Web-based “social” technologies have become a part of our everyday lives and routines. As such, we are constantly leaving behind digital traces–-electronic records in various forms–-during our interactions with these technologies. Such digital traces contain valuable clues about our information and social preferences, as well as our identities.
Our research focuses on the analysis of users’ behavioral traces in networked, social information systems, with an emphasis on the unstructured (textual) data that they share. Our work is positioned at the intersection of social computing and computational social science (particularly, the information and communication sciences). Our ultimate goal is to use behavioral traces to learn about people; however, these insights often have important implications for the design of social technologies.
It is obvious that social media participants who engage in direct interaction with one another can influence each other’s behaviors. However, many social media are not all that social, in the sense that they do not facilitate direct interaction between participants. Examples of such media include comment and review forums where participants share their thoughts and opinions on content (e.g., comments in response to published news articles) or goods and services (e.g., travel destination reviews at TripAdvisor.com) but cannot respond to other participants.
By studying language patterns in such environments, our experiments have shown that even without direct interaction, participants influence one another’s behaviors. For instance, minority-group participants (e.g., women) in a given environment often adapt their language patterns to those of the majority group (e.g., men). We have also discovered the existence of a herding effect on language; participants can be induced to use very unusual features (e.g., emoticons in a medium that is rather formal) in the content they create when the previous contributions exhibited such features, despite the fact that overall, people tend to avoid using them. Our work has serious implications for textual data-mining efforts. Linguistic features are often relied upon in order to infer people’s demographic characteristics; however, there is evidence that our language patterns are susceptible to manipulation. In other words, in social media, we may not always behave as ourselves.
The globalization of the marketplace has led to an increasing number of social computing/social media applications targeting worldwide participation. This intent is reflected in the mission statements of some of the biggest industry players, many of which are based in the United States. For instance, Twitter seeks to “give everyone the power to create and share ideas and information instantly, without barriers,” and has registered more than three quarters of its accounts to users outside the U.S. Similarly, “Facebook’s mission is to give people the power to share and make the world more open and connected”. Despite this, little research is conducted on the behaviors and experiences of users outside of North America.
Our research explores various popular social media in which there is a significant presence of international participants. In some of these media, there is support for participants’ local language (e.g., Greek Wikipedia) while in others English is used as a lingua franca (e.g., Internet Movie Database). We are comparing the behavioral and language traces of international participants to those of participants based in North America in order to learn 1) how participation patterns and preferences vary, 2) whether and how international participants bring diversity to the content shared via these media, and 3) how large, multilingual datasets can be processed accurately and efficiently. The findings will help us understand the extent to which the same technology deployed across users of multiple cultures can actually result in making the “world more open and connected” as proposed by Facebook.