Tagged: social media

On Social Media Sampling

In social media sampling, there are many issues. Two of them are: 1) the silent majority problem and 2) the grouping problem.

The former refers to the imbalance between participants and spectators: can we trust that the vocal few represent the views of all?

The latter means that people of similar opinions tend to flock together, meaning that looking at one online community or even social media platform we can get a biased understanding of the whole population.

Solving these problems is hard, and requires understanding of the online communities, their polarity, sociology and psychology driving the participation, and the functional principles of the algorithms that determine visibility and participation in the platforms.

Prior knowledge on the online communities can be used as a basis for stratified sampling that can be a partial remedy.

Web 3.0: The dark side of social media

Web 2.0 was about all the pretty, shiny things about social media, like user-generated content, blogs, customer participation, ”everyone has a voice,” etc. Now, Web 3.0 is all about the dark side: algorithmic bias, filter bubbles, group polarization, flame wars, cyberbullying, etc. We discovered that maybe everyone should not have a voice, after all. Or at least that voice should be used with more attention to what you are saying.

While it is tempting to blame Facebook, media, or ”technology” for all this (just as it is easy to praise it for the other things), the truth is that individuals should accept more responsibility of their own behavior. Technology provides platforms for communication and information, but it does not generate communication and information; people do.

In consequence, I’m very skeptical about technological solutions to the Web 3.0 problems; they seem not to be technological problems but social ones, requiring primarily social solutions and secondly hybrid solutions. We should start respecting the opinions of others, get educated about different views, and learn how to debate based on facts and finding fundamental differences, not resorting to argumentation errors. Here, machines have only limited power – it’s up to us to re-learn these things and keep teaching them to new generations. It’s quite pitiful that even though our technology is 1000x better than in Ancient Greek, our ability to debate properly is one tenth of what it was 2000 years ago.

Avoiding the enslavement of machines requires going back to the basics of humanity.

From polarity to diversity of opinions

The problem with online discussions and communities is that the extreme poles draw people effectively, causing group polarization in which the original opinion of a person becomes more radical due to influence of the group. In Finnish, we have a saying ”In a group, stupidity concentrates” (joukossa tyhmyys tiivistyy).

Here, I’m exploring the idea that this effect, namely the growth of polar extremes (for example, being for or against immigration, as currently many European citizens are) is simply because people are lacking options to identify with. There are only the extremes, but no neutral or moderate group, even though, as I’m arguing here, most people in fact are moderate and understand that extremes and absolutes are misleading simplifications either way.

In other words, when there are only two ”camps” of opinion, people are more easily split between them. However, my argument is that people have preferences that correspond to being in the middle, not in the extremes.

These preferences remain hidden because there are only two camps to subscribe to: One cannot be moderate because there is no moderate group.

For example, there are liberals and conservatives, but what about the people in the middle? What about them who share some ideas of liberals and others from conservatives? By having only these two groups, other combinations become socially impossible because people are, again socially, pressed to observe all the opinions of the group they’re subscribing to, even if they wouldn’t agree with a particular view. This effect has been studied in relation to the concept of groupthink, but no permanent remedy has been found.

How to solve the problem of extremes?

My idea is simple: we should start more camps, more views to subscribe to, especially those representing moderate views.

The argument is that having more supply of camps, people will distribute more evenly between them and we have less polarization as a consequence.

This is illustrated in the picture (sketched quickly in Paint since I got an inspiration).

a and b

In (A), public discourse is dominated by the extremes (the distribution of attention is skewed toward the extremes of a given opinion spectrum). In (B), the distribution is focused on the center of the opinion spectrum (=moderate views) while the extremes are marginalized (as they should be, according to the assumption of moderate majority).

An example: having several political parties results in more diverse views being presented. In the US, you are either a Democrat or a Republican (although there are  marginal Green Party and the progressives, it must be stated), but in Finland you can also be many others: Center Party, National Coalition Party, or Green Party, for example. The same applies to most countries in Europe. Although I don’t have facts for this, it seems that the public discourse in the US is exceptionally polarized compared to many other countries [1].

Giving more choices to identify with for the ”silent majority” that is moderate rather than extreme, revealing the ”true” opinions of citizens, would ideally marginalize both extremes, avoiding the tyrannity of minority [2] currently dominating the public discourse.

Finally, all this could be formalized in game theory by assuming heterogeneity of preferences over the opinion spectrum and parameters such as gravity (”pull factor” by the extremes), justifiable e.g. by media attention given to extreme views over moderate ones. But the implication reains the same: diversity of classes reduces polarization under the set of assumptions.


[1] Of course there are other reasons, such as media taking political sides.

[2] This means extreme views are not representative to the whole population (which is more moderate than either view) but they get disproportionate attention in the media and public discourse. This is because the majority views are hidden; they would need to be revealed.

Questions from ICWSM17

In the ”Studying User Perceptions and Experiences with Algorithms” workshop, there were many interesting questions popping up. Here are some of them:

  • Will increased awareness of algorithm functionality change user behavior? How
  • How can we build better algorithms to diversify information users are exposed to?
  • Do most people care about knowing how Google works?
  • What’s the ”count to 10” equivalent for online discussions? How to avoid snap judgments?
  • How to defuse revenge seeking in online discussions?
  • What are individuals’ affective relationships with algorithms like?

These make for great research questions.

Reading list from ICWSM17

In one of the workshops of the first conference day, ”Studying User Perceptions and Experiences with Algorithms”, the participants recommended papers to each other. Here are, if not all, then most of them, along with their abstracts.

Bakshy, E., Messing, S., & Adamic, L. A. (2015). Exposure to ideologically diverse news and opinion on Facebook. Science, 348(6239), 1130–1132.

Exposure to news, opinion, and civic information increasingly occurs through social media. How do these online networks influence exposure to perspectives that cut across ideological lines? Using deidentified data, we examined how 10.1 million U.S. Facebook users interact with socially shared news. We directly measured ideological homophily in friend networks and examined the extent to which heterogeneous friends could potentially expose individuals to cross-cutting content. We then quantified the extent to which individuals encounter comparatively more or less diverse content while interacting via Facebook’s algorithmically ranked News Feed and further studied users’ choices to click through to ideologically discordant content. Compared with algorithmic ranking, individuals’ choices played a stronger role in limiting exposure to cross-cutting content.

Bucher, T. (2017). The algorithmic imaginary: exploring the ordinary affects of Facebook algorithms. Information, Communication & Society, 20(1), 30–44. 

This article reflects the kinds of situations and spaces where people and algorithms meet. In what situations do people become aware of algorithms? How do they experience and make sense of these algorithms, given their often hidden and invisible nature? To what extent does an awareness of algorithms affect people’s use of these platforms, if at all? To help answer these questions, this article examines people’s personal stories about the Facebook algorithm through tweets and interviews with 25 ordinary users. To understand the spaces where people and algorithms meet, this article develops the notion of the algorithmic imaginary. It is argued that the algorithmic imaginary – ways of thinking about what algorithms are, what they should be and how they function – is not just productive of different moods and sensations but plays a generative role in moulding the Facebook algorithm itself. Examining how algorithms make people feel, then, seems crucial if we want to understand their social power.

Eslami, M., Rickman, A., Vaccaro, K., Aleyasen, A., Vuong, A., Karahalios, K., … Sandvig, C. (2015). “I Always Assumed That I Wasn’T Really That Close to [Her]”: Reasoning About Invisible Algorithms in News Feeds. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (pp. 153–162). New York, NY, USA: ACM.

Our daily digital life is full of algorithmically selected content such as social media feeds, recommendations and personalized search results. These algorithms have great power to shape users’ experiences, yet users are often unaware of their presence. Whether it is useful to give users insight into these algorithms’ existence or functionality and how such insight might affect their experience are open questions. To address them, we conducted a user study with 40 Facebook users to examine their perceptions of the Facebook News Feed curation algorithm. Surprisingly, more than half of the participants (62.5%) were not aware of the News Feed curation algorithm’s existence at all. Initial reactions for these previously unaware participants were surprise and anger. We developed a system, FeedVis, to reveal the difference between the algorithmically curated and an unadulterated News Feed to users, and used it to study how users perceive this difference. Participants were most upset when close friends and family were not shown in their feeds. We also found participants often attributed missing stories to their friends’ decisions to exclude them rather than to Facebook News Feed algorithm. By the end of the study, however, participants were mostly satisfied with the content on their feeds. Following up with participants two to six months after the study, we found that for most, satisfaction levels remained similar before and after becoming aware of the algorithm’s presence, however, algorithmic awareness led to more active engagement with Facebook and bolstered overall feelings of control on the site.

Epstein, R., & Robertson, R. E. (2015). The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. Proceedings of the National Academy of Sciences, 112(33), E4512–E4521.

Internet search rankings have a significant impact on consumer choices, mainly because users trust and choose higher-ranked results more than lower-ranked results. Given the apparent power of search rankings, we asked whether they could be manipulated to alter the preferences of undecided voters in democratic elections. Here we report the results of five relevant double-blind, randomized controlled experiments, using a total of 4,556 undecided voters representing diverse demographic characteristics of the voting populations of the United States and India. The fifth experiment is especially notable in that it was conducted with eligible voters throughout India in the midst of India’s 2014 Lok Sabha elections just before the final votes were cast. The results of these experiments demonstrate that (i) biased search rankings can shift the voting preferences of undecided voters by 20% or more, (ii) the shift can be much higher in some demographic groups, and (iii) search ranking bias can be masked so that people show no awareness of the manipulation. We call this type of influence, which might be applicable to a variety of attitudes and beliefs, the search engine manipulation effect. Given that many elections are won by small margins, our results suggest that a search engine company has the power to influence the results of a substantial number of elections with impunity. The impact of such manipulations would be especially large in countries dominated by a single search engine company.

Gillespie, T. (2014). The Relevance of Algorithms. In: Media technologies essays on communication, materiality, and society. Cambridge, Mass.

Algorithms (particularly those embedded in search engines, social media platforms, recommendation systems, and information databases) play an increasingly important role in selecting what information is considered most relevant to us, a crucial feature of our participation in public life. As we have embraced computational tools as our primary media of expression, we are subjecting human discourse and knowledge to the procedural logics that undergird computation. What we need is an interrogation of algorithms as a key feature of our information ecosystem, and of the cultural forms emerging in their shadows, with a close attention to where and in what ways the introduction of algorithms into human knowledge practices may have political ramifications. This essay is a conceptual map to do just that. It proposes a sociological analysis that does not conceive of algorithms as abstract, technical achievements, but suggests how to unpack the warm human and institutional choices that lie behind them, to see how algorithms are called into being by, enlisted as part of, and negotiated around collective efforts to know and be known.

Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., & Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2).

In information societies, operations, decisions and choices previously left to humans are increasingly delegated to algorithms, which may advise, if not decide, about how data should be interpreted and what actions should be taken as a result. More and more often, algorithms mediate social processes, business transactions, governmental decisions, and how we perceive, understand, and interact among ourselves and with the environment. Gaps between the design and operation of algorithms and our understanding of their ethical implications can have severe consequences affecting individuals as well as groups and whole societies. This paper makes three contributions to clarify the ethical importance of algorithmic mediation. It provides a prescriptive map to organise the debate. It reviews the current discussion of ethical aspects of algorithms. And it assesses the available literature in order to identify areas requiring further work to develop the ethics of algorithms.

Pasquale, F. (2015). The Black Box Society: The Secret Algorithms That Control Money and Information. Cambridge, MA, USA: Harvard University Press.

Every day, corporations are connecting the dots about our personal behaviorsilently scrutinizing clues left behind by our work habits and Internet use. The data compiled and portraits created are incredibly detailed, to the point of being invasive. But who connects the dots about what firms are doing with this information? The Black Box Society argues that we all need to be able to do soand to set limits on how big data affects our lives. Hidden algorithms can make (or ruin) reputations, decide the destiny of entrepreneurs, or even devastate an entire economy. Shrouded in secrecy and complexity, decisions at major Silicon Valley and Wall Street firms were long assumed to be neutral and technical. But leaks, whistleblowers, and legal disputes have shed new light on automated judgment. Self-serving and reckless behavior is surprisingly common, and easy to hide in code protected by legal and real secrecy. Even after billions of dollars of fines have been levied, underfunded regulators may have only scratched the surface of this troubling behavior. Frank Pasquale exposes how powerful interests abuse secrecy for profit and explains ways to rein them in. Demanding transparency is only the first step. An intelligible society would assure that key decisions of its most important firms are fair, nondiscriminatory, and open to criticism. Silicon Valley and Wall Street need to accept as much accountability as they impose on others.

Proferes, N. (2017). Information Flow Solipsism in an Exploratory Study of Beliefs About Twitter. Social Media + Society, 3(1), 2056305117698493.

There is a dearth of research on the public’s beliefs about how social media technologies work. To help address this gap, this article presents the results of an exploratory survey that probes user and non-user beliefs about the techno-cultural and socioeconomic facets of Twitter. While many users are well-versed in producing and consuming information on Twitter, and understand Twitter makes money through advertising, the analysis reveals gaps in users’ understandings of the following: what other Twitter users can see or send, the kinds of user data Twitter collects through third parties, Twitter and Twitter partners’ commodification of user-generated content, and what happens to Tweets in the long term. This article suggests the concept of “information flow solipsism” as a way of describing the resulting subjective belief structure. The article discusses implications information flow solipsism has for users’ abilities to make purposeful and meaningful choices about the use and governance of social media spaces, to evaluate the information contained in these spaces, to understand how content users create is utilized by others in the short and long term, and to conceptualize what information other users experience.

Woolley, S. C., & Howard, P. N. (2016). Automation, Algorithms, and Politics| Political Communication, Computational Propaganda, and Autonomous Agents — Introduction. International Journal of Communication, 10(0), 9.

The Internet certainly disrupted our understanding of what communication can be, who does it, how, and to what effect. What constitutes the Internet has always been an evolving suite of technologies and a dynamic set of social norms, rules, and patterns of use. But the shape and character of digital communications are shifting again—the browser is no longer the primary means by which most people encounter information infrastructure. The bulk of digital communications are no longer between people but between devices, about people, over the Internet of things. Political actors make use of technological proxies in the form of proprietary algorithms and semiautomated social actors—political bots—in subtle attempts to manipulate public opinion. These tools are scaffolding for human control, but the way they work to afford such control over interaction and organization can be unpredictable, even to those who build them. So to understand contemporary political communication—and modern communication broadly—we must now investigate the politics of algorithms and automation.


Problems of using fact-checking sites as inputs for social media algorithms

This is a brief post describing a key problem in using fact-checking sites as inputs to filter undesirable content (e.g., fake news) from social media newsfeeds (e.g., Facebook).

The premise sounds good, right? We use human raters to verify truthfulness of an article, and use that information as part of the decision-making algorithm.

However, there are two problems:

  1. Human raters may be biased
  2. Not all statements are in fact verifiable

First, it has become obvious that many journalists are not objective, but blatantly biased, even to a degree of being proud about it. Consequently, their credibility is lost. If fact-checkers are biased journalists, they will interpret statements based on their own beliefs and attitudes, while not seeing anything wrong in doing so. This, of course, is a major issue for using fact-checking services as inputs in machine-decision making — because the inputs are corrupt, so will be the outcomes, too (”garbage in, garbage out”).

Second, there are several cases where ”facts” being checked are not truth statements. According to Oxford Index,

An important difference between the truth of a statement and the validity of a norm is that the truth of a statement is verifiable — i.e. it must be possible to prove it to be true or false — while the validity of a norm is not.

For example, a Finnish fact-checking site is checking facts such as ”EU is oppressing national states into following its will.” (Link, in Finnish.) Clearly, this is not a fact statement because that sort of a statement cannot be unambiguously verified. Yet, they unambiguously label the statement ”wrong”, thus exposing their political bias rather than truth value of the statement.

Another example: the popular fact-checking site Snopes.com is checking facts like ”Are Non-Citizens Being Registered to Vote Without Their Knowledge?” In that case, it was determined ”Mostly false” because of only five such errors have taken place according to their second-hand information. However, that is in fact proof of the possibility of the claim. To be a correct truth statement, is should state ”Have non-citizens been registered to vote without their knowledge” (i.e., are there known cases), and the answer, in the light of evidence, should be ”True”, not ”False”. In such cases, the interpretation of the raters clearly shows in how the verified statements are formulated and accordingly interpreted.

The particular problems of fact-checking sites like Snopes.com is that 1) they use selective referencing, seemingly focusing on citing ”liberal media” such as CNN (thus breaking fact-checkers’ code of principles) and 2) use unambiguous definition of truth, including classifications such as  ”Mostly True” and ”Mostly False”. But a truth statement (fact) is either true or false, not somewhere in between. Anything else is interpretation, and therefore susceptible to human bias.


For a fact to be verifiable, it needs to be a truth statement. In other words, we should be able to unambiguously state whether it is true or false. However, some of the ”facts” so-called fact-checking sites (e.g., Snopes) are verifying are not verifiable, i.e. they are not truth statements.

If the facts are not in fact truth statements, but something else — like jokes, sarcasm, or forms of exaggeration — using fact-checking services as inputs in social media algorithms to fight ”fake news” then becomes highly compromising.


Ayer, A. J. (1936) Language, Truth and Logic. London: V. Gollancz.