On Social Media Sampling

In social media sampling, there are many issues. Two of them are: 1) the silent majority problem and 2) the grouping problem.

The former refers to the imbalance between participants and spectators: can we trust that the vocal few represent the views of all?

The latter means that people of similar opinions tend to flock together, meaning that looking at one online community or even social media platform we can get a biased understanding of the whole population.

Solving these problems is hard, and requires understanding of the online communities, their polarity, sociology and psychology driving the participation, and the functional principles of the algorithms that determine visibility and participation in the platforms.

Prior knowledge on the online communities can be used as a basis for stratified sampling that can be a partial remedy.

Web 3.0: The dark side of social media

Web 2.0 was about all the pretty, shiny things about social media, like user-generated content, blogs, customer participation, ”everyone has a voice,” etc. Now, Web 3.0 is all about the dark side: algorithmic bias, filter bubbles, group polarization, flame wars, cyberbullying, etc. We discovered that maybe everyone should not have a voice, after all. Or at least that voice should be used with more attention to what you are saying.

While it is tempting to blame Facebook, media, or ”technology” for all this (just as it is easy to praise it for the other things), the truth is that individuals should accept more responsibility of their own behavior. Technology provides platforms for communication and information, but it does not generate communication and information; people do.

In consequence, I’m very skeptical about technological solutions to the Web 3.0 problems; they seem not to be technological problems but social ones, requiring primarily social solutions and secondly hybrid solutions. We should start respecting the opinions of others, get educated about different views, and learn how to debate based on facts and finding fundamental differences, not resorting to argumentation errors. Here, machines have only limited power – it’s up to us to re-learn these things and keep teaching them to new generations. It’s quite pitiful that even though our technology is 1000x better than in Ancient Greek, our ability to debate properly is one tenth of what it was 2000 years ago.

Avoiding the enslavement of machines requires going back to the basics of humanity.

From polarity to diversity of opinions

The problem with online discussions and communities is that the extreme poles draw people effectively, causing group polarization in which the original opinion of a person becomes more radical due to influence of the group. In Finnish, we have a saying ”In a group, stupidity concentrates” (joukossa tyhmyys tiivistyy).

Here, I’m exploring the idea that this effect, namely the growth of polar extremes (for example, being for or against immigration, as currently many European citizens are) is simply because people are lacking options to identify with. There are only the extremes, but no neutral or moderate group, even though, as I’m arguing here, most people in fact are moderate and understand that extremes and absolutes are misleading simplifications either way.

In other words, when there are only two ”camps” of opinion, people are more easily split between them. However, my argument is that people have preferences that correspond to being in the middle, not in the extremes.

These preferences remain hidden because there are only two camps to subscribe to: One cannot be moderate because there is no moderate group.

For example, there are liberals and conservatives, but what about the people in the middle? What about them who share some ideas of liberals and others from conservatives? By having only these two groups, other combinations become socially impossible because people are, again socially, pressed to observe all the opinions of the group they’re subscribing to, even if they wouldn’t agree with a particular view. This effect has been studied in relation to the concept of groupthink, but no permanent remedy has been found.

How to solve the problem of extremes?

My idea is simple: we should start more camps, more views to subscribe to, especially those representing moderate views.

The argument is that having more supply of camps, people will distribute more evenly between them and we have less polarization as a consequence.

This is illustrated in the picture (sketched quickly in Paint since I got an inspiration).

a and b

In (A), public discourse is dominated by the extremes (the distribution of attention is skewed toward the extremes of a given opinion spectrum). In (B), the distribution is focused on the center of the opinion spectrum (=moderate views) while the extremes are marginalized (as they should be, according to the assumption of moderate majority).

An example: having several political parties results in more diverse views being presented. In the US, you are either a Democrat or a Republican (although there are  marginal Green Party and the progressives, it must be stated), but in Finland you can also be many others: Center Party, National Coalition Party, or Green Party, for example. The same applies to most countries in Europe. Although I don’t have facts for this, it seems that the public discourse in the US is exceptionally polarized compared to many other countries [1].

Giving more choices to identify with for the ”silent majority” that is moderate rather than extreme, revealing the ”true” opinions of citizens, would ideally marginalize both extremes, avoiding the tyrannity of minority [2] currently dominating the public discourse.

Finally, all this could be formalized in game theory by assuming heterogeneity of preferences over the opinion spectrum and parameters such as gravity (”pull factor” by the extremes), justifiable e.g. by media attention given to extreme views over moderate ones. But the implication reains the same: diversity of classes reduces polarization under the set of assumptions.


[1] Of course there are other reasons, such as media taking political sides.

[2] This means extreme views are not representative to the whole population (which is more moderate than either view) but they get disproportionate attention in the media and public discourse. This is because the majority views are hidden; they would need to be revealed.

Questions from ICWSM17

In the ”Studying User Perceptions and Experiences with Algorithms” workshop, there were many interesting questions popping up. Here are some of them:

  • Will increased awareness of algorithm functionality change user behavior? How
  • How can we build better algorithms to diversify information users are exposed to?
  • Do most people care about knowing how Google works?
  • What’s the ”count to 10” equivalent for online discussions? How to avoid snap judgments?
  • How to defuse revenge seeking in online discussions?
  • What are individuals’ affective relationships with algorithms like?

These make for great research questions.

Reading list from ICWSM17

In one of the workshops of the first conference day, ”Studying User Perceptions and Experiences with Algorithms”, the participants recommended papers to each other. Here are, if not all, then most of them, along with their abstracts.

Problems of using fact-checking sites as inputs for social media algorithms

This is a brief post describing a key problem in using fact-checking sites as inputs to filter undesirable content (e.g., fake news) from social media newsfeeds (e.g., Facebook).

The premise sounds good, right? We use human raters to verify truthfulness of an article, and use that information as part of the decision-making algorithm.

However, there are two problems:

  1. Human raters may be biased
  2. Not all statements are in fact verifiable

First, it has become obvious that many journalists are not objective, but blatantly biased, even to a degree of being proud about it. Consequently, their credibility is lost. If fact-checkers are biased journalists, they will interpret statements based on their own beliefs and attitudes, while not seeing anything wrong in doing so. This, of course, is a major issue for using fact-checking services as inputs in machine-decision making — because the inputs are corrupt, so will be the outcomes, too (”garbage in, garbage out”).

Second, there are several cases where ”facts” being checked are not truth statements. According to Oxford Index,

An important difference between the truth of a statement and the validity of a norm is that the truth of a statement is verifiable — i.e. it must be possible to prove it to be true or false — while the validity of a norm is not.

For example, a Finnish fact-checking site is checking facts such as ”EU is oppressing national states into following its will.” (Link, in Finnish.) Clearly, this is not a fact statement because that sort of a statement cannot be unambiguously verified. Yet, they unambiguously label the statement ”wrong”, thus exposing their political bias rather than truth value of the statement.

Another example: the popular fact-checking site Snopes.com is checking facts like ”Are Non-Citizens Being Registered to Vote Without Their Knowledge?” In that case, it was determined ”Mostly false” because of only five such errors have taken place according to their second-hand information. However, that is in fact proof of the possibility of the claim. To be a correct truth statement, is should state ”Have non-citizens been registered to vote without their knowledge” (i.e., are there known cases), and the answer, in the light of evidence, should be ”True”, not ”False”. In such cases, the interpretation of the raters clearly shows in how the verified statements are formulated and accordingly interpreted.

The particular problems of fact-checking sites like Snopes.com is that 1) they use selective referencing, seemingly focusing on citing ”liberal media” such as CNN (thus breaking fact-checkers’ code of principles) and 2) use unambiguous definition of truth, including classifications such as  ”Mostly True” and ”Mostly False”. But a truth statement (fact) is either true or false, not somewhere in between. Anything else is interpretation, and therefore susceptible to human bias.


For a fact to be verifiable, it needs to be a truth statement. In other words, we should be able to unambiguously state whether it is true or false. However, some of the ”facts” so-called fact-checking sites (e.g., Snopes) are verifying are not verifiable, i.e. they are not truth statements.

If the facts are not in fact truth statements, but something else — like jokes, sarcasm, or forms of exaggeration — using fact-checking services as inputs in social media algorithms to fight ”fake news” then becomes highly compromising.


