Data Science Dispatch

Real-Time Misinformation Tracking: Tech-Driven Solutions for Combating Political Fake News

How effective is a machine learning-based misinformation detection system in reducing user engagement with political misinformation on Bluesky?

By Uma Krishnan, Gabby Tom & Ambro Quach

Research DesignQuestion FormulationData and Decision22 min read

Hypothesized effect of the flag

Control
unflagged
Treatment
flagged
Engagement
likes · reposts · views
baseline
reduction
Trust
in the source
baseline
shift
A/B
randomized test
the treatment group sees a misinformation flag; the control doesn't — engagement and trust, measured over three months on Bluesky.

1. Overview

1.1 The Problem

Misinformation has become a pervasive issue in the political landscape, particularly on social media. Its influence extends far beyond casual discourse, shaping public opinion, swaying voter behavior, and ultimately affecting election outcomes. Platforms like Twitter and Facebook have seen the rampant spread of false information, which can grow to affect election results (Hao). Twitter, in particular, has transitioned from a public forum to a space where propaganda and misinformation thrive. For instance, during the recent presidential election cycle, at least 87 tweets by Elon Musk that spread misinformation in swing states garnered nearly 2 billion views; his amplification of these tweets has contributed to spreading false information (Singh and Dang).

While machine learning-based solutions exist designed to detect misinformation, more research is necessary to evaluate their effectiveness in real-world scenarios. It is crucial to measure how well these ML solutions mitigate misinformation. As Bluesky steps in to fill the void left by Twitter, our Bluesky research team is dedicated to enhancing the platform, prioritizing efforts to combat misinformation, and building user trust in our platform.

1.2 Intended Audience

The Bluesky leadership team is the primary audience for this study. We aim to utilize machine learning tools to regulate our content and enhance credibility and trust.

The secondary audiences include private and public entities such as social media platforms, fact-checking organizations, policymakers, and political campaigns. These groups aim to reduce misinformation and ensure that accurate information reaches the public, thus enabling individuals to make informed decisions.

1.3 Existing Literature

Misinformation shapes public opinion in the United States, resulting in consequential political actions and policy decisions (Zencity). Politicians use misinformation for their benefit and detriment, disregarding the blatant deception of the American public. In an article by Political Science Quarterly, the authors describe how "the Bush administration deliberately misled the American public about whether Iraq has weapons of mass destruction" (Hochschild, Einstein). Although they were distributing misinformation to justify their invasion of Iraq for years, the public did not start to see their mistakes until long after the war had begun.

Due to the constant misinformation dispersed by leaders, "the public's trust in our democratic system subsequently declined" (Sanchez, Middlemass). We aim to utilize our ML-based misinformation classifier to dispel engagement with false narratives. If we can identify misinformation at the source and relay that information to our users, governments do not need to go through the process of teaching their constituents media literacy. Another way to offset the spread of misinformation is to "have multiple Rapid Response Election Security Cyber Units that work at the local, state, and national levels to find and disable accounts that purposely spread misinformation" (Sanchez, Middlemass). Although tedious, this allows organizations to control purposeful deception. Using our ML-based classifier, social media organizations can neutralize misinformation quicker, easier, and cheaper than having a human-based unit dedicated to monitoring misinformation.

1.4 Anticipated Impact

This project aims to test the efficacy of the machine learning model's ability to reduce the engagement with political misinformation (by labeling posts as such) through experimental testing and measurable outcomes. We want to see if the implementation of this model causes a notable reduction in engagement with misinformation and influences user sentiment and trust toward the source. Misinformation can erode people's confidence in elections, influence voter behavior, and threaten democracy (Sanchez, Middlemass). We want Bluesky to permanently implement this feature, thus creating greater trust between platform users, which would drive more people to use Bluesky as it's a more "trusted" service. We would anticipate some other social media platforms doing similarly. Public sentiment reflects a distrust of politicians and social media, which could lead to policy changes requiring social media platforms to take stronger measures against misinformation (Zencity). We could also see politicians moving toward using Bluesky primarily as it's combating misinformation.

2. Research Question

2.1 Main Research Question

How effective is a machine learning-based misinformation detection system in reducing user engagement with political misinformation on Bluesky?

2.2 Sub-Question

Will the ML system's intervention affect user sentiment toward the subject/source of the misinformation?

2.3 Definitions

  • Information: Facts provided or learned through study or observation.
  • Misinformation: False or inaccurate information, especially if meant to deceive its audience.
  • Political Misinformation: Refers to false or misleading claims about political figures, policies, elections, or governance that aim to misinform or manipulate the public.
  • Machine Learning (ML): Using statistical analysis of historical data to identify trends and inform future models of those trends.
  • Natural Language Processing (NLP): A machine-learning technology that measures sentiment (positive/negative) and emotional indicators (e.g., trust, anger, skepticism) from the context of comments and sentiment surveys.
  • Classifier: Designed to classify and categorize posts into specific categories (misinformation & non-misinformation)
  • User Engagement: is the level of interaction, participation, and involvement that users have with content on Bluesky, measured through metrics such as likes, comments, shares, views, reposts, clicks, and time spent engaging with posts.
  • User Sentiment: This captures users' attitudes toward the subject or sources of the misinformation, whether positively, negatively, or neutrally.
  • Flagged Post: A post that contains misinformation is prominently highlighted for the user. A clear explanation is provided detailing the reason for the designation, such as reliance on non-credible sources, outdated information, or links to fake domains.
  • Unflagged Post: A post that has not been marked as misinformation and is treated as normal content.
  • Sentiment Analysis: The evaluation of the emotional tone or subjective opinions on whether flagged misinformation conveys positive, negative, or neutral sentiment.
  • Comment Analysis: Examining and interpreting user-generated comments using natural language processing (NLP) and machine learning techniques to extract user sentiment.

3. Study Design

Mainstream social media like Facebook and Twitter have become prominent spaces where misinformation thrives, posing significant concerns during high-impact political periods (e.g., election time, debates). The widespread dissemination of political misinformation has been shown to impact voter behaviors, incite political unrest, and increase distrust in politics and mainstream social media outlets (Medzerian). In response to these concerns, we at Bluesky want to reinforce our dedication to building user trust in our platform while effectively combating misinformation.

This study, conducted by Bluesky's research team, aims to evaluate the efficacy of an existing machine learning (ML) classifier in reducing platform users' engagement with political misinformation. This classifier was trained on data (in the form of posts) labeled as misinformation vs. not misinformation, so we have existing metrics on baseline user engagement on these posts labeled misinformation. By addressing the specific challenges posed by misinformation, particularly during critical political events, this study aligns with our goals to create a more trustworthy platform.

Previous research, such as the findings from the Annual Reviews, highlights that ML models can be evaluated based on their ability to engage users with accurate and informative content to reduce the spread of misinformation within online communities (Boukouvalas and Shafer). This can be measured through metrics such as user engagement, social sharing, and click-through rates. Guided by these insights, the objectives of our study are outlined below:

3.1 Objectives

The primary objective of this study is to test an existing ML classifier's efficacy in reducing engagement with political misinformation on our platform.

The secondary objective is to evaluate the effectiveness of the ML classifier's flagging in influencing user sentiment, specifically by reducing the credibility of sources that post political misinformation or subjects directly about the misinformation (e.g., false claims about political candidates, policies, elections, or governance). This sentiment analysis will be measured by end-of-week surveys on posts identified as misinformation and through comments on misinformation-flagged posts.

3.2 Measurable Outcomes / Key Metrics

To evaluate the efficacy of our ML classifier in combating engagement with political misinformation, we will focus on two key measurable outcomes: user engagement and user sentiment metrics. This study aims to determine how post-interactivity differs between the treatment group (flagged posts) and control group (unflagged posts) and whether user engagement is reduced when the machine learning classifier is active.

  • User Engagement Metrics: We aim to observe reductions in engagement with flagged misinformation posts compared to unflagged posts in terms of reposts, likes, comments, and views.
  • User Sentiment Metrics: Our goal is to influence user sentiment such that users exposed to flagged misinformation posts show reduced trust in or support for the source/subject of the misinformation (e.g., a political figure or organization). We will measure this in two ways. First is the survey-based sentiment analysis sent at the end of each week (would be shown in-app for a smooth user experience) to collect user feedback via Likert-scale surveys to assess trust or support for the source of flagged political misinformation. For example: "How trustworthy do you find the source of this content?" (Scale: 1 = Not trustworthy, 5 = Very trustworthy). The second is comment-based sentiment analysis using NLP to analyze comments on posts identified as misinformation across both the treatment and control groups, where the treatment sees the misinformation flag and the control does not. Then, we will compare these metrics between the treatment and control groups to identify differences in trust levels toward misinformation sources when the ML classifier is active.

The treatment group includes users whose ML classifier actively flags political misinformation posts in their feeds. The control group consists of users whose classifier is inactive; therefore, their feed will not be flagged for misinformation, and there will be no change in the user experience pertaining to misinformation flagging.

The control group will be asked about their sentiment toward the same flagged post as the treatment, but they won't know that it is flagged. The sentiment analyses of comments under flagged posts will be evaluated similarly; treatment participants will know it's misinformation, but control participants won't.

3.3 Data

Preliminary Research: In a study by USC on the causes of rampant misinformation spread on social media, they highlighted that the issue stems from the platform's reward-based structure, driven by likes and comments (Medzerian). So, both information and misinformation are amplified due to increased user interaction and engagement. In our experiment, we will label pieces of misinformation with flags identifying misinformation, and platform users will likely gain less positive engagement for posting clearly labeled misinformation.

Data Collection During Experiment: We will collect data on the treatment effects as described below:

  • Engagement Metrics: We will collect engagement data on each flagged post, including the number of comments, likes, views, and reposts.
  • Sentiment Survey: We will collect a sentiment score using Likert scale responses from a weekly in-app survey indicating users' attitudes toward the subject of the misinformation.
  • Historical Data: We will use the historical training data to train the ML classifier to generate a baseline engagement and sentiment to compare.
  • User Data: We will utilize user data containing age, location, network size (source characteristics), and baseline platform activity level to generate a population with the necessary characteristics to sample.
  • Post-Related Data: We would gather data about individual posts, specifically the type of post (image, text, video), length of post, and time of post.

Intervention: This study's treatment (intervention) involves labeling posts identified as political misinformation by the ML classifier. Users in the treatment group see posts flagged with a visible misinformation warning designed to reduce their engagement with such content. This study has only one treatment: the presence of the misinformation flag on posts for the treatment group. This intervention aims to influence user interactivity by clearly indicating the content's potential inaccuracy. Both the treatment and control groups will participate in the same sentiment analysis surveys at the end of the week to measure sentiment (again, control participants will not know they are reading flagged misinformation). Additionally, posts identified as political misinformation by the classifier will undergo comment-related sentiment analysis on user comments from both groups.

The control condition is a baseline where users are exposed to the same context without flagged posts. In this condition, users do not see any flags on posts, regardless of whether the content is misinformation. This setup allows us to compare engagement and sentiment metrics in the treatment and control groups under otherwise similar conditions.

3.4 Sample

Unit of Analysis: The primary unit of analysis is the post, as we measure engagement metrics (likes, reposts, comments, views) and treatment effects (flagged vs. unflagged posts) at this level. However, posts are nested within users, and user-level characteristics (e.g., network size, engagement history) are included as aggregated features to account for potential confounding factors.

Population: Our population of interest is all users in the United States of America who are over 18 and have a base platform activity level on the Bluesky platform. The users' base platform activity level must be at least 2 hours a day, and their average engagement history (likes, reposts, comments) must be greater than 5 daily.

Random Assignment:

  • Users: We would perform a power analysis to identify the minimum random sample size. Then, we would conduct a Simple Random Sample of the population. We randomly assign users to this sample and place them equally into the control or treatment group.
  • Posts: From those sampled users, we will perform a Clustered Sample among their potentially viewed posts (from followed or suggested accounts) to apply the ML classifier to determine whether they are in the treatment group. Their potentially viewed posts will remain the same if they are in the control group.

Timeline/Duration: This experiment will run for 3 months to observe measurable outcomes. The end-of-week sentiment surveys will occur every 7 days at the end of each 7-day bracket.

Data Collection Procedures: We must gather data ethically to protect users' personal information and privacy while running an A/B test on their experience.

  • We would create a flag on the treatment group's feed to indicate that they will experience a research test while protecting their personal data.
  • We would anonymize personal data so we don't know personal information about the person posting. We will strictly know their age, location, and historical platform engagement from the past month.
  • We would only collect data strictly necessary for the study: the post's contents (text, links to sources, or photos) and engagement data.

Table 1: Implementation of ML Classifier, Posts, & Sentiment on Control & Treatment Group

ML ClassifierPostsSentiment
Control Group (The sample size will be determined using a power analysis (see the Sample section))InactivePosts containing misinformation remain unflagged and lack any form of labeling.The group will receive an end-of-week survey to assess sentiment analysis on unflagged misinformation. We will use NLP to analyze comments on unflagged misinformation posts.
Treatment Group (The sample size will be determined using a power analysis (see the Sample section))ActivePosts identified as containing misinformation are flagged and prominently highlighted. A clear explanation is provided, detailing the reason for the designation—such as reliance on non-credible sources, outdated information, or links to fake domains.The group will receive an end-of-week survey to assess sentiment analysis on flagged misinformation. We will use NLP to analyze comments on flagged misinformation posts.

3.5 Hypotheses

Our primary hypothesis examines engagement metrics to evaluate whether activating the machine learning (ML) classifier reduces user engagement with posts flagged as political misinformation compared to unflagged posts.

  • Null Hypothesis (H₀): Flagging political misinformation does not have an effect on user engagement metrics, such as reposts, likes, comments, and views.
  • Alternative Hypothesis (H₁): Flagging political misinformation reduces user engagement metrics, including reposts, likes, comments, and views.

Our secondary hypothesis focuses on user sentiment, investigating whether exposure to flagged political misinformation influences trust or support for the subject/source of the misinformation (e.g., a political figure or candidate).

  • Null Hypothesis (H₀): Exposure to flagged political misinformation does not affect user sentiment toward the source/subject of the misinformation, such as a candidate or political figure.
  • Alternative Hypothesis (H₁): Exposure to flagged political misinformation reduces user trust in or support for the source posting the misinformation.

Table 2: Expectations of Impact of ML Classifier Activation on User Engagement and Sentiment

Control Condition (Unflagged Posts)Treatment Condition (Flagged Posts)
Primary Hypothesis: Testing User EngagementBaseline engagement: We would see user engagement with misinformation remain the same. Users interact more freely (e.g., likes, shares, views).Expected reduction: Users in the treatment group's engagement with misinformation will decrease as they avoid flagged content.
Secondary Hypothesis: Testing User SentimentBaseline sentiment: Neutral or supportive (e.g., trust in political figures remains stable).Expected shift: Compared to control, we would see a reduction of positive/neutral sentiment (e.g., trust in political figures posting misinformation decreases).

3.6 Variables

Independent Variables (IV):

Intervention IV (e.g., the treatment we enact): The intervention variable is the activation of our ML classifier; it is either active (treatment group) or inactive (control group).

Covariates IVs: Many possible covariates exist as this experiment is on a forum-style social media platform.

  • Covariates for Engagement Metrics Testing the Primary Hypothesis:
    • Type of post: The engagement with a post will likely differ depending on content– pure text, video, and image posts will all vary and possibly affect outcomes.
    • Length of post: The length of a post affects engagement metrics, and a user might not want to spend time reading a longer post.
    • Source characteristics (verification status, follow count): If a user is verified or has more followers, it is more likely that their posts will garner more engagement than those of a user with smaller ones.
    • Time-related covariates (post longevity, time of post): The two time-related covariates are how long a post has been up and what time a post has been made. Certain times yield higher post visibility. If a post has been up longer, it's likely to have circulated more and amassed more engagement.
  • Covariates for User Sentiment Testing the Secondary Hypothesis:
    • User demographics (age, location): Specific user demographics of age and location are essential measures to watch for, as they could impact their sentiment towards a piece of political misinformation. For example, a person of a high-propensity voter age demographic would likely care about political misinformation. Swing state voters are highly targeted by misinformation, so they are more likely to be concerned if flagged misinformation relates to their state.
    • Engagement history: A user who is actively using the app and engaging with posts is more likely to complete surveys and care more about misinformation on their feed.

Control Variables IVs: In our experiment, the time variables we control are the window of the experiment and the end-of-week sentiment surveys. We are conducting this experiment over 3 months, and all posts during this time are included. The weekly surveys will come out at the end of every 7 days from the start of the experiment.

Dependent Variables (Outcomes): In testing our primary hypothesis, the dependent variables we are tracking are the number of reposts, likes, comments, and views of a given post. We hope to see a reduction in overall engagement for those in our treatment group. In testing our secondary hypothesis, the dependent variable we are tracking will be a sentiment score determined by Likert scale responses from the end-of-week surveys indicating users' attitudes toward the subject/source of the misinformation, depending on the post. We will be using NLP techniques to identify comment-based sentiment on posts the classifier identifies as misinformation and compare how the treatment group (who sees the flag) comments vs the control group (who doesn't see the flag).

3.7 Statistical Method

To evaluate the impact of the ML classifier on user engagement metrics (e.g., reposts, likes, comments, views), we have 3 main statistical methods: descriptive statistics, t-tests, and regression analysis. We will use descriptive statistics to calculate the average engagement levels for flagged and unflagged posts in the treatment and control groups. We then conduct t-tests to compare engagement metrics between the treatment and control groups, determining whether flagging significantly reduces engagement. Finally, we will apply regression analysis to model engagement metrics as a function of the flagging status (treatment vs. control) while controlling for other factors outlined in our Variables section.

To analyze the effect of the ML classifier on user sentiment toward misinformation subjects or sources, we will also use descriptive statistics to summarize survey-based sentiment scores and comment-based sentiment derived from NLP techniques. Then, we will conduct t-tests to compare average sentiment scores between flagged and unflagged posts, assessing whether flagging influences trust or support for misinformation sources. Finally, we will use logistic regression on Likert-scale survey data and NLP techniques to model the likelihood of lower trust scores caused by flagged posts while evaluating the relationship between flagging and comment sentiment.

4. Potential Risks

We will disclose the study's general purpose to the users so they are aware when participating in the experiment. This gives us informed consent to address the ethical concern of transparency. However, more specific details will be omitted to ensure that participants won't skew the results in the way the study expects.

For privacy concerns, we will implement strict guidelines in data collection/security (encryption, anonymization) to protect participants' sensitive information (name, phone number, email, etc.), and only to collect the minimum ad-hoc data to achieve our study objectives.

4.1 Scientific Validity

  • Possible Reduction of Misinformation: Using a classifier to flag misinformation exclusively for the treatment group could potentially reduce overall misinformation, as these individuals may be less likely to amplify false information in their feeds. The user experience would not change for any of the non-treatment platform users. Still, they may be less likely to see specific posts containing misinformation due to reduced engagement from the treatment group.
  • Algorithmic Bias: Our algorithm may run into confirmation, source, language, political, or ambiguity bias that can undermine its effectiveness in reducing engagement with misinformation. Bluesky is a relatively new company that may not have a wide variety of data to provide the algorithm with irregular instances of misinformation. This could cause it to flag content that could be ambiguous or opinion-based, over-flag lesser-known sources and well-known narratives, and have difficulty understanding the context of other languages.
  • Alteration of User Behavior Due to Partial Disclosure: Another important consideration is the potential risk of participants changing their behavior due to partial disclosure of the study's purpose. Being aware that they are being observed might influence how they engage with flagged misinformation.

4.2 Law and Ethics, Including Data Security

  • Increased Polarization: If users disagree with flagging their content as misinformation, they may become more entrenched in their opinions and view opposing viewpoints as alienation.
  • Censorship: Users may feel frustrated that their free speech is being suppressed if they see their content being flagged as misinformation (despite us not actively taking it down). This could lead to dissatisfaction with the experiment or Bluesky, altering their participation.
  • Data Security: We keep all data internally. We will not be sharing private information and will keep participants anonymous. We want to retain platform users' safety and security.

5. Deliverables

After this experiment ends, we will share a report with the leadership team at Bluesky showing the effectiveness of the classifier's flagging in combating engagement with fake political news. In that report, we will also share the sentiments users in the experiment felt towards pieces of misinformation they experienced in their feeds via our end-of-week surveys and comment-based sentiment analysis run on posts the ML classifier identified as political misinformation.

If our research study yields positive results, we hope Bluesky implements real-time misinformation flagging as a permanent feature for all users. We want Bluesky to be a social media platform that users trust. If we reduce engagement with political misinformation, we can influence policy better and possibly prevent political problems before they are amplified. We are seeing a massive shift in users migrating to Bluesky right now as it's been culturally deemed a more ethical social media service, so strengthening the platform's ethos will benefit us by bringing in more users and more platform activity. Distrust in social media and politics is at an all-time high, so creating some trust would be a societal good.

References

A research-design project from UC Berkeley's School of Information, by Uma Krishnan, Gabby Tom, and Ambro Quach.

More From Data Science Dispatch