Researchers find flaws in using source reputation for training automatic misinformation detection algorithms

Researchers at Rutgers University have found a major flaw in the way that algorithms designed to detect "fake news" evaluate the credibility of online news stories.

Most of these algorithms rely on a credibility score for the "source" of the article, rather than assessing the credibility of each individual article, the researchers said.

"It is not the case that all news articles published by sources labeled 'credible' (e.g., The New York Times) are accurate, nor is it the case that every article published by sources labeled 'non-credible' publications are 'fake news,'" said Vivek K. Singh, an associate professor at the Rutgers School of Communication and Information and co-author of the study "Misinformation Detection Algorithms and Fairness Across Political Ideologies: The Impact of Article Level Labeling," published on OSFHome.

"Our analysis shows that labeling articles for misinformation based on the source is as bad an idea as just flipping a coin and assigning true/false labels to news stories," added Lauren Feldman, an associate professor of journalism and media studies at the School of Communication and Information, who is another co-author of the paper.

The researchers found using source-level labels for credibility isn't a reliable method, with article-level labels matching 51% of the time. This labeling process has important implications for tasks such as the creation of robust fake news detectors and for audits on fairness across the political spectrum.

To address this problem, the study offers a new dataset of journalistic quality individually labeled articles and an approach for misinformation detection and fairness audits. The findings of this study highlight the need for more nuanced and reliable methods of detecting misinformation in online news and provide valuable resources for future research in this area.

Researchers assessed the credibility and political leaning of 1,000 news articles and used these article-level labels to build misinformation detection algorithms. Then, they evaluated how the labeling methodology (source level versus article level) impacts the performance of misinformation detection algorithms.

Their aim was to explore the impact of article-level labeling on the process and determine whether the bias that exists when applying machine-learning approach at the source level still exists when applying the same machine-learning approach to individual articles, and in addition, to learn if bias is reduced when dealing with individually labeled articles.

The authors presented their paper at the 15th Association for Computing Machinery Web Science Conference 2023, held from April 30-May 1 in Austin, Texas.

A joint effort between journalism, information science and computer science professionals, the authors, in addition to Singh and Feldman, include Jinkyung Park, a Ph.D. alumna of the School of Communication and Information; Rahul Dev Ellezhuthil, a computer science master's degree student; School of Communication and Information doctoral student Joseph Isaac; and Christoph Mergerson, a Ph.D. alumnus of the School of Communication and Information and an assistant professor of race and media at the University of Maryland.

The authors said algorithms used to detect misinformation in online articles function the way they do "mainly because there is a dearth of fine-grained labels defined at the news article level. We acknowledge that labeling each news article may not be feasible given the massive volume of news articles that are published and disseminated on the web. At the same time, there are reasons to question the validity of datasets labeled at the source level."

"Validating online news and preventing the spread of misinformation is critical for ensuring trustworthy online environments and protecting democracy," the authors wrote, adding that their work "aims to increase public confidence in misinformation detection practices and subsequent corrections by ensuring the validity and fairness of results," and their dataset and the conceptual results "aim to pave the way for more reliable and fair misinformation detection algorithms."

More information: Jinkyung Park et al, Misinformation Detection Algorithms and Fairness across Political Ideologies: The Impact of Article Level Labeling, DOI: 10.17605/OSF.IO/QWNSF

Provided by Rutgers University

Researchers find flaws in using source reputation for training automatic misinformation detection algorithms

Sharing source-backed information can help reduce COVID-19 misinformation online

Study examines productivity effects of ChatGPT when used by college-educated professionals

Researchers create privacy technique that protects sensitive data while maintaining performance

Spiking neural network based on theory of mind helps multi-agent cooperation and competition

Deepfake videos prompt false memories of films in half of participants

Making algorithms used in AI more human-like: Scientists use fMRI to test ideas about complex decision-making

Neuralangelo: Unleashing the digital Michelangelo from your smartphone

Software creates entirely new views from existing video

Researchers introduce transparent optical imager with near-infrared sensitivity and touchless interface

ROSE: A revolutionary, nature-inspired soft embracing robotic gripper

Global metric developed for the design of dexterous robots

Coordination could spare billions in grid upgrade costs and accelerate electrification

A strategy to reduce defects in inverted perovskite solar cells and improve their performance

New high-power thermoelectric device may provide cooling in next-gen electronics

A ferroelectric transistor that stores and computes at scale

Need to get Plan B or an HIV test online? Facebook may know about it

Data scientists predict stock returns with AI and online news

Improving high-temperature stability of perovskite solar cells

Generative AI 'fools' scientists with artificial data, bringing automated data analysis closer

Researchers find flaws in using source reputation for training automatic misinformation detection algorithms

Let us know if there is a problem with our content

Thank you for taking time to provide your feedback to the editors

Share article

E-MAIL THE STORY