Cracking Down on Disinformation with Big Data

2023-12-202 min read

Disinformation on the Internet is on the rise, and advancements in artificial intelligence technology are making it faster and easier to generate falsehoods. But our Product Owner Aleksandras Šulženko says that web scraping — the technology of utilizing publicly available data — can help us combat this issue. Let’s find out how.

We think disinformation is a crucial issue — that’s why we collaborated with Debunk EU pro bono and provided them with access to our web intelligence tools.

The global pandemic and the ongoing Russian aggression against Ukraine have amplified the problem of misinformation, with rapidly evolving large language models like Auto-GTP making it easier to generate false information. What’s concerning is that a large part of the general public lacks the skills to recognize false information on the Internet.

Exaggerated emotions — a sign of deceit

“When determining the truthfulness of news, it’s crucial to analyze the emotions and tone of the information. While all new sources regularly use linguistic devices like epithets and hyperboles to make the text more engaging, perpetrators of misinformation use more intense emotions in their communication” — says Aleksandras.

Relying solely on emotion analysis isn’t enough to identify misinformation. To do this, we need to evaluate more aspects of the texts and their sources — and that’s where machine learning tools become extremely handy.

That’s where web scraping and machine learning can be extremely helpful. Web scraping is a technology that allows collecting large amounts of publicly available data, whereas machine learning can assess the credibility of gathered data. By combining these technologies, emotion analysis can be carried out.

Mood or emotion analysis is extremely useful for identifying fake or biased news. It all begins with collecting a large number of reliable articles to establish the optimal level of emotions expressed in factually accurate texts. This acts as a reference point, and articles exceeding the ordinary levels of emotion can then be examined closely.

Avoiding bias — necessary at all times

Aleksandras adds that relying solely on emotion analysis isn’t enough to identify misinformation. To do this, we need to evaluate more aspects of the texts and their sources — and that’s where machine learning tools become extremely handy.

“Machine learning helps find answers in a complex, interconnected data network, as it doesn’t require the definition of specific factors that distinguish real news from fake ones. These tools receive data and learn patterns within it. Thus, by using public data collection tools, we can extract massive amounts of data, mark it accordingly, share with a machine learning model, and then allow it to identify false news,” — says Aleksandras.

However, to objectively mark information, it’s crucial that the person doing this work remains unbiased and these datasets are meticulously verified by others. Otherwise, we risk being subjective.

Partnership with Debunk EU

We think disinformation is a crucial issue — that’s why we collaborated with Debunk EU pro bono and provided them with access to our web intelligence tools. Now, they can gather large volumes of public data, making their fact-checking process more efficient.

Our partnership with Debunk EU is a part of our pro bono Project 4β, where we collaborate with nonprofit organizations and help them use web scraping tools to solve various social issues.

Ready to use the power of web intelligence for the greater good? Bring your talents over to our team — check out our open positions.

Open Engineering Positions

Web Scraping Engineer - Python

New

Vilnius/Kaunas, OxySERPS

Details

Mid-Senior DevOps Engineer (Kubernetes)

Vilnius, Data Team

Details

Distinguished Python Engineer (Webshare product)

Vilnius, Webshare

Details

PHP Senior/Technical Lead

Kaunas, E-Commerce

Details

Senior Python Engineer

Vilnius, Scraping

Details

Senior System Administrator

Vilnius, OxySERPS

Details

Golang Developer (Browser automation tool)

Vilnius, Supply

Details

Site Reliability Engineer

Vilnius, Supply

Details

Apps Developer (.NET / C#)

Vilnius, Supply

Details

Senior Go Developer

Vilnius, Integration

Details

ISO/IEC 27001:2017 certified products:

Proxy Solutions

Scraper APIs

Jobs Technology Life Sustainability Teams Blog

About Oxylabs Press Area Our Products Web Scraping Use Cases

hello@oxylabs.io support@oxylabs.io career@oxylabs.io