Back to blog
Disinformation on the Internet is on the rise, and advancements in artificial intelligence technology are making it faster and easier to generate falsehoods. But our Product Owner Aleksandras Šulženko says that web scraping — the technology of utilizing publicly available data — can help us combat this issue. Let’s find out how.
We think disinformation is a crucial issue — that’s why we collaborated with Debunk EU pro bono and provided them with access to our web intelligence tools.
The global pandemic and the ongoing Russian aggression against Ukraine have amplified the problem of misinformation, with rapidly evolving large language models like Auto-GTP making it easier to generate false information. What’s concerning is that a large part of the general public lacks the skills to recognize false information on the Internet.
“When determining the truthfulness of news, it’s crucial to analyze the emotions and tone of the information. While all new sources regularly use linguistic devices like epithets and hyperboles to make the text more engaging, perpetrators of misinformation use more intense emotions in their communication” — says Aleksandras.
Relying solely on emotion analysis isn’t enough to identify misinformation. To do this, we need to evaluate more aspects of the texts and their sources — and that’s where machine learning tools become extremely handy.
That’s where web scraping and machine learning can be extremely helpful. Web scraping is a technology that allows collecting large amounts of publicly available data, whereas machine learning can assess the credibility of gathered data. By combining these technologies, emotion analysis can be carried out.
Mood or emotion analysis is extremely useful for identifying fake or biased news. It all begins with collecting a large number of reliable articles to establish the optimal level of emotions expressed in factually accurate texts. This acts as a reference point, and articles exceeding the ordinary levels of emotion can then be examined closely.
Aleksandras adds that relying solely on emotion analysis isn’t enough to identify misinformation. To do this, we need to evaluate more aspects of the texts and their sources — and that’s where machine learning tools become extremely handy.
“Machine learning helps find answers in a complex, interconnected data network, as it doesn’t require the definition of specific factors that distinguish real news from fake ones. These tools receive data and learn patterns within it. Thus, by using public data collection tools, we can extract massive amounts of data, mark it accordingly, share with a machine learning model, and then allow it to identify false news,” — says Aleksandras.
However, to objectively mark information, it’s crucial that the person doing this work remains unbiased and these datasets are meticulously verified by others. Otherwise, we risk being subjective.
We think disinformation is a crucial issue — that’s why we collaborated with Debunk EU pro bono and provided them with access to our web intelligence tools. Now, they can gather large volumes of public data, making their fact-checking process more efficient.
Our partnership with Debunk EU is a part of our pro bono Project 4β, where we collaborate with nonprofit organizations and help them use web scraping tools to solve various social issues.
Ready to use the power of web intelligence for the greater good? Bring your talents over to our team — check out our open positions.