Evaluating Location-Dependent Variation in Political Google Search Results: A Case Study in Brazilian Politics

Blog, Journal for High Schoolers, Journal for High Schoolers 2023

Nguyen Hoang Minh Ngoc, Vania Tucto

Mentor: Amy Dunphy

Abstract

With the rise of the internet and social media, misinformation has driven elections all over the world to become increasingly contentious and polarized. Web search results, while generally understudied as a vector for misinformation and political bias, have been found to have dramatic effects on the behavior of undecided voters.

In this project, we sought to study the effect of search location on the ranking and political slant of search results. In a small initial dataset focused on the 2022 Brazilian election, we observed differences in the results returned for different queries based on the location from which a user was searching. This effect could have a widespread impact on elections worldwide. In future work, we seek to quantify the differences between results for different locations, in order to shed light on the specific ways that a location’s characteristics could impact the search experience of voters in that region.

Background

Web Search Engines and Elections

The internet, particularly technologies such as social media and web search, has made it easier than ever to seek or disseminate information. While these developments have massive positive impact, many problems have arisen, including the rising spread of mis- and disinformation. A study by Nanyang Technological University found that 7 in 10 people had accidentally shared social media posts containing fake news. Only one in four Britons reports that they trust social media. [1]

Trust in web search results is considerably higher than in social media. In the 2021 Elderman Trust Barometer [12], the percentages of people reporting trust in search engines and social media are 56 and 35 percent, respectively. The process of gathering information through search engines like Google has been found to reaffirm existing beliefs [9], amplifying the effect of misinformation or biased result ordering in web searches.

There have been many papers examining the influence web search results may have on elections, especially with regard to shaping voters’ opinions toward candidates. Ranking of web searches could prompt up to 80% of undecided voters to select the up-ranked candidate [6]. Epstein’s five experiments in two countries also demonstrated that web search rankings could shift the voting preferences of undecided voters in real elections by 20%. Most concerningly, study participants showed no awareness of the manipulation.

The 2022 Brazilian Election

The 2022 Brazilian election was highly controversial and drew international concern and attention. Elections and misinformation researchers around the world carried out discussions and studies to ensure free and fair elections. Researchers at Harvard and Brazilian analysts both warned that misinformation would be a critical risk, as social media channels flooded with “disinformation, calls to “Stop the Steal” and cries for a military coup”. [2, 8]

Incumbent President Jair Bolsonaro’s supporters widely claimed that his loss was the result of fraud in the election. Some supporters of the far-right Bolsonaro took to the streets with anger, actively calling for intervention from the military. After weeks of investigation from different Brazilian authorities and independent security experts, it was determined that there was no credible evidence of voter fraud. [4]

Location-Based Search Results

Search engine results depend on the location from which you are searching, even if the location is not explicitly included in the search query. The location data is used to improve search results: for example, if a user searches for “coffee shops”, they will receive information on establishments nearby. Broadly, search engines seek to provide relevant and useful results, which may be affected by the behavior of other users in your area. [3]

However, a major concern is the degree to which the search engine’s idea of relevance may become biased. If a web search user lives in a region Bolsonaro is very popular, many of their neighbors googling for political news will favor pro-Bolsonaro content. We hypothesize that an undecided voter living in a pro-Bolsonaro region may see different web search results from one living in a pro-Lula region. This is an effect we want to investigate and quantify since it could potentially have a dramatic effect on the decisions of undecided voters.

Methods

WebSearcher Library

WebSearcher is a package providing tools for conducting and parsing web searches. It is capable of returning web search results from a range of different locations. In this research paper, we apply this package to search Google from a variety of locations across Brazil and retrieve the text, title, and hyperlink of the top results. [13]

TF-IDF analysis

The TF-IDF (Term Frequency – Inverse Document Frequency) is a technique used in textual data mining. This statistical measure is used to evaluate the importance of a word in a document to a word in a collection of documents. This is done by multiplying two metrics:

– Terms frequency of a word is the number of times a word appears in a document.

A black text on a white background

Description automatically generated
  • tf(t, d) : frequency of the word t in document d
  • f(t, d) : raw frequency of the word t in document d
  • max{f(w, d) : w d} : raw frequency of the word t in document d

– Inverse document frequency of word shows how common or rare the word is in document set.

A black text with black text

Description automatically generated with medium confidence
  • idf(t, D) : inverse document frequency of the word t in document set D
  • |D| : total number of documents in the document set D
  • |d D: t d| : number of documents in document set D that have the word t
A math equation with a few black letters

Description automatically generated with medium confidence

Therefore, the higher the tf-idf is, the more relevant that word is in that particular document. [7]

Sentiment analysis

Sentiment analysis is an automated process of tagging data according to their sentiment (eg. positive or negative). We selected a pre-trained model from HuggingFace in order to analyze the sentiment of the results. [5]

Queries and locations

We selected a range of results across a wide variety of geographies, population densities, and political slants. They range from urban to incredibly rural with populations from sixty thousand to twelve million, and had up to 80% support for one candidate over the other. Figure one shows the geographical distribution of our selected locations, as well population and the results of the 2022 first-round election.

A map of brazil with red and blue colors

Description automatically generated Figure 1: The characteristics of our seven selected search locations A map with a diagram

Description automatically generated with medium confidence

Results

Using the WebSearcher library to gather data, we faced a key challenge: it is very difficult to gather large data sets from major platforms. Technology companies often utilize strategies such as CAPTCHAs in order to prevent automated data collection, which dramatically limits our ability to gather representative datasets.

We gathered the title and preview text of the top Google results for each of our seven selected locations for ten Portuguese queries: ‘floresta amazônica’, ‘desmatamento’, ‘militares’, ‘crime’, ‘COVID-19’, ‘pobreza’, ‘segurança’, ‘povo indígena’, ‘refugiados’, and ‘imigração’. To select political results, we appended “Bolsonaro” to each of the queries. In the future, we wish to expand the number of keywords, and append “Lula” as well: unfortunately, our data collection methods broke before we were able to gather these additional datasets.

The results of the queries were generally quite similar across different locations, though there were observable differences. Many of the results largely contained recent (past ~1 week) news articles, which were mostly collected post-election. In the future, we would like to add a time filter to focus on articles published before or during the election. We also saw many results in English, despite the fact that we explicitly used settings for Portuguese and used Portuguese queries.

There were observable differences in results across locations. Both very small towns and large cities tended to have English results from international new sites, which tended to skew much more heavily anti-Bolsonaro [10]. In contrast, mid-sized cities have the most Portuguese results. Additionally, news about Brazil’s deforestation [11] with the role of Bolsonaro appeared slightly more often in pro-Lula regions, such as Teresia.

Unfortunately, our initial dataset was not large enough to quantify any strong differences in sentiment or word use between different search locations. However, we have demonstrated that there are in fact some differences in web search results and rankings for political or election-related terms, which can vary depending on search location across a single country. This could have many implications for future elections, and we look forward to better quantifying these differences in future work.

Future directions

As the quantitative analysis techniques we implemented could not be used on the small initial data set we collected, our next goal is to compile a much larger data set using a wider variety of search queries, and with date ranges set to focus on news from the time of the election.

We also would like to analyze the language-based differences among the articles. Qualitatively, we observed that English language results appeared to skew more anti-Bolsonaro, however, we would like to gather a much larger dataset to determine if this effect can be quantified.

In the future, we also plan to expand to a wider variety of search engines. While Google is very widely used, some initial exploration found that search engines like Yandex (a Russian search engine) return significantly more pro-Bolsonaro content. In this way, we can see whether location-based variation in search results appears across other types of search engines as well.

References

  1. [1] Chee, K. (2022) ‘Many in Singapore confident they can spot fake news but may not actually be able to: Study’, The Straits Times, 12 January. Available at: https://www.straitstimes.com/tech/tech-news/many-in-singapore-confident-they-can-spot-fake-news-but-may-not-actually-be-able-to-study.
  2. [2] Dwoskin, E. (2023) ‘Come to the “war cry party”: How social media helped drive mayhem in Brazil’, The Washington Post, 8 January. Available at: https://www.washingtonpost.com/technology/2023/01/08/brazil-bolsanaro-twitter-facebook/.
  3. [3] Google Search Help ‘Understand & manage your location when you search on Google’. Available at: https://support.google.com/websearch/answer/179386?hl=en&co=GENIE.Platform%3DDesktop.
  4. [4] Nicas, J. (2022) ‘Brazil Counted All Its Votes in Hours. It Still Faces Fraud Claims.’, The New York Times, 10 November. Available at: https://www.nytimes.com/2022/11/10/world/americas/brazil-election-fraud.html.
  5. [5] Pascual, F. (no date) ‘Getting Started with Sentiment Analysis using Python’, Hugging Face. Available at: https://huggingface.co/blog/sentiment-analysis-python.
  6. [6] Shultz, D. (2015) ‘Internet search engines may be influencing elections’, Science [Preprint]. Available at: https://doi.org/10.1126/science.aac8982.
  7. [7] Stecanella, B. (no date) ‘Understanding TF-ID: A Simple Introduction’, MonkeyLearn. Available at: https://monkeylearn.com/blog/what-is-tf-idf/.
  8. [8] ‘The Battle Against Fake News in Brazil’s 2022 Elections’ (2022). David Rockefeller Center For Latin American Studies, Harvard University. Available at: https://drclas.harvard.edu/event/battle-against-fake-news-brazil%E2%80%99s-2022-elections.
  9. [9] Tripodi, F. (2018) ‘SEARCHING FOR ALTERNATIVE FACTS Analyzing Scriptural Inference in Conservative News Practices’, Data&Society, 16 May. Available at: https://datasociety.net/library/searching-for-alternative-facts/.
  10. [10] Watson, K. (2021) ‘Covid: Brazil’s Bolsonaro “should be charged with crimes against humanity”’, BBC, 20 October. Available at: https://www.bbc.com/news/world-latin-america-58976197.
  11. [11] (2022) ‘Amazônia perdeu 31 mil km2 sob Bolsonaro, aponta Inpe’, 12 August. Available at: https://www.dw.com/pt-br/amaz%C3%B4nia-perdeu-31-mil-km-sob-bolsonaro-aponta-inpe/a-62794898.
  12. [12] (2021) ‘2021 Eldelman Trust Barometer’. Available at: https://www.edelman.com/trust/2021-trust-barometer.
  13. [13] Roberson, R. ‘WebSearcher’. Available at: https://github.com/gitronald/WebSearcher/tree/master

Leave a Reply