Significance of Entropy in Combating AI-Driven Disinformation
By: Henry Widjaja, Shreya Das, GeorgeDaniel Dixon, Farah Basher
Mentors: Ann Grimes, Noah Huffman
The proliferation of disinformation presents noteworthy societal challenges, emerging as a multifaceted tool to perpetuate untrue narratives. The prevalence of AI technology has only progressed this phenomenon in the form of false media content, exemplified by AI-generated photos, such as a portrayal of Putin’s detainment and Tesla’s CEO Elon Musk holding hands with General Motors’ CEO Mary Barra- both staged and fictitious (Weber et al., 2023). Due to instances such as this, it is crucial to be able to differentiate AI-generated from authentic, human-generated content. Understanding these differences are crucial to identifying and authenticating information that communities consume. One answer, in the field of AI-generated text, is to use entropy as a distinguishing metric. As a numerical measurement of variance, entropy allows us to delineate between AI- and human- generated text. (Wardle, 2021)
Through the use of entropy, our research found a distinguishable value separating both AI- and human-generated text. AI text yielded an 2-gram conditional entropy estimate of 3.119, while human text yielded an entropy of 3.883, resulting in a 20% variance. This suggests entropy’s potential in distinguishing AI- from human-generated text, a crucial step amidst many rising AI startups. Thus, our research takes a step towards innovative content authenticity through AI text detection.
Disinformation is an evolving issue in our society, progressing through the usage of AI technology. With an increased prevalence of AI tools within our webspace, the production of AI-generated content has ramped up tremendously, especially in recent years. The inclusion of AI content has been seen in fields of photography, music, art, and journalism. This inclusion holds the potential for massive repercussions, as AI-generated content could very easily be used to spread false information through deep fakes and falsely constructed photos and articles. Various means of propaganda could be created and spread very easily, posing a real threat to the peoples’ ability to receive truthful and unbiased information. (Wardle, 2021)
AI in Today’s Society
Just this year, NewsGuard, a tool used to aid in misinformation tracking and rating of various sites, identified news sites heavily utilizing AI-generated content, such as iBusiness Day or Daily Time Update (Sadeghi et al., 2023). AI’s integration into the field of journalism has led to unreliable and low-quality sites, some of which produced a large amount of disinformation. A few of them actively used OpenAI in order to generate false articles, such as a report on President Biden’s death. When NewsGuard attempted to reach out to these sites, they found little contact information and little assurance of any actual reliability (Lutkevich, 2023).
AI Detection: Who’s playing?
As a result of the proliferation of news sites using AI-generated content along with the rise of tools such as ChatGPT, it has become difficult to distinguish AI- and human- authored content. In order to understand how this content is made, we must think about Natural Language Processing (NLP) methods. NLP methods leverage Large Language Models (LLMs) trained on extensive data, to enable computers to mimic human language patterns and create AI-generated content. Acknowledging the danger of AI, companies have begun forming AI content detectors, using statistics to ascertain if a piece of writing was produced by a human or an AI (Karjian, 2023). There are several different players in the field of AI detection, such as:
- Harvard AI Lab’s Giant Language Model Test Room (GLTR): The goal of GLTR is to identify AI text by employing the same language models as content creators do, and can thus recognize text produced by ChatGPT-2 and its later variants with ease (Delgado, 2023).
- OpenAI’s GPT-3 Detector: OpenAI created its own AI language detector for its AI suite. Using datasets of both artificial intelligence and human-generated content on related themes, it trained its AI content detecting tool (Delgado, 2023).
- Content at Scale AI Detector: The Content at Scale AI writing detector predicts the most likely word combinations that statistically increase the likelihood of AI detection. To improve its odds of making correct predictions, it is trained using trillions of pages of data (Delgado, 2023).
- GPTZero: When ChatGPT began demonstrating its use cases for content creation in the education market, GPTZero first came to light. As a result, it can be said to be primarily a ChatGPT detector, but it was created as a general-purpose AI detector. Perplexity and burstiness are two plagiarism signs that GPTZero reports (Delgado, 2023).
Why distinguish AI-generated text?
In the wake of greater AI integration into the web, there’s been a far greater quantity of AI-produced content online. In a world amidst rampant misinformation, we lack a base method of media authentication. The inclusion of AI technologies, which can easily be used to generate false content, is becoming a bigger issue. Fake or mis-contextualized content can be used to push harmful agendas or spread propaganda. The wake of Web 3.0, fomenting a more decentralized internet space, may also have aided in the creation and mass sharing of false content and disinformation. Thus, our study takes an imperative step in developing a method to distinguish AI- from human- generated text – a key step towards a design for content authenticity in the media.
Entropy as a Distinguishing Metric
Entropy is a measurement of the uncertainty, or disorder, in a system. Originally part of thermodynamics, entropy can be applied to literary texts in order to determine the complexity, organization, and structure of the text. It is represented as an equation and yields a numerical value, with lower entropy signifying lower disorder and higher cohesion. Given that cohesion is directly related to (communicative) efficiency, entropy can be inversely used to determine the efficiency of a literary text sample. Thus, the main use for entropy’s use will be to determine the efficiency of a text sample.
For example, let’s say language A and B exist, where both languages often communicate the same meaning through a different syntax. Whereas Language A often uses words such as “hi”, “great”, or “bad”, Language B elects to use language such as “greetings”, “fantastic”, or “awful”. These languages will have different entropies. Language A will have a lower entropy, as it often communicates the same message as Language B but uses less bits to do so. Because it takes fewer bits for Language A to communicate the same concepts as Language B, it is more efficient in its communicative ability and therefore has a lower disorder and entropy.
Explanation of an N-gram
An n-gram is representative of a certain number of character combinations which include, but are not limited to, the alphabet, punctuation, and numbers, as well as combinations of them. For example, 1-grams represent all singular characters (such as a, b, c, etc.) In English, considering only lowercase letters, there are 26 1-grams. Likewise, 2-grams are representative of all 2-character pairs (such as aa, ab, ac, etc.), thus there are 26^2 2-grams. N-grams account for all possible character combinations given a certain number of characters. In our case, we only used lower-case alphabetical 1- and 2- grams to provide preliminary results. By including 2-grams, we conclude with a more accurate textual entropy estimation.
Mathematical Explanation of Entropy
Mathematically, entropy on the topic of information theory is represented by the below equation.
Figure 1. Equation of Entropy
This equation allows us to compute an upper-bound entropy, or disorder, of a text sample. This is done by taking the probability of a character’s occurrence, and multiplying that by the log of the inverse of that character’s occurrence. This is done for every character i in an n-gram. An expanded equation is below.
Figure 2. Expanded Equation of Entropy
The first equation explains the conditional entropy Fn of a text sample, evaluated over an n number of characters. It presents two representations of the entropic equation. The first representation reads that entropy can be thought of as the entropy of a character
conditioned on the previous characters
etc. This goes on until all characters in a sample are exhausted. The variable n represents the number of characters being evaluated to reach this conclusion. Thus, we must consider the probability of the character given the previous character. This is because certain character combinations which are more likely to form a word are more likely to be put together.
The second representation reads that entropy can be thought of as the product of a character,j, given a previous character combination, bi, multiplied by the log of the probability of j, conditioned on a character combination bi, and summed up over all the character combinations i,j. In simpler terms, this is the sum of all character combinations i,j – of a probability of character j given a previous character combination bi, multiplied by the conditional entropy of j conditioned on bi.
The second equation represents that as we evaluate this equation with a number of n-grams approaching infinity, we approach the entropy of the English language. This shows that we can reliably use this method to estimate the entropy of a dataset and compare their differences, as all human-written samples should approach the entropy of English while AI-generated
Calculation of Probability
Entropy involves the consideration of the probability of a character in a text sample. Given a preceding character, a character’s probability is calculated for all possible character combinations in an n-gram, until they are exhausted. This is done by counting each specific character combination and dividing it by the total number of counted character combinations – a standard calculation of probability.
Relation of Probability and Entropy
Probability reflects event likelihood and certainty. Thus, logically speaking, probability is inversely related to uncertainty. Paradoxically, using this equation, a fully certain probability (of one) will result in an uncertainty of one, signifying that there is uncertainty in a sample where the character combinations are fully certain. The equation below outlines this scenario.
Figure 3. Disproving the sole use of inverse to calculate Uncertainty
To resolve this, the log of the inverse of probability must be used, correctly presenting us with zero uncertainty in a sample where the character combination is fully certain. If we consider that the probability of a character is 0, its uncertainty is undefined. This is acceptable, given that the uncertainty for a nonexistent character combination is irrelevant. The equation below outlines this scenario.
Figure 4. Proving the use of log and inverse to calculate Uncertainty
Integration of Python in Sample Calculation
In consideration of the large samples we were utilizing, it was imperative to use time-saving methods, and one way to do this was through the use of NumPy. It was able to convert our samples of English and AI text into workable numbers. This was done by counting the number of 1-grams and 2-grams, storing this count into a NumPy array, and applying each value of this array to the calculation of entropy with respect to the total combinations counted per sample. Then, these arrays were summed up to find the entropy of the given text sample.
Application of Entropy of the Human-Written Language
To calculate the entropy of human written language, we utilized the Google Books English 1 Million N-Gram corpus (Google, 2012). This is a corpus, stored in a CSV, with word counts found in 1 million different books ranging from 1500 to 2008. It is representative of the progression of the English written language, as the number of books counted per year also represent the subject variety of English at the time. Utilizing Python’s Pandas package, we split
this dataset by year into samples representing a single year. Then, we used NumPy to count each 1-gram and present it in each sample (in this case, year), applying the entropy calculation to each 1-gram and summing it up to find the entropy/year estimate. Then, we applied the same calculation to the counts of 2-grams, summing it up and subtracting the entropy of 1-grams, to conclude with a more accurate entropy/year estimate. The diagram below outlines the aforementioned process.
Figure 5. 1-Gram Setup and Parser Structure – Human Text
Figure 6. 2-Gram Setup Structure – Human Text
Application of Entropy of AI-Generated Language
To calculate the entropy of AI-generated language, we utilized a corpus of over 40,000 ChatGPT responses (Peng et al., 2023). This corpus of ChatGPT responses is representative of what a typical user may ask of ChatGPT and includes a variety of conversational responses involving
mathematics, literature, and science. Utilizing Python’s JSON package, we split a large JSON file storing these responses into ChatGPT output only, in order to evaluate only the output and not the question being asked. Utilizing NumPy, we applied the same entropy calculation over the entire sample’s 1-gram counts, summing it up to find a preliminary entropy/year estimate. Then, as shown in the diagram below, we did the same with the entire sample’s 2-gram counts, summing this up and subtracting the entropy of 1-gram counts to find a more accurate entropy/year estimate.
Figure 7. 1-Gram Setup and Parser Structure – AI Text
Figure 8. 2-Gram Setup Structure – AI Text
Collection of Results
We aimed to collect three different metrics as our results – the English language’s progressive entropy over time, the entropy of both human- and AI- generated language, and the entropic difference between human- and AI- generated language. In order to determine English’s progressive entropy, the English language’s entropy of each calculated year was plotted as a respective point over time and culminated into a graph. For our last two metrics, entropy calculations for both Human-written language and AI-generated language were directly compared via an inequality and the difference between them was calculated.
Calculated Entropy – Human/AI Text
As per our experiment, the calculated 2-gram entropy for human-generated text – 3.883 – was higher than AI-generated text – 3.119. This signifies that AI text has lower uncertainty and higher efficiency, and moreover, we can conclude that AI text has a 20% lower uncertainty.
The diagram below represents the evaluation of texts and the threshold for which we consider text to be AI- or human-generated. If any text is deemed to have an entropy <3.119, it is most likely AI-generated, and any text that has an entropy >3.883 is likely human generated. The diagram also represents the 20% delta between the certain values distinguishing a fully certain outcome of the type of text.
Figure 9. Entropic Separation of AI and Human Generated Text
Entropic Progression of English Language
The English language was found to have progressed on a variable trend. Entropic estimates were volatile and increasing to start due to low sample size in the Google corpus, but drew a distinct path starting in the year 1700. The entropy of the English language started to steadily drop off after this time, around the time of the invention of Late Modern English, which occurred around 1800. After the early-mid 1800’s, the entropy of English increased in a linear fashion. Below is the 1-gram and 2-gram entropy for the English language since 1500.
Figure 10. Entropy of the English Language over Time: 1-Grams
Figure 11. Entropy of the English Language over Time: 2-Grams
Distinguishing Human/AI Text
This paper has shown that it may be possible to distinguish between human-generated and AI-generated text through the use of entropy. By using the human-written and AI-generated data, we were able to establish distinguishing measures of entropy for both sets of data. The values given in the results, 3.883 and 3.119, reflect the higher bounds of their respective entropies. The 20% difference between the two entropy values is a clearly detectable and significant number, showing a clear method of differentiation.
Significance of Entropic Changes over Time
The entropy of the English language seems to increase with time. The graph displaying the progression shows a rapid swing back and forth for entropy up until the development of Late Modern English, occurring around 1800. During this period, the entropy of the English language decreased slightly or stabilized, representing a stable of increasing cohesiveness and efficiency in the English language during this time period. After this period, the entropy of English increased, indicating a decrease in cohesiveness and efficiency. This is consistent with the fact that Late Modern English is currently how English is spoken while more words have been invented over time. This invention of new words leads to a decrease in communicative efficiency – a greater number of words used to communicate the same message. Additionally, it is clear that utilizing 2-grams return a more effective estimate of entropy as variation is reduced.
Efficiency of AI Text
The results of the research have also clarified that AI technologies may actually be able to communicate in English with fewer average bits than humans are able to, or may be able to communicate more effectively than humans. The lower entropy value for AI-generated content (3.119) hints at a greater cohesiveness and efficiency, though this number may be affected by a series of outside factors.
First, the human-generated text corpus is far larger than the AI-generated text corpus. Human text has been around for far longer than AI text has, and the AI data used was taken from ChatGPT conversations, a tool that has only really been used in recent years. Our relatively small corpus of only 40k ChatGPT conversations can be expected to have more variance than the larger Google corpus, so perhaps our entropy computation is more biased in this case. In addition, the AI text may be using fewer bits because the Google Books corpus may lack a recognition of informal abbreviations, such as “lol”. This method of efficient conversation may be overlooked. We also only accounted for bigrams, or a pair of consecutive letter units, and it is very possible that results could change if we attempted calculation for deeper grams, as it could provide a more accurate look into how the actual ordering of letters has an impact on the entropy of the different texts.
Implications of AI on Public Trust
One of the the begging questions that has surfaced out of our research is: How might the rise of AI impact journalism, law, education, regulation, and other areas of public policy? In the journalism industry, once the usage of AI in journalism matures, artificial intelligence will be able to automatically create a myriad of new stories: sports results, financial earning results, for example. That means AI can help new organizations produce content more effectively [cheaply], faster and arguably with fewer errors. While promising, the journalism industry, like many others, fears AI-driven job losses. In a recent New York article entitled, How Will Artificial Intelligence Change the News Business?, author John Herrman frames generative AI as a tsunami, pointing out “We can either ride it or get wiped out by it” (Herrman, 2023). While AI being faster, cheaper, and effectively better, the negative aspects – attempting to protect copyrights and distinguishing plagiarism – have yet to be solved. Our research aims to take a step towards addressing potential plagiarism from AI-generated text.
The implication of AI in Public Policy raises thorny questions about ownership issues and appropriate guidelines/rules that differentiate between AI generated content and human generated content. Public policy makers must also address ways to determine the authenticity of the original content being presented to the public. This will combat AI-driven misinformation and improve the public’s trust in the information that journalists and news outlets provide.
Implications of AI in Education
In regards to the impact of AI on education, the question of machine-generated content through ChatGPT has been widely discussed. Manjeet Rege avers in his article, The Impact of Artificial Intelligence and Chat GPT on Education, “The deployment of ChatGPT, an artificial intelligence (AI) application that can compose grade-passing essays, create detailed art and even engage in philosophical conversations… is raising tough questions about the future of AI in education. Will AI replace or stunt human intellect or critical thinking?” (Rege, 2023) Beyond that, some pose the notion that AI could fill the gap and need for teachers in education. However, others point out that AI can not fill the social and emotional aspects that students need to thrive in classrooms. They argue AI educational platforms simply can not take into account student’s learning behaviors and preferences to enhance their learning to the extent that teachers can. AI can perpetuate biases in their learning models, and lack of personalization, and the potential for errors are more frequent. (Greene, 2023). However, that is not to say AI can not be helpful in educational spaces. AI definitely makes grading more efficient, that way teachers can focus on the actual curriculum and fitting the needs of their students. Also, AI can provide responses instantaneously to students’ questions in a virtual setting if the teacher is not available (Seo et al., 2021). All in all, AI can definitely advance education, but the questions still remain: How? Will it be effective? How can we be sure it is as accurate as it can be?
Artificial intelligence is more prevalent in today’s society due to the spread of misinformation (Ortiz, 2023). It is of the utmost importance to be able to differentiate between human generated content and artificially generated content. Deep fakes are just one of the examples undermining job security, resulting in the need for more systematic provenance. Our research project opens the door to distinguishing the differences of man made content and AI by utilizing Entropy, to ultimately create a pathway for future technologies and a more harmonious relationship between humans and AI.
- Weber, J., Wesolowski, K., & Sparrow, T. (2023, May 2). Fact check: How can I spot AI-generated images? – DW – 04/09/2023. dw.com. https://www.dw.com/en/fact-check-how-can-i-spot-ai-generated-images/a-65252602
- Wardle, C. (2021, August 3). Understanding information disorder. First Draft. https://firstdraftnews.org/long-form-article/understanding-information-disorder/
- Sadeghi, M. (2023, August 14). Tracking AI-enabled misinformation: Over 400 “unreliable AI-generated news” websites (and counting), plus the top false narratives generated by Artificial Intelligence Tools. NewsGuard. https://www.newsguardtech.com/special-reports/ai-tracking-center/
- Lutkevich, B. (2023, June 27). Artificial Intelligence Glossary: 60+ terms to know. WhatIs.com. https://www.techtarget.com/whatis/feature/Artificial-intelligence-glossary-60-terms-to-know
- Karjian, R. (2023, August 2). How to detect AI-generated content. Enterprise AI. https://www.techtarget.com/searchenterpriseai/feature/How-to-detect-AI-generated-content?utm_campaign=20230806_ERU-ACTIVE_WITHIN_240_DAYS&utm_medium=email&utm_sourc e=SGERU&source_ad_id=365534713&src=15002958&asrc=EM_SGERU_273812674
- Delgado, J. (2023, July 13). 8 most accurate and reliable AI content detectors in 2023. 10Web. https://10web.io/blog/ai-detectors/
- Ortiz, S. (2023, August 14). What is CHATGPT and why does it matter? Here’s what you need to know. ZDNET. https://www.zdnet.com/article/what-is-chatgpt-and-why-does-it-matter-heres-everything-you-nee d-to-know/
- Google. (2012). Google Ngram Viewer. http://books.google.com/ngrams/datasets,
- Peng, B., Li, C., He, P., Galley, M., & Gao, J. (2023). Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
- Herrman, J. (2023, August 1). How Will Artificial Intelligence Change the News Business?. Intelligencer. https://nymag.com/intelligencer/2023/08/how-ai-will-change-the-news-business.html#:~:text=It%20would%20be%20automating%20the,news%E2%80%9D%20that%20are%20already%20cheap.&text=Much%20more%20common%2C%20so%20far,centric%20news%20production%20a nd%20distribution.
- Rege, M. (2023, April 13). The impact of Artificial Intelligence and CHATGPT on education – newsroom: University of St. Thomas. Newsroom | University of St. Thomas. https://news.stthomas.edu/the-impact-of-artificial-intelligence-and-chatgpt-on-education/
- Seo, K., Tang, J., Roll, I., Fels, S., & Yoon, D. (2021, October 26). The impact of artificial intelligence on learner–instructor interaction in online learning – international journal of educational technology in higher education. SpringerOpen. https://educationaltechnologyjournal.springeropen.com/articles/10.1186/s41239-021-00292-9
- Greene, R. T. (2023, April 24). The Pros and cons of using AI in learning: Is CHATGPT helping or hindering learning outcomes?. eLearning Industry. https://elearningindustry.com/pros-and-cons-of-using-ai-in-learning-chatgpt-helping-or-hindering-learning-outcomes
- Vynck, Gerrit de. (2023, August 7). Every start-up is an AI company now. bubble fears are growing. The Washington Post. https://www.washingtonpost.com/technology/2023/08/05/ai-hype-bubble-chatgpt/
- Urbani, S. (2021, June 28). Verifying online information. First Draft. https://firstdraftnews.org/long-form-article/verifying-online-information/
- HAKIMI LE GRAND, H. (2023, July 6). Using artificial intelligence in your reporting while maintaining Audience Trust. International Journalists’ Network. https://ijnet.org/en/story/using-artificial-intelligence-your-reporting-while-maintaining-audience- trust