JASMINE LIU1, VALENTINA JURICEK, ALLISON SU
Abstract
Social and emotional recognition are fundamental aspects of children’s development, namely their ability to regulate their own emotions and properly understand those of others. However, while children’s literature can aid in developing their emotional competence, many children struggle with emotional expression through literacy; unlike in verbal communication where emotions are articulated through tone of voice, facial expressions, and physical gestures, children often find it difficult to comprehend intended emotions as they read. We aim to improve their literary emotion detection using Natural Language Processing(NLP). Our NLP-based application works by taking text as input and utilizing the EmoRoBERTa natural language processing tool to output the text’s main conveyed emotion.
1GitHub
INTRODUCTION
SOCIAL AND EMOTIONAL LEARNING (SEL)
Social and emotional learning (SEL) has been defined in numerous ways. Broadly, SEL consists of a set of social, emotional, behavioral, and character skills necessary in effectively navigating everyday tasks. The Collaborative for Academic Social Emotional Learning (CASEL), a leading research center and influential advocate for SEL inclusion within schools, recognizes SEL as a set of five competencies: self-awareness, social awareness, relationship skills, self-management, and responsible decision making [1]. More recently, the Wallace Foundation model identifies the three domains of SEL as cognitive regulation (attention control, inhibitory control, cognitive flexibility), emotional processes (emotion knowledge, expression, and regulation, and empathy or perspective taking), and social/interpersonal skills (social cues, conflict resolution, etc.) [2].
Together, these skills allow people to develop healthy identities, manage emotions, and understand the perspectives of others. This plays a vital role in establishing and maintaining positive, supportive relationships, and guiding responsible shortand long-term decision-making skills. In short, SEL develops the necessary interpersonal, intrapersonal, and cognitive skills to succeed in school, the workplace, and relationships.
While SEL in schools may seemingly detract from time spent on academics, considerable research suggests that SEL is indeed a necessary foundation for both academic and career success, and can even facilitate learning. Children who are able to effectively manage their thinking, attention, and behavior are also more likely to have better grades and higher test scores– studies have shown as much as an 11-percentilepoint improvement on standardized test scores [3] [4]. In addition, children with higher teacherrated social and emotional competence in early childhood are more likely to graduate, attend college, and have a job 20 years later. They are also less likely to face mental health challenges, have criminal justice involvement, or receive public assistance in young adulthood [5]. Thus, SEL is crucially beneficial to children’s development.
TEACHING SEL THROUGH LITERATURE
An individual’s ability to recognize emotions is part of their emotional competence. Emotional competence is comprised of three key components: (1) the recognition of emotions, (2) the expression of emotions, and (3) the experience of emotions. [6]. The recognition of emotions involves one’s ability to perceive the emotional state(s) of themself and others. This also entails identifying differences between inner emotional states and outer expression, and at more mature levels, understanding that emotional-expressive behaviors can greatly affect others [6]. Emotional expression encompasses one’s ability to evaluate, regulate, adapt, and respond to these emotions, as well as the ability to use normatively accepted vocabulary to express emotions. The appropriate experience of emotions involves one’s ability to recognize and regulate emotions of varying intensity. It also includes the capacity for emotional self-efficacy where one can confidently accept and embrace their emotional experience [6].
The development of these skills begins in early childhood and is largely shaped by one’s social context [6]. Typically, this means one’s family experience and relationships with parents, teachers, and peers. However, it also includes other forms of media that instruct children’s recognition, expression, and experience of emotions. Fairy tales, for example, introduce concepts of morality, imagination, danger, decision-making, proper etiquette, and social norms [7]. Fairy tales’ didactic nature will often teach children the interrelations between social and emotional behaviors. The story’s ultimate outcome raises the awareness that ”the structure or nature of relationships is in large part defined by how emotions are communicated within the relationship, such as by the degree of emotional immediacy or genuineness of expressive display and by the degree of emotional reciprocity or symmetry within the relationship” (Saarni 5) [6]. Hence, the stories of the characters will teach children the impacts of their own emotional experiences.
In addition, as children are provided the perspectives of multiple characters, they are guided into recognizing emotions within others and forming judgements about the characters and situations themselves. They are furthermore led to develop more advanced competency skills, as they familiarize themselves with conventional emotional vocabulary and recognize differences between characters’ internal thoughts and external emotional representation [8]. At an early age, children typically find it easier to express their emotions through physical gestures (a smile, laugh, etc.) rather than verbalizing. Effectively, teaching appropriate oral language is crucial for children to learn to cope with, express, and regulate their emotions [9]. Children’s literature undoubtedly supports social and emotional maturation.
NLP EMOTION DETECTION
Natural language processing (NLP) has over a 50 year history as a scientific discipline, with applications to education appearing as early as the 1960s. Initial NLP work dates back to the late 1940s, with focus on text-to-text translation across multiple languages. Systems used dictionary-lookup of appropriate words and manual word reordering after translation [10]. Chomsky later introduced the idea of generative grammar, revolutionizing syntactically accurate translation [10]. In recent years, efforts have shifted to improving human-computer conversation with speech recognition, statistical analysis and prediction, conversational agents/chatbots, facial recognition, auto-complete recommendation analysis, emotion detection, and natural language generation.
Typically, emotion detection works by training a model with a data set, then utilizing code to enable the model to analyze data with the opinion mining technique. Opinion mining, also known as sentiment analysis, is crucial for a computer to understand emotions and feelings in text. It is a technique that utilizes NLP and computational linguistics to identify and decode a sentiment behind a text. [11]
This paper applies NLP emotion detection to research on SEL. While children’s literature plays an instrumental role in developing emotional competence, many children struggle with reading comprehension and identifying the intended emotion(s) within literary texts. As manual annotation of texts is time intensive and costly, we use literary emotion detection to support children’s educational and emotional needs, creating an app for children to run texts through. We later conduct sentiment analysis, analyzing emotion-related patterns within children’s books and compare their analysis to other literary texts.
MATERIALS AND METHODS
EMOROBERTA
The application created utilizes the EmoRoBERTa NLP model, which [12]. is derived from BERT (Bidirectional Encoder Representations from Transformers), a deep learning model published by Devlin and his colleagues at Google in 2018. The model is based on the Transformers NLP, a technique which applies a bidirectional way of learning the context of words, and was trained using Wikipedia (2,500M words) and BooksCorpus (800M words).
Our application uses EmoRoBERTa to classify sequences of text as one of 28 emotions [12]. The data set included 60+ Grimms’ Fairy Tales [13], 50 standard Children’s Books [14], 50 TIME-
forKids Articles [14], 5 TIMEs articles [15], 4 novels (Metamorphosis, Frankenstein, Strange Case of Dr. Jekyll and Mr. Hyde, and Moby Dick) [13], and 3 Shakespeare plays (Hamlet, The Tempest, and Twelfth Night) [13]. Many of these were downloaded from the Project Gutebnerg website, an open library offering over 60,000 free eBooks.
As noted earlier, children’s stories, especially fairy tales, help develop children’s emotional lives and introduce them to many moral concepts that remain throughout their lives. The Brother Grimm’s Fairy Tales is a selection of some of the most popular fairy tales read–including Little Red Riding Hood, Rumpelstiltskin, Hansel and Gretel, and the Golden Goose. Additionally, due to the free open access, they were chosen to analyze for the project. Other texts were selected as points of comparison. For example, Hamlet (a tragedy), Twelfth Night (comedy), and The Tempest (contains elements of both a tragedy and comedy) were selected to compare emotion frequencies across different genres.
Data pre-processing consisted of data cleaning and text normalization. This involved removing stop words using the Natural Lanugage Toolkit (NTLK) library with Python to process bigrams. Regular Expressions was also used to normalize irregular spacing and remove punctuation in the top bigrams. NLTK was also used to tokenize the data. Tokenization was performed to split text into sentences. These sentences were then classified using the EmoRoBERTa model, which labeled their main emotion and provided a score that demonstrated the model’s confidence in its evaluation of that emotion. The score ranged from 0 to 100, with higher numbers representing more confidence.
Overall, four main variables were analyzed from and across the texts: (1) emotion frequency, (2) emotion distribution, (3) sentence word count, and (4) word complexity.
- Emotion frequency was calculated by counting the amount of sentences with a specific emotion as assigned by EmoRoBERTa.
- The ’emotion distribution’ was approximated to view the ”timeline” of emotions; while the emotion frequency displays the total count of emotions, it does not provide any insight into the concentrations of these emotions throughout the text. Emotion density, on the other hand, illustrates the varying amounts of each emotion at different parts of the story. Emotion distribution was calculated by categorizing each of the 28 emotions as positive, negative, or neutral, and counting the amount of positive, negative, and neutral emotions in a given segment. The 28 emotions were classified as follows:
- Positive emotions: admiration, amusement, approval, caring, excitement, gratitude, joy, love, optimism, pride, relief
- Negative emotions: anger, annoyance, confusion, disappointment, disapproval, disgust, embarrassment, fear, grief, nervousness, sadness
- Neutral emotions: realization, surprise, neutral, curiosity, desire
The text was then segmented into 5-20 pieces (depending on the length of text) and the sum was taken of the positive, negative, and neutral emotions in each segment.
- Sentence word count was found by counting the number of the tokens for a given sentence.
- ’Word complexity’ was determined using the word-frequency tool [16]. Wordfrequency estimates the vocabulary difficulty of words from their commonality/usage frequency. Based on their frequency, words are assigned Zipf values to determine their vocabulary level and classified as follows:
- Zipf 5 − 8: Beginner
- Zipf 3 − 5: Intermediate
- Zipf 0 − 3: Advanced
The average of each word’s value was taken to calculate the vocabulary level of each sentence.
RESULTS
EMOTION LABEL COUNTS
Figure 3 displays the percentages of each of the 27 non-neutral emotions in various texts.
The most common emotions in the Grimms’ Fairy Tales were admiration, fear, joy, and sadness, with admiration being the highest, while the least common emotions were embarrassment and pride.



Figure 1 and Figure 2 depict the relative amounts of each emotion label across the different texts. Each chart is color coordinated to display the amount of positive (pink/yellow), negative (blue/purple), and neutral (gray) emotion types.
As can be seen, the fairy tales contained the greatest amount of the ’neutral’ emotion while the novels contained the least. As will later be seen in ’Emotions at a Lexical Level’, most text within the fairy tales is simple dialogue or basic character and setting description. The fairy tales consist almost entirely of character dialogue or omniscient narration, often forcing a lack of emotions that would normally be expressed by a character experiencing the actual events. Hence, many of these sentences read neutral as they are not containing a specific emotion, but merely describing events.
Conversely, the novels contained the least amount of the ’neutral’ category of emotions. The longer length and deeper complexity of the novels allowed for more character and plot development, and thus emotional depth. The progression of these novels are often driven by emotional development. The success of these novels can also be accredited to their ability to engage the audience by using strong and descriptive words.
In terms of non-neutral emotions, children’s books contained the least amount of negative emotions, with fairy tales and novels having roughly the same. However, fairy tales showed a much greater amount of positive emotions, while novels had near equal amounts of positive and negative. Fairy tales often instill a sense of fear or danger the character must overcome to teach some sort of moral lesson, while children’s stories remain fairly upbeat from beginning to end, with no sense of moral struggle. These fairy tales typically end on a note of happiness, teaching children that if they abide with certain social codes they will prevail.
Figures 4 and 5 display the emotion counts in TIME and TIME for Kids articles. The TIME articles are mostly neutral, with most positive emotions stemming from ’approval’, while the TIME for Kids articles are most positive, with predominant emotions being ’admiration’ and ’joy.’


ORDERING OF EMOTIONS
Figure 7 represents the ”emotion timelines,” or distributions of positive (shown in blue), negative (green), and neutral (orange) emotions across the text. In each of the timelines, the ’neutral’ emotion was omitted due to governing frequency.
As seen, the fairy tales (shown across the top row) typically begin with a neutral emotion– often a simple line describing the setting–and end on notes of happiness (”happily ever after”).
The distribution and varying levels of positive, negative, and neutral emotions stays fairly consistent with genre and plot development.
The last row displays three Shakespearean plays, one tragedy, one comedy, and one that contains elements of both tragedies and comedies. As Hamlet is a tragedy, it starts with somewhat even levels of positive and negative emotions, though negative is slightly more prevalent. The text briefly raises in positive emotion (likely during the play-within-a-play scene), and from about one-fourth in progressively declines in positive emotion while increasing in negative. In the comedy Twelfth Night, the positive emotions remain consistently more prevalent than negative and neutral ones. The Tempest’s chart does not exhibit the clear distinction between positive and negative emotions as in Twelfth Night, though it also does not show the steady increase in negative emotion as in Hamlet, even ending with less negative than it began with; it’s a mix of both.
EMOTIONS AT A LEXICAL LEVEL
The following table shows the leading bigrams for the selection of Grimm’s fairy tales analyzed. Many of these bigrams, such as ”(’said’, ’I’)” and ”(’I’, ’shall’)” demonstrate dialogue. This can be further affirmed because Grimm’s stories are typically written in third person, so the use of personal pronouns such as ”I” are evidence of dialogue or thoughts. The bigrams ”(’old’, ’woman’) and ”(’long’, ’time’)” suggest the more simple descriptions used in Grimm’s fairy tales, which makes sense as they are typically seen as friendly for children to read.


SENTENCE STRUCTURE
Emotions were then analyzed with respect to sentence structure. Relationships between sentence word count and sentence level complexity were analyzed using simple linear regression analysis and product correlation coefficient tests for the neutral, positive, and negative emotions. Figures 8, 9, 10 are scatter grams of the relationship being studied. Note that the lower sentence complexity zipf values indicate a more advanced vocabulary level.



For the ’neutral’ emotion, shown in 8, the line of best fit was calculated as y = 0.023x + 3.33. The correlation coefficient, .183, is greater than its respective critical value at both a 95% and 99% significance level, meaning we can reject the null
−

that there is no correlation between word count and world complexity. Figures 9 and 10 yielded similar results, with r values of .185 and .176, respectively. Figure 11 shows scatter plots of the neutral, positive, and negative emotions in the other four text types.
DISCUSSION
Our findings suggest a relationship between word count and word complexity. However, while the results aim to shed light on the emotions within the texts, it can also be influenced by the NLP model and how it detects specific emotions within texts. For example, the model often marks sentences containing symbols as exceptionally complex. In a TIME news article discussing COVID, it identified a relatively simple sentence (”so far, APDC members have contributed to identifying three major SARSCoV-2 variants”) as ’advanced,’ likely due to the sentence’s inclusion of ”SARS-CoV-2 variants” [15]. The model is also not entirely reliable at determining if a particular part of a sentence is expressing a certain emotion. The sentence ”then everybody laughed and jeered at her; and she was so abashed, that she wished herself a thousand feet deep in the earth” from the Grimms’ text was marked as amusement; however, while the first half of this sentence expresses amusement, the second conveys feelings of embarrassment. However, as a whole, the model seems generally reliable for larger pieces of texts. The emotion timelines for the Shakespeare plays, for example, seem to follow what would be expected of their respective genres.
CONCLUSION AND FUTURE DIRECTIONS
We hope our results and app will help refine natural language processing, provide more insight into literary emotion analysis, and progress SELbased development.
For future plans, we hope to utilize our application to help children improve their social and emotional literacy. To do this, we aim to create a fully functioning app with an easy and accessible UI. The app will introduce a robot called EmoBOT. The children will then have an opportunity to engage with EmoBot by inputting text for the bot to analyze the emotion. Goals for this app includes designing an icon for the robot, a summarizing tool, a synonym and word recommendation tool, and a feature that allows children to guess the main emotion displayed.

ACKNOWLEDGEMENTS
We would like to express our immense appreciation for our mentors, Raymond Zhang and Sukhamrit Singh, for introducing us to EmoRoBERTa, helping us develop our research, and giving us the opportunity to participate the program. We also want to thank the Stanford Compression Forum for giving us this opportunity to develop our research skills.
REFERENCES
- Social Collaborative for Academic and Emotional Learning (CASEL). What is the casel framework?, Oct 2021.
- Wallace foundation. Wallace Foundation, 2016.
- Roger P Weissberg, Joseph A Durlak, Celene E Domitrovich, and Thomas P Gullotta. Social and emotional learning: Past, present, and future. 2015.
- Joseph A Durlak, Roger P Weissberg, Allison B Dymnicki, Rebecca D Taylor, and Kriston B Schellinger. The impact of enhancing students’ social and emotional learning: A meta-analysis of school-based universal interventions. Child development, 82(1):405–432, 2011.
- Damon E Jones, Mark Greenberg, and Max Crowley. Early social-emotional functioning and public health: The relationship between kindergarten social competence and future wellness. American journal of public health, 105(11):2283–2290, 2015.
- Carolyn Saarni. The development of emotional competence. Guilford press, 1999.
- Leilani VisikoKnox-Johnson. The positive impacts of fairy tales for children. University of Hawaii at Hilo Hohonu, 14:77–81, 2016.
- Elefteria Beazidou, Kafenia Botsoglou, and Maria Vlachou. Promoting emotional knowledge: strategies that greek preschool teachers employ during book reading. Early Child Development and Care, 183(5):613– 626, 2013.
- Marion Dowling. Young Children s Personal, Social and Emotional Development. Sage, 2014.
- Antoine Louis. A brief history of natural language processing — part 1, Jul 2020.
- What is opinion mining why is it essential?, 2020.
- Rohan Kamath, Arpan Ghoshal, Sivaraman Eswaran, and Prasad B Honnavalli. Emoroberta: An enhanced emotion detection model using roberta. In IEEE International Conference on Electronics, Computing and Communication Technologies, 2022.
- Project gutenberg. Project Gutenberg, 2021.
- Time for kids, 2022.
- Time, 2022.
- Robyn Speer, Joshua Chin, Andrew Lin, Sara Jewett, and Lance Nathan. Luminosoinsight/wordfreq: v2.2, October 2018.