Virtual Reality for Emotional Response

Journal for High Schoolers


Sylvia Chin, Tiffany Liu, Aditya Rao


Virtual reality (VR) is a visionary platform that enables individuals to immerse themselves into simulated experiences. Thus far, VR simulation has been widely applied for entertainment and educational purposes. However, due to recent constraints caused by COVID-19, we hoped to explore its potential applications in health and wellness to make mental health support more accessible. Our research studied how VR experiences can better engage and influence a user’s mood, as opposed to a video’s detached 2D experience. Identifying the advantages and strengths of its effects on emotions can lead to health development programs such as proactive therapy and meditation.

Our team created two otherwise identical experiences: a VR 360º and a 2D video. Prior to watching the respective experiences, study participants were given a series of questions to quantify the levels at which they were feeling certain emotions at the time. We utilized a between subject experimental design with a control group of 20 users who experienced the 2D video and a variable group of 20 users who experienced the 360º VR experience. Both experiences were designed to evoke calming emotions using visuals, sound effects, and music. Afterwards, the users were once again asked to answer the same questions from the beginning of the experiment to evaluate if the experience had successfully shifted their moods. After the study, we analyzed video recordings the participants took of themselves during the experiment in order to determine if machine learning models could accurately detect their emotions.

1. Background

Virtual reality aims to transform an individual’s sighted environment by controlling their full vision scope. Past studies of “mood induction” have shown that static audio and visual elements in film can affect moods [1]. As VR lenses like the Google Cardboard become more prevalent and accessible, there is a broadening field for simulated mental health treatment and entertainment.

Virtual Reality Exposure Therapy (VRET) is an increasingly popular tool for therapists and doctors to use on patients suffering from physical or emotional pain. Through VRET, patients can be treated from the comfort of their homes without a practitioner physically present [2]. Our project aimed to support the development of VRET by exploring the emotional responses of people to VR technology.

2. Materials and Methods

2.1 VR and 2D Experiences

We filmed the video experiences with a Rylo 360º camera mounted to a drone, edited in auditory supplements through iMovie, and uploaded the videos to Youtube. The video depicted a flyover above a lagoon at sunset and aimed to evoke pleasant and calm emotions. This content was the same for both study groups, but Group A’s video was formatted as a regular 2D Youtube video and Group B’s video was a 360º Youtube video. Study subjects in Group B were provided Google Cardboards for their experience and asked to use headphones.

2.2 Survey 

Each participant was given a survey to quantify how much they were experiencing eight different emotions or moods at the time of the survey. These moods included happy, sad, peaceful, calm, tired, anxious, excited, and alert. The participants were asked to self-identify their emotional state in these eight moods on a scale of 1 to 5, with 1 being the most disagreement and 5 being the most agreement. They did this survey before and after watching the study video. There were several other questions on the survey regarding demographics and preferences to mislead the participants about the purpose of the experiment. The survey included instructions for navigating to either the video or the virtual reality headset.

2.3 Machine Learning Algorithm for Detecting Emotions

Participants were instructed to record videos of themselves, specifically their faces, throughout the entire duration of the survey. We found a pre-trained facial recognition model from an open source Python machine learning code on GitHub and configured the code to analyze the recorded videos [3]. The facial recognition algorithm uses a deep convolutional neural network to classify emotions into seven categories: angry, disgusted, fearful, happy, sad, surprised, and neutral. We estimated the accuracy of the facial recognition algorithm by comparing the emotions that the algorithm detected from the videos with the survey results.

3. Results

Figure 1:

Figure 2:

Figure 3:

Figure 4:

4. Conclusion

4.1 Survey

Figures 1 and 2 show a comparison between the before and after answers of both study groups. Figure 3 was derived from these graphs, showing the mood percentage changes in each group. The results support the hypothesis that the VR experience shifted more moods compared to the 2D experience. The direction of mood shifts were the same between the control and variable groups for 7 out of 8 moods. The most significant difference between the emotional shifts of the two groups was in calmness, thus underscoring the potential for VR in meditation. There was the greatest percentage change for anxiety decrease among all moods in both groups, which suggests that the experience likely affected anxiety the most. In a free response question asking respondents to describe their thoughts and feelings, 12 out of 20 subjects in the 2D video group used the word “video” in their response and 3 used the word “experience,” whereas the responses were flipped for the VR group. Ultimately, these findings support the notion that users found VR to be more real and immersive than 2D video.

4.2 Machine Learning Algorithm for Detecting Emotions

Facial Recognition accurately detected the participants’ emotions 52% of the time. The algorithm was 17% more effective post-experience in comparison to pre-experience. This shows that the video experiences were effective in changing moods, since the facial recognition had a greater success rate after the participants watched the videos. The algorithm could not make predictions for participants wearing the Google Cardboards in portions of the recorded videos, likely contributing to some of the inaccuracies. Camera quality, lighting, and unique neutral expressions of individuals could also be sources of error. Nevertheless, given a limited sample size and a relatively mild experience, a 52% success rate is a good indicator that facial recognition has great potential in determining moods.

5. Future Directions

After studying the short-term shifts in emotions discovered through this project, we hope to explore VR’s potential for prolonged mood stability and positive improvement. If granted more time and testing equipment, we could create more complex and customized scenes for users, which may lead to more generalizable results. There are several factors such as cultural background, age, and time of day that could be more thoroughly studied. Based on these factors, we could tailor the VR experience to certain individuals in order to further engage and enhance their experiences. Additionally, our experimentation with machine learning facial interpretation yielded good results, and its promise may broaden the possibilities to capture precise data and analyze the effects of VR in real time while a VR headset is worn. It would also be interesting to examine VR’s potential for treating a specific type of physical or mental illness. 

6. Acknowledgements

We would like to acknowledge founding director Professor Tsachy Weissman of the Stanford Electrical Engineering department for a profound look into research, the humanities, and endless opportunity. Special thanks to Stanford undergraduate senior Rachel Carey for mentoring us throughout the entire process. Another thank you to Devon Baur for her valuable insight on analysis and experimental design. We would also like to thank Cindy Nguyen and Suzanne Marie Sims for ordering and delivering the materials that made our experiment possible. Finally, a big thank you to our fellow SHTEM students, family members, and friends who provided data for our research.

7. References

[1] Fernández-Aguilar, Luz, et al. “How Effective Are Films in Inducing Positive and Negative Emotional States? A Meta-Analysis.” PLOS ONE, Public Library of Science,

[2] Charles, Nduka. “Emotion Sensing In VR: How VR Is Being Used To Benefit Mental Health.” VRFocus, 29 Sept. 2017,

[3] Dwivedi, Priya. “Priya-Dwivedi/face_and_emotion_detection.” GitHub,

[4] Baños, Rosa María, et al. “Changing Induced Moods Via Virtual Reality.” SpringerLink, Springer, Berlin, Heidelberg, 18 May 2006,

[5] Droit-Volet, Sylvie, et al. “Emotion and Time Perception: Effects of Film-Induced Mood.” Frontiers in Integrative Neuroscience, Frontiers Research Foundation, 9 Aug. 2011,

[6] Senson, Alex. “Virtual Reality Therapy: Treating The Global Mental Health Crisis.” TechCrunch, TechCrunch, 6 Jan. 2016,

Understanding COVID-19 Through Sentiment Analysis on Twitter and Economic Data

Journal for High Schoolers


Cecilia Quan*, Noemi Chulo*, and Parth Amin (* indicates equal contribution)


We trained a sentiment analysis bot using machine learning and Twitter data to classify tweets as expressing positive, negative, or neutral sentiments toward COVID-19 safety measures such as mask wearing and social distancing. We then compared data obtained from this bot to both economic data and COVID-19 case count data to better understand the interplay between social mindsets, consumer spending, and disease spread.



The COVID-19 pandemic has caused mass social unrest across the United States. The safety procedures (e.g., mask wearing and social distancing) that have been strongly recommended by the Centers for Disease Control and Prevention (CDC) to constrain the spread of the virus have proven to be controversial. By analyzing publicly available Twitter data via the Twitter API, we sought to better understand 1) the usage and attitudes towards these procedures over time in the US, and 2) their effect on case count growth. Using tweets containing the keywords “mask”, “masks”, or  “social distancing”, we trained a machine learning-based sentiment analysis bot to determine whether a tweet expresses positive, negative, or neutral sentiments regarding COVID-19 related safety measures. Specifically, we used a combination of the Naive Bayes and Logistic Regression algorithms to train the bot. We then used our bot to automatically classify thousands of tweets into each of these categories and observe how public sentiments have changed over time since the beginning of the pandemic. Finally, we analyzed government economic data related to consumer spending in brick-and-mortar locations (such as restaurants and retail stores) to see if this data had any correlation with the sentiment data and case counts from the pandemic. The case count and death count datasets were obtained from the World Health Organization (WHO). 



Sentiment analysis is a branch of computer science that attempts to identify sentiment and emotion from natural language input. There are a multitude of ways to accomplish this, but most methods fall into two main branches: machine-learning-based sentiment analysis, and dictionary approaches to sentiment analysis. 

Machine-learning-based sentiment analysis refers to a sub-field of artificial intelligence that aims to understand human emotion in natural language expression in an automated fashion. By training a bot with a classification algorithm and training data consisting of text strings and then their corresponding labels, the bot can learn to classify text on its own with some level of accuracy. Although using sentiment analysis on tweets comes with many limitations, we aim to show that its usage will give us a better perspective of both public opinion on COVID safety procedures, and how these opinions may influence the actions of others. 

Dictionary approaches are much more simplistic, and essentially work by maintaining a large database of words with certain associations (similar to any dictionary for natural language, but encoded for computer purposes), and then classifying a string of text by referring to the dictionary’s classification of the words within it. Compared to machine-learning based sentiment analysis, this system does not learn as it reads more data, cannot be trained, and is unable to understand any word that is not within its dictionary. 

Data and Methods


This section describes the data collection and labeling process, as well as the details of our machine learning system. 

  1. Data Collection and Categorization from Twitter   


To create training data for our bot, we had to obtain a large amount of tweets with our desired keywords (“mask”, “masks”, “social distancing”). To do this, we used the free extension of Twitter Developers, known as Sandbox. This allows us to fetch up to 5k historical tweets (tweets from longer than 7 days ago) per month, but unfortunately historical tweets are truncated (after 140 characters, the tweet is cut off). Instead of collecting historical tweets for our training data, we used the “Stream” function which allowed us to obtain real time tweets with no monthly limit. These were not truncated and came through July, when we were doing most of our classifications. We collected 1297 of these and hand labeled each of them into neutral (DIS), positive (POS), or negative (NEG) categories. An additional 87 were collected but couldn’t be used because they were in the wrong language, or did not have a keyword. We hypothesize that some of the returned tweets are actually retweets of tweets containing a keyword, but do not have the keywords themselves. This is a potential issue we face in our final results. 

Although sorting tweets into negative, positive, and neutral categories seems like a simple thing to do, it is extremely subjective and difficult to judge. Many of the tweets analyzed were entirely incoherent and self contradictory. At the beginning we believed that if a tweet was in favor of social distancing, it would also be in favor of masks, and vice versa. This was not always the case, so deciding where a tweet would go was often subjective. Tweets also tended to follow political orientation, but not as frequently as we expected before we began. You can find a full list of the tweets we used for training and testing along with their classifications below the Appendix. If you notice places where you think we misjudged, we’d love to discuss this with you so we can further increase the objectivity of our sentiment analysis bot. 

NEG (Negative, as in against masks or social distancing)

  1. General negative language/phrases/feelings surrounding protective masks and/or social distancing to prevent the spread of COVID 19
  2. Indication that masks/social distancing are part of a conspiracy theory/government control
  3. Blatantly state they do not wear a mask (for any reason)
  4. Indication that a group of people does not have to wear a mask (even if the person themself is not within that group)*
  5. Proposing that masks lower oxygen intake 
  6. Indication that masks “don’t work”
  7. Proposing rebellions against mask wearing, or “cheating” the system
  8. Indicating “it makes no sense to do X because Y disease was much worse.”
  9. Wanting to open schools AND does not specifically suggest protective measures
  10. Apathy regarding masks or social distancing
  11. Against general mask wearing/social distancing mandates*
  12. Arguing against/ making a negative statement towards something or someone that promotes social distancing or masks
  13. Stating that coronavirus does not exist/ is exaggerated to justify not following safety guidelines
  14. Proposing that the best way to beat coronavirus is to “build up immunity” by defying public health guidelines. 
  15. Makes fun of/ is clearly against people or things that support mask wearing/ social distancing

POS (Positive, as in in favor of masks or social distancing)

  1. General positive language/phrases/feelings surrounding protective masks and/or social distancing to prevent the spread of COVID 19
  2. Implying that masks/social distancing work
  3. Mentioning that they wear a mask/social distance
  4. Arguing against/ making a negative statement towards something or someone that does not promote social distancing or masks
  5. Advertising masks (handmade or otherwise)
  6. Stating that coronavirus is not something trivial and/or should be feared. 
  7. Arguing that reasonable mandates are justified
  8. Encouraging others to follow guidelines

DIS (Discounted/Neutral, as in not clearly expressing a meaningful opinion)

  1. Neutral language/phrases/feelings surrounding protective masks for COVID 19
  2. Contradicting statements to the point the writer’s opinion is somewhat indiscernible, and the tweet does not blatantly conform to a specific rule in POS or NEG that would overwrite that issue. *
  3. People saying they want to understand another side (unclear where their actual opinions are)
  4. Calling people hypocrites for not wearing a mask*
  5. Not referring to masks in the context of COVID 19*
  6. Claiming that they are against a law/mandate that is ridiculous or unimaginable without further indication of their position.*

The full list of clarifications for rules can be found in the Appendix at the end of the paper. Rules that require clarifications are denoted by *.

Prior to the data labeling process, we assumed the issue of categorization would be more black and white than it really is. Although our philosophy towards categorization has evolved over our research, at the moment we base categorization on how the user’s words will affect others who read their tweet instead of just analyzing the writer’s opinion on their own. This makes sense for two reasons:

First is the volume of data that can be collected on this premise, and accuracy of analysis. Most of my tweets would have to be thrown out as neutral based on guidelines that only concern the users’ own assumed opinion, and in the end our data probably wouldn’t be a good reflection of the actual negative vs. positive bias. 

Second is the fact that basing classification on how the tweet affects the mindsets of others who see it addresses our thesis better. It shows better how social media opinions on coronavirus affect real life circumstances, not the portrayal of real life circumstances in social media. The difference is that a user’s opinions are a symptom of a situation, whereas the effect of their opinions on others may collectively cause a situation. We are looking for how social media may predict circumstances rather than how circumstances predict the state of social media. 

An example of this is the following commercial tweet: 

“Pssst! I got a secret. Get at ADDITIONAL 20% OFF face masks that are already on sale!!! That’s around $6 a mask. Only if you buy 4 or more! Sale won’t last long. BUY NOW!!!

RT plz @XMenFilms @xmentas @WolverSteve @DailyXMenFacts #XMen #facemasksforsale

Although it would make sense to assume that someone selling masks is pro-mask, we have no evidence of this whatsoever. If we were to adopt a philosophy of sorting tweets based on an individual’s opinion, we would run the risk of being forced to classify a majority of our tweets into neutral categories and therefore have a skewed dataset. Instead, we consider this as a positive tweet by following our more holistic philosophy of connecting this tweet to its likely effect on those who read it. The specific tweet above promotes a societal acceptance and usage of masks, and people who read it will be affected by this philosophy. 

  1. Collecting Economic Data 


In order to analyze consumer spending patterns during the pandemic, we sought out data made available by the Bureau of Economic Analysis under the United States Department of Commerce. On their website are monthly reports of Personal Income and Outlays, which illustrate consumer earning, spending, and saving. Personal outlay is the sum of Personal Consumption Expenditure, personal interest payments, and personal current transfer payments. 

Personal outlay can also be calculated as the Personal Income minus the Personal Savings and Personal Current Taxes. This represents an overall track of how much consumers have spent within a month. Using the data Table 1 provided by the Personal Income and Outlay report of June 2020 [1], we graphed the Personal Outlays in billions of United States dollars against months on Google Spreadsheets (Figure 2). Since this is a monthly report with overall changes, there is only one data point for each month. Additionally, the dollars are seasonally adjusted by annual rates, which helps remove seasonal patterns that may affect the data. All the dollar amounts in the following figures are seasonally adjusted by annual rates, as well as the index numbers. We chose to include 4 months of data, as the coronavirus 

pandemic started to impact the United States in mid-March, so the data of March is skewed to both the pre-pandemic era and in the pandemic era. 

Figure 1: Personal Outlays (in billions of US dollars)

Within the Personal Outlays, the subtopic of Personal Consumption Expenditure (PCE), also known as Consumer Spending, is a more specific measure of spending on consumer goods and services. PCE  is the value of the goods and services purchased by, or on behalf of, “persons” who reside in the United States.Using the same June 2020 report from the BEA [1], we gathered the PCE in billions of dollars over the months February 2020 to June 2020. For the total amount, we utilized Table 1: Personal Income and Its Dispositions (Months), and for the changes between months, we used Table 3. Personal Income and Its Disposition, Change from Preceding Period (Months). PCE is divided between two sections, goods and services. Within goods, there are two further subtopics: durable and non-durable goods. The first graph is the total amount (Figure 3). 

Figure 2: Personal Consumption Expenditure (in billions of US dollars)

For more details into CPE, we looked at a variety of different products and collected their CPE in billions of USD. To find the data, we found the data in Excel Spreadsheets linked under the Underlying Details section of Interactive Data on the direct Consumer Spending [3] site page on the Bureau of Economic Analysis as SECTION 02: PERSONAL CONSUMPTION EXPENDITURE [2]. Through the Excel Spreadsheet, we accessed Table 2.4.4U. Price Indexes for Personal Consumption Expenditures by Type of Product, which is under the spreadsheet code U20404. From the spreadsheet, we chose a range of goods and services that had a variety of changes over the four months.

  1. Computer software and accessories
  2. Food and beverages purchased for off-premises consumption
  3. Food and nonalcoholic beverages purchased for off-premises consumption (4)
  4. Food purchased for off-premises consumption
  5. Personal care products
  6. Electricity and gas
  7. Public transportation
  8. Air transportation
  9. Live entertainment, excluding sports
  10. Food services and accommodations
  11. Food services
  12. Personal care and clothing services
  13. Personal care services
  14. Hairdressing salons and personal grooming establishments

Using the data in the spreadsheet, we collected the PCE of each good or service in billions of USD, and graphed it using Google Spreadsheets (Figure 4).

Figure 3: Detailed Price Consumption Expenditure (in billions of US dollars)

To go even more in depth on retail sales, we gathered data from the US Census Bureau [4] in their Advance Monthly Trade Report released in July. Using their customizable time series, we found the sales in millions of US dollars for the following: 

  1. Retail Trade and Food Services: U.S. Total
  2. Retail Trade: U.S. Total
  3. Grocery Stores: U.S. Total
  4. Health and Personal Care Stores: U.S. Total
  5. Clothing and Clothing Access. Stores: U.S. Total
  6. General Merchandise Stores: U.S. Total
  7. Department Stores: U.S. Total
  8. Nonstore Retailers: U.S. Total
  9. Food Services and Drinking Places: U.S. Total

Figure 4: Sales of Food and Retail Services (in millions of US dollars)

Another determinant associated with consumer spending is the PCE Price Index, which is a measure of the prices that people living in the United States, or those buying on their behalf, pay for goods and services, and reflects changes in consumer behavior. Utilizing the same June 2020 Personal Income and Outlays BEA report [1], we gathered the data from Table 9: Price Indexes for Personal Consumption Expenditures: Level and Percent Change from Preceding Period (Months). Using the percent change in index, we calculated the change based on the index in February, and graphed it across four months. 

Figure 5: Change in Price Consumption Expenditure Price Index 

As an additional correlation, we gathered the CPI, or Consumer Price Index and compared it with the PCE Price Index. The difference between the two indexes is that the CPI gathers data from consumers while the PCE Price Index is based on information from businesses. Moreover, CPI only tracks expenditures from all urban consumers while the PCE Price Index tracks spending from all households that purchase goods and services. See this resource by the BLS for more details on the differences [5].

For the CPI, we collected data from the Bureau of Labor Statistics, which is another bureau under the Department of Commerce. On the site page, CPI Databases, we accessed Tables of the series All Urban Consumers, which led us to the page, Archived Consumer Price Index Supplemental Files, where we accessed News Release Table 3 [6], which is Consumer Price Index for All Urban Consumers (CPI-U): U.S. city average, special aggregate indexes, June 2020. We chose the exact same expenditures as we had in the PCE Price Index: Services, Durables, and Non-Durables, and collected the seasonally adjusted percent change within the months March 2020 to June 2020. Durable goods are not for immediate consumption, and thus are purchased infrequently while non-durables are purchased on a frequent basis. Since there were three percentages each for between two months, we allocated the percentage to the latest month. Using the percent change in index, we calculated the change based on the index in February, and graphed it across three months. 

Figure 6: Change in Consumer Price Index

3) Machine Learning Methods to Classify Twitter Data


Once we had all our data prepared, we took to Wolfram Alpha to start creating our sorting bot. We decided to include two separate machine learning algorithms as a part of our bot to increase accuracy. Our first bot sorted neutral tweets out from binary (negative or positive tweets), while our second would sort decidedly binary tweets into negative or positive categories. The first bot was trained and tested on all the data we sorted, while the second was only trained and tested on the non-neutral sorted training and test data. This didn’t make a significant impact on the amount of data the second bot was trained on since neutral data only represented 21.14% of the sorted data.

Wolfram has many classifier algorithms available to take advantage of, but since most are designed to be trained on numerical data as opposed to language data, they can be flawed when used for NLP (Natural Language Processing) . Here are most of the options available in Alpha.

Percentage accuracy was one of our top priorities, but another important consideration was bias. We wanted to make sure that when our bot made a mistake, it wasn’t significantly more likely to make one sort of mistake than the other. This ended up ruling out some methods, because 100% of their errors were assuming test was “positive” when the label was “negative”. Note, we had anywhere from 302 to 374 pieces of test data depending on the type of test that was being run, so this was unlikely due to pure chance. We assume that these methods were not created for processing strings, and just guessed the highest probability option from the training data in every instance if the tested data were strings. This is an important reason to run different kinds of tests and analysis on bots besides just accuracy, because although these methods had high accuracy for our particular data, they were very unreliable. 

Another important consideration that we kept in mind was how neutral tweets tended to be sorted when they were mis-sorted. In this scenario, it’s much harder to assume an “ideal” rate. Neutral tweets didn’t only include tweets entirely unrelated to COVID-19, so what the ideal sorting ratio for them truly is is much more difficult to hypothesize. Given time constraints, in our experiment we made the assumption that ideally neutral tweets should be sorted equally into “positive” and “negative” categories if they weren’t sorted as neutral. Although this is a metric that is much more difficult to control for, we took this final predicted ratio into consideration after processing our data to be able to better predict a confidence ratio for each datapoint. 

We ended up using Naive Bayes for the neutral vs. binary sorter, and a combination of Naive Bayes and Logistic Regression for the positive vs. negative sorter. This was done by collecting the probabilities for each outcome within each algorithm and multiplying each probability by its respective ratio (.65 for Logistic Regression, .35 for Naive Bayes), adding them across algorithms, and choosing the largest. 

On their own, the respective accuracy ratios of the neutral sorter and binary sorter are estimated P1= 79.841% and P2= 72.093%. Total accuracy is more difficult to calculate, because our main interest isn’t sorting every tweet into the correct group, it’s obtaining the correct ratio of tweets that are in certain groups. We don’t currently have an estimation of the former accuracy, but we do know for our test group what the assumed vs. true proportions are, and the proportion of correctly sorted tweets. For the latter, we can just assume that this number is roughly equivalent to P1 * P2=0.5755977. In the future, we hope to improve this overall accuracy by refining the algorithms we use and implementing synonym-based data augmentation strategies. 

One more issue we’d like to mention is that although using Wolfram allowed us to complete this project in a timely manner with a wide variety of options available to us, Wolfram is a closed source software and therefore the information about how their algorithms work is somewhat obscure. When using Naive Bayes in Wolfram, we didn’t know whether the function would automatically reformat or clean data for us, and if it did what issues it might’ve encountered when analyzing unknown strings like hashtags. Although we may or may not continue using Wolfram for future extensions of this project, keeping these issues in mind might help guide logistical and practical decisions in the future.

In getting our final data we were somewhat limited by time constraints and available computing power. Since only one of us had access to Wolfram Alpha, and because we did our computations locally, we were only able to process 100 pieces of data per month, save for July which also used our pre-labeled data. 

Figure 7: Proportion of Tweets by Sentiment (keywords: mask, masks, social distancing)

We graphed the data above by month to be able to match the economic data better, and because we did not have enough data to show a day-by-day graph. We would take the data from every day in the month, classify it, and then take the proportion of the data that fit that classification out of all of the data in that month. Unfortunately, historical tweets are truncated, so our algorithm guessed tweet sentiment solely based on the first 140 characters of each tweet for every month before July. In the future, we hope to find a work-around for this issue.

A final issue to mention is that our training data was not equally distributed. About 55.3241% of our human-labeled data was positive, 23.071% negative, and 21.2963% neutral. This means that our machine learning algorithms may hover around these ratios regardless of true values, and means that our data may be more conservative in percent changes than it should be. We decided to keep these ratios because changing them to be equal would significantly decrease the amount of labeled data we could use to train the bot, but we hope that as we work on this in the future, we will have enough data such that we could have equal ratios without having to sacrifice much of our labeled data. 

Correlation Analysis Between Differing Data Forms


Figure 8: WHO Data on Confirmed Cases & Deaths in US 


Figures 1-6: As with all the economic data, we use publicly available data from the government through monthly reports; therefore we only have one datapoint for each month. This leaves only an estimate of the data between months. For example, the lowest dip shows it happened in April in Figure 6, but it does not show when in April. 

Figures 5 & 6: For the index data, we calculated the change of the index from the “original” index given in February (pre-pandemic), which means the y=0 line represents the February index. 

Figure 6: This graph, using data from the Bureau of Labor Statistics, only has three months, which is inconsistent with the rest of the economic data graphs. 

Figure 7: There are a number of reservations to be had with this graph and it should not be considered as fact. We hope to collect a higher volume of data, alongside more accurate evaluations for our future pursuits, so the data we have at the moment could be considered a ‘teaser’ for what is to come. Our machine learning software only has about a 75% accuracy rate and only 1796 pieces of data were used to create the graph due to data processing issues, alongside limits in usage for historical Twitter data. Not to mention, the algorithms we used do have considerable bias which could affect our data. Also, data is exhibited month-by-month because of a lack of data that would make day-by-day analysis jerky and confusing.

Figure 8: For pt.1 of fig.8, since WHO data for the US comes from the government report for the US, there is some discrepancy in the number and the curve due to collection errors, lack of testing, underreporting, misdiagnosis, and other issues. For death counts, unlike death counts for seasonal diseases like the common flu, are not representative of real-time. It often takes days to weeks to test the deceased, get the results, send the reports to the National Center for Health Statistics. This means that the data is often based on past weeks and are not entirely current. In addition, even if it were reported on time it more represents the situation of the nation two weeks ago as opposed to on that day, because it takes a while after getting the virus for any patient to be at risk of death. The following article goes more in depth about the issue:

Initial Observations


Fig 1: Throughout the data collection, positive tweets always seemed to be the majority, indicating that the majority of the US population is in support of mask usage and social distancing. This is supported by surveys of the general US population, which might indicate that representation of opinions on social media about this particular issue might be somewhat representative of the general population. 

Interestingly, the data seems to go through significant changes between April and May, and also June and July. Between April and May, and rising number of Twitter users seem to have positive opinions about mask wearing and social distancing, and not many neutral comments, which could indicate that more people are talking about COVID 19, that more people are making explicit opinions about masks and social distancing, or a combination of both during that interval of time. Between June and July, neutral rates stay somewhat constant but there’s a sharp increase in negative tweets and decrease in positives. From this, we might hypothesize that given the amount of times safety mandates have been in place, people are gradually starting to get more and more upset about mandates and are changing their opinions on it. 

Figure 2-3 & 4-5: These four figures represent the overall Consumer Spending, with Personal Outlays as the overall curve, and then CPE, detailed CPE, and sales of food and services. They all follow a similar dip in USD spent in April, which is succeeded by a slow growth back up. However, at the time of June, the numbers do not reach back to pre-COVID levels, especially considering the rate at which our consumer spending was growing before that. In the possible ways economic impact can play out due to a pandemic like this, our consumer spending may slowly yet surely return to normal growth rate, since there is no indication in our graphs of a quick growth spurt to catch up to what could have been our consumer spending levels. A possible reason that Consumer Spending is recovering while COVID-19 is spreading is the prevalence of e-commerce, especially when it comes to retail. Even though unemployment is still an issue during these times, the stimulus check combined with online shopping and delivery may have helped spur spending in the economy. Another possible option is that many counties and states have started reopening during May and June, allowing more in-person spending. 

Figure 6 & 7: Unlike the other graphs, the price indexes see the dip in May instead of April. Since the price indexes are a measure of inflation, it could be a showing a delayed effect of money spent on the prices of the goods and services. 

Fig 8: Both case counts and death counts rise significantly between April and May, and have somewhat bell-curve-like shapes during the interval. Case counts spike between July and August, while death counts begin to slowly increase day by day in the same interval, lagging behind case counts as expected. 

Between Figures

April-May: Between April and May most graphs seem to change dramatically. Figure 1 sees an increase in positive tweets and a decrease in neutral tweets. Fig 2-7 sees a dip in most forms of in-person sales like restaurants or air travel. Fig 8 sees a local maximum in case counts and deaths. Because Fig 1-7 are monthly, it is more difficult to make general assumptions about associations between the data types: For example we cannot say whether twitter sentiments imply case counts or vice versa, we can simply say that they may be correlated. Regardless of implications, this does give us a hint of how social changes might be able to affect case counts, or how the reverse is true. 

June-August: Although little government economic data about this timeframe has been posted yet, we can make some assumptions and inferences about the correlations between twitter sentiment data and case count data. Case counts rise dramatically during this time and see a local and global peak. Meanwhile, twitter data indicates that positive sentiment for precautionary measures is dramatically falling (between late June and late July), whilst negative sentiment rises to its peak, about 380.67% higher than the highest previous point, which takes place in April. Upon grabbing further data for twitter, and showing data on a day by day basis instead of month by month, we might find that the dramatic maximum that takes place mid July for case counts can be explained by a rapidly diminishing concern for COVID safety precautions. For now, we can only speculate that this may be the case, but it would be very interesting and telling if it were to be true. 

Possible Meaning


We’re intrigued by the idea that economic data and/or Twitter sentiment data may be able to be used to predict upcoming case rates. Although the idea that economic data can be used to predict spread of disease is not new, the idea that sentiment from social media can be used to predict (and possibly help prevent) diseases like this from spreading in the future is a new and revolutionary idea that gives reason to both hold social media companies more accountable, and to take these trends much more seriously than we have in the past. Although we’re far from making those conclusions, simply showing the possibility is important to us, and hopefully with more data we will be able to better understand these connections.

Conclusions and Future Directions


One way to strengthen our understanding of the multifaceted impact of the coronavirus is to combine qualitative and quantitative data. In this case, public opinion on social distancing measures is difficult to visualize; however, by using sentiment analysis on Twitter, we can better understand public sentiment. Since the interaction between the pandemic and society is so complex, we further explored the connection between more quantitative impacts such as consumer spending and COVID-19 data. For some of the economic data, we found similar dips between the two, especially in April. We also found that positive public sentiment decreased while consumer spending continued to rise. This type of analysis is applicable to real world problems, and, through deeper understanding, can lead to policy change and better preparation for future pandemics. 

Mentions for the Future

Economic Pathway:

A future direction would be to analyze the change in growth, which is the derivative of the economic data curves, and possibly correlate that with the derivative of the sentiment analysis data. Although consumer spending may be rising in dollar amounts, there could be indication that the rise is slowing, and such rate-of-change information could offer a deeper layer of analysis. 

Twitter Pathway:

We plan to increase accuracy of sentiment analysis bots through increase in data intake, this may include synonym-based data augmentation, and using computer generated lists with human corrections to increase speed of sorting. We also plan to collect more data so that data could be shown on a day by day basis, and labeling data next to a timeline of political events that might have led to drastic changes in sentiment. We also plan to change the formatting of our algorithms, and possibly migrate to a new coding language to have a better and wider control over our algorithms. This along with stronger data processing that might include word to word associations and grammatical constructs will allow us to have a better and more holistic bot that can tell us more about the data itself. We plan on getting tweet volume data for our keywords and using this to estimate the amount of tweets in each sentiment per day. Finally, if we accomplish our previous goals, we plan to analyze more word based analytics to better understand what people are talking about in the context of COVID-19 safety procedures, and what this can tell us about the spread of misinformation regarding COVID-19.

Time Delay: 

Another way to analyze the relationship between the economic and sentiment analysis data would be to look at the time delay between their curves. For example, we might observe that consumer spending increases a certain amount of time following a rise in positive public sentiment. 

Our results show this very relationship between the peaks of the respective graphs, but with a closer quantitative look, we could figure out if the time delay is consistent or not. This would be very interesting and may lead to us finding more ways to help prevent diseases from spreading as rapidly in the future.



A huge thank you to our graduate student mentors Surin and Leighton in this project for always being there for us to help answer any questions we had, putting us in contact with people to give us advice, helping format the paper, and generally always being there for us to give valuable advice and support us along the way. 

Another thank you to our overseeing professor, Ayfer Ozgur. Her support helped guide us through this project and helped us feel motivated through every step. 

Thank you to Huseyin Inan for giving us great advice on our project and supporting us.

I (Noemi) also wanted to thank Megan Davis for helping me through learning Wolfram so I was able to do this project. 

Appendix: Clarifications/Mentions for Rules for Sorting Tweets


Curse words and retweets in tweets are censored like so: kjnwef

General Mention 1: Sarcasm negates all rules. For example, if someone was advertising masks (POS 5) but doing it in a sarcastic or joking way with fake/non-existent masks, this would be considered a negative. Negating a rule also would likely categorize a tweet in the category opposite of the original rule that was negated. This doesn’t always apply for rules in DIS.

General Mention 2: If a person clearly is attempting to express one opinion or another, but is not using the right words, they will still be sorted under that category assuming that they misunderstood the meaning of their words. An example tweet to illustrate our point:

That shit be pissing me off. People putting other people in danger because of their negligence.

I don’t even know you but GIRL FUCK YOU!

And fuck anyone who refuses to practice herd immunity and wear mask to protect their loved ones and others.

This shit is NOT A GAME.

This tweet follows POS 3, POS 4 and POS 6. The only issue is that the statement “practice herd immunity” could imply that the user was attempting to convey NEG 16. In this case, we can assume that by “herd immunity” the user meant “social distancing” based on everything else she said. Of course, making assumptions always leaves room for error, but human languages require these assumptions of us if we are to semi-accurately predict collective intended meaning. 

General Mention 3: If a person’s tweet fits into one of the neutral rules, but also has qualities that would fit into negative or positive, it will almost always be sorted into whichever binary (non neutral) category it shares a rule with. There are a few exceptions, but in general negative/positive rules will overwrite neutral rules unless that neutral rule encompasses the binary rule. See the following example of an exception where neutral overwrites binary:

Yep. #MaskUp America. Unless of course you’re the exalted Dr. Fauci and the Mrs. I’m guessing just like protesting/looting, there’s an invisible shield that protects you when at a baseball game. #Hypocrite

This rule contains both (POS 8) and sarcasm, which might normally put it in the NEG category because of sarcasm negating previous rules. However, since this entire phrase is encompassed by the person attempting to show that a certain person is a hypocrite, we choose to put it in neutral. 

NEG 4: This includes people who state “people with asthma should not have to wear masks”, “children should not wear masks”, “STRONG, NOT-SICK PEOPLE SHOULDN’T WEAR MASKS”, or otherwise indicate that certain groups are exempt from mask wearing. We as researchers acknowledge that there are medical conditions that would warrant not wearing a mask, but these mainly include severe skin conditions and severe particle allergies that would likely bar the patient from going outside at all during this time. For this reason, stating that certain people should not have to wear a mask would likely put a tweet in the NEG category and overwrite most positive statements. 

NEG 11: This mainly refers to people that do not necessarily explicitly mention that they are against masks, but that they are against reasonable mandates for masks. (ex. Required to wear a mask to enter a store, or receiving a fine for not wearing masks) This is often characterized by a user stating that a mandate would violate civil liberties, or otherwise violate a human right of theirs. This does not include people who state that they are against mandates to wear a mask/social distance in a domestic setting, or claim to be against other ridiculous or unrealistic mandates. People who state the above will probably instead be sorted into a neutral category. 

DIS 2: If there is little for no way for us to discern a users’ opinion with even a small degree of accuracy, we will likely choose to discard it. This is because even if the user clearly has some opinion, by labeling it we risk misrepresenting them, and because other people who have read the tweet may not have understood it either, making it less important to us in the first place. 

  1. “@smogmonster89 @savesnine1 @sainsburys Yet you ID people who abuse you? People in their late 20s who say “are you taking the piss?” Becuase it’s the law; if wearing a mask is also meant to be the law surely it’s same as ID’ing people, just my thoughts”
  1. “More credible then u or trump, but u pitch only to his gullible base. So w/out context, u bring up mask issue I know trump did everything perfect,on face value thats moronic Ur blathering nonsense isnt helping reelect. Thinking repub @JesseBWatters @BillOReilly @newtgingrich”

Above are two examples of tweets that were labeled as neutral, despite being very clearly opinionated. They appear to be self contradictory, could be argued to be representing either side, and have inconsistent and confusing grammar. To some extent, it’s unclear whether or not they’re even talking about masks. For tweets that are entirely incomprehensible, it’s better to just leave them out of the main dataset. 

DIS 4: This particular issue came about when there was controversy around Anthony Fauci for not wearing a mask or social distancing during certain parts of a baseball game. The issue being, many would criticize him for not wearing a mask, but not make their opinions clear. One would think that these tweets would follow POS 4, but the issue was that some users were criticizing Fauci solely because he was not wearing a mask, whilst others were really only criticizing him for being a hypocrite, but not necessarily for not wearing a mask. I noticed that in general this rule was true, when someone was being criticized for something for the reason that they were being hypocritical, it didn’t necessarily convey the users’ opinion on what the person did on its own, sometimes it just conveyed their opinion of the person themself. For this reason, tweets under this category that do not explicitly follow some other rule that would make them positive or negative will often be labeled as neutral. 

Anthony Fauci: It’s ‘Mischievous’ to Criticize Me Taking Off Mask in Baseball Stands:We all need to understand when DEM/Socialist and theGODS of the Media make the laws they are making those laws for you and me not for them they are above the law

The above tweet is very obviously anti-Fauci, and we might assume that they are anti-mask, but because the article they linked was only explicitly “anti-Fauci” and didn’t mention their take on masks, and because the user themself did not mention their take on masks, we have to put them in the neutral category. I considered putting this tweet in the negative category because the last part of their tweet seems to indicate NEG 6 by implying that “mask rules” only apply to people who aren’t democrats, but because we technically can’t differentiate between them being angry about their belief that the rules are only applied to them, or anger that necessary rules don’t apply to others, we cannot categorize it. 

DIS 5: This category just includes tweets that aren’t talking about masks in the context of COVID 19. For example, they may be referencing Bat Man’s mask, masking emotions, or other non-medical references to masks. Below is my favorite example of this rule. 

@REMills2 I’m an abusive pageant mom. Every day I shake him by his tiny little shoulders and say “ONLY WINNERS IN MY HOUSE” and he sheds a single glistening anime tear, knowing the mask of fame must continue to hide his deep dissatisfaction and emptiness. Alexa play Lucky by Britney Spears

DIS 6: We came across a few tweets that were claiming “mandates requiring you to wear a mask in your own home” was something they were completely against. It makes sense to be completely against such a rule if it existed, regardless of position of masks, so if the tweet gave no other indication of their position on the issue it was sorted into this category. 

Twitter Data




[1] Personal Income and Outlay, June 2020, and Annual Update, Bureau of Economic Analysis, July 31, 2020

[2] SECTION 02: PERSONAL CONSUMPTION EXPENDITURE, Underlying Details, Consumer Spending, Bureau of Economic Analysis

[3] Consumer Spending, Bureau of Economic Analysis, July 31, 2020

[4] Advance Monthly Retail Trade Report, US Census Bureau, July 16, 2020

[5] Differences between Consumer Price Index and Personal Consumption Price Index, Bureau of Economic Analysis, May 2011

[6] News Release Table 3 June 2020, Archived Consumer Price Index Supplemental Files, Bureau of Labor Statistics, June 2020

The Price of Latency in Financial Exchanges

Journal for High Schoolers


Fadi Kidess, Madhav Puri, Shiv Trivedi, Vig Sachidananda


Modern financial exchanges facilitate the trading of trillions of dollars of assets per day [1] and process all incoming orders from market participants at one location, a central matching engine. The time it takes (latency) for a market participant to reach this matching engine can vary drastically with some participants paying for lower latency by co-locating next to or within an exchange. Lowering latency leads to faster access to data and order placement. High frequency trading (HFT) heavily relies on low latency and the zero-sum game of the stock market implies that this advantage is likely to the detriment of all other participants.

Building on work that has proposed mechanisms for mitigating the effects of latency inequality present in current exchanges, we develop an exchange testbed to both simulate and quantify the effect of latency on trading algorithms. By doing so, we examine the consequences of the current high frequency arms race for lower latency and aim to build upon this research to inform future policies on how auctions in matching engines can be designed to provide equality of opportunity for financial market participants.


High frequency trading firms have a few options for reducing the time it takes them to reach an exchange. In order to communicate quickly between exchanges these firms have utilized optimized fiber optic networks [2] and more recently microwave links [3]. Within an exchange, trading algorithms can be co-located and housed within the same datacenter as the central matching engine [4].

Recent work has examined the potential negative implications of the latency inequality created by these tools. Perhaps the most famous criticism is brought to light in Michael Lewis’ Flash Boys which recounts the practice of front-running which enables HFT to observe and execute slower market participant orders before they are able to themselves. Additionally, quote sniping, an arbitrage across markets or multiple highly correlated stock symbols, is enabled through purchased low latency [5]. 

Given the broad implications of latency inequality in the stock market, several proposals have been made to design an exchange that favors no class of participants in particular. IEX is an exchange that enforces a “speed bump” on all orders to lessen the relative difference in latency across participants [6]. Additionally, block auctions have been suggested as an alternative to the continuous limit order books that exchanges currently implement [5]. Lastly, counterfactual simulators for trading algorithms exist which allow for researching the effects of trading algorithms with realistic market order flow [7].

Methods and Materials

In our work we develop trading simulations on the CloudEx trading platform. Since the platform is able to let us modify both trading behavior and infrastructure we can leverage such a tool to understand the interplay between latency and trading algorithms.

For all experiments, we set up an exchange with 1 matching engine, 3 gateways and 9 traders (each gateway serves 3 traders). We index the gateways and traders and we denote traders served by Gateway 1 as traders 1, 2, and 3. We use a FIFO sequencer for our matching engine which results in orders not being held by a resequencing buffer once they reach the matching engine.

Each trader trades using a random strategy in which limit prices and actions (buy or sell) are chosen randomly. When starting an experiment, each trader is given a random seed from which they will generate buy or sell orders. In our setup, we pass the same random seed to a trader every experiment and we pass different random seeds to the 9 traders in our experiment testbed. This setup allows us to generate sequences of orders for each trader that are unique from each other and deterministic over experiment trials. Furthermore, since these traders are not reacting to price, this setup allows us to investigate the effects of latency on order execution without the confounding effects of latency applied to the receipt of market data.

For all experiments, traders start with $200k of cash and 10k shares of one asset. All traders trade on a single asset and over the course of the experiment they place 100,000 orders. 

Experiment Type 1All gateways equidistant
Experiment Type 2Gateway 1 has 1ms latency(Gateway 2 and 3 unchanged)
Experiment Type 3Gateway 1 has 5ms latency(Gateway 2 and 3 unchanged)

We run three types of experiments in our testbed to better understand the effect of latency on our traders. In the first experiment type (Type 1), all of our gateways are equidistant to the matching engine. In the second type, gateway 1 has a 1 millisecond latency added to it’s gateway to matching engine transit time through the use of a hold and release buffer implemented on the gateway. In the last experiment type, Gateway 1 has a 5 millisecond latency added to orders in transit to the matching engine.

When running experiments, we collect a variety of statistics on both the systems running the exchange and the trading performance of each trader. On the systems side, we collect timestamps for the transit times between the gateway and matching engine. For each trader, we collect timestamped trades that were executed and the details of each trade. In the following section we present the results of analysis on this data.


We have set up three different scenarios for experimentation using a variety of simulated traders. In the first experiment type, all the gateways are an equal distance to the matching engine. In the second experiment type, the first gateway has an added 1ms latency and is simulated to be farther away from the matching engine than the others. For the third and last experiment type, the first gateway now has an added 5ms latency and is the furthest away. There are three total gateways which each serve three traders. All the traders are executing a “buy low sell high” algorithm, allowing us to compare them against each other in a consistent manner.

Figures 1-3 show this added latency relative to the other gateways for each experiment. There is a noticeable difference between the peaks of each gateway due to the added latencies causing the peak of gateway 1 to shift to the right more with each experiment type. The peaks show the greatest probability at which an order reaches the matching engine at a certain delay.

Figure 1 (Experiment Type 1, all gateways equidistant)

Figure 2 (Experiment Type 2, gateway 1 has an added 1ms latency)

Figure 3 (Experiment Type 3, gateway 1 has an added 5ms latency)

Figure 4 shows the rate of return of traders by experiment type, where the first three traders are connected to gateway 1, and the next three traders are connected to gateway 2, and the last three traders are connected to gateway 3. We observe a negative impact on the rate of return for traders 1, 2 and 3 for experiment types 2 and 3 in which additional latency was imposed, while other traders benefited from the latency disadvantage imposed on gateway 1. Figure 5 visualizes analogous data to figure 4 as it plots the rate of return of each gateway by experiment type. This plot follows up the one in figure 4 by clearly showing the inequity in the trading exchange, since clients trading at gateway number one are at a clear disadvantage, while those at the other gateways are able to prosper from said disadvantage. 

Figure 6 shows the effect of latency on the number of trades executed. Higher latency will result in lower amounts of orders executed since there are lower amounts of orders reaching the matching engine at any given time, which shows another way latency can act as a disadvantage to some trades. Furthermore, the number of shares traded is also greatly affected, which is evident in figure 7, since it is dependent on the number of trades executed.

From the data we gathered in figures 8 and 9, we determine the effects of latency on buy prices, where the average buy price for gateway number 1 increases as latency is added, and more trades are unavailable due to the inequitable latency exploits leading to some traders with lower latency getting their orders matched earlier (which is evident in the figure 9. Where each data point corresponds to a trade order).

However we didn’t reach a direct conclusion with the sell price plots in our experiments, and more tests are needed to verify any correlation between latency and the average sell price. Our results for the sell price data are shown in figures 10 and 11.

At experiment types 2 and 3, the return for gateway 1 dropped significantly below gateways 2 and 3, evidently showing the inequitable arbitrage, which is visualized in figures 12, 13 and 14 (where each figure corresponds to the return over number of shares for each experiment type)


During our investigation, we have shown the impact of added latency in a controlled trading exchange environment on order execution. We have verified this through our results which satisfy that traders experienced ~50% fewer trades executed with 5 milliseconds of added latency costing them ~10% on their rate of return. We also acknowledged the importance of understanding the cost of latency to satisfy fairness in a universal trading environment, which is done by keeping latency considerations in mind when designing such market exchanges. By visualizing the effects of latency on many trading factors, the inequitable arbitrage existing in trading exchanges are clearly shown.

Future Directions

Future work would be helpful to clarify the impacts of specific latency on the rest of a market as well as with different exchange matching algorithms. Experimenting with larger simulations that have more clients and gateways would be more realistic. Different matching algorithms to test with include auctions and IEX’s “speed bump” algorithm. Experiments with different resequencing buffer delay values in the order’s upstream path, allowing the matching engine to sort orders based on their gateway timestamps before the orders are processed may also prove to provide valuable data and offer new insights. Though a standard buy low sell high algorithm was used in our trials for consistency, testing with other algorithms may yield interesting and unpredictable effects as a result of different amounts of latency. The influence of efficient technologies such as the latest (at time of writing) fifth generation (5G) telecommunications standard should also be analyzed to gain more relevant insights.


[1] Budish, E., Cramton, P., & Shim, J. (2015). The High-Frequency Trading Arms Race: Frequent Batch Auctions as a Market Design Response*. The Quarterly Journal of Economics, 130(4), 1547-1621. doi:10.1093/qje/qjv027

[2] Co-Location (CoLo) – NASDAQ. (n.d.). Retrieved August 07, 2020, from

[3] Daily Market Summary. (n.d.). Retrieved August 07, 2020, from

[4] Hasbrouck, J., & Saar, G. (2013, May 22). Low-latency trading. Retrieved August 05, 2020, from

[5] Huang, R., & Polak, T. (2011). LOBSTER: Limit Order Book Reconstruction System. SSRN Electronic Journal. doi:10.2139/ssrn.1977207

[6] IEX Group. (n.d.). Retrieved August 07, 2020, from

[7] McKay Brothers: Low Latency Microwave. (n.d.). Retrieved August 07, 2020, from

[8] Moallemi, C., Tsoukalas, G., Sun, Z., & Mookerjee, R. (2013, April 25). OR Forum-The Cost of Latency in High-Frequency Trading. Retrieved August 05, 2020, from 

[9] Spread Networks – CME Group. (n.d.). Retrieved August 07, 2020, from

RF/mm-Wave Semiconductor Technology for 5G Applications and Beyond

Journal for High Schoolers


Ashley Jun, Rafael Perez Martinez, Srabanti Chowdhury


5G will play an important role to keep up with the exploding demand of wireless networks at an even better quality (rate and efficiency) that is seen today. Currently, there are too many devices on the bandwidth of frequencies operating on 4G and its prior generations. As such, we are running out of spectral capacity. This comes from the past two technological waves that began starting in the 1980s. The answer does not come from minimizing device usage and cutting back on how we are using the bandwidth of frequencies already being used because there is no means of returning to a time without the technological advances we have today. Instead, the approach that 5G took was to move to higher frequencies (e.g., the mm-Wave spectrum), opening a new realm of applications in imaging, sensing, radar, communication, etc. This enabled the third wireless revolution where innovation is no longer inhibited by the lack of wireless networks and the capacities they can hold. With the great possibilities opened by 5G, the question lies of why this hasn’t been implemented if it is the solution many wireless network companies are shifting toward, how it is implemented, and what are the challenges of implementation. 
With the dominant usage of silicon and conventional III-V materials in the devices that have enabled the wireless networks up until today, experts say that we are on the brink of hitting an end to the capacity of the current semiconductor technology. This is when new semiconductor materials are being researched extensively to solve the needed expansion. In this work, the capacities of materials such as GaN, GaAs, InP, SiGe, Gallium oxide, etc., will be compared to present semiconductor technology to understand if these technologies are enough for 5G applications and beyond. The challenges of this becoming an accepted shift in the materials used in wireless networks surrounds the reality that the semiconductor industry has been used for decades with billions of dollars of investment to get us where we are today. For these other materials, there are misconceptions of 5G and unknowns of the possibilities because we haven’t seen the same kind of investment into these new technologies to where we experienced two major technological waves. The combination of understanding the foundations of 5G and comparisons of semiconductor technology provides insight of answers of how the future of wireless networks will work and if the technologies investigated today are enough for our issues with not enough spectral capacity and beyond.


Ever since 1980, there have been four generations of cellular systems, and 5G is projected as the newest addition to open a new avenue for technological advances. 1G was for voice communications consisting of a basic analog system that was known as the analog FM cellular system. 2G moved into digital technology with modulations with code divisions of multiple access, which improved spectral efficiency. In 2001, 3G jump-started the mainstream of technology with high-speed Internet access as well as improved capabilities of both video and audio streaming. 3G brought upon technologies and regulations that were used worldwide such as Wideband Code Division Multiple Access(W-CDMA) and High-Speed Packet Access (HSPA) which included mobile telephone protocols: High-Speed Downlink Packet Access (HSDPA) and HighSpeed Uplink Packet Access (HSUPA). Examples of the expansion can be seen with over 350 communication service providers over multiple frequency bands. However, 4G vastly increased mobile communications technology. 4G LTE enabled a scalable transmission bandwidth up to 20 MHz from radio access technology, which has offered a fully 4G-capable mobile broadband platform. One key technology that allowed for the high data rates was Multiple-Input Multiple-Output (MIMO). MIMO accommodated high spectrum efficiency for multi-stream transmission as well as other improvements of link quality and adaptive radiation patterns for the signal gain and interference mitigation of beamforming. However, with both LTE and HSPA working at optimal load balances, it isn’t enough for the demand of wireless networks and greater efficiency, which comes with 5G and its expansion into greater frequencies.

But, a larger range of frequencies (i.e., mm-Wave spectrum) is not the only technological advance to support the shift to 5G in hopes of answering the issue of lacking spectral capacity. 5G consists of five large components each of which has its challenges that call upon further research and understanding: mm-Wave spectrum, massive MIMO, beamforming, full-duplex (FD), and small cells. Mm-Wave spectrum has to do with the expanding frequencies (e.g., 28 GHz – 39 GHz) that will allow for more space for current and further technological advances that could allow for virtual reality, augmented reality, mixed reality, self-driving cars, and other improvements in sensing, imaging, radar, and communication. Massive MIMO simply increases the number and capacity of antenna ports by 22 folds or more of what 4G MIMO could hold to handle all cellular traffic. The drawback of such expansion is that MIMO broadcast information in all directions, so there will be interference. Beamforming combats that issue of interference by sending focused streams of data to narrow in on the signal’s intended destination. This allows for increased efficiency by taking larger amounts of incoming and outgoing data streams at the same time. Full-duplex helps further increase efficiency by having the incoming and outgoing data streams operate simultaneously on the same wave frequency known as reciprocity. Rather than having to operate on different frequencies, silicon transistors create high-speed switches to decrease transition time between incoming and outgoing data streams. Small cells address the issue with mm-Wave spectrum because at those high of frequencies the signals have a difficult time penetrating through surfaces such as buildings, trees, and people. This would greatly reduce the connection of wireless networks between a device and a large cell tower. The solution of small towers is due to the sheer increase in numbers when compared to the large cell towers because having a greater number of small towers means they can be located in various places that are closer to the devices. An example of this is expressed with what the experience would be like when receiving and sending data streams to the small towers when walking through a town. In a certain range, the device would communicate with a certain tower but when out of range of one, the device would connect to another thereby maintaining a connection despite the difficulties of interference of mm-Wave spectrum.

Methods and Materials

To advance wireless networks with 5G and generations beyond that, semiconductor technology and its advancement parallel the success because one must adapt with the other. One large component of this is the move to higher frequencies with 5G to make smaller transistors, but this increase of speed would lead to a breakdown. Various semiconductor technologies are thus compared to find what qualities are best suited for the projected future. The semiconductor that is vastly used in applications today is silicon, but research is now presenting how it may not stand on its own for the generations beyond 5G. Silicon is regarded as a slow semiconductor because of its small carrier mobility for electrons and holes, which limits its maximum velocity to 1×10^7 cm/s below normal conditions. Silicon also has a lower breakdown voltage in comparison to other materials such as GaN with a much larger breakdown voltage. In comparison, GaN has a faster switch speed, low resistance, and produces much lower waste heat than Silicon, which better suits it for high-power RF electronics (i.e., PA applications). However, there hasn’t been nearly as much investment in materials like GaN, so there hasn’t been the same dominance in semiconductor technology as Silicon. With those years of understanding and experience, silicon has been found to works very well for several reasons: yield many low cost integrated circuits per wafer, large size, great thermal properties to dissipate heat, in high dynamic range can controllably dope n- and p- impurities, abundant and easily purified, can grow an extremely high-quality dielectric of SiO2 on it, and high mechanical strength for handling and fabrication, make very low-resistance ohmic contacts. Many say that we are reaching the ending limit of silicon’s potential, which is where the comparisons of GaN and other materials help determine the next steps of investment for wireless networks in 5G and beyond. 

The usage of semiconductor technologies plays a large role in designing transistors, which is how performance in terms of the frequency range, power capacity, and noise characteristics are compared. For RF bipolar junction transistors (BJTs), as one of the oldest and most popular active RF devices in use today, they have a good operating performance and low cost. These transistors are mostly made using silicon and have shown to be useful for amplifiers up to the range of 2–10 GHz as well as up to 20 GHz in oscillators. Bipolar transistors are optimal for oscillators with low-phase noise because of their very low 1/ f -noise characteristics. In comparison to field-effect transistors at frequencies below about 2–4 GHz, bipolar junction transistors are preferred for their higher gain, lower costs, and biasing potential with a single power supply. However, bipolar junction transistors are subject to shot and thermal noise effects, which lessens the quality of their noise figure against the noise figure of FETs. FETs or field-effect transistors, which come in various forms including the MESFET (metal-semiconductor FET), the MOSFET (metal oxide semiconductor FET), the HEMT (high electron mobility transistor), and the PHEMT (pseudomorphic HEMT), operating at high frequencies have shown useful for low noise amplifiers, LNAs, and power amplifiers, PAs. One of its most common transistors is GaAs MESFETs because of its microwave and millimeter-wave applications at usable frequencies up to 60 GHz or more (more obtained with GaAs HEMTs). The applications for LNAs comes from having a lower noise figure in comparison to other active devices. PAs are very useful for high power RF and microwave amplifiers in recently developed semiconductor technologies such as GaN HEMTs as well as CMOS FETs for their RF integrated circuits at low costs and low power requirements for the high levels of integration. These PAs’ qualities are what is projected to be useful in commercial wireless applications in the expansions seen for 5G and beyond. Another transistor noted for being made of compound semiconductor technologies is a heterojunction bipolar transistor (HBT), which operates similarly to BJTs. They include semiconductor technologies such as GaAs, indium phosphide (InP), or silicon-germanium (SiGe), often in conjunction with thin layers of other materials (e.g., aluminum). This compound usage allows for high-frequency performance exceeding 100 GHz, which is the direction of generations beyond 5G in commercial wireless applications. It is understanding the transistors and qualities of the semiconductor technologies that enable the structure of our applications into greater frequencies and technological purposes.


The results of this research came out of the investigation of how 5G and beyond would be implemented: the steps, the source of demand, the applications, the challenges, and the comparisons of different semiconductor technologies.

5G Implementation Step:Small CellsFull Duplex
Description: Increasing the number of cell towers because the mm-Wave frequencies of 5G cannot penetrate through objects like trees, buildings, and people as readily as the current frequencies.Maximizing the efficiency of the incoming and outgoing data to reduce interference and drag. The data has a directed path.
5G Implementation Step:Massive MIMO and Beamforming
Description: Expanding the number and capacity of the cell towers by 22 folds or more of what is used for 4G LTE to store and send out a greater amount of data. The data is sent out in a narrower direction towards the intended target as a means of reducing interference by the previous methods of sending out data in a 360-degree manner.

The Evolution of G established and continued from the need for expansion. 1G presented mobile communication for the first time. 2G brought upon the first digital standards. 3G allowed access to the internet, which was dramatically increased in 4G and 4G LTE for streaming. 5G is now calling upon new possibilities in areas such as communication, sensing, radar, and conjoining realities (ie. augmented and virtual).

The demand for 5G with the expansion of bandwidth frequency because our current exponential growth of devices and technological advances are crowding the current bandwidth frequencies. To diminish the crowding effect that is projected to get worse if we continue to be confined to what is currently used then, opening new usable frequencies into the mm-Wave regime is where 5G is applicable.

Application of 5G Image(s)Enables 
Gigabit CommunicationSelf-driving cars by increasing the speed of communication between vehicles.
Gigabit CommunicationReceiving information without the limitations of distance to cell towers or the inability of the signals from penetrating through obstacles.
SensingVision and detection without the impedances of weather on clarity, which increases safety.
ImagingRadiometry and Remote Sensing as well as imagers and body scanners to depict a more clear image by utilizing mm-Wave frequencies to bring out the smaller details.
Challenges with Semiconductor Technologies:Image(s)Description
Attenuation vs. FrequenciesAttenuation helps determine which specific frequency to choose. In the figure, the atmospheric attenuation is shown as a function of frequency. Several relevant atmospheric transmission windows are presented as examples of where we see going from 5G and beyond: at 5G band which is 28-39 GHz, 75-110 GHz (W-Band), 125-165 GHz (D-Band), and higher. At lower frequencies, attenuation is less of an impacting factor on performance such as 4G and below. However at higher frequencies, attenuation is chosen at the local minimum so that we can reduce attenuation.  to communicate with the small cells and have clear communication and too much attenuation prevents that clear communication. For these reasons, frequency bands that we see in the figure display area where the attenuation is low for that section.
Power vs. Frequencies
There is a tradeoff of frequency and power. We see that Silicon is a prime example of this because it works in the two extremes. Silicon works at higher frequencies but doesn’t generate a lot of power or a lot of power but a low frequency. Therefore, this tradeoff of power and frequency impacts the decision of what semiconductor technology is used. Another example presented in the figures is GaN where there is less of a trade-off and that’s why it is more applicable to the purposes of 5G and beyond.
PA DesignThe three things that you need in a power amplifier are high-power (which GaN is good at), High efficiency (which requires high-gain), and small die area (to reduce cost, also integration-friendly).A good metric to go about characterizing a power amplifier is PAE or the power-added efficiency. The Power Added Efficiency of an amplifier is the ratio of produced signal power (the difference between input and output power) and the DC input power for the amplifier.Mathematically, it can be represented as the product of the drain efficiency times (1 – the inverse of the Gain) times the power combining efficiency. To maximize PAE from a technology point of view, you want a platform that has the highest gain you can get at a given frequency, which GaN provides. For that reason, Power amplifiers for transmitters utilize GaN, InP HBT, and SiGe HBT, which low noise amplifiers for receivers utilize InP HEMT, and InP MOS-HEMT.
Power vs. Frequencies (Efficiency)The figure shows power in dBm as a function of frequency for five different semiconductor technologies: CMOS/SiGe, GaAs, GaN, InP.  This is to give an idea of how different technologies compare against each other. As you can see, GaN can produce up to 35 dBm at a very high frequency as reported by HRL.

GaN is a semiconductor technology whose qualities make it appealing for the PAs in 5G, which is mirrored in the projections of the GaN Market from 2017 to 2022. There is a projection of $1.1 Billion revenue by 2022, and we see a greater application into commercial sectors along with the largest areas of the market: military and cellular infrastructure.


When researching the various semiconductor technologies and transistors for the potential outcomes of performance and capacity, it was notable that the generations we see from here on out with 5G and beyond are to be carriers exceeding 100 GHz into bandwidths of 5G, W-band, and D-band. This is then what will drive transistor development into greater possibilities. If you are looking into transceiver design (the designing of both the transmitter and receiver), it is optimal to have a fmax that is twice or higher of a semiconductor technology’s operating frequency. The PAs for transmitters: GaN, InP HBT, and SiGe HBT are recognized because they have the high f maxes, which should be enough to cover the frequencies we are looking to go into. Following this, we see that LNAs for receivers: InP HEMT, InP MOS-HEMT are also optimal for the carriers exceeding 100 GHz. When looking for the semiconductor technologies that will present that best performance, in comparison to the others, no one semiconductor can accomplish the highest qualities of a multitude of semiconductor technologies. Silicon has been the forerunner in the evolution of G through 4G LTE, but it isn’t compatible enough with the carriers exceeding into higher mm-wave frequencies beyond 100 GHz. The direction is to now utilize a multitude of semiconductor technologies to offset one’s lacking qualities with another’s highest qualities in performance in the designing of transreceivers for 5G and beyond.

Comparisons of semiconductor technologies applicable to different generations beyond 6G for their qualities in transistor development based on their fmaxes. 

Future Directions

From the progress made by enabling integrated, compact, and efficient chip-scale THz technology, there is a bridging effect on the THz gap between our advances in technology and the application of our technologies. We have seen the result over a vast array of areas including solid-state and photonic devices, two-dimensional (2D) materials, heterogeneous integration and miniaturized THz technology demonstrated with quantum-cascade lasers, microbolometers, nanowires, novel plasmonic nanostructures, metamaterials, and ultrafast photoconductive semiconductor materials. Recently efforts have been dedicated to maximizing the capabilities of solid-state semiconductor technology (III–V and silicon-based) under conditions of room temperature and manufacturing at low costs to increase the economies of scale. From these understandings, the paradigm has shifted in thinking there is one master THz device and semiconductor technology to working with a multitude of semiconductor technologies for new system-level properties that can enable a moderately efficient system, which is versatile and programmable. The programmability allows for possibilities regarding electronic reconfigurability of the wavefront and polarization of the emitted THz fields, enabling applications in communication, radar and imaging, or dynamic spectral control of the radiated fields for spectroscopy and hyperspectral imaging. However for these possibilities, versatility that is not found in many current non-integrated THz platforms is required. With the dominance of silicon-based integrated technology, the qualities of a platform with massive integration and complex phased arrays allow for imaging and communication systems with output power to reach the 100 μW range at 1 THz. But, there is a greater potential of performance as seen with the expansion into other semiconductor technologies for the PAs and LNAs beyond 5G. An example of what these future devices project is multifunctional electromagnetic surfaces to allow subwavelength control to do away with many of the tradeoffs that came with the partitioned block-by-block design approach. Strives towards capabilities such as beamforming for applications in sensing, imaging, and communication are clear functionalities of integrated circuits, but there is much to be done before a fully functional THz system is implemented. THz systems require transporting THz signals across longer distances between integrated circuits, multiplexing, and demultiplexing across a network of nodes or even free-space modulation, and so on. This comes from electrically actuated external beam-steering devices, which multi-input–multi-output (MIMO) antenna systems play a crucial role in the future of THz wireless systems. In the small cells step for implementing 5G’s expansion into mm-wave frequencies, the MIMO arrays are needing to be compact, while allowing for the drastic increases of capability and numbers. Questions are surrounding its implementation and optimization for the execution of such systems of a large number of antennas in a small footprint. The other step of beamforming is also under the question of achieving its spatial and spectral multiplexing because these are all aspects that aren’t familiar to RF systems in the past. We are reaching higher frequencies while also needing to be compact, efficient, high-performance sensing devices, operable at low power and deployable at large scale, and apply integrated circuit technology. The challenges faced to enter this new expansion of frequencies for applications in the real-world come from the demand for efficient and widely

reconfigurable chip-scale systems to apply to properties of the spectrum, radiation patterns, and polarization.


I would like to acknowledge Professor Tsachy Weissman of Stanford University’s Electrical Engineering Department and head of the Stanford Compression Forum for his guidance and help with the project. I also want to thank my mentor Rafael Perez Martinez for assisting me through the understanding and research of this project by maintaining a line of open communication whenever I needed help. Additionally, I would like to thank Professor Srabanti Chowdhury for all of her advice throughout the project as well as inviting me into the WBG Lab.


[1] Angelov, et al., “On the large-signal modeling of AlGaN/GaN HEMTs and SiC MESFETs,” 2005 Gallium Arsenide and Other Semiconductor Application Symposium.

[2] B. Romanczyk et al., “W-band N-polar GaN MISHEMTs with high power and record 27.8% efficiency at 94 GHz,” 2016 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, 2016, pp. 3.5.1-3.5.4, doi: 10.1109/IEDM.2016.7838339.

[3] J. Chéron, M. Campovecchio, R. Quéré, D. Schwantuschke, R. Quay, O. Ambacher, “ High-gain over 30% PAE power amplifier MMICs in 100 nm GaN technology at Ka-band frequencies,” 2015 10t European Microwave Integrated Circuits Conference (EuMIC).

[4] K. Nakatani, Y. Yamaguchi, Y. Komatsuzaki, S. Sakata, S. Shinjo, and K. Yamanaka, “A Ka-Band High Efficiency Doherty Power Amplifier MMIC using GaN-HEMT for 5G Application,” 2018 IEEE MTT-S International Microwave Workshop Series on 5G Hardware and System Technologies.

[5] M. J. W. Rodwell, “50 – 500 GHz Wireless: Transistors, ICs, and System Design,” GeMiC 2014; German Microwave Conference, Aachen, Germany, 2014, pp. 1-4.

[6] R. Guo, et al “A 26 GHz Doherty power amplifier and a fully integrated 2×2 PA in 0.15μm GaN HEMT process for heterogeneous integration and 5G ,” IEEE IWS, 2018.

[7] R. Ma, K. H. Teo, S. Shinjo, K. Yamanaka, and P. M. Asbeck, “A GaN PA for 4G LTE-Advanced and 5G,” IEEE Microwave Magazine, vol. 18, issue 7, pp. 77–85, 2017.

[8] Sengupta, K., Nagatsuma, T. & Mittleman, D.M. Terahertz integrated electronic and hybrid electronic–photonic systems. Nat Electron 1, 622–635 (2018).

[9] Vu, Kien & Bennis, Mehdi & Debbah, mérouane & Latva-aho, Matti. (2019). Joint Path Selection and Rate Allocation Framework for 5G Self-Backhauled mmWave Networks. IEEE Transactions on Wireless Communications. 18. 2431 – 2445. 10.1109/TWC.2019.2904275.

[10] Y. Yamaguchi, Koji Yamanaka, and Toshiyuki Oishi, “A Scalable Large-Signal Distributed Model for mm-Wave GaN HEMTs”, 2-17 IEEE TWHM.

Optimizing the Measurement of SPO2 With a Miniaturized Forehead Sensor

Journal for High Schoolers


Jack Burd, Joon Yang, Ada Poon


The Maxim Integrated MAXREFDES117# infrared and red light sensor chip has been shown to be viable for measuring SpO2 in a forehead-mounted sensor array setup. This was demonstrated through the creation of a forehead-mounted pad prototype which successfully measured SpO2 with >99% accuracy as compared to much larger conventional finger measurement apparatuses. This study’s results show that it is essential that the sensor patch be applied to thoroughly cleaned skin and placed in a dark environment in order to produce accurate measurements. The results also illustrate that this chip is sufficiently accurate to be employed in a sleep-monitoring fully-integrated system implemented on a patch that would replace the plethora of wires and sensors in current clinical sleep studies.


Current sleep studies require an abundance of bulky equipment, including jumbles of wires, a finger pulse oximeter, and various electrode pads placed on the skull. As a result of this, it is more difficult for researchers to conduct sleep studies on children and the elderly. In addition, the bulky setup means that participants have greater difficulty falling asleep and that many participants who already went through a sleep study do not want to return for a second trial. 

The solution to this is to consolidate all of the sensors required for a sleep study into a single forehead-mounted sensor array pad, which is a project being developed by my mentor. This one single system adhered to the forehead would house four electrodes to measure brain waves via an electroencephalogram (EEG) as well as two chips, one to measure the participant’s pulse oximetry and another to process data from the electrodes and transmit all data from the sticker’s various sensors wirelessly to a remote computer for processing and analysis. In addition, there would be a thin-film battery providing power to the patch’s integrated circuits. This work focuses on the commercially-available Maxim Integrated MAXREFDES117# infrared and red light sensor chip [1] to be used for the pulse oximetry measurement. During this project, this chip was mounted in a prototype forehead patch to evaluate that it is accurate enough to be used in a future stand-alone pad monitoring system. 


Most doctors and scientists today use the finger pulse oximeter, which is a device that clamps onto the finger to read a person’s heart rate and blood oxygen level, or saturation (SpO2). However, this project uses the MAXREFDES117# chip, mounted on a patch that can be applied to a subject’s forehead to implement the same functionality. In the prototype pulse oximetry measurement system, an Arduino Uno is used to stream out the data to be analyzed on a computer, but this would be replaced by an integrated processing chip in a future forehead pad system. 

Both technologies work because of a technology called photoplethysmogram technology [2]. This method involves emitting weak infrared or red light radiation, and this light is absorbed by many different tissues in the body but is absorbed most by the blood. Oxygenated blood absorbs a different amount of light than deoxygenated blood because of the difference in light absorptions between oxyhemoglobin and deoxyhemoglobin, respectively. Thus, the sensor can determine what percent of one’s blood is oxygenated, or the SpO2. While the finger pulse oximeters emit light from the top and detect how much has passed through the finger from the bottom, the Maxim Integrated chip emits light into the skin and detects how much is reflected. 

Pulse oximetry is critical in sleep studies. Pulse oximetry data can be used to diagnose in excess of 109 different sleep disorders [3], and information about these disorders is essential for sleep scientists and patients to know about. One of the most common sleep disorders is Obstructive Sleep Apnea (OSA), which may cause snoring and is fairly common. OSA occurs when the throat muscles relax and a person’s airway narrows, disrupting sleep. In addition to causing severe sleep deprivation, OSA can increase a person’s risk for cardiovascular disease. However, prior work by Chiang [4] demonstrates that SpO2 measurements were incredibly effective for diagnosing sleep apnea in primary care hospital patients. His work further illustrates the importance of pulse oximetry measurements in sleep studies. 

Previous work by Longmore [5] looked into which parts of the body would give the best heart rate, SpO2, and respiration rate readings when using reflective photoplethysmography, which is the same technology that the Maxim Integrated chip in this study uses. The study found that the best place to measure both heart rate and SpO2was the forehead, but the forehead was one of the worst spots to measure respiration rate. Because this project is aimed at recording SpO2 data for sleep studies, Longmore’s work served as support for the fact that a forehead setup could be a viable alternative to commercial finger pulse oximeters. 

Test Setup


  • Black cardstock paper –   1 3m Tegaderm Transparent Film Dressing
  • 5 elegoo Dupont wires –   1 Arduino Uno
  • Superglue –   Green tape
  • MAXREFDES117#   1 Maxim integrated HR and SPO2 measurement chip

Physical Construction:

A 1×1 cm square was cut in the center of a Tegaderm forehead sticker. Then, the Maxim Integrated chip was inserted into this hole and secured using small strips of green tape. Next, two sheets of black cardstock were cut in the shape of the forehead sticker and super glued together and then onto the sticker. Five Dupont wires connected the chip up to an Arduino Uno, which then connected a computer via USB. A slight curvature was added to the forehead sticker system. 

The chip itself provides the best data when operating in dark conditions with a good point of contact between the glass sensor box and the skin. The Tegaderm sticker adequately provided a strong seal between the forehead and the sticker system, and pressed the sensor surface firmly against the skin, for optimal positioning. In addition, the black paper backing reduced the amount of light shining through the semi-transparent sticker in order to maintain a dark environment for the chip for best operation. Bending the sticker slightly created a better seal with the forehead.

Images of my prototype:

Front view: Side view:

Back view: Connected to Arduino and Computer:


The Arduino Uno was programmed to take in a constant stream of infrared and red light data from the chip via an I2C input pin, and to calculate a new heart rate and SpO2 value every second. The data values were then sent to the serial port in a specific format so that it could be read into excel for data analysis, utilizing the PLX-DAQ Microsoft Excel Macro [6].


Two and a half thousand data points were collected in order to adequately test the accuracy of this forehead mounted Maxim Integrated pulse oximetry chip. Below are some of the graphs of the accumulated data.

Two-hundred fifty seconds worth of forehead pulse oximetry data and finger pulse oximetry data were collected from three subjects while they were at rest. The forehead setup streamed the data from this forehead sticker, while the finger pulse oximetry data was manually recorded. Using the conventional finger pulse oximeter readings as the accurate baseline reference, the measurements of the chip in this work was compared against the baseline to quantify the error.

Subject 1 (13 year old male): 

    Maximum error:  9.2% 

        Average error: 0.67%

Standard deviation: 1.7%.

Subject 2 (50 year old female):

    Maximum error:  3.1% 

        Average error: 0.36%

Standard deviation: 1.7%.

Subject 3 (50 year old male):

    Maximum error:  5.2% 

        Average error: 1.3%

Standard deviation: 1.5%.

Across all three subjects, the average percent error was 0.78%, with a standard deviation of 1.63%, as shown the in summary table below:

Subject 1Subject 2Subject 3Total
Mean Error0.67%0.36%1.3%0.78%
Std. Deviation1.7%1.7%1.5%1.6%


Given the average percent error of 0.78%, and 1.6% standard deviation, this work has demonstrated that a forehead measurement of SpO2 based on the commercial Maxim MAXREFDES117 chip infrared sensor is sufficiently accurate for clinical studies, and removes the need for the inconvenient finger sensor typically used for SpO2 measurements. While the prototype setup used an Arduino Uno to interface to the chip and provide the measurement readout, this functionality can be readily miniaturized to achieve the ultimate goal of replacing all the wired telemetry sensors in a sleep study with a single patch applied to the forehead. The goal of this work was achieved to demonstrate the viability of the infrared and red light SpO2 sensor.

Throughout the testing phase, several limitations with the prototype were uncovered. The chip’s readings became more inconsistent and less accurate if light leaked through the sticker and into the environment around the chip. In addition, participants needed to scrub their skin thoroughly in order to produce consistent, accurate data with the prototype. Lastly, the chip requires firm placement against the skin but it cannot be pressed into the forehead; one participant applied too much force to the sticker and pressed the chip into his forehead, which resulted in a number of data points where the chip did not take a usable reading. 

Beyond this forehead-mounted sensor array project, this methodology could be employed in other applications where SpO2 is a salient data point, such as devices enabling the early and efficient diagnosis of COVID-19 [7], health-sensing wearables, etc.


I’d like to thank my mentor Joon Yang and Professor Ada Poon for their guidance and support and for this opportunity to explore the field of bioengineering and meaningfully contribute to an ongoing research. I’d also like to thank Professor Tsachy Weissman and Cindy Nguyen of the Stanford STEM to SHTEM Program for an enlightening and engaging experience this summer. 


[1] Maxim MAXREFDES117#: Heart-Rate and Pulse-Oximetry Monitor.  Retrieved July 10, 2020, from

[2] How Does PPG Technology Works? – SoulFitBlog. (2018, April 14). Retrieved July 10, 2020, from

[3] (2018, March 19). Here is what no one tells you about Pulse Oximetry for Sleep. Retrieved August 08, 2020, from

[4] Chiang, L. (2018). Overnight pulse oximetry for obstructive sleep apnea screening among patients with snoring in primary care setting: Clinical case report. Retrieved July 10, 2020, from

[5] Longmore, S., Lui, G., Naik, G., Breen, P., Jalaludin, B., & Gargiulo, G. (2019, April 19). A Comparison of Reflective Photoplethysmography for Detection of Heart Rate, Blood Oxygen Saturation, and Respiration Rate at Various Anatomical Locations. Retrieved July 10, 2020, from

[6] Parallax Data Acquisition tool (PLX-DAQ) software add-in for Microsoft Excel. Retrieved Jul 10, 2020, from

[7] Pathak, N. (2020, April 28). What Is a Pulse Oximeter and Can It Help Against COVID-19? Retrieved August 07, 2020, from