# Creating Multi-Platform Immersive Narratives to Illustrate the Effects of AI Algorithms on Human Behavior and Thought Patterns

Ritali Jain, Alexander Nguyen, Houda Miftah, Julian Reed, Kepler Boyce, Nina Franz, Cecilia Colberg, Audrey Edwards, Ayushman Chakraborty

## Abstract

Algorithms generated by artificial intelligence on online platforms have dominated our lives and our time, affecting our behavior and opinions without user awareness. The more these intelligent algorithms are able to paint a clearer picture of you, the more they can manipulate you to act and think in the ways that are profitable to technology monopolies that exist in our current world. To best emphasize the massive scale of data collected by these companies, our team designed an immersive shopping website, Sahara Prime: The Simulated Experience, to demonstrate how much information can be collected on them in just twenty to thirty minutes. Our website can be accessed through the following link: shtem.herokuapp.com. This project and paper were crafted to explain and raise awareness to the infringement of users’ privacy that is occurring through these highly advanced algorithms. It is our belief that this platform will lead others to do their own research on how they can protect their privacy and resist surveillance capitalism to ensure each individual has more control over personal information. With this, we can break the veil of ignorance that has allowed us to be manipulated by our screens for so long.

## Background

The role of algorithms in our daily lives has been expanding with the rise of digital platforms, including social media, e-commerce, and online services. The Covid-19 pandemic has contributed to the digital surge, with Internet service usage rising from 40% in pre-lockdown levels to 100%. Digital platforms offer large technology companies, often monopolies like Amazon, Facebook, and Apple, more mechanisms to collect data about our tendencies and demographics through our digital trace. Our online presence generates a massive amount of data, including search history, previous purchases, watch history, what posts are liked on social media, and more. This raw data is fed into advanced machine learning algorithms that use analytics to determine our behavior, interests, and persona. These insights undergo pattern discrimination, the addition of identifiers on input data to filter information, in order to divide users into segments for purposes such as advertising or political campaigning.

### Algorithms

We begin by defining key words in relation to the concept of algorithmic targeting: privacy, surveillance capitalism, persuasive technologies, and the attention economy. Privacy is the right to keep personal information to yourself and/or trusted individuals rather than accessible in the public domain. Through terms of service, cookies, and other mediums, users often exchange their privacy in return for online services. Companies benefit off of access to this data due to surveillance capitalism, which refers to the economic system that aims to collect as much of users’ personal data as possible for financial gain, since data insights are sold to businesses and advertisers. In order to increase the amount of data gathered, technology companies aim to maximize the amount of time that people spend on their platforms. This is the concept of attention economy, where Internet users pay for these free services available to them, which tend to be persuasive technologies designed to change human’s thought patterns, with their attention. Essentially, when users are not paying for a product, they tend to ‘be’ the product, since tech monopolies make a solid profit from selling data to third parties. With this data they can also modify their products to be more enticing, such as using algorithms to pinpoint your interests and thereby determine which product recommendations suit you most.

Surveillance capitalism has become innate to our economy, and as a result, consumers do not realize the adverse effects of algorithms and attention economy on different facets of society, such as screen time. Delving into this extensive foundation on algorithms, we aimed to produce a captivating storyline that raises awareness on these concerning realities.

### Data Collection

It is impossible to discuss surveillance capitalism without elaborating on data collection. The amount and nature of data collected tells a story: the story of you. Depending on the nature of the technology offered, companies may be collecting any or even all of the following protected private information: first and last name, gender, date of birth, age, location, IP address, mailing address, email address, phone number, payment details, and credit card information.

Companies also accumulate data surplus, or data that is not necessary to the functionality of the service or experience. Generally, a company may collect data such as social media accounts (if you log in using them), clicks, time spent looking on a website, reviews, browsing data, language, employment history, education history, weight, height, and body measurements (eg. estimated stride and shoe/foot size). Google Analytics is a tool that acquires data from each website visitor by inserting JavaScript page tags into the code of each webpage so that data from each visitor’s browser can be sent to Google’s servers.

Companies also rely on cookies for data collection, as they comprise a unique user ID and a site name. Cookies enable websites to retrieve this information when users revisit them, so that they can remember users and their preferences and tailor page content accordingly. When users accept the use of cookies, they make the collection of their personal information easier. Often, major corporations require that users accept cookies so that they can improve their services and products, improve customer experience, and curate their products. However, data collection benefits the corporation more than the user, as personal data is sold to third parties in order to send targeted ads to users.

### Why Social Media is Free

Despite platforms like Youtube offering users access to an engaging and personalized experience, it is key to recognize that social media corporations benefit from the attention economy and surveillance capitalism. Dissecting their business model is fundamental to understanding why social media is free. Although their revenue is mostly based on personalized ads from firms, social media companies go beyond just allowing the promotion of products on their platforms. They serve as a cage for the guinea pig: the user. By analyzing user data from different inputs, they can sell inferences to corporations who are looking for an easy way to target their ideal customers.

To drive attention economy, social media platforms use features that research has shown are addictive, such as push notifications, typing awareness indicators, and banner ads. Platforms also regurgitate media that is predicted to imitate your tastes and preferences, which can have devastating impacts. For instance, Instagram and Faceobook have contributed to the rise of political polarization.

Algorithms identify user’s political beliefs and update their feed to match the user’s views in order to tempt them to click and engage with the media. This results in a rabbit hole where the algorithm suggests articles with increasingly radical ideas and these become the primary source of information that the user consumes. Another example of creating unhealthy user feeds is when an algorithm learns that a user is depressed and updates the feed with ads about vape pens and other dangerous influences.

### Insight from Workshops

Throughout our journey, we participated in a multitude of workshops to not only gain a deeper understanding of the dangers of surveillance capitalism, but also to learn about the different ways we could create an interactive piece of theater that conveys this information to our audience. Theater is not just limited to a production on stage— it can also encompass an interactive phone call or a science- fiction role playing game. We learned that having the audience become a large part of the performance rather than merely watching it further immerses them into the story, as they essentially became the story. This allows the audience to further resonate with the message of the story being presented, which we wanted to take advantage of to convey our important message in the most impactful way. These workshops helped us unlock the full potential of our creativity as well as expand our understanding of theater, inspiring the experience we created.

### The Social Dilemma

During our preliminary research, we watched two documentaries. The Social Dilemma introduces the ethics of algorithms and the tactics used to not only collect our data but to keep us engaged so that they can expose us to as many ads as possible and increase their revenue. The documentary portrays algorithms as three men to provide a metaphor. These men are not seen as caring for the user, but rather as manipulating and taking advantage of the user for profit. Another key detail is that as the user spends more time with their phone, they obtain a better idea of who the user is, which is symbolized as a holographic image of the user becomes more detailed over time.

We also watched a documentary about surveillance capitalism, featuring Harvard Professor Shoshana Zuboff. The documentary introduced us to multiple case studies where data was used and sold to companies that provided extremely effective ads to their targeted audience. One case study that stood out was the case of a pregnant woman. Using the data collected from the woman, the algorithms of Target were able to figure out that she was pregnant before the woman’s family found out. Such case studies showed us the true power that data collection had and inspired multiple aspects of the project.

## Objectives & Rationale

Following our literature review on algorithms, surveillance capitalism, attention economy, and data collection, we found it pertinent to convey our research to a general audience. In particular, our goal was to illustrate the inherent monopolies that companies utilizing data and algorithms have established in our society, not just controlling economic markets, but dominating our daily lives. Algorithms dictate consumer spending, marketing, politics, and screen time among other facets, and our constant provision of data online no matter how innocuous fuels these algorithms and improves their ability to predict future behavior.

Our primary objective in this project was to illustrate the unwavering influence of surveillance capitalism via a story that is executed by harnessing online platforms. We chose to convey our research through an immersive experience in order to enable users to self-realize the nature of data collected from them within a short span of time as they walk through the story alone.’

We also wanted to show how platforms that appear to be free to use, like social media applications, rely on a much more ephemeral currency – our time and our attention. Using a shopping website as our driving mechanism served this need and also fulfilled the setting where much of surveillance capitalism occurs in reality. Through this vehicle, we were able to collect personally identifiable information (PII) that is necessary for the functionality of a shopping website, but more importantly micro-data that we were able to piece together to make inferences about our user. The purpose was to show how predictive algorithms can use micro-data to make inferences and suggestions for you, oftentimes determining aspects of a person unknown to that person’s conscious mind, and extending the time spent on those platforms.

## Methods

To accomplish our aims, we created Sahara Prime: The Simulated Experience, an immersive shopping narrative. Our goal was to incorporate several platforms throughout the immersive experience in order to guide the user through each layer of our story. We wanted to start with a website that would mimic a high-end fashion store and replicate some of the features that users would be accustomed to. This initial layer of our story aimed to not only collect and compile a user profile, but to also display how valuable data collection is. The next layer of our experience leads the user to a phone call in which an automated voice directs them to a data leak, in which the user is able to view all the data that Sahara Prime has collected on them. Phase alpha is also initiated, a process aimed at detaching the body from the algorithm. The user is then led through the research we collected to reveal the main purpose of Sahara Prime: to inform users on the importance of resistance in the age of attention economy. The last platform we were able to incorporate was the use of email, which was used to wrap up the experience and send more information about resistance. Through this multi-layered experience, we were able to effectively construct an immersive shopping narrative that is both entertaining and informative.

### Tools

We used many tools and frameworks to create our website; as such, we need to define several terms. Our website was built upon the Node.js runtime and package ecosystem, meaning it was written in the Javascript programming language. In our case, however, we used Typescript, which is a strongly-typed version of Javascript. Put simply, this ensures our code is robust by catching many errors before compile time.

Since our website requires dynamic functionalities, we used Next.js, a full-stack application framework that leverages React for the frontend development. React is a Javascript library that allows one to write reusable “components’ ‘ in Javascript Extended (JSX) or Typescript Extended (TSX) which render particular HTML content. This removes the need for independent HTML files and script files, allowing for a much cleaner codebase—all of the content that is rendered on a page and the logic for that page’s functionality are self-contained in a single JSX or TSX file. In addition, content that needs to be reused in many places on the site can be stored in a React component, following the “DRY” principle of programming: Don’t Repeat Yourself. With traditional HTML and Javascript, our website source code would have been much larger and more difficult to manage. React is currently the most popular frontend framework and is used in many frontend web development job positions, which demonstrates how powerful it is for shortening development time and improving code maintainability.

To further streamline development, we used a tool called Tailwind CSS, which almost entirely eliminates the need for CSS by linking premade stylesheets in all JSX/TSX files. Rather than creating verbose CSS files, one can instead do a vast majority of their styling by applying Tailwind’s classnames to their HTML elements. For example, styles such as

div#example{
display: flex;
flex-direction: column;
gap: 0.5rem;
justify-content: flex-end;
}

that would appear in an external CSS file become simplified to inline class names:

<div className=”flex flex-col gap-2 justify-end” />

Further benefits of the Next.js framework include its server-side rendering and built-in API routes. Our website required an email server for a few elements, and because email servers can only run on the backend, we needed some form of backend. Next.js does not offer a complete replacement for a traditional backend, but it allows one to create web API endpoints very easily, which was sufficient for our needs. These API endpoints receive HTTP requests from the frontend containing information such as the email address and contents of the email, and the email server sends an email in accordance with the provided information.

Due to the fact that browsers can only run Javascript (with the exception of WebAssembly, though this still requires Javascript to access and edit the DOM), the React frontend source gets compiled to optimized HTML and Javascript for the final production build.

### Currency

Instead of using the traditional coins or tokens system, we established heartbeats as our form of currency for Sahara Prime. The purpose for choosing this was to make users grasp just how much time they allow screens and websites to take. By making the focus on heartbeats, the user can realize how many precious seconds are wasted for the gain of the attention economy. We believe this will have a lasting effect on our audience and will make them ponder on the quality of life we are all missing out on because of our inability to turn off our electronic devices.

### Collecting Personal Data

On our account setup page, we collected PII such as email address, full name, phone number, and birthdate, as well as whether the user has preference for visual guides to audio components of the website. Our website also includes a preliminary survey to gather user’s interests and demographics, such as preferred brands, ethnicity, gender, and level of introversion vs extroversion. Throughout the website, we collect data on the amount of time spent and the number of clicks on different pages and products to infer what components pique the user’s interests.

For the user to earn enough currency to buy their desired product, we offered several microtasks that they had to complete, including surveys, captchas, video ads, image ads, and signing up for our newsletter. For the surveys and captchas, we embedded questions that seemed playful but would bring algorithms a lot of important predictive information. One such disguised question was ‘Which president would be the worst at video games?’ with options of ‘Abraham Lincoln’ , ‘Millard Fillmore’ , ‘Barack Obama’ , ‘Donald Trump’, and ‘Ulysses S Grant’. The question could be used to reveal any potential political biases of the user.

### Design Choices

On top of applying commonly used data collection tactics in our very own site, we also focused on incorporating satire in our website. Satire allowed us to exaggerate certain components of our website that users often take for granted. We applied satire in the terms and conditions page, where we made the user scroll for at least ten seconds in order to reach the bottom of the page. Our terms and conditions also evidently included nonsensical clauses. By doing so, we hoped that the user would become conscious about the terms and conditions methods that other companies use to get the user’s permission to collect data without them even knowing it. Furthermore, in our captcha task, we included some questions that were impossible to answer correctly. Again, this use of satire illustrated how human verification captchas are persistent forms of data collection on websites.

At the end of checkout, our website is ‘hacked’ by a fictional youth resistance group, The New Generation (the TNGers). The rest of the experience is predominantly outsourced to a phone call, with the exception of showing how much data was collected about the user and what inferences were made. The phone call discusses methods of resistance by walking the user through a mediation-like experience and illustrating the effects devices have on our lives. We chose this multi-platform approach to increase the engagement and awareness that the experience provides by helping the user gradually transition from our site to joining the resistance against surveillance capitalism.

### Limitations

Given the nature of our project, there are certain limitations that arise. Firstly, the inferences we made on our user were created using conditionals rather than an actual trained algorithm. This is because in order to develop a machine learning model that could make inferences based on the particular data we were collecting, we would need a very large set of training data with our custom features, which we did not have the time and resources to generate. Although the user could view their data on Sahara Prime, we used localStorage, which stores all information only on the user’s browser, instead of a database to build the site. We opted to not collect personally identifiable information from our users in order to stay true to our principle. Furthermore, we are currently testing our experience with a sample audience, and the feedback was not returned at the writing of the paper. We hope to publish our results and make it available upon request once we obtain a large, diverse pool of feedback from our audience.

## Conclusion

The significance in our piece lies in the context behind it. Through our use of the heartbeats currency and microtasks, we illustrated the famous saying: if you are not paying for the product, then you are the product. Through our data, we are being sold to companies who want to know more about us so that they can pitch a very personalized ad. This process is how users eerily get ads for products that capture exactly what they were looking for. By simulating the website crash and the data leak, we also conveyed how personal data collected by companies is susceptible to cyber attacks. Even if a security threat of such a magnitude never occurs, the business model of surveillance capitalism requires corporations to sell data in order to earn revenue.

Throughout this process, we also came to learn about how theater is evolving into an experience that does not clearly define the role of the performer and the audience. With new digital mediums, theater is adapting to more immersive pieces, where audiences play a role in the story. This feedback loop between the audience and the performers is called autopoiesis and is easier to embed in performances taking place in the new digital mediums that are at society’s disposal today.

Interestingly, we noticed that the techniques used to create theatrical pieces are very similar to algorithms used by companies. Both collect user input and use that input to shape the user experience. Thus, they both use autopoiesis. Is it possible to conclude that the algorithms share similar aspects with theater?

The root of this answer depends on the definition of theater. Is theater centered around the topic of interaction? Or is it the story lines that make up theater? We believe theater is the performance of creating a strong user experience that is personalized based on the audience. The data-collecting and inference-making algorithms make technological companies a form of theatrical display, as AI-algorithms make these platforms a perfect medium for a personalized user interaction. With more progress on deep learning models, AI becomes much smarter, and soon enough, we will have to dig deeper into the Turing test, the test to determine whether something is human or not. With AI becoming more human, each experience on the Internet becomes more interactive and more focused on developing an impactful human experience.

In our research, we focused heavily on the impact of surveillance capitalism and the attention economy in our lives. Thus, it was crucial for us to highlight the importance of resisting surveillance capitalism. Resistance is important because surveillance capitalism reduces personal privacy and narrows choices through creation of algorithmic echo chambers. We distinguished two methods of resistance to surveillance capitalism: radical and analog ways of resistance. Radical resistance focuses on changing device settings, such as turning on private browsing mode, rejecting certain cookies on browsers, switching to search engines that don’t collect users’ private data (eg. DuckDuckGo, Startpage, Ecosia), turning off push notifications to prevent distractions, and setting screen to grayscale to reduce screen-time. Analog methods of resistance focus on improving users’ mental health and overall well-being. They include practicing mindfulness and gratitude, exploring new hobbies, making bedrooms screen-free, and developing positive habits such as choosing one day per week to set your phone aside or putting a hairband around your phone (the hairband allows users to answer phone calls easily, but makes other uses of the phone more difficult).

## Future Directions

Currently, most of the algorithms that make inferences on Sahara Prime use hard-coded conditions. We hope to eventually implement deep learning algorithms that train themselves to output more information about the user based on the input data. We also want to find more ways to show the amount and impacts of screen time. Our way of illustrating the value of the data was to show the massive amount that we collected in the twenty minutes of the experience. One feature that we hope to put into effect would be to quantitatively quantify the value of a certain user’s data by leveraging the amount of clicks or amount of time spent on a page, as they indicate how much a user interacts with the site and also how much potential data is collected during that time span. By doing so, we hoped to keep the audience aware of the data collection methods embedded in our website.

## Acknowledgements

This project would not have been possible without the support of our mentors and the STEM to SHTEM team. Our team would like to acknowledge and thank Devon Baur and Marieke Gaboury for their guidance and inspiration. We would also like to express our gratitude to Professor Tsachy Weissman for creating the Stanford Compression Forum Internship and to Sylvia Chin for running the program. We are grateful for this unique opportunity to explore novel topics, conduct research, and gain insightful feedback.

## References

Berrios, Giovanni, et al. Cyber Bullying Detection System. 6 May 2020, https://engineering.ucdenver.edu/docs/librariesprovider29/college-of-engineering-and-applied-science/sp2020-capstone/csci14-report.pdf?sfvrsn=d3731fb9_2.

Cohn, N. (2014, June 12). Polarization is dividing American society, not just politics. The New York Times. Retrieved August 5, 2022, from https://www.nytimes.com/2014/06/12/upshot/polarization-is-dividing-american-society-not-just-politics.html

De’, R., Pandey, N., & Pal, A. (2020). Impact of digital surge during Covid-19 pandemic: A viewpoint on research and practice. International journal of information management, 55, 102171. https://doi.org/10.1016/j.ijinfomgt.2020.102171

Ioanăs, Elisabeta, and Ivona Stoica. Social Media and its Impact on Consumers Behavior. Report no. 1, 2014. International Journal

of Economic Practices and Theories, https://training.unmuhkupang.ac.id/index.php/JAK/article/view/2

Kant, T. (2021). Identity, Advertising, and Algorithmic Targeting: Or How (Not) to Target Your “Ideal User.” MIT Case Studies in Social and Ethical Responsibilities of Computing, (Summer 2021). https://doi.org/10.21428/2c646de5.929a7db6

McDavid, J. (2020). The Social Dilemma. Journal of Religion and Film, 24(1), COV41+. https://link.gale.com/apps/doc/A616580373/AONE?u=googlescholar&sid=bookmark-AONE&xid=2ffcc915

​Smith, Ben. “How Tiktok Reads Your Mind.” The New York Times, The New York Times, 6 Dec. 2021, https://www.nytimes.com/2021/12/05/business/media/tiktok-algorithm.html

“Tips for Reducing Screen Time, Reduce Screen Time.” National Heart Lung and Blood Institute, U.S. Department of Health and Human Services, https://www.nhlbi.nih.gov/health/educational/wecan/reduce-screen-time/tips-to-reduce-screen-time.htm.

# Designing for Authenticity

Adanna Taylor, Chloe Zhu, Jared Rosales, Lilah Durney, Ryan Brunswick, Samip Phuyal, Selina Song

## Abstract

We have entered an age of information disorder; with the current design of the internet, it has become increasingly difficult for users to access, identify, and trust authentic information. Editing tools have made the alteration or fabrication of image and video content dangerously easy, leading to the vast amount of misleading and false information available online. Misinformation online has jeopardized the public’s trust in news media and the free press. Furthermore, disinformation is being used now more than ever for information warfare, which has had measurable effects on the political, social and economical climate of nations worldwide.

The imminent transition to Web 3.0 opens up an opportunity to redesign the internet using new technologies. Looking at the Starling Lab’s work on image verification, for instance, we see cryptography’s ability to secure and verify the history of a piece of visual content as a powerful tool to ensure authenticity. In this paper, we seek to explore and demonstrate how Web 3.0’s technology can be applied to solve the information disorder; we ask the question “How might we design for authenticity? And how might we visualize that design?”

All of our results can be found in this folder.

## Background

To understand Starling Lab’s work, it’s important to consider how we arrived at this time full of so much information disorder. Since its invention in 1989, the internet has gone through three distinct phases: Web 1, Web 2, and Web 3 [1]. Web 1 gained popularity in 1991, and its structure was decentralized, with no single authority or group of authorities dictating the content that could be published. Next, around the year 2004, Web 2 emerged with the arrival of the social-mobile web and companies like Facebook. Based on user interaction and the spread of information via posting, commenting and messaging, this phase of the web saw a centralization of power in the hands of a few large companies, which resulted in many companies selling user’s data and information [2]. These characteristics of Web 2 are what has led to the spread of misinformation and disinformation, two terms that we will explore more later in this paper. We are now on the cusp of Web 3, a new phase of the internet that strives to fix the unintended consequences of Web 2, like centralization and monopoly power. The goal of Web 3 is to create a system that removes power from bad actors, decentralizes control, and creates transparency [3]. Beyond Web 3 technologies that have gained popularity in the past few years, like cryptocurrencies, we believe that Web 3 technologies can reduce “information uncertainty” and help bolster trust in digital content.

### Blockchain & Tech

We can specifically use cryptographic technologies to help us verify and authenticate information. The roadmap of verification starts when the creator, or Person A, takes a photo and saves it as a data file. They upload the file through a cryptographic hashing program, software that returns a hash. Next, they sign the hash with their private key, verifying their identity and ownership [4]. Next, Person A registers their signature onto a distributed blockchain ledger, a decentralized database that can be seen across different sites and is not controlled by a single, centralized company like Google or Facebook [4]. Person A then stores it onto a distributed storage network, which splits the data across multiple storage places, or servers. If one server is hacked, the others will not be compromised, a unique security benefit to decentralized technology. Person B will verify the signature with a public key and get the hash code [5]. Person B will compare the hash from the digital signature with the hash from the original data file [5]. If the hashes match, then the data is verified and authentic. If the hashes are different, someone has changed the data file and it is not authentic.

The history of changes made to a data file can be found on the ledger and is stored as metadata. Information on the ledger is immutable, so it cannot be changed or removed once saved [7]. Data explaining what changes were made, at what time, and by whom can all be accessed.

### Misinformation & Disinformation

Misinformation has been one of the main issues that has come along with the evolution of the web. Misinformation is false information that is spread or used by someone because of the unawareness of where it came from, but it may not necessarily be used for intentional harm. There are 7 categorized concepts of misinformation that range from satire, which is low manipulation, to fabricated content, to manipulated, false, misleading or imposter content [9]. For example, Some websites will post false content on websites with headings or logos similar to those of credible and trustworthy news organizations.

In addition to these concepts there are more that come in between that only serve to confuse the public. Disinformation is false information that is continuously used to purposefully cause harm. Some politicians, for example, throw out and overuse the term “fake news,” undercutting all news including reliable journalism and information.

How might we better verify digital content and make it clear to the public that what they read and see has been authenticated thereby bolstering trust in journalism and the media? That’s a question on the minds of many journalists and photographers who worry how and where readers and viewers get their information and whether they can trust it. These days, some say, we’re all suffering from “information disorder.”

### Players & Analysis of Problem

Many developers have been working together to re-architect the web by looking for upstream solutions using provenance and cryptographic tools. There is a world of open source players working in this ecosystem, which allows for various groups to collaborate using publicly available tools to combat misinformation and disinformation. We have spoken to representatives of many of the organizations spearheading this movement, including the Starling Lab for Data Integrity led by Stanford and University of Southern California, The Content Authenticity Initiative, or the CAI, spearheaded by Adobe, as well as the News Provenance Project, an experiment led by the New York Times Research and Development team.

The CAI collaborates with groups across several industries to fight misinformation by using tech and tools that bolster digital content provenance and puts data on a decentralized, unalterable, and transparent distributed network [8].

Similarly, the Starling Lab looks to establish trust in records of digital media by using provenance, various cryptographic methods and widely accessible collaborative tools. The Lab follows a three-step framework to capture, store and verify digital content that makes sure the information of the content’s origins are authentic and viewable [10].

The New York Times Research and Development team has also experimented with provenance working with IBM on a “News Provenance Project” experiment with technical solutions that combat the spread of misinformation by allowing readers to verify the validity of news online [11].

However, there is still a severe lack of awareness about the host of issues that misinformation and disinformation have brought about, as well as the steps that are being taken to address these problems. The work our team has done sheds light on the consequences of misinformation and disinformation, as well as solutions for them.

## Methods and Materials

When discussing the most effective way to inform others about the utility and journey of Web 3.0, we advocated for the use of visual aids. Our primary criteria revolved around factors such as: comprehension, precision, and engagement. The process of creating these visuals involved the use of various editing softwares, incorporating informative text with striking images. Photoshop and Canva were primary players, allowing creative flexibility in the overlap of text and image cycling. Furthermore, iMovie played a significant role in expanding our reach to the media centric audience through video.

## Results

Throughout the summer, we have focused on multimedia forms of creation. We created visuals and infographics to explain the various concepts of Web3. These concepts span from the history of the Internet to the technologies behind data authentication, such as hashing, signatures, and public/private keys. In addition, with the supervision of media professional Aaron Huey, we have created a video to present our methods and findings throughout the program.

Some examples of our work:

The following is a compilation of all visuals and infographics we created on the various topics, technical and conceptual, of Web3:

## Future Directions

We can further increase the scope and quality of our work to make knowledge about cryptography’s value in content verification more accessible. To measure the effectiveness of our work, we can get feedback through surveys from our local communities. After we present the infographics to them, we can track how much information was retained through a follow up quiz. In addition, to create a more professional video with greater human interaction, we can create a video interview series to gauge the starting familiarity levels of strangers with Web3 and how it changes after a brief discussion and video playback. From a more technical aspect, we can learn more about the product development process for Web3 applications and potentially create our own dApp, or decentralized app. Overall, there is an abundance of opportunity to explore the variety of uses of Web3, utilize it to create change, and continue to design for authenticity.

## References

[1] Wikipedia contributors. (2022, July 21). World Wide Web. Wikipedia. https://en.wikipedia.org/wiki/World_Wide_Web

[2] Wikipedia contributors. (2022a, July 3). Web 2.0. Wikipedia. https://en.wikipedia.org/wiki/Web_2.0

[3] Web3: in a nutshell. (2021, September 9). Mirror. https://eshita.mirror.xyz/H5bNIXATsWUv_QbbEz6lckYcgAa2rhXEPDRkecOlCOI

[4] Johnson, S. (2021, September 3). Beyond the Bitcoin Bubble. The New York Times. https://www.nytimes.com/2018/01/16/magazine/beyond-the-bitcoin-bubble.html

[5] IBM. (2021, March 5). Digital signatures. https://www.ibm.com/docs/en/ztpf/1.1.0.14?topic=concepts-digital-signatures

[6] Blockgeeks. (2019, November 8). BLOCKCHAIN INFOGRAPHICS: The Most Comprehensive Collection. https://blockgeeks.com/blockchain-infographics/

[7] Koptyra, K., & Ogiela, M. R. (2020). Imagechain-Application of Blockchain Technology for Images. Sensors (Basel, Switzerland), 21(1), 82. https://doi.org/10.3390/s21010082

[8] CAI. (2022). Secure Mode Enabled. Content Authenticity Initiative. https://contentauthenticity.org/case-study

[9] Wardle, C. (2021, August 3). Understanding Information disorder. First Draft. https://firstdraftnews.org/long-form-article/understanding-information-disorder/

[10] Starling Lab. (2022). Starling Lab. https://www.starlinglab.org/

[11] NYT R&D. (2022). The New York Times R&D. https://rd.nytimes.com/

# A Novel Approach for Generating Customizable Light Field Datasets for Machine Learning

Authors

Julia Huang, Aloukika Patro, Vidhi Chhabra, Toure Smith (High School Students)

Mentor: Manu Gopakumar, Stanford Electrical Engineering PhD Student

Abstract

To train deep learning models, which often outperform traditional approaches, large datasets of a specified medium, e.g., images, are used in numerous areas. However, for light field-specific machine learning tasks, there is a lack of such available datasets. Therefore, we create our own light field datasets, which have great potential for a variety of applications due to the abundance of information in light fields compared to singular images. Using the Unity and C# frameworks, we develop a novel approach for generating large, scalable, and reproducible light field datasets based on customizable hardware configurations to accelerate light field deep learning research.

Keywords

Dataset, machine learning, light fields, 3D graphics pipeline, vertex processor, Unity Engine, C#

Introduction

A light field (represented by L(u, v, s, t)) consists of a set of 4D light rays through every point in empty space, describing the amount of light flowing in every direction in space. We can represent light fields using a two-plane representation in general position (see Fig. 1) to model the analytic geometry of perspective imaging. This two-plane representation could be seen as a collection of perspective images of the ST plane each taken from an observer position on the UV plane. In another way, this can be interpreted as many cameras taking photos of the same scene at different perspective views; thus, light fields are technically a collection of photos taken at different angles. Because light fields contain more information than singular images, they have a lot of critical applications in the computer vision field, such as light field depth estimation [1], synthetic aperture photography [2], and 3-Dimensional models of objects.

In the application of depth estimation, the objective is to calculate the depths of all objects in an image, or, in our case, a light field. To compute the depth of light field scenes, several manual optimization-based techniques have been used: using epipolar plane images that contain lines of different slopes, combining defocus and correspondence cues [5], etc. However, these methods either take too much time or result in low accuracies due to noise or occlusions.

Alternatively, when using synthetic aperture imaging, the goal is to measure the sharpness of a synthetic aperture image. Several calculation and optimization-based approaches for this type of application include using derivatives or local statistics of image pixel value, variance [6], discrete cosine transform [7], Tian and Chen’s Laplacian mixture model of wavelets method [8], etc. However, similar to the methods in the application of light field depth estimation, these optimization approaches are either computationally expensive, inaccurate, and/or time-consuming.

Beyond the depth estimation approaches detailed above, plenoptic cameras – commonly known as light field cameras – can also be used to augment photo editing. This camera captures light fields in a scene where many images are taken at different angles and allows photographers to alternate the focus and perspective of an image after it is captured because of the angular information of the light captured by the light field camera. However, there are limitations to its use: due to noise prevalence in most real-world light field data, depth maps, for example, which are derived from a light field captured through a plenoptic camera, appear unclear.

Recent Related Work

Fortunately, recent machine learning approaches for light field applications have shown the ability to be more precise and faster than previous conventional methods, even in the presence of noise. However, there is a lack of data surrounding light fields, both in terms of the number of datasets publicly available and the amount of data in current sparse light field datasets. This motivates the creation of more robust light field datasets to fully unlock the ability of deep learning techniques, as higher quality, more varied, and larger amounts of data lead to more accurate models.

Researchers have turned to machine learning algorithms to replace past manual calculation and human-reliant approaches to solve various problems. For example, scientists used to manually sort through large amounts of data, but with a machine learning model, scientists can solve tasks such as image classification or object detection faster and more accurately. Therefore, in light field applications, machine learning algorithms have recently been more widely used. For example, in 2019, Pei et al. developed a deep neural network to estimate whether a single synthetic aperture image is in focus for the task of synthetic aperture imaging, a technique where multiple viewpoints of light fields are used to simulate a large aperture camera with a large virtual convex lens with a camera array when their images are tied together [2]. Traditionally, CNNs, due to their robust ability to learn visual features from pictures, are trained on image sets, most notably ImageNet [9], to perform image identification and classification tasks; however, more recently, CNNs have also been used to estimate depths of light fields from light field datasets, as these machine-dependent algorithms can achieve faster and more accurate light field depth estimation results than traditional methods, such as correspondence matching between views and depth from defocus with synthetic aperture photography. For example, Shin et al. [1] developed a depth estimation CNN called EPINET that achieved top rank in the HCI 4D Light Field Benchmark [10] on assessment metrics such as bad pixel ratio, mean square error, etc. The EPINET design takes in four processing streams from four directions (vertical, horizontal, right diagonal, left diagonal) of sub-aperture picture as input and outputs four independent light field representations. Then, Shin et al. combined these feature maps to produce singular and higher-level representations to estimate depth. However, the HCI synthetic light field dataset only has 28 scenes in total and the EPINET’s network architecture has a small receptive field, meaning it can only handle a limited spacing between cameras, making its state-of-the-art (SOTA) performance limited and unrepresentative of performance on real-world light field data [11].

To address the lack of data surrounding light fields, Shin et al. [1] also used data augmentations, or modifications on some parts of data. Then, the modified data are added to an existing dataset of the same medium to expand the dataset. Because there is not enough variety of publicly available light field data and machine learning algorithms are usually highly dependent on the amount and variety of data to be trained accurately, Shin et al. utilized augmentations such as scaling, rotations, transpositions, etc. to increase the amount of data for sparse existing light field datasets on the HCI dataset. In the end, using augmentations helped increase the depth estimation accuracy of their EPINET algorithm. However, augmentations result in a limited increase of data, as only different crops, rotations, and other modifications of clones of the same scenes are added to an existing dataset. Hence, this process leads to a limited increase of accuracy,

Others attempted dataset creation from scratch to increase the amount and diversity of light field data. Xiong et al. [12] captured three discrete 5D hyperspectral light field scenes (represented by f(x,y,u,v,λ)) with a special hyperspectral camera. Unfortunately, this dataset is limited in size and does not have disparity labels; therefore, it cannot be used for evaluating the performance of deep learning algorithms. In addition, it takes 16 hours for their MATLAB code to reconstruct a single hyperspectral light field, and although their hybrid imager hardware system, including a Lytro camera and a coded aperture snapshot spectral imager, recovers 5D light fields with both high spectral and angular resolutions, their expensive and time-consuming process deems it impractical for others to reproduce. On the other hand, Schamback et al. [13] created a 507-scene multispectral light field dataset, where each light field is represented by L (u, v, s, t, λ), using a designed scene generator to randomly output images as well as adding seven handcrafted scenes. Though their dataset is larger than Xiong et al.’s hyperspectral light field dataset, their multispectral dataset creation approach is complex, especially having to manually handcraft seven of the scenes. Besides, most cameras, and especially most light field cameras, are not multispectral, narrowing the practical applications of using a multispectral dataset.

## Materials and Methodology

In this paper, we present a novel approach to create more robust light field datasets to avoid the shortcomings present in previous augmentation and creation methods and to assist learning-based methods, such as the ones mentioned above, as they are shown to outperform manual optimization-based methods for calculating light field tasks. We plan to generate different RGB (red, blue, and green) proportions of synthetic light field datasets from scratch. Our approach is easy to create, set up, and modify using a Unity engine and covers a wide range of different hardware and camera parameters that can easily be changed for any light field tasks and applications, such as the three mentioned above. In addition, the datasets created by our approach are in the RGB color format, which is significantly easier to generate and covers a wider range of applications than hyperspectral and multispectral versions, as the images captured by most cameras, including mobile, hand-held, and most significantly, plenoptic or light field cameras, are in RGB format. Furthermore, our dataset’s light field images can be generated into any file type, e.g., png, jpg, etc. when being captured just by changing a method and String format in the C# Script, which means they are scalable for many different tasks and data requirements. Our goal is to offer a practical, convenient, and robust solution to deep learning and light field researchers by creating large datasets that are reliable as well as easily scalable for numerous light field applications and a wide range of deep learning algorithms. Therefore, in this paper, we present a Unity-based approach towards increasing the diversity of light field datasets to enable more machine learning approaches that are both speed- and accuracy-efficient for a variety of applications.

First, we discuss the software and materials used to generate our datasets. To create custom light field datasets in 3D, we utilize the capabilities of Unity (see Fig. 2), a cross-platform game engine that provides a variety of virtual scenes, backgrounds, prefabs, assets, and objects. With this engine, we conveniently set up a full scene with directional lighting and terrain as the background for our light field images. Afterwards, we develop the code to tune specific hardware parameters, spawn random objects into the empty terrain space, create and position multiple cameras to view this space, and take automatic snapshots of the entire scene. The code is written in the form of C# scripts that are embedded within the Unity project in Unity Hub 3.2.0. and can be found in our public GitHub repository here [14]. We use the Unity Editor Version of 2020.3.36f1 to leverage the objects, functions, etc. that exist in and are compatible with older Unity versions. In addition, we have stored six of our datasets in .png format of various sizes (18, 18, 40, 100, 200, 500 images) generated by our Unity approach into a public Kaggle dataset for others to use [15]. We choose Kaggle because it is one of the top platforms and Jupyter notebook environments for deep learning researchers and enthusiasts alike to utilize datasets to develop machine learning algorithms. Through Kaggle, our datasets can be easily used in the same platform, downloaded to a computer, or be loaded into any other IDE (code development tool) or Jupyter notebook for training deep learning models.

Next, we discuss the process to create our custom Unity dataset. A 3D graphics pipeline (see the top sequence diagram in Fig. 3), which includes view transforms, is the starting point we use to create our light field dataset [16] [17] (Nanyang Technological University, 2012; Wetzstein). We specifically modify the Vertex Processor, one part of the graphics pipeline described above, to create light fields (see the bottom sequence in Fig. 3). We adjust both the model and view transforms of this step. We use random model transforms to create randomized scenes for an infinitely large dataset of scenes, setting up fields of view and the spawning of random objects. We create functions in our C# script to automatically generate random objects of different shapes, sizes, and textures at different positions. Additionally, we use multiple view transforms to generate a light field given a particular scene by modifying the positions of and adding multiple cameras in our Unity engine setup so they can take snapshots of the scene from multiple angles. We create functions in our C# scripts that set up a specified number of cameras at different positions and take automatic snapshots of the scene, which we can modify easily by adding different textures, objects, and backgrounds. Combining the model and view transforms, we develop an automated Unity process that can create a full light field dataset within minutes.

With the end goal of generating more light field data to train robust machine learning models, we design an end-to-end algorithm using Unity graphics and virtual scenes to generate a dataset of random images and depths of random scenes relative to any number of cameras. As mentioned above, our algorithm generates the random positioning of objects in the Unity scene and allows for adding multiple cameras for multiple view transforms to capture a full light field. By taking quick snapshots from slightly different angles, we can create a large dataset containing light field snapshots to train a deep learning architecture. We write C# scripts to automatically generate objects at random positions within set ranges and boundaries that fit the rectangular scene and to automate snapshots of these scenes. Next, the cameras capture a snapshot with a specified Width x Height resolution (can be modified in our Unity program) from each sampled location on the UV plane, delete all the objects from the scene, then regenerate new random objects at new random positions, take a snapshot of this new scene, and the entire process repeats again. At the end of the process, all the snapshots are automatically stored into a Snapshots folder.

Our Unity-based method is much faster, more convenient, scalable, reproducible, and robust than all previous methods for the reasons below. The speed of data generation is only limited by the processing speed of the computer running our Unity engine. Using a Windows 10 desktop with a 3.60GHZ CPU and 1080 Ti GPU (more details of our computer hardware specifications are listed in Fig. 6), spawning up to one thousand objects takes only about four seconds or less, and creating a 2,000 light field image dataset with 40 objects in each snapshot takes less than five and a half minutes due to our automated random object spawner script that can generate new data synthetically from any angle in the scene and take a snapshot of each light field image or scene. We also have the flexibility of creating a dataset of any size for any situation by modifying the number of images variable in Unity. We can easily generate very large light field datasets (up to 2,000 snapshots within our RAM limits) by just adjusting several parameter variables, such as producing more than a thousand different scenes for a dataset, outperforming all three datasets mentioned above- the 28-scene HCI dataset [10], 5D hyperspectral dataset [12], and 507-scene multispectral dataset [13] in size and conveniency. In addition, we can easily increase the number of viewpoints by simply updating parameters in the same Unity scene to match any new hardware, such as another plenoptic camera. Our Unity set-up also allows for the tuning and modifications of numerous parameters, including the set number of cameras, number of objects, types of objects, textures of objects, etc. to increase the diversity of produced data. Also, because we have different random scenes for each data point, we ensure no point of similarity between any data points, therefore outperforming augmentation-based methods and Schamback et al.’s two-camera system for their dataset in terms of data variety. As mentioned before, data augmentations only utilize different modifications on clones of the same scene on a dataset and are therefore not as diverse as our approach. In addition, we choose to generate RGB light fields, where their three-channel colors replace light fields’ coordinates and spectral dependencies [13], because RGB light fields are more easily generated, computationally more efficient to create, and are more widely applicable for light field tasks than multispectral light fields, as most cameras use RGB formats. Therefore, our approach also outperforms Schamback et al.’s multispectral dataset by a second assessment metric [13]. Furthermore, our Unity-based approach is flexible, as its data generation parameters can easily be changed for different hardware setups and matches exactly with specific hardware parameter requirements, such as cameras’ resolutions and positions. For Xiong et al.’s hyperspectral dataset, they included different light field images, but their datasets were only made for a specific hardware setup. To make their datasets suitable for different hardware setups, their datasets must be changed or regenerated to match that specific hardware setup, which is cumbersome as they need to use both a Lytro camera and an imager to recover light fields. For our method, the parameters of our snapshots and locations on the UV plane can be updated easily in Unity to match specified hardware parameters, such as the number and position of cameras, etc. Our Unity scenes simulate the occlusions, reflections, and diffractions of light present in real-world scenes, which allows us to quickly generate accurate light field datasets without noise, without expensive hardware setups such as plenoptic cameras or spectral imagers, or human intervention. The only materials one would need to generate a light field dataset are a Unity3D engine and Unity Hub/Editor installed on a desktop or laptop and our publicly available code, which takes just minutes to create a light field image dataset of any size. Most notably, we developed our C# script to automate the generation of objects and capturing of images, requiring no manual work other than changing variables and parameters. Therefore, our approach is affordable, customizable, convenient, and reproducible.

Fig. 4: Screenshots of our C# scripts written in Visual Studio Code and embedded within the Unity game engine. On the left lists the customized variables whose values we can change, such as the number of cameras in the x and y directions, size of the rectangular screenshot, number of objects to spawn, etc. for any specific scene. On the right displays the two main functions: at the top, we spawn a specified number of objects at a position between several random ranges at set x, y, and z coordinate positions and at the bottom,we include a nested for loop to automate the screenshot capturing of the scene using multiple cameras.

A summary of our process: First, the objects are randomly spawned into the scene, then the cameras take a photo when the number of objects displayed is equal to the variable maxSpawnAmount. After a photo is taken of that scene, all the objects are destroyed, and the scene is empty again. Then, the counter num_images increases by one. This process repeats until the number of images is equal to the specified dataset size.

See our full code on Github at [14].

## Results

Our Unity program outputs a set number of images with the cameras at different x and y coordinate positions. As discussed in the previous section, the C# scripts that indicate variable parameters can be customized to change the number of images to capture, the number of translations the cameras make, and the distance the cameras move. As well as the viewpoint of the scene, every parameter within the scene can be customized to fit specific simulation specifications. Fig. 5 displays one of our datasets.

The amount, size, and quantity of objects spawned can be adjusted to simulate real-life measurements and quantities and are only limited by hardware (our specific computer hardware parameters are listed in the caption of Fig. 6). Our program is lightweight, computationally efficient, and CPU-efficient where a substantial number of objects can be spawned at once. Up to 1,000 generated objects in a single scene are tested and proven to be fast and stable with a variety of Unity hardware and variable parameter specifications, and up to 2,000 light field images for a dataset can be generated within five and a half minutes (see Fig. 6 for a comparison of data creation times).

Fig. 6. Table showing the speed of our dataset generation approach based on several dataset sizes. Each snapshot of a scene has 40 objects. The quick speed ensures the reproducibility and convenience of our method in generating new light field datasets, which is much faster than Xiong et. al’s method, which took 16 hours. Our Unity program was run on a Windows 10 desktop, Version 10.0.19044 and Build 19044 with a Intel Core i9-9900K @ 3.60GHZ CPU, NVIDIA GeForce GTX 1080 Ti GPU, and 32.0 GB @ 3200 MHz RAM.

## Conclusion

In this paper, we presented a novel approach for generating customizable light field image datasets that are quick, easy, customizable, and robust for machine learning and support various parameters of hardware on a UV plane. As discussed in the methods section, our Unity algorithm allowed us to take quick snapshots of slight shifts in viewpoint and angle of scenes by adding objects and multiple cameras, producing highly customizable and variegated light field datasets. Our results suggest that, with just a computer and free Unity software without any special equipment, a large dataset can be generated in a convenient as well as fast way.

Because using machine learning requires a lot of high-quality data to train robust models accurately, our customizable datasets produced using Unity can provide larger and more scalable datasets for both existing and future deep learning algorithms developed for the purpose of disparity, depth, or shape estimation of light fields. Deep learning based architectures, such as EPINET [1], have been shown to be more robust, accurate, and faster than light field cameras and optimization-based techniques for the above tasks.

Overall, our custom curated dataset can be used for machine learning models in an expansive variety of light field applications, notably synthetic aperture photography, depth estimation, 3D representations, and more.

## Limitations and Future Directions

Because Unity provides virtual scenes only, all our datasets are synthetic and without noise, which is unrepresentative of the real world as real-life cameras and images may include noise. Hence, in the future, we plan to simulate real-world noise by adding Unity3D’s built in Cinemachine Noise Properties, such as Perlin noise, which helps simulate random movements to our virtual cameras [19]. In addition, our light field scene simulations are not perfect representations of the real world, as most Unity graphics are not as detailed as real objects and scenes. Adding more complex graphics to our objects and scenes in Unity would take more time to run, since more computation would need to be performed to generate each dataset sample, but once they are added, it can be used to generate datasets with more real-life scenes which is worth the investment in the long run.

Thanks to the promise of machine learning as a robust and accurate mechanism for numerous applications, e.g. audio processing, image classification, etc., but specifically for light field tasks, our novel Unity-based approach can easily be used as a reproducible mechanism to automatically create large variegated datasets to train any machine learning model to maximize its efficiency and accuracy. For future research, we plan to develop a novel machine learning model ourselves, such as a convolutional neural network or autoencoder, to test the robustness and usability of our Unity-generated light field datasets.

## Acknowledgements

We would like to thank the SHTEM Summer Research Internship Program managed by the Stanford University Compression Forum and its organizers Cindy Nguyen, Sylvia Chin, Carrie Lei, Eric Guo, and Kaiser Williams for providing us amazing opportunities to participate in scientific research. Specifically, we would like to thank our mentor Manu Gopakumar, an Electrical Engineering PhD student from Stanford University, for his guidance throughout our research.

## References

[1] Shin, C., Jeon, H. G., Yoon, Y., Kweon, I. S., & Kim, S. J. (2018). Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4748-4757).

[2] Z. Pei, L. Huang, Y. Zhang, M. Ma, Y. Peng and Y. -H. Yang, “Focus Measure for Synthetic Aperture Imaging Using a Deep Convolutional Network,” in IEEE Access, vol. 7, pp. 19762-19774, 2019, doi: 10.1109/ACCESS.2019.2896655.

[3] Levoy, M., & Hanrahan, P. (1996, August). Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques (pp. 31-42).

[4] M. Levoy, “Light Fields and Computational Imaging,” in Computer, vol. 39, no. 8, pp. 46-55, Aug. 2006, doi: 10.1109/MC.2006.270.

[5] M. W. Tao, S. Hadap, J. Malik, and R. Ramamoorthi. Depth from combining defocus and correspondence using light field cameras. In Proceedings of International Conference on Computer Vision (ICCV), 2013.

[6] W. Huang and Z. Jing, ‘‘Evaluation of focus measures in multi-focus image fusion,’’ Pattern Recognit. Lett., vol. 28, no. 4, pp. 493–500, 2007.

[7] M. Kristan, J. Per, M. Pere, and S. Kovai, ‘‘A bayes-spectral-entropy based measure of camera focus using a discrete cosine transform,’’ Pattern Recognit., vol. 27, no. 13, pp. 1431–1439, 2006.

[8] J. Tian and L. Chen, ‘‘Adaptive multi-focus image fusion using a waveletbased statistical sharpness measure,’’ Signal Process., vol. 92, no. 9, pp. 2137–2146, 2012.

[9] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255).

[10] K. Honauer, O. Johannsen, D. Kondermann, and B. Goldluecke. A dataset and evaluation methodology for depth estimation on 4d light fields. In Proceedings of Asian Conference on Computer Vision (ACCV), 2016.

[11] Leistner, T., Mackowiak, R., Ardizzone, L., Köthe, U., & Rother, C. (2022). Towards Multimodal Depth Estimation from Light Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12953-12961).

[12] Z. Xiong, L. Wang, H. Li, D. Liu, and F. Wu, ‘‘Snapshot hyperspectral light field imaging,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 3270–3278

[13] Maximilian Schambach, Michael Heizmann. (2020). A Multispectral Light Field Dataset for Light Field Deep Learning. IEEE Dataport. https://dx.doi.org/10.21227/y90t-xk47

[16] 3D graphics with OpenGL. 3D Graphics with OpenGL – The Basic Theory. (n.d.). Retrieved July 22, 2022, from https://www3.ntu.edu.sg/home/ehchua/programming/opengl/CG_BasicsTheory.html

[17] Wetzstein, G. (n.d.). The Graphics Pipeline and OpenGL I: Transformations. EE267. Retrieved July 22, 2022, from https://stanford.edu/class/ee267/lectures/lecture2.pdf

[18] Noise properties: Cinemachine: 2.8.6. Cinemachine | 2.8.6. (n.d.). Retrieved July 31, 2022, from https://docs.unity.cn/Packages/com.unity.cinemachine@2.8/manual/CinemachineVirtualCameraNoise.html

# The Progression of Perovskite Light Emitting Diodes (LEDs) in our Future

Authors

J. Guzman, R. Schweyk, and L. Traore

Abstract

Over 1.5 billion incandescent and III-V Light Emitting Diodes (LEDs) are currently in use in the United States today. Whether used in headlamps, flashlights, or lamp lights, these LEDs are consuming up 136 billion kilowatt-hours on a yearly basis in the United States. However, there is a possible solution concerning the new progress of perovskite LEDs. If we were to replace all the light sources in the United States that are dominated by III-V LEDs and incandescent light bulbs with the current progression of perovskites, would the enegry savings outweigh the financial cost, and what would this look like for the future of perovskite LEDs. Limited to their operational stability, perovskites as they stand are not a viable replacement. However, with the rapid evolution of perovskite LEDs as well as their superior optoelectric properties, it is probable that in the near future they will play a fundamental role in our daily lives.

In order to understand the cost effciency of perovskites, one must consider the economies of scale. When a company mass produces perovskites, instead of considering each and every material created in a single perovskite, one must take into account buying in bulk quantities. Not only this, but one must examine the advantages in energy consumption for perovskites in comparison to III-V and incandescent LEDs.

In this investigation, we will analyze the viability of replacing III-V LEDs and incandescent light bulbs with next-gen perovskite LEDs with an emphasis on the current operation stability of perovskite LEDs, how the stability will improve in the next 5-10 years, and when they reach an operational stability that rivals or surpasses that of III-V and incandescent technologies. Overall, this report will demarcate when it will make financial sense to switch to perovskite LEDs on a national scale.

# Background and Motivation

To understand what perovskite LEDs (PeLEDs) are, one must become familiar with what is available on the market. As of now the majority of the world utilizes Incandescent lights or LED bulbs as light sources. A light-emitting diode (LED) is a semiconductor light source that emits light when current flows through it. Before understanding the physics governing PeLEDs, we need to understand the ambipolar charge transport comprised of both holes and electrons that lead to light emission within PeLEDs.

Electrons can be described as subatomic particles with a net negative charge. Their most influential role is in the process of bonding individual atoms, known as atomic bonding. Electrons can be found in every element as they occupy the outer orbitals of an element. It is also important to understand electron flow, or in other words, current. This is essential when considering a semiconductor, as when a voltage is applied to a semiconductor, the protons (positively charged subatomic particles) are held stationary whereas the electrons flow with some drift velocity. An abundance of these charge carriers creates majority carriers, whereas a scarcity of these charge carriers creates minority carriers. This in turn induces a current through an electric field. This will be key in this investigation regarding some of the issues that currently face perovskite LEDs, including ion migration and Joule heating. Although similar to an electron in magnitude, holes have a net positive electronic charge associated with them. In essence, a hole represents the absence of an electron. Both however are charge carriers that are necessary for the current flow in ambipolar-based semiconductors.

Electrons in the semiconductor recombine with holes as electron-hole pairs, releasing energy in the form of photons thus leading to light emission. While perovskite light-emitting diodes depend upon the perovskite active layer which is in the form of ABX3, in lead halide perovskites, A represents a cation, B represents lead, and X represents a halogen. In the most common lead halide perovskites, halogens tend to most regularly be chlorine, fluorine, iodine, and bromine, the cations usually represent cesium or methylammonium, and as stated above, B represents lead. For example, CsPbBr3 is a typical lead halide perovskite.

Perovskites were first invented in the early 1950s only emitting red, but later green emission was possible. However, the color blue was and still is challenging to create, being 150 times less stable than green and red. Blue’s instability limits a wide color palette for perovskites to display, therefore impeding perovskite commercialization. One of the particular advantages of PeLEDs over current generation LEDs is that PeLEDs emit light with very narrow emissions resulting in ultra-high color purity. Thus, Lead Halide Perovskites are actively researched for their light emission capabilities and their promise for next-generation lighting and display technologies.

Considering all these factors required in the progression of perovskite LEDs, each and every component is essential in creating the most basic perovskite. In this report, we will further examine the cost analysis as well as the plausibility of our proposal. Given their current operational stability issues, we will predict how far into the future that these perovskite LEDs will be a suitable replacement for all the current incandescent and III-V LEDs in the United States.

# Methods and Results

To analyze when the US can transition to perovskite LEDs rather than incandescent and III-V LEDs, one must consider the output power of these PeLEDs. White light is the current most often utilized color of light, consisting of a superposition of three colors: red, blue, and green light. Once one determines the amount of power each color individually consumes, we can sum up the wattages resulting in a final approximation of the amount of power white light would need to be at its best efficiency.

## I. Power Analysis

When considering the power of each individual light source, the equation P=IV will be necessary to keep in mind. Boiled down to four specific steps, at this point one can thoroughly investigate the necessary amount of power needed to allow for a certain color of light to be emitted. First, one must gather the peak EQE (external quantum efficiency), the ratio between the amount of photons leaving the system to the amount of electrons entering, of the color[] Once this is gathered, use the EQE to characterize how much power the PeLED is consuming at that given operating point using J-V-L graphs. Similarly, one must do the same and compare this voltage number to a graph correlating J (current density) to voltage. This is one of the most important steps in the process as J is measured most commonly in mA/cm^2 which will be relevant in the step that follows. When understanding the equation P=IV, P represents power, I represents current, where V represents voltage. Then, one must input the data gathered into this equation. Although plugging in one’s value for V is simple, current is a bit more complicated. Since we are given units in mA/cm^2, one must multiply this value by the device area each perovskite substrate is created on, leaving with the overall current. Only then, will one be able to complete the calculation and solve for power.

After a thorough investigation of these criteria, we can begin the exact calculations. Take for instance red perovskite LEDs. Having a peak EQE currently at 17.8%, the corresponding voltage at the best efficiency occurs at 2.2 V. Similarly, the corresponding J value results in 8mA/cm^2, but must be multiplied by an area of 2.25 cm^2 as explained above. This has a result of approximately 41mW. Now, we can apply these same steps to green and blue perovskite LEDs. Green perovskite LEDs have their peak EQE at 21.63% being the highest of the three. With a voltage of 3.5 V and a current density value of 5mA/cm^2 both at the best efficiency of the perovskite, this results in a total of 63 mW. Although blue perovskite LEDs are still in development as are the others, blue perovskite LEDs have a long way to go in terms of stability and EQE as it is a very difficult color to produce. Containing a peak EQE of 10.11%, a voltage most efficient at 3.5 V, and a J value most efficient at 15 mA/cm^2, the power results in 118 mW .

The data above is reasonable as one must consider the amount of energy each color consumes as a total. Utilizing the equation $E= \frac{h c}{\lambda}$, the colors with the largest wavelengths will result in the least energy, and understanding the equation $P= \frac{E}{t}$, the greater the energy, the greater the power. When analyzing each color with this information, blue has the smallest wavelength and therefore consumes the largest amount of power. In contrast to blue, red has the longest wavelength so, in turn, has the least amount of power, whereas green falls in between the two.

Since all of these perovskites are currently not at a competitive EQE or stability to replace incandescent and III-V LEDs, it is important to understand how much they need to improve. State-of-the-art green LEDs reside at an approximate 38.4% EQE whereas red LEDs reside at 35.48%. The reason why these values are at such a high percentage is for the very reason of optical lenses. Optical lenses are utilized to ultimately direct light in a single forward direction. EQE is only measured by the forward light emission of the perovskite LEDs. Keeping this into consideration, when perovskite LEDs do indeed reach this level, it will be the logical solution as a replacement. However, this now begs the question of when we will be able to replace all current incandescent and III-V LEDs with perovskite LEDs in the future, and what does the future of blue perovskite LEDs look like.

## II. PeLED Operational Lifetime and EQE Estimates:

In order to predict commercialization of PeLEDs, one must consider the following factors: blue’s current and predicted EQE, Red, green, and blues’ projected operational lifetime in 10 years and their operational lifetime using ion migration suppression methods.

### Blue EQE Prediction:

Currently, blue perovskite LEDs are not at their maximum external quantum efficiency. In comparison to green and red which have come close to reaching their maximum EQE, blue is still behind. Knowing that green and red’s projected maximum EQE is around 25%, blue is expected to reach this maximum of 25% as well. In order to predict when blue perovskite LEDs would reach their maximum EQE, this investigation first gathered blue EQE from the years 2016-2021. With the information collected this research created a graph displaying the points to form a line of best fit. To create the line one had to find the average between the points. This average made the equation of Y= -825.58/x +50. Note that when using this equation one must use the last two digits of the year for “X”, for example in 2016, use 16 for x. This equation grows asymptotically because blue’s EQE would eventually plateau and approach a maximum of 25%. This means that in the year 2034 (12 years), blues predicted EQE will reach 25 percent (view equation (A) for solution and figure (6) for growth).

In addition, all red, green, and blue perovskite LED colors are significantly behind the operational lifetime of an LED thus withholding perovskite commercialization. This however is temporary as advances and predictions can be made to show when perovskite LEDs will surpass LEDs. First, the research started by predicting each color’s operational stability within the next ten years. This would help understand how rapidly perovskites are truly growing. In order to do so, research gathered papers that reported each color’s operational stability throughout the years.

### Predicted Blue PeLED Operational Lifetime in 10 years:

Starting with blue, in 2019, blue’s operational stability was 14.5 minutes, 51 minutes in 2020, and 81.3 minutes in 2021. With this data, one can create a line of best fit to average the points, create a growth rate, and predict future operational stability. Once done, research was able to average the data to formulate an equation of Y=33.4x-619.1, note that to again predict the years following, one must use the last two digits of the year. With this information, one replaced 31 to our x, representing 2031(10 years from now). This means that by the year 2031, blue would reach a predicted operational lifetime of 7 hours (view equation (B) for solution and figure 7 for growth).

It is important to notice that blue’s operational lifetime growth is linear. This is extremely valuable since blue is the most challenging color to create due to its large bandgap which will be further explained once compared to red and green’s operational lifetime growth.

### Predicted Red PeLED Operational Lifetime in 10 years:

Just like blue, the investigation took similar steps to predict red perovskites’ operational stability in 10 years. Again, one collected red’s previous operational stability throughout the years using research papers to predict a growth rate. In 2017, red operated for 16 hours; it increased to 30 in 2018 and currently in 2021, 317 hours. This data is a bit different from blue since it grows exponentially. Therefore instead of using a line of best fit that is linear, research made it exponential. This has to do with the fact that red is a much easier color to accomplish than blue allowing it to advance faster. With that said, the red exponential equation is $y=0.0000273735 \times 2.6169^{x}$ found by the average between points. With this equation, one imputed the year 31, representative of 2031 into the x of the equation. This shows that by the year 2031 red is predicted to operate at 726,735 hours, an extremely noticeable difference between our current operational stability( view equation (C) for solution and figure 8 for growth).

### Predicted Green PeLED Operational Lifetime in 10 years:

Again the research repeated the same process, this time gathering green’s perovskite operational stability to help figure out what the operational lifetime would look like in 10 years. The research found that in 2016 the green perovskite’s operational stability was ten hours, 2 years later in 2018 the LEDs operated for 46 hours, and finally, in 2021 the perovskites operated to their current maximum of 208 hours. This allowed the creation of a line of best fit that was exponential, being the average of the points $y=0.0033 \times 1.169^{x}$. Then use the equation to input the last two digits of the year into x as 31, representing 2031(10 years from 2021). Results in green’s operational lifetime to be 39,105 hours (view equation (D) for solution and figure 9 for growth). This operation stability for the year 2031 is a bit smaller than red’s considering that red is the most efficient.

Knowing that the blue’s bandgap is the widest, it directly correlates to the growth rate. The bandgap denotes the minimum energy required to excite an electron into a state that allows it to conduct current in the conduction band. The valence band is the lower energy level, and if there is a gap between this level and the higher energy conduction band, energy must be added to allow electrons to flow. As described, green and red have smaller band gaps which cause longer operational stability and exponential growth. Blue however has a wider bandgap which causes shorter operational stability and linear growth. Blue requires more energy to emit a photon because of the higher energy band gaps ultimately compromising the efficiency of perovskites.

## III. PeLED Commercialization Viability

### Red PeLED Commercialization:

Now that sufficient equations for each color have been gathered, predictions of commercialization can be considered. In order to find when each color can accomplish commercialization, one must acknowledge that an average LED can last 6 ×106h. Using this information, one can use the red perovskite equation to input 6 ×106h as our “ Y “ representing the operational stability to help find “X” our year. Doing so one can use $0.0000273735*2.16953^{x}=10^(6)*6h$, this answer would result in 33~34, representative of the year 2034. This means that by the year 2034 the red perovskite should be able to compete with LEDs. (view equation (E) for solution)

### Green PeLED Commercialization:

One can repeat the process for the green perovskite where we utilize our equation to input our desired operation lifetime. By replacing our “Y“ as 6 ×106h simplified to $10^{6}*6 =.00336293*1.69116^{x}$ is equivalent to 40 representing 2040 years of commercialization. By the year 2040 green perovskites are estimated to commercialize (view equation (F) for solution)

### Blue PeLED Commercialization:

Lastly, for the blue perovskite, one must solve for the estimated year which was $(33.4x)-619=10^{6}*6h$ equal to the year

Figure 8 Displays the equation $y=0.0000273735 x 2.6169^{x}$ to help predict the operational lifetime of the red perovskite in 10 years. Figure 9 displays the equation $y=.0033 x 1.69116^{x}$. To help predict operational lifetime of the green perovskite in 10 years.

179659, 179659 representative of the year 181,659.25. This year appears different in comparison to the other perovskite predicted commercialization year in which they were in the years 2000. To convert the number as one did for red and green one added 2000 which resulted in blues predicted commercialization year to be 181,659.25. ( view equation (G) for solution)

Blue is immensely behind, however, this is without taking into account advances that will be made in the future. For one, one must consider the fact that these numbers do not utilize ion migration joule heating suppression methods in addition to new research and studies that will specifically help advance the blue perovskite.

## IV. Pathways to enhance PeLED performance and stability

### Ion Migration Suppression Strategies:

As of now, there are many methods utilized to counteract migration and joule heating. One being B-site engineering. B site engineering is doping B-site cation ions with metal ions like Mn2+. Traps in the mid-gap can prevent electrons from recombining with holes, ultimately limiting the perovskite as light cannot emit. Reducing traps allows electrons to fall freely without encountering any obstructions. The addition of Mn2+ reduces the trap density, which results in less ion migration. Mn2+ can lower defect density (the number of defects ) in perovskites, reducing ion migration and resulting in greater operational stability. With that said, a study that utilized this method caused blue perovskites to operate 1,440 times longer than their original undoped perovskites. This study also applied B-site engineering to red perovskites allowing them to operate for 305 times longer than the undoped perovskites.

The second method was using Precursor Solution Composition Optimization. There are three crystallinity orders in a perovskite (substrate). There is amorphous, which means there is no discernible order. There is crystalline, where everything is in order and visually is similar to a checkerboard. Lastly, Polycrystalline indicates that halide ions can be arranged and slightly distributed in tiny groupings. There are pockets of small groups in polycrystalline thin films that will be in order. However, grain boundaries will exist inside the limits of those groupings.

In a polycrystalline material, a grain boundary is a point where two grains, or crystallites, meet. Grain boundaries are two-dimensional defects in the crystal structure that reduce the material’s electrical and thermal conductivity. This indicates that there are flaws in the device that cause it to degrade. Halide ions, such as bromide, chloride, or iodide, can easily cross grain boundaries since they have the lowest activation energy.

These halide ions are at grain boundary cusps, and because they are not strongly attached to the group, they can be whisked away by the electric field, resulting in ion migration. Halide ions are being swept across and concentrated at some point due to the electric field being applied to the layer, which is producing the most segregation. Instead of employing a cesium bromide, we can use a different method. CsTFA-derived films have a flatter energy landscape (a more homogenous energy level distribution for charges), a more stable crystal structure, superior optical characteristics, and reduced ion migration as compared to the CsBr method.

As a result, such grain boundaries get passivated, or the lead halide ions become more difficult to respect within the magnetic field. This causes tighter films where there are no bubbles, (effectively a sheet compared to cesium bromide with defects), less ion migration occurs. A research group utilized Precursor Solution Composition Optimization which allowed for green perovskites to operate 17 times more efficiently.

### Applying Precursor Solution Composition Optimization to Green PeLEDs:

Our research can then apply these numbers to the current undoped operational lifetime. Precursor Solution Composition Optimization allows green perovskites to function 17 times more efficiently. Therefore one first had to find a number multiplied to 17 to then calculate to 10^(6)*6 h resulting in 352,941h. One can then calculate the year our “X” in which green will be able to reach 352,941h. Solving for the year 35 representative of in the year 2035, green will be able to commercialize ( view equation (H) for solution).

### Applying B-site Engineering to Red PeLEDs:

Applying B site engineering red perovskites will be able to operate 305 times longer in comparison to our original predictions. One must find the minimum number that can be multiplied with 305 (305 representing how many times more the perovskite would last being doped) to be equivalent to 10^(6)*6h.

Set of equations A-J is processes taken in order to predict blue EQE (A), Blue perovskite estimated operational lifetime in ten years(B), Red perovskite estimated operational lifetime in ten years (C), Green perovskite estimated operational lifetime in ten years (D), Predicted Red perovskite commercialization year (E), Green perovskite commercialization year (F), Blue perovskite commercialization year (G), Green perovskite estimated commercialization year using Precursor Solution composition optimization(H), Red perovskite estimated commercialization year using B-site Engineering(I) and Blue perovskite estimated commercialization year using B-site Engineering(J).

This number is 19,672h, now replacing it for “Y” representing operational lifetime, into our red perovskite equation. Let us solve for “X” resulting in 26 representing the year 2026. This method allowed for the reduction of 7 years, in comparison to not using any ion migration suppression methods ( view equation (I) for solution).

### Applying B-site Engineering to Blue PeLEDs:

The same process can then be repeated for the blue perovskite. Instead, however, ion suppression made the operational lifetime 1440 times longer. The research found the minimum number that could be multiplied to 1440 (1440 representing how many times more the perovskite would operate for, being doped) to be equivalent to $10^{6}*6h$. Resulting in 4166h, which was used to represent Y operational lifetime into our blue perovskite equation. Then research proceeded to solve for X as our estimated year. Doing so, formulated the number 143 representing the year 2143 when the blue perovskite would be able to commercialize (view equation (J) for solution). Although this is significantly sooner than the undoped blue perovskite commercialization prediction there are still other factors one must consider. Since perovskites are rapidly evolving there are still numerous studies and ion suppression methods that can be applied. With this in mind, new research can be applied to all color’s estimated commercialization year. Regardless of red and green being able to commercialize sooner than blue, blue is still a color with great improvements expected over the next few years.

## V. Modeling the Economies of Scale towards Mass Production

To evaluate the total cost of mass-producing perovskite LED materials in the project of replacing all light sources in the US, we would need to apply economies of scale to our calculations. Economies of scale is the total average cost savings obtained by an enterprise for a greater quantity of production. As production increases, the total average cost decreases, resulting in a lesser total cost than the sum of the price per unit. In the generic model, the variables P1 and P2 represent the cost of production, and the variables Q1 and Q2 represent the quantity of production, and with an increase in the quantity of production (Q value), there is a gradual decline in the cost of production.

Our model evaluating costs, considering the impacts of economies of scale, was developed from a quotation by the Stanford Congreve Lab for TPBi, providing numbers for price at different numbers of grams. From these numbers, we were able to gather initial price numbers and establish a trend. The numbers initially provided were $566.00 for 2 grams and$1341.00 for 5 grams, thus $283.00/g and$268.20/g respectively, while the cost for 1 gram is between $600 and$700. The development of a model for economies of scale necessitated the development an asymptotic graph model based on these numbers, given costs for 2 grams and 5 grams respectively, to calculate the price per gram for 10,000 grams and price for bulk production of 1 billion at this rate.

For an increase in gram quantity by a factor of 2.5, there is a 5.23 % decrease in cost. This number was achieved by subtracting 268.20 from 283, the respective price per gram for 2 grams and 5 grams, and evaluating that percentage of the resulting 14.8. By this model, we can calculate a projected $169.69/g for 10,000 grams, and a projected$1.70/g for 1 billion. The asymptotic model for TPBi prices for increasing gram quantities follows this trend. Having calculated our model for the TPBi layer, we can then apply it similarly to each layer, which would be expected to follow a similar price trend.

To apply these numbers, we must consider the fact that for TPBi, 10,000 grams amounts to 1,000,000 substrates. For the ITO layer, $250.00 is the approximate price for 100 substrates, and by an increase in substrate by a factor of 250, there is a 0.05% decrease in cost per substrate. For 1,000,000 substrates, the cost can be estimated to be$237.48, a mere fraction of a cent per substrate at that quantity. Similarly, an approximate $125.00 corresponds to the quantity of 100 substrates for the PEDOT layer, and by the same process, the total cost for 1,000,000 substrates can be estimated to be$118.74. For the CsPbBr3 layer, the original costs were found to be for $38.00 CsBr and for$10.64 for PbBr2 for 1 substrate, and once more by the same process, the cost for 1,000,000 substrates can be estimated to be $34.29 for CsBr and$9.60 for PbBr2. For the LiF layer, $1 corresponds to the price of LiF for 1 substrate, and for 1,000,000 substrates, the cost can be calculated to an approximate$0.90. Finally, $0.27 corresponds to the price for 1 substrate in the Aluminum layer, and the cost for 1,000,000 substrates can be similarly estimated to$0.24. These costs were originally gathered from Sigma-Aldrich and Ossila.

Finally, from our calculated value of 3 to 4 cents for the cost of 1 substrate for a perovskite LED, we can apply our model to find this number at mass production, which would amount to an approximate fifth of a cent for a gram increase by a factor of 2.5. At production of 1 billion, the price would be about 11 million USD.

## VI. Financial, Performance, and Energy Analysis of Transitioning to PeLED-based Lighting

After taking a cumulative approach to this research of perovskite LEDs, the ultimate question of if the price of perovskite LEDs is worth the energy consumption can finally be answered. Considering the equation $P = \frac{E}{t}$ where P equates to power, E equates to energy, and t equates to time, it is possible to calculate the energy of each color perovskite LED given power and time. Finally, one must combine all these values to find the total energy the US would have to use to power lighting with perovskite LEDs. Considering these perovskites will be at the level where they can compete with current LEDs, one must assume each color perovskite is at their maximum EQE of 25%. The calculations proceed as follows: $((0.063*1.1)+(0.041*1.4)+(0.118*2.5))*(60*60*6*360*328,200,000)$. By multiplying each power value by specific numbers to approximate them to be around 25% EQE, this would give us a better and more accurate representation of the future. Multiplying this by 3,600 gives our value in hours rather than seconds, and final multiplying by the amount per year for every person in the US. This equates to approximately $1.5*10^{15}$ kWh. Compared to the 136 billion kWh the US uses in energy consumption a year to fuel light emission, this does not seem to be a logical fit currently, but with joule heating methods along with the rapid improvements of these perovskite LEDs, they are sure to reach the stability and energy of regular LEDs in the future.

## VII. Cumulative Research Prediction of White Peled

Taking into account power analysis, operational stability, and economies of scale, research is sufficient to help predict how much power a white PeLed would utilize, how long the peled would operate for, and how much the product would cost the U.S as a whole. As previously stated the blue Peled has an operational lifetime of 81 min noticeably behind the green and red Peled operational lifetimes, therefore if one were to utilize multiple blue peled in one white Peled it would help increment lifetime. For example, to create a white PeLED one red, green, and blue peled is needed. However one could utilize ten blue PeLEDs to increment the lifetime by 10 times as long as only one blue peled is used at a time. Nevertheless, there are limitations such as the cost of having multiple PeLEDs in one unit or utilizing an extreme amount of power to operate. Therefore as all colors currently stand research would only utilize 173 blue LEDs. Particularly, 173 blue perovskites in one white Peled because this would allow for the operation to increase to 207 hours (This lifetime being close to green Peleds current operational stability which is at 208 hours). Research multiplied the power needed to light up a blue Peled by 173. Finally adding the power needed to operate one green and one red Peled. The power for this Peled would equate to 20.518 kW. According to calculations 173 blue PeLEDs, one red and one green in one unit to supply the U.S population would cost 128 million. As they currently stand, they would not be the best alternative.

However, research can also calculate the lifetime of the white Peled in 16 years. It is important to consider how the white Peled would change in the future. Particularly in 16 years, since the green perovskite has reached operational stability of 1 million. At this time the blue perovskite would operate for a total of 10 hours, and could potentially operate for 1,000 hours using 100 blue Peled in one unit. This amount of blue Peleds would take less energy as time progressed, the power calculating to a total of 11.9 kW to operate one white LED in the year 2037. The cost of the white Peleds is 127 million dollars to fully transition. This number is lower in cost for a longer operational lifetime and less power needed compared to where they currently stand. It is also important to consider that in the future there could be new advances to combat ion migration which could also improve power, operational lifetime, and cost.

# Challenges Encountered

We are currently facing many problems when considering the stability of PeLEDs. At the moment being unstable, the goal for the future of perovskite LEDs is to eventually increase the stability of each perovskite, however, there are some issues that need to be solved that are preventing us such as ion migration, joule heating, etc.

Before understanding the problems with ion migration, we first must understand the meaning of an electric field. An electric field is a field that physically surrounds electrically charged particles which allows for the repulsion and attraction of other electrically charged particles. Ion migration occurs in the perovskite layer, where either cations or anions, typically halide ions (anions) in lead halide perovskites, as seen in figure 3, approach either the negative or positive side of the electric field, which creates unevenness throughout the perovskite. Once they congregate near the terminals, any sort of light emission will be uneven which in turn compromises the performance of the device. This is solely due to the electric field.

Joule heating occurs in the electron transport layer, the perovskite layer, and the hole transport layer. Joule heating allows for and creates heat, which is produced by current flowing in the material. The problem with this is that it can raise the temperature of the material by up to 40 degrees celsius which also, in turn, allows for degradation of the material and a decrease in instability. Any material is optimized at 200k, which progressively regresses as temp increases, which decreases current since electrons are more scattered. No flow amounts to no current, which results in no light emission. This is solely based on a thermal effect.

Another contributing factor to perovskite degradation is bandgap width, this can cause problems with operational lifetime, particularly with the color blue. The reason why red is the most efficient color is due to its small band gap width. Blue for example has the widest band gap width out of the three colors, this means that there is more energy that must move an electron. As for red, which has the smallest band gap width, the electron does not need as much energy to the dropdown. With smaller band gaps, we are able to have greater stability for red and green. For blue, since the color has a wider bandgap, occasionally the electron gets trapped in the middle of the bandgap; this area is called the midgap. Here the perovskite cannot emit light, as light emission does not occur in the mid-gap, only when the electron falls. This is one of the major problems faced as band gap width is not something that can be changed. Unlike green and red which have longer operational lifetimes, blue currently faces this problem and will continue to face it. In addition, the traps in the mid-gap also affect the operational lifetime to not only the blue perovskite. Therefore jeopardizing particularly the blue perovskite lifetime.

Another challenge encountered was the lack of perfect accuracy for our model of economies of scale. The economies of scale model utilized to calculate the cost values at various quantities is an imperfect measure of estimates. The model is based on very small quantities of TPBi, thus it could have been made more accurate if costs for larger quantities had also been provided for comparison. Additionally, all estimates are based on the model for TPBi, so they are expected to have slightly more inaccurate estimates for the other materials. However, although it does not specifically account for such inaccuracies, the model is itself an estimation and only approximates the trend in cost change at larger scales.

# Conclusion

Whether keeping into account each individual color of perovskite LEDs or the culmination of them all together, they most undoubtedly will play a fundamental role in our future. Each aspect explained upon in the preceding sections of this paper show countless evidence of the rapid evolution of these perovskites when considering either the power of each color, their operational stability, or the price of each perovskite.

The power analysis played an equally important role in determining the reliability of each perovskite. Considering the equation P=IV, the peak EQEs of each color, applying J-V graphs, and combining them collectively to get a final value for power, all portray the importance of the power analysis. As the rise of each color’s EQE continues, so will the efficiency of perovskites, which perhaps may be the most important factor in transitioning from III-V and incandescent LEDs to perovskites. With joule heating methods, perovskites are improving at exponential rates being very promising to their future.

By utilizing previous perovskite operational lifetimes there was sufficient information to predict if Peleds are worth the transition. Starting with the prediction of EQE for the blue perovskite, where research proved to show growth is rapid. Green and red perovskites too have shown extreme n operational lifetime improvements for the future. While the blue perovskites do seem to struggle, there are advances such as B-site engineering and precursor solution composition optimization to combat their short operational stability. Most importantly, the current white Peleds would not be a reasonable transition with the blue perovskite being the largest setback. However, with time we can be sure to see white Peleds thrive.

A significant consideration to evaluating the cost of transitioning to lighting by perovskite LEDs was economies of scale. At the high number of production necessary for this replacement, the total financial estimate could not be made accurately by multiplying the price per unit by the intended number of units, as such a calculation would overestimate the actual cost. Thus, the reduction in increase was calculated based on the original prices gathered from Sigma-Aldrich and Ossila. The economies of scale assumed for these calculations was a continued 5.23 % decrease in cost per gram for a quantity increase by a factor of 2.5, and by the model we used, this amounts to the LED cost of a fraction of a cent for a gram increase by that trend. Currently, to supply the U.S population with next-generation lighting, it would cost \$128 million.

# References

1. Aneer Lamichhane, Nuggehalli M. Ravindra, Energy Gap-Refractive Index Relations in Perovskites, Materials, 10.3390/ma13081917, 13, 8, (1917), (2020).
2. ResearchGate. Nov. 2014, http://www.researchgate.net/publication/ 280845761_Nobel_Prize_in_Physics_The_birth_of_the_blue_LED. Accessed 10 Aug. 2021.
3. Congreve Lab.” Congreve Lab, congrevelab.stanford.edu/.
4. “External Quantum Efficiency.” External Quantum Efficiency – an Overview | ScienceDirect Topics, www.sciencedirect.com/topics/engineering/external-quantum-efficiency.
5. Jiang, Yuanzhi. “Spectra stable blue perovskite light-emitting diodes.” Nature Communications, 23 Apr. 2019, http://www.nature.com/articles/s41467-019-09794-7. Accessed 27 July 2021.
6. Ma, Dongxin. “Chloride Insertion–Immobilization Enables Bright, Narrowband, and Stable Blue-Emitting Perovskite Diodes.” ACS Publications, 9 Mar. 2020, pubs.acs.org/doi/abs/10.1021/jacs.9b12323. Accessed 27 July 2021.
7. Ren, Zhenwei. “High-Performance Blue Perovskite Light-Emitting Diodes Enabled by Efficient Energy Transfer between Coupled Quasi-2D Perovskite Layers.” Onlinelibrary, 2021, onlinelibrary.wiley.com/doi/epdf/10.1002/ adma.202005570. Accessed 27 July 2021.
8. Iopscience.16 Oct. 2017, iopscience.iop.org/article/10.1088/1361-6528/aa8b8b/ meta. Accessed 27 July 2021.
9. Dong, Qi. “Operational stability of perovskite light-emitting diodes.”Iopscience, 2020,iopscience.iop.org/article/10.1088/2515-7639/ab60c4/pdf. Accessed 27 July 2021.
10. Li, Hanming. “​​Efficient and Stable Red Perovskite Light-Emitting Diodes with Operational Stability>300 h.” Wiley Online Library, 9 Mar. 2021, onlinelibrary.wiley.com/doi/abs/10.1002/adma.202008820. Accessed 27 July 2021.
11. Wang, Yanan. “Ultrastable, Highly Luminescent Organic–Inorganic Perovskite–Polymer Composite Films.” Advanced Materials, 2016, lcd.creol.ucf.edu/Publications/2016/Advanced_Materials.pdf. Accessed 27 July 2021.
12. Lin, Kebin. “Perovskite light-emitting diodes with external quantum efficiency exceeding 20 per cent.” Nature, 10 Oct. 2018, http://www.nature.com/articles/ s41586-018-0575-3?WT.feed_name=subjects_physical-sciences. Accessed 27 July 2021.
13. Han, Boning. “Green Perovskite Light-Emitting Diodes with 200 Hours Stability and 16% Efficiency: Cross-Linking Strategy and Mechanism.” Wiley Online Library, 17 Apr. 2021, onlinelibrary.wiley.com/doi/abs/10.1002/adfm.202011003. Accessed 27 July 2021.
14. Band Gap.” Energy Education, energyeducation.ca/encyclopedia/Band_gap#:~:text=A%20band%20gap%20is%20the,it%20can%20participate%20in%20conduction. Accessed 10 Aug. 2021.
15. Dong, Qi. “Operational stability of perovskite light-emittingpercent diodes.” Iopscience, 2020, iopscience.iop.org/article/10.1088/2515-7639/ ab60c4/pdf. Accessed 27 July 2021.
16. Wikipedia. en.wikipedia.org/wiki/Grain_boundary. Accessed 10 Aug. 2021.
17. Xiao, Peng. “Advances in Perovskite Light-Emitting Diodes Possessing Improved Lifetime.” Nanomaterials, 4 Jan. 2021, http://www.mdpi.com/2079-4991/11/1/103/htm. Accessed 10 Aug. 2021.
18. Kenton, Will. “Economies of Scale.” Investopedia, Investopedia, 10 Apr. 2021, http://www.investopedia.com/terms/e/economiesofscale.asp. Accessed 22 July 2021.
19. “Economies of Scale – Definition, Types, Effects of Economies of Scale.” Corporate Finance Institute, 31 Jan. 2021, corporatefinanceinstitute.com/resources/knowledge/economics/economies-of-scale/. Accessed 9 Aug. 2021.

# Investigating the Extent to which Discrepancies in SpO2 Readings Occur due to Differences in Patient Skin Pigmentation

Authors

Hillary Khuu, Jun Hyun Park, Carolina Pavlik, Sylvia Chin

Abstract

Pulse oximeters measure the level of oxygen saturation, also known as SpO2, in a person’s arterial blood to diagnose patients with diseases such as anemia, chronic obstructive pulmonary disease (COPD), and sleep disorders. They were developed in 1974 and are the primary method for measuring oxygen saturation. However, some studies show that pulse oximeters may display inaccurate readings when used on individuals with darker skin tones. In light of the COVID-19 pandemic, where pulse oximeters have been frequently used as self-diagnosing tests, solving technical issues with pulse oximeters has become ever so crucial.

Our project aims to discover the extent to which pulse oximeters work more faultily on individuals with darker skin tones and to determine what factors cause inaccurate readings. To obtain an intimate look into pulse oximetry use in hospitals today, we repurpose a dataset of 400 patients and cross-analyze several demographic points to pulse oximeter accuracy. Furthermore, we explore engineering methods to minimize discrepancies in SpO2 readings that may occur due to variances in patient skin pigmentation by creating our own pulse oximeters.

# Background

Pulse oximetry uses a technique called photoplethysmography, which uses red and infrared light to detect variations in blood volume [3]. A pulse oximeter shines a red light (wavelength of around 660 nm) and an infrared light (wavelength of around 940 nm) through the finger it is placed on, and a photo sensor on the opposite side of the finger absorbs the lights. Digital programs can determine the ratio between oxygenated and deoxygenated blood based on these light absorption levels and therefore provide patients with their SpO2 levels. This process is possible because healthy, oxygenated blood cells have a bright red color, whereas unhealthy, deoxygenated blood cells have a darker color, and thus absorb a different amount of red and infrared light. A healthy level of oxygen saturation hovers between 90% and 100%. An SpO2 reading of 94% indicates that 94% of a person’s red blood cells are healthy, and 6% lack oxygen [2].

There are two methods of pulse oximetry: reflective and transmissive. Reflective pulse oximetry is a process in which light is emitted through a finger, bounces off reflective material on the opposite side of the finger, and is then received by a photodiode located on the same side as the light emitters. Transmissive pulse oximetry, the more popularized and accurate method, involves the lights being emitted through a finger and a photodiode receiving them on the opposite side of the finger [4].

The accuracy of FDA-cleared pulse oximeters is between 2% and 3% of arterial blood gas values. Factors that may affect the accuracy of pulse oximeter readings include poor circulation, skin pigmentation, skin thickness, skin temperature, tobacco use, and fingernail polish [7].

In the 1970’s, when pulse oximeters were being created and tested, the population on which they were used was not racially diverse. Recently, research reports have concluded that pulse oximeters are more likely to miss low oxygen levels on darker skinned individuals compared to lighter skinned individuals. These researchers used the Fitzpatrick scale, a method of categorizing skin tone based on the ability to burn and tan. Their findings suggest that these devices may not be equally accurate on all skin tones. One explanation as to why the potential for racism in pulse oximetry has not been publicized until recently is because of the current U.S. Food and Drug Administration (FDA) requirements. As of 2021, the FDA requires at least two test subjects or 15% of subjects (whichever amount is larger) to be “darkly pigmented” [8].

# Methods and Materials

• Dataset from BMC Pulmonary Medicine journal’s “A multicentre prospective observational study comparing arterial blood gas values to those obtained by pulse oximeters used in adult patients attending Australian and New Zealand hospitals”
• 10 of Newark’s MCL053PD red LED
• 4 of Newark’s OP165A infrared emitter
• 4 of Newark’s TEFD4300 photodiode
• 1 of Newark’s A000066 Arduino Uno Board
• 10 of Newark’s MCF 0.25W 10K ohm resistor
• 10 of Newark’s MCCFR0W4J0331A50 330 ohm resistor
• 1 of Newark’s 759 jumper wires
• 4 of Newark’s SSL-LX5093UWW white LED
• 1 USB-A to USB-B cable
• 1 I2C 16×2 Arduino LCD Display Module
• Male to female jumper wires

Our project focused on examining the relationship between pulse oximetry readings and potential racial bias due to variations in skin pigmentation. To investigate, we contacted Janine Pilcher of “A multicentre prospective observational study comparing arterial blood gas values to those obtained by pulse oximeters used in adult patients attending Australian and New Zealand hospitals,” which was published in the BMC Pulmonary Medicine journal, and obtained her dataset of 400 patients [5]. While her research tested the accuracy of pulse oximeters (which provides the SpO2 value) against arterial blood gas tests (which provides the SaO2 value) for determining oxygen saturation levels, her team recorded information on skin tone as well, making the dataset viable for our project.

We used the Fitzpatrick scale as a reference for determining different levels of human skin pigmentation. The Fitzpatrick scale classifies human skin color into six categories that are dependent on the skin’s melanin concentration and reaction to UV rays. A low number on the scale generally indicates a lighter skin tone that burns more often than tans, while a high number generally indicates a darker skin tone that tans more often than burns [1].

We repurposed Pilcher’s dataset and organized the data by demographic information (i.e. Fitzpatrick type, gender, hospital location). We applied formulas to calculate the average percent error between the SpO2 and SaO2 measurements. We converted these computations into charts and plotted the patient Fitzpatrick scale skin types against the average percent error between SpO2 and SaO2 measurements to interpret whether there were discrepancies in the accuracy of measurements of different skin tones.

Additionally, we furthered our understanding of how pulse oximeters function by assembling our own pulse oximeters with instructions from Giulio Pons’ “Really Homemade Oximeter Sensor” [6]. Although the guide was excellent, some modifications were required due to accessibility issues. For example, instead of using a pre-assembled KY-039 sensor as the basis of the light emitting and sensing portion of the circuit, our group individually connected a red LED, infrared LED, and photodiode to create the sensor. Also, some parts of the code were edited to fit the parameters of our own pulse oximeter. Eventually, we were able to construct a working pulse oximeter that successfully detects heart rate as well as SpO2 readings.

# Results

We reorganized a dataset of 400 patients from Pilcher’s study in Australia and New Zealand, which compared the accuracy of pulse oximeters to arterial blood gas tests. From this dataset, we were able to organize patients on their Fitzpatrick scale classification, gender, hospital location, and percent error between SpO2 and SaO2 readings.

When creating our pulse oximeters, a significant challenge we faced in ensuring accurate results was correctly positioning the LEDs and photodiode. The red and infrared LEDs had to be placed on the top of the finger and the photodiode on the bottom so that the lights could be absorbed properly. Because of accessibility issues due to COVID-19, a casing could not be 3-D printed for the pulse oximeter, which required us to manually position the sensors, as well as tape certain components together for stability. Correctly placing the light emitters and sensors was crucial since incorrect placement could result in inaccurate readings or no readings at all. Also, the lack of at-home soldering material required us to intertwine the metal wires rather than solder them together, which proved to be difficult and tedious at times. As shown below in Figure 1, one member of our research group obtained a heart rate of 91 beats per minute and an SpO2 reading of 95%.

Some studies show that pulse oximeters display results with larger errors when used on individuals with darker skin complexions. However, a slight downtrend in percent errors is visible as the Fitzpatrick scale increases (see Figure 4), disproving our hypothesis. These contradictory results can be greatly attributed to the fact that the Australian and New Zealand study mostly conducted tests on patients that were types I through IV on the Fitzpatrick scale, with only one patient that was a V category and zero patients that were a VI category, as shown in Figure 3. Therefore, since most data points were obtained from patients with lighter skin tones, the study does not demonstrate a holistic view of the global community. Furthermore, Figure 5 shows that the average percent error varied for each gender, and Figure 6 shows that the average percent error varied for each hospital location. Such discrepancies in factors other than skin tone could also have contributed to the contradictory downtrend in average percent errors as skin tone became darker.

# Conclusion

While the results of our research suggest that patients between types IV and V on the Fitzpatrick scale obtain more accurate pulse oximeter results than those between types I and III, different results may have been obtained if an equal number of patients from each Fitzpatrick scale were tested and the population size of each category was substantially greater. Our results highlight the necessity of diverse test subjects and new FDA requirements. The FDA requires only two or 15% of patients to be “darkly pigmented” but fails to specify what qualifies as “darkly pigmented” [8]. To clarify, the FDA should list specific Fitzpatrick scale numbers. In addition, the requirements should be adjusted so that the test subjects in each Fitzpatrick scale category make up around 15% of the total in order to reach equal representation. These adjustments should be applied to all IoMT devices, not solely pulse oximeters, to remove biases based on racial traits.

Additionally, the study was conducted in only the Australian and New Zealand regions, which further strengthens our belief that further research should be conducted on a diverse pool of subjects in order to represent all groups of the world. While our data suggests that percent errors between SpO2 and SaO2 readings decline as Fitzpatrick scale increases, we would need to globally expand our project to definitively trust the results.

Furthermore, while the six categories of the Fitzpatrick scale are beneficial for simplicity, they are oversimplified and do not account for important nuances between darker skin pigmentations as type V and VI were added many years after the lighter tones [9]. The Fitzpatrick scale is the current scientific standard, and while it has room for improvement, it is superior to its predecessor, the Von Luschan’s chromatic scale (VLS). VLS consists of 36 categories, as opposed to the Fitzpatrick scale’s 6, so each category is far more specific, making it more challenging to have a classification for every skin pigmentation. Also, the large number of categories has led to inconsistent results when classifying skin tones. However, one benefit to VLS is that it categorizes skin based on pigmentation. In contrast, the Fitzpatrick scale classifies based on the ability to burn and tan, which may not always correlate with the expected skin pigmentation and does not correspond to any race [10]. Therefore, the FDA should require that the race of the subjects be recorded in conjunction with their Fitzpatrick scale classification to ensure equal representation in the research.

# Future Directions

Through our project, we hope to contribute to the mitigation of racial bias within the medical field in order to improve the lives of all patients. Because COVID-19 has limited human interaction and the ability to work hands-on, we were not able to reach as far into our goals as originally planned. However, future work would include obtaining our own dataset with an equal number of individuals from each Fitzpatrick category to ensure a diversely represented pool of subjects. Testing pulse oximeters on a diverse population would allow us to detect potential discrepancies in SpO2 readings between Fitzpatrick categories.

Furthermore, we would continue to develop stringent recommendations for FDA requirements to ensure safety and accuracy for all patients regardless of skin tone. In addition, we would consult with pulse oximetry experts on their opinions and experiences with inaccurate SpO2 readings that result from variances in skin pigmentation as a method of confirming our hypothesis. We would then continue to modify our pulse oximeters so that skin pigmentation would no longer have any effect on pulse oximeter accuracy. We would test our novel pulse oximeters on the aforementioned group of diverse subjects to confirm the efficiency and accuracy of our successful modification. To further eliminate racial biases and inequities in healthcare, we intend to expand access to our pulse oximeters by making them inclusive and affordable for everyone.

# References

1. Beveridge, Chloe. “Determining your skin type on the Fitzpatrick scale.” Current Body Editorial, 4 May 2018, http://www.currentbody.com/blogs/editorial/determining-your-skin-type-on-the-fitzpatrick-scale. Accessed 21 July 2021.
2. Chan, Edward D., et al. “Pulse oximetry: Understanding its basic principles facilitates appreciation of its limitations.” ScienceDirect, vol. 107, no. 6, 2013, pp. 789-99, https://doi.org/10.1016/j.rmed.2013.02.004. Accessed 21 July 2021.
3. Cheriyedath, Susha, M.Sc. “Photoplethysmography (PPG).” Edited by Yolanda Smith, B.Pharm. News Medical, 27 Feb. 2019, http://www.news-medical.net/health/Photoplethysmography-(PPG).aspx. Accessed 21 July 2021.
4. Nitzan, Meir et al. “Pulse oximetry: fundamentals and technology update.” Medical devices (Auckland, N.Z.) vol. 7 231-9. 8 Jul. 2014, doi:10.2147/MDER.S47319
5. Pilcher, Janine et al. “A multicentre prospective observational study comparing arterial blood gas values to those obtained by pulse oximeters used in adult patients attending Australian and New Zealand hospitals.” BMC pulmonary medicine vol. 20,1 7. 9 Jan. 2020, doi:10.1186/s12890-019-1007-3
6. Pons, Giulio. “Really Homemade Oximeter Sensor.” Project Hub, 13 May 2020, create.arduino.cc/projecthub/giulio-pons/really-homemade-oximeter-sensor-7cf6a1. Accessed 21 July 2021.
7. “Pulse Oximeter Accuracy and Limitations: FDA Safety Communication.” U.S. Food and Drug Administration, 19 Feb. 2021, http://www.fda.gov/medical-devices/safety-communications/pulse-oximeter-accuracy-and-limitations-fda-safety-communication. Accessed 21 July 2021.
8. “Pulse Oximeters – Premarket Notification Submissions [510(k)s]: Guidance for Industry and Food and Drug Administration Staff.” U.S. Food and Drug Administration, 4 Mar. 2013, http://www.fda.gov/regulatory-information/search-fda-guidance-documents/pulse-oximeters-premarket-notification-submissions-510ks-guidance-industry-and-food-and-drug. Accessed 15 July 2021.
9. Sharma, Ajay N., and Bhupendra C. Patel. “Laser Fitzpatrick Skin Type Recommendations.” National Center for Biotechnology Information, 11 Mar. 2021, http://www.ncbi.nlm.nih.gov/books/NBK557626/. Accessed 8 Aug. 2021.
10. “Von Luschan’s chromatic scale.” Wikipedia, 20 June 2021, en.wikipedia.org/wiki/Von_Luschan%27s_chromatic_scale. Accessed 1 Aug. 2021.