Renee Aziz, Stella Chen, Raken Ann Estacio, Kyle Heller, Anthony Sky Ng-Thow-Hing, Jonathan Sneh, Shirley Wang, Patrick Zhu
COVID-19 has challenged the current status quo of all aspects of society; theatre and theatrical performances have been swept up within the uncertainty and face multiple challenges to continue in their current form. As such, virtual adaptations have become a shaky new ground of experimentation as theatre companies — both ametuer and professional — struggle to maintain performances and ensure theatre’s future. Our work through Stanford’s STEM to SHTEM internship allowed us to experiment with various forms of virtual experiences in order to produce a performance that could accurately fulfill the requirements of theatre while confined to an entirely virtual space. The final production, entitled “YOU ARE HERE (AND HERE AND THERE) focused on themes of relativity, perspective, and morals, and built an entirely online performance consisting of multiple platforms and story “tracks” for different audience members to experience. In this paper, we will iterate through our writing process and the technology and platforms used to build the performance, discuss our experience as the creators and crew, examine audience feedback, and discuss the tentative future of this form of performance, as well as the prospects this opens up for theatre as a whole.
The COVID-19 pandemic has ravaged the world, tearing down corporations, disrupting education, and forever altering society as a whole. Theatre is no exception from the chaos as many state and county health orders barred the gatherings required to produce and perform theatrical works. Traditional methods of theatre have been crippled, unable to maintain normal operation. These unprecedented times have forced the theatre community to experiment with technological implementations for live performance and adjust the parameters of theatre, venturing into risky, unexplored realms.
As thespians started to explore the viable possibilities of digital platforms such as video conferencing software, the shortcomings of virtual theatre were swiftly revealed. Instead of being fully immersed in a play on a stage, the experience of audience members is confined by a single rectangular screen. The lively ambiance emitted by characters has dissipated; instead, they have become shallow and one-dimensional. Much of what has characterized Western theatrical performances over the past millennia is evaporating before our eyes.
All of this leads to a new, pertinent question: Can technology resolve the jeopardized state of theatre? If not used wisely, technology itself could easily complicate problems in the theatrical world. Some attempts to modernize traditional Broadway musicals through the production of films—such as the adaptation of Cats—drew widespread criticism, as the “life” of the performance was suppressed by a one-dimensional forced perspective. In an environment when digital plays seem to be the only way to keep the theatrical culture alive, theatre professionals are forced to reevaluate the core characteristics of theatre, and how technology can be used to enhance —rather than eliminate—those values. Clearly, there are differing definitions of these characteristics amongst the theatre community.
Through Stanford’s STEM TO SHTEM summer internship, we created a theatrical performance called “YOU ARE HERE (AND HERE AND THERE)” consisting of multiple platforms and three different paths for audience members to explore. In this paper, we will introduce the technical aspects of the performance we created, discuss our observations as creators as well as the audience’s perspective, and talk about the prospective future of theatre through the deeper implications of our performance.
We began our production by exploring the platform most popularly used for virtual performances – Zoom video conferencing. Turning our cameras on and off served as a virtual alternative to entering and leaving the stage. The virtual background feature was used in place of backdrops and sets, with experimentation in exploiting the spotty background-filtering software to create illusions of floating objects or portals. We used the video filter software Snap Camera to alter our faces and apply masks for characters, mimicking the costume set-ups for traditional theatre while also giving us access to effects not available in conventional theatre without costly prosthetics.
Even with the numerous features, Zoom was still a lackluster performance platform. Video latency was choppy. The audience was in control of their own settings and could accidently see our “backstage”. Instead of using technology to enhance our performance, it felt as if technology was merely a poor translator: Zoom forcefully adapted a physical performance into a digital one. To combat the one-dimensionality of Zoom, we decided to take a multi-platform approach instead. Hours of research went into deciding the optimal platforms to use. As a team, we emphasized the notion that the platforms chosen must bring an aspect into our digital performance not achievable through traditional means.
We started by drafting a story that could take advantage of the live interactive experience that characterizes theatre. The final draft of the script centered on themes of spacetime, celestial bodies, and perspective, using multiple different pathways to enable greater interactivity and cast-audience connection. The story sees the audience as a group of space explorers taking their final exam to receive a space exploration license. After a lecture about stars—led by an eccentric face-filtered alien professor—their exam is hacked by a mysterious “star merchant”, who leads the explorers to buy property on a star and transports them to one of three celestial objects: a black dwarf, neutron star, or black hole. We divided audience members into three separate “tracks”, one for each star type. To navigate the separate tracks, we developed our own dynamically loading website that progressed the audience members through a quiz, as well as serving as a controlling hub and gateway between acts of the performance. As the quiz concluded, audience members returned from their track with different overall experiences. Each track used different digital platforms in their performances.
On the “Black Dwarf” track, audience members received a transmission from a space-protection agent through the live streaming platform Twitch. We wanted to emulate the feeling of being in a spaceship and having the ship’s screens and controls hijacked. Twitch’s emphasis on the stream itself and limited interactivity best helped us achieve that effect. Initially, we used Zoom, but we noticed that it was hard to immerse and engage an audience through a monologue. Many people easily got “Zoom fatigued.” At the end of the Twitch stream, the audience was transported to a star, which turned into a Black Dwarf. The space-protection agent forced them to aid her in restoring the star, transporting them to a different planet to meet the locals and collect materials. This planet was entirely built out of a web of Google Docs, which acted as live chat-rooms with clickable links and images. Audience members were tasked with retrieving fuel, and saw characters interact through a text-format. At times, this experience almost ventured into the world of video games due to a high level of interactivity.
On the “Neutron Star” track, the scene started with an argument between a mother and her son. Audience members dialed in to a Zoom phone call to give the participants a sense that they were eavesdropping on the conversation, making the entire scene feel more intimate and connected. The track then progressed to a platform called High Fidelity, which emulated the surface of a neutron star. This platform allowed people to join two-dimensional rooms, where audience members traversed around the map and talked to each other through spacial audio. The two-dimensional properties of High Fidelity represented the powerful gravity of a neutron star that would instantly flatten anyone within the celestial body’s vicinity. Throughout the scene, our actors interacted with audience members by asking them for their opinions mid-argument, encouraging them to vocalize their own opinions about the practicality of living on a neutron star.
The “Black hole” track opened in a traditional Zoom room with virtual backgrounds configured to give the appearance of being on a spaceship. The scene centered around two ship captains bickering about their morals. One character had more capitalistic values and supported harvesting stars while the other was an environmentalist and believed in preserving space. The Zoom meeting was designed to be interactive by having audience members perform specific actions, such as using objects around them as props. This forced them to stay engaged with the scene and provided them with agency. Through Zoom’s screen share feature, we played a video to simulate traveling through outer space. We synthesized prerecorded effects and live performance by having actors react to the different events occurring in the video. Following the Zoom scene, the audience was taken to a YouTube livestream where they were led through a virtual tour of a star. Here, the audience was engaged by answering questions in the live chat and taking a poll, causing them to ponder their own ideas regarding the tradeoff between capitalism and environmentalism.
The closing scene aimed to connect the theme of perspective to contemporary moments while bringing audience members back into reality. Through Google Earth, we displayed the houses of each audience member, emphasizing connectivity despite being all around the world, and creating a sense of intimacy and interaction. It also served to cement our theme of perspective by showing that, while they had wildly different experiences in the show, they all lived in the same reality.
Throughout the performance, we additionally utilized OpenAI’s GPT-2 text generation model to generate certain character’s lines, as well as the poetry used in the closing Google Earth sequence. We mainly used it because of the experimental nature of the project, but to also add a new area of constraint in order to further induce creativity in writing with the AI-generated segments.
During performance runs, we also had a subset of the crew act as “tech support”, allowing audience members to refer to them whenever they got lost while switching between or within platforms. Our performance tech support crew had roles that could be seen as analogous to what a stage manager would do for a physical performance, giving cues and ensuring that things went smoothly.
The virtual space provided a unique safety net; messages could be sent to actors through Slack or Zoom chat to give cues or raise alerts, even while they performed. As the performances were taking place, we had various technical difficulties both from the audience and from the crew. For example, not every actor or member of our team was able to tell what was happening in other pathways. When we felt that audience members were not getting the experience we intended, we attempted to resolve the issue through Slack or private messaging in the Zoom chat. Throughout the performance, we communicated constantly and relied heavily on our improvisation skills, mitigating the impact of technical issues on the performance.
Sometimes audience memberswould take a path not assigned to them. Some users did not transition to the website or tried to join the wrong Zoom room, causing them to jump to a different track; this was likely indicative of an issue with this form of performance, a technological ineptitude or skill-based barrier to entry. One audience member returned to the performance to experience a different pathway but instead was placed on the same pathway three times. Internet connection was thankfully not a huge issue. Most performers did not experience their internet cutting out, but if it did, their roles were covered. Almost no audience members experienced connectivity issues, and for those who did, their internet problems were resolved quickly.
Even though our performances contained a few technical errors and bits of unplanned improvisation, we were delighted with our results. We surveyed audience members one week after our performances and many respondents expressed awe in the creativity allowed by the art form. The fact that, even a week removed from the performance, audience members felt a lasting sense of amazement demonstrates that a virtual space does not limit the emotional impact a performance can leave. Our belief that this form of performance is viable going forward was validated by our audience feedback.
Audience members had conflicting opinions pertaining to the connection they felt with other participants. Nearly three quarters of survey respondents felt a significant connection with others, especially when using Zoom—where everyone’s faces were visible—or in the Youtube livestream—where audience members could discuss their opinions in the comment section. On the other hand, there were times when the audience felt confused. For instance, some reported that the instructions were not clear in the beginning, while others felt overwhelmed due to quickly switching between platforms. All of these could make the performance hard to navigate.
Over three quarters of audience members polled considered this performance to be “theatre.” Some expressed that this was more engaging than traditional theatre at times. One anonymous audience member wrote:
“For me, theatre is about real-time interaction among the actors, between the actors and the audience, and sometimes among the audience members, in ways that are adaptive and may influence the experience. This performance had all of those elements, and to a stronger extent than in traditional theatre.”
However, some audience members disagreed with this sentiment. They believed that it was a “digital performance” or an “experiment” rather than theatre. Regardless, most of the audience reported a shift in their perception of what a virtual performance entails. One anonymous audience member described:
“It became clear that one can become extremely engaged and immersed, no less and even more than with traditional theatre, via virtual theatre when it’s done right, and that virtual theatre has a lot of potential to develop substantially given how effectively it was carried out in this performance using existing technology.”
Despite all the challenges that we faced during the performance, we were able to broaden the scope of theatre while exploring the immense possibilities that technology had to offer.
Our performance, through the abilities of our actors and unique platforms, created a compelling story—one that left viewers to contemplate the unstable relativity of time, the ethics between good and bad, and unity during isolation. Our team whittled down the crucial facets of theatre into a few components/principles: theatre must captivate the audience and leave them with a newfound perspective. It must usher them into contemplation about fundamental ideas within our world and how we function. It is not the setting that makes the actor; it is the actor that carves out the space. An excellent actor should be able to intrigue viewers from any environment, in-person or virtual. Audiences can still be moved to tears or into fits of laughter even in the comfort of their own homes. The pure, unbridled feeling of seeing actors perform and elegantly present a story should be the focus of theatre.
Although we did not have the physical space that theatre traditionally requires, we incorporated unconventional environments for performance such as Twitch and High Fidelity, ultimately altering our vision of theatre and what constitutes as a theatrical performance. The three paths provided each group of audience members a different perspective—creating room for discussion. The closing Google Earth scene gave the audience a feeling of uncertainty and disorientation, mirroring the same emotions created by the COVID-19 pandemic. Our show’s interactivity added to the engagement aspect of the performance: audience members were constantly on their toes, needing to move from platform to platform.
Modern society is becoming increasingly technological, and this move towards technology has only been further exacerbated by the presence of the pandemic. To adapt to this almost completely virtual way of life, we must be willing to broaden the scope of what we define as theatre, and discover new ways to interact with audiences. We have been forced to alter our perception by reconsidering some elements of theatre that we had originally thought of as crucial to the performance (i.e. a physical space). The changes that COVID-19 has brought upon the theatre industry should not be seen as disadvantages; instead, they should be viewed as opportunities of experimentation that could potentially transform the artform. Like Willem Defoe proclaimed, “With theatre, you have to be ready for anything.”
Our unprecedented performance has opened up an avenue of theatrical production that must be explored. As artists navigate this new environment, we must experiment with the range of resources available to revolutionize the vision of theatre in the 21st century. The possibilities of experimenting with virtual platforms and different technologies are endless; it is up to us to uncover what their roles in theatre are.
 Donaldson, Kayleigh. “Why The Cats Movie Is So Bad.” ScreenRant, 15 Jan. 2020, screenrant.com/cats-movie-bad-reasons-cgi-songs/.
 You Are Here (And Here And There). By Byte-Size, STEM-TO-SHTEM. 4-25 Jul. 2020, Online. Performance.
Evan Huang, Joanne Hui, Michelle Lu, Ganesh Pimpale, Jennifer Song
Robotic dexterity and adaptivity are extensively valued in industrial settings such as manufacturing companies or assembly lines due to their propensity to reduce latency and also the requirement for human involvement. Consequently, these attributes are often modeled after the human hand, which is considered to be one of the most versatile mechanisms concerning object manipulation given its powerful grip and its ability to manipulate small objects with great precision. Although that hardware with the potential to mimic the human hand does exist, there are few options for intelligent software that can autonomously handle objects in conjunction with this hardware. As a step towards producing this software, we created a pure object identification algorithm to discern the optimal means of holding a complex object. The algorithm proceeds by deconstructing complex objects into pure shapes of different parameters, which are then manipulated to determine the grasp that imposes the least amount of movement from the hand’s initial position and the least amount of pressure applied to the hand and object. As a matter of course, this program is also able to validate the grasp and, upon confirmation, undergo a test process involving the optimal grasp and pure object.
In addition to its complexity and utility, the human hand is known to be one of the most intricate mechanisms of the human body. Furthermore, most products are manufactured to be used by human-like hands. Giving such dexterity to robots allows them to interact with products designed for human use, granting functionality toward this software for social and interactive robots. Its utility is also highly valued in industrial scenarios where the dexterity and adaptiveness of robots directly impact the amount of human involvement needed and the number of manufacturing delays. Using hardware and software that allows for robotic dexterity enables robots to fix misplaced or improperly assembled products in a more efficient manner that requires less human interaction.
In the field of robotic dexterity, there have been projects focusing on this topic but the majority of the work is done using a two-finger claw or suction cups. The current research provides information on the main challenges and algorithms that can be repurposed for this project. However, these algorithms are not created for the same hardware, making it difficult to transfer over. Consequently, only research done in pure image and data processing can transfer to this project.
Since the concept of vision-based grasping algorithms is not entirely new, such algorithms have been used in the past for claw based mechanisms but there has not been an intelligent translation given to human-like hardware. Some vision-based systems utilize feature detection, which searches for certain places to grasp the object and often rely on object identification. Other vision-based grasping algorithms also work by using object identification and determining the grasping method from the classification of the object. Although this method works for common and recognizable objects, complex objects can be hard to classify, leading to nonoptimal grasping methods.
Apart from vision-based grasping algorithms, there have also been data-driven approaches. These algorithms are blind to the type of object and instead look for certain features on the object. This eliminates the need to create 3D models of the objects and to estimate the position of the objects, allowing robots to work with ease regarding very complex objects. However, this method calls for fairly high-quality training data, which also requires a variety of environments, objects to grasp, and other physical aspects of the robot. Training data has also taken a “learning by example” approach as data has been produced by remote robot operation as well as directly editing the training data to achieve faster learning and better performance. Although there has been an increase in performance, this approach of data collection is not scalable and it is difficult to obtain such high-quality training data.
Commonly Used Terms and Definitions
Listed here are some commonly used terms and their definitions:
Object: The physical object to pick up
Pure Object: A simple component of the object
Pure Object Parameters: Values that determine the size and shape of the pure object; these values change between objects.
Specificity: The amount of precision and accuracy the calculations have OR The level of detail something is calculated to
In the case of our project, this will be defined as the voxel size
Simple Object: Object that can be represented using one pure object
Complex Object: Object that can be only be represented using multiple pure objects
Note: Simple and Complex object depends on the specificity of the scenario
Due to COVID-19 restrictions, as well as the cost of actual hardware, we decided to do all of our testing through simulations. To do this we designed all of our hardware using Solidworks, a GUI CAD (Computer-Aided Design) tool, and PyBullet, a virtual physics simulator, to run our simulations.
Designing the Hand
The objective of the hand is to mimic the design and function of the human hand and match the performance of current mechanical hands. Our initial design was based on the Open Bionics Brunel V2.0 hand. This is a 3D printable hand that has two joints per finger and a two-axis thumb. Although the hand had all the mechanical functionalities to work in our scenario, due to the detail of components inside the hand, it was very difficult to simulate due to the lack of computing power needed. As a result of this issue, we created our hand design as seen to the right. This hand is scaled to match the size of a human hand and includes joints that mimic the motion of human fingers as accurately as possible. Each finger has three joints except for the thumb, which has two. The thumb is attached to the thumb mount at a joint that allows it to move in front of the palm. The thumb mount is attached to the palm at a joint that allows the thumb to move laterally closer and further from the rest of the fingers. The hand has more functionality than most market available mechanical hands, making it not ideal as we would like to match the current hardware as best as possible. However, this still provides the level of functionality that we need.
Designing the Arm
The arm design is based on BCN3D’s Moveo arm, which features six axes of rotation. Those six axes provide more mobility and dexterity to the arm and hand compared to a simpler arm with just one joint. In the figure below, the pink square represents where the hand would be mounted onto, and the green arrows represent the directions in which each joint is able to move. In the original model of the Moveo arm, there is a two-finger claw attached at the top. However, in our design, we mount our human-mimicking hand to create the most realistic and human-like robotic arm possible. The majority of the arm is 3D printable and the rest of the parts constitute relatively low-cost hardware, making it a very realistic option for real-world use. A diagram of the arm and some annotations can be seen below.
Designing the Setting
In the setting (as seen above), there is a green and red section and a metal stand. The metal stand is used to hold cameras and lights. Cameras can have one of two purposes: data input, where the data are given to the software for the robot to process, or monitoring, where they judge the grasp used compared to other algorithms. The object that the robot needs to pick up is placed in the green section so that its shape can be identified. The solid matte green background helps with performance in the initial meshing by removing the issue of textures that can interfere with the color comparison stages of the mesh generation. Once the robot has determined the optimal grasp, it picks the object up and places it into the red section.
Creating URDF Simulation Files
In order to run the simulation, all of the CAD files have to be exported as a URDF file using the SolidWorks to URDF exporter (SW2URDF). To do that, we had to configure the parts of the hand into a link tree, as seen on the right. The trunk of the link tree is the base part and then each other link extends from the base.
Creating the Designed Hardware
Most of this hardware can be made using either a CNC or a 3D printer. The setting, which is simple and doesn’t have many fine details, could be made using wood and a CNC. However, the arm, which has small and complex parts, would need to be made using a 3D printer so that not as much material is wasted. The hand could be 3D printed, but it would not be able to move, since the hand was designed solely for simulation purposes. The hand design does not include any motors, tendon threads, or any mechanisms to make it move, making it useless to print.
Collecting Data and Data Processing
Simulation Logic and Data Collection
The purpose of the simulation is to produce data on the best gripping motion for each shape, depending on its unique parameters such as radius, height, width, length, etc. Because an optimal grip is defined as the grip requiring the least amount of movement and the minimum amount of pressure on the hand, we collected the torque and position values for each joint on the hand. Thus, it was only necessary to import the hand and each shape into the simulation. An overview of the process is detailed below:
To generate each shape with a range of sizes, we used openSCAD’s Python extension, SolidPython, along with the Python package Itertools. We generated data for four different shapes: cones, cylinders, ellipsoids, and rectangular prisms. The program takes in a range of values, with which Itertools is used to find all possible permutations of a length-dependent on what the shape is. For example, rectangular prisms have three parameters (width, length, and height), so the program finds all possible permutations of length three. Because there is no built-in function in SolidPython to directly build ellipsoids, we scaled a sphere with a vector instead. However, this requires four parameters (radius, x, y, and z), creating many more shapes for ellipsoids than the other three shapes. After using SolidPython to generate the openSCAD code to create each shape, the shapes are rendered as .scad files and exported using openSCAD’s command line to a single STL file that is continually updated with each shape. This STL file is referenced in the URDF file that is loaded into the PyBullet simulation. By constantly updating a single STL file with each iteration, the shape in the simulation will be updated with each step of the simulation.
After the shapes are generated, the hand and current shape iteration are imported into a PyBullet simulation on a plane. Then, the hand must move to grip the shape with a predefined gripping strategy. We currently have defined two gripping methods for each of the four shapes: a two-fingered grip, meant for small objects, and a full-palm grasp, meant for larger objects. For small cylinders, the thumb and index fingers grip the two bases; for large cylinders, the full palm grasps the middle of the cylinder body, wrapping the fingers around. For all rectangular prisms and ellipsoids, the hand will hold the narrower side, whether with two fingers or with the full palm. For cones, the hand will hold it at the base. To delineate the border between “small” and “large” objects, each object is tested with both grips, and the torque and position values for each joint are exported to a CSV file using Pandas. An artificial limit between the two grips is set by the user based on these torque and position values.
Unfortunately, the simulation was not finished within the time frame of the program; we finished generating the shapes, importing them into the simulation, and retrieving the torque and position values for each joint, but the code for each grip is unfinished. More gripping techniques may be necessary to cover all types of objects, especially very large ones.
Data Processing and Produced Trendlines
After each grip has been set to be used for a certain range of sizes, trend lines are made for each grip to compare hand position to the size of each shape. An example is shown below:
This data was retrieved from very preliminary test trials. The joint position values are retrieved by PyBullet’s built-in getJointStates() function. After the hand moves to grip the designated object, the position values are saved and added to a CSV file to create the graph. Each joint’s position is kept track of, and each line in the graph corresponds to the movement of a certain joint as cylinder radius increases. In general, as the cylinder’s radius increases, most joints will have to rotate more, with the direction dependent on which joint it is. However, most of the base joints generally do not move much. After more trials are done with each grip, these trend lines would be used for determining where to grasp a certain pure shape; after the shape to be gripped has been determined, the parameters for the shape would be plugged into the corresponding trendline to see where each joint should position in order to properly grip the object. Because we were unable to finish the code for gripping techniques, we do not currently have all of the trend lines completed.
Pure Object and Parameter Recognition
The complete Pure Object Recognition (POR) system’s purpose is to find pure objects and their size from multiple photos of the object. This algorithm mainly consists of SfM (Structure from Motion) concepts which will be briefly described below.
Input: Camera data
Use the ODM implementation of OpenSfM to generate a mesh from images
Classify voxel groups into:
Collect the size data of each of the identified pure objects
Select an object to grasp based on how similar each identified object is to the pure object
Use the generated trendlines [explained above] to calculate the initial values
Perform a grasp validity test to ensure there is a proper grip on the object
Output: Confirm grasp and perform placement
(All steps of the flowchart will be described with much more detail further below)
Mesh and Voxel Generation
The first step in the POR system is to create a predicted 3D model of the object. This will allow the software to estimate the shape and size of the object and predict the shape of the object that is not visible to the camera.
Input: Multiple Images
Look for recognizable objects in the provided images
Is there a recognized object?
Run the collected image data and classification through a Neural Network trained on the ShapeNet dataset
Generate a mesh from the given point cloud
Voxelize the mesh and export it to a file
Run the ODM software on the images
Voxelize the given mesh and export it to a file
Output: File containing Voxelized mesh
The first step in this process is to determine if the object is recognizable. This step can significantly improve the performance as if the object is recognizable it is much easier to build a predicted mesh using the already known data. If an object is recognizable, the classification of the object and the image data can be run through a neural network trained on the ShapeNet dataset. The network will generate a point cloud that can be meshed and then turned into voxels using a simple recursive algorithm. Examples of each can be seen below.
If the object is not recognizable, the process becomes a bit more complex. In order to obtain similar results as one could from the ShapeNet model, we now use OpenSfM (Structure from Motion). OpenSfM allows you to take multiple images of a setting and it will then stitch them into a point cloud formation. OpenSfM does this by using an incremental reconstruction algorithm. This complex algorithm can be reduced to three main steps: First, find pairs of images that can create the initial reconstruction. Images that have a large overlap are usually the best.
After that, the algorithm will bootstrap the reconstruction and test each image pair until one will work as the initial reconstruction. Once an initial reconstruction is found, more images can be added one at a time to build a point cloud formation. These processes are often used in creating 3D interactive maps so the use of GPS data can help in the reconstruction of the images. A diagram of this process can be seen below.
Image Attribution: OpenMVG, part of Mozilla Public License V2
Instead of using our own implementation of OpenSfM, we decided to use OpenDroneMaps (ODM) as it has its own implementation of OpenSfM that performed much better than the one we produced. In addition, ODM also provided the option to create a Node server that could then be referenced by a Python API. Instead of providing point clouds, ODM generates meshes from the given images which we directly convert to voxels. Although ODM can produce decent results underneath the right conditions, any textures or shadows heavily interfere with the meshing algorithm. Examples can be seen below:
Due to the meshing errors that are caused by texture issues, the setting for this project has been designed with a matte green screen and with light and camera mounts to make sure that there are no shadows and there are no issues with the texture of the background. Although this will work in the confines of this project, this problem is not scalable and will have to be fixed.
Voxel Group Classification
Once the voxelized mesh has been obtained, voxel relation analysis can now be performed to identify pure objects in the voxel mesh. Voxels are much easier to compute compared to meshes due to the fact that they are binary 3D arrays. This makes it easy to use voxel relation analysis, where each voxel is represented with an array containing values if there is another voxel right next to the initial one in every dimension (X, Y, and Z).
A flowchart of the process can be seen below:
In order to work with voxels, the main tools used here were Numpy and PyVista, which allowed for the creation and visualization of the voxels. One of the main components of the algorithm as shown above is to estimate edges as straight lines or curves by using the differences in the height. This is done by creating a sequence out of the differences, which are then estimated to see what kind of sequence each is. The edges are compared by judging if the graph is a straight line or a curve. Another important note is that if an object is over classified, as in it has two different classifications, it will default to a rectangular prism. This is because rectangular prisms closely fit all sorts of shapes the best and an object that is not a rectangular prism can usually also be held as one.
Pure Object Parameter Recognition
Once the segmented voxel mesh has been created, finding the parameters of each of the voxel groups is relatively easy. This is because the voxels are a set size and can be counted to find general lengths. Using voxel counting, all the software has to know is what to count, which is also simple as there are only four different classifications, so there are only four different methods or parameter collections. A flowchart and explanation are below.
Create a list of all voxel segments and the corresponding classifications
Run calculation method depending on voxel classification
Save the parameters in an array
Repeat step one until all the parameters of the segments have been calculated
Output: Array of object parameters
All the counting is done through NumPy and collects the following parameters:
Rectangular Prism: Length, Width, Height
Cylinder: Radius 1, Radius 2, Height
Sphere: Radius 1, Radius 2, Radius 3
Cone: Radius 1, Radius 2, Height
Performing the Grasp
Checking Grasp Validity
At this stage, the software has decided to grasp a certain object and know how to hold it. The grasp validity test is a sequence of short tests to make sure that: there is a solid grasp on the object, the method being used to hold the object is the most optimal for the given grip type. A flowchart and explanation are below.
Input: Initial grasping values
Perform initial grasp and use decrease all valued to make hand position larger than the calculated value
Increase finger positions to the IAP (initial applied pressure) value
Constantly accelerate upwards and measure the acceleration changes where moving.
If the acceleration difference is less than the ADT (acceleration difference tolerance):
Grasp is validated
Increase forces applied by each of the fingers
Even out the pressure between each of the fingers
Start over from step 1
The initial values here (IAP, GTH, MTV, ADT) are all set by a human as these are calibrated values. Depending on these values, this process can be very short or very long. In addition, due to the voxelization and the number of changes the data goes through in the algorithm, this step acts as the final barrier before the task is completed and performed. It checks if the calculations are correct and accounts for the fact the meshes are voxelized and an error range is introduced into the scenario. Once the grasp is validated the robotic arm performs a hardcoded task to move from the green section to the red and drop off the object.
Throughout the past eight weeks, the progress we have made includes creating a Computer-Aided Design (CAD) model of the hardware using Solidworks, forming the first prototype of the mesh generation algorithm, designing parts of the voxel classification, and also collecting test data on finger placement defined by pure object parameters. As explained above, a CAD model is used to improve the quality of the design without physical hardware. Our CAD model of the hand, based off of the human hand, consists of three hinges per finger and sized to match the average size of a human hand. Additionally, we also created a CAD model of a six-axis arm and the first prototype of the mesh generation algorithm, which helps the hand recognize the pure objects of each chosen object. (e.g. cone, cylinder, rectangular prism, ellipsoid). To do so, the camera input from the robot, displaying an image of the object, was employed. Using the mesh generation algorithm, we then created parts of the voxel classification algorithm. These 3D pixels, or groups of voxels, are analyzed and segmented into different pure shapes, such as cones, cylinders, rectangular prisms, and ellipsoids, all with unique parameters. Lastly, we collected some data on finger placement defined by pure object parameters. Using our object creation code, which generates different shapes and sizes through a set range, and our simulation code, which creates a simulation in Pybullet, we were also able to collect some data on finger placement to eventually determine trendlines for the optimal grasp based on the size of each shape.
Due to the time constraints and physical limitations posed by Covid-19, we were not able to completely meet our objectives. Future steps for this project would be to obtain more training data with a wider variety of shapes and sizes, along with more gripping techniques for varying complex objects. We propose to run more trials and to collect data to create precise, definitive trend lines that would be able to determine how to optimally grasp certain pure objects. Currently, we are using a complex mathematical algorithm to segment the voxel mesh, but using an edge detection or deep learning approach could greatly expedite the process. As for the hardware, we would use the Computer Numerical Control Router (CNC) to construct the setting and 3D print a majority of the parts for the arm, but the hand would either need to be redesigned or replaced by an existing hand model, as our current model is aligned toward simulation purposes only.
Regarding the applications of the hardware, we expect to be able to explore the possibilities of applying our research and testing data to the creation of a more dexterous robotic prosthetic hand. Although the current state of our software is not collaborative because it cannot work in conjunction with a human, pure object recognition can be added to prosthetic limbs in order to optimize the functionality. This type of task will also need more technology to predict what the user will want to hold and also when to let go of a certain object.
We would like to thank everyone who supported our project. We would like to acknowledge Professor Tsachy Weissman of Stanford’s Electrical Engineering Department and the head of Stanford Compression Forum for his guidance throughout this project. In addition, we would like to acknowledge Cindy Nguyen, STEM to SHTEM Program Coordinator, for the constant check-ins and chats and thank you to Suzanne Sims for all of the behind-the-scenes work. We would like to thank Shubham Chandak for being our mentor and advising us. Thank you to all of the alumni who presented in the past eight weeks or gave us input regarding our project. Lastly, thank you to past researchers; your work has helped and inspired our project.
 Shreeyak S. Sajjan, Matthew Moore, Mike Pan, Ganesh Nagaraja, Johnny Lee, Andy Zeng, & Shuran Song. (2019). ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation.
 Shuran Song, Andy Zeng, Johnny Lee, & Thomas Funkhouser. (2019). Grasping in the Wild: Learning 6DoF Closed-Loop Grasping from Low-Cost Demonstrations.
 Zeng, A., Song, S., Lee, J., Rodriguez, A., & Funkhouser, T. (2019). TossingBot: Learning to Throw Arbitrary Objects with Residual Physics.
 Zeng, A., Song, S., Yu, K.T., Donlon, E., Hogan, F., Bauza, M., Ma, D., Taylor, O., Liu, M., Romo, E., Fazeli, N., Alet, F., Dafle, N., Holladay, R., Morona, I., Nair, P., Green, D., Taylor, I., Liu, W., Funkhouser, T., & Rodriguez, A. (2018). Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching. In Proceedings of the IEEE International Conference on Robotics and Automation.
 Song, S., Yu, F., Zeng, A., Chang, A., Savva, M., & Funkhouser, T. (2017). Semantic Scene Completion from a Single Depth ImageProceedings of 30th IEEE Conference on Computer Vision and Pattern Recognition.
 R. Jonschkowski, C. Eppner, S. Hfer, R. Martn-Martn, and O.Brock. Probabilistic multi-class segmentation for the amazon picking challenge. In2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–7, Oct 2016.
 J. Redmon and A. Angelova, “Real-time grasp detection using convolutional neural networks,” in ICRA, 2015.
 D’Avella, S., Tripicchio, P., and Avizzano, C. (2020). A study on picking objects in cluttered environments: Exploiting depth features for a custom low-cost universal jamming gripper. Robotics and Computer-Integrated Manufacturing.
 C. Wu, “Towards Linear-Time Incremental Structure from Motion,” 2013 International Conference on 3D Vision – 3DV 2013, Seattle, WA, 2013.
 Saxena, A., Driemeyer, J., & Ng, A. Y. (2008). Robotic Grasping of Novel Objects using Vision. The International Journal of Robotics Research, 27(2), 157–173.
 Billard, A., & Kragic, D. (2019, June 21). Trends and challenges in robot manipulation., Science Magazine
 Ji, S., Huang, M., & Huang, H. (2019, April 2). Robot Intelligent Grasp of Unknown Objects Based on Multi-Sensor Information.
Virtual reality (VR) is a visionary platform that enables individuals to immerse themselves into simulated experiences. Thus far, VR simulation has been widely applied for entertainment and educational purposes. However, due to recent constraints caused by COVID-19, we hoped to explore its potential applications in health and wellness to make mental health support more accessible. Our research studied how VR experiences can better engage and influence a user’s mood, as opposed to a video’s detached 2D experience. Identifying the advantages and strengths of its effects on emotions can lead to health development programs such as proactive therapy and meditation.
Our team created two otherwise identical experiences: a VR 360º and a 2D video. Prior to watching the respective experiences, study participants were given a series of questions to quantify the levels at which they were feeling certain emotions at the time. We utilized a between subject experimental design with a control group of 20 users who experienced the 2D video and a variable group of 20 users who experienced the 360º VR experience. Both experiences were designed to evoke calming emotions using visuals, sound effects, and music. Afterwards, the users were once again asked to answer the same questions from the beginning of the experiment to evaluate if the experience had successfully shifted their moods. After the study, we analyzed video recordings the participants took of themselves during the experiment in order to determine if machine learning models could accurately detect their emotions.
Virtual reality aims to transform an individual’s sighted environment by controlling their full vision scope. Past studies of “mood induction” have shown that static audio and visual elements in film can affect moods . As VR lenses like the Google Cardboard become more prevalent and accessible, there is a broadening field for simulated mental health treatment and entertainment.
Virtual Reality Exposure Therapy (VRET) is an increasingly popular tool for therapists and doctors to use on patients suffering from physical or emotional pain. Through VRET, patients can be treated from the comfort of their homes without a practitioner physically present . Our project aimed to support the development of VRET by exploring the emotional responses of people to VR technology.
2. Materials and Methods
2.1 VR and 2D Experiences
We filmed the video experiences with a Rylo 360º camera mounted to a drone, edited in auditory supplements through iMovie, and uploaded the videos to Youtube. The video depicted a flyover above a lagoon at sunset and aimed to evoke pleasant and calm emotions. This content was the same for both study groups, but Group A’s video was formatted as a regular 2D Youtube video and Group B’s video was a 360º Youtube video. Study subjects in Group B were provided Google Cardboards for their experience and asked to use headphones.
Each participant was given a survey to quantify how much they were experiencing eight different emotions or moods at the time of the survey. These moods included happy, sad, peaceful, calm, tired, anxious, excited, and alert. The participants were asked to self-identify their emotional state in these eight moods on a scale of 1 to 5, with 1 being the most disagreement and 5 being the most agreement. They did this survey before and after watching the study video. There were several other questions on the survey regarding demographics and preferences to mislead the participants about the purpose of the experiment. The survey included instructions for navigating to either the video or the virtual reality headset.
2.3 Machine Learning Algorithm for Detecting Emotions
Participants were instructed to record videos of themselves, specifically their faces, throughout the entire duration of the survey. We found a pre-trained facial recognition model from an open source Python machine learning code on GitHub and configured the code to analyze the recorded videos . The facial recognition algorithm uses a deep convolutional neural network to classify emotions into seven categories: angry, disgusted, fearful, happy, sad, surprised, and neutral. We estimated the accuracy of the facial recognition algorithm by comparing the emotions that the algorithm detected from the videos with the survey results.
Figures 1 and 2 show a comparison between the before and after answers of both study groups. Figure 3 was derived from these graphs, showing the mood percentage changes in each group. The results support the hypothesis that the VR experience shifted more moods compared to the 2D experience. The direction of mood shifts were the same between the control and variable groups for 7 out of 8 moods. The most significant difference between the emotional shifts of the two groups was in calmness, thus underscoring the potential for VR in meditation. There was the greatest percentage change for anxiety decrease among all moods in both groups, which suggests that the experience likely affected anxiety the most. In a free response question asking respondents to describe their thoughts and feelings, 12 out of 20 subjects in the 2D video group used the word “video” in their response and 3 used the word “experience,” whereas the responses were flipped for the VR group. Ultimately, these findings support the notion that users found VR to be more real and immersive than 2D video.
4.2 Machine Learning Algorithm for Detecting Emotions
Facial Recognition accurately detected the participants’ emotions 52% of the time. The algorithm was 17% more effective post-experience in comparison to pre-experience. This shows that the video experiences were effective in changing moods, since the facial recognition had a greater success rate after the participants watched the videos. The algorithm could not make predictions for participants wearing the Google Cardboards in portions of the recorded videos, likely contributing to some of the inaccuracies. Camera quality, lighting, and unique neutral expressions of individuals could also be sources of error. Nevertheless, given a limited sample size and a relatively mild experience, a 52% success rate is a good indicator that facial recognition has great potential in determining moods.
5. Future Directions
After studying the short-term shifts in emotions discovered through this project, we hope to explore VR’s potential for prolonged mood stability and positive improvement. If granted more time and testing equipment, we could create more complex and customized scenes for users, which may lead to more generalizable results. There are several factors such as cultural background, age, and time of day that could be more thoroughly studied. Based on these factors, we could tailor the VR experience to certain individuals in order to further engage and enhance their experiences. Additionally, our experimentation with machine learning facial interpretation yielded good results, and its promise may broaden the possibilities to capture precise data and analyze the effects of VR in real time while a VR headset is worn. It would also be interesting to examine VR’s potential for treating a specific type of physical or mental illness.
We would like to acknowledge founding director Professor Tsachy Weissman of the Stanford Electrical Engineering department for a profound look into research, the humanities, and endless opportunity. Special thanks to Stanford undergraduate senior Rachel Carey for mentoring us throughout the entire process. Another thank you to Devon Baur for her valuable insight on analysis and experimental design. We would also like to thank Cindy Nguyen and Suzanne Marie Sims for ordering and delivering the materials that made our experiment possible. Finally, a big thank you to our fellow SHTEM students, family members, and friends who provided data for our research.
We trained a sentiment analysis bot using machine learning and Twitter data to classify tweets as expressing positive, negative, or neutral sentiments toward COVID-19 safety measures such as mask wearing and social distancing. We then compared data obtained from this bot to both economic data and COVID-19 case count data to better understand the interplay between social mindsets, consumer spending, and disease spread.
The COVID-19 pandemic has caused mass social unrest across the United States. The safety procedures (e.g., mask wearing and social distancing) that have been strongly recommended by the Centers for Disease Control and Prevention (CDC) to constrain the spread of the virus have proven to be controversial. By analyzing publicly available Twitter data via the Twitter API, we sought to better understand 1) the usage and attitudes towards these procedures over time in the US, and 2) their effect on case count growth. Using tweets containing the keywords “mask”, “masks”, or “social distancing”, we trained a machine learning-based sentiment analysis bot to determine whether a tweet expresses positive, negative, or neutral sentiments regarding COVID-19 related safety measures. Specifically, we used a combination of the Naive Bayes and Logistic Regression algorithms to train the bot. We then used our bot to automatically classify thousands of tweets into each of these categories and observe how public sentiments have changed over time since the beginning of the pandemic. Finally, we analyzed government economic data related to consumer spending in brick-and-mortar locations (such as restaurants and retail stores) to see if this data had any correlation with the sentiment data and case counts from the pandemic. The case count and death count datasets were obtained from the World Health Organization (WHO).
Sentiment analysis is a branch of computer science that attempts to identify sentiment and emotion from natural language input. There are a multitude of ways to accomplish this, but most methods fall into two main branches: machine-learning-based sentiment analysis, and dictionary approaches to sentiment analysis.
Machine-learning-based sentiment analysis refers to a sub-field of artificial intelligence that aims to understand human emotion in natural language expression in an automated fashion. By training a bot with a classification algorithm and training data consisting of text strings and then their corresponding labels, the bot can learn to classify text on its own with some level of accuracy. Although using sentiment analysis on tweets comes with many limitations, we aim to show that its usage will give us a better perspective of both public opinion on COVID safety procedures, and how these opinions may influence the actions of others.
Dictionary approaches are much more simplistic, and essentially work by maintaining a large database of words with certain associations (similar to any dictionary for natural language, but encoded for computer purposes), and then classifying a string of text by referring to the dictionary’s classification of the words within it. Compared to machine-learning based sentiment analysis, this system does not learn as it reads more data, cannot be trained, and is unable to understand any word that is not within its dictionary.
To create training data for our bot, we had to obtain a large amount of tweets with our desired keywords (“mask”, “masks”, “social distancing”). To do this, we used the free extension of Twitter Developers, known as Sandbox. This allows us to fetch up to 5k historical tweets (tweets from longer than 7 days ago) per month, but unfortunately historical tweets are truncated (after 140 characters, the tweet is cut off). Instead of collecting historical tweets for our training data, we used the “Stream” function which allowed us to obtain real time tweets with no monthly limit. These were not truncated and came through July, when we were doing most of our classifications. We collected 1297 of these and hand labeled each of them into neutral (DIS), positive (POS), or negative (NEG) categories. An additional 87 were collected but couldn’t be used because they were in the wrong language, or did not have a keyword. We hypothesize that some of the returned tweets are actually retweets of tweets containing a keyword, but do not have the keywords themselves. This is a potential issue we face in our final results.
Although sorting tweets into negative, positive, and neutral categories seems like a simple thing to do, it is extremely subjective and difficult to judge. Many of the tweets analyzed were entirely incoherent and self contradictory. At the beginning we believed that if a tweet was in favor of social distancing, it would also be in favor of masks, and vice versa. This was not always the case, so deciding where a tweet would go was often subjective. Tweets also tended to follow political orientation, but not as frequently as we expected before we began. You can find a full list of the tweets we used for training and testing along with their classifications below the Appendix. If you notice places where you think we misjudged, we’d love to discuss this with you so we can further increase the objectivity of our sentiment analysis bot.
NEG (Negative, as in against masks or social distancing)
General negative language/phrases/feelings surrounding protective masks and/or social distancing to prevent the spread of COVID 19
Indication that masks/social distancing are part of a conspiracy theory/government control
Blatantly state they do not wear a mask (for any reason)
Indication that a group of people does not have to wear a mask (even if the person themself is not within that group)*
Proposing that masks lower oxygen intake
Indication that masks “don’t work”
Proposing rebellions against mask wearing, or “cheating” the system
Indicating “it makes no sense to do X because Y disease was much worse.”
Wanting to open schools AND does not specifically suggest protective measures
Apathy regarding masks or social distancing
Against general mask wearing/social distancing mandates*
Arguing against/ making a negative statement towards something or someone that promotes social distancing or masks
Stating that coronavirus does not exist/ is exaggerated to justify not following safety guidelines
Proposing that the best way to beat coronavirus is to “build up immunity” by defying public health guidelines.
Makes fun of/ is clearly against people or things that support mask wearing/ social distancing
POS (Positive, as in in favor of masks or social distancing)
General positive language/phrases/feelings surrounding protective masks and/or social distancing to prevent the spread of COVID 19
Implying that masks/social distancing work
Mentioning that they wear a mask/social distance
Arguing against/ making a negative statement towards something or someone that does not promote social distancing or masks
Advertising masks (handmade or otherwise)
Stating that coronavirus is not something trivial and/or should be feared.
Arguing that reasonable mandates are justified
Encouraging others to follow guidelines
DIS (Discounted/Neutral, as in not clearly expressing a meaningful opinion)
Neutral language/phrases/feelings surrounding protective masks for COVID 19
Contradicting statements to the point the writer’s opinion is somewhat indiscernible, and the tweet does not blatantly conform to a specific rule in POS or NEG that would overwrite that issue. *
People saying they want to understand another side (unclear where their actual opinions are)
Calling people hypocrites for not wearing a mask*
Not referring to masks in the context of COVID 19*
Claiming that they are against a law/mandate that is ridiculous or unimaginable without further indication of their position.*
The full list of clarifications for rules can be found in the Appendix at the end of the paper. Rules that require clarifications are denoted by *.
Prior to the data labeling process, we assumed the issue of categorization would be more black and white than it really is. Although our philosophy towards categorization has evolved over our research, at the moment we base categorization on how the user’s words will affect others who read their tweet instead of just analyzing the writer’s opinion on their own. This makes sense for two reasons:
First is the volume of data that can be collected on this premise, and accuracy of analysis. Most of my tweets would have to be thrown out as neutral based on guidelines that only concern the users’ own assumed opinion, and in the end our data probably wouldn’t be a good reflection of the actual negative vs. positive bias.
Second is the fact that basing classification on how the tweet affects the mindsets of others who see it addresses our thesis better. It shows better how social media opinions on coronavirus affect real life circumstances, not the portrayal of real life circumstances in social media. The difference is that a user’s opinions are a symptom of a situation, whereas the effect of their opinions on others may collectively cause a situation. We are looking for how social media may predict circumstances rather than how circumstances predict the state of social media.
An example of this is the following commercial tweet:
“Pssst! I got a secret. Get at ADDITIONAL 20% OFF face masks that are already on sale!!! That’s around $6 a mask. Only if you buy 4 or more! Sale won’t last long. BUY NOW!!!
Although it would make sense to assume that someone selling masks is pro-mask, we have no evidence of this whatsoever. If we were to adopt a philosophy of sorting tweets based on an individual’s opinion, we would run the risk of being forced to classify a majority of our tweets into neutral categories and therefore have a skewed dataset. Instead, we consider this as a positive tweet by following our more holistic philosophy of connecting this tweet to its likely effect on those who read it. The specific tweet above promotes a societal acceptance and usage of masks, and people who read it will be affected by this philosophy.
In order to analyze consumer spending patterns during the pandemic, we sought out data made available by the Bureau of Economic Analysis under the United States Department of Commerce. On their website are monthly reports of Personal Income and Outlays, which illustrate consumer earning, spending, and saving. Personal outlay is the sum of Personal Consumption Expenditure, personal interest payments, and personal current transfer payments.
Personal outlay can also be calculated as the Personal Income minus the Personal Savings and Personal Current Taxes. This represents an overall track of how much consumers have spent within a month. Using the data Table 1 provided by the Personal Income and Outlay report of June 2020 , we graphed the Personal Outlays in billions of United States dollars against months on Google Spreadsheets (Figure 2). Since this is a monthly report with overall changes, there is only one data point for each month. Additionally, the dollars are seasonally adjusted by annual rates, which helps remove seasonal patterns that may affect the data. All the dollar amounts in the following figures are seasonally adjusted by annual rates, as well as the index numbers. We chose to include 4 months of data, as the coronavirus
pandemic started to impact the United States in mid-March, so the data of March is skewed to both the pre-pandemic era and in the pandemic era.
Figure 1: Personal Outlays (in billions of US dollars)
Within the Personal Outlays, the subtopic of Personal Consumption Expenditure (PCE), also known as Consumer Spending, is a more specific measure of spending on consumer goods and services. PCE is the value of the goods and services purchased by, or on behalf of, “persons” who reside in the United States.Using the same June 2020 report from the BEA , we gathered the PCE in billions of dollars over the months February 2020 to June 2020. For the total amount, we utilized Table 1: Personal Income and Its Dispositions (Months), and for the changes between months, we used Table 3. Personal Income and Its Disposition, Change from Preceding Period (Months). PCE is divided between two sections, goods and services. Within goods, there are two further subtopics: durable and non-durable goods. The first graph is the total amount (Figure 3).
Figure 2: Personal Consumption Expenditure (in billions of US dollars)
For more details into CPE, we looked at a variety of different products and collected their CPE in billions of USD. To find the data, we found the data in Excel Spreadsheets linked under the Underlying Details section of Interactive Data on the direct Consumer Spending  site page on the Bureau of Economic Analysis as SECTION 02: PERSONAL CONSUMPTION EXPENDITURE . Through the Excel Spreadsheet, we accessed Table 2.4.4U. Price Indexes for Personal Consumption Expenditures by Type of Product, which is under the spreadsheet code U20404. From the spreadsheet, we chose a range of goods and services that had a variety of changes over the four months.
Computer software and accessories
Food and beverages purchased for off-premises consumption
Food and nonalcoholic beverages purchased for off-premises consumption (4)
Food purchased for off-premises consumption
Personal care products
Electricity and gas
Live entertainment, excluding sports
Food services and accommodations
Personal care and clothing services
Personal care services
Hairdressing salons and personal grooming establishments
Using the data in the spreadsheet, we collected the PCE of each good or service in billions of USD, and graphed it using Google Spreadsheets (Figure 4).
Figure 3: Detailed Price Consumption Expenditure (in billions of US dollars)
To go even more in depth on retail sales, we gathered data from the US Census Bureau  in their Advance Monthly Trade Report released in July. Using their customizable time series, we found the sales in millions of US dollars for the following:
Retail Trade and Food Services: U.S. Total
Retail Trade: U.S. Total
Grocery Stores: U.S. Total
Health and Personal Care Stores: U.S. Total
Clothing and Clothing Access. Stores: U.S. Total
General Merchandise Stores: U.S. Total
Department Stores: U.S. Total
Nonstore Retailers: U.S. Total
Food Services and Drinking Places: U.S. Total
Figure 4: Sales of Food and Retail Services (in millions of US dollars)
Another determinant associated with consumer spending is the PCE Price Index, which is a measure of the prices that people living in the United States, or those buying on their behalf, pay for goods and services, and reflects changes in consumer behavior. Utilizing the same June 2020 Personal Income and Outlays BEA report , we gathered the data from Table 9: Price Indexes for Personal Consumption Expenditures: Level and Percent Change from Preceding Period (Months). Using the percent change in index, we calculated the change based on the index in February, and graphed it across four months.
Figure 5: Change in Price Consumption Expenditure Price Index
As an additional correlation, we gathered the CPI, or Consumer Price Index and compared it with the PCE Price Index. The difference between the two indexes is that the CPI gathers data from consumers while the PCE Price Index is based on information from businesses. Moreover, CPI only tracks expenditures from all urban consumers while the PCE Price Index tracks spending from all households that purchase goods and services. See this resource by the BLS for more details on the differences .
For the CPI, we collected data from the Bureau of Labor Statistics, which is another bureau under the Department of Commerce. On the site page, CPI Databases, we accessed Tables of the series All Urban Consumers, which led us to the page, Archived Consumer Price Index Supplemental Files, where we accessed News Release Table 3 , which is Consumer Price Index for All Urban Consumers (CPI-U): U.S. city average, special aggregate indexes, June 2020. We chose the exact same expenditures as we had in the PCE Price Index: Services, Durables, and Non-Durables, and collected the seasonally adjusted percent change within the months March 2020 to June 2020. Durable goods are not for immediate consumption, and thus are purchased infrequently while non-durables are purchased on a frequent basis. Since there were three percentages each for between two months, we allocated the percentage to the latest month. Using the percent change in index, we calculated the change based on the index in February, and graphed it across three months.
Figure 6: Change in Consumer Price Index
3) Machine Learning Methods to Classify Twitter Data
Once we had all our data prepared, we took to Wolfram Alpha to start creating our sorting bot. We decided to include two separate machine learning algorithms as a part of our bot to increase accuracy. Our first bot sorted neutral tweets out from binary (negative or positive tweets), while our second would sort decidedly binary tweets into negative or positive categories. The first bot was trained and tested on all the data we sorted, while the second was only trained and tested on the non-neutral sorted training and test data. This didn’t make a significant impact on the amount of data the second bot was trained on since neutral data only represented 21.14% of the sorted data.
Wolfram has many classifier algorithms available to take advantage of, but since most are designed to be trained on numerical data as opposed to language data, they can be flawed when used for NLP (Natural Language Processing) . Here are most of the options available in Alpha.
Percentage accuracy was one of our top priorities, but another important consideration was bias. We wanted to make sure that when our bot made a mistake, it wasn’t significantly more likely to make one sort of mistake than the other. This ended up ruling out some methods, because 100% of their errors were assuming test was “positive” when the label was “negative”. Note, we had anywhere from 302 to 374 pieces of test data depending on the type of test that was being run, so this was unlikely due to pure chance. We assume that these methods were not created for processing strings, and just guessed the highest probability option from the training data in every instance if the tested data were strings. This is an important reason to run different kinds of tests and analysis on bots besides just accuracy, because although these methods had high accuracy for our particular data, they were very unreliable.
Another important consideration that we kept in mind was how neutral tweets tended to be sorted when they were mis-sorted. In this scenario, it’s much harder to assume an “ideal” rate. Neutral tweets didn’t only include tweets entirely unrelated to COVID-19, so what the ideal sorting ratio for them truly is is much more difficult to hypothesize. Given time constraints, in our experiment we made the assumption that ideally neutral tweets should be sorted equally into “positive” and “negative” categories if they weren’t sorted as neutral. Although this is a metric that is much more difficult to control for, we took this final predicted ratio into consideration after processing our data to be able to better predict a confidence ratio for each datapoint.
We ended up using Naive Bayes for the neutral vs. binary sorter, and a combination of Naive Bayes and Logistic Regression for the positive vs. negative sorter. This was done by collecting the probabilities for each outcome within each algorithm and multiplying each probability by its respective ratio (.65 for Logistic Regression, .35 for Naive Bayes), adding them across algorithms, and choosing the largest.
On their own, the respective accuracy ratios of the neutral sorter and binary sorter are estimated P1= 79.841% and P2= 72.093%. Total accuracy is more difficult to calculate, because our main interest isn’t sorting every tweet into the correct group, it’s obtaining the correct ratio of tweets that are in certain groups. We don’t currently have an estimation of the former accuracy, but we do know for our test group what the assumed vs. true proportions are, and the proportion of correctly sorted tweets. For the latter, we can just assume that this number is roughly equivalent to P1 * P2=0.5755977. In the future, we hope to improve this overall accuracy by refining the algorithms we use and implementing synonym-based data augmentation strategies.
One more issue we’d like to mention is that although using Wolfram allowed us to complete this project in a timely manner with a wide variety of options available to us, Wolfram is a closed source software and therefore the information about how their algorithms work is somewhat obscure. When using Naive Bayes in Wolfram, we didn’t know whether the function would automatically reformat or clean data for us, and if it did what issues it might’ve encountered when analyzing unknown strings like hashtags. Although we may or may not continue using Wolfram for future extensions of this project, keeping these issues in mind might help guide logistical and practical decisions in the future.
In getting our final data we were somewhat limited by time constraints and available computing power. Since only one of us had access to Wolfram Alpha, and because we did our computations locally, we were only able to process 100 pieces of data per month, save for July which also used our pre-labeled data.
Figure 7: Proportion of Tweets by Sentiment (keywords: mask, masks, social distancing)
We graphed the data above by month to be able to match the economic data better, and because we did not have enough data to show a day-by-day graph. We would take the data from every day in the month, classify it, and then take the proportion of the data that fit that classification out of all of the data in that month. Unfortunately, historical tweets are truncated, so our algorithm guessed tweet sentiment solely based on the first 140 characters of each tweet for every month before July. In the future, we hope to find a work-around for this issue.
A final issue to mention is that our training data was not equally distributed. About 55.3241% of our human-labeled data was positive, 23.071% negative, and 21.2963% neutral. This means that our machine learning algorithms may hover around these ratios regardless of true values, and means that our data may be more conservative in percent changes than it should be. We decided to keep these ratios because changing them to be equal would significantly decrease the amount of labeled data we could use to train the bot, but we hope that as we work on this in the future, we will have enough data such that we could have equal ratios without having to sacrifice much of our labeled data.
Figure 8: WHO Data on Confirmed Cases & Deaths in US
Figures 1-6: As with all the economic data, we use publicly available data from the government through monthly reports; therefore we only have one datapoint for each month. This leaves only an estimate of the data between months. For example, the lowest dip shows it happened in April in Figure 6, but it does not show when in April.
Figures 5 & 6: For the index data, we calculated the change of the index from the “original” index given in February (pre-pandemic), which means the y=0 line represents the February index.
Figure 6: This graph, using data from the Bureau of Labor Statistics, only has three months, which is inconsistent with the rest of the economic data graphs.
Figure 7: There are a number of reservations to be had with this graph and it should not be considered as fact. We hope to collect a higher volume of data, alongside more accurate evaluations for our future pursuits, so the data we have at the moment could be considered a ‘teaser’ for what is to come. Our machine learning software only has about a 75% accuracy rate and only 1796 pieces of data were used to create the graph due to data processing issues, alongside limits in usage for historical Twitter data. Not to mention, the algorithms we used do have considerable bias which could affect our data. Also, data is exhibited month-by-month because of a lack of data that would make day-by-day analysis jerky and confusing.
Figure 8: For pt.1 of fig.8, since WHO data for the US comes from the government report for the US, there is some discrepancy in the number and the curve due to collection errors, lack of testing, underreporting, misdiagnosis, and other issues. For death counts, unlike death counts for seasonal diseases like the common flu, are not representative of real-time. It often takes days to weeks to test the deceased, get the results, send the reports to the National Center for Health Statistics. This means that the data is often based on past weeks and are not entirely current. In addition, even if it were reported on time it more represents the situation of the nation two weeks ago as opposed to on that day, because it takes a while after getting the virus for any patient to be at risk of death. The following article goes more in depth about the issue: https://fivethirtyeight.com/features/coronavirus-case-counts-are-meaningless/
Fig 1: Throughout the data collection, positive tweets always seemed to be the majority, indicating that the majority of the US population is in support of mask usage and social distancing. This is supported by surveys of the general US population, which might indicate that representation of opinions on social media about this particular issue might be somewhat representative of the general population.
Interestingly, the data seems to go through significant changes between April and May, and also June and July. Between April and May, and rising number of Twitter users seem to have positive opinions about mask wearing and social distancing, and not many neutral comments, which could indicate that more people are talking about COVID 19, that more people are making explicit opinions about masks and social distancing, or a combination of both during that interval of time. Between June and July, neutral rates stay somewhat constant but there’s a sharp increase in negative tweets and decrease in positives. From this, we might hypothesize that given the amount of times safety mandates have been in place, people are gradually starting to get more and more upset about mandates and are changing their opinions on it.
Figure 2-3 & 4-5: These four figures represent the overall Consumer Spending, with Personal Outlays as the overall curve, and then CPE, detailed CPE, and sales of food and services. They all follow a similar dip in USD spent in April, which is succeeded by a slow growth back up. However, at the time of June, the numbers do not reach back to pre-COVID levels, especially considering the rate at which our consumer spending was growing before that. In the possible ways economic impact can play out due to a pandemic like this, our consumer spending may slowly yet surely return to normal growth rate, since there is no indication in our graphs of a quick growth spurt to catch up to what could have been our consumer spending levels. A possible reason that Consumer Spending is recovering while COVID-19 is spreading is the prevalence of e-commerce, especially when it comes to retail. Even though unemployment is still an issue during these times, the stimulus check combined with online shopping and delivery may have helped spur spending in the economy. Another possible option is that many counties and states have started reopening during May and June, allowing more in-person spending.
Figure 6 & 7: Unlike the other graphs, the price indexes see the dip in May instead of April. Since the price indexes are a measure of inflation, it could be a showing a delayed effect of money spent on the prices of the goods and services.
Fig 8: Both case counts and death counts rise significantly between April and May, and have somewhat bell-curve-like shapes during the interval. Case counts spike between July and August, while death counts begin to slowly increase day by day in the same interval, lagging behind case counts as expected.
April-May: Between April and May most graphs seem to change dramatically. Figure 1 sees an increase in positive tweets and a decrease in neutral tweets. Fig 2-7 sees a dip in most forms of in-person sales like restaurants or air travel. Fig 8 sees a local maximum in case counts and deaths. Because Fig 1-7 are monthly, it is more difficult to make general assumptions about associations between the data types: For example we cannot say whether twitter sentiments imply case counts or vice versa, we can simply say that they may be correlated. Regardless of implications, this does give us a hint of how social changes might be able to affect case counts, or how the reverse is true.
June-August: Although little government economic data about this timeframe has been posted yet, we can make some assumptions and inferences about the correlations between twitter sentiment data and case count data. Case counts rise dramatically during this time and see a local and global peak. Meanwhile, twitter data indicates that positive sentiment for precautionary measures is dramatically falling (between late June and late July), whilst negative sentiment rises to its peak, about 380.67% higher than the highest previous point, which takes place in April. Upon grabbing further data for twitter, and showing data on a day by day basis instead of month by month, we might find that the dramatic maximum that takes place mid July for case counts can be explained by a rapidly diminishing concern for COVID safety precautions. For now, we can only speculate that this may be the case, but it would be very interesting and telling if it were to be true.
We’re intrigued by the idea that economic data and/or Twitter sentiment data may be able to be used to predict upcoming case rates. Although the idea that economic data can be used to predict spread of disease is not new, the idea that sentiment from social media can be used to predict (and possibly help prevent) diseases like this from spreading in the future is a new and revolutionary idea that gives reason to both hold social media companies more accountable, and to take these trends much more seriously than we have in the past. Although we’re far from making those conclusions, simply showing the possibility is important to us, and hopefully with more data we will be able to better understand these connections.
One way to strengthen our understanding of the multifaceted impact of the coronavirus is to combine qualitative and quantitative data. In this case, public opinion on social distancing measures is difficult to visualize; however, by using sentiment analysis on Twitter, we can better understand public sentiment. Since the interaction between the pandemic and society is so complex, we further explored the connection between more quantitative impacts such as consumer spending and COVID-19 data. For some of the economic data, we found similar dips between the two, especially in April. We also found that positive public sentiment decreased while consumer spending continued to rise. This type of analysis is applicable to real world problems, and, through deeper understanding, can lead to policy change and better preparation for future pandemics.
Mentions for the Future
A future direction would be to analyze the change in growth, which is the derivative of the economic data curves, and possibly correlate that with the derivative of the sentiment analysis data. Although consumer spending may be rising in dollar amounts, there could be indication that the rise is slowing, and such rate-of-change information could offer a deeper layer of analysis.
We plan to increase accuracy of sentiment analysis bots through increase in data intake, this may include synonym-based data augmentation, and using computer generated lists with human corrections to increase speed of sorting. We also plan to collect more data so that data could be shown on a day by day basis, and labeling data next to a timeline of political events that might have led to drastic changes in sentiment. We also plan to change the formatting of our algorithms, and possibly migrate to a new coding language to have a better and wider control over our algorithms. This along with stronger data processing that might include word to word associations and grammatical constructs will allow us to have a better and more holistic bot that can tell us more about the data itself. We plan on getting tweet volume data for our keywords and using this to estimate the amount of tweets in each sentiment per day. Finally, if we accomplish our previous goals, we plan to analyze more word based analytics to better understand what people are talking about in the context of COVID-19 safety procedures, and what this can tell us about the spread of misinformation regarding COVID-19.
Another way to analyze the relationship between the economic and sentiment analysis data would be to look at the time delay between their curves. For example, we might observe that consumer spending increases a certain amount of time following a rise in positive public sentiment.
Our results show this very relationship between the peaks of the respective graphs, but with a closer quantitative look, we could figure out if the time delay is consistent or not. This would be very interesting and may lead to us finding more ways to help prevent diseases from spreading as rapidly in the future.
A huge thank you to our graduate student mentors Surin and Leighton in this project for always being there for us to help answer any questions we had, putting us in contact with people to give us advice, helping format the paper, and generally always being there for us to give valuable advice and support us along the way.
Another thank you to our overseeing professor, Ayfer Ozgur. Her support helped guide us through this project and helped us feel motivated through every step.
Thank you to Huseyin Inan for giving us great advice on our project and supporting us.
I (Noemi) also wanted to thank Megan Davis for helping me through learning Wolfram so I was able to do this project.
Appendix: Clarifications/Mentions for Rules for Sorting Tweets
Curse words and retweets in tweets are censored like so: kjnwef
General Mention 1: Sarcasm negates all rules. For example, if someone was advertising masks (POS 5) but doing it in a sarcastic or joking way with fake/non-existent masks, this would be considered a negative. Negating a rule also would likely categorize a tweet in the category opposite of the original rule that was negated. This doesn’t always apply for rules in DIS.
General Mention 2: If a person clearly is attempting to express one opinion or another, but is not using the right words, they will still be sorted under that category assuming that they misunderstood the meaning of their words. An example tweet to illustrate our point:
That shit be pissing me off. People putting other people in danger because of their negligence.
I don’t even know you but GIRL FUCK YOU!
And fuck anyone who refuses to practice herd immunity and wear mask to protect their loved ones and others.
This shit is NOT A GAME.
This tweet follows POS 3, POS 4 and POS 6. The only issue is that the statement “practice herd immunity” could imply that the user was attempting to convey NEG 16. In this case, we can assume that by “herd immunity” the user meant “social distancing” based on everything else she said. Of course, making assumptions always leaves room for error, but human languages require these assumptions of us if we are to semi-accurately predict collective intended meaning.
General Mention 3: If a person’s tweet fits into one of the neutral rules, but also has qualities that would fit into negative or positive, it will almost always be sorted into whichever binary (non neutral) category it shares a rule with. There are a few exceptions, but in general negative/positive rules will overwrite neutral rules unless that neutral rule encompasses the binary rule. See the following example of an exception where neutral overwrites binary:
Yep. #MaskUp America. Unless of course you’re the exalted Dr. Fauci and the Mrs. I’m guessing just like protesting/looting, there’s an invisible shield that protects you when at a baseball game. #Hypocrite
This rule contains both (POS 8) and sarcasm, which might normally put it in the NEG category because of sarcasm negating previous rules. However, since this entire phrase is encompassed by the person attempting to show that a certain person is a hypocrite, we choose to put it in neutral.
NEG 4: This includes people who state “people with asthma should not have to wear masks”, “children should not wear masks”, “STRONG, NOT-SICK PEOPLE SHOULDN’T WEAR MASKS”, or otherwise indicate that certain groups are exempt from mask wearing. We as researchers acknowledge that there are medical conditions that would warrant not wearing a mask, but these mainly include severe skin conditions and severe particle allergies that would likely bar the patient from going outside at all during this time. For this reason, stating that certain people should not have to wear a mask would likely put a tweet in the NEG category and overwrite most positive statements.
NEG 11: This mainly refers to people that do not necessarily explicitly mention that they are against masks, but that they are against reasonable mandates for masks. (ex. Required to wear a mask to enter a store, or receiving a fine for not wearing masks) This is often characterized by a user stating that a mandate would violate civil liberties, or otherwise violate a human right of theirs. This does not include people who state that they are against mandates to wear a mask/social distance in a domestic setting, or claim to be against other ridiculous or unrealistic mandates. People who state the above will probably instead be sorted into a neutral category.
DIS 2: If there is little for no way for us to discern a users’ opinion with even a small degree of accuracy, we will likely choose to discard it. This is because even if the user clearly has some opinion, by labeling it we risk misrepresenting them, and because other people who have read the tweet may not have understood it either, making it less important to us in the first place.
“@smogmonster89 @savesnine1 @sainsburys Yet you ID people who abuse you? People in their late 20s who say “are you taking the piss?” Becuase it’s the law; if wearing a mask is also meant to be the law surely it’s same as ID’ing people, just my thoughts”
“More credible then u or trump, but u pitch only to his gullible base. So w/out context, u bring up mask issue I know trump did everything perfect,on face value thats moronic Ur blathering nonsense isnt helping reelect. Thinking repub @JesseBWatters @BillOReilly @newtgingrich”
Above are two examples of tweets that were labeled as neutral, despite being very clearly opinionated. They appear to be self contradictory, could be argued to be representing either side, and have inconsistent and confusing grammar. To some extent, it’s unclear whether or not they’re even talking about masks. For tweets that are entirely incomprehensible, it’s better to just leave them out of the main dataset.
DIS 4: This particular issue came about when there was controversy around Anthony Fauci for not wearing a mask or social distancing during certain parts of a baseball game. The issue being, many would criticize him for not wearing a mask, but not make their opinions clear. One would think that these tweets would follow POS 4, but the issue was that some users were criticizing Fauci solely because he was not wearing a mask, whilst others were really only criticizing him for being a hypocrite, but not necessarily for not wearing a mask. I noticed that in general this rule was true, when someone was being criticized for something for the reason that they were being hypocritical, it didn’t necessarily convey the users’ opinion on what the person did on its own, sometimes it just conveyed their opinion of the person themself. For this reason, tweets under this category that do not explicitly follow some other rule that would make them positive or negative will often be labeled as neutral.
Anthony Fauci: It’s ‘Mischievous’ to Criticize Me Taking Off Mask in Baseball Stands:We all need to understand when DEM/Socialist and theGODS of the Media make the laws they are making those laws for you and me not for them they are above the law https://t.co/2txBqx5nU7
The above tweet is very obviously anti-Fauci, and we might assume that they are anti-mask, but because the article they linked was only explicitly “anti-Fauci” and didn’t mention their take on masks, and because the user themself did not mention their take on masks, we have to put them in the neutral category. I considered putting this tweet in the negative category because the last part of their tweet seems to indicate NEG 6 by implying that “mask rules” only apply to people who aren’t democrats, but because we technically can’t differentiate between them being angry about their belief that the rules are only applied to them, or anger that necessary rules don’t apply to others, we cannot categorize it.
DIS 5: This category just includes tweets that aren’t talking about masks in the context of COVID 19. For example, they may be referencing Bat Man’s mask, masking emotions, or other non-medical references to masks. Below is my favorite example of this rule.
@REMills2 I’m an abusive pageant mom. Every day I shake him by his tiny little shoulders and say “ONLY WINNERS IN MY HOUSE” and he sheds a single glistening anime tear, knowing the mask of fame must continue to hide his deep dissatisfaction and emptiness. Alexa play Lucky by Britney Spears
DIS 6: We came across a few tweets that were claiming “mandates requiring you to wear a mask in your own home” was something they were completely against. It makes sense to be completely against such a rule if it existed, regardless of position of masks, so if the tweet gave no other indication of their position on the issue it was sorted into this category.