Olfactory Communications In a Virtual Environment

Journal for High Schoolers

Authors:

Anushka Sheth, Bikrant Das Sharma, Harvey Jarin, Ian Zhang, Jonathon Sneh, Leena Elzeiny, Prayusha Parikh and Vyomika Gupta

Abstract:

Currently, VR systems immerse an individual visually and auditorily, but the lack of  olfactory stimulation is part of the void that separates virtual reality from real life  experiences. The prospect of adding scent as a third dimension to virtual reality prompted  our research of the effects of adding an olfactory feature to virtual interactions, as well as  its effect on the judgement of pleasantness and personality of others met in the virtual  environment. At the start of the experiment, all participants take a personality quiz. The  quiz predicts how much each participant enjoys a certain smell. Each individual then  participates in a conversation in virtual reality with a stranger. Some participants have  scent dispersed during their conversation, and some participants do not. The stranger the  participant meets has scripted responses. Immediately after, each participant answers  questions based on their conversation. The questionnaire quantitatively scores the (1) pleasantness to examine if scents impact the perception of someone and (2) the perceived  personality of the stranger, in order to determine whether olfactory communication aids  in an accurate first impression. This research is aimed at investigating the implications of  integrating olfactory communication in virtual interactions and assessing its effects on the  judgement of the other person’s personality and pleasantness.

1. Introduction

Olfactory communication is deeply instinctual. A scent is first processed by the olfactory  bulb which directly activates the amygdala, the control center for emotions in the brain.  The association between smells and emotions can run so deep that, according to three clinical studies by two psychiatric professors, patients with post-traumatic stress disorder reported feeling guilt and nausea upon smelling an environment associated with the traumatic incident [1]. Currently, technology is slowly shifting to incorporate olfaction machinery into spaces, to bring a new era of immersive reality. In order to simulate the  physical world, one must mimic all dimensions of it, but little research has been conducted  to understand the human response to syntents. 

The purpose of our research is to determine the effect of aroma in a person’s judgement of  the pleasantness and personality of a person who they meet. A personality quiz is given to  the participant to predict their enjoyment of a certain smell. The participant interacts with  another person. During the virtual reality conversation between the participant and the  actor, the scent is dispersed for some subjects without their knowledge. 

2. Background

Virtual reality (VR) is a highly realistic, immersive, and simulated experience incorporating  sophisticated visual and audio elements. Virtual reality is primarily used for educational  and entertainment purposes, as it simulates a realistic interaction. The two participants  interact in virtual reality, through the platform AltspaceVR, a social virtual world where  users create avatars and explore chat rooms. AltspaceVR serves to supplement social  interactions, and connects users worldwide. While AltspaceVR and other such virtual  communication platforms attempt to mimic and replicate real life encounters, many  subtleties humans have in their physical interactions are lost. Most first impressions, as  well as the formation of memories, require tangible visual and olfactory cues that cannot  be conveyed currently through virtual means. First impressions are formed in the  amygdala and posterior cingulate cortex; parts of the brain that process fear and assign  value to objects. The factors involved in the assignment of value and trustworthiness are  largely derived from visual signals, primarily body language. People can evaluate faces on  trustworthiness, status, and attractiveness following even just a brief glance (e.g., 33 ms)  and extra time (100 or 500 ms) only led to a small improvement in the correspondence of  these time-constrained evaluations with an independent set of time-unconstrained  judgments. Perceptions of personality are also based in visual cues, including clothing,  posture, and eye contact. Many such cues, however, are also conveyed through olfaction. 

Yet, such a strong correlation has not always been addressed. The potential of olfactory  communication has been tabled since the ancient Greek philosophers: Plato and Aristotle  described scent as the most animalistic of the five senses [2]. Later philosophers such as  Immanuel Kant agreed because smell did not contribute to the music and art during the Enlightenment. On the other hand, some non-Western cultures glorified their sense of smell [3]. For example, the Bororo people in Brazil consider body odor to be the life force of an individual and the smell of breath to be linked to the soul. 

3. Methods and Materials 

We received a scent machine from OWidgets, a device from the SCHI lab at the University  of Sussex. AltspaceVR was used for the virtual reality world and experience. The VR  headsets used were an Oculus Rift and an HTC Vive. Scents were received from the  Institute of Art and Olfaction in Los Angeles, California. An Arduino board was used to  control the dispersion of the OWidgets machine. The board was connected to a laptop  which controlled the dispersion of scent. 

The experience begins in a waiting room, where participants are ushered in and out. Each  participant takes the scent personality quiz anonymously and the database stores the  person’s scent preferences scored by the quiz. From there, participants are told the  narrative for the experiment: they have left their lives and are in a state which determines  whether they will return to their old life or move on to the afterlife. In the room, the  participant enters the virtual reality world. There is another person in the Virtual Reality  world, and the two users converse. After the experience, they take an exit survey, judging  each others personality and pleasantness.  

An important factor in conducting the experiment relies on the participants perceived  reality as a blinded experiment. The experiment is disguised as a study on the effect of  virtual reality on first impression. Each participant meets an “actor”. An actor is a person  who has memorized his/her responses to the questions and gives the memorized response  when the participant asks them a question.  

After their interaction, the participant steps out of the VR headset and take an exit survey  on the interaction they just had. The exit survey asks the participant to judge the actor’s  personality and pleasantness.  

3.1 Altspace

AltspaceVR is a virtual reality platform that is used as a meeting space for both private and public events. In AltspaceVR, worlds can be built from scratch, which allows for greater customization and control over certain variables (avatars, music, boundaries, setting). In addition to its customization features, Altspace also supports world-building through Unity and in-game app development through its MDK feature. This, in combination with its other features, makes AltspaceVR ideal for our experimental procedure.

For our world, we wanted to create a neutral, non-religious interpretation of the Afterlife.  In order to ensure that the world would not elicit any negative emotions, we also wanted  to make the world as serene and calm as possible. The world we determined that would be  best included the following features: soothing music, a night time and mountain background, a reflective plane, and a limiting boundary in the form of rocks. To make the experience more interactive, we also implemented a Tic-Tac-Toe game to go along with the questions. 

3.2 Personal Quiz

In order to determine the scent that corresponded to a participant’s unique personality,  we created our own version of a personality quiz. Based on the research we conducted on  various personality quizzes (Myers-Briggs, Five Factor Model, Eysenck’s PEN Model), we  determined that a quiz following the general structure and trait association as the FFM  would best suit our purpose. We devised 12 questions, 11 four response multiple choice  and 1 two response question, based around the five main personality traits in the FFM:  Openness to experience, Conscientiousness, Extraversion, Agreeableness, and  Neuroticism (OCEAN). Since there are four main fragrance families–Floral, Oriental,  Woody, Fresh–and five trait groups, we decided to combine openness and extraversion due to their similarities [4]. To connect fragrance families and the personality traits, we found correlations between one’s personality and preferred perfume. The fragrance-trait relations we used were Floral-Neuroticism, Oriental-Conscientiousness, Woody-Agreeableness, Fresh-Openness and Extraversion [5].

Early versions of our quiz were inaccurate, as a majority of people tended to  overwhelmingly favour one response over the other answer choices for several questions.  This led to a majority of people ending up with a high score in the Fresh fragrance family  as opposed to the other three fragrance families. After going through several iterations of  the quiz while modifying and removing certain biased questions, we were left with 12  questions with even distributions in terms of answer responses. For scoring, we assigned  each response three points (with the exception of #11 which was given four points for  more weight). The points corresponded to different scents, as our research proved that  those who chose a specific answer choice were more likely to favor a specific fragrance.  Each answer choice that an individual made added points to one or more scents. In total,  there were 37 possible points to be earned in each of the four scents, meaning that there  was an equal opportunity to score any of the four fragrance families. After all 12 questions  were answered, the total number of points scored in each fragrance family was summed  up to create the participant’s scent array. The scent array ranked the four scents, in order  to anticipate which one they would be partial to. Next, we had to evaluate whether an  individual’s personality – as judged by our quiz – actually had an effect on the kind of smells  they preferred. To test the accuracy of our personality quiz, we went to various locations  on campus and surveyed people randomly. Participants were first given the quiz in order  for us to predict the scent that they should prefer based on their personality type. Then,  they were given four strips of paper, each sprayed with a different fragrance oil and were  asked to choose which one they prefered. Out of the 24 people we surveyed, 16 of them  chose one of their top two scoring scents as their favourite, proving that our quiz results  could accurately predict which aromas an individual would prefer a majority of the time. 

3.3 OWidgets Scent Machine 

To disperse the participants’ unique perfume throughout the room, we looked into various  methods of scent delivery systems such as essential oil diffusers, heat diffusers, and  nebulizer oil diffusers. While each of these have proved to be effective at dispersing  smells throughout a given environment, the fragrance oils they give off tend to linger in  the room for hours. Over time, the different smells would blend together to form olfactory  white – an unidentifiable scent that would distort our data. We wanted to avoid the  Smell-o-vision disaster of the mid-twentieth century, in which users were overwhelmed  by the multiple scents coming at them. Furthermore, we had to ensure that the smell was  not too strong. Since each VR experience would only last for four to five minutes, we  needed a way to easily get rid of the smell after each encounter. To do this, we partnered  with OWidgets, a company that designs ways to incorporate olfaction into various  experiences. This company was founded by Emanuela Maggioni, an experimental  psychologist whose interest in the human olfactory system led her to develop a machine  that could be used for further experimentation with smells. The scent machine that she  and her team created allows three different scents to be sprayed from the nozzles.  Through experimentation, we confirmed that perfumes sprayed from the machine only  noticeably lingered in the environment for 1-3 seconds after the machine was turned off.  We found that the ideal distance between the user and the OWidgets Scent machine was  five feet. Moreover, we discovered that the best way to spray the fragrances was to have  the machine on a cycle with it on for five seconds and off for 15 seconds for the duration of the experience – about four minutes. This ensured that the participant could smell the scent for a majority of the experience and was not overwhelmed by the strong fragrance. 

4. Future Directions

The next steps for continuing this research would be using more perfumes in the VR  interaction. Expanding from only fresh to also including floral, warm and spicy, and woody  would offer more data about responses to smells which people like and do not like. 

Afterwards, expanding to a more dynamic set of scents would also be a possible future  direction. By implementing scents that correspond to one’s unique environment, an even  more immersive experience would be gained from the users. Smells in an environment  could be identified by an E-nose and then transferred to another user using a similar  system as ours, but the actual replication of the scent and as well the current capabilities  of an E-nose limit this process. Future advancements in these areas would be needed to  further the inclusion of dynamic scents in VR or any other platform.  

5. Acknowledgements

We would like to acknowledge Professor Tsachy Weissman of Stanford’s Electrical  Engineering Department and head of the Stanford Compression Forum for his guidance  and help with the project. We also want to thank Devon Baur for being the mentor of our  group and helping us with designing the project, accessing and getting materials, and  running the experiment. Additionally, we also want to thank Professors Subhashish Mitra,  Gordon Wetzstein, and Debbie Senesky for all of their help with the project. Also, we want  to thank the OWidgets group in London for giving us a scent machine and helping us set it  up and troubleshoot. Lastly, we want to thank the Institute of Art and Olfaction in Los  Angeles for providing us with the scents we used for the project.  

6. References

[1] Vermetten E & Bremner JD. Olfaction as a​ ​traumatic​ reminder in posttraumatic stress disorder: case reports and review. The Journal of Clinical​ ​Psychiatry​ 64 (2003), 202-207.

[2] Camille Ferdenzi, S. Craig Roberts, Annett Schirmer, Sylvain Delplanque, Sezen Cekic, Christelle Porcherot, Isabelle Cayeux, David Sander, Didier Grandjean, Variability of Affective Responses to Odors: Culture, Gender, and Olfactory Knowledge, ​Chemical Senses​, Volume 38, Issue 2, February 2013, Pages 175–186, https://doi.org/10.1093/chemse/bjs083

[3] Bhana, Yusuf. “Localising Smells: How Different Cultures Perceive Scents.” ​TranslateMedia​, 7 Dec. 2017, http://www.translatemedia.com/translation-blog/localising-smells-different-cultures-perceive-scents/

[4] Mensing J., Beck C. (1988) The psychology of fragrance selection. In: Van Toller S., Dodd G.H. (eds) Perfumery. Springer, Dordrecht

[5] Ahrndt, Sebastian & Fähndrich, Johannes & Albayrak, Sahin. (2015). Modelling of Personality in Agents: From Psychology to Implementation.

Nanopore DNA Sequencing

Journal for High Schoolers

Authors:

Georgina Cortez, Nicole Krukova, Thuan Le, and Suraj Thangellapally

Abstract:

Nanopore technology is a modern way of sequencing DNA, in other words, it determines the order of nucleotides in a DNA strand [9]. First, a DNA strand passes through a motor protein that serves as an inhibitor to improve the accuracy of the readings. Then, the strand runs through a nano-sized hole that is experiencing an ionic current. As the DNA strand passes through the nanopore, the individual nucleotides cause disruptions in the electric current which help determine the nucleic acid sequence [14]. Approximately 250 bases can be sequenced per second by Nanopore technology [9]. However, the results are “noisy” or not completely reliable because of several factors including the inconsistency of the nucleotides’ dwell times, which are the time periods of the central bases in the nanopore. The wavelength of the ionic current depends on the dwell time; inconsistent dwell times produce different wavelengths that make it difficult to accurately determine the nucleic acid sequence.

We believe the central base’s dwell time depends on the nucleotides present in the motor protein at that moment. The central base is the nucleotide currently in the nanopore whose dwell time we are focusing on. A k-mer (e.g. “AAT” or “CGATC”) consists of multiple nucleotides that correspond to the central base’s dwell time. To characterize the inconsistencies of the dwell times, we analyzed the data to find correlations between the k-mers and their dwell times. These correlations can help future researchers assign specific characteristics to k-mers and their impact upon dwell times.

1. Methods

We conducted our research with sequenced DNA data previously acquired from the lambda bacteria, which has approximately 48,000 nucleotides. Through Python, we manipulated the data to determine trends and correlations between the k-mers and their corresponding dwell times. Our main goal in the programming was to create visual representations of the data in the form of plots. These plots allowed us to discern noticeable trends in the data that we otherwise might not have observed. Our initial plots were difficult to analyze because there was too much data that made it nearly impossible to understand any trends. To resolve this issue, we organized the data into a more digestible format, by only plotting the average dwell time for each k-mer and sorting the points from lowest dwell time to highest. This allowed us to more easily spot any outliers or trends in the data and compare different k-mers.

2. Results

The plot and table showed that the data for all of the bases was very similar as there were barely any differences between their means, standard deviations, etc. These results validated our belief that the dwell times did not depend on the base in the pore but rather that the dwell times are dependent on the following bases that are falling through the motor protein since it controls the speed of the DNA strand. To test this idea, we moved on to creating plots of different k-mer’s dwell times to see if any correlations became more apparent.

We began with an unshifted 3-mer as practice to construct our function and begin experimenting with associating k-mers to the central base’s dwell time. As expected, there were no significant trends in our data because the unshifted 3-mer has only one nucleotide following the central base. We believed that there would be correlations with a longer k-mer and its dwell time since, as our hypothesis claims, the central base’s dwell time depends on the nucleotides passing through the motor protein in that moment. Following on with this observation, we created a dataset with shifted 3-mers, where we shifted the 3-mer one space to the right.

Since the error bars for the dwell times of the 3-mer are so significant, the data for the 3-mer plot should not be used as it is unreliable. This further affirmed our focus on the 5-mer as it holds more reliable data.

We observed that the outlying dwell times below 7 units and above 9 units had a consecutive repetition of a base in their k-mers. The dwell times in these ranges varied greatly from the “normal” dwell time in the 7 to 9 unit range. The k-mers in this “normal” range had a minimal amount of repetition in comparison to the outlying ranges. This lead us to believe that the consecutive repetition of a nucleotide causes an irregular dwell time.

Similar to the previous sample of 5-mer data, there was a repetition of a single base in k-mers with irregular dwell times. In the 6.5 to 6.7 unit range of the dwell time there was a repetition of a single base in each k-mer. We also noticed that there was a higher prevalence of C’s in the lower spectrum of the dwell times. Moving into the higher 10 to 12 unit range of the dwell time, we observed that there was a consistent repetition of the nucleotide G in each k-mer. G’s are known to be particularly disruptive so we believe that the repetition of them caused the change in the dwell time. This data supported that the repetition of a nucleotide causes irregularities in the dwell times.

The first plot models that there is no difference between the prior k-mers and the 7-mer. This indicates that in order to find some trend, we need to increase the length of the k-mer and keep shifting it. There are 16,384 possible 7-mers combinations, so we created a plot of their averages so there would be a better visualization. The second plot above represents the averages of the first 10,000 bases of the lambda data collected, sorted from least to greatest. However, after analyzing the plot, we were sure that we would need a longer k-mer to spot any significant trend. For future reference, researchers should look to analyzing the repetition of nucleotides, focusing on specific ranges, and any other possible trends that could lead to a discovery.

3. Conclusions

Our project confirmed that the dwell time is not dependant on the central base but rather on the following bases in the motor protein. A significant discovery was that the consecutive repetition of a nucleotide causes irregular dwell times. The nucleotides G and C also appear to affect the consistency of the dwell time. With this information, future researchers should continue to create different shifts of k-mers and gather more data about the impact of base repetition on the dwell time. By understanding this relationship, researchers will be able to characterize and predict the effect a k-mer will have upon the dwell time. This can lead to more accurate readings of the sequenced DNA which will allow for the expansion of Nanopore technology.

Advancements in Nanopore technology will expand the applications of DNA sequencing in fields such as healthcare and data compression. Increased reliability will permit for the technology to have a more widespread role in personalized medicine, where treatment is unique to one’s genetics, and DNA data storage, an incredibly efficient way of storing a substantial amount of data in a microscopic space.

4. References

[1]​ Python: The Ultimate Beginner’s Guide!​. Scotts Valley: CreateSpace Publishing, 2016.

[2] “Python Numpy Tutorial”, ​Cs231n.github.io​, 2019. [Online]. Available: http://cs231n.github.io/python-numpy-tutorial/#numpy-arrays. [Accessed: 08- Jul- 2019].

[3] “Data Analysis and Visualization with Python for Social Scientists alpha: Extracting row and columns”, ​Datacarpentry.org​, 2017. [Online]. Available: https://datacarpentry.org/python-socialsci/09-extracting-data/index.html. [Accessed: 08-Jul-2019].

[4] G. Templeton, “How DNA data storage works – ExtremeTech”, ​ExtremeTech​, 2016. [Online]. Available: https://www.extremetech.com/extreme/231343-how-dna-data-storage-works-as-scientis ts-create-the-first-dna-ram. [Accessed: 08-Jul-2019].

[5] G. Templeton, “How DNA sequencing works – ExtremeTech”, ​ExtremeTech​, 2015. [Online]. Available: https://www.extremetech.com/extreme/214647-how-does-dna-sequencing-work. [Accessed: 08-Jul-2019].

[6] “Replacing strings with numbers in Python for Data Analysis – GeeksforGeeks”, GeeksforGeeks​, 2018. [Online]. Available: https://www.geeksforgeeks.org/replacing-strings-with-numbers-in-python-for-data-analysis/. [Accessed: 08-Jul-2019].

[7] N. Jetha, C. Feehan, M. Wiggin, V. Tabard-Cossa and A. Marziali, “Long Dwell-Time Passage of DNA through Nanometer-Scale Pores: Kinetics and Sequence Dependence of Motion”, ​Biophysical Journal​, vol. 100, no. 12, pp. 2974-2980, 2011. Available: 10.1016/j.bpj.2011.05.007.

[8] T. Petrou, “Selecting Subsets of Data in Pandas: Part 1 – Dunder Data – Medium”, Medium​, 2017. [Online]. Available: https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c. [Accessed: 11-Jul-2019].

[9] “How it works”, ​Oxford Nanopore Technologies​, 2019. [Online]. Available: https://nanoporetech.com/how-it-works. [Accessed: 11-Jul-2019].

[10] S. Lynn, “Using iloc, loc, & ix to select rows and columns in Pandas DataFrames”, Shane Lynn​, 2018. [Online]. Available: https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-an d-ix/#loc-selection. [Accessed: 11-Jul-2019].

[11] “Python Pandas Tutorial 7. Group By (Split Apply Combine)”, ​YouTube​, 2017. [Online]. Available: https://www.youtube.com/watch?v=Wb2Tp35dZ-I. [Accessed: 14-Jul-2019].

[12] “How do I apply multiple filter criteria to a pandas DataFrame?”, ​YouTube​, 2016. [Online]. Available: https://www.youtube.com/watch?v=YPItfQ87qjM. [Accessed: 14-Jul-2019].

[13] “Sequencing DNA (or RNA) | Real-time, Ultra Long-Reads, Scalable Technology from Oxford Nanopore”, ​YouTube​, 2017. [Online]. Available: https://www.youtube.com/watch?v=GUb1TZvMWsw. [Accessed: 14-Jul-2019]

[14] “Introduction to nanopore sequencing”, ​YouTube​, 2019. [Online]. Available:
https://www.youtube.com/watch?v=I9BOF8Hla5s. [Accessed: 14-Jul-2019].

[15] ​​lambda data accessed from Weissman’s lab

Implicit Communication in Collaborative Object Transport

Journal for High Schoolers

Authors:

Munir Bshara and My Pohl

Abstract:

When two people carry an object together, they manage to change directions and avoid obstacles with minimal or even without verbal communication. Communication using nudges, forces and eye contact is often sufficient. In this paper, we study this nonverbal language with the aim to understand it better. This type of communication is useful also for artificial intelligent robots. The results of our study and further experiments may contribute to better communication between AI robots and humans as well as between robots. The experimental setup consisted of two persons carrying an object to a certain goal. Meanwhile data was collected from force sensing resistors attached onto the object and from a camera. The results showed that there seems to be different styles of communicating. To be able to apply this kind of communication onto robots we think it will be necessary to identify these different methods rather than collecting experimental data and calculate an average.

1. Methods

Materials. A grid made out of masking tape was used to outline the experimental area, a table which was the object the participants carried, four force sensing resistors to measure the different forces that were applied to the object, a web camera (logitech ​Pro Stream Webcam C922 1080HD) ​to record videos to later be able to analyze them in a program (OenPose). To get the forces we used 4 FRS’s connected to 4 breadboards which are then connected individually through a ​3.3kΩ resistor and 3 breadboard cables to an Arduino UNO(REV3) which is connected to a pc through USB.

Figure 1. A schematic overview of the electrical circuit.
Figure 2. A photograph of the actual set up.

The pictures above show how the ZJchao Force Sensitive Resistor was attached to the UNO R3 board. The FSR was attached to the breadboard through a F-M Dupont Wire. The red and blue breadboard jumper wires are for 5V power and grounding while the yellow breadboard jumper wire is used for direct connection to the computer. The resistance of the resistor was ​3.3kΩ. To convert the analog output to digital the Ardunio IDE was used and some code to retain the output of the force sensors.

Design. ​The experiment took place in a room with a grid on the floor with some fabric pieces of four different colors deployed in the squares of the grid. The grid had 32 squares (84), each square was 50cm50cm. In the ceiling there was a camera recording the procedure. Afterwards the videos were analyzed in a program named PoseNet which tracked the participants’ motions and trajectory. At the short side of the grid was the starting position of the table with four force sensing resistors attached.

On the short sides of the table there were handles with two force sensing resistors. The sensors were directly attached onto the table. On top of the sensors were the handles. The handles had a piece of eraser with the same area as the sensors attached to them. The handles were attached to the table with silver tape and so that the eraser touched the sensor. No pressure was being put onto the sensors while resting. The handles were loosely attached to the table.

Figure 3 and 4. Figure 3 is a photograph which shows what the construction of the handles looked like. Figure 4 illustrates how the handles were used.
Figure 5. A schematic showing a connection between an FSR and computer using Arduino UNO.
Figure 6. A graph showing the relationship between resistance and force.

The schematic above shows the connection between the FSR and the computer using the Arduino UNO which allows us to convert the analog signal from the FSR to a digital signal that can be printed. The relationship is generally linear from 50g and up, but note what the relationship does below 50g, and even more-so below 20g. These sensor’s have a turn-on threshold — a force that must be present before the resistance drops to a value below 10kΩ, where the relationship becomes more linear. To counteract this irregularity in the beginning of chart I altered the code. This alteration changes the parabolic reading of pressure put upon the force sensor to a linear one through dividing the force by a certain constant that was given to us.

Procedure. ​There were two participants at a time. Initially they got a general explanation of the experiment read out loud. They were told that they were supposed to carry a table together through a room and that they were not allowed to either turn around or speak with the other person. The experiment would be repeated three times with some small changes. The experiment was over when the table is put down on the floor. Further information was given individually.

Figure 7. A schematic overview of the experimental area with the three goals plotted. The numbers stand for the number of the experiment.
Figure 8. Photograph of the actual experimental area taken with the web camera from the same view as the video recordings.

These were the instructions that were given to the participating pairs on a paper.

Experiment 1, person 1:

“On the floor there will be a grid with some fabric pieces of different colors in it. The grid is the area where the experiment will take place. Remember, you are not allowed to speak with the other person or turn around.”

Experiment 1, person 2:

“On the floor there will be a grid with some fabric pieces of different colors in it. The grid is the area where the experiment will take place. Your task is to place the two legs of the table on your side into two squares with blue colored fabric pieces. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”

Experiment 2, person 1:

“Your task is to place the two legs of the table on your side into two squares with green colored fabric pieces. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”

Experiment 2, person 2:

“Your task is to place one of the two legs of the table on your side into a square with a yellow colored fabric piece in it. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”

Experiment 3, person 1:

“Your task is to place one of the two legs of the table on your side into a square with a yellow colored fabric piece in it, and the other into a square with a blue colored fabric piece. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”

Experiment 3, person 2:

“Your task is to place one of the two legs of the table on your side into a square with a white colored fabric piece in it, and the other into a square with a blue colored fabric piece. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”

All the experiments were recorded on film using the logitech camera in the ceiling. These videos were after the experiment was done analyzed in OpenPose.Then we got the exact position data, movements and trajectories. During the experiment the force sensing resistors were read every .5 seconds. We took time on all the experiments using a stopwatch.

To analyze the force data I connected all four force sensors to their own breadboard and their own arduino UNO board, and through a usb connection to my computer I was able to use an Arduino IDE with some code to print out the force outputs. Then by using a script and vi I was able to easily isolate the forces and time allotted into a text editor.

Looking like this:

To have something to compare the results from the experiments with we made a baseline. The baseline is the same exact experiments, but with both participants knowing where they should place the table.

Participants. ​Participants were recruited from different departments at Stanford University (The Artificial Intelligence Laboratory, Wallenberg Hall and interns from the internship program “Stem to SHTEM”). All participants were between the ages 15 and 52. The average age was 21 years. 12 people participated in the experiment (5 female and 7 male). All participants participated voluntarily.

2. Results

Figure 9. Showing the position for player 2.
Figure 10. Showing the position for player1.

All trials were recorded with the webcam attached to the ceiling. Then Machine Learning Techniques were used to extract various body points. ​OpenPose + Tensorflow allowed us to track positions relative to the camera. A combination of python scripts and bash tools was needed to extract and format the data.

It seems like different types of people react differently to each of the experiments based off previous interactions. For example, a dad and his daughter moved the table calmly while a pair of friends made more powerful movements. Another thing that was being noticed is that people pulled more than they pushed.

Figure 11. Four graphs illustrating the forces that were put onto the object from the two participants in experiment 1.
Figure 12. Four graphs illustrating the forces that were put onto the object from the two participants in experiment 2.
Figure 13. Four graphs illustrating the forces that were put onto the object from the two participants in experiment 3.

In the graphs above we have calculated the difference between the measurements from the pushing and pulling sensors on each handle to get the resulting force. This has been done during the whole time the experiment lasted. Down in the right corner of the figure we have the baseline which can be used for comparison.

Not that surprising, the baseline solved the task faster than the others on all of the experiments. The baseline did also use more exact nudges rather than a lot of back and forth communication. However, the other participants who did not know exactly where the goal was also managed to complete the tasks well.

There is a pattern in the way the different groups solved the task. For example, group 3’s three graphs look a lot alike. The same is for group 2 and 1. More measurements are needed to establish reliable conclusions but it seems like there are a few different styles on how to solve these kinds of tasks. Maybe it would be good to try to identify these different styles rather than collecting data and calculate an average because a combination of working methods does not necessarily have to result in a new working one.

3. Conclusions

Humans have an advanced and efficient way of communicating with each other. The rapidly growing research area artificial intelligence would benefit from learning this kind of communication. In this paper we focused on the nonverbal communication via nudges when carrying an object. We have developed a methodology to measure push and pull forces that are being put onto an object that is being carried. The results from the experiments showed that the tasks were possible to solve but that it went more efficiently when both of the participants knew the goal. Another conclusion is that there seems to be different methods to solve these kinds of tasks and that they may depend on the relationship between the two participants. The different methods should be identified.

Limitations and future work. ​When we designed this experiment we wanted to eliminate all other ways to communicate except for nudges so that we can be sure that the measurements we get are reliable and correct. One limitation is that we did not exclude the possibility to communicate via eye contact. Future work should take eye contact into consideration. An easy way to solve this problem would be to make the participants wear capes or hats. The measurements we got from the force sensing resistors showed how the two participants pushed and pulled. Another dimension that should be added is turnings. Then we would get a more complete idea of how the exchange of information via nudges works. Measure turnings can easily be done with two handles on each side of the table placed in the four corners. (For this four more force sensing resistors are needed.) With four handles the measurements will show when someone is pushing more on one of the handles or pulling stronger on one side of the table which would make it possible to identify turnings. The complete product of this project would be to transfer our measurements onto AI robots so that they can be able to communicate more efficiently and more human-like.

4. Acknowledgements

We would like to thank Professor Weissman for allowing us to participate in the STEM to SHTEM program. Also many thanks to Mengxi Li and Minae Kwon for mentoring us in this project and finally a big thank you to all the participants who helped us gather data for this paper.

5. References

Force Sensitive Resistor Hookup Guide​, learn.sparkfun.com/tutorials/force-sensitive-resistor-hookup-guide/all.

Kevin W. Rio, William H. Warren (2016). Interpersonal coordination in biological systems The emergence of collective locomotion.

Patrick Nalepka, Rachel W. Kallen, Anthony Chemero, Elliot Saltzman, and Michael J. Richardson(2017). Herd Those Sheep: Emergent Multiagent Coordination and Behavioral-Mode Switching.

Human-Based Image Compression; Using a Deterministic Computer Algorithm to Reconstruct Pre-Segmented Images

Journal for High Schoolers

Authors:

Vivaan Mahtab, Ganesh Pimpale, Juan Aldama, and Paul Truong

Abstract:

In 2018, Prof. ​Tsachy Weissman’s lab conducted research on using human semantics to compress an image. A ‘describer’ would provide links to image assets representing elements of the original image ​[1]​; for example, given a picture of the savanna, the describer would provide images of grass and trees, among others. With the provided compiled components, the describer would use natural language to describe how a ‘reconstructor’ could combine the elements to create an image semantically similar to the original. The instructions and links constituted the compressed image. They called this approach Humans as Awesome Compressors (HAAC). While this compression scheme significantly reduced the size of an image, the natural language instructions were imprecise and could be shortened further. To solve this a scripting-based image editing software that tracks the actions of the describer and can execute them on the reconstructor side, thus improving precision. ​Future research could involve completely automated image compression through the use of artificial intelligence, and the implementation of compression using a neural network and creating a database of pre-segmented images. Code can be found at https://github.com/GaneshPimpale/AI-Compression.

1. Introduction and Problem Statement

There are a plethora of lossy and lossless compression algorithms which help in storing and communicating mass amounts of images. Different types of compression algorithms take advantage of different properties of an image to reduce file size but maintaining similarity to the original image. Some of the most common types of compression algorithms are JPEG ​[2]​, JPEG2000​ [3]​, and WEBP​ [4]​. Most lossy compression techniques attempt to represent more image data with fewer data. For the JPEG compression algorithm, the image is converted to YCbCr, the image is downsampled, the DCT (Discrete Cosine Transform) algorithm is used to represent a finite array of values as multiple cosine functions, and the image is quantized to represent it in as few colors as possible. This process is very similar to WEBP, where the image is downsampled and brightness is decreased, but instead if using DCT, WEBP decomposes the image down into smaller blocks and then uses a prediction algorithm to represent more of the image with fewer values. These processes have major effects of certain properties of the image when a high compression ratio is in use. Namely: Blurring of the image, discoloring the image, and a decrease in resolution.

Figure 1​: Comparison of the effects on images from WEBP and JPG compression algorithms
Figure 2: The performance or WEBP compared to JPEG and JPEG2000 ​[4]

From figure 2 and 3, it is obvious to see that WEBP has significantly better performance than JPEG. In the examples from figure 2 the effects of JPEG on the image consist of miscoloring, noise, and over quantization. The size of the file decreased to around 140 kilobytes. Comparing this to WEBP, JPEG did much worse; the WEBP image only suffered some discoloration and change in noise with the file size below JPEG at around 130 kilobytes. From figure 3, the graph clearly shows that WEBP is more efficient than JPEG and JPEG2000, especially when working with smaller images.

Prof. Tsachy Weissman conducted research testing the abilities of human lossy compression ​[1]​. The research group was able to prove that the method of saving the important parts of an image is a viable way of storing it. The team harnessed two humans; one would encode the image by finding image elements, which are the important parts of an image, similar to the original image. The encoder would then communicate the image to the decoder using an online chat. The decoder would then redraw the image. The original image and the composite image would then be compared based on their file size, quality, and semantic similarity. However, this method of testing contains many flaws and there were many other variables that should have been considered. The first flaw is that directions can be executed in different manners by different people, the directions are quite sematic. Another issue is the fact that the language used to describe the image is nowhere near to concise, leading to larger file sizes as a result of an imperfect editing language.

None of the current standard algorithms consider the semantics of an image, they use the same compression algorithm on all parts of the image, even if it is the focus of the image or the background of the image making the entire image lose quality. Current research already done is currently not a practical solution to harness the idea of a semantic image compression technique. This project aims to automate the method of semantic image compression by creating a deterministic algorithm to reconstruct images in a repeatable manner.

2. Overview of Methods and Image Reconstruction

The overall goal was to be able to capture every editing command executed by the describer so that the instructions would be perfectly precise and reproducible. Thus, a custom script-based editor was created. To create a composite image, the describer creates a Python script where he imports element, performs per-element manipulation, such as resize, rotate and blur, and also places the element in the correct location and order on the canvas. He can run the script at any time and see the composite so far and check his work. When the describer is satisfied with the composite image, he runs the script through a parser which compiles it into a custom simplified command language to reduce file size and writes the output to a text file. This compressed text file of compiled commands constitutes the compressed image. To decompress and display the image, the text file is decompressed, then run through a parser to turn it back into a Python script. The script is executed, the composite image is recreated and saved.

In the previous work on HAAC, the describer would compile links to elements and generate text instructions on how to place the elements to create a composite image. However, this scheme had issues of repeatability and accuracy. Because natural language is poorly suited to precise descriptions of how to edit images, it was necessary for the describer to receive a video feed of the reconstructor’s work so that the describer could provide instructions on how to refine the composite image. Thus, image compression could not be effectively carried out with this scheme unless the describer and reconstructor were both in live communication with each other.

Figure 3: ​Flow chart of the overall method of compression

The compression scheme not only eliminates the need for live communication between describer and reconstructor, but it also eliminates the need for a reconstructor entirely. Rather than using natural language, the describer constructs the composite image himself via a Python script. The commands are recorded and can be executed to create an identically recreate the composite image without any further communication with the describer. Thus, the scheme takes advantage of human qualitative understanding to find relevant elements, edit and place them and a computer’s quantitative precision to capture exactly how the elements were edited and where they were placed.

To create a composite image, the elements used must be on transparent backgrounds. However, during testing, few elements actually came originally pre-isolated but rather had to be isolated from their original backgrounds using GIMP’s freeform crop tool. The experimental design was based on an assumption that a completed implementation of this compression scheme, there would be a repository of image elements with transparent backgrounds, as is discussed further in “Future Goals”. Thus, the same images as the previous research group were used, the elements were classified and isolated from their backgrounds as a pre-processing step. Therefore, the describer used elements with transparent backgrounds when writing the script to generate the composite image.

Additionally, while in last year’s research, elements were referred to using links, this year they were referred to using file paths. This was used to simplify the testing scheme, however, it does not skew the data because a URL shortener could be used to generate links shorter than the average file path used during testing. For example, the suffix to a Bitly link is only seven characters long, while the average file path in testing was greater than seven characters long​ [5]​.

Figure 4: ​Flow chart of the script parsing algorithm

To create a consistent method of scripting for the purpose of encoding and decoding, as well as to further compress a parser was designed to compress scripts into a constructed syntax-specific based language. This involved creating an overall method as to which parse the code that displayed the composite image which would provide full functionality while also reducing the script as much as possible. To account for overloaded functions and various images the parser would take in the entire script of elements and output a single string with the element image id and its corresponding functions given identification values and the parameters converted to hexadecimal per each identification with no strict limit on the number of identifications and quantity of parameters. This aforementioned string is stored as the represented composite image and is executable to be decoded and provide reconstruction. A flow chart of this process can be seen in figure 4.

It was considered to build off of existing image editing software. The optimal software would capture user actions in the GUI and save them as a script, as this would make the describer’s workflow much easier than manually writing the script. The options were narrowed to GIMP, Adobe Photoshop and ImageMagick as candidates because they all include both GUIs and scripting utilities. ImageMagick was quickly eliminated, however, because it is built for per-image editing, not placing and editing multiple images on the same canvas​ [6]​. And while GIMP and Photoshop could store UI commands like macros ​[7]​ and had editing histories [​8]​, macros lacked the necessary flexibility and editing histories lacked the necessary precision. Existing plugins to capture image editing commands were either incomplete or generated so much data that the edit log would be larger than the original image. Thus, a research-specific editing software was designed because Python’s Pillow ​[9] ​image editing library included all the necessary image editing functionality while affording more flexibility, including the ability to add a GUI on top of the original program in the future.

3. Experimental Results

Table 1: ​This table shows the type of compression algorithms used and the file sizes they produced.

Table 1 shows the compression algorithm used compared to the file sizes for each of the images. The Original column consists of the original image. The WEBP column consists of images compressed with WEBP with the quality set to zero. The JPEG column has images compressed with JPEG at quality zero. The HAAC column consists of the file sizes that the research group last year achieved. HAAC + Algorithm is the file size of just the plain python script to build the image. HAAC + Algorithm + Parser is the size of the python script after being compressed further. The bar graph provided is a visual aid to Table 1. To make it easier to read, the vertical axis is put on a logarithmic scale.

Table 2: ​Deterministic difference between a compressed image and the original image

Table 2 shows the deterministic differences between a compressed image and the original image. The WEBP images are compressed with a 0 quality and the JPEG images are compressed to a quality of 1. To compare the images, both the original image and the compressed image were resized to 1000 by 1000 pixels. On the resized images, an algorithm will compare each of the pixel values and then score them if they are within a certain boundary of each other. The bar graph above is a visual representation of Table 2.

4. Conclusions and Further Research

Through our research, we were able to create a compression algorithm capable of compressing an image by approximately 99.9884 percent. The average file size of all the images we compressed came out to be a total of 318 bytes from the original 2 million. However, there is the drawback that our images were deterministically the least similar to the original image, being on average, around 25.22 percent different.

In the previous research on HAAC, elements were referenced by linking to images on the internet. However, links are unreliable due to the compressed file possibly linking to images that are no longer available or whose URL was altered. Additionally, there exists the issue of isolating elements from a larger image. Freeform cropping elements from their source images was a time-intensive part of the compression process. Both these issues would theoretically be solved with an element database designed for the purpose of semantic compression[​10]​. This would allow the ability to provide links to elements within the database with the assurance that they will be available later. Furthermore, elements would be isolated from their backgrounds to avoid the laborious step of freeform cropping when creating composites.

Further research could be conducted on the feasibility of the aforementioned database. A cost-efficiency analysis should be conducted to measure how large the database must be in order to provide enough elements to contain consistently similar elements to the original image while minimizing overall size. If the database is not exceedingly large, clients could also store a local copy, thus removing the transmission cost of downloading all elements required to recreate a composite image.

To build even further on the idea of complete automation would be to have an AI work to both segment and form instructions on how to recreate the image. Segmentation software is already a very common type of software and has already accumulated much research. On the other hand, creating directions is a field that does not yet have much research and is very important to make HAAC fully automated.

5. Acknowledgements

Thank you to all those who supported this project. Thank you to the mentors: Micheal Herrera, Shubham Chandak, Kedar Tatwawadi, and Tsachy Weissman. Thank you to the past researchers: Ashutosh Bhown, Soham Mukherjee, Sean Yang, Irena Fischer-Hwang, Shubham Chandak, Kedar Tatwawadi, Judith Fan and, Tsachy Weissman.

6. References

[1] Weissman, T., Tatwawadi, K., Fischer-Hwang, I., Chandak, S., Yang, S., Mukherjee, S., & Bhown, A.(2019, June). ​Towards improved lossy image compression: Human image reconstruction with public-domain images​. Retrieved from Stanford Compression Forum website: https://compression.stanford.edu/human-compression

[2] David Taubman and Michael Marcellin, JPEG2000 Image Compression Fundamentals, Standards and Practice, Springer Publishing Company, Incorporated, 2013.

[3] Gregory K Wallace, “The JPEG still picture compression standard,” Communications of the ACM, vol. 34, no. 4, pp. 30–44, 1991.

[4] “WebP,” https://developers.google.com/speed/webp/, Accessed: 2018-10-16.

[5] [Webpage compressor]. (n.d.). Retrieved from bitly website: https://bitly.com/

[6] ImageMagick [Computer software]. (n.d.). Retrieved from https://imagemagick.org/index.php. Kiel, S. (n.d.). Automate Editing. Retrieved from gimp.org website: https://www.gimp.org/tutorials/Automate_Editing_in_GIMP/

[7] Script recorder (a.k.a. macro recorder) [Online forum post]. (2001). Retrieved from gitlab.gnome.org website: https://gitlab.gnome.org/GNOME/gimp/issues/8

[8] Creating actions. (n.d.). Retrieved from helpx.adobe.com website: https://helpx.adobe.com/photoshop/using/creating-actions.html

[9] Clark, A. (n.d.). Pillow. Retrieved from pillow.readthedocs.i website: https://pillow.readthedocs.io/en/stable/

[10] Yu, F. (2017). ResNet-152 in Keras [Computer software]. Retrieved from https://gist.github.com/flyyufelix/7e2eafb149f72f4d38dd661882c554a6

[11] https://pythonprogramming.net/programming-k-nearest-neighbors-machine-learning-tutorial/. (n.d.). Retrieved from pythonprogramming.net website:
https://pythonprogramming.net/programming-k-nearest-neighbors-machine-learning-tutorial/

[12] PyPE-python-photo-editor [Computer software]. (2016). Retrieved from https://github.com/corentinm68/PyPE-python-photo-editor

6. Appendix