Implicit Communication in Collaborative Object Transport

Journal for High Schoolers

Authors:

Munir Bshara and My Pohl

Abstract:

When two people carry an object together, they manage to change directions and avoid obstacles with minimal or even without verbal communication. Communication using nudges, forces and eye contact is often sufficient. In this paper, we study this nonverbal language with the aim to understand it better. This type of communication is useful also for artificial intelligent robots. The results of our study and further experiments may contribute to better communication between AI robots and humans as well as between robots. The experimental setup consisted of two persons carrying an object to a certain goal. Meanwhile data was collected from force sensing resistors attached onto the object and from a camera. The results showed that there seems to be different styles of communicating. To be able to apply this kind of communication onto robots we think it will be necessary to identify these different methods rather than collecting experimental data and calculate an average.

1. Methods

Materials. A grid made out of masking tape was used to outline the experimental area, a table which was the object the participants carried, four force sensing resistors to measure the different forces that were applied to the object, a web camera (logitech ​Pro Stream Webcam C922 1080HD) ​to record videos to later be able to analyze them in a program (OenPose). To get the forces we used 4 FRS’s connected to 4 breadboards which are then connected individually through a ​3.3kΩ resistor and 3 breadboard cables to an Arduino UNO(REV3) which is connected to a pc through USB.

Figure 1. A schematic overview of the electrical circuit.
Figure 2. A photograph of the actual set up.

The pictures above show how the ZJchao Force Sensitive Resistor was attached to the UNO R3 board. The FSR was attached to the breadboard through a F-M Dupont Wire. The red and blue breadboard jumper wires are for 5V power and grounding while the yellow breadboard jumper wire is used for direct connection to the computer. The resistance of the resistor was ​3.3kΩ. To convert the analog output to digital the Ardunio IDE was used and some code to retain the output of the force sensors.

Design. ​The experiment took place in a room with a grid on the floor with some fabric pieces of four different colors deployed in the squares of the grid. The grid had 32 squares (84), each square was 50cm50cm. In the ceiling there was a camera recording the procedure. Afterwards the videos were analyzed in a program named PoseNet which tracked the participants’ motions and trajectory. At the short side of the grid was the starting position of the table with four force sensing resistors attached.

On the short sides of the table there were handles with two force sensing resistors. The sensors were directly attached onto the table. On top of the sensors were the handles. The handles had a piece of eraser with the same area as the sensors attached to them. The handles were attached to the table with silver tape and so that the eraser touched the sensor. No pressure was being put onto the sensors while resting. The handles were loosely attached to the table.

Figure 3 and 4. Figure 3 is a photograph which shows what the construction of the handles looked like. Figure 4 illustrates how the handles were used.
Figure 5. A schematic showing a connection between an FSR and computer using Arduino UNO.
Figure 6. A graph showing the relationship between resistance and force.

The schematic above shows the connection between the FSR and the computer using the Arduino UNO which allows us to convert the analog signal from the FSR to a digital signal that can be printed. The relationship is generally linear from 50g and up, but note what the relationship does below 50g, and even more-so below 20g. These sensor’s have a turn-on threshold — a force that must be present before the resistance drops to a value below 10kΩ, where the relationship becomes more linear. To counteract this irregularity in the beginning of chart I altered the code. This alteration changes the parabolic reading of pressure put upon the force sensor to a linear one through dividing the force by a certain constant that was given to us.

Procedure. ​There were two participants at a time. Initially they got a general explanation of the experiment read out loud. They were told that they were supposed to carry a table together through a room and that they were not allowed to either turn around or speak with the other person. The experiment would be repeated three times with some small changes. The experiment was over when the table is put down on the floor. Further information was given individually.

Figure 7. A schematic overview of the experimental area with the three goals plotted. The numbers stand for the number of the experiment.
Figure 8. Photograph of the actual experimental area taken with the web camera from the same view as the video recordings.

These were the instructions that were given to the participating pairs on a paper.

Experiment 1, person 1:

“On the floor there will be a grid with some fabric pieces of different colors in it. The grid is the area where the experiment will take place. Remember, you are not allowed to speak with the other person or turn around.”

Experiment 1, person 2:

“On the floor there will be a grid with some fabric pieces of different colors in it. The grid is the area where the experiment will take place. Your task is to place the two legs of the table on your side into two squares with blue colored fabric pieces. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”

Experiment 2, person 1:

“Your task is to place the two legs of the table on your side into two squares with green colored fabric pieces. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”

Experiment 2, person 2:

“Your task is to place one of the two legs of the table on your side into a square with a yellow colored fabric piece in it. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”

Experiment 3, person 1:

“Your task is to place one of the two legs of the table on your side into a square with a yellow colored fabric piece in it, and the other into a square with a blue colored fabric piece. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”

Experiment 3, person 2:

“Your task is to place one of the two legs of the table on your side into a square with a white colored fabric piece in it, and the other into a square with a blue colored fabric piece. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”

All the experiments were recorded on film using the logitech camera in the ceiling. These videos were after the experiment was done analyzed in OpenPose.Then we got the exact position data, movements and trajectories. During the experiment the force sensing resistors were read every .5 seconds. We took time on all the experiments using a stopwatch.

To analyze the force data I connected all four force sensors to their own breadboard and their own arduino UNO board, and through a usb connection to my computer I was able to use an Arduino IDE with some code to print out the force outputs. Then by using a script and vi I was able to easily isolate the forces and time allotted into a text editor.

Looking like this:

To have something to compare the results from the experiments with we made a baseline. The baseline is the same exact experiments, but with both participants knowing where they should place the table.

Participants. ​Participants were recruited from different departments at Stanford University (The Artificial Intelligence Laboratory, Wallenberg Hall and interns from the internship program “Stem to SHTEM”). All participants were between the ages 15 and 52. The average age was 21 years. 12 people participated in the experiment (5 female and 7 male). All participants participated voluntarily.

2. Results

Figure 9. Showing the position for player 2.
Figure 10. Showing the position for player1.

All trials were recorded with the webcam attached to the ceiling. Then Machine Learning Techniques were used to extract various body points. ​OpenPose + Tensorflow allowed us to track positions relative to the camera. A combination of python scripts and bash tools was needed to extract and format the data.

It seems like different types of people react differently to each of the experiments based off previous interactions. For example, a dad and his daughter moved the table calmly while a pair of friends made more powerful movements. Another thing that was being noticed is that people pulled more than they pushed.

Figure 11. Four graphs illustrating the forces that were put onto the object from the two participants in experiment 1.
Figure 12. Four graphs illustrating the forces that were put onto the object from the two participants in experiment 2.
Figure 13. Four graphs illustrating the forces that were put onto the object from the two participants in experiment 3.

In the graphs above we have calculated the difference between the measurements from the pushing and pulling sensors on each handle to get the resulting force. This has been done during the whole time the experiment lasted. Down in the right corner of the figure we have the baseline which can be used for comparison.

Not that surprising, the baseline solved the task faster than the others on all of the experiments. The baseline did also use more exact nudges rather than a lot of back and forth communication. However, the other participants who did not know exactly where the goal was also managed to complete the tasks well.

There is a pattern in the way the different groups solved the task. For example, group 3’s three graphs look a lot alike. The same is for group 2 and 1. More measurements are needed to establish reliable conclusions but it seems like there are a few different styles on how to solve these kinds of tasks. Maybe it would be good to try to identify these different styles rather than collecting data and calculate an average because a combination of working methods does not necessarily have to result in a new working one.

3. Conclusions

Humans have an advanced and efficient way of communicating with each other. The rapidly growing research area artificial intelligence would benefit from learning this kind of communication. In this paper we focused on the nonverbal communication via nudges when carrying an object. We have developed a methodology to measure push and pull forces that are being put onto an object that is being carried. The results from the experiments showed that the tasks were possible to solve but that it went more efficiently when both of the participants knew the goal. Another conclusion is that there seems to be different methods to solve these kinds of tasks and that they may depend on the relationship between the two participants. The different methods should be identified.

Limitations and future work. ​When we designed this experiment we wanted to eliminate all other ways to communicate except for nudges so that we can be sure that the measurements we get are reliable and correct. One limitation is that we did not exclude the possibility to communicate via eye contact. Future work should take eye contact into consideration. An easy way to solve this problem would be to make the participants wear capes or hats. The measurements we got from the force sensing resistors showed how the two participants pushed and pulled. Another dimension that should be added is turnings. Then we would get a more complete idea of how the exchange of information via nudges works. Measure turnings can easily be done with two handles on each side of the table placed in the four corners. (For this four more force sensing resistors are needed.) With four handles the measurements will show when someone is pushing more on one of the handles or pulling stronger on one side of the table which would make it possible to identify turnings. The complete product of this project would be to transfer our measurements onto AI robots so that they can be able to communicate more efficiently and more human-like.

4. Acknowledgements

We would like to thank Professor Weissman for allowing us to participate in the STEM to SHTEM program. Also many thanks to Mengxi Li and Minae Kwon for mentoring us in this project and finally a big thank you to all the participants who helped us gather data for this paper.

5. References

Force Sensitive Resistor Hookup Guide​, learn.sparkfun.com/tutorials/force-sensitive-resistor-hookup-guide/all.

Kevin W. Rio, William H. Warren (2016). Interpersonal coordination in biological systems The emergence of collective locomotion.

Patrick Nalepka, Rachel W. Kallen, Anthony Chemero, Elliot Saltzman, and Michael J. Richardson(2017). Herd Those Sheep: Emergent Multiagent Coordination and Behavioral-Mode Switching.

Human-Based Image Compression; Using a Deterministic Computer Algorithm to Reconstruct Pre-Segmented Images

Journal for High Schoolers

Authors:

Vivaan Mahtab, Ganesh Pimpale, Juan Aldama, and Paul Truong

Abstract:

In 2018, Prof. ​Tsachy Weissman’s lab conducted research on using human semantics to compress an image. A ‘describer’ would provide links to image assets representing elements of the original image ​[1]​; for example, given a picture of the savanna, the describer would provide images of grass and trees, among others. With the provided compiled components, the describer would use natural language to describe how a ‘reconstructor’ could combine the elements to create an image semantically similar to the original. The instructions and links constituted the compressed image. They called this approach Humans as Awesome Compressors (HAAC). While this compression scheme significantly reduced the size of an image, the natural language instructions were imprecise and could be shortened further. To solve this a scripting-based image editing software that tracks the actions of the describer and can execute them on the reconstructor side, thus improving precision. ​Future research could involve completely automated image compression through the use of artificial intelligence, and the implementation of compression using a neural network and creating a database of pre-segmented images. Code can be found at https://github.com/GaneshPimpale/AI-Compression.

1. Introduction and Problem Statement

There are a plethora of lossy and lossless compression algorithms which help in storing and communicating mass amounts of images. Different types of compression algorithms take advantage of different properties of an image to reduce file size but maintaining similarity to the original image. Some of the most common types of compression algorithms are JPEG ​[2]​, JPEG2000​ [3]​, and WEBP​ [4]​. Most lossy compression techniques attempt to represent more image data with fewer data. For the JPEG compression algorithm, the image is converted to YCbCr, the image is downsampled, the DCT (Discrete Cosine Transform) algorithm is used to represent a finite array of values as multiple cosine functions, and the image is quantized to represent it in as few colors as possible. This process is very similar to WEBP, where the image is downsampled and brightness is decreased, but instead if using DCT, WEBP decomposes the image down into smaller blocks and then uses a prediction algorithm to represent more of the image with fewer values. These processes have major effects of certain properties of the image when a high compression ratio is in use. Namely: Blurring of the image, discoloring the image, and a decrease in resolution.

Figure 1​: Comparison of the effects on images from WEBP and JPG compression algorithms
Figure 2: The performance or WEBP compared to JPEG and JPEG2000 ​[4]

From figure 2 and 3, it is obvious to see that WEBP has significantly better performance than JPEG. In the examples from figure 2 the effects of JPEG on the image consist of miscoloring, noise, and over quantization. The size of the file decreased to around 140 kilobytes. Comparing this to WEBP, JPEG did much worse; the WEBP image only suffered some discoloration and change in noise with the file size below JPEG at around 130 kilobytes. From figure 3, the graph clearly shows that WEBP is more efficient than JPEG and JPEG2000, especially when working with smaller images.

Prof. Tsachy Weissman conducted research testing the abilities of human lossy compression ​[1]​. The research group was able to prove that the method of saving the important parts of an image is a viable way of storing it. The team harnessed two humans; one would encode the image by finding image elements, which are the important parts of an image, similar to the original image. The encoder would then communicate the image to the decoder using an online chat. The decoder would then redraw the image. The original image and the composite image would then be compared based on their file size, quality, and semantic similarity. However, this method of testing contains many flaws and there were many other variables that should have been considered. The first flaw is that directions can be executed in different manners by different people, the directions are quite sematic. Another issue is the fact that the language used to describe the image is nowhere near to concise, leading to larger file sizes as a result of an imperfect editing language.

None of the current standard algorithms consider the semantics of an image, they use the same compression algorithm on all parts of the image, even if it is the focus of the image or the background of the image making the entire image lose quality. Current research already done is currently not a practical solution to harness the idea of a semantic image compression technique. This project aims to automate the method of semantic image compression by creating a deterministic algorithm to reconstruct images in a repeatable manner.

2. Overview of Methods and Image Reconstruction

The overall goal was to be able to capture every editing command executed by the describer so that the instructions would be perfectly precise and reproducible. Thus, a custom script-based editor was created. To create a composite image, the describer creates a Python script where he imports element, performs per-element manipulation, such as resize, rotate and blur, and also places the element in the correct location and order on the canvas. He can run the script at any time and see the composite so far and check his work. When the describer is satisfied with the composite image, he runs the script through a parser which compiles it into a custom simplified command language to reduce file size and writes the output to a text file. This compressed text file of compiled commands constitutes the compressed image. To decompress and display the image, the text file is decompressed, then run through a parser to turn it back into a Python script. The script is executed, the composite image is recreated and saved.

In the previous work on HAAC, the describer would compile links to elements and generate text instructions on how to place the elements to create a composite image. However, this scheme had issues of repeatability and accuracy. Because natural language is poorly suited to precise descriptions of how to edit images, it was necessary for the describer to receive a video feed of the reconstructor’s work so that the describer could provide instructions on how to refine the composite image. Thus, image compression could not be effectively carried out with this scheme unless the describer and reconstructor were both in live communication with each other.

Figure 3: ​Flow chart of the overall method of compression

The compression scheme not only eliminates the need for live communication between describer and reconstructor, but it also eliminates the need for a reconstructor entirely. Rather than using natural language, the describer constructs the composite image himself via a Python script. The commands are recorded and can be executed to create an identically recreate the composite image without any further communication with the describer. Thus, the scheme takes advantage of human qualitative understanding to find relevant elements, edit and place them and a computer’s quantitative precision to capture exactly how the elements were edited and where they were placed.

To create a composite image, the elements used must be on transparent backgrounds. However, during testing, few elements actually came originally pre-isolated but rather had to be isolated from their original backgrounds using GIMP’s freeform crop tool. The experimental design was based on an assumption that a completed implementation of this compression scheme, there would be a repository of image elements with transparent backgrounds, as is discussed further in “Future Goals”. Thus, the same images as the previous research group were used, the elements were classified and isolated from their backgrounds as a pre-processing step. Therefore, the describer used elements with transparent backgrounds when writing the script to generate the composite image.

Additionally, while in last year’s research, elements were referred to using links, this year they were referred to using file paths. This was used to simplify the testing scheme, however, it does not skew the data because a URL shortener could be used to generate links shorter than the average file path used during testing. For example, the suffix to a Bitly link is only seven characters long, while the average file path in testing was greater than seven characters long​ [5]​.

Figure 4: ​Flow chart of the script parsing algorithm

To create a consistent method of scripting for the purpose of encoding and decoding, as well as to further compress a parser was designed to compress scripts into a constructed syntax-specific based language. This involved creating an overall method as to which parse the code that displayed the composite image which would provide full functionality while also reducing the script as much as possible. To account for overloaded functions and various images the parser would take in the entire script of elements and output a single string with the element image id and its corresponding functions given identification values and the parameters converted to hexadecimal per each identification with no strict limit on the number of identifications and quantity of parameters. This aforementioned string is stored as the represented composite image and is executable to be decoded and provide reconstruction. A flow chart of this process can be seen in figure 4.

It was considered to build off of existing image editing software. The optimal software would capture user actions in the GUI and save them as a script, as this would make the describer’s workflow much easier than manually writing the script. The options were narrowed to GIMP, Adobe Photoshop and ImageMagick as candidates because they all include both GUIs and scripting utilities. ImageMagick was quickly eliminated, however, because it is built for per-image editing, not placing and editing multiple images on the same canvas​ [6]​. And while GIMP and Photoshop could store UI commands like macros ​[7]​ and had editing histories [​8]​, macros lacked the necessary flexibility and editing histories lacked the necessary precision. Existing plugins to capture image editing commands were either incomplete or generated so much data that the edit log would be larger than the original image. Thus, a research-specific editing software was designed because Python’s Pillow ​[9] ​image editing library included all the necessary image editing functionality while affording more flexibility, including the ability to add a GUI on top of the original program in the future.

3. Experimental Results

Table 1: ​This table shows the type of compression algorithms used and the file sizes they produced.

Table 1 shows the compression algorithm used compared to the file sizes for each of the images. The Original column consists of the original image. The WEBP column consists of images compressed with WEBP with the quality set to zero. The JPEG column has images compressed with JPEG at quality zero. The HAAC column consists of the file sizes that the research group last year achieved. HAAC + Algorithm is the file size of just the plain python script to build the image. HAAC + Algorithm + Parser is the size of the python script after being compressed further. The bar graph provided is a visual aid to Table 1. To make it easier to read, the vertical axis is put on a logarithmic scale.

Table 2: ​Deterministic difference between a compressed image and the original image

Table 2 shows the deterministic differences between a compressed image and the original image. The WEBP images are compressed with a 0 quality and the JPEG images are compressed to a quality of 1. To compare the images, both the original image and the compressed image were resized to 1000 by 1000 pixels. On the resized images, an algorithm will compare each of the pixel values and then score them if they are within a certain boundary of each other. The bar graph above is a visual representation of Table 2.

4. Conclusions and Further Research

Through our research, we were able to create a compression algorithm capable of compressing an image by approximately 99.9884 percent. The average file size of all the images we compressed came out to be a total of 318 bytes from the original 2 million. However, there is the drawback that our images were deterministically the least similar to the original image, being on average, around 25.22 percent different.

In the previous research on HAAC, elements were referenced by linking to images on the internet. However, links are unreliable due to the compressed file possibly linking to images that are no longer available or whose URL was altered. Additionally, there exists the issue of isolating elements from a larger image. Freeform cropping elements from their source images was a time-intensive part of the compression process. Both these issues would theoretically be solved with an element database designed for the purpose of semantic compression[​10]​. This would allow the ability to provide links to elements within the database with the assurance that they will be available later. Furthermore, elements would be isolated from their backgrounds to avoid the laborious step of freeform cropping when creating composites.

Further research could be conducted on the feasibility of the aforementioned database. A cost-efficiency analysis should be conducted to measure how large the database must be in order to provide enough elements to contain consistently similar elements to the original image while minimizing overall size. If the database is not exceedingly large, clients could also store a local copy, thus removing the transmission cost of downloading all elements required to recreate a composite image.

To build even further on the idea of complete automation would be to have an AI work to both segment and form instructions on how to recreate the image. Segmentation software is already a very common type of software and has already accumulated much research. On the other hand, creating directions is a field that does not yet have much research and is very important to make HAAC fully automated.

5. Acknowledgements

Thank you to all those who supported this project. Thank you to the mentors: Micheal Herrera, Shubham Chandak, Kedar Tatwawadi, and Tsachy Weissman. Thank you to the past researchers: Ashutosh Bhown, Soham Mukherjee, Sean Yang, Irena Fischer-Hwang, Shubham Chandak, Kedar Tatwawadi, Judith Fan and, Tsachy Weissman.

6. References

[1] Weissman, T., Tatwawadi, K., Fischer-Hwang, I., Chandak, S., Yang, S., Mukherjee, S., & Bhown, A.(2019, June). ​Towards improved lossy image compression: Human image reconstruction with public-domain images​. Retrieved from Stanford Compression Forum website: https://compression.stanford.edu/human-compression

[2] David Taubman and Michael Marcellin, JPEG2000 Image Compression Fundamentals, Standards and Practice, Springer Publishing Company, Incorporated, 2013.

[3] Gregory K Wallace, “The JPEG still picture compression standard,” Communications of the ACM, vol. 34, no. 4, pp. 30–44, 1991.

[4] “WebP,” https://developers.google.com/speed/webp/, Accessed: 2018-10-16.

[5] [Webpage compressor]. (n.d.). Retrieved from bitly website: https://bitly.com/

[6] ImageMagick [Computer software]. (n.d.). Retrieved from https://imagemagick.org/index.php. Kiel, S. (n.d.). Automate Editing. Retrieved from gimp.org website: https://www.gimp.org/tutorials/Automate_Editing_in_GIMP/

[7] Script recorder (a.k.a. macro recorder) [Online forum post]. (2001). Retrieved from gitlab.gnome.org website: https://gitlab.gnome.org/GNOME/gimp/issues/8

[8] Creating actions. (n.d.). Retrieved from helpx.adobe.com website: https://helpx.adobe.com/photoshop/using/creating-actions.html

[9] Clark, A. (n.d.). Pillow. Retrieved from pillow.readthedocs.i website: https://pillow.readthedocs.io/en/stable/

[10] Yu, F. (2017). ResNet-152 in Keras [Computer software]. Retrieved from https://gist.github.com/flyyufelix/7e2eafb149f72f4d38dd661882c554a6

[11] https://pythonprogramming.net/programming-k-nearest-neighbors-machine-learning-tutorial/. (n.d.). Retrieved from pythonprogramming.net website:
https://pythonprogramming.net/programming-k-nearest-neighbors-machine-learning-tutorial/

[12] PyPE-python-photo-editor [Computer software]. (2016). Retrieved from https://github.com/corentinm68/PyPE-python-photo-editor

6. Appendix

Building a Human-Centric Lossy Compressor for Facial Images

Journal for High Schoolers

Authors:

​Hannah Chau​, Tyler Pauly​​, Darin Phan​, Roshan Prabhakar​​, Lian Quach​, and Christina Vo​

Abstract:

Many lossy image compressors have been created, yet they are not programmed knowing which parts of an image are most important to humans. In order to build a lossy compressor that would take human perception into consideration, components of facial images that are fundamental for making a face distinguishable among others needs to be determined. In this paper, two separate compression methods will be described with regards to their importance, the method in which they were designed, and their contribution to the future of facial imaging compression. The first experiment was programmed using the Amazon Mechanical Turk platform, where users participated in a human-to-human interactive game. In this game, one participant described a targeted facial image out of a dataset to another participant using text instructions. The results would then provide data regarding the accuracy of the description and the most useful facial features that helped the second participant guess correctly. The second experiment used this data to create a program that provided a compressed facial image while preserving the important aspects of the face. Data were collected applying these two qualifications and subsequently applied to construct a human-centric compression algorithm.

1. Introduction

Recently, compression, the reduction of space needed to store data, has become a big issue because of the significant need for more storage on devices. Lossy compression, which includes image compressors such as JPEG and WebP, reduces the amount of data by removing certain parts that the program deems unnecessary. Although the file becomes significantly smaller in size, this process often results in blurry and pixelated pictures. Furthermore, these lossy compressors are not human-centric. This means that they are not focused on compressing the parts of images that people care the least about, while the parts that are most important to human perception are kept. We aim to build such a human-centric lossy compressor that would remove data and ultimately affect the picture quality the least according to human perception. This would still compress the amount of data the image uses, but decrease the distortion by only losing data that humans would not notice in the image.

2. Previous Works

This current study expands on related works, one being experiments done on human compression and image reconstruction (Bhown, etc., 2018). The experiments involved a person with an image sending text messages with links and descriptions to another person in order to recreate the photo using only images off of the web and photoshop. The results showed that when people from Amazon Mechanical Turk judged the human recreation versus the lossy compressor WebP, the human recreation scored higher on all but one trial. That one trial was of a recreation of an image with a face. No photos on the internet of other people’s faces or facial landmarks combined were able to match the original picture, and therefore, made the person unrecognizable. Because faces are more complex than most images, we wanted to use our current experiments to determine the significant parts of faces that made them unique from one another and how that could be conserved during compression.

We incorporated some aspects of experiments done in the past based on graphical communication with constraints into our own in order to conclude which facial landmarks were the most salient. (Bergmann, T., Dale, R., & Lupyan, G., 2013). In their study, they created an online game in which participants from Amazon Mechanical Turk were involved in one of two trials: speaking or listening. In the speaking trial, one person would have five seconds to draw a picture with their mouse of the facial image given to them. In the listening trial, a different person would be shown the animation of the first person sketching this image. They would have to guess which face the sketch matched between the two images they were given.

3. Models and Materials

For the first experiment, we created a human-to-human interactive game using participants from Amazon Mechanical Turk. The game included two roles: describer and guesser. Roles were randomly assigned to each player. Before the user started, we had them enter demographic information (age, gender, race, region) to observe if it correlated to their accuracy on differentiating between the faces of the data set. From there, participants were then given instructions about the overview of the game; the describer would see eight similar faces and receive a specific target image to transcribe to the guesser, who would then choose the face they believe the sketch to be out of the data set.

The describer was shown one of the eight photos that the computer randomizes, along with the whole dataset, and was given either a time constraint or a word limit. With the time limit, the describer was told to type a description of the face in the box below to help the guesser distinguish this particular face from the other seven within 25 seconds. The description did not have to be incomplete sentences, and the describer could not type after the timer hit zero. With the word limit, the describer typed a description into the box and was not be able type after the word limit hit 15 words.

The guesser was then shown the description and the whole dataset of the eight faces. Then, participants were asked to match one of the faces to the one they believe the description was describing. Once they selected a face, they clicked submit within the 15 second time limit. The program then revealed if the guesser guessed correctly. From there, if the guess was correct, the participant was asked which facial feature or part of the description was the most helpful in determining which picture correlated with the description and repeat.

The goal of this experiment was to find the rate-distortion curve for humans and determine if it was more efficient than JPEG rate-distortion curve. Additionally, the intent was to further comprehend which distinguishable facial features are necessary to preserve when compressing a facial image.

For our second experiment, we created a human-centric compression program that compresses facial images, while preserving facial landmarks for recognizability. In this program, we utilized HAAR cascades to detect the jurisdiction of key facial features. We then implemented an adapted version of the K-nearest Neighbor algorithm to detect which of the fields found by the cascades were legitimate. After determining the location of key features, the image is scaled down to a smaller resolution, then resized back to its original resolution. The result at this point is a fully pixelated image. Finally, the regions defined by the HAAR Cascades are overlaid onto the pixelated image. The product is a pixelated image with the main facial landmarks clearly defined. If no faces are present in the image, the program returns a fully pixelated image.

To test the validity of this algorithm, the image reconstructions have been rated by human scorers based off how recognizable the Facial image is and how satisfied they are with the aesthetic appeal of the overall reconstruction.

4. Results

This is the finalized dataset composed with the Humanae project on Tumblr, which will be used in the human to human interactive game.

5. Conclusions

We have yet to come to a solid conclusion, as our project has not received the approval of the ​Institutional Review Board to complete phase one of the experiment. Also, although the data has been collected for the image reconstructions, we have yet to understand the data and alter our algorithm accordingly. So far, our most promising result is the prototyped compression algorithm which preserves key facial landmarks.

6. Future Directions

We feel that the weakest part of our algorithm is the compression, after determining landmarks. Currently, we are pixelating the image in efforts to reduce the image size. Future work requires us to research better lossy compression methods which can be applied to entire images. This will allow us to achieve a better compression ratio while preserving the aesthetic and recognizability of the image. Another future step would be to find a better verification process for the results of the HAAR cascades. We are currently using K nearest neighbor to compare each result with a handmade database of many different landmark pictures in a time, and memory, expensive process.

7. Acknowledgements

We thank Prof. Tsachy Weissman for guiding our group and helping us to start this whole study. We would also like to thank Sara Abdali for also being a mentor to our group and for the multiple informative discussions. Lastly, we would like to thank Judith E. Fan, Kedar Tatwawadi, and Sophia Kivelson for providing feedback and answering any questions that we needed answered as a means to getting our study started.

8. References

Bergmann, T., Dale, R., & Lupyan, G. (2013). The Impact of Communicative Constraints on the Emergence of a Graphical Communication System. ​Proceedings of the Annual Meeting of the Cognitive Science Society​, 35.

Fan, Judith & Hawkins, Robert & Wu, Mike & Goodman, Noah. (2019). ​Pragmatic inference and visual abstraction enable contextual flexibility during visual communication.

Suchow, Jordan & Peterson, Joshua & L Griffiths, Thomas. (2018). ​Learning a face space for experiments on human identity.

Bhown, et al. “Towards Improved Lossy Image Compression: Human Image Reconstruction with Public-Domain Images.”

Monroe, Will & Hawkins, Robert & D. Goodman, Noah & Potts, Christopher. (2017). ​Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding. Transactions of the Association for Computational Linguistics​.

Hawkins, R.X., & Sano, M. (2019). Disentangling contributions of visual information and interaction history in the formation of graphical conventions.

Human Compression. (n.d.). Retrieved from https://compression.stanford.edu/human-compression

Panos, et al. “ShapeGlot: Learning Language for Shape Differentiation.” ArXiv.org, 8 May 2019, arxiv.org/abs/1905.02925.

Angelicadass. (n.d.). Humanæ – work in progress. Retrieved from https://humanae.tumblr.com/

“HAAR Cascades.” ​Haar Cascades​, alereimondo.no-ip.org/OpenCV/34.

Opencv. “Opencv/Opencv.” ​HAAR Cascades – Github​, 7 Feb. 2018, github.com/opencv/opencv/tree/master/data/haar cascades.

Comparing the compressibility of different genomes and the correlation between genome length and compression ratio

Journal for High Schoolers

Authors:

Tyler Lafayette, Heather Lee, Naomi Boneh, and Joel Rivera

Abstract:

The compression ratios of various species’ genomes reflect the repetitiveness of these genomes and can lead to a deeper understanding of the repetitiveness of each genome. In addition, we can make connections between gene repetition and compaction and the intended use of the DNA and conditions it must survive in. ​We tested our ideas by running the genomes of many different organisms through several compression algorithms. Our main question was how much variation is in the repetitiveness of genomes. To study this, we used compression since it somewhat measures repetitiveness.

1. Introduction

A genome is an organism’s set of DNA that holds all the information needed to construct and preserve itself. Repetitions of adenine (A), thymine (T), guanine (G), and cytosine (C) make up DNA; we wanted to see how individual species and their base pairs could be compressed. In the constant update in technology seen throughout the world, DNA sequencing has become cheaper and more accessible. The problem, however, is that the human genome is three billion base pairs long so each person’s genome needs lots of data in order to be stored. Currently, the information about DNA compression according to “​How Repetitive Are Genomes​” is that there are large amounts of repetition among diverse genomes. We decided to approach this topic by seeing how far we could compress different genomes and using different compressors to discern which was the most effective. After running genomes through lots of code, we found that the best DNA compressor was H(p).

2. Methods

To start, we started plotting probability in dice rolls. We used the Java coding language to roll a dice 1000 times and give us the side it landed on. In Java, we used an ArrayList and defined the six sides of a die so that, when the program ran, each side had the number of times it was rolled. Since some of us were in the beginning stages of programming proficiency, we copied the data over to Google Sheets in order to create a plot. After doing that, we decided to pivot to the Julia programming language, because this language allowed us to plot the dice rolls in one line of code and due to ​its high-performance numerical analysis and ability to process big data applications.​ We were also learning the basics of compression, in particular, lossless compression at the same time. Data compression is used to encode information using fewer bits than the original model. Lossless compression recognizes and eliminates statistical repetitions so the data size is smaller than the original. It is extremely useful because it not only reduces the resources required to transmit and store data but also enables the data to be decompressed. ​The genomes of several organisms were downloaded from Ensembl (https://ensembl.org).

We organized our goals into a pipeline. The first step was to get the genomes from the websites and download them onto our computers. Once we had the genomes ready to go, we ran them through different compressors like Bzip2, Gzip, Zstd, as well as an entropy function for predicting compressibility in order to see which algorithm would be able to most efficiently compress the genome files. After the compression was run, the new file size was stored in order to make a plot or graph that would demonstrate how different compressors performed, as well as how genome length correlated to the compression ratio.

We decided to implement our pipeline in Julia because it is efficient and simple. As per our pipeline, we created a script that watches a directory containing uncompressed genomes in the FASTA (.fa) file format. When it detects a genome file that has not yet been analyzed, it first runs the file through a “sanitization” program. Genome FASTA files are typically separated into lines using “/n” and/or “/r.” The sanitation program optimizes compression by removing these unnecessary line break symbols. After this process, the script iterates through four different compression algorithms, running each on the source file and recording the output file size and compression ratio in an SQL database, which stores a table of results. Finally, we used another script to data from the database and generates a plot using multiple methods of categorization as shown below.

3. Results and Discussions

Our results suggest that our initial thoughts that the larger genomes would be more compressible than smaller genomes were incorrect. In fact, our data showed that the genomes with the larger sizes tended to be less compressible.

The different colored dot sets in this plot represent the same genome files ran through various compression algorithms. We noticed that, compared to the H(p) entropy algorithms, there was a small set of genomes that the other algorithms were able to compress much more efficiently than the entropy function. It is important to note that, while the rest of the algorithms process any type of text inputted, the entropy algorithm only processes A, C, T, and G characters. This might explain the large gap in some of the compression ratios, as some of the larger genomes had many repetitions of blocks containing only “N,” which signifies an unknown part of a genome. This extra repetition could optimize the algorithm, and cause higher efficiency.

This plot displays the compression ratios of different genomes using the entropy function, grouped by the species’ kingdoms. In general, bacteria have the smallest starting file size and are able to be compressed by a factor of around 4.15, demonstrating how the smallest files are one of the most compressible. On the other hand, the Animalia kingdom had the largest starting file sizes and had around an average compression ratio of 4.05. The two main outliers are a part of the Protozoa kingdom, so they have a lot of repetition within its genome. The higher the compression ratio, the higher the compressibility and the smaller the output file.

This plot displays the relation between the original file size (in bits) and the output file size of genomes after processing using compression algorithms. We found a near-linear relationship between input size and output size.

4. Conclusions

Our project has shown that DNA compression is extremely useful in determining how repetitive a genome is. The more repetitive the genome is, the more compressible it is. This reduces the number of resources needed to store the DNA base pairs. Our progress is useful because we can conclude from our plots that the H(p) compressor has the highest compression ratio, and that larger starting file size does not necessarily mean a higher compression ratio. In the future, people could develop compressors that are DNA specific which would benefit the genome sequencing industry by allowing genomes to be compressed even more than they are with regular data compressors.

5. References

Haubold, Bernhard, and Thomas Wiehe. “How Repetitive Are Genomes?” ​BMC Bioinformatics​, BioMed Central, 22 Dec. 2006, www.ncbi.nlm.nih.gov/pmc/articles/PMC1769404/​.

“Data Compression.” ​Wikipedia​, Wikimedia Foundation, 17 July 2019,
en.wikipedia.org/wiki/Data_compression.

Probabilistic Models of Human Decision-Making With Time Pressure and Partial Information Using Game Interfaces, Reward Functions, and Markov Decision Processes

Journal for High Schoolers

Authors:

Karan Bhasin and Aditi Talati

Abstract:

While there is a breadth of research about the human-decision making process in game theoretical situations, much of that research focuses on the situations in which time pres-sure or lack of information does not limit the ability of humans to process the information which they are given. The lack of proper time to understand the available choices must be understood in order to accurately predict the behavior of humans under rapid decision-making situations, as an autonomous car must when processing the behavior of nearby human drivers. There are many proposed mathematical models of these decision-making processes, where humans are perceived to follow certain reward functions with various probabilities. However, verifications of these models are inadequate, and it remains un-clear which model works the best.

In order to test these models, we have created multiple logical games where we col-lected data on the decisions players made and compared them to mathematical models of such decisions. We found that the prospect theory model was the most accurate for time constrained games, as previous literature predicted. However, since we could not compute reward values for the information constrained game, we could not compare it to these mathematical models. In the future, we would like to develop a way to analyze the expected rewards for this game so that we can compare it to such models.

1. Introduction

1.1 Overview

We have created multiple games which ask the player to make a computationally heavy decision under various constraints. Two of the games, blackjack and the counting game, asked the player to make a single decision under varying time constraints, while the grid game asked the player to travel from one corner of a grid to another. This was a long-horizon game, where each step the player took impacted the possible future steps. This was also done under various time constraints. The hover game was an information-constrained game, where knowing more information about the game state would also cost points to the player.

The data we collected from each game was compared to various function transfor-mations for likelihood functions, using a Metropolis-Hastings algorithm to calculate the parameter set that maximizes the likelihood for these functions. These values were then re-entered into the function with a validation data set to find the likelihood of the data and relative accuracy of the model. We found that, in our games, the prospect theory model produced the most accurate results, which was expected, because it is the most flexible model of human decision-making.

1.2 Games

We implemented multiple computation heavy games to gather data about user decisions based on time. In these games, the horizon is short and there are only a few actions pos-sible, but deciding the optimal action requires some considerable amount of computation. The first game is called the counting game (Figure 1), and displays 5 square grids each of which is 20 by 20 in size. Anywhere from 3-18 squares are changed to blue on each grid. The user must pick the grid with the maximum reward, where any grid with a multiple of 3 blue squares is a gain of the number of squares and all other grids are a loss of the number of squares. For example, if the displays have 5, 6, 10, 13, and 18 squares, the rewards would be -5, 6, -10, -13, and 18. The time constraint is how much time the user has to make their decision before the grids change to new displays with a new number of blue squares. We collected 100 data samples for each of 5 time constraints: 3, 5, 10, 15, and 20 seconds.

Figure 1: Counting game in JavaScript.

The second game is a variation of Blackjack (Figure 2), where the user has a deck of cards laid out in front of them and knows the final value of the opponent’s cards. The player has already been dealt two random cards and must decide whether to draw a third, knowing that if their final score (the sum of their card values) is greater than 21, they receive a negative reward. If it is greater than the opponent’s and less than or equal to 21, they collect a positive reward, and a smaller positive reward if their score is equal to the opponent’s. The player received a reward of 0 if they scored lower than their opponent; here, they had the option to play further, though any further moves were not considered in our data since the player had already seen the deck. For this game, we collected data for 2, 5, 10, and 30 seconds. We collected data for over 150 runs of the game for each time constraint, where the data consisted of the expected reward of each decision the player could make and the reward of the decision that the player chose.

Figure 2: Blackjack in JavaScript.

We designed another game, the grid game (Figure 3), with a grid of squares, where the user moves in a confined space from square to square. Each square on the grid has a value between -1 and 1, exclusive. The color of each square was decided using an RGB function and the value of the square, where positive values intensify a green color and negative values intensify a red color. The user must then get as many points as possible by moving along an acyclic path with the highest total reward. It is impossible for the user to visit a square that they have already visited. The user begins on the bottom left square of a 10 by 10 grid, which has a reward of 0, and must reach the top right square, which has a reward of 100. This game was a long-horizon game, and we collected data for 3, 5, 10, 15 and 25 seconds.

To investigate the effects of an information constraint on decision-making, we designed a hover game with 3 horizontal panels, each of which contains a block which is constantly moving a random step in either horizontal direction. The game begins with all of the blocks hidden. The user may hover over any panel and view the current location of the block in that panel, but at the cost of 1 point. The user may click to move a block back to the center of the panel, at a cost of 10 points. If the block hits either the right or left edge of the panel, the user loses 100 points. The game ends after a certain number of steps. This game demonstrates how information constraints can change the way in which a player makes decisions.

2. Methods

2.1 Value Iteration and Reinforcement Learning

For the one-horizon games, blackjack and the counting game, expected rewards were directly equal to the values we used in Metropolis-Hastings. For the long-horizon grid game, we generated a new grid of values in place of the grid of raw rewards for each experiment. We used this to establish that one move would affect all future rewards. In other words, the user is incentivized and encouraged to think ahead and plan their choices, rather than consider each move as a separate and unconnected decision. We then used these values with Metropolis-Hastings. Before using this data, we subtracted the mean of the values in each experiment from each value in the experiment for numerical stability.

The optimal path was calculated using value iteration [3]. Each tile was given a value which was originally set as its reward. This value was the value of the corresponding square in the grid game. Then, for each state s , the best state to go to (out of the immediate possible steps s' ), was calculated using

V(s) = R(s) + \max_{s'} \sum_{s'} \gamma V^{\star}(s') \qquad (1)

where R(s) is a function that retrieves the original reward at state s , \gamma is gamma, the discount factor, which we set to 0.999, and V^{\star}(s') is a function which retrieves the current value at state s' . The process is repeated for the entire grid until it converges, or the change in the tile values is lower than a certain threshold value, which we set to 0.001. At this point, the program may draw the optimal path, which is obtained by traveling to the tiles with the greatest values.

We also collected data for another set of experiments, where we allowed the user to play the game with a stopwatch running, rather than with a time constraint. The objective was not to reach the goal in a certain amount of time, but to reach the goal by taking the optimal path. From this, we aimed to determine the average time needed for the user to discern the optimal path.

Figure 3: Sample grid game in JavaScript. (a) Random grid is generated, (b) user takes a path to the reward, (c) optimal path is generated.

2.2 Models

The Metropolis-Hastings algorithm [6] samples variables from an input distribution. The algorithm takes in a starting guess for those variables and computes another value for each variable that is a random noise away, by using a constant step size modulated by a random variable with a Gaussian distribution [5]. If the probability of the new value under the given distribution is greater than the original, the new value is added to a set of accepted values. If it is less, it is added with the probability p = p_{\text{next}}/p_{\text{current}} . If the new value is accepted, it becomes the new current value. Otherwise, the current value is added to the set and remains the current value. The algorithm repeats for a certain number of samples, and then cleans the set. For cleaning the set for the counting game and the grid game, we used an array of 100,000 samples, the first 10,000 of which were discarded. We then took the first of every 9 samples to get 1000 samples with lower autocorrelation. To clean the set for the game of blackjack, we took the last 25000 of our (between 50,000 to 100,000) samples, and used the first of every 50 samples to get 500 clean samples. The algorithm then averages all the values in the clean set to find the expected values for each of our variables.

In an alternative method, the algorithm traversed the set in the same way as before, but stored only the variable set which produced the maximum log-likelihood output for the function, out of the values that it considered while traversing the possibilities for each variable.

The first model we tested is the noisy rational model. The log-likelihood function (which takes each reward R as an input and outputs the natural log of the probability that the player made a certain decision) for the noisy rational model can be expressed as

\ln \left( \frac{\exp(\beta R)}{\sum_{x=1}^n \exp(\beta R_x) } \right) \qquad (2)

where n is the number of choices, R is the reward for each choice (the points gained or lost), and \beta is the temperature coefficient [4].

Figure 4: sample \beta R vs R graph, where \beta = 0.25 .

Intuitively, when setting beta to 1, this function gives the probability of choosing a given option while making all terms positive, increasing the value of large positive rewards, and decreasing the absolute value of large negative rewards. Then, the beta value adds noise to the function, such that a small beta value makes the R terms more irrelevant and brings the probability of each option closer to a random one, while a large beta value makes the R terms more relevant and increases the probability of the best options. For the blackjack game, the reward values for each trial were set to a reference point of the worst option. This means that, in each trial, the value of the worst option was subtracted from each of the reward values.

The log-likelihood function for the threshold model [2] can be expressed as

\ln \left( \frac{\exp(\beta \min\{R, \tau\} )}{\sum_{x=1}^n \exp(\beta \min\{R_x, \tau\}) } \right) \qquad (3)

where \tau is the threshold, so that each option with a reward greater than \tau is considered equivalent.

Figure 5: sample \beta \min\{R, \tau\} vs R graph, where \beta = 0.8 and \tau = 19.

This takes the same function as before, but adds the condition of a threshold, where options above a certain value are considered “good enough” and are seen as equivalent. A real-life example of this model would be a situation where a driver is faced with the possibility of an accident, and has little time to decide how to act. They may drive off the road rather than change lanes, because with such little time, this action is “good enough.” As the threshold value tau decreases, more and more (worse) options are considered good enough, and as tau increases, the player’s discretion becomes finer. We considered two versions of the threshold model: one where we set beta to a given constant and varied only tau, and one where we varied both tau and beta. For the blackjack and counting games, the reward values for each trial were set to a reference point of the worst option. This means that, in each trial, the value of the worst option was subtracted from each of the reward values.

The log-likelihood function for the prospect theory model can be expressed as

\ln \left( \frac{\exp(\beta f(R))}{\sum_{x=1}^n \exp(\beta f(R_x)) } \right) \qquad (4)

where the function f(R) is defined as

f(R) = \begin{cases} -\lambda(-R)^{\gamma}, & R<0 \\ R^{\alpha}, & R\ge 0 \end{cases},

where R is the reward, \alpha, \gamma , and \lambda are all variables being sampled, \alpha and are between 0 and 1, and [latex] \lambda is positive [9].

Figure 6: sample function f(R) where \lambda = 2.25, \gamma = 0.5, \alpha = 0.5.

These conditions were set by having the function output a log-likelihood of negative infinity whenever alpha, gamma, or lambda were outside their given range. Since the function takes some root of all positive rewards, the positive rewards seem larger than the actual value when the reward is small, and smaller than the actual value when the reward is large. Thus, small rewards seem important, but distinguishing between two larger rewards becomes difficult. We can see a similar root function for the negative rewards, but there is an added coefficient lambda which increases the absolute value of the negative reward. Thus, between a loss and gain of equal value, the loss appears larger than the gain. This can be understood with a real-life example, where a person is to choose between a 90% chance of a gain of $1000 and a 100% chance of a gain of $900. Prospect theory details that humans are risk-averse, and would choose the sure gain. However, if the gains were instead losses, humans are risk-seeking, and would take the gamble, hoping to avoid the loss. Here, we considered x to be relative to a given reference point. This reference point was the average of the two options for the blackjack game. For the counting game, the model was tested with a reference point of 0, a reference point which was the average of all possible values in the set, and a reference point which was the average of the values for each individual experiment. We found that the reference point of 0 gave the most accurate results.

2.3 Cross Validation

For each of the models, we used a k -fold cross-validation method on each of the time constraints, where we randomized the data and divided it into a certain number of groups, and rotated through the process of using each group as a validation set and the others as the training set [7]. We calculated the values of the unknown variables of the function on the training set, and plugged those variable values into the model with the validation data set, so that we could calculate the log-likelihood of our data set given those variable values. This would give an approximation of the relative accuracy of the models we used. From this, we took the average log-likelihood for the time constraint, and then used the averages to compute the overall log-likelihood for the model. Throughout this process, we used the log-likelihood in place of the likelihood, so that the values would be numerically easier to compute and compare.

2.4 Information Constraint

In addition to the Java implementation of the hover game, we developed a version of the game using JavaScript (Figure 7) which could be played by a user through any web browser, and another version in Python (Figure 8) for which we trained an AI agent [1] to play the game optimally, using off-the-shelf Proximal Policy Optimization algorithm [8]. We ensured that the games were the exact same by using the same dimensions and circumstances, and by verifying that both games reached the same average score across hundreds of trials where the game ran with no user or AI agent interference (no hovers or clicks, just crashes).

The game lasts for 300 steps, where each block moves once during every step. The distance that each block moves is equivalent to a uniform random integer between -15 and 15 inclusive. The distance between the right edge of the left wall and left edge of the right wall is 480. Crashes are detected when the center of the piece reaches the inner edge of either wall.

Figure 7: User version of the hover game in JavaScript. In panels that are not currently being hovered over, faded red squares represent the location of the block at the most recent instance of hovering. If the user hovers over a panel, the square turns a darker shade of red.
Figure 8: AI agent version of the hover game in Python. The large circles represent the location of the piece in each panel at the most recent instance of hovering. If the user (AI) is currently hovering over a panel, the large circle in that panel turns blue. The small red circles represent the current location of every piece. This example represents a game with a score of -169 at 85 steps, where the agent is currently hovering over the middle panel.

3. Results

3.1 Optimal Parameters for Each Time Constraint

For the single-parameter mathematical models, there are unique trends when we graph the parameter over the possible time constraints. The noisy rational model multiplies the reward by beta. A beta greater than 1 therefore amplifies the value of the reward, meaning that the player is more likely to choose the better decision, while a beta less than 1 decreases each reward value, making the player’s decision more random. When the player has a shorter amount of time to play the game, beta should be smaller, and when the player has more time to consider the possible decisions, beta should be larger. Similarly, in the threshold model, any reward above some value, tau, is considered good enough. A smaller tau means the “good enough” value is worse, while a larger tau means the reward has to be substantial for the decision to be considered good enough. Again, when the player must make a decision with little time, the tau value should be lower.

For the counting game and blackjack, we graphed the values of beta for each time constraint for the noisy rational function and tau for each time constraint for the threshold function. We used the clean set from Metropolis-Hastings, and took the average value of the variable as well as the standard deviation of the clean set for the graphs.

Figure 9: Beta vs Time Graphs for (a) Counting Game (b) Blackjack.
Figure 10: Tau vs Time Graphs for (a) Counting Game (b) Blackjack.

3.2 K -Fold Validation Data

There were 5 subsets for the counting game and blackjack and 6 for the grid game. Each subset was used as the validation set exactly once. For each model, the algorithm was used for all the time constraints, and the maximum log-likelihood was obtained.

Figure 11: Blackjack Game Log-Likelihoods Per Time Constraint.
Figure 12: Counting Game Log-Likelihoods Per Time Constraint.
Figure 13: Grid Game Log-Likelihoods Per Time Constraint.

3.3 Time Needed to Discern Optimal Path

For this data, the user played the grid game until they had achieved the optimal path 30 times. This graph is a histogram of the time it took to reach the optimal path during each trial.

Figure 14: Number of Times vs Time Taken.

3.4 Hover Game Data

To compare the performance of the AI agent against the performance of a human user, we let the AI agent play 100 games. We then had two users, one who was experienced with the game and had played it many times before, and one who had never played it before, play the game a number of times. For all the tests, we collected the average score at each step and the standard deviation of the scores achieved at each step.

Figure 15: (a) Experienced user vs AI agent, (b) unacquainted user vs AI agent.

4. Discussion

4.1 Blackjack

In the blackjack game, the prospect theory model was the best model. This makes sense because the prospect theory model is known, in theory, to be the most accurate model of human behavior. However, there was a concern that with the data set given, the prospect theory model, with its many variables, may overfit to the training set, or adjust to the exact noise of the learning set rather than the true function. This would reduce the likelihood of the verification set. Fortunately, this did not occur.

The noisy rational model did produce slightly better results than the combined thresh-old model in terms of validation likelihood, yet because the two values are so similar, they may be considered approximately equal. This conclusion is reinforced with the tau and beta outputs for the combined threshold model. Because the combined threshold model combines the noisy rational model and the threshold model, its likelihood should be equal to or greater than both models. The combined threshold model has an equal likelihood to the noisy rational model only if the combined threshold model maximizes the threshold and outputs the same beta as the noisy rational model, since this is equivalent to the best possible beta with no threshold. This is validated when we consider the data for the combined threshold and noisy rational model.

This data also explains why the threshold model is the least accurate model out of the four considered. The threshold model is not entirely relevant in this situation, possibly because the player is only choosing between two options, so increasing or reducing the positive reward value of the optimal option may not significantly change the player’s decision.

When we consider specifically the parameter versus time graph for the noisy rational function, the beta values for 5, 10, and 30 seconds are within one standard deviation of each other. This implies that although the player makes worse decisions when given two seconds, they have sufficient time to consider each model to the maximum of their ability within 5 seconds, and do not improve between 5 and 30 seconds. Note that the player is still not making perfect decisions at this point, and even though they are given a significant amount of time, it is not nearly enough to calculate mentally the exact expected value of the possible decisions. Such a mental calculation would be extremely difficult.

The parameter versus time graph for the threshold function is shocking, because the threshold for 30 second games is actually less than the threshold for 10 seconds, and this is a difference of over one significant figure. However, the threshold values are close to zero for each of the models, implying by this data that the players make almost random decisions. This may partially be because the beta value was set to 1.2 to calculate the tau values in this model, while the true beta was closer to 0.2 for the time constraints given. The smaller beta would account for the element of randomness in the game. When the Metropolis-Hastings algorithm is run to optimize both tau and beta in the threshold function, the beta values are within a standard deviation of the ones in the noisy rational function, and the tau values are maximized. Thus, the threshold model is not a relevant model in describing the blackjack game.

4.2 Counting Game

According with the results of blackjack, the prospect theory model was the best, and overfitting did not occur. However, a discrepancy was that the combined model did produce slightly greater log-likelihoods than the noisy rational model. This is expected, as the combined model trains both beta and tau and therefore is more likely to find a parameter set that yields a greater log-likelihood. The threshold model was the worst in the counting game as well.

The beta and tau vs time constraint graphs demonstrate an increasing value for each variable with time. Moreover, the standard deviations ensure that the differences between these variable values at each time constraint are significant, as none of the points are within one standard deviation of each other. This makes sense for beta, because as the time constraint increases, the user makes more informed decisions that are less random, and therefore a greater beta value gives more weight to these decisions.

The tau graph is also reasonable, as allocating more time allows the user to make favorable moves more often. With these moves come higher scores, which means that the threshold at which the user’s score is considered “good enough” must increase. To interpret this graph, it is important to remember that we subtracted the minimum possible reward in each experiment from every value in each experiment. Rather than the threshold being these tau values, the threshold is actually tau greater than the minimum for each trial. Thus, since the threshold is negative for 3 seconds, this means that any value is good enough, or that play at 3 seconds is basically random. This makes sense, as 3 seconds is generally not enough time to distinguish the best choice. Moreover, since the threshold is about 55 for 20 seconds, no reward is considered “good enough.” This is because the lowest possible reward in the counting game is -17, which means the true threshold is at least 38, an unattainable score. In other words, 20 seconds is generally enough time to distinguish the best choice.

From this graph it can be determined that while the threshold model is the least accurate of the four models we tested, it is still applicable in this game and accurately demonstrates the numerical difference in performance with varying time constraints.

4.3 Grid Game

The results of the grid game further reinforced that the prospect theory model is the most accurate in describing human decision-making in games. The results also demonstrate that the threshold model is less accurate. Unlike the other two games, however, the combined threshold model was not at least as good as the threshold and noisy rational models. This is due to overfitting, which occurred when the thresholds were maximized while training the combined model.

The data from the histogram is not informative, as too many experiments were dis-carded for the time results to be conclusive. However, if the experiment procedure is changed, this could generate useful results. A possible procedure could be playing the game with a certain time constraint, and increasing the time constraint until optimal scores are consistently achieved. This could more accurately determine the time at which a human user could play optimally.

4.4 Hover Game

In the hover game, we did not yet create a mathematical method of analyzing the actions that the human players took. However, from the data that we have from the AI playing the game, we can conclude that the AI hovers over some object at almost every step, and that, in the hundred trials we held, it never let any ball out of bounds. From this, we can deduce that a relatively optimal strategy would be to hover over one panel at every step, rotating between the panels. If a ball could reach the end of the panel within three steps, then it is clicked to ensure that it returns to the center. This could be further optimized by not hovering over the balls when they are known to be close to the center.

By comparing the performances of the inexperienced and experienced humans, it is clear that playing the game many times prior to data collection allowed for significant improvement. The experienced user achieved an average score of over 240 points greater, establishing a trend that human users learn to play more rationally with time and experience.

The human player consistently scored worse than the AI agent even after experience; this may be for many reasons. One might be that the time constraints of the game prevented the human player to hover over a ball at every step, an opportunity that the AI had. The human player played with the condition that the balls would move one step every 0.3 seconds; they could not click or hover over the panels more than once during that duration. Though a version of the game was modified to move the balls every second, this was not used in the data-collection process because such a game would be too time-consuming to play over multiple trials. Another explanation of this may be simply because the human had developed a less effective strategy, or that their idea on when to click the ball was less precise than that of the AI.

5. Conclusions

We found that for time-constrained games, the prospect theory model best describes human rational behavior. Our results were empirically corroborated across hundreds of trials in two one-horizon games and one long-horizon game. We were able to further conclude that the threshold model is the least accurate and applicable of the four models we tested. However, we found significant trends in the beta and tau values from the noisy rational and threshold models, numerically demonstrating the effect of a time constraint. We would like in the future to corroborate our results with even more experiments across other games. We also plan to see how real robots perform with these different models by applying them to games with physical movements.

For the information-constrained game, the expected reward for each move was un-known, so we were not able to fit it to a mathematical model. However, we were able to set up a playable user interface and train an AI agent, and compare the respective scores. Moreover, we gathered data including the current position of the ball in all three panels, the score, and the action at each step. This data could be used to create a mathematical model that best fits these types of games. We would like to create a reward function that would calculate the overall value of each possible move the user could make, so that we could compute the rationality of the user’s behavior. Then, we could also consider the effects of different time constraints on the hover game. The results of the hover game can also be used to demonstrate the suboptimality of humans in other tasks, like a human checking their mirrors while driving.

6. Acknowledgements

We thank Inder Bhasin and Vijay Talati for playing our games and helping us collect data. We would also like to acknowledge Cindy Nguyen and Professor Tsachy Weissman for organizing the SHTEM program, and Erdem Biyik and Professor Dorsa Sadigh for mentoring and advising us throughout this project.

7. References

[1] Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI Gym. arXiv preprint arXiv:1606.01540.

[2] Halpern, J. Y., Pass, R., & Seeman, L. (2014). Decision Theory with Resource-Bounded Agents. Topics in cognitive science, 6(2), 245-257.

[3] Hartley, T., Mehdi, Q., Gough, N. (2004). Applying Markov Decision Processes to 2D Real Time Games. University of Wolverhampton, School of Computing and Informa-tion Technology. Retrieved from http://hdl.handle.net/2436/31518.

[4] Luce, R. D. (2012). Individual choice behavior: A theoretical analysis. Courier Cor-poration.

[5] MacKay, D. J. (2003). Information theory, inference, and learning algorithms. Cam-bridge: Cambridge University Press.

[6] Metropolis-Hastings algorithm. (n.d.). Retrieved from https://python4mpia.github.io/fitting data/Metropolis-Hastings.html.

[7] Refaeilzadeh P., Tang L., Liu H. (2009) Cross-Validation. In: LIU L., ¨OZSU M.T.(eds) Encyclopedia of Database Systems. Springer, Boston, MA.

[8] Schulman, J., Wolski, F., Dhariwal, P., A., & O. (2017). Proximal Policy Optimization Algorithms.

[9] Tversky, A., & Kahneman, D. (2000). Advances in Prospect Theory: Cu-mulative Representation of Uncertainty. Choices, Values, and Frames, 44-66. doi:10.1017/cbo9780511803475.004.