Alex Chin and Alan Yu
Georgina Cortez, Nicole Krukova, Thuan Le, and Suraj Thangellapally
Abstract:Nanopore technology is a modern way of sequencing DNA, in other words, it determines the order of nucleotides in a DNA strand . First, a DNA strand passes through a motor protein that serves as an inhibitor to improve the accuracy of the readings. Then, the strand runs through a nano-sized hole that is experiencing an ionic current. As the DNA strand passes through the nanopore, the individual nucleotides cause disruptions in the electric current which help determine the nucleic acid sequence . Approximately 250 bases can be sequenced per second by Nanopore technology . However, the results are “noisy” or not completely reliable because of several factors including the inconsistency of the nucleotides’ dwell times, which are the time periods of the central bases in the nanopore. The wavelength of the ionic current depends on the dwell time; inconsistent dwell times produce different wavelengths that make it difficult to accurately determine the nucleic acid sequence.
We believe the central base’s dwell time depends on the nucleotides present in the motor protein at that moment. The central base is the nucleotide currently in the nanopore whose dwell time we are focusing on. A k-mer (e.g. “AAT” or “CGATC”) consists of multiple nucleotides that correspond to the central base’s dwell time. To characterize the inconsistencies of the dwell times, we analyzed the data to find correlations between the k-mers and their dwell times. These correlations can help future researchers assign specific characteristics to k-mers and their impact upon dwell times.
We conducted our research with sequenced DNA data previously acquired from the lambda bacteria, which has approximately 48,000 nucleotides. Through Python, we manipulated the data to determine trends and correlations between the k-mers and their corresponding dwell times. Our main goal in the programming was to create visual representations of the data in the form of plots. These plots allowed us to discern noticeable trends in the data that we otherwise might not have observed. Our initial plots were difficult to analyze because there was too much data that made it nearly impossible to understand any trends. To resolve this issue, we organized the data into a more digestible format, by only plotting the average dwell time for each k-mer and sorting the points from lowest dwell time to highest. This allowed us to more easily spot any outliers or trends in the data and compare different k-mers.
The plot and table showed that the data for all of the bases was very similar as there were barely any differences between their means, standard deviations, etc. These results validated our belief that the dwell times did not depend on the base in the pore but rather that the dwell times are dependent on the following bases that are falling through the motor protein since it controls the speed of the DNA strand. To test this idea, we moved on to creating plots of different k-mer’s dwell times to see if any correlations became more apparent.
We began with an unshifted 3-mer as practice to construct our function and begin experimenting with associating k-mers to the central base’s dwell time. As expected, there were no significant trends in our data because the unshifted 3-mer has only one nucleotide following the central base. We believed that there would be correlations with a longer k-mer and its dwell time since, as our hypothesis claims, the central base’s dwell time depends on the nucleotides passing through the motor protein in that moment. Following on with this observation, we created a dataset with shifted 3-mers, where we shifted the 3-mer one space to the right.
Since the error bars for the dwell times of the 3-mer are so significant, the data for the 3-mer plot should not be used as it is unreliable. This further affirmed our focus on the 5-mer as it holds more reliable data.
We observed that the outlying dwell times below 7 units and above 9 units had a consecutive repetition of a base in their k-mers. The dwell times in these ranges varied greatly from the “normal” dwell time in the 7 to 9 unit range. The k-mers in this “normal” range had a minimal amount of repetition in comparison to the outlying ranges. This lead us to believe that the consecutive repetition of a nucleotide causes an irregular dwell time.
Similar to the previous sample of 5-mer data, there was a repetition of a single base in k-mers with irregular dwell times. In the 6.5 to 6.7 unit range of the dwell time there was a repetition of a single base in each k-mer. We also noticed that there was a higher prevalence of C’s in the lower spectrum of the dwell times. Moving into the higher 10 to 12 unit range of the dwell time, we observed that there was a consistent repetition of the nucleotide G in each k-mer. G’s are known to be particularly disruptive so we believe that the repetition of them caused the change in the dwell time. This data supported that the repetition of a nucleotide causes irregularities in the dwell times.
The first plot models that there is no difference between the prior k-mers and the 7-mer. This indicates that in order to find some trend, we need to increase the length of the k-mer and keep shifting it. There are 16,384 possible 7-mers combinations, so we created a plot of their averages so there would be a better visualization. The second plot above represents the averages of the first 10,000 bases of the lambda data collected, sorted from least to greatest. However, after analyzing the plot, we were sure that we would need a longer k-mer to spot any significant trend. For future reference, researchers should look to analyzing the repetition of nucleotides, focusing on specific ranges, and any other possible trends that could lead to a discovery.
Our project confirmed that the dwell time is not dependant on the central base but rather on the following bases in the motor protein. A significant discovery was that the consecutive repetition of a nucleotide causes irregular dwell times. The nucleotides G and C also appear to affect the consistency of the dwell time. With this information, future researchers should continue to create different shifts of k-mers and gather more data about the impact of base repetition on the dwell time. By understanding this relationship, researchers will be able to characterize and predict the effect a k-mer will have upon the dwell time. This can lead to more accurate readings of the sequenced DNA which will allow for the expansion of Nanopore technology.
Advancements in Nanopore technology will expand the applications of DNA sequencing in fields such as healthcare and data compression. Increased reliability will permit for the technology to have a more widespread role in personalized medicine, where treatment is unique to one’s genetics, and DNA data storage, an incredibly efficient way of storing a substantial amount of data in a microscopic space.
 Python: The Ultimate Beginner’s Guide!. Scotts Valley: CreateSpace Publishing, 2016.
 “Python Numpy Tutorial”, Cs231n.github.io, 2019. [Online]. Available: http://cs231n.github.io/python-numpy-tutorial/#numpy-arrays. [Accessed: 08- Jul- 2019].
 “Data Analysis and Visualization with Python for Social Scientists alpha: Extracting row and columns”, Datacarpentry.org, 2017. [Online]. Available: https://datacarpentry.org/python-socialsci/09-extracting-data/index.html. [Accessed: 08-Jul-2019].
 G. Templeton, “How DNA data storage works – ExtremeTech”, ExtremeTech, 2016. [Online]. Available: https://www.extremetech.com/extreme/231343-how-dna-data-storage-works-as-scientis ts-create-the-first-dna-ram. [Accessed: 08-Jul-2019].
 G. Templeton, “How DNA sequencing works – ExtremeTech”, ExtremeTech, 2015. [Online]. Available: https://www.extremetech.com/extreme/214647-how-does-dna-sequencing-work. [Accessed: 08-Jul-2019].
 “Replacing strings with numbers in Python for Data Analysis – GeeksforGeeks”, GeeksforGeeks, 2018. [Online]. Available: https://www.geeksforgeeks.org/replacing-strings-with-numbers-in-python-for-data-analysis/. [Accessed: 08-Jul-2019].
 N. Jetha, C. Feehan, M. Wiggin, V. Tabard-Cossa and A. Marziali, “Long Dwell-Time Passage of DNA through Nanometer-Scale Pores: Kinetics and Sequence Dependence of Motion”, Biophysical Journal, vol. 100, no. 12, pp. 2974-2980, 2011. Available: 10.1016/j.bpj.2011.05.007.
 T. Petrou, “Selecting Subsets of Data in Pandas: Part 1 – Dunder Data – Medium”, Medium, 2017. [Online]. Available: https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c. [Accessed: 11-Jul-2019].
 “How it works”, Oxford Nanopore Technologies, 2019. [Online]. Available: https://nanoporetech.com/how-it-works. [Accessed: 11-Jul-2019].
 S. Lynn, “Using iloc, loc, & ix to select rows and columns in Pandas DataFrames”, Shane Lynn, 2018. [Online]. Available: https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-an d-ix/#loc-selection. [Accessed: 11-Jul-2019].
 “Python Pandas Tutorial 7. Group By (Split Apply Combine)”, YouTube, 2017. [Online]. Available: https://www.youtube.com/watch?v=Wb2Tp35dZ-I. [Accessed: 14-Jul-2019].
 “How do I apply multiple filter criteria to a pandas DataFrame?”, YouTube, 2016. [Online]. Available: https://www.youtube.com/watch?v=YPItfQ87qjM. [Accessed: 14-Jul-2019].
 “Sequencing DNA (or RNA) | Real-time, Ultra Long-Reads, Scalable Technology from Oxford Nanopore”, YouTube, 2017. [Online]. Available: https://www.youtube.com/watch?v=GUb1TZvMWsw. [Accessed: 14-Jul-2019]
 “Introduction to nanopore sequencing”, YouTube, 2019. [Online]. Available:
https://www.youtube.com/watch?v=I9BOF8Hla5s. [Accessed: 14-Jul-2019].
 lambda data accessed from Weissman’s lab
Munir Bshara and My Pohl
Abstract:When two people carry an object together, they manage to change directions and avoid obstacles with minimal or even without verbal communication. Communication using nudges, forces and eye contact is often sufficient. In this paper, we study this nonverbal language with the aim to understand it better. This type of communication is useful also for artificial intelligent robots. The results of our study and further experiments may contribute to better communication between AI robots and humans as well as between robots. The experimental setup consisted of two persons carrying an object to a certain goal. Meanwhile data was collected from force sensing resistors attached onto the object and from a camera. The results showed that there seems to be different styles of communicating. To be able to apply this kind of communication onto robots we think it will be necessary to identify these different methods rather than collecting experimental data and calculate an average.
Materials. A grid made out of masking tape was used to outline the experimental area, a table which was the object the participants carried, four force sensing resistors to measure the different forces that were applied to the object, a web camera (logitech Pro Stream Webcam C922 1080HD) to record videos to later be able to analyze them in a program (OenPose). To get the forces we used 4 FRS’s connected to 4 breadboards which are then connected individually through a 3.3kΩ resistor and 3 breadboard cables to an Arduino UNO(REV3) which is connected to a pc through USB.
The pictures above show how the ZJchao Force Sensitive Resistor was attached to the UNO R3 board. The FSR was attached to the breadboard through a F-M Dupont Wire. The red and blue breadboard jumper wires are for 5V power and grounding while the yellow breadboard jumper wire is used for direct connection to the computer. The resistance of the resistor was 3.3kΩ. To convert the analog output to digital the Ardunio IDE was used and some code to retain the output of the force sensors.
Design. The experiment took place in a room with a grid on the floor with some fabric pieces of four different colors deployed in the squares of the grid. The grid had 32 squares (84), each square was 50cm50cm. In the ceiling there was a camera recording the procedure. Afterwards the videos were analyzed in a program named PoseNet which tracked the participants’ motions and trajectory. At the short side of the grid was the starting position of the table with four force sensing resistors attached.
On the short sides of the table there were handles with two force sensing resistors. The sensors were directly attached onto the table. On top of the sensors were the handles. The handles had a piece of eraser with the same area as the sensors attached to them. The handles were attached to the table with silver tape and so that the eraser touched the sensor. No pressure was being put onto the sensors while resting. The handles were loosely attached to the table.
The schematic above shows the connection between the FSR and the computer using the Arduino UNO which allows us to convert the analog signal from the FSR to a digital signal that can be printed. The relationship is generally linear from 50g and up, but note what the relationship does below 50g, and even more-so below 20g. These sensor’s have a turn-on threshold — a force that must be present before the resistance drops to a value below 10kΩ, where the relationship becomes more linear. To counteract this irregularity in the beginning of chart I altered the code. This alteration changes the parabolic reading of pressure put upon the force sensor to a linear one through dividing the force by a certain constant that was given to us.
Procedure. There were two participants at a time. Initially they got a general explanation of the experiment read out loud. They were told that they were supposed to carry a table together through a room and that they were not allowed to either turn around or speak with the other person. The experiment would be repeated three times with some small changes. The experiment was over when the table is put down on the floor. Further information was given individually.
These were the instructions that were given to the participating pairs on a paper.
Experiment 1, person 1:
“On the floor there will be a grid with some fabric pieces of different colors in it. The grid is the area where the experiment will take place. Remember, you are not allowed to speak with the other person or turn around.”
Experiment 1, person 2:
“On the floor there will be a grid with some fabric pieces of different colors in it. The grid is the area where the experiment will take place. Your task is to place the two legs of the table on your side into two squares with blue colored fabric pieces. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”
Experiment 2, person 1:
“Your task is to place the two legs of the table on your side into two squares with green colored fabric pieces. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”
Experiment 2, person 2:
“Your task is to place one of the two legs of the table on your side into a square with a yellow colored fabric piece in it. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”
Experiment 3, person 1:
“Your task is to place one of the two legs of the table on your side into a square with a yellow colored fabric piece in it, and the other into a square with a blue colored fabric piece. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”
Experiment 3, person 2:
“Your task is to place one of the two legs of the table on your side into a square with a white colored fabric piece in it, and the other into a square with a blue colored fabric piece. The other person doesn’t know about this and you will have to navigate the both of you so that you accomplish the mission. Remember, you are not allowed to speak with the other person or turn around.”
All the experiments were recorded on film using the logitech camera in the ceiling. These videos were after the experiment was done analyzed in OpenPose.Then we got the exact position data, movements and trajectories. During the experiment the force sensing resistors were read every .5 seconds. We took time on all the experiments using a stopwatch.
To analyze the force data I connected all four force sensors to their own breadboard and their own arduino UNO board, and through a usb connection to my computer I was able to use an Arduino IDE with some code to print out the force outputs. Then by using a script and vi I was able to easily isolate the forces and time allotted into a text editor.
Looking like this:
To have something to compare the results from the experiments with we made a baseline. The baseline is the same exact experiments, but with both participants knowing where they should place the table.
Participants. Participants were recruited from different departments at Stanford University (The Artificial Intelligence Laboratory, Wallenberg Hall and interns from the internship program “Stem to SHTEM”). All participants were between the ages 15 and 52. The average age was 21 years. 12 people participated in the experiment (5 female and 7 male). All participants participated voluntarily.
All trials were recorded with the webcam attached to the ceiling. Then Machine Learning Techniques were used to extract various body points. OpenPose + Tensorflow allowed us to track positions relative to the camera. A combination of python scripts and bash tools was needed to extract and format the data.
It seems like different types of people react differently to each of the experiments based off previous interactions. For example, a dad and his daughter moved the table calmly while a pair of friends made more powerful movements. Another thing that was being noticed is that people pulled more than they pushed.
In the graphs above we have calculated the difference between the measurements from the pushing and pulling sensors on each handle to get the resulting force. This has been done during the whole time the experiment lasted. Down in the right corner of the figure we have the baseline which can be used for comparison.
Not that surprising, the baseline solved the task faster than the others on all of the experiments. The baseline did also use more exact nudges rather than a lot of back and forth communication. However, the other participants who did not know exactly where the goal was also managed to complete the tasks well.
There is a pattern in the way the different groups solved the task. For example, group 3’s three graphs look a lot alike. The same is for group 2 and 1. More measurements are needed to establish reliable conclusions but it seems like there are a few different styles on how to solve these kinds of tasks. Maybe it would be good to try to identify these different styles rather than collecting data and calculate an average because a combination of working methods does not necessarily have to result in a new working one.
Humans have an advanced and efficient way of communicating with each other. The rapidly growing research area artificial intelligence would benefit from learning this kind of communication. In this paper we focused on the nonverbal communication via nudges when carrying an object. We have developed a methodology to measure push and pull forces that are being put onto an object that is being carried. The results from the experiments showed that the tasks were possible to solve but that it went more efficiently when both of the participants knew the goal. Another conclusion is that there seems to be different methods to solve these kinds of tasks and that they may depend on the relationship between the two participants. The different methods should be identified.
Limitations and future work. When we designed this experiment we wanted to eliminate all other ways to communicate except for nudges so that we can be sure that the measurements we get are reliable and correct. One limitation is that we did not exclude the possibility to communicate via eye contact. Future work should take eye contact into consideration. An easy way to solve this problem would be to make the participants wear capes or hats. The measurements we got from the force sensing resistors showed how the two participants pushed and pulled. Another dimension that should be added is turnings. Then we would get a more complete idea of how the exchange of information via nudges works. Measure turnings can easily be done with two handles on each side of the table placed in the four corners. (For this four more force sensing resistors are needed.) With four handles the measurements will show when someone is pushing more on one of the handles or pulling stronger on one side of the table which would make it possible to identify turnings. The complete product of this project would be to transfer our measurements onto AI robots so that they can be able to communicate more efficiently and more human-like.
We would like to thank Professor Weissman for allowing us to participate in the STEM to SHTEM program. Also many thanks to Mengxi Li and Minae Kwon for mentoring us in this project and finally a big thank you to all the participants who helped us gather data for this paper.
Force Sensitive Resistor Hookup Guide, learn.sparkfun.com/tutorials/force-sensitive-resistor-hookup-guide/all.
Kevin W. Rio, William H. Warren (2016). Interpersonal coordination in biological systems The emergence of collective locomotion.
Patrick Nalepka, Rachel W. Kallen, Anthony Chemero, Elliot Saltzman, and Michael J. Richardson(2017). Herd Those Sheep: Emergent Multiagent Coordination and Behavioral-Mode Switching.
Vivaan Mahtab, Ganesh Pimpale, Juan Aldama, and Paul Truong
Abstract:In 2018, Prof. Tsachy Weissman’s lab conducted research on using human semantics to compress an image. A ‘describer’ would provide links to image assets representing elements of the original image ; for example, given a picture of the savanna, the describer would provide images of grass and trees, among others. With the provided compiled components, the describer would use natural language to describe how a ‘reconstructor’ could combine the elements to create an image semantically similar to the original. The instructions and links constituted the compressed image. They called this approach Humans as Awesome Compressors (HAAC). While this compression scheme significantly reduced the size of an image, the natural language instructions were imprecise and could be shortened further. To solve this a scripting-based image editing software that tracks the actions of the describer and can execute them on the reconstructor side, thus improving precision. Future research could involve completely automated image compression through the use of artificial intelligence, and the implementation of compression using a neural network and creating a database of pre-segmented images. Code can be found at https://github.com/GaneshPimpale/AI-Compression.
1. Introduction and Problem Statement
There are a plethora of lossy and lossless compression algorithms which help in storing and communicating mass amounts of images. Different types of compression algorithms take advantage of different properties of an image to reduce file size but maintaining similarity to the original image. Some of the most common types of compression algorithms are JPEG , JPEG2000 , and WEBP . Most lossy compression techniques attempt to represent more image data with fewer data. For the JPEG compression algorithm, the image is converted to YCbCr, the image is downsampled, the DCT (Discrete Cosine Transform) algorithm is used to represent a finite array of values as multiple cosine functions, and the image is quantized to represent it in as few colors as possible. This process is very similar to WEBP, where the image is downsampled and brightness is decreased, but instead if using DCT, WEBP decomposes the image down into smaller blocks and then uses a prediction algorithm to represent more of the image with fewer values. These processes have major effects of certain properties of the image when a high compression ratio is in use. Namely: Blurring of the image, discoloring the image, and a decrease in resolution.
From figure 2 and 3, it is obvious to see that WEBP has significantly better performance than JPEG. In the examples from figure 2 the effects of JPEG on the image consist of miscoloring, noise, and over quantization. The size of the file decreased to around 140 kilobytes. Comparing this to WEBP, JPEG did much worse; the WEBP image only suffered some discoloration and change in noise with the file size below JPEG at around 130 kilobytes. From figure 3, the graph clearly shows that WEBP is more efficient than JPEG and JPEG2000, especially when working with smaller images.
Prof. Tsachy Weissman conducted research testing the abilities of human lossy compression . The research group was able to prove that the method of saving the important parts of an image is a viable way of storing it. The team harnessed two humans; one would encode the image by finding image elements, which are the important parts of an image, similar to the original image. The encoder would then communicate the image to the decoder using an online chat. The decoder would then redraw the image. The original image and the composite image would then be compared based on their file size, quality, and semantic similarity. However, this method of testing contains many flaws and there were many other variables that should have been considered. The first flaw is that directions can be executed in different manners by different people, the directions are quite sematic. Another issue is the fact that the language used to describe the image is nowhere near to concise, leading to larger file sizes as a result of an imperfect editing language.
None of the current standard algorithms consider the semantics of an image, they use the same compression algorithm on all parts of the image, even if it is the focus of the image or the background of the image making the entire image lose quality. Current research already done is currently not a practical solution to harness the idea of a semantic image compression technique. This project aims to automate the method of semantic image compression by creating a deterministic algorithm to reconstruct images in a repeatable manner.
2. Overview of Methods and Image Reconstruction
The overall goal was to be able to capture every editing command executed by the describer so that the instructions would be perfectly precise and reproducible. Thus, a custom script-based editor was created. To create a composite image, the describer creates a Python script where he imports element, performs per-element manipulation, such as resize, rotate and blur, and also places the element in the correct location and order on the canvas. He can run the script at any time and see the composite so far and check his work. When the describer is satisfied with the composite image, he runs the script through a parser which compiles it into a custom simplified command language to reduce file size and writes the output to a text file. This compressed text file of compiled commands constitutes the compressed image. To decompress and display the image, the text file is decompressed, then run through a parser to turn it back into a Python script. The script is executed, the composite image is recreated and saved.
In the previous work on HAAC, the describer would compile links to elements and generate text instructions on how to place the elements to create a composite image. However, this scheme had issues of repeatability and accuracy. Because natural language is poorly suited to precise descriptions of how to edit images, it was necessary for the describer to receive a video feed of the reconstructor’s work so that the describer could provide instructions on how to refine the composite image. Thus, image compression could not be effectively carried out with this scheme unless the describer and reconstructor were both in live communication with each other.
The compression scheme not only eliminates the need for live communication between describer and reconstructor, but it also eliminates the need for a reconstructor entirely. Rather than using natural language, the describer constructs the composite image himself via a Python script. The commands are recorded and can be executed to create an identically recreate the composite image without any further communication with the describer. Thus, the scheme takes advantage of human qualitative understanding to find relevant elements, edit and place them and a computer’s quantitative precision to capture exactly how the elements were edited and where they were placed.
To create a composite image, the elements used must be on transparent backgrounds. However, during testing, few elements actually came originally pre-isolated but rather had to be isolated from their original backgrounds using GIMP’s freeform crop tool. The experimental design was based on an assumption that a completed implementation of this compression scheme, there would be a repository of image elements with transparent backgrounds, as is discussed further in “Future Goals”. Thus, the same images as the previous research group were used, the elements were classified and isolated from their backgrounds as a pre-processing step. Therefore, the describer used elements with transparent backgrounds when writing the script to generate the composite image.
Additionally, while in last year’s research, elements were referred to using links, this year they were referred to using file paths. This was used to simplify the testing scheme, however, it does not skew the data because a URL shortener could be used to generate links shorter than the average file path used during testing. For example, the suffix to a Bitly link is only seven characters long, while the average file path in testing was greater than seven characters long .
To create a consistent method of scripting for the purpose of encoding and decoding, as well as to further compress a parser was designed to compress scripts into a constructed syntax-specific based language. This involved creating an overall method as to which parse the code that displayed the composite image which would provide full functionality while also reducing the script as much as possible. To account for overloaded functions and various images the parser would take in the entire script of elements and output a single string with the element image id and its corresponding functions given identification values and the parameters converted to hexadecimal per each identification with no strict limit on the number of identifications and quantity of parameters. This aforementioned string is stored as the represented composite image and is executable to be decoded and provide reconstruction. A flow chart of this process can be seen in figure 4.
It was considered to build off of existing image editing software. The optimal software would capture user actions in the GUI and save them as a script, as this would make the describer’s workflow much easier than manually writing the script. The options were narrowed to GIMP, Adobe Photoshop and ImageMagick as candidates because they all include both GUIs and scripting utilities. ImageMagick was quickly eliminated, however, because it is built for per-image editing, not placing and editing multiple images on the same canvas . And while GIMP and Photoshop could store UI commands like macros  and had editing histories , macros lacked the necessary flexibility and editing histories lacked the necessary precision. Existing plugins to capture image editing commands were either incomplete or generated so much data that the edit log would be larger than the original image. Thus, a research-specific editing software was designed because Python’s Pillow  image editing library included all the necessary image editing functionality while affording more flexibility, including the ability to add a GUI on top of the original program in the future.
3. Experimental Results
Table 1 shows the compression algorithm used compared to the file sizes for each of the images. The Original column consists of the original image. The WEBP column consists of images compressed with WEBP with the quality set to zero. The JPEG column has images compressed with JPEG at quality zero. The HAAC column consists of the file sizes that the research group last year achieved. HAAC + Algorithm is the file size of just the plain python script to build the image. HAAC + Algorithm + Parser is the size of the python script after being compressed further. The bar graph provided is a visual aid to Table 1. To make it easier to read, the vertical axis is put on a logarithmic scale.
Table 2 shows the deterministic differences between a compressed image and the original image. The WEBP images are compressed with a 0 quality and the JPEG images are compressed to a quality of 1. To compare the images, both the original image and the compressed image were resized to 1000 by 1000 pixels. On the resized images, an algorithm will compare each of the pixel values and then score them if they are within a certain boundary of each other. The bar graph above is a visual representation of Table 2.
4. Conclusions and Further Research
Through our research, we were able to create a compression algorithm capable of compressing an image by approximately 99.9884 percent. The average file size of all the images we compressed came out to be a total of 318 bytes from the original 2 million. However, there is the drawback that our images were deterministically the least similar to the original image, being on average, around 25.22 percent different.
In the previous research on HAAC, elements were referenced by linking to images on the internet. However, links are unreliable due to the compressed file possibly linking to images that are no longer available or whose URL was altered. Additionally, there exists the issue of isolating elements from a larger image. Freeform cropping elements from their source images was a time-intensive part of the compression process. Both these issues would theoretically be solved with an element database designed for the purpose of semantic compression. This would allow the ability to provide links to elements within the database with the assurance that they will be available later. Furthermore, elements would be isolated from their backgrounds to avoid the laborious step of freeform cropping when creating composites.
Further research could be conducted on the feasibility of the aforementioned database. A cost-efficiency analysis should be conducted to measure how large the database must be in order to provide enough elements to contain consistently similar elements to the original image while minimizing overall size. If the database is not exceedingly large, clients could also store a local copy, thus removing the transmission cost of downloading all elements required to recreate a composite image.
To build even further on the idea of complete automation would be to have an AI work to both segment and form instructions on how to recreate the image. Segmentation software is already a very common type of software and has already accumulated much research. On the other hand, creating directions is a field that does not yet have much research and is very important to make HAAC fully automated.
Thank you to all those who supported this project. Thank you to the mentors: Micheal Herrera, Shubham Chandak, Kedar Tatwawadi, and Tsachy Weissman. Thank you to the past researchers: Ashutosh Bhown, Soham Mukherjee, Sean Yang, Irena Fischer-Hwang, Shubham Chandak, Kedar Tatwawadi, Judith Fan and, Tsachy Weissman.
 Weissman, T., Tatwawadi, K., Fischer-Hwang, I., Chandak, S., Yang, S., Mukherjee, S., & Bhown, A.(2019, June). Towards improved lossy image compression: Human image reconstruction with public-domain images. Retrieved from Stanford Compression Forum website: https://compression.stanford.edu/human-compression
 David Taubman and Michael Marcellin, JPEG2000 Image Compression Fundamentals, Standards and Practice, Springer Publishing Company, Incorporated, 2013.
 Gregory K Wallace, “The JPEG still picture compression standard,” Communications of the ACM, vol. 34, no. 4, pp. 30–44, 1991.
 “WebP,” https://developers.google.com/speed/webp/, Accessed: 2018-10-16.
 [Webpage compressor]. (n.d.). Retrieved from bitly website: https://bitly.com/
 ImageMagick [Computer software]. (n.d.). Retrieved from https://imagemagick.org/index.php. Kiel, S. (n.d.). Automate Editing. Retrieved from gimp.org website: https://www.gimp.org/tutorials/Automate_Editing_in_GIMP/
 Script recorder (a.k.a. macro recorder) [Online forum post]. (2001). Retrieved from gitlab.gnome.org website: https://gitlab.gnome.org/GNOME/gimp/issues/8
 Creating actions. (n.d.). Retrieved from helpx.adobe.com website: https://helpx.adobe.com/photoshop/using/creating-actions.html
 Clark, A. (n.d.). Pillow. Retrieved from pillow.readthedocs.i website: https://pillow.readthedocs.io/en/stable/
 Yu, F. (2017). ResNet-152 in Keras [Computer software]. Retrieved from https://gist.github.com/flyyufelix/7e2eafb149f72f4d38dd661882c554a6
 https://pythonprogramming.net/programming-k-nearest-neighbors-machine-learning-tutorial/. (n.d.). Retrieved from pythonprogramming.net website:
 PyPE-python-photo-editor [Computer software]. (2016). Retrieved from https://github.com/corentinm68/PyPE-python-photo-editor
Hannah Chau, Tyler Pauly, Darin Phan, Roshan Prabhakar, Lian Quach, and Christina Vo
Abstract:Many lossy image compressors have been created, yet they are not programmed knowing which parts of an image are most important to humans. In order to build a lossy compressor that would take human perception into consideration, components of facial images that are fundamental for making a face distinguishable among others needs to be determined. In this paper, two separate compression methods will be described with regards to their importance, the method in which they were designed, and their contribution to the future of facial imaging compression. The first experiment was programmed using the Amazon Mechanical Turk platform, where users participated in a human-to-human interactive game. In this game, one participant described a targeted facial image out of a dataset to another participant using text instructions. The results would then provide data regarding the accuracy of the description and the most useful facial features that helped the second participant guess correctly. The second experiment used this data to create a program that provided a compressed facial image while preserving the important aspects of the face. Data were collected applying these two qualifications and subsequently applied to construct a human-centric compression algorithm.
Recently, compression, the reduction of space needed to store data, has become a big issue because of the significant need for more storage on devices. Lossy compression, which includes image compressors such as JPEG and WebP, reduces the amount of data by removing certain parts that the program deems unnecessary. Although the file becomes significantly smaller in size, this process often results in blurry and pixelated pictures. Furthermore, these lossy compressors are not human-centric. This means that they are not focused on compressing the parts of images that people care the least about, while the parts that are most important to human perception are kept. We aim to build such a human-centric lossy compressor that would remove data and ultimately affect the picture quality the least according to human perception. This would still compress the amount of data the image uses, but decrease the distortion by only losing data that humans would not notice in the image.
2. Previous Works
This current study expands on related works, one being experiments done on human compression and image reconstruction (Bhown, etc., 2018). The experiments involved a person with an image sending text messages with links and descriptions to another person in order to recreate the photo using only images off of the web and photoshop. The results showed that when people from Amazon Mechanical Turk judged the human recreation versus the lossy compressor WebP, the human recreation scored higher on all but one trial. That one trial was of a recreation of an image with a face. No photos on the internet of other people’s faces or facial landmarks combined were able to match the original picture, and therefore, made the person unrecognizable. Because faces are more complex than most images, we wanted to use our current experiments to determine the significant parts of faces that made them unique from one another and how that could be conserved during compression.
We incorporated some aspects of experiments done in the past based on graphical communication with constraints into our own in order to conclude which facial landmarks were the most salient. (Bergmann, T., Dale, R., & Lupyan, G., 2013). In their study, they created an online game in which participants from Amazon Mechanical Turk were involved in one of two trials: speaking or listening. In the speaking trial, one person would have five seconds to draw a picture with their mouse of the facial image given to them. In the listening trial, a different person would be shown the animation of the first person sketching this image. They would have to guess which face the sketch matched between the two images they were given.
3. Models and Materials
For the first experiment, we created a human-to-human interactive game using participants from Amazon Mechanical Turk. The game included two roles: describer and guesser. Roles were randomly assigned to each player. Before the user started, we had them enter demographic information (age, gender, race, region) to observe if it correlated to their accuracy on differentiating between the faces of the data set. From there, participants were then given instructions about the overview of the game; the describer would see eight similar faces and receive a specific target image to transcribe to the guesser, who would then choose the face they believe the sketch to be out of the data set.
The describer was shown one of the eight photos that the computer randomizes, along with the whole dataset, and was given either a time constraint or a word limit. With the time limit, the describer was told to type a description of the face in the box below to help the guesser distinguish this particular face from the other seven within 25 seconds. The description did not have to be incomplete sentences, and the describer could not type after the timer hit zero. With the word limit, the describer typed a description into the box and was not be able type after the word limit hit 15 words.
The guesser was then shown the description and the whole dataset of the eight faces. Then, participants were asked to match one of the faces to the one they believe the description was describing. Once they selected a face, they clicked submit within the 15 second time limit. The program then revealed if the guesser guessed correctly. From there, if the guess was correct, the participant was asked which facial feature or part of the description was the most helpful in determining which picture correlated with the description and repeat.
The goal of this experiment was to find the rate-distortion curve for humans and determine if it was more efficient than JPEG rate-distortion curve. Additionally, the intent was to further comprehend which distinguishable facial features are necessary to preserve when compressing a facial image.
For our second experiment, we created a human-centric compression program that compresses facial images, while preserving facial landmarks for recognizability. In this program, we utilized HAAR cascades to detect the jurisdiction of key facial features. We then implemented an adapted version of the K-nearest Neighbor algorithm to detect which of the fields found by the cascades were legitimate. After determining the location of key features, the image is scaled down to a smaller resolution, then resized back to its original resolution. The result at this point is a fully pixelated image. Finally, the regions defined by the HAAR Cascades are overlaid onto the pixelated image. The product is a pixelated image with the main facial landmarks clearly defined. If no faces are present in the image, the program returns a fully pixelated image.
To test the validity of this algorithm, the image reconstructions have been rated by human scorers based off how recognizable the Facial image is and how satisfied they are with the aesthetic appeal of the overall reconstruction.
We have yet to come to a solid conclusion, as our project has not received the approval of the Institutional Review Board to complete phase one of the experiment. Also, although the data has been collected for the image reconstructions, we have yet to understand the data and alter our algorithm accordingly. So far, our most promising result is the prototyped compression algorithm which preserves key facial landmarks.
6. Future Directions
We feel that the weakest part of our algorithm is the compression, after determining landmarks. Currently, we are pixelating the image in efforts to reduce the image size. Future work requires us to research better lossy compression methods which can be applied to entire images. This will allow us to achieve a better compression ratio while preserving the aesthetic and recognizability of the image. Another future step would be to find a better verification process for the results of the HAAR cascades. We are currently using K nearest neighbor to compare each result with a handmade database of many different landmark pictures in a time, and memory, expensive process.
We thank Prof. Tsachy Weissman for guiding our group and helping us to start this whole study. We would also like to thank Sara Abdali for also being a mentor to our group and for the multiple informative discussions. Lastly, we would like to thank Judith E. Fan, Kedar Tatwawadi, and Sophia Kivelson for providing feedback and answering any questions that we needed answered as a means to getting our study started.
Bergmann, T., Dale, R., & Lupyan, G. (2013). The Impact of Communicative Constraints on the Emergence of a Graphical Communication System. Proceedings of the Annual Meeting of the Cognitive Science Society, 35.
Fan, Judith & Hawkins, Robert & Wu, Mike & Goodman, Noah. (2019). Pragmatic inference and visual abstraction enable contextual flexibility during visual communication.
Suchow, Jordan & Peterson, Joshua & L Griffiths, Thomas. (2018). Learning a face space for experiments on human identity.
Bhown, et al. “Towards Improved Lossy Image Compression: Human Image Reconstruction with Public-Domain Images.”
Monroe, Will & Hawkins, Robert & D. Goodman, Noah & Potts, Christopher. (2017). Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding. Transactions of the Association for Computational Linguistics.
Hawkins, R.X., & Sano, M. (2019). Disentangling contributions of visual information and interaction history in the formation of graphical conventions.
Human Compression. (n.d.). Retrieved from https://compression.stanford.edu/human-compression
Panos, et al. “ShapeGlot: Learning Language for Shape Differentiation.” ArXiv.org, 8 May 2019, arxiv.org/abs/1905.02925.
Angelicadass. (n.d.). Humanæ – work in progress. Retrieved from https://humanae.tumblr.com/
“HAAR Cascades.” Haar Cascades, alereimondo.no-ip.org/OpenCV/34.
Opencv. “Opencv/Opencv.” HAAR Cascades – Github, 7 Feb. 2018, github.com/opencv/opencv/tree/master/data/haar cascades.