Hannah Chau, Tyler Pauly, Darin Phan, Roshan Prabhakar, Lian Quach, and Christina Vo
Abstract:Many lossy image compressors have been created, yet they are not programmed knowing which parts of an image are most important to humans. In order to build a lossy compressor that would take human perception into consideration, components of facial images that are fundamental for making a face distinguishable among others needs to be determined. In this paper, two separate compression methods will be described with regards to their importance, the method in which they were designed, and their contribution to the future of facial imaging compression. The first experiment was programmed using the Amazon Mechanical Turk platform, where users participated in a human-to-human interactive game. In this game, one participant described a targeted facial image out of a dataset to another participant using text instructions. The results would then provide data regarding the accuracy of the description and the most useful facial features that helped the second participant guess correctly. The second experiment used this data to create a program that provided a compressed facial image while preserving the important aspects of the face. Data were collected applying these two qualifications and subsequently applied to construct a human-centric compression algorithm.
Recently, compression, the reduction of space needed to store data, has become a big issue because of the significant need for more storage on devices. Lossy compression, which includes image compressors such as JPEG and WebP, reduces the amount of data by removing certain parts that the program deems unnecessary. Although the file becomes significantly smaller in size, this process often results in blurry and pixelated pictures. Furthermore, these lossy compressors are not human-centric. This means that they are not focused on compressing the parts of images that people care the least about, while the parts that are most important to human perception are kept. We aim to build such a human-centric lossy compressor that would remove data and ultimately affect the picture quality the least according to human perception. This would still compress the amount of data the image uses, but decrease the distortion by only losing data that humans would not notice in the image.
2. Previous Works
This current study expands on related works, one being experiments done on human compression and image reconstruction (Bhown, etc., 2018). The experiments involved a person with an image sending text messages with links and descriptions to another person in order to recreate the photo using only images off of the web and photoshop. The results showed that when people from Amazon Mechanical Turk judged the human recreation versus the lossy compressor WebP, the human recreation scored higher on all but one trial. That one trial was of a recreation of an image with a face. No photos on the internet of other people’s faces or facial landmarks combined were able to match the original picture, and therefore, made the person unrecognizable. Because faces are more complex than most images, we wanted to use our current experiments to determine the significant parts of faces that made them unique from one another and how that could be conserved during compression.
We incorporated some aspects of experiments done in the past based on graphical communication with constraints into our own in order to conclude which facial landmarks were the most salient. (Bergmann, T., Dale, R., & Lupyan, G., 2013). In their study, they created an online game in which participants from Amazon Mechanical Turk were involved in one of two trials: speaking or listening. In the speaking trial, one person would have five seconds to draw a picture with their mouse of the facial image given to them. In the listening trial, a different person would be shown the animation of the first person sketching this image. They would have to guess which face the sketch matched between the two images they were given.
3. Models and Materials
For the first experiment, we created a human-to-human interactive game using participants from Amazon Mechanical Turk. The game included two roles: describer and guesser. Roles were randomly assigned to each player. Before the user started, we had them enter demographic information (age, gender, race, region) to observe if it correlated to their accuracy on differentiating between the faces of the data set. From there, participants were then given instructions about the overview of the game; the describer would see eight similar faces and receive a specific target image to transcribe to the guesser, who would then choose the face they believe the sketch to be out of the data set.
The describer was shown one of the eight photos that the computer randomizes, along with the whole dataset, and was given either a time constraint or a word limit. With the time limit, the describer was told to type a description of the face in the box below to help the guesser distinguish this particular face from the other seven within 25 seconds. The description did not have to be incomplete sentences, and the describer could not type after the timer hit zero. With the word limit, the describer typed a description into the box and was not be able type after the word limit hit 15 words.
The guesser was then shown the description and the whole dataset of the eight faces. Then, participants were asked to match one of the faces to the one they believe the description was describing. Once they selected a face, they clicked submit within the 15 second time limit. The program then revealed if the guesser guessed correctly. From there, if the guess was correct, the participant was asked which facial feature or part of the description was the most helpful in determining which picture correlated with the description and repeat.
The goal of this experiment was to find the rate-distortion curve for humans and determine if it was more efficient than JPEG rate-distortion curve. Additionally, the intent was to further comprehend which distinguishable facial features are necessary to preserve when compressing a facial image.
For our second experiment, we created a human-centric compression program that compresses facial images, while preserving facial landmarks for recognizability. In this program, we utilized HAAR cascades to detect the jurisdiction of key facial features. We then implemented an adapted version of the K-nearest Neighbor algorithm to detect which of the fields found by the cascades were legitimate. After determining the location of key features, the image is scaled down to a smaller resolution, then resized back to its original resolution. The result at this point is a fully pixelated image. Finally, the regions defined by the HAAR Cascades are overlaid onto the pixelated image. The product is a pixelated image with the main facial landmarks clearly defined. If no faces are present in the image, the program returns a fully pixelated image.
To test the validity of this algorithm, the image reconstructions have been rated by human scorers based off how recognizable the Facial image is and how satisfied they are with the aesthetic appeal of the overall reconstruction.
We have yet to come to a solid conclusion, as our project has not received the approval of the Institutional Review Board to complete phase one of the experiment. Also, although the data has been collected for the image reconstructions, we have yet to understand the data and alter our algorithm accordingly. So far, our most promising result is the prototyped compression algorithm which preserves key facial landmarks.
6. Future Directions
We feel that the weakest part of our algorithm is the compression, after determining landmarks. Currently, we are pixelating the image in efforts to reduce the image size. Future work requires us to research better lossy compression methods which can be applied to entire images. This will allow us to achieve a better compression ratio while preserving the aesthetic and recognizability of the image. Another future step would be to find a better verification process for the results of the HAAR cascades. We are currently using K nearest neighbor to compare each result with a handmade database of many different landmark pictures in a time, and memory, expensive process.
We thank Prof. Tsachy Weissman for guiding our group and helping us to start this whole study. We would also like to thank Sara Abdali for also being a mentor to our group and for the multiple informative discussions. Lastly, we would like to thank Judith E. Fan, Kedar Tatwawadi, and Sophia Kivelson for providing feedback and answering any questions that we needed answered as a means to getting our study started.
Bergmann, T., Dale, R., & Lupyan, G. (2013). The Impact of Communicative Constraints on the Emergence of a Graphical Communication System. Proceedings of the Annual Meeting of the Cognitive Science Society, 35.
Fan, Judith & Hawkins, Robert & Wu, Mike & Goodman, Noah. (2019). Pragmatic inference and visual abstraction enable contextual flexibility during visual communication.
Suchow, Jordan & Peterson, Joshua & L Griffiths, Thomas. (2018). Learning a face space for experiments on human identity.
Bhown, et al. “Towards Improved Lossy Image Compression: Human Image Reconstruction with Public-Domain Images.”
Monroe, Will & Hawkins, Robert & D. Goodman, Noah & Potts, Christopher. (2017). Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding. Transactions of the Association for Computational Linguistics.
Hawkins, R.X., & Sano, M. (2019). Disentangling contributions of visual information and interaction history in the formation of graphical conventions.
Human Compression. (n.d.). Retrieved from https://compression.stanford.edu/human-compression
Panos, et al. “ShapeGlot: Learning Language for Shape Differentiation.” ArXiv.org, 8 May 2019, arxiv.org/abs/1905.02925.
Angelicadass. (n.d.). Humanæ – work in progress. Retrieved from https://humanae.tumblr.com/
“HAAR Cascades.” Haar Cascades, alereimondo.no-ip.org/OpenCV/34.
Opencv. “Opencv/Opencv.” HAAR Cascades – Github, 7 Feb. 2018, github.com/opencv/opencv/tree/master/data/haar cascades.