Vivaan Mahtab, Ganesh Pimpale, Juan Aldama, and Paul Truong
Abstract:In 2018, Prof. Tsachy Weissman’s lab conducted research on using human semantics to compress an image. A ‘describer’ would provide links to image assets representing elements of the original image ; for example, given a picture of the savanna, the describer would provide images of grass and trees, among others. With the provided compiled components, the describer would use natural language to describe how a ‘reconstructor’ could combine the elements to create an image semantically similar to the original. The instructions and links constituted the compressed image. They called this approach Humans as Awesome Compressors (HAAC). While this compression scheme significantly reduced the size of an image, the natural language instructions were imprecise and could be shortened further. To solve this a scripting-based image editing software that tracks the actions of the describer and can execute them on the reconstructor side, thus improving precision. Future research could involve completely automated image compression through the use of artificial intelligence, and the implementation of compression using a neural network and creating a database of pre-segmented images. Code can be found at https://github.com/GaneshPimpale/AI-Compression.
1. Introduction and Problem Statement
There are a plethora of lossy and lossless compression algorithms which help in storing and communicating mass amounts of images. Different types of compression algorithms take advantage of different properties of an image to reduce file size but maintaining similarity to the original image. Some of the most common types of compression algorithms are JPEG , JPEG2000 , and WEBP . Most lossy compression techniques attempt to represent more image data with fewer data. For the JPEG compression algorithm, the image is converted to YCbCr, the image is downsampled, the DCT (Discrete Cosine Transform) algorithm is used to represent a finite array of values as multiple cosine functions, and the image is quantized to represent it in as few colors as possible. This process is very similar to WEBP, where the image is downsampled and brightness is decreased, but instead if using DCT, WEBP decomposes the image down into smaller blocks and then uses a prediction algorithm to represent more of the image with fewer values. These processes have major effects of certain properties of the image when a high compression ratio is in use. Namely: Blurring of the image, discoloring the image, and a decrease in resolution.
From figure 2 and 3, it is obvious to see that WEBP has significantly better performance than JPEG. In the examples from figure 2 the effects of JPEG on the image consist of miscoloring, noise, and over quantization. The size of the file decreased to around 140 kilobytes. Comparing this to WEBP, JPEG did much worse; the WEBP image only suffered some discoloration and change in noise with the file size below JPEG at around 130 kilobytes. From figure 3, the graph clearly shows that WEBP is more efficient than JPEG and JPEG2000, especially when working with smaller images.
Prof. Tsachy Weissman conducted research testing the abilities of human lossy compression . The research group was able to prove that the method of saving the important parts of an image is a viable way of storing it. The team harnessed two humans; one would encode the image by finding image elements, which are the important parts of an image, similar to the original image. The encoder would then communicate the image to the decoder using an online chat. The decoder would then redraw the image. The original image and the composite image would then be compared based on their file size, quality, and semantic similarity. However, this method of testing contains many flaws and there were many other variables that should have been considered. The first flaw is that directions can be executed in different manners by different people, the directions are quite sematic. Another issue is the fact that the language used to describe the image is nowhere near to concise, leading to larger file sizes as a result of an imperfect editing language.
None of the current standard algorithms consider the semantics of an image, they use the same compression algorithm on all parts of the image, even if it is the focus of the image or the background of the image making the entire image lose quality. Current research already done is currently not a practical solution to harness the idea of a semantic image compression technique. This project aims to automate the method of semantic image compression by creating a deterministic algorithm to reconstruct images in a repeatable manner.
2. Overview of Methods and Image Reconstruction
The overall goal was to be able to capture every editing command executed by the describer so that the instructions would be perfectly precise and reproducible. Thus, a custom script-based editor was created. To create a composite image, the describer creates a Python script where he imports element, performs per-element manipulation, such as resize, rotate and blur, and also places the element in the correct location and order on the canvas. He can run the script at any time and see the composite so far and check his work. When the describer is satisfied with the composite image, he runs the script through a parser which compiles it into a custom simplified command language to reduce file size and writes the output to a text file. This compressed text file of compiled commands constitutes the compressed image. To decompress and display the image, the text file is decompressed, then run through a parser to turn it back into a Python script. The script is executed, the composite image is recreated and saved.
In the previous work on HAAC, the describer would compile links to elements and generate text instructions on how to place the elements to create a composite image. However, this scheme had issues of repeatability and accuracy. Because natural language is poorly suited to precise descriptions of how to edit images, it was necessary for the describer to receive a video feed of the reconstructor’s work so that the describer could provide instructions on how to refine the composite image. Thus, image compression could not be effectively carried out with this scheme unless the describer and reconstructor were both in live communication with each other.
The compression scheme not only eliminates the need for live communication between describer and reconstructor, but it also eliminates the need for a reconstructor entirely. Rather than using natural language, the describer constructs the composite image himself via a Python script. The commands are recorded and can be executed to create an identically recreate the composite image without any further communication with the describer. Thus, the scheme takes advantage of human qualitative understanding to find relevant elements, edit and place them and a computer’s quantitative precision to capture exactly how the elements were edited and where they were placed.
To create a composite image, the elements used must be on transparent backgrounds. However, during testing, few elements actually came originally pre-isolated but rather had to be isolated from their original backgrounds using GIMP’s freeform crop tool. The experimental design was based on an assumption that a completed implementation of this compression scheme, there would be a repository of image elements with transparent backgrounds, as is discussed further in “Future Goals”. Thus, the same images as the previous research group were used, the elements were classified and isolated from their backgrounds as a pre-processing step. Therefore, the describer used elements with transparent backgrounds when writing the script to generate the composite image.
Additionally, while in last year’s research, elements were referred to using links, this year they were referred to using file paths. This was used to simplify the testing scheme, however, it does not skew the data because a URL shortener could be used to generate links shorter than the average file path used during testing. For example, the suffix to a Bitly link is only seven characters long, while the average file path in testing was greater than seven characters long .
To create a consistent method of scripting for the purpose of encoding and decoding, as well as to further compress a parser was designed to compress scripts into a constructed syntax-specific based language. This involved creating an overall method as to which parse the code that displayed the composite image which would provide full functionality while also reducing the script as much as possible. To account for overloaded functions and various images the parser would take in the entire script of elements and output a single string with the element image id and its corresponding functions given identification values and the parameters converted to hexadecimal per each identification with no strict limit on the number of identifications and quantity of parameters. This aforementioned string is stored as the represented composite image and is executable to be decoded and provide reconstruction. A flow chart of this process can be seen in figure 4.
It was considered to build off of existing image editing software. The optimal software would capture user actions in the GUI and save them as a script, as this would make the describer’s workflow much easier than manually writing the script. The options were narrowed to GIMP, Adobe Photoshop and ImageMagick as candidates because they all include both GUIs and scripting utilities. ImageMagick was quickly eliminated, however, because it is built for per-image editing, not placing and editing multiple images on the same canvas . And while GIMP and Photoshop could store UI commands like macros  and had editing histories , macros lacked the necessary flexibility and editing histories lacked the necessary precision. Existing plugins to capture image editing commands were either incomplete or generated so much data that the edit log would be larger than the original image. Thus, a research-specific editing software was designed because Python’s Pillow  image editing library included all the necessary image editing functionality while affording more flexibility, including the ability to add a GUI on top of the original program in the future.
3. Experimental Results
Table 1 shows the compression algorithm used compared to the file sizes for each of the images. The Original column consists of the original image. The WEBP column consists of images compressed with WEBP with the quality set to zero. The JPEG column has images compressed with JPEG at quality zero. The HAAC column consists of the file sizes that the research group last year achieved. HAAC + Algorithm is the file size of just the plain python script to build the image. HAAC + Algorithm + Parser is the size of the python script after being compressed further. The bar graph provided is a visual aid to Table 1. To make it easier to read, the vertical axis is put on a logarithmic scale.
Table 2 shows the deterministic differences between a compressed image and the original image. The WEBP images are compressed with a 0 quality and the JPEG images are compressed to a quality of 1. To compare the images, both the original image and the compressed image were resized to 1000 by 1000 pixels. On the resized images, an algorithm will compare each of the pixel values and then score them if they are within a certain boundary of each other. The bar graph above is a visual representation of Table 2.
4. Conclusions and Further Research
Through our research, we were able to create a compression algorithm capable of compressing an image by approximately 99.9884 percent. The average file size of all the images we compressed came out to be a total of 318 bytes from the original 2 million. However, there is the drawback that our images were deterministically the least similar to the original image, being on average, around 25.22 percent different.
In the previous research on HAAC, elements were referenced by linking to images on the internet. However, links are unreliable due to the compressed file possibly linking to images that are no longer available or whose URL was altered. Additionally, there exists the issue of isolating elements from a larger image. Freeform cropping elements from their source images was a time-intensive part of the compression process. Both these issues would theoretically be solved with an element database designed for the purpose of semantic compression. This would allow the ability to provide links to elements within the database with the assurance that they will be available later. Furthermore, elements would be isolated from their backgrounds to avoid the laborious step of freeform cropping when creating composites.
Further research could be conducted on the feasibility of the aforementioned database. A cost-efficiency analysis should be conducted to measure how large the database must be in order to provide enough elements to contain consistently similar elements to the original image while minimizing overall size. If the database is not exceedingly large, clients could also store a local copy, thus removing the transmission cost of downloading all elements required to recreate a composite image.
To build even further on the idea of complete automation would be to have an AI work to both segment and form instructions on how to recreate the image. Segmentation software is already a very common type of software and has already accumulated much research. On the other hand, creating directions is a field that does not yet have much research and is very important to make HAAC fully automated.
Thank you to all those who supported this project. Thank you to the mentors: Micheal Herrera, Shubham Chandak, Kedar Tatwawadi, and Tsachy Weissman. Thank you to the past researchers: Ashutosh Bhown, Soham Mukherjee, Sean Yang, Irena Fischer-Hwang, Shubham Chandak, Kedar Tatwawadi, Judith Fan and, Tsachy Weissman.
 Weissman, T., Tatwawadi, K., Fischer-Hwang, I., Chandak, S., Yang, S., Mukherjee, S., & Bhown, A.(2019, June). Towards improved lossy image compression: Human image reconstruction with public-domain images. Retrieved from Stanford Compression Forum website: https://compression.stanford.edu/human-compression
 David Taubman and Michael Marcellin, JPEG2000 Image Compression Fundamentals, Standards and Practice, Springer Publishing Company, Incorporated, 2013.
 Gregory K Wallace, “The JPEG still picture compression standard,” Communications of the ACM, vol. 34, no. 4, pp. 30–44, 1991.
 “WebP,” https://developers.google.com/speed/webp/, Accessed: 2018-10-16.
 [Webpage compressor]. (n.d.). Retrieved from bitly website: https://bitly.com/
 ImageMagick [Computer software]. (n.d.). Retrieved from https://imagemagick.org/index.php. Kiel, S. (n.d.). Automate Editing. Retrieved from gimp.org website: https://www.gimp.org/tutorials/Automate_Editing_in_GIMP/
 Script recorder (a.k.a. macro recorder) [Online forum post]. (2001). Retrieved from gitlab.gnome.org website: https://gitlab.gnome.org/GNOME/gimp/issues/8
 Creating actions. (n.d.). Retrieved from helpx.adobe.com website: https://helpx.adobe.com/photoshop/using/creating-actions.html
 Clark, A. (n.d.). Pillow. Retrieved from pillow.readthedocs.i website: https://pillow.readthedocs.io/en/stable/
 Yu, F. (2017). ResNet-152 in Keras [Computer software]. Retrieved from https://gist.github.com/flyyufelix/7e2eafb149f72f4d38dd661882c554a6
 https://pythonprogramming.net/programming-k-nearest-neighbors-machine-learning-tutorial/. (n.d.). Retrieved from pythonprogramming.net website:
 PyPE-python-photo-editor [Computer software]. (2016). Retrieved from https://github.com/corentinm68/PyPE-python-photo-editor