Deep Medical Image Encoding for Lossy Image Compression

EE376A (Winter 2019)
By Arjun Sawhney, Samir Sen and Meera Srinivasan


Medical Image compression is crucial to the efficient storage and retrieval of medical images and important in reducing costs during medical processes. Standard image compression approaches are sometimes not enough since medical images exhibit unique properties: they tend to be larger than other images in that they often encode highly sensitive and specific information that need to be of high resolution to allow medical professionals to make accurate diagnoses and inferences.

Additionally, oftentimes, medical images have similar underlying characteristics. Similarity in the depiction of cells, nuclei, and tissue motivate us to explore deep learning architectures that are good candidates for the efficient learning of these lower level features and show promise in lossy medical image compression while not compromising core integrity of the image. In this vein, our work intends to be a preliminary investigation of the feasibility of a variety of deep learning approaches on the problem of medical image compression through image encoding. We focus on, implement and compare three encoding approaches specifically: Traditional Autoencoder, Convolutional Autoencoder and Deep Generative Compression [7].

Related Works:

Deep learning approaches have become common methods to apply to the problem of image compression and compression in general. Attempts initially started with basic autoencoding (unsupervised) models using two layers to downsample to a smaller latent space and then try to recreate the original source but have now shown increased promise when extended to more complex types of models such as hybrid Generative Adversarial Networks (GANs) and Recurrent Neural Networks (RNNs) as shown by Agustsson et Al and Toderici et Al in [1] and [2] respectively.

However, much of the literature does not distinguish between classes and domains of images and there exists limited literature on how variations of deep learning approaches can work on medical images, a focus of this work. Furthermore, much of the work on deep learning approaches for medical images is shown to adapt from traditional feed-forward approaches as outlined by the comparative study done by Dabass et Al in [3]. As such we hypothesize that the experiment with convolutional autoencoders (which have shown promise in other image processing domains) could be particularly poignant and relevant.

Dataset and Task:

The dataset used in this study is from the Broad Institute [4] and in particular the Broad Bioimage Benchmark Collection. The image set is described to be “of a Transfluor assay where an orphan GPCR is stably integrated into the b-arrestin GFP expressing U2OS cell line” and contains 144 8-bit TIFF images of size 512 by 512. Consider below an example image from the dataset.

Screen Shot 2019-03-25 at 10.58.06 AM

For pre-processing, the ImageMagick package was used to resize the input images to 128 by 128 size for computational tractability before the images were randomly partitioned into 100 training images and 44 images for the testing set. Note that we had explored the possibility of sampling 128 by 128 windows of the larger image to augment our dataset, but to retain the completeness and integrity of the images, and basis discussion with our project mentor, we decided to change the size of the original images.

Traditional Autoencoder:

The first encoding model attempted was a traditional unsupervised and fully-connected autoencoding model. In order to facilitate the architecture, the images which were converted into a two-dimensional arrays of pixel values, were flattened into input vectors. These input vectors were then put into a deep autoencoder (with architecture as shown in the graphic below) which attempted to learn a function g to the bottleneck layer (2048 dimensional), which downsampled input vector x, to g(x) and then the decoding component attempted to learn function f(x) such that ||x-f(g(x))||^2 was minimized (the L2 loss function). Note that this model used an Adam Optimization algorithm with learning rate 0.005 and RELU activation functions for all layers. All of the implementation details can be derived from the code for the models which shall be linked at the end of the post.

Screen Shot 2019-03-25 at 10.24.45 AM.png

Convolutional Autoencoder:

The second encoding model was a deep convolutional model with 3 pairs of convolutional and pooling layers in the encoder and 3 pairs of convolutional and upsampling layers in the decoder. Intuitively using filters or windows of parameters to interpret, downsample and recreate the image to the latent space. In order to ensure a fair comparison between the two models, the bottleneck layer here was ensured to have the same total number of dimensions as that in Model 1, namely (16 by 16 by 8). The graphic below details the full convolutional model and, consistent with Model 1, the Adam Optimization algorithm was used along with a learning rate of 0.005 and RELU activations.

Screen Shot 2019-03-25 at 10.48.26 AM.png

Deep Compression GAN:

The third model deviates from the traditional autoencoding approach outlined above and uses a GAN-based architecture. Generally, we initially encode the image performing convolutions on the original image set using techniques outlined in [7] (and similar to those in above models) and then update our encodings using the GAN. The proposed conditional GAN formulation differs from those above in that it uses the learned compression of the medical image as input to a generator function which predicts the pixels associated with the true medical image given information about its encoding. We then train this generator using an accompanying discriminator function which is provided the fake and true image and computes the loss on the generated image using the traditional GAN loss (KL divergence between generated and true image) [7]. Thus, we are able to optimize our encoder function by propagating this loss all the way back to the convolution weights. Below is the architecture of the Deep Compression GAN, note that an Adam Optimizer was used with a learning rate of 0.002. 

Screen Shot 2019-03-25 at 12.24.16 PM.png

Conclusion and Results:

The models were each evaluated on mean squared error, a metric which takes the squared L2 distance between the reconstruction and original images and divides this by the total number of pixels present in both images.

Overall the results of the convolutional autoencoder seem to be much more promising than that of the original autoencoder and the deep compression GAN. Interestingly, while the original autoencoder achieved the weakest results on both the training and validation sets, it showed the smallest gap between the results in training and test perhaps indicating slight underfitting. Additionally, although the absolute mean squared error for the Deep Compression GAN was stronger than the traditional autoencoder (and not as strong as that of the Convolutional Autoencoder), it showed the weakest generalization between the train and test set perhaps indicating high variation. It is important to note however, that GANS are notoriously unstable and so there potentially could have been room for stronger results with more rigorous tuning. The full results table is shown below.

Screen Shot 2019-03-25 at 11.21.52 AM

The performance of the convolutional autoencoder doesn’t seem surprising upon visual inspection of the images of the dataset. The images show a lot of similar structure and often have cells popping up in different areas of the image and so the translation invariant property of convolutional-based networks seems to be very beneficial. Further, there are large portions of images that are almost entirely background and the convolutional-based approach seems more robust in dealing with such cases.

We also notice that the results on the GAN-based approach resulted in encodings that could be decoded with high accuracy in regard to location of cell structures. The difficulty with the GAN model is that generated images seem to have high variability and thus result in cell images that are often more blurred compared to the other models. Another observation of interest in the generated images from the GAN architecture is that there seemed to be regions of striations and pixelations. To account for this, it would be beneficial to implement several of the GAN training advice provided in In future iterations, it would be beneficial to experiment with more sophisticated loss functions (such as Wasserstein GAN and Stack GAN) as well train the model for longer and on a larger repository of the medical cell images.

Thus, we have investigated various ways we may be able to encode medical images for efficient and scalable storage across an ever increasing size of medical data, including 3D images or scans. By optimizing a generalizable encoding function for such image formats, we can thus run scripts which convert original full-pixel images to encoded bit streams which are able to store a much larger set of medical images. Furthermore, the ability to build this mapping will greatly improve medical image search – health professionals can query a particular medical for its nearest matches across the encoding space and their related patient records.

Future Work:

Considering that this work was very much a preliminary look at Deep Learning models for encoding Medical Images, there exists large scope for additional work.

Firstly, it is important, given the motivation and applications of the work, to relate it directly to compression. Specifically, this would involve taking the encodings, performing lossless compression on them and then comparing the resulting file sizes against baselines such as JPEG.

Second, it is important to further experiment with different types of models for the encoding and compression problem aforementioned. Specifically, variants on generative models like the Deep Compression GAN have shown much promise in the domain and so experimenting with alternative generative approaches outlined by Tschannen et Al [5] and Duan et Al [6] could be plausible next steps.

As always, in a more immediate sense, it could be beneficial to experiment hyperparameters to tune the approach for this particular dataset. Additionally, it could be beneficial to test the robustness and feasibility of the approach across other medical image datasets, such as X-ray images and others.

The code for our project can be found here:

Outreach Activity:

On March the 14th, as part of EE376A we attended an outreach compression night where we went to Lucille M Nixon Elementary school and tried to explain notions of image representation and similarity to elementary school students. In anticipation for the event, we had planned an interactive activity along with a PowerPoint that intended to cover image representation and intuitive understandings as to current methods of image comparison. A few of the slides with pictures are shown in-line below.

Our intent was to use an activity to get students how notions of similarity or closeness between numbers extend to similar quantities among images and get an appreciation for how images are represented in a computer. The exercises in the powerpoint were designed to illuminate color image representation and lead into L1 and L2 distance definitions, with visualizations of these functions being formed by the students using sour patch strands. Below are some of the slides that prompted these discussions.

Screen Shot 2019-03-25 at 11.32.23 AM.pngScreen Shot 2019-03-25 at 11.33.00 AM.png

Our group then used an activity to showcase limitations of the L1 and L2 measures. The exercise involved a reference image and two adapted images where students were asked which of the adapted images was closest to the reference and were surprised by the outcome (which was caused by large weighting on background pixels etc). We then used these limitations as a motivation to explain SSIM (structural similarity index) as a possible approach and get their thoughts on how they, as humans, compared the images.

Screen Shot 2019-03-25 at 11.33.24 AM.pngScreen Shot 2019-03-25 at 11.33.42 AM.png

Overall, the average age of students tended to be younger than expected and so we had to adapt our activity based on individual students, often focusing on the concepts of image representation and coloring approach taken in computers for the younger students.

Outreach was a truly integral part of this class and it was a wonderful experience to try and communicate our knowledge and enrich the lives of others through the event, we thank the wonderful course staff for the opportunity and for organizing the event!

54522243_624815717943510_4025976620987711488_n.jpg  The best sour punch straw rendition of a lion by the kids! 🙂









Prequential Approach to Compression

EE376A (Winter 2019)

Jay Whang

Recent advances in deep learning has led to many powerful generative models. Among them, autoregressive models have shown much success by achieving state-of-the-art likelihood on several benchmark datasets, as well as generating high quality samples. A particular application of interest of such models is in compression by combining it with arithmetic coding, an entropy coder that relies on autoregressive conditionals p(x_{t+1}|\mathbf{x}^t) for compression. In this work, we present preliminary results on the compression rate of this scheme in the prequential setting, where the autoregressive predictor used by the arithmetic coder is trained during compression and decompression.

Report can be found at this link

Outreach Activity
Given the technical nature of my project, choosing what to present as part of my outreach effort at Nixon Elementary School was not an easy task. My primary goal was to bring something elementary school students can find exciting and become engaged. This ultimately led me to give up on the idea of doing something related to compression; I simply could not find an interesting way to present it.

Instead, based on my personal experience of learning about generative models for the first time, I decided to show what kind of cool things generative models can create. The first example was easy to choose, since I already had a language model trained on novels as part of my project work. Rather than only showing texts (or even worse, fitting some 2D Gaussian and showing scatter plots of samples), however, I went on to add images as well — of handwriting (MNIST) and Pikachu.

One important aspect I wanted to display was the ability to use the latent space to interpolate between samples. This ultimately led me to use a GAN, so I can easily map noise to samples. This turned out to be a success, as many students and their parents were intrigued by the animation of smoothly-changing samples I generated. Overall, I’m glad I was able to showcase latent-variable models to the local community, and I hope this will lead to more interested future students.

My booth at the outreach event, showcasing samples from various modalities.
The event was packed with enthusiastic families.

Painting with the Theme of Information

EE376A (Winter 2019)

Author: Nate Gruver

Painting Description

Having done a lot of technical projects in the two years, I was drawn to the opportunity to work on something a little more abstract and tangentially related to the course material. Painting is growing hobby of mine, so I chose to craft a visual representative of information theory in the medium of paint. In particular, I strove to come at the theme of information from at least two angles:

  1. Information as communication through symbolic representation
  2. The modern abundance of data and the possibility of mass surveillance that this abundance creates

In the 18th Century, Jeremy Bentham conceived of an ideal prison in which a central tower monitors rooms surrounding it in a cylinder–see pictures below (top is an idealized drawing, bottom is a real panopticon built in Cuba):

Idealized Panopticon Design
Presidio Modelo in Cuba

The panopticon as a geometric form was a nice starting place for my painting for at least two reasons. First, it’s symbolic of centralized power and mass surveillance. Second, any visual depiction of one involves a grid of squares which can then used as containers for symbols. In my painting, I chose the break these squares into four triangles each and color them individually.

I conceived of a painting from the perspective of a cell in a panopticon, where a coloring of the grid encoded a message. I drew up the two initial sketches below:

An Initial Sketch
Another Sketch with Different Grid Breakdown

I ultimately decided on a design similar to the second sketch above and stretched a 66” by 56” inch canvas (about 5 ft by 5 ft). I opted for a large canvas in order to effectively create the visual appearance of looking out from a cell and because working with large canvases can be a fun adventure in its own right.

Painting Process

There are many different kinds of paint and for this purpose I needed something that would dry to a flat, durable surface so I could work on top of finished layers. As it turns out, regular house paint for the hardware store is the best for this purpose, so I mixed generic house paints with heavily pigmented artist-grade paints to create the ones I used on the canvas.

The first step was creating the grid-like pattern of the cells in black:

Creating the horizontal lines
The final horizontal lines
Creating the vertical lines

Once I had completed the grid, I began filling in the resulting quadrilaterals with triangles. Read clockwise around each quadrilateral and left-to-right over the quadrilaterals the colors encode a secret message in base 8 ascii.

Applying the first color of triangles
Removing tape
The third and fourth colors
A version midway through

This process ended up being extremely involved (though rewarding) as there were approximately 1600 triangles to be painted in. In fact at the time of writing this I haven’t fully completely the triangles and thus could not begin the process of painting the central tower. The pictures above, however, give a very good taste of what the final painting will look like though.

Outreach Summary

For my outreach event, I did want to do something somewhat related the my painting, so I created a game roughly based on Pictionary. I also grew up on trick-based card games (ones involving bidding strategies), so I wanted to incorporate a bidding strategy component to my game.

In my version of Pictionary, InfoPictionary, you can wage how many squares of graph paper you need to communicate a clue. The squares of grid paper become bits, as you can either fill a square in totally black or leave it white. If your partner is able to guess the clue from the number of squares you decide on, you get a score bonus. Otherwise, you get a score deduction. There are also more elaborate scoring rules that can be used in the style of Bridge in which you must make hard-to-achieve bids in order to control a round of the game.

For the purposes of the outreach event I converted this event to normal Pictionary for the younger kids, but explained the more elaborate version to older children and parents.


I’m extremely grateful to Tsachy Weissman for the opportunity to work on something a little more out-there for this class. It was really fun to read the textbook and watch lectures then just brainstorm wild painting ideas. It worked both sides of my brain in a truly refreshing way.

A Children’s Book on Entropy

EE376A (Winter 2019)

What I did

I wrote and illustrated a children’s book on entropy. The book is aimed at ~8-12 year olds.

Overview: Sister and brother duo Claude and Shannon get into antics together and discover how “entropy” works in the process. This playful book weaves the fundamental concepts of information theory into a story relatable to kids today. Claude and Shannon are curious, cunning kids that are sure to spark your young learner’s interest in entropy. It’s never too early to familiarize your little one’s with the wonders of communication transmission! Give your child a head start grasping the stunning, simple concepts that underpin our technologically powered world.

Outreach activity

For the outreach activity, I read the book to the kids in the reading corner.

Information in DNA: Insertion and Deletion Codes

EE376A (Winter 2019)

In this post we’re interested in storing information in DNA. Why? Well, because it’s so small. No, seriously- DNA base pairs are something like a third of a nanometer (in length along the DNA strand). Meanwhile, bits of today’s hardware memory take space on the order of tens or hundreds of nanometers. That’s a pretty sizable jump in density, especially as building at smaller nodes becomes harder and harder. Ok, so putting data in DNA is a cool idea. However, DNA is susceptible to tricky kinds of errors: insertions and deletions. Let’s look at some of our information science knowledge and try to understand why these are so tricky.

The phrase “information science” already tells you that we’re going to be talking a lot about information. What information? Well, this is a central question to setting up any kind of communications scheme. Let’s say I tell you “hey, check out this blog post!” I have an idea in my head that might be described in terms of some number of neurons firing with certain frequency, and then the sentence leaves my mouth as a set of vibrations with particular length and frequency. Maybe these vibrations enter a text-to-speech program on my phone, and suddenly they’re 30 characters of text data, etc…

Ok, so to make it easier we define what setting we’re interested in, and we abstract our setting to bits of data crossing a channel from sender to receiver. Now our problems pretty much always look something like this: I have n bits of message data and I want to encode and send it across the channel in a light-weight form, say k bits, so that you can decode what I sent. Usually we measure the rate of an encoding- how much information the receiver gets (k bits relative to n)- as well as how good the decoder is at its job.

Well cool! Except, we’re already kind of cheating! See, it’s important to our evaluations that we not be sneaking any information through a back door- and we are, in the form of high-level synchronization. Synchronization as a word describes things that act at the same time. In this context, it refers to the sender and receiver both knowing that the message is being sent, “message-level” synchronization. “Bit-level” synchronization, similarly, refers to the receiver knowing which received bits correspond to which sent encoded bits.

Message-level synchronization is often a reasonable real-world assumption, and bit-level synchronization can be accomplished in most contexts by timing recovery techniques. Such techniques give a picture of the sent encoded bits as well as timing gaps corresponding to erasures. However, the “reception” of DNA is unconnected from any kind of timing of symbols: a portion of a message received as “AA” may have come from an encoded string “AAA”, “ACA”, “ATA”, “AGA”, or just “AA”. This gives us the concept of deletions, a form of synchronization error.

Deletions and insertions (the “opposite” kind of synchronization error) are both so tricky because they invalidate assumptions we usually like to make about information shared between sender and receiver. Synchronization codes have to ensure that instances of these errors on different members of their codebooks will still leave them unambiguous- even the single example in the previous paragraph should already give the reader a sense of how hard this task is.

One way of approaching the problem, as taken by Wang, 2012, is with a concatenated code. The general idea of a concatenated code is to encode and decode twice, once with an outer code giving some set of properties, and again with an inner code giving some other set. In this case, the outer code will be some standard “good” code like and LDPC or turbo code, making the usual assumptions of bit-level synchronization. We then can satisfy these assumptions with an inner synchronization code.

Figure 3.1, Wang, 2012- Π and Π^-1 are interleaving/deinterleaving blocks

In our EE376a project we follow Wang in implementing simple marker synchronization codes. In such a code, a particular sequence of “pilot” bits is inserted at a set interval throughout the input. Knowledge of what and where these bits are is synchronized between the sender and receiver, so the receiver can use this information to help detect where insertions and deletions occurred, and then pass this information along to the decoder for the outer code.

While our marker code implementations are sufficient to use in such a scheme, we continue to work on a complete concatenated encoding/decoding model to conduct error correction analysis. For future work, we hope to use these full models to replicate some of Wang’s simulated results, and if possible to apply the scheme in actual DNA coding in the lab.

While working on this project, we had the opportunity to engage in the information science outreach event at Nixon Elementary. Some of the concepts of synchronization and concatenated codes were likely to be a bit tricky for our audience, but we took the opportunity to teach about DNA as a form of information storage, with a focus on recovering information from incomplete sets.

For an activity, we gave the kids pictures of animals with erasures and asked them to draw what they thought should go in the missing parts. To some of them this was just a fun kind of puzzle game, but even so we’re happy that it got them thinking about recovering information from corrupted messages. For others, the activity seemed to prompt some really great questions about DNA’s role in conveying information in the body! One favorite was “how do the [ribosomes] know if the DNA is [mutated too much]?” which sparked an interesting discussion about probabilistic processes with several of the adults around as well.

Overall, this project was a great opportunity to question assumptions and learn new things, both about DNA and about synchronization codes. And hopefully it has been through this blog post for you as well! Next time you see a communication scheme, ask yourself how much implicit synchronization it relies on- and how it ensures that synchronization. And, for more reading on synchronization codes, check out some of our sources:
M. Mitzenmacher, 2019:
F. Wang, 2012: