Deep Medical Image Encoding for Lossy Image Compression

EE376A (Winter 2019)
By Arjun Sawhney, Samir Sen and Meera Srinivasan


Medical Image compression is crucial to the efficient storage and retrieval of medical images and important in reducing costs during medical processes. Standard image compression approaches are sometimes not enough since medical images exhibit unique properties: they tend to be larger than other images in that they often encode highly sensitive and specific information that need to be of high resolution to allow medical professionals to make accurate diagnoses and inferences.

Additionally, oftentimes, medical images have similar underlying characteristics. Similarity in the depiction of cells, nuclei, and tissue motivate us to explore deep learning architectures that are good candidates for the efficient learning of these lower level features and show promise in lossy medical image compression while not compromising core integrity of the image. In this vein, our work intends to be a preliminary investigation of the feasibility of a variety of deep learning approaches on the problem of medical image compression through image encoding. We focus on, implement and compare three encoding approaches specifically: Traditional Autoencoder, Convolutional Autoencoder and Deep Generative Compression [7].

Related Works:

Deep learning approaches have become common methods to apply to the problem of image compression and compression in general. Attempts initially started with basic autoencoding (unsupervised) models using two layers to downsample to a smaller latent space and then try to recreate the original source but have now shown increased promise when extended to more complex types of models such as hybrid Generative Adversarial Networks (GANs) and Recurrent Neural Networks (RNNs) as shown by Agustsson et Al and Toderici et Al in [1] and [2] respectively.

However, much of the literature does not distinguish between classes and domains of images and there exists limited literature on how variations of deep learning approaches can work on medical images, a focus of this work. Furthermore, much of the work on deep learning approaches for medical images is shown to adapt from traditional feed-forward approaches as outlined by the comparative study done by Dabass et Al in [3]. As such we hypothesize that the experiment with convolutional autoencoders (which have shown promise in other image processing domains) could be particularly poignant and relevant.

Dataset and Task:

The dataset used in this study is from the Broad Institute [4] and in particular the Broad Bioimage Benchmark Collection. The image set is described to be “of a Transfluor assay where an orphan GPCR is stably integrated into the b-arrestin GFP expressing U2OS cell line” and contains 144 8-bit TIFF images of size 512 by 512. Consider below an example image from the dataset.

Screen Shot 2019-03-25 at 10.58.06 AM

For pre-processing, the ImageMagick package was used to resize the input images to 128 by 128 size for computational tractability before the images were randomly partitioned into 100 training images and 44 images for the testing set. Note that we had explored the possibility of sampling 128 by 128 windows of the larger image to augment our dataset, but to retain the completeness and integrity of the images, and basis discussion with our project mentor, we decided to change the size of the original images.

Traditional Autoencoder:

The first encoding model attempted was a traditional unsupervised and fully-connected autoencoding model. In order to facilitate the architecture, the images which were converted into a two-dimensional arrays of pixel values, were flattened into input vectors. These input vectors were then put into a deep autoencoder (with architecture as shown in the graphic below) which attempted to learn a function g to the bottleneck layer (2048 dimensional), which downsampled input vector x, to g(x) and then the decoding component attempted to learn function f(x) such that ||x-f(g(x))||^2 was minimized (the L2 loss function). Note that this model used an Adam Optimization algorithm with learning rate 0.005 and RELU activation functions for all layers. All of the implementation details can be derived from the code for the models which shall be linked at the end of the post.

Screen Shot 2019-03-25 at 10.24.45 AM.png

Convolutional Autoencoder:

The second encoding model was a deep convolutional model with 3 pairs of convolutional and pooling layers in the encoder and 3 pairs of convolutional and upsampling layers in the decoder. Intuitively using filters or windows of parameters to interpret, downsample and recreate the image to the latent space. In order to ensure a fair comparison between the two models, the bottleneck layer here was ensured to have the same total number of dimensions as that in Model 1, namely (16 by 16 by 8). The graphic below details the full convolutional model and, consistent with Model 1, the Adam Optimization algorithm was used along with a learning rate of 0.005 and RELU activations.

Screen Shot 2019-03-25 at 10.48.26 AM.png

Deep Compression GAN:

The third model deviates from the traditional autoencoding approach outlined above and uses a GAN-based architecture. Generally, we initially encode the image performing convolutions on the original image set using techniques outlined in [7] (and similar to those in above models) and then update our encodings using the GAN. The proposed conditional GAN formulation differs from those above in that it uses the learned compression of the medical image as input to a generator function which predicts the pixels associated with the true medical image given information about its encoding. We then train this generator using an accompanying discriminator function which is provided the fake and true image and computes the loss on the generated image using the traditional GAN loss (KL divergence between generated and true image) [7]. Thus, we are able to optimize our encoder function by propagating this loss all the way back to the convolution weights. Below is the architecture of the Deep Compression GAN, note that an Adam Optimizer was used with a learning rate of 0.002. 

Screen Shot 2019-03-25 at 12.24.16 PM.png

Conclusion and Results:

The models were each evaluated on mean squared error, a metric which takes the squared L2 distance between the reconstruction and original images and divides this by the total number of pixels present in both images.

Overall the results of the convolutional autoencoder seem to be much more promising than that of the original autoencoder and the deep compression GAN. Interestingly, while the original autoencoder achieved the weakest results on both the training and validation sets, it showed the smallest gap between the results in training and test perhaps indicating slight underfitting. Additionally, although the absolute mean squared error for the Deep Compression GAN was stronger than the traditional autoencoder (and not as strong as that of the Convolutional Autoencoder), it showed the weakest generalization between the train and test set perhaps indicating high variation. It is important to note however, that GANS are notoriously unstable and so there potentially could have been room for stronger results with more rigorous tuning. The full results table is shown below.

Screen Shot 2019-03-25 at 11.21.52 AM

The performance of the convolutional autoencoder doesn’t seem surprising upon visual inspection of the images of the dataset. The images show a lot of similar structure and often have cells popping up in different areas of the image and so the translation invariant property of convolutional-based networks seems to be very beneficial. Further, there are large portions of images that are almost entirely background and the convolutional-based approach seems more robust in dealing with such cases.

We also notice that the results on the GAN-based approach resulted in encodings that could be decoded with high accuracy in regard to location of cell structures. The difficulty with the GAN model is that generated images seem to have high variability and thus result in cell images that are often more blurred compared to the other models. Another observation of interest in the generated images from the GAN architecture is that there seemed to be regions of striations and pixelations. To account for this, it would be beneficial to implement several of the GAN training advice provided in In future iterations, it would be beneficial to experiment with more sophisticated loss functions (such as Wasserstein GAN and Stack GAN) as well train the model for longer and on a larger repository of the medical cell images.

Thus, we have investigated various ways we may be able to encode medical images for efficient and scalable storage across an ever increasing size of medical data, including 3D images or scans. By optimizing a generalizable encoding function for such image formats, we can thus run scripts which convert original full-pixel images to encoded bit streams which are able to store a much larger set of medical images. Furthermore, the ability to build this mapping will greatly improve medical image search – health professionals can query a particular medical for its nearest matches across the encoding space and their related patient records.

Future Work:

Considering that this work was very much a preliminary look at Deep Learning models for encoding Medical Images, there exists large scope for additional work.

Firstly, it is important, given the motivation and applications of the work, to relate it directly to compression. Specifically, this would involve taking the encodings, performing lossless compression on them and then comparing the resulting file sizes against baselines such as JPEG.

Second, it is important to further experiment with different types of models for the encoding and compression problem aforementioned. Specifically, variants on generative models like the Deep Compression GAN have shown much promise in the domain and so experimenting with alternative generative approaches outlined by Tschannen et Al [5] and Duan et Al [6] could be plausible next steps.

As always, in a more immediate sense, it could be beneficial to experiment hyperparameters to tune the approach for this particular dataset. Additionally, it could be beneficial to test the robustness and feasibility of the approach across other medical image datasets, such as X-ray images and others.

The code for our project can be found here:

Outreach Activity:

On March the 14th, as part of EE376A we attended an outreach compression night where we went to Lucille M Nixon Elementary school and tried to explain notions of image representation and similarity to elementary school students. In anticipation for the event, we had planned an interactive activity along with a PowerPoint that intended to cover image representation and intuitive understandings as to current methods of image comparison. A few of the slides with pictures are shown in-line below.

Our intent was to use an activity to get students how notions of similarity or closeness between numbers extend to similar quantities among images and get an appreciation for how images are represented in a computer. The exercises in the powerpoint were designed to illuminate color image representation and lead into L1 and L2 distance definitions, with visualizations of these functions being formed by the students using sour patch strands. Below are some of the slides that prompted these discussions.

Screen Shot 2019-03-25 at 11.32.23 AM.pngScreen Shot 2019-03-25 at 11.33.00 AM.png

Our group then used an activity to showcase limitations of the L1 and L2 measures. The exercise involved a reference image and two adapted images where students were asked which of the adapted images was closest to the reference and were surprised by the outcome (which was caused by large weighting on background pixels etc). We then used these limitations as a motivation to explain SSIM (structural similarity index) as a possible approach and get their thoughts on how they, as humans, compared the images.

Screen Shot 2019-03-25 at 11.33.24 AM.pngScreen Shot 2019-03-25 at 11.33.42 AM.png

Overall, the average age of students tended to be younger than expected and so we had to adapt our activity based on individual students, often focusing on the concepts of image representation and coloring approach taken in computers for the younger students.

Outreach was a truly integral part of this class and it was a wonderful experience to try and communicate our knowledge and enrich the lives of others through the event, we thank the wonderful course staff for the opportunity and for organizing the event!

54522243_624815717943510_4025976620987711488_n.jpg  The best sour punch straw rendition of a lion by the kids! 🙂









Leave a Reply