Image Compression and Restoration Using Neural Networks

EE376A (Winter 2019)

by Jia Zhuang and Xinyu Ren

Outreach Event: Human Vision Systems

Why animals have different eyes/vision systems?

What is light? What is color? What can we see?

How do we process what we see?

Motivations and Introduction

Traditional image compression and restoration process have been using variety of algorithms to minimize pixel-wise losses and to optimize MSE and PSNR (peak signal-to-noise ratio) characteristics. However, the resulting images usually can still be blurred and the high spatial frequency components are not well reconstructed. Convolutional Neural networks (CNN) have been widely implemented in many areas of computer vision and image processing to improve the quality of the images and videos.

From precious studies, many literature has focused on tuning the architecture of the network for specific goals. However, with different architectures in CNN, the loss layer is equally important during restoration process since it serves as a learning target that drives the network learning. In this project, we implement different loss functions in a Variational Autoencoder (VAE) and train the network with both MNIST and CIFAR-10 datasets to demonstrate and compare how loss functions can affect the result images.

Some Backgrounds

Variational Auto-encoder Network (VAE)

VAEs are powerful generative models that have diverse applications in image processing and generation areas.
The core idea behind autoencoders is not to replicate the same input image, instead, it randomly sample from the latent space. So the VAEs are by design to have continuous and easy random sampling latent spaces.

Loss Functions: L2 norm

L2 norm penalizes the squared magnitude of all errors directly.

Loss Functions: L1 norm

L1 loss functions minimizes absolute differences between estimated values and target values, so L1 does not over-penalize larger errors as L2 sometimes does. L1 and L2 is a direct comparison, which attempts to reduce the artifacts introduced by L2, and they have different convergence properties.

Loss Functions: Structural Similarity Index Metric (SSIM)

SSIM index is perceptually motivated and it quantifies the image degradation caused by data loss during compression process. SSIM calculation requires to look at not only the current pixel of interest but also the neighboring pixels.

Loss Functions: Multi-scale Structural Similarity Index Metric (MS-SSIM)

The choice of $\sigma_{G}$ affects the quality of SSIM and can cause extra artifacts if not chosen carefully. Also, SSIM does not take care of some real-life variables when people are viewing the images including image resolution, the distance between viewer and image etc. MS-SSIM is an improved version of SSIM.

Loss Functions: Cross Entropy

Cross entropy loss is also known as log loss, which measures the divergence between two models whose output is a probability value between 0 and 1. The loss metric takes in two distributions, a truth distribution and an estimated distribution, and if the measured difference is large the yield cross entropy loss is respectively large. Overall, the cross entropy metric converges fast and more likely to reach the global optimization.

Loss Functions: Cascaded Metric: MS-SSIM and L2

Perceptual metrics like SSIM and MS-SSIM are better at preserve the contrast in high-frequency regions, but they are not sensitive to uniform biases. Similarly, L1 handles color and luminance better since errors are weighted equally and locally. To capture both characteristics, the combined loss function can also be considered.

Comparing Some Results

L1 Results

L2 results

SSIM Results:

MS-SSIM Result

Cross Entropy Results:

CIFAR-10 Dataset

For CIFAR-10 Dataset, selected objects and animals are shown here to compare how different loss function can change the output images. In this section, images are shown in following order: original image, L1+SSIM, L2 and MS-SSIM.

Conclusion and Future Works

The loss function is a critical part in training the neural network and optimizing the output images. In MINST dataset, it is capable to see different effects resulted from different loss functions. In general, in author’s knowledge, perceputral loss functions (SSIM, MS-SSIM) are stronger in producing round corners and the statistical loss functions ($L_{1}$, $L_{2}$, cross entropy etc.) performs better in sharp corners.

The images trained with CIFAR-10 are not as satisfying as expected and has a lot of headroom to improve. However, the image quality is limited by the power of variational auto-encoder network. Some interesting cascaded network architectures have been proposed in the literature which demonstrates improvement in image compression and decompression quality. Santurkar proposed in “Generative Compression” which replacing the decoder portion of CAE network with a separately trained DCGAN. GAN network is well known for producing high-quality images, and by experimenting different loss functions with this cascaded network, better image quality can be achieved.


A technical report can be found here:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.