by Benjamin Antin and Anthony Degleris

In a recent conversation with a neuroscience professor at Stanford, we found out that his lab __spends over eight-thousand dollars per month storing data__; a clear case for lossless compression! In the following blog post, we explore lossless compression for high-dimensional neural electrode recordings on a sample dataset from Neuropixels probes. We compare universal compressors, wavelet-based methods, and a simple delta coding scheme, and __achieve roughly 3x compression__ on our dataset. Our code can be found here.

## Neuropixel and the data storage problem

There is a growing understanding among neuroscientists that in order to understand the brain, __we simply need more data__. Starting in the 1940s, scientists have used recordings from single neurons (so-called single unit recordings) as a way to probe how the brain works. Since the first recordings were made using glass electrodes, the number of neurons from which we are able to record has doubled approximately every 7 years, following a Moore’s law-like pattern [Stevenson 2011, Nature Neuroscience].

The most recent of these advances are Neuropixels Probes. These are CMOS fabricated probes capable of recording from up to 384 channels (and thus from a similar number of individual neurons) simultaneously [Jun et al 2017, Nature]. Prior to this, state-of-the-art neural recording used MEMS fabricated arrays capable of recording a maximum of around 100 neurons. All this is to say that there is a data revolution doing on in neuroscience, and __storage will become a top priority__.

## Why we care about raw data

In many cases, neuroscientists are interested in looking at Action Potentials (APs): spikes in voltage across the cell-membrane which are used to signal across long distances. Instead of looking at the raw electrical signals, neuroscientists often split the data into short time-bins (about 1ms long) and binarize the measurements: “1” means that a spike occurred; “0” mean that there was no spike. These spikes tend to be *sparse,* meaning there are far spikes than there are bins. Because of the data’s sparsity, __binarized spike data is easier to store__ than raw electrical recordings: we can just store the indices where spikes occur and achieve orders of magnitude compression (10x or possibly even 100x).

However, there’s a growing movement toward including the study of Local-Field Potential (LFP) in the study of neural activity and behavior. Local Field Potential can roughly be thought of as the low-frequency components in neural recordings. Whereas APs are all-or-nothing events which travel long distances, LFP is gradated and local. LFP is more poorly understood than Action Potentials, but there is a growing movement to study and understand it [Herreras 2016, Frontiers in Neural Circuits]. The potential __role of LFP in future studies motivates our need to store raw neural recordings losslessly__, so that the raw experimental data is always available if future studies require it.

## The dataset

To explore lossless compression of neural recordings, we use a publicly-available dataset recorded using Neuropixels probes (http://data.cortexlab.net/singlePhase3/). The dataset is a recording from 384 channels in an awake mouse over the course of an hour. The raw electrical signals have been quantized to 16-bit integers, creating a matrix of size 384 by 1.8 million. The data takes up 1.3GB when stored uncompressed. Individual electrode channels are placed along the rows of the data matrix, while the columns of the data matrix are used to index time. From here on, we’ll refer to our data matrix as $latex \mathbf{X}$.

To give the reader a sense of the structure present in these data, a plot is shown below of 20 electrode channels over 2000 time-steps. Since the data is a time series, we see that there is strong correlation across time. Since adjacent electrodes are physically close to one another, we also note strong correlations between adjacent rows.

The time-series nature of our data makes it somewhat similar to EEG data. In the figure below, we show a sample EEG recording from multiple electrodes placed on the scalp.

Note that the EEG waveforms also exhibit correlations through time, as well as across adjacent channels. Of course, EEG recordings are not distributed exactly like neural recordings (they tend to have fewer channels, for instance). Still, the two types of data seem similar enough that we’d expect approaches to compressing EEG data to work well for compressing Neuropixel data. One study [Antoniol 1997, IEEE Transactions on Biomedical Engineering], demonstrated roughly 2x compression using simple delta coding combined with Huffman coding. This work, in part, motivates our use of delta coding in the next sections.

## Understanding the data

Since the data is quantized as 16-bit integers, each entry can take on $latex 2^{16}$ possible values, ranging from -32,768 to 32,767. However, not all these values are being used. In the below figures, we see that the histogram of raw values is concentrated in about 100 bins. Even better, the histogram of first differences is concentrated in just 40 bins. This is encouraging: __we may be able store the raw values or the first differences using fewer than 16 bits per integer.__

As a lower bound on the compression performance, we first calculate the first-order entropy. For this calculation, we treat every entry of X as an i.i.d sample from a discrete distribution. The first order entropy calculation gives **3.53 bits per symbol**. Base on this calculation, we shouldn’t hope for a universal compressor to do better than **306 MB** for the entire array. Similarly, a first order entropy calculation of the difference data gives **3.37 bits per symbol**, or **292 MB** for the entire file. Can we come close to achieving these rates?

## Writing a compressor: results

### 1) Baseline: off the shelf universal compressors

Our first approach was to simply compare different universal compressors on the raw data. We stuck to the basics: **gzip and bzip2**, both of which use modified versions of Lempel-Ziv compression. We achieved around 2-3x compression. Later, we tried other compressors (such as *sprintz* or *density*) but with limited success; no compressor outperformed bzip2. This gave us insight into what compression rates we could expected.

Method | File Size | Compression Rate |

Raw | 1332 MB | 1.00 |

gzip | 559 MB | 2.38 |

bzip2 | 423 MB | 3.15 |

### 2) Simple, effective and online: delta coding

**Delta coding** is a simple technique frequently used to compress time-series data. The idea isn’t super complicated: instead of encoding the raw values, we just encode the differences. Since the differences tend to be small, we can use just a few bits to encode each value. When the differences happen to be large, we instead encode the difference in a separate file.

There is an inherent tradeoff when applying this method: using smaller bit-widths to represent the differences means that the differences take up less space. However, since we need to store the outliers as well as their indices, having too many outliers does not work well. We experimented with 8-bit, 6-bit, and 4-bit representations. The results below are for the total file size after delta coding the data.

Method | File Size | Compression Rate |

Raw | 1332 MB | 1.00 |

4-bit diffs + bzip2 | 688 MB | 1.94 |

6-bit diffs + bzip2 | 418 MB | 3.19 |

8-bit diffs + bzip2 | 403 MB | 3.31 |

8-bit diffs (w/o compressor) | 694 MB | 1.92 |

One astounding fact is that the 8-bit differences achieved nearly a 2x compression rate before applying a universal compressors. Since the first difference is easily computed in practice, this means __delta coding has the potential to compress neural data online__, i.e. directly on the hardware while recording. This could allow neuroscientists to record and __transmit Neuropixel data under limited bandwidth conditions, possibly even wirelessly__.

### 3) Fancy stuff: integer wavelet compression

Inspired by delta coding, we wanted to find other methods for compressing data quickly and efficiently, attempting to achieve high compression rates with simple calculations in a possibly online setting. We chose to investigate the **integer wavelet transform**. The wavelet transform is a method for decomposing a signal into various temporal and frequency components, and often yields sparse representations [Mallat, A Wavelet Tour of Signal Processing]. These sparse representations lead to efficient compression of many signals — in fact, wavelet transforms underlie the JPEG 2000 image compression algorithm.

We decided to implement the integer-to-integer Haar transform (a very simple wavelet transform) [Fouad and Dansereau 2014, IJ Image, Graphics and Signal Processing]. After applying this transform to each electrode, we applied the bzip2 compressor and reduced the file to **502 MB**, no better than just compressing the raw data! __Wavelets, in this case, were a failed attempt.__

## Summary

So, what did we learn from this experiment?

- Universal compressors do pretty well on neural data, and are hard to improve upon.
- Simple delta coding achieves a reasonable compression rate, and can be implemented online to reduce the bandwidth needed for transmitting Neuropixel recordings.
- Although wavelet compression has been successful in the past, it appears to work poorly for this type of data.

In the future, we hope to explore several directions

- Exploiting correlations between electrodes.
- Advanced delta coding (e.g. second order differences or linear predictors).
- Advanced wavelet techniques (e.g. using more complex wavelets, transforming both dimensions).

Overall, our project highlights how Neuropixel and other modern neuroscience datasets present a new challenge for compression — can we compress this data live with simple algorithms like delta coding in order to transmit large amounts of data wirelessly? Can we reduce the growing storage costs related to neuroscience datasets? Is there an efficient representation of these recordings, and if so, will this representation tell us something fundamental about the brain? Questions like these fit naturally in an information theory framework, and our exploration is a first step in tackling this problem.

## Outreach

For our outreach project at Lucille Nixon High School, we designed a game called *Spy Messaging* in which the players had to communicate with each other through a noisy channel. The idea is that one player (the Spy) must communicate the location of the secret documents using three characters, each written on its own sticky note, to another player (the Analyst). The trick though, is that the game master (Anthony) could erase one of the letters at random by pulling away one of the sticky notes. This simulates a binary erasure channel, where the erasure probability is the probability that Anthony decides to be sneaky. Below is the description from our poster, describing how to play the game:

The game was played using a board showing many rooms (which we borrowed from the game of “Clue”).

At the beginning of the game, we place a yellow marker in one of the rooms, signifying the location of the secret documents. While we do so, the Analyst must turn around so that they can’t see where we are placing the marker. While the Analyst is turned around, the Spy encodes the location of the secret documents using three characters, each one on a sticky note (shown above). Once the Spy is done writing, we remove the yellow marker, and (optionally) erase one of the characters in the message.

The analyst must then decipher the location of the secrete documents from whatever characters remain. If they do so correctly, they win a prize — which was candy, of course.

As we guided students through the game, we had to adapt on the fly to different age groups and maturity levels. Older kids quickly learned to assign each room its own character, and then repeat that character three times (repetition coding). For our youngest participants, though, communicating a secret message even without any erasures was challenging enough. We found it helpful to have a parent or guardian be one of the partners, so that they could coach their child. Following the lead of other booths at the outreach event, we added a “Leaderboard” where students who won the game could write their name. For reasons we still don’t fully understand, kids *loved* writing their names down after they won.

At the end of the game, we explained to each participant how the game they had just played was related to information theory: every time our phones and computers send information, that information can get partially erased. By being clever about the way we transmit messages, we can send images, texts, and even movies without losing information. Although we never used the phrase “Binary Erasure Channel,” we think kids got the idea.