By Saarthak Sarup, Raman Vilkhu, and Louis Blankemeier

## Modeling the Vertebrate Retina

“To suppose that the eye with all its inimitable contrivances for adjusting the focus to different distances, for admitting different amounts of light, and for the correction of spherical and chromatic aberration, could have been formed by natural selection, seems, I confess, absurd in the highest degree…The difficulty of believing that a perfect and complex eye could be formed by natural selection , though insuperable by our imagination, should not be considered subversive of the theory.” – **Charles Darwin**

The sentiment behind this quote truly captures the miracle that is the human eye… even Darwin had to specifically address its unique case in his argument for natural selection. In this project, we aim to capture some of this complexity in the structure and function of the eye and study it under the lens of information theory.

#### A short biology lesson…

The human eye is composed of various complex structures, however the one studied and modeled throughout this blog is the retina. The retina is responsible for converting light responses captured by the cornea into electrical signals sent to the visual cortex in the brain via the optic nerve. As seen in Figure 1, the retina lies in the back of the eye and can be further decomposed into layers of neurons and photoreceptors. As a brief introduction to the biology, the light passes through the layers from the leftmost to the rightmost: nerve fiber layer (RNFL), retinal ganglion cells (RGC), amacrine cells (AC), bipolar cells (BC), horizontal cells (HC), photoreceptors (PR), and the retinal pigment epithelium (RPE). For the scope of this blog, it is sufficient having a high level understanding that the retina translates varying light responses to electrical spikes (known as action potentials) which can be transmitted along the neurons in the optic nerve to the visual cortex, where perception takes place. For those interested in these specific retina layers and their unique contributions to this encoding process, please refer to [4].

#### Communicating through the eye

The primary objective of the project was to model the eye under the lens of information theory and see what analytical results can be derived to answer questions such as: what is the maximum rate achievable by the eye? What is the best model for the encoding/decoding done in the human visual system? In order to study these overarching questions, we had to start by modeling the system as an information processing channel.

First, the eye takes in a light response captured at the cornea and encodes that into electrical spikes defined by a spiking rate. In an information theory sense, this is equivalent to encoding a source using some encoder that takes intensity and maps it to discrete spikes. Here, as seen in Figure 2, we assume that the eye does rate-encoding, where the majority of the useful “information” is captured by the spiking rate rather than precise spatio-temporal information about each individual spike. This particular assumption is one heavily debated in literature and the true encoding scheme used by the eye is still a mystery, however this assumption is nonetheless a good starting point. Additionally, for the scope of our project, we generated the visual inputs with known statistics matching that of natural scenes, as discussed in later sections. Therefore, mapping the biology to the information theory, the encoder in the system models the retina — mapping light responses to spikes.

In the visual system, the communication channel is represented by the optic nerve, which runs from the retina to the visual cortex in the brain. In our model, this communication channel is modeled as a parallel, inhomogeneous Poisson channel with some additive Poisson noise. Specifically, the number of parallel channels corresponds to the number of retinal ganglion cells connected to the optic nerve bundle, each transmitting some spike rate on the channel.

Finally, in our model, the communication channel feeds into a decoder which implements a strategy to capture the spike rates and reconstruct the perceived visual inputs from these estimated rates. In biology, this corresponds to the visual cortex of the brain, which perceives the spikes as some visual image.

Overall, this general framework of modelling the visual system as an information theory problem is particularly interesting because we can apply well-known information theory techniques to gain insights into otherwise mysterious processes.

## Joint Source-Channel Coding: A Review

Our model of the early visual system depicts both compression via neural downsampling and communication via Poisson spiking. All together, our work fits into the joint source-channel coding framework. The optimal strategy in this framework according to Shannon’s separation theorem is to independently optimize both our source coding compression scheme and our channel coding communication protocol.

#### Source coding for optimal compression

In source coding, we deal with a sequence of inputs , which is encoded by an index and then decoded by an estimate from the output alphabet . This procedure is outlined in Figure 4 below. An important quantity in this framework is the **compression-rate** , defined as the average number of bits per input symbol. According to Shannon’s source coding theorem, it is impossible to perfectly compress input data such that the compression-rate is less than the entropy of the source. In the case where lossy compression is tolerated, the tradeoff between minimizing the compression-rate and minimizing the loss is given by the **rate-distortion curve**. Points along this curve define the smallest achievable compression-rates (largest compression factors) for a given distortion .

#### Channel coding for reliable communication

Channel coding consists of finding a set of channel input symbol sequences such that each sequence is sufficiently “far” from the other sequences and each input sequence maps to a disjoint set of output sequences. In this case, the channel input sequence can be decoded without error. Shannon showed that for every channel, there exists some channel code such that bits can be sent per channel use. is known as the channel capacity. This idea can be generalized to a channel which takes a continuous waveform as input. In this case, channel coding consists of finding waveforms, which last for duration , and are spaced sufficiently far apart such that they can be decoded without error. This is shown in Figure 4 for the case of four independent Poisson channels. Here, the codebook consists of waveforms which take the form of pulse modulated signals. As discussed in sections “Channel Capacity Under Poisson Noise” and a “Mathematical Aside”, a code generated with such waveforms can reach capacity. Thus, each continuous input is mapped to a pulse modulated signal which is then transmitted across the Poisson channel.

## Compressing a Gaussian Source with Finite Bandwidth

How can we statistically characterize our visual inputs? In the space of continuous and unbounded alphabets, Gaussian distributions maximize the available entropy under a bounded power (technically, second moment) constraint. As a result, we take a Gaussian source to be a general description of our 2D image data. To model a general bandwidth assumption on our images, we employ shift-invariant autocorrelation function given by the squared-exponential kernel . This Gaussian process allows us to draw random images as -length vectors from , where the covariance matrix is parameterized by the length constant . This process gives a non-uniform power spectrum with analytic form given below:

To further simplify our analysis, we consider images in gray-scale, as color representation in the retinal code can be handled somewhat orthogonally to intensity by color-tuned photoreceptors and retinal ganglion cells. Example frames from this procedure can be seen in Fig. 5 below.

#### How good can we *theoretically* do?

When compressing a memoryless Gaussian source with a uniform power spectrum , the maximum achievable compression rate for a minimum mean-squared distortion is given by the well-known expression, .

For our non-memoryless case, the optimal compression strategy can be clarified in the spectral domain. Here, the single spatially-correlated source can be decomposed into infinitely many parallel independent sources, each with variance given by the power . This transformation inspires a reverse water-filling solution [7], where we use a given distortion budget to determine a threshold , such that compression is done by only encoding the components where . In this parametric form, the new rate-distortion curve is given by the following equations:

To reiterate an important point, these curves depend on the length scale of the source . In the limit where , our image becomes uniform and its power spectrum is concentrated at . This source distribution corresponds to the largest rate-distortion curve, since the all the input pixels can be encoded with the single uniform intensity and all the distortion is concentrated in the quantization of that value. In the limit where , our bandwidth becomes unbounded and we recover the white Gaussian source, with a constant power spectrum . At this extreme, each input pixel is uncorrelated with the other pixels, and for compression under the same fixed distortion , each symbol must share this budget with a more aggressive quantization. This intuition is validated in Fig. 6a, where the rate-distortion curves are computed for a variety of length-scales.

#### How good do we* practically* do?

Given these fundamental limits, we now describe our neural compression scheme and compare our results. In the pioneering work by Dr. Haldan Hartline in 1938 [3], researchers discovered evidence of a *receptive field* in retinal ganglion cells, corresponding to a particular region of sensory space in which a stimulus would modify the firing properties of the neuron. In more mathematical terms, this property is modeled through a convolutional spatial filter which the neuron applies to its input to determine its response.

We thus interpret each pixel value as the encoded firing intensity of the primary photoreceptors, which we compress by applying spatial filters at each neuron location . Our model uses an isotropic Gaussian for each neuron’s spatial filter, producing an output which is used as the neuron’s firing rate. If the neurons are spaced pixels apart, a image is compressed into a output, corresponding to a compression rate of .

To measure the distortion of this compression scheme, we propose decoding schemes for reconstructing the image. These schemes are not biologically motivated, as its unclear what neural process, if any, attempts to reconstruct the visual inputs to the photoreceptors. It is nevertheless instructive to consider how our biologically-inspired encoding scheme could be reversed. A simple procedure is one which deconvolves a neuron’s firing rate with its receptive field, producing an estimate . This decoding procedure is nearly optimal under the fewest assumptions on the input source distribution, as is shown in our appendix below. A more complex procedure which is allowed perfect knowledge of the distribution’s covariance and global firing rate information can decode with much lower distortion, though its estimate is difficult to express analytically and is instead the solution to a convex optimization problem. It’s derivation is tied closely to the minimization of the Mahalanobis distance , and is also shown more explicitly in the appendix below. An example compression and decompression is shown in Fig. 7.

#### Are we even measuring this right?

Commonly used image fidelity metrics, such as mean-squared error (MSE), are simple to calculate and have clear physical meaning, however these metrics often do not reflect *perceived* visual quality. Therefore, modern research has been done on fidelity metrics that incorporate high-level properties of the human visual system. One such metric is known as the structural similarity (SSIM) metric. For the scope of this blog post it is enough knowing that SSIM incorporates image properties such as averages, variances, co-variances, and dynamic range in order to better gauge perceived similarity between images.

## Channel Capacity under Poisson Noise

After compression, how well can this information be transmitted?

Here we outline some key results from Wyner [1][2]. The Poisson channel takes as input a continuous waveform input, , with domain . We assume the following constraints: and where . Now, the channel output is a Poisson process with intensity where represents additive Poisson noise. defines a Poisson counting function which gives the number of spikes received before time . More specifically, is a staircase function where each step represents detection of a spike at the output. We then have that where .

We now describe a general Poisson channel code. The input codebook consists of a set of M waveforms where which satisfy and . The output alphabet consists of the set where is the set of all for . The channel decoder is the mapping: . The output of the channel takes on a value from the output alphabet . is a staircase function with domain . We then have an error probability given by: . The rate, of the code in bits per second is therefore .

We say the is achievable if for all there exists a code such that and . The channel capacity is then the supremum of all such achievable rates. For any set of codes satisfying these constraints, we have the following capacity:

Here, , , and .

#### But what if we had *more* noise?

Interestingly, work by Frey [5] has shown that in the presence of random, nondeterministic background noise , feedback can *improve* channel capacity (under mild assumptions on the distribution of the noise). In particular the This is surprising because the Poisson channel is still memoryless, and in general, memoryless channels cannot increase their capacity with the addition of feedback. Nevertheless, biologically speaking, this is a more realistic setting, given that seemingly stochastic vesicle release appears to inject random noise into the propagation of spikes within the nervous system. Thus this is a particularly encouraging result from information theory as recurrent connections and backpropagating dendritic action potentials are a common feature within biological neural networks, and may jointly be facilitating this feedback mechanism to counter the intrinsic background noise.

## A Mathematical Aside

For the more mathematically inclined:

#### Optimal decoders

We first set up the problem. Each image frame is vectorized into -length vectors . The receptive field of neuron as a patch, when centered at its location within the frame, is also vectorized into an -length vector . In this framework, we can compute the rates of all neurons by arranging receptive field vectors into a matrix such that the rates are computed as .

This operation necessarily loses information as the solution to is underdetermined. A common technique in probabilistic inference is determining the maximum likelihood estimate (MLE). Since our source is a Gaussian source with and covariance (where the subscript designates its parametrization by the length constant ), we write out our log-likelihood function :

Within this expression we recognize the square of the Mahalanobis distance , which can be considered an expression of the distance between a point to the distribution . Rewriting this expression along with a constant term , we get:

With this, we see we can maximize the log-likelihood by minimizing the Mahalanobis distance subject to the constraint . Moreover, this is a simple convex optimization problem, since the matrix is a positive definite precision matrix.

This decoder is optimal under the assumption that the source is a stationary Gaussian distribution with , covariance , and the decoder has perfect knowledge of the distribution. We also consider the case where the decoder makes no assumptions on a finite bandwidth of the source, i.e., . The resulting objective corresponds to a least-norm problem on :

This optimization problem has an explicit solution , which we now compare to our naive deconvolution solution . When vectorized into the format of this framework, this is equivalent to . Now the comparison becomes more clear, since the least-norm solution and the deconvolution solution are equivalent if . This is the case when each neuron’s receptive field is unit length and pairwise orthogonal (), or in other words, have no overlap over the image.

Important to note is that this decoder is optimal in the noiseless channel case. In the case where the receiver’s estimate , the decompression process minimizes the objective in the nullspace of and fails to find the appropriately reconstruction. We thus modify our constraints to respect our noisy channel by assuming the true rate falls in the set . The new stochastic optimization problem is reformulated below, where the channel noise is known to take a Poisson distribution.

#### Channel capacity lower bound

We now derive an expression for the lower bound of the Poisson channel capacity [1][2]. The main idea of this proof is that we can analyze when is constrained to a subset all possible inputs to the Poisson channel. Clearly, if this constraint is lifted, the same distribution over the inputs can be achieved by setting for any which is not a member of the constrained subset. Therefore, when is constrained to a subset of inputs to the channel is a lower bound on the capacity of the Poisson channel.

Specifically, we impose the constraint that is piecewise constant so that it takes one of two values, 0 or A, over each interval . We then define the discrete signal such that if in the interval . Otherwise, . Additionally, we assume that the detector sees only the values . Thus, it sees the number of spikes received in the interval which we define as . Although there is a finite probability that , we take for all cases where . This is a reasonable approximation for small enough . Thus, is binary valued. We have now reduced the channel to a binary input, binary output channel characterized by . Any capacity achieved by this channel will be a lower bound on the capacity achieved by the general Poisson channel as we are dealing with a restriction of the input and output spaces. However, it will turn out that this bound is tight.

From the Poisson distribution, we have that . We have that . Therefore, . Likewise, . Now, the average power constraint takes the form . Where indexes the codeword. Therefore, the lower bound on capacity is given by where the maximum is taken over all distributions of . Clearly, must satisfy . Now we define , , and . We then have that where is the binary entropy. We then have . After making a few simplifying approximations, we have an expression which is convex in . We then maximize with respect to subject to the constraint . The mutual information is maximized when . We define . We then have that the capacity is lower bounded by where .

Taking our channel to consist of neurons, each modeled as an independent Poisson channel, we have the total capacity .

## Outreach Activity

For the outreach activity at Nixon Elementary school, our group decided to create a board game called **A Polar (codes) Expedition**. The purpose of this outreach activity was to help the kids walk away with an ability to XOR, and, furthermore, use that ability to decode a simple 4-bit polar code.

The overarching theme of the board game is that the player is trapped in a polar storm and needs to use polar codes to figure out where the rescue team is coming. The outreach activity was a huge success (see some pictures from the event below). It was especially amazing to see kids even smaller than we anticipated being the target age for our game (3rd and 4th graders) pick up XOR-ing really fast and walk through the game. Additionally, it was inspiring to see the kids so excited after solving the polar code and then realizing this decoding scheme is an actual one used in modern day. Overall, our group believes the outreach component of the course was truly a wholesome experience and we all came out of it hoping to continue taking part in STEM outreach in some capacity throughout the rest of our PhDs. It was special to see the genuine curiosity that the kids had while learning about polar codes through our game and end the end of the day it really reminded us why we are here to do a PhD in the first place — it boils down to the simple fact that we find these topics to be cool and interesting and we derive happiness from learning more and more about EE.

The transcript for our board game can be found at, please take a look if you would like to read the story arcs that the kids got to take on their polar expedition: https://docs.google.com/document/d/1gQk1USQOqbW8xdxMENY7QQ4kr4LFIazgFS149aJIhuQ/edit?usp=sharing

## References

[1] A.D. Wyner, “Capacity and error exponent for the direct detection photon channel–Part 1,” in *IEEE Trans. Inform. Theory*, vol. 34, pp. 1449-1461, 1988.

[2] A.D. Wyner, “Capacity and error exponent for the direct detection photon channel–Part 2,” in *IEEE Trans. Inform. Theory*, vol. 34, pp. 1462-1471, 1988.

[3] H.K. Hartline, “The response of single optic nerve fibers to illumination of the retina,” in *American Journal of Physiology*, vol. 121, no. 2, pp. 400-415, 1938.

[4] Kolb H, Fernandez E, Nelson R, editors. *Webvision: The Organization of the Retina and Visual System [Internet]*. Salt Lake City (UT): University of Utah Health Sciences Center; 1995-.

[5] M. Frey, “Information capacity of the Poisson channel,” in *IEEE Trans. Inform. Theory*, vol. 37, no. 2, pp. 244-256, 1991.

[6] Palanker D – Own work, CC BY-SA 4.0 *https://commons.wikimedia.org/w/index.php?curid=49156032*.

[7] R. Zamir, Y. Kochman, U. Erez, “Achieving the Gaussian rate-distortion function by prediction,” in *IEEE Trans. Inform. Theory*, vol. 54, no. 7, pp. 3354-3364, 2008.

[8] Z. Wang et al., “Image Quality Assessment: From Error Visibility to Structural Similarity,” in *IEEE Transactions on Image Processing*, vol. 13, no. 4, pp. 600-612, 2004.