Author: Nathan Kong
In the past decade, high throughput training of deep learning models has bolstered their success in completing difficult tasks. Unfortunately, a theoretical understanding of why these models are so successful is missing. Some work investigating why deep learning models generalize so well utilize concepts from information theory and analyze the information gain between the inputs (and outputs) and the internal representations. A problem that arose in this kind of approach was related to how mutual information was computed in deterministic neural networks. Goldfeld et al. (2018) developed a new method for estimating mutual information by analyzing information theoretic quantities using noisy neural networks and observed that the reduction in mutual information between the internal representation and the inputs (compression) is associated with the clustering of internal representations.
In this work, we reproduce some simple empirical observations in Goldfeld et al. (2018). Furthermore, we conduct some experiments related to modifying the data distribution, as previous work studying information flow in neural networks used a uniform input data distribution. We observe that for a single Gaussian data distribution, using a non-saturating non-linearity in the hidden layer such as LeakyReLU, we do not observe a clustering of the internal representations.
The code can be found here.
The report can be found here.
The main goal of my outreach was to introduce elementary school students to (biological) neural networks. In particular, I wanted to communicate with the students about how the brain processes objects in the visual field — starting from early visual cortex to higher visual cortex. To put this a little more concretely, what “parts” of the object are detected in early visual cortex and what “parts” are detected in higher visual cortex?
The activity was originally planned for two teams of four students, but given the circumstances, I had to slightly change it so that it was more amenable for single students. First, I had the students choose an object from the set shown in the image below without telling me what they chose. I would then have the students draw “one part” of the object. It could be a small part of the object such as a door knob or it could be the general shape of the object such as a circle (for the earth, sun, soccer ball, etc.). The goal was to have me guess the object based on the drawing.
If I could not guess the object, I would have the student add to his or her current drawing by adding “another part” of the object. For example, this could be another part of the house object or the seams of the soccer ball. The additions to the drawings would continue until I could successfully guess the chosen object.
The idea behind having the students sequentially draw small parts of object is motivated by the way the visual system extracts features from the visual field. In early visual cortex, features such as edges are extracted from the image. Using this information, it is very difficult to figure out the category of the object. However, as the image is further processed in the visual system, more complex features are extracted such as textures. In higher visual cortex, there are populations of neurons that respond specifically to faces or places. This is analogous to the activity I had the students carry out. As more and more features of the object are added to the drawing, it became easier for me to figure out the object identity.