by Chelsea Sidrane and Ryan Holmdahl
What we did
In a fast-moving field such as computer science, it’s vital to stay aware of new advances. Most computer scientists, though, aren’t academics, and may not have the background or time required to read and process formal research papers. In domains like deep learning, this issue is addressed by a thriving ecosystem of blog-style tutorials, where academics and practitioners write about new developments in a way that’s approachable to people with a non-academic or non-professional background. It’s so far proven to be a great supplement to formal publications, and has helped CS folks from diverse backgrounds stay up-to-date on the latest deep learning developments
Outside of deep learning and a few other “hot” fields, this kind of blog ecosystem doesn’t really exist. We thought that information theory — a domain which impacts nearly everything computer scientists do — could really benefit from writing that falls somewhere between “Information In Small Bits: Information Theory for Kids” and Shannon’s “A Mathematical Theory of Communication”. Some non-threatening but still-informative blog posts could get people thinking more about information theory and deliver the field’s innovations to a bigger crowd.
The posts written for this project piggybacked off of the success of the deep learning blog ecosystem, focusing on topics that combine fundamental ideas of both information theory and deep learning. Hopefully, we can leverage the deep learning hype to get more people interested in information theory.
Post 1: Maximum-entropy Inverse Reinforcement Learning
Chelsea’s research field is reinforcement learning (RL), and she found that in papers and blogs that she read, there was never significant discussion of the information theoretic ideas that are used in RL. In order to remedy this gap in the (blog) literature, it seemed fitting to address a popular technique that borrows a great deal from information theoretic ideas: Maximum Entropy Inverse Reinforcement Learning. Hopefully the blog post can give other RL researchers a deeper understanding of the information theoretic ideas behind MaxEnt IRL.
Post 2: Lossless compression with neural networks
Most deep learning papers either present a new network architecture or apply an existing architecture to a new problem. Kedar Tatwawadi’s paper, DeepZip: Lossless Compression using Recurrent Neural Networks, managed to do both, describing a new neural network/arithmetic coder hybrid and then using that model to tackle lossless compression, a domain neural network research hasn’t often explored. It seemed like a great topic to excite deep learning folks while teaching them about key information theory problems and the cool tools used in the field. The post covers the broad task of compression, the challenges of lossless compression, and Tatwawadi’s neural-network-based approach to it. Hopefully, it can get readers thinking about new ways to leverage machine learning for central information theory problems.
Post 3: Information Bottleneck in neural networks
For our third blog post, we stumbled upon a topic exactly at the intersection of information theory and deep learning: the information bottleneck (IB) debate. This ongoing academic debate surrounds an information theoretic theory for how to understand the training process of deep neural networks. We originally approached the blog post as a summary of the debate, focusing on one of the original papers proposing the topic by Naftali Tishby, and on one which made some counters to its arguments by Ziv Goldfeld. But ultimately, we found that trying to explain all sides of the debate in detail was a monumental task. There are at least 10 papers involved, as well as additional talks following up on many of the papers. After beginning discussions with both Tishby and Goldfeld, we realized that a proper survey of the debate would likely require extensive discussions with many different authors. It also became clear that properly addressing all of the papers would require substantial study of the field to fully understand the arguments being made. In the end, we settled on summarizing the original IB theory itself and discussing the various questions that have been raised and explored regarding applying IB theory to neural networks. In the end, this post was a good lesson about scoping a project so that it’s appropriate in size.
My outreach event was a game using cards from the game Set:
Kids would play in pairs. One player would come up with a pattern — say, red-green-purple, or stripe-stripe-solid — and then the other player would try to guess that pattern. The first player can’t tell the other anything about the pattern; all they can do is show examples of that pattern. After each example, the second player gets to take a guess. If they get it right, they win. If they don’t, then the first player picks a new example to show them, and they keep going until they get the pattern.
The goal of the game was to get kids thinking about what makes an example “informative.” It’s a nice introduction to the idea that some messages, even if they’re the same length, have more information than others, and that a good example minimizes the guesser’s uncertainty about the space of possible patterns.
Overall, I’d say the players figured it out pretty quickly. Some, especially older kids, picked up on it immediately, and could get it in two guesses (or one, in some luckier cases). Younger kids took a bit longer with it, but it was really great seeing them piece things together as they played. A lot of the time, they’d assemble three cards, look them over, think for second, then swap out a card or two for better, more informative ones. The biggest source of confusion was probably the definition of a pattern. I assumed it would be interpreted as a sequence of three attribute values, given the few examples of patterns I’d tell them, but I guess my examples weren’t informative enough to explain it!
For my outreach project I prepared a”Bill Nye” type presentation for the elementary school students at the event. With the help of Noor, I gave a short and hopefully fun lesson on a key information theory concept: entropy, and talked about one of the major problems in information theory: the joint source channel coding problem. I did this through every-day examples in the hopes that it would make it more relatable for the kids. I also told the audience a little bit about my field of aerospace engineering, as I thought this would be exciting for the kids (who doesn’t love rockets?). I closed the presentation by talking about a time that I struggled in physics as a freshman in college. Overall, my intention was to convey a little information theory to the assembled audience, and to also present myself as someone that the elementary school students could see themselves in. (Err, and I also wanted to tell a few jokes about space :P).