Teaching Information Theory with Minimal Equations to a High Schooler

Blog, Information Theory (Winter 2021)

By: Rebecca Wong (rwong01@stanford.edu)

I had the opportunity to mentor a lovely high school senior, Sam, who I got connected with through my old high school physics teacher. Sam is taking physics right now, but she intends on studying political science in college and is interested in law and social issues. Even though her main interest is not in a hard science or engineering discipline, she is broadly very curious and a voracious learner who loves learning and taking an interest in courses and topics outside of her immediate interests, for which I’m very grateful for as it made our conversations engaging, and challenged me to translate very math-heavy topics into something she could understand and appreciate without the math background overhead.

Over the course of the quarter, we met weekly on Saturday afternoons for nominally an hour, but what usually turned into something closer to an hour and a half. Besides just talking about information theory, I took the opportunity to share with her my experience in college as she was wrapping up her college applications earlier in the quarter and now is waiting to hear back on results. She was also very interested in my entrepreneurial experience in starting a company which I also enjoyed sharing about and encouraging her to explore more in college, wherever she ends up. We usually spent the first 20 minutes or so just talking about how to make the most of college, the joys and challenges particularly having a shared experience from our high school and talking about what it does and doesn’t prepare you for, and then we’d launch into our information theory topics.

Because Sam didn’t come from a particularly technical background, I spent the first few sessions motivating what information theory was, why it was useful, and providing a lot of examples of how different topics were used. For instance, I think this was her first exposure to the idea of compression and bits, as well as how computers and devices communicate to each other, so I provided examples of how zipping a file can make it smaller and easier to share with another person or how images and video are compressed so they can be rapidly distributed across the internet for millions of people to view. I also motivated the fundamentals of information theory with the philosophical question about ‘what is information, really?’ to stir a conversation about whether information was or wasn’t (and ultimately claimed it wasn’t) tied to the physical manifestation of the object. A file on a USB versus a file on a hard drive versus a printed copy still represent the same information.

I introduced the concept of bits and described computers as just a collection of tiny, tiny switches that could be either on or off, and we talked through why it was much easier to built a computer made up of lots and lots of binary switches (and why binary was used) instead of switches with 10 different states (to use the decimal system). I wanted to make sure that there was a strong motivation for why we would use and care about such a difficult arithmetic base system and that it was not an arbitrary choice just for the sake of making things harder by not using a decimal system. While I don’t think she ever was totally comfortable with arithmetic and logarithms in base 2, I think she understood enough of the basics of why this different system was used (and that there were other systems in general besides decimal).

Because reading numbers in binary was pretty difficult, I used this as a jumping off point to talk about prefix code, where I described receiving an entire string of one’s and zero’s and showing how if you didn’t have prefix-free code, you would easily get confused and be unable to decode a message. With prefix-free code, however, you could read even a very long and complex string perfectly without error. One difficulty I had in teaching these new concepts was figuring out if I should go deeper and spend more time on these fundamentals or keep it high level in order to give more broad understandings. Ultimately, I erred on the latter side, opting to skim over some of the details where she didn’t seem to mind as much so we could build this into something more interesting.

One of the core concepts that we built up to was the “surprise” function and the notion of entropy in a probabilistic system. I think the intuition of comparing a fair coin flip (equally surprised by either outcome) vs an unfair coin flip (less surprised by the more likely outcome and more surprised by the less likely outcome) made this concept easy to grasp. I challenged her to come up with a mathematical function that she thought could model this kind of relationship, and while she didn’t come back with a totally rigorous answer, she had enough intuition that the \frac{1}{x} made sense, and adding the log in front to make something with probability 1 have no surprise was relatively straightforward. So we came to the creation of our surprise function.

Over the next couple sessions, I introduced the concept of weighted averages which was also a new concept. I drew parallels to calculating a mean/average, but gave examples of uneven distributions of numbers to show how that can shift the average from when it is uniformly distributed. Since we had both watched The Social Dilemma, I gave the example of how an ad does not have equal probability of being shown to every single person on Instagram or Facebook. Instead, there’s a weighting on say, how likely someone is to like or click on that ad (based on the “personas” built up), and that feeds into the algorithms of which ads are shown to which people. We worked through how to represent this mathematically to build up confidence that we could talk about non-uniform distributions which would be important for later explanations of entropy. Finally we constructed the mathematical equation for entropy as the “weighted average of the surprise function” by combining these two concepts together. This was the most math-intense part of our discussions and I thought Sam handled all of the new ideas very well. I didn’t expect 100% comprehension, but it was great to see that she got most of the ideas.

In the last couple sessions, I brought back in high-level concepts and examples of how entropy tells us how much better we can (or can’t) get with our efforts to communicate more effectively. I think the analogy to English text was interesting — if you can write the same idea in fewer words, we would say that the shorter description is more effective, and in general, great communicators are able to say exactly what is needed to convey an idea, no more, no less. No need to be loquacious if you don’t need to be (perhaps unlike this blog post). The intuition that you can’t get arbitrarily good (you can’t communicate nothing if you need to convey an idea) made sense as to why or how a bound might exist on compression of information. There is something fundamental, almost like mass, to information that we can’t get rid of unless we start throwing out information.

Looking back on our sessions, it really challenged me to get creative with how to strip out the math from our course material and really hone in on the core ideas couched in equations. I couldn’t fall back on mathematical definitions of concepts and had to dig deeper for something meaningful that could be understood without a coding or math background. At the same time, I took it as an opportunity to introduce some new math concepts that I don’t think she would get exposure to even in her calculus class and hoped to motivate everything with a real-world application to show how real and relevant math was in making technology in all our daily lives possible.

One challenge I had was with how to write something out and share it with Sam if I wanted to draw a diagram or show an image. I started to use Google Jamboard which turned out to be a pretty handy tool, even though I was still limited to writing on the screen with my cursor. Having a tablet or something like that could have been very handy as my cursor-art is not that legible. Nonetheless, I think using Jamboards greatly improved our ability to communicate with each other and for me to make sure she was following along. I would always solicit for any questions she had, but also worried that she might not know how to ask a question without having the lexicon to ask it. I tried to remedy this by asking her to explain back to me a concept as she understood it to check for comprehension. That helped me see her thought process and the words she would use to describe something so I could build on that basis as much as possible rather than trying to force my understanding and language onto her.

Sam was an awesome mentee and was incredibly patient and curious which made for very enjoyable sessions. I learned a lot about how to communicate technical ideas in a relatively non-technical manner and hope that my practical approach helped bring more understanding to how these ideas were grounded in reality. I don’t expect that she’ll suddenly become an engineering major, but I hope that I’ve opened a door for her curiosity to follow later on.

Leave a Reply