Secret Codes

Information Theory (Winter 2020)

Ryan Stocking and Stephen Badger

Introduction & Motivation:

My team was very excited to share our interest of information theory and engineering with the students of Ravenswood Middle school so we opted for an outreach based project. In addition to our video, we also made a worksheet for the kids to follow which provides more information and introduces extra concepts. We hope that our lesson teaches them about secret codes as well as how to make their messages safer. In combination with all of the other great outreach projects done this year they can learn about how entropy and information theory are used in technology that they use every day.


Different Cipher Explanations:

Let’s say you want to send this message to your friend by writing it in code.

MEET YOU LATER

Our first code is called the A1Z26 cipher, because cipher is another word for code. It’s very simple. It uses the numbers 1 for A, 2 for B, all the way until 26 for Z.

Using this chart, our code would be 13 for M, 5 for E, all the way until 18 for R.

13 5 5 20   25 15 21 12 1 20 5 18

This looks like nonsense! Your friend can use this chart to decode the message, but anyone else who sees it won’t know what it means.

But if you don’t want other people to read your message, there’s a big problem. Anyone else can also use this chart to read your message. There’s only one way to code and decode the message, so it’s pretty easy to guess how to decode.

Our next code is a little trickier to figure out. It’s called a Caesar cipher. This time, we pick a secret number from 1 to 26. Let’s say we picked 5. Now, we use the same table as above, but Step 1 is to change each letter into its number form. Step 2 is to add our secret number. Step 3 is to report back the letter that’s above that number.

If the number we get in Step 2 was more than 26, we subtract 26 and then go to Step 3.

Let’s go through it step-by-step for our first few letters.

Our first letter is M. The number below M is 13. If we add 5, we get 18, which is below R. So we report back R. This is shown below.

If we do this on the rest of MEET, we will get RJJY. Then, when we get to the Y of YOU, we change it into 25 and add 5, getting 30. We subtract 26 and get 4, so we say D.

The code we will get is below.

RJJY DTZ QFYJW

This time, our friend also needs to know our secret number to decode the message. Since you might pick any number, then anyone who doesn’t know the secret number would have to try all 26 possibilities. This will slow them down, but not very much.

Our last and most advanced code is the Random substitution cipher. This code replaces every letter with a different letter, but the letters don’t have to be in order. For example, we might say that A is represented by a G, B is represented by P, and so on, choosing a different random letter for each other letter. No letter should be used twice. Our cipher might look like this.

Now, our secret code will be

NVVD SLC HGDVU

This code would be very hard to guess. You could choose any of 26 letters for A, then any of the 25 remaining letters for B, and so on. There are 26 x 25 x 24 x 23 x 22 … possibilities for this code, or 403291461126605635584000000. This is more than the number of stars in the universe, so you’re probably safe from anyone trying all of them.


Advanced Topic: Frequency Analysis

Unfortunately, our codes might not be as secure as we hope. One reason is because of frequency analysis. Notice that in our random substitution cipher, every time the same letter is encoded, it is represented by the same letter. In the example code, A will be represented by a G every time it comes up.

A savvy observer might notice that certain letters are more common in the alphabet than others. This chart shows roughly how common each letter is.

As you can see in the chart, the most common letter is E. Notice that this was also true in our message, since there were 3 E’s. On the contrary, J, Q, X, and Z are quite rare.

The savvy observer might guess that the most common letter in the code corresponds to an E. In our code above, they might see the 3 V’s and correctly guess they correspond to E, and then they can decode any other V’s as E’s.

Our code also has two T’s which are represented by D’s. Since T is the second most common letter, they might be able to guess that when they see a D, they can decode it as a T.

If our message was long enough, frequency analysis is a powerful technique that would give someone a reasonably good chance of guessing at least part of the code.


Worksheet:

Example 1

What kind of cipher shifts every letter by the same number of positions?

Example 2

Decode this A1Z26 code: 14 9 3 5    10 15 2 !

Example 3

If a Caesar cipher encodes the word SECRET as HTRGTI, how many positions were shifted?

Using this shift, what is the decoding of YJAXJH?

Example 4

Use the random substitution cipher from the How To Guide to decode this message, and encode your response.

E   HLMV   RLDVF PVRGSFV:

Outreach at Ravenswood Middle School

Blog

Over the course of the quarter students in EE276 have been preparing for an outreach event at Ravenswood Middle School. The class will be teaching about a wide range of topics related to information theory. Some teams are talking about mapping political landscapes while others delve into the theory of code breaking. There are groups demonstrating applications to the military when flying jets and others showing information theory through Fortnite. From all of us at Stanford, we are excited to share what we’ve learned this quarter and help inspire the next group of scientists and engineers!

We hope everyone is getting excited for a great afternoon!