Light Conversation: Transmitting the Entropy of Spoken English
By Adam Lavertu, Tymor Hamamsy, and Justin Cheung
This blog post is split into three parts.
Part 1 explains why we chose our subject matter: the entropy of spoken english, and is written for a general audience in mind (including Nixon elementary students)
Part 2 explains how we built it
Part 3 explains what we built
Part 1:
Why:
Information, what is it really? Claude Shannon, the creator of the field of Information theory, described information as a reduction in uncertainty.
Is language, both spoken and written, information? Yes.
How do we measure information? Entropy. Entropy is our surprise about information.
Claude Shannon brought entropy to Information theory. He loved Entropy so much. In fact, he called his house the Entropy House.
Language can be described using entropy. Some things we say more often, and they don’t capture as much surprise.
Low entropy: “The”, “And”, “There”, “Password”
High entropy: “supercalifragilisticexpialidocious”, “xylophone”
For example, you probably wouldn’t want to use a low entropy word to be your password…
Claude Shannon saw that language was information, in fact, he saw that all language was statistical in nature, saying: “anyone speaking a language possesses, implicitly, an enormous knowledge of the statistics of the language. Familiarity with the words, idioms, cliches, and grammar enables him to fill in missing or incorrect letters in proof-reading, or to complete an unfinished phrase in conversation.”
See how well you can use your statistical understanding of english by filling in the blanks of this phrase:
_____ Shannon was the invento_ of entrop_, he is my her_, I want to study informati_ theory for the ____ of my life.
Claude Shannon also did this in his foundational paper: Prediction and Entropy of Printed English.
In his paper, Claude outlined it as below. The first line is the original text; the second line contains a dash for each letter that was correctly guessed using the statistical nature of english.
Shannon even built his own communication system for encoding (reducing) and decoding (predicting) messages using statistics.
So, how did we arrive at English, or any language at all, as a species?
We arrive at English by making restrictive rules. There is a a restricted vocabulary and grammar to our language. We need to be predictable, and we need to be less surprising (lower entropy).
Shannon once said, “Why doesn’t anyone say XFOML RXKHRJFFJUJ?”
What is “free” communication in English? And are we ever not restricted? How random and surprising (high entropy) can your English be?
Claude Shannon was a code breaker during World War 2, so he knew better than anyone, that you can break languages into pieces, and look at the frequencies (inverse of surprise) for words, pairs of letters, letters, or any other reduction of language.
But how do you bring all words/letters into one common scale?
After you capture the entropy, you can convert it into bits.
What is a bit? It can either be 1 or 0. Heads or Tails.
All information in the universe can be reduced to bits. Silicon Valley made its billions from bits. And Shannon invented the bit (with some help from his friends at Bell Labs).
At the heart of Information theory, is the mathematical communication of information.
Here is Shannon’s famous diagram:
Before Claude Shannon, “everyone thought that communication was involved in trying to find ways of communication written language, spoken language, pictures, video, and all of these different things— that all of these would require different ways of communicating. Claude said no, you can turn all of them into binary digits. And then you can find ways of communicating the binary digits.” -Shannon’s colleague and friend Robert Gallager
So what are we showing you here and what does it have to do with Shannon and entropy?
Part 2:
How:
We wanted the forms of the sculpture to feel rotationally isotropic and simultaneously convey a sense of directional flow. Initial form concepts included 2.5D geometries such as polygonal landscapes or geodesic domes, and 3D geometries such as ellipsoids and teardrops. We settled on spherical forms to maximize animation flexibility.
We built three spheres of different sizes, with diameters of 12”, 18” and 24”. These sizes corresponded to the relative number of symbols that each would be a canvas for – unigrams require a large canvas, bigrams a smaller one, and words the smallest.
Modelling:
The spheres were modelled as layered dodecahedrons in Rhinoceros 3D. They consist of an inner skeleton which provides the main structural rigidity and surface on which to lay the LED strips. Offset spines were attached to the middle of each side of the dodecahedrons to provide a surface offset from the LED planes. The resulting structure is the same geometry of the inner skeleton offset 2.5”. This outer geometry provide supports for sandblasted acrylic panels that act as the diffusion layer for the LEDs. The final geometry can be thought of as 12 panelled slices.
Fabrication:
All pieces were cut on the PRL laser cutter. The structure was cut out of 0.21” (nominal 0.25”) thick duron. The acrylic panels were cut from 0.06” extruded acrylic and sandblasted on one side.
Assembly:
The sculptures were designed to not require fasteners: the duron skeleton is press-fit together, and have tabs that constrain the acrylic windows in place. Once the skeleton was assembled, LED strips (WS2812B 60 units/m) were adhered to the inner spines. Once cut to length, the diffusion panels were snapped into place.
LED Control
We used individually addressable WS2812B LED strips controlled by AllPixel controllers. Each sphere had its own controller. These were all controlled from a single computer.
Calculation of symbol probabilities:
We focused on three symbol schemes for the language entropy study, focusing on unigrams (single characters), bigrams (two characters), and words. In order to calculate the surprise associated with each of the symbols we sought a conversational corpus from which we could empirically calculate the probabilities. This led us to the Reddit comment corpus, where we downloaded all ~42 million of the comments submitted to the site in January, 2014. We used this text as a basis for calculating the empirical probabilities associated with each symbol.
Voice interaction system:
The end-to-end speech to surprise light system was implemented in Python 3.6. We used the python “SpeechRecognition” package and the Google Speech to text API to capture audio and extract the associated text.
Programmatic control of the LED lights of all three spheres was achieved through the BiblioPixel library. We manually mapped the location of each LED within the sphere segments to the corresponding integer position of that LED on the LED strip. The input text was then tokenized based on the three different symbol schemes. The LEDs for the corresponding sphere where then set according to surprise associated with each token, following the color scheme below:
A primary issue was the difference in the sizes of the symbol sets due to the different symbol schemes, for example the input text “dog” has a set of size 3 in the unigram scheme, a set of size 2 in the bigram scheme and a set of size 1 in the word scheme. Corresponding to 3, 2, and 1 potentially different LED colors. In order to adjust for this, we matched sphere size to relative symbol scheme set sizes (i.e. big sphere = unigrams, medium sphere = bigrams, small sphere = words). Surprise values were then binned into the above color scheme according to their empirical surprise values. LEDs were set in blocks of 3 to ensure that colors were distinguishable after the diffusion layer.
The final system can be run from the command-line. It defaults all LEDs to a single color, red for the beta test, then users prompt the system with the Enter key. The recording device first adjusts for ambient noise for two seconds, then prompts the users with “Say Something!”. Audio is recorded until the user stops speaking, then the audio is transformed to text, tokenized and associated color schemes are presented on each of the spheres. Color schemes stay on the spheres until the user presses the Enter key, the system then resets and is ready for the next user.
Part 3:
What:
We wanted to do a spin on Claude Shannon’s seminal “Prediction and Entropy of Printed English”, by making a sculpture that reflected the prediction and entropy of spoken english. Our main goal was to convey the entropy of spoken language as surprise, and to show surprise with light. We also wanted to convey the statistical nature of english, and how it can be deconstructed in so many different ways, using unigrams (individual letters), bigrams (pairs of letters) and words to convey this deconstruction. Entropy in information theory was coined by Claude Shannon, and he was inspired by entropy in physics; the second law of thermodynamics states that the entropy of an isolated never decreases. We wanted to convey that the entropy (Shannon) history of spoken english is also never decreasing, and that all language has non-negative entropy, so we keep records (as lights) of everything that what was said in the past.
Here is a demo of our some of the Nixon students using the system:
https://drive.google.com/file/d/1LA5Ewgu8c-z4Ul6TZW_DXrorXoBcUudP/view?usp=sharing
As a visitor approaches the sculpture, we prompt the visitor with a message to say something. Then, we send the recorded language to the google cloud api to recognize the speech that was said and we display the spoken english on the computer to the visitor. An essential part of displaying the spoken english back to the visitor is to convey the concept of a noisy channel to visitors. Especially for the Nixon students, this concept could not have been more true, because the speech recognition algorithms are trained mainly on adult voices, and require the clear, slow enunciation of language. For many of the Nixon students, the channel was so noisy that the meaning of their messages were completely skewed.
https://drive.google.com/file/d/1e4Yw4P6EQkKIQ0IRjBI7sIlnLoCZmUdu/view
After we receive the transcript of the spoken english, we break the english into three channels: unigrams, bigrams and words, and calculate the entropy of each channel. The corpus that we use for word frequencies is Reddit (conversational text) where we have a backend including word, unigram and bigram frequencies that are our base for entropy calculations.
Finally, we bin each of our entropy calculations into 12 bins ranging from very low surprise (“the”, “and”) to very high entropy (“supercalifragilisticexpialidocious”) and we map every unit (letter, pair of letters, word) to its respective orb (big, medium, small respectively), and every unit gets mapped to 3 LEDs.
Thanks for tuning in. It was a blast to make this and be a part of this class!