Humans vs. Animal: Who’s the best for conversation?

EE376A (Winter 2019)

By Logan Spear, Grant Spellman, Heather Shen, and Andre Cornman


Introduction
Human conversation is made up of more than just the words we speak; our messages are enriched by ‘paralinguistics’, or modifications to the way we deliver our words (such as tone, volume, and pitch) and nonverbal cues (consider facial expressions, posture, and gaze). Together, these ‘superlinguistic’ alterations to our words can enhance and amplify our messages or change their meaning completely. But how can we measure and quantify these effects? For this final project, our group explored the complexity of human speech and communication as well as various channels (in addition to speech) for human conversation.

Outreach
Given that the audience at Nixon Elementary School was largely first through fifth graders, we narrowed our outreach event to focus on showing students how to visualize and compare various sounds.

We designed our project interface with two main panels: one for visualizing different animal sounds, and one for interactively visualizing short voice recordings (see Fig. 1). On the first panel, the students could choose from a list of animals (monkey, duck, elephant, or dog), play a recorded clip from the selected animal, and visualize the sound using a time and spectrogram plot. On the second panel, the student could record a three second clip of their voice and visualize it using time and spectrogram plots as well.

Figure 1: Left – the recorded noise and spectrogram of a monkey. Right – the recorded noise and spectrogram of a person talking.
Figure 2: Left – the recorded noise and spectrogram of a monkey. Right – the recorded noise and spectrogram of an undulating sound.

We found that most of the students easily understood how the amplitude of their voice affected the time plot. However, most students were not yet familiar with the idea of frequency and being able to visualize it in the spectrogram was exciting and new to them! You may have heard some high pitched noises — do not worry, that was just some excited young students learning the correlation between frequency and sound. Students were particularly excited about seeing an undulating noise visualized as a wave in the spectrogram (see Fig 2.) or how a whistle transformed into a straight line over time in the frequency domain (see Fig. 3).

Figure 3: Left – the recorded noise and spectrogram of a duck. Right – the recorded noise and spectrogram of a whistle.

To ground these concepts for students, we compared their own spectrograms with well-known animal noises, including a monkey, elephant, duck, and dog. Students were then able to try to emulate animal noises and see how that also might affect the spectrogram.
For students with more advanced understandings of frequency (and parents curious about our project), we started to broach the idea of measuring the “complexity” of sound based on simple metrics. For example, after thresholding the recorded sound to reduce noise, we displayed the minimum and maximum frequency as well as tonal variance on our GUI. Although simplistic, these metrics began the conversation around what makes human language more complex than animals. For example, we were able to show that the range of humans was typically larger than the animals and had greater variance.

Photos from the outreach event at Nixon Elementary School

Slightly more technical stuff
Now, we’ll discuss some ideas we’ve come up with through the process of creating our outreach project, reading some papers, and discussing things over dinner.

Initially, we wanted to do a project in which we captured the “sophistication” or “complexity” of a signal (specifically an audio clip) in order to compare the potential capacities of different species. What we found out, rather quickly, is that that task is extremely difficult. Within our research, though, we did find a paper that sparked our interest, ultimately leading us to flesh out the ideas we outline below. Thanks to Ariana, our mentor, for sending us so many great sources, such as this paper.

The paper that sparked our adventure into the idea for a model of human communication was On the Information Rate of Speech Communication. In this paper, they propose a model of communication and use it to estimate the information rate of English speech. The authors assume no prior knowledge of how language works, and rely on recordings of different people speaking the same sentences. Most importantly, they assume that a message consists entirely of the string of words spoken, and has no dependence on any other factor. When we read it, we were like “yeah, okay, human communication (not just speech) is complicated, and you’re going to need to make some simplifying assumptions, but that one just seems too big.” That sentiment sparked our discussions on the topic of human communication, and eventually led to the (rough) idea of a model we present now.

Our Model
At the heart of our model is the idea that a message between humans is made up of more than just the words spoken. We believe that there are many “superlinguistic” methods of transmitting and receiving information, and our proposed model tries to account for those factors. At a high level, our model consists of a message that gets chopped up into pieces (some of which may be redundant), and then those pieces each get sent across a different channel to the receiver. The receiver then tries to reconstruct the original message given all of the pieces she received. The idea is that the channels represent the various forms of superlinguistic communication, and a true message is made up of more than just the words received over one channel.

We’ll now provide a deeper discussion of the message (which we ourselves still don’t fully understand), and then the channels (which, again, we don’t fully understand).

The Message
To begin with, we point out that messages, as we see them, can be very complicated, since it’s possible to overload single interactions with multiple meanings (some of which are even unintentional). We’ll explore two different example interactions which will motivate our discussion of messages and subsequent discussion of channels.

Imagine first what we consider to be a very simple, direct message: one friend saying to another “pass me the broccoli” at the dinner table. This is an example of one of the simplest messages we could come up with: the speaker’s main (and likely only) intent is for her friend to pass her the broccoli so that she can add some to her plate. In this context, this message makes practically no use of any of the superlinguistic channels of communication, and the primary message is very clear.

Now imagine the case when a friend stomps into the room, sits down heavily in the chair next to you, sighs loudly, and says “I’m exhausted.” In this case, we imagine that the primary message that the speaker is communicating is actually a plea for the listener to ask them what’s up and comfort them, not the mere statement that they’re exhausted. This (rather contrived) example is meant to demonstrate how much meaning can be conveyed through channels other than the selection of words spoken.

In general, we see a single message as something that can actually contain information or statements regarding many different things. Some general types of information within messages that we’ve come up with include things like information about the speaker, the speaker’s opinion of other people or things, a request of the listener, or an invitation to the listener. Honestly, trying to capture all of the different types of information that could be communicated is very difficult, and this list is extremely incomplete and messy. The point, though, is to convey the idea of how layered a message can be and emphasize that there can be multiple discrete sub-messages contained within a single message. In general, we consider most messages to contain a primary message (which is the majority of what the speaker aims to communicate), and often also a secondary message (which is usually an implication made by the speaker), and the remaining information in the message is general background on context or character of the speaker, which consists mostly of things instinctively inferred by the listener rather than intentionally transmitted by the speaker.

So, in summary, messages have the potential to be very complex because of the way they can actually contain multiple, discrete ideas within a single message. To make things at least a little tractable, we consider that most messages contain a primary message, which is the speaker’s intended message, and can contain subsequent messages (such as intended suggestions by the speaker), and finally general information which is basically the information you get from a speaker purely through interacting with them and not by their intention (and it is often related to the nature of the speaker themselves).

The Channels
Now we’ll talk about the channels. In our model, the message itself gets broken up into pieces (not necessarily at random) and each piece is sent across a different channel to the receiver. These channels are supposed to represent the various “superlinguistic” modes of communication, such as concrete things like tone, body language, and cadence, but also more abstract things like speaker-listener relationship, cultural backgrounds of the involved parties, and conversational context. Again, this list is incomplete and rough, but the point is to convey the idea that communicating information relies on more abstract ideas than just the words or even tone and body language. Furthermore, each channel is usually used to relay a certain type or part of the message, which is why the message is not simply broken up at random to be sent across these channels.

When considering the more abstract things, like cultural background, we recognize that it’s weird to consider those as a channel. As a result, we can also consider the case where these abstract things instead parametrize the other, more classic channels, rather than constituting a channel themselves. For instance, a culture that makes heavy use of sarcasm may interpret messages sent across the tonal or facial expression channels very differently than a culture in which sarcasm is rare.

To summarize the whole model: in communicating, the speaker (who may not even be speaking out loud) sends a message. That message in fact contains many messages, some which the speaker intends to convey and some which she doesn’t. We assume that the speaker usually has at least a primary message which she actually intends to communicate. That whole message is split up into various blocks (some of which may be redundant) and sent across multiple channels to the receiver. The receiver then reconstructs the whole message from the received transmissions. Hopefully, all the discussion leading up to this at least sort of convinced you of the nuance in communication that we hope this model is able to capture.

Suggested Experiments
Now we’ll talk about some of the weird experiments we came up with that would try to verify or flesh out this model. These ideas are more for fun and aren’t very rigorous. Think of them more like a suggestion for how to begin thinking about experiments instead. For context, when coming up with these ideas, we thought that there were two things we needed to try to do: first is to come up with simple, repeatable messages so that you can control the message and tweak other things instead. The second thing is to find ways to cut off certain channels or at least select which channels to funnel the message through. This would allow you to explore what parts of what messages are sent across different channels.

Regarding the control of the message, we thought that one was pretty difficult. However, we suspect that if you focus on something expository, like describing pictures or videos (as they do in the Humans are Awesome project), then you can remove most of the “fuzzy” part of messages. And that’s about the best we came up with.

For the control of channels, we basically have one idea that can be reused a bunch of times, which is to directly cutoff or mess with one of the obvious channels. For instance, put two people in separate rooms and have them communicate over an instant messenger, and compare that to people talking directly about the same thing. Or put two people in the same room, but don’t let them talk and instead they still use the instant messenger. That way, they can see each other but they’re not talking. Similarly, put two people in the same room and let them talk to each other, but blindfold them, or at least don’t let them see the other person.

For testing those more abstract ideas (like cultural background or relationship), compare communication between a person and their friend or peer to the same person communicating with a non-peer. Similarly, compare the communication between two people from the same cultural background to two people that are from different cultural backgrounds (and also have relatively little exposure to the culture of their respective communication partner). Along similar lines, you could also explore the difference of language by having people communicate on the same topic twice, once each in different languages. Basically, once you get the idea, you can create plenty of your own experiments.

Conclusion:
In conclusion: we naively tried to capture the “sophistication” of communication based off of an audio clip and then convey that to elementary schoolers. We found out that’s really hard. For our outreach, we instead tried to give them something they could explore intuitively, and we had a lot of fun with that (they did too, we hope). Through some readings we did on information theoretic approaches to analyzing human speech, we came across an interesting paper that inspired us to come up with our own model of human communication. The model consists of a message which is split up and sent across multiple channels, indicating the aspects of communication beyond simply the words used. We discussed the model, and then presented some experiments people could try — experiments which would probably reveal a bunch a flaws in the model and hopefully suggest ways to refine it. Most importantly, though, we learned a lot and had a great time doing it all!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.