A Picture Says a Thousand Words

Information Theory (Winter 2020)

Elizabeth Chen, Jingxiao Liu, Kailas Vodrahalli

Introduction

Let’s talk about pictograms: what are they and how well do they convey information?

Pictograms are symbols that transmit information through visual forms [1]. You might have seen, on the containers of certain chemicals, an icon of a flame. The icon is used to indicate that the chemical is flammable, and is an example of a pictogram. Several writing systems, such as the Egyptian hieroglyphs and Chinese, use pictograms as characters to deliver ideas. In this study, we aim to probe how well pictograms transmit information. More specifically, we will be looking at pictograms from oracle bone scripts, which are ancient Chinese writings inscribed on animal shells and bones [2], as well as pictograms from modern Chinese characters, and comparing the two on their ability to convey meaning.

Figure 1. Oracle bone script character for “mountain”.

Consider the oracle bone script character for “mountain” shown in Figure 1, and then the modern Chinese character for “mountain” in Figure 2. How well can an individual with no prior knowledge of Chinese associate these characters to actual mountains? Which represents a mountain better: the oracle bone script or modern character?

Figure 2. Modern Chinese character for “mountain

Our team tried to measure the “information distance” between the pictographic characters and the photographs of the objects they represent. We wanted to know which form of writing, the oracle bone scripts or modern Chinese, was the more effective medium for transmitting visual information.

Methods

We measured information distance using two methods:

Method 1: We asked people to try to match either oracle bone scripts or modern Chinese characters to photographs of the corresponding objects they represent. We hope studying how well people correctly matched characters to images will provide us with insight on whether oracle bone scripts or modern Chinese characters more effectively represents visual forms.

Method 2: We trained a neural network to classify photographs of objects. In other words, upon training, the neural network will be able to take in, for instance, a photograph of a dog, and output “dog”. We then give the neural network oracle bone script characters and modern Chinese characters and study what the neural network outputs. In this case, the performance of the neural network on the pictographic characters can serve as an indicator on how well each type of pictographic character transmits the information found in visual presentation of the object it tries to represent.

Results

Method 1 Results

We used Amazon MTurk to obtain input from several hundred people. Each person was shown (1) a single grayscale image of 1 of 10 objects (bird, field, fish, moon, mountain, sheep, sun, tree, turtle, or water) along with (2) ten images of the ancient or modern Chinese character corresponding to these 10 objects. The MTurk worker was asked to “select the character image that most closely resembles [the natural image].” We additionally asked a survey question to determine whether the worker understood Chinese and filtered out these responses during analysis.

You can access the matching survey at the following links:

  • Image-to-Ancient Character Survey: bit.ly/info-theory-pictogram1
  • Image-to-Modern Character Survey: bit.ly/info-theory-pictogram2
  • Image-to-Ancient Character Solution: bit.ly/info-theory-pictogram-answers1
  • Image-to-Modern Character Solution: bit.ly/info-theory-pictogram-answers2

Here, we are essentially asking the workers to select the “best encoding” for a given natural image. We could have done the reverse and show them a character and have them select the “best” natural image match, with one image sampled randomly from each class (this would be something like decoding). We chose our method so that we could get responses per natural image (in the “decoding” method, we would get results relative to 10-tuples of natural images; note the character images are fixed, so this is not an issue in our “encoding” method).

In total, we obtained 2,000 responses (2 character sets x 10 classes x 10 images per class x 100 responses per image/character set pair). After filtering out responses from workers who understood written Chinese, we had 1,816 responses remaining. All further analysis is done after this filtering step.

Our analysis is shown below. In Figure 3, we plot the observed distribution of selected symbols (solid blue) vs the true distribution (dotted black line) over both ancient and modern character sets. We note that some characters are not selected often (field) while others compensate for this (fish).

Figure 3. Frequency of workers classification by class (blue bars) and true number of natural images by class (black dotted lines). Note the true numbers are near but not exactly 0.1 — this is due to our post-collection filtering of responses from individuals who can read Chinese characters.

In Figure 4, we plot the marginal distribution of selected symbols over the ancient character set and modern character set. As one might expect, different characters are weighted differently between modern and ancient character sets. For example, we turn to fish in the modern set and water in the ancient. In both of these cases, the observed frequency is higher than the true frequency of the class, suggesting that fish (modern) and water (ancient) are characters “close” to (or, in other words, often mistaken as) other classes of natural images.

Figure 4. Frequency of worker’s classification by class (blue bars), marginal worker’s classification over ancient character set (orange), marginal worker’s classification over modern character set (green), and true number of natural images by class (black dotted lines). Note the blue bars and black lines are the same ones as in Figure 3.

We now calculate the by-class accuracy of both character sets and plot the results in Figure 5. We first compute the maximum accuracy possible given the distributions from Figure 4 (i.e. 1-\Sigma_{c\in\text{classes}}|\text{observed}_{c}-\text{expected}_{c}|), and obtain 81.6% for the ancient set and 82.5% for the modern one.

Figure 5. Accuracy by class. Blue bars and orange bars are the by-class accuracies for ancient and modern character sets, respectively. Black dotted line is the maximum accuracy possible (1.0).

The true accuracy is 65.7% for the ancient characters and 18.5% for the modern ones. For every class, the ancient character has higher accuracy than the modern character. For classes such as “bird”, the accuracy difference between the 2 character sets is large, suggesting that the ancient bird character conveys more information in its shape than the modern version does. In contrast, the “field” class has similar accuracy between modern and ancient, suggesting a similar amount of information is conveyed in both the ancient and modern characters. Both of these states can be confirmed visually in Table 1, where the ancient character for “bird” looks like an actual bird and the modern and ancient characters for “field” are rather similar in appearance.

Method 2 Results

We trained a convolutional neural network (CNN) to learn to classify images of mountains, moons, trees, fields, and suns. By the end of the training, the neural network is able to take in a photo similar to the ones shown in Figure 6 and correctly identify it as a “mountain”, “moon”, “tree”, “field”, or “sun” based on the contents of the input image with more than 90 percent accuracy.

Figure 6. Sample input images of mountains, moons, trees, fields, and suns to give to the CNN.

We treated the trained CNN as a proxy for an individual that knows what the 5 classes (mountain, moon, tree, field, and sun) look like in real life, but has no knowledge of how they are written as Chinese characters. We then showed both the oracle bone script representations and the modern Chinese characters of the 5 classes to the trained neural network. The performance of the CNN is summarized in Tables 2 and 3.

We see that the CNN guessed 2 out of 5 correctly when shown the oracle bone scripts, and also achieved 2 out of 5 correct when shown the modern Chinese characters, which seems to suggest that, when measured using the classification accuracy of a trained CNN, the information distance between oracle bone scripts and actual images, and that between modern Chinese characters and images are roughly similar.

Conclusion

In this project, we tried to measure how well Chinese characters from two different time periods conveyed meaning by putting out surveys asking people to match natural images to their corresponding character representations and by studying whether a trained image-classifying CNN is able to correctly identify these characters.

From the survey results, we saw that people had an easier time associating natural images to ancient characters than they did associating natural images to modern characters. The trained image-classifying CNN performed similarly for both the oracle bone scripts and modern Chinese characters.

What are your thoughts? Which one spoke to you more: the oracle bone scripts or the modern Chinese characters?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.