align=middle width=80%

Student's version

HTML Format

Word Format

Breaking the Code – A Graphical Exploration Using Bar Charts

John Gabrosek
Department of Statistics
Grand Valley State University
1 Campus Drive
Allendale, MI 49401-9403

Michael E. Schuckers
Department of Statistics
410 Hodges Hall
West Virginia University
Morgantown, WV 26506

Statistics Teaching and Resource Library, October 25, 2001

© 2001 by John Gabrosek and Michael E. Schuckers, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.

The statistical educator often finds it difficult to convey the beauty and power of descriptive graphical data summaries to her students. Breaking the Code actively engages students in constructing and interpreting bar charts. The activity requires students to describe data graphically, compare the frequency distribution depicted in two bar charts, construct and test a hypothesis, and communicate results.

The activity begins with an explanation of the Caesar Shift for message encryption (Singh, 1999). The Caesar Shift is a translation of the alphabet; for example, a five-letter shift would code the letter a as f, b as g, … z as e. We describe a five-step process for decoding an encrypted message. First, groups of size 4 construct a frequency table of the letters in two lines of a coded message. Second, students construct a bar chart for a reference message of the frequency of letters in the English language. Third, students create a bar chart of the coded message. Fourth, students visually compare the bar chart of the reference message (step 2) to the bar chart of the coded message (step 3). Based on this comparison, students hypothesize a shift. Fifth, students apply the shift to the coded message.

After decoding the message, students are asked a series of questions that assess their ability to see patterns. The questions are geared for higher levels of cognitive reasoning.

Key words: bar charts, Caesar Shift, encryption, testing hypotheses


The objectives of the Breaking the Code activity include:

bullet Constructing frequency tables
bullet Constructing bar charts
bullet Comparing distributions by looking for patterns
bullet Forming and testing a hypothesis
bullet Understanding sampling variability
bullet Explaining results of statistical procedures
bullet Working cooperatively in a group
bullet Using a statistical computer package (such as, SPSS for Windows or Minitab).                                 

Materials and Equipment

The following materials and equipment are needed for the Breaking the Code activity:

bullet A classroom set of handouts, one for each student
bullet A computer along with statistical software is optional, but recommended

Time Involved

This activity has been used for four semesters in a general education introductory statistics classroom. The activity has been assigned early in the semester when students are unfamiliar with a statistical computer package. The time involved has been as follows:

bullet Step I of the activity – Allow the last 10 minutes of a class period
bullet Steps II to V of the activity – Allow an entire 50-minute class period held in a computer lab

Depending on course structure, consider the following alternative approaches:

  1. Eliminate the computer portion of the assignment and have students produce bar charts by hand. The activity can be completed in a 50-minute class.
  2. Give a brief lecture of the decoding approach with a short example. Assign the activity as individual or group homework.

Regarding the Data and Graphs

Non-coded writing is used to produce a reference distribution for the frequency of letters in the English language. Any writing of at least 250 letters could be used. Below we give a writing sample. The sample will not have the same frequency distribution of letters found in the English language as would another writing sample. Students will compare bar charts of a coded message and the reference distribution to hypothesize the shift used to encode the message. The message that we used to generate the reference distribution is:

And, most importantly, I would like to thank my family for their unconditional love and generous support. Without the encouragement of my parents, Joseph and Ann, my brother, Joe, my sister-in-law, Jenna, my nieces, Gabrielle and Madison, and my sister, Anita, I could not have completed this work.

The frequency table of the reference message is:

Letter Count Letter Count Letter Count Letter Count
A 19 B 2 C 5 D 10
E 23 F 3 G 3 H 8
I 17 J 3 K 3 L 11
M 12 N 23 O 21 P 6
Q 0 R 13 S 12 T 20
U 7 V 2 W 4 X 0
Y 8 Z 0        

The bar chart of the reference message is:

The two lines of the coded message are:

Line 1: svukvujhsspunavaolmhyhdhfavduz
Line 2: uvddhypzkljshylkhukihaasljvtlkvdu

The frequency table of the coded message is:

Letter Count Letter Count Letter Count Letter Count
A 5 B 0 C 0 D 5
E 0 F 1 G 0 H 8
I 1 J 3 K 5 L 5
M 1 N 1 O 1 P 2
Q 0 R 0 S 5 T 1
U 7 V 7 W 0 X 0
Y 3 Z 2        

The bar chart of the coded message is:


After completing the activity, students should be able to interpret bar charts, state and test hypotheses (informally), and explain the concept of sampling variability (informally). Question 1 of the activity requires students to look for patterns in interpreting bar charts. Questions 1 and 2 informally assess understanding of the process used to formulate and test hypotheses. Question 3 addresses knowledge of sampling variability. On homework and exams students should be required to interpret bar charts looking for peaks, valleys, and unusual observations. Students should be required to write about sampling variability and hypothesis testing. For example, students should be able to answer the following question:

You are given a six-sided die with each of the numbers 1,2,3,4,5,6 imprinted on one face.

  1. Discuss how you can determine whether the die is "fair." (By fair we mean that all six faces of the die are equally likely.)
  2. Suppose that you and I independently follow the procedure you outlined in part (a). Would you expect our results to be identical? Explain.

Teaching Notes

We have observed the following when using the activity in an introductory statistics classroom:

bullet Students appreciate the hands-on nature. On a scale of 1 to 5 (1 = strongly disagree, 5 = strongly agree), the mean student response to the statement “The activity was more interesting than solely a lecture on bar charts” was 4.50.
bullet Giving an example of a shift applied to a short message helps to avoid student confusion.
bullet Keeping the required computer skills to a minimum allows students to focus on interpreting the bar charts.
bullet Students will try to compare single peaks between the reference and the coded messages. We purposefully chose a message, that when decoded, has a most frequent letter other than the most frequent letter (a tie between e and n) in the reference message. A hint to “look for general patterns of peaks and valleys” will usually get students on the right track.
bullet Closely monitoring each group’s progress is essential. Students will proceed far down an incorrect path. This is especially true if they have hypothesized an incorrect shift.
bullet We have used the same coded message for each group; however, there is no reason that either the message and/or the shift could not be varied from group to group.
bullet Some computer packages (for example, SPSS for Windows) will not print a label on the horizontal axis of a bar chart for a category that has frequency 0. We recommend that students substitute 0.01 for 0.


Singh, S. (1999).  The Code Book.  New York: Doubleday. 


Editor's note: Before 11-6-01, the "student's version" of an activity was called the "prototype".