# Lab/Activity: Breaking the Code – A Graphical Exploration using Bar Charts

## Group Members:  ______________________          _________________________

______________________           _________________________

Statement of the Problem: Codes have been used by governments for thousands of years to encrypt messages. Deciphered codes have decided the outcome of wars and uncovered assassination attempts.  A great deal of time and energy has been spent throughout history on developing and breaking codes.

One of the earliest codes was used by Julius Caesar to communicate with his armies. Known as the Caesar Shift, the code is a simple shifting of the alphabet. For instance, a shift of five letters would code the letter a as f, b as g,…, z as e. Your task is to decode the following message using bar charts.

Line 1:              svukvujhsspunavaolmhyhdhfavduz

Line 2:              uvddhypzkljshylkhukihaasljvtlkvdu

To break this code you will first create a bar chart of non-coded English. Then you will create a bar chart for the frequency of letters in Lines 1 and 2 above. You will compare the two bar charts and hypothesize a shift. (As noted above a shift of 5 letters would mean that the ‘coded’ letter f would be replaced by an a.) Next, check to see if the shift that you hypothesized makes sense. That is, do you get recognizable words as you apply the shift to the coded message. If not, hypothesize a different shift.

Strategy to break the code: Using bar charts to break the code involves five steps:

·        Step I. Count the frequency of the letters in the coded message.

·        Step II. Create a bar chart of non-coded English.

·        Step III. Create a bar chart of the coded message.

·        Step IV. Compare the two bar charts and hypothesize the shift.

·        Step V. Apply the shift to the coded message.

Step I. Count the frequency of the letters in the coded message.

·        Split the four group members into pairs. The first pair will work with Line 1 of the coded message and fill in columns two and three (Line 1 Tally and Line 1 Frequency) of the table on the next page. The second pair will work with Line 2 of the coded message and fill in columns four and five (Line 2 Tally and Line 2 Frequency) of the table.

·        After both pairs are done, fill in column six (Total Frequency) of the table with the sum of Line 1 Frequency and Line 2 Frequency.

The frequency table of the coded message is:

 Letter Line 1 Tally Line 1 Frequency Line 2 Tally Line 2 Frequency Total Frequency Decoded Letter A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Step II. Create a bar chart of a non-coded reference message in SPSS.

·        Open a blank worksheet to type in data.

·        Click on the variable view tab at the bottom of the screen. Change the name of Var0001 to alphabet.

·        Go back to the data view tab.

·        Type the lower case alphabet under the column labeled alphabet (a, b, c,…)

·        In the second column of the data worksheet, type in the counts from the reference message's frequency table given below. (Note. For any count of 0 use 0.01.)

·        Click on the variable view tab and rename this variable cntknwn.

·        Weight the data by the variable cntknwn. (In SPSS, Data ® Weight Cases ® Weight cases by ® Frequency variable = cntknwn ® OK.)

·        Make a bar chart. (In SPSS, Graphs ® Bar ® Simple ® Summaries for groups of cases ® Define ® N of cases ® Category Axis = alphabet ® OK.)

·        Edit the bar chart as follows:

1.      Double click on the bar chart to get into the Chart Editor window.

2.      Change the x-axis so that the entire alphabet shows. (In SPSS, Chart ® Axis ® Category ® OK ® Labels ® All labels ® Continue ® OK.)

3.      Add the title “Bar chart of reference message.” (In SPSS, Chart ® Title ® Type in “Bar chart of reference message” ® OK.)

4.      Close the Chart Editor by clicking on the X in the upper right corner.

Frequency Table for the reference message.

Count

Letter

Count

Letter

Count

Letter

# Count

A

19

B

2

C

5

D

10

E

23

F

3

G

3

H

8

I

17

J

3

K

3

L

11

M

12

N

23

O

21

P

6

Q

0

R

13

S

12

T

20

U

7

V

2

W

4

X

0

Y

8

Z

0

Step III. Create a bar chart of the coded message.

·        Return to the Data Window. (In SPSS, Window ® 1 – SPSS Data Editor.)

·        In the third column of the data worksheet, type in the counts from your frequency table of the coded writing done in step I. (Note. For any count of 0 use 0.01.)

·        Click on the variable view tab and rename this variable cntcode.

·        Weight the data by the variable cntcode. (In SPSS, Data ® Weight Cases ® Weight cases by ® Frequency variable = cntcode ® OK.)

·        Make a bar chart. (In SPSS, Graphs ® Bar ® Simple ® Summaries for groups of cases ® Define ® N of cases ® Category Axis = alphabet ® OK.)

·        Edit the bar chart as follows:

1.      Double click on the bar chart to get into the Chart Editor window.

2.      Change the x-axis so that the entire alphabet shows. (In SPSS, Chart ® Axis ® Category ® OK ® Labels ® All labels ® Continue ® OK.)

3.      Add the title “Bar chart of coded message.” (In SPSS, Chart ® Title ® Type in “Bar chart of coded message” ® OK.)

4.      Close the Chart Editor by clicking on the X in the upper right corner.

Step IV. Compare the two bar charts and hypothesize the shift. Fill in column 7 (Decoded Letter) of the table under Step I.

Step V.  Apply the shift to the coded message. Write the decoded message below.

Questions: After you have completed decoding the message, answer each of the following questions in complete sentences.

1.      Describe the process that you used to compare the two bar charts. Specifically,

a. Did you look at the highest peak in each bar chart? This does not work. Why?

b. What did you look for in the two bar charts that helped you to break the code?

2. Once you decided on a shift, describe how you were able to determine if your hypothesized shift was correct.

3. What would you expect to find if you compared the frequency table of the reference message given in step II and a frequency table for another piece of non-coded English of the same length? Explain.

The frequency table of the coded message is:

 Letter Line 1 Tally Line 1 Frequency Line 2 Tally Line 2 Frequency Total Frequency Decoded Letter A 3 2 5 T B 0 0 0 U C 0 0 0 V D 2 3 5 W E 0 0 0 X F 1 0 1 Y G 0 0 0 Z H 4 4 8 A I 0 1 1 B J 1 2 3 C K 1 4 5 D L 1 4 5 E M 1 0 1 F N 1 0 1 G O 1 0 1 H P 1 1 2 I Q 0 0 0 J R 0 0 0 K S 3 2 5 L T 0 1 1 M U 4 3 7 N V 4 3 7 O W 0 0 0 P X 0 0 0 Q Y 1 2 3 R Z 1 1 2 S

Step V. Decoded Message

Line 1 London calling to the far away towns

Line 2 Now war is declared and battle come down

(This is from the song "London Calling" by the Clash.)

Question 1. (a) Comparing the highest peak in the two bar charts would imply that the coded letter H should be decoded as E or N. However, neither of these decoding strategies works. The reason is that when you have two samples of writing they should have similar but not identical frequency distributions. This is due to sampling variability.

Question 1. (b) The easiest way to break the code is to look for similar patterns in the two bar charts. Notice that the reference message has a high frequency region from a to e. That matches a high frequency region from h to l in the coded message. Thus, you would hypothesize a 7-letter shift that codes a as h, b as i, ... z as g.

Question 2. Begin applying your hypothesized shift to the coded message. If you are getting recognizable words then you have correctly identified the shift employed to encrypt the message.

Question 3. Frequency distributions for two reference messages of equal length should be similar but not identical. Different messages will employ different words and hence have different letter frequencies. Again this is due to sampling variability.