Latest update Dec 19, 2022

Breaking the Cipher (Cryptanalysis)

One or two persons, level of ambition accordingly. No prior knowledge of Cryptography is required.

In this project you will be given a set encrypted texts (ciphertexts). Your task is to decrypt them to find the corresponding plaintext.

1. Caesar Cipher

This is one of the oldest and most known ciphers. It is a substitution cipher where each letter of the alphabet is shifted some c steps to the right.

Example: say we want to encrypt the plaintext: "THIS IS NOT VERY SECURE". First, pick an integer 0<c<26 (assuming we have 26 letters in the alphabet). Next, shift all letters c steps to the right. Doing this for c=7 yields:

Plaintext:  THIS IS NOT VERY SECURE
Ciphertext: AOPZ PZ UVA CLYF ZLJBYL

To decrypt we simply shift the ciphertext c steps to the left and we get the plaintext.

Your task is to write a program in F# that decodes a text file that has been encrypted using a Caesar cipher. You can assume that only the ASCII letters (A-Z) are shifted in the encrypted text, and that the text contains only capital letters. (Hint: consider how you can find c. How many different c's are there? Is it feasible to use brute force?)

Obviously your solution must involve the user. Your program should offer the user to check proposed decodings of the text, and to pick the correct one.

Test your solution using the file TEXT_1.txt. In this file, the text has been encrypted by shifting letters only: no other characters are encrypted.

2. Frequency analysis of mixed alphabet

The security of the Caesar cipher in 1 can be increased by instead of simply shifting the alphabet c positions, each letter l is replaced by the letter p(l) according to the permutation p. (A permutation is a function from a set to itself that is 1-1.) This means that an encrypted letter e is decoded into p-1(e), where p-1 is the inverse of p.

Example: we substitute letters in the regular alphabet (plaintext alphabet) according to the permutation (ciphertext alphabet) specified by the table below:

Plaintext  alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ_ (_ is whitespace)
Ciphertext alphabet: HCIJBDFXGSYEWUNL_PQRAKZMOTV

Using this ciphertext alphabet we encrypt the plaintext: "THIS IS A BIT MORE SECURE":

Plaintext:  THIS IS A BIT MORE SECURE
Ciphertext: RXGQVGQVHVCGRVWNPBVQBIAPB

The problem with this kind of encryption is that, although it scrambles the letters, it doesn't change the frequencies of which they appear. Therefore, if we analyze the frequency of each letter and compare it to the known letter frequency of the English language, we are given good hints as to the character mapping between the plaintext alphabet and the ciphertext alphabet. For instance, using knowledge about how often whitespace tends to appear in English texts, one could figure out from the above ciphertext that whitespace most likely is mapped to the letter 'V'.

Your task is to write a program that decrypts a text that has been encrypted using a mixed alphabet of the type described above. (Hint: search the web for information about the relative frequencies of letters in the English language).

Test your solution using the file TEXT_2.txt, which contains a cipher text that is encrypted using a mixed alphabet. In this file the text has been encrypted by permuting all characters in the file (not just the letters), including parentheses, periods, commas, etc.

Be aware: there can be random variations in the character frequencies in a given text, that make them deviate from them in a table. Also, frequency tables can be slightly different depending on what mix of texts they are based upon. Thus, one cannot expect a solution that is based just on execution frequencies to return a perfect decryption. Rather, one will have to work in an iterative manner like the following:

  1. Make a first guess of the encryption based on the frequencies. Most likely, the encryption will be wrong for some characters.
  2. Make guesses about the encryption of single characters. For instance, it may be that you can spot common English words that are "almost" right. You can then make a guess which characters to fill in, and you can change the assumed encryption for these and apply to the text to see whether the decryption became better. This should become increasingly simple as more and more of the encryption becomes known.
  3. You can also try to use frequency tables of character sequences to gather more information about the encryption. Such tables exist at least for length 2 and 3.

Thus, the software that provides your solution will work as a support for this process rather than providing the decrypted message fully automatically. In your report you should give a brief description how you worked your way to obtain the final clear text, and how your software supported this process.

The above is sufficient for one person. For two persons also implement:

3. Beating simple frequency analysis

Both the Caesar cipher in 1, and the mixed alphabet in 2 are vulnerable to simple frequency analysis. The Vigenère cipher tries to combat this by combining several Caesar ciphers. Recall the right shift offset c used in the Caesar cipher above. In the Vigenère cipher, we construct a 'key' K by selecting a sequence of these offsets. Let |K| be the key length (how many Caesar ciphers that are combined).

To encrypt plaintext we apply the first Caesar cipher to the first |K| characters. The next Caesar cipher to the next |K| characters, etc. After applying the last cipher the process restarts with the first.

Example: we select the key K = {4 1 7} of length 3. For each offset in the key, we construct a ciphertext alphabet by shifting the plaintext alphabet that many steps to the right.

K = {4 1 7}, |K| = 3
(Offset in-between parentheses)

Plaintext  alphabet (0): ABCDEFGHIJKLMNOPQRSTUVWXYZ_ (_ is whitespace)
Ciphertext alphabet (4): EFGHIJKLMNOPQRSTUVWXYZ_ABCD
Ciphertext alphabet (1): BCDEFGHIJKLMNOPQRSTUVWXYZ_A
Ciphertext alphabet (7): HIJKLMNOPQRSTUVWXYZ_ABCDEFG

Say we want to encrypt the plaintext: "THIS IS THE MOST SECURE YET", using the above key. The process would be:

  1. The 3 first characters "THI" are offset by 4.
  2. The next 3 characters "S I" are offset by 1.
  3. "S T" are offset by 7.
  4. "HE " are offset by 4.
  5. "MOS" are offset by 1.
  6. ...

Offsets:    444111777444111777444111777
Plaintext:  THIS IS THE MOST SECURE YET
Ciphertext: XLMTAJZG LIDNPT GZIGYSFAEL 

Your task is to implement this cipher (not to break it). That is, implement the encrypt and decrypt functions. The input to the encryption function is a plaintext and a key. It outputs a ciphertext. The input to the decryption function is a ciphertext and a key. It outputs the plaintext.

Note: there are several variations of the Vigenère cipher. You are free to implement one of your own choice. However, you need to describe your reasoning (in the report) if you decide on another variation than the above.


Björn Lisper
bjorn.lisper (at) mdu.se