Latest update May 28, 2015

To avoid spam, all mail addresses on this page have the "@" replaced by "#".

One or two persons, level of ambition accordingly. No prior knowledge of Cryptography is required.

In this project you will be given a set encrypted texts (ciphertexts). Your task is to decrypt them to find the corresponding plaintext.

This is one of the oldest and most known ciphers. It is a substitution cipher where each letter of the alphabet is shifted some c steps to the right.

**Example**: say we want to encrypt the plaintext: "THIS IS NOT VERY
SECURE". First, pick an integer 0<c<26 (assuming we have 26 letters in
the alhpabet). Next, shift all letters c steps to the right. Doing this for
c=7 yields:

Plaintext: THIS IS NOT VERY SECURE Ciphertext: AOPZ PZ UVA CLYF ZLJBYL

To decrypt we simply shift the ciphertext c steps to the left and we get the plaintext.

**Your task** is to write a program in F# that decodes a text file that
has been encrypted using a Caesar cipher. You can assume that only the ASCII
letters (A-Z) are shifted in the encrypted text, and that the text contains
only capital letters. (Hint: consider how you can find c. How many
different c's are there? Is it feasible to use brute force?)

Obviously your solution must involve the user. Your program should offer the user to check proposed decodings of the text, and to pick the correct one.

Test your solution using the file TEXT_1.txt. In this file, the text has been encrypted by shifting letters only: no other characters are encrypted.

The security of the Caesar cipher in 1 can be increased by instead of simply
shifting the alphabet c positions, each letter *l* is replaced by the
letter *p(l)* according to the permutation *p*. (A permutation is
a function from a set to itself that is 1-1.) This means that an encrypted
letter *e* is decoded into *p ^{-1}(e)*,
where

**Example**: we substitute letters in the regular alphabet (plaintext
alphabet) according to the permutation (ciphertext alphabet) specified by
the table below:

Plaintext alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ_ (_ is whitespace) Ciphertext alphabet: HCIJBDFXGSYEWUNL_PQRAKZMOTV

Using this ciphertext alphabet we encrypt the plaintext: "THIS IS A BIT MORE SECURE":

Plaintext: THIS IS A BIT MORE SECURE Ciphertext: RXGQVGQVHVCGRVWNPBVQBIAPB

The problem with this kind of encryption is that, although it scrambles the letters, it doesn't change the frequencies of which they appear. Therefore, if we analyze the frequency of each letter and compare it to the known letter frequency of the English language, we are given good hints as to the character mapping between the plaintext alphabet and the ciphertext alphabet. For instance, using knowledge about how often whitespace tends to appear in English texts, one could figure out from the above ciphertext that whitespace most likely is mapped to the letter 'V'.

**Your task** is to write a program that decrypts a text that has been
encrypted using a mixed alphabet of the type described above. (Hint: search
the web for information about the relative frequencies of letters in the
English language).

Test your solution using the file TEXT_2.txt, which contains a cipher text that is encrypted using a mixed alphabet. In this file the text has been encrypted by permuting all characters in the file (not just the letters), including parentheses, periods, commas, etc.

**Be aware**: there can be random variations in the character frequencies
in a given text, that make them deviate from them in a table. Also,
frequency tables can be slightly different depending on what mix of texts
they are based upon. Thus, one cannot expect a solution that is based just
on execution frequencies to return a perfect decryption. Rather, one will
have to work in an iterative manner like the following:

- Make a first guess of the encryption based on the frequencies. Most likely, the encryption will be wrong for some characters.
- Make guesses about the encryption of single characters. For instance, it may be that you can spot common English words that are "almost" right. You can then make a guess which characters to fill in, and you can change the assumed encryption for these and apply to the text to see whether the decryption became better. This should become increasingly simple as more and more of the encryption becomes known.
- You can also try to use frequency tables of character sequences to gather more information about the encryption. Such tables exist at least for length 2 and 3.

Thus, the software that provides your solution will work as a support for this process rather than providing the decrypted message fully automatically. In your report you should give a brief description how you worked your way to obtain the final clear text, and how your software supported this process.

The above is sufficient for one person. For two persons also implement:

Both the Caesar cipher in 1, and the mixed alphabet in 2 are vulnerable to simple frequency analysis. The Vigenère cipher tries to combat this by combining several Caesar ciphers. Recall the right shift offset c used in the Caesar cipher above. In the Vigenère cipher, we construct a 'key' K by selecting a sequence of these offsets. Let |K| be the key length (how many Caesar ciphers that are combined).

To encrypt plaintext we apply the first Caesar cipher to the first |K| characters. The next Caesar cipher to the next |K| characters, etc. After applying the last cipher the process restarts with the first.

**Example**: we select the key K = {4 1 7} of length 3. For each offset in
the key, we construct a ciphertext alphabet by shifting the plaintext
alphabet that many steps to the right.

K = {4 1 7}, |K| = 3

(Offset in-between parentheses)

Plaintext alphabet (0): ABCDEFGHIJKLMNOPQRSTUVWXYZ_ (_ is whitespace) Ciphertext alphabet (4): EFGHIJKLMNOPQRSTUVWXYZ_ABCD Ciphertext alphabet (1): BCDEFGHIJKLMNOPQRSTUVWXYZ_A Ciphertext alphabet (7): HIJKLMNOPQRSTUVWXYZ_ABCDEFG

Say we want to encrypt the plaintext: "THIS IS THE MOST SECURE YET", using the above key. The process would be:

- The 3 first characters "THI" are offset by 4.
- The next 3 characters "S I" are offset by 1.
- "S T" are offset by 7.
- "HE " are offset by 4.
- "MOS" are offset by 1.
- ...

Offsets: 444111777444111777444111777 Plaintext: THIS IS THE MOST SECURE YET Ciphertext: XLMTAJZG LIDNPT GZIGYSFAEL

**Your task** is to implement this cipher (not to break it). That is,
implement the encrypt and decrypt functions. The input to the encryption
function is a plaintext and a key. It outputs a ciphertext. The input to the
decryption function is a ciphertext and a key. It outputs the plaintext.

**Note**: there are several variations of the Vigenère cipher. You are
free to implement one of your own choice. However, you need to describe your
reasoning (in the report) if you decide on another variation than the above.

bjorn.lisper#mdh.se