Cryptanalysis
Under construction
Congratulations on getting here!
Cryptanalysis
Cryptanalysis is the breaking of codes and ciphers. Cryptanalysts try to break codes and cyphers created by cryptographers.
Monoalphabetic Substitution Ciphers
The first cryptanalysts were arab scholars of the Abbassid Islamic caliphate. Official documents and tax records were at that time protected by cipher alphabets which included other symbols as well as letters. Such substitution ciphers are called monoalphabetic substitution ciphers, which can contain letters and symbols. Arab scholars studying the Koran and the Hadith used frequency of words to try and decide which statements were actually attributable to the prophet Muhammad, and also noticed that some letters occurred more frequently than others. This frequency varies from language to language (as anyone who has played Scrabble with a foreign set will know!). So, if the language of the enciphered text is known, then cryptanalysts will first try substituting the most frequently occurring letter in the text by the most frequently occuring letter in the language, and then the second, and so on. If substituting the most frequent letter in the language does not make sense, then the second most frequent is tried, and so on. This is far quicker than trying combinations at random.
The following list gives the approximate relative frequency of letters in the English language, in descending order:
e, t, a, o, i, n, s, h, r, d, l, c, u, m, w, f, g, y, p, b, v, k, j, x, q, z
Of course, a small message may be a poor sample unrepresentative of the language as a whole (for example "she sells sea shells on the sea shore" produces a decreasing frequency of s, e, h, l, a, o, n, t, r), but this method can yield fast results. The frequency of letters can differ within the same language, though. For example, Americans use more z's and fewer s's than the English (ending words in -ize instead of -ise) and they also use fewer u's (e.g. color instead of colour). I shall be referring to English as spoken in England throughout this site unless I specify otherwise.
Of course, one can go farther than just using the overall frequency of letters. For instance, only certain letters can appear as double letters, and some of those appear far more frequentrly than others (for example, double a is rare, but appears in aardvark and Baal, whereas double e (seen), t (letter), f (offer), m (common), s (pass), ll and oo (balloon) are quite common.
Some letters (particularly vowels) can appear next to virtually any other letter (e for example), whereas others are far more fussy (for instance q appears almost always in front of u, and h is far more likely to come before e than after it). Another thing to watch out for (where the letters are still in their original groupings) is single letter words. These are most likely to be I or a. Two and three letter words are also fairly easy to identify. Cryptographers are therefore advised to remove the spaces from their text, to make life harder for cryptanalysts. Every letter behaves a little differently, though, so cryptographers stand a good chance of solving most longer texts.
Once you have got a few letters, then you can start guessing words to get more letters. However, you should also fill in a table of the cipher alphabet against the plain alphabet, to see if it becomes obvious that the cryptographer has used a shift cipher or a keyphrase. Note that the keyphrase will not always start at the beginning.
Making Monoalphabetic Substitution Ciphers harder to crack
The frequency of letters are the major weakness of monoalphabetic substitution ciphers. There are several ways to get around this.
Nulls
Instead of using just 26 symbols, one for each letter, use more. The additional symbols should be inserted into the enciphered text at varying frequencies, but you and the recipient will know that they are blanks, and do not actually represent a letter from the original.
Misspelling
Use deliberately wrong spelling in the message before coding it, so long as the receipient will still be able to work out what you mean.
Code Words
Replace certain key words by a another word or symbol (for instance replace submarine by ~, airplane by , truck by C, ship by $ and so on). Another symbol can be used to denote that the following letter is doubled. The problems with coding are working out codes in advance for everything you might need, and the difficulty and insecurity of carrying around a large codebook.
Nomenclatures
This system of encryption uses a few code words as well as a cipher alphabet.
