Substituting letter pairs (bigrams) is an encryption method that was already known in the 16th century. Is it still secure today?
To develop an encryption method that only requires pen and paper is a difficult task. Many ciphers of this kind, e.g. the Caesar cipher, the Vigenère cipher or the Playfair cipher have proven insecure. Among the best concepts known to me are the double columnar transposition and the One Time Pad. Book ciphers, encryption with a Rubik’s cube and encryption with a card deck (Solitaire) are interesting methods, too, as they only require unsuspicious everyday objects.
There’s another encryption method which only requires pen and paper but is still not easy to break: the bigram substitution. A bigram is a letter pair, e.g., CG, HE, JS or QW. The number of letter bigrams in the Latin alphabet is 26×26=676, ranging from AA to ZZ. A bigram substitution replaces each letter pair with another one (or with a symbol or with a number between 1 and 676). In order to use a bigram substitution, we need a substitution table with 676 entries.
Porta’s bigram substitution
The oldest bigram substitution I am a ware of is described in the marvelous book De Furtivis Literarum Notis written by 16th century cryptologist Giambattista della Porta. Porta uses a 20 letter alphabet. He therefore needs a substitution table with 400 entries. Here it is:
As can be seen, Porta substitutes each letter pair with a symbol. He must have been quite inventive to come up with 400 different symbols. For instance, the bigram IA is replaced with a symbol that looks like an X. The bigram VO is substituted with something resembling an O. Here’s a ciphertext Porta mentions in his book (the solution is available here).
Vigenère’s bigram substitution
Blaise de Vigenère invented a bigram substitution, too. Here is his table:
Vigenère replaces each bigram with a single letter or a letter followed by a dot, colon or semicolon. E.g., LM is substituted with “r.”.
RSHA bigram substitution
The following bigram substitution, which is described in David Kahn’s book The Codebreakers, was used by the Nazi authority Reichssicherheitshauptamt (RSHA):
How secure is a bigram substution?
It is clear that a bigram substitution can be broken with a bigram frequency analysis. Here are the most frequent English bigrams (according to Wikipedia):
th 1.52 en 0.55 ng 0.18 he 1.28 ed 0.53 of 0.16 in 0.94 to 0.52 al 0.09 er 0.94 it 0.50 de 0.09 an 0.82 ou 0.50 se 0.08 re 0.68 ea 0.47 le 0.08 nd 0.63 hi 0.46 sa 0.06 at 0.59 is 0.46 si 0.05 on 0.57 or 0.43 ar 0.04 nt 0.56 ti 0.34 ve 0.04 ha 0.56 as 0.33 ra 0.04 es 0.56 te 0.27 ld 0.02 st 0.55 et 0.19 ur 0.02
However, a frequency analysis will not be successful if the ciphertext is short enough. On the other hand, it doesn’t make much sense to use a bigram substitution for a ciphertext of, say, 400 letters because the One Time Pad is more efficient in such a case (dealing with a 400 letter One Time Pad key is certainly easier than using a substitution table with 676 entries).
For this reason, the bigram substitution is especially interesting for messages that contain between, say, 1500 and 5000 letters. The main question is if the bigram substitution is secure enough for messages of such a length or if a frequency analysis can break it. In order to answer this question, I have created two bigram challenges, one with a 2500 letter message and one with a 5000 letter message.