Substituting letter pairs (also known as bigrams or digraphs) is an encryption method invented in the 16th century. Can you break a new challenge I have made?
The bigram substitution is a manual encryption method with a history of over 500 years. A bigram (also known as a digraph) is a pair of letters, such as CG, HE, JS or QW. The number of bigrams in the Latin alphabet is 26×26=676, ranging from AA to ZZ. A bigram substitution replaces each letter pair with another one (or with a symbol or with a number between 1 and 676). In order to use a bigram substitution, we need a substitution table with 676 entries.
Porta’s bigram substitution
The oldest bigram substitution I am aware of is described in the book De Furtivis Literarum Notis written by 16th century cryptologist Giambattista della Porta. Porta uses a 20 letter alphabet. He therefore needs a substitution table with 400 entries. Here it is:
As can be seen, Porta substitutes each letter pair with a symbol. He had to be quite inventive to come up with 400 different symbols. For instance, the bigram IA is replaced with a symbol that looks like an X. The bigram VO is substituted with something resembling an O. Here’s a ciphertext Porta provides in his book (the solution is available here).
Vigenère’s bigram substitution
Blaise de Vigenère invented a bigram substitution, too. Here’s his table:
Vigenère replaces each bigram with a single letter or a letter followed by a dot, colon or semicolon. E.g., LM is substituted with “r.”.
RSHA bigram substitution
The following bigram substitution, which is described in David Kahn’s book The Codebreakers, was used by the Nazi authority Reichssicherheitshauptamt (RSHA):
It is clear that a bigram substitution can be broken with bigram frequency analysis. Here are the most frequent English bigrams (according to Wikipedia):
th 1.52 en 0.55 ng 0.18 he 1.28 ed 0.53 of 0.16 in 0.94 to 0.52 al 0.09 er 0.94 it 0.50 de 0.09 an 0.82 ou 0.50 se 0.08 re 0.68 ea 0.47 le 0.08 nd 0.63 hi 0.46 sa 0.06 at 0.59 is 0.46 si 0.05 on 0.57 or 0.43 ar 0.04 nt 0.56 ti 0.34 ve 0.04 ha 0.56 as 0.33 ra 0.04 es 0.56 te 0.27 ld 0.02 st 0.55 et 0.19 ur 0.02
An even more powerful method is hill climbing. As far as I can tell, hill cimbing is the best approach to attack a bigram substitution. The best way to implement the fitness function of a bigram hill climber is probably to use tetragram (letter four-tuple) frequencies or some similar means.
However, frequency analysis and hill climbing will only be successful if there is enough material to analyze, i.e., if the ciphertext is long enough. On the other hand, it doesn’t make much sense to use a bigram substitution for a ciphertext of, say, 400 letters, as dealing with a substitution table containing 676 entries is more complicated than with a 400 letter One Time Pad key.
For this reason, the bigram substitution seems to be especially interesting for messages that contain between, say, 1000 and 5000 letters. The main question is if the bigram substitution is secure enough for messages of such a length. Not much has been published about this question in the literature. So, two years ago, I decided to go a first step in finding the answer. For this purpose, I took two messages – one with 2500 and one with 5000 letters – and encrypted them. Subsequently, I published the two ciphertexts as challenges on my blog.
As usual, my readers solved both challenges within a few days. However, this time things proved a little more difficult than in most other cases. Blog reader Norbert Biermann found the solution of the 5000 letters version – still with a few mistakes – using hill climbing. Thomas Ernst published a few interesting word pattern considerations. Then Norbert provided a second, more sophisticated hillclimbing result, which was almost error-free. Finally, Armin Krauß published the correct solution.
After the solution of the 5000 letter challenge had proven quite difficult, I expected that the 2500 letter ciphertext would not be solved so soon. However, I was wrong. Only a few days later, Norbert Biermann published the correct solution of the 2500 letter challenge, which he again had found with his hill climber. To my knowledge, this success still represents the world record in breaking bigram substitutions.
The bigram 1346 challenge
Two years after Norbert’s record, it is about time to start a new bigram challenge. This time, I took an English text constisting of 1346 letters as plaintext. Contrary to the last time, I didn’t replace bigrams with numbers, but bigrams with bigrams. For this reason, the ciphertext consists of letters, which have to read pair-wise. Here it is: