Blog reader Norbert Biermann has recently solved a bigram substitution ciphertext consisting of 1346 letters – the shortest one ever broken. Here’s a 1000-letter ciphertext of the same kind.
The bigram substitution is a manual encryption method with a history of over 500 years. A bigram (also known as a digraph) is a pair of letters, such as CG, HE, JS or QW. The number of bigrams in the Latin alphabet is 26×26=676, ranging from AA to ZZ. A bigram substitution replaces each letter pair with another one (or with a symbol or with a number between 1 and 676). In order to use a bigram substitution, we need a substitution table with 676 entries.
Porta’s bigram substitution
The oldest bigram substitution I am aware of is described in the book De Furtivis Literarum Notis written by 16th century cryptologist Giambattista della Porta. Porta uses a 20 letter alphabet. He therefore needs a substitution table with 400 entries. Here it is:
As can be seen, Porta substituted each letter pair with a symbol. He had to be quite inventive to come up with 400 different symbols. For instance, the bigram IA is replaced with a symbol that looks like an X. The bigram VO is substituted with something resembling an O.
Vigenère’s bigram substitution
Blaise de Vigenère invented a bigram substitution, too. Here’s his table:
Vigenère replaces each bigram with a single letter or a letter followed by a dot, colon or semicolon. E.g., LM is substituted with “r.”.
RSHA bigram substitution
The following bigram substitution, which is described in David Kahn’s book The Codebreakers, was used by the Nazi authority Reichssicherheitshauptamt (RSHA):
As far as I can tell, hill cimbing is the best approach to attack a bigram substitution. The best way to implement the fitness function of a bigram hill climber is probably to use hexagram (letter six-tuple) frequencies or some similar means.
However, hill climbing will only be successful if there is enough material to analyze, i.e., if the ciphertext is long enough. But how much ciphertext is necessary to break a bigram substitution? Not much has been published about this question in the literature. Three years ago, I decided to go a first step in finding the answer. For this purpose, I took two messages – one with 2500 and one with 5000 letters – and encrypted them. Subsequently, I published the two ciphertexts as challenges on my blog.
As usual, my readers solved both challenges within a few days. However, this time things proved a little more difficult than in most cases. Blog reader Norbert Biermann found the solution of the 5000 letters version – still with a few mistakes – using hill climbing. Thomas Ernst published a few interesting word pattern considerations. Then Norbert provided a second, more sophisticated hillclimbing result, which was almost error-free. Finally, Armin Krauß published the correct solution.
After the solution of the 5000 letter challenge had proven quite difficult, I expected that the 2500 letter ciphertext would not be solved so soon. However, I was wrong. Only a few days later, Norbert Biermann published the correct solution of the 2500 letter challenge, which he again had found with his hill climber. To my knowledge, this success represented the world record in breaking bigram substitutions.
The bigram 1346 challenge
Two years after Norbert’s record, I published another bigram challenge on this blog. This time, I took an English text constisting of 1346 letters as plaintext, calling it Bigram 1346 challenge. Contrary to last time, I didn’t replace bigrams with numbers but with other bigrams. For this reason, the ciphertext consisted of letters, which had to be read pair-wise. Here’s the challenge:
In August 2019, Norbert Biermann published the solution of the bigram 1346 challenge as a comment on my blog. With this success, Norbert set a new world record for the shortest bigram ciphertext ever broken. To my knowledge, this record is still valid today.
The Bigram 1000 Challenge
After Norbert had broken the bigram 1346 message, I decided to create a new, even shorter challenge. This time, I took a message with exactly 1000 letters. I encrypted it in the same way as the bigram 1346 challenge plaintext, calling it Bigram 1000 Challenge. Here’s the ciphertext: