Substituting letter pairs (also known as bigrams or digraphs) is an encryption method invented in the 16th century. Can you break a new challenge I have made?

The bigram substitution is a manual encryption method with a history of over 500 years. A bigram (also known as a digraph) is a pair of letters, such as CG, HE, JS or QW. The number of bigrams in the Latin alphabet is 26×26=676, ranging from AA to ZZ. A bigram substitution replaces each letter pair with another one (or with a symbol or with a number between 1 and 676). In order to use a bigram substitution, we need a substitution table with 676 entries.


Porta’s bigram substitution

The oldest bigram substitution I am aware of is described in the book De Furtivis Literarum Notis written by 16th century cryptologist Giambattista della Porta. Porta uses a 20 letter alphabet. He therefore needs a substitution table with 400 entries. Here it is:


As can be seen, Porta substitutes each letter pair with a symbol. He had to be quite inventive to come up with 400 different symbols. For instance, the bigram IA is replaced with a symbol that looks like an X. The bigram VO is substituted with something resembling an O. Here’s a ciphertext Porta provides in his book (the solution is available here).



Vigenère’s bigram substitution

Blaise de Vigenère invented a bigram substitution, too. Here’s his table:


Vigenère replaces each bigram with a single letter or a letter followed by a dot, colon or semicolon. E.g., LM is substituted with “r.”.


RSHA bigram substitution

The following bigram substitution, which is described in David Kahn’s book The Codebreakers, was used by the Nazi authority Reichssicherheitshauptamt (RSHA):


Two challenges

It is clear that a bigram substitution can be broken with bigram frequency analysis. Here are the most frequent English bigrams (according to Wikipedia):

th 1.52       en 0.55       ng 0.18
he 1.28       ed 0.53       of 0.16
in 0.94       to 0.52       al 0.09
er 0.94       it 0.50       de 0.09
an 0.82       ou 0.50       se 0.08
re 0.68       ea 0.47       le 0.08
nd 0.63       hi 0.46       sa 0.06
at 0.59       is 0.46       si 0.05
on 0.57       or 0.43       ar 0.04
nt 0.56       ti 0.34       ve 0.04
ha 0.56       as 0.33       ra 0.04
es 0.56       te 0.27       ld 0.02
st 0.55       et 0.19       ur 0.02

An even more powerful method is hill climbing. As far as I can tell, hill cimbing is the best approach to attack a bigram substitution. The best way to implement the fitness function of a bigram hill climber is probably to use tetragram (letter four-tuple) frequencies or some similar means.

However, frequency analysis and hill climbing will only be successful if there is enough material to analyze, i.e., if the ciphertext is long enough. On the other hand, it doesn’t make much sense to use a bigram substitution for a ciphertext of, say, 400 letters, as dealing with a substitution table containing 676 entries is more complicated than with a 400 letter One Time Pad key.

For this reason, the bigram substitution seems to be especially interesting for messages that contain between, say, 1000 and 5000 letters. The main question is if the bigram substitution is secure enough for messages of such a length. Not much has been published about this question in the literature. So, two years ago, I decided to go a first step in finding the answer. For this purpose, I took two messages – one with 2500 and one with 5000 letters – and encrypted them. Subsequently, I published the two ciphertexts as challenges on my blog.

As usual, my readers solved both challenges within a few days. However, this time things proved a little more difficult than in most other cases. Blog reader Norbert Biermann found the solution of the 5000 letters version – still with a few mistakes – using hill climbing. Thomas Ernst published a few interesting word pattern considerations. Then Norbert provided a second, more sophisticated hillclimbing result, which was almost error-free. Finally, Armin Krauß published the correct solution.

After the solution of the 5000 letter challenge had proven quite difficult, I expected that the 2500 letter ciphertext would not be solved so soon. However, I was wrong. Only a few days later, Norbert Biermann published the correct solution of the 2500 letter challenge, which he again had found with his hill climber. To my knowledge, this success still represents the world record in breaking bigram substitutions.


The bigram 1346 challenge

Two years after Norbert’s record, it is about time to start a new bigram challenge. This time, I took an English text constisting of 1346 letters as plaintext. Contrary to the last time, I didn’t replace bigrams with numbers, but bigrams with bigrams. For this reason, the ciphertext consists of letters, which have to read pair-wise. Here it is:


Can a reader break this challenge? If so, he or she will set a new record.

Further reading: Can you solve this Cold War encryption challenge?


Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Kommentare (8)

  1. #1 Norbert
    14. Juli 2019

    @Klaus: Are you sure about 1374 letters? The ciphertext seems to consist of only 1346 letters.

  2. #2 Thomas
    14. Juli 2019

    I can see only 61 letters.

  3. #3 Gerd
    14. Juli 2019

    Thomas, the text is in one line, and only 61 letters are visible. Try to select it with the mouse and do a copy & paste that will give the whole ciphertext.

  4. #4 Thomas
    14. Juli 2019


    Thanks, now the ciphertext is visible on my tablet. I wonder why Klaus has put the text in one line.

  5. #5 Klaus Schmeh
    15. Juli 2019

    Sorry for the confusion. The ciphertext consists of only 1346 letters. I changed this and added line breaks to the ciphertext.

  6. #6 Norbert
    10. August 2019

    The Catharina was a British passenger ship that sank in the southern Atlantic Ocean in nineteen fifteen after colliding with fishing boat. Of the over two thousand passengers and crew aboard, more than one thousand five hundred died. The Catharina carried some of the wealthiest people in the world, as well as emigrants from Europe who were seeking a new life in America. The first-class accommodation was designed to be the pinnacle of comfort and luxury, with an on-board fitness center, swimming pool, libraries, high-class restaurants and opulent rooms. Although Catharina had safety features such as watertight compartments and remotely activated watertight doors, it only carried enough lifeboats for a thousand people – about half the number on board. On fifteen September the Catharina hit an fishing boat. Just under two hours after Catharina sank, the freight liner Tun??? [Tundra?] arrived and brought aboard an estimated thousand survivors. The disaster was met with world-wide shock. Public inquiries in France and Canada led to major improvements in maritime safety. Additionally, several new wireless regulations were passed around the world in an effort to learn from the many missteps in wireless communications. The wreck of Catharina was discovered in in nineteen ninety-five. The ship was split in two and is gradually disintegrating at a depth of almost four kilometers. Thousands of artefacts have been recovered and displayed at museums around the world. Catharina has become one of the most famous ships in history. Catharina is the second largest ocean liner wreck in the world, only beaten by her sister RMS Bellinda.

  7. #7 Thomas
    11. August 2019


    Great job! Congratulations!

  8. #8 Marc
    12. August 2019