Today’s blog post is not about an unsolved cryptogram. Instead, I need the help from my readers to solve a mathematical problem. The solution might be helpful to break encryptions.
Breaking a simple cipher usually works best if one counts the letter frequencies or guesses words. Readers of this blog have used these techniques hundreds of times to solve encrypted postcards, diaries, letters and other texts.
However, there are cases, in which frequency analysis and word guessing alone don’t work. For example, the cigaret case cryptogram …
… looks like a simple cipher (with a simple cipher I mean monoalphabetic substitution, also known as MASC), but all efforts to break it have failed. The same is true for the seal cryptogram.
Of course, it is possible that these two cryptograms have withstood all breaking attempts because they are no MASCs at all. Nevertheless, I wouldn’t reject the MASC hypothesis before a few additional examinations have been made.
As the most common codebreaking methods don’t work for these two cryptograms, it makes sense to look out for less common ones. The book Cryptanalysis (1939) by Helen Fouché Gaines introduces several of these.
Two of these tools are based on so-called “contacts”. The contacts of a letter in a certain text are the letters that are located directly before or after it appears (without a space in between).
If we look at the following message …
RED AND GOLD ARE ROYAL COLOURS
… the letter A has the following contacts: N, R, Y, L. The contacts of D are E and L. E has the contacts R and D (duplicates are not counted).
The number of contacts (without counting duplicates) a letter has in a certain text is named the variety of contacts. In the example above A has a variety of contacts of 4. The veariety of contacts of E is 2.
In an ordinary English text (the same holds for all other common languages) some letters are known to have a low variety of contacts. For instance, the Q is especially often contacted by the U. Other letters tend to have a greater variety of contacts. Usually, vowels have a higher variety of contacts than consonants.
As Fouché Gaines writes, in an (English) text of about 100 letters, most vowels “step up”. This means that the variety of contacts is higher than the number of appearances (in our example message the A steps up, as it appears three times and has four contacts). Consonants usually step down in a text of this length.
All these facts about contacts can be very helpful for breaking difficult MASCs. I can therefore only recommend reading the respective chapters (IX and X) of Fouché Gaines’ book. It is available online.
What Fouché Gaines does not describe is a measure for the contact variety. Counting the contacts alone is not enough, as this number is dependent from the length of the text. A number like contacts per 100 letters is useless, as the number of contacts is not proportional to the text length.
The information that a letter steps up in a cryptogram of 100 letters is worthless for a longer text (in a very long textevery letter steps down, as the number of appearances grows linearly with the text length, while the variety of contacts never exceeds 26).
So, what we need is a formula for the contact variety of a letter in a certain text. The result of this formula must be (statistically) independent from the text length. It must return a high value, if the variety of contacts is high, otherwise it must return a low value.
Once we have such a formula, we can apply it in different ways:
- We can calculate the contact variety measure for all letters in a typical English text.
- We can calculate the contact variety measure of left and right contacts in a typical English text separately.
- We can calculate the contact variety measures for texts in other languages.
All these values can be compared with the variety measures of an encrypted text. This might help to break it.
How can a formula for the contact variety look like? Any suggestions are welcome.
Further reading: Bigram substitution: An old and simple encryption algorithm that is hard to break