Magnus Ekhall from Sweden has solved my Playfair challenge from September 2019. With only 28 letters, this ciphertext is the shortest of its kind ever broken.
The Playfair cipher, which was invented by Charles Wheatstone in the 19th century, can be broken with hill climbing (enhanced with simulated annealing), provided that the ciphertext is long enough. Last year, George Lasry, …
… solved a Playfair cryptogram consisting of only 50 letters I had introduced on this blog. As far as I know, this success set a new world record in breaking a Playfair with a random matrix (US cryptanalyst Alf Monge broke a 30-letter Playfair in the 1930s, but this one was based on a matrix created with a keyword, which made things easier).
In order to check if it is possible to do better, I published a Playfair challenge based on a ciphertext with only 40 letters. This time, Nils Kopal solved it and improved the world record.
After Nils’ success, I decided to take a 30-letters plaintext for my next Playfair challenge. Here is the resulting ciphertext (it has 32 letters, because two additinal letters were added to avoid doublings and/or a single letter at the end):
This time it was Magnus Ekhall, who broke the challenge. Magnus is a Swedish computer engineer, working in software development. At the moment, his main interest is with the Turing bombes and the US Navy bombes.
The plaintext Magnus found was the following:
The first X was included to avoid a doubled T, the second one to make the number of letters even. The original plaintext is: TAKE THE LAST TRAIN TO YORK ON SUNDAY
With breaking this challenge, Magnus had set a new world record.
How the Playfair works
The Playfair cipher substitutes letter pairs. So, the cleartext needs to be written as a sequence of letter pairs (the following cleartext is a Shakespeare quote taken from Robert Thouless’ life-after-death experiment):
BA LM OF HU RT MI ND SG RE AT NA TU RE SS EC ON DC OU RS EC HI EF NO UR IS HE RI NL IF ES FE AS T
The Playfair cipher requires that no letter pair consist of two equal letters. Therefore, we add an X between the two Ss:
BA LM OF HU RT MI ND SG RE AT NA TU RE SX SE CO ND CO UR SE CH IE FN OU RI SH ER IN LI FE SF EA ST
If the number of letters in the cleartext is odd, another X needs to be added at the last position, but this is not the case here. Next, we set up a 5×5 matrix containing the letters of the alphabet in a random order (we identify the J with the I, in order to get a 25 letter alphabet):
S U R P I E A B C D F G H K L M N O Q T V W X Y Z
As you might have noticed, the letter order in the matrix has been derived with the keyword SURPRISE. It would be more secure to use a completely random order of the letters.
Now, we replace the cleartext letter pairs (BA, LM, OF, HU, …) according to the three Playfair rules. Here are the rules in a diagram:
Here are the same rules in text form (I refer to the letter pair to be replaced as XY):
- If X and Y are not in the same column and not in the same row (this is the most frequent case), form a rectangle and replace the two letters by the other two corner letters (the upper cleartext letter is replaced by the other upper letter in the rectangle, the lower cleartext letter by the lower one). For instance, LM becomes FT.
- If the two letters stand in the same row, each one is replaced by its right neighbor. Here, BA becomes CB.
- If the two letters stand in the same column, each one is replaced by its lower neighbor. In our example, AN becomes GW.
When we apply the Playfair rules on our cleartext with our 5×5 matrix, we get the following ciphertext:
CB FT MH GR IO TS TA UF SB DN WG NI SB RV EF BQ TA BQ RP EF BK SD GM NR PS RF BS UT TD MF EM AB IM
A new Playfair challenge
After Magnus had successfully attacked my 32-letters cryptogram, I decided to take a 28-letters cipherttext for my next challenge. Here’s the ciphertext CrypTool 2 delivered:
ZX LS EW HC HU CE LQ OE PN YR IW YC VQ LS
On Monday this week, I received another mail from Magnus. He wrote:
I’ve been looking into the Playfair challenge with 28 letters for a while. It is quite difficult: there are so many results that almost make sense, but very few that form complete and correct sentences. In any case, I have found one result which is so good that I at least would like to check whether it is correct or not (the key is TNPXIZUVSYWEFGHBOMALKCDQR):
STAY WHERE YOU ARE UNTIL THURSDAY
Magnus’s solution was correct. The Playfair world record was broken once again.
Thankfully, Magnus provided me a report on how he had broken the Playfair challenge. Here it is:
Earlier this year I managed to find the solution to the previous Playfair challenge which had 32 letters of cryptotext. I used then a combination of simulated annealing, with restarts and also tried to avoid falling into the same local maximums by remembering parts of the solution space that I had visited earlier. The scoring function was based on 5-grams.
When I started to have a look at this challenge, I noticed that with four letters less to work with, just 28, it became considerably more difficult. I print all cleartexts that score over a certain limit. The limit is set based on analysis of applying the scoring function on a large amount of English clear text, but still low enough so that I would not drown in output.
I moved from using 5-grams to 6-grams even though it would use a bit more memory and make the algorithm a bit slower.
With this setup I expect that only part of the correct clear text gets printed in the output: the optimisation can sometimes be “too” good.
I let this algorithm run for about 24 hours which resulted in about 100000 lines of output to sift through. Many of the output lines contains nothing but proper English words. Most of those does not form any sentences that makes sense.
I tried different techniques to sift through this and help me find output lines that seemed to make more sense, grammatically. By processing the English part of Project Gutenberg I built up a bi-gram table of words – all sequences of two words in all sentences.
With the word-bigrams I made another scoring function based on a non-overlapping version of the Aho-Corasick algorithm. This scoring function was used on the already generated output, just to sort it so that the log lines which contained most two-word bi-grams would come out on top.
This way I found a promising solution, about 2000 lines from the top:
I guessed that this was close to the solution and that the correct word in the beginning should be STAY instead of AWAY.
as a crib, the program quickly confirmed my guess and gave the correct solution:
As mentioned above, with 28 letters it is difficult to differentiate between the correct solution and false solutions which also consists of English words. I also feel that my method of solving this challenge should perhaps more correctly be named “brute force with simulated annealing support” rather than anything else. I have read many, many lines of gibberish lately …
Thanks Magnus! I hope it wasn’t too much gibberish you had to read. I can only congratulate you on this marvellous success!
Again, I’m overwhelmed by my reader’s skills in breaking difficult cryptograms. I’m proud that my challenges stimulate so much valuable research. Thanks to Magnus and all other codebreakers among my readers.
Further reading: A mird in the hand is worth two in the mush: Solving ciphers with Hill Climbing