# Can you break this WW2 spy cipher?

A recent book mentions a cipher used by a German spy during the Second World War. Can you break a message I encrypted with this method?

George Lasry, …

… a world-class codebreaker from Israel and reader of this blog, has made me aware of a book published in 2018: Codebreaker: The untold story of Richard Hayes by Marc McMenamin. This book is about Richard Hayes, a gifted polymath and cryptographer from Ireland, who was drafted by Irish intelligence services to track the movements of German spy Hermann Görtz.

### The cipher

The book contains a scan of the original cipher description Görtz used. The cipher is explained with the following plaintext:

THE SIXTH AMERICAN TANK DIVISION HAS UNBARKED X

The cipher requires a key word or key phrase. The description uses a phrase: THE NORWEGIAN COAST

In the first step, the key phrase is used to construct a 5×5 matrix (write the key phrase line-wise; if the same letter appears twice, omit the second one; fill the rest of the matrix with the remaining letters of the alphabet; the letter J is ommitted because it is identified with I).

```THENO
RWGIA
CSBDF
KLMPQ
UVXYZ```

Based on this matrix we construct a substitution table. It looks like this:

11=T, 12=H, 13=E, …

21=R, 22=W, …

With this substitution we encode the plaintext:

11 12 13    32 24 53 11 12    25 43 13 21 24 31 25 14    11 25 14 41    34 24 52 24 32 24 15 14    12 25 32    13 43 33 25 21 41 13 14    53

Next, we perform a columnar transposition. For this purpose we use the same password again (it would certainly be better to apply a new one now, but apparently the German cryptologists thought it was too complicated for a spy to remember two words or phrases). To prepare the transposition, we write the encoded message (11 12 13 32 …) row-wise below the passphrase:

```THENORWEGIANCOAST
11121332245311122
54313212431251411
25144134245224322
41514122532134333
252141131453```

Now we sort the table such that the top row is ordered alphabetically:

```AACEEGHINNOORSTTW
51112214231132123
14532443123121511
53214254424412223
23352513114413432
5  23154134 1 2 1```

In the last step we read out the ciphertext column-wise:

51525 1433 1523 13152 22423 24151 14515 43434 21411 32213 13444 1144 32111 2123 15242 2123 31321

Usually, the ciphertext is written in groups of five:

51525 14331 52313 15222 42324 15114 51543 43421 41132 21313 44411 44321 11212 31524 22123 31321

### The challenge

There are about a dozen German spy ciphers from World War 2 I am aware of (at the HistoCrypt 2017 I gave a presentation about German WW2 spy ciphers). This one is certainly one of the best (remember that a spy cipher had to be simple and that using a cipher tool, let alone an encryption machine, was prohibitive). Anyway, it is clear that this method is not unbreakable. To test the security of Görtz’s cipher, I have created a challenge. The following message is encrypted in exactly the way described above, but with a different key word or phrase:

11331 42451 31211 33233 11144 42433 55112 13455 24342 43342 32341 51242 12425 32511 42353 11554 53213 23512 43233 31521

Can you decipher this message?

Further reading: The Top 50 unsolved encrypted messages: 45. The World Record Challenge

## Kommentare (30)

1. #1 George Lasry
17. Januar 2019

English or German?

2. #2 George Lasry
17. Januar 2019

Can you also say something about the length of the transposition key?

3. #3 Thomas
17. Januar 2019

The only difference to the ADFGX cipher used in WWI is the use of an identical key word/phrase for both substitution and transposition. Thus the same algorithms that George applied in breaking ADFGVX ciphers should work.

4. #4 Klaus Schmeh
17. Januar 2019

@George: It’s English.

>Can you also say something
>about the length of the transposition key?
I took an expression with over 20 letters. A dictionary attack probably won’t work.

5. #5 Klaus Schmeh
17. Januar 2019

@Thomas: You’re right, this is pretty close to an ADFGX cipher.

6. #6 Magnus Ekhall
Borensberg
17. Januar 2019

When decoding such a message (given that you have the key), how would you do with the last line if it is shorter than the “matrix” rows?

Would such a message have been padded with nulls perhaps?

7. #7 Klaus Schmeh
17. Januar 2019

@Magnus:
>Would such a message have been padded with nulls perhaps?
No, this would make the cipher less secure.
Having an incomplete last line makes the decryption process more complex, but it still works. Before you decrypt you need to check which columns are shorter than the others.

8. #8 Thomas
18. Januar 2019

The first task is to find out the transposition key which is very tedious because the transposition effects a fractionation of the number pairs. I recommend George Lasry’s book “A Methodology for the Cryptanalysis of Classical Ciphers with Search Metaheuristics” (pdf: http://www.upress.uni-kassel.de/katalog/abstract.php?978-3-7376-0458-1, ) esp. the section on the ADFGVX cipher (pp 88-94). Since the methods devised by Painvin, Childs and Bauer deal with special cases, I cannot think of a pencil and paper attack. The best way might be a hill climing approach with a score function based on computing the indices of coincidence of pairs in adjacent columns as described by George in detail.

9. #9 George Lasry
18. Januar 2019

I am afraid the ciphertext might be way too short for an attack similar to the one on ADFGVX. It might be possible to take advantage of the shared key, but this is not clear. A dictionary attack using expressions extracted from a database of expressions might also work, but I suspect Klaus created his own unique expression 🙂

10. #10 Joe
Berlin
18. Januar 2019
11. #11 George Lasry
Givataim
18. Januar 2019

Klaus: Can we assume the keyword (or “key expression”) is no longer than 25 (otherwise, it is not clear how to fill the substitution key based on it).

In other words, the keyword length is > 20, and <= 25?

12. #12 George Lasry
Givataim
18. Januar 2019

One more question: is there a J in the key expression, and if yes, how should it handled when using the expression to create the transposition and substitution keys, e.g. just replaced by an I?

13. #13 Klaus Schmeh
18. Januar 2019

There is no J in the key. If there is one, it is treated like an I.

14. #14 George Lasry
Givataim
18. Januar 2019

Hill climbing does not work. This would require probably at least 800-1000 letters for a transposition key with 25 elements.

15. #15 George Lasry
19. Januar 2019

YESTERDAYSEVENBRITISHVESSELSLEFTDUBLINFORNEWCASTLE
The key:
WHATWEDIDONOURHOLIDAYS

Solved by scanning all English Wikipedia entries (from a 50GB dump) and testing all consecutive sets of words, which form an expression from 20 to 35 letters. Was found overnight, at the time it had covered 2% of the dump.

The expression was found here: https://en.wikipedia.org/wiki/1969_in_music

Seems to be the name of a song by a band called Fairport Convention.
@klaus – what this your inspiration?

All other attempts using hillclimbing, or methods similar to those I used for the ‘sibling’ of this cipher, ADFGVX, were ineffective. They could work only for much longer simulated samples.

From reading the relevant parts of the book Klaus mentioned, it is still not clear to me how it was historically solved.

16. #16 Thomas
19. Januar 2019

George:

Congratulations! Great job!

17. #17 Thomas
19. Januar 2019

According to the incomplete Google books excerpt, Hayes had found Görtz’s papers containing the key words during an interrogation. Maybe Klaus has the complete printed version at hand?

18. #18 George Lasry
19. Januar 2019

I bought the Amazon Kindle version, which, unfortunately, is not transferable (13\$ on the US site)

There are some conflicting fragments of stories in the text, some mentioning papers written by Görtz’s with encryption sheets, some mentioning a book, some hinting that the keywords were found, some others hinting that they were cryptanalytically recovered using frequency analysis. There is only one photo at the end of the book, written by Hayes, in which the solution of the substitution part is shown, using frequency analysis on a 200-something letters (twice in terms of digits).

But it is not really possible to understand what was the exact process without access to the primary sources. A good reason to visit Dublin 🙂

19. #19 George Lasry
19. Januar 2019

@Thomas – thanks!

20. #20 Klaus Schmeh
19. Januar 2019

@George: Congratulations! I’m once again impressed!
Yes, the key WHAT WE DID ON OUR HOLIDAYS is the title of an album by Fairport Convention, one of my favourite rock bands.
https://en.wikipedia.org/wiki/What_We_Did_on_Our_Holidays
George’s work shows that Görtz’s cipher was quite secure. The dictionary attack George used requires a computer and therefore was not available back in WW2. Even with a computer at hand the cipher is probably difficult to break if the passphrase is long enough and not listed in a source like Wikipedia.

21. #21 George Lasry
19. Januar 2019

Thanks Klaus for creating such high-quality and fun challenges.

Regarding the security of the cipher, it is comparable to ADFGX, and probably easier, as the keyword is shared between substitution and transposition. With enough material, historical attacks like Painvin’s (similar beginnings or endings), or statistical attacks (Friedman, Rives) also work, but for a single message, they don’t (and neither does modern hill climbing).

What’s still puzzles me is that in some places in the book, it could be understood that Hayes was able to recover a keyword from a single message, which sounds doubtful. If this is really the case, that would be a really impressive and unique achievement, surpassing Painvin’s work.

22. #22 Thomas
20. Januar 2019

@George #18

As to the sheet “Hayes’ codebreaking notes…” shown in the book’s appendix: What I don’t understand is why he added the frequency sums of the columns and the rows (1. c. + 1.r., 2. c + 2. r. and so on, results: 85, 75, 102, 119, 55) and calculated the average frequencies of the digit pairs and their corresponding reversals ( column ” Arranged in pairs”). I don’t see why these values are necessary for solving the monoalphabetic substitution part.

23. #23 Max Baertl
20. Januar 2019

Under the link https://www.rte.ie/radio1/doconone/2017/1003/909437-richard-hayes-nazi-codebreaker/ are several picture connected with Richard Hayes, one of the pictures seems like it is the third page of Hermann Goertz’s cipher description and indicates that the numbers must be converted back to letters after the Transposition step before sending them. So the encryption process doesn‘t end after the transposition step.

24. #24 George Lasry
21. Januar 2019

@thomas – not clear either to me, although Max’s comment could be relevant here. Note that Friedman and his team also used to fill squares of frequencies with column and row summaries.

@max – very interesting! that would add a new stage to the cipher and make it even more difficult (or easy?). I have contacted the author and asked whether he has more material. Will be very interesting to learn more, on the cipher and on Hayes’s codebeaking.

25. #25 Thomas
21. Januar 2019

The cipher mentioned by Max Baertl (converting the fractionated digit pairs to letters with the Polybius square) was already described as “Fractionation combined with columnar transposition” by Friedman in the 1931 edition of “Advanced Military Cryptography” (ch. 58, pp. 72, 73, https://archive.org/details/41748809078800/page/n73). The only difference was Friedman used a 6×6 square.

@George
I suppose you refer to Fig. 64 on p. 153 in Friedman’s “Military Cryptanalysis”, part IV, concerning the ADFGVX- cipher. There the column and row sums help distinguish between initial and final components of the number pairs, i.e. in an very early stage of the codebreaking process. If Hayes’s sheet had only to do with breaking the substitution part after stripping off the column transposition, this (and the calculation of the arithmetical means between frequencies of digit pairs and their reversals) would be superfluous. Maybe Hayes had discovered a method to distinguish between first and second digits without heaps of messages at hand?

The webpage linked by Max shows another sheet with Roman numerals, up till now I don’t see whether this has anything to do with Görtz’s cipher.

26. #26 Max Baertl
21. Januar 2019

@Thomas
The sheet I mean has the headline: „The XGX cipher (decoding)“ and is the 4th pic in the second line. It contains the following cipher text:

UVUM CVCC VWLS LXNU YDDR
KOWE EPAN MRTR GOIW HBER

27. #27 Max Baertl
21. Januar 2019

as the 1st picture of the cipher description posted by Klaus has the headline „The XGX cipher (coding)“ it seems that there is a connection between the 2 sheets of the cipher description posted by Klaus and the sheet I found under the Link.

28. #28 Thomas
21. Januar 2019

@Max
I’ve seen the sheet “XGX cipher”, as you said, it shows that after the transposition the digit pairs are converted to letters. What I was refering to is the other image in your link showing a sheet with Roman numerals. Obviously Hayes’s checked out certain assumptions. Whether this sheet belonged to the solution of Görtz’s cipher as the sheet “Hayes’s codebreaking notes…” in the book, isn’t yet clear.

29. #29 Thomas
22. Januar 2019

I’m pretty sure that the other sheet (the first image in the lower row in Max’s link) belongs to the codebreaking sheet in the book: The latter shows the letter q in cell 45 of the Polybius square, this is the case shown on the right half of the sheet in the link (“q=45, v in the key”). Since v is part of the key in this case, the 5th row consists of u,w,x,y,z, as shown in the book’s sheet. The passage “of the four HIKM two are 4, probably KM” shows his assumption that row 4 of the square contained KM and Q. Hayes’s examination (sheet in Max’s link) was apparently based on the assumption that the transposition rectangle contained 14 columns (Arabic numerals). In the parts with the Roman numerals he assigned certain letters of the lower part of the Polybius square to certain columns to examine several cases. Thus he might have tried to sort the columns in order to solve the transposition. I wonder whether the Roman numerals i to vii were parts of a decision tree distinguishing between several cases or were assigned to seven cryptograms.
The reversal thing shown in the book’s sheet could have served for separating side from top coordinates in the bipartite cipher similar to Painvin’s method solving ADFGX (see Kahn, Codebreakers, p. 343).

30. #30 Thomas
22. Januar 2019

The “Reminiscences of Dr Hayes” as part of the “Ms 22,984/6″belonging to the “Richard G. Hayes Papers” (National Irish Library): https://m.imgur.com/a/GMBu2
So he had 14 or 15 messages at his disposal what might have enabled him to carry out extensive frequency calculations to align the rectangle columns according to monoalphabeticity. Unfortunately, the excerpt doesn’t show any of the calculation sheets that are most probably contained in the Hayes’s papers (http://catalogue.nli.ie/Collection/vtls000183471).