The year 2018 has its first alleged Voynich Manuscript solution. This time, two researchers say that Hebrew is the language the enigmatic book was written in. What’s behind this new hypothesis?

To be honest, I don’t know how many solutions of the Voynich Manusript have been published over the last decades. There must be at least 50, maybe even more.


A new solution?

According to reports by Fox News, The Daily Mail and others, yet another Voynich Manusript solution (or at least a solution approach) has been put forward recently (thanks to blog reader George Keller for the hint). Here are the most important facts about it:

  • Who? The new alleged solution stems from Professor Greg Kondrak and graduate student Bradley Hauer from the University of Alberta, Canada. Both are into computer science with a focus on NLP (no, this is not Neuro-linguistic Programming, but Natural Language Processing). This background gives me hope that their work is not complete crap.
  • What? The two researchers say that the manuscript was written in Hebrew. I don’t know if this is a new hypothesis. Others have claimed that the language underlying this mysterious text is Latin, Greek, English, German, Italian, Armenian or Arabic – just to name a few.
  • Where was it published? As mentioned above, there are a number of press reports about Kondrak’s and Hauer’s solution. Luckily, there’s also a scientific publication. The two presented their research at the Association for Computational Linguistics Conference 2017. Their paper “Decoding Anagrammed Texts Written in an Unknown Language and Script” appeared in Transactions of the Association for Computational Linguistics (Volume 4, Issue 1).


What Kondrak and Hauer really did

To be fair, Kondrak and Hauer don’t claim to have solved the Voynich Manuscript (the Fox News headline “15th-century manuscript with ‘alien’ characters finally decoded” is therefore nonsense). What they did is well described in the abstract of their paper:

Algorithmic decipherment is a prime example of a truly unsupervised problem. The first step in the decipherment process is the identification of the encrypted language. We propose three methods for determining the source language of a document enciphered with a monoalphabetic substitution cipher. The best method achieves 97% accuracy on 380 languages. We then present an approach to decoding anagrammed substitution ciphers, in which the letters within words have been arbitrarily transposed. It obtains the average decryption word accuracy of 93% on a set of 50 ciphertexts in 5 languages. Finally, we report  the results on the Voynich manuscript, an unsolved fifteenth century cipher, which suggest Hebrew as the language of the document.

In fact, algorithmic decipherment (i.e., letting a computer break an encrypted text without a human interfering) is a very interesting topic. In the scientific magazine, Cryptologia a number of articles have been published about it (referred to as “automated cryptoanalysis”). As described on this blog before, Hill Climbing has been used for this purpose with great success.

Before Kondrak and Hauer published the paper mentioned above, they co-authored a scientific article about algorithmic decipherment of mono-alphabetical substitution ciphers (MASCs). I haven’t read it yet, but it looks quite interesting. As can be read in the abstract above, their current paper improves their algorithmic decipherment techniques by introducing new methods for determining the cleartext language.

In the last chapter of their paper, Kondrak and Hauer apply their solution method to the Voynich Manuscript. This experiment can only be successful if the Voynich Manuscript was encrypted with a MASC – which is far from clear. At least, Kondrak’s and Hauer’s method delivers a result: Hebrew is the language that fits best. The first sentence of the manuscript might be:

She made recommendations to the priest, man of the house and me and people.


Serious research, but not a solution

Sub-chapter 5.4 of Kondrak’s and Hauer’s paper is titled “Decipherment Experiments”. This headline exactly describes what is going on here. Two comutational linguists ask themselves what happens if the text in the Voynich Manuscript is treated as a MASC encryption in an unknown language and fed to a MASC solving program. One of the conclusions given in the paper reads as follows: “[Our work] can only be a starting point for scholars that are well-versed in the given language and historical period.” In other words: Don’t trust this “solution”, it’s only experimental.


All in all, it should be clear: Kondrak’s and Hauer’s work should not be confused with the dozens of useless Voynich Manuscript solutions that have been proposed in the past. Instead, it is a piece of serious research on algorithmic decipherment, enhanced with a nice experiment, which should not be misunderstood as the definitive way to decipher the manuscript.

I hope, we will see Greg Kondrak and Bradley Hauer at crypto history conferences in the near future.

Further reading: A test for checking whether a Voynich Manuscript solution is correct

Kommentare (10)

  1. #1 David Wilson
    28. Januar 2018

    Hebrew is written right to left. Isn’t Voynich written left to right?

  2. #2 Omnivor
    Am 'Nordpol' von NRW
    28. Januar 2018

    Mirror writing, as second step of encryption?

  3. #3 Klaus Schmeh
    28. Januar 2018

    Bart Wenmeckers via Facebook:
    Good to see some positive research result rather than bold solved claims. I wish the two authors well.
    The voynich and to a lesser extent zodiac are tarred with bogus solve claims.

  4. #4 Klaus Schmeh
    28. Januar 2018

    Bart Wenmeckers via Facebook:
    John Reade posted an interesting article on linguistic programming

  5. #5 Jürgen Hermes
    28. Januar 2018

    Wrote some thoughts about the approach into my blog (in German, sorry): https://texperimentales.hypotheses.org/2396

  6. #6 Nikolai
    29. Januar 2018

    Good day!
    There is a key to cipher the Voynich manuscript. The manuscript was not written in Hebrew.
    The key to the cipher manuscript placed in the manuscript. It is placed throughout the text. Part of the key hints is placed on the sheet 14. With her help was able to translate a few dozen words that are completely relevant to the theme sections.
    The Voynich manuscript is not written with letters. It is written in signs. Characters replace the letters of the alphabet one of the ancient language. Moreover, in the text there are 2 levels of encryption. I figured out the key by which the first section could read the following words: hemp, wearing hemp; food, food (sheet 20 at the numbering on the Internet); to clean (gut), knowledge, perhaps the desire, to drink, sweet beverage (nectar), maturation (maturity), to consider, to believe (sheet 107); to drink; six; flourishing; increasing; intense; peas; sweet drink, nectar, etc. Is just the short words, 2-3 sign. To translate words with more than 2-3 characters requires knowledge of this ancient language. The fact that some symbols represent two letters. In the end, the word consisting of three characters can fit up to six letters. Three letters are superfluous. In the end, you need six characters to define the semantic word of three letters. Of course, without knowledge of this language make it very difficult even with a dictionary.
    If you are interested, I am ready to send more detailed information, including scans of pages showing the translated words.
    And most important. In the manuscript there is information about “the Holy Grail”.

  7. #7 Charlotte Auer
    29. Januar 2018

    Der Ansatz von Kondrak/Hauer mag ja aus der Sicht der Informatik und der computerbasierten Kryptologie sehr interessant sein (was ich nicht wirklich beurteilen kann), eine sinnvolle Möglichkeit, das VMs zu entschlüsseln stellt er jedenfalls nicht dar.

    Auch hier zeigt sich wieder einmal der grundlegende Unterschied zwischen Kryptologie und Kryptographie. Eine wesentliche Voraussetzung für eine Analyse per NLP wäre es, eine zweifelsfreie Transkription des VMs zu haben, aber eine solche gibt es nicht. Alle bisherigen Interpretationen der Schrift oder Voynich-Alphabete wie z.B. EVA weisen schwere Mängel auf, weil sie wesentliche paläografische Aspekte (z.B. Ligaturen, Abbreviaturen etc.) ausser Acht lassen und teilweise mit wirklich haarsträubender Willkür transkribiert wurden. Dies alleine genügt schon, um eine Dekodierung per Computer praktisch unmöglich zu machen, wobei ich hier eine ganze Reihe weiterer Gründe gar nicht erst aufzähle.

    Nun kann und muss man von Informatikern nicht verlangen, dass sie erst einmal jahrelang mittelalterliche Handschriften studieren, bevor sie ein solches Experiment wagen, aber ohne eine wissenschaftlich anerkannte Transkription des VMs geht es eben auch nicht. Vielleicht bringt es die Informatik weiter, die Voynich-Forschung ganz sicher nicht. Dass die dem Kodex zugrunde liegende Sprache auf keinen Fall Hebräisch sein kann (obwohl einzelne hebräische Zeichen damals gerne in Geheimschriften verwendet wurden), ließe sich detailliert begründen, würde aber hier den Rahmen sprengen.

  8. #8 Klaus Schmeh
    29. Januar 2018

    Gert Brantner via Facebook:
    the transcription, the transcription.. (ad nauseam)

  9. #9 Jürgen Hermes
    30. Januar 2018

    (me, on Twitter):
    If you have programmed an algorithm that is capable of classifying fruits. What would happen if you let it classify a football? Maybe it would say it’s a watermelon. Would you publish this result? And if a native speaker tells you that a sentence doesn’t make sense, will you ask Google Translate, because it might know better?

  10. #10 walim
    1. Februar 2018

    A great start in 2018. Last year we had only two rumors in media about claimed voynich solutions al all, if i remember correctly. And now already the first in january !