# Bigram substitution: An old and simple encryption algorithm that is hard to break

Substituting letter pairs (bigrams) is an encryption method that was already known in the 16th century. Is it still secure today?

To develop an encryption method that only requires pen and paper is a difficult task. Many ciphers of this kind, e.g. the Caesar cipher, the  Vigenère cipher or the Playfair cipher have proven insecure. Among the best concepts known to me are the double columnar transposition and the One Time Pad. Book ciphers, encryption with a Rubik’s cube and encryption with a card deck (Solitaire) are interesting methods, too, as they only require unsuspicious everyday objects.

There’s another encryption method which only requires pen and paper but is still not easy to break: the bigram substitution. A bigram is a letter pair, e.g., CG, HE, JS or QW. The number of letter bigrams in the Latin alphabet is 26×26=676, ranging from AA to ZZ. A bigram substitution replaces each letter pair with another one (or with a symbol or with a number between 1 and 676). In order to use a bigram substitution, we need a substitution table with 676 entries.

### Porta’s bigram substitution

The oldest bigram substitution I am a ware of is described in the marvelous book De Furtivis Literarum Notis written by 16th century cryptologist Giambattista della Porta. Porta uses a 20 letter alphabet. He therefore needs a substitution table with 400 entries. Here it is:

As can be seen, Porta substitutes each letter pair with a symbol. He must have been quite inventive to come up with 400 different symbols. For instance, the bigram IA is replaced with a symbol that looks like an X. The bigram VO is substituted with something resembling an O. Here’s a ciphertext Porta mentions in his book (the solution is available here).

### Vigenère’s bigram substitution

Blaise de Vigenère invented a bigram substitution, too. Here is his table:

Vigenère replaces each bigram with a single letter or a letter followed by a dot, colon or semicolon. E.g., LM is substituted with “r.”.

### RSHA bigram substitution

The following bigram substitution, which is described in David Kahn’s book The Codebreakers, was used by the Nazi authority Reichssicherheitshauptamt (RSHA):

### How secure is a bigram substution?

It is clear that a bigram substitution can be broken with a bigram frequency analysis. Here are the most frequent English bigrams (according to Wikipedia):

```th 1.52       en 0.55       ng 0.18
he 1.28       ed 0.53       of 0.16
in 0.94       to 0.52       al 0.09
er 0.94       it 0.50       de 0.09
an 0.82       ou 0.50       se 0.08
re 0.68       ea 0.47       le 0.08
nd 0.63       hi 0.46       sa 0.06
at 0.59       is 0.46       si 0.05
on 0.57       or 0.43       ar 0.04
nt 0.56       ti 0.34       ve 0.04
ha 0.56       as 0.33       ra 0.04
es 0.56       te 0.27       ld 0.02
st 0.55       et 0.19       ur 0.02```

However, a frequency analysis will not be successful if the ciphertext is short enough. On the other hand, it doesn’t make much sense to use a bigram substitution for a ciphertext of, say, 400 letters because the One Time Pad is more efficient in such a case (dealing with a 400 letter One Time Pad key is certainly easier than using a substitution table with 676 entries).

For this reason, the bigram substitution is especially interesting for messages that contain between, say, 1500 and 5000 letters. The main question is if the bigram substitution is secure enough for messages of such a length or if a frequency analysis can break it. In order to answer this question, I have created two bigram challenges, one with a 2500 letter message and one with a 5000 letter message.

## Kommentare (20)

1. #1 George Lasry
Israel
14. Februar 2017

A hill climber I wrote can solve simulated texts with at least 8000 bigrams (16000 plaintext monograms). It can find some barely readable fragments at 5000 (bigrams). For this challenge (2500 bigrams), I don’t think hill climbing would work, at least not the naive version. Another approach is needed.

2. #2 Thomas Ernst
Latrobe
14. Februar 2017

The “675” brings back memories. On the assumption that “q” doesn’t occur in your two cipher texts – although it certainly could – “quote”, “question”, et alia – you appear to be the first cryptographer to put into action Trithemius’ “Steganographia” III: improvedly so, i. e. by bigrams. In S III, Trithemius goes up to 700 / 725, and does not use the bottom rung 100 – 1; a peculiar steganographic obsession of his to start up late with numbers; idem in his numerical ciphers of the “Polygraphia”. – Am assuming that you gave the most frequent bigrams “th” and “he” two extra lines of substitution, i. e. additional homophones. – One of the computer gurus will have figured it out soon. But thank you for bringing JT S III back to life. Where he belongs!

3. #3 Thomas
14. Februar 2017

A count of number pairs would be more reliable than bigram frequencies. A simulated annealing algorithm swapping numbers combined with a tetragraph fitness function might be a good approach.

4. #4 Norbert
14. Februar 2017

I wrote a simulated annealing algorithm for the second challenge which apparently cracked it overnight, but … there remains much work of correcting and complementing “by hand” … Unless I’m very much mistaken the plaintext starts like this:

every time you login to a computer the operating system checks whether you

5. #5 Thomas
14. Februar 2017

@Norbert
Great!

Did you swap parts of the bigram table/rows/columns/single numbers?

6. #6 Norbert
14. Februar 2017

Must go to work now … I tried to complete the plaintext in a text editor, but in my hurry I made a mess out of it. I will come back to it tonight (unless someone has posted the complete plaintext by then).

@Thomas: Single numbers. I didn’t rely on random swaps (as there are too many possibilities) but made sure that every possible swap is tried several times per round. That actually made the difference, I assume.

7. #7 Thomas Ernst
Latrobe
14. Februar 2017

The sequence “131 191 496 481 209 238 218” occurs four times, the sequence “131 191 496 481 209 238” twice. Of English words that exist both in variations of 14 and 12 characters, unfortunately, there are many: “substitutional”, “substitution”; “intertextually”, “intertextual”. At least we may assume that 131 begins a word, and 218 denotes a nominal, adverbial or adjectival suffix. Alas – not enough for an algorithm …

8. #8 Thomas Ernst
Latrobe
14. Februar 2017

A few more repetitions in the “2500”: “205 221” (6), “266 565 646 189” (3), “602 387 602 214” (4), “131 191 496 481 209 238 218” (4), “131 191 496 481 209 238” (2), “290 626 7 193 249 46 169” (2), “290 626 7 193 249 281 (1). Which – should have mentioned it above – does not imply that the first bigram[m] of theses sequences constitutes the beginning of a word, since it can be an overlap of the last and first letter of two words, like “an[d t]he”. Still, it might be time to just head into the situation room …

9. #9 Norbert
14. Februar 2017

My head hurts … I decided that I won’t reach perfection regarding challenge #2 anyway, so this is my state of knowledge still written as bigrams. May a native speaker (and IT expert) correct my mistakes, fill the gaps and turn it into readable text 🙂

ev er yt im ey ou lo gi nt oa co mp ut er th eo pe ra ti ng sy st em ch ec ks wh et he ry ou ar ea na ut ho ri ze du se rw he ny ou wi th dr aw ca sh fr om an at am ac hi ne th em ac hi ne ch ec ks wh et he ry ou ar ea le gi ti ma te ac co un to wn er if yo ue nt er af ac to ry th eg at ek ee pe rc he ck sw he th er yo ua re an ap pr ov ed vi si to ro ft he fa ci li ty in ea ch of th es ec as es th er ei sc he ck to de te rm in et he au th en ti ci ty of th ec la im an da ut ho ri za ti on of th ea ct io nh ow ev er it is te ch ni ca ll ys uf fi ci en tt oc he ck th ev al id it yo ft he ri gh tt ol og in wi th dr aw ca sh or ac ce ss fa ci li ty wi th ou tr ev ea li ng yo ur id en ti ty te ch ni ca ll ys ta te di 095 an yo ft od ay sa ut he nt ic at io np ro ce ss es an id en ti ty of ap er so no rm ac hi ne is va li da te da lt ho ug hi ti sa ls op os si bl ec he ck on ly ap ar ti cu la ra tt ri bu te of an id en ti ty wi th ou td is cl os in gt he fu ll id en ti ty da ta va lu et yp ic al at tr ib ut es th at ca nb ev al id at ed ar ea ge ad dr es sd at aa dr es sp er mi ss io ns an dp ro cu ra ti on at tr ib ut eb as ed au th en ti ca ti on ha ss ev er al ad va nt ag es on ly pa rt ia li de nt it yd at ai sd is cl os ed an dl im it ed to sp ec if ic at tr ib ut es da ta pr ot ec ti on an at tr ib ut em ay ha ve av al id it yp er io dt ha ti sd if fe re nt fr om th eo ne of th ei de nt it ya na tt ri bu te ca nb ea ss er te dt oo rc al cu la te df ro ma 323 ea 173 ex is ti ng id en ti ty da ta an at tr ib ut ec an be ch an ge dw it ho ut ch an gi ng th ei de nt it yi ns om ec as es at tr ib ut es ca nt 478 ev al id at ed wi th th eu se of at tr ib ut es ta te me nt sf or ex am pl et he pr oc es so fa ge ve ri fi ca ti on us ua ll yd oe sn ot re qu ir et he sp ec if ic bi rt hd at ei ts el 207 ut ra th er ac al cu la te dv al ue ab ou tt he at tr ib ut es ta te me nt su ch as pe rs on is ov er ei gh tt ee na tt ri bu te sa re we ll su it ed fo rs ma rt ca rd au th en ti ca ti on es pe ci al ly in th ew 456 ld of ei dd oc um en ts fo ri ns ta nc et he pr oo fo fp er mi ss io no re nt it le me nt fo rd ri vi ng or pu rs ui ng ap ro fe ss io nd oc to rl aw ye ro rs im il ar ca nb ea ut he nt ic at ed pr oo fo fr eg is tr at io nf or pr oo fo fi ns ur an ce pr of es si on al or ga ni za ti on so rs im il ar ar ea dd it io na la sp ec ts of at tr ib ut ec al cu la ti on at tr ib ut es ha ve be en we ll kn ow nf or de ca de sa sa pa rt of at tr ib ut ec er ti fi ca te sa na tt ri bu te ce rt if ic at ea ss ta nd ar di ze di nt he 125 iv eo ni ne fr am ew or 244 an be th ou gh to fa sa di gi ta lc er ti fi ca te wi th ou ta pu bl ic ke yt he si gn at ur ec om es fr om ac er ti fi ca ti on au th or it yc ah ow ev er th is ki nd of at tr ib ut eh an dl in gd oe sn tp la ya ro le in mo de rn ei dc ar ds ys te ms on er ea so nf or th is is th at it is to oc om pl ic at ed to in 521 330 ea ca in ev er yc er ti fi ca te ge ne ra ti on in ad di ti on at tr ib ut ec er ti fi ca te sh av et 478 es to re do nt he ca rd du ri ng pe rs on al iz at io nw hi ch pr ev en ts th ep os si bi li ty of ch an gi ng th em du ri ng li fe ti me of th ed oc um en tf ur th er if ap ar ty is al lo we dt or ea dt he at tr ib ut ec er ti fi ca te al ls to re da tt ri bu te sa re di sc lo se dd ur in gt hi sp ro ce ss an dc an be ut il iz ed ev en ou ts id et he ca rd fo rt he se re as on sa tt ri bu te st od ay ar et yp ic al ly pr ov id ed by th ec ar di ts el ft he ym ay ei th er be si gn ed by th ec ar do rp ro vi de dt oe xt er na ls ys te ms vi aa na ut he nt ic at ed ch an ne la na tt ri bu te th at is pr ot ec te di ns uc ha wa yc an be re fe rr ed to as ac re de nt ia lb ec au se it so ri gi na li ty an da ut he nt ic it yc an be pr ov en cr ed en ti al sc re at ed in su ch aw ay in cl 006 eo 375 yt he ne ce ss ar ya tt ri bu te so ft he ca rd ho ld er an dw hi ch ex pl ic it pe rm is si on sa re 291 an te da ni nn ov at iv ew ay to ha nd le at tr ib ut es an dt oa ss ig np er mi ss io ns to th em ha sb ee ni mp le me nt ed in th eg er ma ni dc ar db ym ea ns of ad ed ic at ed ri gh ts ma na ge me nt an de nc od ed in se 154 de sc ri bi ng ce rt if ic at es pr es en to nt he ca rd ea ch at tr ib ut ec an on ly be re ad ou tb ya se rv ic ep ro vi de rw it ht he ex pl ic it co ns en to ft he ho ld er an dw it hd ed ic at ed pe rm is si on gi ve nb ya fe de ra lc af or th is pe rm is si on th es er vi ce pr ov id er ha st oa pp ly fo ra nd th ep er mi ss io nf or th eb us in es sc as ei sh an de do ve ri nf or mo fa ca rd ve ri fi ab le ce rt if ic at et ha ti sp ro ve nb yt he id ca rd du ri ng au th en ti ca ti on of th es er vi ce pr ov id er it is cl ea rt ha tc re de nt ia ls si gn ed by th ec ar da re mo re se cu re th an th os eo 375 yp ro vi de dv ia an au th en ti ca te dc ha nn el ho 247 he wr or 045 nd ho wn ew at tr ib ut es ar ec re at ed du ri ng th ev al id it yp er io do ft he do cu me nt is ex pl ai ne db el ow ev en wh en an at tr ib ut ei sp ro vi de db ya sm ar tc ar di ti so fc ou rs ei nd ep en de nt fr om it it is th er ew or ep os si bl et ha to ne se rv ic ep ro vi de rp as se sa na tt ri bu te to an ot he rs er vi ce wh en th is at tr ib ut ei sd ig it al ly si gn ed th er ec ei ve rc an be su re th at th ec on te nt is co rr ec ti ft he re is no di gi ta ls ig na tu re th er ec ei vi ng se rv ic ep ro vi de rh as to tr us tt he so ur ce of th ea tt ri bu te an dt he co am un ic at io nc ha nn el ho we ve rh an dl in ga tt ri bu te si nd ep en de nt ly fr om ac ar di so ft en no td es ir ab le as it ma ke st he ca rd ho ld er ca nn ot di re ct ly co nt ro la pi ec eo fp er so na li nf or ma ti on wh ic hm ay be sh ar ed wi th ot he rs er vi ce st he re fo re mo de rn at tr ib ut em an ag em en ti so rg an iz ed in au se rc en tr ic wa yt hi si sb as ed on pr oc es se si nw hi ch at tr ib ut es or as se rt ed st at em en ts ar ep ro vi de db yt he ca rd it se 154 of co ur se th is ap pr oa ch is mo re co mp le xt ha nt he co nc ep to 369 re el 480 lo at in ga tt ri bu te sb ut it re nd er sm or ed at ap ro te ct io na nd ah ig he rd eg re eo fp ri va cy wh il ea na tt ri bu te ca nb ew ri tt en to as ma rt ca rd du ri ng th ep ro du ct io np ro ce ss th ew 005 616 ot en ti al of th is te ch no lo 166 is on ly av ai la bl ei fe 198 st in 464 ar ds ca nb eu 054 at ed at tr ib ut es af te 633 ei ng is su ed fo rs uc ha po st is su an ce pr ov is io ni ng as pe ci al pr oc es si sn ec es sa ry th is pr oc es sm us te ns ur et ha to 375 ya tr us te da tt ri bu te pr ov id er ma wr ri te an at tr ib ut eo nt oa ca rd ch ip an at tr ib ut ep ro vi de ri st yp ic al ly tr ig ge re db ya se rv ic ep ro vi de rf or in st an ce an eg ov er 095 en tp or ta lo ra no 375 in es ho pi no rd er to ac hi ev ea us er ce nt ri ca tt ri bu te ma na ge me nt it is us ua ll yd es ir ed to se pa ra te se rv ic ep ro vi de rs fr om at tr ib ut ep ro vi de rs an 463 or eo ve rt om 255 et he am ut ua ll yi sv is ib le in or de rt os up po rt su ch ap ro ce ss th ee nh an ce dr ol ea ut he nt ic at io np ro to co le ra wa sd es ig ne de ra is at 094 ee pa rt yp ro to co lt ha ti sa pp li ed be tw ee na ch ip ca rd un de rt he co nt ro lo fi ts ho ld er vi as ma rt ca rd mi dd le wa re as er vi ce pr ov id er an da na tt ri bu te pr ov id er ge ne ra ll yt hi sp ro to co lc om bi ne sa us er ce nt er ed ap pr oa ch wi th th es er vi ce pr ov id er an dt he op er at or ha vi ng no di re ct co nn ec ti on wi th th es ys te ms

10. #10 George Lasry
14. Februar 2017

Very impressive, Norbert. You are so fast and efficient!. I am still stuck with solutions only for 5000 bigrams (twice the length). A couple of questions:
2) Did you normalize using a log function?
3) How did you build your database?
4) How exactly do you evaluate after each change? It is not needed to redecrypt the whole thing I guess.
5) What is your programming language?

Thx
George

11. #11 Armin
14. Februar 2017

Great job, Norbert!

Using your crib I found the following plaintext, that completes your solution:

12. #12 Klaus Schmeh
14. Februar 2017

@Norbert, Armin: Great job!

Here’s the cleartext:

YOUAREANAUTHORIZEDUSERWHENYOUWITHDRAWCASHFROMANATMMACHINETH
EMACHINECHECKSWHETHERYOUAREALEGITIMATEACCOUNTOWNERIFYOUENTE
RAFACTORYTHEGATEKEEPERCHECKSWHETHERYOUAREANAPPROVEDVISITORO
FTHEFACILITYINEACHOFTHESECASESTHEREISCHECKTODETERMINETHEAUT
HENTICITYOFTHECLAIMANDAUTHORIZATIONOFTHEACTIONHOWEVERITISTE
RAWCASHORACCESSFACILITYWITHOUTREVEALINGYOURIDENTITYTECHNICA
LLYSTATEDINMANYOFTODAYSAUTHENTICATIONPROCESSESANIDENTITYOFA
PERSONORMACHINEISVALIDATEDALTHOUGHITISALSOPOSSIBLECHECKONLY
APARTICULARATTRIBUTEOFANIDENTITYWITHOUTDISCLOSINGTHEFULLIDE
ESSDATAACCESSPERMISSIONSANDPROCURATIONATTRIBUTEBASEDAUTHENT
EDANDLIMITEDTOSPECIFICATTRIBUTESDATAPROTECTIONANATTRIBUTEMA
YHAVEAVALIDITYPERIODTHATISDIFFERENTFROMTHEONEOFTHEIDENTITYA
NTITYDATAANATTRIBUTECANBECHANGEDWITHOUTCHANGINGTHEIDENTITYI
NSOMECASESATTRIBUTESCANTOBEVALIDATEDWITHTHEUSEOFATTRIBUTEST
ATEMENTSFOREXAMPLETHEPROCESSOFAGEVERIFICATIONUSUALLYDOESNOT
REQUIRETHESPECIFICBIRTHDATEITSELFBUTRATHERACALCULATEDVALUEA
BOUTTHEATTRIBUTESTATEMENTSUCHASPERSONISOVEREIGHTTEENATTRIBU
TESAREWELLSUITEDFORSMARTCARDAUTHENTICATIONESPECIALLYINTHEFI
ELDOFEIDDOCUMENTSFORINSTANCETHEPROOFOFPERMISSIONORENTITLEME
NTFORDRIVINGORPURSUINGAPROFESSIONDOCTORLAWYERORSIMILARCANBE
AUTHENTICATEDPROOFOFREGISTRATIONFORPROOFOFINSURANCEPROFESSI
IBUTECERTIFICATESANATTRIBUTECERTIFICATEASSTANDARDIZEDINTHEX
TAPUBLICKEYTHESIGNATURECOMESFROMACERTIFICATIONAUTHORITYCAHO
WEVERTHISKINDOFATTRIBUTEHANDLINGDOESNTPLAYAROLEINMODERNEIDC
ARDSYSTEMSONEREASONFORTHISISTHATITISTOOCOMPLICATEDTOINVOLVE
TESHAVETOBESTOREDONTHECARDDURINGPERSONALIZATIONWHICHPREVENT
ATTRIBUTESAREDISCLOSEDDURINGTHISPROCESSANDCANBEUTILIZEDEVEN
OUTSIDETHECARDFORTHESEREASONSATTRIBUTESTODAYARETYPICALLYPRO
VIDEDBYTHECARDITSELFTHEYMAYEITHERBESIGNEDBYTHECARDORPROVIDE
DTOEXTERNALSYSTEMSVIAANAUTHENTICATEDCHANNELANATTRIBUTETHATI
SPROTECTEDINSUCHAWAYCANBEREFERREDTOASACREDENTIALBECAUSEITSO
RIGINALITYANDAUTHENTICITYCANBEPROVENCREDENTIALSCREATEDINSUC
HAWAYINCLUDEONLYTHENECESSARYATTRIBUTESOFTHECARDHOLDERANDWHI
CHEXPLICITPERMISSIONSAREGRANTEDANINNOVATIVEWAYTOHANDLEATTRI
BUTESANDTOASSIGNPERMISSIONSTOTHEMHASBEENIMPLEMENTEDINTHEGER
FDESCRIBINGCERTIFICATESPRESENTONTHECARDEACHATTRIBUTECANONLY
RANDWITHDEDICATEDPERMISSIONGIVENBYAFEDERALCAFORTHISPERMISSI
ONTHESERVICEPROVIDERHASTOAPPLYFORANDTHEPERMISSIONFORTHEBUSI
NESSCASEISHANDEDOVERINFORMOFACARDVERIFIABLECERTIFICATETHATI
SPROVENBYTHEIDCARDDURINGAUTHENTICATIONOFTHESERVICEPROVIDERI
TISCLEARTHATCREDENTIALSSIGNEDBYTHECARDAREMORESECURETHANTHOS
EONLYPROVIDEDVIAANAUTHENTICATEDCHANNELHOWTHEYWORKANDHOWNEWA
TTRIBUTESARECREATEDDURINGTHEVALIDITYPERIODOFTHEDOCUMENTISEX
PLAINEDBELOWEVENWHENANATTRIBUTEISPROVIDEDBYASMARTCARDITISOF
COURSEINDEPENDENTFROMITITISTHEREFOREPOSSIBLETHATONESERVICEP
ROVIDERPASSESANATTRIBUTETOANOTHERSERVICEWHENTHISATTRIBUTEIS
FTHEREISNODIGITALSIGNATURETHERECEIVINGSERVICEPROVIDERHASTOT
RUSTTHESOURCEOFTHEATTRIBUTEANDTHECOMMUNICATIONCHANNELHOWEVE
RHANDLINGATTRIBUTESINDEPENDENTLYFROMACARDISOFTENNOTDESIRABL
EASITMAKESTHECARDHOLDERCANNOTDIRECTLYCONTROLAPIECEOFPERSONA
LINFORMATIONWHICHMAYBESHAREDWITHOTHERSERVICESTHEREFOREMODER
NATTRIBUTEMANAGEMENTISORGANIZEDINAUSERCENTRICWAYTHISISBASED
ONPROCESSESINWHICHATTRIBUTESORASSERTEDSTATEMENTSAREPROVIDED
BYTHECARDITSELFOFCOURSETHISAPPROACHISMORECOMPLEXTHANTHECONC
EPTOFFREELYFLOATINGATTRIBUTESBUTITRENDERSMOREDATAPROTECTION
ANDAHIGHERDEGREEOFPRIVACYWHILEANATTRIBUTECANBEWRITTENTOASMA
RTCARDDURINGTHEPRODUCTIONPROCESSTHEFULLPOTENTIALOFTHISTECHN
OLOGYISONLYAVAILABLEIFEXISTINGCARDSCANBEUPDATEDATTRIBUTESAF
TERBEINGISSUEDFORSUCHAPOSTISSUANCEPROVISIONINGASPECIALPROCE
SSISNECESSARYTHISPROCESSMUSTENSURETHATONLYATRUSTEDATTRIBUTE
PROVIDERMAYWRITEANATTRIBUTEONTOACARDCHIPANATTRIBUTEPROVIDER
ISTYPICALLYTRIGGEREDBYASERVICEPROVIDERFORINSTANCEANEGOVERNM
ENTPORTALORANONLINESHOPINORDERTOACHIEVEAUSERCENTRICATTRIBUT
EMANAGEMENTITISUSUALLYDESIREDTOSEPARATESERVICEPROVIDERSFROM
ATTRIBUTEPROVIDERSANDMOREOVERTOMAKETHEMMUTUALLYINVISIBLEINO
RDERTOSUPPORTSUCHAPROCESSTHEENHANCEDROLEAUTHENTICATIONPROTO
COLERAWASDESIGNEDERAISATHREEPARTYPROTOCOLTHATISAPPLIEDBETWE
ENACHIPCARDUNDERTHECONTROLOFITSHOLDERVIASMARTCARDMIDDLEWARE
ASERVICEPROVIDERANDANATTRIBUTEPROVIDERGENERALLYTHISPROTOCOL
COMBINESAUSERCENTEREDAPPROACHWITHTHESERVICEPROVIDERANDTHEOP
ERATORHAVINGNODIRECTCONNECTIONWITHTHESYSTEMS

13. #13 Thomas
15. Februar 2017

Congratulations!

@Norbert, George, Armin or WIMC:

If I’m not mistaken (I’m no programmer) hill climbing and simulated annealing algorithms are “blindfolded” choosing a certain bigram substitution table (key). Wouldn’t it be even faster if the bigram frequencies of the ciphertext were taken into account from the beginning to rule out impossible and improbable substitutions? I think the problem might be to group the ciphertext and plaintext bigrams by frequencies before decrypting.

14. #14 Norbert
16. Februar 2017

@George: I am afraid my hit was by mere chance! Anyway, here is my “setup”: I used the quadgram freqencies provided here. The frequencies are logarithmized, and then the values for each quadgram of the tentative plaintext summed up. For the 2500-letter challenge, I also tried quintgrams and another approach, which is to compute the bigram distribution of the plaintext and compare it to the expected distribution (chi square). Neither approach has succeeded yet (the latter seems to be the most unusable, unless I implemented it wrongly). I did not optimize my code in terms of redundant recalculating of unchanged plaintext parts. I am happy if the programming takes not too much time, rather I allow my C++ code to run all night 😉

@Thomas: Depending on the details of the hill climbing algorithm, it is indeed often a good idea to start with a probable key, so the program may at first try to assign each cipher number to a bigram with a corresponding frequency. However, the simulated annealing algorithm will in its first stage turn any initial key into a “chaotic” state. But the programmer is free to decide that certain cipher numbers will be restricted to one or several plaintext bigrams. It’s not a big deal to integrate such a feature into the program. Maybe this is the way Armin used my plaintext suggestion for challenge #2 as a “crib”.

15. #15 George Lasry
16. Februar 2017

Hi Norbert

Many thanks for the details. I tried something very similar, but with hill climbing. Seems that SA is more appropriate. Congrats again!.

George

16. #16 Norbert
17. Februar 2017

At last, a simulated annealing approach with quintgram scoring and a guessed crib (“there are”) was successful toward challenge #1:

there are several technologies which can be used to prevent attacks on eids the most important ones are cryptography biometry secure storage and pin query in this work only cryptography is covered cryptography provides for a broad range of methods to improve the security of an eid among the attacks mentioned above complete fraud cloning alteration unauthorized reading eavesdropping and man in the middle can be prevented with cryptography usage by the wrong persons would be addressed with biometry while illegal obtaining is prevented not with technical but with organisational means for preventing denial of service and marking cryptography is also not the major technology however it is possible to abuse cryptographic tools to start denial of service or marking attacks especially the question of marking is very interesting because it is possible to use covert channels in cryptographic protocols for this purpose however this important question is out of scope in this work the first purpose of this paper is to classify the cryptographic techniques that make an eid secure these cryptographic techniques can be divided into three groups authentication of the chip unit authentication of the terminal unit and secure data exchange among the most important cryptographic tools used for securing eids are the ones that are used for authenticating the card chip itself chip authentication or the content of the card chip content authentication or a statement about the eid statement authentication these tools are summarised under authentication of the chip unit authentication of the chip unit can prevent complete fraud cloning and alteration it can be achieved with both asymmetric and symmetric cryptography this authentication is accomplished with a challenge response protocol the terminal unit sends a challenge and the chip replies with a response chip authentication can prevent complete fraud and cloning but cannot prevent alteration the challenge response protocol is performed with either a secret or a private key in both cases the key must be stored on the chip in a secure way in addition the chip must be capable of computing a challenge out of a response which means that a memory chip is not suitable in other words a microcontroller is necessary in the symmetric case the chip has stored a secret key e g an aes key which is used for the challenge response protocol both the card and the verifier need to know this key symmetric card authentication is mainly recommendable in environments where there are only few terminal units in case of large number of checking units the management of the secret keys gets complicated nevertheless there are card systems where a large number of cards and checking units only use symmetric keys e g aes ecards are often realised this way in case of asymmetric chip authentication the chip uses a private key e g rsa key for the challenge response protocol the terminal unit needs the public key usually provided as a digital

17. #17 Jim
17. Februar 2017

Well done, Norbert! I was still messing with initialization strategies without much success.

18. #18 Hendrik
18. Februar 2017

@Norbert Congratulations!
I tried it for myself with simulated annealing and quintgram scoring but was still struggling with local maximums and inefficent code 😉

19. #19 Klaus Schmeh
18. Februar 2017

Bart Wenmeckers via Faccebook:
He is too quick for me.. well done to Norbert. I am curious if my solver will break it.

20. #20 Klaus Schmeh
18. Februar 2017

@Norbert: Congratulations! Great job!
Your work proves that a bigram substitution applied on a 2500 letter cleartext can be broken. This means that this method should only be used if the cleartext is considerably shorter than 2500 letters. However, creating a table containg 676 bigrams (i. e. 1352 letters) to encrypt, say, a 1500 letter plaintext doesn’t make much sense, as one can use the One Time Pad instead. All in all, I see no reason to use the bigram substitution at all.