Substitution Cipher¶
Our goal here is to crack substitution ciphers using a Monte Carlo method (stochastic hill climb / Simulated Annealing).
A substitution cipher is one where you create a key that is a permutation of the alphabet (e.g. $A \mapsto K$, $B \mapsto Z$, etc.) Using this key, you can encode and decode this message. At first sight this might seem uncrackable by brute force -- your key is one permutation of $28!$ (26 letters plus a period and space punctuation).
This is a needle in an enormous haystack. If you could examine $10^{12}$ keys in a second (which is a generous overestimate), then it would still take you about a billion years to crack this code. Nevertheless, if you're sending sufficiently long (few paragraphs) of readable text data, this method is cracable in seconds using a randomized algorithm.
import numpy as np
from numpy import sqrt, pi, exp, log, floor, ceil, sin, cos
from numpy import linspace, logspace, arange, empty, zeros, ones, empty_like, zeros_like, ones_like, full, full_like
from numpy import diff, meshgrid, mean, std, argmin, array, eye, amin, amax, fmin, fmax
from numpy.linalg import norm
import re, multiprocessing as mp
from collections import defaultdict
from tqdm.notebook import tqdm, trange
rng = np.random.default_rng()
Encoding / decoding messages¶
Let's fix our alphabet to be uppercase letters, space and a period. (You coluld also mix case, use numbers, symbols, etc.). A few useful functions for us are:
- Sanitize arbitrary text to remove symbols outside our alphabet, convert case, etc.
- Encode / decode sanitized messages given a key
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ. '
list_alphabet = list(alphabet)
def sanitize(s):
s = re.sub( '[^A-Za-z. ]', ' ', s )
s = re.sub( r'\s+', ' ', s ).upper()
return list(s)
def make_index( A ):
return { s: n for n, s in enumerate(A) }
def encode( msg, key ):
return [ key[encode.idx[c]] for c in msg]
encode.idx = make_index( alphabet )
def decode( msg, key ):
idx = make_index( key )
return [alphabet[ idx[c] ] for c in msg]
I'm going to choose a passage from Sherlock Holmes as my message to send.
msg = sanitize( '''
To Sherlock Holmes she is always _the_ woman. I have seldom heard him
mention her under any other name. In his eyes she eclipses and
predominates the whole of her sex. It was not that he felt any emotion
akin to love for Irene Adler. All emotions, and that one particularly,
were abhorrent to his cold, precise but admirably balanced mind. He
was, I take it, the most perfect reasoning and observing machine that
the world has seen, but as a lover he would have placed himself in a
false position. He never spoke of the softer passions, save with a gibe
and a sneer. They were admirable things for the observer—excellent for
drawing the veil from men’s motives and actions. But for the trained
reasoner to admit such intrusions into his own delicate and finely
adjusted temperament was to introduce a distracting factor which might
throw a doubt upon all his mental results. Grit in a sensitive
instrument, or a crack in one of his own high-power lenses, would not
be more disturbing than a strong emotion in a nature such as his. And
yet there was but one woman to him, and that woman was the late Irene
Adler, of dubious and questionable memory.
I had seen little of Holmes lately. My marriage had drifted us away
from each other. My own complete happiness, and the home-centred
interests which rise up around the man who first finds himself master
of his own establishment, were sufficient to absorb all my attention,
while Holmes, who loathed every form of society with his whole Bohemian
soul, remained in our lodgings in Baker Street, buried among his old
books, and alternating from week to week between cocaine and ambition,
the drowsiness of the drug, and the fierce energy of his own keen
nature. He was still, as ever, deeply attracted by the study of crime,
and occupied his immense faculties and extraordinary powers of
observation in following out those clues, and clearing up those
mysteries which had been abandoned as hopeless by the official police.
From time to time I heard some vague account of his doings: of his
summons to Odessa in the case of the Trepoff murder, of his clearing up
of the singular tragedy of the Atkinson brothers at Trincomalee, and
finally of the mission which he had accomplished so delicately and
successfully for the reigning family of Holland. Beyond these signs of
his activity, however, which I merely shared with all the readers of
the daily press, I knew little of my former friend and companion.
One night—it was on the twentieth of March, 1888—I was returning from a
journey to a patient (for I had now returned to civil practice), when
my way led me through Baker Street. As I passed the well-remembered
door, which must always be associated in my mind with my wooing, and
with the dark incidents of the Study in Scarlet, I was seized with a
keen desire to see Holmes again, and to know how he was employing his
extraordinary powers. His rooms were brilliantly lit, and, even as I
looked up, I saw his tall, spare figure pass twice in a dark silhouette
against the blind. He was pacing the room swiftly, eagerly, with his
head sunk upon his chest and his hands clasped behind him. To me, who
knew his every mood and habit, his attitude and manner told their own
story. He was at work again. He had risen out of his drug-created
dreams and was hot upon the scent of some new problem. I rang the bell
and was shown up to the chamber which had formerly been in part my own.
''')
Let's choose a key randomly, and encrypt this message. (To ensure our functions work, let's also decode this message and see if everything looks OK)
key = rng.permutation( list_alphabet )
coded_msg = encode( msg, key )
print( ''.join( coded_msg[:400] ) )
print( ''.join( decode( coded_msg, key )[:400] ) )
LAVLCRQPTVIWLRVTKQCLCRQLMCLETDEJCLARQLDVKEYSLMLREGQLCQTFVKLRQEPFLRMKLKQYAMVYLRQPLHYFQPLEYJLVARQPLYEKQSLMYLRMCLQJQCLCRQLQITMBCQCLEYFLBPQFVKMYEAQCLARQLDRVTQLVNLRQPLCQUSLMALDECLYVALAREALRQLNQTALEYJLQKVAMVYLEWMYLAVLTVGQLNVPLMPQYQLEFTQPSLETTLQKVAMVYCLEYFLAREALVYQLBEPAMIHTEPTJLDQPQLEXRVPPQYALAVLRMCLIVTFLBPQIMCQLXHALEFKMPEXTJLXETEYIQFLKMYFSLRQLDECLMLAEWQLMALARQLKVCALBQPNQIALPQECVYMYOLEYFLVXCQPGMYOLKEIRMY TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER UNDER ANY OTHER NAME. IN HIS EYES SHE ECLIPSES AND PREDOMINATES THE WHOLE OF HER SEX. IT WAS NOT THAT HE FELT ANY EMOTION AKIN TO LOVE FOR IRENE ADLER. ALL EMOTIONS AND THAT ONE PARTICULARLY WERE ABHORRENT TO HIS COLD PRECISE BUT ADMIRABLY BALANCED MIND. HE WAS I TAKE IT THE MOST PERFECT REASONING AND OBSERVING MACHIN
Cracking algorithm¶
We are going to download a few long books from Project Gutenberg, and then scan them and record how frequently we get symbol sequences. That is, how often do we get each letter? How often do we get ab
? How often do se get abs
? Let's stop at sequences of a fixed length. (Length 3 worked perfectly fine for me; but I'm going up to length 6 here.)
%%time
lang_sample_file = 'data/few-books.txt'
with open( lang_sample_file ) as f:
lang_sample = f.read()
print( lang_sample[:500] )
The Project Gutenberg eBook of Moby-Dick; or The Whale, by Herman Melville This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located befor CPU times: user 7.37 ms, sys: 90 μs, total: 7.46 ms Wall time: 6.9 ms
%%time
s_lang_sample = sanitize( lang_sample )
print( ''.join( s_lang_sample[:1000] ) )
THE PROJECT GUTENBERG EBOOK OF MOBY DICK OR THE WHALE BY HERMAN MELVILLE THIS EBOOK IS FOR THE USE OF ANYONE ANYWHERE IN THE UNITED STATES AND MOST OTHER PARTS OF THE WORLD AT NO COST AND WITH ALMOST NO RESTRICTIONS WHATSOEVER. YOU MAY COPY IT GIVE IT AWAY OR RE USE IT UNDER THE TERMS OF THE PROJECT GUTENBERG LICENSE INCLUDED WITH THIS EBOOK OR ONLINE AT WWW.GUTENBERG.ORG. IF YOU ARE NOT LOCATED IN THE UNITED STATES YOU WILL HAVE TO CHECK THE LAWS OF THE COUNTRY WHERE YOU ARE LOCATED BEFORE USING THIS EBOOK. TITLE MOBY DICK OR THE WHALE AUTHOR HERMAN MELVILLE RELEASE DATE JUNE EBOOK MOST RECENTLY UPDATED AUGUST LANGUAGE ENGLISH CHARACTER SET ENCODING UTF PRODUCED BY DANIEL LAZARUS JONESEY AND DAVID WIDGER START OF THE PROJECT GUTENBERG EBOOK MOBY DICK OR THE WHALE MOBY DICK OR THE WHALE. BY HERMAN MELVILLE CONTENTS ETYMOLOGY. EXTRACTS SUPPLIED BY A SUB SUB LIBRARIAN . CHAPTER . LOOMINGS. CHAPTER . THE CARPET BAG. CHAPTER . THE SPOUTER INN. CHAPTER . THE COUNTERPANE. CHAPTER . BREAKFAS CPU times: user 62.9 ms, sys: 20.5 ms, total: 83.4 ms Wall time: 83.3 ms
# Create frequency table
def ngrams_freq( txt, sizes=[1, 2, 3, 4, 5, 6] ):
'''Counts frequencies with which letter combinations (of size specified by the sizes array) occur.
Returns freq where:
freq['asdf'] is the frequency of occurances of 'asdf'
Also freq['@sizes'] = sizes'''
if type( txt ) != str: txt = ''.join( txt )
idx = make_index( alphabet )
freq = defaultdict( lambda: 0 )
for N in tqdm(sizes):
total = len(txt) - N + 1
inc = 1/total
for n in trange( total ):
freq[ txt[n:n+N] ] += inc
freq['@sizes'] = sizes
return freq
# If this runs slow, then pass sizes=[4]; or sizes=[3,4].
# You can define a perfectly good fitness function just using frequencies of 3 (or 4) letter combinations
freq = ngrams_freq( s_lang_sample, sizes=[1,2,3,4,5,6] )
print( f'Generated {len( freq.keys() )} keys' )
0%| | 0/6 [00:00<?, ?it/s]
0%| | 0/1581649 [00:00<?, ?it/s]
0%| | 0/1581648 [00:00<?, ?it/s]
0%| | 0/1581647 [00:00<?, ?it/s]
0%| | 0/1581646 [00:00<?, ?it/s]
0%| | 0/1581645 [00:00<?, ?it/s]
0%| | 0/1581644 [00:00<?, ?it/s]
Generated 382346 keys
Question 1: Define a Fitness function¶
Once we have frequencies of $n$ letter combinations of symbols, we need to define the fitness of a given string of characters. It needs to be such that it is high for English language strings, and low otherwise. (I've exlcuded the definition of this function from the notebook; experiment around and think about it -- the better your fitness function is, the better the results will be)
Note: You can get perfectly good results using just frequencies of 4-symbol combinations; But feel free to combine frequencies of $n$-grams for a few different values of $n$ if you want to experiment. (It's more expensive.)
Warning: It's hard to think of what this function should be, and hard to implement it in a way that gives good results. But once you figure out what to do, the actual implementation is just a few lines of code.
def fitness( txt ):
# Computes fitness of a message "txt" given the English symbol sequence we computed above.
# For testing, I also recommend normalizing your fitness with the length of the message
# so it doesn't seem too large/small if different inputs have different lengths.
F = ... # Feel free to use the global variable freq we computed.
return F
# Compute fitness for a few test strings to see if we're "doing OK"
for s in [
'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate est. Extremum autem esse bonorum voluptatem ex infinito tempore aetatis percipi posse, quam ex hoc facillime perspici potest: Constituamus aliquem.',
'The Monte Carlo method uses random sampling to solve computational problems that would otherwise be intractable, and enables computers to model complex systems in nature that are otherwise too difficult to simulate. This course provides a first introduction to Monte Carlo methods from complementary theoretical and applied points of view, and will include implementation of practical algorithms. Topics include random number generation, sampling, Markov chains, Monte Carlo integration, stochastic processes, and applications in computational science. Students need a basic background in probability, multivariable calculus, and some coding experience in any language.',
'asdf wer b ewar pou bewp ear bewapoiu b awer wapeoi b asdlk;j wav asdlfk jwe bvasdlkjf awepoui b zxcnmvoiheyrhb zxljh bvpcxiuoear bv asdjhf b sdilufyh awe biodsy bscliewyaq qwpoi bv piozu bawe pkc,v uiad fa bpoiua bokased bweoaiu asd vgaweoiu vkljaewsou bpqp l;kj xcv,mj apo alsdkjf bpouw2 er bsdoiu wqaer bpoiu q2w[ sapoiu asdf bcpoviuqwer,k bpoieau aospiduf bopawiue rb olajkse fpou b,asejdpowe b poaiusde frboiu wqern bcxkl jx.;lkwaepo p[q sdvlk adpoiu a,vbmxc ;oij asdf apodsiu awklerj zspoiua ,bvlaksdj aqowieu salkjf bopsierlka lo;ajksdfopi uasd,fljk asld;kj fasld;jk faslkdjf o blzkj asldkjf abpoaj sdlkjf bxcl,m vpoiurewt,n bzmclkajd gfbol jkasdlf; jadslkj.',
'A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.[1][2][3] Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). It is named after the Russian mathematician Andrey Markov.',
]:
print( f'{fitness( sanitize(s) ):.3f} {s:.80}' )
-152.625 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor i -39.363 The Monte Carlo method uses random sampling to solve computational problems that -413.316 asdf wer b ewar pou bewp ear bewapoiu b awer wapeoi b asdlk;j wav asdlfk jwe bva -47.702 A Markov chain or Markov process is a stochastic model describing a sequence of
Question 2: Crack the cipher.¶
Given a key $X$, we can define a function $F$ by $$ F(X) = \operatorname{fitness}( \operatorname{decode}( \mathrm{cmsg}, X ), \mathrm{freq} ) $$ Now maximize $F$ using simulated annealing. (For fun, you can also try and maximize $F$ using a hill climb, and see the difference in performance between the two algorithms.)
In order to use a hill climb / simulated annealing, you need a notion of close by keys (or a proposal mechanism). We can do this by swapping two symbols. That is, choose $i$, $j$ randomly and define a new key $Y$ by
Y = X.copy()
Y[i], Y[j] = Y[j], Y[i]
Write a function crack
, maximizes the function $F$ above over all keys using simulated annealing.
def crack( cmsg, X, iters=range(2000), store_every=100, rng=rng ):
'''Crack a coded message "cmsg", with initial guess of the key to be X.
Does "iters" iterations, and stores results after every "store_every" iterations.
You can use the freq global variable we computed
'''
# Choose an annealing schedule
β = ... # β[0] should be small; β[ len(iters) - 1] should be large
Xn = [''.join(X)]
fnX = fitness( decode( cmsg, X ) )
for n in iters:
# Choose a close by key Y
# Decide if you want to accept / reject it using the Metropolis Hastings
# rule to sample from e^{+β[n]* F}. Note the + in the exponent -- you're maximizing F
# and not minimizing it
if n % store_every == store_every-1:
Xn.append( ''.join(X) )
Xn.append( ''.join(X) )
return Xn
cracked_key = crack( coded_msg, rng.permutation( list_alphabet ), iters=trange(5000) )
0%| | 0/5000 [00:00<?, ?it/s]
Let's see how well we did. Every 100 iterations, let's print the first line of the message decoded with our current guess of the key, the fitness (F) and the key distance (D), which is the number of characters we got wrong in our key.
key_dist = lambda s, t: sum( [a != b for a, b in zip( s, t )] )
for n, k in enumerate( cracked_key ):
dmsg = decode( coded_msg, k )
fn = fitness( dmsg )
print( f'F={fn:7.2f}, D={key_dist(k, key):2d}: ', ''.join( dmsg[:80] ) )
F=-499.49, D=26: VMOVLK AEOXNVKOEI LVLK VPLVWEUWGLVMK VUOIWY.VPVKWC VL EJOIVK WAJVKPIVI YMPOYVK A F=-457.06, D=26: TRHT POUZHD.TPHZQO T POTG TJZFJM TRPOTFHQJEITGTPJNOT OZAHQTPOJUATPGQTQOERGHETPOU F=-381.65, D=23: LHOLAY USODJLYOSM ALAY LNALESWEIALHY LWOMETZLNLYE. LA SCOMLY EUCLYNMLM THNOTLY U F=-370.23, D=26: OCLOHAT ULFJOALUMTHOHATONHOSUBSXHOCATOBLMSIDONOASETOHTURLMOATS ROANMOMTICNLIOAT F=-360.63, D=27: IRMILATUNMDZIAMNSTLILATIELI NP JLIRATIPMS OWIEIA CTILTNFMSIAT UFIAESISTOREMOIATU F=-352.03, D=27: IAEILSTNGEPYISEGMTLILSTIRLI GD BLIASTIDEM FVIRIS .TILTGOEMIST NOISRMIMTFAREFISTN F=-365.32, D=28: IEAILSTYFAKBISAFDTLILSTI LIRFGRPLIESTIGADRHUI ISR.TILTFMADISTRYMIS DIDTHE AHISTY F=-346.39, D=28: ERAEBITYNAPWEIANDTBEBITE BESNGSMBERITEGADSHFE EIS.TEBTNLADEITSYLEI DEDTHR AHEITY F=-345.56, D=27: ERAEWNTIYAMKENAYDTWEWNTE WESYCSUWERNTECADSHBE ENSFTEWTYLADENTSILEN DEDTHR AHENTI F=-334.95, D=25: ERSEDNTOISMJENSIPTDEDNTEADE IY GDERNTEYSP H.EAEN WTEDTILSPENT OLENAPEPTHRASHENTO F=-327.69, D=27: ERSEDNUOBSHJENSBTUDEDNUEADE BM KDERNUEMST CYEAEN ZUEDUBLSTENU OLENATETUCRASCENUO F=-315.64, D=27: ERSETNOUPSHVENSPDOTETNOEATE PM KTERNOEMSD CYEAEN GOETOPLSDENO ULENADEDOCRASCENOU F=-307.05, D=26: ERUETMOSPUBWEMUPDOTETMOEATE PN YTERMOENUD CVEAEM JOETOPLUDEMO SLEMADEDOCRAUCEMOS F=-293.48, D=28: ERUETSIMPUBJESUPDITETSIEATE PN OTERSIENUD CVEAES FIETIPLUDESI MLESADEDICRAUCESIM F=-288.75, D=26: EROETSINPOBXESOPDITETSIEATE PM VTERSIEMOD CYEAES FIETIPLODESI NLESADEDICRAOCESIN F=-281.40, D=25: RE TSICDEBJ SEDPIT TSI AT ODHOMT RSI HEPONY A SOFI TIDLEP SIOCL SAP PINRAEN SIC F=-262.33, D=21: RE TSIMLEBJ SELPIT TSI AT OLHOFT RSI HEPONY A SOVI TILDEP SIOMD SAP PINRAEN SIM F=-238.01, D=19: SI TREMLIBJ RILPET TRE AT OLGOYT SRE GIPONF A ROVE TELDIP REOMD RAP PENSAIN REM F=-224.12, D=17: SI TPEMLIB. PILRET TPE AT OLGOYT SPE GIRONC A POVE TELDIR PEOMD PAR RENSAIN PEM F=-194.57, D=14: SI THEMLIB. HILRET THE OT ALGAYT SHE GIRANC O HAVE TELDIR HEAMD HOR RENSOIN HEM F=-175.61, D=10: TI SHEMLIB. HILRES SHE OS ALGAYS THE GIRANW O HAVE SELDIR HEAMD HOR RENTOIN HEM F=-167.98, D= 9: TI SHEMLIBW HILRES SHE OS ALGAYS THE GIRAN. O HAVE SELDIR HEAMD HOR RENTOIN HEM F=-167.98, D= 9: TI SHEMLIBW HILRES SHE OS ALGAYS THE GIRAN. O HAVE SELDIR HEAMD HOR RENTOIN HEM F=-131.48, D= 7: TO SHEMLOBW HOLRES SHE IS ALGAYS THE GORAN. I HAVE SELDOR HEAMD HIR RENTION HEM F=-110.87, D= 5: TO SHEMLOCW HOLRES SHE IS ALGAYS THE GORAN. I HAVE SELDOR HEAMD HIR RENTION HEM F=-108.94, D= 4: TO SHEMLOCW HOLRES SHE IS ALKAYS THE KORAN. I HAVE SELDOR HEAMD HIR RENTION HEM F=-108.94, D= 4: TO SHEMLOCW HOLRES SHE IS ALKAYS THE KORAN. I HAVE SELDOR HEAMD HIR RENTION HEM F=-108.94, D= 4: TO SHEMLOCW HOLRES SHE IS ALKAYS THE KORAN. I HAVE SELDOR HEAMD HIR RENTION HEM F=-108.94, D= 4: TO SHEMLOCW HOLRES SHE IS ALKAYS THE KORAN. I HAVE SELDOR HEAMD HIR RENTION HEM F=-108.94, D= 4: TO SHEMLOCW HOLRES SHE IS ALKAYS THE KORAN. I HAVE SELDOR HEAMD HIR RENTION HEM F= -79.56, D= 2: TO SHEMLOCK HOLRES SHE IS ALWAYS THE WORAN. I HAVE SELDOR HEAMD HIR RENTION HEM F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER
Amazing! Despite seemingly insurmountable odds (1 in $10^{-28}$), we got it right in a few thousand iterations. You may want to experiment with the weights to see what works best. You can use just 3 letter sequences for instance, it it still usually works. I couldn't, however, get it to work by using only 2 letter sequences. (You can play with different messages / language samples to see how things work for you.)
Just for fun, I'm going to run this a few times and see how often we succeed starting with randomly chosen keys.
Since each run is independent of the other, I'm running them all in parallel using the python multiprocessing
module. One thing you should look out for here is to reseed your random number generator on each run. Otherwise each run will use the exact same random numbers, and produce identical results (just wasting computational effort). I'm doing this here using rng.spawn
, and passing the new random number generator to each crack attempt. You can run things sequentially if you're not too familiar with this. It will still work but be a bit slower.
You may want to see how much simulated annealing beats the stochastic hill climb by running this again without annealing. I found a hill climb worked about 70% of the time and Annealing (with the right schedule for β) worked about 90% of te time.
(You don't have to turn this part in with your homework as it takes a bit longer to finish running.)
%%time
def crack_mp( cmsg, n_iters=1000, n_reals=8 ):
rngs = rng.spawn(n_reals)
args = [ (cmsg, rng.permutation( list(alphabet) ), range(n_iters), 100, rngs[n] )
for n in range(n_reals) ]
with mp.Pool() as p:
res = p.starmap( crack, args )
return array(res)
K = crack_mp( coded_msg, 5000, 8*mp.cpu_count() )
CPU times: user 834 ms, sys: 53.6 ms, total: 887 ms Wall time: 42min 47s
n_cracked = 0
for k in K[:, -1]:
dmsg = decode( coded_msg, k )
fn = fitness( dmsg )
dist = key_dist( k, key )
if dist <= 3: n_cracked += 1
print( f'F={fn:7.2f}, D={key_dist(k, key):2d}: ', ''.join( dmsg[:80] ) )
print( f'\n\nSuccessfully cracked {n_cracked / K.shape[0] * 100 :.1f}% of the time' )
F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F=-297.89, D=21: ETNEIALOUNCKEANUYLIEIALESIERUDRMIETALEDNYR VESEAR.LEILUBNYEALROBEASYEYL TSN EALO F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F=-285.04, D=26: ERAELSITHAPZESAHMILELSIEOLE HD BLERSIEDAM UKEOES FIELIHNAMESI TNESOMEMIUROAUESIT F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F=-272.29, D=19: ST YIROUTCK ITUARY YIR NY LUFLMY SIR FTALE. N ILVR YRUGTA IRLOG INA ARESNTE IRO F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F=-283.46, D=26: ANUALT DOURKATUOM LALT AELAIOCIPLANT ACUMISWAEATI. AL OHUMAT IDHATEMAM SNEUSAT D F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER F= -20.30, D= 0: TO SHERLOCK HOLMES SHE IS ALWAYS THE WOMAN. I HAVE SELDOM HEARD HIM MENTION HER Successfully cracked 93.8% of the time