Tuesday, June 6, 2017

How to match word cipher text against reference text

In my last post, I mentioned that I want to develop some strategies to attack word ciphers. The reason for this is that I think it would be generally useful for decipherments like the Rohonc text, where the semantic domain is known.

One way to do this would be to create a network showing the relationships between words within a cipher text, and try to find the best match between that network and a similar network for a known text.

I am trying these ideas out with chapters 81 and 87 of Melville's Moby Dick. You can see the basic idea if you look at the closest relationships between the top 20 words of each chapter:



Closest relationships between top 20 words in Chapter 81


 

Closest relationships between top 20 words in Chapter 87

You can see that in both chapters there is a little island of nouns (whale, whales, it, he), and another little island of determiners (a, the, his). Most prepositions are connected to other prepositions, and the pronouns "that" and "this" are connected to each other. In the underlying data there is much more information available about the closeness of the relationships, but that is not shown in these graphs.

The main problem I will run into is the problem of processing time. Luckily, in the quiet years since I was last writing about these things, I have learned to use cloud computing. It will just be a big job and it needs to be planned out carefully.

No comments:

Post a Comment