Tuesday, August 8, 2017

How many symbols are in a script?

Given Zipf's law, it seems like you should be able to calculate the number of symbols in a script based on the number of symbols that only occur only one time in a sample of text.

I worked out the math, but the code I've written to do the calculation is very slow when it comes to scripts with a large number of symbols (like Debosnys' cipher), so I also wrote a Monte Carlo simulation that can come up with an approximate answer much more quickly.

The Debosnys ciphers contain 1188 glyphs, of which 277 occur only once. For a text like this we would expect a total glyph inventory of around 1500 symbols.

Here is what the distribution looks like for texts of 1188 symbols. The x-axis is the inventory of symbols, and the y-axis is the number that would appear only once in a distribution that conforms to Zipf's law.


No comments:

Post a Comment