Tuesday, August 8, 2017

How many symbols are in a script?

Given Zipf's law, it seems like you should be able to calculate the number of symbols in a script based on the number of symbols that only occur only one time in a sample of text.

I worked out the math, but the code I've written to do the calculation is very slow when it comes to scripts with a large number of symbols (like Debosnys' cipher), so I also wrote a Monte Carlo simulation that can come up with an approximate answer much more quickly.

The Debosnys ciphers contain 1188 glyphs, of which 277 occur only once. For a text like this we would expect a total glyph inventory of around 1500 symbols.

Here is what the distribution looks like for texts of 1188 symbols. The x-axis is the inventory of symbols, and the y-axis is the number that would appear only once in a distribution that conforms to Zipf's law.


Monday, August 7, 2017

Another note on N-Glyphs

To test the hypothesis that the subglyph N represents nasalization of a vowel, I looked at the frequency of nasalized syllables in Beaudelaire's Fleurs du Mal and compared it to the cipher poem.

I have a copy of Fleurs du Mal containing 3182 Alexandrine lines. (One poem in this copy is not an Alexandrine). Among these lines, I count 6536 nasalized syllables, so an average of 2.05 nasalized syllables per line.

If the Debosnys cipher poem is a French Alexandrine, and the N subglyph represents nasalization (and is the only representation of nasalization), then we should expect to find a similar distribution of N subglyphs in the cipher poem.

Of the 20 lines of the cipher poem, I counted a total of 30 n-glyphs, so an average of 1.5 n-glyphs per line. More specifically, the number of n-glyphs per line was distributed as follows:

0 n-glyphs: 2 lines, 10%
1 n-glyph: 6 lines, 30%
2 n-glyphs: 9 lines, 45%
3 n-glyphs: 2 lines, 10%

Among the Alexandrine lines of Fleurs du Mal we have the following distribution:

0 nasalized vowels: 366 lines, 11.5%
1 nasalized vowel: 816 lines, 25.6%
2 nasalized vowels: 897 lines, 28.2%
3 nasalized vowels: 658 lines, 20.1%
4 nasalized vowels: 312 lines, 9.8%
5 nasalized vowels: 101 lines, 3.2%
6-7 nasalized vowels: 32 lines, 1.6%

This looks like a promising match, but more work needs to be done obviously.

A look at N-Glyphs

In this post I'll take a look at a single class of Debosnys glyphs that I call "N-Glyphs", in hopes of ferreting out some details on how the cipher works.

N-glyphs are characterized by having a subglyph at the top that looks like a tilde (~), which I transliterate as N. The N subglyph has the following properties:
  • It cannot occur on its own, but only in combination with other subglyphs
  • It can only occur at the top of a glyph, or else directly under another N subglyph
  • Though N cannot occur on its own, N.N frequently occurs on its own
The following are examples of all of the types of N-glyphs that I have identified:


If we assume that these glyphs represent syllables, then the observed properties of the N subglyph may give us a clue into what it represents.

The greatest challenge is to explain why N cannot occur on its own, but N.N can. However, many subglyphs can occur as pairs, and I think it is possible that pairs such as N.N, I.I and O.O may represent different subglyphs from the corresponding singles N, I and O. If we can accept that explanation, then two possibilities suggest themselves:

1. N is a consonant that can only occur in syllable-initial position.
  • It cannot occur on its own because it must be accompanied by a vowel to make a syllable
  • It can only occur at the top of a glyph because it is a consonant (such as French b or d) that can only occur in syllable-initial position.
2. N is a marker of a vocalic feature
  • It cannot occur on its own because it is a feature of another subglyph (in this case a vowel) which must be present.
  • It only occurs at the top of a glyph because it is used as a suprasegmental mark
At the moment I'm favoring the idea that the N is a marker of nasalization, directly influenced by the use of the tilde in certain languages as a suprasegmental mark of nasalization (e.g. ã, ẽ, ĩ, õ, ũ). To test this theory, I will look at the frequency of the N subglyph in the cipher poem, and compare it to the frequency of nasalized syllables in a large set of French Alexandrine lines.

Friday, August 4, 2017

Debosnys Cipher Transcription Revision

My initial transcription of the Debosnys cipher texts allocated one transcription to one glyph, where I have defined a glyph as a cluster of graphemes bounded to the left and right by white space. So, for example, the "signature" line is analyzed as six glyphs:

 C2B2 XP NU ZOO OM2N SHI

With the Debosnys material, this yields a text of 1188 instances of 425 glyphs. That means a lot of the text will consist of glyphs that only occur once, which makes contextual analysis difficult. I thought it would be useful to be able to do some analysis on deconstructed glyphs as well. So I created a second transcription that looks at the internal structure of the glyphs:

 <C2 B2> <X DOT> <N U> <O Z O> <O2RNO> <CROSSB>

There is an order to the internal structures of glyphs. For example, using Backus-Naur form, you could describe a whole set of glyphs as follows:

<n-glyph> ::= N <n-tail>
<n-tail> ::= <n-medial> | <n-medial> <n-final>
<n-medial> ::= N | U | X | O
<n-final> ::= X | O

I am currently exploring the idea that these structures correspond to syllable structures, with subglyphs representing letters or phonemes.

The distribution of sub-glyphs follows Zipf's law, with the subglyph O being most common. In French, the most common letter is e, and there is a favorable comparison between the frequency of the O subglyph in the cipher poem and the vowel e in a comparable number of lines of Beaudelaire's poetry.

More on that when I have time.