Friday, February 28, 2014

Chapter divisions, and questions about a solution

Last night, I went to bed wondering about a problem in one of the images in my last post. We have (apparently) a picture of Doubting Thomas, but the accompanying scriptural reference is (tentatively) John 22:


The problem is that the story of Doubting Thomas occurs in John 20, not 22. In fact, John only has 21 chapters.

I started to hunt about for some map that would show the relationship between modern chapters, Byzantine kephalaia, and whatever other system of capitulation might be out there, but no luck. As far as I can tell from looking at manuscripts, the Byzantine book of John was divided into 19 kephalaia.

One thing led to another, and the other thing led me the Wikipedia article on the Rohoncz codex, where I saw that many of the things I've been writing about in this series of posts have apparently been discussed by several Hungarian researchers since 2010, notably Tokai, Király and Láng.

This is great progress.  When I first looked at the RC, the predominant theories regarding it were highly implausible. I created a Yahoo group in June 2005 and posted some ideas about numbers and episodes, but became frustrated with the fact that I couldn't reliably embed images in messages. I posted sporadically after that time, but eventually put it aside, to return to it only this year.

If this has already been solved (as the Wikipedia article suggests), then I think I need to go find another mystery to work on.

Luckily I have a stack of them.

However, the Wikipedia article suggests a couple of things about the solution with which I disagree, so I will consider the RC to be partially solved until I see the full solution.

Thursday, February 27, 2014

Scriptural references in the Rohonc Codex

In my last post, I briefly mentioned a formula used to introduce chapters or episodes. In this post, I will explore the idea that part of that formula contains a scriptural reference to the source of the episode.

The following image shows the basic layout of the episodic formula:


the episodic formula

The text in red is boilerplate, generally found in most instances of the episodic formula. The text in blue represents a small set (three?) of possible non-numeric values, and the text in green is a number.

I propose that these formulae contain a scriptural reference, with the text in blue being the name of a book (usually one of the gospels) and the numbers in green being a specific chapter of the book.

Delia Huegel identifies the following as a depiction of Doubting Thomas:


The episodic formula that accompanies this picture contains the number 22. Interestingly, it looks like it is meant to be read "two and twenty", since the lower-order 2 comes before the higher-order 20. However, there are a number of strange things that happen with the numbers in these formulae, so they will definitely bear further examination.

The three main (or perhaps only) "books" mentioned in the episodic formulae are these:




Note that each of these begins with a crossed character, like the character for "nine" in reverse. Following my theory that the crossed line indicates a ligature with t, I suggest that this character represents some cognate of the word saint, which is common (I think) to all of the candidate languages.

If these are the names of three of the gospels, one possibility would be that the last two are Luke and Mark (in some order) because they both end with the same triangle character, and the first is John, because it does not share an initial with Mark (and so therefore is not Matthew).

If so, then I might need to scrap the theory I put forward in my last post suggesting that the triangle and circle represented the word for "day".

More bits from the Rohonc Codex

In this post, I'll propose a few more scattered readings for Rohonc words.

The following depiction of the crucifixion is accompanied by text in which there are two characters that look like the cross itself. I have highlighted one of them, which is accompanied by a prefixed character that curves over it:


In this context, it might make sense to say that the cross-shaped character is an ideogram for the cross itself, and the curved character is the preposition "on".

Note that the preposition "on" looks like an uncrossed version of the number nine. One possible way to interpret this is to say that the preposition here is Albanian , "on", and by crossing the line it is changed to nëntë, "nine". For this gloss, and others in this post, I'll indicate how the phrase would read in Modern Albanian, Romanian, Hungarian and Croatian, for comparison.

A: në kryq
R: pe cruce
H: a kereszten
C: na križu
on the cross

We also find the "on" character in the text accompanying a depiction of the resurrection. (Credit for identifying this scene goes to Delia Huegel, who also identified many other images in the codex).


In this case, the "on" is followed by the number "three". Since we know that Jesus rose on the third day, it would make sense that this could mean "on the third day" or "in three days":

A: në tre ditë
R: în tre zile
H: három nap alatt
C: u tri dana
in three days

A: në ditën e tretë
R: în a treia zi
H: a harmadik napon
C: trećeg dana
on the third day

If we can read the triangle and circle as "day", then that may work with the formula shown in the following image. This formula occurs at the beginning of many episodes in the text:


In this case, we could read the opening formula as beginning "one day...", an expression that is not uncommon at the beginning of a story.

This last image holds a wealth of possibility. It shows a meeting between Jesus and someone whose name or title is given above his head. This name or title shares two characters with the opening formula, including the first character for the word "day". If the underlying language is Albanian or South Slavic, we would expect the person to have a d near the end of his name. If it is Romanian, we would expect a z, and if Hungarian an n.

Tuesday, February 25, 2014

Phonological features in the Rohonc Codex

I encourage anyone interested in the Rohonc Codex to visit the sites of Delia Huegel and Marius-Adrian Oancea. Their theories and observations regarding the text are far more developed than any I could put together.

My own ideas are poor and piecemeal, utterly unworthy of a second glance. But I'm a curmudgeon, so I'll persist in wasting digital space with them. In this post, I will make a suggestion about phonological features in the codex.

To begin with, I return to my proposals for Christ and Pilate. I had suggested the following readings:

Christ

Pontius / Pilate

Graphemically, these two names (or titles) share two common features: The first is a crossed line; The second is a trailing backwards c. In Latin, these two names (or titles) share common features as well. They both share a medial t and second declension suffixes: Chris-t-us, Pon-t-i-us, Pila-t-us.

The crossed line is reminiscent of the Latin t, and the backwards c is reminiscent of the Greek s. However, there are only so many lines you can make with a pen, so it may just be coincidence.

Another symbol with a crossed line is the number nine, which I mentioned in my first post:

nine

Naturally, I am tempted to ask which candidate languages have a t in the name of the number nine. Bearing in mind my theory that this was produced somewhere within the sphere of Ottoman influence, along the general trajectory between Venice and Reichnitz, the main candidates would be Albanian (nëntë) and the South Slavic languages (devet).

The number ten looks like a cross +, but both Albanian (dhjetë) and South Slavic (deset) have t in their words for ten. Alternately, the + could just be a rotated Roman numeral X, so it does not constitute strong evidence one way or the other.

But suppose these are ligatures containing a letter t. The next step would be to remove the t and speculate about the phonological values of the remaining letters.

Tuesday, February 18, 2014

A preamble to a computer worm

I recently downloaded the source-code for 30 Chinese computer worms and Trojan horses. The code makes for interesting reading, but the comments are all in the GB2312 character set, so I have to convert to UTF-8 in order to read them.

When these things first appeared in the wild, they had a deliberate anonymity. Their original developers had given them names like Golden Pig and Chinese Vampire, and adorned their code with comments to describe and explain their effects. But before releasing them, the developers stripped them of all of their identifying and explanatory information, and sent them out into the world nameless and unexplained.

Those who discovered and analyzed them gave them new names. They disassembled their code, but they couldn't recreate the comments and non-semantic details that the original developers created.

It is interesting to look at the original source code for some of these things, for the subtle details you would not see in disassembled code. In this post, I will just give the preamble that appears at the top of one source file that is part of something the author called the Chinese Vampire, written in 2008. Reading this feels kind of like reading the mummy's curse.

Chinese Vampire Source Code
Author: God of the Black Net
After you buy the source code, please do not casually distribute it. Please treasure the product of the author's labor.
If you get lost in the code, the coding style and comments are not generally to blame. Those that I have already changed are very good, quite clear and easy to understand.

It does not use any C++, just simple C code, but edit it using VC++6.0. Once you edit it, you can use it. It has already passed hundreds of tests, so it is quite perfect, and there is no need to edit it very much.
If you can't get rid of it, contact the author and ask for a special killer.

This comment reveals a couple of interesting details about the Chinese Hacker world, at least as it was in 2008 (six years ago, now). First, the Chinese Vampire was for sale, a stock tool that could be purchased and customized. Second, there was an expectation that the author should be remunerated for his hard work.

A neat footnote on function difficulty

In my last post, I argued that the difficulty of the most difficult functions in F increases in proportion to N2 (where F is the set of binary functions that operate on values in [0..N-1]).

I was trying to think of a way to test this conclusion for a specific case, when I remembered an earlier post where I argued that these functions can all be represented as polynomials, as long as N is prime. This gives us a perfect test-case, because polynomials can be described as programs in the context of the previous post, where the instruction set consists of addition and multiplication.

So let's take the case where N = 23. The most complex polynomial representing a function in F will have 232 terms. In reality, the most efficient way to evaluate the function would be to preprocess x2..x22 and y2..y22. This wasn't something I accounted for in my model, but let's run with it anyway. Preprocessing would require 22 operations for x and 22 for y.

There would be 232 = 529 terms in the polynomial, and for each term we would need to perform two multiplications, so 1058 multiplications. We would need 528 additions to add the terms together.

In total, that would mean 22 + 22 + 1058 + 528 = 1630 operations would be required for the most complex function in F.

How does that compare to our prediction? For N = 23 and A = 2, the lower limit for the difficulty of functions in F would be (232 ln 23 - ln (23+2)) / (ln 4 + ln + ln (23+2)) = 312.45.

So, in this test case, it seems the conclusion is right, since 1630 > 312.45. The general case for any prime N can be easily shown, where the difficulty for this particular case is d' = 2(N-1) + 3N2 - 1.

The other thing that is interesting about this is that, for this particular instruction set, we can say something about the makeup of the function space in terms of difficulty. There are NN2 functions in the set, of which (N-1)N2 fall into the "most difficult" category. In other words, difficult functions vastly outnumber easy ones, with the ratio of "most difficult" functions to all others being 1 - ((N - 1)/N)N2.

Monday, February 17, 2014

Difficulty of functions

This weekend I have been thinking about abstract ways to measure how difficult it is to evaluate functions given a hypothetical computer with a specific instruction set.

In particular, I want to see if I can find a lower limit of difficulty for the most difficult functions of a particular type. I thought I had it nailed last night, but as soon as I went to bed I realized I was wrong.

Imagine a computer with the following properties:

1. The computer's only data type is an integer in the set [0..N-1], called an int.

2. The computer's programs represent functions that take two ints and returns an int. The set of all such functions is F.

3. The computer has an instruction set composed of A <= N+2 functions drawn from F. The computer can easily evaluate every function in its instruction set.

4. The computer's programs are finite function trees, so each program has a root function drawn from the instruction set, which takes two parameters. The two parameters of the root function may be function trees, constant ints, or the special symbols x and y which evaluate to the program's two input parameters. There is no recursion.

5. The difficulty of a program is the average number of operations required to evaluate the program for any possible input values x and y.

For simplicity, we initially assume that there is no lazy evaluation.

Given a computer like this, can we put a lower limit on the difficulty of the most difficult function in F?

If there is no lazy evaluation, then the difficulty of a function is proportional to the number of instructions. Given that there are NN2 functions in F, if you could write maximally efficient programs to represent each of those functions, how big would the largest such efficient program need to be?

There are p0 = N+2 distinct programs that could be written with zero instructions. These would be the programs that returned a constant value, or else x or y with no modification.

The number of distinct programs that can be written with one instruction is p1 = A*(p0 * p0) = A(N+2)2. That is, there are A possible instructions, which must take a program with zero instructions for each parameter.

The number of distinct programs that can be written with two instructions is p2 = A*(p* p1 + p* p0) = 2A2(N+2)3. That is, the first parameter may be a zero-instruction tree, in which case the second parameter must be a 1-instruction tree, or vice-versa.

The number of distinct programs that can be written with three instructions is p3 = A*(p* p2 + p* p1 + p* p0) = 5A3(N+2)4. The progression we are seeing here is that pn = snAn(N+2)n+1, where sn is the number of different trees that can be formed with n binary functions.

There will be a lot of overlap in the program space, meaning there will be multiple programs that can evaluate to the same function. This means we can't say that n instructions can always represent pn functions, but we can say that they will represent no more than pfunctions. Thus, for pn = |F| = NN2, we can be certain that the most complex program representing a function in F can be no smaller than n instructions.

So the lower limit on the difficulty of the most difficult function in F is calculated as follows:

pd > NN2.
sdAd(N+2)d+1 > NN2.

For large values of d, I think 3d < sd < 4d. This needs to be proven, though. If it is true, we could fudge a bit and say

4dAd(N+2)d+1 > NN2.
d ln 4 + d ln + d ln (N+2) + ln (N+2) > N2 ln N.
d ln 4 + d ln + d ln (N+2) > N2 ln N - ln (N+2).
> (N2 ln N - ln (N+2)) / (ln 4 + ln + ln (N+2)).

Of course, there is a lot more to gnaw on here, like operation laziness and pre-evaluation, which might be real game-changers. But as we have it now, the difficulty of the most difficult functions in F increases in proportion to N2.

Tuesday, February 11, 2014

A brief history of Chinese hacking: Part III

(The following draws extensively from an online text titled "The Record of X on the Rise of the Chinese Hacker", supplemented from other sources.)

In the last two posts, I have mentioned two galvanizing events for the Red Hacker movement: Violence against ethnic Chinese in Indonesia; and the NATO bombing of the Chinese embassy in Belgrade.

Two months after the bombing of the embassy in Belgrade, the government of Taiwan announced a 'Two States' policy, which undermined the long-held idea that China and Taiwan were a single country suffering a temporary disunion. Seasoned by the 1998 action against Indonesia and the May 1999 action against the United States, the Red Hacker apparatus was ready to turn and defend the honor of the motherland on the battlefield of Taiwan's networks.

They attacked the website of the Executive Yuan of Taiwan, as well as many other websites, deploying newly developed tools like Glacier (冰河, a trojan horse) for the first time, and NetSpy (a tool for uploading and downloading files from a server, apparently).

In 2000, the number of internet cafes mushroomed, and the hacker spectrum broadened. The old Black Hackers were still around, but the ready availability of technology led to a large number of careless, headstrong and unskilled teenagers pursuing the black hacker path. These "script kiddies" were nicknamed the Little Blacks (小黑黑) by an influential female hacker of the time named Wollf.

Alongside the Black and Red hackers, there also arose Blue Hackers (篮客, lán kè), who were relatively unconcerned with cheap tricks and politics, and intensely passionate about computer security.

In 2001, after the South China Sea collision incident, a small American hacker group called PoizonBOx defaced at least a hundred Chinese websites, and reportedly 80,000 Chinese hackers returned fire beginning on May 4. Most of these were unskilled script kiddies, so the damage done did not reflect their large numbers, and some considered the action to be a farce. As far as I can tell, 100-600 websites were vandalized, and the White House website suffered a DOS attack that blocked access from May 4 to May 8.

In the years between 2000 and 2002, Chinese hackers created and released the Code Red, Code Blue and nimda computer worms. But many also undertook a serious discussion of the ethical dimensions of hacking, and of hacking culture. They began to discover and publish their own findings on network and software vulnerabilities, which have been picked up by international security research organizations.

Sunday, February 9, 2014

A brief history of Chinese hacking: Part II

(The following draws extensively from an online text titled "The Record of X on the Rise of the Chinese Hacker", supplemented from other sources.)

I ended the last post with the emergence of the Chinese hacktivist alliance in response to violence against ethnic Chinese in Indonesia in 1998. This era also saw the emergence of the Green Corps and the Chinese Green League. (I'm not sure what the significance of the color "green" is in these names, but I wonder if it doesn't relate to the color of CRT screens).

Webpages discussing the technical details of hacking began to proliferate, and Chinese hackers eagerly undertook to study the relevant technologies. The most famous hacker of this period may have been Xiǎo Róng (小榕), creator of tools like Stream of Light (流光, a vulnerability scanner), Tracing Snow (溯雪, a password cracker) and Chaos Knife (乱刀).

1999 saw a dramatic increase in the number of internet users in China, and it also saw the NATO bombing of the Chinese embassy in Belgrade, which many Chinese saw as a deliberate act of retribution on the part of the United States for China's criticism of NATO action in Yugoslavia.

The second day after the bombing of the Chinese embassy in Belgrade, the first Red Hacker website was born, initially called the Chinese Hacker's Rallying Point for the Motherland (中国红客之祖国团结阵线), and later renamed the Chinese Hacker's United Front for the Motherland (中国红客之祖国统一战线).

This site drew intense interest from Chinese citizens around the world, and the Red Hackers carried out widespread attacks on American websites and email servers.

Hacking tools created in this period included NetSpy (inspired by Cult of the Dead Cow's Back Orifice), Glacier (冰河, a trojan horse), Black Hole (黑洞), Network Thief (网络神偷), Gray Dove (灰鸽子), XSan and YAI.

Glacier, Black Hole and Network Thief are still considered by many to be essential tools for the Chinese hacker. "Official" development of Glacier has ceased, but users have forked off many versions of their own.


A brief history of Chinese hacking: Part I

(The following draws extensively from an online text titled "The Record of X on the Rise of the Chinese Hacker", supplemented from other sources.)

China's earliest online community arose in the mid-1990s, with a small number of people using PCs and dial-ups to interact with each other on bulletin-board systems. Between 1994 and 1996, BBS servers proliferated in major Chinese cities, and interest in copying software and breaking license controls on software also grew, creating the first generation of Chinese hackers.

Internet access came to China in 1996, and the BBS culture moved from dial-ups and isolated servers to the internet. It is interesting to me that the BBS format is incredibly prevalent on Chinese websites today, while they have been basically replaced by social networks in America. It was during this period that a man named Gao Chunhui created the first personal website in China, and it is said that his personal site at that time was dedicated to the topic of breaking software registration controls.

This era also saw a brief period of phreaking (电话飞客), but advances in telecom technology rapidly put an end to that.

In 1998, a Taiwanese student named Chen Ing-Hau released the Chernobyl virus, which caused billions in economic damage in mainland China. Because the author was a Taiwanese student, some Chinese users perceived the damage done by the Chernobyl virus as a politically motivated attack.

Also in 1998, amid the deepening Asian Financial Crisis, there was widespread violence against ethnic Chinese in Indonesia. Chinese internet users formed teams that flooded Indonesian government email accounts, and they tried to bring down Indonesian websites with ping-based DOS attacks. In order to coordinate these attacks, a group was formed called the Chinese Hacker Emergency Meeting Center (中国黑客紧急会议中心). This might be considered the first Chinese hacktivist alliance.

So, from the very beginning, Chinese hacking has been closely tied to nationalist sentiments.


Friday, February 7, 2014

The Chinese Hacker's Code

In 1984, Steven Levy suggested that there was a commonly understood but unwritten [American/European] "hacker code of ethics", that encompassed the values of sharing, openness, decentralization, free access to computers, and world improvement.

On many Chinese hacker sites I have found a written code of conduct, which is attributed to an influential Taiwanese hacker named CoolFire, who has his roots in the computer culture of the late 1990s. I will present that code of conduct below, but first I want to write out something about the connotations of the word "hacker" in Chinese.

The most common word for "hacker" in Chinese is 黑客, hēikè, derived phonetically from the English word "hacker". These two characters literally mean "black guest", which I think is a great way to describe a hacker's presence on your system. Unlike the English word "hacker", however, the Chinese hēikè seems to have a less negative, perhaps more ambiguous connotation.

The less common word is 骇客, hàikè, also derived phonetically from English "hacker", but with a literal meaning of "terrifying guest". This seems to be a more negative term, maybe more like cyber-criminal.

There is a strong association between hacking (hēikè) and patriotism in China, dating back to the earliest organizations and activities of hackers in the 1990s. This has given rise to another term, 红客, hóngkè, meaning "red guest". This is sometimes translated as "honker", but I'll render it as Red Hacker for now. (Not only does "honker" also mean someone from Hong Kong, but it sounds pejorative to me.)

Without further ado, here is a composite of the Chinese Hacker Code, drawn from several similar versions.

1. Do not sabotage any system. It will only bring you trouble.

2. Do not modify any system files. If you must do so to access a system, please restore them to their original state after you are done.

3. Do not casually hack a website and then tell friends whom you do not trust.

4. Do not talk about what you have hacked in a BBS or forum.

5. Do not use your real name when you post an article.

6. Do not leave your computer while you are actively engaged in invasion.

7. Do not invade or attack telecom/government organization servers.

8. Do not talk about what you have hacked over the phone.

9. Keep your notes in a safe place.

10. Read everything related to system security or vulnerabilities (learn English quickly!)

11. Do not delete or alter accounts on the systems you invade.

12. Do not modify system files, unless it is necessary to conceal your intrusion. In any case, maintain the security of the system, do not invade and disable the original security.

13. Do not share the accounts you have cracked with your friends.

14. Do not invade or destroy government organization servers.

15. If you can't program you can't be a good hacker.

16. Hackers are not "pirates".


Thursday, February 6, 2014

Motherlode

Last month, I idled away some hours trying to get some idea of the Chinese hacker culture. I assumed that, like the hackers I knew when I was younger, that Chinese hackers would have some kind of specialized vocabulary, like a Chinese version of 1337. I figured if I could find a few terms in that specialized vocabulary, I might be able to do some narrow internet searches that would give me a general outline of the Chinese hacker world.

It didn't really work. I did get a really interesting look into how the Chinese government is handling cybersecurity, but I never found anything that really looked like a hacker site.

Today, by chance, I hit the motherlode. I found the type of site I was looking for, and the wealth of information available is a bit overwhelming. My experience with Chinese government sites has been that, after I access them a few times, they may drop off the net, especially if they are very interesting. I fear I won't have enough time to learn what I want to learn before this site goes away too.

If my luck holds, I will soon have much new and interesting information to blog about.

Monday, February 3, 2014

The proper names of the saha islands

In a couple of recent posts, I've been gnawing on the names of some islands at the mouth of the Tumen river, where the word saha appears to mean "island".

Initially I was working from European maps, which alternated between saba and saha, and I had thought that the word must be saba.

However, today I remembered a volume at the Bibliothèque national de France containing a long list of Manchu place names. I skimmed through it, and managed to find the page shown below, which now allows me to give the proper names for the islands, as transcribed from Manchu script.

The islands listed on Danville's map are (from South to North) Taitou saha, Siské loun, Tayam ou saha, Sarbatchou saba, Mama saba, and Youanga toun. The names, as transcribed from the page below in Manchu script, are Daidu saha, Sishe tun, Dayanggū saha, Sarbacu saha, Mama saha and Yohangga tun (not shown on this page, but on a later page).

Also listed on this page is a river called Fiya bira (bira meaning "river"), which bears a passing resemblance to the ethnic name Fiyaka.


Borrowing associative and commutative properties

I was trying to come up with a set of functions that had associative and commutative properties last week, and I found a way to "borrow" those properties from addition.

Start out with a function f(x) in ZN that acts as a permutation of the elements [0..N-1]. Since f(x) is a permutation, there exists the inverse f-1(x).

From these, generate a function F(x, y) = f(f-1(x) + f-1(y)). The new binary function F(x, y) has associative and commutative properties, which it has borrowed from addition. The commutative property is obvious, because f(f-1(x) + f-1(y)) = f(f-1(y) + f-1(x)), so F(x, y) = F(y, x). The associative property works like this:

F(F(x, y), z) = F(x, F(y, z))

Rewrite in terms of f(x) and f-1(x):

f(f-1(f(f-1(x) + f-1(y))) + f-1(z)) = f(f-1(x) + f-1(f(f-1(y) + f-1(z))))

Since f-1(f(x)) = x, we can simplify both sides as follows:

f(f-1(x) + f-1(y) + f-1(z)) = f(f-1(x) + f-1(y) + f-1(z))

The number of these functions is N!, i.e. the number of permutations of [0..N-1]. These functions have a zero value i, such that F(x, i) = x. Since F is associative, for simplicity we can write F(x, y, z) for either F(F(x, y), z) or F(x, F(y, x)).

Suppose you try to build a multiplication-like function G from F, such that, for example, G(x, 4) = F(x, x, x, x). The resulting function is simply G(x, y) = f(y * f-1(x)). Interestingly, this multiplication-like function is not guaranteed to be associative or commutative. In fact, it ends up being more like an exponent than multiplication.

If someone someday figures out how to calculate the discrete logarithm in polynomial time, and thereby break a number of existing cryptographic schemes, I wonder if other exponent-like functions could prove to be less easy to break.