Saturday, November 30, 2013

Functions with certain properties

I've been trying to figure out the general answer to this question. It seems like I should be able to sort it out, but I've just got a mental block.

For a set of values N, how many functions f exist, such that the following are true? (Forgive me if I make mistakes with the notation...I'm not a mathematician.)

f (a, b), c) = (a, c), b)


g : g (a, f (a, b)) = b

where a, b, c ← N

I've been able to determine the answer through brute-force exploration of the function space for the following sizes of N:

For |N| = 2, there are 2 functions with these properties.
For |N| = 3, there are 6 functions with these properties.
For |N| = 4, there are 96 functions with these properties.

It seems like it ought to be easy to say, for a given size |N|, how many of these functions there are. But I can't see it yet.

Note 12/3/2013: I figured it out.

Monday, November 25, 2013

Structure of political events

I've off and on been trying to find ways to create computational models of political events in order to predict their outcomes. I've started with an intuitive sense that events evolve in a structured way, and if we can predict the probability of the smaller steps in that evolution, we can predict the total outcome of the event.

For example, consider the question of whether Afghanistan will approve the bilateral security agreement offered by the United States. The Loya Jirga (assembly of elders) has just approved the agreement, and urged Karzai to sign it before the end of the year. In order for Karzai to sign it, the following internal events must occur:

E1: The Afghan National Assembly must approve the BSA. This will no doubt be influenced by the fact that the Loya Jirga has already approved it, since the Afghan constitution declares the Loya Jirga to be the highest expression of the will of the Afghan people. However, there is some question of whether the Loya Jirga was properly assembled according to the constitution, and some feeling that Karzai convened the Loya Jirga because he thought he could influence its decision more than he could influence the National Assembly directly. Let's say the odds of this are something like 0.9.

E2: Karzai must choose to sign the BSA before the end of the year. Currently he is calling for it to be signed next year.

Intuitively, the outcome of the total event should be something like:

PE1 * PE2

Where PE is the probability that event E will occur.

I do not know specifically why Karzai is holding out on signing the BSA this year. In the absence of knowledge, it is tempting to assign a 0.5 probability to PE2, giving final odds of 0.9 * 0.5 = 0.45. However, my recent experience is showing that there is another way to look at these questions.

In the most general description, event E2 is an irreversible state change. Once Karzai chooses to approve the BSA, he cannot (in a practical sense) reverse the approval and return to his original state. He could make his decision on any day, so PE2 should really decay towards zero according to a formula like the following:

PE2 = 1 - (1 - pE2)d

Where pE2 represents the odds on any given day that Karzai will change his mind, and d is the number of days remaining before the end of the year.

If delaying the BSA is good in itself for Karzai, then he will never change his mind, so we could say that pE2 is zero and so likewise is PE2. That might be the case if Karzai believes he could be charged with treason, assassinated, or otherwise subject to persecution/prosecution for signing the BSA.

On the other hand, Karzai may be threatening to delay signature of the BSA in order to extract some concession from the United States. In that case, for him to fail to sign the BSA by the end of the year signifies the failure of his gambit. The odds of his signature in that case are calculated very differently, so we should probably think about this as two separate events:

E2a: Karzai signs the BSA even though he believes he could be persecuted as a result.

E2b: Karzai signs the BSA after receiving a concession from the United States.

In order to calculate E2 from these two values, we need to decide the odds that either or both are true. Let's say there is are 0.1 odds that Karzai fears serious persecution and 0.9 odds that he is trying to wring out a concession:

PE2 = 0.1 PE2a + 0.9 PE2b

As the end of the year approaches, the value of Karzai's signature drops, so the value of what he expects in return should decrease.  Meanwhile, the value of what the US offers should gradually increase until it meets the value of what Karzai has to offer. If both players know each other well, then they have already calculated that they will reach an agreement before the end of the year, and the only question is whether they are wrong. In that case, PE2b should depend on the odds that the two players have correctly estimated each other's thresholds for negotiation.

PE2b = 1 - (1 - CE2b)d

Where CE2b represents the odds that the two parties have correctly assessed each other's positions and can reach an agreement on any given day.

So, if we estimate 0.9 odds that the National Assembly will concur with the decision of the Loya Jirga, and 0.8 odds that the two players have correctly estimated each other's positions, then the total likelihood of a timely signature is:

PEtotal = PE1 * (0.1 PE2a + 0.9 PE2b)
PEtotal = 0.9 * (0.1 * 0.0 + 0.9 * (1 - (1 - 0.8))d)

Under that model, the odds of Karzai signing the BSA hover at about 0.81 till the last three days of December, when they suddenly plummet to zero.



One of the things that bedevils me here, though, is the unknown intermediate steps. If I have time, I think I would like to see what kind of behavior emerges if I simulate situations where there are thousands of dependent steps and thousands of possible routes to a particular outcome. Do complex networks of interrelated events conform to a different set of rules en masse?

(Note added 12/17/2013: The Karzai example here does not work if, for example Karzai is engaged in negotiations with Iran. The reason is that the settled state of negotiations with the US gives Karzai the opportunity to develop other options and choose between them. So maybe we should expect that a less-than-satisfactory settled negotiation will usually stall while other options are developed.)

Sunday, November 10, 2013

A chartreuse analogy

This is a quick follow-up on the last post, where I sketched an idea for a functional programming language where the lambda was implicit.

First, let me repudiate the whole idea of the unresolve metafunction. There's really no need for it, and it just makes things messy. I'm going to go back and cross that bit out.

Second, it might have occurred to you that, if the lambda is implicit in the way I described, then there is no distinction between values and expressions that take no parameters. At first, that seems like a problem, but I don't think it is. In a pure functional context, what is really the difference?

Of course, even the purest functional language must eventually sully itself by making contact with this sordid and messy world, and there you might have a problem, since your parameterless functions need to be evaluated in order for any side-effects to occur.

Which brings me to an idea about how to separate the pure (and provable) side of a program from the impure, messy side: Rather than trying to incorporate both worlds in one language, let them be separate languages.

Functional languages are bad at being imperative, and imperative languages are bad at being functional. So strictly separate the two by writing the core code in a functional language, and accessing it from the real world via an imperative language. Naturally, you want to be able to compile the two together, and you want the languages to have common features, but let it be clear when you are coding in one or the other, and let the compiler enforce the purity of the functional side.

You could think of the functional (sub-)language as being like the eremitical side of a Carthusian monastery, and the imperative (sub-)language as being like the cenobitic side. In a Carthusian monastery you have the hermits, whose lives are dedicated to prayer and contemplation, and the lay monks, who serve the hermits and maintain the monastery. The two are strictly separated and work in different domains, but are integrated with each other in a symbiotic way.


Saturday, November 9, 2013

Implicit lambdas and thusness: a silly idea

When my brother and I were out camping with our dad, we used to sit around the fire at night and ask all kinds of "what if" questions.

Tonight, I've gotten the kids to sleep, gotten the dishes washed, and did a little amateur skimming of the Romani internet. (Who couldn't love a language that calls a black widow spider yalakráno, and a pear tree ambrolin?)

Now I am tired, and unable to do anything except ask "what if?"

What if you had a functional programming language where the lambda was implicit, so any expression with an unresolved term was a lambda expression, and the unresolved terms were the parameters?

Obviously, you would need a couple of things to make this work. First, you would need to do away with a lot of syntactic sugar, because your parser needs to be able to operate without necessarily knowing anything about the referents of the symbols it is operating on. Second, you would need to have some way to recognize which input values to a lambda expression match which parameters.

But both of those are just conventional matters, not real theoretical difficulties. Let's say, for the sake of this post, that we have a syntax like Lisp, and parameters are ordered as they appear in the source code.

Now for the second bit: What if you were able to reference a function from within itself using a special term, similar to the way you can use the variable this, self or me in object-oriented languages?

For the sake of this post, let's call this special term thus.

If you had a language with both of these features, you could write a lambda expression for the exponentiation function as follows:

(if (eq x 1) 1 (multiply x (thus (subtract x 1))))

We assume the functions if, eq, multiply, thus, 1 and subtract are known, so the unresolved term is x, which is the sole parameter to the expression.

Now, what if you could take a lambda expression in this language and override some of the known terms, producing functions with similar structure but different behavior? So, maybe have a metafunction called unresolve that returns a term of an expression from the known space to the parameter space. So, if you wanted a function like factorial that took the parameter x and also some function to use instead of multiply, you could say:

(unresolve factorial multiply)

Then you could substitute a different function, like divide:


((unresolve factorial multiply) divide)

These are the sorts of vapid ideas a tired mind produces.

Friday, November 8, 2013

FST via RDBMS

I was stuck in an airport a couple days ago with nothing to do, so I started a little project to pass the time. The challenge is this: Using a conventional database and a conventional programming language, what is the fastest program I can write that can take an inflected word in some given language and identify the root and the inflection(s)?

There are two ends of the spectrum in solutions. On one end, you have a "fat" solution involving a massive database containing every possible inflected form of every word in the language. On the other end, you have a "lean" solution with a smaller database of stems and inflectional information, and some complex code to use the two in combination.

The fat solution has the advantage of using mature indexing technology to efficiently search the massive database for the subject term.  Supposing you have a binary tree index, the cost of executing a query with the fat solution might be:

Tfat = Q + N log(si)

where Q is the fixed cost of opening a query to the database, N is the incremental cost of checking one node in the binary tree, and log(si) is the depth of the binary tree, based on the number of stems (s) and inflections (i).

However, for agglutinative languages this is a nightmare. Consider the standard Uyghur expression used when meeting someone new:

tonušqanliqimizdin xošalmen

The stem of the first word is ton, and the rest is all inflection (uš - qan - liq - imiz - din). A database containing all inflected forms of Uyghur verbs would be huge.

The leanest solution searches the stems and inflections separately. Since the inflections are a fixed set, it makes sense to factor them out first, then do a simple search of the stems.  The speed of the lean solution is the time required to query for all of the inflectional data (Ni), the time required to parse all possible inflections (P), and the time required to query roots (N logs):

Tlean = (Q + Ni) + P + (N logs)

which can be written as

Tlean = (Ni) + P + Tfat - N logi

In order for the lean solution to beat the fat solution, this has to be true:

Ni + P < N logi

That simply isn't doable, because Q, P and N are positive, and i > logi.  In order to get the lean solution to work, we've got to pre-fetch the inflectional data. In fact, we could write a routine to use the inflectional data to build a parser, in which case we're down to this:

Tlean = P + Tfat N logi

And now all we need is to make this work:

P < N logi

The fastest parser would probably be a finite state transducer. It seems to me you would end up with the FST traversing about the same number of states as the nodes traversed by the index search, but the FST has the advantage of running in memory, while the index search would have to deal with some of the necessary evils of relational database management (like query preparation, I/O buffering, and so forth) that are hiding in the value N.

Going back to the challenge requirement that this has to be implemented in a conventional programming language, the best marriage between conventional programming and an FST is probably found in regular expression processing, so that would probably be the tool to use for the parser.