Monday, February 17, 2014

Difficulty of functions

This weekend I have been thinking about abstract ways to measure how difficult it is to evaluate functions given a hypothetical computer with a specific instruction set.

In particular, I want to see if I can find a lower limit of difficulty for the most difficult functions of a particular type. I thought I had it nailed last night, but as soon as I went to bed I realized I was wrong.

Imagine a computer with the following properties:

1. The computer's only data type is an integer in the set [0..N-1], called an int.

2. The computer's programs represent functions that take two ints and returns an int. The set of all such functions is F.

3. The computer has an instruction set composed of A <= N+2 functions drawn from F. The computer can easily evaluate every function in its instruction set.

4. The computer's programs are finite function trees, so each program has a root function drawn from the instruction set, which takes two parameters. The two parameters of the root function may be function trees, constant ints, or the special symbols x and y which evaluate to the program's two input parameters. There is no recursion.

5. The difficulty of a program is the average number of operations required to evaluate the program for any possible input values x and y.

For simplicity, we initially assume that there is no lazy evaluation.

Given a computer like this, can we put a lower limit on the difficulty of the most difficult function in F?

If there is no lazy evaluation, then the difficulty of a function is proportional to the number of instructions. Given that there are NN2 functions in F, if you could write maximally efficient programs to represent each of those functions, how big would the largest such efficient program need to be?

There are p0 = N+2 distinct programs that could be written with zero instructions. These would be the programs that returned a constant value, or else x or y with no modification.

The number of distinct programs that can be written with one instruction is p1 = A*(p0 * p0) = A(N+2)2. That is, there are A possible instructions, which must take a program with zero instructions for each parameter.

The number of distinct programs that can be written with two instructions is p2 = A*(p* p1 + p* p0) = 2A2(N+2)3. That is, the first parameter may be a zero-instruction tree, in which case the second parameter must be a 1-instruction tree, or vice-versa.

The number of distinct programs that can be written with three instructions is p3 = A*(p* p2 + p* p1 + p* p0) = 5A3(N+2)4. The progression we are seeing here is that pn = snAn(N+2)n+1, where sn is the number of different trees that can be formed with n binary functions.

There will be a lot of overlap in the program space, meaning there will be multiple programs that can evaluate to the same function. This means we can't say that n instructions can always represent pn functions, but we can say that they will represent no more than pfunctions. Thus, for pn = |F| = NN2, we can be certain that the most complex program representing a function in F can be no smaller than n instructions.

So the lower limit on the difficulty of the most difficult function in F is calculated as follows:

pd > NN2.
sdAd(N+2)d+1 > NN2.

For large values of d, I think 3d < sd < 4d. This needs to be proven, though. If it is true, we could fudge a bit and say

4dAd(N+2)d+1 > NN2.
d ln 4 + d ln + d ln (N+2) + ln (N+2) > N2 ln N.
d ln 4 + d ln + d ln (N+2) > N2 ln N - ln (N+2).
> (N2 ln N - ln (N+2)) / (ln 4 + ln + ln (N+2)).

Of course, there is a lot more to gnaw on here, like operation laziness and pre-evaluation, which might be real game-changers. But as we have it now, the difficulty of the most difficult functions in F increases in proportion to N2.

No comments:

Post a Comment