## February 2009 solution

Python code with example implementations of all of this month's solution functions is available here.

#### Part 1(a): Left shift

First of all, we note that x<<y equals x*(1<<y), so, given that multiplication is to be solved in part 1(c) we can restrict ourselves to finding the integer that encodes the array whose i'th element is 1<<b[i].

This is done in a very similar technique to that used in array look-up: substitute each single cell with width cells. Now, create two vectors, one where there is a running counter over the width elements, the other that has repetitions of b[i]. Equality will happen only in the b[i]'th element. This will transform each cell in the original vector to a vector of size width, where all cells are zero except the b[i]'th element, that equals one. Now, change the width of the cells to be 1. The result is the desired vector.

Below is a step-by-step example using a vector b=[1,4,2] encoded with width=5:
 b[i] [1,4,2] element-repeat b[i] [1,1,1,1,1,4,4,4,4,4,2,2,2,2,2] repeated counter [0,1,2,3,4,0,1,2,3,4,0,1,2,3,4] equality? [0,1,0,0,0,0,0,0,0,1,0,0,1,0,0]
Now, if we take the last vector and encode it using width=1, we get the number that is 001001000000010 in base two. Considering the same number in the original width=5, we get [2=1<<1,16=1<<4,4=1<<2].

This solution not only looks like array look-up, it can actually be implemented as such. The function "1<<a[i]" can be described as a look-up into the table [1<<0, 1<<1, 1<<2, ... , 1<<(width-1)] (with potentially zeros appended at the end). This table is encoded by the integer Σi=0width-1 2i2i*width, which equals Σi=0width-1 2i*(width+1), which is simply the all-ones vector of length width and width width+1.

#### Part 1(b): Right shift

Right shift seems a bit more complicated than left shift, but it really isn't. Merely note that it's fairly easy to right shift all the vector's elements by the same number (using a single right shift to the entire integer and one AND operation to mask out any overflows).

Also, note that x>>y equals (x<<z)>>(z+y) for any z. What we do, therefore, is to first calculate the left-shift by width-b[i] of the array elements, then right-shift all by width.

This assumes that all b[i] are smaller than or equal to width. If this isn't true, any shift right that is by width or larger will always yield the same result (zero), so we can simply left-shift by min(b[i],width).

#### Part 1(c): Multiplication

So far, we've considered our S=Σs[i]*2i*width construct by the name "vector", but, now that we know the width value can be altered freely in O(1) to make sure it's always large enough to avoid overflows, a better name for it may be "polynomial". The integer S can be thought to encode the polynomial Σs[i]*Xi. If S=Σs[i]*Xi and T=Σt[i]*Xi, then the multiplication result S*T equals the result of the polynomial multiplication of Σs[i]*Xi by Σt[i]*Xi.

Now, consider the Xsize coefficient of this result. It equals the sum of all s[i]*t[size-i]. All we need to do is to create S and T such that s[i]=a[i]*2i*width and t[i]=b[size-i], and then isolate this element, to reach the desired result. Both s[i] and t[i] are vectors easily created by width-change/folding operations.

#### Part 1(d): Integer division

Unlike left-shift/right-shift, division can't be simulated as a simple inversion of multiplication. This is because numbers may not divide evenly, causing difficult-to-manage overflows. We will opt for a different solution, instead.

We will utilize the parallelization properties of the computational model. Consider that the integer division answer can only be between 0 and a[i]. Let us first find the maximum of all a[i] (not a problem, as we already know how to perform an entire sort in O(1).) Next, let us create a counter running from 0 to max(a[i]), aligning a copy of each of the original a[i] and b[i] against each of the values of this counter. This is done by repeating either copies of the entire array or of each element inside it, both being functions we have shown to be tractable in O(1).

The 0 through max(a[i]) counter will serve as a series of "guesses" regarding the division answer. We will check whether a guess is correct by use of the inequality guess*b[i]≤a[i]<(guess+1)*b[i]. Once the positions of the correct guesses are found, we mask out everything else and fold the results into the correct positions in the output vector.

#### Part 1(e): Modulo taking

Once we have division, modulo is simple: a[i] mod b[i]=a[i]-(a[i] div b[i])*b[i].

Note: This month's riddle allows the convenience assumption that all inputs are positive and that outputs do not cause overflows. Would the question have become more difficult if these assumptions had been removed? The answer is no, though, of course, some special-case handling would have had to be inserted. Consider, for example, the solution for division. Internally, the solution for division uses multiplication. However, unlike in the answer to 1(c), the internal multiplication used here cannot assume lack of overflows or positive inputs. This means that in order to perform this internal multiplication we need to have a generic solution to multiplication without mitigating assumptions. As a result, some allowances are made. In particular, all vectors have their widths doubled prior to be sent to be multiplied. This is because the width needed in order to store a multiplication result cannot be larger than the sum of the widths of the operands.
Such logic works in every part and sub-part of this month's riddle. It is always possible to find a suitable width for the output, even if such a width were to be not given explicitly.

We leave to the reader the question of how to augment the solutions above to be able to handle vectors carrying values that are zero/negative.

#### Part 2: Factorial

The question asks the reader to show how "vectorized" factorial calculation can be O(1), but the vectorization part is really a misdirection: all one really needs to do is to calculate the integer encoding [0!,1!,2!,...,n!] for a given n. From this, the required function is simply a matter of array look-up, choosing n to be max(a[i]).

So, we don't really need to answer the question for any vector, but rather only for one specific vector, namely [0, 1, 2, ..., n]. Differently worded, we are considering the function

f(n)=Σi=0n i!2width*i

and are asking whether this function has a closed form formula, for sufficiently large values of width. (A closed form formula is a finite combination of basic functions. Given that our atomic functions are to be considered "basic", a "closed form formula" and an "O(1) calculation" both mean exactly the same thing.)

The surprising answer is that not only is this possible, but that every function that is known to be calculable by a specific computer has a closed form representation, or, alternatively worded, all functions that can be calculated by any specific computer can be calculated in the described computational model in O(1).

The full proof of this claim is given below in the section How to Solve the Halting Problem in O(1). Before going to show this general result, however, let's solve the specific case of the factorial calculation. This is done by use of a method very similar to that of the solution to division. In division, we had a finite number of guesses for the correct solution. All we had to do is to parallelize the test that checks which of the guesses is the correct one. Here, we will do the same, but instead of just guessing the final n! answer, we will guess the entire value of f(n). This value, too, has only a finite number of possibilities, because it is known to be representable by only width*(n+1) bits. All we have to do is to produce a vector including all such numbers (this is simply a counter going from 0 to 2width*(n+1).

To complete the solution we need to show that we can check, in O(1), whether a guess is correct. This is easy to do, as f(n) is the only vector of length n+1 to satisfy the induction assumptions

f(0)=1
For all 0<xn, f(x)=f(x-1)*x

The former can be checked in O(1) easily (and, in fact, we can even go so far as to only generate guesses that have this property). The latter condition can be worded for the entire vector together as

f(n) element-wise multiplied by (counter(n+1)>>width) = f(n) >> width

To exemplify, if n=5:
 f(n) [1, 1, 2, 6, 24, 120] counter(n+1) [0, 1, 2, 3, 4, 5] counter(n+1)>>width [1, 2, 3, 4, 5] multiplication [1*1, 1*2, 2*3, 6*4, 24*5] = f(n) >> width [1, 2, 6, 24, 120]
We generate all guesses, check in O(1) which one of them is the correct one and output it.

We leave the reader with the follow-up riddle of how to solve the same question, but using an algorithm that requires an amount of memory that is no more than exponential in the length of the output.

#### Bonus: How to solve the halting problem in O(1)

The solution above is a special case of a general result: consider f(n) not as a function but rather (as would normally be the more intuitive way) as a process: we start off with 0!=1, then iteratively calculate (a+1)!=a!*(a+1). Because we are able to verify, in O(1), the correctness of a single step in this iterative process, we can also vectorize this check and verify in O(1) the entire process. Consider what this means, if the process at hand is not the calculation of a factorial, but rather a simulation of a Turing machine, where each step in the calculation is a single advancement of the Turing machine. With the tools developed in part 1 of the riddle, it is easily possible to verify in O(1) that any given step of the advancement of the Turing machine is simulated accurately. Therefore, we can simulate all of it together, reaching the final output of the Turing machine (or the conclusion that it does not halt) all in O(1).

This may sound like a magical solution to the halting problem, but there's one small problem with it: recall the note at the end of part 1. It indicated that for all questions asked in both parts of this month's riddle it is possible to know in advance what a good width parameter is going to be. This is not true for Turing machine simulations.

So, the solution above is an O(1) solution to the halting problem, but not to the halting problem of generic Turing machines but rather to the much simpler case of Turing machines that use a bounded length tape (as well as for non-deterministic Turing machines that use a bounded length tape). These are computationally equivalent to actual computers, where the bounded length of the tape acts like the bounded memory/hard-disk of the computer. They are also equivalent to finite state machines (or to non-deterministic finite state machines).

As the total number of possible states for such a machine is known, it's possible to know in advance how many steps one needs to simulate before one is guaranteed to either reach a halting state or to get stuck in an infinite loop. The fact that for Turing machines with bounded length tape the halting problem can be solved is well known, but this month's riddle innovates over this fact by adding that the entire calculation needed for this can be performed in O(1) in the described computational model. Alternatively worded, the calculation has a closed form formula.

Furthermore, not only do all functions that can be calculated using any specific computer have closed-form formulae, but also, because anything calculated by any specific computer can be computed by a universal Turing machine with bounded length tape, all functions that can be calculated using any computer can actually be computed using the same closed form formula, with the difference only being the input parameters used. (The formula will need, at its input, information equivalent to the size of the tape and its initial contents. The former is equivalent to knowing the memory capabilities of the computer and the latter equivalent to knowing the function being calculated.)

The discussion above may have left the reader with the impression that everything can be tackled by our model in O(1), but this is not the case. What can't be calculated in O(1) in this model? One easy example is functions that are just too large. Consider a state of our computational model where all stored integers are smaller (in absolute value) than some n0. In the next computational step, for n0≠0, the largest value that any of the integers can have is n1=n0<<n0. Let us define a corresponding nk sequence, where each nk is calculated as nk-1<<nk-1. The value nk cannot be calculated in less than k steps, meaning O(k) time, not O(1).

However, all we need to do to circumvent this is to add to the input a number that is known to be larger than the final result (and, more generally, a number known to be larger than any intermediate result required in the process of calculation). So, computing A(n,n) from n is not possible in this model in O(1) (where A is the Ackermann function, a function that rises much more rapidly than nk). However, supposing that in some untold way we happen to obtain a number, M, that is known to satisfy the rather common property M>A(n,n) -- it doesn't need to satisfy any other requirement! -- then suddenly calculating A(n,n) as a function of both n and M is something that can be accomplished in the described computational model in O(1). All we needed to do is to add to the input a number, M, that satisfies a property that, mathematically speaking, is satisfied by almost every natural in the world.

One last word, before we leave this computational model entirely. Clearly, now that we're solving halting problems in O(1), this computational model has ceased to be an interesting computational model to ask complexity questions about. Nevertheless, in future "Using your Head is Permitted" questions where we will return to the subject of complexity, we will, once again, need a computational model with specific, explicitly mentioned assumptions, to work on. In all such future cases, unless explicitly stated otherwise, we will be working with a somewhat more tame computational model, which I call "The Usual Complexity Assumptions", described in full here. You will note that this model still includes O(1) addition, subtraction and bit shifts, so it's still possible to sort in it n integers in O(n), but such small quirks aside it behaves more or less as you'd expect.

I'm afraid I don't have an equivalently mathematically-neat formulation for what I mean in questions where I will ask for a "closed form solution", but readers should use their best judgment on this, and if your closed form solution is more than, say, two hundred symbols long, it probably isn't the closed form solution I had in mind when I wrote the question.

Lastly, but perhaps most importantly: I initially neglected to provide full credit for this riddle. I will correct this omission now. This riddle came to me following a conversation with my good friend Shachar Shemesh, during which he pointed out to me the casualness in which we use terms like O(n2)-time when we really mean O(n2) comparisons, where in truth the two mean different things. The same conversation yielded the December and the January riddles as well. Thanks Shachar!