Using your Head is Permitted

October 2017 solution

Riddle 1:

Surprisingly, the answer is that such a program exists. In fact, it is possible to solve the halting problem in constant time. This is done using the following piece of code, written here as pseudo-Python:

def is_halting(prog):
  return floor(H*(1<<prog))%2

where a return code of 1 indicates a halting program and a 0 indicates a non-halting program. The variable 'prog' is the input program, suitably encoded as a natural.

What this program does is simply return the 'prog'th bit of H, where H is a real constant used by the program.

The trick to this program is that, as written, the usual complexity assumptions explicitly allow the use of real constants. In practice, no such constants should be allowed in any reasonable computational model, for precisely the reason outlined above: if we are looking for a program that takes a natural and returns a bit (such as "accept" or "don't accept"), any such program can be implemented in constant time using the procedure described above, simply by choosing the appropriate real number that corresponds to the desired return values. There is always exactly one such real number in the range [0,1).

One can argue that we will never be able to find the correct H value that will solve the halting problem, but the riddle merely asks if such a program exists, and the answer is that such a constant and such a program must certainly exist. If Using your Head is Permitted had continued publishing riddles about the usual complexity assumptions, I would have altered these slightly so as to explicitly disallow real constants to be used. (In some contexts, I have been known to also disallow all integer constants except for 0 and 1, for similar -- but not identical -- reasons.)

So, how does this construction overcome Turing's diagonalisation argument?

The answer is that it doesn't. This program cannot determine the halting properties of every program written using the usual assumptions. The trick is that the input is "a software program, suitably encoded as a natural": there are only as many encodable inputs as there are naturals. This is enough to encode all Turing machines and all programs that do not use real constants, but the number of programs that use real constants is much larger than the number of naturals, and these cannot all be encoded.

Riddle 2:

David Dowe sent this riddle in (it is a variation over one he heard in his student years), but neither he nor I managed to find a good solution to it. Happily, Tom Hutchcroft and Omer Angel stepped in to help us (Thanks to you both!), and I can now report the the solution is as follows.

First off, without any further assumptions, we can construct explicitly a solution for the case that n=2k+1. A line in ℝⁿ space can be described as the set of points satisfying x+vt, for any real value t, where x and v are vector parameters describing the line.

For our purposes, it is enough to restrict x_n to 0 and v_n to 1. We must, however, assign a unique v value to every such x. Let us describe this by a function, f:ℝ^2k→ℝ^2k, such that for every x as above, the associated v value satisfies

(v₁,..., v_2k)=f(x₁,..., x_2k)

The condition of the riddle can be stated under these restrictions as f(a)-f(b) being linearly independent of a-b for every a and b. This is particularly easy to attain when f is a linear function, i.e. f(a)=M*a, for some matrix M. Here, the condition on f boils down to M not having any real eigenvalues.

When the dimension of M is even (in this case, 2k), constructing such a matrix with no real eigenvalues is straightforward. For example, we can pick

f(x₁,..., x_2k)=(x₂,-x₁,x₄,-x₃,..., x_2k,-x_2k-1)

which is a matrix rotating each pair of dimensions by ninety degrees.

This, however, leaves open the case where n is even, a case where no such M can be found because complex eigenvalues always come in pairs.

Another case that can be handled without further assumptions is n=2, where it is clearly impossible to construct a set of lines satisfying the requirements of the riddle.

We remain, therefore, with the case that n is even and greater than 2. We will show for this case that assuming the axiom of choice it is possible to construct a set of lines satisfying the desired conditions, but it is not possible if the mapping is continuous in any neighbourhood N.

Let us first consider the case where the mapping is continuous in some neighbourhood N. Let x be in N and let v be the direction of the line going through x. Because the mapping is continuous in the neighbourhood, there is some stricter neighbourhood of x in which no line has a direction that is perpendicular to v.

For convenience, let us choose a frame of reference wherein x=(0,..., 0) and v=(0,..., 0, 1), then this condition becomes equivalent to our requirement from the previous portion of the analysis that we can assume x_n=0 and v_n=1.

As demonstrated before, any successful construction must now satisfy that for any x and y, f(y)-f(x) is linearly independent of y-x.

Let us fix x as above, and define g(y) as the normalised direction vector for the component of f(y)-f(x) that is perpendicular to y-x.

The impossibility of finding such a g is a consequence of the "hairy ball theorem": there must be at least one y in any sphere around x where the perpendicular component vanishes and g cannot be normalised.

We will now show that when looking at discontinuous mappings, we can construct a solution to this problem by use of the axiom of choice (or even some weaker axioms). A method for doing so is as follows.

Take a bijection between ℝ^n-1 and the first ordinal of cardinality c (continuum). Now, go over all a values in ℝ^n-1 in the order of the bijection, and for each a pick an f(a) on the unit sphere that has not been excluded due to previous choices.

Will there always be a value remaining that can still be picked? Well, each previously chosen value generates a constraint on f(a), forbidding it from being on a certain line. This excludes at most two values on the unit sphere. However, there were always previously less than c such already fixed values, and therefore also fewer than c forbidden values (the multiplication by 2 makes no difference to the ordinal), so there will always be more values that can be chosen.

The complete answer is therefore that if assuming the axiom of choice all n values are possible except 2, whereas if assuming the mapping to be continuous in any neighbourhood, possible values are exactly the odd n.

Thanks again David for this riddle, and thanks Tom and Omer for the beautiful solution.

Riddle 3:

Yes, such a strategy exists.

Before describing it, here is a recap of how to solve the more standard riddle, where N guesses are allowed. (Kevin tells me he has heard this version from "either Ross Kang or Fiona Skerman".)

We choose to turn all lights off, except for the last (i.e., the N'th) light encountered in each row. That light we keep on. The gaoler now has the choice of turning the last light off, in which case our friend merely needs to test the N bulbs in the row where all are off, or the gaoler will turn the last light on, in which case there are N lights on, one in each row, and by guessing each in turn our friend can find the desired light bulb. It is possible to tell which is the case simply by counting the number of lights that are on.

With this in mind, we now turn to Kevin's variation on this riddle, wanting to solve the same problem with less guesses.

To do this, let us consider the following simpler problem: we have n light bulbs, and the gaoler points them to us in an order of his (or her) choosing, at which time we must choose whether to turn the pointed-to bulb on or off. We wish to choose a strategy for determining which lights to turn on and which off that will satisfy the following criteria.

The n lights will ultimately form a code word in an error correcting code capable of correcting a single error.
The code word will encode the identity of k bulbs which are all not the last bulb to be pointed to. (It does not need to encode any ordering among the k.)

A good choice of k and n would be one that maximises k/n.

Note that once a set k₁ and n₁ and a set k₂ and n₂ are found, it is always possible to find a solution to k₁+k₂ and n₁+n₂ simply by appending the two vectors.

A corollary is that once a set of k and n is found, one can find arbitrarily large sets with the same k/n simply by joining t copies of the same construction.

Let (k,n) be one solution to the simpler problem, and let us now return to the original problem, but instead of solving it for an N*N grid we will solve it for a rectangular grid with n-k rows and n columns.

Our strategy will be as follows. Whenever the gaoler points to a bulb, we use the code-book that we devised previously to encode the identity of k bulbs that are not the last the gaoler pointed to among the n. However, when reaching the last bulb in any given row, we purposefully introduce an error in the code.

At the end of the process, there will either be n-k coding errors, in which case our friend will merely need to guess these, or there will be one less coding errors, in which case this pinpoints the identity of a single row, and we can use the encoding in order to narrow the search for the last bulb chosen by the gaoler to only n-k possibilities.

In practice, it is unlikely that a set of N*N bulbs will avail an exact reshaping to (n-k)*n, but we don't actually need all rows to be of exactly the same length. All we need is to minimise the maximum between the number of rows and the maximum among all rows of the n-k value used in encoding the row. (Remark: A more general statement is that if M is the total number of light bulbs, whether or not it is a perfect square, then the function that maps M to the optimal number of guesses required is monotone increasing. This can be shown by noting that if M₁<M₂ then solving for M₂ can be no harder than solving for M₁ once an Oracle provides us with the identity of the first M₁-M₂ light bulbs to be chosen, which can be no harder than solving for M₂ without such an Oracle.)

Solving by use of rows of unequal lengths is always possible to do. This can be demonstrated by the fact that the standard solution is merely a case where n=1 and k=0. However, the saving in using a particular choice of (k,n) pair compared to the standard solution is the multiplicative difference between n-k and √(n-k)*n.

Any k/n>0 will therefore solve the riddle, and the higher the value the greater the saving.

One simple solution can be found with k=1 and n=6 (recalling that once a particular k/n is found, the values of k and n can be increased arbitrarily by appending more copies of the same without this degrading the k/n value) by simply encoding in a 1-correcting code of length 6 the identity of the first bulb to be pointed to.

Just for fun, here's a more involved code with k=4 and n=15. It is a Hamming Code in which one begins by turning the first 4 chosen lights off, then completes the rest of the code word to encode the position of these first four lights. The code itself is given here. The left column indicates (using its '1's) the identity of the first four bulbs to be pointed to; the right column shows the completion into a full code word, after the first four pointed-to lights have all been switched off.

This code was generated by the Python program given here.

However (for the bonus riddle), can we solve this problem in o(N)?

The answer is that we can, and this due to the following simple observation: if it is possible to solve for M₁ bulbs with G₁ guesses and M₂ bulbs with G₂ guesses, then it is possible to solve for M₁M₂ bulbs with G₁G₂ guesses.

What this means is that if there a solution for some (M,G) pair, there is also a solution for (M^k,G^k) for any k, so if G=M^α then there is a solution in O(M^α)=O(N^β) for a general M (and N), where β=2α.

The original riddle showed how to do this with α=1/2, our improvement with (k,n) schemes works with α values of log(n-k)/log(n(n-k)). This places our (k,n)=(1,6) scheme at α≈0.473197 (so, ≈O(N^0.946)) and our improved (k,n)=(4,15) scheme at a slightly better α≈0.469628 (so, ≈O(N^0.939)). Both satisfy the o(N) requirement.

How do we prove that there is a solution in G₁G₂ guesses?

Divide the M₁M₂ bulbs into M₁ groups of M₂ bulbs each. We know that there is some strategy affording a solution in only G₂ for M₂ bulbs. We will employ this solution in each of the bulb groups separately.

Note, however, that this (M₂,G₂) solution assumes that the gaoler has control over the setting of the last bulb. In practice, in M₁-1 of the M₁ groups, we have control over this bulb. We will use our control in order to send an additional bit of information to our friend, and will do so in the following way.

We will treat each group of M₂ bulbs as though it was a single bulb. Whenever the last bulb is pointed to in any group, we know that this group is not going to have the last bulb to be chosen. We will treat this as we do pointing to a single bulb out of a total of M₁ (where pointing to a bulb indicates that this bulb is not going to be the last to be chosen). If in the problem with M₁ bulbs we would have at this point chosen the light to be on, in the larger problem with M₁ groups of size M₂ we will either turn the light on or off, so that the total number of lights that are on is odd. Conversely, if we would have chosen in the M₁-sized problem to turn the light off, we make the total number of on lights in the group even.

By using the (M₁,G₁) solution over the parity of the number of lights on in each group, we can ensure that, at the end, we can narrow the problem down to only G₁ potential groups where the desired last bulb can be, much like in the original M₁-sized problem we were able to narrow the position of the last bulb to only G₁ candidate positions.

We now go over each of these G₁ potential groups, and employ the (M₂,G₂) tactic on each of them, thus being able to rule out that they have the last bulb in only G₂ guesses each.

Thus, in a total of at most G₁G₂ guesses the desired bulb is found.

But is β=0.939 the lowest β for which there exists a solution in O(N^β) guesses?

In the history of Using your Head is Permitted it was many times the case that readers sent me more elegant solutions than I had prepared originally, and I would publish these instead of my own solutions, but it was almost never the case that I would get sent a solution that is quantitatively better than what I had. How fitting it is therefore that for the very last part of the site's very last riddle Uoti Urpala sent me a solution that outdoes by a stretch this β=0.939, which is the best complexity I knew of for this riddle at the time of publication. Thanks, Uoti!

While I still do not know the best β possible, Uoti's strategy shows that the riddle can be solved with any β>2/3.

I will describe here a simplified, easier to explain, but not quite as efficient version of Uoti's strategy (while still obtaining the β>2/3 result). For those interested in Uoti's full strategy, I've made his full description available here. For all others: rest assured that any ingenious part of the construction is Uoti's, while any inefficiency, in-clarity and incorrectness are entirely my fault.

Here goes.

As before, the method is based on the idea of a code of length n that can encode the identity of k elements that are known to not be the last one set, and that can correct one error. We will construct such a code for which n-k is, up to some polylogarithmic terms, O(√n).

Readers are welcome to verify that such a code translates, in the overall scheme to an O(N^β) with β equal to 2/3 plus some asymptotically vanishing terms (due to the polylogarithmic terms of n-k), which is what we want to prove.

Our specific error correcting code requires several elements. First is the idea of a Hamming checksum.

To define the Hamming checksum, let us first describe a truncated Hamming code: to create a code of length n we assign to each position a unique number from 1 to n, which we encode as an L_n≈O(log(n))-length bit-string. A word is in the code if the xor of all bit-strings related to the position of its '1' bits is zero. The code can correct one error, because if any single bit is flipped, the xor becomes the bit-string associated with the corrupted position.

The Hamming checksum we will use will be the bits at positions associated with numbers that are powers of two. There are L_n of these, and regardless of the values assigned to any of the other bits in a word, one can complete it into a code word by a unique appropriate assignment of the checksum bits.

Importantly, the bits can be assigned to the numbers 1 through n in any order, and in particular any subset of L_n bits can be used as checksum bits, as long as the decoder knows in advance which bits are which.

In our case, we will use other parts of the code word in order to encode which L_n bits are the Hamming checksum, and will take the other bits to be assigned such that both the checksum bits and the non-checksum bits are individually assigned monotone increasing values in the Hamming error correcting scheme.

Notably, the information of which bits are the checksum bits must itself be stored using a 1-error correcting code. For this we will use a simple (and inefficient) code: we will simply encode the information three times, using three non-overlapping sets of bits.

The full construction is as follows. We begin by describing it from the perspective of the decoder (say, our "friend"), then go back to show that it can be utilised in practice.

We take the n-length word and divide it into segments of size x (a length to be decided on later, but which will ultimately be more or less √n in size.

In a legal code-word in our scheme, all segments are very sparse (the portion of '0's tends to 1 as n grows to infinity), but some segments are more sparse than others. There will be, namely:

One segment with between 0 and L_n ones. We name it "Type A".
Three segments with between 2L_nL_x and 3L_nL_x ones, where L_x≈log(x) is the number of bits required to encode a number between 0 and x-1. We name these "Type B".
All other segments with exactly C=2L_nL_x(L_x+1) ones. We name these "Type C".

Importantly, in a 1-corrupted code-word, these numbers may be off by ±1, but the differences between the number of set bits in each type of segment are significant enough that no single-bit error can cause ambiguity regarding which type each segment is.

Using a special technique, which we name subset encoding and define below, each of the three "Type B" segments encode the same set of L_n numbers in {0,..., x-1} (thus providing 1-error correcting for this information). The set itself is interpreted as a set of L_n positions in the "Type A" segment, which, in turn, is used as the Hamming checksum for the entire code-word.

To make this into a (k,n) construction, we must specify which k elements are known not to be the last elements set, given a specific (legal) code-word. Here, these k are the zero positions in the "Type C" segments. As can be easily ascertained, if x is taken to be √n, n-k is on the order of √n times a polylogarithmic term.

From the perspective of the encoder, this is handled as follows.

Initially, the encoder uses the strategy that the first x-C bits selected in every segment are chosen to be 0, and all following bits in that segment are chosen to be 1. This continues until there are only 4 segments for which x-C bits have not yet been selected.

At this point, the encoder chooses arbitrarily one of the 4 segments to serve as a "Type A" and the other three as "Type B"s. The remaining segments are the "Type C"s. Furthermore, the encoder chooses arbitrarily L_n as-yet unselected bits in the "Type A" segment, to be used as the Hamming checksum. The rest of the "Type A" bits will be set to zero. The rest of the "Type B" bits will be set by the subset encoding scheme, so as to encode the positions of the Hamming checksum bits, and the rest of the "Type C" bits will be set to 1, continuing the original strategy and ensuring that the "Type C" zeros are known to not contain the last selected bit.

We now complete the construction by describing subset encoding.

Subset encoding takes a string of x bits, of which all but A are zeros and the rest have not been set yet, and selects value for the remaining A bits in a way that encodes a desired bit string. We will show that any bit-string of length A/(2(L_x+1)) can be encoded in this way, allowing for unambiguous decoding, even by a decoder who does not have the information of which A were the initially settable positions. Readers can confirm that these conditions are enough to satisfy the criteria of the overall algorithm described here.

Once again, we begin by describing the process from the point of view of the decoder.

To decode the word encoded by a string of x bits, we use the following procedure:

If the number of set bits is zero, the string encoded is the empty string.
If it is 2, the string encoded is '0'.
If it is 3, the string encoded is '1'.
If it is 4 or more, divide the input string to two halves, decode each half separately, and concatenate the two results.

In a legal encoding, the number of set bits is never 1, which is why this possibility is ignored.

In effect, we distribute the bits of x along the leaves of a tree, where each leaf is associated with 0, 2 or 3 set bits. Importantly, a leaf with 0 set bits can be omitted from the tree altogether without this changing the decoding, for which reason it is not necessary for the encoder and decoder to use exactly the same tree decomposition.

Also, note that once the tree is pruned of its zero-weight leaves, the rest of the tree structure is unambiguous, because the smallest of the non-leaf nodes has a Hamming weight of 2+2=4, which is more than that of the heaviest leaf node (3). Together, these facts indicate that if the encoder encoded its message by use of leaves of weights 0, 2 and 3, the decoder will decode the message properly.

The encoder works in the following way: the string of x bits is recursively divided into halves, with division stopping whenever the number of settable bits in a leaf node is 4 or less. If it is 2 or less, those bits are zeroed out. The remaining leaves have either 3 or 4 settable bits. We now prune these to either 2 or 3 set bits, according to the values we wish to encode.

How inefficient can this encoding be? If x has at least 3 settable bits, it will eventually encode at least one bit of output. We can therefore take each bit of output, and bound the maximum number of settable bits x may have had along its encoding path from root to leaf. Multiplied by the number of encoded bits, this gives an upper bound to the number of bits needed for encoding.

According to this upper bound, each bit encoded requires at most 2(L_x+1) settable bits in x: the number of levels in the tree is L_x; at each intermediate level the wastage is at most of 2 bits (because 3 bits are already enough to encode output), and there are at most 4 settable bits at the leaf node.

Q.E.D.

As mentioned, it is at this time unknown (to me) whether and by how much β can be pushed down even further from this present 2/3 bound.

Thanks, Uoti for the beautiful construction, and thanks Kevin for this wonderful riddle.

And thank you to all readers, all who contributed riddles, all who contributed solutions, and all who supported this site throughout its run.

I will miss you all.

PS --

For anyone who is interested in reading other things I write, I have just now started a new web-column, called "Musings of a Junk Connoisseur", where I give movie and TV reviews. Right now there is only one review there (for "The Orville"), but I have quite a few reviews ready, which I'll be uploading over the next week or two, so have faith and some patience. The review page can be found here.

In one of the shortly-to-be-uploaded reviews I will also be explaining why it is that I felt the need to do this on a web-page of my own, rather than use established sites such as IMDb or Rotten Tomatoes.

Back to riddle

Back to main page