Using your Head is Permitted

March 2008 solution

To reach a solution, we combine two ideas:
  1. Unary encoding, which is the encoding of the number n by the string containing n ones, satisfies that numeric sorting and lexicographic sorting coincide for it. If we concatenate a "0" suffix to all strings under this encoding, we add the further property that no two of the resulting strings are prefixes of each other.
  2. If E(n) is an encoding scheme for n where numeric and lexicographic ordering coincide, and if the encoding of no two numbers under E are prefixes of each other, then E(len(str(n)))+str(n) is another such encoding. (where "str(n)" is the standard decimal representation of n, "len" is the digit length of a string and "+" is string concatenation).
If we were to simply use unary encoding, we would reach a string of length O(n), which is too long for us. However, we can use the encoding of the first idea as the E(n) function of the second idea to get an O(log(n)) solution. Here is a Python implementation of encoding and decoding algorithms for it:

def encode(n):
  return "1"*len(str(n))+"0"+str(n)

def decode(s):
  return int(s[s.index("0")+1:])

However, these functions still don't provide an adequate solution, because the "length" component, encoded in unary, is of order O(log(n)), meaning that R(n) is O(log(n)), which is still too large. To solve the problem completely, we will re-utilize the second encoding idea, this time using the functions above as E(n):

def encode(n):
  return "1"*len(str(len(str(n))))+"0"+str(len(str(n)))+str(n)

def decode(s):
  return int(s[2*s.index("0")+1:])

This reduces R(n) to O(log(log(n))), and solves the problem.

For the bonus part, we want to prove that this complexity is the theoretical minimum. What follows is a proof that shows this for binary digit strings, but it is straightforward to extend it to any base, including to decimal representation.

Let us consider how many integers we can encode before we require one where R(n)>c, for some integer c.

Suppose all the numbers up to a certain digit length l can be encoded with R(n)≤c. Consider an arbitrary kl.

There are 2k-1 numbers of length exactly k in standard binary representation, all of which must be represented by strings of length between 0 and k+c. Consider the prefixes of length c+2 of these strings (where strings of length less than c+2 are their own prefixes). There are at most 2k-1-1 such strings that share any given prefix, for which reason at least two distinct prefixes must be used for the encoding of the numbers of length k.

Of these prefixes, none but the lexicographically largest can be the prefix of the encoding of any larger number. Therefore, each length k can be said to "exhaust" at least one prefix. There are only 2c+3-1 such prefixes, so after the first 2c+3-1 lengths they will all be exhausted. l must be smaller than 2c+3. The first n that requires R(n)>c is less than 22^(c+3). Changing the formula around, for any n there is at least one mn, such that R(m)≥log2(log2(n))-3, which is equal to log2(log256(n)), and therefore R(n)≥O(log(log(n)).

Q.E.D.

Back to riddle

Back to main page