Why Kolmogorov complexity is not computable - root cause and an example

Please check my informal introduction to "Kolmogorov complexity" if you are not familiar with the subject. In this article I will approach some less discussed ramifications of it and hopefully some new ideas: an example of a Kolmogorov compression that would require an arbitrary high amount of working memory before shrinking to the compressed string (conjecture).

1. Root cause why "Kolmogorov complexity" is uncomputable

Kolmogororov complexity is uncomputable because Turing machines have a halting problem :)

Let's start with the observation the the Halting problem of a Turing machine having a limited band is actually decidable, unlike the infinite band one. A "finite band Turing machine" is actually a finite state machine that will either stop without going through the same state, or will cycle infinitely. Having a limited number of possible states, it cannot run too much without repeating a previous state. As soon as the machine returns to a previous state, it will cycle forever. You can know from the start the maximum number of execution steps until it will either stop or will go into an infinite loop: the number of its possible states. So, you can algorithmically test if any Turing machine with a band having bounded length will stop or not.

Actually, you don't even need a certain constant band to solve the halting problem: it would be sufficient to have a function that would provide an upper band of the required memory for any certain program. With such a function, you would be sure that you can detect an infinite cycle or the stop in a finite number of steps - calculated using the value of that function. It turns out that such function is not computable either, see also the Bussy beaver problem.

Smallest program might require an arbitrary big memory on the band

We know from the mathematical proof that the Kolmogorov complexity is uncomputable. This implies that, in some cases, the smallest program that is compressing certain strings would require to create an arbitrary long state on the band in order to output the "string to be compressed".

Of course, a naive compression that only does << print "string to be compressed">> is only slightly bigger than the "string to be compressed" and does not need extra memory; only a real "space saver" compression can potentially need a huge amount of space on the band. The needed space can be so huge that it can overcome any mathematical function applied on the length of the "string to be compressed" or even on the number represented by the binary form of the string. Otherwise, we would be able to solve the halting problem and calculate the Kolmogorov complexity - that would contradict the theorem.

If I say, for example, that size("string to be compressed") taken to the power of 1000000000000000000 should be enough memory on the band for the smallest program... then there will be a string of a certain size that have the smallest compression that cannot fit into that size calculated by this formula. Take an even faster growing function and the shortest "compressing" program will need an even higher band storage - and no other "shortest program" will fit in that size. The required function would be like the mother of all possible mathematical functions, so it cannot be calculated ;)

That being said, it is not obvious how can we construct such example of compression that is (much) smaller than the "string to be compressed", that will finally leave on the band the "string to be compressed", requiring an arbitrary amount of temporary space on the band in the process -  without being able to run with less. See below an example, a conjecture for now.

2. Example of compression requiring an arbitrary big amount of temporary space

Idea: we need a program that starts smaller than the "string to be compressed", it creates a huge state on the band of the Turing machine - way bigger than the "string to be compressed", then finally ends generating the "string to be compressed" and getting rid of the extra temporary space.

If the final string is easily compressible by a Huffman code, it would be easy for the program to fit into a predictable memory size. Even if we find such compression that requires an arbitrary amount of memory, we cannot be sure that there is no other "even smaller" program that requires a predictable amount of memory. And we know that... we cannot know when we found the smallest one.

We need something that can scale. Let's assume that the compression ratio to the initial size can become so arbitrary small that it can really count as the "smallest program" that can generate that string (maybe minus a constant). A constant would be added anyway when changing the Turing machine of choice.

I think the decimal development of √2 (square root of 2) can be such a string, that can scale. We can take the first N decimals of √2 for any chosen N, no matter how big. Still, the same finite program can generate all these N decimals by implementing a series converging to √2. Alternatively, a program could just try each decimal, keeping the highest one that will not overcome 2 when squaring it (credits M.L.).

We observe that this program needs increasing additional memory space compared with the number N of decimals we want to calculate. Even the program the are picking digits needs a lot of additional space to calculate the square of the incomplete number. While we compute more decimal digits, the additional digits for decimal calculation increases by the power of 2. Without such additional digits, you cannot be sure that a surprise far transport from the digits that you ignored is not modifying a digit from the N you want to generate.

Another argument that any program will need this extra space is that, generating an increasing number of digits using a finite memory space would result in something that can be generated with a finite state machine, that would contradict the irrational nature of √2. Therefore, the extra space required by generating N digits of √2 will keep increasing when N increases.

In the case of √2, we can still find a cap to the number of extra digits we need in order to calculate N digits of √2 : this would be around N squared. This can become arbitrary larger than N, however it is not exactly what we were looking for, because a maximum memory size can be derived from N.

However, a similar approach might provide what we searched for by taking the decimal representation of a non-algebraic (transcendental) real number - where the error cannot be assessed by keeping a known number of decimals. I expect such numbers would need sometimes an arbitrary number of decimals for calculating the first N digits on the number, and that a memory cap cannot be calculated for them (conjecture).

If we would try to compress a long sequence of decimals that could belong to a transcendental number, we would need to try algorithms that would need an arbitrary amount of additional memory space. It is a bit easier if you guess the actual number and the numerical series that will finally generate the number containing as decimals the "string to be compressed". However it is not possible to determine the generating number by a deterministic process, and this is the main difficulty in measuring the "Kolmogorov complexity" of a string.

Update 2019-08-11

Transcendental Liouvill numbers will not have this property of requiring arbitrary big space on the band in order to computer the first N decimals. However, it seems likely that non-Liouvill numbers will exhibit this property, maybe all of them. One sketch of proof is: if you can only calculate a finite number of extra decimals in order to have the first N decimals exactly, you could approximate that number with a rational number, that would make it an Liouvill number. Therefore, the non-Liouvill numbers should require an arbitrary number of extra decimals in order to calculate the first N decimals.

How is this an example of non-computability of Kolmogorov complexity?

Let's say we want to compress a long string of N decimals. They are actually the first decimals of a non-Liouvill commutable number, however we don't know that. There is a simple program that approximates that number, and can generate the first N decimals. However, testing this program would require a memory (band) that is arbitrary bigger than N. We don't know how many decimals (memory) are enough before the first N decimals will converge to the string to be compressed.

If we guess what is the number (and the associated algorithm), we can easily prove this lower bound of compressing the above first N decimals. However, in general, if is impossible to problematically find such short program that would generate a huge string of N decimals that happens to be decimals of a non-Liouville number - if we don't guess it.

3. Conclusion

While strings of characters and numbers are not really the same animals, there seems to be a relation between number properties and the (decimal) representation of these numbers. Maybe this comes from the fact that most of the numbers we are using must have a kind of definition that can be put in relation to an algorithmic way to generate them. It is not the numbers that we generate, we actually generate a certain relative construction relatively to a certain reference. For example√2 is the diagonal of the square of side 1 expressed as sums of tens, tens of tens, etc:
1.4142 = 1 + 4 * 1/10  1*1/100 + 4*1/1000  + 2 * 1/10000

However, if a second square have the side length exactly that diagonal (√2), the square created with that side would have the diagonal of size √2 * √2 = 2, that can be represented without decimals. This only shows that the relation regarding the initial side (1) is more simple (twice the size) compared to the more complicated size relation between the diagonal of the square and the side of the square. This is also true for the second square, where the diagonal of size "2" have a complex size relation with its side of length "√2". Numbers are rational or irrational only regarding a certain reference.

Compression seems to be related to the power of identifying constructive algorithmic proceedings that can expand to certain observed sequences. Sometimes, these sequences that seem random can hide a simple order that waits to be discovered, and such discovery often needs a good dose of intuition.

Dear reader, please leave a message if you exist! ;) Also, please share this article if you find it interesting. Thank you.


Popular posts from this blog

Why Bitcoin will fail

Linux Ubuntu 19.04 full install on USB stick drive - with UEFI/Grub boot fix

Black Friday theory