Abstract
Kolmogorov complexity is the length of the ultimately compressed version of a file (i.e., anything which can be put in a computer). Formally, it is the length of a shortest program from which the file can be reconstructed. We discuss the incomputability of Kolmogorov complexity, which formal loopholes this leaves us with, recent approaches to compute or approximate Kolmogorov complexity, which approaches are problematic, and which approaches are viable.
Keywords: Kolmogorov complexity, incomputability, feasibility
1. Introduction
Recently there have been several proposals regarding how to compute or approximate in some fashion the Kolmogorov complexity function. There is a proposal that is popular as a reference in papers that do not care about theoretical niceties, and a couple of proposals that do make sense but are not readily applicable. Therefore, it is timely to survey the field and show what is and what is not proven.
The plain Kolmogorov complexity was defined in [1] and denoted by C in the text [2] and its earlier editions. It deals with finite binary strings, strings for short. Other finite objects can be encoded into single strings in natural ways. The following notions and notation may not be familiar to the reader so we will briefly discuss them. The length of a string x is denoted by . The empty string of 0 bits is denoted by . Thus, . Let x be a natural number or finite binary string according to the correspondence
Then . The Kolmogorov complexity of x is the length of a shortest string such that x can be computed from by a fixed universal Turing machine (of a special type called “optimal” to exclude undesirable such machines). In this way is a definite natural number associated with x and a lower bound on the length of a compressed version of it by any known or as yet unknown compression algorithm. We also use the conditional version .
The papers by R.J. Solomonoff published in 1964, referenced as [3], contain informal suggestions about the incomputability of Kolmogorov complexity. Says Kolmogorov, “I came to similar conclusions [as Solomonoff], before becoming aware of Solomonoff’s work, in 1963–1964.” In his 1965 paper [1] Kolmogorov mentioned the incomputability of without giving a proof: “[…] the function cannot be effectively calculated (generally computable) even if it is known to be finite for all x and y.” We give the formal proof of incomputability and discuss recent attempts to compute the Kolmogorov complexity partially, a popular but problematic proposal and some serious options. The problems of the popular proposal are discussed at length while the serious options are primarily restricted to brief citations explaining the methods gleaned from the introductions to the articles involved.
2. Incomputability
To find the shortest program (or rather its length) for a string x we can run all programs to see which one halts with output x and select the shortest. We need to consider only programs of length at most that of x plus a fixed constant. The problem with this process is known as the halting problem [4]: some programs do not halt, and it is undecidable which ones they are. A further complication is that we must show there are infinitely many such strings x for which is incomputable.
The first written proof of the incomputability of Kolmogorov complexity was perhaps in [5] and we reproduce it here following [2] to show what is and what is not proved.
Theorem 1.
The function is not computable. Moreover, no partial computable function defined on an infinite set of points can coincide with over the whole of its domain of definition.
Proof.
We prove that there is no partial computable as in the statement of the theorem. Every infinite computably enumerable set contains an infinite computable subset, see e.g., [2]. Select an infinite computable subset A in the domain of definition of . The function is (total) computable (since on A), and takes arbitrarily large values, since it can obviously not be bounded for infinitely many x. Also, by definition of , we have . On the other hand, by definition of C, and obviously . Hence, up to a constant independent of m, which is false from some m onward. □
That was the bad news; the good news is that we can approximate .
Theorem 2.
There is a total computable function , monotonic decreasing in t, such that .
Proof.
We define as follows: For each x, we know that the shortest program for x has length at most with c a constant independent of x. Run the reference Turing machine U (an optimal universal one) for t steps on each program p of length at most . If for any such input p the computation halts with output x, then define the value of as the length of the shortest such p, otherwise equal to . Clearly, is computable, total, and monotonically nonincreasing with t (for all x, if ). The limit exists, since for each x there exists a t such that U halts with output x after computing t steps starting with input p with . □
One cannot decide, given x and t, whether . Since is nondecreasing and goes to the limit for , if there were a decision procedure to test , given x and t, then we could compute . However, above we showed that C is not computable.
However, this computable approximation has no convergence guaranties as we show now. Let be a sequence of functions. We call f the limit of this sequence if for all x. The limit is computably uniform if for every rational there exists a , where t is a total computable function, such that , for all x. Let the sequence of one-argument functions be defined by , for each t for all x. Clearly, C is the limit of the sequence of s. However, by Theorem 1, the limit is not computably uniform. In fact, by the well-known halting problem, for each and there exist infinitely many x such that . This means that for each , for each t there are many xs such that our estimate overestimates by an error of at least .
3. Computing the Kolmogorov Complexity
The incomputability of does not mean that we cannot compute for some xs. For example, if for individual string x we have for some constant c, then this means that there is an algorithm of c bits which computes from x. We can express the incomputability of in terms of , which measures what we may call the “complexity of the complexity function.” Let . It is easy to prove the upper bound . However, it is quite difficult to prove the lower bound [6]: For each length n there are strings x of length n such that
or its improvement by a game-based proof in [7]: For each length n there are strings x of length n such that
This means that x only marginally helps to compute ; most information in is extra information related to the halting problem.
One way to go about computing the Kolmogorov complexity for a few small values is as follows. For example, let be an acceptable enumeration of Turing machines. Such an acceptable enumeration is a formal concept ([2] Exercise 1.7.6). Suppose we have a fixed reference optimal universal Turing machine U in this enumeration. Let simulate for all indexes i and (binary) programs p.
Run for all i and p in the following manner. As long as i is sufficiently small it is likely that for all p (the machine halts for every p). The Busy Beaver function was introduced in [8] and has as value the maximal running time of n-state Turing machines in quadruple format (see [8] or [2] for details). This function is incomputable and rises faster than any computable function of n.
Reference [9] supplies the maximal running time for halting machines for all and for it is decidable which machines halt. For but still small there are heuristics [10,11,12,13]. A gigantic lower bound for all i is given in [14]. Using Turing machines and programs with outcome the target string x we can determine an upper bound on for reference machine U (by for each encoding i in self-delimiting format). Please note that there exists no computable lower bound function approximating since C is incomputable and upper semicomputable. Therefore it cannot be lower semicomputable [2].
For an approximation using small Turing machines we do not have to consider all programs. If I is the set of indexes of the Turing machines and P is the set of halting (or what we consider halting) programs then
with . Here we can use the computably invertible Cantor pairing function [15] which is defined by so that each pair of natural numbers is mapped to a natural number and vice versa. Since the Cantor pairing function is invertible, it must be one-to-one and onto: . Here is the desired set of applicable halting programs computing x, i.e., if either or is greater than some with while then we can discard the pair concerned from .
4. Problematic Use of the Coding Theorem
Fix an optimal universal prefix Turing machine U. The Universal distribution (with respect to U) is where p is a program (without input) for U that halts. The prefix complexity is with respect to the same machine U. The complexity is similar to but such that the set of strings for which the Turing machine concerned halts is prefix-free (no program is a proper prefix of any other program). This leads to a slightly larger complexity: . The Coding theorem [16] states . Since (the term contributes to the sum and is also a program for x) we know that the term is greater than 0.
In [17] it was proposed to compute the Kolmogorov complexity by experimentally approximating the Universal distribution and using the Coding theorem. This idea was used in several articles and applications. One of the last is [18]. It contains errors or inaccuracies for example: “the shortest program” instead of “a shortest program,” “universal Turing machine” instead of “optimal universal Turing machine” and so on. Explanation: there can be more than one shortest program, and Turing machines can be universal in many ways. For instance, if for a universal Turing machine, the Turing machine such that for every q and for every string for some string q, is also universal. Yet if U serves to define the Kolmogorov complexity then defines a complexity of x equal to which means that the invariance theorem does not hold for Universal Turing machines that are not optimal.
Let us assume that the computer used in the experiments fills the rôle of the required optimal Universal Turing machine for the desired Kolmogorov complexity, the target string, and the universal distribution involved. However, the term in the Coding theorem is mentioned but otherwise ignored in the experiments and conclusions about the value of the Kolmogorov complexity as reported in [17,18]. Yet the experiments only concern small values of the Kolmogorov complexity, say smaller than 20, so they are likely swamped by the constant hidden in the term. Let us expand on this issue briefly. In the proof of the Coding theorem, see e.g., [2], a Turing machine T is used to decode a complicated code. The machine T is one of an acceptable enumeration of all Turing machines. The target Kolmogorov complexity K is shown to be smaller than the complexity associated with T plus a constant c representing the number of bits to represent T and other items: . Since T is complex since it serves to decode this code, the constant c is huge, i.e., much larger than, say, 100 bits. The values of x for which is approximated by [17,18] are at most 5 bits, i.e., at most 32. Unless there arises a way to prove the Coding theorem without the large constant c, this method does not seem to work. Other problems: The distribution is apparently used as , see ([19] Equation (6)) using a (noncomputable) enumeration of Turing machines that halt on empty input . Therefore and with we have since . By definition however : contradiction. It should be with as shown in ([2] pp. 270–271).
5. Natural Data
The Kolmogorov complexity of a file is a lower bound on the length of the ultimate compressed version of that file. We can approximate the Kolmogorov complexities involved by a real-world compressor. Since the Kolmogorov complexity is incomputable, in the approximation we never know how close we are to it. However, we assume in [20] that the natural data we are dealing with contain no complicated mathematical constructs like or Universal Turing machines, see [21]. In fact, we assume that the natural data we are dealing with contains primarily effective regularities that a good compressor finds. Under those assumptions the Kolmogorov complexity of the object is not much smaller than the length of the compressed version of the object.
6. Safe Computations
A formal analysis of the intuitive idea in Section 5 was subsequently and independently given in [22]. From the abstract of [22]: “Kolmogorov complexity is an incomputable function. … By restricting the source of the data to a specific model class, we can construct a computable function to approximate it in a probabilistic sense: the probability that the error is greater than k decays exponentially with k.” This analysis is carried out but its application yielding concrete model classes is not.
7. Short Lists
Quoting from [23]: “Given that the Kolmogorov complexity is not computable, it is natural to ask if given a string x it is possible to construct a short list containing a minimal (plus possibly a small overhead) description of x. Bauwens, Mahklin, Vereshchagin and Zimand [24] and Teutsch [25] show that surprisingly, the answer is YES. Even more, in fact the short list can be computed in polynomial time. More precisely, the first reference showed that one can effectively compute lists of quadratic size guaranteed to contain a description of x whose size is additively from a minimal one (it is also shown that it is impossible to have such lists shorter than quadratic), and that one can compute in polynomial-time lists guaranteed to contain a description that is additively from minimal. Finally, Ref. [25] improved the latter result by reducing to ”. See also [26].
8. Conclusions
The review shows that the Kolmogorov complexity of a string is incomputable in general, but may be computable for some arguments. To compute or approximate the Kolmogorov complexity, several approaches have recently been proposed. The most popular of these is inspired by L.A. Levin’s Coding theorem and consists of taking the negative logarithm of the so-called universal probability of the string to obtain the Kolmogorov complexity of very short strings (this is not excluded by incomputability as we saw). This probability is approximated by the frequency distributions obtained from small Turing machines. As currently stated, the approach is problematic in the sense that it is only suggestive and cannot be proved correct. Nonetheless, some applications make use of it. Proper approaches either restrict the domain of strings of which the Kolmogorov complexity is desired (so that the incomputability turns into computability) or manage to restrict the Kolmogorov complexity of a string to an item in a small list of options (so that the Kolmogorov complexity has a certain finite probability).
Acknowledgments
Thank you for CWI’s support.
Funding
This research received no external funding.
Conflicts of Interest
The author declares no conflict of interest.
References
- 1.Kolmogorov A.N. Three approaches to the quantitative definition of information. Probl. Inform. Transm. 1965;1:1–7. doi: 10.1080/00207166808803030. [DOI] [Google Scholar]
- 2.Li M., Vitányi P.M.B. An Introduction to Kolmogorov Complexity and Its Applications. 2008 ed. Springer; Cham, Switzerland: New York, NY, USA: 2008. [Google Scholar]
- 3.Solomonoff R.J. A formal theory of inductive inference, part 1 and part 2. Inform. Contr. 1964;7:224–254. doi: 10.1016/S0019-9958(64)90131-7. [DOI] [Google Scholar]
- 4.Turing A.M. On computable numbers, with an application to the Entscheidungsproblem. Proc. Lond. Math. Soc. 1937;42:230–265. doi: 10.1112/plms/s2-42.1.230. [DOI] [Google Scholar]
- 5.Zvonkin A.K., Levin L.A. The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms. Russ. Math. Surv. 1970;25:83–124. doi: 10.1070/RM1970v025n06ABEH001269. [DOI] [Google Scholar]
- 6.Gács P. On the symmetry of algorithmic information. Soviet Math. Dokl. 1974;15:1477–1480. [Google Scholar]
- 7.Bauwens B., Shen A.K. Complexity of complexity and strings with maximal plain and prefix Kolmogorov complexity. J. Symbol. Logic. 2014;79:620–632. doi: 10.1017/jsl.2014.15. [DOI] [Google Scholar]
- 8.Rado T. On non-computable functions. Bell Syst. Tech. J. 1962;44:877–884. doi: 10.1002/j.1538-7305.1962.tb00480.x. [DOI] [Google Scholar]
- 9.Brady A.H. The determination of the value of Rado’s noncomputable function Sigma(k) for four-state Turing machines. Math. Comp. 1983;40:647–665. doi: 10.1090/S0025-5718-1983-0689479-6. [DOI] [Google Scholar]
- 10.Harland J. Busy beaver machines and the observant otter heuristic (or how to tame dreadful dragons) Theor. Comput. Sci. 2016;646:61–85. doi: 10.1016/j.tcs.2016.07.016. [DOI] [Google Scholar]
- 11.Kellett O. Master’s Thesis. Rensselaer Polytechnic Institute; Troy, NY, USA: Jul, 2005. A Multi-Faceted Attack on the Busy Beaver Problem. [Google Scholar]
- 12.Marxen H., Buntrock J. Attacking the Busy Beaver 5. Bull. EATCS. 1990;40:247–251. [Google Scholar]
- 13.Michel P. Small Turing machines and generalized busy beaver competition. Theor. Comput. Sci. 2004;325:45–56. doi: 10.1016/j.tcs.2004.05.008. [DOI] [Google Scholar]
- 14.Green M.W. A lower bound Rado’s sigma function for binary Turing machines; Proceedings of the Fifth Annual Symposium on Switching Circuit Theory and Logical Design; Princeton, NJ, USA. 11–13 November 1964. [Google Scholar]
- 15.Cantor’s Pairing Function. [(accessed on 19 February 2020)]; Available online: https://en.wikipedia.org/wiki/Pairing_function.
- 16.Levin L.A. Laws of information conservation (non-growth) and aspects of the foundation of probability theory. Probl. Inform. Transm. 1974;10:206–210. [Google Scholar]
- 17.Zenil H. Ph.D. Thesis. Lab. d’Informatique Fondamentale de Lille, Université des Sciences et Technologie de Lille, Lille I; Villeneuve-d’Ascq, France: 2011. Une approche expérimentale á la théorie algorithmique de la complexité. [Google Scholar]
- 18.Soler-Toscano F., Zenil H., Delahaye J.P., Gauvrit N. Calculating Kolmogorov complexity from the output frequency distributions of small Turing machines. PLoS ONE. 2014;9:e96223. doi: 10.1371/journal.pone.0096223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Soler-Toscano F., Zenil H. A computable measure of algorithmic probability by finite approximations with an application to integer sequences. arXiv. 2017 doi: 10.1155/2017/7208216.1504.06240 [DOI] [Google Scholar]
- 20.Cilibrasi R.L., Vitányi P.M.B. Clustering by compression. IEEE Trans. Inf. Theory. 2005;51:1523–1545. doi: 10.1109/TIT.2005.844059. [DOI] [Google Scholar]
- 21.Vitány P.M.B. Similarity and denoising. Philos. Trans. R. Soc. A. 2013;371 doi: 10.1098/rsta.2012.0091. [DOI] [PubMed] [Google Scholar]
- 22.Bloem P., Mota F., de Rooij S., Antunes L., Adriaans P. A safe approximation for Kolmogorov complexity; Proceedings of the International Conference on Algorithmic Learning Theory; Bled, Slovenia. 8–10 October 2014; pp. 336–350. [Google Scholar]
- 23.Zimand M. Short Lists with Short Programs in Short Time—A Short Proof; Proceedings of the Conference on Computability in Europe; Budapest, Hungary. 23–27 June 2014. [Google Scholar]
- 24.Bauwens B., Makhlin A., Vereshchagin N., Zimand M. Short lists with short programs in short time; Proceedings of the 2013 IEEE Conference on Computational Complexity; Stanford, CA, USA. 5–7 June 2013. [Google Scholar]
- 25.Teutsch J. Short lists for shortest programs in short time. Comput. Complex. 2014;23:565–583. doi: 10.1007/s00037-014-0090-3. [DOI] [Google Scholar]
- 26.Teutsch J., Zimand M. A brief on short descriptions. ACM SIGACT News. 2016;47 doi: 10.1145/2902945.2902957. [DOI] [Google Scholar]
