Previously on this blog (quite a while ago), we’ve investigated some simple ideas of using randomness in artistic design ( psychedelic art , and earlier randomized css designs ), and measuring the complexity of such constructions . Here we intend to give a more thorough and rigorous introduction to the study of the complexity of strings. This naturally falls into the realm of computability theory and complexity theory, and so we refer the novice reader to our other primers on the subject ( Determinism and Finite Automata , Turing Machines , and Complexity Classes ; but Turing machines will be the most critical to this discussion).
The Problem with Randomness
What we would really love to do is be able to look at a string of binary digits and decide how “random” it is. On one hand, this is easy to do at a high level. A child would have no difficulty deciding which of the following two strings is more random:
And yet, by the immutable laws of probability, each string has an equal chance (
) in being chosen at random from all sequences of 50 binary digits. So in a sense, the huge body of mathematics underlying probability has already failed us at this basic juncture because we cannot speak of how random one particular outcome of an experiment is. We need a new notion that overcomes this difficulty.
Definition: The Kolmogorov complexity of a string , denoted
is the length of the shortest program which outputs given no input.
While this definition is not rigorous enough to be of any use (we will reformulate it later), we can easily see why the first of the two strings above is less random. We can write a very short Python program that outputs it:
[code]print "01" * 25[/code] On the other hand, it is not hard to see that the shortest program that produces the second string is
[code]print "00011101001000101101001000101111010100000100111101"[/code] The dutiful reader will cry out in protest. How do you know that’s the shortest program? Why restrict to Python? This whole discussion is so arbitrary!
Indeed, this is probably not the strictly shortest Python program that prints out the string. In the following we’ll work entirely in binary, so the picky reader should interpret this as a print command referring to a block of binary memory. We will reify these ideas in full rigor shortly, but first let us continue with this naivete to make the coming definitions a bit easier to parse.
If we abstract away the lengths of these strings, we see that the length of the first program is