2015-06-11

Shortest program that prints the play "Othello" by William Shakespeare

We have this requirement: "Create the smallest program that will print the full text of the play Othello by William Shakespeare".

I should warn you that the requirement hides a trick and you will hate the solution. The program does not need to run efficiently. Actually, it does not need to run in reasonable time. It just has to be very short. And to be proven to answer the requirement, eventually... No external random source is allowed and no input from user.

You could, of course, do just a printf("[full text of play here]"), however it would not be very short, because of the size of the play that takes around 170KB

You might have already thought for a way to compress the play and uncompress it at run time. It might be possible, but it would be (unprovable) not short enough ;)

You might have thought about generating words and combine them randomly. It might work, however you don't have a random source. Also, you might wander how can you know you have generated the right text - you would need to store the text in the program. Well, actually, the requirement is not to print only the text of the play. You can print other texts before, as long as you eventually print the right text.

You could do a backtracking and generate all the possible combinations of words. It will surely print the play eventually, but how do you know when you printed the right text? This is actually an "acceptable" solution according to requirements,as the program is not required to stop with the right text. Can it be even shorter?

A very short program that satisfies the requirement is actually no more complex than a counter in binary, on enough many bits. Incrementing a counter stored on enough bits, you can generate all the possible bits combinations, and eventually it will reach the combinations of bits in the Othello play also. Actually it would generate any other poems and plays of Shakespeare. With "enough" time, it will generate all the movies that you have seen and that are not created yet. You just need a time that is more that the age of the Universe, probably ;)

One implementation, not minimal, just a prove of concept. If you can write a shorter one, just let me know:
$ cat universal.c
#include <stdio.h>
#include <string.h>

#define buf_size 200000
/*
 *   200KB should be enough:
 *  wc -c othello.txt
 *  167566 othello.txt
 */

int main(){
  unsigned char a[buf_size];
  memset(a,0,buf_size);
 
  int i=0;
  while(i < buf_size){
    i=0;
    while( (++a[i] == 0)){
      i++;
    }
    printf("%s\n", a);
  }
}
Let's run the program and check some simple, small, beginnings of phrases:
$ gcc -O3 universal.c
$ time ./a.out | grep -m 1 "Hi !"
Hi !
real 0m18s
$ time ./a.out | grep -m1 'Me ?'
Me ?
real    0m36s

You can try other short texts, 4 characters are generated in about 1 minute. For bigger length the time increases exponentially. But if you provide "enough" time...

$ time ./a.out  | grep -m 1 Othe
Othe
real    1m1s
$ time ./a.out | grep -m 1 Othel
Othel
real 209m17s
For an additional character ("Othell") it would take around 891 hours! The speed can be improved a lot, for example skipping the unprintable characters, limit the size of a word and line, limit the repeating of letters. But it will still take exponential time as text length increases. And the program will become more complex, that is against the problem's requirement.

I told you that you will hate the solution. It is not of any practical use. This blog is often about weird possibilities. However, this solution has some interesting philosophical ramifications:
  • As any digitized art work is virtually generated by such simple program, one could argue against the morality of copyright laws. Personally, I don't have a strong opinion on this, yet.
  • Information is relatively easy to create. It is more important to be able to tell when you have the valuable information. This is even more important these days, where we have a lot of non-trivial information available, but it is hard to chose what is valuable, relevant, useful.
  • Let's say you have many people that are generating various theories. It is not enough that one of them articulates the theory that would revolutionize the science. If is also needed for this theory to be identified as likely to be valuable, in order to be verified and then adopted mainstream.
  • You can automatically generate combinations of axioms and theorems, generating a huge number of demonstrable true statements. However, it is still not useful until you identify which of them are indeed useful.
  • If you remove the "printf()" from the above program, it will simulate a Turing machine, where the "a[]" buffer is the tape. Such program will eventually have "on the tape" (in memory) any information that you would want, any solution on any solvable problem. The inherent difficulty is not to put the information on the type, the difficulty is to decide to stop with the needed information. If you can find a simple equation that uniquely identifies a certain information, you can have a very good compression of that information. The compression will have the same order of magnitude as the equation representation in that language.
  • This program is (unprovable) much smaller the text's Kolmogorov complexity, that should be the minimum program that generates the text. The relaxation that permits this is the removal of the request that the machine should stop with that text.


No comments:

Post a Comment