javascriptalgorithmlanguage-agnosticgenetic-programming

Is it possible to enumerate computer programs?


Suppose you have to write a program that will test all programs in search of one that completes a specific task. For example, consider this JavaScript function:

function find_truth(){
    for(n=0;;++n){
        try {
            var fn = Function(string(n));
            if (fn() == 42)
                return fn;
        } catch() {
            continue;
        }
    }
}

As long as string(n) returns the nth string possible ("a", "b", "c", ... "aa", "ab" ...), this program would eventually output a function that evaluated to 42. The problem with this method is that it is enumerating strings that could or couldn't be a valid program. My question is: is it possible to enumerate programs themselves? How?


Solution

  • Yes, there are ways that this is possible. A few months ago I made a little experimental project to do something like it, which works reasonably well considering what it's doing. You give it a type and some tests to pass, and it searches for a suitable function that passes the tests. I never put in the work to make it mature, but you can see that it does in fact eventually generate functions that pass the tests in the case of the examples. Requiring the type was an essential component for the practicality of this search -- it drastically reduced the number of programs that needed to be tried.

    It is important to have a firm grasp of the theory here before making assertions about what is and is not possible -- there are many who will jump to the halting problem and tell you that it's impossible, when the truth is quite a bit more nuanced than that.

    What you cannot do to a turing complete language is decide that a program will definitely not ever generate 42 -- it could run forever trying, and you won't be able to tell whether it will eventually succeed until it does. There are some programs which you can tell will infinite loop, though, just not all of them. So if you have a table of programs, you might expect your enumerator program in the case of a non-turing complete language to be like this:

    Program |  P1  |  P2  |  P3  |  P4  |  P5  |  P6  |  P7  | ...
    42?     | No   | No   | No   | Yes  | No   | No   | No   | ...
    

    Whereas in a turing complete language, your output (at a given time) would be like this:

    Program |  P1  |  P2  |  P3  |  P4  |  P5  |  P6    |  P7  | ...
    42?     | No   | No   | Loop | Yes  | No   | Dunno  | No   | ...
    

    And later that Dunno might turn into a Yes or a No, or it might stay dunno forever -- you just dunno.

    This is all very theoretical, and actually generating programs in a turing complete language to search for ones that do a certain thing is pretty hard and takes a long time. Certainly you don't want to do it one by one, because the space is so big; you probably want to use a genetic search algorithm or something.

    Another subtle point in the theory: talking about programs which "generate 42" is missing some important detail. Usually when we talk about generating programs, we want it to generate a certain function, not just output something specific. And this is when things you might want to do get more impossible. If you have an infinite space of possible inputs -- say, inputting a single number, then

    There you have it, a daily dose of decidability theory phenomenology.

    If you are interested in doing it for real, look into genetic algorithms, and put timeouts on the functions you try and/or use a decidable (non-turing-complete) language.