This question was inspired by How to transform a flow chart into an implementation? which asks about ways of algorithmically eliminating the goto
statement from code. The answer to the general problem is described in this scientific paper.
I have implemented some code following the high-level sketch of Algorithm X from Knuth's The art of computer programming describing the generation of Lexicographic permutations with restricted prefixes (see p. 16 of this draft).
This is the corresponding flow chart of the above algorithm.
This might be a very clever and very efficient algorithm, but the structure of the code seems to be challenging to follow. I ended up using the good old goto
-style implementation:
//Algorithm X;
1:
initialize();
2:
enter_level(k);
3:
set(a[k],q);
if(test() == ok) {
if (k == n) {
visit();
goto 6;
}
goto 4;
}
goto 5;
4:
increase(k);
goto 2;
5:
increasev2(a[k]);
if (q != 0) {
goto 3;
}
6:
decrease(k);
if (k==0) {
goto 7;
}
set(p,u_k);
goto 5;
7:
return;
The question is: how can this code be refactored to eliminate all the goto
calls?
One (bogus) answer is a suggestion to "look up the cited scientific paper, and follow it line by line" - indeed, that is certainly a possibility. But this question is about what experienced programmers see instantly once they give a glance at this spaghetti code.
I am interested in the how of refactoring, step by step, more than just the code.
Note:
goto
jumps. Implementing the black-box functions initialize()
etc. would take only a few extra instructions, but those are irrelevant regarding the structure of the code. It is not important what is happening during the function calls, as the focus is now on the flow of the program.I sketched an algorithm for OP earlier at https://stackoverflow.com/a/36661381/120163
Found a better paper that discusses how to generate structured code while preserving the original control flow graph exactly:
W.D Maurer, "Generalized structured programs and loop trees", Science of Computer Programming, 2007
I followed that procedure (on paper, hope I did it right, looks OK at 2:40 am). His basic trick is to find strongly connected regions (cycles in the code); these will become loops; He then breaks that cycle by deleting an edge; this eventually becomes a loop backlink (restored when he is complete). The process is repeated until no more cycles can be found; what is left is essentially a structured program with identified loops. It is tricky to do this right; you really need an automated procedure. Your bit of code, although small, is still pretty nasty :-}
I cheated in one place. Maurer insists that forward gotos are OK, even into the middle of loop. If you buy that, then you can preserve the CFG exactly. If not, you have to handle the case where a loop has two or more entry points; your algorithm has such a loop. I solved the problem by coding the loop, and coding a loop-tail-end-fragment equivalent that acts like the first iteration of jump-into-the-middle, which is followed by the loop itself.
My notation is a little funny: most languages don't have "block{...}" constructs. [The one I code in (see bio) does]. Think of this as being an "execute one iteration" loop :-} I assume that blocks/loops have loop exits and loop continues. If you dont have these, you can simulate them with sufficient quantity of just block{ ... } and exit_block@N.
EDIT after accepted: In the light of day, I didn't do it right, I left out the of the while loop@3. I've patched that; the need for the block construct now goes away because I can exit the while loop@3 to achieve the same affect. Actually, the code reads a little better.
I left your numeric labels in, even where they aren't needed, for easier reference.
//Algorithm X;
1:
initialize();
2:
while (true) {
enter_level(k);
3:
while (true) {
set(a[k],q);
if (test() == ok) {
if (k != n) exit_while@3;
visit();
decrease(k); // replicate logic at 6 to avoid jumping into middle of 5 loop
if (k==0) return;
set(p,u_k);
}
5:
while (true) {
increasev2(a[k]);
if (q != 0) continue_while@3;
6:
decrease(k);
if (k==0) return;
set(p,u_k);
} // while(true)@5
} // while(true)@3
4:
increase(k);
} // while(true)@2
Unlike most other answers I've seen so far, this runs at the same speed as the original (no extra flags or flag checks).
@hatchet's answer is interesting; a) it is equally fast, b) he chose to handle the two entry loop by the same technique, but he chose the "other entry" as the loop top. He did something similar with the "enter_level(k)" operation at label 2.
It is interesting that all this structuring doesn't seem to help readability of this code one bit. Makes one wonder about the whole point of "structured programs". Maybe well-designed spaghetti isn't so bad :-}