step_n(0, I, I).
step_n(N, In, Out) :-
N > 0, plus(N1, 1, N), phase_step(In, T),
step_n(N1, T, Out).
phase_step
is a function that transforms data.
Will this step_n
run in almost the same memory as phase_step
? If not, how should I rewrite it to do so? Will this depend on phase_step having a single solution?
EDIT: After some debugging using prolog_current_frame
, I found out that if phase_step
is a simple function like Out is In + 1
, then optimization happens but not in my use case.
Why is TCO dependent on phase_step
predicate?
Will this depend on phase_step having a single solution?
Kind of, but a bit stronger still: It depends on phase_step
being deterministic, which means, not leaving any "choice points". A choice point is a future path to be explored; not necessarily one that will produce a further solution, but still something Prolog needs to check.
For example, this is deterministic:
phase_step_det(X, X).
It has a single solution, and Prolog does not prompt us for more:
?- phase_step_det(42, Out).
Out = 42.
The following has a single solution, but it is not deterministic:
phase_step_extrafailure(X, X).
phase_step_extrafailure(_X, _Y) :-
false.
After seeing the solution, there is still something Prolog needs to check. Even if we can tell by looking at the code that that something (the second clause) will fail:
?- phase_step_extrafailure(42, Out).
Out = 42 ;
false.
The following has more than one solution, so it is not deterministic:
phase_step_twosolutions(X, X).
phase_step_twosolutions(X, Y) :-
plus(X, 1, Y).
?- phase_step_twosolutions(42, Out).
Out = 42 ;
Out = 43.
Why is TCO dependent on phase_step predicate?
If there are further paths to be explored, then there must be data about those paths stored somewhere. That "somewhere" is some sort of stack data structure, and for every future path there must be a frame on the stack. This is why your memory usage grows. And with it, the computation time (the following uses copies of your step_n
with my corresponding phase_step
variants from above):
?- time(step_n_det(100_000, 42, Out)).
% 400,002 inferences, 0.017 CPU in 0.017 seconds (100% CPU, 24008702 Lips)
Out = 42 ;
% 7 inferences, 0.000 CPU in 0.000 seconds (87% CPU, 260059 Lips)
false.
?- time(step_n_extrafailure(100_000, 42, Out)).
% 400,000 inferences, 4.288 CPU in 4.288 seconds (100% CPU, 93282 Lips)
Out = 42 ;
% 100,005 inferences, 0.007 CPU in 0.007 seconds (100% CPU, 13932371 Lips)
false.
?- time(step_n_twosolutions(100_000, 42, Out)).
% 400,000 inferences, 4.231 CPU in 4.231 seconds (100% CPU, 94546 Lips)
Out = 42 ;
% 4 inferences, 0.007 CPU in 0.007 seconds (100% CPU, 548 Lips)
Out = 43 ;
% 8 inferences, 0.005 CPU in 0.005 seconds (100% CPU, 1612 Lips)
Out = 43 ;
% 4 inferences, 0.008 CPU in 0.008 seconds (100% CPU, 489 Lips)
Out = 44 ;
% 12 inferences, 0.003 CPU in 0.003 seconds (100% CPU, 4396 Lips)
Out = 43 ;
% 4 inferences, 0.009 CPU in 0.009 seconds (100% CPU, 451 Lips)
Out = 44 . % many further solutions
One way to explore this is using the SWI-Prolog debugger, which has a way of showing you alternatives (= choice points = future paths to be explored):
?- trace, step_n_det(5, 42, Out).
Call: (9) step_n_det(5, 42, _1496) ? skip % I typed 's' here.
Exit: (9) step_n_det(5, 42, 42) ? alternatives % I typed 'A' here.
[14] step_n_det(0, 42, 42)
Exit: (9) step_n_det(5, 42, 42) ? no debug % I typed 'n' here.
Out = 42 ;
false.
?- trace, step_n_extrafailure(5, 42, Out).
Call: (9) step_n_extrafailure(5, 42, _1500) ? skip
Exit: (9) step_n_extrafailure(5, 42, 42) ? alternatives
[14] step_n_extrafailure(0, 42, 42)
[14] phase_step_extrafailure(42, 42)
[13] phase_step_extrafailure(42, 42)
[12] phase_step_extrafailure(42, 42)
[11] phase_step_extrafailure(42, 42)
[10] phase_step_extrafailure(42, 42)
Exit: (9) step_n_extrafailure(5, 42, 42) ? no debug
Out = 42 ;
false.
All of those alternatives correspond to extra interpreter frames. If you use SWI-Prolog's visual debugger, it will also show you a graph representation of your stack, including all open choice points (though I've always found that hard to make sense of).
So if you want TCO and not grow the stack, you need your phase step to execute deterministically. You can do that by making the phase_step
predicate itself deterministic. You can also put a cut after the phase_step
call inside step_n
.
Here are the calls from above with a cut after each phase_step
:
?- time(step_n_det(100_000, 42, Out)).
% 400,001 inferences, 0.017 CPU in 0.017 seconds (100% CPU, 24204529 Lips)
Out = 42 ;
% 7 inferences, 0.000 CPU in 0.000 seconds (83% CPU, 737075 Lips)
false.
?- time(step_n_extrafailure(100_000, 42, Out)).
% 400,000 inferences, 0.023 CPU in 0.023 seconds (100% CPU, 17573422 Lips)
Out = 42 ;
% 5 inferences, 0.000 CPU in 0.000 seconds (93% CPU, 220760 Lips)
false.
?- time(step_n_twosolutions(100_000, 42, Out)).
% 400,000 inferences, 0.023 CPU in 0.023 seconds (100% CPU, 17732727 Lips)
Out = 42 ;
% 5 inferences, 0.000 CPU in 0.000 seconds (94% CPU, 219742 Lips)
false.
Do not place cuts blindly, only once you understand where and why you really need them. Note how in the extrafailure
case the cut only removes failures, but in the twosolutions
case it removes actual solutions.