I am testing the inductive capabilities of some theorem provers (such as Z3, Alt-Ergo, Vampire etc.) using the TPTP syntax. To my surprise, none of them managed to demonstrate the following simple conjecture:
tff(t1, type, (fun: $int > $int )).
tff(ax1, axiom, (
! [A: $int] : (
$less(A, 1) => (fun(A) = 123)
)
)).
tff(ax2, axiom, (
! [A: $int] : (
$greatereq(A, 1) => (fun(A) = fun($difference(A, 1)))
)
)).
tff(conj1, conjecture, ! [A: $int] : ($greatereq(A, 1) => (fun(A) = 123))).
% END OF SYSTEM OUTPUT
% RESULT: SOT_EWCr1V - Z3---4.8.9.0 says Timeout - CPU = 60.09 WC = 35.47
% OUTPUT: SOT_EWCr1V - Z3---4.8.9.0 says None - CPU = 0 WC = 0
This conjecture can be easily proven by induction, however it does not seem to be the case for the vast majority of theorem provers I have tested this on. Obviously, if I restrict the domain to only one element instead of the whole set of integers, the ATP succeeds because it only needs to check against a limited set of numbers:
tff(t1, type, (fun: $int > $int )).
tff(ax1, axiom, (
! [A: $int] : (
$less(A, 1) => (fun(A) = 123)
)
)).
tff(ax2, axiom, (
! [A: $int] : (
$greatereq(A, 1) => (fun(A) = fun($difference(A, 1)))
)
)).
tff(conj1, conjecture, ! [A: $int] : ((A = 5) => (fun(A) = 123))).
% END OF SYSTEM OUTPUT
% RESULT: SOT_j9liHr - Z3---4.8.9.0 says Theorem - CPU = 0.00 WC = 0.08
% OUTPUT: SOT_j9liHr - Z3---4.8.9.0 says Proof - CPU = 0.00 WC = 0.09
Is this a general limitation of automatic theorem provers? Is there any tool that performs well with induction?
PS: You can try it out using the following online tool: https://tptp.org/cgi-bin/SystemOnTPTP
PS2: The TPTP syntax manual can be found here: https://www.tptp.org/TPTP/TR/TPTPTR.shtml
This is expected. SMT solvers don't do induction out-of-the-box. You can "coax" them to do induction by proving the base case, then posing the induction-hypothesis and have them prove it using quantifiers; but that's usually a fool's errand. SMT solvers are simply not the right choice for doing inductive proofs. Here're some relevant prior discussions on stackoverflow regarding this matter:
And many others.
Having said that, the new define-fun-rec
construct in SMTLib allows for recursive definitions, whose proofs are naturally done by induction. So, I expect the community will be going towards that direction; adding inductive capabilities over time. For instance, see:
for a paper on how to do this properly in an SMT solver. As far as I know CVC5 has incorporated some of these ideas, but expecting out-of-the-box inductive proofs would be naive at this point. (See https://github.com/cvc5/cvc5/issues/1796)
So, long story short: No, SMT solvers don't do induction. You can coax them to do it, and there's recent work that add further capabilities, but a push-button experience is unlikely. If your goal is to reason about recursive definitions and hence you rely on induction, your best bet is to use a semi-automated theorem prover such as Isabelle, Coq, HOL, HOL-Light, ACLL2, Lean, etc.. all of which have strong facilities to do induction. Furthermore they incorporate SMT solvers as "tactics" so you get the best of both worlds in that sense. (i.e., use a manual strategy to dissect your proof to simple enough subgoals, handle induction etc, and ship the rest off to an SMT-solver.)