I'm trying to get a fuller picture of the use of the optimal substructure property in dynamic programming, yet I've gone blind on why we have to prove that any optimal solution to the problem contains within it optimal solutions to the sub-problems.
Wouldn't it be enough to show that some optimal solution to the problem has this property, and then use this to argue that since the solution built by our recursive algorithm is at least as good as an optimal solution, it will itself be optimal? In other words, I fail to spot where in the correctness argument for our algorithm we need that all optimal solutions contain optimal solutions to sub-problems.
To clarify:
The CLRS definition of optimal substructure says that "a problem exhibits optimal substructure if any optimal solution to the problem contains within it optimal solutions to subproblems".
Why wouldn't it not be enough to say that "a problem exhibits optimal substructure if some optimal solution to the problem contains within it optimal solutions to subproblems"?
I've been bothered a bit by this in my research on approximation algorithms, which involves dynamic programs that find approximately optimal solutions. The right way to think about the correctness of dynamic programs, I believe, is as a reduction (in the complexity theory sense) from a problem to a subproblem. This reduction often is applied recursively and with memoization, but those are details right now.
Let A be the problem and B be the subproblem. There's only one subproblem because we can combine multiple independent subproblems into one via a generalized Cartesian product. The reduction consists of two functions: f, from an A-instance to a B-instance, and h, from a B-solution to an A-solution. The correctness property that we need is that, for every function g from each B-instance to a corresponding optimal B-solution (the oracle), the composition h ∘ g ∘ f is a function from each A-instance to a corresponding optimal A-solution. (If h needs access to the A-instance, then extend B so that its instances contain an A-instance that must be copied verbatim into the corresponding B-solution.)
To make your point, for a particular A-instance and an optimal A-solution, there need not exist an oracle g such that the pipeline h ∘ g ∘ f produces that solution from the given instance. (In other words, h need not be surjective.) On the other hand, h must be able to handle every possible optimal B-solution from g for the B-instance constructed by f.
One common strategy for ensuring that h is correct is to find a "substructure" function k from A-solutions to B-solutions and a way to "splice" substructure, that is, a proof that, given an A-solution x and a B-solution y no worse than k(x), there exists an A-solution x' no worse than x such that k(x') = y. Then h can optimize over everything in the inverse image under k of its input. It is not necessary that splicing apply to all solutions x, just one optimal one.