I have an optimization problem that involves minimizing a function whose gradient I know, but the actual value of objective function at any point is unknown.
I'd like to optimize the function using BFGS, but all of the BFGS implementations I've found seem to require knowledge of the value of the objective, especially in the line search step. I've looked at both a python (scipy) and C++ implementation of BFGS.
Obviously I can use gradient descent, but I'd prefer not to reinvent the wheel here.
Any ideas?
Some more detail: I want to minimize h. But I'm not given h. What I'm given is h = f(g), and an explicit formula for g(x). f basically transforms the gradients of g in a kind of tricky geometric way that is not too difficult to calculate, but impossible to integrate. So, it's pretty straightforward to calculate the gradient of h(x), but hard to get explicit values for h(x).
After spending some time thinking about this, I think the answer is to just adapt a quasi-newton method like BFGS. The only place that the function value enters into the BFGS computation is in the line search section, the first Wolfe condition.
I think the solution is to instead use a line search method that doesn't check the first Wolfe condition (Armijo rule).
I implemented it for BFGS in python and C++: https://gist.github.com/rmcgibbo/4735287. On second though though, I think you could get the same result by supplying the BFGS routine with a function that is always decreasing (e.g. it contains a counter tracking the # of times it has been called, and always returns a smaller number than it did the last time you called it). The decrease has to be big enough that you always pass the Armijo rule (http://en.wikipedia.org/wiki/Wolfe_conditions).