openmdao

Is there a way to "clean up" an OpenMDAO component to be pickled after an optimization has been run?


Let's take the example below, but imagine that Comp is a much larger group than depicted. After the optimization is run, we want to be able to pickle comp and then later unpickle is it to call some of it's functions that rely on get_val calls. This is for use in a web interface when the optimization is being run in a callback and the object need to be passed to other callbacks to explore the results.

import openmdao.api as om


class Comp(om.Group):

    def setup(self):
        desvars: om.IndepVarComp = self.add_subsystem('desvars', om.IndepVarComp(), promotes=['*'])
        desvars.add_output('x')
        self.add_subsystem('comp', om.ExecComp('y=x**2'), promotes=['*'])
        self.add_design_var('x')
        self.add_objective('y')


if __name__ == '__main__':
    prob = om.Problem()
    comp: Comp = prob.model.add_subsystem('comp', Comp())
    prob.driver = om.ScipyOptimizeDriver(optimizer='SLSQP')
    prob.setup()
    prob.run_driver()

    # Here we want to reduce the size of comp and pickle it

    # Something like the following line might be called in a separate callback after unpickling
    print(comp.get_val('y'))

Just pickling it works, but taking comp as a sizable group with many sub components, the size of the final object is huge. After setup but before run_driver, we usually see sizes of around 5-20mb. After run_driver, this can skyrocket to over 500 mb making this generally challenging for passing between callbacks.

Is there any way to "clean up" the comp object to bring it back to the size that it was after the setup call, but retaining the values of variables that were set by the driver?

Update:
I ultimately moved towards the prob.load_case approach that Justin and swryan sugguested. I was able to drop the runtime down on the case recording from >1min to <10 secs by taking some of the suggestions below, but the main time drop came from changing the behavior of the sqlite journal_mode as seen in the snip below

recorder = om.SqliteRecorder(filepath, record_viewer_data=False)
prob.add_recorder(recorder)

prob.setup()
prob.final_setup()
with recorder.connection as c:
    c.execute('PRAGMA journal_mode = MEMORY')

This prevented a .sql-journal file being generated hundreds of times as all the subsystems were looped over and rather kept the journal in memory instead. This can apparently be a little more susceptible to database corruption, but ended up making the use of the CaseRecorder feasible for my particular application


Solution

  • the problem is that components don't actually contain their own state --- even though they kind of look like they do. They have some pointers to vector views that actually point back to the global problem level vector. So when you pickle a component, you're actually saving data from the whole problem. In other words, components are not at all designed to work as stand alone objects. They are children of the problem, and need the surrounding problem to function properly. So that increase in size you see is all the memory allocation and and various other setup that the problem does on behalf of the component.

    However, If I understand you use case correctly then it seems like you are only trying to recall the data itself and don't need the ability to re-execute the model (e.g. call run_model or run_driver again) in order to plot the data. So trying to recreate the entire model object seems like overkill.

    Instead I recommend you use the CaseRecorder and CaseReader objects. These are designed specifically to save and reload the numerical data. The Reader object even provides the same get_val interface (at the problem level) that the original problem does.

    Perhaps there is a reason that you don't want to the the CaseRecorder though. In that case, the second best option I can provide to you is to pickle the output vector of the Problem object. Then when you need to reload the data later, you can re-instantiate and re-setup a clean Problem object and then push the old output vector values into it. Then you should be able to call get_val and set_val on the problem or objects. There is certainly a way to do this, because the [load_case][3] method does essentially exact this thing. However load_case takes a case object, which comes from a CaseReader. So again, I really recommend you use the CaseRecorder and CaseReader objects.

    Since you have a web-application, you could potentially even write your own CaseRecorder/Reader objects to work on your web database instead of a local SQLite file.

    Another strong advantage of the CaseRecorder method is that it gives the model builder fine-grained control over what data gets saved and when. Including saving different granularity of data at different times. For example, you can configure the driver to save the DV variables, constraints, and objective only. So your time-history of optimization won't be massive amounts of data. But you may also want one final case saved that does have EVERY variable in the model --- which would be too much data to save for every iteration--- so you can configure the Problem recording options to be more broad. Then you can manually trigger one case to be saved with all the variables using the prob.record('final') call.