modeltraining-datavowpalwabbit

What does --passes do in Python VowpalWabbit?


The --passes flag is the number of training passes. But it's unclear what the notion of passes means when training a Python VW model, within a for loop for example.

e.g. If I'm training a model example by example within a for-loop like this:

for line in train:
    model.learn(line)

How could there be multiple passes if each training sample is learned from only once?


Solution

  • In Python the passes option only affect when the inbuilt driver is used. This only occurs when a data file and or passes is specified in the configuration for the VW object. It does not cause different behavior for model.learn(line).

    This is the check (internally to the Python code) for running the inbuilt parser:

    class vw(pylibvw.vw):
        def __init__(self, arg_str=None, **kw):
            # ...
            ext_file_args = ['d', 'data', 'passes']
            if any(x in kw for x in ext_file_args):
                pylibvw.vw.run_parser(self)
    

    This is one of those confusing cases caused by the fact VW was as a command line tool first. This is definitely something it would be good to make clearer as we work on the bindings.