vowpalwabbitprecompiled

Using a precompiled version of Vowpal Wabbit - Downsides?


Due to the difficulty of compiling VW on a RHEL machine, I am opting out to use a compiled versions of VW provided by Ariel Faigon (thank you!) here. I'm calling VW from Python, so I am planning on using Python's subprocess module (I couldn't get the python package to compile either). I am wondering if there would be any downsides to this approach. Would I see any performance lags?

Thank you so much for your help!


Solution

  • Feeding a live vowpal wabbit process via Python's subprocess is ok (fast). as long as you don't start a new process per example and avoid excessive context switches. In my experience, in this set up, you can expect a throughput of ~500k features per second on typical dual-core hardware. This is not as fast as the (10x faster) ~5M features/sec vw typically processes when not interacting with any other software (reading from file/cache), but is good enough for most practical purposes. Note that the bottleneck in this setting would most likely be the processing by the additional process, not vowpal-wabbit itself.

    It is recommended to feed vowpal-wabbit in batches (N examples at a time, instead of one at a time) both on input (feeding vw) and on output (reading vw responses). If you're using subprocess.Popen to connect to the process, make sure to pass a large bufsize otherwise by default the Popen iterator would be line-buffered (one example at a time) which might result in a per example context-switch between the producer of examples and consumer (vowpal wabbit).

    Assuming your vw command line is in vw_cmd, it would be something like:

    vw_proc = subprocess.Popen(vw_cmd,
                       stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
                       bufsize=1048576)
    

    Generally, slowness can come from:

    So avoiding all the above pitfalls should give you the fastest throughput possible under the circumstances of having to interact with additional processes.