I'm writing a polars plugin that works, but never seems to use more than one CPU. The plugin's function is element-wise, and is marked as such in register_plugin_function
. What might I need to do to get this plugin to properly use parallelism?
It's a bit of a complex plugin, and I can't share the code. Some relevant information:
is_elementwise=True
in the register_plugin_function
call.pl.threadpool_size()
(Python-side) gives the right number, as does printing the pool size on the Rust side with polars_core::POOL
.Any thoughts?
Thanks to @jqurious for getting me to an answer!
I was able to get the plugin running in parallel by forcing it through a LazyFrame and collecting with .collect(engine="streaming")
. Instead of doing
df = df.with_columns(my_plugin(colname, arg))
I did
df = df.lazy().with_columns(my_plugin(colname, arg)).collect(engine="streaming")
and this worked as expected, giving me a ~30x speedup on a 32-core machine. I'm not sure if this is the way Polars intends plugins to work, but it did work.