python-polarsrust-polars

Do repeated calls to polars with_columns cause fragmenting?


Is there a penalty to calling with_columns many times in polars. Does it lead to dataframe "fragmenting"?

EDIT: I don't mean to distract with the term "fragmenting". My real question is, is there any performance penalty to calling with_columns many times instead of with_columns with many columns?


Solution

  • I don't believe so. The reason why this is bad in Pandas is that it can trigger consolidation within BlockManager. If you're inserting N columns, consolidation can cause that to take O(N^2) time due to repeated copying of the blocks.

    Polars lacks an equivalent of BlockManager. I don't believe there's any situation where it stores two columns within the same backing Arrow array.