Is there a penalty to calling with_columns
many times in polars. Does it lead to dataframe "fragmenting"?
EDIT: I don't mean to distract with the term "fragmenting". My real question is, is there any performance penalty to calling with_columns
many times instead of with_columns
with many columns?
I don't believe so. The reason why this is bad in Pandas is that it can trigger consolidation within BlockManager. If you're inserting N columns, consolidation can cause that to take O(N^2) time due to repeated copying of the blocks.
Polars lacks an equivalent of BlockManager. I don't believe there's any situation where it stores two columns within the same backing Arrow array.