Polars dataframes have an is_empty
attribute:
import polars as pl
df = pl.DataFrame()
df.is_empty() # True
df = pl.DataFrame({"a": [], "b": [], "c": []})
df.is_empty() # True
This is not the case for Polars lazyframes, so I devised the following helper function:
def is_empty(data: pl.LazyFrame) -> bool:
return (
data.width == 0 # No columns
or data.null_count().collect().sum_horizontal()[0] == 0 # Columns exist, but are empty
)
other = pl.LazyFrame()
other.pipe(is_empty) # True
other = pl.LazyFrame({"a": [], "b": [], "c": []})
other.pipe(is_empty) # True
Is there a better way to do this? By better, I mean either without collecting or less memory-intensive if collecting can not be avoided.
As explained in the comments, "A LazyFrame doesn't have length. It is a promise on future computations. If we would do those computations implicitly, we would trigger a lot of work silently. IMO when the length is needed, you should materialize into a DataFrame and cache that DataFrame so that that work isn't done twice".
So, calling collect is inevitable, but one can limit the cost by collecting only the first row (if any) with Polars limit, as suggested by @Timeless:
import polars as pl
df = pl.LazyFrame()
df.limit(1).collect().is_empty() # True
df= pl.LazyFrame({"a": [], "b": [], "c": []})
df.limit(1).collect().is_empty() # True
df = pl.LazyFrame({col: range(100_000_000) for col in ("a", "b", "c")})
df.limit(1).collect().is_empty() # False, no memory cost