In Python-Polars, I am trying to get the shrinked data type of a column using an expression, to be able to run validations against it.
For example, I would like to build an expression that allows me to do the following:
df = pl.DataFrame({"list_column": [[1, 2], [3, 4], [5, 6]]})
shape: (3, 1)
┌─────────────┐
│ list_column │
│ --- │
│ list[i64] │
╞═════════════╡
│ [1, 2] │
│ [3, 4] │
│ [5, 6] │
└─────────────┘
df.select(type_check = pl.lit((pl.col("list_column").shrink_dtype() == pl.List)))
shape: (3, 2)
┌─────────────┬────────────┐
│ list_column ┆ type_check │
│ --- ┆ --- │
│ list[i64] ┆ bool │
╞═════════════╪════════════╡
│ [1, 2] ┆ true │
│ [3, 4] ┆ true │
│ [5, 6] ┆ true │
└─────────────┴────────────┘
Is this something feasible?
No. In first place, the data type for the list_column in your example is pl.List(pl.Int64())
, so it would not be equal to pl.List
- polars has a strong distinction between different nested types, and shrink_dtype
does not currently works for that case at all.
Secondly, the data type is always the same for all rows within a given column, so it does not makes much sense to do the same operation for every single row.
You can use df.collect_schema()
to get a Schema object instead, which contains the data type for each column.
Alternatively, you might want to consider using dtype selectors if you wanted to perform different operations for each type.