pythonpython-polars

Python-Polars: Get column type using an expression


In Python-Polars, I am trying to get the shrinked data type of a column using an expression, to be able to run validations against it.

For example, I would like to build an expression that allows me to do the following:

df = pl.DataFrame({"list_column": [[1, 2], [3, 4], [5, 6]]})
shape: (3, 1)
┌─────────────┐
│ list_column │
│ ---         │
│ list[i64]   │
╞═════════════╡
│ [1, 2]      │
│ [3, 4]      │
│ [5, 6]      │
└─────────────┘

df.select(type_check = pl.lit((pl.col("list_column").shrink_dtype() == pl.List)))

shape: (3, 2)
┌─────────────┬────────────┐
│ list_column ┆ type_check │
│ ---         ┆ ---        │
│ list[i64]   ┆ bool       │
╞═════════════╪════════════╡
│ [1, 2]      ┆ true       │
│ [3, 4]      ┆ true       │
│ [5, 6]      ┆ true       │
└─────────────┴────────────┘

Is this something feasible?


Solution

  • No. In first place, the data type for the list_column in your example is pl.List(pl.Int64()), so it would not be equal to pl.List - polars has a strong distinction between different nested types, and shrink_dtype does not currently works for that case at all.

    Secondly, the data type is always the same for all rows within a given column, so it does not makes much sense to do the same operation for every single row.

    You can use df.collect_schema() to get a Schema object instead, which contains the data type for each column.

    Alternatively, you might want to consider using dtype selectors if you wanted to perform different operations for each type.