pythondataframecastingtype-conversionpython-polars

Casting a Polars pl.Object column to pl.String raises a ComputeError


I got a pl.LazyFrame with a column of type Object that contains date representations, it also includes missing values (None).
In a first step I would like to convert the column from Object to String however this results in a ComputeError. I can not seem to figure out why. I suppose this is due to the None values, sadly I can not drop those at the current point in time.

import numpy as np
import polars as pl

rng = np.random.default_rng(12345)
df = pl.LazyFrame(
    data={
        "date": rng.choice(
            [None, "03.04.1998", "03.05.1834", "05.06.2025"], 100
        ),
    }
)
df.with_columns(pl.col("date").cast(pl.String)).collect()

Solution

  • When Polars assigns the pl.Object type it essentially means: "I do not understand what this is."

    By the time you end up with this type, it is generally too late to do anything useful with it.

    In this particular case, numpy.random.choice is creating a numpy array of dtype=object

    >>> rng.choice([None, "foo"], 3)
    array([None, None, 'foo'], dtype=object)
    

    Polars has native .sample() functionality which you could use to create your data instead.

    df = pl.select(date = 
        pl.Series([None, "03.04.1998", "03.05.1834", "05.06.2025"])
          .sample(100, with_replacement=True)
    )
    
    # shape: (100, 1)
    # ┌────────────┐
    # │ date       │
    # │ ---        │
    # │ str        │
    # ╞════════════╡
    # │ null       │
    # │ 05.06.2025 │
    # │ 03.05.1834 │
    # │ 03.04.1998 │
    # │ …          │
    # │ null       │
    # │ 03.04.1998 │
    # │ 03.05.1834 │
    # │ 03.04.1998 │
    # └────────────┘