dataframecountpython-polarsexplodepolars

Explode and duplicate rows in df and create a new column with the correct count of the values


In input, I have this dataframe :

numberType NumberInst Type
1 None Car
2 1 Bus
3 1 Plane

I did that to explode and duplicate the rows :

df= df.with_columns(pl.col("NumberInst").alias("numberInstExclude"))
df= df.select(pl.exclude("numberInstExclude").repeat_by("numberType").explode())

The result, I want to have is :

numberType NumberInst Type NumberInstRep
1 None Car 1
2 1 Bus 1
2 1 Bus 2
3 1 Plane 1
3 1 Plane 2
3 1 Plane 3

If you have an idea, how could I get "NumberInstRep". What I want to do is :

if NumberInst is None then NumberInstRep = 1
else use numberType to have NumberInstRep

Thank you.


Solution

  • You can use pl.int_ranges() to create lists with the right number of repetitions, then df.explode() it

    >>> df = pl.DataFrame({'type': [1, 2, 3], 'inst': [None, 1, 1], 'category': ['car', 'bus', 'plane']})
    >>> df.with_columns(pl.int_ranges(1, 1+pl.col('type')).alias('rep')).explode('rep')
    shape: (6, 4)
    ┌──────┬──────┬──────────┬─────┐
    │ type ┆ inst ┆ category ┆ rep │
    │ ---  ┆ ---  ┆ ---      ┆ --- │
    │ i64  ┆ i64  ┆ str      ┆ i64 │
    ╞══════╪══════╪══════════╪═════╡
    │ 1    ┆ null ┆ car      ┆ 1   │
    │ 2    ┆ 1    ┆ bus      ┆ 1   │
    │ 2    ┆ 1    ┆ bus      ┆ 2   │
    │ 3    ┆ 1    ┆ plane    ┆ 1   │
    │ 3    ┆ 1    ┆ plane    ┆ 2   │
    │ 3    ┆ 1    ┆ plane    ┆ 3   │
    └──────┴──────┴──────────┴─────┘