In input, I have this dataframe :
numberType | NumberInst | Type |
---|---|---|
1 | None | Car |
2 | 1 | Bus |
3 | 1 | Plane |
I did that to explode and duplicate the rows :
df= df.with_columns(pl.col("NumberInst").alias("numberInstExclude"))
df= df.select(pl.exclude("numberInstExclude").repeat_by("numberType").explode())
The result, I want to have is :
numberType | NumberInst | Type | NumberInstRep |
---|---|---|---|
1 | None | Car | 1 |
2 | 1 | Bus | 1 |
2 | 1 | Bus | 2 |
3 | 1 | Plane | 1 |
3 | 1 | Plane | 2 |
3 | 1 | Plane | 3 |
If you have an idea, how could I get "NumberInstRep". What I want to do is :
if NumberInst is None then NumberInstRep = 1
else use numberType to have NumberInstRep
Thank you.
You can use pl.int_ranges()
to create lists with the right number of repetitions, then df.explode()
it
>>> df = pl.DataFrame({'type': [1, 2, 3], 'inst': [None, 1, 1], 'category': ['car', 'bus', 'plane']})
>>> df.with_columns(pl.int_ranges(1, 1+pl.col('type')).alias('rep')).explode('rep')
shape: (6, 4)
┌──────┬──────┬──────────┬─────┐
│ type ┆ inst ┆ category ┆ rep │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 │
╞══════╪══════╪══════════╪═════╡
│ 1 ┆ null ┆ car ┆ 1 │
│ 2 ┆ 1 ┆ bus ┆ 1 │
│ 2 ┆ 1 ┆ bus ┆ 2 │
│ 3 ┆ 1 ┆ plane ┆ 1 │
│ 3 ┆ 1 ┆ plane ┆ 2 │
│ 3 ┆ 1 ┆ plane ┆ 3 │
└──────┴──────┴──────────┴─────┘