pythondata-manipulationpython-polars

How to reshape/unpivot multiple columns in a dataset and split the resulting values into a different number of columns in Polars?


I'm trying to restructure a dataframe by combining values from multiple pairs of columns into a single column for each pair. For example, I've got a column called age_pre and a column called age_post, and I want to combine these two columns into one column called age. Same with the exposure_pre and exposure_post columns.

df = pl.from_repr("""
┌─────┬─────────┬──────────┬──────────────┬───────────────┐
│ id  ┆ age_pre ┆ age_post ┆ exposure_pre ┆ exposure_post │
│ --- ┆ ---     ┆ ---      ┆ ---          ┆ ---           │
│ i64 ┆ i64     ┆ i64      ┆ f64          ┆ f64           │
╞═════╪═════════╪══════════╪══════════════╪═══════════════╡
│ 123 ┆ 55      ┆ 56       ┆ 0.49         ┆ 0.51          │
│ 123 ┆ 56      ┆ 57       ┆ 0.49         ┆ 0.51          │
│ 456 ┆ 49      ┆ 50       ┆ 0.8          ┆ 0.2           │
│ 456 ┆ 50      ┆ 51       ┆ 0.8          ┆ 0.2           │
└─────┴─────────┴──────────┴──────────────┴───────────────┘
""")

My desired output would look like this

shape: (8, 3)
┌─────┬─────┬──────────┐
│ id  ┆ age ┆ exposure │
│ --- ┆ --- ┆ ---      │
│ i64 ┆ i64 ┆ f64      │
╞═════╪═════╪══════════╡
│ 123 ┆ 55  ┆ 0.49     │
│ 123 ┆ 56  ┆ 0.51     │
│ 123 ┆ 56  ┆ 0.49     │
│ 123 ┆ 57  ┆ 0.51     │
│ 456 ┆ 49  ┆ 0.8      │
│ 456 ┆ 50  ┆ 0.2      │
│ 456 ┆ 50  ┆ 0.8      │
│ 456 ┆ 51  ┆ 0.2      │
└─────┴─────┴──────────┘

I tried creating multiple unpivoted dataframes and using .hstack() to create a dataframe with my desired output, put this seems really inefficient. How can I do this with the Polars syntax?


Solution

  • You can use pl.concat_list then .explode.

    out = df.with_columns(
        pl.concat_list("age_pre", "age_post").alias("age"),
        pl.concat_list("exposure_pre", "exposure_post").alias("exposure"),
    ).select("id", "age", "exposure").explode("age", "exposure")
            
    print(out)
    
    shape: (8, 3)
    ┌─────┬─────┬──────────┐
    │ id  ┆ age ┆ exposure │
    │ --- ┆ --- ┆ ---      │
    │ i64 ┆ i64 ┆ f64      │
    ╞═════╪═════╪══════════╡
    │ 123 ┆ 55  ┆ 0.49     │
    │ 123 ┆ 56  ┆ 0.51     │
    │ 123 ┆ 56  ┆ 0.49     │
    │ 123 ┆ 57  ┆ 0.51     │
    │ 456 ┆ 49  ┆ 0.8      │
    │ 456 ┆ 50  ┆ 0.2      │
    │ 456 ┆ 50  ┆ 0.8      │
    │ 456 ┆ 51  ┆ 0.2      │
    └─────┴─────┴──────────┘