pandaspandas-melt

regex column selection inside pd.melt function


I have a very wide df with about 100 columns, within which there are several 'XYZ_rating' columns that I want to pivot longer.

df = pd.DataFrame({

    'id': [1, 1, 1, 2, 2, 2, 3, 3, 3],

    'first_rating': [1, 2, 3, 1, 2, 3, 1, 2, 3],

    'second_rating': [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],

    'third_rating': [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9],

})

The df in question has about 100 other columns that I do not want to transform from wide to long. I also want to remove the '_rating' suffix from the the category string.

The solution I came up with didn't work and is roughly translated from my R background.

pd.melt(df, id_vars=str.contains('[^rating]'), value_vars=re.contains(`rating`), var_name='category', value_name='value')

Desired output df would look like:

id category rating 100 other columns...
1 first 1 ...
1 second 2.8 ...
1 third 3.4 ...
...

Solution

  • One option is to reshape with pivot_longer from pyjanitor, using names_sep to split into new columns:

    # pip install pyjanitor
    import pandas as pd
    import janitor
    (df
    .pivot_longer(
        # you could pass a regex here : 
        # re.compile(".+rating")
        column_names = "*rating", # uses the shell glob syntax
        names_to = ("category", ".value"), 
        names_sep = "_")
    )
        id category  rating
    0    1    first     1.0
    1    1    first     2.0
    2    1    first     3.0
    3    2    first     1.0
    4    2    first     2.0
    5    2    first     3.0
    6    3    first     1.0
    7    3    first     2.0
    8    3    first     3.0
    9    1   second     2.8
    10   1   second     2.9
    11   1   second     2.2
    12   2   second     2.0
    13   2   second     1.8
    14   2   second     1.9
    15   3   second     2.2
    16   3   second     2.3
    17   3   second     2.1
    18   1    third     3.4
    19   1    third     3.8
    20   1    third     2.9
    21   2    third     3.2
    22   2    third     2.8
    23   2    third     2.4
    24   3    third     3.3
    25   3    third     3.4
    26   3    third     2.9