pythonpandasdataframemeltpandas-melt

How do I melt a pandas dataframe?


On the tag, I often see users asking questions about melting dataframes, but existing questions* are often too specific or lacking explanation to be broadly useful. I am going to attempt a canonical Q&A (self-answer) on this topic. I am is going to clarify:

  1. What is melt?

  2. How do I use melt?

  3. When do I use melt?

Dataset

I will have all my answers on this dataset of random grades for random people with random ages (easier to explain for the answers :D):

import pandas as pd

df = pd.DataFrame({
    'Name': ['Bob', 'John', 'Foo', 'Bar', 'Alex', 'Tom'],
    'Math': ['A+', 'B', 'A', 'F', 'D', 'C'],
    'English': ['C', 'B', 'B', 'A+', 'F', 'A'],
    'Age': [13, 16, 16, 15, 15, 13]})
>>> df
   Name Math English  Age
0   Bob   A+       C   13
1  John    B       B   16
2   Foo    A       B   16
3   Bar    F      A+   15
4  Alex    D       F   15
5   Tom    C       A   13

Problems

Problem 1

How do I melt a dataframe so that the original dataframe becomes the following?

    Name  Age  Subject Grade
0    Bob   13  English     C
1   John   16  English     B
2    Foo   16  English     B
3    Bar   15  English    A+
4   Alex   17  English     F
5    Tom   12  English     A
6    Bob   13     Math    A+
7   John   16     Math     B
8    Foo   16     Math     A
9    Bar   15     Math     F
10  Alex   17     Math     D
11   Tom   12     Math     C

I want to transpose this so that one column would be each subject and the other columns would be the repeated names of the students and their age and score.

Problem 2

Similar to Problem 1, but I want to make the Subject column only have Math; in other words, I want to filter out the English column:

   Name  Age Subject Grades
0   Bob   13    Math     A+
1  John   16    Math      B
2   Foo   16    Math      A
3   Bar   15    Math      F
4  Alex   15    Math      D
5   Tom   13    Math      C

Problem 3

How would I group the melt and order the students by their scores?

  value             Name                Subjects
0     A         Foo, Tom           Math, English
1    A+         Bob, Bar           Math, English
2     B  John, John, Foo  Math, English, English
3     C         Tom, Bob           Math, English
4     D             Alex                    Math
5     F        Bar, Alex           Math, English

I need it to be ordered and the names separated by comma and also the Subjects separated by comma in the same order respectively.

Problem 4

How would I unmelt a melted dataframe? Let's say I already melted this dataframe:

df = df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades')

To become:

    Name  Age  Subject Grades
0    Bob   13     Math     A+
1   John   16     Math      B
2    Foo   16     Math      A
3    Bar   15     Math      F
4   Alex   15     Math      D
5    Tom   13     Math      C
6    Bob   13  English      C
7   John   16  English      B
8    Foo   16  English      B
9    Bar   15  English     A+
10  Alex   15  English      F
11   Tom   13  English      A

Then how would I translate this back to the original dataframe?

Problem 5

How would I group by the names of the students and separate the subjects and grades by comma?

   Name        Subject Grades
0  Alex  Math, English   D, F
1   Bar  Math, English  F, A+
2   Bob  Math, English  A+, C
3   Foo  Math, English   A, B
4  John  Math, English   B, B
5   Tom  Math, English   C, A

Problem 6

How would I completely melt my dataframe - all columns as values?

     Column Value
0      Name   Bob
1      Name  John
2      Name   Foo
3      Name   Bar
4      Name  Alex
5      Name   Tom
6      Math    A+
7      Math     B
8      Math     A
9      Math     F
10     Math     D
11     Math     C
12  English     C
13  English     B
14  English     B
15  English    A+
16  English     F
17  English     A
18      Age    13
19      Age    16
20      Age    16
21      Age    15
22      Age    15
23      Age    13

* Some existing questions


Solution

  • Note for pandas versions < 0.20.0: I will be using df.melt(...) for my examples, but you will need to use pd.melt(df, ...) instead.

    Documentation

    Most of the solutions here will use melt:

    Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

    This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.

    See also:

    Logic to melting

    Melting combines multiple columns and converts the dataframe from wide to long. For the solution to Problem 1 (covered below), the steps are basically:

    1. First we have the original dataframe.

    2. The melt concatenates the Math and English columns and replicates the other column values (making the dataframe longer).

    3. Then it adds the column Subject which is the transformation of the Grades column headers.

    Image showing how the table transforms

    Solutions

    Problem 1

    This can be solved using DataFrame.melt:

    print(df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades'))
    

    With id_vars set to ['Name', 'Age'], then value_vars is automatically set to the other columns (['Math', 'English']), which are then transposed.

    You could also solve Problem 1 using stack like the below:

    print(
        df.set_index(["Name", "Age"])
        .stack()
        .reset_index(name="Grade")
        .rename(columns={"level_2": "Subject"})
        .sort_values("Subject")
        .reset_index(drop=True)
    )
    

    This code sets the Name and Age columns as the index and stacks the rest of the columns Math and English, and resets the index and assigns Grade as the column name, then renames the other column level_2 to Subject and then sorts by the Subject column, then finally resets the index again.

    Both of these solutions output:

        Name  Age  Subject Grade
    0    Bob   13  English     C
    1   John   16  English     B
    2    Foo   16  English     B
    3    Bar   15  English    A+
    4   Alex   17  English     F
    5    Tom   12  English     A
    6    Bob   13     Math    A+
    7   John   16     Math     B
    8    Foo   16     Math     A
    9    Bar   15     Math     F
    10  Alex   17     Math     D
    11   Tom   12     Math     C
    

    Problem 2

    This is similar to my first question, but this one I only one to filter in the Math columns, this time the value_vars argument can come into use, like the below:

    print(
        df.melt(
            id_vars=["Name", "Age"],
            value_vars="Math",
            var_name="Subject",
            value_name="Grades",
        )
    )
    

    Or we can also use stack with column specification:

    print(
        df.set_index(["Name", "Age"])[["Math"]]
        .stack()
        .reset_index(name="Grade")
        .rename(columns={"level_2": "Subject"})
        .sort_values("Subject")
        .reset_index(drop=True)
    )
    

    Both of these solutions give:

       Name  Age Subject Grade
    0   Bob   13    Math    A+
    1  John   16    Math     B
    2   Foo   16    Math     A
    3   Bar   15    Math     F
    4  Alex   15    Math     D
    5   Tom   13    Math     C
    

    Problem 3

    Problem 3 could be solved with melt and groupby, using the agg function with ', '.join, like the below:

    print(
        df.melt(id_vars=["Name", "Age"])
        .groupby("value", as_index=False)
        .agg(", ".join)
    )
    

    It melts the dataframe then groups by the grades and aggregates them and joins them by a comma.

    stack could be also used to solve this problem, with stack and groupby like the below:

    print(
        df.set_index(["Name", "Age"])
        .stack()
        .reset_index()
        .rename(columns={"level_2": "Subjects", 0: "Grade"})
        .groupby("Grade", as_index=False)
        .agg(", ".join)
    )
    

    This stack function just transposes the dataframe in a way that is equivalent to melt, then resets the index, renames the columns and groups and aggregates.

    Both solutions output:

      Grade             Name                Subjects
    0     A         Foo, Tom           Math, English
    1    A+         Bob, Bar           Math, English
    2     B  John, John, Foo  Math, English, English
    3     C         Bob, Tom           English, Math
    4     D             Alex                    Math
    5     F        Bar, Alex           Math, English
    

    Problem 4

    How would I unmelt a melted dataframe? Let's say I already melted this dataframe:

    df = df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades')
    

    This could be solved with pivot_table. We would have to specify the arguments values, index, columns and also aggfunc:

    print(
        df.pivot_table("Grades", ["Name", "Age"], "Subject", aggfunc="first")
        .reset_index()
        .rename_axis(columns=None)
    )
    

    Output:

       Name  Age English Math
    0  Alex   15       F    D
    1   Bar   15      A+    F
    2   Bob   13       C   A+
    3   Foo   16       B    A
    4  John   16       B    B
    5   Tom   13       A    C
    

    We first pivot the melted dataframe and then reset the index and remove the column axis name.

    See also:

    Problem 5

    Problem 5 could be solved with melt and groupby like the following:

    print(
        df.melt(id_vars=["Name", "Age"], var_name="Subject", value_name="Grades")
        .groupby("Name", as_index=False)
        .agg(", ".join)
    )
    

    That melts and groups by Name.

    Or you could stack:

    print(
        df.set_index(["Name", "Age"])
        .stack()
        .reset_index()
        .groupby("Name", as_index=False)
        .agg(", ".join)
        .rename({"level_2": "Subjects", 0: "Grades"}, axis=1)
    )
    

    Both codes output:

       Name       Subjects Grades
    0  Alex  Math, English   D, F
    1   Bar  Math, English  F, A+
    2   Bob  Math, English  A+, C
    3   Foo  Math, English   A, B
    4  John  Math, English   B, B
    5   Tom  Math, English   C, A
    

    Problem 6

    Problem 6 could be solved with melt and no column needed to be specified, just specify the expected column names:

    print(df.melt(var_name='Column', value_name='Value'))
    

    That melts the whole dataframe.

    Or you could stack:

    print(
        df.stack()
        .reset_index(level=1)
        .sort_values("level_1")
        .reset_index(drop=True)
        .set_axis(["Column", "Value"], axis=1)
    )
    

    Both codes output:

         Column Value
    0       Age    16
    1       Age    15
    2       Age    15
    3       Age    16
    4       Age    13
    5       Age    13
    6   English    A+
    7   English     B
    8   English     B
    9   English     A
    10  English     F
    11  English     C
    12     Math     C
    13     Math    A+
    14     Math     D
    15     Math     B
    16     Math     F
    17     Math     A
    18     Name  Alex
    19     Name   Bar
    20     Name   Tom
    21     Name   Foo
    22     Name  John
    23     Name   Bob