On the pandas tag, I often see users asking questions about melting dataframes, but existing questions* are often too specific or lacking explanation to be broadly useful. I am going to attempt a canonical Q&A (self-answer) on this topic. I am is going to clarify:
What is melt?
How do I use melt?
When do I use melt?
I will have all my answers on this dataset of random grades for random people with random ages (easier to explain for the answers :D):
import pandas as pd
df = pd.DataFrame({
'Name': ['Bob', 'John', 'Foo', 'Bar', 'Alex', 'Tom'],
'Math': ['A+', 'B', 'A', 'F', 'D', 'C'],
'English': ['C', 'B', 'B', 'A+', 'F', 'A'],
'Age': [13, 16, 16, 15, 15, 13]})
>>> df
Name Math English Age
0 Bob A+ C 13
1 John B B 16
2 Foo A B 16
3 Bar F A+ 15
4 Alex D F 15
5 Tom C A 13
How do I melt a dataframe so that the original dataframe becomes the following?
Name Age Subject Grade
0 Bob 13 English C
1 John 16 English B
2 Foo 16 English B
3 Bar 15 English A+
4 Alex 17 English F
5 Tom 12 English A
6 Bob 13 Math A+
7 John 16 Math B
8 Foo 16 Math A
9 Bar 15 Math F
10 Alex 17 Math D
11 Tom 12 Math C
I want to transpose this so that one column would be each subject and the other columns would be the repeated names of the students and their age and score.
Similar to Problem 1, but I want to make the Subject
column only have Math
; in other words, I want to filter out the English
column:
Name Age Subject Grades
0 Bob 13 Math A+
1 John 16 Math B
2 Foo 16 Math A
3 Bar 15 Math F
4 Alex 15 Math D
5 Tom 13 Math C
How would I group the melt and order the students by their scores?
value Name Subjects
0 A Foo, Tom Math, English
1 A+ Bob, Bar Math, English
2 B John, John, Foo Math, English, English
3 C Tom, Bob Math, English
4 D Alex Math
5 F Bar, Alex Math, English
I need it to be ordered and the names separated by comma and also the Subjects
separated by comma in the same order respectively.
How would I unmelt a melted dataframe? Let's say I already melted this dataframe:
df = df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades')
To become:
Name Age Subject Grades
0 Bob 13 Math A+
1 John 16 Math B
2 Foo 16 Math A
3 Bar 15 Math F
4 Alex 15 Math D
5 Tom 13 Math C
6 Bob 13 English C
7 John 16 English B
8 Foo 16 English B
9 Bar 15 English A+
10 Alex 15 English F
11 Tom 13 English A
Then how would I translate this back to the original dataframe?
How would I group by the names of the students and separate the subjects and grades by comma?
Name Subject Grades
0 Alex Math, English D, F
1 Bar Math, English F, A+
2 Bob Math, English A+, C
3 Foo Math, English A, B
4 John Math, English B, B
5 Tom Math, English C, A
How would I completely melt my dataframe - all columns as values?
Column Value
0 Name Bob
1 Name John
2 Name Foo
3 Name Bar
4 Name Alex
5 Name Tom
6 Math A+
7 Math B
8 Math A
9 Math F
10 Math D
11 Math C
12 English C
13 English B
14 English B
15 English A+
16 English F
17 English A
18 Age 13
19 Age 16
20 Age 16
21 Age 15
22 Age 15
23 Age 13
* Some existing questions
Convert columns into rows with Pandas: This one actually could be good, but some more explanation would be better.
Pandas Melt Function: A nice question, and the answer is good, but it's a bit too vague and doesn't have much explanation.
Melting a pandas dataframe: Also a nice answer! But it's only for that particular situation, which is pretty simple, only pd.melt(df)
Pandas dataframe use columns as rows (melt): Very neat! But the problem is that it's only for the specific question the OP asked, which is also required to use pivot_table
as well.
Note for pandas versions < 0.20.0: I will be using df.melt(...)
for my examples, but you will need to use pd.melt(df, ...)
instead.
Most of the solutions here will use melt
:
Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.
This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.
See also:
melt()
and wide_to_long()
— pandas User GuideMelting combines multiple columns and converts the dataframe from wide to long. For the solution to Problem 1 (covered below), the steps are basically:
First we have the original dataframe.
The melt concatenates the Math
and English
columns and replicates the other column values (making the dataframe longer).
Then it adds the column Subject
which is the transformation of the Grades
column headers.
This can be solved using DataFrame.melt
:
print(df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades'))
With id_vars
set to ['Name', 'Age']
, then value_vars
is automatically set to the other columns (['Math', 'English']
), which are then transposed.
You could also solve Problem 1 using stack
like the below:
print(
df.set_index(["Name", "Age"])
.stack()
.reset_index(name="Grade")
.rename(columns={"level_2": "Subject"})
.sort_values("Subject")
.reset_index(drop=True)
)
This code sets the Name
and Age
columns as the index and stacks the rest of the columns Math
and English
, and resets the index and assigns Grade
as the column name, then renames the other column level_2
to Subject
and then sorts by the Subject
column, then finally resets the index again.
Both of these solutions output:
Name Age Subject Grade
0 Bob 13 English C
1 John 16 English B
2 Foo 16 English B
3 Bar 15 English A+
4 Alex 17 English F
5 Tom 12 English A
6 Bob 13 Math A+
7 John 16 Math B
8 Foo 16 Math A
9 Bar 15 Math F
10 Alex 17 Math D
11 Tom 12 Math C
This is similar to my first question, but this one I only one to filter in the Math
columns, this time the value_vars
argument can come into use, like the below:
print(
df.melt(
id_vars=["Name", "Age"],
value_vars="Math",
var_name="Subject",
value_name="Grades",
)
)
Or we can also use stack
with column specification:
print(
df.set_index(["Name", "Age"])[["Math"]]
.stack()
.reset_index(name="Grade")
.rename(columns={"level_2": "Subject"})
.sort_values("Subject")
.reset_index(drop=True)
)
Both of these solutions give:
Name Age Subject Grade
0 Bob 13 Math A+
1 John 16 Math B
2 Foo 16 Math A
3 Bar 15 Math F
4 Alex 15 Math D
5 Tom 13 Math C
Problem 3 could be solved with melt
and groupby
, using the agg
function with ', '.join
, like the below:
print(
df.melt(id_vars=["Name", "Age"])
.groupby("value", as_index=False)
.agg(", ".join)
)
It melts the dataframe then groups by the grades and aggregates them and joins them by a comma.
stack
could be also used to solve this problem, with stack
and groupby
like the below:
print(
df.set_index(["Name", "Age"])
.stack()
.reset_index()
.rename(columns={"level_2": "Subjects", 0: "Grade"})
.groupby("Grade", as_index=False)
.agg(", ".join)
)
This stack
function just transposes the dataframe in a way that is equivalent to melt
, then resets the index, renames the columns and groups and aggregates.
Both solutions output:
Grade Name Subjects
0 A Foo, Tom Math, English
1 A+ Bob, Bar Math, English
2 B John, John, Foo Math, English, English
3 C Bob, Tom English, Math
4 D Alex Math
5 F Bar, Alex Math, English
How would I unmelt a melted dataframe? Let's say I already melted this dataframe:
df = df.melt(id_vars=['Name', 'Age'], var_name='Subject', value_name='Grades')
This could be solved with pivot_table
. We would have to specify the arguments values
, index
, columns
and also aggfunc
:
print(
df.pivot_table("Grades", ["Name", "Age"], "Subject", aggfunc="first")
.reset_index()
.rename_axis(columns=None)
)
Output:
Name Age English Math
0 Alex 15 F D
1 Bar 15 A+ F
2 Bob 13 C A+
3 Foo 16 B A
4 John 16 B B
5 Tom 13 A C
We first pivot the melted dataframe and then reset the index and remove the column axis name.
See also:
Problem 5 could be solved with melt
and groupby
like the following:
print(
df.melt(id_vars=["Name", "Age"], var_name="Subject", value_name="Grades")
.groupby("Name", as_index=False)
.agg(", ".join)
)
That melts and groups by Name
.
Or you could stack
:
print(
df.set_index(["Name", "Age"])
.stack()
.reset_index()
.groupby("Name", as_index=False)
.agg(", ".join)
.rename({"level_2": "Subjects", 0: "Grades"}, axis=1)
)
Both codes output:
Name Subjects Grades
0 Alex Math, English D, F
1 Bar Math, English F, A+
2 Bob Math, English A+, C
3 Foo Math, English A, B
4 John Math, English B, B
5 Tom Math, English C, A
Problem 6 could be solved with melt
and no column needed to be specified, just specify the expected column names:
print(df.melt(var_name='Column', value_name='Value'))
That melts the whole dataframe.
Or you could stack
:
print(
df.stack()
.reset_index(level=1)
.sort_values("level_1")
.reset_index(drop=True)
.set_axis(["Column", "Value"], axis=1)
)
Both codes output:
Column Value
0 Age 16
1 Age 15
2 Age 15
3 Age 16
4 Age 13
5 Age 13
6 English A+
7 English B
8 English B
9 English A
10 English F
11 English C
12 Math C
13 Math A+
14 Math D
15 Math B
16 Math F
17 Math A
18 Name Alex
19 Name Bar
20 Name Tom
21 Name Foo
22 Name John
23 Name Bob