I'm trying to manipulate a dataframe in Pandas and running into some issues. I've looked at some variants of the questions that have been asked here and most of them involve using pivot and discarding some of the existing columns, I'm wondering if there's a way around that.
I created some simple data as an illustration that's similar to my existing data:
import pandas as pd
raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
'Month': ["November", "November", "November", "December","December", "December"],
'Sales': [100, 150, 275, 200, 150, 150]}
frame = pd.DataFrame(raw_data, columns =raw_data.keys())
This produces a dataframe that looks like this:
OutputFrame What I want to do, is turn the months into columns, WHILE preserving the other data. So something like this:DesiredFrame
I've tried the suggestions to pivot from here: Pandas long to wide reshape, by two variables
I tried pivoting on months:
frame.pivot(columns = 'Month')
I tried adding more columns to see if it would clean up:
frame.pivot(columns = ('FirstName', 'LastName','Month'), values = 'Sales' )
In both cases I got some sort bizzare columns. I'm very curious to know what Pandas is doing here, but I have no idea how to make sense of this.
I figure I could loop through and just re-create the data, but I figure this has to be a better way?
You were actually almost all the way there with pivot()
. Specifying the index
will take you almost all the way there:
import pandas as pd
raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
'Month': ["November", "November", "November", "December","December", "December"],
'Sales': [100, 150, 275, 200, 150, 150]}
frame = pd.DataFrame(raw_data, columns =raw_data.keys())
df = frame.pivot(
index=["FirstName", "LastName", "Building"],
columns="Month",
values="Sales",
)
df
The only difference is that you will have a multi-level index in your dataframe. If you want to get exactly the desired output, you'd need to collapse the multi-index and rename the index (you can chain them as well)
import pandas as pd
raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
'Month': ["November", "November", "November", "December","December", "December"],
'Sales': [100, 150, 275, 200, 150, 150]}
frame = pd.DataFrame(raw_data, columns =raw_data.keys())
df = (
frame.pivot(
index=["FirstName", "LastName", "Building"],
columns="Month",
values="Sales"
)
.reset_index() # collapses multi-index
.rename_axis(None, axis=1) # renames index
)
df