pythonpandas

Pandas long to wide, WHILE preserving existing columns?


I'm trying to manipulate a dataframe in Pandas and running into some issues. I've looked at some variants of the questions that have been asked here and most of them involve using pivot and discarding some of the existing columns, I'm wondering if there's a way around that.

I created some simple data as an illustration that's similar to my existing data:

import pandas as pd

raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
            'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
            'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
            'Month': ["November", "November", "November", "December","December", "December"], 
              'Sales': [100, 150, 275, 200, 150, 150]}

frame = pd.DataFrame(raw_data, columns =raw_data.keys())

This produces a dataframe that looks like this:

OutputFrame What I want to do, is turn the months into columns, WHILE preserving the other data. So something like this:DesiredFrame

I've tried the suggestions to pivot from here: Pandas long to wide reshape, by two variables

I tried pivoting on months:

frame.pivot(columns = 'Month')

Fail1

I tried adding more columns to see if it would clean up:

frame.pivot(columns = ('FirstName', 'LastName','Month'), values = 'Sales' )

Fail2

In both cases I got some sort bizzare columns. I'm very curious to know what Pandas is doing here, but I have no idea how to make sense of this.

I figure I could loop through and just re-create the data, but I figure this has to be a better way?


Solution

  • You were actually almost all the way there with pivot(). Specifying the index will take you almost all the way there:

    import pandas as pd
    
    raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
                'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
                'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
                'Month': ["November", "November", "November", "December","December", "December"], 
                'Sales': [100, 150, 275, 200, 150, 150]}
    
    frame = pd.DataFrame(raw_data, columns =raw_data.keys())
    
    df = frame.pivot(
        index=["FirstName", "LastName", "Building"],
        columns="Month",
        values="Sales",
    )
    
    df
    

    The only difference is that you will have a multi-level index in your dataframe. If you want to get exactly the desired output, you'd need to collapse the multi-index and rename the index (you can chain them as well)

    import pandas as pd
    
    raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
                'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
                'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
                'Month': ["November", "November", "November", "December","December", "December"], 
                'Sales': [100, 150, 275, 200, 150, 150]}
    
    frame = pd.DataFrame(raw_data, columns =raw_data.keys())
    
    df = (
        frame.pivot(
            index=["FirstName", "LastName", "Building"],
            columns="Month",
            values="Sales"
        )
        .reset_index()              # collapses multi-index
        .rename_axis(None, axis=1)  # renames index
    )
    
    df