pythonpandasdataframemulti-index

How do I create a multi-index dataframe from tuples/lists


I want to create (for example) a small multi-index dataframe with 3 level index and one column of values. There will only be a handful of rows.

Value
Category Sub Category Sub Sub Category
Machine Car Sedan 10
Machine Bike 2 wheel 5
Machine Bike 3 wheel 4
Animal Donkey Big 2

A requirement is that the data has to be easy to enter, a row at a time, from left to right (not top to bottom), so my first step is to make a list of lists.

Then I can use DataFrame method and then the set-index method like this:

data=[["Machine","Car","Sedan",10], ["Machine","Bike","2 Wheel",5], ["Machine","Bike","3 Wheel",4], ["Animal","Donkey","Big",2]]
column_names=["Category","Sub Category","Sub Sub Category","Value"]
df=pd.DataFrame(data,columns=column_names)
df.set_index(["Category","Sub Category","Sub Sub Category"])

But is there a way to make the multi-index DataFrame directly from a list, or something like it?


Solution

  • You can create a MultiIndex from a list of tuples (or list of lists) with pandas.MultiIndex.from_tuples:

    pd.MultiIndex.from_tuples([lst[:3] for lst in data],
                              names=['Category', 'Sub Category', 'Sub Sub Category']
                             )
    

    Output:

    MultiIndex([('Machine',    'Car',   'Sedan'),
                ('Machine',   'Bike', '2 Wheel'),
                ('Machine',   'Bike', '3 Wheel'),
                ( 'Animal', 'Donkey',     'Big')],
               names=['Category', 'Sub Category', 'Sub Sub Category'])
    

    To create a DataFrame, you could split the sublists into index/values and pass them to the DataFrame constructor:

    *index, values = map(list, zip(*data))
    df = (pd.DataFrame({'Value': values}, index=index)
            .rename_axis(['Category', 'Sub Category', 'Sub Sub Category'])
         )
    

    Output:

                                            Value
    Category Sub Category Sub Sub Category       
    Machine  Car          Sedan                10
             Bike         2 Wheel               5
                          3 Wheel               4
    Animal   Donkey       Big                   2
    

    Your approach is also perfectly valid, you could just reuse the variable with the list of columns to use as index and chain the operations:

    df = (pd.DataFrame(data, columns=column_names)
            .set_index(column_names[:3])
         )