pythonpandascsv

Is there a way to include column index name with Pandas dataframe to CSV?


Is there a way to include the column (not rows!) index name in the output when calling Pandas' dataframe.to_csv() method? For example:

import pandas as pd

iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
pivot_iris = iris.pivot_table(index='species', columns='sepal_length', values='sepal_width')

print(pivot_iris.columns)
print(pivot_iris)

pivot_iris.to_csv('pivot_iris.csv', index=True, header=True)

After calling pivot, the column index name is set to sepal_length as you can see in the prints

Index([4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6,
       5.7, 5.8, 5.9, 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7.0,
       7.1, 7.2, 7.3, 7.4, 7.6, 7.7, 7.9],
      dtype='float64', name='sepal_length')

and

sepal_length  4.3       4.4  4.5    4.6  4.7  ...  7.3  7.4  7.6   7.7  7.9
species                                       ...                          
setosa        3.0  3.033333  2.3  3.325  3.2  ...  NaN  NaN  NaN   NaN  NaN
versicolor    NaN       NaN  NaN    NaN  NaN  ...  NaN  NaN  NaN   NaN  NaN
virginica     NaN       NaN  NaN    NaN  NaN  ...  2.9  2.8  3.0  3.05  3.8

[3 rows x 35 columns]

Unfortunately the output file produced with to_csv() is missing the label in front of the column names:

species,4.30,4.40,4.50,4.60,4.70,4.80,4.90,5.00,5.10,5.20,5.30,5.40,5.50,5.60,5.70,5.80,5.90,6.00,6.10,6.20,6.30,6.40,6.50,6.60,6.70,6.80,6.90,7.00,7.10,7.20,7.30,7.40,7.60,7.70,7.90
setosa,3.00,3.03,2.30,3.33,3.20,3.18,3.20,3.36,3.60,3.67,3.70,3.66,3.85,,4.10,4.00,,,,,,,,,,,,,,,,,,,
versicolor,,,,,,,2.40,2.15,2.50,2.70,,3.00,2.44,2.82,2.82,2.67,3.10,2.80,2.88,2.55,2.70,3.05,2.80,2.95,3.07,2.80,3.10,3.20,,,,,,,
virginica,,,,,,,2.50,,,,,,,2.80,2.50,2.73,3.00,2.60,2.80,3.10,2.93,2.92,3.05,,3.04,3.10,3.13,,3.00,3.27,2.90,2.80,3.00,3.05,3.80

is there a way to include it?


Solution

  • You can't really include the index names in a CSV.

    What you could do is to create a MultiIndex:

    pivot_iris = (pd.concat({'sepal_length':
                             iris.pivot_table(index='species', columns='sepal_length', values='sepal_width')},
                            axis=1)
                      .rename_axis(columns=(None, 'species')).reset_index()
                 )
    
    pivot_iris.to_csv('pivot_iris.csv', index=False)
    

    Output:

    species,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length,sepal_length
    ,4.3,4.4,4.5,4.6,4.7,4.8,4.9,5.0,5.1,5.2,5.3,5.4,5.5,5.6,5.7,5.8,5.9,6.0,6.1,6.2,6.3,6.4,6.5,6.6,6.7,6.8,6.9,7.0,7.1,7.2,7.3,7.4,7.6,7.7,7.9
    setosa,3.0,3.03,2.3,3.32,3.2,3.18,3.2,3.36,3.6,3.67,3.7,3.66,3.85,,4.1,4.0,,,,,,,,,,,,,,,,,,,
    versicolor,,,,,,,2.4,2.15,2.5,2.7,,3.0,2.44,2.82,2.82,2.67,3.1,2.8,2.88,2.55,2.7,3.05,2.8,2.95,3.07,2.8,3.1,3.2,,,,,,,
    virginica,,,,,,,2.5,,,,,,,2.8,2.5,2.73,3.0,2.6,2.8,3.1,2.93,2.92,3.05,,3.04,3.1,3.13,,3.0,3.27,2.9,2.8,3.0,3.05,3.8