pythonpandasdataframeiteration

Pandas df.itertuples renaming dataframe columns when printing


I know that normally pandas' itertuples() will return the values of each including the column names as follows:

ab=pd.DataFrame(np.random.random([3,3]),columns=['hi','low','med'])
for i in ab.itertuples():
    print(i)

and the output is as follows:

Pandas(Index=0, hi=0.05421443, low=0.2456833, med=0.491185)
Pandas(Index=1, hi=0.28670429, low=0.5828551, med=0.279305)
Pandas(Index=2, hi=0.53869406, low=0.3427290, med=0.750075)

However, I have no idea why it doesn't shows the columns as I expected for my another set of code as below:

            us qqq equity  us spy equity
date                                    
2017-06-19            0.0            1.0
2017-06-20            0.0           -1.0
2017-06-21            0.0            0.0
2017-06-22            0.0            0.0
2017-06-23            1.0            0.0
2017-06-26            0.0            0.0
2017-06-27           -1.0            0.0
2017-06-28            1.0            0.0
2017-06-29           -1.0            0.0
2017-06-30            0.0            0.0

the above is a Pandas Dataframe with Timestamp as index, float64 as the values in the list, and a list of string ['us qqq equity','us spy equity'] as the columns.

When I do this:

for row in data.itertuples():
    print (row)

It shows the columns as _1 and _2 as follows:

Pandas(Index=Timestamp('2017-06-19 00:00:00'), _1=0.0, _2=1.0)
Pandas(Index=Timestamp('2017-06-20 00:00:00'), _1=0.0, _2=-1.0)
Pandas(Index=Timestamp('2017-06-21 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-22 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-23 00:00:00'), _1=1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-26 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-27 00:00:00'), _1=-1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-28 00:00:00'), _1=1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-29 00:00:00'), _1=-1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-30 00:00:00'), _1=0.0, _2=0.0)

Does anyone has any clue about what have I done wrong? Does it have to do with some variable referencing issue when creating the original dataframe? (Also, as a side question, I learnt from the community that the type of data generated from itertuples() should be tuples, but it seems (as shown above), the return type is as I verified from the type statement?)

Thank you for all your patience as I am still trying to master the application of DataFrame.


Solution

  • This seems to be an issue with handling column names having spaces in them. If you replace the column names with different ones without spaces, it will work:

    df.columns = ['us_qqq_equity', 'us_spy_equity'] 
    # df.columns = df.columns.str.replace(r'\s+', '_', regex=True)  # Courtesy @MaxU  
    for r in df.head().itertuples():
        print(r)
    
    # Pandas(Index='2017-06-19', us_qqq_equity=0.0, us_spy_equity=1.0)
    # Pandas(Index='2017-06-20', us_qqq_equity=0.0, us_spy_equity=-1.0)
    # ...
    

    Column names with spaces cannot effectively be represented in named tuples, so they are renamed automatically when printing.