I know that normally pandas' itertuples() will return the values of each including the column names as follows:
ab=pd.DataFrame(np.random.random([3,3]),columns=['hi','low','med'])
for i in ab.itertuples():
print(i)
and the output is as follows:
Pandas(Index=0, hi=0.05421443, low=0.2456833, med=0.491185)
Pandas(Index=1, hi=0.28670429, low=0.5828551, med=0.279305)
Pandas(Index=2, hi=0.53869406, low=0.3427290, med=0.750075)
However, I have no idea why it doesn't shows the columns as I expected for my another set of code as below:
us qqq equity us spy equity
date
2017-06-19 0.0 1.0
2017-06-20 0.0 -1.0
2017-06-21 0.0 0.0
2017-06-22 0.0 0.0
2017-06-23 1.0 0.0
2017-06-26 0.0 0.0
2017-06-27 -1.0 0.0
2017-06-28 1.0 0.0
2017-06-29 -1.0 0.0
2017-06-30 0.0 0.0
the above is a Pandas Dataframe with Timestamp as index, float64 as the values in the list, and a list of string ['us qqq equity','us spy equity'] as the columns.
When I do this:
for row in data.itertuples():
print (row)
It shows the columns as _1 and _2 as follows:
Pandas(Index=Timestamp('2017-06-19 00:00:00'), _1=0.0, _2=1.0)
Pandas(Index=Timestamp('2017-06-20 00:00:00'), _1=0.0, _2=-1.0)
Pandas(Index=Timestamp('2017-06-21 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-22 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-23 00:00:00'), _1=1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-26 00:00:00'), _1=0.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-27 00:00:00'), _1=-1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-28 00:00:00'), _1=1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-29 00:00:00'), _1=-1.0, _2=0.0)
Pandas(Index=Timestamp('2017-06-30 00:00:00'), _1=0.0, _2=0.0)
Does anyone has any clue about what have I done wrong? Does it have to do with some variable referencing issue when creating the original dataframe? (Also, as a side question, I learnt from the community that the type of data generated from itertuples() should be tuples, but it seems (as shown above), the return type is as I verified from the type statement?)
Thank you for all your patience as I am still trying to master the application of DataFrame.
This seems to be an issue with handling column names having spaces in them. If you replace the column names with different ones without spaces, it will work:
df.columns = ['us_qqq_equity', 'us_spy_equity']
# df.columns = df.columns.str.replace(r'\s+', '_', regex=True) # Courtesy @MaxU
for r in df.head().itertuples():
print(r)
# Pandas(Index='2017-06-19', us_qqq_equity=0.0, us_spy_equity=1.0)
# Pandas(Index='2017-06-20', us_qqq_equity=0.0, us_spy_equity=-1.0)
# ...
Column names with spaces cannot effectively be represented in named tuples, so they are renamed automatically when printing.