I have Excel files with multiple sheets, each of which looks a little like this (but much longer):
Sample CD4 CD8
Day 1 8311 17.3 6.44
8312 13.6 3.50
8321 19.8 5.88
8322 13.5 4.09
Day 2 8311 16.0 4.92
8312 5.67 2.28
8321 13.0 4.34
8322 10.6 1.95
The first column is actually four cells merged vertically.
When I read this using pandas.read_excel, I get a DataFrame that looks like this:
Sample CD4 CD8
Day 1 8311 17.30 6.44
NaN 8312 13.60 3.50
NaN 8321 19.80 5.88
NaN 8322 13.50 4.09
Day 2 8311 16.00 4.92
NaN 8312 5.67 2.28
NaN 8321 13.00 4.34
NaN 8322 10.60 1.95
How can I either get Pandas to understand merged cells, or quickly and easily remove the NaN and group by the appropriate value? (One approach would be to reset the index, step through to find the values and replace NaNs with values, pass in the list of days, then set the index to the column. But it seems like there should be a simpler approach.)
You could use the Series.fillna method to forword-fill in the NaN values:
df.index = pd.Series(df.index).fillna(method='ffill')
For example,
In [42]: df
Out[42]:
Sample CD4 CD8
Day 1 8311 17.30 6.44
NaN 8312 13.60 3.50
NaN 8321 19.80 5.88
NaN 8322 13.50 4.09
Day 2 8311 16.00 4.92
NaN 8312 5.67 2.28
NaN 8321 13.00 4.34
NaN 8322 10.60 1.95
[8 rows x 3 columns]
In [43]: df.index = pd.Series(df.index).fillna(method='ffill')
In [44]: df
Out[44]:
Sample CD4 CD8
Day 1 8311 17.30 6.44
Day 1 8312 13.60 3.50
Day 1 8321 19.80 5.88
Day 1 8322 13.50 4.09
Day 2 8311 16.00 4.92
Day 2 8312 5.67 2.28
Day 2 8321 13.00 4.34
Day 2 8322 10.60 1.95
[8 rows x 3 columns]