pandasaverageprefixsuffix

Pandas: Average of columns with incremented names in the middle


I have the following data frame:

df_ex = pd.DataFrame({
'alpha.1.try': [2,4,2.0,-0.5,6,120], 
'alpha.1.test': [1, 3, 4, 2,40,11], 
'alpha.1.sample': [3, 2, 3, 4,2,2], 
'alpha.3.try': [6, 2.2, 7, 0,3,3],
'alpha.3.test': [12, 4, 7, -5,5,5],
'alpha.3.sample': [2, 3, 8, 2,12,8],
'alpha.5.try': [6, 2.2, 7, 0,3,3],
'alpha.5.test': [12, 4, 11, -5,5,5],
'alpha.5.sample': [2, 3, 8, 2,12,8]})
df_ex

|    |   alpha.1.try |   alpha.1.test |   alpha.1.sample |   alpha.3.try |   alpha.3.test |   alpha.3.sample |   alpha.5.try |   alpha.5.test |   alpha.5.sample |
|---:|--------------:|---------------:|-----------------:|--------------:|---------------:|-----------------:|--------------:|---------------:|-----------------:|
|  0 |           2   |              1 |                3 |           6   |             12 |                2 |           6   |             12 |                2 |
|  1 |           4   |              3 |                2 |           2.2 |              4 |                3 |           2.2 |              4 |                3 |
|  2 |           2   |              4 |                3 |           7   |              7 |                8 |           7   |             11 |                8 |
|  3 |          -0.5 |              2 |                4 |           0   |             -5 |                2 |           0   |             -5 |                2 |
|  4 |           6   |             40 |                2 |           3   |              5 |               12 |           3   |              5 |               12 |
|  5 |         120   |             11 |                2 |           3   |              5 |                8 |           3   |              5 |                8 |

but it could be quite large, the names would vary in number and suffix, .number.suffix is a group to average throughout.

I would like to average the contents of prefix.1.suffix with prefix.3.suffix with prefix.5.suffix and put these averages in a new column prefix.135.suffix

I have tried

avg135 = df_ex.columns[(df.columns.str.contains('alpha.1') | df.columns.str.contains('alpha.3') | 
                           df.columns.str.contains('alpha.5')].tolist()

to create a list of columns to slice the data frame because there could be more than the headers seen here and I want the option to select a subset. But the rest, grouping similar suffix and averaging them is a bit out of my programming skills.


Solution

  • You can use MultiIndex:

    # Split each column header into a 3-tuple, e.g.: ("alpha", "1", "try"),
    # ("alpha", "1", "test"), etc.
    df_ex.columns = pd.MultiIndex.from_tuples([col.split(".") for col in df_ex.columns])
    
    # Group by prefix and suffix and take the mean of each column group
    result = df_ex.groupby(level=[0,2], axis=1).mean()
    
    # Rename the resulting columns
    result.columns = [f"{a}.135.{b}" for a, b in result.columns]