Consider the following input dataframe:
index | col_1 | col_2 |
1 | 1234 | 4567 |
2 | 3456 | 9453 |
Each column of the dataframe is a series (timeseries), and we want to do some computations that create series of length equal to the input (for example, computing the running mean of the series of last 5 samples (op_1), and of 10 samples (op_2).
Finally, the output should be grouped under the name of the column like shown below:
Output:
| col_1 | col_2 |
index | value opr_1 opr_2 | value opr_1 opr_2 |
1 | 1234 10 1 | 4567 22 13 |
2 | 3456 18 6 | 9453 21 4 |
This should allow me to access each original column's related computation under a single head col_1
.
Initially, I thought of increasing the level of the input dataframe manually as:
df.columns = pd.MultiIndex.from_product([df.columns, ['value']])
But, I cannot figure out how to run apply
on its second level alone (considering that I want to address the column as df['col_1']['value']
and then to put those values into the dataframe at the same level inside df['col_1']['op_1']
.
So, the second approach I tried was to create a dataframe for each operation as
op_1 = df.apply(lambda x: op_1_func(x, **params))
op_2 = df.apply(lambda x: op_2_func(x, **params))
And then merge the three dataframes to create the desired multilevel view. However, I cannot figure out a way to concat the dataframes to produce the desired output. Please help!
Given the following example:
import numpy as np
import pandas as pd
def opr1_func(x, y):
return sum(x) - y
def opr2_func(x, y):
return np.mean(x) - y
df = pd.DataFrame({"col1": [2, 4, 6, 8, 10], "col2": [6, 8, 12, 14, 16]})
Here is one way to do it using Pandas advanced indexing with hierarchical index:
df.columns = pd.MultiIndex.from_product([df.columns, ["value"]])
for col in ["col1", "col2"]:
for new_col, func in zip(["opr_1", "opr_2"], [opr1_func, opr2_func]):
df[(col, new_col)] = df[(col, "value")].rolling(2).apply(lambda x: func(x, 10))
df = df.reindex(
pd.MultiIndex.from_product([["col1", "col2"], ["value", "opr_1", "opr_2"]]), axis=1
)
Then:
print(df)
# Output
col1 col2
value opr_1 opr_2 value opr_1 opr_2
0 2 NaN NaN 6 NaN NaN
1 4 -4.0 -7.0 8 4.0 -3.0
2 6 0.0 -5.0 12 10.0 0.0
3 8 4.0 -3.0 14 16.0 3.0
4 10 8.0 -1.0 16 20.0 5.0