I have a Pandas dataframe that looks like:
Group_ID feature1 feature2 label
1 3 2 0
1 5 7 0
1 2 4 1
1 9 9 1
1 2 0 1
2 4 1 1
2 8 8 0
2 5 5 0
3 0 9 1
3 4 7 1
3 2 3 0
3 7 2 0
and for each feature i, I would like to create a new feature called featurei_rel
for i=1,2 using the following logic: it is given by feature i for that row divided by the mean of the smallest two feature i in the same group.
So for example, for feature1 row 1, the smallest two values in Group 1 are 2 and 2 (row 3 and 5 respectively), hence feature1_rel for row 1 is given by 3/((2+2)/2) = 3/2 = 1.5 and the desired result looks like
Group_ID feature1 feature2 feature1_rel feature2_rel label
1 3 2 3/((2+2)/2) 2/((0+2)/2) 0
1 5 7 5/((2+2)/2) 7/((0+2)/2) 0
1 2 4 2//((2+2)/2) 4/((0+2)/2) 1
1 9 9 9//((2+2)/2) 9/((0+2)/2) 1
1 2 0 2/((2+2)/2) 0/((0+2)/2) 1
2 4 1 4/((4+5)/2) 1/((1+5)/2) 1
2 8 8 8/((4+5)/2) 8/((1+5)/2) 0
2 5 5 5/((4+5)/2) 5/((1+5)/2) 0
3 0 9 0/((0+2)/2) 9/((2+3)/2) 1
3 4 7 4/((0+2)/2) 7/((2+3)/2) 1
3 2 3 2/((0+2)/2) 3/((2+3)/2) 0
3 7 2 7/((0+2)/2) 2/((2+3)/2) 0
So here is what I have tried:
# create columns to find the smallest element in a group
df['feature1min'] = df.groupby('Group_ID')['feature1'].transform('min')
# create columns to find the second smallest element in a group
df['feature1min2'] = df.groupby('Group_ID')['feature1'].nsmallest(2)
df['feature1_rel'] = df['feature1']/((df['feature1min'] + df['feature1min2']) / 2)
However, in my actual dataset, I have hundreds of features and millions of rows, so I was wondering is there any fast way to do it, thank you so much in advance.
IIUC, you can try:
col = pd.Index(['feature1', 'feature2'])
df[col+'_rel']= df.groupby('Group_ID')[col].transform(lambda x: x/x.nsmallest(2).mean())
Output:
Group_ID feature1 feature2 label feature1_rel feature2_rel
0 1 3 2 0 1.500000 2.000000
1 1 5 7 0 2.500000 7.000000
2 1 2 4 1 1.000000 4.000000
3 1 9 9 1 4.500000 9.000000
4 1 2 0 1 1.000000 0.000000
5 2 4 1 1 0.888889 0.333333
6 2 8 8 0 1.777778 2.666667
7 2 5 5 0 1.111111 1.666667
8 3 0 9 1 0.000000 3.600000
9 3 4 7 1 4.000000 2.800000
10 3 2 3 0 2.000000 1.200000
11 3 7 2 0 7.000000 0.800000