pythonpandasdataframegroup-byswitch-statement

Select mode of each group and assign back to the original dataframe on a condition


I have a dataframe like this:

df = pd.DataFrame({
    'id_vector': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D', 'E', 'E', 'E', 'E'], 
    'svm_prediction': [0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0]
})

And I want to create a new column in which each value represent the most frequent value in the svm_prediction column and just for the specific id_vector group. For example if for id_vector 'A' the most frequent value in svm_prediction is 1, the new column will have all values 1, same for 0, in case the frequency is equal for both values then put 2 or in each row.

The result will be something like this where column new_label is added to df.

result = pd.DataFrame({
    'id_vector': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D', 'E', 'E', 'E', 'E'], 
    'svm_prediction': [0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0], 
    'new_label': [0, 0, 0, 0, 0, 2, 2, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2]
})

I tried:

r.groupby('id_vector')['svm_prediction'].mode()

but an error is raise because groupby has no attribute mode

also:

r['average'] = r[['svm_prediction']].mode(axis=0)

gives NaN values for every row except the first one.


Solution

  • We can use np.where to match the case you want, and np.mean which will give 0.5 if zero's == one's and more than 0.5 if one's more and less than 0.5 for zero's more.

    df['new_label'] = df.groupby('id_vector').svm_prediction.transform(np.mean)
    df['new_label'] = np.where(df.new_label > 0.5 , 1, np.where(df.new_label < 0.5, 0, 2))
    
    
        id_vector   true_label  svm_prediction  new_label
    0   A   0   0   0
    1   A   0   0   0
    2   A   0   1   0
    3   A   0   1   0
    4   A   0   0   0
    5   B   0   0   2
    6   B   0   1   2
    7   B   0   0   2
    8   B   0   1   2
    9   C   0   1   1
    10  C   0   0   1
    11  C   0   1   1
    12  C   0   1   1
    13  D   1   1   0
    14  D   1   0   0
    15  D   1   0   0
    16  D   1   0   0
    17  E   1   0   2
    18  E   1   1   2
    19  E   1   1   2
    20  E   1   0   2