pythonpandasdataframegroup-by

How to extract some rows under specific condition in a dataframe (Python)?


I have the following dataset

A=pd.DataFrame({ 'vol_num' : 1.,
                        'vol_name' : pd.Categorical(["test","train","tt","tn","se","train","tt","test","train","tt"]),
                        'lat' : [0.188319,0.818803,0.087331,0.305681,0.871307,0.818803,0.087331,0.188319,0.818803,0.087331],
                        'lon' : [0.959698,0.678901,0.961500,0.229158,0.947383,0.678901,0.961500,0.959698,0.678901,0.961500],
                        })

For each "vol_name" I have the same "lat" and "lon".
I want to extract the "lat" and "lon" for the top 3 repeated "vol_name" in my dataframe.

The following code gives me the 3 value.

A['vol_name'].value_counts().head(3)

tt       3
train    3
test     2
Name: vol_name, dtype: int64

However, I don't know how to get each "lat" and "lon".

How can get the following outcomes? In a dataframe style with 3 columns.

tt      0.087331    0.961500  
train   0.818803    0.67890  
test    0.188319    0.959698

Thank you.

*my real dataset has over 500 rows.


Solution

  • First remove duplicates by vol_name, then change order by index idx and last remove column vol_num:

    idx = A["vol_name"].value_counts().head(3).index
    
    A = (
        A.drop_duplicates("vol_name")
        .set_index(["vol_name"])
        .reindex(idx)
        .reset_index()
        .drop("vol_num", 1)
    )
    
    print (A)
      vol_name       lat       lon
    0       tt  0.087331  0.961500
    1    train  0.818803  0.678901
    2     test  0.188319  0.959698