pythonpandasnumpydataframe

Get first and second highest values in pandas columns


I am using pandas to analyse some election results. I have a DF, Results, which has a row for each constituency and columns representing the votes for the various parties (over 100 of them):

In[60]: Results.columns
Out[60]: 
Index(['Constituency', 'Region', 'Country', 'ID', 'Type', 'Electorate',
       'Total', 'Unnamed: 9', '30-50', 'Above',
       ...
       'WP', 'WRP', 'WVPTFP', 'Yorks', 'Young', 'Zeb', 'Party', 'Votes',
       'Share', 'Turnout'],
      dtype='object', length=147) 

So...

In[63]: Results.head()
Out[63]: 
                         Constituency    Region   Country         ID    Type  \
PAID                                                                           
1                            Aberavon     Wales     Wales  W07000049  County   
2                           Aberconwy     Wales     Wales  W07000058  County   
3                      Aberdeen North  Scotland  Scotland  S14000001   Burgh   
4                      Aberdeen South  Scotland  Scotland  S14000002   Burgh   
5     Aberdeenshire West & Kincardine  Scotland  Scotland  S14000058  County   

      Electorate  Total  Unnamed: 9  30-50  Above    ...     WP  WRP  WVPTFP  \
PAID                                                 ...                       
1          49821  31523         NaN    NaN    NaN    ...    NaN  NaN     NaN   
2          45525  30148         NaN    NaN    NaN    ...    NaN  NaN     NaN   
3          67745  43936         NaN    NaN    NaN    ...    NaN  NaN     NaN   
4          68056  48551         NaN    NaN    NaN    ...    NaN  NaN     NaN   
5          73445  55196         NaN    NaN    NaN    ...    NaN  NaN     NaN   

      Yorks  Young  Zeb  Party  Votes     Share   Turnout  
PAID                                                       
1       NaN    NaN  NaN    Lab  15416  0.489040  0.632725  
2       NaN    NaN  NaN    Con  12513  0.415052  0.662230  
3       NaN    NaN  NaN    SNP  24793  0.564298  0.648550  
4       NaN    NaN  NaN    SNP  20221  0.416490  0.713398  
5       NaN    NaN  NaN    SNP  22949  0.415773  0.751528  

[5 rows x 147 columns]

The per-constituency results for each party are given in the columns Results.ix[:, 'Unnamed: 9': 'Zeb']

I can find the winning party (i.e. the party which polled highest number of votes) and the number of votes it polled using:

RawResults = Results.ix[:, 'Unnamed: 9': 'Zeb']
Results['Party'] = RawResults.idxmax(axis=1)
Results['Votes'] = RawResults.max(axis=1).astype(int)

But, I also need to know how many votes the second-place party got (and ideally its index/name). So is there any way in pandas to return the second highest value/index in a set of columns for each row?


Solution

  • To get the highest values of a column you can try nlargest() :

    df['High'].nlargest(2)
    

    The above will give you the 2 highest values of column High.


    You can also use nsmallest() to get the lowest values.