miller

Miller: getting the row containing the max (or min) values


Using miller, I would like to know what is the maximal value in a column (which is easy with stats1 -a max) but I would also like to get the whole row containing that max value.

Let's say I have this data under a data.csv file:

year,value,country
2000,13,ES
2001,18,IT
2002,16,TZ
2003,14,TZ
2004,10,ES

I would like a miller command to get the max row for each country (so something based on stats1 -a max -f value -g country):

year,value_max,country
2000,13,ES
2001,18,IT
2002,16,TZ

However mlr --csv stats1 -a max -f value -g country would only return the value and country columns not the date one.

I would like to do this in a single-pass as my data is quite large.

Thanks!


Solution

  • You could use top verb

    mlr --csv top -f value -a -g country input.csv >output.csv
    

    to get

    year,value,country
    2000,13,ES
    2001,18,IT
    2002,16,TZ
    

    You have the --min option to have top smallest values.