rmaxrowfilteringsubset

Subset rows using slice_max() - how to use it in R?


I have to create a partial data set containing only the 20 days with the highest daily mean air temperature values for each year. My dataset looks like this:

date mean
1997-07-15 27.05292
1997-07-17 26.86542
1997-06-21 26.10958
1997-07-16 26.05833
1997-07-14 26.02500
1997-06-25 25.80125
1997-07-18 25.36208
1997-06-22 25.18875
1997-06-29 24.72333
1997-06-30 24.71000

...

I tried to use the code bellow, but this one only filters the maximum from every year and creates a dataframe with 20 rows - but I need the Top 20 mean values from every year (1997 – 2010). I use the class data.frame btw. I would be so grateful if anyone can help me, I just can't figure it out!

top_20_per_year <- daily_mean_temp_sorted %>%
  slice_max(mean, n = 20) %>%

Solution

  • Example taking the top 2 mean values by year:

    library(tidyverse)
    
    df <- tribble(
      ~date, ~mean,
      "1997-07-15", 27.05292,
      "1997-07-17", 26.86542,
      "1997-06-21", 26.10958,
      "1997-07-16", 26.05833,
      "1997-07-14", 26.02500,
      "1998-06-25", 25.80125,
      "1998-07-18", 25.36208,
      "1998-06-22", 25.18875,
      "1998-06-29", 24.72333,
      "1998-06-30", 24.71000
    )
    
    df |> 
      mutate(date = ymd(date), year = year(date)) |> 
      slice_max(n = 2, order_by = mean, by = year)
    #> # A tibble: 4 × 3
    #>   date        mean  year
    #>   <date>     <dbl> <dbl>
    #> 1 1997-07-15  27.1  1997
    #> 2 1997-07-17  26.9  1997
    #> 3 1998-06-25  25.8  1998
    #> 4 1998-07-18  25.4  1998
    

    Created on 2024-04-29 with reprex v2.1.0