rdplyrpercentileecdf

Calculating percentile of values from separate grouped dataframes


I have two dataframes, one (df1) contains the minimum 'flow' value for specific events observed at different sites. The second dataframe (df2) contains the complete flow rime series.

df1 <- data.frame(Event = as.character(seq(1,9,1)), 
              Site_ID = rep(c("a","b","c"),each=3), 
              Flow = rnorm(9,2,1))

 df2 <- data.frame(Site_ID = rep(c("a","b","c"),each=20), 
              Flow = rnorm(60,2,1))

I'd like to use dplyr to go through each 'Flow' value in df1, and calculate its percentile value relative to the complete time series of the corresponding site in df2 using ecdf. Any help would be much appreciated. Thanks.


Solution

  • Here's a method using group_split and map2:

    library(tidyverse)
    
    df1 %>%
      mutate(quantile = map2(group_split(df1, Site_ID), 
                             group_split(df2, Site_ID), 
                             ~ecdf(.y$Flow)(.x$Flow)) |> unlist())
    #>   Event Site_ID     Flow quantile
    #> 1     1       a 2.004936     0.55
    #> 2     2       a 2.072371     0.55
    #> 3     3       a 1.767891     0.45
    #> 4     4       b 2.686329     0.70
    #> 5     5       b 1.517909     0.30
    #> 6     6       b 1.574355     0.35
    #> 7     7       c 1.606521     0.30
    #> 8     8       c 1.680232     0.30
    #> 9     9       c 1.604164     0.30