dplyrtidyverse

Adding row names within the reframe() function in dplyr


I would like to know how to add a column of row names in combination with the reframe() function in dplyr. Here's toy data: a grouped dataset with three groups crossed with three conditions and a score for each observation.

d <- data.frame(group = rep(letters[1:3],each=30),
                condition = rep(1:3,times=30),
                score = sample(1:10,size=90,replace=T))

head(d)

#   group condition score
# 1     a         1     1
# 2     a         2     4
# 3     a         3     9
# 4     a         1     7
# 5     a         2     3
# 6     a         3     9

Now to summarise it using the quantile() function within reframe, which will produce three rows per unique combination of group and condition

sumD <- d %>%
          group_by(group, condition) %>%
            reframe(sumScore = quantile(x = score,
                                         probs = c(0.025, 0.5, 0.975)))

sumD

# A tibble: 27 × 3
#    group condition sumScore
#    <chr>     <int>    <dbl>
#  1 a             1     2.22
#  2 a             1     4.5 
#  3 a             1     9.33
#  4 a             2     4.22
#  5 a             2     7.5 
#  6 a             2    10   
#  7 a             3     2   
#  8 a             3     7   
#  9 a             3     9.78
# 10 b             1     1.68

But what is missing is a label for each. I can do this manually via add_column() and rep(), specifying the number of times I want the sequence of labels to repeat.

sumD %>%
  add_column(quantiles = rep(x = c("lowCI", 
                                   "median",
                                   "hiCI"),
                             length.out = nrow(sumD)))

#  A tibble: 27 × 4
#    group condition sumScore quantiles
#    <chr>     <int>    <dbl> <chr>    
#  1 a             1     2.22 lowCI    
#  2 a             1     4.5  median   
#  3 a             1     9.33 hiCI     
#  4 a             2     4.22 lowCI    
#  5 a             2     7.5  median   
#  6 a             2    10    hiCI     
#  7 a             3     2    lowCI    
#  8 a             3     7    median   
#  9 a             3     9.78 hiCI     
# 10 b             1     1.68 lowCI  

But I would like to skip the step where I need to name the object and then use nrow(foo) to specify the number of repetitions. There are occasions where I do not know in advance how many rows the object I am creating will have so cannot specify the number of times.out in the rep() function. And if you leave that argument blank in rep() it only returns three values and you get an error, see below

d %>%
  group_by(group, condition) %>%
    reframe(sumScore = quantile(x = score,
                                probs = c(0.025, 0.5, 0.975))) %>%
      add_column(quantiles = rep(x = c("lowCI", 
                                       "median",
                                       "hiCI")))

Error in `add_column()`:
! New columns must be compatible with `.data`.
✖ New column has 3 rows.
ℹ `.data` has 27 rows.
Run `rlang::last_trace()` to see where the error occurred.

Does anyone know of a way either to supply labels to the row names within the reframe() function, or alternatively specify some kind of 'keep going till your reach the (unknown) end' argument in rep()? (or in fact any other tidyverse-type solution.


Solution

  • You can just add the labels to the reframe() call:

    library(dplyr)
    
    d %>%
      group_by(group, condition) %>%
      reframe(sumScore = quantile(x = score, probs = c(0.025, 0.5, 0.975)),
              quantile = c("lowCI", "median", "hiCI")) 
    
    # A tibble: 27 × 4
       group condition sumScore quantile
       <chr>     <int>    <dbl> <chr>   
     1 a             1     1.23 lowCI   
     2 a             1     6.5  median  
     3 a             1    10    hiCI    
     4 a             2     3.45 lowCI   
     5 a             2     9    median  
     6 a             2    10    hiCI    
     7 a             3     3    lowCI   
     8 a             3     6    median  
     9 a             3     9.78 hiCI    
    10 b             1     1    lowCI  
    

    Alternatively, quantile() by default returns a named vector and these names can be used as labels:

    d %>%
      group_by(group, condition) %>%
      reframe(sumScore = quantile(x = score, probs = c(0.025, 0.5, 0.975)),
              quantiles = names(sumScore))
    
    # A tibble: 27 × 4
       group condition sumScore quantiles 
       <chr>     <int>    <dbl> <chr>
     1 a             1     1.23 2.5% 
     2 a             1     6.5  50%  
     3 a             1    10    97.5%
     4 a             2     3.45 2.5% 
     5 a             2     9    50%  
     6 a             2    10    97.5%
     7 a             3     3    2.5% 
     8 a             3     6    50%  
     9 a             3     9.78 97.5%
    10 b             1     1    2.5% 
    # ℹ 17 more rows
    # ℹ Use `print(n = ...)` to see more rows