rdataframefunctionrstatix

Run a shapiro_test() within a function based on a dataframe


I have a dataframe with columns Time_points_secs, Treatment and Pellet.

I want to test for normality before running statistics and then produce a line graph. I am creating a function so that i can repeat the same code for other columns within the dataframe (e.g. Pellet_count, etc.).

My function is:

line_graph<-function(var){

Normality<- df %>%
    group_by(Treatment, Time_point_secs) %>%
    filter(n_distinct(.data[[var]]) > 1) %>%
    shapiro_test(.data[[var]]) #rstatix package
  
  return(Normality)
 }

line_graph("Pellet")

But i get an error saying:

Error in `mutate()`:
ℹ In argument: `data = map(.data$data, .f, ...)`.
Caused by error in `map()`:
ℹ In index: 1.
Caused by error in `select()`:
! Can't subset columns that don't exist.
✖ Column `.data[["Pellet"]]` doesn't exist.

I've tried [[var]],{{var}} but neither works.


Solution

  • Embracing ({{var}}) should work without any issues, though you might have called the function with a character vector as in your example (line_graph("Pellet")) instead of using unquoted data-variable (line_graph(Pellet)).

    And shapiro_test(.data[[var]]) fails as it expects "/../ One or more unquoted expressions (or variable names) separated by commas /../" for ..., but shapiro_test(.data[[var]]) apparently gets parsed as shapiro_test(vars = '.data[[var]]').

    So either use embrace and pass the argument without quotes or adjust shapiro_test() to use vars parameter:

    library(rstatix)
    library(dplyr)
    
    # use env-variable, call with character vector: f_envvar("var")
    f_envvar<-function(var){
      mtcars %>%
        group_by(am, gear) %>%
        filter(n_distinct(.data[[var]]) > 1) %>%
        shapiro_test(vars = var) #rstatix package
    }
    
    # use data-variable, call with unquoted promise: f_embrace(var)
    f_embrace<-function(var){
      mtcars %>%
        group_by(am, gear) %>%
        filter(n_distinct({{var}}) > 1) %>%
        shapiro_test({{var}}) #rstatix package
    }
    
    norm_envvar <- f_envvar("vs")
    norm_embrce <- f_embrace(vs)
    norm_envvar
    #> # A tibble: 3 × 5
    #>      am  gear variable statistic          p
    #>   <dbl> <dbl> <chr>        <dbl>      <dbl>
    #> 1     0     3 vs           0.499 0.00000348
    #> 2     1     4 vs           0.566 0.0000632 
    #> 3     1     5 vs           0.552 0.000131
    
    # check if identical:
    identical(norm_envvar, norm_embrce)
    #> [1] TRUE
    
    tibble(mtcars)
    #> # A tibble: 32 × 11
    #>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
    #>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    #>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
    #>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
    #>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
    # ...
    #> # ℹ 22 more rows
    

    Backtrace for original approach with shapiro_test(.data[[var]]):

    line_graph<-function(var){
      
      Normality<- mtcars %>%
        group_by(am, gear) %>%
        filter(n_distinct(.data[[var]]) > 1) %>%
        shapiro_test(.data[[var]]) #rstatix package
      
      return(Normality)
    }
    
    line_graph("vs")
    #> Error in `mutate()`:
    #> ℹ In argument: `data = map(.data$data, .f, ...)`.
    #> Caused by error in `map()`:
    #> ℹ In index: 1.
    #> Caused by error in `select()`:
    #> ! Can't subset columns that don't exist.
    #> ✖ Column `.data[["vs"]]` doesn't exist.
    #> Backtrace:
    #>      ▆
    #>   1. ├─global line_graph("vs")
    #>   2. │ └─... %>% shapiro_test(.data[[var]])
    #>   3. ├─rstatix::shapiro_test(., .data[[var]])
    #>   4. │ └─data %>% doo(shapiro_test, ..., vars = vars)
    #>   5. ├─rstatix::doo(., shapiro_test, ..., vars = vars)
    #>   6. │ └─... %>% mutate(data = map(.data$data, .f, ...))
    #>   7. ├─dplyr::mutate(., data = map(.data$data, .f, ...))
    #>   8. ├─dplyr:::mutate.data.frame(., data = map(.data$data, .f, ...))
    #>   9. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
    #>  10. │   ├─base::withCallingHandlers(...)
    #>  11. │   └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
    #>  12. │     └─mask$eval_all_mutate(quo)
    #>  13. │       └─dplyr (local) eval()
    #>  14. ├─purrr::map(.data$data, .f, ...)
    #>  15. │ └─purrr:::map_("list", .x, .f, ..., .progress = .progress)
    #>  16. │   ├─purrr:::with_indexed_errors(...)
    #>  17. │   │ └─base::withCallingHandlers(...)
    #>  18. │   ├─purrr:::call_with_cleanup(...)
    #>  19. │   └─rstatix (local) .f(.x[[i]], ...)
    #>  20. │     └─data %>% select(!!!syms(vars))
    #>  21. ├─dplyr::select(., !!!syms(vars))
    #>  22. ├─dplyr:::select.data.frame(., !!!syms(vars))
    #>  23. │ └─tidyselect::eval_select(expr(c(...)), data = .data, error_call = error_call)
    #>  24. │   └─tidyselect:::eval_select_impl(...)
    #>  25. │     ├─tidyselect:::with_subscript_errors(...)
    #>  26. │     │ └─rlang::try_fetch(...)
    #>  27. │     │   └─base::withCallingHandlers(...)
    #>  28. │     └─tidyselect:::vars_select_eval(...)
    #>  29. │       └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
    #>  30. │         └─tidyselect:::eval_c(expr, data_mask, context_mask)
    #>  31. │           └─tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
    #>  32. │             └─tidyselect:::walk_data_tree(new, data_mask, context_mask)
    #>  33. │               └─tidyselect:::as_indices_sel_impl(...)
    #>  34. │                 └─tidyselect:::as_indices_impl(...)
    #>  35. │                   └─tidyselect:::chr_as_locations(x, vars, call = call, arg = arg)
    #>  36. │                     └─vctrs::vec_as_location(...)
    #>  37. └─vctrs (local) `<fn>`()
    #>  38.   └─vctrs:::stop_subscript_oob(...)
    #>  39.     └─vctrs:::stop_subscript(...)
    #>  40.       └─rlang::abort(...)
    

    Created on 2023-06-27 with reprex v2.0.2