rlistfunctionlapplykruskal-wallis

How to lapply() a formula over a dataframe list. or how to perform kruskal.test() over a list of dataframes


So I have this Data and trying to do kruskal.test() over a list containing dataframes

df_list <- list(
  `1.3.A` = 
    tibble::tribble(
      ~Person, ~Height, ~Weight,
      "Alex",    175L,     75L,
      "Gerard",    180L,     85L,
      "Clyde",    179L,     79L,
      "Alex",    175L,     75L,
      "Gerard",    180L,     85L,
      "Clyde",    179L,     79L

    ),
  `2.2.A` = 
    tibble::tribble(
      ~Person, ~Height, ~Weight,
      "Alex",    175L,     75L,
      "Gerard",    180L,     85L,
      "Clyde",    179L,     79L,
       "Alex",    175L,     75L,
      "Gerard",    180L,     85L,
      "Clyde",    179L,     79L
    ), 
  `1.1.B` = 
    tibble::tribble(
      ~Person, ~Height, ~Weight,
      "Alex",    175L,     75L,
      "Gerard",    180L,     85L,
      "Clyde",    179L,     79L,
      "Alex",    175L,     75L,
      "Gerard",    180L,     85L,
      "Clyde",    179L,     79L
    )
)

I am trying to perform kruskal.test over these 3 dataframes but failed after hours and hours of trying to find a solution. I am new to R.

Failed attempts are :

snake <- function(i){
  kruskal.test(df$Height ~ df$Person, data = i)
}
snail <- lapply(df_list, "[[", snake)


df_list %>% kruskal.test(df$Height ~ df$Person)

sapply(df_list, function(i) { kruskal.test(df$Height ~ df$Person, data = i)})


Map(function(x) kruskal.test(Height ~ Person), get(df_list))

Map(function(df_list, .f(kruskal.test(Height ~ Person)))

lapply(mget(df_list), function(x) kruskal.test(Height ~ Person))

bunny <- df_list %>%
  kruskal_test(df$Height ~ Person, data = .)

Summary: I am trying to do kruskal.test() over a set of list containing dataframes. How can a pass a formula over lapply() or Map() to run the kruskal.test() in each dataframes in the list?


Solution

  • Your code is referencing an object called "df", which does not appear to exist. Also, when using kruskal.test with the arguments kruskal.test(formula, data), there is no need to reference the data frame in the formula. Providing kruskal.test a "data" argument will cause the function to search for the formula symbols first in the provided data. In other words, if data frame "x" contains columns "Height" and "Person", then the following would work:

    kruskal.test(Height ~ Person, data = x)
    

    In your example, you shouldn't reference df. Notice that the code below creates a temporary function with an argument called "i", and that "i" is subsequently referenced:

    lapply(df_list, function(i) kruskal.test(Height ~ Person, data = i))
    
    $`1.3.A`
    
        Kruskal-Wallis rank sum test
    
    data:  Height by Person
    Kruskal-Wallis chi-squared = 5, df = 2, p-value = 0.08208
    
    
    $`2.2.A`
    
        Kruskal-Wallis rank sum test
    
    data:  Height by Person
    Kruskal-Wallis chi-squared = 5, df = 2, p-value = 0.08208
    
    
    $`1.1.B`
    
        Kruskal-Wallis rank sum test
    
    data:  Height by Person
    Kruskal-Wallis chi-squared = 5, df = 2, p-value = 0.08208