Consider a dataset "trees" which has several factors and two numerical continuous variables. Two of these variables, plot.type (with classes "one", "ten", "non") and demog (with classes "seed", "sap", "adult"), are used to subset the data. For every subset, Kruskal-Wallis tests should be run on the two continuous variables: count & prop, using the grouping variable guild (with classes "fast", "med", "slow").
Without individually writing out the test on each subset and subset-of-subset of the data, is there a quick way in R to accomplish this task and put calculated p values to a matrix?
Example data
library(dplyr)
set.seed(123)
Data <- tbl_df(
data.frame(
plot.type = as.factor(rep(c("one", "ten", "non"), each = 80)),
guild.type = as.factor(sample(c("fast", "med", "slow"), replace = TRUE, prob = c(.12, .28, .60))),
demog = as.factor(sample(c("seed", "sap", "adult"),, prob = c(.33, .33, .33))),
prop = runif(240, 0, 10),
count = runif(240, 0, 10))
)%>%
group_by(plot.type)
trees <- as.data.frame(Data)
Was looking at other similar questions:
Tried this:
library(dplyr)
library(tidyr)
nm1 <- names(lhs)[4:5]
f1 <- function(x,y) kruskal.test(x~y)$p.value
tree %>%
do({data.frame(Map(f1, .[nm1], list(.$compare_by)))}) %>%
unite(Data_subsets, demog, plot.type, sep="_")
But missing how to include the guild infos as the grouping variable for the KW tests. Perhaps I should try to transform guild to a wide format table here?
Example of desired output:
Data_subsets fast.stat fast.p med.stat med.p slow.stat slow.p
1 seed_one <value> <value> <value> <value> <value> <value>
2 seed_ten <value> <value> <value> <value> <value> <value>
3 seed_non <value> <value> <value> <value> <value> <value>
4 sap_one <value> <value> <value> <value> <value> <value>
5 sap_ten <value> <value> <value> <value> <value> <value>
6 sap_non <value> <value> <value> <value> <value> <value>
7 adult_one <value> <value> <value> <value> <value> <value>
8 adult_ten <value> <value> <value> <value> <value> <value>
9 adult_non <value> <value> <value> <value> <value> <value>
Any suggestions? Thanks in advance!
If I understand you correctly, you can just unite all three factors together, run the Kruskal-Wallis test on all the resulting group, then spread your table wide to have the fast, med, slow separated by columns.
library(dplyr)
set.seed(12)
Data <- tbl_df(
data.frame(
plot.type = as.factor(rep(c("one", "ten", "non"), each = 80)),
guild.type = as.factor(sample(c("fast", "med", "slow"), replace = TRUE, prob = c(.12, .28, .60))),
demog = as.factor(sample(c("seed", "sap", "adult"),, prob = c(.33, .33, .33))),
prop = runif(240, 0, 10),
count = runif(240, 0, 10))
)%>%
group_by(plot.type)
trees <- as.data.frame(Data)
library(dplyr)
library(tidyr)
trees %>%
unite(Data_subsets, demog, plot.type, guild.type, sep="_", remove = F) %>%
group_by(Data_subsets) %>%
summarise(stats=kruskal.test(count~prop)$stat, p = kruskal.test(count~prop)$p.value) %>%
separate(Data_subsets, c("demog", "plot.type", "guild.type")) %>%
unite(Subsets, demog, plot.type, sep="_") %>%
pivot_wider(id_cols = Subsets,
names_from = guild.type,
values_from = c("stats", "p"))
This would result in something like this:
A tibble: 9 x 7
Subsets stats_med stats_slow stats_fast p_med p_slow p_fast
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 adult_non 26 NA NA 0.463 NA NA
2 adult_one 26 NA NA 0.463 NA NA
3 adult_ten 25 NA NA 0.462 NA NA
4 sap_non NA 25 NA NA 0.462 NA
5 sap_one NA 26 NA NA 0.463 NA
6 sap_ten NA 26 NA NA 0.463 NA
7 seed_non NA NA 26 NA NA 0.463
8 seed_one NA NA 25 NA NA 0.462
9 seed_ten NA NA 26 NA NA 0.463
The NAs are just there because the data does not have that comparison as is. I would choose to keep the table without pivoting_wider. Something like this:
trees %>%
unite(Data_subsets, demog, plot.type, guild.type, sep="_", remove = F) %>%
group_by(Data_subsets) %>%
summarise(stats=kruskal.test(count~prop)$stat, p = kruskal.test(count~prop)$p.value) %>%
separate(Data_subsets, c("demog", "plot.type", "guild.type"))
results:
# A tibble: 9 x 5
demog plot.type guild.type stats p
<chr> <chr> <chr> <dbl> <dbl>
1 adult non med 266 4.38e-22
2 adult one med 266 4.38e-22
3 adult ten med 265 6.25e-22
4 sap non slow 265 6.25e-22
5 sap one slow 266 4.38e-22
6 sap ten slow 266 4.38e-22
7 seed non fast 266 4.38e-22
8 seed one fast 265 6.25e-22
9 seed ten fast 266 4.38e-22
I hope this helps.
H