I am trying to run Kruskal wallis tests for multiple columns in my example dataframe (df) in R, but I am stuck with the following error:
Error in model.frame.default(formula = as.numeric(x) ~ as.factor(Groups), :
variable lengths differ (found for 'as.factor(Groups)')
Here is my example dataframe (df):
Groups Gene1 Gene2 Gene3 Gene4 Gene5 Gene6 Gene7 Gene8 Gene9 Gene10
Group1 120.67 69.33 1.24 2.31 0.39 6.57 2.49 383.84 415.23 NA
Group1 157 110.67 0.4 0.84 0.28 2.62 2.11 245.42 325.23 NA
Group1 113.5 66.75 1.07 4.53 0.33 2.37 2.35 421.25 352.03 73.51
Group1 131 79.67 1.13 5.03 0.72 3.36 2.24 305.32 432.81 71.11
Group1 120 79.67 0.91 3.84 0.74 3.77 1.92 298.91 382.43 66.49
Group2 125.67 83.67 2.07 1.73 0.38 3.89 2.09 233.81 377.21 72.1
Group2 103.33 68.67 1.01 4.89 0.3 4.5 1.75 231.5 381.73 53
Group2 121.33 74.67 0.54 2.39 3.95 3.7 2.46 310.66 355.97 143.61
Group2 136 83.67 1.6 1.75 0.32 5.17 2.36 410.21 389.62 170.34
Group2 143.67 71.33 0.56 1.22 0.26 4.48 2.62 294.01 491.57 96.72
Group2 134.67 69.67 0.85 1.77 0.45 3.58 2.44 236.61 441.32 69.06
Group2 158.33 98.33 0.87 3.69 0.51 2.53 2.6 257.66 396.96 41.94
Group2 147.33 88.33 NA NA NA NA NA NA NA NA
Group2 95.67 59 1.39 0.56 0.31 2.49 2.09 395.38 420.28 64.83
Group3 135 82 13.31 24.05 1.21 3.83 2.83 313.71 327.84 66.8
Group3 124.67 78 1.12 2 0.71 3.77 2.42 334.36 358.9 131.35
Group3 152 98.33 1.11 1.54 0.35 2.11 2.21 297.68 433.48 117.18
Group3 135.33 73.67 0.13 2.99 0.3 2.4 1.86 296.82 415.13 112.97
Group3 135.33 87 0.91 3.73 0.65 2.92 1.85 335.31 412.16 103.18
Group4 124.67 77.67 0.28 0.81 0.49 2.62 1.96 251.49 468.19 80.27
Group4 125.67 72.33 1.01 1.82 0.35 3.65 1.62 335.18 264.74 145.15
Group4 169 105 0.6 3.12 0.29 3.9 2.22 311.01 459.85 82.89
Group4 123.67 76.33 0.65 1.78 0.47 2.77 1.57 253.56 283.38 59.07
Group5 132.67 76.33 2.94 17.01 0.27 3.99 2.55 354.78 493.02 145.36
Group5 NA NA 1.34 1.42 0.4 4.21 2.02 243.26 345.2 43.91
Group5 144.33 75 NA NA 0.55 3.26 2.85 312.16 419.86 55.71
Group5 136.25 78.25 NA 1.32 0.65 3.63 1.52 267.13 256.18 53.49
Group5 123.67 69.33 1.81 1.52 0.67 3.89 2 303.89 346.57 112.16
Group5 116.67 66.33 0.7 1.68 0.27 3.55 2.16 284.96 407.04 102.97
Group5 136.67 76 2.68 4.3 0.33 7.36 2.26 237.28 423.29 88.65
Group6 122 63.33 0.87 4.2 0.17 3.92 2.11 159.04 300.24 60.13
Group6 130.67 82.67 0.8 1.85 1 5.26 2.46 388.61 558.51 66.76
Group6 136.33 70.33 0.54 2.26 0.35 NA NA 388.81 551.69 113.39
Group6 127.33 73 1.32 2.19 0.99 4.42 2.59 378.57 501.12 85.56
Group7 186.67 89.67 0.79 1.77 0.53 5.22 2.73 269.87 490.25 77.74
Group7 203 93 5.63 22.08 0.82 6.97 2.92 341.87 611.33 92.7
Group7 127 72.67 0.55 1.07 0.38 3.2 1.69 310.9 410.19 65.62
Group7 142 79.67 1.61 1.35 3.24 3.73 2.08 304.52 495.79 60.15
Here is my code:
kw.tests <- lapply(
data[, -1],
function(x) { kruskal.test(as.numeric(x) ~ as.factor(Groups), data = data_test, na.action=na.omit) }
)
Error in model.frame.default(formula = as.numeric(x) ~ as.factor(Groups), :
variable lengths differ (found for 'as.factor(Groups)')
This code runs perfectly when I am running each of the gene individually, for example, for Gene1:
kruskal.test(Gene1 ~ as.factor(Groups), data = data_test, na.action=na.omit)
Kruskal-Wallis rank sum test
data: Gene1 by as.factor(Groups)
Kruskal-Wallis chi-squared = 5.6607, df = 6, p-value = 0.4622
However, it gives me this error when I use lapply or even a for loop. I have already googled this error several times, but none of the following answers are helping me.
I here post snippet of my data:
> dput(data_test)
structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L), .Label = c("Group1",
"Group2", "Group3", "Group4", "Group5", "Group6", "Group7"), class = "factor"),
Gene1 = c(120.67, 157, 113.5, 131, 120, 125.67, 103.33, 121.33,
136, 143.67, 134.67, 158.33, 147.33, 95.67, 135, 124.67,
152, 135.33, 135.33, 124.67, 125.67, 169, 123.67, 132.67,
NA, 144.33, 136.25, 123.67, 116.67, 136.67, 122, 130.67,
136.33, 127.33, 186.67, 203, 127, 142), Gene2 = c(69.33,
110.67, 66.75, 79.67, 79.67, 83.67, 68.67, 74.67, 83.67,
71.33, 69.67, 98.33, 88.33, 59, 82, 78, 98.33, 73.67, 87,
77.67, 72.33, 105, 76.33, 76.33, NA, 75, 78.25, 69.33, 66.33,
76, 63.33, 82.67, 70.33, 73, 89.67, 93, 72.67, 79.67), Gene3 = c(1.24,
0.4, 1.07, 1.13, 0.91, 2.07, 1.01, 0.54, 1.6, 0.56, 0.85,
0.87, NA, 1.39, 13.31, 1.12, 1.11, 0.13, 0.91, 0.28, 1.01,
0.6, 0.65, 2.94, 1.34, NA, NA, 1.81, 0.7, 2.68, 0.87, 0.8,
0.54, 1.32, 0.79, 5.63, 0.55, 1.61), Gene4 = c(2.31, 0.84,
4.53, 5.03, 3.84, 1.73, 4.89, 2.39, 1.75, 1.22, 1.77, 3.69,
NA, 0.56, 24.05, 2, 1.54, 2.99, 3.73, 0.81, 1.82, 3.12, 1.78,
17.01, 1.42, NA, 1.32, 1.52, 1.68, 4.3, 4.2, 1.85, 2.26,
2.19, 1.77, 22.08, 1.07, 1.35), Gene5 = c(0.39, 0.28, 0.33,
0.72, 0.74, 0.38, 0.3, 3.95, 0.32, 0.26, 0.45, 0.51, NA,
0.31, 1.21, 0.71, 0.35, 0.3, 0.65, 0.49, 0.35, 0.29, 0.47,
0.27, 0.4, 0.55, 0.65, 0.67, 0.27, 0.33, 0.17, 1, 0.35, 0.99,
0.53, 0.82, 0.38, 3.24), Gene6 = c(6.57, 2.62, 2.37, 3.36,
3.77, 3.89, 4.5, 3.7, 5.17, 4.48, 3.58, 2.53, NA, 2.49, 3.83,
3.77, 2.11, 2.4, 2.92, 2.62, 3.65, 3.9, 2.77, 3.99, 4.21,
3.26, 3.63, 3.89, 3.55, 7.36, 3.92, 5.26, NA, 4.42, 5.22,
6.97, 3.2, 3.73), Gene7 = c(2.49, 2.11, 2.35, 2.24, 1.92,
2.09, 1.75, 2.46, 2.36, 2.62, 2.44, 2.6, NA, 2.09, 2.83,
2.42, 2.21, 1.86, 1.85, 1.96, 1.62, 2.22, 1.57, 2.55, 2.02,
2.85, 1.52, 2, 2.16, 2.26, 2.11, 2.46, NA, 2.59, 2.73, 2.92,
1.69, 2.08), Gene8 = c(383.84, 245.42, 421.25, 305.32, 298.91,
233.81, 231.5, 310.66, 410.21, 294.01, 236.61, 257.66, NA,
395.38, 313.71, 334.36, 297.68, 296.82, 335.31, 251.49, 335.18,
311.01, 253.56, 354.78, 243.26, 312.16, 267.13, 303.89, 284.96,
237.28, 159.04, 388.61, 388.81, 378.57, 269.87, 341.87, 310.9,
304.52), Gene9 = c(415.23, 325.23, 352.03, 432.81, 382.43,
377.21, 381.73, 355.97, 389.62, 491.57, 441.32, 396.96, NA,
420.28, 327.84, 358.9, 433.48, 415.13, 412.16, 468.19, 264.74,
459.85, 283.38, 493.02, 345.2, 419.86, 256.18, 346.57, 407.04,
423.29, 300.24, 558.51, 551.69, 501.12, 490.25, 611.33, 410.19,
495.79), Gene10 = c(NA, NA, 73.51, 71.11, 66.49, 72.1, 53,
143.61, 170.34, 96.72, 69.06, 41.94, NA, 64.83, 66.8, 131.35,
117.18, 112.97, 103.18, 80.27, 145.15, 82.89, 59.07, 145.36,
43.91, 55.71, 53.49, 112.16, 102.97, 88.65, 60.13, 66.76,
113.39, 85.56, 77.74, 92.7, 65.62, 60.15)), class = "data.frame", row.names = c(NA,
-38L))
Any further help appreciated. Thanking you.
You used the wrong dataset name in your lapply / apply call
apply(data_test[,-1],2,function(x){kruskal.test(as.numeric(x)~as.factor(data_test$Groups))})
works for me.