I have a large dataset, containing a categorical (Size) and a numeric variable(Fraktion) to group the data. The rest are analytical resultd like numeric values. Every value got sampled 3 times, and I needed to create the meaning.
It looks more or less like this:
Size | Fraktion | Sample | Value1 | Value2 | ... |
---|---|---|---|---|---|
A | 1 | 1 | 3 | 2 | ... |
A | 1 | 2 | 4 | 4 | ... |
A | 1 | 3 | 2 | 1 | ... |
A | 2 | 1 | 1 | 5 | ... |
A | 2 | 2 | 3 | 7 | ... |
A | 2 | 3 | 4 | 5 | ... |
B | 1 | 1 | 2 | 3 | ... |
B | 1 | 2 | 3 | 2 | ... |
B | 1 | 3 | 4 | 2 | ... |
B | 1 | 3 | 2 | 4 | ... |
To calculate the means of the samples I used the summarise function of dyplr like this:
mean_df<-
df %>%
group_by(Fraktion,Size)%>%
summarise_all("mean")
I guess this might not be the most elegant way, as in the next step I have to remove the "Sample" column, but it worked for me. Now I want to integrate the standard deviation for each created mean and add it to the df.
I found this thread ((Can I calculate the standard error of all columns with the "summarise_all" function in R dplyr)and tried to use the code provided by Ronak Shah in answer 3:
mean_sd_df<-
df %>%
group_by(Fraktion, Size)%>%
summarise_each(funs(mean,sd,se=sd(.)/sqrt(n())))
However, I get the follwoing error:
across() must only be used inside dplyr verbs.
Any idea what could be the issue?
The classic tidyverse
way to do this, is first make your data tidy by pivoting your table to long format:
df %>%
pivot_longer(starts_with("Value"), names_to = "Measurement", values_to = "val")
and then summarize this long table:
df %>%
pivot_longer(starts_with("Value"), names_to = "Measurement", values_to = "val") %>%
group_by(Fraktion, Size) %>%
summarize(mean = mean(val), se = sd(val)/sqrt(n()))
If you want the summary by "ValueX", add the name of the new column to the grouping:
df %>%
pivot_longer(starts_with("Value"), names_to = "Measurement", values_to = "val") %>%
group_by(Fraktion, Size, Measurement) %>%
summarize(mean = mean(val), se = sd(val)/sqrt(n()))