I have a dataset that contains triplicate measurements of 16 different PAHs (pollutants) in 23 different samples. I would like to calculate the propagated error for the sum of the mean triplicate concentration of the 16 PAHs for each sample. Is there a function that does this in R? Here is the dataset for one of the samples I have:
Q <- biochar %>%
filter(pollutant_class == "PAH",
sample_ID == "WT-BC-700") %>%
select(sample_ID, pollutant, conc)
sample_ID pollutant conc
2 WT-BC-700 Acenaphthene 0.509
25 WT-BC-700 Acenaphthene 0.430
46 WT-BC-700 Acenaphthene 0.491
3 WT-BC-700 Acenaphthylene 3.410
26 WT-BC-700 Acenaphthylene 2.990
42 WT-BC-700 Acenaphthylene 3.500
27 WT-BC-700 Anthracene 1.810
36 WT-BC-700 Anthracene 2.090
40 WT-BC-700 Anthracene 2.020
20 WT-BC-700 Benz(a)anthracene 0.081
32 WT-BC-700 Benz(a)anthracene 0.087
48 WT-BC-700 Benz(a)anthracene 0.086
1 WT-BC-700 Benzo(a)pyrene 0.078
8 WT-BC-700 Benzo(a)pyrene 0.089
9 WT-BC-700 Benzo(a)pyrene 0.096
7 WT-BC-700 Benzo(b)fluoranthene 0.051
15 WT-BC-700 Benzo(b)fluoranthene 0.056
22 WT-BC-700 Benzo(b)fluoranthene 0.055
16 WT-BC-700 Benzo(ghi)perylene 0.022
24 WT-BC-700 Benzo(ghi)perylene 0.022
34 WT-BC-700 Benzo(ghi)perylene 0.020
6 WT-BC-700 Benzo(k)fluoranthene 0.067
21 WT-BC-700 Benzo(k)fluoranthene 0.071
39 WT-BC-700 Benzo(k)fluoranthene 0.080
4 WT-BC-700 Chrysene 0.093
5 WT-BC-700 Chrysene 0.097
23 WT-BC-700 Chrysene 0.084
17 WT-BC-700 Dibenz(ah)anthracene 0.007
19 WT-BC-700 Dibenz(ah)anthracene 0.009
37 WT-BC-700 Dibenz(ah)anthracene 0.009
11 WT-BC-700 Fluoranthene 1.590
33 WT-BC-700 Fluoranthene 1.440
47 WT-BC-700 Fluoranthene 1.620
30 WT-BC-700 Fluorene 0.850
41 WT-BC-700 Fluorene 0.877
45 WT-BC-700 Fluorene 0.782
18 WT-BC-700 Indeno(1,2,3-cd)pyrene 0.030
29 WT-BC-700 Indeno(1,2,3-cd)pyrene 0.033
43 WT-BC-700 Indeno(1,2,3-cd)pyrene 0.024
13 WT-BC-700 Naphthalene 5.620
14 WT-BC-700 Naphthalene 4.380
44 WT-BC-700 Naphthalene 5.730
10 WT-BC-700 Phenanthrene 6.640
28 WT-BC-700 Phenanthrene 7.080
35 WT-BC-700 Phenanthrene 5.930
12 WT-BC-700 Pyrene 0.986
31 WT-BC-700 Pyrene 1.190
38 WT-BC-700 Pyrene 1.110
I have created a summary table with the mean and standard deviation of each pollutant:
means <- Q %>%
group_by(sample_ID, pollutant) %>%
summarise(mean_conc = mean(conc),
sd_conc = sd(conc))
`summarise()` has grouped output by 'sample_ID'. You can override using the
`.groups` argument.
means
# A tibble: 16 × 4
# Groups: sample_ID [1]
sample_ID pollutant mean_conc sd_conc
<chr> <chr> <dbl> <dbl>
1 WT-BC-700 Acenaphthene 0.477 0.0414
2 WT-BC-700 Acenaphthylene 3.3 0.272
3 WT-BC-700 Anthracene 1.97 0.146
4 WT-BC-700 Benz(a)anthracene 0.0847 0.00321
5 WT-BC-700 Benzo(a)pyrene 0.0877 0.00907
6 WT-BC-700 Benzo(b)fluoranthene 0.054 0.00265
7 WT-BC-700 Benzo(ghi)perylene 0.0213 0.00115
8 WT-BC-700 Benzo(k)fluoranthene 0.0727 0.00666
9 WT-BC-700 Chrysene 0.0913 0.00666
10 WT-BC-700 Dibenz(ah)anthracene 0.00833 0.00115
11 WT-BC-700 Fluoranthene 1.55 0.0964
12 WT-BC-700 Fluorene 0.836 0.0490
13 WT-BC-700 Indeno(1,2,3-cd)pyrene 0.029 0.00458
14 WT-BC-700 Naphthalene 5.24 0.750
15 WT-BC-700 Phenanthrene 6.55 0.580
16 WT-BC-700 Pyrene 1.10 0.103
And then I created a summary of the total concentration of all the PAHs:
summary <- means %>%
group_by(sample_ID) %>%
summarise(sum_pollutant = sum(mean_conc))
summary
# A tibble: 1 × 2
sample_ID sum_pollutant
<chr> <dbl>
1 WT-BC-700 21.5
So my question is: how can I calculate the propagated error for sum_pollutant
in the summary
data frame?
I tried finding the answer in the propagate() package, but this got a bit too complex for me and I wasn't sure if this was the right place to look. I would like to use the following expression for the propagated error:
σx = sqrt(σa^2 + σb^2 + σc^2)
a, b, and c are measured variables and σa, σb, and σc are the standard deviations of those variables.
Is this what you're looking for? I just added a new variable to your summary
dataset.
summary <- means %>%
group_by(sample_ID) %>%
summarise(
sum_pollutant = sum(mean_conc),
propogated_error = sqrt(sum(sd_conc ^ 2))
)