rdplyrpropagation

Propagation of random errors in R


I have a dataset that contains triplicate measurements of 16 different PAHs (pollutants) in 23 different samples. I would like to calculate the propagated error for the sum of the mean triplicate concentration of the 16 PAHs for each sample. Is there a function that does this in R? Here is the dataset for one of the samples I have:

Q <- biochar %>% 
  filter(pollutant_class == "PAH",
         sample_ID == "WT-BC-700") %>% 
  select(sample_ID, pollutant, conc)

sample_ID              pollutant  conc
2  WT-BC-700           Acenaphthene 0.509
25 WT-BC-700           Acenaphthene 0.430
46 WT-BC-700           Acenaphthene 0.491
3  WT-BC-700         Acenaphthylene 3.410
26 WT-BC-700         Acenaphthylene 2.990
42 WT-BC-700         Acenaphthylene 3.500
27 WT-BC-700             Anthracene 1.810
36 WT-BC-700             Anthracene 2.090
40 WT-BC-700             Anthracene 2.020
20 WT-BC-700      Benz(a)anthracene 0.081
32 WT-BC-700      Benz(a)anthracene 0.087
48 WT-BC-700      Benz(a)anthracene 0.086
1  WT-BC-700         Benzo(a)pyrene 0.078
8  WT-BC-700         Benzo(a)pyrene 0.089
9  WT-BC-700         Benzo(a)pyrene 0.096
7  WT-BC-700   Benzo(b)fluoranthene 0.051
15 WT-BC-700   Benzo(b)fluoranthene 0.056
22 WT-BC-700   Benzo(b)fluoranthene 0.055
16 WT-BC-700     Benzo(ghi)perylene 0.022
24 WT-BC-700     Benzo(ghi)perylene 0.022
34 WT-BC-700     Benzo(ghi)perylene 0.020
6  WT-BC-700   Benzo(k)fluoranthene 0.067
21 WT-BC-700   Benzo(k)fluoranthene 0.071
39 WT-BC-700   Benzo(k)fluoranthene 0.080
4  WT-BC-700               Chrysene 0.093
5  WT-BC-700               Chrysene 0.097
23 WT-BC-700               Chrysene 0.084
17 WT-BC-700   Dibenz(ah)anthracene 0.007
19 WT-BC-700   Dibenz(ah)anthracene 0.009
37 WT-BC-700   Dibenz(ah)anthracene 0.009
11 WT-BC-700           Fluoranthene 1.590
33 WT-BC-700           Fluoranthene 1.440
47 WT-BC-700           Fluoranthene 1.620
30 WT-BC-700               Fluorene 0.850
41 WT-BC-700               Fluorene 0.877
45 WT-BC-700               Fluorene 0.782
18 WT-BC-700 Indeno(1,2,3-cd)pyrene 0.030
29 WT-BC-700 Indeno(1,2,3-cd)pyrene 0.033
43 WT-BC-700 Indeno(1,2,3-cd)pyrene 0.024
13 WT-BC-700            Naphthalene 5.620
14 WT-BC-700            Naphthalene 4.380
44 WT-BC-700            Naphthalene 5.730
10 WT-BC-700           Phenanthrene 6.640
28 WT-BC-700           Phenanthrene 7.080
35 WT-BC-700           Phenanthrene 5.930
12 WT-BC-700                 Pyrene 0.986
31 WT-BC-700                 Pyrene 1.190
38 WT-BC-700                 Pyrene 1.110

I have created a summary table with the mean and standard deviation of each pollutant:

means <- Q %>% 
  group_by(sample_ID, pollutant) %>% 
  summarise(mean_conc = mean(conc),
            sd_conc = sd(conc))

`summarise()` has grouped output by 'sample_ID'. You can override using the
`.groups` argument.

means 
# A tibble: 16 × 4
# Groups:   sample_ID [1]
   sample_ID pollutant              mean_conc sd_conc
   <chr>     <chr>                      <dbl>   <dbl>
 1 WT-BC-700 Acenaphthene             0.477   0.0414 
 2 WT-BC-700 Acenaphthylene           3.3     0.272  
 3 WT-BC-700 Anthracene               1.97    0.146  
 4 WT-BC-700 Benz(a)anthracene        0.0847  0.00321
 5 WT-BC-700 Benzo(a)pyrene           0.0877  0.00907
 6 WT-BC-700 Benzo(b)fluoranthene     0.054   0.00265
 7 WT-BC-700 Benzo(ghi)perylene       0.0213  0.00115
 8 WT-BC-700 Benzo(k)fluoranthene     0.0727  0.00666
 9 WT-BC-700 Chrysene                 0.0913  0.00666
10 WT-BC-700 Dibenz(ah)anthracene     0.00833 0.00115
11 WT-BC-700 Fluoranthene             1.55    0.0964 
12 WT-BC-700 Fluorene                 0.836   0.0490 
13 WT-BC-700 Indeno(1,2,3-cd)pyrene   0.029   0.00458
14 WT-BC-700 Naphthalene              5.24    0.750  
15 WT-BC-700 Phenanthrene             6.55    0.580  
16 WT-BC-700 Pyrene                   1.10    0.103  

And then I created a summary of the total concentration of all the PAHs:

summary <- means %>% 
  group_by(sample_ID) %>% 
  summarise(sum_pollutant = sum(mean_conc))

summary
# A tibble: 1 × 2
  sample_ID sum_pollutant
  <chr>             <dbl>
1 WT-BC-700          21.5

So my question is: how can I calculate the propagated error for sum_pollutant in the summary data frame?

I tried finding the answer in the propagate() package, but this got a bit too complex for me and I wasn't sure if this was the right place to look. I would like to use the following expression for the propagated error:

σx = sqrt(σa^2 + σb^2 + σc^2)

a, b, and c are measured variables and σa, σb, and σc are the standard deviations of those variables.


Solution

  • Is this what you're looking for? I just added a new variable to your summary dataset.

    summary <- means %>% 
      group_by(sample_ID) %>% 
      summarise(
        sum_pollutant = sum(mean_conc), 
        propogated_error = sqrt(sum(sd_conc ^ 2))
      )