I have a data table structured like this:
Molecular Formula AHBG13 AHBG15 AHBG16 AHBG19 AHBG20
C23H48NO7P 1767 4570 7052 8831 7921
C24H50NO7P 0 0 0 73331 0
C24H50NO7P 45335 48281 0 0 77333
C24H50NO7P 0 0 48537 0 0
C25H44NO7P 4190 3166 0 4903 6609
C26H48NO7P 5110 3102 0 8939 7772
C26H50NO7P 0 3021 1981 7770 5357
C26H52NO7P 6377 0 0 0 13176
C26H52NO7P 0 8245 7883 20806 0
C26H54NO7P 12049 27408 35437 45506 44353
C26H54NO7P 0 0 0 2123 2367
C26H54NO7P 0 1307 0 0 0
C28H50NO7P 0 6503 0 21058 10462
C28H50NO7P 7547 0 0 0 0
C28H50NO7P 6414 0 4990 0 0
C28H54NO7P 0 0 863 0 6411
C28H54NO7P 2107 4732 0 6778 0
C30H54NO7P 0 0 2185 8351 5841
C30H54NO7P 3809 0 0 0 0
C30H54NO7P 0 3886 0 0 0
where the first column is the molecular formula for a lipid, and the following columns are the concetrations of each lipid for my specific samples. However, there are repeats of certain lipids due to isomers of the same molecule being present, requiring me to sum these values using r. Is there a way to sum the concentrations of identical lipids across all samples? I'd like to create something like this:
Molecular Formula AHBG13 AHBG15 AHBG16 AHBG19 AHBG20
C23H48NO7P 1767 4570 7052 8831 7921
C24H50NO7P 45335 48281 48537 73331 77333
C25H44NO7P 4190 3166 0 4903 6609
C26H48NO7P 5110 3102 0 8939 7772
C26H50NO7P 0 3021 1981 7770 5357
C26H52NO7P 6377 8245 7883 20806 13176
C26H54NO7P 12049 28715 35437 47629 46720
Optimally something that can be added into a pipeline, but I'll take whatever I can get.
I tried using a various variations of summarize() and summarize_all(), but wasn't able to find a syntax that produces the desired result. I could use the summarize function to do what I want for each sample individually, with the amount of samples and data that I have it would save a significant amount of time if I could apply the function to multiple samples at once.
The closest I got was using
summarise(across(c("AHBG13","AHBG15","AHBG16,"AHBG19","AHBG20"), sum))
which resulted in the sum of all of the concentrations, like this:
AHBG13 AHBG15 AHBG16 AHBG19 AHBG20
Molecular Formula 94705 114221 108928 208396 187602
I think you should use .by = ...
, e.g.,
df %>%
summarise(across(everything(), sum), .by = MolecularFormula)
or simply applies aggregate
aggregate(. ~ MolecularFormula, df, sum)
which gives
MolecularFormula AHBG13 AHBG15 AHBG16 AHBG19 AHBG20
1 C23H48NO7P 1767 4570 7052 8831 7921
2 C24H50NO7P 45335 48281 48537 73331 77333
3 C25H44NO7P 4190 3166 0 4903 6609
4 C26H48NO7P 5110 3102 0 8939 7772
5 C26H50NO7P 0 3021 1981 7770 5357
6 C26H52NO7P 6377 8245 7883 20806 13176
7 C26H54NO7P 12049 28715 35437 47629 46720
8 C28H50NO7P 13961 6503 4990 21058 10462
9 C28H54NO7P 2107 4732 863 6778 6411
10 C30H54NO7P 3809 3886 2185 8351 5841