I have a dataset of abundance of organisms across sites processed by 3 different people that I would like to take from wide to long format using column headers as information to split into different ID variables. here is a snippet of the dataset:
> head(reads[,1:7])
TAXON ORRU_CL.rod ORRU_CL.louis ORRU_CL.john2 BONE_CL.rod BONE_CL.louis
1 ACENTRELLA 4 2 0 0 0
2 ACRONEURIA 0 0 0 0 0
3 ALLOCAPNIA 30 33 44 223 162
4 ALLOPERLA 0 0 0 0 0
5 AMPHINEMURA 912 659 805 6667 4905
6 ANOPHELES 0 0 0 0 0
BONE_CL.john2
1 0
2 0
3 206
4 0
5 6343
6 0
I want to keep TAXON as an id variable, but also have a "SITE" column which is represented by the variable before the "." (ORRU_CL or BONE_CL) and then a "USER" variable which is the name of the person processing (rod, louis or john2). I know that I can split by the "." but not sure how to incorporate this into the reshape2 melt
call.
With a simple melt call we get the long version with TAXON as the id variable, but Id like to make an additional two columns of "SITE" and "USER", splitting at the '.'
> long.reads<-melt(head(reads[,1:7]))
> long.reads
TAXON variable value
1 ACENTRELLA ORRU_CL.rod 4
2 ACRONEURIA ORRU_CL.rod 0
3 ALLOCAPNIA ORRU_CL.rod 30
4 ALLOPERLA ORRU_CL.rod 0
5 AMPHINEMURA ORRU_CL.rod 912
6 ANOPHELES ORRU_CL.rod 0
7 ACENTRELLA ORRU_CL.louis 2
8 ACRONEURIA ORRU_CL.louis 0
9 ALLOCAPNIA ORRU_CL.louis 33
10 ALLOPERLA ORRU_CL.louis 0
11 AMPHINEMURA ORRU_CL.louis 659
12 ANOPHELES ORRU_CL.louis 0
13 ACENTRELLA ORRU_CL.dada2 0
14 ACRONEURIA ORRU_CL.dada2 0
15 ALLOCAPNIA ORRU_CL.dada2 44
16 ALLOPERLA ORRU_CL.dada2 0
17 AMPHINEMURA ORRU_CL.dada2 805
18 ANOPHELES ORRU_CL.dada2 0
19 ACENTRELLA BONE_CL.rod 0
20 ACRONEURIA BONE_CL.rod 0
21 ALLOCAPNIA BONE_CL.rod 223
22 ALLOPERLA BONE_CL.rod 0
23 AMPHINEMURA BONE_CL.rod 6667
24 ANOPHELES BONE_CL.rod 0
25 ACENTRELLA BONE_CL.louis 0
26 ACRONEURIA BONE_CL.louis 0
27 ALLOCAPNIA BONE_CL.louis 162
28 ALLOPERLA BONE_CL.louis 0
29 AMPHINEMURA BONE_CL.louis 4905
30 ANOPHELES BONE_CL.louis 0
31 ACENTRELLA BONE_CL.dada2 0
32 ACRONEURIA BONE_CL.dada2 0
33 ALLOCAPNIA BONE_CL.dada2 206
34 ALLOPERLA BONE_CL.dada2 0
35 AMPHINEMURA BONE_CL.dada2 6343
36 ANOPHELES BONE_CL.dada2 0
Any suggestions?
Thanks for your help!
LP
Using tidyr::pivot_longer
you could split the column names on the .
like:
library(dplyr)
library(tidyr)
reads %>%
pivot_longer(-TAXON,
names_to = c('SITE','USER'),
names_sep = '\\.')
#> # A tibble: 36 × 4
#> TAXON SITE USER value
#> <chr> <chr> <chr> <int>
#> 1 ACENTRELLA ORRU_CL rod 4
#> 2 ACENTRELLA ORRU_CL louis 2
#> 3 ACENTRELLA ORRU_CL john2 0
#> 4 ACENTRELLA BONE_CL rod 0
#> 5 ACENTRELLA BONE_CL louis 0
#> 6 ACENTRELLA BONE_CL john2 0
#> 7 ACRONEURIA ORRU_CL rod 0
#> 8 ACRONEURIA ORRU_CL louis 0
#> 9 ACRONEURIA ORRU_CL john2 0
#> 10 ACRONEURIA BONE_CL rod 0
#> # ℹ 26 more rows
structure(list(TAXON = c("ACENTRELLA", "ACRONEURIA", "ALLOCAPNIA",
"ALLOPERLA", "AMPHINEMURA", "ANOPHELES"), ORRU_CL.rod = c(4L,
0L, 30L, 0L, 912L, 0L), ORRU_CL.louis = c(2L, 0L, 33L, 0L, 659L,
0L), ORRU_CL.john2 = c(0L, 0L, 44L, 0L, 805L, 0L), BONE_CL.rod = c(0L,
0L, 223L, 0L, 6667L, 0L), BONE_CL.louis = c(0L, 0L, 162L, 0L,
4905L, 0L), BONE_CL.john2 = c(0L, 0L, 206L, 0L, 6343L, 0L)), class = "data.frame", row.names = c(NA,
-6L))