I have many data.frames
(448) with the same columns names (9 in total), like this:
V1 V2 V4 ... V9
ENSG00000000003.15 TSPAN6 7095
ENSG00000000005.6 TNMD 4355
. . .
. . .
i want to create another data.frame
, maintaining the first 2 columns (V1 and V2), these are identical in every data.frame
, and merge V4 column (these are different for every data.frame
) from all data.frame
s, the rest of the columns should be excluded.
If it could be possible, I would like to rename the V4 columns to "sample1", "sample2" and so on, until 448.
So the final dataframe should be:
V1 V2 V4_1 V4_2 ... V4_448
ENSG00000000003.15 TSPAN6 7095 3856 .
ENSG00000000005.6 TNMD 4355 2976 .
. . . . .
. . . . .
I already did this:
reader <- function(f){
read.table(f, sep='\t', skip=6, header=FALSE)
}
files <- list.files(path,
recursive=TRUE, full.names=TRUE)
myfilelist <- lapply(files, reader)
But I don't know how to combine only these selected columns
This is the output of dput(lapply(myfilelist[1:2], head))
:
myfilelist <- list(structure(list(V1 = c("ENSG00000000003.15", "ENSG00000000005.6",
"ENSG00000000419.13", "ENSG00000000457.14", "ENSG00000000460.17",
"ENSG00000000938.13"), V2 = c("TSPAN6", "TNMD", "DPM1", "SCYL3",
"C1orf112", "FGR"), V3 = c("protein_coding", "protein_coding",
"protein_coding", "protein_coding", "protein_coding", "protein_coding"
), V4 = c(7094L, 2L, 4355L, 1149L, 372L, 585L), V5 = c(3573L,
1L, 2201L, 953L, 553L, 281L), V6 = c(3521L, 1L, 2154L, 883L,
579L, 308L), V7 = c(59.9764, 0.052, 138.3704, 6.4018, 2.3896,
6.6335), V8 = c(20.5827, 0.0178, 47.4859, 2.197, 0.8201, 2.2765
), V9 = c(22.2037, 0.0192, 51.2256, 2.37, 0.8847, 2.4558)), row.names = c(NA,
6L), class = "data.frame"), structure(list(V1 = c("ENSG00000000003.15",
"ENSG00000000005.6", "ENSG00000000419.13", "ENSG00000000457.14",
"ENSG00000000460.17", "ENSG00000000938.13"), V2 = c("TSPAN6",
"TNMD", "DPM1", "SCYL3", "C1orf112", "FGR"), V3 = c("protein_coding",
"protein_coding", "protein_coding", "protein_coding", "protein_coding",
"protein_coding"), V4 = c(2616L, 23L, 3746L, 1288L, 510L, 1578L
), V5 = c(1369L, 9L, 1876L, 1015L, 681L, 797L), V6 = c(1250L,
14L, 1871L, 984L, 693L, 782L), V7 = c(16.8063, 0.4541, 90.4417,
5.4531, 2.4895, 13.5969), V8 = c(4.8615, 0.1314, 26.1617, 1.5774,
0.7201, 3.9331), V9 = c(6.0158, 0.1625, 32.3733, 1.9519, 0.8911,
4.867)), row.names = c(NA, 6L), class = "data.frame"))
It's not clear to me how you want to join this list of data frames (are the first two columns identical?), but here is an option using left-join:
library(dplyr)
library(purrr)
imap(myfilelist, \(df, i) select(df, 1:2, "sample{i}" := 4) ) |>
reduce(left_join, by = join_by(V1, V2))
If the first two columns are identical across all data frames then you could bind them together:
library(dplyr)
library(purrr)
bind_cols(myfilelist[[1]][1:2],
imap(myfilelist, \(df, i) select(df, "sample{i}" := 4)) |> bind_cols())