I'm trying to import multiple CSV files in the RStudio while keeping their filenames.
library(readr)
library(dplyr)
library(purrr)
#importing all csv files at once
csv_files = list.files(pattern ="*Con.csv")
myfiles = lapply(csv_files , read.delim, header = TRUE, sep = "," )
#merging all files by identifiers
Samp_merg <- myfiles %>% reduce(full_join, by=c("chr", "start","end"))
After doing this I could import the files but the names of the files were missing from the list myfiles
.
myfiles <- dir(pattern = "*Con.csv", full.names = FALSE)
myfiles_data <- lapply(myfiles, data.table::fread)
# assign names to list items
names(myfiles_data) <- myfiles
#merging the files
dat_merg <- myfiles_data %>% reduce(full_join, by=c("chr", "start", "end"))
Here, using this script I can import the files by keeping their names in the myfiles_data
object. However, after joining by three identifiers I'm unable to retain their file names as column names. I want to keep the colname
of the merged df
as the individual file name without extension (.csv).
There are around 90 CSV files present in the directory with the same header.
$ls
01AvPMPpCon.csv
02AvPMPpCon.csv
03AvPMPpCon.csv
04AvPMPpCon.csv
05AvPMPpCon.csv
$head 01AvPMPpCon.csv
chr,start,end,CpG
chr1,2017424,2017750,10
chr1,24901325,24901700,11
chr1,24902268,24902701,25
chr1,24927215,24927416,4
chr1,26861926,26862173,5
chr1,26864186,26864613,15
chr1,35576334,35576451,3
chr1,36304606,36304817,7
At now, the merged file looks like this,
$head(dat_merg)
chr start end CpG.x CpG.y CpG.x.x CpG.y.y CpG.x.x.x CpG.y.y.y
1: chr1 3903250 3903277 4 NA NA NA 4 NA
2: chr1 4657240 4657314 3 NA NA NA NA NA
3: chr1 24900249 24900468 5 NA 5 NA NA NA
4: chr1 46484938 46485047 4 NA 4 NA NA NA
5: chr1 47223634 47223758 4 NA NA NA 4 4
6: chr1 66752822 66753167 12 12 NA NA 12 NA
So, my expected output should look like,
$head(dat_merg)
chr start end 01Av 02Av 03Av 04Av 05Av 06Av
1: chr1 3903250 3903277 4 NA NA NA 4 NA
2: chr1 4657240 4657314 3 NA NA NA NA NA
3: chr1 24900249 24900468 5 NA 5 NA NA NA
4: chr1 46484938 46485047 4 NA 4 NA NA NA
5: chr1 47223634 47223758 4 NA NA NA 4 4
6: chr1 66752822 66753167 12 12 NA NA 12 NA
What about pivot_wider()
instead of reduce(full_join, ...)
?
Prepare reprex, 99 4-row csv files:
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
library(readr)
library(purrr)
csv_ <- read_csv("chr,start,end,CpG
chr1,2017424,2017750,10
chr1,24901325,24901700,11
chr1,24902268,24902701,25
chr1,24927215,24927416,4
chr1,26861926,26862173,5
chr1,26864186,26864613,15
chr1,35576334,35576451,3
chr1,36304606,36304817,7", show_col_types = FALSE)
sprintf("%.2dAvPMPpCon.csv", 1:99) |>
walk(\(f_) slice_sample(csv_, n = 4) |> write_csv(f_))
read_csv()
can read from a list of files and stores file names in id column, c(chr, start, end)
will be used for pivot_wider()
id_cols
:
list.files(pattern ="*Con.csv") |>
read_csv(id = "src") |>
mutate(src = substr(src, 1, 4)) |>
pivot_wider(names_from = src, values_from = CpG)
#> Rows: 396 Columns: 5
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): chr
#> dbl (3): start, end, CpG
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Result:
#> # A tibble: 8 × 102
#> chr start end `01Av` `02Av` `03Av` `04Av` `05Av` `06Av` `07Av` `08Av`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 chr1 26864186 2.69e7 15 NA 15 15 NA NA NA NA
#> 2 chr1 26861926 2.69e7 5 5 NA 5 5 NA NA NA
#> 3 chr1 24902268 2.49e7 25 25 25 25 25 25 NA NA
#> 4 chr1 24927215 2.49e7 4 4 NA NA 4 NA 4 4
#> 5 chr1 35576334 3.56e7 NA 3 3 NA NA 3 3 NA
#> 6 chr1 2017424 2.02e6 NA NA 10 10 NA NA 10 10
#> 7 chr1 24901325 2.49e7 NA NA NA NA 11 11 11 11
#> 8 chr1 36304606 3.63e7 NA NA NA NA NA 7 NA 7
#> # ℹ 91 more variables: `09Av` <dbl>, `10Av` <dbl>, `11Av` <dbl>, `12Av` <dbl>,
#> # `13Av` <dbl>, `14Av` <dbl>, `15Av` <dbl>, `16Av` <dbl>, `17Av` <dbl>,
#> # `18Av` <dbl>, `19Av` <dbl>, `20Av` <dbl>, `21Av` <dbl>, `22Av` <dbl>,
#> # `23Av` <dbl>, `24Av` <dbl>, `25Av` <dbl>, `26Av` <dbl>, `27Av` <dbl>,
#> # `28Av` <dbl>, `29Av` <dbl>, `30Av` <dbl>, `31Av` <dbl>, `32Av` <dbl>,
#> # `33Av` <dbl>, `34Av` <dbl>, `35Av` <dbl>, `36Av` <dbl>, `37Av` <dbl>,
#> # `38Av` <dbl>, `39Av` <dbl>, `40Av` <dbl>, `41Av` <dbl>, `42Av` <dbl>, …
Created on 2024-01-18 with reprex v2.0.2