I want to import multiple .DAT
files from a directory and make them as a list elements and then save them as .RDATA
files.
I tried the following code
files <- dir(pattern = "*.DAT")
library(tidyverse)
Data1 <-
files %>%
map(~ read.table(file = ., fill = TRUE))
which works sometimes and fails others. The files are also available on this link. I want to read all files and them save them as .RDATA
with the same names.
Since the data of the link are partly a little bit unclean, I show you the solution of the core problem of your question on the basis of this example data:
(name1 <- name2 <- name3 <- name4 <- name5 <- data.frame(matrix(1:12, 3, 4)))
# X1 X2 X3 X4
# 1 1 4 7 10
# 2 2 5 8 11
# 3 3 6 9 12
We save the data into a sub directory of your working directory named "test"
.
l <- mget(ls(pattern="^name"))
DIR <- "test"
# dir.create(DIR) # leave out if dir already exists
sapply(1:length(l), function(x)
write.table(l[[x]], file=paste0(DIR, "/", names(l)[x], ".dat"), row.names=FALSE))
Now we look what's inside "test"
.
dir(DIR)
# [1] "name1.dat" "name2.dat" "name3.dat" "name4.dat" "name5.dat"
Now we import the files in the directory by pattern. I use rio::import_list
, which nicely imports the files into a list an uses data.table::fread
inside. But your own code also would work fine.
# rm(list=ls()) # commented out for user safety
L <- rio::import_list(paste0(DIR, "/", dir(DIR, pattern="\\.dat$")), format="tsv")
To save them as .Rdata
we want to assign
names dynamically which we achive with the list
option within save()
.
sapply(seq_along(L), function(x) {
tmp <- L[[x]]
assign(names(L)[x], tmp)
save(list=names(L)[x], file=paste0(DIR, "/", names(L)[x], ".Rdata"))
})
When we list the directory we see that the data was created.
dir(DIR)
# [1] "name1.dat" "name1.Rdata" "name2.dat" "name2.Rdata" "name3.dat" "name3.Rdata"
# [7] "name4.dat" "name4.Rdata" "name5.dat" "name5.Rdata"
Now let's look whether the object names also were created correctly:
# rm(list=ls()) # commented out for user safety
load("test/name1.Rdata")
ls()
# [1] "name1"
name1
# X1 X2 X3 X4
# 1 1 4 7 10
# 2 2 5 8 11
# 3 3 6 9 12
Which is the case.
We alternatively could attempt a more direct approach using rvest
. First we fetch the data names:
library(rvest)
dat.names <- html_attr(html_nodes(read_html(
"https://www2.stat.duke.edu/courses/Spring03/sta113/Data/Hand/Hand.html"),
"a"), "href")
and create individual links:
links <- as.character(sapply(dat.names, function(x)
paste0("https://www2.stat.duke.edu/courses/Spring03/sta113/Data/Hand/", x)))
The remainder is basically the same as above:
DIR <- "test"
# dir.create(DIR) # leave out if dir already exists
library(rio)
system.time(L <- import_list(links, format="tsv") ) # this will take a minute
sapply(seq_along(L), function(x) {
tmp <- L[[x]]
assign(names(L)[x], tmp)
save(list=names(L)[x], file=paste0(DIR, "/", names(L)[x], ".Rdata"))
})
# rm(list=ls()) # commented out for user safety
load("test/clinical.Rdata") # test a data set
clinical
# V1 V2 V3
# 1 26 31 57
# 2 51 59 110
# 3 21 11 32
# 4 40 34 74
# 5 138 135 273
However, as noted earlier in the introduction, the data are partly a little bit unclean and you probably will have to handle them individually and adapt the code case-wise.