rpurrrread.tablerdatadat-protocol

Reading multiple .dat files as a list and saving as .RDATA files in R


I want to import multiple .DAT files from a directory and make them as a list elements and then save them as .RDATA files.

I tried the following code

files <- dir(pattern = "*.DAT")
library(tidyverse)
Data1 <- 
  files %>%
    map(~ read.table(file = ., fill = TRUE))

which works sometimes and fails others. The files are also available on this link. I want to read all files and them save them as .RDATA with the same names.


Solution

  • Since the data of the link are partly a little bit unclean, I show you the solution of the core problem of your question on the basis of this example data:

    (name1 <- name2 <- name3 <- name4 <- name5 <- data.frame(matrix(1:12, 3, 4)))
    #   X1 X2 X3 X4
    # 1  1  4  7 10
    # 2  2  5  8 11
    # 3  3  6  9 12
    

    We save the data into a sub directory of your working directory named "test".

    l <- mget(ls(pattern="^name"))
    DIR <- "test"
    # dir.create(DIR)  # leave out if dir already exists
    sapply(1:length(l), function(x) 
      write.table(l[[x]], file=paste0(DIR, "/", names(l)[x], ".dat"), row.names=FALSE))
    

    Now we look what's inside "test".

    dir(DIR)
    # [1] "name1.dat" "name2.dat" "name3.dat" "name4.dat" "name5.dat"
    

    Now we import the files in the directory by pattern. I use rio::import_list, which nicely imports the files into a list an uses data.table::fread inside. But your own code also would work fine.

    # rm(list=ls())  # commented out for user safety
    L <- rio::import_list(paste0(DIR, "/", dir(DIR, pattern="\\.dat$")), format="tsv")
    

    To save them as .Rdata we want to assign names dynamically which we achive with the list option within save().

    sapply(seq_along(L), function(x) {
      tmp <- L[[x]]
      assign(names(L)[x], tmp)
      save(list=names(L)[x], file=paste0(DIR, "/", names(L)[x], ".Rdata"))
    })
    

    When we list the directory we see that the data was created.

    dir(DIR)
    # [1] "name1.dat"   "name1.Rdata" "name2.dat"   "name2.Rdata" "name3.dat"   "name3.Rdata"
    # [7] "name4.dat"   "name4.Rdata" "name5.dat"   "name5.Rdata"
    

    Now let's look whether the object names also were created correctly:

    # rm(list=ls())  # commented out for user safety
    load("test/name1.Rdata")
    ls()
    # [1] "name1"
    name1
    #   X1 X2 X3 X4
    # 1  1  4  7 10
    # 2  2  5  8 11
    # 3  3  6  9 12
    

    Which is the case.

     

    Add-on option

    We alternatively could attempt a more direct approach using rvest. First we fetch the data names:

    library(rvest)
    dat.names <- html_attr(html_nodes(read_html(
      "https://www2.stat.duke.edu/courses/Spring03/sta113/Data/Hand/Hand.html"),
      "a"), "href")
    

    and create individual links:

    links <- as.character(sapply(dat.names, function(x)
      paste0("https://www2.stat.duke.edu/courses/Spring03/sta113/Data/Hand/", x)))
    

    The remainder is basically the same as above:

    DIR <- "test"
    # dir.create(DIR)  # leave out if dir already exists
    
    library(rio)
    system.time(L <- import_list(links, format="tsv") ) # this will take a minute
    sapply(seq_along(L), function(x) {
      tmp <- L[[x]]
      assign(names(L)[x], tmp)
      save(list=names(L)[x], file=paste0(DIR, "/", names(L)[x], ".Rdata"))
    })
    
    # rm(list=ls())  # commented out for user safety
    load("test/clinical.Rdata")  # test a data set
    clinical
    #    V1  V2  V3
    # 1  26  31  57
    # 2  51  59 110
    # 3  21  11  32
    # 4  40  34  74
    # 5 138 135 273
    

    However, as noted earlier in the introduction, the data are partly a little bit unclean and you probably will have to handle them individually and adapt the code case-wise.