rdata.tablefreadread.table

fread does not read character vector


I am trying to download a list using R with the following code:

name <- paste0("https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx")
master <- readLines(url(name))
master <- master[grep("SC 13(D|G)", master)]
master <- gsub("#", "", master)
master_table <- fread(textConnection(master), sep = "|")

The final line returns an error. I verified that textConnection works as expected and I could read from it using readLines, but fread returns an error. read.table runs into the same problem.

Error in fread(textConnection(master), sep = "|") :  input= must be a single character string containing a file name, a system command containing at least one space, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or, the input data itself containing at least one \n or \r

What am I doing wrong?


Solution

  • 1) In the first line we don't need paste. In the next line we don't need url(...). Also we have limited the input to 1000 lines to illustrate the example in less time. We can omit the gsub if we specify na.strings in fread. Also collapsing the input to a single string allows elimination of textConnection in fread.

    library(data.table)
    
    name <- "https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx"
    master <- readLines(name, 1000)
    master <- master[grep("SC 13(D|G)", master)]
    master <- paste(master, collapse = "\n")
    master_table <- fread(master, sep = "|", na.strings = "")
    

    2) A second approach which may be faster is to download the file first and then fread it as shown.

    name <- "https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx"
    download.file(name, "master.txt")
    master_table <- fread('findstr "SC 13[DG]" master.txt', sep = "|", na.strings = "")
    

    The above is for Windows. For Linux with bash replace the last line with:

    master_table <- fread("grep 'SC 13[DG]' master.txt", sep = "|", na.strings = "")