I am trying to download a list using R with the following code:
name <- paste0("https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx")
master <- readLines(url(name))
master <- master[grep("SC 13(D|G)", master)]
master <- gsub("#", "", master)
master_table <- fread(textConnection(master), sep = "|")
The final line returns an error. I verified that textConnection
works as expected and I could read from it using readLines
, but fread
returns an error. read.table
runs into the same problem.
Error in fread(textConnection(master), sep = "|") : input= must be a single character string containing a file name, a system command containing at least one space, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or, the input data itself containing at least one \n or \r
What am I doing wrong?
1) In the first line we don't need paste
. In the next line we don't need url(...)
. Also we have limited the input to 1000 lines to illustrate the example in less time. We can omit the gsub
if we specify na.strings
in fread
. Also collapsing the input to a single string allows elimination of textConnection
in fread
.
library(data.table)
name <- "https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx"
master <- readLines(name, 1000)
master <- master[grep("SC 13(D|G)", master)]
master <- paste(master, collapse = "\n")
master_table <- fread(master, sep = "|", na.strings = "")
2) A second approach which may be faster is to download the file first and then fread
it as shown.
name <- "https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx"
download.file(name, "master.txt")
master_table <- fread('findstr "SC 13[DG]" master.txt', sep = "|", na.strings = "")
The above is for Windows. For Linux with bash replace the last line with:
master_table <- fread("grep 'SC 13[DG]' master.txt", sep = "|", na.strings = "")