I am trying to get companies and their filing information from EDGAR using edgarWebR
package. Particularly, I want to use two functions from the package - filing_information
and company_filings
.
I have actually thousands of cik
in a different dataset, but both functions above cannot deal with a vector of cik
. This is an example -
library(edagrWebR)
comp_file <- company_filings(c("1000045"), before = "20201231",
type = "10-K", count = 100,
page = 1)
head(comp_file)
accession_number act file_number filing_date accepted_date
1 <NA> 34 000-26680 2020-06-22 2020-06-22
2 <NA> 34 000-26680 2019-06-28 2019-06-28
3 <NA> 34 000-26680 2018-06-27 2018-06-27
4 <NA> 34 000-26680 2017-06-14 2017-06-14
5 <NA> 34 000-26680 2016-06-14 2016-06-14
6 <NA> 34 000-26680 2015-06-15 2015-06-15
href
1 https://www.sec.gov/Archives/edgar/data/1000045/000156459020030033/0001564590-20-030033-index.htm
2 https://www.sec.gov/Archives/edgar/data/1000045/000156459019023956/0001564590-19-023956-index.htm
3 https://www.sec.gov/Archives/edgar/data/1000045/000119312518205637/0001193125-18-205637-index.htm
4 https://www.sec.gov/Archives/edgar/data/1000045/000119312517203193/0001193125-17-203193-index.htm
5 https://www.sec.gov/Archives/edgar/data/1000045/000119312516620952/0001193125-16-620952-index.htm
6 https://www.sec.gov/Archives/edgar/data/1000045/000119312515223218/0001193125-15-223218-index.htm
type film_number
1 10-K 20977409
2 10-K 19927449
3 10-K 18921743
4 10-K 17910577
5 10-K 161712394
6 10-K 15931101
form_name
1 Annual report [Section 13 and 15(d), not S-K Item 405]
2 Annual report [Section 13 and 15(d), not S-K Item 405]
3 Annual report [Section 13 and 15(d), not S-K Item 405]
4 Annual report [Section 13 and 15(d), not S-K Item 405]
5 Annual report [Section 13 and 15(d), not S-K Item 405]
6 Annual report [Section 13 and 15(d), not S-K Item 405]
description size
1 <NA> 14 MB
2 <NA> 10 MB
3 <NA> 5 MB
4 <NA> 5 MB
5 <NA> 5 MB
6 <NA> 7 MB
I need to use the href
variable in filing_information
function.
Actually, I tried to use it this way -
file_info <- filing_information(comp_file$href)
but it does not work. I got this message -
Error in parse_url(url) : length(url) == 1 is not TRUE
I can actually do it by putting each href
variable value like the following way
x <- "https://www.sec.gov/Archives/edgar/data/1000045/000156459020030033/0001564590-20-030033-index.htm"
file_info <- filing_information(x)
The same is true for company_filings
function, where I use only one cik
- "1000045", but in another file I have thousands of cik
for all of which I want to run the company_filings
function. Manually it is not possible as I have thousands of cik
.
Anybody has any idea how I can perform these two functions on a LARGE vector automatically.
Thanks
In general, when a function (whether API-reaching or local) takes only one element as an argument, often the simplest way to "vectorize" it is to use a form of lapply
:
companies <- c("1000045", "1000046", "1000047")
comp_file_list <- lapply(
setNames(nm=companies),
function(comp) company_filings(comp, before = "20201231",
type = "10-K", count = 100,
page = 1)
)
Technically, the setNames(nm=.)
portion is a safeguard, allowing us to know which company id was use for each element. If it is included in the return data, then you can remove it.
Assuming that the return value is always a data.frame
, then you can either keep them in the list (and deal with them as a list of frames, c.f., https://stackoverflow.com/a/24376207/3358227), or you can combine them into one much-taller frame using one of:
# base R
comp_files <- Map(function(x, nm) transform(x, id = nm), comp_files, names(comp_files))
comp_files <- do.call(rbind, comp_files_list)
# dplyr/tidyverse
comp_files <- dplyr::bind_rows(comp_files_list, .id = "id")
# data.table
comp_files <- data.table::rbindlist(comp_files, idcol = "id")
FYI, the second argument of lapply
is a function, where the first argument is filled with each from X
(first arg of lapply
). Sometimes this function can be just the function itself, as in
res <- lapply(companies, company_filings)
This is equivalent to
res <- lapply(companies, function(z) company_filings(z))
If you have a single set of arguments that must be applied to all calls, you can choose one of the following equivalent expressions:
res <- lapply(companies, company_filings, before = "20201231", type = "10-K", count = 100, page = 1)
res <- lapply(companies, function(z) company_filings(z, before = "20201231", type = "10-K", count = 100, page = 1))
If one (or more) of those arguments varies with each company, however, you need a different form. Let's assume that we have different before=
arguments for each company,
befores <- c("20201231", "20201130", "20201031")
res <- Map(function(comp, bef) company_filing(comp, before=bef, type="10-K"),
companies, befores)
Basic error handling if you have ids/refs that fail the query:
res <- lapply(comp, function(cmp) {
tryCatch(
company_filing(cmp, before=".."),
error = function(e) e
)
})
errors <- sapply(res, inherits, "error")
failures <- res[errors]
successes <- res[!errors]
good_returns <- do.call(rbind, success)
names(failures)
# indicates which company ids failed, and the text of the error may
# indicate why they failed
Some options for the tryCatch(..., error=)
argument:
error=identity
returns the raw error, sometimes enough informationerror=function(e) e
same thingerror=function(e) conditionMessage(e)
is a character
return, the message portion of the errorerror=function(e) NULL
ignore the error, return NULL
(or some constant) insteadYou can also conditionally treat e
, including patterns such as if (grepl("not found", e)) {...} else NULL
.