redgar

How to get more than one company's information using edgarWebR


I am trying to get companies and their filing information from EDGAR using edgarWebR package. Particularly, I want to use two functions from the package - filing_information and company_filings.

I have actually thousands of cik in a different dataset, but both functions above cannot deal with a vector of cik. This is an example -

library(edagrWebR)
comp_file <- company_filings(c("1000045"), before = "20201231",
                            type = "10-K",  count = 100,
                            page = 1)

head(comp_file)
  accession_number act file_number filing_date accepted_date
1             <NA>  34   000-26680  2020-06-22    2020-06-22
2             <NA>  34   000-26680  2019-06-28    2019-06-28
3             <NA>  34   000-26680  2018-06-27    2018-06-27
4             <NA>  34   000-26680  2017-06-14    2017-06-14
5             <NA>  34   000-26680  2016-06-14    2016-06-14
6             <NA>  34   000-26680  2015-06-15    2015-06-15
                                                                                               href
1 https://www.sec.gov/Archives/edgar/data/1000045/000156459020030033/0001564590-20-030033-index.htm
2 https://www.sec.gov/Archives/edgar/data/1000045/000156459019023956/0001564590-19-023956-index.htm
3 https://www.sec.gov/Archives/edgar/data/1000045/000119312518205637/0001193125-18-205637-index.htm
4 https://www.sec.gov/Archives/edgar/data/1000045/000119312517203193/0001193125-17-203193-index.htm
5 https://www.sec.gov/Archives/edgar/data/1000045/000119312516620952/0001193125-16-620952-index.htm
6 https://www.sec.gov/Archives/edgar/data/1000045/000119312515223218/0001193125-15-223218-index.htm
  type film_number
1 10-K    20977409
2 10-K    19927449
3 10-K    18921743
4 10-K    17910577
5 10-K   161712394
6 10-K    15931101
                                               form_name
1 Annual report [Section 13 and 15(d), not S-K Item 405]
2 Annual report [Section 13 and 15(d), not S-K Item 405]
3 Annual report [Section 13 and 15(d), not S-K Item 405]
4 Annual report [Section 13 and 15(d), not S-K Item 405]
5 Annual report [Section 13 and 15(d), not S-K Item 405]
6 Annual report [Section 13 and 15(d), not S-K Item 405]
  description  size
1        <NA> 14 MB
2        <NA> 10 MB
3        <NA>  5 MB
4        <NA>  5 MB
5        <NA>  5 MB
6        <NA>  7 MB

I need to use the href variable in filing_information function.

Actually, I tried to use it this way -

file_info <- filing_information(comp_file$href) 

but it does not work. I got this message -


Error in parse_url(url) : length(url) == 1 is not TRUE

I can actually do it by putting each href variable value like the following way

x <- "https://www.sec.gov/Archives/edgar/data/1000045/000156459020030033/0001564590-20-030033-index.htm"

file_info <- filing_information(x)

The same is true for company_filings function, where I use only one cik - "1000045", but in another file I have thousands of cik for all of which I want to run the company_filings function. Manually it is not possible as I have thousands of cik.

Anybody has any idea how I can perform these two functions on a LARGE vector automatically.

Thanks


Solution

  • In general, when a function (whether API-reaching or local) takes only one element as an argument, often the simplest way to "vectorize" it is to use a form of lapply:

    companies <- c("1000045", "1000046", "1000047")
    comp_file_list <- lapply(
      setNames(nm=companies),
      function(comp) company_filings(comp, before = "20201231",
                                     type = "10-K",  count = 100,
                                     page = 1)
    )
    

    Technically, the setNames(nm=.) portion is a safeguard, allowing us to know which company id was use for each element. If it is included in the return data, then you can remove it.

    Assuming that the return value is always a data.frame, then you can either keep them in the list (and deal with them as a list of frames, c.f., https://stackoverflow.com/a/24376207/3358227), or you can combine them into one much-taller frame using one of:

    # base R
    comp_files <- Map(function(x, nm) transform(x, id = nm), comp_files, names(comp_files))
    comp_files <- do.call(rbind, comp_files_list)
    
    # dplyr/tidyverse
    comp_files <- dplyr::bind_rows(comp_files_list, .id = "id")
    
    # data.table
    comp_files <- data.table::rbindlist(comp_files, idcol = "id")
    

    FYI, the second argument of lapply is a function, where the first argument is filled with each from X (first arg of lapply). Sometimes this function can be just the function itself, as in

    res <- lapply(companies, company_filings)
    

    This is equivalent to

    res <- lapply(companies, function(z) company_filings(z))
    

    If you have a single set of arguments that must be applied to all calls, you can choose one of the following equivalent expressions:

    res <- lapply(companies, company_filings, before = "20201231", type = "10-K",  count = 100, page = 1)
    res <- lapply(companies, function(z) company_filings(z, before = "20201231", type = "10-K",  count = 100, page = 1))
    

    If one (or more) of those arguments varies with each company, however, you need a different form. Let's assume that we have different before= arguments for each company,

    befores <- c("20201231", "20201130", "20201031")
    res <- Map(function(comp, bef) company_filing(comp, before=bef, type="10-K"),
               companies, befores)
    

    Basic error handling if you have ids/refs that fail the query:

    res <- lapply(comp, function(cmp) {
      tryCatch(
        company_filing(cmp, before=".."),
        error = function(e) e
      )
    })
    errors <- sapply(res, inherits, "error")
    failures <- res[errors]
    successes <- res[!errors]
    good_returns <- do.call(rbind, success)
    
    names(failures)
    # indicates which company ids failed, and the text of the error may
    # indicate why they failed
    

    Some options for the tryCatch(..., error=) argument:

    You can also conditionally treat e, including patterns such as if (grepl("not found", e)) {...} else NULL.