loadpurrredgar

Why purrr:map () function is not working with load () to import Rda datasets


I have downloaded some Rdata files using getMasterIndex function from edgar package.

Now I am trying to load all of these files into RStudio using the following code -

paths <- list.files('Master Indexes', pattern = '[.]Rda$', full.names = TRUE)
files <- map (paths, load)
list_rbind(files)

The output of the files dataset is, but there should be data in it.

[[1]]
[1] "year.master"

[[2]]
[1] "year.master"

[[3]]
[1] "year.master"

The output of the code list_rbind(files) is -

Error in `list_rbind()`:
! Each element of `x` must be either a data frame or `NULL`.
ℹ Elements 1, 2, and 3 are not.
Run `rlang::last_trace()` to see where the error occurred.

However, the last Rda file is loaded in RStudio with the name being year.master

I have also used the for loop function, but the results remain the same.

I tried to take help from this page, but it does not work - Using purrr to load multiple rda files

My goal is to put all of the Rda files into a list and then convert it into a dataframe.


Solution

  • Solution

    Use this:

    map(paths, ~ {load(.x); year.master}) 
    # or map_dfr, if you want a dataframe as an output instead of a list
    

    Explanation

    Okay so first of all: clear your environment (or save your environment, and start a fresh one). If you're anything like me, then you have a lot of things in there that it makes it hard to see what is loaded.

    Then run this code:

    pacman::p_load(edgar, tidyverse)
    
    useragent <- "Your Name Contact@domain.com"
    getMasterIndex(2006, useragent) 
    getMasterIndex(2022, useragent)
    
    paths <- dir("Master Indexes/", full.names = TRUE) |> grep(pattern = "\\.Rda", value = TRUE)
    
    example <- load(paths[1])
    
    files <- map(paths, ~ load(.x, .GlobalEnv))
    

    Afterwards, you'll see a few things in your environment: image of environment with year.master dataframe in it

    So you can see, even though (it seems) you were trying to load the files as the object files, they aren't actually saved there, they're saved under another name, "year.master", and the function returns that name. It appears that the R objects are loaded with their (presumably original) name.

    From the documentation:

    load(<file>) replaces all existing objects with the same names in the current environment (typically your workspace, .GlobalEnv) and hence potentially overwrites important data. It is considerably safer to use envir = to load into a different environment, or to attach(file) which load()s into a new entry in the search path.

    In other words, because they all have the same name, running map(paths, ~ load(.x, .GlobalEnv)) will load all of them, but you'll only get the last one, because every one after the first will overwrite the one that came before it.