rdplyrimport

dplyr - Need to add the CSV filename after importing all files from a directory


I have used this code to import all csv files from a directory

orgs <-
  list.files(pattern = "\\.csv$") %>% 
  map_df(~read_csv(., col_types = cols(.default = "c")))

Which has successfully combined all files in the directory into one data frame

I am looking for a way to add the file names of the csv files imported as an addtional variable

Looking something like

Imported Data Filename
data 1 csv1.csv
data 2 csv2.csv

Solution

  • Something like this?

    lapply(list.files(pattern = "\\.csv$"), \(x) {
      read_csv(x, col_types = cols(.default = "c")) |>
      mutate(Filename=x)  }) |>
      bind_rows()
    

    Returns (for the data generated below):

       Sepal.Length Sepal.Width Petal.Length Petal.Width Species Filename 
       <chr>        <chr>       <chr>        <chr>       <chr>   <chr>    
     1 5.1          3.5         1.4          0.2         setosa  iris1.csv
     2 4.9          3           1.4          0.2         setosa  iris1.csv
     3 4.7          3.2         1.3          0.2         setosa  iris1.csv
     4 4.6          3.1         1.5          0.2         setosa  iris1.csv
     5 5            3.6         1.4          0.2         setosa  iris1.csv
     6 5.4          3.9         1.7          0.4         setosa  iris2.csv
     7 4.6          3.4         1.4          0.3         setosa  iris2.csv
     8 5            3.4         1.5          0.2         setosa  iris2.csv
     9 4.4          2.9         1.4          0.2         setosa  iris2.csv
    10 4.9          3.1         1.5          0.1         setosa  iris2.csv
    

    Note that map_df was superseded in purrr 1.0.0.


    Data:

    library(dplyr)
    library(readr)
    
    write.csv(iris[1:5,], "iris1.csv", row.names=FALSE)
    write.csv(iris[6:10,], "iris2.csv", row.names=FALSE)