I have one data frame containing the patient_id
's matched with the names
of the patients.
Each patient has his/her own data file FirstNameLastName.csv
. In order to anonymize the data I wrote the function read_in
which will read in each FirstNameLastName.csv
and add the specified patient_id
to it.
For further analysis I now want to have all anonymized data in one data frame object. I tried this using the map_df() function from the purrr package, however I am having problems matching the ID to each read in .csv
file. Could somebody help fix that, such that the result is a data frame containing all the data with the respected ID.
> patient_names
patient_id patient_name
1 1 Tina Turner
2 2 Michael Jackson
3 3 Michael Jordan
4 4 Dom Toretto
5 5 Lebron James
Year Injury
<chr> <chr>
2020 Sprained Ankle
1990 Torn ACL
1995 Bruised Knee
2011 Sore Neck
2014 Headache
2019 Broken Leg
read_in <- function(path, patient_id= 1){
data <- read_delim(path, delim= ";",col_names = TRUE)
data <- add_column(data, patient_id= patient_names[["patient_id"]][id], .before = 1)
patient_id Year Injury
<int> <chr> <chr>
1 5 2020 Sprained Ankle
2 5 1990 Torn ACL
3 5 1995 Bruised Knee
4 5 2011 Sore Neck
5 5 2014 Headache
6 5 2019 Broken Leg
list.files(path= "/directory", pattern = ".csv", full.names = TRUE) %>%
# A tibble: 1234 x 3
patient_id Year Injury
<int> <chr> <chr>
1 1 2012 Ankle
2 1 2014 Broken Arm
3 1 1999 Concussion
4 1 1987 Broken Finger
... ... ... ...
Try this approach -
filenames <- paste0(gsub('\\s', '', patient_names$patient_name), '.csv')
data <- map_df(filenames, read_csv, .id = 'patient_id')
should create a vector of filenames to read from and data
should have all the data combined from these csv files with a unique id for each file which is called 'patient_id'