I know this has been done before but I'm struggling to get it to work with my code. I have multiple .csv files with different headings/number of columns. The columns are in the same order and I really just need the first 3 columns and then I need to just replace the column headings so they are the same.
y_path <- "Y:\\Public\\xx\\DataLoggers\\WaterTemp\\DarkCanyon\\DC Trib 1A\\"
all_csv <- list.files(path = y_path, pattern = '.csv', full.names = TRUE)
open_csv <- lapply(all_csv, \(x) fread(x,skip=2))
one_df <- data.table::rbindlist(open_csv, fill=TRUE, idcol='names(all_csv)')
one_df
df1<-one_df %>% select(1:3)
df1
colnames(df1) <- c("numb","datetime","temp") #change column names
output:
numb datetime temp
<int> <char> <num>
1: 1 6/30/2014 13:37 17.272
2: 2 6/30/2014 14:37 18.438
3: 3 6/30/2014 15:37 18.152
4: 4 6/30/2014 16:37 18.176
5: 5 6/30/2014 17:37 18.200
---
46230: 915 07/29/24 01:41:52 PM 11.722
46231: 916 07/29/24 02:41:52 PM 12.401
46232: 917 07/29/24 03:41:52 PM 13.269
46233: 918 07/29/24 04:10:09 PM NA
46234: 919 07/29/24 04:10:12 PM NA
The problem is I really want to include the path because ideally I will grab all these files from within the WaterTemp folder instead of individually running code for each subfolder separately. Once I have the pathway I should be able to pull the folder level (site) name from it. I'm not sure how with the way I have coded this to include the pathway name in a new column.
What I want
numb datetime temp Site
<int> <char> <num> <char>
1: 1 6/30/2014 13:37 17.272 Y:\\Public\\Fisher Group\\DataLoggers\\WaterTemp\\DarkCanyon\\DC Trib 1A\\
2: 2 6/30/2014 14:37 18.438 Y:\\Public\\Fisher Group\\DataLoggers\\WaterTemp\\DarkCanyon\\DC Trib 1A\\
3: 3 6/30/2014 15:37 18.152 Y:\\Public\\Fisher Group\\DataLoggers\\WaterTemp\\DarkCanyon\\DC Trib 1A\\
4: 4 6/30/2014 16:37 18.176 Y:\\Public\\Fisher Group\\DataLoggers\\WaterTemp\\DarkCanyon\\DC Trib 1A\\
5: 5 6/30/2014 17:37 18.200 Y:\\Public\\Fisher Group\\DataLoggers\\WaterTemp\\DarkCanyon\\DC Trib 1A\\
---
46230: 915 07/29/24 01:41:52 PM 11.722 Y:\\Public\\Fisher Group\\DataLoggers\\WaterTemp\\DarkCanyon\\DC Trib 1A\\
46231: 916 07/29/24 02:41:52 PM 12.401 Y:\\Public\\Fisher Group\\DataLoggers\\WaterTemp\\DarkCanyon\\DC Trib 1A\\
46232: 917 07/29/24 03:41:52 PM 13.269 Y:\\Public\\Fisher Group\\DataLoggers\\WaterTemp\\DarkCanyon\\DC Trib 1A\\
46233: 918 07/29/24 04:10:09 PM NA Y:\\Public\\Fisher Group\\DataLoggers\\WaterTemp\\DarkCanyon\\DC Trib 1A\\
46234: 919 07/29/24 04:10:12 PM NA Y:\\Public\\Fisher Group\\DataLoggers\\WaterTemp\\DarkCanyon\\DC Trib 1A\\
Thank you, Amanda
You need to change the column names before using rbindlist
. Further, idcol=
is great, but unless I'm missing something, your all_csv
is not named, so the idcol is going to capture the index within the list, not the filename itself.
Perhaps something like this:
one_df <- lapply(
setNames(nm = all_csv),
\(x) setnames(fread(x, skip = 2, select = 1:3), c("numb","datetime","temp")
) |>
rbindlist(idcol = "names(all_csv)")
I'm using setNames(nm=x)
as short-hand of setNames(x,nm=x)
, once the list/vector has names, lapply
retains them.
Edit (slight code-golf), motivated by Friede:
one_df <- lapply(
setNames(nm = all_csv),
\(x) fread(x, skip = 2, col.names = c("numb", "datetime", "temp"), select = 1:3)
) |>
rbindlist(idcol = "names(all_csv)")