I have 100 hdf5 files in a folder. For a reproducible example let's consider only 2 files, namely:
> list.files(pattern="*.hdf5")
[1] "Cars_20160601_01.hdf5" "Cars_20160601_02.hdf5"
Each hdf5 file contains 2 groups, data
and frame
. I want to extract out 2 objects from data
group. These are called VDS_Veh_Speed
and VDS_Chassis_CG_Position
. Similarly, in the frame
group there are 3 objects. Only the object frame
is relevant in this group.
I want to read these files and extract the relevant variables described above.
# Create a list all the hdf5 files
temp = list.files(pattern="*.hdf5")
# Read all files and create data frames from each using the file name as df name
for (i in unique(temp)){
data <- h5read(file = i, name = "data") # ED data
frame <- h5read(file = i, name = "frame") # Frame numbers
ED <- data.frame(frames = frame$frame,
speed.kph.ED = round(data$VDS_Veh_Speed*1.46667*0.3048*3.6,2),
pedal_pos = data$CFS_Accelerator_Pedal_Position)#fps
df <- h5read(file = i, name = "data/VDS_Chassis_CG_Position")
df <- as.data.frame(df)
colnames(df) <- c("y", "x", "z")
df$speed <- ED$speed.kph.ED
df$pedal_pos <- ED$pedal_pos
df$file.ID <- i
assign(i, df)
}
Now, because I have all the files in the Global environment, I removed the extra objects and only kept the new dfs:
# Remove extra objects
rm(data, df, ED, frame, i, temp)
Finally, I made a list of the dfs in the environment and then created a single data frame:
DF_obj <- lapply(ls(), get)
fdc <- do.call("rbind", DF_obj)
This works for me. But, as mentioned in the comments, assign
should be avoided. Also, I have to manually use rm()
, without which this code won't work. Is there any way to avoid assign
in this context?
If you need the data files, here is the link to the 2 mentioned above: https://1drv.ms/f/s!AsMFpkDhWcnw6g7StJp9dzZ-nCr4
The answer is basically the same as your code, but with a couple minor changes. We just use a list and do normal assign to elements of the list rather than using assign()
to create data frames in your global environment. This saves potential bugs, name clashes, and having to worry about extensive clean-up.
temp = list.files(pattern="*.hdf5")
df_list = list() # initialize a list
# Read all files into a list of data frames
for (i in unique(temp)){
data <- h5read(file = i, name = "data") # ED data
frame <- h5read(file = i, name = "frame") # Frame numbers
ED <- data.frame(frames = frame$frame,
speed.kph.ED = round(data$VDS_Veh_Speed*1.46667*0.3048*3.6,2),
pedal_pos = data$CFS_Accelerator_Pedal_Position)#fps
df <- h5read(file = i, name = "data/VDS_Chassis_CG_Position")
df <- as.data.frame(df)
colnames(df) <- c("y", "x", "z")
df$speed <- ED$speed.kph.ED
df$pedal_pos <- ED$pedal_pos
# assign to the list. We can take care of the id cols automatically
df_list[[i]] <- df
}
names(df) <- unique(temp)
fdc <- data.table::rbindlist(df_list, idcol = "file.ID")
Using data.table::rbindlist
will be faster than using do.call(rbind)
, and it takes care of the ID column for us based on the names of the list.