I have a number of chromatograms saved as .csv files in a folder that look something like
time <- c(0.001575, 0.008775, 0.015975, 0.023175, 0.030375, 0.037575, 0.044775, 0.051975, 0.059175, 0.066375, 0.073575, 0.080776, 0.087976, 0.095176, 0.102376, 0.109576, 0.116776, 0.123976, 0.131176, 0.138376, 0.145576, 0.152776, 0.159976, 0.167176, 0.174376, 0.181576, 0.188776, 0.195976, 0.203176)
RID <- c(67.36, 66.39, 65.39, 64.41, 63.52, 62.76, 62.16,61.76, 61.54,61.53,61.7,62.05,62.52, 63.09, 63.71, 64.33, 64.92, 65.46, 65.93, 66.32, 66.63, 66.87, 67.05, 67.18, 67.27, 67.32, 67.35, 67.37, 67.38)
dd<- data.frame(time, RID)
what I need to do and I'm struggling to understand how could I possibly do is "reduce" the resolution of the dataset by making averaging the data in bins of a certain resolution e.g. 0.05 meaning turning that dataframe into something like
time | RID |
---|---|
0 | average of the RID data between time 0 and 0.05 |
0.05 | average of the RID data between time 0.05 and 0.1 |
0.1 | average of the RID data between time 0.1 and 0.15 |
and so on.
The only thing I know that comes close to what I want to do is aggregate, but that would imply first creating a dummy time table with the time data cropped to the desired resolution and it feels like there should be a more easily available solution, especially because I have an entire folder of xsv files and I need to automate the entire process for future studies.
For a single file
# Define bin width
bin_width <- 0.05
# Create bins using cut()
dd$bin <- cut(dd$time, breaks = seq(0, ceiling(max(dd$time)/bin_width)*bin_width, bin_width),
include.lowest = TRUE, right = FALSE)
# Calculate means for each bin
result <- aggregate(RID ~ bin, data = dd, mean)
# Extract the lower bound of each bin as the new time
result$time <- as.numeric(sub("\\[([^,]*),.*", "\\1", result$bin))
result <- result[, c("time", "RID")]
For multiple files
# Set working directory to your folder (or specify full path)
setwd("path/to/your/folder")
# Define bin width
bin_width <- 0.05
# List all CSV files
files <- list.files(pattern = "*.csv")
# Process each file
for (file in files) {
# Read the CSV
dd <- read.csv(file)
# Create bins
dd$bin <- cut(dd$time, breaks = seq(0, ceiling(max(dd$time)/bin_width)*bin_width, bin_width),
include.lowest = TRUE, right = FALSE)
# Calculate means
result <- aggregate(RID ~ bin, data = dd, mean)
result$time <- as.numeric(sub("\\[([^,]*),.*", "\\1", result$bin))
result <- result[, c("time", "RID")]
# Save the result (e.g., append "_reduced" to the filename)
output_file <- sub(".csv", "_reduced.csv", file)
write.csv(result, output_file, row.names = FALSE)
cat("Processed:", file, "\n")
}