rautomationaverage

Averaging temporal series with fixed resolution in R


I have a number of chromatograms saved as .csv files in a folder that look something like

time <- c(0.001575, 0.008775, 0.015975, 0.023175, 0.030375, 0.037575, 0.044775, 0.051975, 0.059175, 0.066375, 0.073575, 0.080776, 0.087976, 0.095176, 0.102376, 0.109576, 0.116776, 0.123976, 0.131176, 0.138376, 0.145576, 0.152776, 0.159976, 0.167176, 0.174376, 0.181576, 0.188776, 0.195976, 0.203176)

RID <- c(67.36, 66.39, 65.39, 64.41, 63.52, 62.76, 62.16,61.76, 61.54,61.53,61.7,62.05,62.52, 63.09, 63.71, 64.33, 64.92, 65.46, 65.93, 66.32, 66.63, 66.87, 67.05, 67.18, 67.27, 67.32, 67.35, 67.37, 67.38)

dd<- data.frame(time, RID)

what I need to do and I'm struggling to understand how could I possibly do is "reduce" the resolution of the dataset by making averaging the data in bins of a certain resolution e.g. 0.05 meaning turning that dataframe into something like

time RID
0 average of the RID data between time 0 and 0.05
0.05 average of the RID data between time 0.05 and 0.1
0.1 average of the RID data between time 0.1 and 0.15

and so on.

The only thing I know that comes close to what I want to do is aggregate, but that would imply first creating a dummy time table with the time data cropped to the desired resolution and it feels like there should be a more easily available solution, especially because I have an entire folder of xsv files and I need to automate the entire process for future studies.


Solution

  • For a single file

    # Define bin width
    bin_width <- 0.05
    
    # Create bins using cut()
    dd$bin <- cut(dd$time, breaks = seq(0, ceiling(max(dd$time)/bin_width)*bin_width, bin_width), 
                  include.lowest = TRUE, right = FALSE)
    
    # Calculate means for each bin
    result <- aggregate(RID ~ bin, data = dd, mean)
    
    # Extract the lower bound of each bin as the new time
    result$time <- as.numeric(sub("\\[([^,]*),.*", "\\1", result$bin))
    result <- result[, c("time", "RID")]
    

    For multiple files

    # Set working directory to your folder (or specify full path)
    setwd("path/to/your/folder")
    
    # Define bin width
    bin_width <- 0.05
    
    # List all CSV files
    files <- list.files(pattern = "*.csv")
    
    # Process each file
    for (file in files) {
      # Read the CSV
      dd <- read.csv(file)
      
      # Create bins
      dd$bin <- cut(dd$time, breaks = seq(0, ceiling(max(dd$time)/bin_width)*bin_width, bin_width), 
                    include.lowest = TRUE, right = FALSE)
      
      # Calculate means
      result <- aggregate(RID ~ bin, data = dd, mean)
      result$time <- as.numeric(sub("\\[([^,]*),.*", "\\1", result$bin))
      result <- result[, c("time", "RID")]
      
      # Save the result (e.g., append "_reduced" to the filename)
      output_file <- sub(".csv", "_reduced.csv", file)
      write.csv(result, output_file, row.names = FALSE)
      
      cat("Processed:", file, "\n")
    }