rplotcurve-fittingsmoothingcurve

How to find a peak and 2 valleys in my data?


enter image description here

Hi, I have a dataset (and relative plot) that look a bit like this (it's a series of measurements over time). As you can see, it's full of noise (and actually this has already been "smoothed" with a rolling average).

I am trying to achieve 2 things:

  1. Find the first (and highest) peak and the 2 valleys around it. Only this one peak, not all peaks in the curve.

  2. Fit a line from the 1st valley to the peak, and from the peak to the 2nd valley, see example below (I think I have an idea of how to do this, so it's less important)

enter image description here

I've tried some methods found online (like find_peaks from ggpmisc), but I've only been able to find all peaks and valleys, while I only need this specific one (that is the only true one).

Do you guys have any suggestions?

EDIT

If anyone's interested, I managed to do it by using pracma::findpeaks (it can also find valleys by putting a - sign before the variable of interest). I added TRUE/FALSE columns to track where peaks and valleys are to make it easier to plot these points later.

#Find peak
peaks <- findpeaks(data$L.MEAN, npeaks = 1, minpeakdistance = 100, sortstr = TRUE)
is_peak <- vector("logical" , length(data$Time))
data$is_peak = is_peak
for (Time in peaks[,2]) {
  data$is_peak[Time] = TRUE
}

#Find valleys
valleys <- findpeaks(-data$L.MEAN, npeaks = 2, minpeakdistance = 100)
is_valley <- vector("logical" , length(data$Time))
data$is_valley = is_valley
for (Time in valleys[,2]) {
  data$is_valley[Time] = TRUE
}

Solution

  • I'll derive some data to analyze:

    dat <- data.frame(x = seq(-1, 6*pi, by=0.01))
    dat$y <- sin(dat$x) / ifelse(abs(dat$x) < 1e-9, 1, sqrt(abs(dat$x)))
    library(ggplot2)
    ggplot(dat, aes(x, y)) + geom_line()
    

    dampened sine wave

    Finding the max is easy with which.max:

    ymaxi <- which.max(dat$y)
    ymaxi
    # [1] 432
    dat$y[ymaxi + -1:1]
    # [1] 0.8512233 0.8512383 0.8511839
    
    ggplot(dat, aes(x, y)) +
    geom_line() +
    geom_point(data = ~ .[ymaxi,], color = "red")
    

    dampened sine wave, the max value highlighted with a red dot

    Finding the preceding/following valleys is a skosh more work

    ymini1 <- ymaxi + 1L - which(diff(rev(dat$y[1:ymaxi])) > 0)[1]
    dat$y[ymini1 + -2:2]
    # [1] -0.8511520 -0.8512284 -0.8512356 -0.8511732 -0.8510408
    ymini2 <- which(diff(dat$y[-(1:ymaxi)]) > 0)[1] + ymaxi
    dat$y[ymini2 + -1:1]
    # [1] -0.4633072 -0.4633109 -0.4632688
    
    ggplot(dat, aes(x, y)) + geom_line() + geom_point(data = ~ .[c(ymini1, ymaxi, ymini2),], color = "red")
    

    same dampened sine wave with one peak and two surrounding valleys highlighted

    I'm defining "valley" as the point where the gradient (diff(.)) changes from negative to positive. You may need to include some tolerance with this, such that the change is held for so-many-points in order to skip false-valleys ... in which case there are a lot of various heuristics, mostly depending on the context of the data and your intent. For instance, you can find the most with a positive above a certain value, such as changing > 0 to > 0.01 or similar, but this can fail if it is positive (sloped-up) but very close to flat. Or you could say look for n-consecutive positives, which is a rolling window question and well-informed by using zoo::rollapply or data.table::frollapply or many other window functions; you could also use run-length-encoding for this (R's rle function), perhaps something like:

    diffs <- diff(dat$y[-(1:ymaxi)])
    r <- rle(diffs > 0)
    r
    # Run Length Encoding
    #   lengths: int [1:6] 343 318 316 315 315 160
    #   values : logi [1:6] FALSE TRUE FALSE TRUE FALSE TRUE
    r$values[r$lengths < 3 & r$values] <- FALSE
    which(inverse.rle(r))[1] + ymaxi
    # [1] 776
    

    which happens to be the same as above, but would "ignore" positive-gradients that are only 1 or 2 points before going negative again.