rdataframetime-series

Extract the timestamp of specific item from data frame/table


I have a data set composed from 2 columns and around 12K rows. It have this form

timestamp  track 
0.006       A
0.185       R
0.210       A
0.221  R
0.285  A
0.387  R
0.405  R
0.412  R
0.436  A
0.458  A
0.482  A

So what I want is first to extract all the timestamp where my track is equal to A, which I did easily using so base R function

track_data = subset(df, df$track == 'A')
track_time = track_data$V1

now I want to extract the timestamp of the last track = R just before the last track = An if any R is founded if not to put a timestamp = 0

so in my case for example it would be:

(0, 0.185, 0.221, 0.412, 0.412, 0.412, ...)

Obviously I should have a vector that contain as many element as the number of An in track vector.

I have tried to do a loop but the result is not compatible at all with what I am looking.

for (i in 1 : length(df$track)) {
 if (df$track[i] == 'A')
  x = last(df$track[i] == 'R')
 vect = c(vect,x)
 }

Solution

  • base R

    We can create a vector of just-R timestamps, then use zoo::na.locf (last observation carried forward) to cascade it into the A rows.

    ifelse(dat$track == "R", dat$timestamp, NA)
    #  [1]    NA 0.185    NA 0.221    NA 0.387 0.405 0.412    NA    NA    NA
    zoo::na.locf(ifelse(dat$track == "R", dat$timestamp, NA), na.rm = FALSE)
    #  [1]    NA 0.185 0.185 0.221 0.221 0.387 0.405 0.412 0.412 0.412 0.412
    dat$newcol <- zoo::na.locf(ifelse(dat$track == "R", dat$timestamp, NA), na.rm = FALSE)
    dat$newcol <- ifelse(is.na(dat$newcol), 0, dat$newcol)
    dat
    #    timestamp track newcol
    # 1      0.006     A  0.000
    # 2      0.185     R  0.185
    # 3      0.210     A  0.185
    # 4      0.221     R  0.221
    # 5      0.285     A  0.221
    # 6      0.387     R  0.387
    # 7      0.405     R  0.405
    # 8      0.412     R  0.412
    # 9      0.436     A  0.412
    # 10     0.458     A  0.412
    # 11     0.482     A  0.412
    subset(dat, track == "A")
    #    timestamp track newcol
    # 1      0.006     A  0.000
    # 3      0.210     A  0.185
    # 5      0.285     A  0.221
    # 9      0.436     A  0.412
    # 10     0.458     A  0.412
    # 11     0.482     A  0.412
    

    dplyr

    library(dplyr)
    dat %>%
      mutate(
        newcol = coalesce(
          zoo::na.locf(if_else(track == "R", timestamp, NA_real_), na.rm = FALSE),
          0)
      ) %>%
      filter(track == "A")
    

    coalesce is a function that returns the first non-NA of the arguments, so this replaces our ifelse(is.na(newcol),...) from base R.

    data.table

    library(data.table)
    datDT[, newcol := fcoalesce(
        nafill(fifelse(track == "R", timestamp, NA_real_), type = "locf"),
        0)]
    

    data.table has its own nafill that replaces the need for zoo::na.locf, as well as its own fifelse and fcoalesce.