I have a data set composed from 2 columns and around 12K rows. It have this form
timestamp track
0.006 A
0.185 R
0.210 A
0.221 R
0.285 A
0.387 R
0.405 R
0.412 R
0.436 A
0.458 A
0.482 A
So what I want is first to extract all the timestamp where my track is equal to A, which I did easily using so base R function
track_data = subset(df, df$track == 'A')
track_time = track_data$V1
now I want to extract the timestamp of the last track = R just before the last track = An if any R is founded if not to put a timestamp = 0
so in my case for example it would be:
(0, 0.185, 0.221, 0.412, 0.412, 0.412, ...)
Obviously I should have a vector that contain as many element as the number of An in track vector.
I have tried to do a loop but the result is not compatible at all with what I am looking.
for (i in 1 : length(df$track)) {
if (df$track[i] == 'A')
x = last(df$track[i] == 'R')
vect = c(vect,x)
}
We can create a vector of just-R timestamps, then use zoo::na.locf
(last observation carried forward) to cascade it into the A rows.
ifelse(dat$track == "R", dat$timestamp, NA)
# [1] NA 0.185 NA 0.221 NA 0.387 0.405 0.412 NA NA NA
zoo::na.locf(ifelse(dat$track == "R", dat$timestamp, NA), na.rm = FALSE)
# [1] NA 0.185 0.185 0.221 0.221 0.387 0.405 0.412 0.412 0.412 0.412
dat$newcol <- zoo::na.locf(ifelse(dat$track == "R", dat$timestamp, NA), na.rm = FALSE)
dat$newcol <- ifelse(is.na(dat$newcol), 0, dat$newcol)
dat
# timestamp track newcol
# 1 0.006 A 0.000
# 2 0.185 R 0.185
# 3 0.210 A 0.185
# 4 0.221 R 0.221
# 5 0.285 A 0.221
# 6 0.387 R 0.387
# 7 0.405 R 0.405
# 8 0.412 R 0.412
# 9 0.436 A 0.412
# 10 0.458 A 0.412
# 11 0.482 A 0.412
subset(dat, track == "A")
# timestamp track newcol
# 1 0.006 A 0.000
# 3 0.210 A 0.185
# 5 0.285 A 0.221
# 9 0.436 A 0.412
# 10 0.458 A 0.412
# 11 0.482 A 0.412
library(dplyr)
dat %>%
mutate(
newcol = coalesce(
zoo::na.locf(if_else(track == "R", timestamp, NA_real_), na.rm = FALSE),
0)
) %>%
filter(track == "A")
coalesce
is a function that returns the first non-NA
of the arguments, so this replaces our ifelse(is.na(newcol),...)
from base R.
library(data.table)
datDT[, newcol := fcoalesce(
nafill(fifelse(track == "R", timestamp, NA_real_), type = "locf"),
0)]
data.table
has its own nafill
that replaces the need for zoo::na.locf
, as well as its own fifelse
and fcoalesce
.