I have a list of times and values that are not standardized; in other words, the times when values are assigned are not consistent across the data. Some elements may have values assigned at t=1,3,5
while others have values assigned at t=2,4,6,8
. How can I convert these values to a standard mxn
matrix format? I would be okay with filling in the blanks with NA
but ideally I would like impute the missing values.
Example:
set.seed(1)
ids <- 1:5
# create a time-based function that generates a new value based on previous value
age_fn <- function(prior_value, age) prior_value - 1.2*age
my_list <- list()
for(i in 1:length(ids))
{
# how many records for this id?
N <- sample(c(2:4), 1, replace=TRUE)
# for each record, assign a time-step
time_step <- sample(c(1:3), N-1, replace=TRUE) # minus 1 because first record is at t=0
# define time when values are recorded
t <- rep(0, times=N)
for(n in 2:N)
{
t[n] <- t[n-1] + time_step[n-1]
}
# assign values recorded at each time
value <- rep(100, times=N)
for(n in 2:N)
{
value[n] <- age_fn(value[n-1], t[n])
}
my_list[[i]] <- list(ids[i], t, value)
}
As an example, my first element has a two values while the second one has four, ranging from t=0 to t=7
:
> my_list
[[3]]
[[3]][[1]]
[1] 3
[[3]][[2]]
[1] 0 3
[[3]][[3]]
[1] 100.0 96.4
[[4]]
[[4]][[1]]
[1] 4
[[4]][[2]]
[1] 0 2 4 7
[[4]][[3]]
[1] 100.0 97.6 92.8 84.4
I would like this as a 2x8 matrix [element, t]
, with values (or NA
) filled in where they don't exist in the non-standard format:
0 1 2 3 4 5 6 7
values1 100 100 100.0 96.4 96.4 96.4 96.4 96.4
values2 100 100 97.6 97.6 92.8 92.8 92.8 84.4
Edit: edited with set.seed
for reproducability and example output
You could easily do
m = max(unlist((lapply(my_list, `[[`, 2))))
v = rep(NA, m+1)
sapply(my_list, \(l) {
v[l[[2]] + 1] = l[[3]]
zoo::na.locf(v) }) |>
t() |>
`dimnames<-`(list(paste0('values', sapply(my_list, '[[', 1)), 0:m))
0 1 2 3 4 5 6 7
values1 100 100.0 100.0 96.4 96.4 96.4 96.4 96.4
values2 100 100.0 97.6 97.6 97.6 97.6 97.6 97.6
values3 100 100.0 100.0 96.4 96.4 96.4 96.4 96.4
values4 100 100.0 97.6 97.6 92.8 92.8 92.8 84.4
values5 100 98.8 96.4 92.8 92.8 92.8 92.8 92.8
Without zoo::na.locf()
0 1 2 3 4 5 6 7
values1 100 NA NA 96.4 NA NA NA NA
values2 100 NA 97.6 NA NA NA NA NA
values3 100 NA NA 96.4 NA NA NA NA
values4 100 NA 97.6 NA 92.8 NA NA 84.4
values5 100 98.8 96.4 92.8 NA NA NA NA
Do lapply( ... ) |> do.call(what='rbind')
instead, if you prefer lapply()
(I do).
Assumptions:
t
s for each id
,id
.We can optimse if the list is very long.