rstatisticslongitudinal

Group based trajectory modeling in R


I have a dataset consisting of 3 variables one ID, on variable Y, and time variable t.

It is in long form with each subject having several rows of registrations with the ordinal variable Y (1-5) and time of registration with the first registration being 0 and the rest in months since first registration. The time points are different for each subject.

I want to group the subjects based on similar Trajectories of Y over t.

I have tried the gbmt package but the results seem nonsensical. I am now trying the flexmix package but I cant figure out how to plot the Trajectories.

I am fairly new both in statistics, R and programming so any help is welcomed. I'm open to other packages if flexmix is not right for this.

This is a small data sample with same structure: https://wetransfer.com/downloads/6528f6249eaf6cc2e488193c58aed84b20231208151530/e90fa7

This is the code for the small dataset.

library(dplyr)
library(flexmix)
library(ggplot2)

#fit model
fit <- flexmix(Y ~ t+I(t^2) | ID, data = data, k = 3)
summary(fit)
clusters(fit)

#add groups in dataset
data$group <- clusters(fit)

#add the change of Y
data <- data %>%
  group_by(ID) %>%
  mutate(change_Y = Y - first(Y))

# Calculate mean change in Y for each group at each time point
mean_change_data <- data %>%
  group_by(group, t) %>%
  summarise(mean_change_Y = mean(change_Y, na.rm = TRUE))

# Plot the mean change in Y for each group over t

 ggplot(mean_change_data, aes(x = t, y = mean_change_Y, group = group, color = as.factor(group))) +
  stat_smooth(method = "loess", se = TRUE, alpha = 0.2) + 
  labs(title = "Mean Change in Y Over Time",
       x = "t",
       y = "Mean Change in Y") +
  scale_color_discrete(name = "Group")

This is the resulting plot: This is the resulting plot


Solution

  • flexmix is a regression package, and gbmt is for trajectory analysis. I am not about the speciality of these two packages, but usually you can do analysis using the most common packages.

    Edit: I can see a couple of issues here. 1) you first fit the model with original 'data', but then you modified the data with baselining and averageing 'Y' (Why you need to do it?). This clearly makes the data no longer predictable by the existing model. 2) For plotting the model fitted with original data, try the code below (removed 'stat_smooth' as it does not relevant at all (it's another basic model).

    ggplot(data, aes(x=t,y=Y))+
    geom_point()+   
    geom_point(aes(col = as.factor(model@cluster))) +   
    geom_abline(data = as.data.frame(t(parameters(model))), aes(intercept = `coef.(Intercept)`, slope = `coef.Time`, col = as.factor(seq_along(sigma)))   )
    

    Still, you should first standardized your trajectories, meaning consistent intervals (define one, e.g., 1 s) between time t. You can use complete() from dplyr package to standardize time t and na.approx() from zoo package to interpolate coordinates Y for each group - 'ID'.

    data %<>% group_by(ID)%>%
      dplyr::complete(t = seq(min(t), max(t), by = 1)) %>%
      mutate(Y = zoo::na.approx(Y)) 
    

    Optionally, you can align all the trails by extending last coordinates of the shorter ones to the longest one. Then you may summative calculate common metrics, like overall length, duration, velocity and etc..

    Your case seem not to necessitate Finite Mixture Models, you can do polynomial regressions using glmer() or glm() from lme4 package, which is a much more popular package with rich support. Check this book for the method https://www.researchgate.net/publication/290152385_Growth_Curve_Analysis_and_Visualization_Using_R