dplyrgroup-bylinear-regressionancova

How do perform ANCOVA within one group of my dataset?


I am having trouble figuring out an appropriate way to perform subgroup analysis using ANCOVA. For my analysis, I want to compare the 6-month depression scores (six_PHQ) between two groups: intervention versus control (Study.Arm) according to marital status (Marital_Status). I am also controlling for baseline score (base_PHQ) as a covariate. These are the columns in the data frame PHQ1.

  dput(PHQ1[1:4, ])
structure(list(Patient = c("ASSC-011-JB", "ASSC-013-ME", "ASSC-019-JM", 
"ASSC-053-RH"), **Study.Arm = c("Control", "Control", "Control"**, 
**"Control")**, DOB = c("1968-01-11", "1939-06-12", "1955-03-24", 
"1944-05-04"), Enrolment.Date = c("2021-04-23", "2021-03-19", 
"2021-03-25", "2021-06-29"), Age_Enrollment = c(53L, 81L, 66L, 
77L), Age_Today = c(56L, 84L, 68L, 79L), Gender = c("m", "f", 
"f", "f"), **Marital_Status = structure(c(2L, 2L, 1L, 2L)**, levels = c("Married", 
"Not_Married"), class = "factor"), Highest_Education = c(4L, 
4L, 3L, 2L), Employment_Type = c(5L, 2L, 2L, 1L), Income_Annual = c(7L, 
2L, 1L, 1L), Canadian_Born = c(1L, 2L, 1L, 1L), Ethnicity = c("White", 
"White", "White", "White"), Number.of.Chronic.conditions = c(2L, 
5L, 5L, 6L), **base_PHQ = c(0L, 3L, 1L, 4L), six_PHQ = c(3L, 1L, 
0L, 2L)**, Change.Score = c(-3L, 2L, 1L, 2L), Clinical.Change = c("Minimal Increase", 
"Minimal Decrease", "Minimal Decrease", "Minimal Decrease"), 
    Change.of.5 = c("No Change", "No Change", "No Change", "No Change"
    )), row.names = c(NA, 4L), class = "data.frame").```

   
I tried using the `   group_by   ` function and making a lm model within that grouping, but this is not quite what I was hoping for:

`   Ancova_Marital <- PHQ1 %>%
   group_by(Marital_Status) %>%
do(model = lm(six_PHQ ~ Study.Arm + base_PHQ, data = PHQ1)) %>%
   summarise(
     Marital_Status = first(Marital_Status),
     beta = coef(model)[2],  # Slope coefficient (effect of covariate)
     p_value = summary(model)$coef[2, 4] # p-value for the slope coefficient 
   )    `

`   print(Ancova_Marital)
  Marital_Status   beta p_value
  <fct>           <dbl>   <dbl>
1 Married        -0.231   0.734
2 Not_Married    -0.231   0.734   `


With this method, I don't know how to take it a step further and use the ANCOVA code: `    Anova([aov model], type=3)    `. Apologies if this is unclear. I am new to using R and stack overflow. Thank you in advance. 

Solution

  • I'm not 100% sure I understand what exactly you are looking for but here is a suggestion for splitting your data based on Marital_Status and performing an ANCOVA for each subgroup:

    library(tidyverse)
    
    # Creating a sample dataset
    n <- 100 # number of observations
    set.seed(0) # seed for reproducibility
    data <- tibble(
      six_PHQ = sample(0:30, n, replace = T),
      Study.Arm = sample(c("Control", "Intervention"), n, replace = T),
      base_PHQ = sample(0:30, n, replace = T),
      Marital_Status = sample(c("Married", "Not_Married"), n, replace = T)
    )
    
    # Grouping the data by "Marital_Status" and creating nested data frames
    data_subgroups <- data %>% 
      group_by(Marital_Status) %>% 
      nest()
    
    # Custom function to perform ANCOVA analysis
    perform_ANCOVA <- function(df) {
      model <- lm(six_PHQ ~ Study.Arm + base_PHQ, data = df)
      anova_result <- anova(model)
      return(anova_result)
    }
    
    # Applying ANCOVA analysis to each subgroup
    result <- data_subgroups %>% 
      mutate(ANCOVA_result = map(data, perform_ANCOVA))
    

    Is this what you are looking for?

    Even though I don't know the study you are analyzing, you might think about putting everything into one model and maybe add an interaction with Marital_Status instead of subgrouping.