I am having trouble figuring out an appropriate way to perform subgroup analysis using ANCOVA. For my analysis, I want to compare the 6-month depression scores (six_PHQ) between two groups: intervention versus control (Study.Arm) according to marital status (Marital_Status). I am also controlling for baseline score (base_PHQ) as a covariate. These are the columns in the data frame PHQ1.
dput(PHQ1[1:4, ])
structure(list(Patient = c("ASSC-011-JB", "ASSC-013-ME", "ASSC-019-JM",
"ASSC-053-RH"), **Study.Arm = c("Control", "Control", "Control"**,
**"Control")**, DOB = c("1968-01-11", "1939-06-12", "1955-03-24",
"1944-05-04"), Enrolment.Date = c("2021-04-23", "2021-03-19",
"2021-03-25", "2021-06-29"), Age_Enrollment = c(53L, 81L, 66L,
77L), Age_Today = c(56L, 84L, 68L, 79L), Gender = c("m", "f",
"f", "f"), **Marital_Status = structure(c(2L, 2L, 1L, 2L)**, levels = c("Married",
"Not_Married"), class = "factor"), Highest_Education = c(4L,
4L, 3L, 2L), Employment_Type = c(5L, 2L, 2L, 1L), Income_Annual = c(7L,
2L, 1L, 1L), Canadian_Born = c(1L, 2L, 1L, 1L), Ethnicity = c("White",
"White", "White", "White"), Number.of.Chronic.conditions = c(2L,
5L, 5L, 6L), **base_PHQ = c(0L, 3L, 1L, 4L), six_PHQ = c(3L, 1L,
0L, 2L)**, Change.Score = c(-3L, 2L, 1L, 2L), Clinical.Change = c("Minimal Increase",
"Minimal Decrease", "Minimal Decrease", "Minimal Decrease"),
Change.of.5 = c("No Change", "No Change", "No Change", "No Change"
)), row.names = c(NA, 4L), class = "data.frame").```
I tried using the ` group_by ` function and making a lm model within that grouping, but this is not quite what I was hoping for:
` Ancova_Marital <- PHQ1 %>%
group_by(Marital_Status) %>%
do(model = lm(six_PHQ ~ Study.Arm + base_PHQ, data = PHQ1)) %>%
summarise(
Marital_Status = first(Marital_Status),
beta = coef(model)[2], # Slope coefficient (effect of covariate)
p_value = summary(model)$coef[2, 4] # p-value for the slope coefficient
) `
` print(Ancova_Marital)
Marital_Status beta p_value
<fct> <dbl> <dbl>
1 Married -0.231 0.734
2 Not_Married -0.231 0.734 `
With this method, I don't know how to take it a step further and use the ANCOVA code: ` Anova([aov model], type=3) `. Apologies if this is unclear. I am new to using R and stack overflow. Thank you in advance.
I'm not 100% sure I understand what exactly you are looking for but here is a suggestion for splitting your data based on Marital_Status
and performing an ANCOVA for each subgroup:
library(tidyverse)
# Creating a sample dataset
n <- 100 # number of observations
set.seed(0) # seed for reproducibility
data <- tibble(
six_PHQ = sample(0:30, n, replace = T),
Study.Arm = sample(c("Control", "Intervention"), n, replace = T),
base_PHQ = sample(0:30, n, replace = T),
Marital_Status = sample(c("Married", "Not_Married"), n, replace = T)
)
# Grouping the data by "Marital_Status" and creating nested data frames
data_subgroups <- data %>%
group_by(Marital_Status) %>%
nest()
# Custom function to perform ANCOVA analysis
perform_ANCOVA <- function(df) {
model <- lm(six_PHQ ~ Study.Arm + base_PHQ, data = df)
anova_result <- anova(model)
return(anova_result)
}
# Applying ANCOVA analysis to each subgroup
result <- data_subgroups %>%
mutate(ANCOVA_result = map(data, perform_ANCOVA))
Is this what you are looking for?
Even though I don't know the study you are analyzing, you might think about putting everything into one model and maybe add an interaction with Marital_Status
instead of subgrouping.