I am trying to run a two-sample t-test for a difference between a treatment and control group. Data is not paired. When I subset my original dataframe, I found that I have unequal sample sizes (not an issue by hand, but R seems to make it an issue). Here is my code:
CG<-subset(data,treat=="Control")
TG<-subset(data,treat!="Control")
agep <-t.test(CG$age~TG$age)$p.value
The error I get is the following:
Error in model.frame.default(formula = CG$age ~ TG$age) :
variable lengths differ (found for 'TG$age')
Yes! The lengths do differ. Not sure why that's a problem if I'm not running a paired test? Thanks in advance for any help.
If the unequal sample sizes are independent groups, then the mean can be parsed in R via an unpaired two-sample t-test.
First, ensure that your data pass a test of homoscedasticity--are the variances homogenous? We do this in R with a Fisher's F-test, var.test(x, y)
.
CG <- subset(data, treat == "Control")
TG <- subset(data, treat != "Control")
var.test(CG, TG)
If your p > 0.05, then you can assume that the variances of both samples are homogenous. In this case, we run a classic Student's two-sample t-test by setting the parameter var.equal = TRUE
.
agep <- t.test(CG$age, TG$age, var.equal = TRUE)
If the F-test returns a p < 0.05, then you can assume that the variances of the two groups are different (heteroscedasticity). In this case, you can run a Welch t-statistic. Simply set var.equal = FALSE
.
agep <- t.test(CG$age, TG$age, var.equal = FALSE)