I have a larger dataset and below is a subset of that data. The category is the dependent variable and Day_1 and Day_2 are independent variables.
ID <- c("e-1", "e-2", "e-3", "e-8", "e-9", "e-10", "e-13", "e-16", "e-17", "e-20")
Day_1 <- c(0.58, 0.62, 0.78, 0.18, 0.98, 0.64, 0.32, 0.54, 0.94, 0.87)
Day_2 <- c(0.58, 0.65, 0.25, 0.34, 0.17, 0.82, 0.67, 0.39, 0.49, 0.86)
Category <- c(1, 1, 0, 1, 0, 1, 1, 1, 0, 1)
df <- data.frame(ID, Day_1, Day_2, Category)
As the sample sizes of Category 0 & 1 are different (3 - Category 0 and 7 Category 1), I want to perform a cross multiplication. That means repeating all category 0 data points 7 times, and all category 1 data points 3 times, so that both have a new sample size of 7*3. The final data frame should contain all the columns as 'df' but with all the added rows as well.
How I supposed to do this in R?
This might be the wrong approach, as you will increase the overall sample size and thus inflate the t-statistic.
See this small example also with a binary dependent variable. By doubling the sample size (and not changing proportions of "am"
) you get different results.
summary(glm(am ~ mpg, mtcars, family='binomial'))
# Estimate Std. Error z value Pr(>|z|)
# mpg 0.3070 0.1148 2.673 0.00751 **
summary(glm(am ~ mpg, rbind(mtcars, mtcars), family='binomial'))
# Estimate Std. Error z value Pr(>|z|)
# mpg 0.30703 0.08121 3.781 0.000156 ***
What you want are frequency w
eights which you derive by dividing population proportions (which in your case are both .5
) by sample proportions. You can use mapply
for that.
mtcars <- transform(mtcars,
w=mapply(`/`,
c(`0`=.5, `1`=.5),
proportions(table(am)))[as.character(am)])
summary(glm(am ~ mpg, mtcars, weights=w, family='binomial'))
# Estimate Std. Error z value Pr(>|z|)
# mpg 0.3005 0.1123 2.676 0.00746 **