I am hoping to create a formula in R that I can use to calculate a standard error estimate of population percentages for various demographic factors. There are 1,045 people in my sample. My data frame is called NHIS1, and, for example, I would like to calculate the standard error for the proportion of the population which is white or Hispanic. The variables I have for WHITE and HISP are binary with 0 or 1 indicators. I calculated the population percentages with this code:
#sum(NHIS1$WHITE)=637,nrow(NHIS1)=1045, and sum(NHIS1$HISP)=408
(sum(NHIS1$WHITE)/nrow(NHIS1))*100,
(sum(NHIS1$HISP)/nrow(NHIS1))*100
I thought my formula set up could look something like what's below, but I am not sure if there's a better way to set this all up so R can refer to these population proportions above without me manually plugging it in.
perc_SE=function(p){sqrt((p*(1–p))/1045)}
Thank you!
You could try using a prop.test
of a table of each column, which gives you the proportion as well as 95% confidence intervals. Just multiply these by 100 to get percentages:
prop.test(table(NIHS$WHITE))
#>
#> 1-sample proportions test with continuity correction
#>
#> data: table(NIHS$WHITE), null probability 0.5
#> X-squared = 3.6431, df = 1, p-value = 0.0563
#> alternative hypothesis: true p is not equal to 0.5
#> 95 percent confidence interval:
#> 0.4993011 0.5533346
#> sample estimates:
#> p
#> 0.5263941
If you want a simple function to get the percentages for, you can do this:
proportions <- function(x)
{
a <- prop.test(table(x))
data.frame(Proportion = 100 * a$estimate,
Lower_CI = 100 * a$conf.int[1],
Upper_CI = 100 * a$conf.int[2])
}
So now you can just do:
proportions(NIHS$WHITE)
#> Proportion Lower_CI Upper_CI
#> p 52.63941 49.93011 55.33346