r statistics confidence-interval rate proportions

Obtaining proportions and confidence intervals with prop.test() for multiple groups in R

I need to get the positivity rate by county for multiple drugs and multiple counties. The real data includes more drugs and more counties. The NA values mean that the test was not ordered so they should not be counted as the total number of samples.

Also, I need to use prop.test(x, y, conf.level = 0.95) to obtain the proportions and the confidence interval for each county.

Here is a mock data:

county <- c("Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange")
drug1 <- c("Positive", "Negative", "Negative", "Positive", NA, "Negative", "Positive", "Negative", "Positive", "Negative", "Positive", "Negative")
drug2 <- c("Positive", NA, "Negative", "Negative", "Negative", "Positive", "Positive", NA, "Negative", "Negative", "Negative", "Positive")
data <- data.frame(county, drug1, drug2)

The problem is that I cannot use a simple groupby and summarise to calculate the rate, like shown below. I have to use the prop.test() and get a rate and a confidence interval for each county.

data |> 
  group_by(county) |> 
  summarise(drug1 = sum(drug1=="Positive", na.rm = TRUE)/
              sum(!is.na(drug1)),
            drug2 = sum(drug2=="Positive", na.rm = TRUE)/
              sum(!is.na(drug2))
            ) |> 
  ungroup()

Solution

county <- c("Erie", "Orange", "Erie", "Orange", "Erie", "Orange", 
  "Erie", "Orange", "Erie", "Orange", "Erie", "Orange")
drug1 <- c("Positive", "Negative", "Negative", "Positive", NA, 
  "Negative", "Positive", "Negative", "Positive", "Negative", "Positive", "Negative")
drug2 <- c("Positive", NA, "Negative", "Negative", "Negative", 
  "Positive", "Positive", NA, "Negative", "Negative", "Negative", "Positive")
data1 <- data.frame(county, drug1, drug2)

It's better to keep the data in entirely "long" format. You can convert it by using f.ex. melt from reshape2.

library(reshape2)

data1 <- melt(data1, id="county", variable.name="drug")

Then you split into groups, by specifying factors, or by using a formula (as shown). lapply over the groups, sending a table of the values column to prop.test.

prop.res <- lapply(split(data1, ~ county+drug), function(x) prop.test(table(x$value)))

The result is a list of htest objects which you can extract values from by again using lapply. Either one at a time…

lapply(prop.res, `[[`, "p.value")
# $Erie.drug1
# Negative 
#    0.375 
# 
# $Orange.drug1
# Negative 
#  0.21875 
# 
# $Erie.drug2
# Negative 
#   0.6875 
# 
# $Orange.drug2
# [1] 1
#

Or multiple.

lapply(prop.res, `[`, c("p.value", "conf.int"))
# $Erie.drug1
# $Erie.drug1$p.value
# Negative 
#    0.375 
# 
# $Erie.drug1$conf.int
# [1] 0.0050507634 0.7164179361
# attr(,"conf.level")
# [1] 0.95
# 
# 
# $Orange.drug1
# $Orange.drug1$p.value
# Negative 
#  0.21875 
# 
# $Orange.drug1$conf.int
# [1] 0.35876542 0.99578926
# attr(,"conf.level")
# [1] 0.95
# 
# 
# $Erie.drug2
# $Erie.drug2$p.value
# Negative 
#   0.6875 
# …