I need to get the positivity rate by county for multiple drugs and multiple counties. The real data includes more drugs and more counties. The NA values mean that the test was not ordered so they should not be counted as the total number of samples.
Also, I need to use prop.test(x, y, conf.level = 0.95) to obtain the proportions and the confidence interval for each county.
Here is a mock data:
county <- c("Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange", "Erie", "Orange")
drug1 <- c("Positive", "Negative", "Negative", "Positive", NA, "Negative", "Positive", "Negative", "Positive", "Negative", "Positive", "Negative")
drug2 <- c("Positive", NA, "Negative", "Negative", "Negative", "Positive", "Positive", NA, "Negative", "Negative", "Negative", "Positive")
data <- data.frame(county, drug1, drug2)
The problem is that I cannot use a simple groupby and summarise to calculate the rate, like shown below. I have to use the prop.test() and get a rate and a confidence interval for each county.
data |>
group_by(county) |>
summarise(drug1 = sum(drug1=="Positive", na.rm = TRUE)/
sum(!is.na(drug1)),
drug2 = sum(drug2=="Positive", na.rm = TRUE)/
sum(!is.na(drug2))
) |>
ungroup()
county <- c("Erie", "Orange", "Erie", "Orange", "Erie", "Orange",
"Erie", "Orange", "Erie", "Orange", "Erie", "Orange")
drug1 <- c("Positive", "Negative", "Negative", "Positive", NA,
"Negative", "Positive", "Negative", "Positive", "Negative", "Positive", "Negative")
drug2 <- c("Positive", NA, "Negative", "Negative", "Negative",
"Positive", "Positive", NA, "Negative", "Negative", "Negative", "Positive")
data1 <- data.frame(county, drug1, drug2)
It's better to keep the data in entirely "long" format. You can convert it by using f.ex. melt
from reshape2
.
library(reshape2)
data1 <- melt(data1, id="county", variable.name="drug")
Then you split
into groups, by specifying factors, or by using a formula (as shown). lapply
over the groups, sending a table
of the values column to prop.test
.
prop.res <- lapply(split(data1, ~ county+drug), function(x) prop.test(table(x$value)))
The result is a list of htest
objects which you can extract values from by again using lapply
. Either one at a time…
lapply(prop.res, `[[`, "p.value")
# $Erie.drug1
# Negative
# 0.375
#
# $Orange.drug1
# Negative
# 0.21875
#
# $Erie.drug2
# Negative
# 0.6875
#
# $Orange.drug2
# [1] 1
#
Or multiple.
lapply(prop.res, `[`, c("p.value", "conf.int"))
# $Erie.drug1
# $Erie.drug1$p.value
# Negative
# 0.375
#
# $Erie.drug1$conf.int
# [1] 0.0050507634 0.7164179361
# attr(,"conf.level")
# [1] 0.95
#
#
# $Orange.drug1
# $Orange.drug1$p.value
# Negative
# 0.21875
#
# $Orange.drug1$conf.int
# [1] 0.35876542 0.99578926
# attr(,"conf.level")
# [1] 0.95
#
#
# $Erie.drug2
# $Erie.drug2$p.value
# Negative
# 0.6875
# …