I need to calculate an FDR variable per group, using an expected random distribution of p values (corresponds to the "Random" type).
library(dplyr)
library(data.table)
calculate_empirical_fdr = function(control_pVal, test_pVal) {
m_control = length(control_pVal)
m_test = length(test_pVal)
unlist(lapply(test_pVal, function(significance_threshold) {
m_control = length(control_pVal)
m_test = length(test_pVal)
FP_expected = length(control_pVal[control_pVal<=significance_threshold])*m_test/m_control # number of
expected false positives in a p-value sequence with the size m_test
S = length(test_pVal[test_pVal<=significance_threshold]) # number of significant hits (FP + TP)
return(FP_expected/S)
}))
}
An example dataset with groups I need to control for in the "Group" variable:
set.seed(42)
library(dplyr); library(data.table)
dataset_test = data.table(Type = c(rep("Random", 500),
rep("test1", 500),
rep("test2", 500)),
Group = sample(c("group1", "group2", "group3"), 1500, replace = T),
Pvalue = c(runif(n = 500),
rbeta(n = 500, shape1 = 1, shape2 = 4),
rbeta(n = 500, shape1 = 1, shape2 = 6))
)
I have found that the best way to use my function per group would be using a temporal variable where I can store the p values of the random type, but this does not work:
dataset_test %>%
group_by(Group) %>%
{filter(Type=="Random") %>% select(Pvalue) ->> control_set } %>%
group_by(Type, add = T) %>%
mutate(FDR_empirical = calculate_empirical_fdr(control_pVal = control_set,
test_pVal = Pvalue)) %>%
data.table()
Error in filter(Type == "Random") : object 'Type' not found
I understand that probably temporal vairables "do not see" the environment within the data.table, would be glad to hear any suggestions how to fix it.
You can do something like this, which filters the control group P-values using the data.table special .BY
setDT(dataset_test)
dataset_test[
i= Type!="Random",
j = FDR_empirical:=calculate_empirical_fdr(dataset_test[Type=="Random" & Group ==.BY$Group, Pvalue], Pvalue),
by = .(Group, Type)
]
Output:
Type Group Pvalue FDR_empirical
1: Random group1 0.70292111 NA
2: Random group1 0.72383117 NA
3: Random group1 0.76413459 NA
4: Random group1 0.87942702 NA
5: Random group2 0.71229213 NA
---
1496: test2 group3 0.34817178 0.3681791
1497: test2 group1 0.22419118 0.2308988
1498: test2 group3 0.07258545 0.2314655
1499: test2 group2 0.24687976 0.2849462
1500: test2 group1 0.12206777 0.1760657