I have a matrix that's 490 rows (features; F1..F490) and 350 columns (350 samples; s1..s350). The first columns look like this:
Drug T T T C T
Sample s1 s2 s3 s4 s5 .....
Pair 16 81 -16 32 -81 .....
Cond B D B B D .....
F1 34 23 12 9 .....
F2 78 11 87 10 .....
...
(there are missing data, it's normal).
There are 2 conditions; B and D. There are 2 drugs (T and C). The samples are paired. So for example, s1 and s3 are paired because their Pair value is the same (in absolute value).
What I'm trying to do, is to permute the drugs labels 1000 times while preserving the information on the pairing (Pair value). So, a pair should always have the same condition (B in this case) and the same Pair value (16 and -16 in this case). Also, they have to have the same drug label. Example; s1 and s3 are a pair; the have the same Pair value, are both B and have both the drug label T.
So 1 of the 1000 permuted files should look something like this for example:
Drug C T C T T
Sample s1 s2 s3 s4 s5 .....
Pair 16 81 -16 32 -81 .....
Cond B D B B D .....
F1 34 23 12 9 .....
F2 78 11 87 10 .....
...
I don't mind if the samples are not in order.
I've tried permute and sample (in R), but I can't seem to find a way to do it while including the conditions described above.. I'm sorry if this is obvious..
I want to use these permutated files (n=1000) for a downstream analysis that I already coded.
Thank you very much for your input.
Given the data df
. Group by absolute value of Pair
and then sample/ permute Drug
for the grouped pairs. Finally join back on absolute value of Pairs
. Using dplyr
:
t_df <- as.data.frame(t(df)) # transposed to use features as cols
t_df$Pair <- as.numeric(as.character(t_df$Pair)
library(dplyr)
# Wrap this into a function to call/ permute 1000 times
df_out <- t_df %>% mutate(abs_pair = abs(Pair)) %>%
group_by(abs_pair) %>% filter(row_number()==1) %>%
ungroup() %>% mutate(Permuted_drug = sample(Drug, n())) %>%
select(abs_pair, Permuted_drug) %>%
inner_join(t_df %>% mutate(abs_pair = abs(Pair)))
df_out
# abs_pair Permuted_drug Drug Sample Pair Cond
# <dbl> <fct> <fct> <fct> <dbl> <fct>
#1 16 T T s1 16 B
#2 16 T T s3 -16 B
#3 81 C T s2 81 D
#4 81 C T s5 -81 D
#5 32 T C s4 32 B
Data Used:
df <- read.table(text = "Drug T T T C T
Sample s1 s2 s3 s4 s5
Pair 16 81 -16 32 -81
Cond B D B B D", row.names = 1)