I have several character vectors of genes containing names of the species in which they're found, and I made an UpSetR plot to show the number of species in common across genes. Now I'd like to do the opposite: Plotting the number of genes in common across species, yet I don't know how to do it.
Example of what I have:
gene1 <- c("Panda", "Dog", "Chicken")
gene2 <- c("Human", "Panda", "Dog")
gene3 <- c("Human", "Panda", "Chicken")
...#About 20+ genes with 100+ species each
Example of what I would like to have as a result:
Panda <- c("gene1", "gene2", "gene3")
Dog <- c("gene1", "gene2")
Human <- c("gene2", "gene3")
Chicken <- c("gene1", "gene3")
...
I know it is conceptually easy, yet logistically more complicated. Can anyone give me a clue?
Thank you!
You can use unstack
from base R:
unstack(stack(mget(ls(pattern="gene"))),ind~values)
$Chicken
[1] "gene1" "gene3"
$Dog
[1] "gene1" "gene2"
$Human
[1] "gene2" "gene3"
$Panda
[1] "gene1" "gene2" "gene3"
You can end up listing this to the environment by list2env
function
Breakdown:
l = mget(ls(pattern="gene"))#get all the genes in a list
m = unstack(stack(l),ind~values)# Stack them, then unstack with the required formula
m
$Chicken
[1] "gene1" "gene3"
$Dog
[1] "gene1" "gene2"
$Human
[1] "gene2" "gene3"
$Panda
[1] "gene1" "gene2" "gene3"
list2env(m,.GlobalEnv)
Dog
[1] "gene1" "gene2"