I've never made a plot like this before, so sorry as this is probably a basic question, but I am stuck on how to make a chord diagram and specifically get the outer sections to be my column headings (drug mechanisms) and the inner connections between the sections to be the rows (genes) which don't need to be named in the plot as there are so many.
My data is rows of genes that are marked as interacting with columns of drug mechanisms by zeros or ones.
For example a subset of my data looks like:
Gene Diuretic Beta_blocker ACE_inhibitor
Gene1 1 0 0
Gene2 0 0 1
Gene3 1 1 1
Gene4 0 1 1
My total data is actually 700 genes for 15 columns of drug mechanisms with all zeors and ones. I am currently just creating a chord diagram with:
df <- fread('df.csv')
df[is.na(df)] <- 0
df <- df %>% data.frame %>% set_rownames(.$Gene) %>% dplyr::select(-Gene)
mt <- as.matrix(df)
circos.par(gap.degree = 0.9) #set this as I was otherwise getting an error with my total data
chordDiagram(mt, transparency = 0.5)
With my total data this plot looks like:
I've been getting various errors with trying to get this plot to be 15 sections only (and even just trying to get the sections to have the column names).
Is there a way for me plot a chord diagram with the sections being representative of each column? Then for genes/rows that have an interaction (a 1 in the data) for that section and any other section to be shown in the chord diagram? I don't need the gene names to be visible, I am looking to just visualize the amount of overlap between my columns/sections.
Example input data (for which my problem would be trying to make only have 3 sections per each column to show their overlap):
df <- structure(list(Gene = c("Gene1", "Gene2", "Gene3", "Gene4"),
Diuretic = c(1L, 0L, 1L, 0L), Beta_blocker = c(0L, 0L, 1L,
1L), ACE_inhibitor = c(0L, 1L, 1L, 1L)), row.names = c(NA,
-4L), class = c("data.table", "data.frame")
If you have 15 different drug mechanisms, it would be best to count the genes that various mechanisms have in common, and use these as weightings for the links between drug effects.
Your sample data is too limited to give a feel for how this would look, but the code would be something like this:
new_df <-apply(df, 1, function(x) {
x <- names(df)[which(x == 1)]
m <- 1 - diag(length(x))
dimnames(m) <- list(x, x)
inds <- which(lower.tri(m), arr.ind = TRUE)
data.frame(from = x[inds[,1]], to = x[inds[,2]])}) %>%
bind_rows() %>%
mutate(wt = 1) %>%
group_by(from, to) %>%
summarize(wt = sum(wt), .groups = 'drop')
new_df
#> # A tibble: 3 x 3
#> from to wt
#> <chr> <chr> <dbl>
#> 1 ACE_inhibitor Beta_blocker 2
#> 2 ACE_inhibitor Diuretic 1
#> 3 Beta_blocker Diuretic 1
We can see that we have two genes that have a common action on ACE inhibitor and Beta blocker mechansim (which is what your table implies), and a single gene that links diuretic to both beta blocker and ACE inhibitor to diuretic.
This produces the following rather dull chord diagram:
chordDiagram(new_df)
However, if we make a sample data set that is of the same scale as your real data, we get a more satisfactory result:
set.seed(123)
big_dat <- as.data.frame(matrix(rbinom(15 * 700, 1, 0.5), 700),
row.names = paste0('Gene', 1:700)) %>%
setNames(c('ACE_inhibitor', 'Diuretic', 'Beta_Blocker',
'CCB', 'Nitrate', 'K_channel', 'Aldosterone_blocker',
'Vasodilator', 'PDEI', 'Central', 'Relaxant',
'ARB', 'Alpha_blocker', 'Dopaminergic', 'Unknown'))
big_df <- apply(big_dat, 1, function(x) {
x <- names(big_dat)[which(x == 1)]
m <- 1 - diag(length(x))
dimnames(m) <- list(x, x)
inds <- which(lower.tri(m), arr.ind = TRUE)
data.frame(from = x[inds[,1]], to = x[inds[,2]])}) %>%
bind_rows() %>%
mutate(wt = 1) %>%
subset(complete.cases(.)) %>%
group_by(from, to) %>%
summarize(wt = sum(wt), .groups = 'drop')
chordDiagram(big_df)