rdataframebioconductor

Keep only columns with matching column names


I want to retain the column names in all three dataframes met.kirp.450, met.kirc.450, and met.kich.450 from SummarizedExperiment data type, before merging column-wise.

First, I find the matching columns match.col before submitting the data frames with these matching columns. Then, I performed SummarizedExperiment::cbind.

dflist <- list(met.kirp.450, met.kirc.450, met.kich.450)
match.col <- Reduce(function(x, y){intersect(x, names(y))}, dflist, init = names(dflist[[1]]))
met.kirp.450 <- met.kirp.450[match.col,]
met.kirc.450 <- met.kirc.450[match.col,]
met.kich.450 <- met.kich.450[match.col,]

met.kipan <- SummarizedExperiment::cbind(met.kirp.450, met.kirc.450, met.kich.450)

Traceback:

Error in .aggregate_and_align_all_colnames(all_colnames, strict.colnames = strict.colnames) : 
  the DFrame objects to combine must have the same column names

I'm unable to print the dput because even a minimal dput(met.kirp.450[1,1]) is too long.

Data:

library(TCGAbiolinks)

query.met.kirp <- GDCquery(
  project = "TCGA-KIRP", 
  legacy = TRUE,
  data.category = "DNA methylation",
  platform = "Illumina Human Methylation 450", 
)
GDCdownload(query.met.kirp)

query.met.kirc <- GDCquery(
  project = "TCGA-KIRC", 
  legacy = TRUE,
  data.category = "DNA methylation",
  platform = "Illumina Human Methylation 450", 
)
GDCdownload(query.met.kirc)

query.met.kich <- GDCquery(
  project = "TCGA-KICH", 
  legacy = TRUE,
  data.category = "DNA methylation",
  platform = "Illumina Human Methylation 450", 
)
GDCdownload(query.met.kich)

met.kirp.450 <- GDCprepare(
  query = query.met.kirp,
  save = TRUE, 
  save.filename = "gbmDNAmet450k.rda",
  summarizedExperiment = TRUE
)

met.kirc.450 <- GDCprepare(
  query = query.met.kirc,
  save = TRUE, 
  save.filename = "gbmDNAmet450k.rda",
  summarizedExperiment = TRUE
)

met.kich.450 <- GDCprepare(
  query = query.met.kich,
  save = TRUE, 
  save.filename = "gbmDNAmet450k.rda",
  summarizedExperiment = TRUE
)

Solution

  • Because the objects are in SummarizedExperiment format, the column names can be retrieved from y@colData@rownames, where colData means column data.

    match.cols <- Reduce(function(x, y){intersect(x, y@colData@rownames)}, dflist, init = colnames(dflist[[1]]))
    
    met.kirp.450.new <- met.kirp.450[,match.cols]
    met.kirc.450.new <- met.kirc.450[,match.cols]
    met.kich.450.new <- met.kich.450[,match.cols]
    
    met.kipan <- SummarizedExperiment::cbind(met.kirp.450.new, met.kirc.450.new, met.kich.450.new)