rpcalimma

Why does removeBatchEffect() not remove batch clustering in PCA?


I'm using limma::removeBatchEffect() to correct batch effects in RNA-seq data after variance-stabilizing transformation. However, my PCA still shows strong clustering by batch.

Here’s a minimal example of the code I'm using:

library(DESeq2)
library(limma)

# Example VST matrix from DESeq2
vsd_mat <- assay(vsd)  # genes × samples matrix

# Metadata with batch and biological group (e.g. cancer subtype)
design_mat <- model.matrix(~ category, data = metadata)

# Attempt batch correction
corrected <- removeBatchEffect(vsd_mat,
                                batch = metadata$batch,
                                design = design_mat)

After correction, I run PCA:

pca <- prcomp(t(corrected))
plot(pca$x[,1:2], col = metadata$batch)  # Still clusters by batch

Question:

Why is removeBatchEffect() not eliminating batch structure in PCA, and how can I improve this?


Solution

  • By default, removeBatchEffect only subtracts batch means. If your batches have different variances, you may also want to scale:

    corrected2 <- removeBatchEffect(vsd_mat,
                                    batch = metadata$batch,
                                    design = design_mat,
                                    scale  = TRUE)