data-visualizationmatrixchemistryinformation-extraction

How to extract a visible scatterplot matrix from Rstudio when you have many variables?


I'm doing the chemical analysis and trying to make a scatterplot matrix based on my chemistry data. I was able to create a matrix that I wanted but each scatterplot is not visible enough to read on my screen. I am planning to print this matrix on a huge scientific poster, however, I don't know how am I going to export this matrix with visible plots. Are there any ways that I can get these plots visible? I used this code for a matrix:

scatterFull <- pairs(springwater[,1:53], panel=panel.smooth)

Here is a matrix that I got in a plot box from Rstudio. I know the monitor screen size matters which is why I am not able to see each plot clearly but is there a way to figure this out without changing monitor size?

I want to see this matrix visible as much as the matrix below (this matrix contains only 10 elements but I need all of 53 elements in my matrix). enter image description here

Please help me with this problem. Thank you!


Solution

  • I have had to deal with this problem for decades. By far the best way (in terms of ease and quality) in R is to output the graphic in PDF format. When you specify a large paper size, the graphic can be readable. It will be of higher quality than exporting the graphic shown in RStudio. Even so, you will likely have to experiment and possibly tinker with graphical elements such as text sizes and symbol sizes.

    Here's an example showing how the plot looks by default on a poster-size sheet (44 by 32 inches). The figure displays a portion of the output (as rendered in Acrobat Reader at 200% magnification: click on it to see it at its original size on the screen).

    Figure

    #
    # Generate sample data.
    #
    n <- 1e2                 # Number of points
    d <- 53                  # Number of variables
    mu <- runif(d)           # Variable log means
    sigma <- rbeta(d, 2, 2)  # Variable log sds
    X <- matrix(exp(rnorm(n*d, mu, sigma)), n, byrow=FALSE)
    X <- ifelse(X < 1, 0, X) # Censor all values at a detection limit of 1
    colnames(X) <- paste0("X", seq_len(ncol(X)))
    #
    # The proposed solution.
    #
    pdf("Pairs.pdf", width=44, height=32)
    pairs(X)
    dev.off()
    

    After running this code, open the output file Pairs.pdf in a pdf reader to view it.

    If even this doesn't work, you will need to tile the output by looping over disjoint subsets of the variables and outputting a scatterplot matrix for each ordered pair of those subsets. But with 53 variables it ought to be good enough.


    If you wish to identify and display a subset of these panels, here's some basic code you can emulate. It uses a hard-coded threshold of 0.3 for the correlation coefficient; in practice, you would likely adjust this according to the data and your objectives.

    #
    # Display highly correlated pairs of variables.
    #
    R <- abs(cor(X))          # The correlation matrix
    i <- which(R > 0.3) - 1   # Indexes of large absolute correlations
    p <- rbind(i %% ncol(X), i %/% ncol(X)) + 1 # Indexes into the variable names
    p <- p[, p[1,] != p[2,]]  # Remove the diagonals
    
    m <- ncol(p)
    if (m > 0) {
      nrow <- min(4, ceiling(sqrt(m/1.6)))
      ncol <- min(5, ceiling(sqrt(m*1.6)))
      par(mfrow = c(nrow, ncol))
      apply(p, 2, function(j) {
        plot(X[, j[1]], X[, j[2]], xlab=colnames(X)[j[1]], ylab=colnames(X)[j[2]])
      })
      par(mfrow=c(1,1))
    }