rggplot2data-miningpcafactoextra

How to highlight a particular variable or individual in a PCA space in R


I am currently working on a large dataset (count data with species x samples) from which I performed a PCA. What I get is a massive cloud of points, and I would like to color one given species to show where it is located in this cloud (species are my variables here). Here is what it looks like :

enter image description here

I use the package factoextra, and visualize the variables with fviz_pca_var. Is there a way to select one particular species and display it with a color different than the others ?

Thank you for your help


Solution

  • If it's just a single point you want to color, perhaps:

    library(tidyverse)
    library(factoextra)
    library(FactoMineR)
    
    data("iris")
    
    iris$assigned_colors <- NA
    # Change the color of the 'individual of interest'
    iris[9,]$assigned_colors <- "red"
    
    iris.pca <- PCA(iris[,-c(5,6)], graph = FALSE)
    
    fviz_pca_ind(iris.pca,
                 geom = "point",
                 geom.ind = "point") +
      geom_point(aes(color = iris$assigned_colors)) +
      scale_color_identity()
    #> Warning: Removed 149 rows containing missing values (geom_point).
    

    Created on 2022-07-08 by the reprex package (v2.0.1)

    You can also label specific points (i.e. just the point of interest) using this approach, e.g.

    library(tidyverse)
    library(factoextra)
    library(FactoMineR)
    
    data("iris")
    
    iris$assigned_colors <- NA
    iris[9,]$assigned_colors <- "red"
    
    iris$labels <- NA
    iris[9,]$labels <- "point of interest"
    
    iris.pca <- PCA(iris[,-c(5,6, 7)], graph = FALSE)
    
    fviz_pca_ind(iris.pca,
                 geom = "point",
                 geom.ind = "point") +
      geom_point(aes(color = iris$assigned_colors)) +
      geom_text(aes(label = iris$labels), nudge_y = -0.2) +
      scale_color_identity()
    #> Warning: Removed 149 rows containing missing values (geom_point).
    #> Warning: Removed 149 rows containing missing values (geom_text).
    

    Created on 2022-07-08 by the reprex package (v2.0.1)