rggplot2graphicspcaggbiplot

ggbiplot(): How to change the colour of the arrows and text using a function for a PCA biplot in R


Issue

I have used the function ggbiplot() to produce a PCA biplot for multivariate data (see diagram 1 - below)

I found this Stackoverflow post for specifying the colour/transparency/position of line segments in ggbiplot.

Therefore, I am trying to use the sample code in this post to change the colour of the arrows in the biplot and the text of the whistle parameter names to black (see my ideal output in diagram 2 - below).

When I run the geom_segment() function, I keep getting this error message (see below) and I cannot quite figure out how to run the functions ggbiplot2() and geom_segment() (see below) without getting error messages to produce a biplot with the desired output.

Error in layer(data = data, mapping = mapping, stat = stat, geom = GeomSegment,  : 
  object 'arrow.color' not found

Specifically, I would like the arrows to be black, boldface and longer and the text to be boldface and black with the whistle parameters text to be inside labels with a grey background (exactly the same as diagram 2 - see below).

Unfortunately, I cannot share my data, so I have produced a Dummy Data frame, so the resulting PCA biplot will probably look very different to diagram 1 (see below).

If anyone can help, I would be deeply appreciative.

Dummy Data

#Dummy data
#Create a cluster column with dummy data (clusters = 3)
f1 <- gl(n = 2, k=167.5); f1

#Produce a data frame for the dummy level data
f2<-as.data.frame(f1)

#Rename the column f2
colnames(f2)<-"Country"

#How many rows
nrow(f2)

#Rename the levels of the dependent variable 'Country' as classifiers
#prefer the inputs to be factors
levels(f2$Country) <- c("France", "Germany")

#Create random numbers
Start.Freq<-runif(335, min=1.195110e+02, max=23306.000000)
End.Freq<-runif(335, min=3.750000e+02, max=65310.000000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)
Low.Freq<-runif(335, min=6.592500e+02, max=20491.803000)
High.Freq<-runif(335, min=2.051000e+03, max=36388.450000)
Peak.Freq<-runif(335, min=7.324220+02, max=35595.703000)
Center.Freq<-runif(335, min=2.190000e-02, max=3.155800)
Delta.Freq<-runif(335, min=1.171875+03, max=30761.719000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)

#Bind the columns together
Bind<-cbind(f2, Start.Freq, End.Freq,  Low.Freq, High.Freq, Peak.Freq,  Center.Freq, Delta.Freq, Delta.Time)

#Rename the columns 
colnames(Bind)<-c('Country', 'Low.Freq', 'High.Freq', 'Start.Freq', 'End.Freq', 'Peak.Freq', 'Center.Freq', 
                  'Delta.Freq', 'Delta.Time')

#Produce a dataframe
Whistle_Parameters<-as.data.frame(Bind)
Whistle_Parameters

Data Transformation

#Box Cox

#Create a dataframe format for the Yeo transform
Box<-as.data.frame(Whistle_Parameters)
Box

#Check the structure of the dataframe 'Box'
str(Box)

#Use the function powerTransform(), specifying family = "bcPower", to obtain an optimal Box Cox transformation
transform_Low.Freq.box=car::powerTransform(Box$Low.Freq, family= "bcPower")
transform_Low.Freq.box

transform_High.Freq.box=car::powerTransform(Box$High.Freq, family= "bcPower")
transform_High.Freq.box

transform_Start.Freq.box=car::powerTransform(Box$Start.Freq, family= "bcPower")
transform_Start.Freq.box

transform_End.Freq.box=car::powerTransform(Box$End.Freq, family= "bcPower")
transform_End.Freq.box

transform_Peak.Freq.box=car::powerTransform(Box$Peak.Freq, family= "bcPower")
transform_Peak.Freq.box

transform_Center.Freq.box=car::powerTransform(Box$Center.Freq, family= "bcPower")
transform_Center.Freq.box

transform_Delta.Freq.box=car::powerTransform(Box$Delta.Freq, family= "bcPower")
transform_Delta.Freq.box

transform_Delta.Time.box=car::powerTransform(Box$Delta.Time, family= "bcPower")
transform_Delta.Time.box

#Produce a dataframe object
Box_Cox_Transformation<-as.data.frame(stand_box)
Box_Cox_Transformation

PCA

#install.packages("devtools")
library(devtools)
install_github("vqv/ggbiplot")
library(ggbiplot)

#You can do a PCA to visualize the difference between the groups using the standardised box cox data
PCA=prcomp(Box_Cox_Transformation[2:8], center = TRUE, scale=TRUE, retx = T)
PCA

#Plot the names of the principal components
names(PCA)

#Print the attributes of the PCA
attributes(PCA)

#Summarise the importance of the components showing the standard deviation, proportion of variance, 
#and the cumulative proportion
summary(PCA)

#In the arglist in the function, add "name = expression" terms for color, line type and transparency ("alpha") for the arrows.
ggbiplot2 <- function (pcobj, choices = 1:2, scale = 1, pc.biplot = TRUE, 
                       obs.scale = 1 - scale, var.scale = scale, groups = NULL, 
                       ellipse = FALSE, ellipse.prob = 0.68, labels = NULL, labels.size = 3, 
                       alpha = 1, var.axes = TRUE, circle = FALSE, circle.prob = 0.69, 
                       varname.size = 3, varname.adjust = 1.5, varname.abbrev = FALSE, 
                       linetype = "solid",
                       alpha_arrow = c(0.2, 0.5, 1, 1),
                       arrow.color = c(muted("red"), "black", "red")) 
  
ggbiplot2


#Then search for the geom_segment part, and add arguments for color, linetype and alpha:

 g <- ggbiplot2 + geom_segment(data = df.v, aes(x = 0, y = 0, xend = xvar, yend = yvar),
                        arrow = arrow(length = unit(1/2, "picas")),
                        arrow.color = arrow.color, linetype = linetype, alpha = alpha_arrow)

Assign the edited function to a new name, e.g. ggbiplot2. Try it, where you set values other than the default for the arrows:

#Build a classification model for the PCA components
PCA_plot<-ggbiplot2(PCA, ellipse=TRUE, circle=TRUE, varname.adjust = 1.6, groups=Box_Cox_Transformation$Country,
                    arrow.color = "black", linetype = "solid", alpha_arrow = 0.5) +
                    ggtitle("PCA of Acoustic Parameters") +
                    theme(plot.title = element_text(hjust = 0.5)) +
                    theme_minimal() +
                    theme(panel.background = element_blank(), 
                    panel.grid.major = element_blank(), 
                    panel.grid.minor = element_blank(),
                    panel.border = element_blank()) + 
                    theme(axis.line.x = element_line(color="black", size = 0.8),
                    axis.line.y = element_line(color="black", size = 0.8)) 
            

PCA_plot

Diagram 1 - PCA_Plot without the function

enter image description here

Diagram 2 enter image description here


Solution

  • Unfortunately, despite the effort you put in to creating a dummy data set, the code you made to reproduce it contains errors. However, this seems a bit tangential to what you are asking here, which is to change the colors and weights of segments and text in the image produced by ggbiplot. To do this, we can simply use the example that comes with the package:

    library(ggbiplot)
    
    data(wine)
    wine.pca <- prcomp(wine, scale. = TRUE)
    
    p <- ggbiplot(wine.pca, obs.scale = 1, var.scale = 1, 
             groups = wine.class, ellipse = TRUE, circle = TRUE)
    
    p
    

    enter image description here

    The options for styling the plot within the function itself are somewhat limited, but since it produces a ggplot object, we can re-specify the necessary layers. The following code should work on any object output from ggbiplot. First we find the geom segment and geom text layers:

    seg <- which(sapply(p$layers, function(x) class(x$geom)[1] == 'GeomSegment'))
    txt <- which(sapply(p$layers, function(x) class(x$geom)[1] == 'GeomText'))
    

    We can change the colour and width of the segments by doing

    p$layers[[seg]]$aes_params$colour <- 'black'
    p$layers[[seg]]$aes_params$size <- 1
    

    To change the labels to have a gray background, we need to overwrite the geom_text layer with a geom_label layer:

    p$layers[[txt]] <- geom_label(aes(x = xvar, y = yvar, label = varname,
                                      angle = angle, hjust = hjust), 
                                  label.size = NA,
                                  data = p$layers[[txt]]$data, 
                                  fill = '#dddddd80')
    

    Now we can draw the plot with a clean modern theme:

    p + theme_minimal()
    

    enter image description here