rdplyrpcaggbiplot

Save intermediate list output in dplyr pipeline and map it back to another list further down the pipeline - R


I am running pcas on groups in a data set using dplyr pipelines. I am starting with group_split, so am working with a list. In order to run the prcomp() function, only the numeric columns of each list can be included, but I would like the factor column brought back in for plotting at the end. I have tried saving an intermediate output using {. ->> temp} partway through the pipeline, but since it is a list, I don't know how to index the grouping column when plotting.

library(tidyverse)
library(ggbiplot)

iris %>%
  group_split(Species, keep = T) %>% #group by species, one pca per species
  {. ->> temp} %>%  # save intermediate output to preserve species column for use in plotting later
  map(~.x %>% select_if(is.numeric) %>% select_if(~var(.) != 0) %>% 
        prcomp(scale. = TRUE))%>% #run pca on numeric columns only
  map(~ggbiplot(.x), label=temp$Species)#plot each pca, labeling points as species names form the temporary object

This works to produce one pca plot for each species in the irisdata set, but since temp$species = NULL, the points are not labelled.


Solution

  • If you use map2() and pass the .y argument as the species list you can get the result I think you want. Note that in your original code the labels argument was outside the ggbiplot() function and was ignored.

    library(tidyverse)
    library(ggbiplot)
    
    iris %>%
      group_split(Species, keep = T) %>% 
      {. ->> temp} %>%  
      map(~.x %>% 
            select_if(is.numeric) %>%
            select_if(~var(.) != 0) %>% 
            prcomp(scale. = TRUE)) %>% 
      map2(map(temp, "Species"), ~ggbiplot(.x, labels = .y))
    

    enter image description here

    In response to your comment, if you wanted to add a third argument you could use pmap() instead of map2(). In the example below, pmap() is being passed a (nested) list of the data for the ggbiplot() arguments. Note I've changed the new variable so that it's a factor and not constant across groups.

    iris %>%
      mutate(new = factor(sample(1:3, 150, replace = TRUE))) %>%
      group_split(Species, keep = T) %>% 
      {. ->> temp} %>%  
      map(~.x %>% 
            select_if(is.numeric) %>%
            select_if(~var(.) != 0) %>% 
            prcomp(scale. = TRUE)) %>% 
      list(map(temp, "Species"), map(temp, "new")) %>%
      pmap(~ ggbiplot(pcobj = ..1, labels = ..2, groups = ..3))
    

    enter image description here