rggplot2ggpubrgeom-histogram

ggarrange not working with list of histograms and barplots, but histograms working correctly inside the function where they are appended to the list


I am trying to create a classPanelGraph for my Statystical Analysis assignment (reference for classPanelGraphs ) in R. In order to do this, I have prepared a small testing dataframe composed of just 5 variables: x and y, a random uniform and exponential distribution, cat1 and cat2 (two random categorical variables (factor in case of R) with 3 levels or modalities in case of cat1 and 2 in case of cat2) and class, which is a copy of cat2 and is used in order to simulate the classes that would be created by a clustering method.

Then, I am using a function which loops over the dataset's variables and creates a barplot in case of the variable being a factor type, and a histogram in the case of it being a numerical or integer type. Finally, I add each of this plots to a list that is returned as the result of the function.

However, when executing the function and using the obtained list of plots as an argument of the function ggarrange from the library ggpubr, an error jumps regarding the histograms, implying that the x variable declared in the aesthetics of the plot is not a continuous variable and it cannot be used as a variable for the geom_histogram() method in ggplot2.

The program being used is exactly this one:

library(ggplot2)
library(ggpubr)

#Creation of testing variables
x <- runif(1000)
y <- rexp(1000)
cat <- sample.int(3,1000,prob=c(0.6,0.3, 0.1), replace=TRUE)
class <- sample.int(2, 1000, replace=TRUE)


data <- data.frame(x,y, cat, class)

data[,"cat2"] <- data["class"]

data[,"cat"] <- as.factor(data[,"cat"])
data[,"class"] <- as.factor(data[,"class"])
data[,"cat2"] <- as.factor(data[,"cat2"])


classPanelGraph <- function(data){
  plots <- list()
  for (var in names(data)){
    if (is.factor(data[,var])){
      plots[[var]] <- ggplot(data=data,aes(x=data[,var]))+
        geom_bar()+
        facet_grid(class ~ .)+ ylab( "") + xlab(var)

    } else {
      error_plot<-ggplot(data=data, aes(x=data[,var]))+
        geom_histogram()+
        facet_grid(class ~ .)+ ylab("") + xlab(var)
      plots[[var]] <- error_plot
      print(error_plot)
    }
  }
  return(plots)
}


ll <- classPanelGraph(data)
ll[[1]]
ggarrange(plotlist=ll)

As it can be seen and tested, the function will print the histogram labeled as error_plot perfectly inside the function, but when trying to access it inside the list that is returned, be it with the ggarrange() function or by accessing the exact plot on the list (ll[[1]]), it results in the next error:

Error in `geom_histogram()`:
! Problem while computing stat.
ℹ Error occurred in the 1st layer.
Caused by error in `setup_params()`:
! `stat_bin()` requires a continuous x aesthetic
✖ the x aesthetic is discrete.
ℹ Perhaps you want `stat="count"`?
Run `rlang::last_trace()` to see where the error occurred.

I am guessing it may have something to do with how the plots are being stored or with some kind of problem regarding the local environment inside the function, in which the plot is created with no issues; and the global one, where it cannot be created. I have not found any other post or manual regarding this exact issue, and in case anyone is wondering, if just the categorical variables are considered, the program works perfectly.


Solution

  • I'm virtually certain this is a lazy evaluation problem.

    Your plots list is not being evaluated at the time your function is called, but only when you access its results. At this point, var is equal to the final column name in your data frame, and hence all of the plots in the list are constructed with this column as their x aesthetic, regardless of whether you actually wanted a histogram or a bar plot.

    In R, for loops use lazy evaluation. The apply family uses forced evaluation. [This is one of the reasons why many people would say "If you're coding in R and thinking of using a for loop, there's probably a better way to do it...".]

    So, let's convert your function to an lapply call:

    plots <- lapply(
      names(data),
      function(var) {
        if (is.factor(data[,var])){
          ggplot(data=data,aes(x=data[,var]))+
            geom_bar()+
            facet_grid(class ~ .)+ ylab( "") + xlab(var)
        } else {
          ggplot(data=data, aes(x=data[,var]))+
            geom_histogram()+
            facet_grid(class ~ .)+ ylab("") + xlab(var)
        }
      }
    )
    plots
    [[1]]
    `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
    
    [[2]]
    `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
    
    [[3]]
    
    [[4]]
    
    [[5]]
    

    And four plots appear in the viewer pane:

    enter image description here enter image description here enter image description here enter image description here