rggplot2facetmosaic-plot

Faceted mosaic plots with the same area scaling


I have data that has two logical variables (a treatment variable and an outcome variable) and one categorical variable. I want to use a mosaic plot to visualize the logical dimensions faceted by the categorical variables, especially because some subjects are represented in more than one category.

The problem is that the number of observations for each of the categories is very different (by an order of magnitude) I want to retain the scale across each of the categories so that the overall size of the plot is proportional to the number of observations.

Faceted mosaic plot

I'm using the ggmosaic package. I tried facet_wrap, but the facets are rescaled so that they aren't proportional to each other. I would like the area of the spe facet to be 10x the size of the ory facet to reflect the number of observations.

total_plot <- ggplot(tx_fig_data) +
  geom_mosaic(aes(x = product(permissive), fill = tx), show.legend = 
                FALSE) +
  theme_mosaic() +
  geom_mosaic_text(aes(x = product(permissive), fill = tx,
     label = after_stat(.wt)), show.legend = FALSE) +
  facet_wrap(facets = vars(organism))

Solution

  • This strikes me as a helpful feature that isn't to my knowledge currently possible with the ggmosaic package. (Happy to stand corrected.)

    While this isn't a complete solution (e.g. I haven't hacked the y axis labels yet), we could create this effect somewhat manually with dplyr. Here, I scale the plot in each facet as a square proportional in area to the largest facet. Alternatively, the facets could be scaled purely in x or y, or perhaps using some more complicated rubric, such as solving for what makes for the most "square" cells.

    In my example, I use mtcars where the x axis reflects counts of am values (0/1) first, and then those regions are split vertically based on vs values (0/1).

    EDIT - I have added a function to semi-manually add x labels. The process for determining y labels eludes me for the moment, as I'm not sure from which vertical slices we should extract our labels.

    enter image description here

    library(dplyr); library(ggplot2)
    mtcars |>
      count(cyl, am, vs) |>
      arrange(cyl, am, vs) -> mtcars1  # <--- new example count data
    
    widths <- mtcars1 |>
      count(cyl, am, wt = n, name = "x_n") |>
      add_count(cyl, wt = x_n, name = "facet_scale") |>
      mutate(facet_scale = facet_scale / max(facet_scale)) |>
      mutate(x_width = x_n / sum(x_n) * facet_scale ^ 0.5, 
             x_mid = cumsum(x_width) - x_width/2,
             .by = cyl)
    
    mtcars1 |>
      left_join(widths) |>
      mutate(y_height = n / sum(n) * facet_scale ^ 0.5, 
             y_mid = cumsum(y_height) - y_height/2,
             .by = c(cyl, am)) -> staged
    
    make_x_breaks <- function(pos) {
      df1 <- staged |>
        filter(pos == COL) |>
        distinct(COL, am, x_mid)
      scale_x_continuous(breaks = df1$x_mid, labels = df1$am)
    }
    
    staged |>
      ggplot(aes(x_mid, y_mid, width = x_width, height = y_height, 
                 fill = factor(vs), label = n)) +
      geom_tile() +
      geom_text() +
      facet_wrap(~cyl, scales = "free") +
      ggh4x::facetted_pos_scales(
        x = list(
          make_x_breaks(1),
          make_x_breaks(2),
          make_x_breaks(3))
        )