rggplot2likert

Plotting Likert-type questions with some non-respondents


I am trying to plot Likert-type data, and am using code adapted from this site: https://scisley.github.io/articles/ggplot-likert-plot/

The code to reproduce the example:

# load packages
library(dplyr)
library(ggplot2)
library(knitr)
library(tidyr)
library(scales)

# Simulation N responses
N <- 50
answers <- c("Strongly Disagree","Somewhat Disagree","Neither Agree nor Disagree",
              "Somewhat Agree", "Strongly Agree")
set.seed(12342)
d <- tibble(
  id = paste0("Respondent", 1:N),
  Q1 = sample(answers, N, replace=TRUE),
  Q2 = sample(answers, N, replace=TRUE),
  Q3 = sample(answers, N, replace=TRUE),
  Q4 = sample(answers, N, replace=TRUE),
  Q5 = sample(answers, N, replace=TRUE)
)

# reduce data
d.reduced <- d %>%
  select(-id) %>%
  gather("Q", "ans") %>%
  group_by(Q, ans) %>%
  summarize(n=n()) %>%
  mutate(per = n/sum(n),
         ans = factor(ans, levels=answers)) %>%
  arrange(Q, ans)

# create plot data
stage1 <- d.reduced %>%
  mutate(text = paste0(formatC(100 * per, format="f", digits=0), "%"),
         cs = cumsum(per),
         offset = sum(per[1:(floor(n()/2))]) + (n() %% 2)*0.5*(per[ceiling(n()/2)]),
         xmax = -offset + cs,
         xmin = xmax-per) %>%
  ungroup()

# order plot data
gap <- 0.2
stage2 <- stage1 %>%
  left_join(stage1 %>%
              group_by(Q) %>%
              summarize(max.xmax = max(xmax)) %>%
              mutate(r = row_number(max.xmax)),
            by = "Q") %>%
  arrange(desc(r)) %>%
  mutate(ymin = r - (1-gap)/2,
         ymax = r + (1-gap)/2)

# create plot
ggplot(stage2) +
  geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill=ans)) +
  geom_text(aes(x=(xmin+xmax)/2, y=(ymin+ymax)/2, label=text), size = 3) +
  scale_x_continuous("", labels=percent, breaks=seq(-1, 1, len=9), limits=c(-1, 1)) +
  scale_y_continuous("", breaks = 1:n_distinct(stage2$Q),
                     labels=rev(stage2 %>% distinct(Q) %>% .$Q)) +
  scale_fill_brewer("", palette = "BrBG")

The issue I have is that this requires all respondents to answer all questions, since it is assuming all rectangles will have the same length. If I have NAs, like I have in d3 below, the bars do not center properly in the plot.

# create data.frame with NAs
d2 <- apply (d[2:ncol(d)], 2,
             function(x) {x[sample( c(1:N), floor(N/10))] <- NA; x} )
d3 <- cbind(d[,1], d2) 

How can the code be adjusted to create a plot for d3 like we have for d above? I think I have isolated the issue to the offset in stage1 but I can't see how to adjust it to account for NAs.


Solution

  • If you want to remove NA values, do it at the summary stage before calculating percentages. To be honest, the code you have there seems a bit more complex than it needs to be. Here's a shorter version in a single pipeline that additionally handles NA values and puts the questions in the correct order:

    d3 %>%
      pivot_longer(-id, names_to = "Q", values_to = "ans") %>%
      count(Q, ans) %>%
      filter(complete.cases(.)) %>%
      mutate(percent = n/sum(n), .by = Q) %>%
      mutate(Q = factor(Q, colnames(d)[-1]),
             ans = factor(ans, answers)) %>%
      arrange(Q, ans) %>%
      mutate(xmin = cumsum(lag(percent, 1, 0)), 
             xmax = cumsum(percent), .by = Q) %>%
      mutate(offset = (xmax[3] + xmax[2])/2, .by = Q) %>%
      mutate(across(starts_with("xm"), ~.x - offset)) %>%
      ggplot(aes(y = Q)) +
      geom_linerange(aes(xmin = xmin, xmax = xmax, color = ans), linewidth = 30,
                     key_glyph = draw_key_rect) +
      geom_text(aes((xmin + xmax)/2, label = percent(percent, 1)), size = 3) +
      scale_x_continuous(NULL, labels = percent, breaks = seq(-1, 1, len = 9), 
                         limits = c(-1, 1)) +
      scale_color_brewer(NULL, palette = "BrBG")
    

    enter image description here