rggplot2data-visualizationjitter

R control jitter function - avoid overplotting / non-random jitter


My problems seems simple, I am using ggplot2 with geom_jitter() to plot a variable. (take my picture as an example)

Jitter now adds some random noise to the variable (the variable is just called "1" in this example) to prevent overplotting. So I have now random noise in the y-direction and clearly what otherwise would be completely overplotted is now better visible.

But here is my question:

As you can see, there are still some points, that overplot each other. In my example here, this could be easily prevented, if it wouldn't be random noise in y-direction... but somehow more strategically placed offsets.

Can I somehow alter the geom_jitter() behavior or is there a similar function in ggplot2 that does exactly this? enter image description here

Not really a minimal example, but also not too long:

library("imputeTS")
library("ggplot2")

data <- tsAirgap


# 2.1 Create required data

# Get all indices of the data that comes directly before and after an NA

na_indx_after <- which(is.na(data[1:(length(data) - 1)])) + 1
# starting from index 2 moves all indexes one in front, so no -1 needed for before
na_indx_before <- which(is.na(data[2:length(data)]))

# Get the actual values to the indices and put them in a data frame with a label
before <- data.frame(id = "1", type = "before", input = na_remove(data[na_indx_before]))
after <- data.frame(id = "1", type = "after", input = na_remove(data[na_indx_after]))
all <- data.frame(id = "1", type = "source", input = na_remove(data))

# Get n values for the plot labels
n_before <- length(before$input)
n_all <- length(all$input)
n_after <- length(after$input)



# 2.4 Create dataframe for ggplot2

# join the data together in one dataframe
df <- rbind(before, after, all)


# Create the plot

gg <- ggplot(data = df) +
  geom_jitter(mapping = aes(x = id, y = input, color = type, alpha = type), width = 0.5 , height = 0.5) 

gg <- gg + ggplot2::scale_color_manual(
  values = c("before" = "skyblue1", "after" = "yellowgreen","source" = "gray66"),
)

gg <- gg + ggplot2::scale_alpha_manual(
  values = c("before" = 1, "after" = 1,"source" = 0.3),
)

gg + ggplot2::theme_linedraw() + theme(aspect.ratio = 0.5) + ggplot2::coord_flip()

So many good suggestions...here is what Bens suggestion would look like for my example:

I changed parts of my code to:

gg <- ggplot(data = df, aes(x = input,  color = type, fill = type, alpha = type)) +
  geom_dotplot(binwidth = 15) 

enter image description here

Would basically also work as intended for me. ggbeeplot as suggested by Jon also worked great for my purpose.


Solution

  • I thought of a hack I really like, using ggrepel. It's normally used for labels, but nothing preventing you from making the label into a point.

    df <- data.frame(x = rnorm(200),
                     col = sample(LETTERS[1:3], 200, replace = TRUE),
                     y = 1)
    
    ggplot(df, aes(x, y, label = "●", color = col)) + # using unicode black circle
      ggrepel::geom_text_repel(segment.color = NA, 
                               box.padding = 0.01, key_glyph = "point")
    

    enter image description here

    A downside of this method is that ggrepel can take a lot time for a large number of points, and will recalculate differently each time you change the plot size. A faster alternative would be to use ggbeeswarm::geom_quasirandom, which uses a deterministic process to define jitter that looks random.

    ggplot(df, aes(x,y, color = col)) +
      ggbeeswarm::geom_quasirandom(groupOnX = FALSE)
    

    enter image description here