I am trying to create a raincloud plot to show scores on sex, however it is subgrouping each point based on its score I want it to look like this image, where petal.length on grouped by species and not the length itself depicted. I have code that has been working with other sets, however I am not sure what the issue is.
I have also check to see it the score scale is continuous or discrete, and it is continuous*
here is the code I am using in R:
dplyr::group_by(sex) %>%
dplyr::mutate(
mean = mean(score),
se = sd(score) / sqrt(length(score)),
sex_y = paste0(sex, "\n(", n(), ")")
) %>%
ungroup() %>%
ggplot(aes(x = NIH_score, y = sex_y)) +
stat_slab(aes(fill = sex)) +
geom_point(aes(color = sex),shape = 16,
position = ggpp::position_jitternudge(height = 0.125, width = 0,
y = -0.125,
nudge.from = "jittered")) +
scale_fill_brewer(palette = "Set1", aesthetics = c("fill", "color")) +
geom_errorbar(aes(
xmin = mean - 1.96 * se,
xmax = mean + 1.96 * se
), width = 0.2) +
stat_summary(fun = mean, geom = "point", shape = 16, size = 3.0) +
theme_bw(base_size = 10) +
theme(legend.position = "top") +
labs(title = "Raincloud plot with ggdist", x = "score")```
It's not that your data is being grouped by x axis value. It's just that the bandwidth of the kernel density estimator is too small.
Let's recreate your issue with essentially the same code but some made up data:
library(tidyverse)
library(ggdist)
set.seed(1)
df <- tibble(NIH_score = sample(2:8, 200, TRUE),
sex = sample(c("Male", "Female"), 200, TRUE),
score = NIH_score)
df %>%
dplyr::group_by(sex) %>%
dplyr::mutate(
mean = mean(score),
se = sd(score) / sqrt(length(score)),
sex_y = paste0(sex, "\n(", n(), ")")
) %>%
ungroup() %>%
ggplot(aes(x = NIH_score, y = sex_y)) +
stat_slab(aes(fill = sex), adjust = 0.1) +
geom_point(aes(color = sex),shape = 16,
position = ggpp::position_jitternudge(height = 0.125, width = 0,
y = -0.125,
nudge.from = "jittered")) +
scale_fill_brewer(palette = "Set1", aesthetics = c("fill", "color")) +
geom_errorbar(aes(
xmin = mean - 1.96 * se,
xmax = mean + 1.96 * se
), width = 0.2) +
stat_summary(fun = mean, geom = "point", shape = 16, size = 3.0) +
theme_bw(base_size = 10) +
theme(legend.position = "top") +
labs(title = "Raincloud plot with ggdist", x = "score")
But if we increase the bandwidth to, say, 2 inside stat_slab
using the adjust
parameter, we get:
It's not clear what it is about your settings or data that is giving such a narrow bandwidth (since neither is in your question), but you should be able to get the result you need by increasing adjust