This is a different question but follows on from this: R boxplot Subset column based on value in another column
UPDATED
my dataset looks like this:
Term | Name | True | Result | Gender |
---|---|---|---|---|
T1 | Name1 | True | 4 | F |
T2 | Name2 | False | 6 | F |
T3 | Name3 | True | 5.5 | M |
T3 | Name4 | False | 4.6 | M |
The test dataset:
dataset_test <- structure(list(Term = c("T1", "T1", "T1", "T1", "T1", "T1", "T2",
"T2", "T2", "T2", "T2", "T2", "T2", "T3", "T3", "T3", "T3", "T3",
"T3", "T3"), Name = c("Name1", "Name2", "Name3", "Name4", "Name5",
"Name6", "Name5", "Name6", "Name7", "Name8", "Name9", "Name10",
"Name11", "Name12", "Name13", "Name14", "Name15", "Name16", "Name17",
"Name18"), TRUE. = c(TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE,
TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
FALSE, TRUE, TRUE), Result = c(4, 5, 6, 4, 5, 6, 5.5, 4.6, 5.5,
4.6, 5, 5.2, 6, 5.5, 4, 5.5, 4.8, 5, 5, 4.4), Gender = c("F",
"F", "F", "M", "M", "M", "F", "F", "F", "F", "M", "M", "M", "F",
"F", "F", "F", "M", "M", "M")), class = "data.frame", row.names = c(NA,
-20L))
I have a grouped box plot by gender below. I want to be able to highlight the points in the right gender boxplot, i.e. the points need to align with the gender of the True record.
Solution credited to chemdork123
dataset_test %>%
group_by(Term) %>%
filter(any(TRUE.)) %>%
ggplot(aes(x = Term, y = Result, fill = Gender)) +
scale_fill_brewer(palette = "Blues") +
geom_boxplot(position=position_dodge(0.8))+
geom_point( # add the highlight points
data=subset(dataset_test, TRUE. == TRUE),
aes(x=Term, y=Result), position=position_dodge(0.8),
color="blue", size=4, show.legend = FALSE) +
ggtitle("Distribution of results by term") +
xlab("Term ") + ylab("Result)")
Position dodge now works perfectly if there are true records for both genders. But breaks if there are only one. However, this is the main use case for this visualisation.
The code above produces this:
Again any help would be greatly appreciated.
You were probably close : you need to use position_dodge
on the geom_point()
call. In order to be sure that the points align correctly with the position of the boxplots, you also should explicitly define the width
of position_dodge
for the boxplot geom too. I also include show.legend=FALSE
for geom_point()
here, since you likely don't want the blue dots on the legend like you had in your example:
dataset %>%
group_by(Term) %>%
filter(any(TRUE.)) %>%
ggplot(aes(x = Term, y = Result, fill = Gender)) +
scale_fill_brewer(palette = "Blues") +
geom_boxplot(position=position_dodge(0.8))+
geom_point( # add the highlight points
data=subset(dataset, TRUE. == TRUE),
aes(x=Term, y=Result), position=position_dodge(0.8),
color="blue", size=4, show.legend = FALSE) +
ggtitle("Distribution of results by term") +
xlab("Term ") + ylab("Result)")