I am trying to make a volcano plot (dot plot) where the points above a certain y value are colored in a gradient from red to green depending on their x value, and add a legend that specifies the number of points that are above these values.
I have a code similar to this one:
set.seed(123)
x <- runif(600, -3, 3)
y <- runif(600, 0, 0.6)
df<- as.data.frame(cbind(x,y))
df %>% ggplot(aes(x, -log10(y), color=x)) +
geom_point()+
geom_hline(yintercept=1.3, color="darkgrey")+
scale_fill_gradient(low="red",high="green", aesthetics = "color")
Which (with my data) produces this plot:
But I want the legend to count the number of points which y>1.3 & x>0, and y>1.3 & x<0 separately (not to show the color bar) and I want the points below the line to be black.
Could someone help me?
Thanks!!!!
One point to note: legends in ggplot
are only setup to explain how the aesthetics are represented. In order for legends to display results or data (such as a tally of your information), you have to use a different approach from what is built in to ggplot2
.
With that being said, here's an example using a subset of the diamonds dataset.
Note I'm using a sample of the diamonds dataset because I'm lazy and did not want to wait for the 50000+ points of data to render. :/
set.seed(12345)
di <- diamonds[sample(1:nrow(diamonds), 5000),]
I'm going to setup the plot to represent depth on the x axis and price on the y axis. We will summarize the number of observations with high depth (> mean depth) and low depth (< mean depth), and which all have price > 6000. We'll use this table later.
di.summary <- as.data.frame(
di %>% dplyr::filter(price > 6000) %>%
group_by(depth > mean(di$depth)) %>% tally()
)
chartTable <- cbind(c('Low\nDepth', 'High\nDepth'), di.summary[,2])
This illustrates the method that can be used for your chart to only change the color of certain points. In this case, I want only points above 6000 in price to be colored, and all other points to be represented as gray dots. The easiest way to do this is to have two geom_point
calls and have them use different datasets. One will have a color aesthetic applied (within aes()
) and the other will have a gray color specified outside the aes()
function.
p <- ggplot(di, aes(depth, price)) +
geom_point(data=di[which(di$price > 6000),], aes(color=depth), size=1) +
geom_point(data=di[which(di$price <= 6000),], color='gray80', size=1) +
geom_hline(yintercept=6000) +
geom_vline(xintercept=mean(di$depth), linetype=2) +
scale_color_gradient(high='red', low='green')
p
In order to display the table in your plot, we are going to have to use a "grob" (short for "Graphics Object", I believe). I'm going to convert the table using tableGrob
from the gridExtra
library. You then pass that grob object to annotation_custom()
, and specify the location within your chart.
One more point is that we are planning to put the table in the lower right corner outside the plot area (below the legend). In order to do this, we need to make room for the table by adding a plot margin on the right. We also need to turn clipping off so that the annotation can be represented outside the plot area.
library(gridExtra)
p +
coord_cartesian(clip='off') +
theme(
plot.margin = margin(0,40,0,0)
) +
annotation_custom(
grob=tableGrob(chartTable, theme=ttheme_default(base_size = 9)),
xmin=74.5, xmax=76, ymin=0, ymax=5000
)
You can use a similar approach for your data.
An alternative approach to using a tableGrob
could be to just represent the tally of points via text annotations. I'll show an example of that here:
p +
annotate(
geom='label',
x=min(di$depth), y=0.8*max(di$price),
hjust=0,
label=paste0('n=',di.summary[1,2])
) +
annotate(
geom='label',
x=max(di$depth), y=0.8*max(di$price),
hjust=1,
label=paste0('n=',di.summary[2,2])
)
While not your data, the above example should give you enough information to figure out how these can apply to your own data.