I have found a problem when plotting hex graphs using ggplot2
. When I use geom_text
to add text to the graph, it takes a very long time!
I made a minimal, self-contained, reproducible example and you can easily find this problem.
library(ggplot2)
d1 <- ggplot(diamonds, aes(carat, price)) +
geom_hex()
d2 <- d1+
geom_text(x=3,y=5000,
label="y = 0.0243x + 0.298\nR^2 = 0.648, p < 2.2e-16")
system.time({
print(d1)
})
# user system elapsed
# 0.11 0.01 0.13
system.time({
print(d2)
})
# user system elapsed
# 0.75 2.84 3.61
For a very small amount of data, geom_text
makes the runtime more than 27 times longer. For my real code, it increases the runtime from 3.65s to 162.30s (more than 44 times, and note that this is only when the run is complete, it takes more time to show up in the pane).
I'm not exactly sure what's causing this, but I feel like adding text to a graphic should be one of the more basic graphic settings, so I'm sincerely hoping this can be fixed.In addition, because of this super long running time, it often causes R to crash when I want to stop running (this makes me very frustrated). I am also curious if there is a more appropriate way to end the running process and keep the R program running normally.
ggplot2
expects aesthetics to be the same length as the data. geom_text()
recycles your string to every row of data. We can look at the data that ggplot2
uses with ggplot2::ggplot_build()
:
dat <- ggplot_build(d2)
lapply(dat$data, head,2)
# [[1]]
# fill x y width height density ndensity count ncount PANEL group colour linewidth linetype alpha
# 1 #132B43 0.3206657 -0.000001 0.1603333 616.5667 1.853912e-05 0.0001721467 1 0.0001721467 1 -1 NA 0.5 1 NA
# 2 #53ABEE 0.2404990 533.962395 0.1603333 616.5667 1.030033e-01 0.9564468928 5556 0.9564468928 1 -1 NA 0.5 1 NA
# [[2]]
# PANEL group colour size angle hjust vjust alpha family fontface lineheight x y label
# 1 1 -1 black 3.88 0 0.5 0.5 NA 1 1.2 3 5000 y = 0.0243x + 0.298\nR^2 = 0.648, p < 2.2e-16
# 2 1 -1 black 3.88 0 0.5 0.5 NA 1 1.2 3 5000 y = 0.0243x + 0.298\nR^2 = 0.648, p < 2.2e-16
The first data frame contains the coordinates of each hexagon. The second contains the same text, repeated 53,940 times (the number of rows of diamonds
).
One way to see this is to reproduce both plots as svg strings. You will see the d2
string contains the following two lines repeated nrow(diamonds)
times:
<text x='404.35' y='510.41' text-anchor='middle' style='font-size: 11.04px; font-family: "Arimo";' textLength='144.40px' lengthAdjust='spacingAndGlyphs'>R^2 = 0.648, p < 2.2e-16</text>
<text x='404.35' y='494.51' text-anchor='middle' style='font-size: 11.04px; font-family: "Arimo";' textLength='115.77px' lengthAdjust='spacingAndGlyphs'>y = 0.0243x + 0.298</text>
You cannot see the extra 107,880 lines when you look at the plot as the text is drawn over itself every time. However, it will take much longer to render. If you save the plot in a format which describes how to render the plot, such as svg, it will be much bigger (20mb compared to 85kb). However, if you save it as an representation of pixels, e.g. png, you will not be observe any difference.
You do not want to map the text to your data, so for this kind of thing you should instead use ggplot2::annotate()
. As the docs state,
The properties of the geoms are not mapped from variables of a data frame, but are instead passed in as vectors. This is useful for adding small annotations (such as text labels)
d3 <- d1 + annotate(
"text",
x = 3,
y = 5000,
label = "y = 0.0243x + 0.298\nR^2 = 0.648, p < 2.2e-16"
)
This only generates the text once so should not cause this issue:
system.time(print(d1))
# user system elapsed
# 0.11 0.01 0.13
system.time(print(d3))
# user system elapsed
# 0.141 0.000 0.141
Similarly, unlike the 20mb d2.svg
, the size of d3
when saved to svg is 85kb, the same as d1
.