I have some code that needs to plot data as multiple lines, some solid and some dashed. There is one line per "group", and the linewidth for that group is found in a column of my data frame. At some point, I started encountering an error saying that I couldn't have varying linewidth along a line when the "linetype" isn't "solid". I understand the intention of this error message, but there is never any variation in linewidth along any of my lines.
More strangely, however, is that I struggled to reproduce my error while making a reprex. It ended up appearing that simply changing the name of some of the groups from using two "s" characters like "ss_1" to using only one "s", like "s_1", could avoid the error message completely. This is bizarre. I've created an reproducible example below using ggplot2 version 3.5.1 (latest version) and R version 4.2.1 (old, I know). I start from a completely clear environment, for what it's worth.
Do you have any idea what's going on here?
I appreciate any ideas or observations you have. Thank you!
Edit: Commenters have reminded me that this code deviates from standard practice by placing "linewidth" outside of the aes() where it supposed to be. I got into this situation when I tried moving it outside the aes() to fix a different problem with linewidths, which was that the linewidths wouldn't render correctly unless you add a scale_linewidth. Thanks to commenters insisting on following proper practice, I searched and found the scale_linewidth() solution to my original problem (you can reproduce this issue with the example here if you move linewidth=linewidth inside the aes() -- the line widths will be wrong and won't change no matter what the values in the data frame are).
However, I am still intensely curious about why my improper usage of "linetype" would be sensitive to the character change in the data frame. I think I have seen this type of problem before, and it made creating a reproducible example quite time-intensive because the error would flicker on and off for seemingly no reason. Producing a reprex is a crucial step in the process of problem solving, even if it's just for oneself. Although I was using ggplot2 improperly here, I thought that making a reprex would lead me to the solution, but instead, it took several hours trying figure out what conditions would make the error appear, since it was seemingly at random. I hope to learn something about this situation, or report a bug if necessary, so that I and others can fix our mistakes quicker in the future. Thank you again for comments and advice!
library(ggplot2)
df1 = data.frame(year = c(rep(2020:2021, 2), 2020:2022),
groupid = c(rep("ss_1", 2), rep("ss_2", 2), rep("sim", 3)),
linewidth = c(rep(0.4, 4), rep(1, 3)),
value = 1:7,
linetype = c(rep("ss", 4), rep("sim", 3)))
ggplot() + geom_line(data=df1, aes(x=year,y=value,group=groupid,linetype = linetype), linewidth=df1$linewidth)
df2 = data.frame(year = c(rep(2020:2021, 2), 2020:2022),
groupid = c(rep("s_1", 2), rep("s_2", 2), rep("sim", 3)),
linewidth = c(rep(0.4, 4), rep(1, 3)),
value = 1:7,
linetype = c(rep("s", 4), rep("sim", 3)))
Same exact call as above.
ggplot() + geom_line(data=df2, aes(x=year,y=value,group=groupid,linetype = linetype), linewidth=df2$linewidth)
Edit: Solution to make linewidths correct:
ggplot() + geom_line(data=df2, aes(x=year,y=value,group=groupid,linetype = linetype, linewidth=linewidth)) + scale_linewidth(range=c(0.4,1))
However, I am still intensely curious about why my improper usage of "linetype" would be sensitive to the character change in the data frame.
Jon already answered this in comments -- it breaks grouping/ordering.
It's about groupid
(as this is what you use for grouping) and linewidth
(as this is what you include from environment and not from ggplot data).
Values in your df2$groupid
happen to be ordered and grouping does not alter how data is arranged, so in this particular case you get away when referring to an external vector, df2$linewidth
.
To illustrate through dplyr::arrange()
:
library(dplyr)
df2 |>
select(groupid, linewidth) |>
arrange(groupid) |>
bind_cols(df2_linewidth = df2$linewidth)
#> groupid linewidth df2_linewidth
#> 1 s_1 0.4 0.4
#> 2 s_1 0.4 0.4
#> 3 s_2 0.4 0.4
#> 4 s_2 0.4 0.4
#> 5 sim 1.0 1.0
#> 6 sim 1.0 1.0
#> 7 sim 1.0 1.0
But when df1
gets rearranged by groupid
, proper alignment with df1$linewidth
will be broken due to choice and order of groupid
values in input data :
df1 |>
select(groupid, linewidth) |>
arrange(groupid) |>
bind_cols(df1_linewidth = df1$linewidth)
#> groupid linewidth df1_linewidth
#> 1 sim 1.0 0.4
#> 2 sim 1.0 0.4
#> 3 sim 1.0 0.4
#> 4 ss_1 0.4 0.4
#> 5 ss_1 0.4 1.0
#> 6 ss_2 0.4 1.0
#> 7 ss_2 0.4 1.0