rggplot2legendmultiple-entries

Hide some legend entries in ggplot


I have the follow lines of code:

ggplot() + 
  geom_line(data=TS_SimHeads_HOBS_final, aes(x=as.Date(Date), y=BH2672), color='red') + 
  geom_point(data=Hydro_dates_wellData_2014_2018, aes(x=as.Date(Date), y=BH2672), color='red') +
  geom_line(data=TS_SimHeads_HOBS_final, aes(x=as.Date(Date), y=BH3025), color='green') + 
  geom_point(data=Hydro_dates_wellData_2014_2018, aes(x=as.Date(Date), y=BH3025), color='green') +
  xlab("Date") + ylab("Head")
  #theme_bw() 

which generate the following plot:

enter image description here

What I am trying to do, unsuccessfully, is to include legends only for the lines (points are the experimental data and lines the simulated ones). Some data for reproduction purposes:

Date BH2672 BH278 BH2978 BH2987 BH3025 BH312 BH3963 BH3962 BH3957
2014-02-19 31.28400 78.86755 5.671027 39.48419 53.60201 44.29516 69.23685 61.70843 56.13871
2014-02-20 30.76656 78.87344 5.656940 39.49012 53.56489 44.50679 69.50910 61.70638 56.09621
2014-02-21 30.43226 78.88097 5.642136 39.49902 53.56041 44.65761 69.65709 61.70126 56.04346
2014-02-22 30.16532 78.88979 5.643818 39.51101 53.56065 44.78333 69.75621 61.69643 55.99459
2014-02-23 29.93577 78.89954 5.650873 39.52544 53.55970 44.89429 69.82983 61.69332 55.95241
2014-02-24 29.73162 78.90991 5.658991 39.54147 53.55682 44.99520 69.88845 61.69236 55.91639


Solution

  • As is quite often the case you first have to convert both of your datasets to long or tidy format using e.g. tidyr::pivot_longer which will result in a new column with the variable names as categories which could then be mapped on the color aes. Doing so will automatically create a legend and also allows to simplify your code. And if you want only the lines to appear in the legend then you could add show.legend=FALSE to geom_point. Finally you can set your desired colors via scale_color_manual.

    As you provided only one dataset I used this for both datasets which however shouldn't matter. Also, to make my life a bit easier I have put the datasets in an named list:

    library(dplyr, warn = FALSE)
    library(tidyr)
    library(ggplot2)
    
    data_list <- list(data = Hydro_dates_wellData_2014_2018, sim = TS_SimHeads_HOBS_final) %>%
      lapply(function(x) {
          x %>%
          select(Date, BH2672, BH3025) %>%
          mutate(Date = as.Date(Date)) %>%
          tidyr::pivot_longer(-Date)
      })
    
    ggplot() + 
      geom_line(data=data_list$sim, aes(x=Date, y=value, color = name)) +
      geom_point(data=data_list$data, aes(x=Date, y=value, color = name), show.legend = FALSE) +
      scale_color_manual(values = c(BH2672 = "red", BH3025 = "green")) +
      labs(x = "Date", y = "Head")
    

    DATA

    TS_SimHeads_HOBS_final <- structure(list(Date = c(
      "2014-02-19", "2014-02-20", "2014-02-21",
      "2014-02-22", "2014-02-23", "2014-02-24"
    ), BH2672 = c(
      31.284,
      30.76656, 30.43226, 30.16532, 29.93577, 29.73162
    ), BH278 = c(
      78.86755,
      78.87344, 78.88097, 78.88979, 78.89954, 78.90991
    ), BH2978 = c(
      5.671027,
      5.65694, 5.642136, 5.643818, 5.650873, 5.658991
    ), BH2987 = c(
      39.48419,
      39.49012, 39.49902, 39.51101, 39.52544, 39.54147
    ), BH3025 = c(
      53.60201,
      53.56489, 53.56041, 53.56065, 53.5597, 53.55682
    ), BH312 = c(
      44.29516,
      44.50679, 44.65761, 44.78333, 44.89429, 44.9952
    ), BH3963 = c(
      69.23685,
      69.5091, 69.65709, 69.75621, 69.82983, 69.88845
    ), BH3962 = c(
      61.70843,
      61.70638, 61.70126, 61.69643, 61.69332, 61.69236
    ), BH3957 = c(
      56.13871,
      56.09621, 56.04346, 55.99459, 55.95241, 55.91639
    )), class = "data.frame", row.names = c(
      NA,
      -6L
    ))
    
    Hydro_dates_wellData_2014_2018 <- TS_SimHeads_HOBS_final