rggplot2scatter

Plot one vs many actual-predicted values scatter plot using R


For a sample dataframe df, pred_value and real_value respectively represent the monthly predicted values and actual values for a variable, and acc_level represents the accuracy level of the predicted values comparing with the actual values for the correspondent month, the smaller the values are, more accurate the predictions result:

df <- structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L), .Label = c("2022/3/31", "2022/4/30", 
"2022/5/31"), class = "factor"), pred_value = c(2721.8, 2721.8, 
2705.5, 2500, 2900.05, 2795.66, 2694.45, 2855.36, 2300, 2799.82, 
2307.36, 2810.71, 3032.91), real_value = c(2736.2, 2736.2, 2736.2, 
2736.2, 2736.2, 2759.98, 2759.98, 2759.98, 2759.98, 3000, 3000, 
3000, 3000), acc_level = c(1L, 1L, 2L, 3L, 3L, 1L, 2L, 2L, 3L, 
2L, 3L, 2L, 1L)), class = "data.frame", row.names = c(NA, -13L
))

Out:

        date pred_value real_value acc_level
1  2022/3/31    2721.80    2736.20         1
2  2022/3/31    2721.80    2736.20         1
3  2022/3/31    2705.50    2736.20         2
4  2022/3/31    2500.00    2736.20         3
5  2022/3/31    2900.05    2736.20         3
6  2022/4/30    2795.66    2759.98         1
7  2022/4/30    2694.45    2759.98         2
8  2022/4/30    2855.36    2759.98         2
9  2022/4/30    2300.00    2759.98         3
10 2022/5/31    2799.82    3000.00         2
11 2022/5/31    2307.36    3000.00         3
12 2022/5/31    2810.71    3000.00         2
13 2022/5/31    3032.91    3000.00         1

I've plotted the predicted values with code below:

library(ggplot2)
ggplot(x, aes(x=date, y=pred_value, color=acc_level)) +
  geom_point(size=2, alpha=0.7, position=position_jitter(w=0.1, h=0)) +
  theme_bw()

Out:

enter image description here

Beyond what I've done above, if I hope to plot the actual values for each month with red line and red points, how could I do that? Thanks.

Reference:

How to add 4 groups to make Categorical scatter plot with mean segments?


Solution

  • We can add the actuals using additional layers. To make the line show up, we need to specify that the points should be part of the same series.

    ggplot assumes by default that since the x axis is discrete that the data points are not part of the same group. We could alternatively deal with this by making the date variable into a date data type, like with aes(x=as.Date(date)...

    library(ggplot2)
    ggplot(df, aes(x=date, y=pred_value, color=as.factor(acc_level))) +
      geom_point(size=2, alpha=0.7, position=position_jitter(w=0.1, h=0)) +
      geom_point(aes(y = real_value), size=2, color = "red") + 
      geom_line(aes(y = real_value, group = 1), color = "red") +
      scale_color_manual(values = c("yellow", "magenta", "cyan"),
                         name = "Acc Level") +
      theme_bw()
    

    enter image description here