rggplot2plotgeom-textggrepel

Why is my ggplot generating phantom data?


My plot seems to be inserting data not in my df.
Code for my plot:

X <- ggplot(methods_per_CONSM_majors |> mutate(Majors = factor(Majors) |> fct_inorder(),
                    Measures = factor(Measures) |> fct_rev()), 
       aes(Majors, Measures, label = LCOs, color = Years)) + geom_text(size = 2.5) + geom_text_repel(box.padding = 1, max.overlaps = 5) +
  scale_y_discrete(guide = guide_axis(n.dodge = 8)) +
  labs(title = "TEMP:",
         subtitle = "TEMP") +
  xlab(label = "") +
  ylab(label = "") +
  scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) +
  theme_minimal() +
  theme(panel.grid = element_line(
      size = (0.1), colour =
        "lightgrey")) +
  theme(legend.position = "top") + 
  guides(color = guide_legend(nrow = 1)) +
  theme(axis.text.x=element_text(angle=90, hjust=1)) +
  geom_text_repel()

X + labs(color=NULL) +
  scale_color_manual(values=c("gold3", "firebrick2", "blue", 'orchid', 'black'))

Here's the problem: Engineering (for example) has one datapoint in Rubric and it is 2.1. But when I plot, 2.1 appears three times for Engineering:

[]

I realize the plot isn't actually inserting data into my df, but what is causing the appearance of these additional, phantom data labels?


Data

methods_per_CONSM_majors <- data.frame(
  'Majors'=c('Science (Pre-Nursing)', 'Science (Pre-Nursing)', 'Science (Pre-Nursing)', 
             'Science (Pre-Nursing)', 'Science (Pre-Nursing)', 'Science (Pre-Nursing)', 'Science (Pre-Nursing)', 
             'Science (Pre-Nursing)', 'Science (Pre-Nursing)', 'Science (Pre-Nursing)', 'Engineering', 
             'Engineering', 'Biology', 'Biology', 'Biology', 'Biology', 'Biology', 'Biology', 'Biology', 
             'Biology', 'Biology', 'Biology', 'Cell Biology', 'Cell Biology', 'Cell Biology', 'Cell Biology', 
             'Cell Biology', 'Cell Biology', 'Cell Biology', 'Cell Biology', 'Cell Biology', 'Cell Biology', 
             'Macrobiology', 'Macrobiology', 'Macrobiology', 'Macrobiology', 'Macrobiology', 'Macrobiology', 
             'Macrobiology', 'Macrobiology', 'Macrobiology', 'Macrobiology',  'Computer Information Science',  
             'Computer Information Science',  'Computer Information Science',  'Computer Information Science',  
             'Computer Information Science',  'Computer Information Science',  'Computer Information Science',  
             'Computer Information Science',  'Computer Information Science',  'Computer Information Science', 
             'Computer Science', 'Computer Science', 'Computer Science', 'Computer Science', 'Computer Science', 
             'Computer Science', 'Computer Science', 'Computer Science', 'Computer Science', 'Computer Science', 
             'Environmental Science', 'Environmental Science', 'Environmental Science', 'Environmental Science', 
             'Environmental Science', 'Environmental Science', 'Environmental Science', 'Environmental Science', 
             'Environmental Science', 'Environmental Science', 'Health Sciences', 'Health Sciences', 'Health Sciences', 
             'Health Sciences', 'Kinesiology', 'Kinesiology', 'Kinesiology', 'Kinesiology', 'Kinesiology', 
             'Kinesiology', 'Kinesiology', 'Kinesiology', 'Kinesiology', 'Kinesiology', 'Mathematics', 'Mathematics', 
             'Mathematics', 'Mathematics', 'Mathematics', 'Mathematics', 'Mathematics', 'Mathematics', 'Mathematics', 
             'Mathematics', 'Mathematics', 'Mathematics', 'Mathematics', 'Mathematics', 'Mathematics', 'Mathematics', 
             'Mathematics', 'Mathematics', 'Mathematics', 'Mathematics', 'Mathematics', 'Mathematics', 'Mathematics',  
             'Natural Sciences', 'Natural Sciences', 'Natural Sciences', 'Natural Sciences', 'Natural Sciences', 
             'Natural Sciences', 'Natural Sciences', 'Natural Sciences', 'Natural Sciences', 'Natural Sciences', 
             'Natural Sciences', 'Natural Sciences', 'Natural Sciences'), 
  'Measures'=c('Research Project/Essay', 
               'Pre/Post', 'Rubric', 'Pre/Post', 'Rubric', 'Rubric', 'Pre/Post', 'Ill-Defined', 'Rubric', 'Rubric', 
               'Rubric', 'Exam', 'Exit Exam', 'Pre/Post', 'Quiz', 'Pre/Post', 'Pre/Post', 'Rubric', 'Exit Exam', 
               'Pre/Post', 'Rubric', 'Rubric', 'Exit Exam', 'Pre/Post', 'Quiz', 'Pre/Post', 'Pre/Post', 'Rubric', 
               'Exit Exam', 'Pre/Post', 'Rubric', 'Rubric', 'Exit Exam', 'Pre/Post', 'Quiz', 'Pre/Post', 'Pre/Post', 
               'Rubric', 'Exit Exam', 'Pre/Post', 'Rubric', 'Rubric', 'Rubric', 'Rubric', 'Exam', 'Rubric', 'Rubric', 
               'Rubric', 'Exam', 'Rubric', 'Evaluator Observation/ Assessment', 'Evaluator Observation/ Assessment', 
               'Rubric', 'Exams & Projects', 'Exam', 'Rubric', 'Exam', 'Rubric', 'Exam', 'Rubric', 'Evaluator Observation/ Assessment', 
               'Evaluator Observation/ Assessment', 'Observation/ Other', 'Ill-Defined', 'Rubric', 'Rubric', 'Rubric',
               'Rubric', 'Exams (Multiple)', 'Rubric', 'Exams (Multiple)', 'Rubric', 'Rubric', 'Exam', 'Rubric', 'Exam',
               'Pre/Post', 'Pre/Post', 'Rubric', 'Pre/Post', 'Rubric', 'Exam', 'Rubric', 'Rubric', 'Pre/Post', 'Rubric',
               'Exam', 'Ill-Defined', 'Exam', 'Exam', 'Exam', 'Rubric', 'Rubric', 'Rubric', 'Alumni Survey', 
               'Journal/ Reflection', 'Rubric', 'Rubric', 'Rubric', 'Rubric', 'Rubric', 'Exam', 'Exam', 'Exam', 'Exam', 
               'Exam', 'Exam', 'Rubric', 'Rubric', 'Exam', 'Ill-Defined', 'Ill-Defined', 'Rubric', 'Rubric', 'Exam', 
               'Quiz', 'Quiz', 'Rubric', 'Exam', 'Exam', 'Quiz', 'Rubric'), 
  'LCOs'=c('1.1', '2.1', '1.1', '2.1', '1.2', 
           '3.1', '2.1', '3.1', '1.2', '3.1', '2.1', '2.2', '1.1', '2.1', '1.2', '2.1', '1.2', '2.2', '1.1', '2.1', 
           '2.2', '3.2', '1.1', '2.1', '1.2', '2.1', '1.2', '2.2', '1.1', '2.1', '2.2', '3.2', '1.1', '2.1', '1.2', 
           '2.1', '1.2', '2.2', '1.1', '2.1', '2.2', '3.2', '2', '4', '1.1', '3.1', '2.2', '4.2', '2.1', '4.1', '3.1', 
           '4.2', '2', '4', '1.1', '3.1', '1.1', '2.2', '2.1', '4.1', '3.1', '4.2', '1.2', '3.1', '1.1', '2.1', '1.2', 
           '2.1', '1.2', '2.1', '1.2', '2.1', '1.1', '2.1', '1.2', '2.2', '2.2', '5.1', '1.1', '5.1', '3.1', '6.1', '4.1', 
           '7.1', '2.1', '5.1', '3.1', '4.1', '4.2', '3.2', '3.3', '5.1', '5.3', '5.2', '1.1', '1.2', '5.3', '5.2', '5.1', 
           '2.1', '2.2', '3.1', '3.2', '3.3', '3.1', '3.2', '3.3', '4.1', '4.2', '1.1', '1.2', '1.3', '4.1', '4.2', '2.1', 
           '3.6', '3.1', '4.1', '1.1', '2.1', '3', '5'), 
  'Years'=c('2017-2018', '2017-2018', '2018-2019', '2018-2019', 
            '2019-2020', '2019-2020', '2020-2021', '2020-2021', '2021-2022', '2021-2022', '2021-2022', '2021-2022', 
            '2017-2018', '2017-2018', '2018-2019', '2018-2019', '2019-2020', '2019-2020', '2020-2021', '2020-2021', 
            '2021-2022', '2021-2022', '2017-2018', '2017-2018', '2018-2019', '2018-2019', '2019-2020', '2019-2020', 
            '2020-2021', '2020-2021', '2021-2022', '2021-2022', '2017-2018', '2017-2018', '2018-2019', '2018-2019', 
            '2019-2020', '2019-2020', '2020-2021', '2020-2021', '2021-2022', '2021-2022', '2017-2018', '2017-2018', 
            '2018-2019', '2018-2019', '2019-2020', '2019-2020', '2020-2021', '2020-2021', '2021-2022', '2021-2022', 
            '2017-2018', '2017-2018', '2018-2019', '2018-2019', '2019-2020', '2019-2020', '2020-2021', '2020-2021', 
            '2021-2022', '2021-2022', '2017-2018', '2017-2018', '2018-2019', '2018-2019', '2019-2020', '2019-2020', 
            '2020-2021', '2020-2021', '2021-2022', '2021-2022', '2020-2021', '2020-2021', '2021-2022', '2021-2022', 
            '2017-2018', '2017-2018', '2018-2019', '2018-2019', '2019-2020', '2019-2020', '2020-2021', '2020-2021', 
            '2021-2022', '2021-2022', '2017-2018',  '2017-2018',  '2017-2018', '2018-2019', '2018-2019', '2018-2019',
            '2018-2019', '2018-2019', '2019-2020', '2019-2020', '2019-2020', '2019-2020', '2019-2020', '2020-2021', 
            '2020-2021', '2020-2021', '2020-2021', '2020-2021', '2021-2022', '2021-2022', '2021-2022', '2021-2022', 
            '2021-2022', '2017-2018', '2017-2018',  '2017-2018', '2017-2018',  '2017-2018', '2018-2019', '2018-2019', 
            '2019-2020', '2019-2020', '2020-2021', '2020-2021', '2021-2022', '2021-2022'))

Created on 2023-10-07 with reprex v2.0.2


Solution

  • There are several errors in the posted code.
    The main error is to have repeated layers, geom_text_repel and scale_y_discrete. There are also repeated labs, x/ylabs and theme's.
    And the data transformations are made before the plot, not in the data argument.

    In the code below I start by defining a custom theme, so that the plot that follows is simpler and more readable.

    suppressPackageStartupMessages({
      library(dplyr)
      library(forcats)
      library(stringr)
      library(ggplot2)
      library(ggrepel)
    })
    
    theme_so_q77248246 <- function() {
      theme_minimal() %+replace%    #
        theme(
          panel.grid = element_line(linewidth = 0.1, colour = "lightgrey"),
          legend.position = "top",
          axis.text.x=element_text(angle = 90, hjust = 1),
          axis.text.y = element_text(size = 5, color = "black")
        )
    }
    
    methods_per_CONSM_majors |> 
      mutate(Majors = factor(Majors) |> fct_inorder(),
             Measures = factor(Measures) |> fct_rev()) |>
      ggplot(aes(Majors, Measures, label = LCOs, color = Years)) + 
      geom_point(alpha = 0) + 
      geom_text(size = 2.5, show.legend = FALSE) +
      geom_text_repel(box.padding = 1, max.overlaps = 5, show.legend = FALSE) +
      scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) +
      labs(title = "TEMP:", subtitle = "TEMP", x = "", y = "") +
      guides(color = guide_legend(
        nrow = 1, 
        override.aes = list(alpha = 1, size = 5)
      )) +
      theme_so_q77248246() -> X
    
    X + labs(color = NULL) +
      scale_color_manual(values=c("gold3", "firebrick2", "blue", 'orchid', 'black'))
    #> Warning: ggrepel: 117 unlabeled data points (too many overlaps). Consider
    #> increasing max.overlaps
    

    Created on 2023-10-07 with reprex v2.0.2