rggplot2boxplot

How to fix aesthetic length error when making box plots


I'm making a boxplot from the below df (I'm sorry if this is the wrong way to post a dataframe. I just copied and pasted the output from the dput function). I've used this code to make the boxplot:

IPC_15 <- tidyr::pivot_longer(Income_percap_15, -c("State", "Counties"), names_to = "Income_Per_Capita", values_to = "num") %>% 
  ggplot(aes(x="", y = Income_percap_15)) + 


geom_boxplot() + coord_cartesian(ylim = c(0, 52))
IPC_15 + labs(x = "State",
                y = "Income per Capita",
                title = "US Income per capita per state")

However I keep getting the error "Aesthetics must be either length 1 or the same as the data (52): y".

Any ideas how to fix this?

structure(list(State = structure(1:52, .Label = c("Alabama", 
"Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", 
"Delaware", "District of Columbia", "Florida", "Georgia", "Hawaii", 
"Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", 
"Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", 
"Minnesota", "Mississippi", "Missouri", "Montana", "Nebraska", 
"Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", 
"North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", 
"Pennsylvania", "Puerto Rico", "Rhode Island", "South Carolina", 
"South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", 
"Washington", "West Virginia", "Wisconsin", "Wyoming"), class = "factor"), 
    Counties = c(67L, 29L, 15L, 75L, 58L, 64L, 8L, 3L, 1L, 67L, 
    159L, 5L, 44L, 102L, 92L, 99L, 105L, 120L, 64L, 16L, 24L, 
    14L, 83L, 87L, 82L, 115L, 56L, 93L, 17L, 10L, 21L, 33L, 62L, 
    100L, 53L, 88L, 77L, 36L, 67L, 78L, 5L, 46L, 66L, 95L, 254L, 
    29L, 14L, 133L, 39L, 55L, 72L, 23L), Income_15 = c(20780.9402985075, 
    30332.9655172414, 21052.5333333333, 20072.0266666667, 27902.6034482759, 
    27747.25, 37025.125, 28952, 47675, 23501.8507462687, 20566.0062893082, 
    31892.6, 21451.1136363636, 25485.7156862745, 23977.0652173913, 
    26555.8686868687, 24953.0476190476, 20663.6083333333, 22064.609375, 
    25792.3125, 33073.2083333333, 35554.4285714286, 23662.2048192771, 
    27610.4252873563, 18805.0487804878, 21504.7826086957, 25020.6785714286, 
    26336.8494623656, 26317.7058823529, 31810.4, 36084.5238095238, 
    21789.4545454545, 28189.7580645161, 22514.36, 31900.5094339623, 
    24467.7727272727, 22811.8701298701, 24311.9166666667, 25952.223880597, 
    9617.66666666667, 35670.6, 21411.9565217391, 25334.8939393939, 
    21442.4210526316, 23551.7992125984, 22552.2413793103, 28487.2142857143, 
    27065.3909774436, 25734.4102564103, 21710.4181818182, 26250.7222222222, 
    29223.652173913)), row.names = c(NA, -52L), class = "data.frame")

Solution

  • This solution has multiple parts due to a number of comments that hopefully can help you. I'll try to arrange the points accordingly:

    Error Message Text and Meaning

    Your error message, "Aesthetics must be either length 1 or the same as the data (52): y" is indicating that one of the aes() attributes is not mapping for all points in your dataset. The description at the end gives you the number it "should" be (52) based on what was found in mapping of one of the aesthetics or the # of observations in your datset. You have 52 rows in your dataframe, so that means one of those aesthetics are not mapped correctly. You can use "" for an aesthetic mapping, which basically means "map the entire dataframe as one". It seems the error is specifically with y=Income_percap_15. After your pivot_longer call, there is no column with that name. I think you want to use y=num there.

    Intended Aesthetics and your intended plot

    Your code has aesthetics indicated for x="" and y="Income_percap_15", which would indicate you want to show one boxplot for the entire dataset. However, your labs() call indicates you wish to show a boxplot for every state. While you can show the "single boxplot" for the entire dataset" (aes(x="",...)), your data will not be able to show you a boxplot for every state. A boxplot represents the distribution of data, so that means you need multiple points of "y" for every "x" value. In your dataframe, you only have one "y" value (Income per capita) for each "x" (State).

    Kinda problematic limits

    The limits you set (0 to 52) are applied to the y aesthetic. The y aesthetic appears to be intended to be mapped to Income per capita. In your dataframe after the pivot_longer call, that would be the "num" column, which has a minimum value of 9618 and max of 47675 - clearly out of bounds for the limit you set. That means you'll see an empty plot. If you wanted this to apply to the x aesthetic (52 States), which I believe is your intention, it's not needed here - you only need to specify the correct aesthetic. Since you indicated to apply this limit to the y axis... I'm doing an assumption here that you are looking to have horizontally-arranged boxplots. For that, you are "flipping" the axis, which would be coord_flip().

    The Final plot?

    Well, I wish I had better news, but as mentioned above, your intended boxplot appears to not be possible with the data you have. To "fix" your code to show you a boxplot (even though it won't be possible), here it is below. Note that the resulting "boxplot" shows lines for every state, because for every state, n=1. The "distribution" is therefore not really a distribution. Note: assume here that df is your dataframe after the pivot_longer call:

    ggplot(data=df, aes(x=State, y = num)) +
        geom_boxplot() +
        coord_flip() +
        labs(y='Income per capita', title="US Income per capita per state") +
        theme(
            axis.text.y=element_text(size=7, vjust=0.3),
            plot.title=element_text(size=9)
        )
    

    enter image description here

    It actually doesn't look too bad to show "lines" instead of a "box" here, but you can certainly make the same plot and use geom_point or even geom_segment to give you the "line" look, albeit cleaner. Some other notes about the plot: