rggplot2as.date

My Ggplots are formatting the date incorrectly (sorting by year when in the format %m/%y)


countrydf$dates <- format(as.Date(countrydf$date, format="%Y-%m-%d"), "%m/%y", ordered = T)
germanydf <- subset(countrydf,location == "Germany")
ggplot(germanydf, aes(x=dates, y=total_deaths)) + 
  geom_bar(aes(), stat = 'identity', position = 'dodge') +
 geom_smooth(method="lm", col="Grey", size=1) + 
  labs(title="Deaths vs date in Germany", subtitle="From ourworldindata.org", y="Total Deaths", x="Date")  +
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=.5))

enter image description here

Even though I formatted as a date, I am still getting the dates as "01/2020, 01/2021, 01/2022" (incorrect) vs "01/2020, 02/2020, 03/2020" (correct). Any help would be greatly appreciated!

As well, when viewing the dataframe, they are ordered correctly, so it is clearly a Ggplot formatting thing.

enter image description here


Solution

  • The function format() returns a string, not a date. Thus, you get strings that are of the format 01/2020, 02/2020 etc. Naturally, these strings, when sorted, are not sorted by date – but just like any string, alphanumerically, and 01/2021 comes before 02/2020.

    What exactly is happening? When ggplot takes your variable as X, it converts it to a factor, because it is a categorical (as opposed to continuous) variables. By default, factor levels are sorted alphanumerically.

    Solutions: