countrydf$dates <- format(as.Date(countrydf$date, format="%Y-%m-%d"), "%m/%y", ordered = T)
germanydf <- subset(countrydf,location == "Germany")
ggplot(germanydf, aes(x=dates, y=total_deaths)) +
geom_bar(aes(), stat = 'identity', position = 'dodge') +
geom_smooth(method="lm", col="Grey", size=1) +
labs(title="Deaths vs date in Germany", subtitle="From ourworldindata.org", y="Total Deaths", x="Date") +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=.5))
Even though I formatted as a date, I am still getting the dates as "01/2020, 01/2021, 01/2022" (incorrect) vs "01/2020, 02/2020, 03/2020" (correct). Any help would be greatly appreciated!
As well, when viewing the dataframe, they are ordered correctly, so it is clearly a Ggplot formatting thing.
The function format()
returns a string, not a date. Thus, you get strings that are of the format 01/2020
, 02/2020
etc. Naturally, these strings, when sorted, are not sorted by date – but just like any string, alphanumerically, and 01/2021
comes before 02/2020
.
What exactly is happening? When ggplot takes your variable as X, it converts it to a factor, because it is a categorical (as opposed to continuous) variables. By default, factor levels are sorted alphanumerically.
Solutions:
format
function. ggplot understands dates. If you want, you can drop the day from the date (e.g. using lubridate).