I have a tibble, with train route data, and whether the rider was a member or not, using ggplot's bar chart, I have starting station name as x, count as y, and the colour based on if they're a member or not.
However, there are over 700 stations here and thus the chart is cluttered, I'm looking to take the top 10 (the most frequent) and the bottom 10 (the least frequent), the issue is I don't think I can use the standard slice_min
and slice_max
functions as the count column is not present, as I am relying on ggplot's default behaviour to put the count on the y axis, rather than a count column.
Is there a way to select the top 10 and bottom 10 counts so the chart isn't crowded? Additionally, I'd like to show the top and bottom as 2 sub plots.
A tibble: 6 × 3
starting_station_name ending_station_name member_status
<chr> <chr> <chr>
1 American University East Campus 39th & Veazey St NW member
2 Washington & Independence Ave SW/HHS Independence Ave & L'Enfant Plaza SW/DOE member
3 15th St & Massachusetts Ave SE 12th St & Pennsylvania Ave SE member
4 New Hampshire Ave & Ward Pl NW 14th & Rhode Island Ave NW casual
5 11th & Girard St NW Georgia & New Hampshire Ave NW member
6 15th & W St NW California St & Florida Ave NW member
using the code
rides_stations <- subset(rides_cleaned, select = c(5,7,8))
q1 <- ggplot(rides_stations, aes(x=starting_station_name, fill = member_status)) +
geom_bar()
q1
library(dplyr); library(forcats)
data.frame(starting_station_name = sample(letters, 500, TRUE, prob = 26:1),
member_status = sample(c("casual", "member"), 500, TRUE)) |>
count(starting_station_name, member_status) |>
mutate(starting_station_name = factor(starting_station_name) |>
fct_lump(n = 10, w = n) |>
fct_reorder(-n, sum)) |>
filter(starting_station_name != "Other") |>
ggplot(aes(starting_station_name, n, fill = member_status)) +
geom_col()