I'm working with a dataframe on state-sponsored cyberattacks (my main three variable are thus Date, Sponsor and Victim). I want to create a geom_bar where for each year, the top five victims of cyber attacks will appear.
I'm not sure how I could produce a reproductible example for this. I made a version where the overall top 5 victims appear, but it doesn't reflect change in target over the years.
cyber%>%
filter(Sponsor_sep == "China" &
Victims_sep %in% c("United States", "China", "Japan", "South Korea", "India"))%>%
ggplot() +
geom_bar(mapping = aes(x = Year, fill = Victims_sep))
EDIT: I followed @dandrews comment and created a sample
cyber <- tibble::tibble(
Year = rep(c("2020", "2015", "2010", "2005"), c(73L, 53L, 9L, 4L)),
Sponsor_sep = rep("China", 139L),
Victims_sep = c(
"Japan", "Australia", "Asia", "Australia", "Asia", "China",
"China", "China", "United States", "United States", "China",
"Japan", "Australia", "Australia", "Australia", "India", "Kazakhstan",
"Kyrgyzstan", "Malaysia", "Russia", "Ukraine", "China", "United States",
"United States", "Vietnam", "United States", "United States",
"China", "China", "Malaysia", "Vietnam", "Asia", "China", "South Korea",
"Myanmar", "China", "Myanmar", "United States", "China", "Vatican City",
"China", "Vatican City", "China", "Japan", "Russia", "South Korea",
"Japan", "Russia", "South Korea", "Japan", "Russia", "South Korea",
"China", "International Organisations", "International Organisations",
"Japan", "China", "United States", "United States", "United States",
"United States", "United States", "Japan", "Russia", "South Korea",
"International Organisations", "International Organisations",
"Mongolia", "Mongolia", "Japan", "Asia", "Asia", "Mongolia",
"India", "Thailand", "South Korea", "Saudi Arabia", "Malaysia",
"United States", "Vietnam", "Cambodia", "Indonesia", "Myanmar",
"China", "Laos", "Singapore", "Phillipines", "India", "Thailand",
"South Korea", "Saudi Arabia", "Malaysia", "United States", "Vietnam",
"Cambodia", "Indonesia", "Myanmar", "China", "Laos", "Singapore",
"Phillipines", "Vietnam", "Vietnam", "Anthem", "United States",
"United Kingdom", "China", "United States", "United Kingdom",
"France", "United States", "United Kingdom", "France", "Thailand",
"United States", "United States", "United States", "United States",
"Malaysia", "Philippines", "India", "Indonesia", "United States",
"United States", "United States", "Australia", "United States",
"Asia", "India", "Australia", "United States", "United States",
"International Organisations", "United States", "International Organisations",
"United States", "United Kingdom", "United States", "United Kingdom"
),
)
OK, I found this to be an interesting brain teaser so I took a shot. I first created some data to work with, but not that because these data were drawn randomly the resulting figure is not very interesting. However, the code seems to work even if it is clunky.
library(tidyverse)
# Create the data and add some extra countries so the output varies
cyber <- tibble(Year=sample(seq(2005,2022,1),50000,replace = T),
Victims_sep=sample(c("United States", "China", "Japan", "South Korea", "India",
'England','Spain','Vietnam','Canada','France','Bangladesh','Taiwan','Morocco'),
50000,
replace = T))
# Original plot from OP but with more countries
cyber %>%
ggplot() +
geom_bar(mapping = aes(x = Year, fill = Victims_sep))
# New plot
cyber %>%
group_by(Year,Victims_sep) %>%
summarise(n=n()) %>% # get the number of attached in each year for each country
ungroup() %>%
group_by(Year) %>%
# get the number of attacks for the country with the most through 5th most in each year
mutate(max_victim=max(n),
len=length(n),
second=sort(n,partial=len-1)[len-1],
third=sort(n,partial=len-2)[len-2],
fourth=sort(n,partial=len-3)[len-3],
fifth=sort(n,partial=len-4)[len-4]) %>%
rowwise() %>%
mutate(top5=ifelse(n %in% max_victim:fifth,1,0)) %>% # create an index
filter(top5==1) %>% # keep only index values equal to 1
ggplot() +
geom_col(mapping = aes(x = Year,y=n, fill = Victims_sep)) # use geom_col to apply the n value
UPDATE USING OP'S DATA
I think this works. Note that given the data provided each year can show more than 5 results in a column where there were equal number of attacks. For example, 2015 shows 8 countries, but 2 pairs of 3 have the same value.
cyber %>%
mutate(Year=as.numeric(Year)) %>%
group_by(Year,Victims_sep) %>%
summarise(n=n()) %>%
ungroup() %>%
group_by(Year) %>%
mutate(max_victim=max(n),
len=length(n),
second=ifelse(len>=2,sort(n,partial=len-1)[len-1],0),
third=ifelse(len>=3,sort(n,partial=len-2)[len-2],0),
fourth=ifelse(len>=4,sort(n,partial=len-3)[len-3],0),
fifth=ifelse(len>=5,sort(n,partial=len-4)[len-4],0)) %>%
rowwise() %>%
mutate(top5=ifelse(n %in% max_victim:fifth,1,0)) %>%
filter(top5==1) %>%
ggplot() +
geom_col(mapping = aes(x = Year,y=n, fill = Victims_sep))