rgeom-bargeom

geom_bar, how to only make the x highest frequency appear?


I'm working with a dataframe on state-sponsored cyberattacks (my main three variable are thus Date, Sponsor and Victim). I want to create a geom_bar where for each year, the top five victims of cyber attacks will appear.

I'm not sure how I could produce a reproductible example for this. I made a version where the overall top 5 victims appear, but it doesn't reflect change in target over the years.

enter image description here

cyber%>%
  filter(Sponsor_sep == "China" & 
         Victims_sep %in% c("United States", "China", "Japan", "South Korea", "India"))%>%
  ggplot() + 
  geom_bar(mapping = aes(x = Year, fill = Victims_sep))

EDIT: I followed @dandrews comment and created a sample

cyber <- tibble::tibble(
  Year = rep(c("2020", "2015", "2010", "2005"), c(73L, 53L, 9L, 4L)),
  Sponsor_sep = rep("China", 139L),
  Victims_sep = c(
    "Japan", "Australia", "Asia", "Australia", "Asia", "China",
    "China", "China", "United States", "United States", "China",
    "Japan", "Australia", "Australia", "Australia", "India", "Kazakhstan",
    "Kyrgyzstan", "Malaysia", "Russia", "Ukraine", "China", "United States",
    "United States", "Vietnam", "United States", "United States",
    "China", "China", "Malaysia", "Vietnam", "Asia", "China", "South Korea",
    "Myanmar", "China", "Myanmar", "United States", "China", "Vatican City",
    "China", "Vatican City", "China", "Japan", "Russia", "South Korea",
    "Japan", "Russia", "South Korea", "Japan", "Russia", "South Korea",
    "China", "International Organisations", "International Organisations",
    "Japan", "China", "United States", "United States", "United States",
    "United States", "United States", "Japan", "Russia", "South Korea",
    "International Organisations", "International Organisations",
    "Mongolia", "Mongolia", "Japan", "Asia", "Asia", "Mongolia",
    "India", "Thailand", "South Korea", "Saudi Arabia", "Malaysia",
    "United States", "Vietnam", "Cambodia", "Indonesia", "Myanmar",
    "China", "Laos", "Singapore", "Phillipines", "India", "Thailand",
    "South Korea", "Saudi Arabia", "Malaysia", "United States", "Vietnam",
    "Cambodia", "Indonesia", "Myanmar", "China", "Laos", "Singapore",
    "Phillipines", "Vietnam", "Vietnam", "Anthem", "United States",
    "United Kingdom", "China", "United States", "United Kingdom",
    "France", "United States", "United Kingdom", "France", "Thailand",
    "United States", "United States", "United States", "United States",
    "Malaysia", "Philippines", "India", "Indonesia", "United States",
    "United States", "United States", "Australia", "United States",
    "Asia", "India", "Australia", "United States", "United States",
    "International Organisations", "United States", "International Organisations",
    "United States", "United Kingdom", "United States", "United Kingdom"
  ),
)

Solution

  • OK, I found this to be an interesting brain teaser so I took a shot. I first created some data to work with, but not that because these data were drawn randomly the resulting figure is not very interesting. However, the code seems to work even if it is clunky.

    library(tidyverse)
    
    # Create the data and add some extra countries so the output varies
    cyber <- tibble(Year=sample(seq(2005,2022,1),50000,replace = T),
                    Victims_sep=sample(c("United States", "China", "Japan", "South Korea", "India",
                                         'England','Spain','Vietnam','Canada','France','Bangladesh','Taiwan','Morocco'),
                                       50000,
                                       replace = T))
    
    # Original plot from OP but with more countries 
    cyber %>% 
    ggplot() + 
      geom_bar(mapping = aes(x = Year, fill = Victims_sep))
    
    # New plot
     cyber %>% 
      group_by(Year,Victims_sep) %>% 
      summarise(n=n()) %>% # get the number of attached in each year for each country
      ungroup() %>% 
      group_by(Year) %>% 
    # get the number of attacks for the country with the most through 5th most in each year
      mutate(max_victim=max(n), 
             len=length(n),
             second=sort(n,partial=len-1)[len-1],
             third=sort(n,partial=len-2)[len-2],
             fourth=sort(n,partial=len-3)[len-3],
             fifth=sort(n,partial=len-4)[len-4]) %>% 
      rowwise() %>% 
      mutate(top5=ifelse(n %in% max_victim:fifth,1,0)) %>% # create an index 
      filter(top5==1) %>%  # keep only index values equal to 1
       
       ggplot() + 
       geom_col(mapping = aes(x = Year,y=n, fill = Victims_sep)) # use geom_col to apply the n value
    

    UPDATE USING OP'S DATA

    I think this works. Note that given the data provided each year can show more than 5 results in a column where there were equal number of attacks. For example, 2015 shows 8 countries, but 2 pairs of 3 have the same value.

     cyber %>% 
       mutate(Year=as.numeric(Year)) %>% 
      group_by(Year,Victims_sep) %>% 
      summarise(n=n()) %>% 
      ungroup() %>% 
      group_by(Year) %>% 
      mutate(max_victim=max(n),
             len=length(n),
             second=ifelse(len>=2,sort(n,partial=len-1)[len-1],0),
             third=ifelse(len>=3,sort(n,partial=len-2)[len-2],0),
             fourth=ifelse(len>=4,sort(n,partial=len-3)[len-3],0),
             fifth=ifelse(len>=5,sort(n,partial=len-4)[len-4],0)) %>%
      rowwise() %>% 
      mutate(top5=ifelse(n %in% max_victim:fifth,1,0)) %>% 
      filter(top5==1) %>% 
       
       ggplot() + 
       geom_col(mapping = aes(x = Year,y=n, fill = Victims_sep))
    

    enter image description here