rggplot2bar-chartreshapereshape2

Grouped bar plot in ggplot


I have a survey file in which row are observation and column question.

Here are some fake data they look like:

People,Food,Music,People
P1,Very Bad,Bad,Good
P2,Good,Good,Very Bad
P3,Good,Bad,Good
P4,Good,Very Bad,Very Good
P5,Bad,Good,Very Good
P6,Bad,Good,Very Good

My aim is to create this kind of plot with ggplot2.

enter image description here

Here are my fake data:

raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",")
raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)

But if I choose Y as count then I'm facing an issue about choosing the X and the Group values... I don't know if I can succeed without using reshape2... I've also tired to use reshape with melt function. But I don't understand how to use it...


Solution

  • EDIT: Many years later

    For a pure ggplot2 + utils::stack() solution, see the answer by @markus!


    A somewhat verbose tidyverse solution, with all non-base packages explicitly stated so that you know where each function comes from:

    library(magrittr) # needed for %>% if dplyr is not attached
    
    "http://pastebin.com/raw.php?i=L8cEKcxS" %>%
      utils::read.csv(sep = ",") %>%
      tidyr::pivot_longer(cols = c(Food, Music, People.1),
                          names_to = "variable",
                          values_to = "value") %>%
      dplyr::group_by(variable, value) %>%
      dplyr::summarise(n = dplyr::n()) %>%
      dplyr::mutate(value = factor(
        value,
        levels = c("Very Bad", "Bad", "Good", "Very Good"))
      ) %>%
      ggplot2::ggplot(ggplot2::aes(variable, n)) +
      ggplot2::geom_bar(ggplot2::aes(fill = value),
                        position = "dodge",
                        stat = "identity")
    

    The original answer:

    First you need to get the counts for each category, i.e. how many Bads and Goods and so on are there for each group (Food, Music, People). This would be done like so:

    raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",")
    raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
    raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
    raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)
    
    raw=raw[,c(2,3,4)] # getting rid of the "people" variable as I see no use for it
    
    freq=table(col(raw), as.matrix(raw)) # get the counts of each factor level
    

    Then you need to create a data frame out of it, melt it and plot it:

    Names=c("Food","Music","People")     # create list of names
    data=data.frame(cbind(freq),Names)   # combine them into a data frame
    data=data[,c(5,3,1,2,4)]             # sort columns
    
    # melt the data frame for plotting
    data.m <- melt(data, id.vars='Names')
    
    # plot everything
    ggplot(data.m, aes(Names, value)) +   
      geom_bar(aes(fill = variable), position = "dodge", stat="identity")
    

    Is this what you're after?

    enter image description here

    To clarify a little bit, in ggplot multiple grouping bar you had a data frame that looked like this:

    > head(df)
      ID Type Annee X1PCE X2PCE X3PCE X4PCE X5PCE X6PCE
    1  1    A  1980   450   338   154    36    13     9
    2  2    A  2000   288   407   212    54    16    23
    3  3    A  2020   196   434   246    68    19    36
    4  4    B  1980   111   326   441    90    21    11
    5  5    B  2000    63   298   443   133    42    21
    6  6    B  2020    36   257   462   162    55    30
    

    Since you have numerical values in columns 4-9, which would later be plotted on the y axis, this can be easily transformed with reshape and plotted.

    For our current data set, we needed something similar, so we used freq=table(col(raw), as.matrix(raw)) to get this:

    > data
       Names Very.Bad Bad Good Very.Good
    1   Food        7   6    5         2
    2  Music        5   5    7         3
    3 People        6   3    7         4
    

    Just imagine you have Very.Bad, Bad, Good and so on instead of X1PCE, X2PCE, X3PCE. See the similarity? But we needed to create such structure first. Hence the freq=table(col(raw), as.matrix(raw)).