rdataframeheaderreshapemelt

How to melt data in a data frame with a two-row header in R?


data <- data.frame(Q1_1 = c("abc",1,2,3,4),
                   Q1_2 = c("def",2,3,6,2),
                   Q2_1 = c("ghi",4,5,3,1),
                   Q2_2 = c("jkl",6,3,8,4),
                   ID = c("ID", "AA", "BB", "CC", "DD"),
                   Order = c("Order", "zz", "ss", "tt", "qq"))

The above resembles my data frame in terms of its layout, which should look like the one below. The actual data are saved as in a CSV file, once you read it into R with read.csv(), it would create a data frame like this.

  Q1_1 Q1_2 Q2_1 Q2_2 ID Order
1  abc  def  ghi  klm ID Order
2    1    2    4    6 AA    zz
3    2    3    5    3 BB    ss
4    3    6    3    8 CC    tt
5    4    2    1    4 DD    qq

The trickiest thing about it is that the data file has the first two rows as headers, and once it is read into R, it produces a data frame that presents the secondary header in the first row.

This is what I want to end up seeing. There are the codes that I manually wrote.

melted <- data.frame(response = c(1,2,3,4,2,3,6,2,4,5,3,1,6,3,8,4),
                    question_code = c("abc","abc","abc","abc",
                                      "def","def","def","def",
                                      "ghi","ghi","ghi","ghi",
                                      "jkl","jkl","jkl","jkl"),
                    question_number = c("Q1_1","Q1_1","Q1_1","Q1_1",
                                        "Q1_2","Q1_2","Q1_2","Q1_2",
                                        "Q2_1","Q2_1","Q2_1","Q2_1",
                                        "Q2_2","Q2_2","Q2_2","Q2_2"),
                    ID = c("AA", "BB", "CC", "DD",
                           "AA", "BB", "CC", "DD",
                           "AA", "BB", "CC", "DD",
                           "AA", "BB", "CC", "DD"),
                    Order = c("zz", "ss", "tt", "qq",
                              "zz", "ss", "tt", "qq",
                              "zz", "ss", "tt", "qq",
                              "zz", "ss", "tt", "qq"))

And it should look like this...

   response question_code question_number ID Order
1         1           abc            Q1_1 AA    zz
2         2           abc            Q1_1 BB    ss
3         3           abc            Q1_1 CC    tt
4         4           abc            Q1_1 DD    qq
5         2           def            Q1_2 AA    zz
6         3           def            Q1_2 BB    ss
7         6           def            Q1_2 CC    tt
8         2           def            Q1_2 DD    qq
9         4           ghi            Q2_1 AA    zz
10        5           ghi            Q2_1 BB    ss
11        3           ghi            Q2_1 CC    tt
12        1           ghi            Q2_1 DD    qq
13        6           jkl            Q2_2 AA    zz
14        3           jkl            Q2_2 BB    ss
15        8           jkl            Q2_2 CC    tt
16        4           jkl            Q2_2 DD    qq

So basically I want to stack the values ("response") into one column while creating four columns as references. Two of the four columns are question_code and question__name, which are transposed from the original two-header. The rest two column simply presents ID and Order (I do not know if the way I explained is understandable, but all the codes are here).

If the data file has only one row as header, I know how to do it with melt(), but this two-row header seems to largely complicates the situation.


Solution

  • data %>%
      mutate(across(starts_with('Q'), ~c(str_c(cur_column(),.[1], sep=':'), .[-1]))) %>%
      set_names(.[1,]) %>%
      slice(-1) %>%
      pivot_longer(starts_with('Q'), values_to = 'response', names_sep = ':', 
                   names_to = c("question_number", "question_code"))
    
    # A tibble: 16 × 5
       ID    Order question_number question_code response
       <chr> <chr> <chr>           <chr>         <chr>   
     1 AA    zz    Q1_1            abc           1       
     2 AA    zz    Q1_2            def           2       
     3 AA    zz    Q2_1            ghi           4       
     4 AA    zz    Q2_2            jkl           6       
     5 BB    ss    Q1_1            abc           2       
     6 BB    ss    Q1_2            def           3       
     7 BB    ss    Q2_1            ghi           5       
     8 BB    ss    Q2_2            jkl           3       
     9 CC    tt    Q1_1            abc           3       
    10 CC    tt    Q1_2            def           6       
    11 CC    tt    Q2_1            ghi           3       
    12 CC    tt    Q2_2            jkl           8       
    13 DD    qq    Q1_1            abc           4       
    14 DD    qq    Q1_2            def           2       
    15 DD    qq    Q2_1            ghi           1       
    16 DD    qq    Q2_2            jkl           4