rtidyversedata-conversionmultilevel-analysis

How to convert a wide data into long format for cross-classified model [R, GLMM]


I would like to convert wide data to long data in R, and my data set is for cross-classified models, exploring participants’ response to each target item that has different characteristics.

enter image description here

Although there are so many tutorials for a wide to long conversion, I could not find a one specifically explaining conversion for cross-classified models.

I would like to use tidyverse if possible for the sake of consistency.

My sample data is the following:

structure(list(item_name = c("x1", "x2", "participant_id", "1", 
"2", "3", "4", "5", "6", "7"), participant_variable_1 = c(NA, 
NA, NA, 20, 23, 21, 20, 19, 22, 30), condition = c(NA, NA, NA, 
"A", "B", "A", "B", "A", "B", "A"), t1.item1.test1 = c(1, 3, 
NA, 0, 1, 0, 1, 0, 0, 1), t1.item2.test1 = c(2, 2, NA, 0, 0, 
0, 1, 1, 0, 1), t1.item3.test1 = c(1, 3, NA, 0, 0, 0, 1, 0, 0, 
0), t1.item4.test1 = c(3, 1, NA, 1, 0, 0, 0, 1, 1, 0), t2.item1.test1 = c(1, 
3, NA, 0, 1, 1, 0, 1, 1, 1), t2.item2.test1 = c(2, 2, NA, 1, 
0, 1, 0, 1, 0, 1), t2.item3.test1 = c(1, 3, NA, 0, 0, 0, 1, 0, 
0, 0), t2.item4.test1 = c(3, 1, NA, 1, 1, 0, 1, 1, 1, 0), t1.item1.test2 = c(1, 
3, NA, 0, 1, 0, 1, 0, 0, 1), t1.item2.test2 = c(2, 2, NA, 0, 
0, 0, 1, 1, 0, 1), t1.item3.test2 = c(1, 3, NA, 0, 0, 0, 1, 0, 
0, 0), t1.item4.test2 = c(3, 1, NA, 1, 0, 0, 0, 1, 1, 0), t2.item1.test2 = c(1, 
3, NA, 0, 1, 1, 0, 1, 1, 1), t2.item2.test2 = c(2, 2, NA, 1, 
0, 1, 0, 1, 0, 1), t2.item3.test2 = c(1, 3, NA, 0, 0, 0, 1, 0, 
0, 0), t2.item4.test2 = c(3, 1, NA, 1, 1, 0, 1, 1, 1, 0)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

I would like to have a long data, which looks like the following:

enter image description here

Please and thank you for your guidance!


Solution

  • This answer requires heavy use of the new pivot_ functions in the dev version of tidyr. You can install that with devtools::install_github("tidyverse/tidyr") if you're willing to run the dev version.

    First we split the data into item and participant info - you're not really getting any benefit from storing both in the same table:

    item_info = dat[1:2, ]
    participant_info = dat[4:nrow(dat), ] %>%
        rename(participant_id = item_name)
    

    Then it's time for a lot of pivoting:

    # I have the dev version of tidyr so that is being loaded
    library(tidyverse)
    
    item_long = item_info %>%
        select(-participant_variable_1, -condition) %>%
        pivot_longer(
            cols = t1.item1:t2.item4,
            names_to = c("time", "item"),
            names_pattern = "t(\\d)\\.(item\\d)",
        ) %>%
        pivot_wider(names_from = item_name, values_from = value)
    
    participant_long = participant_info %>%
        pivot_longer(
            cols = t1.item1:t2.item4,
            names_to = c("time", "item"),
            names_pattern = "t(\\d)\\.(item\\d)",
            values_to = "response"
        )
    
    combined = participant_long %>%
        left_join(item_long, by = c("item", "time"))
    

    Result:

    > combined
    # A tibble: 56 x 8
       participant_id participant_variable_1 condition time  item  response    x1    x2
       <chr>                           <dbl> <chr>     <chr> <chr>    <dbl> <dbl> <dbl>
     1 1                                  20 A         1     item1        0     1     3
     2 1                                  20 A         1     item2        0     2     2
     3 1                                  20 A         1     item3        0     1     3
     4 1                                  20 A         1     item4        1     3     1