rtidy

How can I spread values from multiple rows across multiple columns in R


I feel like this is a very simple procedure and yet I haven't managed to do it yet. I have a df like this:

book_title  author
book1       bob
book1       jane
book2       alice
book2       fred
book2       greg
book3       louise
book3       sam

And I would like to transform it to look like this:

book_title  author_1  author_2  author_3
book1       bob       jane      NA
book2       alice     fred      greg
book3       louise    sam       NA

I know that spread(), pivot_wider() and separate() are not going to be of much use. All help is greatly appreciated!


Solution

  • I've included three ways to do this: a base R approach (i.e. using no packages), a tidyverse method and a data.table solution.

    I wouldn't generally use base R for reshaping wide/long as I find the syntax a little confusing - although in this case it's not too bad. In my view, the tidyverse syntax is the most intuitive. However, with large datasets it can be much slower than data.table

    base R approach

    dt$author_num <- unlist(lapply(rle(dt$book_title)$lengths, seq))
    
    reshape(
        dt,
        idvar = "book_title",
        timevar = "author_num",
        direction = "wide",
        sep = "_"
    )
    
    #   book_title author_1 author_2 author_3
    # 1      book1      bob     jane     <NA>
    # 3      book2    alice     fred     greg
    # 6      book3   louise      sam     <NA>
    

    tidyverse approach

    library(dplyr)
    library(tidyr)
    
    dt  |>
      group_by(book_title)  |>
      mutate(author_num = paste0(
        "author_", row_number()
      ))  |>
      pivot_wider(
        id_cols = book_title,
        names_from = author_num,
        values_from = author
      )
    
    # # A tibble: 3 × 4
    # # Groups:   book_title [3]
    #   book_title author_1 author_2 author_3
    #   <chr>      <chr>    <chr>    <chr>   
    # 1 book1      bob      jane     NA      
    # 2 book2      alice    fred     greg    
    # 3 book3      louise   sam      NA      
    

    data.table approach

    Here is a method using the data.table::dcast function:

    library(data.table)
    setDT(dt)
    
    dt[,
        author_num := paste0("author_", seq_len(.N)),
        by = book_title
    ]
    
    dcast(dt, book_title ~ author_num, value.var = "author")
    
    #    book_title author_1 author_2 author_3
    #        <char>   <char>   <char>   <char>
    # 1:      book1      bob     jane     <NA>
    # 2:      book2    alice     fred     greg
    # 3:      book3   louise      sam     <NA>
    

    Original data

    dt  <- read.table(text = "book_title  author
    book1       bob
    book1       jane
    book2       alice
    book2       fred
    book2       greg
    book3       louise
    book3       sam", h = T)