I feel like this is a very simple procedure and yet I haven't managed to do it yet. I have a df like this:
book_title author
book1 bob
book1 jane
book2 alice
book2 fred
book2 greg
book3 louise
book3 sam
And I would like to transform it to look like this:
book_title author_1 author_2 author_3
book1 bob jane NA
book2 alice fred greg
book3 louise sam NA
I know that spread(), pivot_wider() and separate() are not going to be of much use. All help is greatly appreciated!
I've included three ways to do this: a base R approach (i.e. using no packages), a tidyverse
method and a data.table
solution.
I wouldn't generally use base R for reshaping wide/long as I find the syntax a little confusing - although in this case it's not too bad. In my view, the tidyverse
syntax is the most intuitive. However, with large datasets it can be much slower than data.table
dt$author_num <- unlist(lapply(rle(dt$book_title)$lengths, seq))
reshape(
dt,
idvar = "book_title",
timevar = "author_num",
direction = "wide",
sep = "_"
)
# book_title author_1 author_2 author_3
# 1 book1 bob jane <NA>
# 3 book2 alice fred greg
# 6 book3 louise sam <NA>
library(dplyr)
library(tidyr)
dt |>
group_by(book_title) |>
mutate(author_num = paste0(
"author_", row_number()
)) |>
pivot_wider(
id_cols = book_title,
names_from = author_num,
values_from = author
)
# # A tibble: 3 × 4
# # Groups: book_title [3]
# book_title author_1 author_2 author_3
# <chr> <chr> <chr> <chr>
# 1 book1 bob jane NA
# 2 book2 alice fred greg
# 3 book3 louise sam NA
Here is a method using the data.table::dcast
function:
library(data.table)
setDT(dt)
dt[,
author_num := paste0("author_", seq_len(.N)),
by = book_title
]
dcast(dt, book_title ~ author_num, value.var = "author")
# book_title author_1 author_2 author_3
# <char> <char> <char> <char>
# 1: book1 bob jane <NA>
# 2: book2 alice fred greg
# 3: book3 louise sam <NA>
dt <- read.table(text = "book_title author
book1 bob
book1 jane
book2 alice
book2 fred
book2 greg
book3 louise
book3 sam", h = T)