rdataframetransactionsarules

How can I transform this dataframe (with more than 2 rows) to a transaction in R?


I have a dataframe that contains:

userID song   sex
1      songA  M
2      songB  F
1      songC  M
2      songA  F 
...    ...    ...

So each line is a register of a song listened by the user. I want to use "arules" but first I need to transform this dataframe to a transaction. I've searched a lot but actually I'don't know if my idea is wrong because I have no answer yet. I've find solutions like using split to create lists of lists with all songs listend by each user, but if I do that I'll lose the sex information. I'll only get rules like {songA,songB} -> {songZ}. I want to generate rules like {songA,songC,M} -> {songZ} (using the sex information). I don't know if I am wrong with my idea and this is not possible. Any idea?

Thanks.


Solution

  • If you're looking at associations, you'll generally want to reshape your data into a long dataframe, with an ID column, and another column for your binary item attributes.

    There are many ways to reshape your data to get the right form. In your example, I reshaped using tidyverse, and also added a distinct so that the user's gender wouldn't be stated multiple times.

    txt = "
    userID song   sex
    1      songA  M
    2      songB  F
    1      songC  M
    2      songA  F "
    df <- read.table(text = txt, header = TRUE)
    
    library(tidyverse)
    df %>%
      pivot_longer(cols = c(song, sex)) %>%
      distinct()
    #> # A tibble: 6 x 3
    #>   userID name  value
    #>    <int> <chr> <fct>
    #> 1      1 song  songA
    #> 2      1 sex   M    
    #> 3      2 song  songB
    #> 4      2 sex   F    
    #> 5      1 song  songC
    #> 6      2 song  songA