I have a dataframe that contains:
userID song sex
1 songA M
2 songB F
1 songC M
2 songA F
... ... ...
So each line is a register of a song listened by the user.
I want to use "arules" but first I need to transform this dataframe to a transaction. I've searched a lot but actually I'don't know if my idea is wrong because I have no answer yet.
I've find solutions like using split to create lists of lists with all songs listend by each user, but if I do that I'll lose the sex information. I'll only get rules like {songA,songB} -> {songZ}
.
I want to generate rules like {songA,songC,M} -> {songZ}
(using the sex information). I don't know if I am wrong with my idea and this is not possible.
Any idea?
Thanks.
If you're looking at associations, you'll generally want to reshape your data into a long dataframe, with an ID column, and another column for your binary item attributes.
There are many ways to reshape your data to get the right form. In your example, I reshaped using tidyverse
, and also added a distinct so that the user's gender wouldn't be stated multiple times.
txt = "
userID song sex
1 songA M
2 songB F
1 songC M
2 songA F "
df <- read.table(text = txt, header = TRUE)
library(tidyverse)
df %>%
pivot_longer(cols = c(song, sex)) %>%
distinct()
#> # A tibble: 6 x 3
#> userID name value
#> <int> <chr> <fct>
#> 1 1 song songA
#> 2 1 sex M
#> 3 2 song songB
#> 4 2 sex F
#> 5 1 song songC
#> 6 2 song songA