rdataframemarket-basket-analysis

Is there an R function to add columns by multipling value of an other column?


I want to do an association analysis, but need to bring my data frame into the right format, which only shows transactions. 1) How can I multiply my "Sub Category" column by the amount of "Quantity" column?

2) How can I group the Transactions by Order-ID?

I have this df:

 `Order ID`        `Sub-Category` `Quantity`
  <chr>              <chr>             <dbl>

1 CA-2017-152156    Bookcases             2

2 CA-2017-152156    Chairs                3

3 CA-2017-138688    Labels                2

1) I want this:

  `Order ID`     `Sub-Category` `Sub-Category2`   `Sub-Category3`
  <chr>          <chr>             <chr>           <chr>

1 CA-2017-152156 Bookcases        Bookcases         NULL

2 CA-2017-152156 Chairs             Chairs          Chairs

3 CA-2017-138688 Labels            Labels           NULL

(After that I want to combine same Order IDs. E.g. row 1 and 2. Do you have a hint for that?) Thank you!


Solution

  • The following answers to point 1).

    Max <- max(df1$Quantity)
    res <- lapply(seq_len(nrow(df1)), function(i){
      c(rep(as.character(df1[i, 2]), df1[i, 3]), rep(NA, Max - df1[i, 3]))
    })
    res <- cbind(df1[1], do.call(rbind, res))
    names(res)[-1] <- paste0(names(df1)[2], names(res)[-1])
    
    res
    #        Order ID Sub-Category1 Sub-Category2 Sub-Category3
    #1 CA-2017-152156     Bookcases     Bookcases          <NA>
    #2 CA-2017-152156        Chairs        Chairs        Chairs
    #3 CA-2017-138688        Labels        Labels          <NA>
    

    Data in dput format.

    df1 <-
    structure(list(`Order ID` = structure(c(2L, 2L, 1L), 
    .Label = c("CA-2017-138688", "CA-2017-152156"), 
    class = "factor"), `Sub-Category` = structure(1:3, 
    .Label = c("Bookcases", "Chairs", "Labels"), class = 
    "factor"), Quantity = c(2L, 3L, 2L)), class = "data.frame", 
    row.names = c("1", "2", "3"))