rpowerbiapriorimarket-basket-analysis

Power BI - R Script Visual - Apriori


I'm using an r script visual in PowerBI. I can run the below code in R and it works, but I get the error below when running in PowerBI. I want to show the results from apriori in a visual. Both tests below run fine in R, but NOT in R script visual from PowerBI. Any thoughts?

if I try #test2.

library(Matrix)
library(arules)
library(plyr)
library(gridExtra)

df_itemList <- ddply(dataset,c("SALESID"),function(df1)paste(df1$ITEMID))
#test1
#df_itemList  = sapply(df_itemList , function(x) gsub(" ", ",", x))
#basket_rules <- apriori(df_itemList, parameter = list(sup=0.1,conf=0.5,target="rules", maxlen=5));

#test2
txn = read.transactions(df_itemList, rm.duplicates = TRUE, format = "basket", sep = ",", cols = 1);
basket_rules <- apriori(txn, parameter = list(sup=0.1,conf=0.5,target="rules", maxlen=5));

df_basket <- as(basket_rules,"data.frame")
grid.table(df_basket)

Error Message: R script error.

Attaching package: 'arules'

The following objects are masked from 'package:base':

abbreviate, write

Error in readLines(file, encoding = encoding) : 'con' is not a connection Calls: read.transactions -> lapply -> readLines Execution halted

If I try #test1...

library(Matrix)
library(arules)
library(plyr)
library(gridExtra)

df_itemList <- ddply(dataset,c("SALESID"),function(df1)paste(df1$ITEMID))
#test1
df_itemList  = sapply(df_itemList , function(x) gsub(" ", ",", x))
basket_rules <- apriori(df_itemList, parameter = list(sup=0.1,conf=0.5,target="rules", maxlen=5));

#test2
#txn = read.transactions(df_itemList, rm.duplicates = TRUE, format = "basket", sep = ",", cols = 1);
#basket_rules <- apriori(txn, parameter = list(sup=0.1,conf=0.5,target="rules", maxlen=5));

df_basket <- as(basket_rules,"data.frame")
grid.table(df_basket)

Then I get the error below.

Error Message: R script error.

Attaching package: 'arules'

The following objects are masked from 'package:base':

abbreviate, write

Error in asMethod(object) : column(s) 2, 3, 4 not logical or a factor. Discretize the columns first. Calls: apriori -> as -> asMethod Execution halted


Solution

  • The proper way to use read.transactions in a PowerBI R script is to convert the dataframe to a matrix, then to the transactions class. This is to by pass the exporting to a csv, then read back in to read.transactions... Reference here

    library(Matrix)
    library(arules)
    library(plyr)
    library(dplyr)
    library(gridExtra)
    
    itemList <- dataset
    #itemList <- read.csv("ItemListAll.csv", header=TRUE, sep=",")
    
    # Converting to a Matrix ####
    itemList$const = TRUE
    
    # Remove duplicates
    dim(itemList)
    orders <- unique(itemList)
    dim(itemList)
    
    # Need to reshape the matrix
    itemList_max_prep <- reshape(data = itemList,
                               idvar = "SALESID",
                               timevar = "ITEMID",
                               direction = "wide")
    
    # Drop the SALESID
    itemList_matrix <- as.matrix(itemList_max_prep[,-1])
    
    # Clean up the missing values to be FALSE
    itemList_matrix[is.na(itemList_matrix)] <- FALSE
    
    # Clean up names
    colnames(itemList_matrix) <- gsub(x=colnames(itemList_matrix),
                                   pattern="const\\.", replacement="")
    
    itemList_trans <- as(itemList_matrix,"transactions")
    
    #inspect(itemList_trans)
    
    basket_rules <- apriori(itemList_trans, parameter = list(sup=0.01,conf=0.5,target="rules", minlen=3));
    df_basket <- as(basket_rules,"data.frame")
    df_basket$support <- ceiling(df_basket$support * 100)
    df_basket$confidence<- ceiling(df_basket$confidence * 100)
    df_basket$lift<- round(df_basket$lift, digits = 2)
    df_basket <- df_basket[rev(order(df_basket$support)),];
    grid.table(head(df_basket));