data-miningwekaapriori

How to setup a csv or txt file for uploading to weka?


How should a txt or csv file be setup for uploading to weka in order to use apriori? I have tried setting it up as a binary but the associations don't seem to come out correctly. Assuming my database transactions are simple like below what would be the correct way to create a csv or txt file for uploading to weka? The first column is the transaction id and the latter is the items for that transaction.

1 --- {M,O,N,K,E,Y}
2 --- {D,O,N,K,E,Y}
3 --- {M,A,K,E}
4 --- {C,O,O,K,I,E}
5 --- {D,O,O,D,L,E}


Solution

  • Weka comes with an example dataset supermarket, which contains a dataset that is in the right format for Apriori for market basket analysis (this article uses it).

    Since Weka does not handle variable number of attributes per row, each item that was bought, gets a separate column. If the item was bought, then a t (= true) is stored, otherwise a ? (= missing value).

    In your case, you would have to do something similar: e.g., creating a CSV spreadsheet with separate columns for each item and filling them with t if the transaction contains that item, otherwise leave it empty. For example:

    id,A,C,D,E,I,K,L,M,N,O,Y
    1,,,,t,,t,,t,t,t,t
    2,,,t,t,,t,,,t,t,t
    3,t,,,t,,t,,t,,,
    4,,t,,t,t,t,,,,t,
    5,,,t,t,,,t,,,t,
    

    You can then load the dataset in the Weka Explorer and save it as ARFF (which will use ? for the missing values).

    However, Apriori only handles nominal attributes and your ID attribute is numeric. You can then either delete that attribute before running Apriori or turn it into nominal attribute using the NumericToNominal filter in the Preprocess panel.