I'm using latest trunk version of mahout's PFP Growth implementation on top of a hadoop cluster to determine frequent patterns in movielens dataset. In a previous step I converted the dataset to a list of transactions as the pfp growth algorithm needs that input format.
However, the output I get is unexpected
For example for item 1017 the only frequent pattern is
1017 ([100,1017, 50])
I would also expect a pattern like ([1017], X) with X >= 50 in that line.
I also testset an example input
1,2,3
1,2,3
1,3
and the output I get is
1 ([1, 3],3), ([1],3), ([1, 3, 2],2)
2 ([1, 3, 2],2)
3 ([1, 3],3), ([1, 3, 2],2)
There are missing patterns like ([1,2],2)
What is wrong?
The reason is that the FP Algorithm does not output subsets of a frequent pattern if its support is not greater. It's described here: http://www.searchworkings.org/forum/-/message_boards/view_message/396093
I need to rewrite the code for my use.