apachehadoopdata-miningmahout

Wrong output of mahout PFPGrowth algorithm?


I'm using latest trunk version of mahout's PFP Growth implementation on top of a hadoop cluster to determine frequent patterns in movielens dataset. In a previous step I converted the dataset to a list of transactions as the pfp growth algorithm needs that input format.

However, the output I get is unexpected

For example for item 1017 the only frequent pattern is

1017 ([100,1017, 50])

I would also expect a pattern like ([1017], X) with X >= 50 in that line.

I also testset an example input

1,2,3

1,2,3

1,3

and the output I get is

1 ([1, 3],3), ([1],3), ([1, 3, 2],2)

2 ([1, 3, 2],2)

3 ([1, 3],3), ([1, 3, 2],2)

There are missing patterns like ([1,2],2)

What is wrong?


Solution

  • The reason is that the FP Algorithm does not output subsets of a frequent pattern if its support is not greater. It's described here: http://www.searchworkings.org/forum/-/message_boards/view_message/396093

    I need to rewrite the code for my use.