I have problem with finding "correct" associations within production data.
The data looks like this
A;B;C;D;E;F;G
1;0;1;0;0;0;0
0;1;0;0;0;0;0
0;0;0;1;0;0;0
0;0;1;0;1;0;0
1;0;0;0;0;0;0
0;0;0;0;0;1;0
0;0;0;0;0;0;1
1;0;1;0;0;0;0
(Of course I have a lot more steps and rows)
Where A,B,C etc are production steps. 0 means that a worker did not perform this production step and 1 means that this step was performed by a worker. For example, first row - 1;0;1;0;0;0;0 means that steps A & C where performed at the same time by a worker. And second row -0;1;0;0;0;0;0 means that (perhaps another worker) performed only production step B.
So it happens that some of the production steps are usually performed simultaneously by the same worker, just like step A & C in the example above (2 out of 3 times they occur together). In order to find which steps tend to be performed together I applied apriori algorithm.
I hoped to receive answer like "If there is 1 in column A, it is likely that 1 will appear in column C". But instead, apriori algorithm found for me this "cool" rules which basically say that there are a lot of 0s in the table. Rules found where like this "If there is 0 in columns A and G, it is likely that there is 0 in column E" - thanks Sherlock
I need this algorithm to focus on rules connected to where are 1s in the table, not 0s. Basically any rule that looks at 0s can be ignored. I just want rules that look at 1s because I want to know which production steps tend to be performed together and I don't care which production steps are not performed together (0s) because obviously majority of the steps are not performed simultaneously.
Does anybody have some idea how to find associations between 1s instead of 0s?
I use Weka software to do the data mining.
Apriori has no notion of what the labels represent, they are just strings.
Have you tried the -Z
option, treating the first label in attribute as missing?