sequencedata-miningdata-warehouseminingdata-representation

Representation of sequential rules in data mining (sequence pattern mining)


In sequence pattern mining I have doubt regarding representational meaning of the following:-

Actual doubt

Does the rule 1 {a, b} → {f} and the rule 2 <{a, b}> → <{f}> mean the same?

Also does ⇒ and → have the same meaning when used to represent rules?

I noticed this representation and am confused if either both mean the same or are they different.

My observation

Having noticed that rule 2 representation sometimes being used to represent partially ordered sequential rule I want to know if :-

  1. Rule 1 is representation of Standard sequential rule where {a,b} must co-occur in at least any one grouping/element in a sequence provided before occurrence of {f} in some grouping/element.

    Assume the sequence <{a,c,b,d}, {g,h}, {b,f}> I am confused if rule 1 is being followed here which seems to me yes.

    Also assume this example sequence <{a,c,d},{b,f}> I am confused if rule 1 is not followed here, which to me seems a no.

  2. Rule 2 is representation of Partially ordered sequential rule where {a,b} must occur in some grouping/element in any ordering in a sequence provided before occurrence of {f} in some grouping/element.

The rule when converted to Standard sequential rule if I am correct is :-

<{a},{b}> → <{f}>
<{b},{a}> → <{f}>
<{a,b}> → <{f}>
<{b,a}> → <{f}>

Assume the sequence <{b,c,a,d}, {g,h}, {b,f}> I am confused if rule 2 is being followed here which seems to me yes.

Also assume this example sequence <{b,c,d},{f,g}> I am confused if rule 2 is not followed here, which to me seems a no.

Also assume this example sequence <{b,c,d},{a,f}> I am confused if rule 2 is not followed here, which to me seems a no according to my list of derived standard sequential rules.Since f being present in any group containing a or b would signify f never follows them but rather occurs at same time violating the rule.

Clarification

Would be helped if somebody clarifies these doubts to a beginner who has just started digging into the data mining concepts.

Pardon me for any silly mistakes in the question!


Solution

  • I see that you are referring to standard sequential rules (SSR) vs partially-ordered sequential rules (POSR). And from your tweet, I see that you are referring to these concepts from the video "Introduction to sequential rules", which I am the author.

    I will answer your questions. First, let me clarify that there are different types of sequential rules used by different authors in data mining, and that different authors may use different notations. For example, an author may use one type of arrow in a paper, while another author may use a different type of arrow, or may adopt slightly different definitions for the same notation.

    Having said that, to answer your question, I will assume that

    Rule 1 {a, b} → {f} is a partially-ordered sequential rule (POSR)

    and

    Rule 2 <{a, b}> → <{f}> is a standard sequential rule (SSR).

    Now, if we agree on this notation, I can answer the follow up questions.

    The Rule 1 and Rule 2 dont have the same meaning.

    The first rule means that if A and B are observed in ANY order then F will appear after. There are three possibilities that are all acceptable. The first one is that A and B appear at the same time, and are followed by F. The second one is that A appears, then B appears, and then F appears. And the third one is that B appears, A appears and then F. These three cases are represented by the Rule 1.

    The second rule means that if A and B appear at the same time, then F will appear after. Here, this rule is much more strict because A cannot appear before B for this rule to be satisfied. And also, B cannot appear before A for this rule to be satisfied.

    Now, lets clarify the meaning of a sequence such as: <{a,c,b,d}, {g,h}, {b,f}> from your example. This sequence means that A B C and D where observed at the SAME TIME. And after that, G and H were observed at the same time. And after that B and F were observed at the same time.

    Now let me answer your observations:

    The rule {a, b} → {f} does not appear in sequence <{a,c,b,d}, {g,h}, {b,f}> because F does not appear after B. In that sequence B and F appear at the same time. But according to rule 1, F should appear after A and B.

    The rule {a, b} → {f} does not appear in sequence <{a,c,d},{b,f}>. Again, this is the same reason that F does not appear after B. In that sequence B and F appear at the same time. But according to rule 1, F should appear after A and B.

    About observation 2, if the rule {a, b} → {f} is a partially-ordered sequential rule, you can indeed view it as replacing these rules:

    <{a},{b}> → <{f}> A followed by B, followed by F

    <{b},{a}> → <{f}> B followed by A, followed by F

    <{a,b}> → <{f}> A and B are at the same time, followed by F

    But this one is unnecessary:

    <{b,a}> → <{f}>

    because it has the same meaning as <{a,b}> → <{f}> . By this I mean that both <{b,a}> → <{f}> and <{a,b}> → <{f}> have the same meaning that A and B are at the same time, followed by F.

    Now let me answer your other questions.

    The rule <{a, b}> → <{f}> appears in <{b,c,a,d}, {g,h}, {b,f}>. The reason is that A and B appear at the same time and are followed by F. Here, pay attention that even if you write {b,c,a,d} in that sequence, the meaning is that ABCD have appeared at the same time. In other words, the order between {} does not matter. I think this is something that is creating confusion in your understanding.

    The rule <{a, b}> → <{f}> does not appear in <{b,c,d},{f,g}>. The reason is that there is no A in that sequence.

    The rule <{a, b}> → <{f}> does not appear in <{b,c,d},{a,f}> because according to the rule A and B should appear at the same time and followed by F. But in that sequence, B appears and is then followed by A and F at the same time. This does not match!