algorithmpattern-matchingsuffix-tree

Is there a ready-to-use unsupervised multi-string-based pattern discovery library/software?


strace is a command that traces system calls and signals. An example of its output:

poll([{fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=30, events=POLLIN}, {fd=31, events=POLLIN}, {fd=104, events=POLLIN}], 5, 11) = 0 (Timeout)
recvmsg(30, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f946e0c56e8, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x7f946e0c5698, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(30, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=30, events=POLLIN}, {fd=31, events=POLLIN}, {fd=104, events=POLLIN}], 5, 0) = 0 (Timeout)
recvmsg(30, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f946e0c56e8, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x7f946e0c5698, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(30, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(31, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)

It is highly patterned - is there a ready-to-use software that can read the input above, then unsupervisedly recognizes the patterns, like:

====================================================
Patterns
====================================================

Pattern 1 (P1): {fd=$1, events=$2}

P2: P1
    where $2=POLLIN

P3: [P2a, P2b, P2c, P2d, P2e]
    where a.$1=$1, b.$1=$2, c.$1=$3, d.$1=$4, e.$1=$5

P4: poll(P3, $6, $7) = 0 (Timeout)
    where P3.$1=$1 P3.$2=$2 P3.$3=$3 P3.$4=$4 P3.$5=$5

P6: recvmsg($1, {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)

P7: futex($1, FUTEX_WAKE_PRIVATE, $2) = 1

P8:
    P4a
    P6b
    P7c
    P7d
    P6e
    P6f
where a.$1=$1 a.$2=$2 ...


====================================================
Output
====================================================

P8 where $1=11, $2=12, ...
P8 where $1=11, $2=12, ...

Is there an universe ready-to-use unsupervised one that is already implentmented?


Solution

  • The compression algorithms are good examples. See the theory and implementations of deflate, gzip, xz and lzma.