Something that can convert
r"a+|(?:ab+c)"
to
{
(1, 'a') : [2, 3],
(2, 'a') : [2],
(3, 'b') : [4, 3],
(4, 'c') : [5]
}
or something similar
and accepting in 2 or 5
i have some code that will do this. it's not well documented and it's not supported, but if you're interested you're welcome to look at it.
the library is called rxpy and the repository is http://code.google.com/p/rxpy
the routine that does parsing is parse_pattern at http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/parser/pattern.py#871
if you call repr(...)
on the result from that you get a graph in the "dot language" - https://en.wikipedia.org/wiki/DOT_language
for example, see the tests as http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/parser/_test/parser.py#47
to show what i mean ,let's look at the test at http://code.google.com/p/rxpy/source/browse/rxpy/src/rxpy/parser/_test/parser.py#234 which is for 'ab*c'
:
"""digraph {
0 [label="a"]
1 [label="...*"]
2 [label="b"]
3 [label="c"]
4 [label="Match"]
0 -> 1
1 -> 2
1 -> 3
3 -> 4
2 -> 1
}"""
that starts at 0
which can match an "a" to go to state 1
. from there you can match a "b" to go to state 2
or a "c" to go to state 3
. state 2
then has a transition back to 1
that can consume another "b", etc etc. it's a bit ugly to read by hand, but when the test fails you get a little graph displayed on the screen.
the library also has various "engines" which will match strings against this graph (and so do regular expression matching). but it is much slower than the python library (because it is pure python).
this is not supported and may not be very clear - sorry - but i think it's close to what you want and you're welcome to use it if it's useful (MPL or LGPL licence).