I am trying to implement a regex to NFA converter. I have most of the code written, but I am struggling to find a way to build a graph with a cycle given my representation for states (nodes) and edges.
My graph representation is as follows:
type state =
| State of int * edge list (* Node ID and outgoing edges *)
| Match (* Match state for the NFA: no outgoing edges *)
and edge =
| Edge of state * string (* End state and label *)
| Epsilon of state (* End state *)
My function to convert a regex to an NFA is basically a pattern match on the type of regex, taking in the regex type and the "final state" (where all the outgoing edges for the NFA will go) and returning the "start state" of the (partially built) NFA for that regex. The NFA fragments are built by returning a State constructed with its outgoing edge list, where each edge's end state is constructed via a recursive call.
Most of the code is easy, but I am having trouble building the NFA for Kleene star and +, which require cycles in the graph. Given my representation I end up with something like:
let rec regex2nfa regex final_state =
match regex with
... (* Other cases... *)
| KleeneStar(re) ->
let s = State(count, [Epsilon(regex2nfa r s); Epsilon(final_state)]) in
s
Obviously this doesn't compile as s is undefined at this point. However I also cannot add the "rec" keyword because the type checker will (rightfully) reject such a recursively defined type, and I can't get around this by using Lazy because forcing the evaluation of "s" will recursively force it (again and again...). Basically I have a chicken and egg problem here - I need to pass the "state" reference before it is fully constructed to a another state that will have an edge back to it, but of course the original state must be fully constructed to be passed in the recursive call.
Is there anyway way to do this without using references/mutable records? I would really like to keep this as functional as possible but I don't see a way around this given the situation... Anyone have suggestions?
You can create data structures with cycles without explicit references, using lazy types or functions. Indeed, both of them hides some form of mutability.
Here is an example of a simplest lazy structure, that is more complex than a list
type 'a tree = 'a tr Lazy.t
and 'a tr = Stem of 'a * 'a tree * 'a tree
let rec tree_with_loop : int tree =
lazy (Stem (42,tree_with_loop,tree_with_loop))
But, you should understand, that with this kind of structures (i.e., those that contains cycles) you're stepping to an infirm ground of infinity, as all your traversing functions now diverge.
And here is the same example, but without lazy:
type 'a tree = unit -> 'a tr
and 'a tr = Stem of 'a * 'a tree * 'a tree
let rec tree_with_loop : int tree =
fun () -> Stem (42,tree_with_loop,tree_with_loop)
And here is an example of a slightly less infinite tree:
type 'a tree = 'a tr Lazy.t
and 'a tr =
| Node of 'a
| Tree of 'a tree * 'a tree
let rec tree_with_loop : int tree =
lazy (Tree (tree_with_loop,
lazy (Node 42)))