So CodeMirror uses modes
to tokenise its code.
It breaks up the document into lines and makes each line a stream, which is then put through into the pre-defined mode
. It can span multiple lines by using its state
parameter.
It seems ACE has a similar method.
Neither of these methods use RegExp inherently (but obviously whomever creates the mode can code in RegExp into their mode).
From what I've read of Atom's code and style, is that it calls different syntax highlighters grammars
and they resemble closely the grammars
from TextMate.
These grammars
resemble JSON objects which contain classnames and RegExps (see how to write a TextMate grammar).
I can't figure out for the life of me how exactly Atom Text Editor actually performs the parsing of code, keeping its state and also extending through various scopes.
If someone could point me in the right direction that would be great.
The question was answered here.
Atom uses its first-mate
module, which relies on oniguruma
for parsing Regular Expressions.