c++regex

convert/compile regular expressions to C code


I am on a memory limited system, boost::regex is too large. What options exist to compile my regular expression straight to C/C++ and how many KB of code size should I except? With the goal of reducing memory and code size as much as possible.

I am looking for under 100kb of code size and the same in memory usage. Boost regex appears to be approx 470kb which is too large.


Solution

  • lex (and flex) produce table-driven lexers which are generally pretty small; they go back to the days when 100kB would have been considered a supercomputer :) The basic flex code skeleton is tiny (a few kB) and the tables depend on how many token types you have and how complicated the regular expressions are, but a simple flex scanner table are typically a few kB as well.

    However, if you're not using them for building an interpreter/compiler, they do have a couple of annoying characteristics: first, they insist on doing your input and buffering for you, which is nice if you're always reading from a file but can be less cool if your input is coming from a socket or terminal (or, worse, being preprocessed by some kind of translator), and second they are designed for an environment where you have a few simple token types, and you have a parser which is responsible for interpreting the sequencing. (Hence yacc or bison.) You could use these tools to parse HTTP, certainly, and you might even find that you've learned some useful new skills.

    There is a tool called re2c (i.e. regular expression to C) which you might find a little more comfortable. Unlike lex, it produces customized C code, which is quite a bit bulkier, but arguably runs slightly faster. I don't think it's being actively maintained, but I had quite a lot of success with it some years back. You should be able to find it on SourceForge.

    Good luck.