c++parsinggrammarrecursive-descentlr-grammar

How does gcc/clang parse c++ if it can't be parsed by a LR(1) parser?


GCC/Clang are handwritten parsers. I read a post saying that C++ can't be parsed by an LR(1) parser (Why can't C++ be parsed with a LR(1) parser?). If so, how come GCC/Clang are handwritten recursive descent parsers, when LR(1) is more powerful than recursive descent?


Solution

  • GCC/Clang are not strict recursive descent parsers; they allow backtracking (reparsing an arbitrarily-long text), among other deviations from theoretical purity. Backtracking parsers are strictly more powerful than non-backtracking parsers (but at the cost of no longer being linear time).

    The real complexities in parsing C++ vanish when the question is phrased this way. If you strip C++ down to the superset described by the BNF in Appendix A, then you basically just need to be prepared to consider several alternative parses. You can do that through backtracking -- trying one possibility and then if that fails, trying some other ones -- or with parallel exploration, using GLR/GLL or some other variant; nothing too painful. But the real work still needs to be confronted: