c++parsingantlrantlr4grammar

How to build the antlr grammar provided?


I would like to build a cpp parser using cpp and I'm using ANTLR4. I notice there is this 'grammar' section from the official github antlr grammar github and I've downloaded it. While opening the CPP file inside, I notice there is a CPP14Parser and a CPP14Lexer, also there's another CPP file with a CPPParser in it. I've been running through the documentations for days but it seems it was outdated. I've tried running antlr-4 but later I kept getting and error when trying to compile it. However, the CMakeLists.txt and main(basically all other file) I wrote does work on an older version where things are more clean The version I successfully build. Can somene please teach me how to build the latest version of current grammar? Please let me know if any more specific context is needed! Thanks in advance!

Edit: The step I did was listed as below, I install the older version of CPP14.g4(as written in above description), and wrote my wrote my own CMakeLists.txt

cmake_minimum_required(VERSION 3.14)
project(CPPparser)
set(CMAKE_CXX_STANDARD 17)
include_directories(
        ${PROJECT_SOURCE_DIR}/generated/
        ${PROJECT_SOURCE_DIR}/cppruntime/src/
        ${PROJECT_SOURCE_DIR}/src/
)
set(src_dir
        ${PROJECT_SOURCE_DIR}/generated/CPP14Lexer.cpp
        ${PROJECT_SOURCE_DIR}/generated/CPP14Parser.cpp
        ${PROJECT_SOURCE_DIR}/generated/CPP14Visitor.cpp
        ${PROJECT_SOURCE_DIR}/generated/CPP14BaseVisitor.cpp
 )
file(GLOB antlr4-cpp-src
    ${PROJECT_SOURCE_DIR}/cppruntime/src/*.cpp
    ${PROJECT_SOURCE_DIR}/cppruntime/src/atn/*.cpp
    ${PROJECT_SOURCE_DIR}/cppruntime/src/dfa/*.cpp
    ${PROJECT_SOURCE_DIR}/cppruntime/src/internal/*.cpp
    ${PROJECT_SOURCE_DIR}/cppruntime/src/misc/*.cpp
    ${PROJECT_SOURCE_DIR}/cppruntime/src/support/*.cpp
    ${PROJECT_SOURCE_DIR}/cppruntime/src/tree/*.cpp
    ${PROJECT_SOURCE_DIR}/cppruntime/src/tree/pattern/*.cpp
    ${PROJECT_SOURCE_DIR}/cppruntime/src/tree/xpath/*.cpp
)
add_library (antlr4-cpp-runtime ${antlr4-cpp-src})
add_executable(CPPparser ${src_dir} src/main.cpp)
target_link_libraries(CPPparser antlr4-cpp-runtime)

and I have the main.cpp written in a /src folder, /generated having the files generated after I run antlr4 -Dlanguage=Cpp -visitor, /cppruntime for the runtime from the official antlr github. So I'll run the following command on a ubuntu

mkdir build && cd build
cmake ..
make

It does work perfectly fine, however, when I try to compile the latest file provided on github using the same steps, I get tons of error when running the make command. The error are as lited below

: error: expected class-name before ‘{’ token
   12 | class  CPP14Parser : public CPP14ParserBase {
      |                                             ^
/home/user/CPnew/generated/CPP14Parser.h:118:3: error: ‘CPP14Parser::~CPP14Parser()’ marked ‘override’, but does not override
  118 |   ~CPP14Parser() override;
      |   ^
/home/user/CPnew/generated/CPP14Parser.h:120:15: error: ‘std::string CPP14Parser::getGrammarFileName() const’ marked ‘override’, but does not override
  120 |   std::string getGrammarFileName() const override;
      |               ^~~~~~~~~~~~~~~~~~
/home/user/CPnew/generated/CPP14Parser.h:122:27: error: ‘const antlr4::atn::ATN& CPP14Parser::getATN() const’ marked ‘override’, but does not override
  122 |   const antlr4::atn::ATN& getATN() const override;
      |                           ^~~~~~
/home/user/CPnew/generated/CPP14Parser.h:124:35: error: ‘const std::vector<std::__cxx11::basic_string<char> >& CPP14Parser::getRuleNames() const’ marked ‘override’, but does not override
  124 |   const std::vector<std::string>& getRuleNames() const override;
      |                                   ^~~~~~~~~~~~
/home/user/CPnew/generated/CPP14Parser.h:126:34: error: ‘const antlr4::dfa::Vocabulary& CPP14Parser::getVocabulary() const’ marked ‘override’, but does not override
  126 |   const antlr4::dfa::Vocabulary& getVocabulary() const override;
      |                                  ^~~~~~~~~~~~~
/home/user/CPnew/generated/CPP14Parser.h:128:34: error: ‘antlr4::atn::SerializedATNView CPP14Parser::getSerializedATN() const’ marked ‘override’, but does not override
  128 |   antlr4::atn::SerializedATNView getSerializedATN() const override;
      |                                  ^~~~~~~~~~~~~~~~
/home/user/CPnew/generated/CPP14Parser.h:3895:8: error: ‘bool CPP14Parser::sempred(antlr4::RuleContext*, size_t, size_t)’ marked ‘override’, but does not override
 3895 |   bool sempred(antlr4::RuleContext *_localctx, size_t ruleIndex, size_t predicateIndex) override;
      |        ^~~~~~~
/home/user/CPnew/cppruntime/src/CPP14ParserBase.cpp:5:6: error: ‘CPP14ParserBase’ has not been declared
    5 | bool CPP14ParserBase::IsPureSpecifierAllowed()
      |      ^~~~~~~~~~~~~~~
/home/user/CPnew/cppruntime/src/CPP14ParserBase.cpp: In function ‘bool IsPureSpecifierAllowed()’:
/home/user/CPnew/cppruntime/src/CPP14ParserBase.cpp:9:18: error: invalid use of ‘this’ in non-member function
    9 |         auto x = this->getRuleContext(); // memberDeclarator
      |                  ^~~~

In case needed,my main.cpp is as below

#include <iostream>
#include "CPP14Lexer.h"
#include "CPP14Parser.h"
using namespace antlr4;

int main(int argc, const char* argv[]) {
    const char* filepath = argv[1];
    std::ifstream ifs;
    ifs.open(filepath);
    ANTLRInputStream input(ifs);

    CPP14Lexer lexer(&input);
    CommonTokenStream tokens(&lexer);

    CPP14Parser parser(&tokens);
    tree::ParseTree* tree = parser.translationunit();

    if (parser.getNumberOfSyntaxErrors() > 0) {
        std::cout<<"File syntax error"<<std::endl;
        return 0;
    }

    tokens.fill();
    for (auto t : tokens.getTokens()) {
        std::cout<<t->toString()<<std::endl;
    }
    std::cout << tree->toStringTree(&parser) << std::endl << std::endl;


    ifs.close();
    return 0;
}

TLDR:If possible, please guide me on how to build the latest version of antlr grammar. Sorry if I'm not making things clear or making any sily mistake, it's my first question here.


Solution

  • Step 1 -- clone the repo

    $ git clone https://github.com/antlr/grammars-v4.git
    Cloning into 'grammars-v4'...
    remote: Enumerating objects: 50618, done.
    remote: Counting objects: 100% (1907/1907), done.
    remote: Compressing objects: 100% (1285/1285), done.
    remote: Total 50618 (delta 686), reused 1618 (delta 510), pack-reused 48711
    Receiving objects: 100% (50618/50618), 47.50 MiB | 23.79 MiB/s, done.
    Resolving deltas: 100% (27107/27107), done.
    Updating files: 100% (9413/9413), done.
    

    Step 2 -- navigate to cpp and check desc.xml to see if target is available for grammar

    $ cd grammars-v4/cpp/
    $ cat desc.xml
    <?xml version="1.0" encoding="UTF-8" ?>
    <desc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../_scripts/desc.xsd">
       <targets>Cpp;CSharp;Dart;Go;Java;JavaScript;Python3;Antlr4ng</targets>
    </desc>
    

    The desc.xml describes what targets build and are "working". It is sometimes possible to see target files available for a target, but they may not build, or the grammar is too slow to use in practice.

    For the cpp grammar, the Cpp target is listed, so the target "works".

    Step 3 -- copy target support files into directory containing grammar

    $ cp Cpp/* .
    

    You have to copy the files to the directory. Otherwise, you will get compilation errors.

    Step 4 -- transform the grammar using the Python script provided

    $ python transformGrammar.py
    Altering .\CPP14Lexer.g4
    Writing ...
    Altering .\CPP14Parser.g4
    Writing ...
    

    The grammar contains "actions", which is target-specific code, specifically for Java. You must convert this. to the syntax for C++, this->.

    $ grep this *.g4
    CPP14Lexer.g4:This: 'this';
    CPP14Parser.g4: * Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
    CPP14Parser.g4: * The above copyright notice and this permission notice shall be included in all copies or
    CPP14Parser.g4:        | { this->IsPureSpecifierAllowed() }? pureSpecifier
    CPP14Parser.g4:        | { this->IsPureSpecifierAllowed() }? virtualSpecifierSeq pureSpecifier
    

    Step 5 -- write a driver and build script for the grammar

    The grammar in the repo contains just the grammar. It does not contain any driver code because people want to package the parser in a number of different ways. You need to write this code yourself. In addition, you need a build script to do a build in a repeatable, error-free way. This is why people use CMake.

    In lieu of that, you can run the following commands to get the code to compile. You will still need to write the driver, decide how you want to package and link the Antlr Cpp runtime (static vs dynamic libraries, etc).

    a) Clone the antlr4 repo

    pushd ../../
    $ git clone https://github.com/antlr/antlr4.git
    Cloning into 'antlr4'...
    remote: Enumerating objects: 134889, done.
    remote: Counting objects: 100% (158/158), done.
    remote: Compressing objects: 100% (82/82), done.
    remote: Total 134889 (delta 62), reused 110 (delta 47), pack-reused 134731
    Receiving objects: 100% (134889/134889), 68.26 MiB | 23.52 MiB/s, done.
    Resolving deltas: 100% (79495/79495), done.
    Updating files: 100% (2273/2273), done.
    $ popd
    

    b) Run antlr4 tool to generate the parser source code

    $ antlr4 -Dlanguage=Cpp CPP14Lexer.g4 CPP14Parser.g4
    

    c) Compile using GNU compiler

    $ g++ -std='c++17' -pthread -c -I../../antlr4/runtime/Cpp/runtime/src/ -g *.cpp