c++treesitterast-grep

Why is the last statement ignored in ast-grep / tree-sitter with C++ in compound_statement?


In this playground, the last statement is always ignored and not captured. Why is this?

https://ast-grep.github.io/playground.html#eyJtb2RlIjoiQ29uZmlnIiwibGFuZyI6ImNwcCIsInF1ZXJ5IjoidXNpbmcgbmFtZXNwYWNlICRBOyIsInJld3JpdGUiOiJ1c2luZyBuYW1lc3BhY2UgZm9vOjokQTsiLCJjb25maWciOiJcbmlkOiB0ZXN0YmFzZV9pbml0aWFsaXplclxubGFuZ3VhZ2U6IENQUFxucnVsZTpcbiAgcGF0dGVybjpcbiAgICBzZWxlY3RvcjogY29tcG91bmRfc3RhdGVtZW50XG4gICAgY29udGV4dDogXCJBOjpBKCkgOiBmb28oKSB7ICQkJEJPRFlTVFVGRiB9XCJcbmZpeDogfC1cbiAge1xuICAgIGYoKTtcbiAgICAkJCRCT0RZU1RVRkY7XG4gIH0iLCJzb3VyY2UiOiJBOjpBKClcbiAgOiBiYXNlOjpDbGFzcyhhLCBiLCBjKSB7IFxuICAgICBhO1xuICAgICBiO1xuICAgICBjO1xuICB9In0=

Test C++ code:

A::A()
  : base::Class(a, b, c) { 
     a;
     b;
     c;
  }

Test rules:

id: testbase_initializer
language: CPP
rule:
  pattern:
    selector: compound_statement
    context: "A::A() : foo() { $$$BODYSTUFF }"
fix: |-
  {
    f();
    $$$BODYSTUFF;
  }

Captured by BODYSTUFF: a;b;. But c; is not in it. Why? For reference, the unreduced testcase is this:

id: testbase_initializer
language: CPP
rule:
  pattern:
    selector: compound_statement
    context: "A::A() : foo() { $$$BODYSTUFF }"

  follows:
    kind: field_initializer_list
    has:
      pattern: 
        selector: field_initializer
        context: "A::A() : TestBase($NAME, $DETAILS, $ID) { }"
fix: |-
  { 
    setName($NAME);
    setId($ID);

    $$$BODYSTUFF
  }

And it was intended to move part of the base class initializer into the body. But the fix always discarded the last body statement!


Solution

  • This is a tricky issue of tree-sitter parser.

    { $$$A } is not a valid syntax in CPP, as you can see in the attachment. (Note the show full tree option is toggled on).

    While ast-grep recommends pattern should be valid syntax, it is permissive for the pattern and will try its best to perform reasonable matching.

    $$$BODYSTUFF will be parsed as type_identifier and a missing ;. So the last statement is wrongly matched into the missing semicolon, and is left out in the capturing.

    A workaround for now is using shorter variable name like $$$B to make tree-sitter parse it as one single ERROR node.

    See https://ast-grep.github.io/playground.html#eyJtb2RlIjoiQ29uZmlnIiwibGFuZyI6ImNwcCIsInF1ZXJ5IjoieyBcbiAgICAkJCRCT0RZU1RVRkZcbn0iLCJyZXdyaXRlIjoidXNpbmcgbmFtZXNwYWNlIGZvbzo6JEE7IiwiY29uZmlnIjoiXG5pZDogdGVzdGJhc2VfaW5pdGlhbGl6ZXJcbmxhbmd1YWdlOiBDUFBcbnJ1bGU6XG4gIHBhdHRlcm46XG4gICAgc2VsZWN0b3I6IGNvbXBvdW5kX3N0YXRlbWVudFxuICAgIGNvbnRleHQ6IFwieyAkJCRCIH1cIlxuZml4OiB8LVxuICB7XG4gICAgZigpO1xuICAgICQkJEJcbiAgfSIsInNvdXJjZSI6IkE6OkEoKVxuICA6IGJhc2U6OkNsYXNzKGEsIGIsIGMpIHsgXG4gICAgIGE7XG4gICAgIGI7XG4gICAgIGM7XG4gIH0ifQ==

    playgroudn dump