javaantlr4postfix-notationinfix-notation

Infix -> postfix translator in antlr4


I need to define the semantic actions required to translate infix notation to postfix notation by embedding semantic actions as Java code in the g4 grammar file.

For example, when I try to input "a + b * c", I want it to print out a b c * +. But mine is: a b + c *

It happens only when there are * and /

This is my grammar file

grammar Simple;
// productions for syntax analysis
start returns [String node]: e=expr  <EOF>              {$node = $e.node;};

expr returns [String node]: t=term r=rest               {$node = $t.node + " " + $r.node;};

rest returns [String node]: '*' t=term r=rest           {$node = $t.node + " " + $r.node + "*" ;}
                          | '/' t=term r=rest           {$node = $t.node + " " + $r.node + "/" ;}
                          | '+' t=term r=rest           {$node = $t.node + "+"  + $r.node ;}
                          | '-' t=term r=rest           {$node = $t.node + "-" + $r.node   ;}
                          |                              {$node = " ";};

term returns [String node]: '(' e=expr ')'              {$node = $e.node;}
                           |Id                          {$node = $Id.text;}
                           | Num                        {$node = $Num.text;};


// productions for lexical analysis
Id  : [a-z]+ ;
Num : [0-9]+ ;
WS  : (' ' | '\t' | '\r'| '\n'|) -> skip;

This is my main file

import java.io.*;
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.ParseTree;

public class SimpleMain {
    public static void main(final String[] args) throws IOException {
        CharStream charStream =  CharStreams.fromFileName("src/test.expression");
        SimpleLexer lexer = new SimpleLexer(charStream);
        SimpleParser parser = new SimpleParser(new CommonTokenStream(lexer));
        String postfix = parser.start().node; /* Run translator */
        System.out.println(postfix);
    }
}

Solution

  • I'm not really sure what you're attempting to accomplish with the (op) t=term r=rest style of rules. They won't match the infix operator syntax you have specified for your input. (It looks a bit like it might be a mis-translation from a non-antler grammar, but I'd just be guessing)

    First, a couple of things to watch out for:

    rest returns [String node]: '*' t=term r=rest           {$node = $t.node + " " + $r.node + "*" ;}
                              | '/' t=term r=rest           {$node = $t.node + " " + $r.node + "/" ;}
                              | '+' t=term r=rest           {$node = $t.node + "+"  + $r.node ;}
                              | '-' t=term r=rest           {$node = $t.node + "-" + $r.node   ;}
                              |                              {$node = " ";};  // 
    

    The last alternative is empty and will match "nothing". This can be used to indicate that the contents of a rule may be optional, but you don't want that here.

    WS  : (' ' | '\t' | '\r'| '\n'|) -> skip; 
    

    The |) creates an empty alternative and ANTLR warns you that this is not a good idea in a Lexer rule. WS: [ \t\r\n] -> skip; makes this simpler.

    With those things in mind, This seems to give the correct output:

    grammar Simple
        ;
    // productions for syntax analysis
    start
        returns[String node]: e = expr <EOF> {$node = $e.node;};
    
    expr
        returns[String node]
        : lhs = expr op = ('*' | '/') rhs = expr 
            {$node = $lhs.node + " " + $rhs.node + " " + $op.text ; }
        | lhs = expr op = ('+' | '-') rhs = expr 
            {$node = $lhs.node + " "  + $rhs.node + " " + $op.text ; }
        | '(' e = expr ')' {$node = $e.node;}
        | Id {$node = $Id.text;}
        | Num {$node = $Num.text;}
        ;
    
    term
        returns[String node]
        : '(' e = expr ')' {$node = $e.node;}
        | Id {$node = $Id.text;}
        | Num {$node = $Num.text;}
        ;
    
    // productions for lexical analysis 
    Id:  [a-z]+;
    Num: [0-9]+;
    WS:  [ \t\r\n] -> skip;
    

    output for a + b * c is a b c * +

    And output for a + b * c / d is a b c * d / +