algorithmvariadic-functionspostfix-notationvariable-lengthrpn

Variable-Length Operators In Reverse Polish Notation (Postfix)


Background: In traditional Reverse Polish Notation, all operators must have fixed lengths, which allows RPN to be easily evaluated and manipulated by code because every token, expression, and subexpression are all "self-contained" such that one can blindly substitute the y in x y * for y 1 + to get x y 1 + *, which is another valid expression that does exactly what you want it to do. Here is an interactive demo of a simple RPN calculator with named variable support. Note that the demos try to present the gist of an algorithm; they don't correlate to or represent production code.

var rpn = prompt("Please enter RPN string, where each token is " +
  "separated by a space", "x 1 x + * 2 /").trim().split(/\s+/);

var stack = [], variables = [], values = [];
for (let i = 0, len = rpn.length|0; i < len; i=i+1|0) {
    if (/^\d*(\.\d*)?$/.test(rpn[i]) && rpn[i] !== "") {
        stack.push( rpn[i] );
    } else if (/^[a-z]$/i.test(rpn[i])) {
        stack.push( rpn[i] );
        if (!~variables.indexOf(rpn[i])) variables.push( rpn[i] );
    } else {
        if(stack.length<2)throw Error("No operand for " + rpn[i]);
        const firstPop = stack.pop(); //lacks check if stack empty
        stack.push( "(" + stack.pop() + rpn[i] + firstPop + ")" );
    }
}
if (stack.length !== 1) throw Error("Invalid RPN got: " + stack);

for (let i = 0, len = variables.sort().length|0; i < len; i=i+1|0)
    values[i] = +prompt(variables[i] + " = ", Math.random()*10|0);

variables.push("'use strict';return(" + stack.pop() + ")");
alert("Result: " + Function.apply(0, variables).apply(0, values));

Question: How can RPN be modified or adapted to accommodate variable-length "operators" (think functions)?

Research and proposed solutions: I am using RPN as an intermediary representation of code before it is finalized into a specified code language. I want to preserve as much of the usefulness and ease of RPN as possible while still representing variable-length operators. I devised three solutions and implemented them in rather simplistic demos below.

  1. A special ARGUMENTS_BEGIN prefix operator (we'll use # for the purposes of this question). This solution flies in the face of traditional RPN in that it adds prefix operators to denote where the arguments begin. This makes the arguments list auto-expand in size, and assists with debugging because no malformed token substitution can disrupt the arguments list, allowing one to localize the error more easily. This could make manipulation of arguments more complex due to more code needed to handle cases such as nested function calls, but I am not entirely sure what all complications could arise. It is my guess that I will encounter obstacles parsing syntax that includes prefix and postfix operators. It also makes direct evaluation more difficult because back-tracking or a separate stack is needed to locate the start of the arguments.

var rpn = prompt("Please enter a RPN string, where each token is " +
  "separated by a space", "# # x 210 gcd x 6 * 126 gcd").trim()
  .split(/\s+/);

var stack = [], variables = [], values = [];
for (let i = 0, len = rpn.length|0; i < len; i=i+1|0) {
    if (/^\d*(\.\d*)?$/.test(rpn[i]) && rpn[i] !== "") {
        stack.push( rpn[i] );
    } else if (/^[a-z]$/i.test(rpn[i])) {
        stack.push( rpn[i] );
        if (!~variables.indexOf(rpn[i])) variables.push( rpn[i] );
    } else if (/^[a-z]\w*$/i.test(rpn[i])) {
        const s = stack.lastIndexOf("#");
        if(s<0) throw Error("No start of arguments to " + rpn[i]);
        stack.push( rpn[i]+"(" + stack.splice(s).slice(1) + ")" );
    } else if (rpn[i] === '#') {
        stack.push( '#' ); // sparks a syntax error if misused
    } else {
        if(stack.length<2)throw Error("No operand for " + rpn[i]);
        const firstPop = stack.pop();
        stack.push( "(" + stack.pop() + rpn[i] + firstPop + ")" );
    }
}
if (stack.length !== 1) throw Error("Invalid RPN got: " + stack);

for (let i = 0, len = variables.sort().length|0; i < len; i=i+1|0)
    values[i] = +prompt(variables[i] + " = ", Math.random()*10|0);

variables.push( "gcd" );
values.push( function gcd(a, b) {return b ? gcd(b, a % b) : a;} );

variables.push("'use strict';return(" + stack.pop() + ")");
alert("Result: " + Function.apply(0, variables).apply(0, values));

  1. Comma operators to group arguments together (we'll use , for grouping the last two items and ~ to denote the a zero-length group for the purposes of this question). This solution is traditional RPN except with slightly special handling of the comma and zero-group operators. Every variable-length operator is treated as having a length of one (zero arguments is represented with ~). Commas build arguments lists out of two items, each of which can be an ordinary token, an arguments list, or a zero-group operator. Advantages include easy manipulation and parsing of the code, compliance with the simplicity of RPN, and preservation of the token-independentness of RPN. Disadvantages include the RPN being harder to debug because a tiny malformed token can upset an entire arguments list and snowball out of control with no way to detect whether it is deliberate or accidental.

var rpn = prompt("Please enter RPN string, where each token is " +
  "separated by a space", "x 6 * 126 , 210 , gcd ~ PI %")
  .trim().split(/\s+/);

var stack = [], variables = [], values = [];
for (let i = 0, len = rpn.length|0; i < len; i=i+1|0) {
    if (/^\d*(\.\d*)?$/.test(rpn[i]) && rpn[i] !== "") {
        stack.push( rpn[i] );
    } else if (/^[a-z]$/i.test(rpn[i])) {
        stack.push( rpn[i] );
        if (!~variables.indexOf(rpn[i])) variables.push( rpn[i] );
    } else if (/^[a-z]\w*$/i.test(rpn[i])) {
        if(stack.length<1)throw Error("No operand for " + rpn[i]);
        stack.push( rpn[i] + "(" + stack.pop() + ")" );
    } else if (rpn[i] === ',') {
        if(stack.length<2)throw Error("No operand for " + rpn[i]);
        const p2 = "" + stack.pop(), p1 = "" + stack.pop();
        stack.push( p1 && p2 ? p1 + "," + p2 : p1 || p2 );
    } else if (rpn[i] === '~') {
        stack.push( "" ); // zero-length group
    } else {
        if(stack.length<2)throw Error("No operand for " + rpn[i]);
        const firstPop = stack.pop(); //lacks check if stack empty
        stack.push( "(" + stack.pop() + rpn[i] + firstPop + ")" );
    }
}
if (stack.length !== 1) throw Error("Invalid RPN got: " + stack);

for (let i = 0, len = variables.sort().length|0; i < len; i=i+1|0)
    values[i] = +prompt(variables[i] + " = ", Math.random()*10|0);

variables.push( "gcd", "PI" );
values.push( function gcd(a, b) {return b ? gcd(b, a % b) : a;} );
values.push( function PI() {return Math.PI;} );

variables.push("'use strict';return(" + stack.pop() + ")");
alert("Result: " + Function.apply(0, variables).apply(0, values));

  1. The operator intrinsically stores its length (we'll append a number onto the function name for the purposes of this question). This solution inherits all of the advantages of traditional RPN. Additionally, it makes the reading aspect of the parser simple. Additionally, debugging is easier because there is no accidental insertion of new arguments. However, it makes manipulations and generation of RPN code more complex. Updating and generating arguments lists is difficult because this solution deviates from the token-independentness aspect of RPN such that adding an argument (and changing the arity) requires two actions and one lookup (verses the traditional one action and zero lookups): (1.) insert the argument, (2.) lookup the position of the variable-length operator, and (3.) update the length of the operator.

var rpn = prompt("Please enter RPN string, where each token is " +
  "separated by a space", "x 210 gcd2 x 6 * 126 gcd3").trim()
  .split(/\s+/);

var stack = [], variables = [], values = [];
for (let i = 0, len = rpn.length|0, m; i < len; i=i+1|0) {
    if (/^\d*(\.\d*)?$/.test(rpn[i]) && rpn[i] !== "") {
        stack.push( rpn[i] );
    } else if (/^[a-z]$/i.test(rpn[i])) {
        stack.push( rpn[i] );
        if (!~variables.indexOf(rpn[i])) variables.push( rpn[i] );
    } else if (m = rpn[i].match(/^([a-z]+)(\d+)$/i)) {
       if(stack.length<m[2])throw Error("No operand for "+rpn[i]);
        stack.push( m[1] + "(" + stack.splice(-m[2]) + ")" );
    } else {
        if(stack.length<2)throw Error("No operand for " + rpn[i]);
        const firstPop = stack.pop(); //lacks check if stack empty
        stack.push( "(" + stack.pop() + rpn[i] + firstPop + ")" );
    }
}
if (stack.length !== 1) throw Error("Invalid RPN got: " + stack);

for (let i = 0, len = variables.sort().length|0; i < len; i=i+1|0)
    values[i] = +prompt(variables[i] + " = ", Math.random()*10|0);

variables.push( "gcd" );
values.push( function gcd(a, b) {return b ? gcd(b, a % b) : a;} );

variables.push("'use strict';return(" + stack.pop() + ")");
alert("Result: " + Function.apply(0, variables).apply(0, values));

  1. Nested arrays on the stack (no demo possible). This solution involves storing the arguments in a list before the operator on the stack, which makes direct execution of the code very easy. However, this violates the entire precept and advantage of RPN, which is to have a flat list of items. Perhaps, if lists were only one deep, there would not be too much of a problem; however, for my use case, I would end up with deeply nested lists. Thus, manipulation of the RPN and generation of the RPN becomes very difficult.

Extrapolation of the single question: Are there any other possible solutions to this problem? What is the standard (most-used) solution to this problem? Are there fundamental problems with my solutions (please provide counter examples)? Did I overlook some pros/cons of my solutions? Could my solutions' algorithms be improved?


Solution

  • I think you have covered the options already: if you have to be able to pass a variable-length list of arguments, then either your language needs to have a native data structure allowing the whole list to be a single value on the stack (i.e. nested lists as in #4, or a simulacrum of them as in #2 where lists are represented as strings, comma-separated, and cannot contain other lists), or otherwise the list elements must be separate values on the stack. In that case the variable length must be determined either by a sentinel (as in #1) or a length field (as in #3). That seems exhaustive.

    As for advantages and disadvantages:

    Personally, I favour option #4 because a programming language is not very useful for general purposes if it doesn't have lists/arrays as first-class objects. I am not sure exactly what you mean by "this violates the entire precept and advantage of RPN [...] manipulation of the RPN and generation of the RPN becomes very difficult." It is quite possible to have a syntax for creating lists and nested lists in a concatenative language like RPN.

    Taking my own toy language fffff as an example, the code [1 2 3]. creates a sequence by opening a new stack with the [ operator, pushing the literal values 1, 2 and 3 to this new stack, and then closes the new stack with the ]. operator, which also pushes a reference to the new stack onto the previously-current stack. This obeys the concatenative property because if, for example, the function three_and_close is defined as doing 3 ]. then the code [1 2 three_and_close has the same behaviour as the original code; so factoring out parts of the code is still just as easy as in standard RPN.