I am currently facing an issue where I have an array of different strings, each representing some part of a logical and/or comparison operation, for example: ['Name', '=', 'John', 'and', 'LastName', '!=', 'Doe']
Given this, I would need to construct a tree structure that would look something like:
Very basic tree
With this, each represents a node, with the bottom ones being LeafNodes and the top being a CompoundNode (which if there is only one node, is a LeafNode itself). The issue I am facing is that a simple compound query is easy enough, but there can be infinite leaf nodes, with the following expression resulting in 4 leaf nodes under the 'and': Name = John and LastName != Doe and City != Amsterdam and BirthYear = 2000
The problem I am facing, is I am unsure how to algorithmically approach this. What makes something even more challenging, is if say given something like this: Price > 10 or (Category = 'Electronics' and Quantity < 100)
, it would create this structure:
Slightly more complex tree So the problem can get infinitely more difficult, and this is purely based on user input. The order of precedence and constructing the tree itself are something I can't figure out how to do recursively. I know I need to traverse the string, find leaf nodes and then attach to compound nodes (if there are any) but I am really battling with breaking down this issue into progressively more complex building blocks to solve this.
I have the following typescript classes to construct the nodes:
// Define the types of operators
type LogicalOperator = 'and' | 'or' | 'not';
type ComparisonOperator = 'eq' | 'ne' | 'gt' | 'ge' | 'lt' | 'le' | 'like' | 'ilike' | 'any';
// Interface for internal nodes (logical operators)
interface LogicalNode {
operator: LogicalOperator;
children: FilterNode[]; // Can be any number of children nodes
// The above would mean 0 as left, 1 as further right etc
}
// Interface for leaf nodes (comparison)
interface ComparisonNode {
value: string | number; // Value of the leaf node
field: string; // Field name for comparison nodes
operator: ComparisonOperator; // Operator for comparison nodes
}
// Union type representing all possible node types
type FilterNode = LogicalNode | ComparisonNode;
I have this working for very simple compound queries here (I cut off quite a bit of code to give the gist, and yes it is a quite ugly at the moment - I want to reapproach this differently):
let i = 1; // First token is always not a comparison or logical operator
while (i < tokens.length) {
if (COMPARISON_OPERATORS.indexOf(tokens[i]) !== -1) {
// Will always be left/right operands in this case
leafNodes.push(this.constructLeafNode(tokens[i - 1], tokens[i], tokens[i + 1]));
i = i + 2;
}
if (LOGICAL_OPERATORS.indexOf(tokens[i]) !== -1) {
// First leaf node should already be pushed so now we get the 2nd
leafNodes.push(
this.constructLeafNode(tokens[i + 1], tokens[i + 2], tokens[i + 3])
);
this.constructCompoundNode(leafNodes)
leafNodes = [] // reset the leaf nodes
i = i + 4;
}
}
My question is, and I wish I remembered more from my 2nd year data structures and algorithms course here, but what algorithm / structures would be best suited for this? This does not work when parenthesis is involved or when there is both and / or in the query
Here is an implementation with precedence from left to right in case two different logical operators are used one after the other. In other words, the logical operators all have the same precedence and are left-associative.
const LOGICAL_OPERATORS = ["and", "or", "not"] as const;
type LogicalOperator = (typeof LOGICAL_OPERATORS)[number];
const isLogicalOperator = (x: any): x is LogicalOperator => LOGICAL_OPERATORS.includes(x);
const COMPARISON_OPERATORS = ["=", "!=", ">", ">=", "<", "<="]
type ComparisonOperator = (typeof COMPARISON_OPERATORS)[number];
const isComparisonOperator = (x: any): x is ComparisonOperator => COMPARISON_OPERATORS.includes(x);
interface LogicalNode {
operator: LogicalOperator;
children: FilterNode[];
}
interface ComparisonNode {
value: string | number;
field: string;
operator: ComparisonOperator;
}
type FilterNode = LogicalNode | ComparisonNode;
function createCompoundNode(operator: string, ...operands: FilterNode[]): LogicalNode {
if (!isLogicalOperator(operator)) throw Error("Not a logical operator");
return {
operator,
children: operands
}
}
function createLeafNode(field: string, operator: string, value: string): ComparisonNode {
if (!isComparisonOperator(operator)) throw Error("Not a comparison operator");
return {
operator,
field,
value: isNaN(+value) ? value : +value
}
}
function tokenize(s: string): string[] {
return s.match(/[-+]?\d+(?:\.\d+)?|\w+|'[^']*'|[<>!=]+|\S/g) ?? [];
}
function parse(tokens: string[]): FilterNode {
let i = 0;
const getToken = (): string | undefined => tokens[i++];
const getTerm = (): FilterNode => {
const token = getToken()!;
if (token === "(") {
return getExpression(")");
}
if (token === "not") { // Unary operator
return createCompoundNode("not", getTerm());
}
return createLeafNode(token, getToken()!, getToken()!);
};
const getExpression = (terminal?: string): FilterNode => {
const term = getTerm()!;
let operator = getToken();
if (operator === terminal) {
return term;
}
let expr = createCompoundNode(operator!, term, getTerm());
for (operator = getToken(); operator !== terminal; operator = getToken()) {
if (operator !== expr.operator) {
// Apply left associativity
expr = createCompoundNode(operator!, expr, getTerm());
} else {
expr.children.push(getTerm());
}
}
return expr;
};
return getExpression(undefined);
}
// Demo
const s = "Price > 10 or X != -1.95 or (Category = 'Electronics' and Quantity < 100) and AAA <= 9";
const tokens = tokenize(s);
const tree = parse(tokens);
console.log(JSON.stringify(tree, null, 2));
The output of the example run is:
{
"operator": "and",
"children": [
{
"operator": "or",
"children": [
{
"field": "Price",
"operator": ">",
"value": "10"
},
{
"field": "X",
"operator": "!=",
"value": "-1.95"
},
{
"operator": "and",
"children": [
{
"field": "Category",
"operator": "=",
"value": "'Electronics'"
},
{
"field": "Quantity",
"operator": "<",
"value": "100"
}
]
}
]
},
{
"field": "AAA",
"operator": "<=",
"value": "9"
}
]
}