postgresqlapache-ageopencypher

Conflicts on Trying to Prevent SQL Clauses Entering Cypher Parser


I am working on a project to add support for Cypher queries on psql to Apache AGE. Currently, to create a graph with Apache AGE, we need to specify a Cypher query inside the SQL query. For example:

SELECT * FROM cypher('graph_name', $$
MATCH (v)
RETURN v
$$) as (v agtype);

With the new support, we only need to specify MATCH (v) RETURN v; to generate the same result. To achieve this, we implemented the HandleCypherCmds function in the psql mainloop.c file, specifically in the PSCAN_SEMICOLON condition.

Here is the relevant code:

/*
 * Send command if semicolon found, or if end of line and we're in
 * single-line mode.
 */
if (scan_result == PSCAN_SEMICOLON ||
    (scan_result == PSCAN_EOL && pset.singleline))
{
    /*
     * Save line in history.  We use history_buf to accumulate
     * multi-line queries into a single history entry.  Note that
     * history accumulation works on input lines, so it doesn't
     * matter whether the query will be ignored due to \if.
     */
    if (pset.cur_cmd_interactive && !line_saved_in_history)
    {
        pg_append_history(line, history_buf);
        pg_send_history(history_buf);
        line_saved_in_history = true;
    }

    /* execute query unless we're in an inactive \if branch */
    if (conditional_active(cond_stack))
    {
        /* handle cypher match command */
        if (pg_strncasecmp(query_buf->data, "MATCH", 5) == 0 ||
                pg_strncasecmp(query_buf->data, "OPTIONAL", 8) == 0 ||
                pg_strncasecmp(query_buf->data, "EXPLAIN", 7) == 0 ||
                pg_strncasecmp(query_buf->data, "CREATE", 6) == 0)
        {
            cypherCmdStatus = HandleCypherCmds(scan_state,
                                cond_stack,
                                query_buf,
                                previous_buf);

            success = cypherCmdStatus != PSQL_CMD_ERROR;

            if (cypherCmdStatus == PSQL_CMD_SEND)
            {
                //char *qry = convert_to_psql_command(query_buf->data);
                success = SendQuery(convert_to_psql_command(query_buf->data));
            }
        }
        else
            success = SendQuery(query_buf->data);

        slashCmdStatus = success ? PSQL_CMD_SEND : PSQL_CMD_ERROR;
        pset.stmt_lineno = 1;

        /* transfer query to previous_buf by pointer-swapping */
        {
            PQExpBuffer swap_buf = previous_buf;

            previous_buf = query_buf;
            query_buf = swap_buf;
        }
        resetPQExpBuffer(query_buf);

        added_nl_pos = -1;
        /* we need not do psql_scan_reset() here */
    }
    else
    {
        /* if interactive, warn about non-executed query */
        if (pset.cur_cmd_interactive)
            pg_log_error("query ignored; use \\endif or Ctrl-C to exit current \\if block");
        /* fake an OK result for purposes of loop checks */
        success = true;
        slashCmdStatus = PSQL_CMD_SEND;
        pset.stmt_lineno = 1;
        /* note that query_buf doesn't change state */
    }
}

Currently, the code implements temporary constraints to prevent SQL clauses from entering the Cypher parser, as doing so generates syntax errors. However, maintaining these constraints is not practical because they only work if the user correctly writes the Cypher clause. I tried working with the parser variables, but it needs to enter the Cypher parser to work, resulting in the same errors.

I have been unable to find a solution to this problem. Could someone please assist me in implementing this feature?


Solution

  • This problem is now solved. In case of someone is interested in the answer, this is how we resolved this issue:

    All SQL and Cypher clauses enter the Cypher parser. In the parser, we have boolean variables that help to differentiate between SQL and Cypher clauses. The following example shows the rules for the DROP clause:

    drop_clause:
        DROP GRAPH if_exists_opt IDENTIFIER cascade_opt { graph_name = $4; drop_graph = true; }
        | DROP VLABEL IDENTIFIER cascade_opt { label_name = $3; drop_label = true; }
        | DROP ELABEL IDENTIFIER cascade_opt { label_name = $3; drop_label = true; }
        | DROP CONSTRAINT identifier_opt ON IDENTIFIER assert_patt_opt
        ;
    

    The variables (e.g. drop_graph) are true only if the query matches the Cypher rules. For example, DROP GRAPH graph_name. Otherwise, if the query is a SQL statement, like DROP TABLE table_name, the variables remain false. The parser then returns false, and the SQL statement continues to the SQL parser. You can refer to the complete code in the AgeSQL repository, files mainloop.c and cypher.y.