Cypher query language has been made popular with Neo4J and is being standardized as openCypher. The openCypher page mentions SQL/GQL but the GQL page has last been updated in 2019. Meanwhile the SQL:2023 standard includes a section on Property Graph queries called SQL/PGQ. Unfortunately SQL:2003 is not public so I can only guess about details. Then there is PGQL, driven by Oracle, so PGQL seems to be the same as SQL/PGQ and its syntax looks like a subset of Cypher but I would not be surprised to find expressions in each of the language that cannot be used in the other languages.
I suppose very simple query expressions such as MATCH (a:b {c:42})
can safely be used across all of these languages but what about are more complex properties, quoted strings, list values, return types, limits etc.? Is there a safe and formally defined subset of Cypher queries with same semantics across these query languages?
I am a developer at Oracle and have been leading the PGQL effort and am lately focusing on the SQL/PGQ implementation in Oracle Database.
The example you gave (MATCH (a:b {c:42})
) is valid Cypher but is not valid SQL or PGQL.
SQL has standardized this as MATCH (a IS b WHERE a.c = 42)
.
Here, a
is a vertex variable declaration, b
is a label expression introduced by the keyword IS
, and a.c = 42
is a search condition that references property c
of vertex a
.
Is there a safe and formally defined subset of Cypher queries with same semantics across these query languages?
There is not a single (full) query that is valid in all these languages, if only for the fact that SQL queries use SELECT
(or COLUMNS
) where Cypher uses RETURN
and WITH
.
But when only focusing on graph pattern matching there is significant overlap between all these languages and differences are mostly minor and syntactical so that it becomes simple to migrate between them. BTW, the SQL/PGQ specification is technically public, but accessing it indeed requires a fee. The same will be the case for GQL once it is published here.
But different vendors will publish the documentation of their own implementations. Here are the best references for Oracle Database:
Outside of graph pattern matching there is not a lot of overlap between the languages.
In SQL, since graphs are view-like objects on top of tables, users perform INSERT
/UPDATE
/DELETE
operations against underlying tables of the graph. For example, you can query a graph and insert into its underlying tables like this: INSERT INTO ... FROM GRAPH_TABLE ( my_graph MATCH ... )
In SQL, properties are statically typed but there is JSON (and XML) to handle semi-structured data and it can easily be combined with property graphs. For example, JSON dot-notation access inside a property graph query looks like this: MATCH (v IS person) WHERE v.address.street_name = 'Monroe Avenue'
.
SQL includes many important database functionalities that are not specific to graphs but nevertheless useful if not essential to users of graphs: privileges, constraints, triggers, views, rich set of data types and accompanying expressions and predicates, etc. From our project experience, much of the data preparation required to create a good graph model is very efficiently done in SQL. A fair amount of this functionality is lacking from languages that haven't been around as long as SQL.
For PGQL, many of the language constructs from SQL/PGQ have already been added to Oracle’s implementation, while the original specification still works to ensure backwards compatibility. The plan is to allow seamless migration between the two platforms.