neo4jgremlintinkerpopjanusgraphgraphml

Gremlin: Read edge GraphML file and node GraphML file in separate queries


I have two files that I want to load by using g.io(<name file>).read().iterate(): nodes.xml and edges.xml.

The nodes.xml file contains the nodes of the graph I want to upload, and its contents are this:

<?xml version='1.0' encoding='utf-8'?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
    <key id="labelV" for="node" attr.name="labelV" attr.type="string" />
    <key id="name" for="node" attr.name="name" attr.type="string" />
    <key id="age" for="node" attr.name="age" attr.type="int" />
    <graph id="G" edgedefault="directed">
        <node id="1">
            <data key="labelV">person</data>
            <data key="name">marko</data>
            <data key="age">29</data>
        </node>
        <node id="2">
            <data key="labelV">person</data>
            <data key="name">vadas</data>
            <data key="age">27</data>
        </node>
    </graph>
</graphml>

The edges.xml file contains the edges of the graph I want to upload, and its content are this:

<?xml version='1.0' encoding='utf-8'?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
    <key id="labelE" for="edge" attr.name="labelE" attr.type="string" />
    <key id="weight" for="edge" attr.name="weight" attr.type="double" />
    <graph id="G" edgedefault="directed">
        <edge id="7" source="1" target="2">
            <data key="labelE">knows</data>
            <data key="weight">0.5</data>
        </edge>
    </graph>
</graphml>

I want to upload the nodes first by running g.io('nodes.xml').read().iterate() and then the edges by running g.io('edges.xml').read().iterate(). But when I upload the edges.xml, instead of adding edges to the previously created nodes, it creates new nodes.

It is possible to easily load the nodes first and then the edges in separate queries with a similar command in Gremlin? I know this can be accomplished with complex queries that involve reading and creating edge by edge the edges in the edges.xml file via user queries, but I'm wondering if there is something easier. Also, I wouldn't want to upload a single file with all the nodes and edges.


Solution

  • I'm afraid that the GraphMLReader doesn't work that way. It's not designed to read into an existing graph. I honestly can't remember if this was done purposefully or not.

    The code isn't too complicated though. You could probably just modify it to work they way that you want. You can see here where the code checks the vertex cache for the id. That cache is empty on your second execution because it is only filled by way of new vertex additions - it doesn't remember any from your first run and it doesn't read from the graph directly for your second run. Simply change that to logic to better suit your needs.