grapharangodbpyarango

Best way to import bulk data into ArangoDB


I'm currently working on an ArangoDB POC. I find that the time taken for document creation is very high in ArangoDB with PyArango. It takes about 5 minutes to insert 300 documents. I've pasted the rough code below, please let me know if there are better ways to speed this up :

with open('abc.csv') as fp:
for line in fp:
    dataList = line.split(",")

    aaa = dbObj['aaa'].createDocument()
    bbb = dbObj['bbb'].createDocument() 
    ccc = dbObj['ccc'].createEdge()

    bbb['bbb'] = dataList[1]
    aaa['aaa'] = dataList[0]
    aaa._key = dataList[0]

    aaa.save()
    bbb.save()

    ccc.links(aaa,bbb)
    ccc['related_to'] = "gfdgf"
    ccc['weight'] = 0

    ccc.save()

The different collections are created by the below code :

 dbObj.createCollection(className='aaa', waitForSync=False)

Solution

  • for your problem with the batch mode in the arango java driver. if you know the key attributes of the vertices you can build the document handle by "collectionName" + "/" + "documentKey". Example:

    arangoDriver.startBatchMode();
    
    for(String line : lines)
    {
      String[] data = line.split(",");
    
      BaseDocument device = new BaseDocument();
      BaseDocument phyAddress = new BaseDocument(); 
      BaseDocument conn = new BaseDocument();
    
      String keyDevice = data[0];
      String handleDevice = "DeviceId/" + keyDevice; 
    
      device.setDocumentKey(keyDevice);
    
      device.addAttribute("device_id",data[0]);
    
      String keyPhyAddress = data[1];
      String handlePhyAddress = "PhysicalLocation/" + keyPhyAddress; 
    
      phyAddress.setDocumentKey(keyPhyAddress);
    
      phyAddress.addAttribute("address",data[1]);
    
      final DocumentEntity<BaseDocument> from = arangoDriver.graphCreateVertex("testGraph", "DeviceId", device, null);       
      final DocumentEntity<BaseDocument> to = arangoDriver.graphCreateVertex("testGraph", "PhysicalLocation", phyAddress, null);
    
      arangoDriver.graphCreateEdge("testGraph", "DeviceId_PhysicalLocation", null, handleDevice, handlePhyAddress, null, null);
    
    }
    arangoDriver.executeBatch();