I am trying to import many thousands of records into Arango. I am attempting to use the batch/bulk import feature of Arango described at: https://docs.arangodb.com/3.11/develop/http/batch-requests/ to do a combination of PUT and POST requests to either insert new records, or update existing records if they already exist. My end solution needs to run from a Python script, presumably using pyArango. I have created a sample HTTP request
POST http://<arango_server>:8529/_db/myDB/_api/batch
that looks something like the following:
Content-Type: multipart/form-data; boundary=P1X7QNCB
Content-Length: <calculated by python or REST Client>
Authorization: Basic <calculated by python requests session or REST Client>
--P1X7QNCB
Content-type: application/x-arango-batchpart
Content-Id: 1
POST /_api/document/model/foo HTTP/1.1
{"data": "bar"}
--P1X7QNCB
I have not been able to get this to process successfully in Arango. I have tried using python similar to the following (that generates the above request, even if my approximation of the code below has typos):
url = "/_api/document/" + collection + "/" + nodeKey + " HTTP/1.1"
postString = ("--P1X7QNCB\r\n"
"Content-type: application/x-arango-batchpart\r\n"
"Content-Id: " + str(counter) + "\r\n"
"\r\n"
"\r\n"
"PUT " + url+ "\r\n\r\n\r\n" + json.dumps(nodeData) + "\r\n")
batchHeaders = {"Content-Type": "multipart/form-data; boundary=P1X7QNCB"}
response = self.db.connection.session.post(self.db.URL + "/batch", data=postString, headers=batchHeaders)
and using a REST client where I manually post the content. In both cases I get the following response back:
{"error":true,"errorMessage":"invalid multipart message received","code":400,"errorNum":400}
And the following is logged in the arango log file:
WARNING received a corrupted multipart message
Is it obvious to anyone what I am doing wrong, or where I can look for more details on why ArangoDB is rejecting the requests?
Thanks!
ArangoDB will throw this error when it tries to extract the next part of a multipart mime container and fails to.
You should inspect your boundary strings, and check that the last string properly terminates the container with two trailing dashes (--
)
NGrep or Wireshark tend to be very usefull to inspect whats really sent by programs - it may sometimes not be what you think - or even get samples how to do it from other programs.