javamongodbmongo-java-drivermongo-java

MongoDB 4.4, Java driver 4.2.3 - InsertManyResult.getInsertedIds() not returning IDs for all inserted documents


I am trying to retrieve values of _id for inserted documents after successful InsertMany operation. To achieve this I am using InsertManyResult.getInsertedIds(). While this approach works most of the time there are cases where not all _id values are retrieved.

I am not sure if I am doing something wrong but I would assume that InsertManyResult.getInsertedIds() returns _id for all the documents inserted.

Problem details

I am inserting 1000 documents in MongoDB in two batches of 500 documents. Each document is approx 1 MB in size.

After batch is inserted using InsertMany I attempt to read values of _id via InsertManyResult.getInsertedIds() and save it to a collection for later use.

I would assume that after inserting 500 documents via InsertMany the InsertManyResult.getInsertedIds() would return 500 _id values. It is however returning only 16 _id values out of 500.

When I check the Mongo collection directly via Mongo Shell I see that all records were successfully inserted. There is 1000 documents in my test collection. I am just unable to get the _id of all the inserted document via InsertManyResult.getInsertedIds(). I only get 32 _id for 1000 documents inserted.

JSON structure

To replicate the issue I have exactly one JSON which is approx 1 MB in size which looks like this.

{
  "textVal" : "RmKHtEMMzJDXgEApmWeoZGRdZJZerIj1",
  "intVal" : 161390623,
  "longVal" : "98213019054010317",
  "timestampVal" : "2020-12-31 23:59:59.999",
  "numericVal" : -401277306,
  "largeArrayVal" : [ "MMzJDXg", "ApmWeoZGRdZJZerI", "1LhTxQ", "adprPSb1ZT", ..., "QNLkBZuXenmYE77"]

}

Note that key largeArrayVal is holding almost all the data. I have omitted most of the values for readability.

Sample code

The code below parses JSON shown above into a Document which is then inserted to MongoDB via InsertMany. After that is done I try to get inserted _id using InsertManyResult.getInsertedIds().

private static final int MAX_DOCUMENTS = 1000;
private static final int BULK_SIZE = 500;

private static List<ObjectId> insertBatchReturnIds(List<Document> insertBatch)
{
  List<ObjectId> insertedIds = new ArrayList<ObjectId>();
  InsertManyResult insertManyResult;

  insertManyResult = mongoClient.getDatabase(MONGO_DATABASE).getCollection(MONGO_COLLECTION).insertMany(insertBatch);
  insertManyResult.getInsertedIds().forEach((k,v) -> insertedIds.add(v.asObjectId().getValue()));

  System.out.println("Batch inseted:");
  System.out.println(" - Was acknowladged: " + Boolean.toString(insertManyResult.wasAcknowledged()).toUpperCase());
  System.out.println(" - InsertManyResult.getInsertedIds().size(): " + insertManyResult.getInsertedIds().size());

  return insertedIds;
}

private static void insertDocuments()
{
  int documentsInserted = 0;
  List<Document> insertBatch = new ArrayList<Document>();
  List<ObjectId> insertedIds = new ArrayList<ObjectId>();
  final String largeJson = loadLargeJsonFromFile("d:\\test-sample.json");

  System.out.println("Starting INSERT test...");
  while (documentsInserted < MAX_DOCUMENTS)
  {
    insertBatch.add(Document.parse(largeJson));
    documentsInserted++;

    if (documentsInserted % BULK_SIZE == 0)
    {
     insertedIds.addAll(insertBatchReturnIds(insertBatch));
     insertBatch.clear();
    }
  }
  if (insertBatch.size() > 0)
    insertedIds.addAll(insertBatchReturnIds(insertBatch));
  System.out.println("INSERT test finished");

  System.out.println(String.format("Expected IDs retrieved: %d. Actual IDs retrieved: %d.", MAX_DOCUMENTS, insertedIds.size()));
  if (insertedIds.size() != MAX_DOCUMENTS)
    throw new IllegalStateException("Not all _ID were returned for each document in batch");
}

Sample output

Starting INSERT test...
Batch inseted:
 - Was acknowladged: TRUE
 - InsertManyResult.getInsertedIds().size(): 16
Batch inseted:
 - Was acknowladged: TRUE
 - InsertManyResult.getInsertedIds().size(): 16
INSERT test finished
Expected IDs retrieved: 1000. Actual IDs retrieved: 32.
Exception in thread "main" java.lang.IllegalStateException: Not all _ID were returned for each document in batch

My questions

  1. Is InsertManyResult.getInsertedIds() meant to return _id for all documents inserted?
  2. Is the way I am using InsertManyResult.getInsertedIds() correct?
  3. Could size of the inserted JSON be a factor here?
  4. How should I use InsertManyResult to get _id for inserted documents?

Note

I am aware that I can either read _id after Document.parse as it is the driver that generates this or I can select _id after documents were inserted.
I would like to know how can this be achieved using InsertManyResult.getInsertedIds() as it seems to be made to fit this purpose.


Solution

  • This is a bug in the Java driver, and it's being tracked in https://jira.mongodb.org/browse/JAVA-4436 (reported on January 5, 2022).