I am trying to retrieve values of _id
for inserted documents after successful InsertMany
operation. To achieve this I am using InsertManyResult.getInsertedIds()
. While this approach works most of the time there are cases where not all _id
values are retrieved.
I am not sure if I am doing something wrong but I would assume that InsertManyResult.getInsertedIds()
returns _id
for all the documents inserted.
I am inserting 1000 documents in MongoDB in two batches of 500 documents. Each document is approx 1 MB in size.
After batch is inserted using InsertMany
I attempt to read values of _id
via InsertManyResult.getInsertedIds()
and save it to a collection for later use.
I would assume that after inserting 500 documents via InsertMany
the InsertManyResult.getInsertedIds()
would return 500 _id
values. It is however returning only 16 _id
values out of 500.
When I check the Mongo collection directly via Mongo Shell I see that all records were successfully inserted. There is 1000 documents in my test collection. I am just unable to get the _id
of all the inserted document via InsertManyResult.getInsertedIds()
. I only get 32 _id
for 1000 documents inserted.
To replicate the issue I have exactly one JSON which is approx 1 MB in size which looks like this.
{
"textVal" : "RmKHtEMMzJDXgEApmWeoZGRdZJZerIj1",
"intVal" : 161390623,
"longVal" : "98213019054010317",
"timestampVal" : "2020-12-31 23:59:59.999",
"numericVal" : -401277306,
"largeArrayVal" : [ "MMzJDXg", "ApmWeoZGRdZJZerI", "1LhTxQ", "adprPSb1ZT", ..., "QNLkBZuXenmYE77"]
}
Note that key largeArrayVal
is holding almost all the data. I have omitted most of the values for readability.
The code below parses JSON shown above into a Document
which is then inserted to MongoDB via InsertMany
. After that is done I try to get inserted _id
using InsertManyResult.getInsertedIds()
.
private static final int MAX_DOCUMENTS = 1000;
private static final int BULK_SIZE = 500;
private static List<ObjectId> insertBatchReturnIds(List<Document> insertBatch)
{
List<ObjectId> insertedIds = new ArrayList<ObjectId>();
InsertManyResult insertManyResult;
insertManyResult = mongoClient.getDatabase(MONGO_DATABASE).getCollection(MONGO_COLLECTION).insertMany(insertBatch);
insertManyResult.getInsertedIds().forEach((k,v) -> insertedIds.add(v.asObjectId().getValue()));
System.out.println("Batch inseted:");
System.out.println(" - Was acknowladged: " + Boolean.toString(insertManyResult.wasAcknowledged()).toUpperCase());
System.out.println(" - InsertManyResult.getInsertedIds().size(): " + insertManyResult.getInsertedIds().size());
return insertedIds;
}
private static void insertDocuments()
{
int documentsInserted = 0;
List<Document> insertBatch = new ArrayList<Document>();
List<ObjectId> insertedIds = new ArrayList<ObjectId>();
final String largeJson = loadLargeJsonFromFile("d:\\test-sample.json");
System.out.println("Starting INSERT test...");
while (documentsInserted < MAX_DOCUMENTS)
{
insertBatch.add(Document.parse(largeJson));
documentsInserted++;
if (documentsInserted % BULK_SIZE == 0)
{
insertedIds.addAll(insertBatchReturnIds(insertBatch));
insertBatch.clear();
}
}
if (insertBatch.size() > 0)
insertedIds.addAll(insertBatchReturnIds(insertBatch));
System.out.println("INSERT test finished");
System.out.println(String.format("Expected IDs retrieved: %d. Actual IDs retrieved: %d.", MAX_DOCUMENTS, insertedIds.size()));
if (insertedIds.size() != MAX_DOCUMENTS)
throw new IllegalStateException("Not all _ID were returned for each document in batch");
}
Starting INSERT test...
Batch inseted:
- Was acknowladged: TRUE
- InsertManyResult.getInsertedIds().size(): 16
Batch inseted:
- Was acknowladged: TRUE
- InsertManyResult.getInsertedIds().size(): 16
INSERT test finished
Expected IDs retrieved: 1000. Actual IDs retrieved: 32.
Exception in thread "main" java.lang.IllegalStateException: Not all _ID were returned for each document in batch
InsertManyResult.getInsertedIds()
meant to return _id
for all documents inserted?InsertManyResult.getInsertedIds()
correct?InsertManyResult
to get _id
for inserted documents?I am aware that I can either read _id
after Document.parse
as it is the driver that generates this or I can select _id
after documents were inserted.
I would like to know how can this be achieved using InsertManyResult.getInsertedIds()
as it seems to be made to fit this purpose.
This is a bug in the Java driver, and it's being tracked in https://jira.mongodb.org/browse/JAVA-4436 (reported on January 5, 2022).