I have php + yii2 + mongodb running on linux in virtualbox (10 cores, 12 GB RAM). yii2-mongodb is installed. I have a csv file that contains 100.000+ records in 2 columns: custom identifier (not _id), and a number. I would like to add the number to the documents in the collection in mongodb where the custom identifier matches the document. I was trying several different approaches, like 1. $collection->update()
in the while(fgetcsv)
loop, 2. tried to create an array in while()
like:
$command = $conn->createCommand();
while(fgetcsv...) {
$command->addUpdate(['identifier' => $csvrow[0]], ['number' => $csvrow[1]]);
}
$command->executeBatch('collection');
it works but it is terribly slow. 10-20 records per second. I have tried to add a text index to the custom identifier in mongodb compass (I have 1(asc), -1(desc), 2dsphere and text), got no performance boost there. What am I missing?
Looks like you did not create an index or you created a wrong index. Use this one
db.collection.createIndexes([ { identifier: 1 } ], { unique: true })
By using bulkWrite it should get even faster. In order to avoid "out of memory" problems I would suggest
$ops = [];
while(fgetcsv...) {
array_push($ops, [ 'updateOne' => [ [ 'identifier' => $csvrow[0] ], [ '$set' => [ 'number' => $csvrow[1] ] ] ] ]);
if (count($ops)) > 10000 {
$conn->createCommand()->addUpdate($ops)->executeBatch('collection');
$ops = [];
}
}
if (count($ops)) > 0
$conn->createCommand()->addUpdate($ops)->executeBatch('collection');