node.jsmongodbmongoosehigh-load

Change string indexes to be ObjectID ones in large MongoDB instance


So, I've git a large production database dump with _id field as strings. Different collections got those string's length different. There're a lot of relations there. I need a way to change string _ids to ObjectId ones.

What I've tried already:

1) Looking mongoose/mongodb documentation for single command to do that failed

2) node.js migration script that grabs all the entries in one collection and wraps string id into ObjectId just fails either because of stack overflow FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory if we're trying to delete-and-recreate approach or with an error about bad string lenght and impossibility to create ObjectId from that string.

Will attach data samples and/or mongoose schema a bit later.


Solution

  • A simple and inefficient cursor solution to avoid JavaScript heap out of memory is to serialize everytyhing. Of course in order to edit the _id you have to create a new document and remove the old one.

    const cursor = Model.find().lean().cursor();
    let doc;
    while ((doc = await cursor.next())) {
      // The string must be a valid ObjectId, otherwhise it won't work
      if (typeof doc._id === 'string') {
        let newId = new mongoose.Types.ObjectId(doc._id);
        let newDoc = new Model(Object.assign({}, doc, {_id: newId}));
        await newDoc.save();
        await Model.remove({_id: doc._id});
      }
    }
    

    However, if you have errors about incorrect ids, it may be beacause the string id are not actually stringified version of mongo ObjectId. In such case, the relation cannot be preserved.