phpmongodbphp-mongodbschema-design

Porting a MySQL app to MongoDB and accommodating the _id field


I have a PHP app which can hook into multiple backends (currently MySQL or XML) and which I'm trying to make work with MongoDB as well. One seemingly minor issue that I'm distinctly struggling with is that Mongo mandates '_id' is the name of the primary key. The backends are generally fairly well abstracted by the application, but the application does need to work with the id on a fairly regular basis, so I access it with code like $result['id'], which up until now has abstracted well.

But now I'm faced with trying to (efficiently!) handle a DB that won't let me use 'id' as its primary key, and I'm not sure what the best option is. Here's what I've thought of so far:

  1. Just leave Mongo's '_id' value alone, and set an application variable $id to "id" or "_id" depending on the backend. Applications should use $id to access the ID field, rather than the hard coded "id". Note that all other fields are accessed by their string names, such as $result['user'] and so this would be breaking with norms in the application.

    Pros: allows MongoCursor objects to be returned directly when possible, ensuring minimal memory usage and fast access to the data.

    Cons: not backwards-compatible, would require (a fairly tedious) refactoring of any code using this application.

  2. Wrap the returned MongoCursor objects in a new iterator class that returns each item with "_id" mapped to "id" appropriately. The application would map inbound calls to "id" back to "_id" when communicating with Mongo.

    Pros: Retains most of the memory efficiency of 1. while preventing most backwards-compatibility issues.

    Cons: Not totally sure how to implement such an object, not convinced it could be done cleanly, or that it would actually work as well as I'm imagining. To do it right, I'd want to implement similar iterator wrappers for the other backends as well.

  3. Load results into memory with iterator_to_array() as described in MongoCollection.find(), do appropriate transforms, and return the array.

    Pros: conceptually simpler than 2., would work well with the rest of my app.

    Cons: obviously a poor choice in terms of memory. Not the end of the world given the app, but still not ideal.

Do any of these options stand out as particular reasonable and robust solutions to this issue? Can anyone suggest other alternatives for handling primary keys in a backend-agnostic way? Additional backends are a possibility in the future, so issues or benefits of a given method related to other data storage systems are also welcome.

I'm currently leaning towards 2., but I welcome your thoughts.


Solution

  • I ended up going with option 2, wrapping the iterator and hiding the existing '_id' field. Not clean, and not elegant, but the best I could do with what I had. Working on this really made me pine for Python.