mongodbaggregation-frameworkpymongoupdatesfindandmodify

How to update MongoDB collection from aggregation results, set values within array


Using PyMongo, I'm inner joining MongoDB documents in one collection (refer_docs) with documents in another collection (test), and I want to find those documents in which the values from a field in each document of the join don't match, and I want to update those values. The refer_docs document should say that the test document it points to has the correct document type.

I can find those documents, but I'd like to do the update without looping through a list of the aggregation results and running updates one at a time.

Is there a Mongo/PyMongo method to use an aggregate pipeline as a query within an update, or to use $set within aggregate()? Or?

The pipeline starts as a join between collections with documents like the following examples:

import pymongo
mongo_client = MongoClient('localhost:27017')
osm_db = mongo_client.osm
refer_docs_col = osm_db["refer_docs"]
test_col = osm_db["test"]

# test collection:
test_col.insert_many(
[
 {'_id': '611868136', 'doc_type': 'way'},
 {'_id': '5792648632', 'doc_type': 'node'},
 {'_id': '611868133', 'doc_type': 'node'},
 {'_id': '1', 'doc_type': 'node'}
]
)

# refer_docs collection:
refer_docs_col.insert_many(
[
 {'_id': '8483444',
  'refs': [{'ref': '611868136', 'ref_type': 'way'},
           {'ref': '5792648632', 'ref_type': 'node'},
           {'ref': '611868133', 'ref_type': 'way'}],
  'doc_type': 'relation'}
]
)

Now, here's the pipeline and one attempt at updating the collection, which is essentially looping through the results, and which doesn't work:

pipeline = [
    { "$unwind" : "$refs" },
    {
        "$lookup" : {
            "from" : "test",
            "localField" : "refs.ref",
            "foreignField" : "_id",
            "as" : "ref_doc"
        }
    },
    { "$match" : { "ref_doc" : { "$ne" : [] } } },
    { "$unwind" : "$ref_doc"},
    { "$project" : { "_id" : 1, "refs" : 1, "ref_doc.doc_type" : 1, 
                     "cmp" : { "$cmp" : [ "$refs.ref_type",
                                          "$ref_doc.doc_type" ] } } },
    { "$match" : { "cmp" : { "$ne" : 0 } } },
]

result = [
 refer_docs_col.find_one_and_update( doc,
    { "$set" : { "refs.ref_type" : "$ref_doc.doc_type" } } ) \
 for doc in refer_docs_col.aggregate(pipeline)
]

refer_docs_col.find_one( { "_id" : "8483444" } )

This doesn't work, but I want to see that the document has been updated so that the ref_type of ref "611586133" is now "node":

{'_id': '8483444',
 'refs': [{'ref': '611868136', 'ref_type': 'way'},
          {'ref': '5792648632', 'ref_type': 'node'},
          {'ref': '611868133', 'ref_type': 'node'}],
 'doc_type': 'relation'}

Solution

  • This worked.

    Here, I used "refs" : mismatch["refs"] to find the array element in the filter, and "refs.$" to set the found array element.

    for mismatch in result: # Loop aggregation result docs
        filtr = { "_id" : mismatch["_id"] ,
                  "refs" : mismatch["refs"]}
        update = { "$set" : { "refs.$" : { 
            "ref" : mismatch["ref_doc"]["_id"],
            "ref_type" : mismatch["ref_doc"]["doc_type"] } } }
        doc = refer_docs_col.update_one(filtr, update)
    
    refer_docs_col.find_one( { "_id" : "8483444" } )
    

    Output:

    {'_id': '8483444',
     'refs': [{'ref': '611868136', 'ref_type': 'way'},
      {'ref': '5792648632', 'ref_type': 'node'},
      {'ref': '611868133', 'ref_type': 'node'}],
     'doc_type': 'relation'}