Here's what my records look like:
I have obtained this by using collection.find().limit(1)
[
{
"_id": {"$oid": "..."},
"husband.firstName": "John",
"husband.secondName": "Smith",
"wife.firstName": "Alice",
"wife.secondName": "Watson",
"...": "...",
}
]
The husband and wife fields only contains firstName and secondName
I want to count how common each husband and wife name combos are.
I imagine the results in the form of something like:
[
{
"husband.firstName": "John",
"husband.secondName": "Smith",
"wife.firstName": "Alice",
"wife.secondName": "Watson",
"count": "456",
},
{
"husband.firstName": "Jack",
"husband.secondName": "Smith",
"wife.firstName": "Alice",
"wife.secondName": "Watson",
"count": "123",
}
]
I'm using Python and pymongo so I have tried the following:
pipeline = [
{
"$group": {
"_id": {
"husband": "$husband",
"wife": "$wife"
},
"count": {"$sum": 1}
}
},
{
"$sort": {"count": -1}
},
]
However this returns an empty result of:
[
{
"_id": "{}",
"count": 47553
}
]
I tried also grouping by the fields separately but the result was the same.
As pointed out in the comments by @cmgchess, the main difficulty for your case is the dot in your field name. You may want to refactor your schema to something like below:
{
"husband": {
"firstName": "John",
"secondName": "Smith"
},
"wife": {
"firstName": "Alice",
"secondName": "Watson"
}
}
Nevertheless, for your current schema, you may workaround it through usage of $getField
.
db.collection.aggregate([
{
"$group": {
"_id": {
"husband": {
"firstName": {
"$getField": {
"field": "husband.firstName",
"input": "$$ROOT"
}
},
"secondName": {
"$getField": {
"field": "husband.secondName",
"input": "$$ROOT"
}
}
},
"wife": {
"firstName": {
"$getField": {
"field": "wife.firstName",
"input": "$$ROOT"
}
},
"secondName": {
"$getField": {
"field": "wife.secondName",
"input": "$$ROOT"
}
}
}
},
"count": {
"$sum": 1
}
}
},
{
"$sort": {
"count": -1
}
}
])