MongoDB Aggregate: How to preserve/protect fields during multiple $group and $project?
This article talks about an alternative way of preserving fields in response data of the aggregate query instead of writing it again and again.
In MongoDB aggregate queries we usually work with a few fields in each document. But we still need other fields for some purpose. In projection and grouping, we, again and again, need to write these fields in order to preserve/protect them.
But there are better and smarter ways to preserve other fields and do the calculations at the same time.
Let's begin with an example
Consider example documents in "movies" collection
// movies collection
[
{
"_id": ObjectId("61a0bc5e8a9fa2e12f5580c0"),
"name": "Sholay", "year": "1975", "director": "Ramesh Sippy",
"ratings": [{ "name": "Simran", "rating": 4 }, { "name": "Santosh", "rating": 5 }, { "name": "Rahul", "rating": 4 }],
"boxOffice": [{ "place": "India", "amount": 40000 }, { "place": "worldwide", "amount": 60000 }]
},
{
"_id": ObjectId("61a0bc5e8a9fa2e12f5580c1"),
"name": "Border", "year": "1997", "director": "JP Dutta",
"ratings": [{ "name": "Simran", "rating": 3 }, { "name": "Santosh", "rating": 4 }, { "name": "Rahul", "rating": 5 }],
"boxOffice": [{ "place": "India", "amount": 50000 }, { "place": "worldwide", "amount": 70000 }]
},
{
"_id": ObjectId("61a0bc5e8a9fa2e12f5580c2"),
"name": "Gadar", "year": "2001", "director": "Anil Sharma",
"ratings": [{ "name": "Simran", "rating": 4 }, { "name": "Santosh", "rating": 2 }, { "name": "Rahul", "rating": 5 }],
"boxOffice": [{ "place": "India", "amount": 30000 }, { "place": "worldwide", "amount": 50000 }]
}
]
What query does?
Task is to accomplish these 3 things:
- Get the average rating of each movie.
- Get total earnings (
boxOffice
"India" + "worldwide") of each movie.- Get all the other fields.
Logic simply is
Split the
ratings
array in different documents{ $unwind: "$ratings" }
Response is:
{
"_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
------Other fields------
"ratings" : { "name" : "Simran", "rating" : 4.0 }
},
{
"_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
------Other fields------
"ratings" : { "name" : "Santosh", "rating" : 5.0 }
},
{
"_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
------Other fields------
"ratings" : { "name" : "Rahul", "rating" : 4.0 }
} ...so on
- Re-group documents with unique
_id
in 3 steps- Get the average of ratings.
- Merge ratings into an array so that we have our array as before.
- Get other fields while grouping to preserve them.
{
$group: {
_id: "$_id",
// Getting the average
avg_ratings: { $avg: "$ratings.rating" },
// Push ratings into an array to get the actual `ratings` array
ratings: { $push: "$ratings" },
// Preserve other fields
name: { $last: "$name" },
year: { $last: "$year" },
director: { $last: "$director" },
boxOffice: { $last: "$boxOffice" },
}
}
Response is:
{
"_id" : ObjectId("61a0bc5e8a9fa2e12f5580c2"),
"avg_ratings" : 3.66666666666667,
"ratings" : [
{ "name" : "Simran", "rating" : 4.0 },
{ "name" : "Santosh", "rating" : 2.0 },
{ "name" : "Rahul", "rating" : 5.0 }
],
"name" : "Gadar",
"year" : 2001.0,
"director" : "Anil Sharma",
"boxOffice" : [
{ "place" : "India", "amount" : 30000.0 },
{ "place" : "worldwide", "amount" : 50000.0 }
]
} ... so on
- Now apply the same procedure to get total earnings. (Split and regroup with adding amount and preserve other fields).
Split:
{ $unwind: "$boxOffice" }
Re-group:
{
$group: {
_id: "$_id",
// Getting the sum of earnings in India and world wide
earnings: { $sum: "$boxOffice.amount" },
// Push ratings into an array to get the actual `ratings` array
boxOffice: { $push: "$boxOffice" },
// Preserve other fields
name: { $last: "$name" },
year: { $last: "$year" },
director: { $last: "$director" },
ratings: { $last: "$ratings" },
// Preserve average ratings also
avg_ratings: { $last: "$avg_ratings" }
}
}
Response is:
{
"_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
"earnings" : 100000.0,
"boxOffice" : [
{ "place" : "India", "amount" : 40000.0 },
{ "place" : "worldwide", "amount" : 60000.0 }
],
"name" : "Sholay",
"year" : 1975.0,
"director" : "Ramesh Sippy",
"ratings" : [
{ "name" : "Simran", "rating" : 4.0 },
{ "name" : "Santosh", "rating" : 5.0 },
{ "name" : "Rahul", "rating" : 4.0 }
],
"avg_ratings" : 4.33333333333333
} ... so on
The issue with above code
By this approach, you must have noticed that in order to preserve all other fields on which we are not really operating, we still have to write them again and again.
So, to reduce these steps,
we can simply store the entire document into an object and keep performing operations on those specific fields required.
Let me show you how...
Move the entire document into a separate object along with fields you want to operate on.
{ $project: { movieObj: "$$ROOT", ratings: 1, boxOffice: 1 } }
$$ROOT
is the root object in each document.
This is how the response will look like
{
"_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
"ratings" : [
{ "name" : "Simran", "rating" : 4.0 },
{ "name" : "Santosh", "rating" : 5.0 },
{ "name" : "Rahul", "rating" : 4.0 }
],
"boxOffice" : [
{ "place" : "India", "amount" : 40000.0 },
{ "place" : "worldwide", "amount" : 60000.0 }
],
"movieObj" : {
"_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
"name" : "Sholay",
"year" : "1975",
-----Other fields-----
}
}, ... so on
It's time to split and regroup. Ang the magic happens when we regroup it.
{ $unwind: "$ratings" }, { $group: { _id: "$_id", avg_ratings: { $avg: "$ratings.rating" }, boxOffice: { $last: "$boxOffice" }, movieObj: { $last: "$movieObj" } } }
As all fields are already preserved inside
movieObj
object, we don't have to worry about it.Same happens in the other two cases, while grouping to get total earnings. It is shown in the final query. This is how one of the documents in response will look like
{
"_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
"earnings" : 100000.0,
"avg_ratings" : 4.33333333333333,
"movieObj" : {
"_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
"name" : "Sholay",
"year" : "1975",
"director" : "Ramesh Sippy",
"ratings" : [
{ "name" : "Simran", "rating" : 4.0 },
{ "name" : "Santosh", "rating" : 5.0 },
{ "name" : "Rahul", "rating" : 4.0 }
],
"boxOffice" : [
{ "place" : "India", "amount" : 40000.0 },
{ "place" : "worldwide", "amount" : 60000.0 }
]
}
}
Now, we need to bring back our actual document which is embedded in
movieObj
object. But there are other fields as well likeavg_ratings
andearnings
. So, we need to merge this object with the root object and remove themovieObj
as its use is finished.Merge
movieObj
with root object and replace root.{ $replaceRoot: { newRoot: { $mergeObjects: ["$$ROOT", "$movieObj"] } } }
Remove
movieObj
.{ $project: { movieObj: 0 } }
Response:
[{
"_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
"earnings" : 100000.0,
"avg_ratings" : 4.33333333333333,
"name" : "Sholay",
"year" : "1975",
"director" : "Ramesh Sippy",
"ratings" : [
{ "name" : "Simran", "rating" : 4.0 },
{ "name" : "Santosh", "rating" : 5.0 },
{ "name" : "Rahul", "rating" : 4.0 }
],
"boxOffice" : [
{ "place" : "India", "amount" : 40000.0 },
{ "place" : "worldwide", "amount" : 60000.0 }
]
}, ... so on]
Final Query
db.getCollection('movies').aggregate([
{ $project: { movieObj: "$$ROOT", ratings: 1, boxOffice: 1 } },
{ $unwind: "$ratings" },
{
$group: {
_id: "$_id",
avg_ratings: { $avg: "$ratings.rating" },
boxOffice: { $last: "$boxOffice" },
movieObj: { $last: "$movieObj" }
}
},
{ $unwind: "$boxOffice" },
{
$group: {
_id: "$_id",
earnings: { $sum: "$boxOffice.amount" },
avg_ratings: { $last: "$avg_ratings" },
movieObj: { $last: "$movieObj" }
}
},
{ $replaceRoot: { newRoot: { $mergeObjects: ["$$ROOT", "$movieObj"] } } },
{ $project: { movieObj: 0 } }
])
Conclusion
Aggregation to accumulate data in MongoDB allows you to use multiple methods. It is important to keep optimizing your queries to get better performance and code readability. There can be even smarter ways. All you need to do is keep looking at your queries and make sure that you follow fundamental principles of Computer Science like KISS (Keep it simple silly) & DRY (Do not repeat yourself).
That's it! ๐
Thank you for reading ๐ฅ๐ฅ๐ฅ, and I hope you found this article useful in your path to get better working with aggregate queries ๐ฅ
If you have any questions or just wanna chat, feel free to DM me. You can also respond to me on my social media.
Twitter Profile: @RahulChouhan97
LinkedIn Profile: Rahul Chouhan
Github Profile - Rahul-Geeks
Thank you. โค๏ธ
References
$unset MongoDB - docs.mongodb.com/manual/reference/operator/..