MongoDB Aggregate: How to preserve/protect fields during multiple $group and $project?

MongoDB Aggregate: How to preserve/protect fields during multiple $group and $project?

This article talks about an alternative way of preserving fields in response data of the aggregate query instead of writing it again and again.

ยท

7 min read

In MongoDB aggregate queries we usually work with a few fields in each document. But we still need other fields for some purpose. In projection and grouping, we, again and again, need to write these fields in order to preserve/protect them.

But there are better and smarter ways to preserve other fields and do the calculations at the same time.

Let's begin with an example

Consider example documents in "movies" collection

// movies collection
[
  { 
    "_id": ObjectId("61a0bc5e8a9fa2e12f5580c0"),
    "name": "Sholay", "year": "1975", "director": "Ramesh Sippy", 
    "ratings": [{ "name": "Simran", "rating": 4 }, { "name": "Santosh", "rating": 5 }, { "name": "Rahul", "rating": 4 }],
    "boxOffice": [{ "place": "India", "amount": 40000 }, { "place": "worldwide", "amount": 60000 }] 
  },
  {
    "_id": ObjectId("61a0bc5e8a9fa2e12f5580c1"),
    "name": "Border", "year": "1997", "director": "JP Dutta",
    "ratings": [{ "name": "Simran", "rating": 3 }, { "name": "Santosh", "rating": 4 }, { "name": "Rahul", "rating": 5 }],
    "boxOffice": [{ "place": "India", "amount": 50000 }, { "place": "worldwide", "amount": 70000 }] 
  },
  {
    "_id": ObjectId("61a0bc5e8a9fa2e12f5580c2"),
    "name": "Gadar", "year": "2001", "director": "Anil Sharma",
    "ratings": [{ "name": "Simran", "rating": 4 }, { "name": "Santosh", "rating": 2 }, { "name": "Rahul", "rating": 5 }],
    "boxOffice": [{ "place": "India", "amount": 30000 }, { "place": "worldwide", "amount": 50000 }] 
  }
]

What query does?

Task is to accomplish these 3 things:

  • Get the average rating of each movie.
  • Get total earnings (boxOffice "India" + "worldwide") of each movie.
  • Get all the other fields.

Logic simply is

  • Split the ratings array in different documents

    { $unwind: "$ratings" }
    

Response is:

{
    "_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
    ------Other fields------
    "ratings" : { "name" : "Simran", "rating" : 4.0 }
},
{
    "_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
    ------Other fields------
    "ratings" : { "name" : "Santosh", "rating" : 5.0 }
},
{
    "_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
    ------Other fields------
    "ratings" : { "name" : "Rahul", "rating" : 4.0 }
} ...so on
  • Re-group documents with unique _id in 3 steps
    1. Get the average of ratings.
    2. Merge ratings into an array so that we have our array as before.
    3. Get other fields while grouping to preserve them.
{ 
   $group: { 
    _id: "$_id",

    // Getting the average
    avg_ratings: { $avg: "$ratings.rating" },

    // Push ratings into an array to get the actual `ratings` array
    ratings: { $push: "$ratings" },

    // Preserve other fields
    name: { $last: "$name" },
    year: { $last: "$year" },
    director: { $last: "$director" },
    boxOffice: { $last: "$boxOffice" },
   }
}

Response is:

{
    "_id" : ObjectId("61a0bc5e8a9fa2e12f5580c2"),
    "avg_ratings" : 3.66666666666667,
    "ratings" : [ 
        { "name" : "Simran", "rating" : 4.0 }, 
        { "name" : "Santosh", "rating" : 2.0 }, 
        { "name" : "Rahul", "rating" : 5.0 }
    ],
    "name" : "Gadar",
    "year" : 2001.0,
    "director" : "Anil Sharma",
    "boxOffice" : [ 
        { "place" : "India", "amount" : 30000.0 }, 
        { "place" : "worldwide", "amount" : 50000.0 }
    ]
} ... so on
  • Now apply the same procedure to get total earnings. (Split and regroup with adding amount and preserve other fields).

Split:

{ $unwind: "$boxOffice" }

Re-group:

{ 
   $group: { 
    _id: "$_id",

    // Getting the sum of earnings in India and world wide
    earnings: { $sum: "$boxOffice.amount" },

    // Push ratings into an array to get the actual `ratings` array
    boxOffice: { $push: "$boxOffice" },

    // Preserve other fields
    name: { $last: "$name" },
    year: { $last: "$year" },
    director: { $last: "$director" },
    ratings: { $last: "$ratings" },

    // Preserve average ratings also
    avg_ratings: { $last: "$avg_ratings" }
   }
}

Response is:

{
    "_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
    "earnings" : 100000.0,
    "boxOffice" : [ 
        { "place" : "India", "amount" : 40000.0 }, 
        { "place" : "worldwide", "amount" : 60000.0 }
    ],
    "name" : "Sholay",
    "year" : 1975.0,
    "director" : "Ramesh Sippy",
    "ratings" : [ 
        { "name" : "Simran", "rating" : 4.0 }, 
        { "name" : "Santosh", "rating" : 5.0 }, 
        { "name" : "Rahul", "rating" : 4.0 }
    ],
    "avg_ratings" : 4.33333333333333
} ... so on

The issue with above code

By this approach, you must have noticed that in order to preserve all other fields on which we are not really operating, we still have to write them again and again.

So, to reduce these steps,

we can simply store the entire document into an object and keep performing operations on those specific fields required.

Let me show you how...

modern problems meme.jpg

  • Move the entire document into a separate object along with fields you want to operate on.

    { $project: { movieObj: "$$ROOT", ratings: 1, boxOffice: 1 } }
    

$$ROOT is the root object in each document.

This is how the response will look like

{
    "_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
    "ratings" : [ 
        { "name" : "Simran", "rating" : 4.0 }, 
        { "name" : "Santosh", "rating" : 5.0 }, 
        { "name" : "Rahul", "rating" : 4.0 }
    ],
    "boxOffice" : [ 
        { "place" : "India", "amount" : 40000.0 }, 
        { "place" : "worldwide", "amount" : 60000.0 }
    ],
    "movieObj" : {
        "_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
        "name" : "Sholay",
        "year" : "1975",
        -----Other fields-----
    }
}, ... so on
  • It's time to split and regroup. Ang the magic happens when we regroup it.

      { $unwind: "$ratings" },
      { 
         $group: { 
              _id: "$_id",
             avg_ratings: { $avg: "$ratings.rating" },
             boxOffice: { $last: "$boxOffice" },
             movieObj: { $last: "$movieObj" }
         }
      }
    

    As all fields are already preserved inside movieObj object, we don't have to worry about it.

  • Same happens in the other two cases, while grouping to get total earnings. It is shown in the final query. This is how one of the documents in response will look like

{
    "_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
    "earnings" : 100000.0,
    "avg_ratings" : 4.33333333333333,
    "movieObj" : {
        "_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
        "name" : "Sholay",
        "year" : "1975",
        "director" : "Ramesh Sippy",
        "ratings" : [ 
            { "name" : "Simran", "rating" : 4.0 }, 
            { "name" : "Santosh", "rating" : 5.0 }, 
            { "name" : "Rahul", "rating" : 4.0 }
        ],
        "boxOffice" : [ 
            { "place" : "India", "amount" : 40000.0 }, 
            { "place" : "worldwide", "amount" : 60000.0 }
        ]
    }
}
  • Now, we need to bring back our actual document which is embedded in movieObj object. But there are other fields as well like avg_ratings and earnings. So, we need to merge this object with the root object and remove the movieObj as its use is finished.

    1. Merge movieObj with root object and replace root.

      { $replaceRoot: { newRoot: { $mergeObjects: ["$$ROOT", "$movieObj"] } } }
      
    2. Remove movieObj.

      { $project: { movieObj: 0 } }
      

Response:

[{
    "_id" : ObjectId("61a0bc5e8a9fa2e12f5580c0"),
    "earnings" : 100000.0,
    "avg_ratings" : 4.33333333333333,
    "name" : "Sholay",
    "year" : "1975",
    "director" : "Ramesh Sippy",
    "ratings" : [ 
        { "name" : "Simran", "rating" : 4.0 }, 
        { "name" : "Santosh", "rating" : 5.0 }, 
        { "name" : "Rahul", "rating" : 4.0 }
    ],
    "boxOffice" : [ 
        { "place" : "India", "amount" : 40000.0 }, 
        { "place" : "worldwide", "amount" : 60000.0 }
    ]
}, ... so on]

Final Query

db.getCollection('movies').aggregate([
    { $project: { movieObj: "$$ROOT", ratings: 1, boxOffice: 1 } },
    { $unwind: "$ratings" },
    { 
       $group: { 
           _id: "$_id",
           avg_ratings: { $avg: "$ratings.rating" },
           boxOffice: { $last: "$boxOffice" },
           movieObj: { $last: "$movieObj" }
       }
    },
    { $unwind: "$boxOffice" },
    { 
       $group: { 
            _id: "$_id",
            earnings: { $sum: "$boxOffice.amount" },
            avg_ratings: { $last: "$avg_ratings" },
            movieObj: { $last: "$movieObj" }
       }
    },
    { $replaceRoot: { newRoot: { $mergeObjects: ["$$ROOT", "$movieObj"] } } },
    { $project: { movieObj: 0 } }
])

Conclusion

Aggregation to accumulate data in MongoDB allows you to use multiple methods. It is important to keep optimizing your queries to get better performance and code readability. There can be even smarter ways. All you need to do is keep looking at your queries and make sure that you follow fundamental principles of Computer Science like KISS (Keep it simple silly) & DRY (Do not repeat yourself).

That's it! ๐Ÿ˜‰

Thank you for reading ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ, and I hope you found this article useful in your path to get better working with aggregate queries ๐Ÿ”ฅ

If you have any questions or just wanna chat, feel free to DM me. You can also respond to me on my social media.

Twitter Profile: @RahulChouhan97

LinkedIn Profile: Rahul Chouhan

Github Profile - Rahul-Geeks

Thank you. โค๏ธ

References

$unset MongoDB - docs.mongodb.com/manual/reference/operator/..

ย