A Complete Guide - MongoDB Use Cases for Aggregation Pipelines
MongoDB Use Cases for Aggregation Pipelines
1. Data Transformation and Filtering
Use Case: Data cleaning and preprocessing are crucial stages in data analysis. Aggregation pipelines can perform these tasks efficiently by filtering out unwanted data and transforming the existing data into a suitable format.
Important Information:
- $match: Filters documents that pass the specified condition.
- $project: Reshapes each document in the stream, such as by adding new fields or removing existing fields.
- $redact: Restricts the documents to the set of fields that match the given condition.
- $replaceRoot: Replaces the input document with the specified document.
Example:
db.sales.aggregate([
{ $match: { status: "A" } },
{ $project: { _id: 0, item: 1, price: 1, quantity: 1 } }
])
This pipeline filters documents with a status of "A" and projects only the item
, price
, and quantity
fields.
2. Data Aggregation and Grouping
Use Case: Summarizing and aggregating large datasets is essential for generating insights and driving business decisions. Aggregation pipelines can perform complex grouping operations.
Important Information:
- $group: Groups input documents by a specified expression and outputs to the next stage a document for each distinct grouping.
- $accumulator: Adds user-defined accumulators to the
$group
stage. - $bucket: Categorizes incoming documents into groups based on specified boundaries.
Example:
db.sales.aggregate([
{ $group: { _id: "$item", totalQuantity: { $sum: "$quantity" } } }
])
This pipeline groups documents by the item
field and sums up the quantity
for each item.
3. Data Sorting and Limiting
Use Case: Sorting and limiting results are common requirements in data analysis to focus on top-performing or bottom-performing items.
Important Information:
- $sort: Sorts all input documents and returns them in sorted order.
- $limit: Limits the number of documents passed to the next stage in the pipeline.
- $skip: Skips a specified number of documents and passes the remaining documents to the next stage.
Example:
db.sales.aggregate([
{ $sort: { totalQuantity: -1 } },
{ $limit: 5 }
])
This pipeline sorts the documents by totalQuantity
in descending order and limits the results to the top 5.
4. Joining Data from Different Collections
Use Case: Data often resides in multiple collections or databases. Aggregation pipelines can perform lookups to combine data from different collections.
Important Information:
- $lookup: Performs a left outer join to another collection in the same database to filter in documents from the “joined” collection for processing.
- $graphLookup: Recursively joins subdocuments to the current document from the same collection.
Example:
db.orders.aggregate([
{
$lookup:
{
from: "customers",
localField: "cust_id",
foreignField: "customer_id",
as: "customer_info"
}
}
])
This pipeline joins the orders
collection with the customers
collection based on the cust_id
and customer_id
fields.
5. Calculating Running Totals and Moving Averages
Use Case: Running totals and moving averages are useful for analyzing trends over time, such as stock price movements or sales performance.
Important Information:
- $addFields: Adds new fields to documents.
$addFields
outputs documents that contain all existing fields from the input documents and newly added fields. - $window: Performs operations on a set of documents in a defined window and outputs the result for each document individually.
$window
can be used to calculate running totals and moving averages.
Example:
db.sales.aggregate([
{
$window:
{
output:
{
runningTotal: { $sum: "$value" },
movingAvg: { $avg: "$value", window: { documents: [-2, 2] } }
}
}
}
])
This pipeline calculates a running total and a moving average with a window of 5 documents centered on the current document.
6. Geospatial Aggregation
Use Case: Geospatial data is critical in location-based services, logistics, and urban planning. Aggregation pipelines can perform geospatial operations directly within MongoDB.
Important Information:
- $geoNear: Outputs documents in order of nearest to farthest from a specified point.
- $geoWithin: Selects documents with geospatial data within a specified shape.
- $geoIntersects: Selects documents with geospatial data that intersect with a specified shape.
Example:
db.places.aggregate([
{
$geoNear:
{
near: { type: "Point", coordinates: [ -73.99279 , 40.719296 ] },
distanceField: "dist.calculated",
maxDistance: 2000,
spherical: true
}
}
])
This pipeline retrieves locations within a maximum distance of 2000 meters from the specified point.
7. Data Analysis and Reporting
Use Case: Generating business reports and performing data analysis are vital for making informed decisions. Aggregation pipelines can provide the required data in real-time.
Important Information:
- $facet: Performs multiple aggregation operations in parallel.
- $bucketAuto: Categorizes incoming documents into a specific number of groups based on a specified expression.
- $out: Writes the result of the aggregation pipeline to a specified collection.
Example:
db.orders.aggregate([
{
$facet:
{
byCategory:
[
{ $group: { _id: "$category", totalSales: { $sum: "$amount" } } },
{ $sort: { totalSales: -1 } },
{ $limit: 5 }
],
salesByDay:
[
{ $group: { _id: { $dayOfYear: "$orderDate" }, totalSales: { $sum: "$amount" } } },
{ $sort: { _id: 1 } },
{ $limit: 12 }
]
}
}
])
This pipeline generates two reports: the top 5 categories by total sales and the total sales by day for the last 12 days.
Conclusion
Online Code run
Step-by-Step Guide: How to Implement MongoDB Use Cases for Aggregation Pipelines
Use Case 1: Calculating Total Sales for a Product
Scenario: Suppose you have a collection named orders
with the following documents. You want to calculate the total sales for each product.
{ "_id": 1, "product_id": "A101", "quantity": 3, "price": 10 }
{ "_id": 2, "product_id": "B202", "quantity": 5, "price": 20 }
{ "_id": 3, "product_id": "A101", "quantity": 2, "price": 10 }
{ "_id": 4, "product_id": "C303", "quantity": 4, "price": 15 }
{ "_id": 5, "product_id": "B202", "quantity": 3, "price": 20 }
Objective: Calculate the total sales (quantity * price
) for each product.
Steps:
- Match: Filter the documents (if needed, but in this case, all documents are relevant).
- Group: Group by
product_id
and calculate the total sales. - Project: Format the output.
Aggregation Pipeline:
db.orders.aggregate([
{
$group: {
_id: "$product_id",
totalSales: { $sum: { $multiply: ["$quantity", "$price"] } }
}
},
{
$project: {
_id: 0,
product_id: "$_id",
totalSales: 1
}
}
])
Output:
{ "product_id": "A101", "totalSales": 50 }
{ "product_id": "B202", "totalSales": 170 }
{ "product_id": "C303", "totalSales": 60 }
Use Case 2: Filtering and Sorting Data
Scenario: Suppose you have a collection named transactions
with the following documents. You want to filter transactions for a specific year, sort them by amount in descending order, and limit the results to the top 5.
{ "_id": 1, "account_id": "123", "year": 2021, "amount": 150 }
{ "_id": 2, "account_id": "456", "year": 2022, "amount": 200 }
{ "_id": 3, "account_id": "789", "year": 2021, "amount": 250 }
{ "_id": 4, "account_id": "123", "year": 2021, "amount": 300 }
{ "_id": 5, "account_id": "456", "year": 2021, "amount": 350 }
{ "_id": 6, "account_id": "789", "year": 2022, "amount": 100 }
Objective: Filter transactions for the year 2021, sort them by amount in descending order, and limit the results to the top 5.
Steps:
- Match: Filter documents for the year 2021.
- Sort: Sort the documents by
amount
in descending order. - Limit: Limit the results to the top 5.
Aggregation Pipeline:
db.transactions.aggregate([
{
$match: { year: 2021 }
},
{
$sort: { amount: -1 }
},
{
$limit: 5
}
])
Output:
{ "_id": 5, "account_id": "456", "year": 2021, "amount": 350 }
{ "_id": 4, "account_id": "123", "year": 2021, "amount": 300 }
{ "_id": 3, "account_id": "789", "year": 2021, "amount": 250 }
{ "_id": 1, "account_id": "123", "year": 2021, "amount": 150 }
Use Case 3: Grouping by Multiple Fields and Using Accumulators
Scenario: Suppose you have a collection named sales
with the following documents. You want to group sales by region
and product
, calculate the total sales amount, and find the average sales amount per product.
{ "_id": 1, "region": "North", "product": "Widget", "amount": 200 }
{ "_id": 2, "region": "South", "product": "Widget", "amount": 150 }
{ "_id": 3, "region": "North", "product": "Gadget", "amount": 250 }
{ "_id": 4, "region": "South", "product": "Gadget", "amount": 100 }
{ "_id": 5, "region": "North", "product": "Widget", "amount": 300 }
{ "_id": 6, "region": "South", "product": "Widget", "amount": 175 }
Objective: Group sales by region
and product
, calculate the total sales amount, and find the average sales amount per product.
Steps:
- Group: Group by
region
andproduct
, and calculate total and average sales. - Project: Format the output.
Aggregation Pipeline:
db.sales.aggregate([
{
$group: {
_id: { region: "$region", product: "$product" },
totalSales: { $sum: "$amount" },
averageSales: { $avg: "$amount" }
}
},
{
$project: {
_id: 0,
region: "$_id.region",
product: "$_id.product",
totalSales: 1,
averageSales: 1
}
}
])
Output:
{ "region": "North", "product": "Widget", "totalSales": 500, "averageSales": 250 }
{ "region": "South", "product": "Widget", "totalSales": 325, "averageSales": 162.5 }
{ "region": "North", "product": "Gadget", "totalSales": 250, "averageSales": 250 }
{ "region": "South", "product": "Gadget", "totalSales": 100, "averageSales": 100 }
Use Case 4: Joining Collections (Lookup)
Scenario: Suppose you have two collections named orders
and customers
. You want to join these collections to retrieve customer information for each order.
Collections:
orders:
{ "_id": 1, "customer_id": "C100", "product": "Widget", "amount": 200 }
{ "_id": 2, "customer_id": "C101", "product": "Gadget", "amount": 150 }
{ "_id": 3, "customer_id": "C100", "product": "Widget", "amount": 300 }
customers:
{ "_id": "C100", "name": "Alice", "email": "alice@example.com" }
{ "_id": "C101", "name": "Bob", "email": "bob@example.com" }
Objective: Join the orders
and customers
collections to retrieve customer information for each order.
Steps:
- Lookup: Join the
customers
collection with theorders
collection based oncustomer_id
. - Project: Format the output.
Aggregation Pipeline:
db.orders.aggregate([
{
$lookup: {
from: "customers",
localField: "customer_id",
foreignField: "_id",
as: "customer_info"
}
},
{
$unwind: "$customer_info"
},
{
$project: {
_id: 0,
order_id: "$_id",
product: 1,
amount: 1,
customer_name: "$customer_info.name",
customer_email: "$customer_info.email"
}
}
])
Output:
Top 10 Interview Questions & Answers on MongoDB Use Cases for Aggregation Pipelines
1. Data Transformation and Formatting
Question: How can MongoDB Aggregation Pipelines be used to format and transform raw data into a more readable format?
Answer: The $project
stage is used to select and reshape fields to be returned from the pipeline. You can also use $addFields
to add new fields or modify existing ones.
Example: Suppose you have a collection employees
with document fields firstName
, lastName
, and salary
. You want to create a full name field combining firstName
and lastName
.
db.employees.aggregate([
{
$project: {
_id: 0, // exclude _id
fullName: { $concat: ["$firstName", " ", "$lastName"] },
salary: 1
}
}
]);
2. Data Filtering
Question: How can Aggregation Pipelines filter documents to return only those that meet specific criteria?
Answer: The $match
stage is used to filter documents based on query conditions. Similar to a find()
query, it is useful for narrowing down your dataset.
Example: To find all employees earning more than $70,000.
db.employees.aggregate([
{
$match: { salary: { $gt: 70000 } }
}
]);
3. Grouping Data
Question: How does MongoDB handle data grouping within a pipeline?
Answer: The $group
stage groups documents by a certain key and can optionally aggregate data based on various accumulators like $sum
, $avg
, $max
, etc.
Example: Group employees by department and find the average salary in each department.
db.employees.aggregate([
{
$group: {
_id: "$department",
avgSalary: { $avg: "$salary" }
}
}
]);
4. Sorting and Limiting Results
Question: Can Aggregation Pipelines sort and limit the number of documents returned?
Answer: Yes, the $sort
and $limit
stages serve these purposes. $sort
organizes documents in a specified order while $limit
restricts the number of documents passed to the next stage.
Example: List the top 5 highest-paid employees.
db.employees.aggregate([
{
$sort: { salary: -1 }
},
{
$limit: 5
}
]);
5. Data Lookup and Joining
Question: How can Aggregation Pipelines perform complex lookups and joins between different collections?
Answer: The $lookup
stage facilitates left outer joins, while $unwind
can deconstruct arrays into multiple documents. These are useful for combining data from related collections.
Example: List all employees along with their associated projects from the projects
collection.
db.employees.aggregate([
{
$lookup: {
from: "projects",
localField: "_id",
foreignField: "employeeId",
as: "assignedProjects"
}
}
]);
6. Conditional Logic and Computations
Question: How can Aggregation Pipelines incorporate conditional logic and computations?
Answer: Operators like $cond
, $switch
, $ifNull
, and $add
can be used to perform logic and arithmetic operations.
Example: Add a new field to indicate whether an employee is high earning (over $100,000).
db.employees.aggregate([
{
$addFields: {
isHighEarning: { $cond: { if: { $gt: ["$salary", 100000] }, then: true, else: false } }
}
}
]);
7. Handling Null and Missing Values
Question: How can Aggregation Pipelines manage null or missing values?
Answer: Operators such as $ifNull
and $filter
allow for handling missing or null values efficiently.
Example: Set salary
to 0 for any null or missing value.
db.employees.aggregate([
{
$addFields: {
salary: { $ifNull: ["$salary", 0] }
}
}
]);
8. Complex Data Extraction Using $unwind
and $group
Question: How can Aggregation Pipelines handle arrays and extract complex data?
Answer: $unwind
splits the documents into multiple ones based on an array field. After $unwind
, operations like $group
can aggregate data across the expanded documents.
Example: Find out how many projects each employee is assigned to.
db.projects.aggregate([
{
$unwind: "$employeeIds"
},
{
$group: {
_id: "$employeeIds",
projectCount: { $sum: 1 }
}
}
]);
9. Pipeline Optimization and Indexing
Question: How can you improve the performance of MongoDB Aggregation Pipelines?
Answer: Proper indexing can significantly boost the performance of $match
stages. Utilizing indexes on fields that are frequently used in filtering, sorting, and grouping operations can substantially speed up the aggregation processes.
Example: Ensure there is an index on department
for the grouping query from earlier.
db.employees.createIndex({ department: 1 });
10. Data Export and Analysis
Question: Can Aggregation Pipelines aid in exporting and analyzing data?
Answer: Aggregation Pipelines can output data into various formats for further analysis in other tools via $out
to another collection, $merge
to merge with another collection, or $export
(in newer versions) to export results to a file or other destinations.
Example: Export the aggregated data to a CSV file using the MongoDB Shell.
Login to post a comment.