Aggregation in MongoDB: Pipeline & Syntax
Updated on Jul 03, 2023 | 10 min read | 6.0k views
Share:
For working professionals
For fresh graduates
More
Updated on Jul 03, 2023 | 10 min read | 6.0k views
Share:
Table of Contents
MongoDB is a form of a high-volume data storage medium. It acts as a non-relational database with document queries. The basic unit in MongoDB is key-value pairs of the documents in MongoDB collection. It became a very beneficial medium from the early 2000s.
Aggregation in MongoDB is a framework that allows us to perform various computational tasks on documents in one or more MongoDB collections. It is an effective way of generating reports or a handful of data metrics for interpretation from different documents. The framework is named as MongoDB as it aggregates multiple documents to form united and combined results.
The aggregation in MongoDB primarily contains the pipeline framework. The pipeline’s basic underlying concept is that input is taken from a MongoDB collection, and the documents are passed through a series of stages to produce a unified output finally. This idea is very much similar to the Linux pipeline concept, i.e., Bash.
Numerous files are processed using aggregation in MongoDB, which then produces computed results. Aggregation operations can be used to:
To conduct or execute an aggregate function in MongoDB, you can use the following:
There are many reasons for which this database system is widely used. Some special features are mentioned below:
Read: MongoDB Project Ideas & Topics
There can be times when processing a million of embedded files may be needed. However, this can cause an overflow in the server stack and cause the process to terminate. The constraint of processing a large number of embedded files indulged the enhancement of the scanning process by associating the files together.
Therefore, aggregation operation was designed to compute the documents in different stages and show the cumulative effect as a result and return it. The matching technique of result generation revolutionized the issues of a huge number of files. Hence, the aggregation framework is essential.
This framework can perform many query operations on different files simultaneously. It has much resemblance to relational Database queries.
Check out: Most Common MongoDB Commands
A pipeline is a framework of continuous stages designed to perform separate tasks that together solve one unified goal. Here in MongoDB Aggregation, this framework serves the computation process and manipulates the documents. Many documents from the MongoDB collection are given as input, and specific to the methodology; a particular task is performed at each stage.
Later, all the results are collectively united, and cumulative metrics are calculated, which are shown as output. The output is quite similar to query outputs given from relational databases, i.e., a stream of documents to work additionally. Later, it can be used in report generation of website making.
So, each stage acts as a processing unit here. For every internal stage, the output from the previous stage acts as an input. Also, additional filters can be added at the initial stage. The stages are often designed with many hyperparameters. For this purpose, some knobs or tuning buttons are provided to control them. Changing these hyperparameters affects the results of that stage. This parameterized the task one is interested in performing. In this way, a stage performs a generic task.
There can be situations when one may want to include a similar type of stage multiple times in a particular pipeline. For example, there can be a filter present in the initial part to not make the entire collection pass through. But later on, after some processing, another filter may be needed for a different criterion.
Syntax
There is a specific format in which the aggregation queries are built. The syntax and format of code is shown below.
db.Collection_Name.aggregate([
{ $match: {“_id_field_”: value}}
{ $group: {“_id_field_”: value}}
{ $sort: {“_id_field_”: value}}
]);
Pipeline Commands
Matching: This is the filtering stage. This stage cuts out the documents which are not cared about. This command has much resemblance to the WHERE function of SQL.
db.customers.aggregate([
{ $match: {“zip”: 700068}}
]);
2. Grouping: After filtering the documents, the specific grouping is needed. This enables to form subsets of the whole collection. Also, documents can be clustered upon similar commonalities. Clustering helps to perform similar operations on them together.
db.customers.aggregate([
{ $match: {“zip”: 700068}}
{
$group: {
_id: null,
Count: {
$sum: 1
}
}
]);
3. Sort: This helps to sort the documents in ascending or descending order based on any specific query field.
db.customers.aggregate([
{ $match: {“zip”: 700068}}
{
$group: {
_id: null,
Count: {
$sum: 1
}
}
{
$sort: {
{“zip”: -1}
}
}
]);
This will sort the documents based upon their zip code.
Each stage begins with the stage operators, which are:
Expressions: It signifies the field name in input files, for e.g. { $group : { _id : “$id“, total:{$sum:”$fare“}}} here $id and $fare are expressions.
In memory, aggregation functions. Each level has a maximum RAM usage of 100 MB. If you go beyond this limit, the database will issue an error. If it becomes impossible to avoid the issue, you can choose a page to disc.
However, this has the drawback of making you wait a little longer, as working on the disc takes longer than working in memory. You only need to toggle the setting allowDiskUse to true in order to select the page-to-disk method:
db.collectionName.aggregate(pipeline, { allowDiskUse : true })
Keep in mind that shared services may not always have this option available. The Atlas M0, M2, and M5 clusters, for instance, disable this option. The maximum size of the documents retrieved by the aggregation query is 16MB, whether they are saved as a cursor or via $out in another collection.
They cannot, therefore, exceed the largest permitted size for a MongoDB document. If you anticipate going over this limit, you must indicate that the aggregation query’s result is a cursor rather than a document.
Also Read: Future Scope of MongoDB
In the $match stage, programmers are able to select only the documents from any grouping in MongoDB that they are interested in using. It does this by eliminating individuals who don’t fit their criteria.
In the scenario that follows, we only intend to proceed with the documents that explicitly state that Spain is the value for the field country and Salamanca is the value for the field city. I’m going to finish all the instructions with.pretty() to get a comprehensible result.
db.universities.aggregate([
{ $match : { country : 'Spain', city : 'Salamanca' } }
]).pretty()
The output is…
{
"_id" : ObjectId("5b7d9d9efbc9884f689cdba9"),
"country" : "Spain","city" : "Salamanca",
"name" : "USAL",
"location" : {
"type" : "Point",
"coordinates" : [
-5.6722512,
17,
40.9607792
]
},
"students" : [
{
"year" : 2014,
"number" : 24774
},
{
"year" : 2015,
"number" : 23166
},
{
"year" : 2016,
"number" : 21913
},
{
"year" : 2017,
"number" : 21715
}
]
}
{
"_id" : ObjectId("5b7d9d9efbc9884f689cdbaa"),
"country" : "Spain",
"city" : "Salamanca",
"name" : "UPSA",
"location" : {
"type" : "Point",
"coordinates" : [
-5.6691191,
17,
40.9631732
]
},
"students" : [
{
"year" : 2014,
"number" : 4788
},
{
"year" : 2015,
"number" : 4821
},
{
"year" : 2016,
"number" : 6550
},
{
"year" : 2017,
"number" : 6125
}
]
}
In this era of Big Data, non-relational databases are very useful to handle large sample sets. Nowadays, the field of data science and development are well accustomed to the use of MongoDB. This framework is usable with popular languages like Java, JavaScript, Python, and many other languages. Having knowledge of MongoDB and a sound hand with an aggregation framework can make for a career of dreams.
If you are interested to know more about Big Data, check out our Advanced Certificate Programme in Big Data from IIIT Bangalore.
In that case, this course certainly will help you in gaining all the knowledge regarding Data structures and algorithms, Java programming, Foundation of Database, HTML, CSS, JavaScript, Angular, Java, Object-Oriented Analysis & Design.
More than 250 hours of online teaching, one on one sessions with industry experts, and much more is available in this course. In addition to this, the course will be curated by subject matter experts from upGrad, and you will be provided with placement opportunities from top IT companies, product-based companies, and start-ups.
Learn Software Development Courses online from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs or Masters Programs to fast-track your career.
Get Free Consultation
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources