← back

Aggregation
Stages

FILE  18_aggregation_stages
TOPIC  $match · $project · $sort · $limit/$skip · $addFields · $count · $out · $merge
LEVEL  Foundation
01
$match
Filter documents — the most important stage for performance
filter

$match filters the document stream using the same query syntax as find(). Documents that do not match the condition are removed from the pipeline. It is the primary performance lever — place it as early as possible.

Basic Syntax

{ $match: { <query> } }

// Equality filter
{ $match: { status: "active" } }

// With operators
{ $match: { score: { $gte: 80 }, status: "published" } }

// Compound — AND (implicit)
{ $match: { category: "tech", views: { $gt: 1000 } } }

// Compound — OR
{ $match: { $or: [ { status: "active" }, { featured: true } ] } }

Using $expr in $match

$expr allows aggregation expression operators inside a $match condition — enabling comparisons between two fields in the same document.

// Find orders where discount > 20% of total
{ $match: { $expr: { $gt: ["$discount", { $multiply: ["$total", 0.2] }] } } }

// Find documents where endDate field is after startDate
{ $match: { $expr: { $gt: ["$endDate", "$startDate"] } } }

// $expr cannot use indexes on the compared fields in general,
// but CAN use an index if the expression is { $eq: ["$field", value] }

$text Search in $match

// Full-text search — $text must be the FIRST stage and requires a text index
db.articles.aggregate([
  { $match: { $text: { $search: "mongodb aggregation" } } },
  { $sort: { score: { $meta: "textScore" } } },
  { $limit: 10 }
])
DANGER
A $match placed at the very beginning of a pipeline can use a collection index, drastically reducing the number of documents processed by subsequent stages. A $match placed after $group or $unwind runs on the transformed stream with no index — potentially filtering millions of intermediate documents.

$match vs find() Filter

Aspectfind() filter$match in pipeline
SyntaxIdentical query syntaxIdentical query syntax
Index usageAlways (if available)Only at first stage
$expr supportYesYes
Field availabilityOriginal document fieldsFields at that pipeline position
Can appear multiple timesN/AYes — filter at any stage
02
$project
Include, exclude, rename, and compute fields
shape

$project shapes each document in the stream — controlling which fields pass through, renaming fields, and computing new fields using expression operators.

Include and Exclude Syntax

// Include specific fields (1 = include)
{ $project: { name: 1, email: 1 } }
// Output: { _id: ..., name: ..., email: ... }
// _id is included unless explicitly excluded

// Exclude specific fields (0 = exclude)
{ $project: { password: 0, __v: 0 } }
// Output: all fields EXCEPT password and __v

// Exclude _id
{ $project: { name: 1, email: 1, _id: 0 } }
// Only exception: _id can be excluded while other fields are included
WARN
You cannot mix include (1) and exclude (0) in the same $project, except for the _id field. Use either all-includes (pass-list) or all-excludes (block-list). Mixing throws a "Invalid $project" error.

Renaming Fields

// Rename "firstName" to "first" and "lastName" to "last"
{
  $project: {
    first:  "$firstName",
    last:   "$lastName",
    _id:    0
  }
}
// Assigning a field path expression renames the field

Computing New Fields

// Compute fullName, discounted price, and status badge
{
  $project: {
    fullName:  { $concat: ["$firstName", " ", "$lastName"] },
    price:     { $multiply: ["$basePrice", 0.9] },  // 10% discount
    isExpired: { $lt: ["$expiryDate", "$$NOW"] },    // boolean expression
    _id:       0
  }
}

Nested Field Access and Dot Notation

// Include only nested fields
{ $project: { "address.city": 1, "address.zip": 1, _id: 0 } }

// Promote a nested field to the top level
{ $project: { city: "$address.city", _id: 0 } }

$project vs $addFields

$project$addFields / $set
Passes through all fields?No — explicit controlYes — additive
Can exclude fieldsYesNo
Can add computed fieldsYesYes
Best whenPrecise output shape neededJust adding/overwriting a few fields
03
$sort
Order documents — blocking stage with Top-K optimization
order

$sort reorders all documents in the stream. It is a blocking stage — it must consume the entire input before it can emit the first output document. The exception is the Top-K optimization.

Syntax

{ $sort: { field1: 1, field2: -1 } }
// 1 = ascending, -1 = descending
// Multi-field sort: primary sort on field1, tiebreak on field2

// Sort by score descending, then by _id ascending for stable tie-breaking
{ $sort: { score: -1, _id: 1 } }

// Sort by nested field
{ $sort: { "address.city": 1 } }

// Sort by text score (full-text search relevance)
{ $sort: { score: { $meta: "textScore" } } }

Top-K Optimization

When $sort is immediately followed by $limit, MongoDB applies a Top-K optimization — instead of sorting all documents, it maintains a sorted heap of only K items in memory, never loading more than K documents simultaneously.

// Top-K pattern — sort + limit together = memory-efficient
db.products.aggregate([
  { $sort:  { price: -1 } },   // sort descending by price
  { $limit: 10 }               // Top-K: only 10 items in memory at a time
])
// MongoDB recognizes the pattern and optimizes automatically
TIP
MongoDB automatically coalesces adjacent $sort + $limit stages regardless of whether they are next to each other in your pipeline or separated by other non-filtering stages. Always place $sort immediately followed by $limit when you only need the top N results.

Index Usage with $sort

// $sort at the START of a pipeline (or right after $match) can use an index
db.orders.aggregate([
  { $match:  { status: "active" } },
  { $sort:   { createdAt: -1 } },    // uses index on { status, createdAt }
  { $limit:  20 }
])

// $sort AFTER $group — no index possible (operating on computed data)
db.orders.aggregate([
  { $group: { _id: "$region", total: { $sum: "$amount" } } },
  { $sort:  { total: -1 } }      // in-memory sort only — no index available
])

Sort Stability

MongoDB sort is stable — documents with equal sort key values maintain their relative order from the input. To guarantee fully deterministic output, always add _id as a final tiebreaker since _id is always unique.

ScenarioMemory UsageNotes
$sort alone (all docs)Up to 100MB (soft limit per stage)Blocking — must consume all input first
$sort + $limit (Top-K)Only K documents in heapOptimization applied automatically
$sort with allowDiskUse:trueSpills to disk past 100MBSlower but handles large datasets
04
$limit & $skip
Control result count and offset — skip() degrades at scale
pagination

$limit

{ $limit: N }
// Pass only the first N documents downstream — all others are discarded

// Get top 5 highest-rated products
db.products.aggregate([
  { $sort:  { rating: -1 } },
  { $limit: 5 }
])

$skip

{ $skip: N }
// Discard the first N documents — pass the rest downstream

// Page 3 with 10 items per page
db.articles.aggregate([
  { $sort:  { createdAt: -1 } },
  { $skip:  20 },               // skip pages 1 and 2
  { $limit: 10 }                // take page 3
])
DANGER
$skip scans and discards the skipped documents — it does not jump to a position. Skipping 10,000 documents means 10,000 documents must pass through the pipeline to be discarded. Performance degrades linearly. For deep pagination use keyset/cursor-based pagination with _id: { $gt: lastId } instead.

Correct Order: $sort → $skip → $limit

// CORRECT order — sort THEN skip THEN limit
db.col.aggregate([
  { $sort:  { name: 1 } },
  { $skip:  40 },
  { $limit: 20 }
])

// WRONG — $limit before $skip cuts the dataset before you can skip
db.col.aggregate([
  { $sort:  { name: 1 } },
  { $limit: 20 },   // ← only 20 docs remain
  { $skip:  40 }    // ← tries to skip 40 from only 20 — returns empty!
])

Keyset Pagination Alternative

// Efficient — O(log n) using index on _id
db.posts.aggregate([
  { $match: { _id: { $gt: lastSeenId } } },  // bookmark from previous page
  { $sort:  { _id: 1 } },
  { $limit: 20 }
])
// No skip, no linear scan — index lookup directly to the bookmark position
skip() PaginationKeyset Pagination
Deep page performanceO(n) — degradesO(log n) — stable
Jump to any pageYesSequential only
Stable under insertsDuplicates/skipsStable
05
$addFields / $set
Add or overwrite fields while keeping all others
additive

$addFields and $set are exact aliases — they add new fields or overwrite existing fields while passing all other fields through unchanged. Unlike $project, they never drop fields.

Basic Usage

// $addFields and $set are identical in behavior
{ $addFields: { newField: expression } }
{ $set:       { newField: expression } }  // alias — prefer $set in newer code (MongoDB 4.2+)

// Add a computed field without losing originals
db.orders.aggregate([
  {
    $addFields: {
      totalWithTax: { $multiply: ["$total", 1.18] },   // new field
      status:       "processed"                         // overwrite existing field
    }
  }
])
// All original fields are preserved; totalWithTax is added; status is overwritten

Overwriting Nested Fields

// Add a field inside a nested object
db.users.aggregate([
  {
    $addFields: {
      "profile.score": { $add: ["$profile.score", 10] }
    }
  }
])

// Replace a nested object entirely
db.users.aggregate([
  {
    $addFields: {
      address: {
        city:    "$address.city",
        country: "India",           // add new sub-field
        zip:     "$address.zip"
      }
    }
  }
])

Removing Fields with $unset

$unset is the complement of $addFields — it removes specific fields while passing everything else through.

// Remove one or more fields
{ $unset: "password" }
{ $unset: ["password", "ssn", "__v"] }

// Equivalent $project
{ $project: { password: 0, ssn: 0, __v: 0 } }
// $unset is more readable when removing a few fields from a large document
TIP
Use $set / $addFields when you want to augment documents mid-pipeline (e.g., compute a field for sorting, then remove it later with $unset or $project). The combination $set → sort/filter → $project is a common pattern for temp computed fields.
06
$count
Count documents passing through the pipeline
count

$count outputs a single document containing the count of documents that reach this stage. It replaces the pattern of $group: { _id:null, n:{$sum:1} } with a cleaner syntax.

Syntax and Usage

{ $count: "fieldName" }
// Outputs: { fieldName: N } — a single document with the count

// Count published tech articles
db.articles.aggregate([
  { $match: { status: "published", category: "tech" } },
  { $count: "publishedTechCount" }
])
// Output: [ { publishedTechCount: 142 } ]

// Count per-group — $count alone doesn't group; use $group for that
db.orders.aggregate([
  { $match: { status: "completed" } },
  { $group: { _id: "$region", orderCount: { $sum: 1 } } },
  { $sort:  { orderCount: -1 } }
])

$count vs countDocuments()

{ $count: "n" }countDocuments()
ResultSingle pipeline documentReturns a number directly
Use in pipelineYes — a stageNo — separate collection method
Applies pipeline filters?Yes — counts what reaches stageYes — filter argument
Best forCounting mid-pipeline after transformsSimple filtered count queries
07
$out & $merge
Write pipeline results to a collection
output

Both stages persist pipeline results to a MongoDB collection. They must be the last stage in a pipeline. The key difference: $out replaces the entire target collection atomically; $merge performs an upsert merge.

$out — Replace Collection

// Simple $out — write to a collection in the same database
{ $out: "reportCollection" }

// Extended form — specify database
{ $out: { db: "reporting", coll: "monthly_summary" } }

// Usage
db.orders.aggregate([
  { $match:  { status: "completed", year: 2024 } },
  { $group:  { _id: "$region", total: { $sum: "$amount" } } },
  { $sort:   { total: -1 } },
  { $out:    "regional_totals_2024" }
])
// regional_totals_2024 is REPLACED atomically with the new results
// Indexes on the target collection are dropped and recreated
WARN
$out replaces the entire target collection atomically. All existing documents are deleted. All indexes on the target collection (except _id) are dropped. If you need to add/update rather than replace, use $merge instead.

$merge — Upsert into Collection

{
  $merge: {
    into:           "targetCollection",    // or { db: "...", coll: "..." }
    on:             "_id",                  // match key (default: "_id")
    whenMatched:    "merge",               // merge | replace | keepExisting | fail | pipeline
    whenNotMatched: "insert"               // insert | discard | fail
  }
}

// Incrementally update a running summary
db.dailyOrders.aggregate([
  { $group: { _id: "$productId", sold: { $sum: "$qty" } } },
  {
    $merge: {
      into:           "productTotals",
      on:             "_id",
      whenMatched:    "merge",     // merges fields from new doc into existing doc
      whenNotMatched: "insert"     // new products are inserted
    }
  }
])

$merge whenMatched Options

whenMatched valueBehavior
"merge"Merge new document fields into existing (existing fields not in new doc survive)
"replace"Replace the entire matched document with the new one
"keepExisting"Keep the existing document, discard the new one
"fail"Throw an error if a match is found
[ pipeline ]Apply an aggregation pipeline to produce the final merged document

$out vs $merge

$out$merge
Existing documentsAll dropped (full replace)Updated or preserved (merge logic)
Indexes on targetDropped + recreatedPreserved
Incremental updatesNoYes
Target can be same collectionNo (same DB allowed)Yes
MongoDB version2.2+4.2+