18 · Aggregation Stages

01

$match

Filter documents — the most important stage for performance

filter

$match filters the document stream using the same query syntax as find(). Documents that do not match the condition are removed from the pipeline. It is the primary performance lever — place it as early as possible.

Basic Syntax

{ $match: { <query> } }

// Equality filter
{ $match: { status: "active" } }

// With operators
{ $match: { score: { $gte: 80 }, status: "published" } }

// Compound — AND (implicit)
{ $match: { category: "tech", views: { $gt: 1000 } } }

// Compound — OR
{ $match: { $or: [ { status: "active" }, { featured: true } ] } }

Using $expr in $match

$expr allows aggregation expression operators inside a $match condition — enabling comparisons between two fields in the same document.

// Find orders where discount > 20% of total
{ $match: { $expr: { $gt: ["$discount", { $multiply: ["$total", 0.2] }] } } }

// Find documents where endDate field is after startDate
{ $match: { $expr: { $gt: ["$endDate", "$startDate"] } } }

// $expr cannot use indexes on the compared fields in general,
// but CAN use an index if the expression is { $eq: ["$field", value] }

$text Search in $match

// Full-text search — $text must be the FIRST stage and requires a text index
db.articles.aggregate([
  { $match: { $text: { $search: "mongodb aggregation" } } },
  { $sort: { score: { $meta: "textScore" } } },
  { $limit: 10 }
])

DANGER

A $match placed at the very beginning of a pipeline can use a collection index, drastically reducing the number of documents processed by subsequent stages. A $match placed after $group or $unwind runs on the transformed stream with no index — potentially filtering millions of intermediate documents.

$match vs find() Filter

Aspect	find() filter	$match in pipeline
Syntax	Identical query syntax	Identical query syntax
Index usage	Always (if available)	Only at first stage
$expr support	Yes	Yes
Field availability	Original document fields	Fields at that pipeline position
Can appear multiple times	N/A	Yes — filter at any stage

02

$project

Include, exclude, rename, and compute fields

shape

$project shapes each document in the stream — controlling which fields pass through, renaming fields, and computing new fields using expression operators.

Include and Exclude Syntax

// Include specific fields (1 = include)
{ $project: { name: 1, email: 1 } }
// Output: { _id: ..., name: ..., email: ... }
// _id is included unless explicitly excluded

// Exclude specific fields (0 = exclude)
{ $project: { password: 0, __v: 0 } }
// Output: all fields EXCEPT password and __v

// Exclude _id
{ $project: { name: 1, email: 1, _id: 0 } }
// Only exception: _id can be excluded while other fields are included

WARN

You cannot mix include (1) and exclude (0) in the same $project, except for the _id field. Use either all-includes (pass-list) or all-excludes (block-list). Mixing throws a "Invalid $project" error.

Renaming Fields

// Rename "firstName" to "first" and "lastName" to "last"
{
  $project: {
    first:  "$firstName",
    last:   "$lastName",
    _id:    0
  }
}
// Assigning a field path expression renames the field

Computing New Fields

// Compute fullName, discounted price, and status badge
{
  $project: {
    fullName:  { $concat: ["$firstName", " ", "$lastName"] },
    price:     { $multiply: ["$basePrice", 0.9] },  // 10% discount
    isExpired: { $lt: ["$expiryDate", "$$NOW"] },    // boolean expression
    _id:       0
  }
}

Nested Field Access and Dot Notation

// Include only nested fields
{ $project: { "address.city": 1, "address.zip": 1, _id: 0 } }

// Promote a nested field to the top level
{ $project: { city: "$address.city", _id: 0 } }

$project vs $addFields

	`$project`	`$addFields / $set`
Passes through all fields?	No — explicit control	Yes — additive
Can exclude fields	Yes	No
Can add computed fields	Yes	Yes
Best when	Precise output shape needed	Just adding/overwriting a few fields

03

$sort

Order documents — blocking stage with Top-K optimization

order

$sort reorders all documents in the stream. It is a blocking stage — it must consume the entire input before it can emit the first output document. The exception is the Top-K optimization.

Syntax

{ $sort: { field1: 1, field2: -1 } }
// 1 = ascending, -1 = descending
// Multi-field sort: primary sort on field1, tiebreak on field2

// Sort by score descending, then by _id ascending for stable tie-breaking
{ $sort: { score: -1, _id: 1 } }

// Sort by nested field
{ $sort: { "address.city": 1 } }

// Sort by text score (full-text search relevance)
{ $sort: { score: { $meta: "textScore" } } }

Top-K Optimization

When $sort is immediately followed by $limit, MongoDB applies a Top-K optimization — instead of sorting all documents, it maintains a sorted heap of only K items in memory, never loading more than K documents simultaneously.

// Top-K pattern — sort + limit together = memory-efficient
db.products.aggregate([
  { $sort:  { price: -1 } },   // sort descending by price
  { $limit: 10 }               // Top-K: only 10 items in memory at a time
])
// MongoDB recognizes the pattern and optimizes automatically

TIP

MongoDB automatically coalesces adjacent $sort + $limit stages regardless of whether they are next to each other in your pipeline or separated by other non-filtering stages. Always place $sort immediately followed by $limit when you only need the top N results.

Index Usage with $sort

// $sort at the START of a pipeline (or right after $match) can use an index
db.orders.aggregate([
  { $match:  { status: "active" } },
  { $sort:   { createdAt: -1 } },    // uses index on { status, createdAt }
  { $limit:  20 }
])

// $sort AFTER $group — no index possible (operating on computed data)
db.orders.aggregate([
  { $group: { _id: "$region", total: { $sum: "$amount" } } },
  { $sort:  { total: -1 } }      // in-memory sort only — no index available
])

Sort Stability

MongoDB sort is stable — documents with equal sort key values maintain their relative order from the input. To guarantee fully deterministic output, always add _id as a final tiebreaker since _id is always unique.

Scenario	Memory Usage	Notes
$sort alone (all docs)	Up to 100MB (soft limit per stage)	Blocking — must consume all input first
$sort + $limit (Top-K)	Only K documents in heap	Optimization applied automatically
$sort with allowDiskUse:true	Spills to disk past 100MB	Slower but handles large datasets

04

$limit & $skip

Control result count and offset — skip() degrades at scale

pagination

$limit

{ $limit: N }
// Pass only the first N documents downstream — all others are discarded

// Get top 5 highest-rated products
db.products.aggregate([
  { $sort:  { rating: -1 } },
  { $limit: 5 }
])

$skip

{ $skip: N }
// Discard the first N documents — pass the rest downstream

// Page 3 with 10 items per page
db.articles.aggregate([
  { $sort:  { createdAt: -1 } },
  { $skip:  20 },               // skip pages 1 and 2
  { $limit: 10 }                // take page 3
])

DANGER

$skip scans and discards the skipped documents — it does not jump to a position. Skipping 10,000 documents means 10,000 documents must pass through the pipeline to be discarded. Performance degrades linearly. For deep pagination use keyset/cursor-based pagination with _id: { $gt: lastId } instead.

Correct Order: $sort → $skip → $limit

// CORRECT order — sort THEN skip THEN limit
db.col.aggregate([
  { $sort:  { name: 1 } },
  { $skip:  40 },
  { $limit: 20 }
])

// WRONG — $limit before $skip cuts the dataset before you can skip
db.col.aggregate([
  { $sort:  { name: 1 } },
  { $limit: 20 },   // ← only 20 docs remain
  { $skip:  40 }    // ← tries to skip 40 from only 20 — returns empty!
])

Keyset Pagination Alternative

// Efficient — O(log n) using index on _id
db.posts.aggregate([
  { $match: { _id: { $gt: lastSeenId } } },  // bookmark from previous page
  { $sort:  { _id: 1 } },
  { $limit: 20 }
])
// No skip, no linear scan — index lookup directly to the bookmark position

	skip() Pagination	Keyset Pagination
Deep page performance	O(n) — degrades	O(log n) — stable
Jump to any page	Yes	Sequential only
Stable under inserts	Duplicates/skips	Stable

05

$addFields / $set

Add or overwrite fields while keeping all others

additive

$addFields and $set are exact aliases — they add new fields or overwrite existing fields while passing all other fields through unchanged. Unlike $project, they never drop fields.

Basic Usage

// $addFields and $set are identical in behavior
{ $addFields: { newField: expression } }
{ $set:       { newField: expression } }  // alias — prefer $set in newer code (MongoDB 4.2+)

// Add a computed field without losing originals
db.orders.aggregate([
  {
    $addFields: {
      totalWithTax: { $multiply: ["$total", 1.18] },   // new field
      status:       "processed"                         // overwrite existing field
    }
  }
])
// All original fields are preserved; totalWithTax is added; status is overwritten

Overwriting Nested Fields

// Add a field inside a nested object
db.users.aggregate([
  {
    $addFields: {
      "profile.score": { $add: ["$profile.score", 10] }
    }
  }
])

// Replace a nested object entirely
db.users.aggregate([
  {
    $addFields: {
      address: {
        city:    "$address.city",
        country: "India",           // add new sub-field
        zip:     "$address.zip"
      }
    }
  }
])

Removing Fields with $unset

$unset is the complement of $addFields — it removes specific fields while passing everything else through.

// Remove one or more fields
{ $unset: "password" }
{ $unset: ["password", "ssn", "__v"] }

// Equivalent $project
{ $project: { password: 0, ssn: 0, __v: 0 } }
// $unset is more readable when removing a few fields from a large document

TIP

Use $set / $addFields when you want to augment documents mid-pipeline (e.g., compute a field for sorting, then remove it later with $unset or $project). The combination $set → sort/filter → $project is a common pattern for temp computed fields.

06

$count

Count documents passing through the pipeline

count

$count outputs a single document containing the count of documents that reach this stage. It replaces the pattern of $group: { _id:null, n:{$sum:1} } with a cleaner syntax.

Syntax and Usage

{ $count: "fieldName" }
// Outputs: { fieldName: N } — a single document with the count

// Count published tech articles
db.articles.aggregate([
  { $match: { status: "published", category: "tech" } },
  { $count: "publishedTechCount" }
])
// Output: [ { publishedTechCount: 142 } ]

// Count per-group — $count alone doesn't group; use $group for that
db.orders.aggregate([
  { $match: { status: "completed" } },
  { $group: { _id: "$region", orderCount: { $sum: 1 } } },
  { $sort:  { orderCount: -1 } }
])

$count vs countDocuments()

	`{ $count: "n" }`	`countDocuments()`
Result	Single pipeline document	Returns a number directly
Use in pipeline	Yes — a stage	No — separate collection method
Applies pipeline filters?	Yes — counts what reaches stage	Yes — filter argument
Best for	Counting mid-pipeline after transforms	Simple filtered count queries

07

$out & $merge

Write pipeline results to a collection

output

Both stages persist pipeline results to a MongoDB collection. They must be the last stage in a pipeline. The key difference: $out replaces the entire target collection atomically; $merge performs an upsert merge.

$out — Replace Collection

// Simple $out — write to a collection in the same database
{ $out: "reportCollection" }

// Extended form — specify database
{ $out: { db: "reporting", coll: "monthly_summary" } }

// Usage
db.orders.aggregate([
  { $match:  { status: "completed", year: 2024 } },
  { $group:  { _id: "$region", total: { $sum: "$amount" } } },
  { $sort:   { total: -1 } },
  { $out:    "regional_totals_2024" }
])
// regional_totals_2024 is REPLACED atomically with the new results
// Indexes on the target collection are dropped and recreated

WARN

$out replaces the entire target collection atomically. All existing documents are deleted. All indexes on the target collection (except _id) are dropped. If you need to add/update rather than replace, use $merge instead.

$merge — Upsert into Collection

{
  $merge: {
    into:           "targetCollection",    // or { db: "...", coll: "..." }
    on:             "_id",                  // match key (default: "_id")
    whenMatched:    "merge",               // merge | replace | keepExisting | fail | pipeline
    whenNotMatched: "insert"               // insert | discard | fail
  }
}

// Incrementally update a running summary
db.dailyOrders.aggregate([
  { $group: { _id: "$productId", sold: { $sum: "$qty" } } },
  {
    $merge: {
      into:           "productTotals",
      on:             "_id",
      whenMatched:    "merge",     // merges fields from new doc into existing doc
      whenNotMatched: "insert"     // new products are inserted
    }
  }
])

$merge whenMatched Options

whenMatched value	Behavior
`"merge"`	Merge new document fields into existing (existing fields not in new doc survive)
`"replace"`	Replace the entire matched document with the new one
`"keepExisting"`	Keep the existing document, discard the new one
`"fail"`	Throw an error if a match is found
`[ pipeline ]`	Apply an aggregation pipeline to produce the final merged document

$out vs $merge

	`$out`	`$merge`
Existing documents	All dropped (full replace)	Updated or preserved (merge logic)
Indexes on target	Dropped + recreated	Preserved
Incremental updates	No	Yes
Target can be same collection	No (same DB allowed)	Yes
MongoDB version	2.2+	4.2+