← back

Compound
Indexes

FILE  25_compound_indexes
TOPIC  Prefix Rule · ESR Rule · Sort Direction · Covered Queries · Field Order Strategy
LEVEL  Intermediate
01
What is a Compound Index
Multi-field B-Tree — field order is critical
concept

A compound index is a single B-Tree built on two or more fields. The index stores entries sorted by the first field, then by the second field within each first-field group, and so on. Because the sort order is determined by field sequence, the order of fields in a compound index definition directly determines what queries it can serve.

// Create a compound index — field order matters
db.orders.createIndex(
  { status: 1, customerId: 1, createdAt: -1 },
  { name: "idx_status_customer_created" }
)

// Up to 32 fields in a single compound index
// Direction (1/-1) matters for sort operations; irrelevant for single-field equality

How the B-Tree is Structured

For an index on { status: 1, customerId: 1 }, the B-Tree sorts all documents first by status, then by customerId within each status group. This means the index can efficiently locate a specific { status, customerId } pair, or all documents for a particular status — but it cannot efficiently find all documents for a particular customerId without knowing the status first.

// Conceptual layout of { status: 1, customerId: 1 } index:
//   "active"   → C001 → doc ptr
//   "active"   → C002 → doc ptr
//   "active"   → C003 → doc ptr
//   "closed"   → C001 → doc ptr
//   "closed"   → C004 → doc ptr
//   "pending"  → C002 → doc ptr
//
// Can scan: { status: "active" } — contiguous block
// Cannot scan: { customerId: "C001" } — scattered throughout
NOTE
A compound index on { a, b, c } partially replaces the need for standalone indexes on { a } and { a, b } — it serves all three. However, it does NOT replace { b }, { c }, or { b, c }. Always verify with explain() that the index is actually used for each query pattern.
02
The Prefix Rule
An index serves a query only if the query uses a left-prefix of the index fields
prefix

A compound index on { a, b, c } can serve any query that references a contiguous left-prefix of its fields — starting from the leftmost field. Skipping the first field makes the index unusable for filtering (though it may still be used for sort).

Prefix Coverage Table

For index { status: 1, customerId: 1, createdAt: -1 }:

Query FilterUses Index?Why
{ status: "active" }YesLeft prefix: status only
{ status: "active", customerId: "C1" }YesLeft prefix: status + customerId
{ status: "active", customerId: "C1", createdAt: {$gte:...} }YesFull index used
{ customerId: "C1" }NoSkips status (leftmost field)
{ customerId: "C1", createdAt: {$gte:...} }NoSkips status
{ status: "active", createdAt: {$gte:...} }PartialUses status prefix only; createdAt not contiguous
// Index: { status: 1, customerId: 1, createdAt: -1 }

// ✅ IXSCAN — uses prefix { status }
db.orders.find({ status: "active" })

// ✅ IXSCAN — uses prefix { status, customerId }
db.orders.find({ status: "active", customerId: "C001" })

// ❌ COLLSCAN — customerId alone, no status prefix
db.orders.find({ customerId: "C001" })

// ⚠️ Partial IXSCAN — status prefix allows index entry,
// but createdAt is range-filtered in-memory after status scan
db.orders.find({ status: "active", createdAt: { $gte: ISODate("2024-01-01") } })
WARN
The prefix rule applies to the filter fields, not the query field order in your code. MongoDB's query optimizer normalizes the query filter — writing { customerId: "C1", status: "active" } is identical to { status: "active", customerId: "C1" }. What matters is which fields are present, not the order they appear in your find() call.

When a Non-Prefix Field IS Used (Sort Only)

// Index: { status: 1, createdAt: -1 }
// Query filters on status (prefix), sorts on createdAt (second field)
db.orders
  .find({ status: "active" })
  .sort({ createdAt: -1 })
// ✅ Both filter AND sort served by the index — no in-memory sort
03
The ESR Rule
Equality → Sort → Range: optimal field ordering for compound indexes
strategy

The ESR Rule is the most important compound index design principle. It prescribes the optimal order for fields in a compound index when a query combines equality filters, sort fields, and range filters.

  • E — Equality: Fields tested with exact match ({ field: value }) go first
  • S — Sort: Fields used in .sort() go in the middle
  • R — Range: Fields in range queries ($gt, $lt, $in, $regex) go last

Why ESR Works

Equality fields placed first collapse the index scan to the smallest possible contiguous block. Sort fields in the middle eliminate in-memory sorting because the index is already ordered for that range. Range fields at the end are applied after equality narrows the scan, minimising the range scan width.

// Query: active orders for customer C001, sorted by date, created after Jan 2024
// Fields: status (equality), customerId (equality), createdAt (sort), amount (range)

// ❌ BAD order — range before sort
db.orders.createIndex({ status: 1, amount: 1, createdAt: -1, customerId: 1 })

// ✅ GOOD order using ESR rule:
//   E: status, customerId  (equality)
//   S: createdAt           (sort)
//   R: amount              (range)
db.orders.createIndex(
  { status: 1, customerId: 1, createdAt: -1, amount: 1 },
  { name: "idx_esr_orders" }
)

// The query this index serves optimally:
db.orders
  .find({
    status:     "active",           // E
    customerId: "C001",             // E
    amount:     { $gt: 100 }        // R
  })
  .sort({ createdAt: -1 })          // S

Range before Sort — Why It Breaks Sort Elimination

// Index: { status: 1, amount: 1, createdAt: -1 }
// Query: { status: "active", amount: { $gt: 100 } }, sort: { createdAt: -1 }
//
// Problem: After scanning { status: "active", amount: >100 },
// the remaining entries are NOT sorted by createdAt — amount range
// scattered the createdAt values across the scan range.
// Result: in-memory SORT stage required (blocking, up to 100MB)

// Fix: move sort field before range field (ESR)
// Index: { status: 1, createdAt: -1, amount: 1 }
// Now createdAt is contiguous within each status group → sort eliminated
TIP
When there is no sort in the query, the ESR rule simplifies to ER: equality fields first, range fields last. The sort slot is empty. If there is no range either, just put all equality fields in selectivity order (highest cardinality first).
04
Sort Direction Compatibility
Index direction must match or be the exact inverse of the sort
direction

For a compound index to eliminate an in-memory sort, the sort direction of the query must exactly match the index field directions, or be the exact inverse of them. Any partial mismatch forces an in-memory sort.

Matching Rules

// Index: { createdAt: -1, score: 1 }

// ✅ Exact match — no in-memory sort
.sort({ createdAt: -1, score: 1 })

// ✅ Exact inverse — index traversed backwards, no in-memory sort
.sort({ createdAt: 1, score: -1 })

// ❌ Partial mismatch — in-memory sort required
.sort({ createdAt: -1, score: -1 })   // score direction flipped
.sort({ createdAt: 1, score: 1 })     // score direction flipped

Direction Matters Only for Multi-Field Sort

// Index: { status: 1, createdAt: -1 }

// Equality on status → only createdAt sort remains
// Single remaining sort field → direction irrelevant (B-Tree traversed either way)
db.orders.find({ status: "active" }).sort({ createdAt: 1 })   // ✅ no sort stage
db.orders.find({ status: "active" }).sort({ createdAt: -1 })  // ✅ no sort stage

// No equality prefix → sort on both fields
// Both directions must match or be exact inverse
db.orders.find({}).sort({ status: 1, createdAt: -1 })   // ✅ exact match
db.orders.find({}).sort({ status: -1, createdAt: 1 })   // ✅ exact inverse
db.orders.find({}).sort({ status: 1, createdAt: 1 })    // ❌ partial mismatch

Designing for Bidirectional Sort Needs

// If you need BOTH { createdAt -1, score -1 } AND { createdAt 1, score 1 }:
// → One index handles both (exact match + exact inverse)
db.posts.createIndex({ createdAt: -1, score: -1 })    // serves both sort directions

// If you need BOTH { createdAt -1, score 1 } AND { createdAt 1, score -1 }:
// → Same: one index handles both (exact match + exact inverse)
db.posts.createIndex({ createdAt: -1, score: 1 })     // serves both

// If you need BOTH { createdAt -1, score 1 } AND { createdAt -1, score -1 }:
// → These are NOT inverses → need TWO separate indexes
Index DirectionSort QueryIn-Memory Sort?
{ a: 1, b: 1 }{ a: 1, b: 1 }No — exact match
{ a: 1, b: 1 }{ a: -1, b: -1 }No — exact inverse
{ a: 1, b: 1 }{ a: 1, b: -1 }Yes — partial mismatch
{ a: 1, b: 1 }{ a: -1, b: 1 }Yes — partial mismatch
05
Covered Queries
Zero document fetches — results served entirely from the index
covered

A covered query is one where every field in the filter, every field in the projection, and (for compound indexes) every field in the sort is present in the index. MongoDB can return results entirely from the index without reading any documents from disk — totalDocsExamined: 0 in explain output.

Requirements for a Covered Query

  • All fields in the query filter are in the index
  • All fields in the projection are in the index
  • _id must be explicitly excluded from the projection (_id: 0) unless _id is in the index
  • The query must not touch an array field (multikey index) — multikey indexes cannot cover queries
  • The document must not contain the queried field as an array
// Index: { email: 1, name: 1, role: 1 }

// ✅ COVERED — all filter + projection fields in index; _id excluded
db.users.find(
  { email: "user@example.com" },       // filter field: in index
  { name: 1, role: 1, _id: 0 }         // projected fields: in index; _id excluded
)
// explain(): totalDocsExamined: 0 — never touched disk documents

// ❌ NOT COVERED — _id is included by default
db.users.find(
  { email: "user@example.com" },
  { name: 1, role: 1 }                 // _id implicitly included, not in index
)

// ❌ NOT COVERED — projecting a field not in the index
db.users.find(
  { email: "user@example.com" },
  { name: 1, createdAt: 1, _id: 0 }   // createdAt not in index
)

Verifying Coverage with explain()

// Run executionStats and look for totalDocsExamined: 0
db.users.find(
  { email: "user@example.com" },
  { name: 1, role: 1, _id: 0 }
).explain("executionStats")
// Key indicators of a covered query:
//   winningPlan.stage: "PROJECTION_COVERED"  ← or IXSCAN with no FETCH
//   totalDocsExamined: 0                     ← zero disk reads
//   totalKeysExamined: 1                     ← index keys scanned
TIP
Covered queries are the highest efficiency tier available. For hot read paths (e.g., session lookups, API key validation, user profile fetches for auth), designing specifically for covered queries can eliminate disk I/O entirely. The index must be in WiredTiger cache for this to be fast — verify with db.collection.stats().indexSizes.

Multikey Indexes Cannot Cover Queries

// If any document has the indexed field as an array,
// MongoDB marks the index as multikey — covered queries are then impossible
db.products.createIndex({ tags: 1, name: 1 })

// { tags: ["a", "b"], name: "Widget" } → index is multikey
db.products.find(
  { tags: "a" },
  { name: 1, _id: 0 }
)
// totalDocsExamined will be > 0 even with the index — multikey always fetches docs
06
Field Order Strategy
Designing compound indexes for real query patterns
design

There is no universal "best" compound index — the right index depends entirely on the actual query patterns it must serve. The following principles guide field order decisions.

Step-by-Step Design Process

  • 1. Identify the query pattern: list all filter fields, sort fields, and projection fields
  • 2. Apply ESR: Equality → Sort → Range
  • 3. Within equality group: put highest-selectivity (most distinct values) fields first to narrow the scan fastest
  • 4. Verify with explain(): confirm IXSCAN, check totalDocsExamined vs nReturned
// Scenario: e-commerce orders dashboard
// Query: find "completed" orders for region "UK", sorted by amount desc, in last 30 days
db.orders.find(
  {
    status:  "completed",                            // E — low cardinality (5 values)
    region:  "UK",                                   // E — medium cardinality (~20 values)
    createdAt: { $gte: ISODate("2024-01-01") }       // R — high cardinality (range)
  }
).sort({ amount: -1 })                              // S

// ESR order: status (E), region (E), amount (S), createdAt (R)
db.orders.createIndex(
  { status: 1, region: 1, amount: -1, createdAt: 1 },
  { name: "idx_status_region_amount_created" }
)

High-Selectivity First Within Equality Fields

// Two equality fields: userId (millions of users) vs status (5 values)
// userId has WAY higher cardinality → put it FIRST

// ✅ BETTER — userId first narrows to a single user's docs immediately
db.orders.createIndex({ userId: 1, status: 1 })

// ❌ WORSE — status first groups by 5 buckets, then scans within bucket for userId
db.orders.createIndex({ status: 1, userId: 1 })

// Exception: if queries almost always include status but userId is optional,
// status first might serve more query patterns via prefix rule

One Index for Multiple Query Patterns

// Index: { userId: 1, status: 1, createdAt: -1 }
// This single index serves ALL of these query patterns:

db.orders.find({ userId: "U1" })                          // prefix: userId
db.orders.find({ userId: "U1", status: "active" })         // prefix: userId + status
db.orders.find({ userId: "U1" }).sort({ createdAt: -1 })  // sort elimination
db.orders.find({ userId: "U1", status: "active" })
  .sort({ createdAt: -1 })                                 // full ESR use

// Does NOT serve (no prefix):
db.orders.find({ status: "active" })                       // COLLSCAN
db.orders.find({ createdAt: { $gte: ... } })               // COLLSCAN
DANGER
Do not create a compound index for each individual query pattern. Each index has a write penalty and consumes RAM. Instead, design one well-ordered compound index that serves multiple related query patterns via the prefix rule. Aim for the minimal set of indexes that covers all critical query patterns.
07
Quick Reference
Rules, patterns, and common mistakes at a glance
reference

Compound Index Rules Summary

RuleDescriptionConsequence of Violation
Prefix RuleQuery must reference left-prefix of index fieldsCOLLSCAN — index not used
ESR OrderEquality → Sort → Range field orderingIn-memory sort required, larger index scan
Sort DirectionQuery sort must match or be exact inverse of indexIn-memory sort required
Covered QueryAll filter + projection fields in index; _id: 0Document fetch required (totalDocsExamined > 0)
No Parallel ArraysCompound index can have at most one array fieldError: "cannot index parallel arrays"
Max 32 FieldsMax 32 fields in a single compound indexError on creation

Common Mistakes

MistakeProblemFix
Range field before sort fieldCannot eliminate in-memory sortMove sort field before range field (ESR)
Low-selectivity field firstLarge initial scan before narrowingUse most selective equality field first
Projecting unlisted fieldCovered query broken — triggers FETCHAdd field to index or remove from projection
Forgetting _id: 0Covered query broken — _id not in indexAdd _id: 0 to projection
Index per query patternExcessive write penalty, RAM wasteOne well-designed index serves multiple patterns

ESR Quick Classifier

// For each field in your query, classify it:
//   field: "exact_value"       → E (equality)
//   field: { $gt/$lt/... }     → R (range)
//   field: { $in: [v1, v2] }   → R (treated as range for ESR purposes)
//   .sort({ field: 1/-1 })     → S (sort)
//   field: { $regex: /.../ }   → R (range)
//
// Build index: all E fields first, then S fields, then R fields
// Within E group: highest cardinality field first // Within R group: field most likely to reduce result set first