25 · Compound Indexes

What is a Compound Index

Multi-field B-Tree — field order is critical

concept

A compound index is a single B-Tree built on two or more fields. The index stores entries sorted by the first field, then by the second field within each first-field group, and so on. Because the sort order is determined by field sequence, the order of fields in a compound index definition directly determines what queries it can serve.

// Create a compound index — field order matters
db.orders.createIndex(
  { status: 1, customerId: 1, createdAt: -1 },
  { name: "idx_status_customer_created" }
)

// Up to 32 fields in a single compound index
// Direction (1/-1) matters for sort operations; irrelevant for single-field equality

How the B-Tree is Structured

For an index on { status: 1, customerId: 1 }, the B-Tree sorts all documents first by status, then by customerId within each status group. This means the index can efficiently locate a specific { status, customerId } pair, or all documents for a particular status — but it cannot efficiently find all documents for a particular customerId without knowing the status first.

// Conceptual layout of { status: 1, customerId: 1 } index:
//   "active"   → C001 → doc ptr
//   "active"   → C002 → doc ptr
//   "active"   → C003 → doc ptr
//   "closed"   → C001 → doc ptr
//   "closed"   → C004 → doc ptr
//   "pending"  → C002 → doc ptr
//
// Can scan: { status: "active" } — contiguous block
// Cannot scan: { customerId: "C001" } — scattered throughout

NOTE

A compound index on { a, b, c } partially replaces the need for standalone indexes on { a } and { a, b } — it serves all three. However, it does NOT replace { b }, { c }, or { b, c }. Always verify with explain() that the index is actually used for each query pattern.

The Prefix Rule

An index serves a query only if the query uses a left-prefix of the index fields

prefix

A compound index on { a, b, c } can serve any query that references a contiguous left-prefix of its fields — starting from the leftmost field. Skipping the first field makes the index unusable for filtering (though it may still be used for sort).

Prefix Coverage Table

For index { status: 1, customerId: 1, createdAt: -1 }:

Query Filter	Uses Index?	Why
`{ status: "active" }`	Yes	Left prefix: status only
`{ status: "active", customerId: "C1" }`	Yes	Left prefix: status + customerId
`{ status: "active", customerId: "C1", createdAt: {$gte:...} }`	Yes	Full index used
`{ customerId: "C1" }`	No	Skips status (leftmost field)
`{ customerId: "C1", createdAt: {$gte:...} }`	No	Skips status
`{ status: "active", createdAt: {$gte:...} }`	Partial	Uses status prefix only; createdAt not contiguous

// Index: { status: 1, customerId: 1, createdAt: -1 }

// ✅ IXSCAN — uses prefix { status }
db.orders.find({ status: "active" })

// ✅ IXSCAN — uses prefix { status, customerId }
db.orders.find({ status: "active", customerId: "C001" })

// ❌ COLLSCAN — customerId alone, no status prefix
db.orders.find({ customerId: "C001" })

// ⚠️ Partial IXSCAN — status prefix allows index entry,
// but createdAt is range-filtered in-memory after status scan
db.orders.find({ status: "active", createdAt: { $gte: ISODate("2024-01-01") } })

WARN

The prefix rule applies to the filter fields, not the query field order in your code. MongoDB's query optimizer normalizes the query filter — writing { customerId: "C1", status: "active" } is identical to { status: "active", customerId: "C1" }. What matters is which fields are present, not the order they appear in your find() call.

When a Non-Prefix Field IS Used (Sort Only)

// Index: { status: 1, createdAt: -1 }
// Query filters on status (prefix), sorts on createdAt (second field)
db.orders
  .find({ status: "active" })
  .sort({ createdAt: -1 })
// ✅ Both filter AND sort served by the index — no in-memory sort

The ESR Rule

Equality → Sort → Range: optimal field ordering for compound indexes

strategy

The ESR Rule is the most important compound index design principle. It prescribes the optimal order for fields in a compound index when a query combines equality filters, sort fields, and range filters.

E — Equality: Fields tested with exact match ({ field: value }) go first
S — Sort: Fields used in .sort() go in the middle
R — Range: Fields in range queries ($gt, $lt, $in, $regex) go last

Why ESR Works

Equality fields placed first collapse the index scan to the smallest possible contiguous block. Sort fields in the middle eliminate in-memory sorting because the index is already ordered for that range. Range fields at the end are applied after equality narrows the scan, minimising the range scan width.

// Query: active orders for customer C001, sorted by date, created after Jan 2024
// Fields: status (equality), customerId (equality), createdAt (sort), amount (range)

// ❌ BAD order — range before sort
db.orders.createIndex({ status: 1, amount: 1, createdAt: -1, customerId: 1 })

// ✅ GOOD order using ESR rule:
//   E: status, customerId  (equality)
//   S: createdAt           (sort)
//   R: amount              (range)
db.orders.createIndex(
  { status: 1, customerId: 1, createdAt: -1, amount: 1 },
  { name: "idx_esr_orders" }
)

// The query this index serves optimally:
db.orders
  .find({
    status:     "active",           // E
    customerId: "C001",             // E
    amount:     { $gt: 100 }        // R
  })
  .sort({ createdAt: -1 })          // S

Range before Sort — Why It Breaks Sort Elimination

// Index: { status: 1, amount: 1, createdAt: -1 }
// Query: { status: "active", amount: { $gt: 100 } }, sort: { createdAt: -1 }
//
// Problem: After scanning { status: "active", amount: >100 },
// the remaining entries are NOT sorted by createdAt — amount range
// scattered the createdAt values across the scan range.
// Result: in-memory SORT stage required (blocking, up to 100MB)

// Fix: move sort field before range field (ESR)
// Index: { status: 1, createdAt: -1, amount: 1 }
// Now createdAt is contiguous within each status group → sort eliminated

TIP

When there is no sort in the query, the ESR rule simplifies to ER: equality fields first, range fields last. The sort slot is empty. If there is no range either, just put all equality fields in selectivity order (highest cardinality first).

Sort Direction Compatibility

Index direction must match or be the exact inverse of the sort

direction

For a compound index to eliminate an in-memory sort, the sort direction of the query must exactly match the index field directions, or be the exact inverse of them. Any partial mismatch forces an in-memory sort.

Matching Rules

// Index: { createdAt: -1, score: 1 }

// ✅ Exact match — no in-memory sort
.sort({ createdAt: -1, score: 1 })

// ✅ Exact inverse — index traversed backwards, no in-memory sort
.sort({ createdAt: 1, score: -1 })

// ❌ Partial mismatch — in-memory sort required
.sort({ createdAt: -1, score: -1 })   // score direction flipped
.sort({ createdAt: 1, score: 1 })     // score direction flipped

Direction Matters Only for Multi-Field Sort

// Index: { status: 1, createdAt: -1 }

// Equality on status → only createdAt sort remains
// Single remaining sort field → direction irrelevant (B-Tree traversed either way)
db.orders.find({ status: "active" }).sort({ createdAt: 1 })   // ✅ no sort stage
db.orders.find({ status: "active" }).sort({ createdAt: -1 })  // ✅ no sort stage

// No equality prefix → sort on both fields
// Both directions must match or be exact inverse
db.orders.find({}).sort({ status: 1, createdAt: -1 })   // ✅ exact match
db.orders.find({}).sort({ status: -1, createdAt: 1 })   // ✅ exact inverse
db.orders.find({}).sort({ status: 1, createdAt: 1 })    // ❌ partial mismatch

Designing for Bidirectional Sort Needs

// If you need BOTH { createdAt -1, score -1 } AND { createdAt 1, score 1 }:
// → One index handles both (exact match + exact inverse)
db.posts.createIndex({ createdAt: -1, score: -1 })    // serves both sort directions

// If you need BOTH { createdAt -1, score 1 } AND { createdAt 1, score -1 }:
// → Same: one index handles both (exact match + exact inverse)
db.posts.createIndex({ createdAt: -1, score: 1 })     // serves both

// If you need BOTH { createdAt -1, score 1 } AND { createdAt -1, score -1 }:
// → These are NOT inverses → need TWO separate indexes

Index Direction	Sort Query	In-Memory Sort?
`{ a: 1, b: 1 }`	`{ a: 1, b: 1 }`	No — exact match
`{ a: 1, b: 1 }`	`{ a: -1, b: -1 }`	No — exact inverse
`{ a: 1, b: 1 }`	`{ a: 1, b: -1 }`	Yes — partial mismatch
`{ a: 1, b: 1 }`	`{ a: -1, b: 1 }`	Yes — partial mismatch

Covered Queries

Zero document fetches — results served entirely from the index

covered

A covered query is one where every field in the filter, every field in the projection, and (for compound indexes) every field in the sort is present in the index. MongoDB can return results entirely from the index without reading any documents from disk — totalDocsExamined: 0 in explain output.

Requirements for a Covered Query

All fields in the query filter are in the index
All fields in the projection are in the index
_id must be explicitly excluded from the projection (_id: 0) unless _id is in the index
The query must not touch an array field (multikey index) — multikey indexes cannot cover queries
The document must not contain the queried field as an array

// Index: { email: 1, name: 1, role: 1 }

// ✅ COVERED — all filter + projection fields in index; _id excluded
db.users.find(
  { email: "user@example.com" },       // filter field: in index
  { name: 1, role: 1, _id: 0 }         // projected fields: in index; _id excluded
)
// explain(): totalDocsExamined: 0 — never touched disk documents

// ❌ NOT COVERED — _id is included by default
db.users.find(
  { email: "user@example.com" },
  { name: 1, role: 1 }                 // _id implicitly included, not in index
)

// ❌ NOT COVERED — projecting a field not in the index
db.users.find(
  { email: "user@example.com" },
  { name: 1, createdAt: 1, _id: 0 }   // createdAt not in index
)

Verifying Coverage with explain()

// Run executionStats and look for totalDocsExamined: 0
db.users.find(
  { email: "user@example.com" },
  { name: 1, role: 1, _id: 0 }
).explain("executionStats")
// Key indicators of a covered query:
//   winningPlan.stage: "PROJECTION_COVERED"  ← or IXSCAN with no FETCH
//   totalDocsExamined: 0                     ← zero disk reads
//   totalKeysExamined: 1                     ← index keys scanned

TIP

Covered queries are the highest efficiency tier available. For hot read paths (e.g., session lookups, API key validation, user profile fetches for auth), designing specifically for covered queries can eliminate disk I/O entirely. The index must be in WiredTiger cache for this to be fast — verify with db.collection.stats().indexSizes.

Multikey Indexes Cannot Cover Queries

// If any document has the indexed field as an array,
// MongoDB marks the index as multikey — covered queries are then impossible
db.products.createIndex({ tags: 1, name: 1 })

// { tags: ["a", "b"], name: "Widget" } → index is multikey
db.products.find(
  { tags: "a" },
  { name: 1, _id: 0 }
)
// totalDocsExamined will be > 0 even with the index — multikey always fetches docs

Field Order Strategy

Designing compound indexes for real query patterns

design

There is no universal "best" compound index — the right index depends entirely on the actual query patterns it must serve. The following principles guide field order decisions.

Step-by-Step Design Process

1. Identify the query pattern: list all filter fields, sort fields, and projection fields
2. Apply ESR: Equality → Sort → Range
3. Within equality group: put highest-selectivity (most distinct values) fields first to narrow the scan fastest
4. Verify with explain(): confirm IXSCAN, check totalDocsExamined vs nReturned

// Scenario: e-commerce orders dashboard
// Query: find "completed" orders for region "UK", sorted by amount desc, in last 30 days
db.orders.find(
  {
    status:  "completed",                            // E — low cardinality (5 values)
    region:  "UK",                                   // E — medium cardinality (~20 values)
    createdAt: { $gte: ISODate("2024-01-01") }       // R — high cardinality (range)
  }
).sort({ amount: -1 })                              // S

// ESR order: status (E), region (E), amount (S), createdAt (R)
db.orders.createIndex(
  { status: 1, region: 1, amount: -1, createdAt: 1 },
  { name: "idx_status_region_amount_created" }
)

High-Selectivity First Within Equality Fields

// Two equality fields: userId (millions of users) vs status (5 values)
// userId has WAY higher cardinality → put it FIRST

// ✅ BETTER — userId first narrows to a single user's docs immediately
db.orders.createIndex({ userId: 1, status: 1 })

// ❌ WORSE — status first groups by 5 buckets, then scans within bucket for userId
db.orders.createIndex({ status: 1, userId: 1 })

// Exception: if queries almost always include status but userId is optional,
// status first might serve more query patterns via prefix rule

One Index for Multiple Query Patterns

// Index: { userId: 1, status: 1, createdAt: -1 }
// This single index serves ALL of these query patterns:

db.orders.find({ userId: "U1" })                          // prefix: userId
db.orders.find({ userId: "U1", status: "active" })         // prefix: userId + status
db.orders.find({ userId: "U1" }).sort({ createdAt: -1 })  // sort elimination
db.orders.find({ userId: "U1", status: "active" })
  .sort({ createdAt: -1 })                                 // full ESR use

// Does NOT serve (no prefix):
db.orders.find({ status: "active" })                       // COLLSCAN
db.orders.find({ createdAt: { $gte: ... } })               // COLLSCAN

DANGER

Do not create a compound index for each individual query pattern. Each index has a write penalty and consumes RAM. Instead, design one well-ordered compound index that serves multiple related query patterns via the prefix rule. Aim for the minimal set of indexes that covers all critical query patterns.

Quick Reference

Rules, patterns, and common mistakes at a glance

reference

Compound Index Rules Summary

Rule	Description	Consequence of Violation
Prefix Rule	Query must reference left-prefix of index fields	COLLSCAN — index not used
ESR Order	Equality → Sort → Range field ordering	In-memory sort required, larger index scan
Sort Direction	Query sort must match or be exact inverse of index	In-memory sort required
Covered Query	All filter + projection fields in index; _id: 0	Document fetch required (totalDocsExamined > 0)
No Parallel Arrays	Compound index can have at most one array field	Error: "cannot index parallel arrays"
Max 32 Fields	Max 32 fields in a single compound index	Error on creation

Common Mistakes

Mistake	Problem	Fix
Range field before sort field	Cannot eliminate in-memory sort	Move sort field before range field (ESR)
Low-selectivity field first	Large initial scan before narrowing	Use most selective equality field first
Projecting unlisted field	Covered query broken — triggers FETCH	Add field to index or remove from projection
Forgetting `_id: 0`	Covered query broken — _id not in index	Add `_id: 0` to projection
Index per query pattern	Excessive write penalty, RAM waste	One well-designed index serves multiple patterns

ESR Quick Classifier

// For each field in your query, classify it:
//   field: "exact_value"       → E (equality)
//   field: { $gt/$lt/... }     → R (range)
//   field: { $in: [v1, v2] }   → R (treated as range for ESR purposes)
//   .sort({ field: 1/-1 })     → S (sort)
//   field: { $regex: /.../ }   → R (range)
//
// Build index: all E fields first, then S fields, then R fields
// Within E group: highest cardinality field first
// Within R group: field most likely to reduce result set first