31 · Relationships

Cardinality Overview

Relationship size determines schema strategy

concept

MongoDB has no enforced foreign keys or JOIN syntax — relationships are modeled through embedding and referencing. The right choice depends heavily on the relationship's cardinality (how many of each side exist) and access patterns.

Cardinality	Scale	Primary Strategy	Example
One-to-One (1:1)	1 child per parent	Embed or merge	User ↔ UserProfile
One-to-Few (1:few)	2–20 children	Embed array	User → Addresses
One-to-Many (1:N)	Hundreds–thousands	Reference with child-side key	Post → Comments
One-to-Squillions (1:N+)	Millions+	Reference + bucket/subset	Server → Log entries
Many-to-Many (M:N)	Both sides multiple	Array of IDs on one/both sides	Students ↔ Courses

NOTE

MongoDB's document model makes 1:1 and 1:few relationships trivially fast through embedding. The complexity appears at 1:many and many:many, where trade-offs between query simplicity and document size must be balanced.

One-to-One (1:1)

Always prefer embedding — two collections is almost never right for 1:1

1:1

A one-to-one relationship means each document in collection A corresponds to exactly one document in collection B (and vice versa). In relational DB this would be separate tables; in MongoDB, merge them into a single document.

// ❌ Relational-style split — unnecessary round-trip
// users:    { _id: 1, email: "alice@x.com" }
// profiles: { userId: 1, bio: "...", avatar: "url", twitter: "@alice" }

// ✅ MongoDB-style — one document (unless profile is large and rarely read)
{
  _id:     ObjectId("..."),
  email:   "alice@example.com",
  profile: {
    bio:     "MongoDB enthusiast",
    avatar:  "https://cdn.example.com/alice.jpg",
    twitter: "@alice"
  }
}
// Single document read → complete user + profile
// Single atomic write → update user + profile together

When 1:1 Split is Justified

The secondary data is very large but rarely read (e.g., full resume PDF content vs user summary)
The secondary data has different access control (e.g., sensitive fields in a separate collection with restrictive read access)
The secondary data is only needed in one specific endpoint while the primary is loaded everywhere

One-to-Few (1:few)

Embed the bounded array — simplest, fastest pattern

1:few

One-to-few means a parent has a small, bounded number of children (typically 2–20). This is the classic embedding case — no separate collection needed.

// User → Addresses (a user has at most 5 delivery addresses)
{
  _id:   ObjectId("u1"),
  name:  "Alice",
  email: "alice@example.com",
  addresses: [
    { label: "home",    street: "123 Main",  city: "NYC",      zip: "10001", isDefault: true },
    { label: "office",  street: "456 5th Ave", city: "NYC",    zip: "10018", isDefault: false }
  ]
}

// Add an address:
db.users.updateOne(
  { _id: userId },
  { $push: { addresses: newAddress } }
)

// Update default address:
db.users.updateOne(
  { _id: userId, "addresses.label": "home" },
  { $set: { "addresses.$.isDefault": true } }
)

// Remove an address by label:
db.users.updateOne(
  { _id: userId },
  { $pull: { addresses: { label: "office" } } }
)

Order → Line Items (1:few with calculated total)

{
  _id:    ObjectId("o1"),
  userId: ObjectId("u1"),
  status: "placed",
  lineItems: [
    { productId: ObjectId("p1"), name: "Widget", qty: 2, unitPrice: 19.99, subtotal: 39.98 },
    { productId: ObjectId("p2"), name: "Gadget", qty: 1, unitPrice: 49.99, subtotal: 49.99 }
  ],
  subtotal: 89.97,
  tax:       7.20,
  total:    97.17
}
// Atomic: update line item AND order total in one write
// No second collection needed — line items have no life outside the order

One-to-Many (1:N)

Child-side reference with index — avoid embedding unbounded arrays

1:N

One-to-many: one parent, hundreds-to-thousands of children. The children are stored in their own collection with a reference back to the parent. Always index the reference field.

// Blog post → Comments (hundreds of comments per post)

// posts collection (lean — no embedded comments)
{
  _id:          ObjectId("post1"),
  title:        "MongoDB Relationships",
  body:         "...",
  authorId:     ObjectId("user1"),
  commentCount: 247,       // computed field — avoids COUNT query
  createdAt:    ISODate("2024-03-01")
}

// comments collection (child-side reference)
{ _id: ObjectId("c1"), postId: ObjectId("post1"), userId: ObjectId("u2"),
  text: "Great article!", createdAt: ISODate("2024-03-02"), likes: 12 }
{ _id: ObjectId("c2"), postId: ObjectId("post1"), userId: ObjectId("u3"),
  text: "Very helpful",   createdAt: ISODate("2024-03-02"), likes: 5 }

// Critical: index on the foreign key field
db.comments.createIndex({ postId: 1, createdAt: -1 })   // for pagination
db.comments.createIndex({ userId: 1, createdAt: -1 })   // for "user's comments"

// Fetch comments for a post (paginated):
db.comments.find({ postId: ObjectId("post1") })
  .sort({ createdAt: -1 })
  .skip(page * 20)
  .limit(20)

// New comment — update parent count atomically:
session.withTransaction(async () => {
  await db.comments.insertOne(newComment)
  await db.posts.updateOne({ _id: postId }, { $inc: { commentCount: 1 } })
})

WARN

Without an index on the child-side reference field (postId), every query for comments on a post is a full collection scan — O(N) across ALL comments in the database, not just the post's comments. This index is not optional.

Many-to-Many (M:N)

Array of IDs — avoid junction collections when possible

M:N

Many-to-many: a student can enrol in many courses; a course has many students. MongoDB avoids junction tables by storing an array of IDs on one or both sides.

// Approach 1: Array of IDs on the "many" side that's bounded
// A student takes ≤ 20 courses at a time → embed courseIds in student
{
  _id:            ObjectId("s1"),
  name:           "Alice",
  enrolledCourses: [ObjectId("c1"), ObjectId("c2"), ObjectId("c3")]
}

// courses collection stays lean (no student array — could be 50,000 students)
{ _id: ObjectId("c1"), title: "MongoDB Fundamentals", seats: 30, enrolled: 247 }

// Index for reverse lookup: "which students are in course C1?"
db.students.createIndex({ enrolledCourses: 1 })   // multikey index auto-created

// Find all courses for a student (single doc read):
const student = db.students.findOne({ _id: studentId })
const courses = db.courses.find({ _id: { $in: student.enrolledCourses } }).toArray()

// Find all students in a course:
db.students.find({ enrolledCourses: ObjectId("c1") }).sort({ name: 1 })

// Approach 2: Bidirectional arrays (when both sides are bounded)
// Tags ↔ Articles (tag has array of articleIds; article has array of tagIds)
{ _id: ObjectId("t1"), name: "mongodb", articleIds: [ObjectId("a1"), ObjectId("a3")] }
{ _id: ObjectId("a1"), title: "...",    tagIds:     [ObjectId("t1"), ObjectId("t2")] }
// Fast lookups either way — but both arrays must be kept in sync on updates

// Approach 3: Junction collection (best when relationship has its own attributes)
// Enrollment has: enrolledAt, grade, status — not just a link
{
  _id:        ObjectId("e1"),
  studentId:  ObjectId("s1"),
  courseId:   ObjectId("c1"),
  enrolledAt: ISODate("2024-01-15"),
  grade:      "A",
  status:     "active"
}
db.enrollments.createIndex({ studentId: 1, courseId: 1 }, { unique: true })
db.enrollments.createIndex({ courseId: 1 })

Tree Structures

Parent reference · Children array · Materialized path · Nested sets

trees

Hierarchical data (categories, org charts, comment threads, file systems) can be modeled in several ways, each optimized for different query patterns.

Parent Reference (simplest)

// Each node stores its parent's _id
{ _id: 1, name: "Electronics",    parentId: null }
{ _id: 2, name: "Phones",         parentId: 1 }
{ _id: 3, name: "Laptops",        parentId: 1 }
{ _id: 4, name: "Android Phones", parentId: 2 }

db.categories.createIndex({ parentId: 1 })

// Get direct children of "Electronics":
db.categories.find({ parentId: 1 })

// Traverse ancestors (requires GraphLookup or app-side recursion)
db.categories.aggregate([{
  $graphLookup: {
    from:             "categories",
    startWith:        "$parentId",
    connectFromField: "parentId",
    connectToField:   "_id",
    as:               "ancestors"
  }
}])

Materialized Path (best for subtree queries)

// Store the full path from root as a string
{ _id: 1, name: "Electronics",    path: "," }
{ _id: 2, name: "Phones",         path: ",1," }
{ _id: 3, name: "Laptops",        path: ",1," }
{ _id: 4, name: "Android Phones", path: ",1,2," }

// Index on path for prefix queries
db.categories.createIndex({ path: 1 })

// Get ALL descendants of "Electronics" (id=1) in one query:
db.categories.find({ path: /,1,/ })   // regex prefix match

// Get direct children of "Phones" (id=2):
db.categories.find({ path: /^,1,2,$/ })

Children Array

// Each node stores direct children IDs
{ _id: 1, name: "Electronics", children: [2, 3] }
{ _id: 2, name: "Phones",      children: [4] }
{ _id: 4, name: "Android",     children: [] }
// Fast: get direct children via single doc read
// Slow: traverse subtree requires multiple reads

Reference Patterns Summary

Which side holds the reference and when

reference

Parent-Side Reference (embed foreign key in parent)

// Parent holds the reference — good when: parent is frequently read,
// child count is small, lookups go from parent → child
// Order → Customer: order stores customerId
{ orderId: "O1", customerId: ObjectId("c1") }
db.customers.findOne({ _id: order.customerId })

Child-Side Reference (embed foreign key in child)

// Child holds the reference — good when: many children per parent,
// queries go from child → parent, children need independent access
// Comment → Post: comment stores postId
{ commentId: "cm1", postId: ObjectId("post1"), text: "..." }
db.comments.find({ postId: postId }).sort({ date: -1 })

Deep Comparison

Attribute	Parent-Side	Child-Side
Lookup direction	Parent → find child(ren)	Child → find parent
Unbounded growth	Problem (array grows)	No problem (index on child)
Delete parent	Must clean up orphan children	Children reference missing parent
Count children	array.length (fast)	countDocuments() or computed field
Add/remove child	$push/$pull on parent doc	insert/delete child doc
Best for	1:1, 1:few, M:N (bounded)	1:many, 1:squillions