← back

Modeling
Relationships

FILE  31_relationships
TOPIC  1:1 · 1:Few · 1:Many · Many:Many · Tree Structures · Cardinality Guide
LEVEL  Intermediate
01
Cardinality Overview
Relationship size determines schema strategy
concept

MongoDB has no enforced foreign keys or JOIN syntax — relationships are modeled through embedding and referencing. The right choice depends heavily on the relationship's cardinality (how many of each side exist) and access patterns.

CardinalityScalePrimary StrategyExample
One-to-One (1:1)1 child per parentEmbed or mergeUser ↔ UserProfile
One-to-Few (1:few)2–20 childrenEmbed arrayUser → Addresses
One-to-Many (1:N)Hundreds–thousandsReference with child-side keyPost → Comments
One-to-Squillions (1:N+)Millions+Reference + bucket/subsetServer → Log entries
Many-to-Many (M:N)Both sides multipleArray of IDs on one/both sidesStudents ↔ Courses
NOTE
MongoDB's document model makes 1:1 and 1:few relationships trivially fast through embedding. The complexity appears at 1:many and many:many, where trade-offs between query simplicity and document size must be balanced.
02
One-to-One (1:1)
Always prefer embedding — two collections is almost never right for 1:1
1:1

A one-to-one relationship means each document in collection A corresponds to exactly one document in collection B (and vice versa). In relational DB this would be separate tables; in MongoDB, merge them into a single document.

// ❌ Relational-style split — unnecessary round-trip
// users:    { _id: 1, email: "alice@x.com" }
// profiles: { userId: 1, bio: "...", avatar: "url", twitter: "@alice" }

// ✅ MongoDB-style — one document (unless profile is large and rarely read)
{
  _id:     ObjectId("..."),
  email:   "alice@example.com",
  profile: {
    bio:     "MongoDB enthusiast",
    avatar:  "https://cdn.example.com/alice.jpg",
    twitter: "@alice"
  }
}
// Single document read → complete user + profile
// Single atomic write → update user + profile together

When 1:1 Split is Justified

  • The secondary data is very large but rarely read (e.g., full resume PDF content vs user summary)
  • The secondary data has different access control (e.g., sensitive fields in a separate collection with restrictive read access)
  • The secondary data is only needed in one specific endpoint while the primary is loaded everywhere
03
One-to-Few (1:few)
Embed the bounded array — simplest, fastest pattern
1:few

One-to-few means a parent has a small, bounded number of children (typically 2–20). This is the classic embedding case — no separate collection needed.

// User → Addresses (a user has at most 5 delivery addresses)
{
  _id:   ObjectId("u1"),
  name:  "Alice",
  email: "alice@example.com",
  addresses: [
    { label: "home",    street: "123 Main",  city: "NYC",      zip: "10001", isDefault: true },
    { label: "office",  street: "456 5th Ave", city: "NYC",    zip: "10018", isDefault: false }
  ]
}

// Add an address:
db.users.updateOne(
  { _id: userId },
  { $push: { addresses: newAddress } }
)

// Update default address:
db.users.updateOne(
  { _id: userId, "addresses.label": "home" },
  { $set: { "addresses.$.isDefault": true } }
)

// Remove an address by label:
db.users.updateOne(
  { _id: userId },
  { $pull: { addresses: { label: "office" } } }
)

Order → Line Items (1:few with calculated total)

{
  _id:    ObjectId("o1"),
  userId: ObjectId("u1"),
  status: "placed",
  lineItems: [
    { productId: ObjectId("p1"), name: "Widget", qty: 2, unitPrice: 19.99, subtotal: 39.98 },
    { productId: ObjectId("p2"), name: "Gadget", qty: 1, unitPrice: 49.99, subtotal: 49.99 }
  ],
  subtotal: 89.97,
  tax:       7.20,
  total:    97.17
}
// Atomic: update line item AND order total in one write
// No second collection needed — line items have no life outside the order
04
One-to-Many (1:N)
Child-side reference with index — avoid embedding unbounded arrays
1:N

One-to-many: one parent, hundreds-to-thousands of children. The children are stored in their own collection with a reference back to the parent. Always index the reference field.

// Blog post → Comments (hundreds of comments per post)

// posts collection (lean — no embedded comments)
{
  _id:          ObjectId("post1"),
  title:        "MongoDB Relationships",
  body:         "...",
  authorId:     ObjectId("user1"),
  commentCount: 247,       // computed field — avoids COUNT query
  createdAt:    ISODate("2024-03-01")
}

// comments collection (child-side reference)
{ _id: ObjectId("c1"), postId: ObjectId("post1"), userId: ObjectId("u2"),
  text: "Great article!", createdAt: ISODate("2024-03-02"), likes: 12 }
{ _id: ObjectId("c2"), postId: ObjectId("post1"), userId: ObjectId("u3"),
  text: "Very helpful",   createdAt: ISODate("2024-03-02"), likes: 5 }

// Critical: index on the foreign key field
db.comments.createIndex({ postId: 1, createdAt: -1 })   // for pagination
db.comments.createIndex({ userId: 1, createdAt: -1 })   // for "user's comments"

// Fetch comments for a post (paginated):
db.comments.find({ postId: ObjectId("post1") })
  .sort({ createdAt: -1 })
  .skip(page * 20)
  .limit(20)

// New comment — update parent count atomically:
session.withTransaction(async () => {
  await db.comments.insertOne(newComment)
  await db.posts.updateOne({ _id: postId }, { $inc: { commentCount: 1 } })
})
WARN
Without an index on the child-side reference field (postId), every query for comments on a post is a full collection scan — O(N) across ALL comments in the database, not just the post's comments. This index is not optional.
05
Many-to-Many (M:N)
Array of IDs — avoid junction collections when possible
M:N

Many-to-many: a student can enrol in many courses; a course has many students. MongoDB avoids junction tables by storing an array of IDs on one or both sides.

// Approach 1: Array of IDs on the "many" side that's bounded
// A student takes ≤ 20 courses at a time → embed courseIds in student
{
  _id:            ObjectId("s1"),
  name:           "Alice",
  enrolledCourses: [ObjectId("c1"), ObjectId("c2"), ObjectId("c3")]
}

// courses collection stays lean (no student array — could be 50,000 students)
{ _id: ObjectId("c1"), title: "MongoDB Fundamentals", seats: 30, enrolled: 247 }

// Index for reverse lookup: "which students are in course C1?"
db.students.createIndex({ enrolledCourses: 1 })   // multikey index auto-created

// Find all courses for a student (single doc read):
const student = db.students.findOne({ _id: studentId })
const courses = db.courses.find({ _id: { $in: student.enrolledCourses } }).toArray()

// Find all students in a course:
db.students.find({ enrolledCourses: ObjectId("c1") }).sort({ name: 1 })
// Approach 2: Bidirectional arrays (when both sides are bounded)
// Tags ↔ Articles (tag has array of articleIds; article has array of tagIds)
{ _id: ObjectId("t1"), name: "mongodb", articleIds: [ObjectId("a1"), ObjectId("a3")] }
{ _id: ObjectId("a1"), title: "...",    tagIds:     [ObjectId("t1"), ObjectId("t2")] }
// Fast lookups either way — but both arrays must be kept in sync on updates
// Approach 3: Junction collection (best when relationship has its own attributes)
// Enrollment has: enrolledAt, grade, status — not just a link
{
  _id:        ObjectId("e1"),
  studentId:  ObjectId("s1"),
  courseId:   ObjectId("c1"),
  enrolledAt: ISODate("2024-01-15"),
  grade:      "A",
  status:     "active"
}
db.enrollments.createIndex({ studentId: 1, courseId: 1 }, { unique: true })
db.enrollments.createIndex({ courseId: 1 })
06
Tree Structures
Parent reference · Children array · Materialized path · Nested sets
trees

Hierarchical data (categories, org charts, comment threads, file systems) can be modeled in several ways, each optimized for different query patterns.

Parent Reference (simplest)

// Each node stores its parent's _id
{ _id: 1, name: "Electronics",    parentId: null }
{ _id: 2, name: "Phones",         parentId: 1 }
{ _id: 3, name: "Laptops",        parentId: 1 }
{ _id: 4, name: "Android Phones", parentId: 2 }

db.categories.createIndex({ parentId: 1 })

// Get direct children of "Electronics":
db.categories.find({ parentId: 1 })

// Traverse ancestors (requires GraphLookup or app-side recursion)
db.categories.aggregate([{
  $graphLookup: {
    from:             "categories",
    startWith:        "$parentId",
    connectFromField: "parentId",
    connectToField:   "_id",
    as:               "ancestors"
  }
}])

Materialized Path (best for subtree queries)

// Store the full path from root as a string
{ _id: 1, name: "Electronics",    path: "," }
{ _id: 2, name: "Phones",         path: ",1," }
{ _id: 3, name: "Laptops",        path: ",1," }
{ _id: 4, name: "Android Phones", path: ",1,2," }

// Index on path for prefix queries
db.categories.createIndex({ path: 1 })

// Get ALL descendants of "Electronics" (id=1) in one query:
db.categories.find({ path: /,1,/ })   // regex prefix match

// Get direct children of "Phones" (id=2):
db.categories.find({ path: /^,1,2,$/ })

Children Array

// Each node stores direct children IDs
{ _id: 1, name: "Electronics", children: [2, 3] }
{ _id: 2, name: "Phones",      children: [4] }
{ _id: 4, name: "Android",     children: [] }
// Fast: get direct children via single doc read
// Slow: traverse subtree requires multiple reads
07
Reference Patterns Summary
Which side holds the reference and when
reference

Parent-Side Reference (embed foreign key in parent)

// Parent holds the reference — good when: parent is frequently read,
// child count is small, lookups go from parent → child
// Order → Customer: order stores customerId
{ orderId: "O1", customerId: ObjectId("c1") }
db.customers.findOne({ _id: order.customerId })

Child-Side Reference (embed foreign key in child)

// Child holds the reference — good when: many children per parent,
// queries go from child → parent, children need independent access
// Comment → Post: comment stores postId
{ commentId: "cm1", postId: ObjectId("post1"), text: "..." }
db.comments.find({ postId: postId }).sort({ date: -1 })

Deep Comparison

AttributeParent-SideChild-Side
Lookup directionParent → find child(ren)Child → find parent
Unbounded growthProblem (array grows)No problem (index on child)
Delete parentMust clean up orphan childrenChildren reference missing parent
Count childrenarray.length (fast)countDocuments() or computed field
Add/remove child$push/$pull on parent docinsert/delete child doc
Best for1:1, 1:few, M:N (bounded)1:many, 1:squillions