01 · MongoDB Introduction

Why NoSQL?

The problems NoSQL was built to solve

Foundation

NoSQL emerged because traditional relational databases hit hard walls when internet-scale workloads demanded more than rigid schemas, vertical scaling, and complex joins could provide.

1. Horizontal Scaling

SQL problem: Vertical scaling (bigger server) hits a physical and cost ceiling. A single machine can only grow so much.

NoSQL solution: Sharding — data is split across hundreds of cheap commodity servers. MongoDB distributes data using a shard key, allowing near-linear throughput growth.

TIPReal example: Netflix uses Apache Cassandra to store viewing histories across regions. MongoDB is used by companies like EA, Adobe, and Cisco for horizontally-scaled workloads.

2. Flexible Schema

SQL problem: Every record must fit the same table structure. Adding a new field requires ALTER TABLE — which can lock production databases.

MongoDB solution: Documents in the same collection can have completely different fields. A shirt and a router can live in the same products collection without a schema migration.

// SQL: shirt and router need same columns — many NULLs
// MongoDB: each document stores only what it needs
{ _id: 1, type: "shirt", size: "L", fabric: "cotton" }
{ _id: 2, type: "router", voltage: "5V", ports: 4, wifi: "6E" }

3. Performance & Low Latency

NoSQL avoids expensive joins by denormalizing — storing related data together. A single read returns an entire embedded object instead of joining 5 tables.

Where it shines: Real-time leaderboards (Redis), social feeds (MongoDB), sensor logs (Cassandra).

4. High Availability

MongoDB replicates data across a replica set (primary + secondaries). If the primary goes down, an automatic election promotes a secondary — typically within 10–30 seconds.

NOTEZero-downtime failover is built-in. Unlike a single SQL server, a replica set keeps serving reads even during primary failover.

5. Developer Velocity

MongoDB stores BSON (Binary JSON) — the same format your Node.js/Python app already uses. No ORM mapping, no impedance mismatch between object models and table rows.

TIPMERN/MEAN stack benefit: JSON from browser → Node.js → MongoDB. The data model is consistent end-to-end. No translation layer needed.

MongoDB vs SQL

Terminology and conceptual differences

Comparison

SQL Concept	MongoDB Equivalent	Key Difference
Database	Database	Same concept
Table	Collection	No enforced schema in collection
Row	Document	BSON object, max 16MB, flexible fields
Column	Field	Fields can differ per document
JOIN	`$lookup`	Joins are pipeline stages, not native
Primary Key	`_id`	Auto-generated ObjectId if not provided
Foreign Key	Manual reference	No enforcement — app logic must maintain
INDEX	Index	Same concept, same B-tree underneath
VIEW	Read-only view / materialized view	MongoDB 3.4+ supports read-only views
Stored Procedure	Atlas functions / app logic	No stored procs; use app layer
ALTER TABLE	No equivalent needed	Add new fields anytime, no migration
NULL	`null` or missing field	Missing field ≠ null field in queries

Document Model: Embed vs Reference

// EMBED — good when data is small and always needed together
{
  _id: ObjectId("..."),
  name: "Alice",
  address: { street: "123 MG Road", city: "Mumbai", pin: "400001" }
  // address always fetched with user — single query
}

// REFERENCE — good when data is large or queried independently
// user document:
{ _id: ObjectId("user1"), name: "Alice" }

// orders collection (references user):
{ _id: ObjectId("ord1"), userId: ObjectId("user1"), total: 999 }

RULEGolden rule: Embed when you always read the data together. Reference when the sub-document grows unboundedly (e.g., unlimited orders per user).

ACID Compliance

Single-document vs multi-document guarantees

Transactions

TIPMongoDB is ACID compliant. Single-document operations have always been atomic. Multi-document ACID transactions were added in MongoDB 4.0 (2018).

Single-Document ACID (Always)

Every operation on a single document is atomic — it either fully succeeds or fully fails. No partial writes. No dirty reads.

// This updateOne is atomic — all three fields update or none do
db.orders.updateOne(
  { _id: orderId },
  {
    $set: { status: "shipped" },
    $inc: { shipCount: 1 },
    $currentDate: { shippedAt: true }
  }
)

Multi-Document Transactions (MongoDB 4.0+)

Group operations across multiple documents or collections in an all-or-nothing transaction.

const session = db.getMongo().startSession();
session.startTransaction();
try {
  const accounts = session.getDatabase("bank").accounts;
  accounts.updateOne({ _id: "alice" }, { $inc: { balance: -500 } });
  accounts.updateOne({ _id: "bob"   }, { $inc: { balance: +500 } });
  session.commitTransaction();  // Both succeed or neither does
} catch(e) {
  session.abortTransaction();   // Rolls back all changes
} finally {
  session.endSession();
}

ACID Properties Mapped

Property	What it means	How MongoDB provides it
Atomicity	All-or-nothing	WiredTiger undo logs; transaction abort reverts everything
Consistency	Valid state → valid state	Schema validation rules enforced; index consistency maintained
Isolation	Concurrent txns don't interfere	MVCC snapshot isolation — each txn sees a consistent point-in-time snapshot
Durability	Committed data survives crashes	WAL journaling + checkpoints every 60s; journal replayed on restart

PERFMulti-document transactions add latency (~3–5x vs non-transactional writes). Prefer embedding related data in one document to avoid needing transactions at all.

Under the Hood

WiredTiger · MVCC · Journaling

Internals

WiredTiger Storage Engine

MongoDB's default storage engine since 3.2. Provides document-level concurrency (not collection-level locks like MMAPv1).

Compression: Snappy (default) or zlib/zstd — reduces storage by ~60–80%
Cache: Uses 50% of RAM - 1GB (configurable). Hot data stays in memory
B-tree index: Same structure as SQL indexes — O(log n) lookup

MVCC (Multi-Version Concurrency Control)

When a transaction starts, WiredTiger creates a point-in-time snapshot. The transaction reads from this snapshot — other concurrent writes are invisible until commit.

Readers don't block writers; writers don't block readers
Conflict detection: if two transactions modify the same document, one gets a WriteConflict error and must retry
Optimistic concurrency — no pessimistic locking by default

Write-Ahead Logging (WAL / Journal)

Every write is first recorded in the journal on disk before being applied to data files. On crash:

MongoDB restarts and finds the last valid checkpoint (created every 60s)
Replays journal entries since that checkpoint
All committed transactions are fully restored

TIPUse writeConcern: { j: true } to wait for journal flush before acknowledging a write. Default (j: false) is faster but risks losing the last ~100ms of writes on crash.

Oplog (Operations Log)

A special capped collection on each replica set member that records all write operations. Secondaries tail the primary's oplog and replay operations to stay in sync.

use local
db.oplog.rs.find().sort({ $natural: -1 }).limit(5)
// Shows the 5 most recent write operations replicated to secondaries

When to Use What

SQL vs MongoDB decision guide

Decision

Use SQL When

Scenario	Why SQL Wins
Banking / FinTech transfers	Multi-table ACID critical; money cannot "disappear"
Inventory with complex joins	Products ↔ Suppliers ↔ Categories ↔ Orders — many foreign keys
CRM / ERP (Salesforce, SAP)	Highly structured, predictable data; complex reporting with GROUP BY
Data warehouses	Columnar SQL (BigQuery, Redshift) optimized for analytics scans

Use MongoDB When

Scenario	Why MongoDB Wins
Social media feeds / posts	Posts have varying structure (text, video, poll, link)
Product catalogs (e-commerce)	Shirts have size/fabric; routers have ports/voltage — schema-flexible
Real-time analytics / dashboards	Aggregation pipeline; no JOIN cost; pre-computed with $out/$merge
IoT / sensor data	High write throughput; time-series collections (MongoDB 5.0+)
Content management	Nested content, variable metadata, versioning
Session / cache store	TTL indexes auto-expire documents

Edge Cases & Gotchas

Things that trip up developers

Edge Cases

Database Visibility

use newDB             // Switches context but does NOT create the DB
show dbs              // newDB won't appear here yet!
db.col.insertOne({})  // NOW the database is created and visible

NOTEMongoDB lazily creates databases and collections. use myDB merely sets the context; the DB only materializes on first write.

Referential Integrity — None!

// If you delete a user, their orders still reference the deleted _id
db.users.deleteOne({ _id: ObjectId("user1") })
// orders: { userId: ObjectId("user1") } — orphaned! MongoDB doesn't cascade

DANGERMongoDB has no foreign key constraints. Orphaned references are your application's responsibility. Use change streams or application-level cascades to clean up.

16MB Document Limit

// Storing unlimited comments embedded in a post document
{ _id: postId, content: "...", comments: [ ...potentially thousands... ] }
// This will eventually hit the 16MB BSON limit and throw:
// BSONObj size: 16777217 is invalid. Limit: 16777216 bytes

// Fix: store comments in a separate collection
db.comments.insertOne({ postId: postId, text: "...", createdAt: new Date() })

Transactions Require Replica Set

WARNMulti-document transactions require a replica set. A standalone mongod does not support transactions. Even a single-node deployment must be initialized as a 1-member replica set (rs.initiate()).

WriteConflict in Concurrent Transactions

// Two transactions updating the same document:
// Transaction A: $inc { balance: -100 }
// Transaction B: $inc { balance: -200 } (simultaneous)
// One succeeds; the other gets WriteConflict (code 112)
// Application MUST retry transactions on TransientTransactionError

if (error.hasErrorLabel('TransientTransactionError')) {
  // safe to retry the whole transaction
}

Schema Flexibility ≠ Schema-less

TIPMongoDB is schema-flexible, not schema-less. Production systems should enforce structure with JSON Schema validation ($jsonSchema) to prevent bad writes from slipping in silently.

db.createCollection("users", {
  validator: {
    $jsonSchema: {
      required: ["name", "email"],
      properties: {
        email: { bsonType: "string", pattern: "^.+@.+$" },
        age:   { bsonType: "int", minimum: 0 }
      }
    }
  },
  validationAction: "error"  // rejects invalid writes
})