← Back to Index

MongoDB
Introduction

FILE 01_introduction
TOPIC Why NoSQL · ACID · Use Cases
LEVEL Foundation
01
Why NoSQL?
The problems NoSQL was built to solve
Foundation

NoSQL emerged because traditional relational databases hit hard walls when internet-scale workloads demanded more than rigid schemas, vertical scaling, and complex joins could provide.

1. Horizontal Scaling

SQL problem: Vertical scaling (bigger server) hits a physical and cost ceiling. A single machine can only grow so much.

NoSQL solution: Sharding — data is split across hundreds of cheap commodity servers. MongoDB distributes data using a shard key, allowing near-linear throughput growth.

TIPReal example: Netflix uses Apache Cassandra to store viewing histories across regions. MongoDB is used by companies like EA, Adobe, and Cisco for horizontally-scaled workloads.

2. Flexible Schema

SQL problem: Every record must fit the same table structure. Adding a new field requires ALTER TABLE — which can lock production databases.

MongoDB solution: Documents in the same collection can have completely different fields. A shirt and a router can live in the same products collection without a schema migration.

// SQL: shirt and router need same columns — many NULLs
// MongoDB: each document stores only what it needs
{ _id: 1, type: "shirt", size: "L", fabric: "cotton" }
{ _id: 2, type: "router", voltage: "5V", ports: 4, wifi: "6E" }

3. Performance & Low Latency

NoSQL avoids expensive joins by denormalizing — storing related data together. A single read returns an entire embedded object instead of joining 5 tables.

Where it shines: Real-time leaderboards (Redis), social feeds (MongoDB), sensor logs (Cassandra).

4. High Availability

MongoDB replicates data across a replica set (primary + secondaries). If the primary goes down, an automatic election promotes a secondary — typically within 10–30 seconds.

NOTEZero-downtime failover is built-in. Unlike a single SQL server, a replica set keeps serving reads even during primary failover.

5. Developer Velocity

MongoDB stores BSON (Binary JSON) — the same format your Node.js/Python app already uses. No ORM mapping, no impedance mismatch between object models and table rows.

TIPMERN/MEAN stack benefit: JSON from browser → Node.js → MongoDB. The data model is consistent end-to-end. No translation layer needed.
02
MongoDB vs SQL
Terminology and conceptual differences
Comparison
SQL ConceptMongoDB EquivalentKey Difference
DatabaseDatabaseSame concept
TableCollectionNo enforced schema in collection
RowDocumentBSON object, max 16MB, flexible fields
ColumnFieldFields can differ per document
JOIN$lookupJoins are pipeline stages, not native
Primary Key_idAuto-generated ObjectId if not provided
Foreign KeyManual referenceNo enforcement — app logic must maintain
INDEXIndexSame concept, same B-tree underneath
VIEWRead-only view / materialized viewMongoDB 3.4+ supports read-only views
Stored ProcedureAtlas functions / app logicNo stored procs; use app layer
ALTER TABLENo equivalent neededAdd new fields anytime, no migration
NULLnull or missing fieldMissing field ≠ null field in queries

Document Model: Embed vs Reference

// EMBED — good when data is small and always needed together
{
  _id: ObjectId("..."),
  name: "Alice",
  address: { street: "123 MG Road", city: "Mumbai", pin: "400001" }
  // address always fetched with user — single query
}

// REFERENCE — good when data is large or queried independently
// user document:
{ _id: ObjectId("user1"), name: "Alice" }

// orders collection (references user):
{ _id: ObjectId("ord1"), userId: ObjectId("user1"), total: 999 }
RULEGolden rule: Embed when you always read the data together. Reference when the sub-document grows unboundedly (e.g., unlimited orders per user).
03
ACID Compliance
Single-document vs multi-document guarantees
Transactions
TIPMongoDB is ACID compliant. Single-document operations have always been atomic. Multi-document ACID transactions were added in MongoDB 4.0 (2018).

Single-Document ACID (Always)

Every operation on a single document is atomic — it either fully succeeds or fully fails. No partial writes. No dirty reads.

// This updateOne is atomic — all three fields update or none do
db.orders.updateOne(
  { _id: orderId },
  {
    $set: { status: "shipped" },
    $inc: { shipCount: 1 },
    $currentDate: { shippedAt: true }
  }
)

Multi-Document Transactions (MongoDB 4.0+)

Group operations across multiple documents or collections in an all-or-nothing transaction.

const session = db.getMongo().startSession();
session.startTransaction();
try {
  const accounts = session.getDatabase("bank").accounts;
  accounts.updateOne({ _id: "alice" }, { $inc: { balance: -500 } });
  accounts.updateOne({ _id: "bob"   }, { $inc: { balance: +500 } });
  session.commitTransaction();  // Both succeed or neither does
} catch(e) {
  session.abortTransaction();   // Rolls back all changes
} finally {
  session.endSession();
}

ACID Properties Mapped

PropertyWhat it meansHow MongoDB provides it
AtomicityAll-or-nothingWiredTiger undo logs; transaction abort reverts everything
ConsistencyValid state → valid stateSchema validation rules enforced; index consistency maintained
IsolationConcurrent txns don't interfereMVCC snapshot isolation — each txn sees a consistent point-in-time snapshot
DurabilityCommitted data survives crashesWAL journaling + checkpoints every 60s; journal replayed on restart
PERFMulti-document transactions add latency (~3–5x vs non-transactional writes). Prefer embedding related data in one document to avoid needing transactions at all.
04
Under the Hood
WiredTiger · MVCC · Journaling
Internals

WiredTiger Storage Engine

MongoDB's default storage engine since 3.2. Provides document-level concurrency (not collection-level locks like MMAPv1).

  • Compression: Snappy (default) or zlib/zstd — reduces storage by ~60–80%
  • Cache: Uses 50% of RAM - 1GB (configurable). Hot data stays in memory
  • B-tree index: Same structure as SQL indexes — O(log n) lookup

MVCC (Multi-Version Concurrency Control)

When a transaction starts, WiredTiger creates a point-in-time snapshot. The transaction reads from this snapshot — other concurrent writes are invisible until commit.

  • Readers don't block writers; writers don't block readers
  • Conflict detection: if two transactions modify the same document, one gets a WriteConflict error and must retry
  • Optimistic concurrency — no pessimistic locking by default

Write-Ahead Logging (WAL / Journal)

Every write is first recorded in the journal on disk before being applied to data files. On crash:

  1. MongoDB restarts and finds the last valid checkpoint (created every 60s)
  2. Replays journal entries since that checkpoint
  3. All committed transactions are fully restored
TIPUse writeConcern: { j: true } to wait for journal flush before acknowledging a write. Default (j: false) is faster but risks losing the last ~100ms of writes on crash.

Oplog (Operations Log)

A special capped collection on each replica set member that records all write operations. Secondaries tail the primary's oplog and replay operations to stay in sync.

use local
db.oplog.rs.find().sort({ $natural: -1 }).limit(5)
// Shows the 5 most recent write operations replicated to secondaries
05
When to Use What
SQL vs MongoDB decision guide
Decision

Use SQL When

ScenarioWhy SQL Wins
Banking / FinTech transfersMulti-table ACID critical; money cannot "disappear"
Inventory with complex joinsProducts ↔ Suppliers ↔ Categories ↔ Orders — many foreign keys
CRM / ERP (Salesforce, SAP)Highly structured, predictable data; complex reporting with GROUP BY
Data warehousesColumnar SQL (BigQuery, Redshift) optimized for analytics scans

Use MongoDB When

ScenarioWhy MongoDB Wins
Social media feeds / postsPosts have varying structure (text, video, poll, link)
Product catalogs (e-commerce)Shirts have size/fabric; routers have ports/voltage — schema-flexible
Real-time analytics / dashboardsAggregation pipeline; no JOIN cost; pre-computed with $out/$merge
IoT / sensor dataHigh write throughput; time-series collections (MongoDB 5.0+)
Content managementNested content, variable metadata, versioning
Session / cache storeTTL indexes auto-expire documents
06
Edge Cases & Gotchas
Things that trip up developers
Edge Cases

Database Visibility

use newDB             // Switches context but does NOT create the DB
show dbs              // newDB won't appear here yet!
db.col.insertOne({})  // NOW the database is created and visible
NOTEMongoDB lazily creates databases and collections. use myDB merely sets the context; the DB only materializes on first write.

Referential Integrity — None!

// If you delete a user, their orders still reference the deleted _id
db.users.deleteOne({ _id: ObjectId("user1") })
// orders: { userId: ObjectId("user1") } — orphaned! MongoDB doesn't cascade
DANGERMongoDB has no foreign key constraints. Orphaned references are your application's responsibility. Use change streams or application-level cascades to clean up.

16MB Document Limit

// Storing unlimited comments embedded in a post document
{ _id: postId, content: "...", comments: [ ...potentially thousands... ] }
// This will eventually hit the 16MB BSON limit and throw:
// BSONObj size: 16777217 is invalid. Limit: 16777216 bytes

// Fix: store comments in a separate collection
db.comments.insertOne({ postId: postId, text: "...", createdAt: new Date() })

Transactions Require Replica Set

WARNMulti-document transactions require a replica set. A standalone mongod does not support transactions. Even a single-node deployment must be initialized as a 1-member replica set (rs.initiate()).

WriteConflict in Concurrent Transactions

// Two transactions updating the same document:
// Transaction A: $inc { balance: -100 }
// Transaction B: $inc { balance: -200 } (simultaneous)
// One succeeds; the other gets WriteConflict (code 112)
// Application MUST retry transactions on TransientTransactionError

if (error.hasErrorLabel('TransientTransactionError')) {
  // safe to retry the whole transaction
}

Schema Flexibility ≠ Schema-less

TIPMongoDB is schema-flexible, not schema-less. Production systems should enforce structure with JSON Schema validation ($jsonSchema) to prevent bad writes from slipping in silently.
db.createCollection("users", {
  validator: {
    $jsonSchema: {
      required: ["name", "email"],
      properties: {
        email: { bsonType: "string", pattern: "^.+@.+$" },
        age:   { bsonType: "int", minimum: 0 }
      }
    }
  },
  validationAction: "error"  // rejects invalid writes
})