← Back to Index

Documents &
BSON

FILE 03_documents_bson
TOPIC BSON Types · ObjectId · Structure · Naming Rules
LEVEL Foundation
01
JSON vs BSON
Why MongoDB uses binary encoding
Encoding

MongoDB stores data as BSON (Binary JSON) — a superset of JSON that adds richer type information and encodes data in binary for efficiency.

FeatureJSONBSON
FormatHuman-readable text (UTF-8)Binary — not human-readable
NumbersSingle generic "Number" typeInt32, Int64, Double, Decimal128
DatesStrings (no native Date type)Native 64-bit UTC timestamp
BinaryBase64 encode workaroundNative BinData type
ObjectIdNot availableNative 12-byte type
SpeedSlower (text parsing)Faster (binary traversal)
SizeSmaller for simple stringsSmaller for typed numeric data
NOTEThe MongoDB wire protocol and storage both use BSON. When you use the shell or a driver, it serializes your JavaScript objects to BSON and deserializes responses from BSON transparently.
02
BSON Data Types
Complete type reference with usage notes
Types
TypeBSON AliasExampleWhen to Use
Stringstring"MongoDB"Names, text, slugs
Int32intNumberInt(42)Counts, small IDs (fits in 32 bits)
Int64longNumberLong(9007199254740993)Large counters, timestamps-as-int
Doubledouble3.14Default for JS numbers; scores, percentages
Decimal128decimalNumberDecimal("19.99")MONEY Financial values — exact precision
Booleanbooltrue / falseFlags, toggles, pass/fail
Datedatenew Date()Timestamps, event dates — use new Date() not strings!
ObjectIdobjectIdObjectId("...")Default _id, unique identifier
Arrayarray[1, "two", true]Tags, skills, phone numbers, ordered lists
Objectobject{city:"Mumbai"}Embedded sub-documents, nested config
NullnullnullExplicit absence; optional fields
BinarybinDataBinData(0,"...")Images, encrypted data, raw bytes
Regular Expressionregex/pattern/iPattern storage; use $regex operator to query
TimestamptimestampTimestamp(1,1)Internal MongoDB use (oplog) — avoid in app
MinKey / MaxKeyminKey/maxKeyComparison anchors for shard ranges
UndefinedundefinedDEPRECATED — use null instead

Critical: Double vs Decimal128

// WRONG for money — binary float has precision issues:
{ price: 19.99 }  // stored as Double: 19.989999999999998...

// CORRECT for money — exact decimal:
{ price: NumberDecimal("19.99") }  // stored as Decimal128: exactly 19.99

// Verify the difference:
db.test.insertOne({ d: 0.1 + 0.2, dec: NumberDecimal("0.1") + NumberDecimal("0.2") })
// d: 0.30000000000000004 (Double drift)
// dec: 0.3 (Decimal128 — precise)

Date Gotcha

// WRONG — stores a string, not a Date object
{ createdAt: "2024-01-15" }              // type: string
{ createdAt: Date() }                    // Date() without 'new' returns a STRING

// CORRECT — stores actual BSON Date (64-bit UTC milliseconds)
{ createdAt: new Date() }                // type: date — sortable, queryable with $gt/$lt
{ createdAt: ISODate("2024-01-15") }     // same as new Date("2024-01-15")

// String dates break date range queries:
db.orders.find({ createdAt: { $gte: new Date("2024-01-01") } })
// → returns 0 results if createdAt is stored as string
WARNAlways use new Date() (with new) to store dates. Date() without new returns a plain string in JavaScript.

Full Document Example with All Types

db.products.insertOne({
  _id: ObjectId(),                         // ObjectId
  name: "Wireless Headphones",             // String
  price: NumberDecimal("2499.99"),         // Decimal128 (money)
  stock: NumberInt(150),                   // Int32
  inStock: true,                           // Boolean
  tags: ["electronics", "audio", "sony"], // Array
  specs: { brand: "Sony", weight: "250g" },// Object (embedded doc)
  createdAt: new Date(),                   // Date
  image: BinData(0, "base64=="),           // Binary
  discount: null                           // Null (no current discount)
})
03
ObjectId Deep Dive
Structure · Methods · Custom _id patterns
ObjectId

Structure — 12 Bytes, 24 Hex Chars

// 507f1f77bcf86cd799439011
// ├── bytes 0-3  ──┤ ├─ 4-8 ─┤ ├9-11┤
//  timestamp(4B)   random(5B) counter(3B)
//  seconds since     per-process  increments per
//  Unix epoch        random seed  ObjectId created
BytesSizeContentPurpose
0–34 bytesUnix timestamp (seconds)Embeds creation time; enables time-range queries via _id
4–85 bytesRandom value per processMachine/process uniqueness — no central coordinator needed
9–113 bytesIncrementing counter (random start)Uniqueness within same second on same process

ObjectId Methods

// Create new ObjectId
const id = new ObjectId()
const id2 = ObjectId()    // same thing — 'new' is optional in mongosh

// Extract creation timestamp (free, no extra field needed)
id.getTimestamp()         // → ISODate("2024-01-15T10:30:00Z")

// String representations
id.toString()             // → "ObjectId('507f1f77bcf86cd799439011')"
id.str                    // → "507f1f77bcf86cd799439011"
id.valueOf()              // → "507f1f77bcf86cd799439011"

// Query by ObjectId (MUST wrap string in ObjectId())
db.users.findOne({ _id: ObjectId("507f1f77bcf86cd799439011") })
// String "507f1f7..." ≠ ObjectId("507f1f7...") — type mismatch!

// Time-range query using _id (no createdAt field needed)
const start = ObjectId.createFromTime(new Date("2024-01-01").getTime() / 1000)
const end   = ObjectId.createFromTime(new Date("2024-02-01").getTime() / 1000)
db.orders.find({ _id: { $gte: start, $lt: end } })
// Returns all orders created in January 2024
TIPFree timestamp: Every document already has creation time embedded in its _id. Don't duplicate it with a createdAt field unless you need sub-second precision or plan to sort by creation time with a custom index.

Custom _id Types

// Any unique BSON value can be _id
db.countries.insertOne({ _id: "US", name: "United States" })     // String
db.products.insertOne({ _id: NumberInt(1001), name: "Widget" }) // Integer
db.sessions.insertOne({ _id: UUID(), token: "abc..." })          // UUID
db.config.insertOne({ _id: { type: "limit", env: "prod" } })    // Compound object

// Compound _id (natural unique key — no separate unique index needed)
db.enrollments.insertOne({ _id: { userId: "u1", courseId: "c1" }, grade: "A" })
WARN_id is immutable. Once set, it cannot be changed. Attempting $set: { _id: newValue } throws: "Performing an update on the path '_id' would modify the immutable field '_id'".
04
Document Structure
Hierarchy · nesting · field ordering
Structure

The MongoDB Hierarchy

MongoDB Server
 └── Database (e.g., EcommerceDB)
      └── Collection (e.g., products)
           └── Document (e.g., { name: "iPhone", price: 999 })
                └── Fields + Values (BSON key-value pairs)

Document Rules

  • Max size: 16MB per document. Use GridFS for larger files (images, videos).
  • _id is mandatory: Auto-generated ObjectId if not provided. Cannot be an array.
  • _id is always first: Field order is preserved except _id always comes first in storage.
  • Field names: Must be UTF-8 strings. Top-level names cannot start with $.
  • Nesting: No hard depth limit, but deeply nested documents are hard to index and query.
  • Schema-flexible: Documents in the same collection can have completely different fields.

Dot Notation for Nested Access

// Document:
{
  _id: 1,
  user: { name: "Alice", address: { city: "Mumbai", pin: "400001" } },
  scores: [85, 92, 78]
}

// Query nested field:
db.col.find({ "user.address.city": "Mumbai" })   // quotes required

// Update nested field (doesn't overwrite sibling fields):
db.col.updateOne({ _id: 1 }, { $set: { "user.address.city": "Delhi" } })
// Only changes city — name, pin, everything else preserved

// WRONG — replaces entire address object:
db.col.updateOne({ _id: 1 }, { $set: { "user.address": { city: "Delhi" } } })
// pin field is now GONE

// Access array by index:
db.col.find({ "scores.0": 85 })                  // first element is 85
db.col.updateOne({ _id: 1 }, { $set: { "scores.1": 95 } })  // update index 1

Field Order Preservation

// MongoDB preserves field insertion order (except _id always goes first)
db.test.insertOne({ b: 2, a: 1, c: 3 })
db.test.findOne({})
// → { _id: ObjectId("..."), b: 2, a: 1, c: 3 }
// Note: _id appears FIRST even though it wasn't in inserted position
NOTEDo not rely on non-_id field order for business logic. Use sort() for ordering results, not field insertion position.
05
Naming Rules
Database · collection · field constraints
Rules
EntityRulesAnti-patterns
Database Case-insensitive, max 64 chars. Forbidden: /\. "$*:|? (Windows), /\. "$ (Unix). Conventional: lowercase or CamelCase. My Database (space), DB$Name
Collection Must start with _ or letter (a–z, A–Z). Cannot start with system. (reserved). Cannot contain $ or null chars. Max 120 bytes (including db prefix). system.users, 2024stats, my$col
Field Must be UTF-8 string. Top-level: cannot start with $ (reserved for operators). Can contain dots only in embedded dot-notation context. _id is reserved and always unique. $price, fields with . in name

Reserved Databases

// MongoDB auto-creates these — never drop or modify them:
admin   // Authentication, authorization, server commands
local   // Replication oplog, never replicated to secondaries
config  // Sharding metadata for mongos router
DANGERNever run db.dropDatabase() on admin or local. This can corrupt your replica set configuration or lose all user credentials.
06
Edge Cases
null · missing · NaN · type mismatches
Edge Cases

null vs Missing Field

// Two different documents:
{ _id: 1, name: "Alice", phone: null }   // field EXISTS but is null
{ _id: 2, name: "Bob"  }                 // phone field is ABSENT (missing)

// This query matches BOTH:
db.users.find({ phone: null })           // null OR missing — returns both docs

// To match ONLY explicitly null (not missing):
db.users.find({ phone: { $eq: null, $exists: true } })   // returns _id:1 only

// To match ONLY missing (not even null):
db.users.find({ phone: { $exists: false } })             // returns _id:2 only
WARN{ field: null } is one of MongoDB's most common gotchas — it matches both null values AND missing fields. Always use $exists when you need precision.

NaN (Not a Number)

// NaN is a valid BSON Double value
db.data.insertOne({ value: NaN })

// Query for NaN — must use exact equality
db.data.find({ value: NaN })            // works
db.data.find({ value: { $gt: NaN } })   // returns nothing — comparisons don't work

// $inc on NaN stays NaN
db.data.updateOne({ value: NaN }, { $inc: { value: 1 } })
// value is still NaN — NaN + 1 = NaN

Type Sensitivity in Queries

// Price stored as string by mistake:
db.products.insertOne({ name: "Widget", price: "499" })   // string!

// This query returns nothing:
db.products.find({ price: { $gt: 100 } })   // 100 is number; "499" is string

// Find and fix type mismatches:
db.products.find({ price: { $type: "string" } }).forEach(doc => {
  db.products.updateOne({ _id: doc._id }, { $set: { price: parseFloat(doc.price) } })
})

Duplicate Key on _id: null

// null is a valid _id — but only ONE document can have _id:null
db.test.insertOne({ _id: null, x: 1 })   // succeeds
db.test.insertOne({ _id: null, x: 2 })   // E11000 duplicate key error!

Nested Field on null Parent

// Document: { _id: 1, address: null }
db.users.updateOne({ _id: 1 }, { $set: { "address.city": "Mumbai" } })
// Error: "Cannot create field 'city' in element {address: null}"
// Fix: first set the parent object, then set the nested field
db.users.updateOne({ _id: 1 }, { $set: { address: {} } })
db.users.updateOne({ _id: 1 }, { $set: { "address.city": "Mumbai" } })