MongoDB Schema Design Patterns
MongoDB’s flexible schema is both a blessing and a curse. After designing schemas for various applications, I’ve learned that good MongoDB schema design requires different thinking than relational databases.
Embedding vs Referencing
When to Embed
Embed when:
- Data is accessed together
- One-to-few relationships
- Data doesn’t change frequently
// User with addresses (one-to-few)
{
_id: ObjectId("..."),
name: "John Doe",
email: "john@example.com",
addresses: [
{
street: "123 Main St",
city: "New York",
zip: "10001"
},
{
street: "456 Oak Ave",
city: "Boston",
zip: "02101"
}
]
}
When to Reference
Reference when:
- Many-to-many relationships
- Data changes frequently
- Data is large
- Need to query independently
// Users collection
{
_id: ObjectId("user1"),
name: "John Doe",
email: "john@example.com"
}
// Orders collection
{
_id: ObjectId("order1"),
userId: ObjectId("user1"), // Reference
items: [
{ productId: ObjectId("prod1"), quantity: 2 },
{ productId: ObjectId("prod2"), quantity: 1 }
],
total: 99.99
}
One-to-Many Patterns
Pattern 1: Embedding (Small N)
// Good for small, bounded arrays
{
_id: ObjectId("blog1"),
title: "MongoDB Patterns",
author: "John Doe",
comments: [
{
author: "Alice",
text: "Great article!",
date: ISODate("2017-11-15")
},
{
author: "Bob",
text: "Very helpful",
date: ISODate("2017-11-16")
}
]
}
Pattern 2: Child References (Large N)
// Parent document
{
_id: ObjectId("blog1"),
title: "MongoDB Patterns",
author: "John Doe"
}
// Child documents
{
_id: ObjectId("comment1"),
blogId: ObjectId("blog1"),
author: "Alice",
text: "Great article!",
date: ISODate("2017-11-15")
}
// Query with $lookup
db.blogs.aggregate([
{ $match: { _id: ObjectId("blog1") } },
{
$lookup: {
from: "comments",
localField: "_id",
foreignField: "blogId",
as: "comments"
}
}
]);
Pattern 3: Parent References
// For many-to-many relationships
// Tags collection
{
_id: ObjectId("tag1"),
name: "mongodb"
}
// Posts collection
{
_id: ObjectId("post1"),
title: "MongoDB Guide",
tagIds: [ObjectId("tag1"), ObjectId("tag2")]
}
Denormalization Patterns
Pattern 1: One-Way Embedding
// Embed frequently accessed data
// User document
{
_id: ObjectId("user1"),
name: "John Doe",
email: "john@example.com"
}
// Order document (embeds user name)
{
_id: ObjectId("order1"),
userId: ObjectId("user1"),
userName: "John Doe", // Denormalized
items: [...],
total: 99.99
}
Pattern 2: Two-Way Embedding
// User document
{
_id: ObjectId("user1"),
name: "John Doe",
recentOrders: [
{ orderId: ObjectId("order1"), total: 99.99 },
{ orderId: ObjectId("order2"), total: 149.99 }
]
}
// Order document
{
_id: ObjectId("order1"),
userId: ObjectId("user1"),
userName: "John Doe",
items: [...],
total: 99.99
}
Precomputed Patterns
Pattern 1: Precomputed Aggregates
// Product document with precomputed stats
{
_id: ObjectId("prod1"),
name: "Laptop",
price: 999.99,
stats: {
totalSales: 1250,
totalRevenue: 1249987.50,
averageRating: 4.5,
reviewCount: 342
}
}
// Update stats on each sale
db.products.update(
{ _id: ObjectId("prod1") },
{
$inc: {
"stats.totalSales": 1,
"stats.totalRevenue": 999.99
}
}
);
Pattern 2: Bucket Pattern
// Store time-series data in buckets
{
_id: ObjectId("sensor1"),
sensorId: "temp-sensor-1",
metadata: { location: "Room 101" },
measurements: [
{
timestamp: ISODate("2017-11-15T10:00:00Z"),
temperature: 72.5
},
{
timestamp: ISODate("2017-11-15T10:05:00Z"),
temperature: 73.1
}
// ... up to 1000 measurements per document
]
}
Polymorphic Pattern
// Different document types in same collection
// Content collection
[
{
_id: ObjectId("content1"),
type: "article",
title: "MongoDB Guide",
body: "...",
author: "John Doe"
},
{
_id: ObjectId("content2"),
type: "video",
title: "MongoDB Tutorial",
url: "https://...",
duration: 600,
author: "John Doe"
},
{
_id: ObjectId("content3"),
type: "podcast",
title: "MongoDB Podcast",
audioUrl: "https://...",
transcript: "...",
author: "John Doe"
}
]
// Query by type
db.content.find({ type: "article" });
Extended Reference Pattern
// Store frequently accessed fields with reference
// Order document
{
_id: ObjectId("order1"),
userId: ObjectId("user1"),
// Extended reference
user: {
_id: ObjectId("user1"),
name: "John Doe",
email: "john@example.com"
},
items: [
{
productId: ObjectId("prod1"),
productName: "Laptop", // Denormalized
price: 999.99,
quantity: 1
}
]
}
Schema Versioning
// Add version field for schema migrations
{
_id: ObjectId("user1"),
schemaVersion: 2,
name: "John Doe",
email: "john@example.com",
// New fields in v2
preferences: {
theme: "dark",
notifications: true
}
}
// Migration script
async function migrateToV3() {
const users = await db.users.find({ schemaVersion: 2 });
for (const user of users) {
await db.users.update(
{ _id: user._id },
{
$set: {
schemaVersion: 3,
profile: {
bio: "",
avatar: null
}
}
}
);
}
}
Indexing Strategies
// Compound indexes for common queries
db.orders.createIndex({ userId: 1, createdAt: -1 });
// Text index for search
db.articles.createIndex({
title: "text",
content: "text"
});
// Geospatial index
db.locations.createIndex({ location: "2dsphere" });
// TTL index for expiration
db.sessions.createIndex(
{ createdAt: 1 },
{ expireAfterSeconds: 3600 }
);
Best Practices
- Design for queries - Structure based on access patterns
- Embed for one-to-few - Reference for one-to-many
- Denormalize carefully - Balance consistency vs performance
- Use appropriate indexes - Support your queries
- Consider document size - Keep under 16MB
- Version your schema - Plan for migrations
- Precompute aggregates - For frequently accessed data
- Use buckets - For time-series data
Common Anti-Patterns
Anti-Pattern 1: Massive Arrays
// BAD: Unbounded array growth
{
_id: ObjectId("user1"),
logins: [
// Could grow to millions
]
}
// GOOD: Use separate collection or buckets
{
_id: ObjectId("login1"),
userId: ObjectId("user1"),
timestamp: ISODate("2017-11-15T10:00:00Z")
}
Anti-Pattern 2: Over-Normalization
// BAD: Too many references (like relational DB)
{
_id: ObjectId("order1"),
userId: ObjectId("user1"),
itemIds: [ObjectId("item1"), ObjectId("item2")]
}
// GOOD: Embed when appropriate
{
_id: ObjectId("order1"),
userId: ObjectId("user1"),
items: [
{ productId: ObjectId("prod1"), name: "Laptop", price: 999.99 }
]
}
Conclusion
MongoDB schema design requires:
- Understanding access patterns
- Choosing embedding vs referencing
- Strategic denormalization
- Proper indexing
- Schema versioning
Design for your queries, not for normalization. The patterns shown here handle millions of documents in production.
MongoDB schema patterns from November 2017, covering common design patterns for NoSQL databases.