生产环境中的 MongoDB 模式设计

MongoDB 灵活的文档模型是一把双刃剑。设计得当，性能卓越；设计不当，查询缓慢且文档臃肿。

围绕访问模式进行设计

与 SQL 中规范化数据不同，MongoDB 的模式设计由访问模式驱动：

哪些查询最频繁？
读写比例是多少？
数据随时间如何变化？

MongoDB Schema Design Patterns: Embedding, Bucketing, Time-Series, and Validatio illustration

内嵌 vs. 引用

// 内嵌：所有数据在一个文档中
// 适用于：总是一起访问、大小有限、一对少数
{
  _id: ObjectId("..."),
  name: "John Doe",
  addresses: [
    { type: "home", street: "123 Main St", city: "Austin" },
    { type: "work", street: "456 Corp Ave", city: "Austin" }
  ],
  preferences: { notifications: true, theme: "dark" }
}

// 引用：外键风格
// 适用于：大型/无界数据、独立访问、多对多
{
  _id: ObjectId("..."),
  name: "John Doe",
  order_ids: [ObjectId("..."), ObjectId("...")]
}

时序数据的分桶模式

将测量值分组到时间桶中，大幅减少文档数量：

// 每次测量一个文档 = 每个传感器每年 100 万文档

// 分桶模式：每小时一个文档 = 每个传感器每年 8,760 个文档（减少 114 倍！）
{
  sensor_id: "s1",
  date: ISODate("2026-01-01T00:00:00"),
  nMeasurements: 60,
  measurements: [23.4, 23.5, 23.6],
  summary: { min: 23.1, max: 23.8, avg: 23.5 }
}

// 高效更新：添加到现有桶
db.sensor_data.updateOne(
  {
    sensor_id: "s1",
    date: ISODate("2026-01-01T00:00:00"),
    nMeasurements: { $lt: 60 }
  },
  {
    $push: { measurements: 23.7 },
    $inc: { nMeasurements: 1 },
    $min: { "summary.min": 23.7 },
    $max: { "summary.max": 23.7 }
  },
  { upsert: true }
)

MongoDB Schema Design Patterns: Embedding, Bucketing, Time-Series, and Validatio illustration

原生时序集合（MongoDB 5.0+）

db.createCollection("sensor_readings", {
  timeseries: {
    timeField: "timestamp",
    metaField: "sensor_id",
    granularity: "minutes"
  },
  expireAfterSeconds: 7776000  // 90 天 TTL
})

// 高效的每小时聚合
db.sensor_readings.aggregate([
  { $match: { sensor_id: "temp_001", timestamp: { $gte: new Date("2026-01-01") } } },
  {
    $group: {
      _id: { $dateTrunc: { date: "$timestamp", unit: "hour" } },
      avgTemp: { $avg: "$temperature" },
      count: { $sum: 1 }
    }
  },
  { $sort: { _id: 1 } }
])

异常值模式

处理偶尔会违反常规模式的文档：

// 普通帖子：内嵌评论
{ _id: ObjectId("post_1"), comments: [...最多 50 条], has_extra: false }

// 病毒式帖子：溢出到单独集合
{ _id: ObjectId("post_2"), comments: [...前 50 条], has_extra: true }

// 溢出集合用于额外评论
{ post_id: ObjectId("post_2"), comments: [...下一批] }

async function getComments(postId) {
  const post = await db.posts.findOne({ _id: postId });
  if (!post.has_extra_comments) return post.comments;
  const overflow = await db.overflow.find({ post_id: postId }).toArray();
  return [...post.comments, ...overflow.flatMap(o => o.comments)];
}

MongoDB Schema Design Patterns: Embedding, Bucketing, Time-Series, and Validatio illustration

模式验证

db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["email", "name", "created_at"],
      properties: {
        email: {
          bsonType: "string",
          pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}quot;
        },
        role: { enum: ["user", "admin", "moderator"] },
        age: { bsonType: "int", minimum: 0, maximum: 150 }
      }
    }
  },
  validationAction: "error"
})

索引最佳实践

// 复合索引用于常见查询
db.orders.createIndex({ user_id: 1, status: 1, created_at: -1 })

// 部分索引：仅活跃记录
db.users.createIndex(
  { email: 1 },
  {
    partialFilterExpression: { deleted_at: { $exists: false } },
    unique: true
  }
)

// 带字段权重的文本搜索
db.articles.createIndex(
  { title: "text", content: "text" },
  { weights: { title: 10, content: 1 } }
)

// 分析查询性能
db.orders.find({ user_id: "123" }).explain("executionStats")
// 查找：IXSCAN 优于 COLLSCAN
// totalDocsExamined 应接近 nReturned

应避免的反模式

// 避免：无界数组（16MB 文档限制）
{ liked_posts: [/* 数百万个 ID */] }
// 修复：使用单独的集合存储点赞

// 避免：单调递增的分片键（产生写入热点）
// 修复：基于哈希的分片
db.adminCommand({
  shardCollection: "mydb.events",
  key: { user_id: "hashed" }
})

良好的 MongoDB 模式设计意味着首先考虑访问模式，然后选择合适的内嵌/引用策略和索引。

页面加载失败

MongoDB 模式设计模式：内嵌、分桶、时序与验证

生产环境中的 MongoDB 模式设计

围绕访问模式进行设计

内嵌 vs. 引用

时序数据的分桶模式

原生时序集合（MongoDB 5.0+）

异常值模式

模式验证

索引最佳实践

应避免的反模式