
数据库分片:策略与实现
何时分片
在分片之前,先尝试:
- 垂直扩展(更大的机器)
- 只读副本(扩展读取)
- 缓存(减少数据库负载)
- 归档(将旧数据移至冷存储)
- 表分区(在单个数据库内拆分)
当以下情况出现时进行分片:单个节点无法处理写入吞吐量,或者存储超出实际限制。

分片策略

范围分片
SHARD_RANGES = [
(0, 10_000_000, 'shard-1'),
(10_000_000, 20_000_000, 'shard-2'),
(20_000_000, float('inf'), 'shard-3'),
]
def get_shard(user_id: int) -> str:
for start, end, shard in SHARD_RANGES:
if start <= user_id < end:
return shard
优点:简单,范围查询高效 缺点:热点风险,重新平衡困难

哈希分片
import hashlib
def get_shard(user_id: str, num_shards: int = 4) -> int:
hash_val = int(hashlib.md5(user_id.encode()).hexdigest(), 16)
return hash_val % num_shards
优点:均匀分布,无热点 缺点:范围查询需访问所有分片,增加分片需要重新映射
一致性哈希
import bisect, hashlib
class ConsistentHashRing:
def __init__(self, replicas=150):
self.ring: dict = {}
self.sorted_keys: list = []
self.replicas = replicas
def add_node(self, node: str) -> None:
for i in range(self.replicas):
key = int(hashlib.md5(f"{node}:{i}".encode()).hexdigest(), 16)
self.ring[key] = node
bisect.insort(self.sorted_keys, key)
def get_node(self, key: str) -> str:
hash_key = int(hashlib.md5(key.encode()).hexdigest(), 16)
idx = bisect.bisect(self.sorted_keys, hash_key) % len(self.sorted_keys)
return self.ring[self.sorted_keys[idx]]
# 添加节点时仅约 1/N 的键需要重新映射
ring = ConsistentHashRing()
ring.add_node('shard-1')
ring.add_node('shard-2')
应用层分片
class ShardedDatabase {
private shards: Database[];
private getShard(key: string): Database {
const hash = createHash('md5').update(key).digest('hex');
const index = parseInt(hash.substring(0, 8), 16) % this.shards.length;
return this.shards[index];
}
async findUser(userId: string): Promise<User | null> {
return this.getShard(userId).query('SELECT * FROM users WHERE id = $1', [userId]);
}
// 扇出跨分片查询
async findOrdersByDate(date: Date): Promise<Order[]> {
const results = await Promise.all(
this.shards.map(s => s.query('SELECT * FROM orders WHERE created_at > $1', [date]))
);
return results.flat();
}
}
分片键的选择至关重要。根据访问模式、基数和分布来选择。