
可观测性:使用 OpenTelemetry 实现分布式追踪
可观测性的三大支柱
日志: 发生了什么(离散事件)
"用户 123 在 10:23:45 登录失败"
指标: 多少/频率(聚合数值)
HTTP 请求/秒、错误率、P99 延迟
追踪: 时间花在哪里(分布式请求流)
请求:250ms 总计
API 网关:5ms
用户服务:30ms
数据库:25ms
订单服务:195ms
数据库:150ms
支付 API:40ms

OpenTelemetry 设置
// tracing.ts - 必须在应用之前导入
import { NodeSDK } from '@opentelemetry/sdk-node';
import { Resource } from '@opentelemetry/resources';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express';
const sdk = new NodeSDK({
resource: new Resource({ 'service.name': 'user-service', 'service.version': '1.2.3' }),
traceExporter: new OTLPTraceExporter({ url: 'http://jaeger:4318/v1/traces' }),
instrumentations: [new HttpInstrumentation(), new ExpressInstrumentation()],
});
sdk.start();

手动插桩
import { trace, SpanStatusCode } from '@opentelemetry/api';
const tracer = trace.getTracer('user-service');
async function processOrder(orderId: string): Promise<Order> {
return tracer.startActiveSpan('processOrder', async (span) => {
span.setAttributes({ 'order.id': orderId });
try {
const order = await fetchOrder(orderId);
await chargePayment(order);
span.setStatus({ code: SpanStatusCode.OK });
return order;
} catch (err) {
span.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
span.recordException(err);
throw err;
} finally {
span.end();
}
});
}

上下文传播
import { propagation, context } from '@opentelemetry/api';
// 生产者:将追踪上下文注入 Kafka 消息头
async function publishEvent(event: object): Promise<void> {
const headers: Record<string, string> = {};
propagation.inject(context.active(), headers); // 添加 traceparent 头
await kafka.producer.send({ topic: 'events', messages: [{ value: JSON.stringify(event), headers }] });
}
// 消费者:提取并继续追踪
async function handleMessage(message: KafkaMessage): Promise<void> {
const parentContext = propagation.extract(context.active(), message.headers ?? {});
await context.with(parentContext, () =>
tracer.startActiveSpan('handleEvent', span => {
processEvent(JSON.parse(message.value!.toString()));
span.end();
})
);
}
采样
import { ParentBasedSampler, TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-node';
const sdk = new NodeSDK({
sampler: new ParentBasedSampler({
root: new TraceIdRatioBasedSampler(0.1), // 采样 10% 的追踪
}),
});
// 如果父 span 被采样,子 span 也会被采样(分布式追踪保持完整)
Jaeger 快速开始
# docker-compose.yml
services:
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686" # UI
- "4318:4318" # OTLP HTTP
environment:
COLLECTOR_OTLP_ENABLED: true
分布式追踪回答了“为什么这个请求慢?”的问题,跨越微服务边界。