正在加载,请稍候…

可观测性:Node.js 的日志、指标和链路追踪

在 Node.js 服务中实现可观测性的三大支柱:使用 Winston 进行结构化日志、Prometheus 自定义指标以及 OpenTelemetry 分布式

可观测性:Node.js 的日志、指标和链路追踪

可观测性:Node.js 的日志、指标和链路追踪

可观测性的三大支柱:日志(发生了什么)、指标(多少/多快)、链路追踪(在哪里)。

使用 Winston 进行结构化日志

import winston from 'winston';

const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    process.env.NODE_ENV === 'production'
      ? winston.format.json()
      : winston.format.prettyPrint()
  ),
  defaultMeta: {
    service: 'api-service',
    version: process.env.APP_VERSION,
  },
  transports: [
    new winston.transports.Console(),
    new winston.transports.File({ filename: 'logs/error.log', level: 'error' }),
  ],
});

// 使用结构化日志
logger.info('User registered', {
  userId: user.id,
  email: user.email,
  duration: Date.now() - startTime,
});

logger.error('Payment failed', {
  orderId: order.id,
  amount: order.total,
  error: error.message,
});

可观测性:Node.js 的日志、指标和链路追踪插图

请求上下文传播

import { AsyncLocalStorage } from 'async_hooks';

interface RequestContext {
  requestId: string;
  userId?: string;
  traceId?: string;
}

const requestStorage = new AsyncLocalStorage<RequestContext>();

// 设置上下文的中间件
app.use((req, res, next) => {
  const context: RequestContext = {
    requestId: req.headers['x-request-id'] as string ?? crypto.randomUUID(),
    userId: req.user?.id,
  };
  requestStorage.run(context, next);
});

// 在调用链的任何位置访问上下文
function log(message: string, meta?: object) {
  const ctx = requestStorage.getStore();
  logger.info(message, { ...meta, ...ctx });
}

可观测性:Node.js 的日志、指标和链路追踪插图

OpenTelemetry 设置

import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

// 在应用代码之前初始化
const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: 'http://otel-collector:4318/v1/traces',
  }),
  instrumentations: [getNodeAutoInstrumentations()],
  serviceName: 'api-service',
});

sdk.start();

可观测性:Node.js 的日志、指标和链路追踪插图

自定义 Span

import { trace, SpanStatusCode } from '@opentelemetry/api';

const tracer = trace.getTracer('api-service');

async function processOrder(orderId: string) {
  return tracer.startActiveSpan('processOrder', async (span) => {
    span.setAttributes({
      'order.id': orderId,
      'order.type': 'online',
    });

    try {
      const order = await orderRepo.findById(orderId);
      span.setAttribute('order.total', order.total);

      const result = await paymentService.charge(order);
      span.setStatus({ code: SpanStatusCode.OK });
      return result;
    } catch (error) {
      span.recordException(error as Error);
      span.setStatus({ code: SpanStatusCode.ERROR, message: (error as Error).message });
      throw error;
    } finally {
      span.end();
    }
  });
}

Prometheus 指标

import { register, Counter, Histogram, Gauge } from 'prom-client';

// 定义指标
const httpRequestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5],
});

const httpRequestTotal = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'route', 'status_code'],
});

// 中间件
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    const labels = { method: req.method, route: req.route?.path ?? req.path, status_code: res.statusCode };
    httpRequestDuration.observe(labels, duration);
    httpRequestTotal.inc(labels);
  });
  next();
});

// 暴露指标端点
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.send(await register.metrics());
});

三大支柱共同提供了理解和调试生产系统所需的一切。