正在加载,请稍候…

Terraform 状态管理:远程状态、状态锁定、工作区与模块化 IaC 最佳实践

掌握生产级 Terraform 状态管理:配置 S3/GCS 远程后端与 DynamoDB 锁定,使用工作区隔离环境,并通过可复用模块组织基础设施。

Terraform 状态管理:远程状态、状态锁定、工作区与模块化

Terraform 状态管理:远程状态、状态锁定、工作区与模块化 IaC 最佳实践

Terraform 状态是基础设施的真相来源。状态管理不当会导致配置漂移、资源重复和部署冲突。本指南涵盖生产级状态管理模式,从配置远程后端到使用可复用模块组织复杂基础设施。

为什么状态管理很重要

Terraform 状态跟踪配置与所管理的实际资源之间的映射。没有适当的状态管理:

  • 团队成员会覆盖彼此的更改
  • 状态在中断的 apply 过程中损坏
  • 敏感值以明文形式本地持久化
  • 多个环境共享状态导致交叉污染

Terraform 状态管理:远程状态、状态锁定、工作区与模块化 插图

远程状态后端

带 DynamoDB 锁定的 S3 后端

# backend.tf
terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "production/networking/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    kms_key_id     = "arn:aws:kms:us-east-1:123456789012:key/your-key-id"

    # DynamoDB table for state locking
    dynamodb_table = "terraform-state-locks"
  }
}

使用引导配置创建 S3 存储桶和 DynamoDB 表:

# bootstrap/main.tf
resource "aws_s3_bucket" "terraform_state" {
  bucket = "my-terraform-state"
  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_s3_bucket_versioning" "state_versioning" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "state_encryption" {
  bucket = aws_s3_bucket.terraform_state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"
    }
  }
}

resource "aws_dynamodb_table" "state_locks" {
  name         = "terraform-state-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

GCS 后端(Google Cloud)

terraform {
  backend "gcs" {
    bucket  = "my-terraform-state"
    prefix  = "production/networking"
  }
}

GCS 提供内置的对象版本控制,并使用 Cloud Storage 的原生锁定机制。

状态锁定深入解析

状态锁定防止并发操作损坏状态。

手动锁定/解锁

# Force-unlock a stuck lock (use with caution)
terraform force-unlock LOCK_ID

# Show current state
terraform state list

# Show specific resource state
terraform state show aws_instance.web_server

锁定超时配置

terraform {
  backend "s3" {
    # ... bucket config ...
    dynamodb_table = "terraform-state-locks"
  }
}
# Apply with custom lock timeout
terraform apply -lock-timeout=300s

用于环境隔离的工作区

工作区允许单个 Terraform 配置通过独立的状态文件管理多个环境。

Terraform 状态管理:远程状态、状态锁定、工作区与模块化 插图

基本工作区操作

# Create workspaces
terraform workspace new staging
terraform workspace new production

# Switch workspace
terraform workspace select production

# List workspaces
terraform workspace list

# Show current workspace
terraform workspace show

工作区感知配置

locals {
  environment = terraform.workspace
  
  instance_type = {
    staging    = "t3.micro"
    production = "t3.xlarge"
  }
  
  min_capacity = {
    staging    = 1
    production = 3
  }
}

resource "aws_autoscaling_group" "app" {
  min_size = local.min_capacity[local.environment]
  
  launch_template {
    id = aws_launch_template.app.id
    version = "$Latest"
  }
}

resource "aws_instance" "bastion" {
  instance_type = local.instance_type[local.environment]
  
  tags = {
    Name        = "bastion-${local.environment}"
    Environment = local.environment
  }
}

每个工作区的后端键

terraform {
  backend "s3" {
    bucket = "my-terraform-state"
    key    = "env:/WORKSPACE/network/terraform.tfstate"
    region = "us-east-1"
  }
}

Terraform 会自动将 WORKSPACE 替换为当前工作区名称。

模块化基础设施

模块结构

infrastructure/
  modules/
    vpc/
      main.tf
      variables.tf
      outputs.tf
      versions.tf
    eks/
      main.tf
      variables.tf
      outputs.tf
    rds/
      main.tf
      variables.tf
      outputs.tf
  environments/
    staging/
      main.tf
      terraform.tfvars
    production/
      main.tf
      terraform.tfvars

VPC 模块

# modules/vpc/variables.tf
variable "cidr_block" {
  description = "VPC CIDR block"
  type        = string

  validation {
    condition     = can(cidrnetmask(var.cidr_block))
    error_message = "Must be a valid CIDR block."
  }
}

variable "availability_zones" {
  description = "List of AZs"
  type        = list(string)
}

variable "environment" {
  type = string
}

# modules/vpc/main.tf
resource "aws_vpc" "main" {
  cidr_block           = var.cidr_block
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name        = "${var.environment}-vpc"
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

resource "aws_subnet" "private" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.cidr_block, 4, count.index)
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "${var.environment}-private-${var.availability_zones[count.index]}"
    Type = "private"
  }
}

# modules/vpc/outputs.tf
output "vpc_id" {
  value       = aws_vpc.main.id
  description = "VPC ID"
}

output "private_subnet_ids" {
  value       = aws_subnet.private[*].id
  description = "List of private subnet IDs"
}

使用模块

# environments/production/main.tf
module "vpc" {
  source = "../../modules/vpc"

  cidr_block         = "10.0.0.0/16"
  availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
  environment        = "production"
}

module "eks" {
  source = "../../modules/eks"

  cluster_name    = "production-cluster"
  vpc_id          = module.vpc.vpc_id
  subnet_ids      = module.vpc.private_subnet_ids
  node_count      = 3
  node_type       = "m5.xlarge"
}

module "rds" {
  source = "../../modules/rds"

  identifier     = "production-db"
  engine         = "postgres"
  engine_version = "16.2"
  instance_class = "db.r6g.xlarge"
  vpc_id         = module.vpc.vpc_id
  subnet_ids     = module.vpc.private_subnet_ids
}

Terraform 状态管理:远程状态、状态锁定、工作区与模块化 插图

远程状态数据源

在配置之间共享状态:

# Read VPC outputs from another state file
data "terraform_remote_state" "networking" {
  backend = "s3"
  config = {
    bucket = "my-terraform-state"
    key    = "production/networking/terraform.tfstate"
    region = "us-east-1"
  }
}

resource "aws_instance" "app" {
  subnet_id = data.terraform_remote_state.networking.outputs.private_subnet_ids[0]
}

状态迁移

导入现有资源

# Import an existing EC2 instance
terraform import aws_instance.web i-1234567890abcdef0

# Import with resource address in module
terraform import module.eks.aws_eks_cluster.main my-cluster

在状态中移动资源

# Move resource to new address (no destroy/recreate)
terraform state mv aws_instance.old_name aws_instance.new_name

# Move resource into a module
terraform state mv aws_security_group.app module.app.aws_security_group.main

从状态中移除资源

# Remove without destroying (manage outside Terraform)
terraform state rm aws_instance.legacy

状态中的敏感值

variable "db_password" {
  type      = string
  sensitive = true
}

output "db_connection_string" {
  value     = "postgresql://admin:${var.db_password}@${aws_db_instance.main.endpoint}/app"
  sensitive = true  # prevents printing in console output
}

即使设置了 sensitive = true,值仍以明文形式存储在状态中。始终使用加密的远程后端。

漂移检测

# Check for drift without applying
terraform plan -detailed-exitcode
# Exit code 0: no changes
# Exit code 1: error
# Exit code 2: changes present

# Refresh state to match reality
terraform refresh

集成到 CI:

- name: Detect drift
  run: |
    terraform init
    terraform plan -detailed-exitcode
  continue-on-error: false

结论

Terraform 状态管理是可靠基础设施即代码的基础。带锁定的远程后端可防止竞态条件和数据丢失。工作区提供清晰的环境隔离。模块化配置支持跨项目和团队复用代码。远程状态数据源允许基础设施组件之间的松散耦合。采用这些模式,您的 IaC 可以从单个开发者无缝扩展到整个平台工程团队。