Terraform 状态管理:远程状态、状态锁定、工作区与模块化 IaC 最佳实践
Terraform 状态是基础设施的真相来源。状态管理不当会导致配置漂移、资源重复和部署冲突。本指南涵盖生产级状态管理模式,从配置远程后端到使用可复用模块组织复杂基础设施。
为什么状态管理很重要
Terraform 状态跟踪配置与所管理的实际资源之间的映射。没有适当的状态管理:
- 团队成员会覆盖彼此的更改
- 状态在中断的 apply 过程中损坏
- 敏感值以明文形式本地持久化
- 多个环境共享状态导致交叉污染

远程状态后端
带 DynamoDB 锁定的 S3 后端
# backend.tf
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "production/networking/terraform.tfstate"
region = "us-east-1"
encrypt = true
kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/your-key-id"
# DynamoDB table for state locking
dynamodb_table = "terraform-state-locks"
}
}
使用引导配置创建 S3 存储桶和 DynamoDB 表:
# bootstrap/main.tf
resource "aws_s3_bucket" "terraform_state" {
bucket = "my-terraform-state"
lifecycle {
prevent_destroy = true
}
}
resource "aws_s3_bucket_versioning" "state_versioning" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "state_encryption" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
resource "aws_dynamodb_table" "state_locks" {
name = "terraform-state-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
GCS 后端(Google Cloud)
terraform {
backend "gcs" {
bucket = "my-terraform-state"
prefix = "production/networking"
}
}
GCS 提供内置的对象版本控制,并使用 Cloud Storage 的原生锁定机制。
状态锁定深入解析
状态锁定防止并发操作损坏状态。
手动锁定/解锁
# Force-unlock a stuck lock (use with caution)
terraform force-unlock LOCK_ID
# Show current state
terraform state list
# Show specific resource state
terraform state show aws_instance.web_server
锁定超时配置
terraform {
backend "s3" {
# ... bucket config ...
dynamodb_table = "terraform-state-locks"
}
}
# Apply with custom lock timeout
terraform apply -lock-timeout=300s
用于环境隔离的工作区
工作区允许单个 Terraform 配置通过独立的状态文件管理多个环境。

基本工作区操作
# Create workspaces
terraform workspace new staging
terraform workspace new production
# Switch workspace
terraform workspace select production
# List workspaces
terraform workspace list
# Show current workspace
terraform workspace show
工作区感知配置
locals {
environment = terraform.workspace
instance_type = {
staging = "t3.micro"
production = "t3.xlarge"
}
min_capacity = {
staging = 1
production = 3
}
}
resource "aws_autoscaling_group" "app" {
min_size = local.min_capacity[local.environment]
launch_template {
id = aws_launch_template.app.id
version = "$Latest"
}
}
resource "aws_instance" "bastion" {
instance_type = local.instance_type[local.environment]
tags = {
Name = "bastion-${local.environment}"
Environment = local.environment
}
}
每个工作区的后端键
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "env:/WORKSPACE/network/terraform.tfstate"
region = "us-east-1"
}
}
Terraform 会自动将 WORKSPACE 替换为当前工作区名称。
模块化基础设施
模块结构
infrastructure/
modules/
vpc/
main.tf
variables.tf
outputs.tf
versions.tf
eks/
main.tf
variables.tf
outputs.tf
rds/
main.tf
variables.tf
outputs.tf
environments/
staging/
main.tf
terraform.tfvars
production/
main.tf
terraform.tfvars
VPC 模块
# modules/vpc/variables.tf
variable "cidr_block" {
description = "VPC CIDR block"
type = string
validation {
condition = can(cidrnetmask(var.cidr_block))
error_message = "Must be a valid CIDR block."
}
}
variable "availability_zones" {
description = "List of AZs"
type = list(string)
}
variable "environment" {
type = string
}
# modules/vpc/main.tf
resource "aws_vpc" "main" {
cidr_block = var.cidr_block
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.environment}-vpc"
Environment = var.environment
ManagedBy = "terraform"
}
}
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.cidr_block, 4, count.index)
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.environment}-private-${var.availability_zones[count.index]}"
Type = "private"
}
}
# modules/vpc/outputs.tf
output "vpc_id" {
value = aws_vpc.main.id
description = "VPC ID"
}
output "private_subnet_ids" {
value = aws_subnet.private[*].id
description = "List of private subnet IDs"
}
使用模块
# environments/production/main.tf
module "vpc" {
source = "../../modules/vpc"
cidr_block = "10.0.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
environment = "production"
}
module "eks" {
source = "../../modules/eks"
cluster_name = "production-cluster"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
node_count = 3
node_type = "m5.xlarge"
}
module "rds" {
source = "../../modules/rds"
identifier = "production-db"
engine = "postgres"
engine_version = "16.2"
instance_class = "db.r6g.xlarge"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnet_ids
}

远程状态数据源
在配置之间共享状态:
# Read VPC outputs from another state file
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "my-terraform-state"
key = "production/networking/terraform.tfstate"
region = "us-east-1"
}
}
resource "aws_instance" "app" {
subnet_id = data.terraform_remote_state.networking.outputs.private_subnet_ids[0]
}
状态迁移
导入现有资源
# Import an existing EC2 instance
terraform import aws_instance.web i-1234567890abcdef0
# Import with resource address in module
terraform import module.eks.aws_eks_cluster.main my-cluster
在状态中移动资源
# Move resource to new address (no destroy/recreate)
terraform state mv aws_instance.old_name aws_instance.new_name
# Move resource into a module
terraform state mv aws_security_group.app module.app.aws_security_group.main
从状态中移除资源
# Remove without destroying (manage outside Terraform)
terraform state rm aws_instance.legacy
状态中的敏感值
variable "db_password" {
type = string
sensitive = true
}
output "db_connection_string" {
value = "postgresql://admin:${var.db_password}@${aws_db_instance.main.endpoint}/app"
sensitive = true # prevents printing in console output
}
即使设置了 sensitive = true,值仍以明文形式存储在状态中。始终使用加密的远程后端。
漂移检测
# Check for drift without applying
terraform plan -detailed-exitcode
# Exit code 0: no changes
# Exit code 1: error
# Exit code 2: changes present
# Refresh state to match reality
terraform refresh
集成到 CI:
- name: Detect drift
run: |
terraform init
terraform plan -detailed-exitcode
continue-on-error: false
结论
Terraform 状态管理是可靠基础设施即代码的基础。带锁定的远程后端可防止竞态条件和数据丢失。工作区提供清晰的环境隔离。模块化配置支持跨项目和团队复用代码。远程状态数据源允许基础设施组件之间的松散耦合。采用这些模式,您的 IaC 可以从单个开发者无缝扩展到整个平台工程团队。