Skip to main content

Testing and Debugging Your Terraform Code

 Testing and Debugging Your Terraform Code: From Local Experiments to Production Confidence

Your complete guide to validating, testing, and debugging Terraform configurations—catching errors before they reach production and resolving issues when they inevitably occur.

📅 Published: Feb 2026
⏱️ Estimated Reading Time: 26 minutes
🏷️ Tags: Terraform Testing, Debugging, Terratest, CI/CD, Infrastructure Testing, Error Resolution


🐞 Introduction: Why Testing Infrastructure Code Matters

The Infrastructure Testing Paradox

You wouldn't deploy application code without tests. Yet infrastructure code is often deployed with nothing more than a hopeful terraform plan.

This is terrifying for three reasons:

1. Infrastructure failures are catastrophic. A bug in your Terraform can delete production data, expose sensitive information, or create security vulnerabilities that persist for years. Application bugs are annoying; infrastructure bugs are business-ending.

2. Infrastructure changes affect everything. A misconfigured network security group impacts every application running in that VPC. A broken IAM policy blocks every service that depends on it.

3. Infrastructure is stateful. You can't just "redeploy" and hope—you have to clean up the broken state first. The blast radius of a bad apply can take days or weeks to fully recover from.

The paradox: Infrastructure is harder to test than application code, but more critical to get right.


The Testing Pyramid for Infrastructure

Just like application testing, infrastructure testing follows a pyramid pattern:

text
          /\
         /  \
        /    \
       /      \
      /        \
     /  MANUAL  \     <-- Production verification, canaries
    /  EXPLORATORY \   <-- Expensive, slow, rare
   /________________\
  /                  \
 /    INTEGRATION    \  <-- Real infrastructure, isolated environment
 /       TESTS        \  <-- Slower, more expensive, comprehensive
/______________________\
|                      |
|      CONTRACT       |  <-- Module interface validation
|       TESTS         |  <-- Fast, focused, API-level
|_____________________|
|                      |
|     STATIC          |  <-- Linting, formatting, security scanning
|     ANALYSIS        |  <-- Fastest, cheapest, catches obvious errors
|_____________________|

Each level has different tradeoffs:

  • Bottom (Static Analysis): Seconds to run, catches syntax errors and known bad patterns

  • Middle (Contract Tests): Minutes to run, ensures modules work as advertised

  • Upper (Integration Tests): 5-30 minutes to run, validates actual infrastructure behavior

  • Top (Manual): Hours to days, catches the unexpected

Professional Terraform teams test at ALL levels, not just one.


🔍 Static Analysis: Catching Errors Before They Happen

What Static Analysis Can (and Can't) Do

✅ CAN catch:

  • Syntax errors and invalid HCL

  • Formatting inconsistencies

  • Known security misconfigurations

  • Deprecated resource arguments

  • Missing required variables

  • Invalid variable types

❌ CAN'T catch:

  • Logic errors (wrong CIDR calculation)

  • Provider API issues (resource limits, permissions)

  • Runtime failures (timeouts, dependencies)

  • Integration problems (this works in dev but not prod)

Static analysis is your first line of defense. It's fast, cheap, and should run on every commit.


Command 1: terraform fmt — Consistent Style

bash
# Check formatting without changing files
terraform fmt -check -recursive

# Automatically fix formatting issues
terraform fmt -recursive

# Exit codes:
# 0 = all files formatted
# 1 = errors (invalid syntax)
# 3 = some files need formatting

Why this matters: Consistent formatting isn't aesthetic—it's cognitive. When every Terraform file looks the same, reviewers focus on logic, not layout.

CI Integration:

yaml
# .github/workflows/terraform.yml
- name: Check Terraform Formatting
  run: terraform fmt -check -recursive
  working-directory: ./terraform

Command 2: terraform validate — Syntax and Internal Consistency

bash
# Basic validation
terraform validate

# JSON output for CI
terraform validate -json

# Exit codes:
# 0 = valid
# 1 = invalid

What it checks:

  • ✅ Valid HCL syntax

  • ✅ Referenced variables exist

  • ✅ Referenced resources/modules exist

  • ✅ Provider requirements satisfied

  • ❌ Does NOT check against cloud provider APIs

Always run this before pushing code.


Command 3: terraform init -backend=false — Module Validation

bash
# Initialize without backend (faster for CI)
terraform init -backend=false

# Validate module references
terraform init -backend=false -get-plugins=false

Why: Many validation errors come from missing or misconfigured modules. Running init ensures modules are downloaded and available.


Command 4: terraform validate with Custom Conditions

Terraform 1.5+ includes preconditions and postconditions:

hcl
resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t2.micro"
  
  # PRECONDITION: Check BEFORE creation
  lifecycle {
    precondition {
      condition     = var.environment != "prod" || var.instance_count >= 3
      error_message = "Production environments require at least 3 instances."
    }
  }
}

data "aws_ami" "ubuntu" {
  most_recent = true
  
  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
  }
  
  owners = ["099720109477"]
  
  # POSTCONDITION: Check AFTER reading data
  lifecycle {
    postcondition {
      condition     = self.architecture == "x86_64"
      error_message = "Only x86_64 AMIs are supported."
    }
  }
}

This is static validation WITHIN your configuration—it fails during plan, not apply.


Tool: tflint — Terraform-Specific Linter

bash
# Install tflint
brew install tflint  # macOS
curl -s https://raw.githubusercontent.com/terraform-linters/tflint/master/install_linux.sh | bash

# Basic scan
tflint

# Scan with configuration
tflint --config .tflint.hcl

# Deep mode (requires AWS credentials)
tflint --deep

.tflint.hcl configuration:

hcl
plugin "aws" {
  enabled = true
  version = "0.21.0"
  source  = "github.com/terraform-linters/tflint-ruleset-aws"
}

rule "aws_instance_invalid_type" {
  enabled = true
}

rule "aws_s3_bucket_name" {
  enabled = true
}

rule "terraform_deprecated_index" {
  enabled = true
}

rule "terraform_comment_syntax" {
  enabled = true
}

What it catches that validate doesn't:

  • Invalid instance types for region

  • Deprecated resource syntax

  • Best practice violations

  • Provider-specific validation


🔧 Unit and Contract Testing: Testing Modules in Isolation

What Are Contract Tests?

Contract tests verify that your module behaves as advertised—without creating real infrastructure. They check:

  1. Input contract: Variables have correct types, descriptions, validation

  2. Output contract: Outputs exist and have correct types

  3. Resource contract: Resources are created with expected attributes

  4. Error contract: Module fails gracefully with invalid inputs

These tests run fast (seconds) and don't require cloud credentials.


Testing with terraform test (Terraform 1.6+)

Terraform now includes native testing capabilities:

hcl
# tests/vpc_test.tftest.hcl
run "test_vpc_basic" {
  # Override variables for this test
  variables {
    vpc_name     = "test-vpc"
    environment  = "test"
    vpc_cidr     = "10.0.0.0/16"
  }
  
  # Verify outputs
  assert {
    condition     = output.vpc_id != null
    error_message = "VPC ID should not be null"
  }
  
  assert {
    condition     = output.vpc_cidr_block == "10.0.0.0/16"
    error_message = "VPC CIDR block should match input"
  }
}

run "test_vpc_production_requirements" {
  variables {
    vpc_name    = "prod-vpc"
    environment = "prod"
  }
  
  # Verify production requirements
  assert {
    condition     = aws_vpc.this.enable_dns_hostnames == true
    error_message = "Production VPC must have DNS hostnames enabled"
  }
  
  assert {
    condition     = length(aws_subnet.public) >= 2
    error_message = "Production VPC requires at least 2 public subnets"
  }
}

run "test_vpc_invalid_cidr" {
  # This test should fail
  command = plan
  
  variables {
    vpc_cidr = "invalid"
  }
  
  expect_failures = [
    var.vpc_cidr,  # Expect validation to fail
  ]
}
bash
# Run all tests
terraform test

# Run specific test file
terraform test -filter=tests/vpc_test.tftest.hcl

# Verbose output
terraform test -verbose

Testing with Terratest (Go)

For more complex testing scenarios, Terratest is the industry standard.

go
// test/vpc_test.go
package test

import (
	"testing"

	"github.com/gruntwork-io/terratest/modules/terraform"
	"github.com/stretchr/testify/assert"
)

func TestVPCModule(t *testing.T) {
	t.Parallel()

	terraformOptions := &terraform.Options{
		// The path to where your Terraform code is located
		TerraformDir: "../examples/basic-vpc",

		// Variables to pass to the Terraform module
		Vars: map[string]interface{}{
			"vpc_name":    "test-vpc",
			"environment": "test",
			"vpc_cidr":    "10.0.0.0/16",
		},

		// Disable color output for CI
		NoColor: true,
	}

	// Clean up everything at the end
	defer terraform.Destroy(t, terraformOptions)

	// Initialize and apply
	terraform.InitAndApply(t, terraformOptions)

	// Verify outputs
	vpcID := terraform.Output(t, terraformOptions, "vpc_id")
	assert.NotEmpty(t, vpcID)
	
	vpcCIDR := terraform.Output(t, terraformOptions, "vpc_cidr_block")
	assert.Equal(t, "10.0.0.0/16", vpcCIDR)
	
	subnetIDs := terraform.OutputList(t, terraformOptions, "public_subnet_ids")
	assert.Len(t, subnetIDs, 2)
}

func TestVPCInvalidInput(t *testing.T) {
	t.Parallel()

	terraformOptions := &terraform.Options{
		TerraformDir: "../examples/basic-vpc",
		Vars: map[string]interface{}{
			"vpc_cidr": "invalid", // Should cause validation error
		},
	}

	// This should fail
	_, err := terraform.InitAndPlanE(t, terraformOptions)
	assert.Error(t, err)
	assert.Contains(t, err.Error(), "VPC CIDR block must be a valid IPv4 CIDR range")
}

Run with:

bash
go test -v ./test -timeout 30m

Testing Module Interfaces

A good module test suite validates the contract, not just the implementation:

go
func TestModuleInterface(t *testing.T) {
	t.Parallel()
	
	// Parse the module to check interface
	moduleDir := "../modules/aws-vpc"
	
	// Check that required variables exist
	variables := terraform.GetVariablesAsMapFromDir(t, moduleDir)
	requiredVars := []string{"vpc_name", "environment"}
	
	for _, v := range requiredVars {
		_, exists := variables[v]
		assert.True(t, exists, "Required variable '%s' missing", v)
	}
	
	// Check that outputs exist
	outputs := terraform.GetOutputsAsMapFromDir(t, moduleDir)
	expectedOutputs := []string{"vpc_id", "vpc_cidr_block", "public_subnet_ids"}
	
	for _, o := range expectedOutputs {
		_, exists := outputs[o]
		assert.True(t, exists, "Expected output '%s' missing", o)
	}
}

🧪 Integration Testing: Testing with Real Infrastructure

Why Integration Tests Matter

Static analysis and contract tests catch syntax errors and interface issues. But they don't tell you if your configuration actually works.

Integration tests create REAL infrastructure, validate its behavior, and destroy it. They are:

  • Slow (minutes to hours)

  • Expensive (cloud resources cost money)

  • Essential (the only way to know it works)


Test Environment Strategy

hcl
# test/environments/integration/main.tf

provider "aws" {
  region = var.aws_region
}

# Use random suffix for unique resource names
resource "random_string" "suffix" {
  length  = 6
  special = false
  upper   = false
}

locals {
  test_name = "tftest-${random_string.suffix.result}"
}

# Deploy the module under test
module "vpc" {
  source = "../../../modules/aws-vpc"
  
  name        = local.test_name
  environment = "test"
  vpc_cidr    = var.vpc_cidr
}

# Test-specific validation resources
resource "null_resource" "validate_vpc" {
  triggers = {
    vpc_id = module.vpc.vpc_id
  }
  
  provisioner "local-exec" {
    command = <<EOF
      aws ec2 describe-vpcs --vpc-ids ${module.vpc.vpc_id} --region ${var.aws_region}
    EOF
  }
}

output "vpc_id" {
  value = module.vpc.vpc_id
}

Test Lifecycle with Terratest

go
package test

import (
	"fmt"
	"testing"
	"time"

	"github.com/gruntwork-io/terratest/modules/aws"
	"github.com/gruntwork-io/terratest/modules/random"
	"github.com/gruntwork-io/terratest/modules/terraform"
	"github.com/stretchr/testify/assert"
)

func TestVPCIntegration(t *testing.T) {
	t.Parallel()

	// Generate unique identifier for this test run
	testName := fmt.Sprintf("tftest-%s", random.UniqueId())

	// AWS region for test
	awsRegion := "us-west-2"

	terraformOptions := &terraform.Options{
		TerraformDir: "../test/environments/integration",

		Vars: map[string]interface{}{
			"test_name":  testName,
			"aws_region": awsRegion,
			"vpc_cidr":   "10.0.0.0/16",
		},

		EnvVars: map[string]string{
			"AWS_DEFAULT_REGION": awsRegion,
		},

		// Retry on known flaky errors
		MaxRetries:         3,
		TimeBetweenRetries: 5 * time.Second,
		
		NoColor: true,
	}

	// Clean up resources at the end
	defer terraform.Destroy(t, terraformOptions)

	// Apply the Terraform code
	terraform.InitAndApply(t, terraformOptions)

	// Get VPC ID from outputs
	vpcID := terraform.Output(t, terraformOptions, "vpc_id")

	// Verify VPC exists in AWS
	vpc := aws.GetVpcById(t, vpcID, awsRegion)
	assert.Equal(t, "10.0.0.0/16", vpc.CidrBlock)
	assert.True(t, *vpc.EnableDnsHostnames)
	assert.True(t, *vpc.EnableDnsSupport)

	// Verify VPC has expected tags
	tags := aws.GetTagsForVpc(t, vpcID, awsRegion)
	assert.Equal(t, testName, tags["Name"])
	assert.Equal(t, "test", tags["Environment"])
}

func TestVPCSubnets(t *testing.T) {
	t.Parallel()

	testName := fmt.Sprintf("tftest-%s", random.UniqueId())
	awsRegion := "us-west-2"

	terraformOptions := &terraform.Options{
		TerraformDir: "../test/environments/integration",
		Vars: map[string]interface{}{
			"test_name":  testName,
			"aws_region": awsRegion,
			"vpc_cidr":   "10.0.0.0/16",
			"public_subnet_cidrs":  ["10.0.1.0/24", "10.0.2.0/24"],
			"private_subnet_cidrs": ["10.0.10.0/24", "10.0.20.0/24"],
		},
	}

	defer terraform.Destroy(t, terraformOptions)
	terraform.InitAndApply(t, terraformOptions)

	// Get subnet IDs
	publicSubnetIDs := terraform.OutputList(t, terraformOptions, "public_subnet_ids")
	privateSubnetIDs := terraform.OutputList(t, terraformOptions, "private_subnet_ids")

	// Verify correct number of subnets
	assert.Len(t, publicSubnetIDs, 2)
	assert.Len(t, privateSubnetIDs, 2)

	// Verify each subnet exists and has correct configuration
	for _, subnetID := range publicSubnetIDs {
		subnet := aws.GetSubnetById(t, subnetID, awsRegion)
		assert.True(t, *subnet.MapPublicIpOnLaunch)
		assert.Equal(t, "public", subnet.Tags["Type"])
	}

	for _, subnetID := range privateSubnetIDs {
		subnet := aws.GetSubnetById(t, subnetID, awsRegion)
		assert.False(t, *subnet.MapPublicIpOnLaunch)
		assert.Equal(t, "private", subnet.Tags["Type"])
	}
}

Testing Stateful Resources

Some resources (databases, load balancers) are harder to test because they're slow to provision and have side effects.

go
func TestRDSPostgreSQL(t *testing.T) {
	t.Parallel()

	testName := fmt.Sprintf("tftest-%s", random.UniqueId())
	
	terraformOptions := &terraform.Options{
		TerraformDir: "../test/environments/rds-test",
		Vars: map[string]interface{}{
			"test_name":        testName,
			"database_name":    "testdb",
			"database_user":    "testuser",
			"database_password": random.UniqueId(), // Random password for each test
		},
	}

	defer terraform.Destroy(t, terraformOptions)
	terraform.InitAndApply(t, terraformOptions)

	// Get database endpoint
	endpoint := terraform.Output(t, terraformOptions, "database_endpoint")
	port := terraform.Output(t, terraformOptions, "database_port")

	// Wait for database to be ready (can take 5-10 minutes)
	time.Sleep(5 * time.Minute)

	// Test database connectivity
	connectionString := fmt.Sprintf("postgres://testuser:%s@%s:%s/testdb",
		terraformOptions.Vars["database_password"],
		endpoint,
		port,
	)

	// Try to connect and run a query
	db := sqlx.MustConnect("postgres", connectionString)
	defer db.Close()

	var result int
	err := db.Get(&result, "SELECT 1")
	assert.NoError(t, err)
	assert.Equal(t, 1, result)
}

Cleaning Up Failed Tests

The cardinal rule of integration testing: ALWAYS clean up your resources.

go
func TestWithCleanup(t *testing.T) {
	terraformOptions := &terraform.Options{
		TerraformDir: ".",
	}

	// This will run even if the test panics
	defer func() {
		terraform.Destroy(t, terraformOptions)
		
		// Verify all resources were destroyed
		remaining := terraform.StateList(t, terraformOptions)
		if len(remaining) > 0 {
			t.Logf("Warning: Resources remain in state after destroy: %v", remaining)
		}
	}()

	terraform.InitAndApply(t, terraformOptions)
	// ... test logic ...
}

🐛 Debugging Terraform: When Things Go Wrong

The Debugging Mindset

Terraform errors can be cryptic. The key is knowing WHERE to look.

Debugging hierarchy:

  1. Terraform's error message — Start here, but it's often incomplete

  2. terraform plan output — Shows what Terraform THINKS will happen

  3. Terraform logs — Set TF_LOG to see EVERYTHING

  4. Provider API logs — CloudTrail, CloudWatch, etc.

  5. Manual verification — Check the actual infrastructure state


Level 1: Terraform Logs (TF_LOG)

This is your most powerful debugging tool.

bash
# Set log level (TRACE, DEBUG, INFO, WARN, ERROR)
export TF_LOG=DEBUG

# Optionally save to file
export TF_LOG_PATH=terraform-debug.log

# Run command
terraform apply

# Disable logging when done
unset TF_LOG
unset TF_LOG_PATH

What you'll see in DEBUG logs:

  • HTTP requests/responses to provider APIs

  • State read/write operations

  • Graph building and evaluation

  • Resource lifecycle events

What you'll see in TRACE logs:

  • EVERYTHING, including function calls and variable values

  • Extremely verbose (can be gigabytes for large applies)

  • Use only when DEBUG isn't enough


Level 2: Plan Analysis

Sometimes the error is in what Terraform WANTS to do, not what it's doing.

bash
# Save plan to file
terraform plan -out=plan.tfplan

# Convert to human-readable JSON
terraform show -json plan.tfplan | jq '.' > plan.json

# Inspect specific resource changes
terraform show -json plan.tfplan | jq '.resource_changes[] | select(.type == "aws_instance")'

# Show plan in machine-readable format
terraform show -json plan.tfplan | jq '.resource_changes[] | {address, change}'

Common issues revealed by plan analysis:

  • Unexpected resource replacements (force-new)

  • Dependencies you didn't know existed

  • Incorrect count or for_each evaluation

  • Data source staleness


Level 3: State Inspection

When Terraform's behavior doesn't match reality, inspect the state.

bash
# List all resources in state
terraform state list

# Show detailed attributes of a specific resource
terraform state show aws_instance.web[0]

# Pull raw state JSON
terraform state pull | jq '.resources[] | select(.type == "aws_s3_bucket")'

# Compare state to reality
terraform plan -refresh-only  # Updates state without changing resources

Level 4: Provider-Specific Debugging

AWS:

bash
# Enable AWS SDK logging
export TF_LOG=DEBUG
export AWS_SDK_LOAD_CONFIG=1
export AWS_DEBUG=1

# Check CloudTrail for API calls
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=ResourceName,AttributeValue=i-1234567890abcdef0

Kubernetes:

bash
# Enable kubectl verbose logging
export TF_LOG=DEBUG
export KUBECTL_VERBOSE=9

# Check Kubernetes events
kubectl describe resource -n namespace
kubectl get events --all-namespaces --watch

Common Errors and How to Debug Them

Error 1: "Error creating: InvalidParameterCombination"

text
Error: Error creating DB Instance: InvalidParameterCombination:
No subnets found for the DB subnet group.

Debug approach:

bash
# 1. Check subnet IDs
terraform state show aws_db_subnet_group.main

# 2. Verify subnets exist in AWS
aws ec2 describe-subnets --subnet-ids subnet-12345 subnet-67890

# 3. Check availability zones
aws ec2 describe-availability-zones --region us-west-2

# 4. Ensure subnets are in different AZs
aws rds describe-db-subnet-groups --db-subnet-group-name main

Error 2: "Invalid for_each argument"

text
Error: Invalid for_each argument
The given "for_each" argument value is unsuitable: 
the "for_each" value must be a map or set of strings.

Debug approach:

hcl
# Add output to see what the value actually is
output "debug_for_each_value" {
  value = var.user_map  # Check if this is map or set
}

# Convert if needed
resource "aws_iam_user" "this" {
  for_each = toset(var.user_list)  # Convert list to set
  name     = each.key
}

Error 3: "Provider doesn't support resource"

text
Error: aws_s3_bucket_policy is not a supported resource type

Debug approach:

bash
# 1. Check provider version
terraform version
cat .terraform.lock.hcl | grep aws

# 2. Update to newer version
terraform init -upgrade

# 3. Check documentation for correct resource name
# aws_s3_bucket_policy -> aws_s3_bucket_public_access_block (different!)

Error 4: "Context deadline exceeded"

text
Error: timeout while waiting for state to become 'success'

Debug approach:

hcl
# Increase timeouts
resource "aws_db_instance" "main" {
  # ... other config ...
  
  timeouts {
    create = "60m"
    update = "60m"
    delete = "60m"
  }
}

🧰 Terraform Console: Interactive Debugging

Your REPL for Terraform

terraform console is an interactive shell for evaluating Terraform expressions.

bash
terraform console
> var.environment
"dev"

> aws_vpc.main.cidr_block
"10.0.0.0/16"

> local.instance_count[terraform.workspace]
3

> [for i in range(3): "subnet-${i}"]
[
  "subnet-0",
  "subnet-1", 
  "subnet-2",
]

> exit

Use cases:

  • Testing complex for expressions

  • Verifying variable interpolation

  • Debugging cidrsubnet calculations

  • Checking function behavior


🤖 CI/CD Testing Pipelines

Comprehensive Test Pipeline

yaml
name: Terraform CI/CD Pipeline

on:
  pull_request:
    branches: [ main ]
  push:
    branches: [ main ]

jobs:
  static-analysis:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v2
      with:
        terraform_version: 1.6.0
    
    - name: Terraform Format
      run: terraform fmt -check -recursive
    
    - name: Terraform Init
      run: terraform init -backend=false
      working-directory: ./environments/dev
    
    - name: Terraform Validate
      run: terraform validate
      working-directory: ./environments/dev
    
    - name: TFLint
      uses: terraform-linters/setup-tflint@v3
      with:
        tflint_version: latest
    
    - name: Run TFLint
      run: tflint --recursive
      working-directory: ./
    
    - name: Checkov
      uses: bridgecrewio/checkov-action@master
      with:
        directory: ./
        framework: terraform
        soft_fail: false

  unit-tests:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v2
    
    - name: Run Terraform Tests
      run: terraform test -verbose
      working-directory: ./test/unit

  integration-tests:
    runs-on: ubuntu-latest
    environment: test
    steps:
    - uses: actions/checkout@v3
    
    - name: Setup Terraform
      uses: hashicorp/setup-tfenv@v3
    
    - name: Setup Go
      uses: actions/setup-go@v4
      with:
        go-version: '1.21'
    
    - name: Configure AWS Credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/terraform-test-role
        aws-region: us-west-2
    
    - name: Run Terratest
      run: go test -v ./test/integration -timeout 30m
    
    - name: Notify Slack
      if: failure()
      uses: slackapi/slack-github-action@v1.24.0
      with:
        payload: |
          {
            "text": "❌ Integration tests failed for ${{ github.repository }}@${{ github.ref }}"
          }
      env:
        SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

  plan:
    runs-on: ubuntu-latest
    needs: [static-analysis, unit-tests, integration-tests]
    environment: ${{ github.ref == 'refs/heads/main' && 'prod' || 'dev' }}
    steps:
    - uses: actions/checkout@v3
    
    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v2
    
    - name: Configure AWS Credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/terraform-${{ github.ref == 'refs/heads/main' && 'prod' || 'dev' }}-role
    
    - name: Terraform Init
      run: terraform init
      working-directory: ./environments/${{ github.ref == 'refs/heads/main' && 'prod' || 'dev' }}
    
    - name: Terraform Plan
      id: plan
      run: terraform plan -no-color
      working-directory: ./environments/${{ github.ref == 'refs/heads/main' && 'prod' || 'dev' }}
    
    - name: Comment Plan
      uses: actions/github-script@v6
      if: github.event_name == 'pull_request'
      with:
        script: |
          const output = `#### Terraform Plan 📖
          
          <details><summary>Show Plan</summary>
          
          \`\`\`terraform\n
          ${process.env.PLAN}
          \`\`\`
          
          </details>
          
          *Pushed by: @${{ github.actor }}, Action: \`${{ github.event_name }}\`*`;
          
          github.rest.issues.createComment({
            issue_number: context.issue.number,
            owner: context.repo.owner,
            repo: context.repo.repo,
            body: output
          })
      env:
        PLAN: ${{ steps.plan.outputs.stdout }}

  apply:
    runs-on: ubuntu-latest
    needs: [plan]
    if: github.ref == 'refs/heads/main'
    environment: prod
    concurrency: production
    steps:
    - uses: actions/checkout@v3
    
    - name: Setup Terraform
      uses: hashicorp/setup-terraform@v2
    
    - name: Configure AWS Credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/terraform-prod-role
    
    - name: Terraform Init
      run: terraform init
      working-directory: ./environments/prod
    
    - name: Terraform Apply
      run: terraform apply -auto-approve
      working-directory: ./environments/prod

📋 Terraform Testing Checklist

Static Analysis

  • terraform fmt -check -recursive passes

  • terraform validate passes

  • tflint passes with no errors

  • checkov passes with no high/critical violations

  • tfsec passes with no high/critical violations

  • Pre-commit hooks configured for all developers

  • Preconditions/postconditions defined for critical resources

Unit/Contract Tests

  • Module variables have proper type constraints

  • Module outputs are documented and tested

  • terraform test runs in CI pipeline

  • Invalid inputs produce expected errors

  • Edge cases (empty lists, null values) handled gracefully

Integration Tests

  • Isolated test environment (separate account/VPC)

  • Resources uniquely named to avoid conflicts

  • Automatic cleanup on test completion/failure

  • Timeouts configured to prevent hung tests

  • Idempotency verified (apply twice, no changes)

  • Destructive changes tested in isolation

Debugging Capabilities

  • Team knows how to enable TF_LOG

  • State inspection commands documented

  • Common error patterns documented in runbook

  • terraform console used for complex expression testing

CI/CD

  • Static analysis runs on every commit

  • Unit tests run on every PR

  • Integration tests run before merge to main

  • Plan output posted to PR for review

  • Production apply requires manual approval

  • Failed tests block merge


🎓 Summary: Test Early, Test Often, Test Real

Testing infrastructure code is harder than testing application code, but the consequences of failure are much higher.

Test LevelTimeCostConfidenceFrequency
Static AnalysisSeconds$0LowEvery commit
Contract TestsSeconds$0MediumEvery PR
Integration TestsMinutes$HighEvery merge to main
Production CanariesContinuous$$HighestAfter deploy

The most important testing principle: Shift left. Find issues as early as possible, when they're cheap to fix and haven't affected users.

The second most important principle: Test what you deploy; deploy what you test. Your test environment should mirror production as closely as possible. The resources you test should be the same artifacts you promote.


🔗 Master Terraform Testing with Hands-on Labs

Theory is essential, but testing skills are built through practice—and failure—in safe environments.

👉 Practice Terraform testing, debugging, and validation in our interactive labs at:
https://devops.trainwithsky.com/

Our platform provides:

  • Static analysis and linting challenges

  • Contract test implementation exercises

  • Integration test environment setup

  • Debugging real failure scenarios

  • CI/CD pipeline configuration

  • Multi-environment testing strategies


Frequently Asked Questions

Q: How much testing is enough?

A: There's no universal answer, but a good heuristic: If a failure would cause significant business impact, it should have automated tests at multiple levels. Critical infrastructure (IAM, networking, databases) deserves full integration tests. Simple, low-risk resources may only need static analysis.

Q: Should I test community modules?

A: You should absolutely test how community modules behave in YOUR environment. Even well-tested modules can have unexpected interactions with your existing infrastructure, compliance requirements, or usage patterns.

Q: How do I test destructive changes?

A: Use a completely isolated test environment. Create a clone of your production configuration with different resource names. Test the destructive change there first. If it works, you can apply with confidence in production.

Q: Why does terraform plan sometimes show changes when I haven't changed anything?

A: This is "drift." Something changed outside of Terraform. Common causes:

  • Manual changes in the console/CLI

  • Automatic updates (Lambda runtimes, AMIs)

  • Configuration drift in modules

  • Provider API changes

Use terraform plan -refresh-only to update state without changing resources.

Q: How do I test modules with complex dependencies?

A: Use dependency injection in your tests. Your test configuration should create all required dependencies before calling the module under test. Terratest makes this pattern easy with multiple Terraform options.

Q: What's the best way to learn Terraform debugging?

A: Break things intentionally. Create a module with a deliberate error, then practice finding it using logs, state inspection, and console evaluation. Do this until the process becomes muscle memory.

Q: Should I use terraform plan as a test?

A: plan is NOT a test—it's a prediction. It tells you what Terraform THINKS will happen based on its current state and configuration. It doesn't verify that the resources will work correctly, only that they can be created.


Struggling with a specific Terraform error? Not sure how to test a particular resource? Share your debugging challenge in the comments below—our community of Terraform practitioners has seen (and fixed) almost every error! 💬

Comments

Popular posts from this blog

Introduction to Terraform – The Future of Infrastructure as Code

  Introduction to Terraform – The Future of Infrastructure as Code In today’s fast-paced DevOps world, managing infrastructure manually is outdated . This is where Terraform comes in—a powerful Infrastructure as Code (IaC) tool that allows you to define, provision, and manage cloud infrastructure efficiently . Whether you're working with AWS, Azure, Google Cloud, or on-premises servers , Terraform provides a declarative, automation-first approach to infrastructure deployment. Shape Your Future with AI & Infinite Knowledge...!! Read In-Depth Tech & Self-Improvement Blogs http://www.skyinfinitetech.com Watch Life-Changing Videos on YouTube https://www.youtube.com/@SkyInfinite-Learning Transform Your Skills, Business & Productivity – Join Us Today! In today’s digital-first world, agility and automation are no longer optional—they’re essential. Companies across the globe are rapidly shifting their operations to the cloud to keep up with the pace of innovatio...

📊 Monitoring & Logging in Kubernetes – Tools like Prometheus, Grafana, and Fluentd

  Monitoring & Logging in Kubernetes – Tools like Prometheus, Grafana, and Fluentd Monitoring and logging are essential for maintaining a healthy and well-performing Kubernetes cluster. In this guide, we’ll cover why monitoring is important, key monitoring tools like Prometheus and Grafana, and logging tools like Fluentd to help you gain visibility into your cluster’s performance and logs. Shape Your Future with AI & Infinite Knowledge...!! Want to Generate Text-to-Voice, Images & Videos? http://www.ai.skyinfinitetech.com Read In-Depth Tech & Self-Improvement Blogs http://www.skyinfinitetech.com Watch Life-Changing Videos on YouTube https://www.youtube.com/@SkyInfinite-Learning Transform Your Skills, Business & Productivity – Join Us Today! 🚀 Introduction In today’s fast-paced cloud-native environment, Kubernetes has emerged as the de-facto container orchestration platform. But deploying and managing applications in Kubernetes is just half the ba...

🔒 Kubernetes Security – RBAC, Network Policies, and Secrets Management

  Kubernetes Security – RBAC, Network Policies, and Secrets Management Security is a critical aspect of managing Kubernetes clusters. In this guide, we'll cover essential security mechanisms like Role-Based Access Control (RBAC) , Network Policies , and Secrets Management to help you secure your Kubernetes environment effectively. Shape Your Future with AI & Infinite Knowledge...!! Want to Generate Text-to-Voice, Images & Videos? http://www.ai.skyinfinitetech.com Read In-Depth Tech & Self-Improvement Blogs http://www.skyinfinitetech.com Watch Life-Changing Videos on YouTube https://www.youtube.com/@SkyInfinite-Learning Transform Your Skills, Business & Productivity – Join Us Today! 🚀 Introduction: Why Kubernetes Security Is Non-Negotiable As Kubernetes becomes the backbone of modern cloud-native infrastructure, security is no longer optional—it’s mission-critical . With multiple moving parts like containers, pods, services, nodes, and more, Kuberne...