Monday, November 10, 2025

Terraform Data Sources and Dependencies: Implicit vs. Explicit

Terraform Data Sources and Dependencies: Implicit vs. Explicit
Terraform Data Sources Dependencies AWS DevOps

Terraform Data Sources and Dependencies: Implicit vs. Explicit

Published on: November 3, 2023 | Author: DevOps Engineering Team

Mastering Terraform Data & Dependencies

Welcome to Part 6 of our Terraform Mastery Series! As your infrastructure grows more complex, you'll need to reference existing resources and manage intricate relationships between components. Data sources and dependency management are the keys to building sophisticated, interconnected systems that work harmoniously together.

What are Data Sources?

Data sources allow Terraform to fetch and reference information from outside your configuration. Unlike resources, data sources don't create or manage infrastructure - they only read existing data.

Data Source Purpose

  • Fetch information about existing infrastructure
  • Reference resources created outside Terraform
  • Get dynamic data from cloud providers
  • Share information between configurations

When to Use Data Sources

  • Referencing existing VPCs and subnets
  • Getting the latest AMI IDs
  • Reading existing security group rules
  • Fetching availability zone information

Data Source vs Resource

Resources create and manage infrastructure. Data sources only read existing information. Data sources are declared with the data block instead of resource.

Common Data Source Examples

Let's explore the most frequently used data sources through an interactive slider:

Latest AMI Lookup

data "aws_ami" "latest_ubuntu" {
  most_recent = true
  owners      = ["099720109477"] # Canonical

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.latest_ubuntu.id
  instance_type = "t3.micro"
}

Use Case: Always deploy with the latest Ubuntu AMI without hardcoding the ID.

Existing VPC Reference

data "aws_vpc" "main" {
  filter {
    name   = "tag:Name"
    values = ["main-vpc"]
  }
}

data "aws_subnets" "private" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.main.id]
  }

  tags = {
    Type = "private"
  }
}

resource "aws_instance" "app" {
  subnet_id     = data.aws_subnets.private.ids[0]
  instance_type = "t3.medium"
}

Use Case: Deploy resources into an existing VPC infrastructure.

Security Group Reference

data "aws_security_group" "web_sg" {
  filter {
    name   = "group-name"
    values = ["web-security-group"]
  }

  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.main.id]
  }
}

resource "aws_instance" "web" {
  vpc_security_group_ids = [data.aws_security_group.web_sg.id]
  instance_type          = "t3.micro"
}

Use Case: Attach instances to existing security groups managed by another team.

Availability Zone Information

data "aws_availability_zones" "available" {
  state = "available"
}

resource "aws_subnet" "private" {
  count = 2

  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Name = "private-subnet-${count.index + 1}"
  }
}

Use Case: Distribute resources across available AZs dynamically.

Implicit Dependencies Explained

Terraform automatically detects dependencies when you reference one resource from another. This is called implicit dependency.

VPC
Subnet
EC2 Instance

Basic Resource Reference

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

resource "aws_subnet" "web" {
  vpc_id     = aws_vpc.main.id # Implicit dependency
  cidr_block = "10.0.1.0/24"
}

Terraform knows to create the VPC before the subnet because the subnet references the VPC ID.

Chained Dependencies

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.main.id # Depends on VPC
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id # Depends on VPC

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.igw.id # Depends on IGW
  }
}

Complex dependency chains are automatically resolved by Terraform.

Explicit Dependencies with depends_on

Sometimes Terraform can't automatically detect dependencies. Use depends_on for explicit dependency declaration.

IAM Role
EC2 Instance
Application Config

When to Use depends_on

  • Resources that don't directly reference each other
  • Side-effect dependencies
  • Resources in different modules
  • When Terraform can't infer the relationship

Common Scenarios

  • IAM roles and instance profiles
  • Database initialization scripts
  • Resource creation ordering requirements
  • Cross-module dependencies

IAM Role and Instance Profile

resource "aws_iam_role" "ec2_role" {
  name = "ec2-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_instance_profile" "ec2_profile" {
  name = "ec2-instance-profile"
  role = aws_iam_role.ec2_role.name
}

resource "aws_instance" "web" {
  iam_instance_profile = aws_iam_instance_profile.ec2_profile.name
  instance_type        = "t3.micro"
  
  # Explicit dependency - instance profile must be ready
  depends_on = [aws_iam_instance_profile.ec2_profile]
}

Cross-Module Dependencies

module "network" {
  source = "./modules/network"
  
  vpc_cidr    = "10.0.0.0/16"
  subnet_count = 2
}

module "database" {
  source = "./modules/database"
  
  vpc_id     = module.network.vpc_id
  subnet_ids = module.network.private_subnet_ids
  
  # Explicit dependency on network module
  depends_on = [module.network]
}

module "application" {
  source = "./modules/application"
  
  vpc_id        = module.network.vpc_id
  subnet_ids    = module.network.public_subnet_ids
  database_url  = module.database.connection_string
  
  # Dependencies on both network and database
  depends_on = [module.network, module.database]
}

Common Dependency Pitfalls

Avoid these common mistakes when working with dependencies:

Circular Dependencies
# This will fail!
resource "aws_security_group" "web" {
  ingress {
    from_port = 80
    to_port   = 80
    protocol  = "tcp"
    security_groups = [aws_security_group.lb.id]
  }
}

resource "aws_security_group" "lb" {
  egress {
    from_port = 80
    to_port   = 80
    protocol  = "tcp"
    security_groups = [aws_security_group.web.id]
  }
}

Solution: Use self-references or restructure your security groups.

Overusing depends_on
# Unnecessary depends_on
resource "aws_instance" "web" {
  ami           = "ami-123456"
  instance_type = "t3.micro"
  subnet_id     = aws_subnet.web.id
  
  # This is redundant!
  depends_on = [aws_subnet.web]
}

Solution: Let Terraform handle implicit dependencies when possible.

Data Source and Dependency Best Practices

Data Source Guidelines

  • Use specific filters for data sources
  • Handle cases where data sources might not find anything
  • Use data sources for cross-account references
  • Cache data source results when appropriate

Dependency Management

  • Prefer implicit dependencies over explicit ones
  • Use depends_on sparingly and document why
  • Test dependency chains with terraform graph
  • Break circular dependencies by restructuring

Real-World Implementation

Here's a complete example showing data sources and dependencies working together:

Complete Web Application Stack

# Get the latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
}

# Reference existing VPC and subnets
data "aws_vpc" "selected" {
  tags = {
    Environment = "production"
  }
}

data "aws_subnets" "private" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.selected.id]
  }

  tags = {
    Type = "private"
  }
}

# Create application resources
resource "aws_security_group" "app" {
  vpc_id = data.aws_vpc.selected.id
  # ... security group rules
}

resource "aws_instance" "app_server" {
  count = 2

  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t3.medium"
  subnet_id     = data.aws_subnets.private.ids[count.index % length(data.aws_subnets.private.ids)]
  
  vpc_security_group_ids = [aws_security_group.app.id]

  tags = {
    Name = "app-server-${count.index + 1}"
  }

  # Ensure security group is created first
  depends_on = [aws_security_group.app]
}

Key Takeaways

  • Data sources fetch information about existing infrastructure
  • Implicit dependencies are automatically detected through references
  • Explicit dependencies with depends_on handle special cases
  • Avoid circular dependencies by designing resource relationships carefully
  • Use terraform graph to visualize and debug dependency chains

In our next tutorial, we'll explore Terraform Modules, where you'll learn how to create reusable, composable infrastructure components that can be shared across your organization.


No comments:

Post a Comment

Terraform Data Sources and Dependencies: Implicit vs. Explicit

Terraform Data Sources and Dependencies: Implicit vs. Explicit Terr...