Terraform Data Sources and Dependencies: Implicit vs. Explicit

Terraform Data Sources Dependencies AWS DevOps

Terraform Data Sources and Dependencies: Implicit vs. Explicit

Published on: November 3, 2023 | Author: DevOps Engineering Team

Mastering Terraform Data & Dependencies

Welcome to Part 6 of our Terraform Mastery Series! As your infrastructure grows more complex, you'll need to reference existing resources and manage intricate relationships between components. Data sources and dependency management are the keys to building sophisticated, interconnected systems that work harmoniously together.

What You'll Learn

What are Data Sources?
Common Data Source Examples
Implicit Dependencies Explained
Explicit Dependencies with depends_on
Common Dependency Pitfalls
Data Source and Dependency Best Practices
Real-World Implementation

What are Data Sources?

Data sources allow Terraform to fetch and reference information from outside your configuration. Unlike resources, data sources don't create or manage infrastructure - they only read existing data.

Data Source Purpose

Fetch information about existing infrastructure
Reference resources created outside Terraform
Get dynamic data from cloud providers
Share information between configurations

When to Use Data Sources

Referencing existing VPCs and subnets
Getting the latest AMI IDs
Reading existing security group rules
Fetching availability zone information

Data Source vs Resource

Resources create and manage infrastructure. Data sources only read existing information. Data sources are declared with the data block instead of resource.

Common Data Source Examples

Let's explore the most frequently used data sources through an interactive slider:

Latest AMI Lookup

data "aws_ami" "latest_ubuntu" {
  most_recent = true
  owners      = ["099720109477"] # Canonical

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.latest_ubuntu.id
  instance_type = "t3.micro"
}

Use Case: Always deploy with the latest Ubuntu AMI without hardcoding the ID.

Existing VPC Reference

data "aws_vpc" "main" {
  filter {
    name   = "tag:Name"
    values = ["main-vpc"]
  }
}

data "aws_subnets" "private" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.main.id]
  }

  tags = {
    Type = "private"
  }
}

resource "aws_instance" "app" {
  subnet_id     = data.aws_subnets.private.ids[0]
  instance_type = "t3.medium"
}

Use Case: Deploy resources into an existing VPC infrastructure.

Security Group Reference

data "aws_security_group" "web_sg" {
  filter {
    name   = "group-name"
    values = ["web-security-group"]
  }

  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.main.id]
  }
}

resource "aws_instance" "web" {
  vpc_security_group_ids = [data.aws_security_group.web_sg.id]
  instance_type          = "t3.micro"
}

Use Case: Attach instances to existing security groups managed by another team.

Availability Zone Information

data "aws_availability_zones" "available" {
  state = "available"
}

resource "aws_subnet" "private" {
  count = 2

  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${count.index}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    Name = "private-subnet-${count.index + 1}"
  }
}

Use Case: Distribute resources across available AZs dynamically.

Implicit Dependencies Explained

Terraform automatically detects dependencies when you reference one resource from another. This is called implicit dependency.

VPC

→

Subnet

→

EC2 Instance

Basic Resource Reference

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

resource "aws_subnet" "web" {
  vpc_id     = aws_vpc.main.id # Implicit dependency
  cidr_block = "10.0.1.0/24"
}

Terraform knows to create the VPC before the subnet because the subnet references the VPC ID.

Chained Dependencies

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.main.id # Depends on VPC
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id # Depends on VPC

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.igw.id # Depends on IGW
  }
}

Complex dependency chains are automatically resolved by Terraform.

Explicit Dependencies with depends_on

Sometimes Terraform can't automatically detect dependencies. Use depends_on for explicit dependency declaration.

IAM Role

→

EC2 Instance

→

Application Config

When to Use depends_on

Resources that don't directly reference each other
Side-effect dependencies
Resources in different modules
When Terraform can't infer the relationship

Common Scenarios

IAM roles and instance profiles
Database initialization scripts
Resource creation ordering requirements
Cross-module dependencies

IAM Role and Instance Profile

resource "aws_iam_role" "ec2_role" {
  name = "ec2-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_instance_profile" "ec2_profile" {
  name = "ec2-instance-profile"
  role = aws_iam_role.ec2_role.name
}

resource "aws_instance" "web" {
  iam_instance_profile = aws_iam_instance_profile.ec2_profile.name
  instance_type        = "t3.micro"
  
  # Explicit dependency - instance profile must be ready
  depends_on = [aws_iam_instance_profile.ec2_profile]
}

Cross-Module Dependencies

module "network" {
  source = "./modules/network"
  
  vpc_cidr    = "10.0.0.0/16"
  subnet_count = 2
}

module "database" {
  source = "./modules/database"
  
  vpc_id     = module.network.vpc_id
  subnet_ids = module.network.private_subnet_ids
  
  # Explicit dependency on network module
  depends_on = [module.network]
}

module "application" {
  source = "./modules/application"
  
  vpc_id        = module.network.vpc_id
  subnet_ids    = module.network.public_subnet_ids
  database_url  = module.database.connection_string
  
  # Dependencies on both network and database
  depends_on = [module.network, module.database]
}

Common Dependency Pitfalls

Avoid these common mistakes when working with dependencies:

Circular Dependencies

# This will fail!
resource "aws_security_group" "web" {
  ingress {
    from_port = 80
    to_port   = 80
    protocol  = "tcp"
    security_groups = [aws_security_group.lb.id]
  }
}

resource "aws_security_group" "lb" {
  egress {
    from_port = 80
    to_port   = 80
    protocol  = "tcp"
    security_groups = [aws_security_group.web.id]
  }
}

Solution: Use self-references or restructure your security groups.

Overusing depends_on

# Unnecessary depends_on
resource "aws_instance" "web" {
  ami           = "ami-123456"
  instance_type = "t3.micro"
  subnet_id     = aws_subnet.web.id
  
  # This is redundant!
  depends_on = [aws_subnet.web]
}

Solution: Let Terraform handle implicit dependencies when possible.

Data Source and Dependency Best Practices

Data Source Guidelines

Use specific filters for data sources
Handle cases where data sources might not find anything
Use data sources for cross-account references
Cache data source results when appropriate

Dependency Management

Prefer implicit dependencies over explicit ones
Use depends_on sparingly and document why
Test dependency chains with terraform graph
Break circular dependencies by restructuring

Real-World Implementation

Here's a complete example showing data sources and dependencies working together:

Complete Web Application Stack

# Get the latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
}

# Reference existing VPC and subnets
data "aws_vpc" "selected" {
  tags = {
    Environment = "production"
  }
}

data "aws_subnets" "private" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.selected.id]
  }

  tags = {
    Type = "private"
  }
}

# Create application resources
resource "aws_security_group" "app" {
  vpc_id = data.aws_vpc.selected.id
  # ... security group rules
}

resource "aws_instance" "app_server" {
  count = 2

  ami           = data.aws_ami.amazon_linux.id
  instance_type = "t3.medium"
  subnet_id     = data.aws_subnets.private.ids[count.index % length(data.aws_subnets.private.ids)]
  
  vpc_security_group_ids = [aws_security_group.app.id]

  tags = {
    Name = "app-server-${count.index + 1}"
  }

  # Ensure security group is created first
  depends_on = [aws_security_group.app]
}

Key Takeaways

Data sources fetch information about existing infrastructure
Implicit dependencies are automatically detected through references
Explicit dependencies with depends_on handle special cases
Avoid circular dependencies by designing resource relationships carefully
Use terraform graph to visualize and debug dependency chains

In our next tutorial, we'll explore Terraform Modules, where you'll learn how to create reusable, composable infrastructure components that can be shared across your organization.

SKY Tech – Explore Technology!

Monday, November 10, 2025