September 21, 2019

1400 lines of Terraform

Terraform has been on my radar for a while now, ever since I was introduced to the beauty that is infrastructure as code (IaC).
Of course each cloud provider has their own flavour of IaC, AWS has CloudFormation (CF) which I’ve had good experience with, Azure has Azure Resource Manager (ARM) templates, which I don’t have any production experience with.
Consolidating on a multi-platform tool like Terraform of course has it’s own pro’s and con’s which I won’t detail here. However, I have found that it is possible to do things in Terraform that it is not possible to do in CF which is what drove me finally explore Terraform.

These are my learnings to date after a measly 1400 lines of HCL.

Remote backends

Terraform stores the state of the managed environment in .tfstate files. By default, these live in the current directory. This presents problems in multi-user or multi-machine environments.

Enter, remote backends.
These are remote file repositories where Terraform will push and pull environment state from.
There are a wide variety of supported remote backends, s3 and azurerm are the ones I’ve used to date and can confirm work well and are simple to configure.

S3

To configure an S3 remote backend, we need to define a terraform backend s3 block.

terraform {
  backend "s3" {
    bucket         = "terraform-state-unique-key-name" #1
    key            = "project_name_resources_key_name.tfstate" #2
    region         = "ap-southeast-2"
    dynamodb_table = "terraform-locks-unique-key-name" #3
    encrypt        = true
    profile        = "aws_profile_name" #4
  }
}

bucket - S3 buckets names must be unique
key - the name for your .tfstate file
The s3 backend also requires a DynamoDB table for lock management
I always use named CLI profiles for all AWS accounts

Unfortunately, at this stage, backends do not support string interpolation, so you cannot DRY and reuse your profile or region for example within the configuration of your backend.
There are tools, such as Terragrunt which are supposed to simplify this, however I’ve not used any to date.

Workspaces

Workspaces are invaluable for managing multiple deployments of the same stack. By default, terraform has a default workspace.
I encourage anyone starting out to immediately create a new workspace, dev, test, staging, prod are obvious candidates.
If you work on the default workspace and then need to migrate to a named workspace, it’s not trivial to migrate your state from the default workspace to a new named workspace.

To create a new workspace.

terraform workspace new dev

Variables

Variables are well supported and I’ll not go over the basic’s here.

There are 2 scenarios that I’ve found useful that bear mentioning.

1. Terraform remote state

A remote state can be defined where all variables are pushed to a backend and then imported into another template. This has the advantage of not having to manage local .tfvars files or passing variables on the command line.
It does come at the expense of a bit of extra complexity though as each variable modification has to be deployed to a separete template and then (my preference), redeclared in the consuming template.

NOTE: Any variables will be stored in state, so if you supply passwords or other secrets on the command line, they will be stored in your local or remote state.
See Sensitive Data in State for more.

Produce variables

variable "region" {
  type = "string"
}

variable "profile" {
  default = "profile_name"
}

variable "rds_cluster_name" {
  type = "string"
}

provider "aws" {
  profile = "${var.profile}"
  region  = "${var.region}"
}


terraform {
  backend "s3" {
    bucket         = "terraform-state-unique-key-name"
    key            = "project_name_variables.tfstate"
    region         = "ap-southeast-2"
    dynamodb_table = "terraform-locks-unique-key-name"
    encrypt        = true
    profile        = "profile_name"
  }
}

output "rds_cluster_name" {
  value = "${var.rds_cluster_name}"
}

To deploy, we run terraform apply in the working directory of our variables .tf file.
This will output the rds_cluster_name and store it in our remote backend bucket terraform-state-unique-key-name.

Consume variables To consume our variables, we declare a remote_state block.


variable "profile" {
  default = "profile_name"
}

variable "region" {
  default = "ap-southeast-2"
}

data "terraform_remote_state" "variables" {
  backend = "s3"
  config = {
    bucket  = "terraform-state-unique-key-name"
    key     = "env:/${terraform.workspace}/project_name_variables.tfstate"
    region  = "${var.region}" #NOTE: string interpolation is supported for terraform_remote_state blocks
    profile = "${var.profile}"
  }
}

In the above block, note that we use the current workspace name as part of the key (..env:/${terraform.workspace}/..), to ensure we pick up the correct variables for the current workspace.

To use our variables, we can access them using the data.terraform_remote_state.{remote_state_name}.outputs.{variable_name} syntax.
This is quite long, therefore I redeclare them inside a locals block and they can then be accessed using the local.{local_name} syntax.

locals {
  rds_cluster_name = "${data.terraform_remote_state.variables.outputs.rds_cluster_name}"
}

#...snip...

resource "aws_rds_cluster" "rds_cluster" {
  cluster_identifier = "${local.rds_cluster_name}"
#...snip...
}

This is quite a lengthy process, however it’s a one time setup and quite manageable after that I find.

2. Merging configuration

Another way to manage variables is to commit them to your VCS in a multi workspace configuration and then create a merged setting per this comment.

To do this, we create a folder structure as below

--environments
----dev
------tfsettings.yaml
----test
------tfsettings.yaml

Then we merge the default and per-environment settings from tfsettings.yaml.

locals {
  #You have to initialize any settings you plan to use to avoid a "This object does not have an attribute named" error. You can also use conditionals, this is generally easier
  default_tfsettings = {
    rds_cluster_name = ""
  }

  tfsettingsfile        = "./environments/${terraform.workspace}/tfsettings.yaml"
  tfsettingsfilecontent = fileexists(local.tfsettingsfile) ? file(local.tfsettingsfile) : "NoTFSettingsFileFound: true"
  tfworkspacesettings   = yamldecode(local.tfsettingsfilecontent)
  tfsettings            = merge(local.default_tfsettings, local.tfworkspacesettings)
}

This is slightly easier to setup and has side-effect of committing our variables to our VCS system, this is a choice to be made by the maintainer.

Importing

In some cases, when we are trying to retrofit Terraform management onto an existing environment, you may already have resources you wish to keep.
In that case, you can use the terraform import command to import them into your script.
This has the effect of bringing them into your existing state. Any further application of your template will modify them to the state you’ve specified in your template. E.g. tags etc…

Example

Given we have the below resource block

resource "aws_instance" "ec2_web" {
  ami                         = "ami-123456"
  instance_type               = "t2.micro"
  vpc_security_group_ids      = ["${aws_vpc.vpc.default_security_group_id}"]
  subnet_id                   = "${aws_subnet.private_subnet_az1.id}"
  associate_public_ip_address = false
  key_name                    = "${local.tfsettings.ec2_web_key_name}"

  tags        = merge(local.project_tags, { Name = "${upper(local.tfsettings.project)}-${upper(local.tfsettings.environment)} Web Server" })
  volume_tags = local.project_tags
}

If this instance already exists, we can import it using the statement below

terraform import aws_instance.ec2_web i-123456

Then when we next run terraform plan, we’ll see any discrepancies between our imported instance and our templated resource and can run terraform apply to make any changes required.

Prevent destroy

In some cases, we may have resources we never wish to destroy.
In this case, we can mark them with the lifecycle { prevent_destroy = true } attribute which will then throw an error if an operation requests the destruction of the resource.
Unfortunately, it’s not possible to instruct Terraform to just skip this resource, instead it must be removed from state using the terraform state rm {address} command.
For subsequent use, it must then be imported again.

Locks

In some cases, terraform will get into a state whereby it leaves a template locked.
This may be because the process has crashed or been aborted by the user.
In this case, a message will be printed to the console including the lock id.

Error: Error locking state: Error acquiring the state lock: ConditionalCheckFailedException: The conditional request failed
        status code: 400, request id: 757HE8RDTAENR9NLH2Q40S24C3VV4KQNSO5AEMVJF66Q9ASUAAJG
Lock Info:
  ID:        5ba82c90-deb3-5052-8d52-27ece381cf5e
  Path:      #########.tfstate
  Operation: OperationTypePlan
  Who:       ####
  Version:   0.12.8
  Created:   2019-09-22 00:59:38.4504328 +0000 UTC
  Info:

Once we have the lock id and we’ve confirmed no one else is operating on the same template, we can then use the force-unlock command to unlock our template.

terraform force-unlock 5ba82c90-deb3-5052-8d52-27ece381cf5e

Then we should be able to successfully redo our previous operation.

Custom commands

One of the great things I’ve found is that although some operations may be missing from the terraform provider for your cloud provider of choice, custom CLI commands can be run to fill in gaps.
I recently had to provision an aws_ec2_client_vpn_endpoint for a customer.
When I had to apply custom security groups to the VPN endpoint, I found that the AWS provider does not support this at time of writing.

To get around this limitation, we can simply invoke the aws ec2 apply-security-groups-to-client-vpn-target-network command using a custom command

resource "null_resource" "vpn_subnets_association" {
  provisioner "local-exec" {
    when    = "create"
    command = "aws ec2 apply-security-groups-to-client-vpn-target-network --client-vpn-endpoint-id ${aws_ec2_client_vpn_endpoint.vpn.id} --vpc-id ${aws_vpc.vpc.id} --security-group-ids ${aws_security_group.sg.id} --profile ${var.profile}"
  }

I have found though, that this does not reapply if another security group is added. In that case, as this command is idempotent, we can just remove the resource from our state using the terraform state rm null_resource.vpn_subnets_association command and then reapply.

Cloud

Terraform also has a Cloud offering. While I have not actively used it, it looks very promising for independents or small teams.
If there is any interest, I can do another write-up of the Cloud offering.

Conclusion

Terraform is open source and an amazing fully-featured tool just in its open source offering.
With such a large feature offering, there is bound to be a certain amount of complexity and troubleshooting required.
I’ve found to date that Terraform is an extremely well thought out and engineered product and thoroughly recommend it to any cloud practitioners.

We truly stand on the shoulders of giants. Thank you.

Happy Terraforming :)