Running Terraform in a Lambda Function

Running Terraform in a Lambda Function

Alan Raison
8th June 2022

Home Insights Running Terraform in a Lambda Function

I recently set up a Terraform project which I wanted to run on a regular schedule. There are a number of ways to achieve this, but I decided to package the project as a Lambda function and schedule it with Amazon EventBridge events. This is how I achieved it and what I learned along the way.

We will create a Docker image-based lambda function and let this communicate with the Lambda runtime.

Suitability

First things first, I would not normally advocate running Terraform as a lambda function. This is because one of the pricing factors of Lambda is how long the function runs for. There is also currently a 15 minute maximum execution time. Terraform can often take a long time to run and complete, depending on the types of resources it is creating, and for certain AWS services this can be far longer than 15 minutes. However, for my use case, terraform apply can comfortably run in 2 minutes, so well within these limits.

If the run time was longer then AWS CodeBuild would probably be a more suitable solution.

Getting Started

As with any Terraform project, the state should be stored in a secure, persistent location like S3 and there should be a locking mechanism such as a DynamoDB table to prevent concurrent updates. See the Terraform documentation for details. An IAM role must be created with policies as to allow reading and writing from the state bucket and the lock table, and for the creation, querying and deletion of any resources that the project may wish to manage.

Docker

Once these are in place, we can create a container image to hold our terraform code. We base the image off the latest Amazon Linux 2 image:

FROM public.ecr.aws/lambda/provided:al2

We need to install the terraform executable for the lambda platform (either amd64 or arm64, your choice), and to do so we must install the unzip package from yum. It is also a good idea to check the SHA256 sum of the binary against the expected version, to verify that it has not been corrupted:

ARG ARCH=amd64
ARG TERRAFORM_VERSION=1.1.7
ARG TERRAFORM_SHASUM=e4add092a54ff6febd3325d1e0c109c9e590dc6c38f8bb7f9632e4e6bcca99d4

RUN yum install -y unzip \
  && curl -fsSL -o /tmp/terraform.zip https://releases.hashicorp.com/terraform/${TERRAFORM_VERSION}/terraform_${TERRAFORM_VERSION}_linux_${ARCH}.zip \
  && echo "${TERRAFORM_SHASUM}  /tmp/terraform.zip" | sha256sum -c \
  && unzip -d /usr/local/bin /tmp/terraform.zip \
  && rm /tmp/terraform.zip

The checksums of all of the 1.1.7 terraform packages can be found on the download site.

Copy the root terraform module into the container and initialise the project, to download providers and modules and speed up the execution of the lambda:

COPY *.tf ./

RUN terraform init -backend=false

You can now test the container by building and running it, using your favourite container runtime, e.g.:

docker build -t terraform .
docker run terraform sh -c "terraform init && terraform plan"

You may wish to pass environment variables into the terraform project; this can be achieved with the -e parameter to docker run. Don’t forget that any Terraform variables can be specified through environment variables starting TF_VAR_; for example, if you have the terraform declaration variable "some_variable" { ... }, then a value can be passed in using the environment variable TF_VAR_some_variable.

Preparing for Lambda

Terraform usually writes its data to the directory .terraform under the current working directory, but since the lambda runtime mounts the filesystem read-only, we must also set the environment variable TF_DATA_DIR=/tmp in the Dockerfile. Note that if the Lambda function is run frequently, this directory may be shared between invocations.

To set the default command for the docker image, you can set an CMD declaration, like

CMD terraform init -input=false \
  -backend-config=bucket=${BACKEND_CONFIG_BUCKET} \
  -backend-config=dynamodb_table=${BACKEND_CONFIG_DYNAMODB_TABLE} \
  && terraform apply -input=false -auto-approve

At this point I created a lambda function from this container, so that I could check that it ran and while I debugged it. Build the image and push to a Docker registry, such as ECR (you can follow another tutorial if you don’t know how). I have created a private ECR repository with terraform:

resource "aws_ecr_repository" "my_tf_function" {
  name = "my-tf-function"
}

The Lambda function could be created with a terraform resource similar to the below:

resource "aws_lambda_function" "my_tf_function" {
  architectures = ["x86_64"]
  image_uri     = "${aws_ecr_repository.my_tf_function.repository_url}:latest"
  role          = aws_iam_role.terraform.arn
  function_name = "MyTerraformFunction"
  package_type  = "Image"
  timeout       = 300
  memory_size   = 512

  environment {
    variables = {
      BACKEND_CONFIG_BUCKET         = var.backend_config_bucket
      BACKEND_CONFIG_DYNAMODB_TABLE = var.backend_config_dynamodb_table
    }
  }
}

The memory_size needed to be higher than the default 128mb, otherwise the terraform process ran out of memory. This seems to be related to the size of the state file (i.e. the number of resources being managed), so you may need to adjust to your particular needs.

Lambda API

If you run the lambda like this, there is a problem; the lambda always reports that it has failed! This is because the Lambda runtime isn’t aware of the return status of the docker CMD. To fix this, we add an entrypoint script which communicates with the Lambda API:

#!/bin/bash
set -eo pipefail

REQUEST_ID=$(curl -X GET -sI "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next" \
  | awk -v FS=": " -v RS="\r\n" '/Lambda-Runtime-Aws-Request-Id/{print $2}')
export REQUEST_ID

function error {
  curl -s -d "ERROR" "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/${REQUEST_ID}/error"
}

trap error ERR

terraform init -input=false \
  -backend-config=bucket=${BACKEND_CONFIG_BUCKET} \
  -backend-config=dynamodb_table=${BACKEND_CONFIG_DYNAMODB_TABLE}

terraform apply -input=false -auto-approve

curl -s -d "SUCCESS" "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/${REQUEST_ID}/response"

First we must look up the request ID of the current invocation, since this is how we identify the invocation with the Lambda API.

Next we create a function to run if the script encounters an error; this posts the word “ERROR” to the error API.

We then run our terraform functions (we have previously set the -eo pipefail flags, to ensure that the script exits with an error if any command returns an error status, or if there is an error status in a piped command). Finally we post the word “SUCCESS” to the response API, to indicate that the function has finished successfully.

All the standard output and standard error contents are written to CloudWatch logs, for reference.

We now replace our previous CMD instruction in our Dockerfile with the instruction ENTRYPOINT [ "/bin/bash", "entrypoint.sh" ] and upload this to our docker repository. Since the lambda function uses the latest tag on the docker image, then the next time it is refreshed, it will run with the new version.

Summary

I demonstrated how to package up a terraform module as a Docker container, and how to create a Lambda function from this. I also showed how we need to wrap the terraform commands so that the success or failure of the process is reported correctly back to the Lambda API.

Share Article