How to manage Lambda log groups with Terraform

Lambda logs are stored forever by default. This is a bit too long

Author's image
Tamás Sallai
6 mins

Lambda logging

Lambda automatically creates all log resources when a function is created which enables permanent logging even if you do nothing special to enable it. This is convenient as logging is an essential debugging tool, and you don't want to realize you don't have any records when you need them.

Lambda puts its logs straight into CloudWatch Logs, and that is using 2 layers to organize logs. The upper layer is the log groups and it contains the log streams, which in turn is a container for the log events. With Lambda, there is one log group for each function, and multiple streams are created under it, (at least) one for each version.

The Lambda service creates these resources but it needs permission to do so. The AWSLambdaBasicExecutionRole defines the basic permissions for functions, and it allows creating each of the log resources:

"Document": {
		"Statement": [
				{
						"Action": [
								"logs:CreateLogGroup",
								"logs:CreateLogStream",
								"logs:PutLogEvents"
						],
						"Effect": "Allow",
						"Resource": "*"
				}
		],
		"Version": "2012-10-17"
}

If you attach this managed policy or a policy with these permissions, the Lambda service creates all resources related to logs.

Problems with Lambda-managed logs

There are 2 problems with the default approach. First, the log group created by Lambda does not set any expiration to the log messages. This is a good default, but since it incurs costs to store logs, in most cases it is unnecessary. Unfortunately, there is no way to control this default.

The second problem is that if you use an IaC solution to deploy the architecture, like Terraform or CloudFormation, log groups are not managed by that. When you destroy the stack the function is gone, but the log group it created is not. And all the log messages are also kept there, forever.

From a cost perspective, it is not a big problem as these logs tend to be small. But clutter is costly in the long term, as everything that queries the log groups will need to filter out an increasing amount of noise. A better approach would be to make sure the logs are gone when the function is deleted, except for cases when you configure it otherwise.

And this is possible with Terraform. But as usual, it needs some planning. In this article, you'll learn how log groups creation works and how to make Terraform manage its lifecycle so that you have full control of whether it will survive the function or not.

Sample code

I'll use the code from this GitHub repository. It covers all 3 use-cases that are described below.

Region

Since the examples use Terraform as well as the AWS CLI, there are two environment variables to control the region.

The AWS_REGION is for Terraform, while the AWS_DEFAULT_REGION is for CLI. Make sure to set both of them to the same region:

export AWS_REGION=eu-west-1 && export AWS_DEFAULT_REGION=$AWS_REGION

The default case: Lambda-managed log group

First, let's see how Lambda works by default! This is the default case and used for most functions.

The function needs permission to create all log resources, either via the managed role, or a custom policy:

data "aws_iam_policy_document" "lambda_exec_role_policy" {
  statement {
    actions = [
      "logs:CreateLogGroup",
      "logs:CreateLogStream",
      "logs:PutLogEvents"
    ]
    resources = [
      "arn:aws:logs:*:*:*"
    ]
  }
}

Then deploy the stack, and invoke the function:

terraform apply
FUNCTION_NAME=$(terraform output function_name)

aws lambda invoke --function-name $FUNCTION_NAME --payload '{"param": "World"}' >(cat)
{"value":"Hello World"}{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}

Inspecting the log group for the function:

aws logs describe-log-groups --log-group-name-prefix /aws/lambda/$FUNCTION_NAME
{
	"logGroups": [
		{
			"logGroupName": "/aws/lambda/d696e739f1a18cda-function",
			"creationTime": 1576232093472,
			"metricFilterCount": 0,
			"arn": "arn:aws:logs:eu-west-1:123456789012:log-group:/aws/lambda/d696e739f1a18cda-function:*",
			"storedBytes": 0
		}
	]
}

The log group is created automatically, and it does not expire (the retentionInDays parameter is not set).

Since it is not managed by Terraform, destroying the stack does not delete the log group:

terraform destroy
aws logs describe-log-groups --log-group-name-prefix /aws/lambda/$FUNCTION_NAME
{
	"logGroups": [
		{
			"logGroupName": "/aws/lambda/d696e739f1a18cda-function",
			"creationTime": 1576232093472,
			"metricFilterCount": 0,
			"arn": "arn:aws:logs:eu-west-1:123456789012:log-group:/aws/lambda/d696e739f1a18cda-function:*",
			"storedBytes": 0
		}
	]
}

Solutions

Disabled logging

A straightforward solution is to not let Lambda create a log group, which means there is no log group remaining after deleting the function, but it also means no logging. Interestingly, it does not affect the function execution in any ways, it just does not put the logs anywhere.

I don't recommend using this, but it's interesting to see that it works.

Use a role that lacks the CreateLogGroup permission:

data "aws_iam_policy_document" "lambda_exec_role_policy" {
  statement {
    actions = [
      "logs:CreateLogStream",
      "logs:PutLogEvents"
    ]
    resources = [
      "arn:aws:logs:*:*:*"
    ]
  }
}

Create the stack:

terraform apply
FUNCTION_NAME_WO_PERM=$(terraform output function_name_without_createloggroup)

The function still works:

aws lambda invoke --function-name $FUNCTION_NAME_WO_PERM --payload '{"param": "World"}' >(cat)
{"value":"Hello World"}{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}

But there is no log group:

aws logs describe-log-groups --log-group-name-prefix /aws/lambda/$FUNCTION_NAME_WO_PERM
{
	"logGroups": []
}

Unsurprisingly, deleting the stack does not leave a log group:

terraform destroy
aws logs describe-log-groups --log-group-name-prefix /aws/lambda/$FUNCTION_NAME_WO_PERM
{
	"logGroups": []
}

Terraform-managed log group

This solution builds on the previous step. If you want Terraform to manage the log group resource, you need to make sure Lambda won't create it accidentally.

This means that just like before, you need to remove the CreateLogGroup permission from the function.

data "aws_iam_policy_document" "lambda_exec_role_policy" {
  statement {
    actions = [
      "logs:CreateLogStream",
      "logs:PutLogEvents"
    ]
    resources = [
      "arn:aws:logs:*:*:*"
    ]
  }
}

Making Terraform manage the log group allows you to define its parameters, such as the retentionInDays which controls the auto-expiration of log messages.

The log group has a fixed name of /aws/lambda/<function name>, and this is the only thing that connects it to the function itself.

resource "aws_cloudwatch_log_group" "loggroup" {
  name              = "/aws/lambda/${aws_lambda_function.lambda.function_name}"
  retention_in_days = 14
}

Creating the stack and invoking the function works:

terraform apply
FUNCTION_NAME_WO_PERM_W_RES=$(terraform output function_name_without_createloggroup_with_resource)

aws lambda invoke --function-name $FUNCTION_NAME_WO_PERM_W_RES --payload '{"param": "World"}' >(cat)
{"value":"Hello World"}{
    "StatusCode": 200,
    "ExecutedVersion": "$LATEST"
}

Also, there is a log group that gives the function a place to log:

aws logs describe-log-groups --log-group-name-prefix /aws/lambda/$FUNCTION_NAME_WO_PERM_W_RES
{
	"logGroups": [
		{
			"logGroupName": "/aws/lambda/without-createloggroup-with-resource-d696e739f1a18cda-function",
			"creationTime": 1576232060718,
			"retentionInDays": 14,
			"metricFilterCount": 0,
			"arn": "arn:aws:logs:eu-west-1:123456789012:log-group:/aws/lambda/without-createloggroup-with-resource-d696e739f1a18cda-function:*",
			"storedBytes": 0
		}
	]
}

Note the retentionInDays which reflects the property in the Terraform config.

Destroying the stack and inspecting the log group again:

terraform destroy
aws logs describe-log-groups --log-group-name-prefix /aws/lambda/$FUNCTION_NAME_WO_PERM_W_RES
{
	"logGroups": []
}

The log group is gone with function.

Lambda@Edge

While this works for normal Lambda functions, it isn't enough for Lambda@Edge logging. Since an edge function runs in different regions, it creates a log group in each of them with the name of /aws/lambda/us-east-1.<function name> when there is a request to the function. This means Terraform has to create a log group in every region, which is not supported at this moment.

Conclusion

By removing the CreateLogGroup permission and adding an aws_cloudwatch_log_group resource with the correct name, Terraform can manage all Lambda logging resources. This enables you to define the properties of the log resources as well as clean them up when the stack is deleted.

January 21, 2020