How to clean up Lambda logs

Lambda keeps its logs forever by default. Learn how to reduce the clutter

Author's image
Tamás Sallai
5 mins

Log clutter

An annoying feature of Lambda is that its logs tend to accumulate over time. The messages are pushed to CloudWatch Logs for storage organized in groups, but nothing clears those when the function is gone. In fact, by default, even the retention period is not set meaning all the logs will be kept forever.

It does not mean too much cost for development though. A few invocations mean an insignificant amount of storage used. But the increasing amount of log groups make it harder to navigate the CloudWatch Logs console, and also the CLI needs to download more data while paginating through results. It is clutter, which has its costs even if not on storage.

In this article, I'll introduce a simple Node.js-based tool to detect Lambda log groups those functions are already deleted.

How Lambda logging works

Lambda log groups

Fortunately, Lambda uses a predictable naming pattern. This makes it easy to spot which log groups are created for a function and which have a different purpose. This naming pattern means that /aws/lambda/<function name> belongs to <function name>.

If <function name> does not exist anymore, that makes the log group a candidate for deletion. We just need to get all the log groups with the /aws/lamba/ prefix along with all the functions and match them.

Lambda@Edge logs

One thing that complicates this process is that Lambda@Edge logs are stored in the region of the request instead of the function. These logs are prefixed with the function's region, so a function that is deployed to the us-east-1 region (the only one supported at the time of writing) named EdgeTest will log to a group called /aws/lambda/us-east-1.EdgeTest in all regions. In effect, we need to consider all functions in all regions.

Log group attributes

Log groups have two important attributes. The first is the number of stored bytes which indicates how much storage it consumes. This is an eventually consistent value, and as such, it needs some time to reflect recent changes.

The second one is the retention period, defined in days. Any log entry that is older than this is automatically deleted.

Find unused log groups

To get the log groups without a matching Lambda function taking into account both regional and Edge functions, use this script:

npx https://github.com/sashee/unused_lambda_logs

This generates a JSON structure with the log groups grouped by regions:

{
	"eu-west-1": [
		{
			"logGroupName": "/aws/lambda/aa-LambdaFunction-B0TYC8BX9VBT",
			"storedBytes": 375
		},
		{
			"logGroupName": "/aws/lambda/ab-LambdaFunction-5JBV23QSN6A5",
			"storedBytes": 146279,
			"retentionInDays": 180
		},
	],
	"us-east-1": [
		...
	]
}

If you prefer a columnar format that is better suited for CLI tools, use jq to transform it:

npx https://github.com/sashee/unused_lambda_logs | \
	jq -r 'to_entries |
		map(.key as $region |
			.value |
			map("\($region)\t\(.logGroupName)\t\(.storedBytes)\t\(.retentionInDays)")
		) |
		flatten |
		.[]' | \
	column -t

With a result of region - log group name - stored bytes - retention in days:

eu-west-1  /aws/lambda/aa-LambdaFunction-B0TYC8BX9VBT                        375     null
eu-west-1  /aws/lambda/ab-LambdaFunction-5JBV23QSN6A5                        146279  180
eu-west-1  /aws/lambda/custom1-CustomResourceLambdaFunction-1VO9GXJ7UQVTB    13500   null
eu-west-1  /aws/lambda/ff1-LambdaFunction-1GCJW7VUIBATW                      744     null

Filter to no retention period

To get a list of potentially unused log groups that have no retention period set:

npx https://github.com/sashee/unused_lambda_logs | \
	jq -r 'to_entries |
		map(.key as $region |
			.value |
			.[] |
			select(has("retentionInDays") | not) |
			"\($region)\t\(.logGroupName)"
		) |
	.[]'

These should be the ones that are not used but will be stored forever:

eu-west-1       /aws/lambda/aa-LambdaFunction-B0TYC8BX9VBT
eu-west-1       /aws/lambda/custom1-CustomResourceLambdaFunction-1VO9GXJ7UQVTB
eu-west-1       /aws/lambda/ff1-LambdaFunction-1GCJW7VUIBATW

To set the retention period to 180 days for all of them, pipe the output to the AWS CLI. Remove the | sh part first to validate what it will do before you run it.

npx https://github.com/sashee/unused_lambda_logs | \
	jq -r 'to_entries |
		map(.key as $region |
			.value |
			.[] |
			select(has("retentionInDays") | not) |
			"\($region)\t\(.logGroupName)"
		) |
	.[]' \
	| awk '{print "aws logs --region " $1 " put-retention-policy --log-group-name \"" $2 "\" --retention-in-days 180"}' \
	| sh

Filter to no stored bytes

To get the list of empty (no bytes are stored) log groups:

npx https://github.com/sashee/unused_lambda_logs | \
	jq -r 'to_entries |
		map(.key as $region |
			.value |
			.[] |
			select(.storedBytes == 0) |
			"\($region)\t\(.logGroupName)"
		) |
		.[]'

These are without a function logging here and without any log entries. These should be safe to delete.

eu-central-1    /aws/lambda/tt4-LambdaFunction-RE48D3AAQ9ZF
eu-central-1    /aws/lambda/tt5-LambdaFunction-MI1G8IVCUL1D
eu-central-1    /aws/lambda/tt5-RandomString-1J9QG6IX-RandomStringLambdaFuncti-10OLP6DKWD6E6

To delete them, pipe the output to the AWS CLI. Again, double check that you really want those groups to be gone by removing the | sh part.

npx https://github.com/sashee/unused_lambda_logs | \
	jq -r 'to_entries |
		map(.key as $region |
			.value |
			.[] |
			select(.storedBytes == 0) |
			"\($region)\t\(.logGroupName)"
		) |
		.[]' \
	| awk '{print "aws logs --region " $1 " delete-log-group --log-group-name \"" $2 "\""}' \
	| sh

Safely delete log groups

If there is a retention period set and the number of stored bytes is 0 that means there were no writes in that period. When there is no matching lambda function to write there, it should be safe to delete.

If there are logs messages but no retention period, you should set sensible limits. What is a sensible limit? It depends. For a production function that might be years. For one that is used only during development, a few weeks should be plenty. As a rule of thumb, I believe 180 days should be a good default.

A safe process should be as follows:

  • find all log groups without a matching function
  • set a retention period if there is none set
  • delete the log group if the number of stored bytes is 0

Doing this periodically would guarantee that no logs will be stuck forever after the Lambda function is deleted.

July 23, 2019