How to handle timeouts in Lambda functions

Promise-based timeout mitigation in AWS Lambda functions

Author's image
Tamás Sallai
4 mins

Lambda function timeouts

A Lambda function has a timeout parameter after which the Lambda runtime terminates the execution. It can happen at any point in the function execution and that might leave state in-between processing.

For example, the above function does multiple steps one after the other, first copying an S3 object, then deleting one, and finally reading a value from DynamoDB:

await s3.copyObject({
	/* ... */
}).promise();
await s3.deleteObject({
	/* ... */
}).promise();

return dynamodb.getItem({
	/* ... */
}).promise();

When this function is called, one of multiple things can happen:

  • none of the operations are run
  • the S3 object is copied
  • the S3 object is copied and the other one is deleted
  • the S3 object is copied, the other one is deleted, and the value is read from DynamoDB

Apart from the machine crashing, which you can not do anything, it can be caused by timeouts. Maybe the S3 or the DynamoDB service is experiencing degradation and that makes the call longer than usual. Fortunately, there are ways to handle timeout errors in the function code.

Remaining time

The first thing is to know how much time is left for the execution. Maybe the execution reaches the critical section with a significant delay which leaves less time for the function to finish.

The second argument of the handler function is the context object:

module.exports.index = async (event, context) => {
	// ...
}

This has a number of properties. Among them is a function called getRemainingTimeInMillis which returns how many milliseconds the function has before it will be terminated.

module.exports.index = async (event, context) => {
	const remainingTime = context.getRemainingTimeInMillis();
	// ...
}

This allows the function to adapt to the remaining time instead of some hardcoded value.

const remainingTime = context.getRemainingTimeInMillis();
try {
	// run for remainingTime - 1000
}catch(e) {
	if (timeoutError) {
		// handle timeout
		// there is ~1 second remaining
	}
}

AWS SDK timeouts

But how to enforce this value in the function? By default, asynchronous operations run until completion without any regard to timing.

Network calls usually has some retry policy and give up eventually. The AWS SDK functions support setting timeouts.

const s3WithLimit = new AWS.S3({
	httpOptions: {
		timeout: context.getRemainingTimeInMillis() - 1000,
	},
	maxRetries: 1,
});
// copy with timeout
await s3WithLimit.copyObject({
	Bucket: process.env.BUCKET,
	Key: key,
	CopySource: encodeURI(`${process.env.BUCKET}/${key}`),
}).promise();

It works, but there are several problems with this. First, timeouts are set on the service-level. This needs a lot of boilerplate code to set a separate value for each call.

Second, and more importantly, there is not much documentation on how to configure it right. There are separate limits for the socket and the response timings, and I couldn’t find a setting that limits the total time. Also, setting the maxRetries to 1 effectively disables the auto-retry behavior that has unintended side-effects.

There is a GitHub issue and an article about this, but not much else.

Promise-based timeout handling

Fortunately, Lambda functions support async/await and Promises which makes all the Javascript async tools available. This makes a general-purpose timeout work:

const timeout = (prom, time, exception) => {
	let timer;
	return Promise.race([
		prom,
		new Promise((_r, rej) => timer = setTimeout(rej, time, exception))
	]).finally(() => clearTimeout(timer));
}

For more info on how it works, see the article on Javascript Promise timeouts.

To use it, call it with the Promise and the number of milliseconds for the timeout. The following code throws an Exception if the s3.copyObject does not finish in 1.5 seconds:

await timeout(s3.copyObject({
	Bucket: process.env.BUCKET,
	Key: key,
	CopySource: encodeURI(`${process.env.BUCKET}/${key}`),
}).promise(), 1500);

To handle timeout events, use the third argument of the function to pass a custom error object. Using a Symbol here helps distinguish it from the “normal” errors.

When combined with the context.getRemainingTimeInMillis function, this structure gives a reliable way to handle timeouts in a dynamic way:

const timeoutError = Symbol();
try {
	await timeout(
		s3.copyObject({
			Bucket: process.env.BUCKET,
			Key: key,
			CopySource: encodeURI(`${process.env.BUCKET}/${key}`),
		}).promise(),
		context.getRemainingTimeInMillis() - 1000,
		timeoutError,
	);
}catch(e) {
	if (e !== timeoutError) {
		throw e;
	}
	// handle timeout error
	// there is ~1 second left
}

And since it’s Promise-based, it works with async functions too. This structure can group multiple calls:

const timeoutError = Symbol();
try {
	const item = await timeout(
		(async () => {
			await s3.copyObject({
				/* ... */
			}).promise();
			await s3.deleteObject({
				/* ... */
			}).promise();

			return dynamodb.getItem({
				/* ... */
			}).promise();
		})(),
		context.getRemainingTimeInMillis() - 1000,
		timeoutError
	);
}catch(e) {
	if (e !== timeoutError) {
		throw e;
	}
	// handle timeout error
	// there is ~1 second left
}

Conclusion

AWS Lambda functions can be terminated prematurely. This can happen due to machine failure, but reaching the timeout is also a possibility. The runtime gives a way to check for the remaining execution time and combining that with a Promise-based timeout construct makes it possible to run a mitigation logic.

24 November 2020