How to delay calling a Lambda function using Step Functions
Call a state machine with a wait task first
Delaying
Lambda is event-driven, meaning it starts running when an event triggers it. Because of this, it does not support delaying that event and start processing at a later time.
While it could wait as part of its code, that is an expensive operation as you need to pay for the total runtime of the function multiplied by the memory allocation. Also, the maximum time a function can run is 15 minutes. In theory, you could use a small Lambda for waiting and call itself to extend the hard execution limit, it is more of an interesting development challenge than a realistic approach.
In AWS, you can use Step Functions to solve this instead. It is an orchestration service that allows you to define a workflow as a state machine and it can
use other services and built-in operations. One of this built-in operation is the Wait
task that stops the process for a configurable amount of time. And
to make it a viable solution, contrary to a Lambda function, you don't pay for the total execution time.
A delayer state machine has a Wait and a Task (with a Lambda function) states:
The Wait task
This task stops the execution for some time. The interesting part here is how to define how long that pause should be.
The task supports a constant amount of seconds via the
Seconds
parameter, but it can also take a Path that is a reference to its input.
A state machine-based orchestration service is not just to define what tasks it needs to take but also what data flows between those tasks. In this simple
example, the state machine gets some input and it passes that to the Wait
task. That task can then use that to know how long to wait.
{
"States": {
"Wait": {
"Type": "Wait",
"SecondsPath": "$.delay_seconds",
"Next": "..."
},
}
}
The parameter is the SecondsPath
, and it defines a path: $.delay_seconds
. This looks up the input object, finds the delay_seconds
field, which
is a number, and wait for that amount of seconds. The great thing about this structure is that the caller can define the delay and it's not hardcoded.
Calling the Lambda
The second state is a Task that calls a Lambda function. It needs its ARN, as usual, and it also needs the lambda:InvokeFunction
permission.
To get that permission, the state machine uses an IAM role that trusts the states.amazonaws.com
service and has an identity-based policy:
States definition
Combining all of the above, the final states definition:
{
"StartAt": "Wait",
"States": {
"Wait": {
"Type": "Wait",
"SecondsPath": "$.delay_seconds",
"Next": "Call function"
},
"Call function": {
"Type": "Task",
"Resource": "${aws_lambda_function.lambda.arn}",
"End": true
}
}
}
It starts at the Wait
task, that stops the execution based on the delay_seconds
input. When it wakes up, it moves to the Call function
state,
and that calls the Lambda function. When that is finished the execution ends.
Testing
Let's use the AWS CLI to call the state machine and start an execution:
aws stepfunctions start-execution \
--state-machine-arn $(terraform output -raw delayer_arn) \
--input '{"delay_seconds": 5, "test":"value"}'
Since the example code is deployed via Terraform, the delayer_arn
output is the ARN of the state machine. Then the --input
defines an object with a
delay_seconds
and a test
value.
The log shows that the lambda is called after a delay:
And the function gets all the inputs in the event
argument:
Absolute timing
Instead of waiting for N seconds, the Wait task also supports a timestamp and it halts the execution until that is reached. That's the Timestamp
and its
TimestampPath
counterpart option:
{
"Wait": {
"Type": "Wait",
"TimestampPath": "$.at",
"Next": "Call function"
},
}
The timestamp format is ISO8601 but with some additional restrictions, such as it must not contain a numeric offset. I found that the %Y-%m-%dT%H:%M:%SZ
works fine when the timestamp is in the UTC timezone.
To test it, call the state machine with a timestamp that is 10 seconds from now:
aws stepfunctions start-execution \
--state-machine-arn $(terraform output -raw scheduler_arn) \
--input "{\"at\": \"$(TZ='UTC' date --date="10 seconds" +"%Y-%m-%dT%H:%M:%SZ")\", \"test\":\"value\"}"
The inspector shows the steps taken:
And the logs show that the function was called after a delay: