How to use the PhysicalResourceId for CloudFormation Custom Resources

The PhysicalResourceId is an oft-overlooked feature of Custom Resources. This is a mistake.

Author's image
Tamás Sallai
7 mins

Background

Working with custom resources in CloudFormation is mostly a straightforward task. After you have the template for the Lambda function and the necessary permissions set up once, it is mostly copy-paste and handling the lifecycle is a matter of API reading. But it took me a lot of time to wrap my head around the PhysicalResourceId and what are the best candidates for this parameter.

At first, I didn't give too much attention to it. It resulted in a lot of frustration and I soon realized that overlooking its importance cause a lot of headache down the road. Numerous times I watched the underlying resource simply disappear then had to wait for CloudFormation to offer the option to delete the stack without properly cleaning everything.

After giving it much thought, I identified two distinct use-cases that should cover pretty much every scenario.

But first and foremost, do not use the context.logStreamName from the official examples. It will appear to be working but it will break eventually.

What is the PhysicalResourceId

First, let's recap what exactly is the PhysicalResourceId and what is it good for.

It is a parameter that you get with the lifecycle events and you also return it. The value of the event.PhysicalResourceId during the events:

1) Create

Empty. Plain and simple.

2) Update

The value is what you returned in the Create step.

But depending on what you return here, two things can happen:

If you return the same one, nothing else happens. It will be a regular update.

But if you return a different one, CloudFormation will issue a Delete with the old parameters. In effect, this will be a replace.

3) Delete

The value will be the one you returned in the Create step.

Uses

As you can see, the PhysicalResourceId is used for two things:

First, it communicates a piece of information between the lifecycle steps. What you returned in the Create step, you'll get in the Update and the Delete.

And second, it controls whether a resource is updated or replaced. And this can cause problems, as in some cases it can delete the underlying resource with all its state.

What to set it then?

Scenario #1: The resource provides the id

This is the usual case. When you create a resource, it usually returns an id. For example, when a GitHub gist is created, you issue a POST and the API returns an id:

POST /gists

And in the response, there is a property called id:

{
	"id": "aa5a315d61ae9438b18d",
}

This is a critical piece of information, as you'll need it later for updating/deleting the gist:

DELETE /gists/:gist_id

As it identifies the underlying resource (the gist), it is a perfect candidate for the PhysicalResourceId. In the Create step, return the id:

const id = createGist(...);

Then for Updates, you can either return the same one (update) or create a new gist and return the id of that (replace):

const existingId = event.PhysicalResourceId;

if (...) {
	const newId = createGist(...);

	// ... return the newId
}else {
	// ... return the existingId
}

The important thing here is not to delete the existing resource. When a new PhysicalResourceId is returned, CloudFormation will issue a Delete, and that should take care of cleaning it up.

In this scenario, unfortunately, you can not detect the case when there is a problem and CloudFormation retries creating the resource when it already exists. In that case, you can end up with an orphaned resource, i.e. a resource that is not managed by any CloudFormation resource.

Scenario #2: You supply the id

The other use-case is when you specify the id of the underlying resource. This is the case for S3 objects, but to stick to our previous example, it applies to GitHub repositories. Repositories are identified by the account name and the repository name.

When you create a repository, you provide the name in the body:

POST /user/repos

{
	...
	name: "<name>",
}

Then when you want to delete it, you only need the name again (and the account name, but you can get that from a separate call):

DELETE /repos/:owner/:repo

Make the name unique

A common mistake is to define the name fully in the CloudFormation template and use that directly. While this gives full control over the naming to the user of the template, this practice easily causes irreversible problems. The problematic cases are renaming the resource in the template and copy-pasting it over a different name:

Repo1:
  Properties:
    Name: repo
Repo2:
  Properties:
    Name: repo

If both resources map to the same repository, then the updates will become unreliable and they can easily be stuck.

To avoid the clashing of names, instead of naming it repo, add something unique, such as repo-bf3f42.

How should you create the second part?

First of all, don't make it random. While it is an easy choice (those characters look like random, right?), but that makes your function non-idempotent. If there is a failure, CloudFormation retries creating the resource and this can be caused by something outside your control, like a failure of the computer running the Lambda code. If you introduce randomness, you can end up with orphaned resources. While it should be rare, it's better to prepare for this scenario.

Instead of a random, let's use a pseudo-random part based on a subset of parameters available to the function. As a bare minimum, you should use the StackId and the LogicalResourceId. These two together make the resource unique and make sure the names can not clash.

You can use a hash function for the calculation, like this one:

const getId = (event) => require('crypto').createHash('md5').update(`${event.StackId}-${event.LogicalResourceId}`).digest("hex").substring(0, 7);
Controlled replace

If you use the StackId along with the LogicalResourceId as inputs to calculate the hash, there will be no replaces to the resource, only updates. In some cases, you might want to replace the underlying resource instead of updating it, this being the case for properties that can not be updated.

To add parameters that control when a replace should be done instead of an update add them to the hash when you calculate the id:

const getId = (event, ResourceProperties) => require('crypto').createHash('md5').update(`${event.StackId}-${event.LogicalResourceId}-${ResourceProperties.Name}`).digest("hex").substring(0, 7);

But don't forget to check if the id is changed and create a new resource if it was:

const oldId = getId(event, event.OldResourceProperties);
const newId = getId(event, event.ResourceProperties);
if (oldId !== newId) {
	// create a new resource
}else {
	// use the existing resource
}
What about the name?

So, what should go into the PhysicalResourceId?

For a resource that is defined like this:

Repo1:
  Properties:
    Name: repo

Then you can choose between two approaches:

Either add the name to the hash and use the variable part as the id (bf3f42) or use the full name of the resource (repo-bf3f42).

In both cases the effect is the same: whenever the name is changed, the resource will be replaced. Personally, I prefer the first approach, as that will name the PhysicalResourceId the same as the underlying resource.

Conclusion

Choosing the PhysicalResourceId is critical when you define your own custom resources. It can make or break the reliability of your code, but it is often overlooked.

I hope this guide helps you choose the right one for your use-case.

January 16, 2019