Why a multi-account setup is essential for secure systems

Development and production are to be separated by account boundaries

5 mins
I have a lot of challenges when it comes to AWS, but I bet your pain points are entirely different than mine. I'd love to hear what keeps you up at night. It would be great to hear from you by filling out this form. Thanks in advance!

Motivation

All too often I see companies having opened an AWS account jump right in and start deploying systems. This is fine when the project is in the development phase without production resources like databases filled with customer data or third-party integrations. But time goes on and at one point the system under development goes into production and while everybody is happy that it finally happened – and quite possibly the easing of the time pressure – few think about how to it will be secure in the long run. More often than not, this single account will be used for continuing development next to the production resources.

And the problem is that this setup can not be secure.

Single-account setup

If you use only a single account that usually means you have the production resources along with the development ones, especially the developer users themselves. This scenario is the default as you get one account by default but you need to take action to get another one.

Why it’s bad for developers

The main question here is whether developers have access to the production resources. The executive thinking is that developers might need something from there, for example to reproduce a bug, so give them all the access they might require.

But if developers have access to production resources then you put immense trust in them. And it is a bad situation for everyone.

Development is about experimentation. A new database might be needed to spin up or torn down. Or a process may go wrong eventually and consumes more resources than it is supposed to. But if you put a developer into a place where he can accidentally wreak havoc to the business that would limit these experimentations.

On the other hand, do you really want to give all your secrets and data to every new employee, even if they are just onboarding? You wouldn’t give your confidential documents to them, so why treat customer data any different? Not to mention data privacy concerns.

If your developers don’t have access to the production resources, that means you utilize some sort of access control to lock down your precious data. That’s a good start, but unfortunately it is practically impossible to do so in a way that still allows free experimentation.

Let’s see an example!

Let’s say your production environment uses a Lambda function to access some data, for example to read something from an S3 bucket. To allow it to do so you define an execution role with the necessary permissions and assign it the function.

But what prevents the developer to deploy a different Lambda function that uses the same role? Well, you might limit the developer account so that it can not pass that role (iam:PassRole permissions). This is a good second step, but what prevents the same developer to create a role that does have the permission to associate a Lambda function with any role? If you limit this than you limit the developer account in a way that interferes with the experiments. As an alternative, you can use Permission Boundaries, but now you’re entering an intricately complicated territory.

In short, it is insanely hard to find all cracks in permissions.

If that same production Lambda connects to a database and the credentials can be extracted from the code (sloppy dev practice) then even read-only access is an information leak.

There is also the notion that a user has access to everything that the services he can access can access. If you have no access to a bucket but you can SSH into an instance that has, then in effect you have access to the bucket. Mapping all these transitive permissions is also extremely hard and they change constantly.

Also, don’t rely on CloudTrail as an effective deterrent. For a competent attacker, it’s all too easy to plausibly deny any wrongdoing and blame an unknown third-party. You might see that your developer is the culprit, but if he contends that his access keys were compromised you can not really do anything about it.

Why it’s bad for operations

For the operations team, it’s an even more problematic setup. Ops require guarantees and how can you guarantee that a resource is protected and is only accessible in some predefined ways and by a predefined set of people?

Since it is nearly impossible to track down effective permissions for people who need to experiment with a wide range of resource types (see above), it is similarly nearly impossible to make invariant assertions about the production environment.

Multi-account setup

A multi-account setup solves all the above problems. Developers are free to experiment in their own sandboxed accounts with a well-defined blast radius. Granted, they can still start costly resources and keep them running, but that is nowhere near what a lost database would cost.

Ops can have guarantees, as only a fraction of the employees (even from their own ranks) can have any access to the production account. And since they don’t need to be able to experiment they can be locked down to specific workflows. No IAM access for most of the team simplifies the permissions a lot.

When CI/CD is bad for security

There is a scenario where privilege escalation can happen even for a multi-account setup. If the codebase (or part of it) is automatically deployed from a repository, then as per transitive permissions, anybody who has push access will have access to the production resources. This can compromise security and break the access control guarantees that would otherwise be present.

Conclusion

While systems tend to evolve naturally to a single-account setup, it is a flawed organization of resources. Make some plans during the go-live event how the continued development will take place and how to keep it secure.

14 May 2019