9 tips for working with AWS IAM
Lessons learned with AWS security configurations
The IAM service is the central place to define access control in an AWS account. It defines the users and other identities and what they can do. It has quite a few parts and it's easy to make mistakes in the configurations.
This article is a collection of lessons learned and some tips to help avoid common mistakes when working with the IAM service.
1. Make a user for everybody (a.k.a. don't share secrets)
I saw on multiple occasions that giving access to a new developer was done by sending the root account password in an email. This goes against all security best practices. First, you can not limit what individuals have access to as everybody is using the same credentials. And second, CloudTrail logs don't show who is using the shared identity, so when something bad happens it's way harder to identify the cause.
Considering that creating IAM users takes just a few clicks and you can equally easily attach the AdministratorAccess policy, it's an easily solvable problem. And by creating a user for everyone, you can make sure only the intended person knows the password.
2. Use IAM roles when possible
Roles are temporary credentials as their access keys have an expiration time. In contrast, IAM users have permanent credentials, as those are valid until explicitly revoked.
As role credentials expire automatically, losing them is a lot less of a problem than losing user keys. And since roles require a process to get and refresh keys, they can not be hardcoded.
Roles have their specific use-cases and in some scenarios they can't be used. But in all the cases where they can, they are the superior choice. For example, giving an EC2 instance or a Lambda function access to resources, or implementing cross-account access.
3. Minimize the number of policies
IAM policies are the documents that describe who can do what. They are not complicated, but if your account has 100 of them then they can get out of control. By slashing the number of policies you make the security of the account easier to reason about.
There are several ways to implement access control in fewer policies. For example, user groups allow a single policy to be attached to multiple users. Or you can use roles that have the permissions and allow certain people to assume that role. Or you can use tags to define access (also known as Attribute-Based Access Control, or ABAC for short).
4. Implement least privilege
The principle of least privilege is to grant only permissions as is necessary to perform the work and cut everything else. The AdministratorAccess managed policy is good for people who need wide-ranging permissions, but it's unnecessary for a Lambda function that only needs access to a DynamoDB table.
Implementing least privilege sounds sensible, but it's quite hard to implement and maintain. My advice is to start with programmatic access: Lambda functions, EC2 instances, where the set of required services are easier to identify.
5. Use Conditions sparingly
I really wanted to like the Condition elements in IAM policies. It offers fine-grained control over who can do what, and on paper it sounds good.
With Conditions, it is possible to restrict access to an S3 bucket to the company IP address, or requiring an MFA device.
But in reality, I spent too much time implementing some tricky Conditions that I realized that it just does not worth it. There is no visibility why something is not working and there are so many exceptions that it's a frustrating experience.
There is a set of things that can be implemented using policy Conditions. And there is a much smaller set of things that can be implemented using policy Conditions and there is an article on how to do it in the AWS docs. Use Conditions for cases from this smaller set.
6. Use resource-based policies too
Resource-based policies allow you to control access from the resource-side. For example, an S3 bucket policy can allow or deny users access to its contents.
Of course, if a user has no permissions to an S3 bucket then it can not access it. But also adding a resource-based policy implements defense-in-depth: by denying access from the resource-side then accidentally adding permission to a user won't expose the contents.
7. Use a multi-account setup
You can (and should) enable Organizations and create a new AWS member account. This helps with compartmentalization as if you raise account boundaries between resources that are not tightly coupled a breach on one part won't affect the rest of the systems. For example, if the company is working on two separate products then you can move them to separate accounts. This helps with the overall security tremendously.
Also, using AWS member accounts opens the door for using SCPs (Service Control Policies). With this policy type, you can put restrictions on the account from outside the account. For example, you can disable entire services, such as when a backend does not need SageMaker, or even just some expensive action, such as removing the ability for a dev account to buy reserved instances (that can be a multi-million dollar mistake, by the way). Or you can disable entire regions to make it less likely to have some accidental (or malicious) resources in one that you don't check.
8. Check if policies and keys are still needed
It's a best practice to regularly go through all the security-related configurations and see if you still need them. AWS provides several "last used" reports, such as for user logins, access keys, and more recently for policy elements.
9. Be careful with anonymous access
Hardly a week pass by without seeing a breach involving an open S3 bucket. These attacks are the easiest you can imagine: the company allowed anonymous access to an S3 bucket, put a ton of sensitive data into it, then an attacker came and downloaded everything. It's surprisingly common so AWS put some additional control over it and now you need to disable public access block first. But this kind of breach still happens.
My favorite story is the Twilio security incident. They made the bucket that hosts their public SDK world-writable, and somebody came and overwrote it with a malicious version. This SDK then immediately compromised all the sites that loaded it.