How to auto-disable leaked IAM credentials

Attach the AWSDenyAll policy when a user makes a request that is denied

Author's image
Tamás Sallai
8 mins

I'm building a personal backup solution based on AWS S3. Part of that is giving access to a local process running on my laptop and phone to read and write the contents of a bucket. The way to do it is to create an IAM user, generate an access key for it, and store it locally.

But then I started thinking how best to secure these credentials?

In this article I'll describe a mechanism that automatically disables an IAM user when it makes a request that is denied. You'll learn in which cases this is a good solution, and what are the alternatives.

What does not work

First, let's talk about why I chose an IAM user and why I'm concerned about its keys getting leaked.

Protecting client secrets locally

The first line of defense is to protect the keys locally. This includes several steps, such as the script is owned by root and is not editable to the user, and using systemd credentials to store them encrypted:

# encrypting credentials
systemd-creds encrypt --name=aws_secret_access_key plaintext.txt aws_secret_access_key

# using in a unit file
LoadCredentialEncrypted=aws_secret_access_key:/etc/restic_configs/%i/aws_secret_access_key

But no matter how much security is put in place on the device itself, it does not change the fundamental truth: it is stored on the device in a recoverable way. There can be a breach that gives an attacker root access and then they can steal the keys.

Temporary credentials

The root of the problem is that the access key is permanent: if somebody can steal it, it's gone for good. So, why not use temporary credentials? After all, using roles instead of users is a best practice.

For example, a Lambda function needs access to read objects from an S3 bucket. Permission-wise it would be a role, attached to the Lambda function, and given the necessary permission. Then the Lambda runtime will automatically fetch the short-term keys and make them available to the function:

// Look, no credentials!

import {S3Client, GetObjectCommand} from "@aws-sdk/client-s3";

const object = await new S3Client().send(new GetObjectCommand({
	Bucket,
	Key,
}));

So, why something similar can't work in my case?

Short-term credentials still reqire a trust anchor: something that allows the issuance of short-term credentials. In the case of the Lambda function, this trust comes from configuring the role with the function ("passing the role"):

But my laptop is outside the AWS cloud, so there is no inherent trust anchor. That's why it needs some secret stored on-device.

What about Roles Anywhere?

Roles Anywhere is a newer feature of AWS that uses certificates to issue short-term AWS credentials. A certificate can be short-term, so why it won't work here?

This only adds another layer but won't change the big picture: something permanent needs to be stored in the device and it does not matter if that is the AWS key itself or a certificate that can be used to get these keys.

Two types of IAM users

I found that there are two types of iam users: one used by people, and one used by machines. For example, a new developer joins the team and you want to give them access to some resources. The easy way is to create an IAM user, attach some policies, then allow console access. Of course, configuring AWS SSO is a more scalable way, but most teams starting out with AWS usually start with a single account and with IAM users.

Requests sent by an actual person is inherently messy. Clicking around the console or trying out different commands with the CLI will try to access resources outside the allowed set.

Just opening the management console fetches data from a rather wide set of services:

As a result, denied requests are normal for humans.

The other type of IAM user is a machine user, such as in my case. I write the code that interacts with AWS and I define what the code is doing. If I define that it only reads an object from S3 then it won't go around and fetch other things:

import {S3Client, GetObjectCommand} from "@aws-sdk/client-s3";

const object = await new S3Client().send(new GetObjectCommand({
	Bucket: process.env.BUCKET,
	Key,
}));

That means if I configure the policies correctly for the exact use-case then under normal circumstances no requests will be denied.

This is the root of the idea behind this solution: when a machine user makes a request that is denied that means something is wrong, most likely the credentials are stolen. And for damage control, an automation can disable the user's access.

GetCallerIdentity

There is a very special action in IAM: the sts:GetCallerIdentity. This is used to get information about the current user, such as its account ID and its ARN:

{
	"UserId": "AIDA3A...",
	"Account": "7572...",
	"Arn": "arn:aws:iam::7572...:user/testing-User-SdzLVYqjlzW4"
}

It is special because it can not be denied, and is usually used to find out if the AWS CLI is configured properly.

Since it prints out information about the account, it is usually the first thing somebody would run when discovering an AWS credential. That makes it a good indicator of breach besides the access denied notification. And, fortunately, it is logged in CloudTrail:

{
	"eventVersion": "1.08",
	"userIdentity": {
		"type": "IAMUser",
		"..."
	},
	"eventTime": "2024-08-08T10:50:33Z",
	"eventSource": "sts.amazonaws.com",
	"eventName": "GetCallerIdentity",
	"..."
}

Architecture

So, how to build auto-remediation?

The core of the solution is CloudTrail. It records the requests going to the AWS account and it writes these to S3 an CloudWatch Logs.

When we have the logs in CloudWatch, the next is to put a metric filter that tracks requests that were denied and made by the user. Since each log event is a JSON writing a filter is easy.

With a metric that tracks the number of denied requests, an alarm can trigger a Lambda function. This is the auto-remediation part: the function can attach a policy to the user that denies all future requests.

One downside is latency as CloudTrail events are not delivered instantly. In practice, the user is disabled in a few minutes after the offending request was made.

Implementation

In initial requirement is that a trail is configured with a CloudWatch logs group:

As this is configured separately, it is a parameter for the CloudFormation stack:

Parameters:
  CloudTrailLogGroup:
    Type: String
    Description: "The CloudWatch log group where CloudTrail writes the logs"

Next, a metric filter writes a metric when access denied errors are encountered for this user:

LoginMetricFilter:
  Type: AWS::Logs::MetricFilter
  Properties:
    LogGroupName: !Ref CloudTrailLogGroup
    FilterPattern: !Sub '{($.errorCode = "AccessDenied" || $.eventName = "GetCallerIdentity") && $.userIdentity.arn = "${User.Arn}"}'
    MetricTransformations:
    - MetricValue: 1
      MetricNamespace: !Sub "${AWS::StackName}"
      MetricName: !Sub "AccessDenieds-${User}"

Note the FilterPattern: the errorCode and the userIdentity is checked where the latter need to match the user's ARN.

Here's an example event:

{
	"eventVersion": "1.08",
	"userIdentity": {
		"type": "IAMUser",
		"principalId": "AIDAUB3O2IQ5EEQKVGBG2",
		"arn": "arn:aws:iam::278868411450:user/test-User-s4G9yXBxXESF",
		"accountId": "278868411450",
		"accessKeyId": "AKIAUB3O2IQ5MQXZAOHI",
		"userName": "test-User-s4G9yXBxXESF"
	},
	"eventTime": "2024-07-25T10:52:35Z",
	"eventSource": "lambda.amazonaws.com",
	"eventName": "ListFunctions20150331",
	"awsRegion": "eu-central-1",
	"sourceIPAddress": "81.183.220.118",
	"userAgent": "aws-cli/2.17.14 md/awscrt#0.20.11 ua/2.0 os/linux#6.9.10-arch1-1 md/arch#x86_64 lang/python#3.12.4 md/pyimpl#CPython cfg/retry-mode#standard md/installer#source md/distrib#arch md/prompt#off md/command#lambda.list-functions",
	"errorCode": "AccessDenied",
	"errorMessage": "User: arn:aws:iam::278868411450:user/test-User-s4G9yXBxXESF is not authorized to perform: lambda:ListFunctions on resource: * because no identity-based policy allows the lambda:ListFunctions action",
	"requestParameters": null,
	"responseElements": null,
	"requestID": "66e4c7f5-7156-4216-b688-236ffd132e58",
	"eventID": "3f882e4b-cb95-4fdd-acb2-236344b89b9c",
	"readOnly": true,
	"eventType": "AwsApiCall",
	"managementEvent": true,
	"recipientAccountId": "278868411450",
	"eventCategory": "Management",
	"tlsDetails": {
		"tlsVersion": "TLSv1.3",
		"cipherSuite": "TLS_AES_128_GCM_SHA256",
		"clientProvidedHostHeader": "lambda.eu-central-1.amazonaws.com"
	}
}

Then an alarm is configured when this metric goes above 0:

AccessDeniedAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    ComparisonOperator: GreaterThanThreshold
    EvaluationPeriods: 1
    Threshold: 0
    Namespace: !Sub "${AWS::StackName}"
    MetricName: !Sub "AccessDenieds-${User}"
    Statistic: Maximum
    Period: 60
    TreatMissingData: notBreaching
    ActionsEnabled: true
    AlarmActions:
    - !GetAtt LambdaFunction.Arn

Lambda

The Lambda function needs several resources.

First, a log group:

LambdaLogGroup: 
  Type: AWS::Logs::LogGroup
  Properties: 
    RetentionInDays: 365

Then a role that allows logging and attaching the AWSDenyAll policy to the user:

LambdaExecutionRole:
  Type: AWS::IAM::Role
  Properties:
    AssumeRolePolicyDocument:
      Version: '2012-10-17'
      Statement:
      - Effect: Allow
        Principal:
          Service:
          - lambda.amazonaws.com
        Action:
        - sts:AssumeRole
    Policies:
      - PolicyName: allow-logs
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - 'logs:PutLogEvents'
            - 'logs:CreateLogStream'
            Resource: !Sub "${LambdaLogGroup.Arn}:*"
          - Effect: Allow
            Action:
            - 'iam:AttachUserPolicy'
            Resource: !GetAtt User.Arn
            Condition:
              StringEquals:
                "iam:PolicyARN": "arn:aws:iam::aws:policy/AWSDenyAll"

Then the function itself:

LambdaFunction:
  Type: AWS::Lambda::Function
  Properties:
    Runtime: nodejs20.x
    Role: !GetAtt LambdaExecutionRole.Arn
    Handler: index.handler
    ReservedConcurrentExecutions: 1
    Timeout: 30
    Code:
      ZipFile: |
        const {IAMClient, AttachUserPolicyCommand} = require("@aws-sdk/client-iam");
        exports.handler = async (event) => {
          const res = await new IAMClient().send(new AttachUserPolicyCommand({
            UserName: process.env.UserName,
            PolicyArn: "arn:aws:iam::aws:policy/AWSDenyAll",
          }));
          console.log(res);
        };
    LoggingConfig:
      LogGroup: !Ref LambdaLogGroup
      LogFormat: JSON
    Environment:
      Variables:
        UserName: !Ref User

Finally, a permission for the alarm to call this function:

InvokeLambdaPermission: 
  Type: AWS::Lambda::Permission
  Properties: 
    FunctionName: !GetAtt LambdaFunction.Arn
    Action: "lambda:InvokeFunction"
    Principal: "lambda.alarms.cloudwatch.amazonaws.com"
    SourceArn: !GetAtt AccessDeniedAlarm.Arn

Testing

Let's see how it works in action!

Create an access key for the user for testing:

The user can list S3 buckets:

$ aws s3 ls
2020-06-22 21:11:15 cf-templates-c2cqm2z6vtrr-eu-central-1
2019-06-29 09:40:07 cf-templates-c2cqm2z6vtrr-us-east-1

But listing Lambda functions is denied:

$ aws lambda list-functions

An error occurred (AccessDeniedException) when calling the ListFunctions operation: User: arn:aws:iam::278868411450:user/test-User-s4G9yXBxXESF is not authorized to perform: lambda:ListFunctions on resource: * because no identity-based policy allows the lambda:ListFunctions action

This triggers the alarm in a few minutes:

Now the AWSDenyAll policy is attached automatically to the user:

And even the S3 action is denied:

$ aws s3 ls

An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied
July 30, 2024

Free PDF guide

Sign up to our newsletter and download the "Foreign key constraints in DynamoDB" guide.


In this article