CloudTrail's horrible developer experience

Why it's so hard to react to events in an account?

Author's image
Tamás Sallai
5 mins
Photo by Tim Johnson on Unsplash

CloudTrail

CloudTrail has probably the worst developer experience I had with AWS recently. CloudTrail itself is an immensely useful service: it gives insights into the events that are happening in an account, such as who made changes to what resource, console logins, and also to some extent who accessed what.

To be fair, CloudTrail's primary use-case is to provide data for forensic analysis and it's rather good at that. With the multi-region organization trail you only need to set it up once and it will happily log everything into an S3 bucket. When there is a need, the logs will be in one place.

But then CloudTrail has a secondary use-case, which is to provide insight into what is happening in the account possibly with automation on top of it to react to certain events. Some examples I needed in the past was to get a notification when somebody signs in to the console, and in another case I wanted to disable an access key when it makes a request that is denied.

CloudTrail is entirely capable to handle these cases, as it has all the events. But the developer experience ruins the show.

Hidden events in EventBridge

What makes a developer's life miserable is suprises and lack of visibility into how things work. And CloudTrail is guilty on both accounts.

CloudTrail is the source of the account events but it does not allow reacting to these events, you'll need to use a separate service for that. All CloudTrail allows is:

  • save the events to an S3 bucket
  • write to a CloudWatch Log Group

At least that's what the console shows when you configure a trail. But there is a hidden third integration:

  • forward events to EventBridge

It does not show up on the CloudTrail side, but on the EventBridge there is an event called "AWS API Call via CloudTrail" listed for the services in the examples:

Apparently, this includes events for all services, but I couldn't stop wondering whether this event type is really included for all services in the list.

How do events end up in EventBridge? When you configure a trail in the region it automatically sends events to EventBridge. So:

  • no trail => no events
  • trail => events

Of course, there is no indication that events are coming or not, the only difference is that the rules are not matching.

Next, I would assume that since EventBridge depends on a trail sending events then the events sent depends on the trail configuration. But that's not the case: a multi-regional trail won't send events happening in other regions. Again, the only indication for this is that the rules won't match.

Next surprise is that an EventBridge rule can be in 3 states:

  • DISABLED
  • ENABLED
  • ENABLED_WITH_ALL_CLOUDTRAIL_MANAGEMENT_EVENTS

If you simply "ENABLE" the rule it won't get CloudTrail events.

Pricing does not help

CloudTrail's pricing model also makes little sense for this use-case. To get events delivered to EventBridge you need a trail in the same region. And by the best practice, there should be a multi-region organization trail enabled in the account to collect forensic data. Moreover, only the first copy of the events are free, all others come with a price tag.

So, let's say the main trail is deployed in us-east-1 but your application is in eu-west-1. Here, you get a tough choice:

  • no trail => no events
  • trail => events, but with an extra cost for everything happening in the account

This pricing model encourages users to avoid creating extra trails, which makes sense, but then there are no events either.

CloudWatch Logs

So, what's the solution? The best I could come up with to date is to use CloudWatch Logs and configure a metric filter with an alarm. This provides all events the trail is configured for and provides excellent visibility into how events look like.

But it feels needlessly complex:

  • CloudTrail records the events
  • writes them to CloudWatch Logs
  • a metric filter matches the incoming events
  • and publishes a metric
  • then an alarm picks up these values and notifies an SNS/lambda

All to react an event coming from CloudTrail.

Cross-account?

But even this falls short in one common scenario: detecting events in a member account.

The trail is configured in the management account, so the CloudWatch Log Group is in the management account as well. So how could a member account react to an event happining in that account without replicating the events?

Well, not much at this moment.

What could help

The best would be that CloudTrail gets a notification feature that supports fine-grained filtering similar to EventBridge. That would eliminate replicating all events just to detect a few, so this could be then deployed wherever it's needed, even in member accounts.

Moreover, it would be nice if the default 90-days retention feature could also support more fine-grained queries. The current lookup-events supports only a handful of fields.

September 17, 2024

Free PDF guide

Sign up to our newsletter and download the "Foreign key constraints in DynamoDB" guide.


In this article