AWS: How to query the available CPU credits for t2/t3 instances

How to know how much CPU power your burstable instance has

6 mins
I have a lot of challenges when it comes to AWS, but I bet your pain points are entirely different than mine. I'd love to hear what keeps you up at night. It would be great to hear from you by filling out this form. Thanks in advance!

Motivation

The instances in the t2 and the t3 instance family, i.e. the instance types that start with either t2. or t3., are burstable ones. That means the instance collects CPU credits over time that can be used later. If you use less than what you get in the long run – the baseline performance, between 5% and 40% of the available CPUs – you won’t even notice how this system works. But if you use more than that, the instance will either get throttled or you’ll get charged for the excess usage.

It is useful to know how much horsepower an instance has. But unfortunately, looking from the instance itself it is hard to know whether the instance is overused or not.

But as this data can be queried from CloudWatch, it is just a matter of scripting to get an up-to-date overview of the numbers.

CPU credit balance

The credit balance is automatically posted to CloudWatch by AWS under the AWS/EC2 namespace. It has a fixed interval of 5 minutes and it can not be lowered even when using detailed monitoring.

The metric that keeps track of the available credits is called CPUCreditBalance.

When using the AWS CLI, the dimension parameter can be used to filter for the instance ID, like this: --dimensions Name=InstanceId,Value=$INSTANCE_ID. But how do you know the instance ID? It can be queried from other APIs, or if you request it from the instance itself then you can use the metadata service: INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id).

Another set of required parameters is the date interval consisting of the start and the end time. Both are expected in ISO-8601 format, which is readily available using date --iso-8601=seconds.

The end-time is the current time so that it returns the most recent data point: --end-time $(date --iso-8601=seconds).

The start-time should be at least 5 minutes in the past: --start-time $(date --iso-8601=seconds -d "10 mins ago"). It is better to specify a longer, but not too long, interval and then sort and filter the results than use a too short one and risk not consistently getting back a data point. The only exception is when the instance was recently started, in which case there will be no metrics available.

The --statistics defines how the data points are aggregated. Since by specifying --period 300 there will be no aggregation, this call will get the raw data points. Therefore it does not make a difference what you choose, apart from the counted ones. I’ll use Maximum in these examples. Just make sure to handle the appropriate one in the response.

The CloudWatch CLI call is then:

aws cloudwatch get-metric-statistics \
	--namespace AWS/EC2 \
	--metric-name CPUCreditBalance \
	--start-time $(date --iso-8601=seconds -d "10 mins ago") \
	--end-time $(date --iso-8601=seconds) \
	--period 300 \
	--statistics Maximum \
	--dimensions Name=InstanceId,Value=$INSTANCE_ID

This returns a JSON similar to this one:

{
    "Label": "CPUCreditBalance",
    "Datapoints": [
        {
            "Timestamp": "2019-04-13T08:59:00Z",
            "Maximum": 104.93587318333333,
            "Unit": "Count"
        },
        {
            "Timestamp": "2019-04-13T09:04:00Z",
            "Maximum": 105.02401506666666,
            "Unit": "Count"
        }
    ]
}

Notice that there are 2 data points in this example. With some jq magic it is easy to extract the data with the highest timestamp:

| jq '.Datapoints | sort_by(.Timestamp | fromdateiso8601) | .[-1].Maximum'

The full script that returns the CPU credits for an instance is:

CPU_POSITIVE_BALANCE=$(aws cloudwatch get-metric-statistics \
		--namespace AWS/EC2 \
		--metric-name CPUCreditBalance \
		--start-time $(date --iso-8601=seconds -d "10 mins ago") \
		--end-time $(date --iso-8601=seconds) \
		--period 300 \
		--statistics Maximum \
		--dimensions Name=InstanceId,Value=$INSTANCE_ID |
	jq '.Datapoints | sort_by(.Timestamp | fromdateiso8601) | .[-1].Maximum')

Surplus balance

If you have unlimited mode enabled then you can also account for the surplus balance. This is a similar metric, but instead of tracking the remaining CPU credits it tracks the negative balance.

The concept is the same, but the metric name is CPUSurplusCreditBalance:

CPU_SURPLUS_BALANCE=$(aws cloudwatch get-metric-statistics \
		--namespace AWS/EC2 \
		--metric-name CPUSurplusCreditBalance \
		--start-time $(date --iso-8601=seconds -d "10 mins ago") \
		--end-time $(date --iso-8601=seconds) \
		--period 300 \
		--statistics Maximum \
		--dimensions Name=InstanceId,Value=$INSTANCE_ID |
	jq '.Datapoints | sort_by(.Timestamp | fromdateiso8601) | .[-1].Maximum')

Then to calculate the real credit balance, subtract the surplus balance from the positive one. This can be safely done as one of the two metrics is always 0.

CPU_BALANCE=$(echo "$CPU_POSITIVE_BALANCE - $CPU_SURPLUS_BALANCE" | bc)

Maximum CPU credits

The maximum amount of CPU credits affects how much the instance can accumulate as well as when you’ll be charged for the surplus credits. I consider it vital info.

The maximum value is how many credits the instance gets in 24 hours. There are tables in the AWS docs, but I couldn’t find them in a parseable format.

To remedy this, I made a JSON file that you can use to query this data. If you see any discrepancy between the docs and the JSON, please open a PR.

With this repository in place you can use it to get the hourly collected credits for burstable instance types. To calculate the maximum is just a matter of multiplication:

CPU_CREDITS_PER_HOUR=$(curl -s https://sashee.github.io/aws-data/burstable_instances_cpu_credit_per_hour.json | \
	jq --arg instanceType "$INSTANCE_TYPE" '.[$instanceType]')

MAX_CREDITS=$(echo "$CPU_CREDITS_PER_HOUR * 24" | bc)

Where to get the instance type? If you use the metadata service: INSTANCE_TYPE=$(curl -s http://169.254.169.254/latest/meta-data/instance-type).

Printing

Now that you have both the CPU_BALANCE and the MAX_CREDITS, it is time to print them:

printf "%.0f/%d" "$CPU_BALANCE" "$MAX_CREDITS"

This will print an informative status, for example:

105/288

Conclusion

With some scripting you can get the up-to-date status of your instance. It can be easily integrated with tmux, zsh, or any other tool you use. This can be a convenient indicator where you stand in terms of bursting capabilities.

16 April 2019