How to keep accurate counts in DynamoDB

Use transactions to atomically update items and keep a record of their numbers

Author's image
Tamás Sallai
6 mins

Querying counts

You have a DynamoDB table containing the registered users. But how many users you have? With an SQL-based database, that would be a simple SELECT COUNT(*) FROM users. But DynamoDB is not that and it does have native support for efficient counting.

At first sight, it might seem like a deficiency. But the above SQL query hides iterating over all the records in a table, which is O(n) so what is one command might not be efficient at all. This also happens with offset pagination. Since DynamoDB does not support anything besides the scan that may perform poorly when the number of rows in a table begins to grow, it has no count(*) functionality.

A bad solution in DynamoDB is to use a scan. While it accurately returns the number of elements in a table (or a subset of it) without storing it as a separate entry, it does so by reading all the items in the table. While it works for small tables, the read cost goes through the roof for any realistic production app. So don't be tempted to use it even if it looks good on demos.

Another suboptimal solution (though it's mentioned in many places) is to use a DynamoDB stream to call a Lambda function that keeps a value stored somewhere in the database up-to-date with the changes. A stream collects modified (added, deleted, and changed) elements and sends them to a target that can be a variety of things inside AWS, such as an SNS topic, an SQS queue, or a Lambda function. A piece of code can then use these change items to increment/decrement a number.

This solution has some nice properties. It decouples the logic of keeping a count from the main application logic. The app code does not need any changes, it writes the user records as before and it can also read the number of items. All the magic is encapsulated in the stream configuration.

But it has two significant downsides.

First, it is async, meaning that the count is updated later than the item. This can lead to inconsistencies as the latest changes might not be reflected in the derived value.

And second, it's not easy to make sure that the updater script runs to completion every single time. With a DDB stream + Lambda function solution you introduce quite a bit of moving parts, each having some guarantees but errors might happen. Lambda has an at-least-once processing model, but even that can fail, sending the call to the DLQ. And since errors accumulate, even an occasional mistake makes a permanent offset between the value and the records in the table.

Despite these downsides, it is a good solution in many cases, especially when you only need an approximation. In that case, the asynchronicity and the occasional errors can be tolerated.

But with DynamoDB transactions, we can do better. In this article, you'll learn how to keep accurate counts in the application logic.

Keeping counts with transactions

With transactions, you can group multiple change operations and they are guaranteed to either succeed or fail together. This allows inserting/removing an item from the users table and simultaneously updating a count value to be an atomic operation.

There are 3 operations we need to implement to provide counting functionality: initialization, adding, and removing elements.

Initialization

Since all the other operations increase or decrease a number in the counts table, we need to make sure there is a number there. This has to be done once, usually when the app is started.

First, we need to check whether the count is there, and add if not:

const item = await client.send(new GetItemCommand({
	TableName: COUNTS_TABLE,
	Key: marshall({type: "users"}),
}));

if (!item.Item) {
	// initialize count to 0
	await client.send(new PutItemCommand({
		TableName: COUNTS_TABLE,
		Item: marshall({type: "users", count: 0}),
		ConditionExpression: "attribute_not_exists(#pk)",
		ExpressionAttributeNames: {"#pk": "type"},
	}));
}

The ConditionExpression: "attribute_not_exists(#pk)" makes sure that the database is in the state we expect it to be. As there are no transactions between the commands (the GetItemCommand and the PutItemCommand), another thread or app might have inserted a value. This check makes sure that it's not overwritten.

Adding an item

To add an item that requires changing the count value, we need a transaction to modify both at the same time.

The ConditionExpression: attribute_not_exists(#pk) is also important here as we need to make sure that the user item is indeed inserted into the table. Without that, it might overwrite an existing record and still increase the count.

To modify a number by a given value, DynamoDB supports an UpdateExpression. This has the advantage of multiple transactions can write it simultaneously without the need for ConditionExpressions.

// the user object
const item = ...;

// insert new user
await client.send(new TransactWriteItemsCommand({
	TransactItems: [
		{
			Put: {
				TableName: USERS_TABLE,
				ConditionExpression: "attribute_not_exists(#pk)",
				ExpressionAttributeNames: {"#pk": "ID"},
				Item: marshall(item),
			}
		},
		{
			Update: {
				TableName: COUNTS_TABLE,
				UpdateExpression: "ADD #count :count",
				ExpressionAttributeNames: {"#count": "count"},
				ExpressionAttributeValues: marshall({":count": 1}),
				Key: marshall({type: "users"}),
			}
		}
	]
}));

The above code can only insert a new user item but it can not update one. This is because of the ConditionExpression: "attribute_not_exists(#pk)". To implement modifications, use a PutItemCommand with a condition to make sure the item exists:

// the user object
const item = ...;

// update existing user
await client.send(new PutItemCommand({
	TableName: USERS_TABLE,
	ConditionExpression: "attribute_exists(#pk)",
	ExpressionAttributeNames: {"#pk": "ID"},
	Item: marshall(item),
}));

Deleting an item

Deletion works similar to addition, with the exception that it needs a Delete operation.

It needs to do 3 things:

  • check that the item exists (ConditionExpression: "attribute_exists(#pk)")
  • decrement the count
  • delete the item
await client.send(new TransactWriteItemsCommand({
	TransactItems: [
		{
			Delete: {
				TableName: USERS_TABLE,
				ConditionExpression: "attribute_exists(#pk)",
				ExpressionAttributeNames: {"#pk": "ID"},
				Key: marshall({ID}),
			}
		},
		{
			Update: {
				TableName: COUNTS_TABLE,
				UpdateExpression: "ADD #count :count",
				ExpressionAttributeNames: {"#count": "count"},
				ExpressionAttributeValues: marshall({":count": -1}),
				Key: marshall({type: "users"}),
			}
		}
	]
}));

Conclusion

Keeping accurate counts is not easy in DynamoDB but it can be done using transactions. Make sure to use the correct ConditionExpressions in every operation to avoid accumulating errors when multiple processes are changing the same data.

March 9, 2021

Free PDF guide

Sign up to our newsletter and download the "Git Tips and Tricks" guide.


In this article