How to keep accurate counts in DynamoDB
Use transactions to atomically update items and keep a record of their numbers
Querying counts
You have a DynamoDB table containing the registered users. But how many users you have? With an SQL-based database, that would be a simple SELECT COUNT(*) FROM users
. But DynamoDB is not that and it does have native support for efficient counting.
At first sight, it might seem like a deficiency. But the above SQL query hides iterating over all the records in a table, which is O(n) so what is one command
might not be efficient at all. This also happens with
offset pagination. Since DynamoDB
does not support anything besides the scan
that may perform poorly when the number of rows in a table begins to grow, it has no count(*)
functionality.
A bad solution in DynamoDB is to use a scan
. While it accurately returns the number of elements in a table (or a subset of it) without storing it as a
separate entry, it does so by reading all the items in the table. While it works for small tables, the read cost goes through the roof for any realistic
production app. So don't be tempted to use it even if it looks good on demos.
Another suboptimal solution (though it's mentioned in many places) is to use a DynamoDB stream to call a Lambda function that keeps a value stored somewhere in the database up-to-date with the changes. A stream collects modified (added, deleted, and changed) elements and sends them to a target that can be a variety of things inside AWS, such as an SNS topic, an SQS queue, or a Lambda function. A piece of code can then use these change items to increment/decrement a number.
This solution has some nice properties. It decouples the logic of keeping a count from the main application logic. The app code does not need any changes, it writes the user records as before and it can also read the number of items. All the magic is encapsulated in the stream configuration.
But it has two significant downsides.
First, it is async, meaning that the count is updated later than the item. This can lead to inconsistencies as the latest changes might not be reflected in the derived value.
And second, it's not easy to make sure that the updater script runs to completion every single time. With a DDB stream + Lambda function solution you introduce quite a bit of moving parts, each having some guarantees but errors might happen. Lambda has an at-least-once processing model, but even that can fail, sending the call to the DLQ. And since errors accumulate, even an occasional mistake makes a permanent offset between the value and the records in the table.
Despite these downsides, it is a good solution in many cases, especially when you only need an approximation. In that case, the asynchronicity and the occasional errors can be tolerated.
But with DynamoDB transactions, we can do better. In this article, you'll learn how to keep accurate counts in the application logic.
Keeping counts with transactions
With transactions, you can group multiple change operations and they are guaranteed to either succeed or fail together. This allows inserting/removing an item from the users table and simultaneously updating a count value to be an atomic operation.
There are 3 operations we need to implement to provide counting functionality: initialization, adding, and removing elements.
Initialization
Since all the other operations increase or decrease a number in the counts table, we need to make sure there is a number there. This has to be done once, usually when the app is started.
First, we need to check whether the count is there, and add if not:
const item = await client.send(new GetItemCommand({
TableName: COUNTS_TABLE,
Key: marshall({type: "users"}),
}));
if (!item.Item) {
// initialize count to 0
await client.send(new PutItemCommand({
TableName: COUNTS_TABLE,
Item: marshall({type: "users", count: 0}),
ConditionExpression: "attribute_not_exists(#pk)",
ExpressionAttributeNames: {"#pk": "type"},
}));
}
The ConditionExpression: "attribute_not_exists(#pk)"
makes sure that the database is in the state we expect it to be. As there are no transactions between
the commands (the GetItemCommand
and the PutItemCommand
), another thread or app might have inserted a value. This check makes sure that it's not
overwritten.
Adding an item
To add an item that requires changing the count value, we need a transaction to modify both at the same time.
The ConditionExpression: attribute_not_exists(#pk)
is also important here as we need to make sure that the user item is indeed inserted into the table.
Without that, it might overwrite an existing record and still increase the count.
To modify a number by a given value, DynamoDB supports an UpdateExpression
. This has the advantage of multiple transactions can write it simultaneously
without the need for ConditionExpression
s.
// the user object
const item = ...;
// insert new user
await client.send(new TransactWriteItemsCommand({
TransactItems: [
{
Put: {
TableName: USERS_TABLE,
ConditionExpression: "attribute_not_exists(#pk)",
ExpressionAttributeNames: {"#pk": "ID"},
Item: marshall(item),
}
},
{
Update: {
TableName: COUNTS_TABLE,
UpdateExpression: "ADD #count :count",
ExpressionAttributeNames: {"#count": "count"},
ExpressionAttributeValues: marshall({":count": 1}),
Key: marshall({type: "users"}),
}
}
]
}));
The above code can only insert a new user item but it can not update one. This is because of the ConditionExpression: "attribute_not_exists(#pk)"
. To
implement modifications, use a PutItemCommand
with a condition to make sure the item exists:
// the user object
const item = ...;
// update existing user
await client.send(new PutItemCommand({
TableName: USERS_TABLE,
ConditionExpression: "attribute_exists(#pk)",
ExpressionAttributeNames: {"#pk": "ID"},
Item: marshall(item),
}));
Deleting an item
Deletion works similar to addition, with the exception that it needs a Delete operation.
It needs to do 3 things:
- check that the item exists (
ConditionExpression: "attribute_exists(#pk)"
) - decrement the count
- delete the item
await client.send(new TransactWriteItemsCommand({
TransactItems: [
{
Delete: {
TableName: USERS_TABLE,
ConditionExpression: "attribute_exists(#pk)",
ExpressionAttributeNames: {"#pk": "ID"},
Key: marshall({ID}),
}
},
{
Update: {
TableName: COUNTS_TABLE,
UpdateExpression: "ADD #count :count",
ExpressionAttributeNames: {"#count": "count"},
ExpressionAttributeValues: marshall({":count": -1}),
Key: marshall({type: "users"}),
}
}
]
}));
Conclusion
Keeping accurate counts is not easy in DynamoDB but it can be done using transactions. Make sure to use the correct ConditionExpression
s in every
operation to avoid accumulating errors when multiple processes are changing the same data.