The surprising properties of DynamoDB pagination

When limits and filters mess up the results

Author's image
Tamás Sallai
6 mins

DynamoDB uses token-based pagination. This means when a query or a scan operation potentially return more results then the result contains a LastEvaluatedKey field with some value. To fetch the next page, pass that value as the ExclusiveStartKey in a separate query. Do this until the LastEvaluatedKey is undefined, and you can be sure you fetched all items.

How it works is logical and simple. But when you start using other DynamoDB features, namely the Limit and the FilterExpression for queries and scans, things become more interesting.

In this article, we'll look into a specific case where at first the results do not make much sense. Then we'll look into how these features work and see why this is a feature of DynamoDB and not a bug.

The base query

Let's run this DynamoDB query and inspect the results:

import { DynamoDBClient, QueryCommand } from "@aws-sdk/client-dynamodb";

const client = new DynamoDBClient();
const command = new QueryCommand({
	TableName: process.env.TABLE,
	IndexName: "groupId",
	KeyConditionExpression: "#groupId = :groupId",
	ExpressionAttributeNames: {"#groupId": "group_id", "#status": "status"},
	ExpressionAttributeValues: {
		":groupId": {S: "group1"},
		":status": {S: "ACTIVE"},
	},
	ScanIndexForward: false,
	FilterExpression: "#status = :status",
});
console.log(await client.send(command));

This returns a single item with some additional metadata:

{
  '$metadata': {
    "..."
  },
  Items: [
    {
      "..."
    }
  ],
  LastEvaluatedKey: undefined,
}

Limits

Let's see what happens if we add different Limit arguments to the query!

Limit: 1

const command = new QueryCommand({
	TableName: process.env.TABLE,
	IndexName: "groupId",
	KeyConditionExpression: "#groupId = :groupId",
	ExpressionAttributeNames: {"#groupId": "group_id", "#status": "status"},
	ExpressionAttributeValues: {
		":groupId": {S: "group1"},
		":status": {S: "ACTIVE"},
	},
	ScanIndexForward: false,
	FilterExpression: "#status = :status",
	// Add a limit
	Limit: 1,
});

The result now has zero items!

{
	"Count": 0,
	"Items": [],
	"LastEvaluatedKey": {
		"..."
	},
	"ScannedCount": 1
}

Even when fetching the next page, there are no items, but a different LastEvaluatedKey. Only the third page returns an item.

Stranger still, the third query returns a pagination token, but the fourth query returns zero results.

Limit: 2

const command = new QueryCommand({
// ...
	Limit: 2,
});

No items, but a LastEvaluatedKey:

{
	"Count": 0,
	"Items": [],
	"LastEvaluatedKey": {
		"..."
	},
	"ScannedCount": 2
}

Limit: 3

const command = new QueryCommand({
// ...
	Limit: 3,
});

We have a result!

{
	"Count": 1,
	"Items": [{
		"..."
	}],
	"LastEvaluatedKey": {
		"..."
	},
	"ScannedCount": 3
}

Repeating the query with the pagination token returns no items and no LastEvaluatedKey either:

{
	"Count": 0,
	"Items": [],
	"ScannedCount": 0
}

Limit: 4

const command = new QueryCommand({
// ...
	Limit: 4,
});

We have a result!

{
	"Count": 1,
	"Items": [{
		"..."
	}],
	"ScannedCount": 3
}

With this limit we came back to the original response: 1 item, no pagination token.

Seeing these results, there are several pressing questions we can ask. Why the matching item not in the result when the Limit: 1? Why is there a pagination token when the response already contains all items, such as in the Limit: 3 scenario?

Investigation

First, let's see what items are in the table!

The users are stored with a group_id and a status field. There is also a last_active and that is the sort key. This defines the ordering of the items.

The query expression defines the group_id, so the results only contain items where the group is group1. This is why all the returned users are in this group.

Then the filter expression makes sure that only active users are returned.

For the first query, DynamoDB fetches a page of users that match the query expression. This means all three users are fetched, then the filter expression drops inactive ones.

This is why this query returns 1 item and no pagination token: DynamoDB knows there are no more users, as it reached the end of the table.

Limit: 1

When the Limit is 1, DynamoDB fetches only 1 item. The query expression is effective before the fetch, it will still skip users not in group1. But since the filter expressions runs after, the result will have zero items and a pagination token.

Limit: 2

When Limit is 2, it works almost the same. The only difference is that DynamoDB fetches 2 items, then drops both.

Limit: 3

When the Limit is 3, things change a bit. DynamoDB gets 3 items, but the last one is ACTIVE. This means the filter expression drops only 2 items and returns 1. But since the query did not reach the end of the table, there will be a pagination token.

The next query returns zero results, but also no pagination token. This means DynamoDB reached the end of the table.

Limit: 4

Finally, when Limit is 4, DynamoDB fetches all results and also reaches the end of the table. This means the result will contain the single result and no pagination token.

Takeaways

The filter expression is run after DynamoDB fetches the results, so it is almost the same as doing it on the client-side, the only difference is the saved bytes on the network layer.

Then Limit is applied for the fetch, which is before the filtering. This means there are no guarantees that a Limit will contain all matching items. As we've seen previously, it is possible that a result set contains zero items and multiple steps of pagination is needed to get the first result.

This is especially important when the DynamoDB query is behind an abstraction. For example, if the query is sent by an AppSync resolver and clients only interface with the GraphQL endpoint then they might not be aware that the limit works this way. Worse still, a seemingly simple change of adding a filter expression to that resolver changes the behavior of the GraphQL query. Because of this, it is important to communicate the lack of guarantees.

Using filter expressions with limit can easily affect performance. If, let's say, there are 10 non-matching item then a Limit: 1 query needs to send 11 requests to get the first result. Without Limit, DynamoDB usually fetches enough items that it at least won't impact the number of roundtrips.

Then, getting a LastEvaluatedKey does not mean there are more items to return. It can happen that DynamoDB did not reach the end of the table and a subsequent request is needed to make sure no elements are left out. This might result in strange behaviors such as showing a "load more results" link but when clicked it shows nothing.

And finally, the query expression is more efficient as it does not even touch the items that do not match. So if you can organize the table or index in a way that you can integrate the filtering in the query expression, then most of the surprises caused by pagination and limiting won't happen.

July 12, 2022
In this article