The slow evolution of AppSync RDS support

How AppSync slowly got the ability to use relational databases

Author's image
Tamás Sallai
5 mins
Photo by Natalia Y. on Unsplash

When I started working with AppSync at the fall of 2020 it already supported all important data sources: reading and writing data in DynamoDB tables, calling Lambda functions, sending generic HTTP requests, and also working with relational databases. While the syntax was a bit awkward back then due to the choice of VTL for resolvers, the integration itself was solid.

Except for RDS. That was terrible.

Over the years, AppSync gradually got features to ease working with RDS data sources, and over a series of long steps today it's on par with other data sources.

In this article I recollect the important steps along the way.

Origins: VTL + variableMap

Originally, AppSync only supported VTL (Velocity Template Language) for writing resolver code. That was used to generate a stringified JSON that will then be parsed back to JSON and sent to the data source. It was hard-to-read and generally error-prone, so while it's still supported by AppSync, you should use only JavaScript resolvers.

The RDS data source needed a statements and a variableMap for the SQL request:

{
	"version": "2018-05-29",
	"statements": [
		"SELECT * FROM UserGroup WHERE id = :ID"
	],
	"variableMap": {
		":ID": $util.toJson($ctx.args.id)
	}
}

This looks like a parameterized statement but it had a fatal flaw: it used string concatenation without any escaping when the variables were merged back to the statement.

Yes, that's an SQL injection in 2020:

query MyQuery {
	groupById_unsafe(id: "a' OR '1' = '1' LIMIT 1 OFFSET 1; -- ") {
		id
		name
	}
}

This was a known issue. A ticket was raised because special characters break the query:

However if input parameters author or title contains a signle quote, this will break VTL.

What is the best way to work around this?

Then another ticket was more about the security aspect:

While there is a variableMap field which looks like it should be parameterised, it simply concatenates the strings and is just as vulnerable to SQLi. There is a broad misconception about this in the community (#60 (comment)) as this separation is the standard way of parametrising queries in all other SQL libraries (including boto3).

The official documentation mentioned this but put all the burden on the user:

We strongly encourage the developers to sanitize the arguments of GraphQL operations.

I wrote about this in my AppSync book:

This is a huge oversight from AWS, especially as the Data API that the RDS data source uses has everything to solve this.

...

While it works for known cases, this is not a proper solution, just a temporary one. Custom SQL input sanitization is generally not recommended as it can miss attack vectors. Use the above at your own risk and monitor the ticket.

Interim #1: JavaScript resolvers

Then AppSync added support for JavaScript resolvers in 2022. While its rollout was staged with first only supporting pipeline resolver functions, it was already a huge improvement over VTL.

As for the RDS data source, the documentation was entirely missing.

But nevertheless the same structure worked in JS as with VTL, albeit with the same escaping problem:

return {
	version: "2018-05-29",
	statements: [
		"SELECT * FROM UserGroup WHERE id = :ID"
	],
	variableMap: {
		":ID": ctx.args.id,
	}
}

Interim #2: Parameterized queries

In the meantime, AWS fixed the variableMap vulnerability by sending a parameterized query instead of simply concatenating the strings. I don't know when that happened, but that fixed the VTL resolvers.

Now, the documentation says:

AWS AppSync will use the variable map values to construct the SqlParameterized queries that will be sent to the Amazon Aurora Serverless Data API. The SQL statements are executed with parameters provided in the variable map, which eliminates the risk of SQL injection.

The present: tagged templates

Then the latest addition to the RDS data source is the support for tagged templates. This takes advantage of JavaScript's ability to define a function for a template string.

import {util} from "@aws-appsync/utils";
import {sql, createPgStatement, toJsonObject} from "@aws-appsync/utils/rds";

export function request(ctx) {
	return createPgStatement(sql`
		SELECT * FROM "user_group" WHERE id = ${ctx.args.id}
	`);
}

This provides a modern, easy-to-use and safe way to write SQL queries against RDS data sources.

The future: Data API availability?

So, is this the end of the road and RDS is "complete"?

Well, so-so. AppSync support is good, so I don't expect any new big features in this department.

The problem is the Data API. This provides a stateless request-based interface instead of the traditional connection-based one and since AppSync is serverless it can't maintain a long-running connection. That's why it needs the Data API.

The alternative would be the "sit down before you open the pricing page" RDS Proxy.

But where the Data API is available is very spotty. Just look at the table that shows along the different dimensions what is available.

There are multiple dimensions here. The serverless version (v1 or v2) governs how the instances scale and how it is priced, where generally newer is better. Then MySQL vs Postgres is the database engine and there are different versions of both. Finally, different regions support different combinations.

If you look at the table you'll see how sparse it is:

  • Serverless v2 with MySQL is entirely missing
  • Aurora MySQL v3 is entirely missing
  • Serverless v2 is only supported in 4 regions in total (N. Virginia, Oregon, Tokyo, Frankfurt)

With this, I think the next frontier of the evolution of the AppSync RDS support is to increase the availability of the Data API so that it can be used in more configurations.

June 25, 2024