How to run SQL scripts against the RDS Data API with Terraform
Initialize tables and data with a null resource and a local-exec provisioner
Initialize an RDS cluster
When you create an RDS database it does not contain tables and data. In a production environment this is the expected behavior: the schema is not managed as part of the infrastructure as usually it has a different lifecycle. But for a demo or an example code it is a perfectly reasonable wish to deploy everything with as few commands as possible.
So, is there a way to run SQL commands that create tables and example data against an RDS cluster as part of a terraform apply
?
Yes, but it requires several building blocks to implement. And to make things more interesting, I needed to use the Data API, which means there is no direct connection to the database, only the AWS CLI's rds-data command is usable.
Database setup
Since the Data API requires a Secret to hold the password, the database setup requires three components at least:
- The RDS cluster
- The Secret
- And the initial password
# secret to store the password
resource "aws_secretsmanager_secret" "db-pass" {
name = "db-pass-${random_id.id.hex}"
}
# initial value
resource "aws_secretsmanager_secret_version" "db-pass-val" {
secret_id = aws_secretsmanager_secret.db-pass.id
secret_string = jsonencode(
{
username = aws_rds_cluster.cluster.master_username
password = aws_rds_cluster.cluster.master_password
engine = "mysql"
host = aws_rds_cluster.cluster.endpoint
}
)
}
# rds cluster
resource "aws_rds_cluster" "cluster" {
engine = "aurora-mysql"
engine_version = "5.7.mysql_aurora.2.07.1"
engine_mode = "serverless"
database_name = "mydb"
master_username = "admin"
master_password = random_password.db_master_pass.result
enable_http_endpoint = true
skip_final_snapshot = true
scaling_configuration {
min_capacity = 1
}
}
Null resource with local-exec
The local-exec provisioner allows arbitrary commands to run locally as part of the deploy. This is the most versatile approach to extend Terraform's functionality but at the cost of adding more dependencies to the machine that runs the deploy.
The null resource is a construct that is similar to other resources but it does not provision anything in the cloud. In this scenario its only purpose is to provide a "container item" for the local-exec provisioner.
The combined structure:
resource "null_resource" "db_setup" {
provisioner "local-exec" {
command = <<-EOF
# ...command
EOF
environment = {
# ...variables
}
}
}
SQL file
The SQL statements that create the tables and insert data are located in a separate file (initial.sql
):
DROP TABLE IF EXISTS TEST;
CREATE TABLE IF NOT EXISTS TEST(
id INT NOT NULL AUTO_INCREMENT,PRIMARY KEY(id)
);
INSERT INTO TEST (id) VALUES ('1');
INSERT INTO TEST (id) VALUES ('2');
Note that it drops the table on the first line and recreates/repopulates it on the next lines. This is good when experimenting but know that data will be lost if this file is re-run.
Environment variables
The execute-statement
requires three things from the database:
- The cluster's ARN
- The database name
- The secret's ARN
To pass these to the local-exec provisioner, use the environment
block:
resource "null_resource" "db_setup" {
provisioner "local-exec" {
environment = {
DB_ARN = aws_rds_cluster.cluster.arn
DB_NAME = aws_rds_cluster.cluster.database_name
SECRET_ARN = aws_secretsmanager_secret.db-pass.arn
}
# ...
}
# ...
}
Command
The command
is where the magic happens. It reads the initial.sql
file, does some transformation on the format, then runs the aws rds-data execute-statement
for each statement found:
while read line; do
echo "$line"
aws rds-data execute-statement --resource-arn "$DB_ARN" --database "$DB_NAME" --secret-arn "$SECRET_ARN" --sql "$line"
done < <(awk 'BEGIN{RS=";\n"}{gsub(/\n/,""); if(NF>0) {print $0";"}}' initial.sql)
The awk
part splits the file by ;\n
which is usually (there might be some edge cases where it breaks) the end of the statement. Then it writes each
statement in one line so that the read line
can read them one-by-one. Finally, it prints out only non-empty lines.
The while read line
goes through each statement and runs the AWS CLI command.
Interpreter
Terraform uses /bin/sh
by default and that does not support the
<(...)
structure. Because of this, the provisioner needs a different interpreter:
resource "null_resource" "db_setup" {
provisioner "local-exec" {
interpreter = ["bash", "-c"]
# ...
}
# ...
}
Rerun after change
By default, nothing checks if the null resource is changed and should be redeployed. As a result, once the script runs, it is never run again.
Terraform supports triggers that allow fine control over when
the resource should be redeployed. In this case, it should respect when the initial.sql
is changed, so a trigger with the file's
hash is a good solution here:
resource "null_resource" "db_setup" {
triggers = {
file = filesha1("initial.sql")
}
# ...
}
The finished script
resource "null_resource" "db_setup" {
triggers = {
file = filesha1("initial.sql")
}
provisioner "local-exec" {
command = <<-EOF
while read line; do
echo "$line"
aws rds-data execute-statement --resource-arn "$DB_ARN" --database "$DB_NAME" --secret-arn "$SECRET_ARN" --sql "$line"
done < <(awk 'BEGIN{RS=";\n"}{gsub(/\n/,""); if(NF>0) {print $0";"}}' initial.sql)
EOF
environment = {
DB_ARN = aws_rds_cluster.cluster.arn
DB_NAME = aws_rds_cluster.cluster.database_name
SECRET_ARN = aws_secretsmanager_secret.db-pass.arn
}
interpreter = ["bash", "-c"]
}
}