How to run SQL scripts against the RDS Data API with Terraform

Initialize tables and data with a null resource and a local-exec provisioner

Tamás Sallai

4 mins

Want to learn AWS serverless development? Click here

Initialize an RDS cluster

When you create an RDS database it does not contain tables and data. In a production environment this is the expected behavior: the schema is not managed as part of the infrastructure as usually it has a different lifecycle. But for a demo or an example code it is a perfectly reasonable wish to deploy everything with as few commands as possible.

So, is there a way to run SQL commands that create tables and example data against an RDS cluster as part of a terraform apply?

Yes, but it requires several building blocks to implement. And to make things more interesting, I needed to use the Data API, which means there is no direct connection to the database, only the AWS CLI's rds-data command is usable.

Book

Building GraphQL APIs with AWS AppSync

How to design, implement, and deploy GraphQL-based APIs on the AWS cloud

Database setup

Since the Data API requires a Secret to hold the password, the database setup requires three components at least:

The RDS cluster
The Secret
And the initial password

# secret to store the password
resource "aws_secretsmanager_secret" "db-pass" {
  name = "db-pass-${random_id.id.hex}"
}

# initial value
resource "aws_secretsmanager_secret_version" "db-pass-val" {
  secret_id = aws_secretsmanager_secret.db-pass.id
  secret_string = jsonencode(
    {
      username = aws_rds_cluster.cluster.master_username
      password = aws_rds_cluster.cluster.master_password
      engine   = "mysql"
      host     = aws_rds_cluster.cluster.endpoint
    }
  )
}

# rds cluster
resource "aws_rds_cluster" "cluster" {
  engine               = "aurora-mysql"
  engine_version       = "5.7.mysql_aurora.2.07.1"
  engine_mode          = "serverless"
  database_name        = "mydb"
  master_username      = "admin"
  master_password      = random_password.db_master_pass.result
  enable_http_endpoint = true
  skip_final_snapshot  = true
  scaling_configuration {
    min_capacity = 1
  }
}

Null resource with local-exec

The local-exec provisioner allows arbitrary commands to run locally as part of the deploy. This is the most versatile approach to extend Terraform's functionality but at the cost of adding more dependencies to the machine that runs the deploy.

The null resource is a construct that is similar to other resources but it does not provision anything in the cloud. In this scenario its only purpose is to provide a "container item" for the local-exec provisioner.

The combined structure:

resource "null_resource" "db_setup" {
  provisioner "local-exec" {
    command = <<-EOF
			# ...command
			EOF
    environment = {
			# ...variables
    }
  }
}

SQL file

The SQL statements that create the tables and insert data are located in a separate file (initial.sql):

DROP TABLE IF EXISTS TEST;

CREATE TABLE IF NOT EXISTS TEST(
	id INT NOT NULL AUTO_INCREMENT,PRIMARY KEY(id)
);

INSERT INTO TEST (id) VALUES ('1');
INSERT INTO TEST (id) VALUES ('2');

Note that it drops the table on the first line and recreates/repopulates it on the next lines. This is good when experimenting but know that data will be lost if this file is re-run.

Environment variables

The execute-statement requires three things from the database:

The cluster's ARN
The database name
The secret's ARN

To pass these to the local-exec provisioner, use the environment block:

resource "null_resource" "db_setup" {
  provisioner "local-exec" {
		environment = {
			DB_ARN     = aws_rds_cluster.cluster.arn
			DB_NAME    = aws_rds_cluster.cluster.database_name
			SECRET_ARN = aws_secretsmanager_secret.db-pass.arn
		}
		# ...
	}
	# ...
}

Command

The command is where the magic happens. It reads the initial.sql file, does some transformation on the format, then runs the aws rds-data execute-statement for each statement found:

while read line; do
	echo "$line"
	aws rds-data execute-statement --resource-arn "$DB_ARN" --database  "$DB_NAME" --secret-arn "$SECRET_ARN" --sql "$line"
done  < <(awk 'BEGIN{RS=";\n"}{gsub(/\n/,""); if(NF>0) {print $0";"}}' initial.sql)

The awk part splits the file by ;\n which is usually (there might be some edge cases where it breaks) the end of the statement. Then it writes each statement in one line so that the read line can read them one-by-one. Finally, it prints out only non-empty lines.

The while read line goes through each statement and runs the AWS CLI command.

Interpreter

Terraform uses /bin/sh by default and that does not support the <(...) structure. Because of this, the provisioner needs a different interpreter:

resource "null_resource" "db_setup" {
  provisioner "local-exec" {
		interpreter = ["bash", "-c"]
		# ...
	}
	# ...
}

Rerun after change

By default, nothing checks if the null resource is changed and should be redeployed. As a result, once the script runs, it is never run again.

Terraform supports triggers that allow fine control over when the resource should be redeployed. In this case, it should respect when the initial.sql is changed, so a trigger with the file's hash is a good solution here:

resource "null_resource" "db_setup" {
	triggers = {
		file = filesha1("initial.sql")
	}
	# ...
}

The finished script

resource "null_resource" "db_setup" {
  triggers = {
    file = filesha1("initial.sql")
  }
  provisioner "local-exec" {
    command = <<-EOF
			while read line; do
				echo "$line"
				aws rds-data execute-statement --resource-arn "$DB_ARN" --database  "$DB_NAME" --secret-arn "$SECRET_ARN" --sql "$line"
			done  < <(awk 'BEGIN{RS=";\n"}{gsub(/\n/,""); if(NF>0) {print $0";"}}' initial.sql)
			EOF
    environment = {
      DB_ARN     = aws_rds_cluster.cluster.arn
      DB_NAME    = aws_rds_cluster.cluster.database_name
      SECRET_ARN = aws_secretsmanager_secret.db-pass.arn
    }
    interpreter = ["bash", "-c"]
  }
}