How to set up Amazon RDS password rotation with Terraform

Set up Secrets Manager password rotation with a VPC and a Lambda function

Tamás Sallai

8 mins

Want to learn AWS serverless development? Click here

Database passwords in the state file

When you create an RDS cluster you need to define a username and a master password. This is not a problem when a trusted person is using the Console to type it manually, but it is a problem when you use Terraform as in that case the state file contains the password in plain text:

Terraform state with the master password

The proper solution is to implement state management for Terraform so that the state file is not readable by everyone as detailed in the Terraform documentation. While this definitely solves this problem, it feels a bit of overkill for simple projects. So I looked for a different solution.

I used RDS Data API that relies on Secrets Manager to store the current version of the password. For example, starting a transaction takes the cluster and the secret ARN:

aws rds-data begin-transaction --resource-arn "..." --database "mydb" --secret-arn "..."

This makes it a bit easier as as long as the password set in the database and in the secret match, the Data API works.

Book

Building GraphQL APIs with AWS AppSync

How to design, implement, and deploy GraphQL-based APIs on the AWS cloud

Password rotation with RDS

Secrets Manager supports password rotation. This gave an idea: why not set up rotation so that the password in the Terraform state is no longer valid? It is a best practice to change passwords from time to time, so this feels like a good solution.

It turned out that "supported by AWS" does not mean "easy" by any means. It took me quite some time to figure out every detail of configuration to make it work.

First, I used the Data API that does not go through the usual networking paths of a VPC but uses a different plane. Since that means there is no need to make the database reachable, I did not want to expose it just because of rotation. This turned out to be a complication later on.

Second, rotating a secret is a complicated process, consisting of Secrets Manager invoking a Lambda function multiple times with different steps. Luckily, AWS provides a template for this as well as implementations for different scenarios (such as this one for MySQL). Moreover, there are templates you can deploy directly.

Theoretically, these ready-made solutions should make it easy to implement everything. But combined with the Data API, things become complicated.

Changing a password can be done in two distinct ways: one is to use the ModifyDBCluster AWS API. The other one is to connect to the database, log in with the current credentials then issue a SET PASSWORD command. While the first approach does not rely on connectivity to the database, the second one does.

The AWS scripts implement the second approad, so the Lambda they use require a connection to the database as well as the Secrets Manager service as described in the documentation. Because of this, rotation requires a somewhat complicated VPC-based networking setup just to wire everything together.

The secret itself needs to encode everything to change the password to be usable with the AWS-provided scripts. Its structure that is expected by the rotation function is described in the documentation:

{
  "engine": "mysql",
  "host": "<instance host name/resolvable DNS name>",
  "username": "<username>",
  "password": "<password>",
  "dbname": "<database name. If not specified, defaults to None>",
  "port": "<TCP port number. If not specified, defaults to 3306>"
}

For example:

{
	"engine": "mysql",
	"host": "tf-20220529115812774900000003.cluster-csmcnafptpfv.eu-central-1.rds.amazonaws.com",
	"password": "nAt?3e4dh~6f*=M}9,nsMYSLw(=nnx?x",
	"username": "admin"
}

As everything is in the secret, the function only needs the secret ARN, then it can read the host, the username, and the current password. Then it can log in, change the password, then save the new one in the secret.

Networking configuration

Since I did not want to expose the database to the Internet, I had to use a VPC and wire everything together inside it. This requires quite a few parts:

2 subnets (RDS needs at least 2)
The rotator function configured with an endpoint in one of the subnets
RDS cluster configured for the subnets
Inteface endpoint for Secrets Manager so the Lambda can call it
Security Group so that things can talk to each other

With this, the database is still private and reachable only via the Data API as it was originally. The Lambda function has only limited access to the parts it needs to communicate with (Secrets Manager and the database). And since the rotator runs immediately after being enabled, the password in the Terraform state is not valid.

Let's see how to configure the different parts with Terraform!

VPC, Subnets, and Security Group

resource "aws_vpc" "db" {
  cidr_block           = "10.0.0.0/16"
	# needed for the interface endpoint
  enable_dns_support   = true
  enable_dns_hostnames = true
}

# query the AZs
data "aws_availability_zones" "available" {
  state = "available"
}

# create 2 subnets
resource "aws_subnet" "db" {
  count             = 2
  vpc_id            = aws_vpc.db.id
  cidr_block        = cidrsubnet(aws_vpc.db.cidr_block, 8, count.index)
  availability_zone = data.aws_availability_zones.available.names[count.index]
}

# allow data flow between the components
resource "aws_security_group" "db" {
  vpc_id = aws_vpc.db.id

  ingress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = [aws_vpc.db.cidr_block]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = [aws_vpc.db.cidr_block]
  }
}

It creates 2 Subnets in two different Availability Zones as RDS has this requirement. Then it creates a Security Group that allows communication inside the VPC.

RDS Cluster

# initial password
resource "random_password" "db_master_pass" {
  length           = 40
  special          = true
  min_special      = 5
  override_special = "!#$%^&*()-_=+[]{}<>:?"
}

# the secret
resource "aws_secretsmanager_secret" "db-pass" {
  name = "db-pass-${random_id.id.hex}"
}

# initial version
resource "aws_secretsmanager_secret_version" "db-pass-val" {
  secret_id = aws_secretsmanager_secret.db-pass.id
	# encode in the required format
  secret_string = jsonencode(
    {
      username = aws_rds_cluster.cluster.master_username
      password = aws_rds_cluster.cluster.master_password
      engine   = "mysql"
      host     = aws_rds_cluster.cluster.endpoint
    }
  )
}

# add the cluster to the 2 subnets
resource "aws_db_subnet_group" "db" {
  subnet_ids = aws_subnet.db[*].id
}

resource "aws_rds_cluster" "cluster" {
  engine                 = "aurora-mysql"
  engine_version         = "5.7.mysql_aurora.2.07.1"
  engine_mode            = "serverless"
  database_name          = "mydb"
  master_username        = "admin"
  master_password        = random_password.db_master_pass.result
  enable_http_endpoint   = true
  skip_final_snapshot    = true
	# attach the security group
  vpc_security_group_ids = [aws_security_group.db.id]
	# deploy to the subnets
  db_subnet_group_name   = aws_db_subnet_group.db.name
}

This part creates a Secret and a Secret Version. The latter contains the initial password to the database as well as the other required information for the rotation function.

Then it creates a Subnet Group with the two Subnets. Finally, it creates a cluster with the configuration.

Secrets Manager Endpoint

resource "aws_vpc_endpoint" "secretsmanager" {
  vpc_id              = aws_vpc.db.id
  service_name        = "com.amazonaws.${data.aws_region.current.name}.secretsmanager"
  vpc_endpoint_type   = "Interface"
  private_dns_enabled = true
	# deploy in the first subnet
  subnet_ids          = [aws_subnet.db[0].id]
	# attach the security group
  security_group_ids  = [aws_security_group.db.id]
}

This part creates an Interface Endpoint for Secrets Manager so the rotator Lambda can call the service to get and change the secret value.

Rotator stack

# find the details by id
data "aws_serverlessapplicationrepository_application" "rotator" {
  application_id = "arn:aws:serverlessrepo:us-east-1:297356227824:applications/SecretsManagerRDSMySQLRotationSingleUser"
}

data "aws_partition" "current" {}
data "aws_region" "current" {}

# deploy the cloudformation stack
resource "aws_serverlessapplicationrepository_cloudformation_stack" "rotate-stack" {
  name             = "Rotate-${random_id.id.hex}"
  application_id   = data.aws_serverlessapplicationrepository_application.rotator.application_id
  semantic_version = data.aws_serverlessapplicationrepository_application.rotator.semantic_version
  capabilities     = data.aws_serverlessapplicationrepository_application.rotator.required_capabilities

  parameters = {
		# secrets manager endpoint
    endpoint            = "https://secretsmanager.${data.aws_region.current.name}.${data.aws_partition.current.dns_suffix}"
		# a random name for the function
    functionName        = "rotator-${random_id.id.hex}"
		# deploy in the first subnet
    vpcSubnetIds        = aws_subnet.db[0].id
		# attach the security group so it can communicate with the other componets
    vpcSecurityGroupIds = aws_security_group.db.id
  }
}

The aws_serverlessapplicationrepository_application data source finds the application by its ID, then the aws_serverlessapplicationrepository_cloudformation_stack creates a CloudFormation stack. It passes the endpoint, functionName, the vpcSubnetIds, and the vpcSecurityGroupIds parameters. This creates a Lambda function with an endpoint in the correct VPC with permissions to talk to both the database and the Secrets Manager Endpoint.

Rotation

resource "aws_secretsmanager_secret_rotation" "rotation" {
	# secret_id through the secret_version so that it is deployed before setting up rotation
  secret_id           = aws_secretsmanager_secret_version.db-pass-val.secret_id
  rotation_lambda_arn = aws_serverlessapplicationrepository_cloudformation_stack.rotate-stack.outputs.RotationLambdaARN

  rotation_rules {
    automatically_after_days = 30
  }
}

Finally, this part configures rotation for the secret. Notice that the secret_id references the aws_secretsmanager_secret_version instead of the aws_secretsmanager_secret. This makes sure the initial password is in place before the rotation is setup so that the rotator Lambda can read the current value. Without it, there is a race condition: if the rotation is enabled first, the Lambda throws an error.

Testing

After running terraform apply and waiting for the deployment to finish, the CloudFormation stack is created:

CloudFormation stack for the rotation function

The rotator function is run with all the steps needed to change the password:

Rotation is configured and clicking the "Rotate secret immediately" changes the stored password after a few seconds:

Costs

A secret costs $0.4 / month. The Interface endpoint costs $0.01 / hour, which is $7.2 / month. I think the RDS endpoint is part of the database and as such it does not incur additional costs, but I'm not 100% sure about that.

The Lambda, the VPC, the resources inside, and various calls have some costs, but should be negligible as rotation happens rarely.

In total, while this solution adds some fixed costs to the database, it is in the manageable level.