How to set up Amazon RDS password rotation with Terraform
Set up Secrets Manager password rotation with a VPC and a Lambda function
Database passwords in the state file
When you create an RDS cluster you need to define a username and a master password. This is not a problem when a trusted person is using the Console to type it manually, but it is a problem when you use Terraform as in that case the state file contains the password in plain text:
The proper solution is to implement state management for Terraform so that the state file is not readable by everyone as detailed in the Terraform documentation. While this definitely solves this problem, it feels a bit of overkill for simple projects. So I looked for a different solution.
I used RDS Data API that relies on Secrets Manager to store the current version of the password. For example, starting a transaction takes the cluster and the secret ARN:
aws rds-data begin-transaction --resource-arn "..." --database "mydb" --secret-arn "..."
This makes it a bit easier as as long as the password set in the database and in the secret match, the Data API works.
Password rotation with RDS
Secrets Manager supports password rotation. This gave an idea: why not set up rotation so that the password in the Terraform state is no longer valid? It is a best practice to change passwords from time to time, so this feels like a good solution.
It turned out that "supported by AWS" does not mean "easy" by any means. It took me quite some time to figure out every detail of configuration to make it work.
First, I used the Data API that does not go through the usual networking paths of a VPC but uses a different plane. Since that means there is no need to make the database reachable, I did not want to expose it just because of rotation. This turned out to be a complication later on.
Second, rotating a secret is a complicated process, consisting of Secrets Manager invoking a Lambda function multiple times with different steps. Luckily, AWS provides a template for this as well as implementations for different scenarios (such as this one for MySQL). Moreover, there are templates you can deploy directly.
Theoretically, these ready-made solutions should make it easy to implement everything. But combined with the Data API, things become complicated.
Changing a password can be done in two distinct ways: one is to use the
ModifyDBCluster AWS
API. The other one is to connect to the database, log in with the current credentials then issue a SET PASSWORD
command. While the first approach does not
rely on connectivity to the database, the second one does.
The AWS scripts implement the second approad, so the Lambda they use require a connection to the database as well as the Secrets Manager service as described in the documentation. Because of this, rotation requires a somewhat complicated VPC-based networking setup just to wire everything together.
The secret itself needs to encode everything to change the password to be usable with the AWS-provided scripts. Its structure that is expected by the rotation function is described in the documentation:
{
"engine": "mysql",
"host": "<instance host name/resolvable DNS name>",
"username": "<username>",
"password": "<password>",
"dbname": "<database name. If not specified, defaults to None>",
"port": "<TCP port number. If not specified, defaults to 3306>"
}
For example:
{
"engine": "mysql",
"host": "tf-20220529115812774900000003.cluster-csmcnafptpfv.eu-central-1.rds.amazonaws.com",
"password": "nAt?3e4dh~6f*=M}9,nsMYSLw(=nnx?x",
"username": "admin"
}
As everything is in the secret, the function only needs the secret ARN, then it can read the host, the username, and the current password. Then it can log in, change the password, then save the new one in the secret.
Networking configuration
Since I did not want to expose the database to the Internet, I had to use a VPC and wire everything together inside it. This requires quite a few parts:
- 2 subnets (RDS needs at least 2)
- The rotator function configured with an endpoint in one of the subnets
- RDS cluster configured for the subnets
- Inteface endpoint for Secrets Manager so the Lambda can call it
- Security Group so that things can talk to each other
With this, the database is still private and reachable only via the Data API as it was originally. The Lambda function has only limited access to the parts it needs to communicate with (Secrets Manager and the database). And since the rotator runs immediately after being enabled, the password in the Terraform state is not valid.
Let's see how to configure the different parts with Terraform!
VPC, Subnets, and Security Group
resource "aws_vpc" "db" {
cidr_block = "10.0.0.0/16"
# needed for the interface endpoint
enable_dns_support = true
enable_dns_hostnames = true
}
# query the AZs
data "aws_availability_zones" "available" {
state = "available"
}
# create 2 subnets
resource "aws_subnet" "db" {
count = 2
vpc_id = aws_vpc.db.id
cidr_block = cidrsubnet(aws_vpc.db.cidr_block, 8, count.index)
availability_zone = data.aws_availability_zones.available.names[count.index]
}
# allow data flow between the components
resource "aws_security_group" "db" {
vpc_id = aws_vpc.db.id
ingress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = [aws_vpc.db.cidr_block]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = [aws_vpc.db.cidr_block]
}
}
It creates 2 Subnets in two different Availability Zones as RDS has this requirement. Then it creates a Security Group that allows communication inside the VPC.
RDS Cluster
# initial password
resource "random_password" "db_master_pass" {
length = 40
special = true
min_special = 5
override_special = "!#$%^&*()-_=+[]{}<>:?"
}
# the secret
resource "aws_secretsmanager_secret" "db-pass" {
name = "db-pass-${random_id.id.hex}"
}
# initial version
resource "aws_secretsmanager_secret_version" "db-pass-val" {
secret_id = aws_secretsmanager_secret.db-pass.id
# encode in the required format
secret_string = jsonencode(
{
username = aws_rds_cluster.cluster.master_username
password = aws_rds_cluster.cluster.master_password
engine = "mysql"
host = aws_rds_cluster.cluster.endpoint
}
)
}
# add the cluster to the 2 subnets
resource "aws_db_subnet_group" "db" {
subnet_ids = aws_subnet.db[*].id
}
resource "aws_rds_cluster" "cluster" {
engine = "aurora-mysql"
engine_version = "5.7.mysql_aurora.2.07.1"
engine_mode = "serverless"
database_name = "mydb"
master_username = "admin"
master_password = random_password.db_master_pass.result
enable_http_endpoint = true
skip_final_snapshot = true
# attach the security group
vpc_security_group_ids = [aws_security_group.db.id]
# deploy to the subnets
db_subnet_group_name = aws_db_subnet_group.db.name
}
This part creates a Secret and a Secret Version. The latter contains the initial password to the database as well as the other required information for the rotation function.
Then it creates a Subnet Group with the two Subnets. Finally, it creates a cluster with the configuration.
Secrets Manager Endpoint
resource "aws_vpc_endpoint" "secretsmanager" {
vpc_id = aws_vpc.db.id
service_name = "com.amazonaws.${data.aws_region.current.name}.secretsmanager"
vpc_endpoint_type = "Interface"
private_dns_enabled = true
# deploy in the first subnet
subnet_ids = [aws_subnet.db[0].id]
# attach the security group
security_group_ids = [aws_security_group.db.id]
}
This part creates an Interface Endpoint for Secrets Manager so the rotator Lambda can call the service to get and change the secret value.
Rotator stack
# find the details by id
data "aws_serverlessapplicationrepository_application" "rotator" {
application_id = "arn:aws:serverlessrepo:us-east-1:297356227824:applications/SecretsManagerRDSMySQLRotationSingleUser"
}
data "aws_partition" "current" {}
data "aws_region" "current" {}
# deploy the cloudformation stack
resource "aws_serverlessapplicationrepository_cloudformation_stack" "rotate-stack" {
name = "Rotate-${random_id.id.hex}"
application_id = data.aws_serverlessapplicationrepository_application.rotator.application_id
semantic_version = data.aws_serverlessapplicationrepository_application.rotator.semantic_version
capabilities = data.aws_serverlessapplicationrepository_application.rotator.required_capabilities
parameters = {
# secrets manager endpoint
endpoint = "https://secretsmanager.${data.aws_region.current.name}.${data.aws_partition.current.dns_suffix}"
# a random name for the function
functionName = "rotator-${random_id.id.hex}"
# deploy in the first subnet
vpcSubnetIds = aws_subnet.db[0].id
# attach the security group so it can communicate with the other componets
vpcSecurityGroupIds = aws_security_group.db.id
}
}
The aws_serverlessapplicationrepository_application
data source finds the application by its ID, then the
aws_serverlessapplicationrepository_cloudformation_stack
creates a CloudFormation stack. It passes the endpoint
, functionName
, the
vpcSubnetIds
, and the vpcSecurityGroupIds
parameters. This creates a Lambda function with an endpoint in the correct VPC with permissions to talk to
both the database and the Secrets Manager Endpoint.
Rotation
resource "aws_secretsmanager_secret_rotation" "rotation" {
# secret_id through the secret_version so that it is deployed before setting up rotation
secret_id = aws_secretsmanager_secret_version.db-pass-val.secret_id
rotation_lambda_arn = aws_serverlessapplicationrepository_cloudformation_stack.rotate-stack.outputs.RotationLambdaARN
rotation_rules {
automatically_after_days = 30
}
}
Finally, this part configures rotation for the secret. Notice that the secret_id
references the aws_secretsmanager_secret_version
instead of the
aws_secretsmanager_secret
. This makes sure the initial password is in place before the rotation is setup so that the rotator Lambda can read the current
value. Without it, there is a race condition: if the rotation is enabled first, the Lambda throws an error.
Testing
After running terraform apply
and waiting for the deployment to finish, the CloudFormation stack is created:
The rotator function is run with all the steps needed to change the password:
Rotation is configured and clicking the "Rotate secret immediately" changes the stored password after a few seconds:
Costs
A secret costs $0.4 / month. The Interface endpoint costs $0.01 / hour, which is $7.2 / month. I think the RDS endpoint is part of the database and as such it does not incur additional costs, but I'm not 100% sure about that.
The Lambda, the VPC, the resources inside, and various calls have some costs, but should be negligible as rotation happens rarely.
In total, while this solution adds some fixed costs to the database, it is in the manageable level.