How CloudFront routing works
How CloudFront uses path-based routing to select where to forward a request
CloudFront is a proxy that sits between the users and the backend servers, called origins. When a request comes in, CloudFront forwards it to one of the origins.
Let's see what parts of the distribution configuration decides how the routing happens!
Path-based routing
Without CloudFront, each origin has its own name or IP address where it can be accessed and clients connect to them directly. For example, EC2 servers can have
Elastic IPs, an API Gateway has its own domain under https://<id>.execute-api.<region>.amazonaws.com
.
But when these services are behind CloudFront, they use only one domain, either the default <id>.cloudfront.net
or a custom one. Because of this, host-based
routing is not possible.
Routing in CloudFront is based on the path of the request. A request that goes to https://<id>.cloudfront.net/api/users
has the path /api/users
.
This is the distinguishing factor that decides which backend server the request goes.
Cache behavior configuration
Cache behaviors are the unit of configuration that decides what happens with an incoming request. They define how to transform a request and the response, how to cache, what to include or exclude, and most important, which origin to forward to.
Each behavior has a path pattern that defines what paths it can handle. This is a filter expression, an incoming request either matches this pattern or not.
A path pattern supports the *
and ?
wildcards, where the former matches 0 or more characters and the latter exactly one. This is not a regex engine
and don't plan to write complicated patterns here.
Usually, path patterns fall into one of three categories:
- Exact matching:
robots.txt
,404.html
- Start of the path:
/api/*
,/files/*
- End of the path, usually the extension:
*.jpg
,*.html
Precedence
As there can be more than one cache behavior that matches a given path (/api/image.jpg
is matched by both /api/*
and *.jpg
), CloudFront needs
to break this tie. Because of this, there is an ordering between the behaviors.
There is exactly one that has the default (*
) path pattern, which it called the default cache behavior. This matches all the requests and it is always the
last one.
When a request reaches the distribution, CloudFront starts from the top and tries to match the path patterns for each cache behavior. The first one that matches wins.
Origin
Each cache behavior defines an origin via its Origin ID. The first matching behavior's origin will be used for the request.
To find the origin configuration, select the origin with the matching Origin ID. This contains the domain where CloudFront forwards the request.