How CloudFront routing works

How CloudFront uses path-based routing to select where to forward a request

Author's image
Tamás Sallai
2 mins

CloudFront is a proxy that sits between the users and the backend servers, called origins. When a request comes in, CloudFront forwards it to one of the origins.

Let’s see what parts of the distribution configuration decides how the routing happens!

Path-based routing

Without CloudFront, each origin has its own name or IP address where it can be accessed and clients connect to them directly. For example, EC2 servers can have Elastic IPs, an API Gateway has its own domain under https://<id>.execute-api.<region>.amazonaws.com.

Visitor«EC2»Servers[]«S3Bucket»Static assets[]«APIGateway2»API[]10.0.0.1example.com...execute-api.amazonaws.comDirect access

But when these services are behind CloudFront, they use only one domain, either the default <id>.cloudfront.net or a custom one. Because of this, host-based routing is not possible.

Visitor«CloudFront»CloudFront[]Edge locations«EC2»Servers[]«S3Bucket»Static assets[]«APIGateway2»API[]example.com/server//apiPath-based routing

Routing in CloudFront is based on the path of the request. A request that goes to https://<id>.cloudfront.net/api/users has the path /api/users. This is the distinguishing factor that decides which backend server the request goes.

Adding CloudFront to your architecture is like laying down fiber-optic cable and building a datacenter next to every visitor. Learn how

Cache behavior configuration

Cache behaviors are the unit of configuration that decides what happens with an incoming request. They define how to transform a request and the response, how to cache, what to include or exclude, and most important, which origin to forward to.

Each behavior has a path pattern that defines what paths it can handle. This is a filter expression, an incoming request either matches this pattern or not.

A path pattern supports the * and ? wildcards, where the former matches 0 or more characters and the latter exactly one. This is not a regex engine and don’t plan to write complicated patterns here.

Usually, path patterns fall into one of three categories:

  • Exact matching: robots.txt, 404.html
  • Start of the path: /api/*, /files/*
  • End of the path, usually the extension: *.jpg, *.html

Precedence

As there can be more than one cache behavior that matches a given path (/api/image.jpg is matched by both /api/* and *.jpg), CloudFront needs to break this tie. Because of this, there is an ordering between the behaviors.

There is exactly one that has the default (*) path pattern, which it called the default cache behavior. This matches all the requests and it is always the last one.

When a request reaches the distribution, CloudFront starts from the top and tries to match the path patterns for each cache behavior. The first one that matches wins.

Get familiar with CloudFront quickly with our video course.

Origin

Each cache behavior defines an origin via its Origin ID. The first matching behavior’s origin will be used for the request.

To find the origin configuration, select the origin with the matching Origin ID. This contains the domain where CloudFront forwards the request.

18 September 2020
In this article