How CloudFront determines the origin request URL

How the origin config, the cache behavior, and the viewer request determines the URL sent to the origin

Author's image
Tamás Sallai
4 mins

CloudFront transforms the requests it sends to the origins. How it’s done depends on the origin config, the cache behavior config, and the viewer request path (the one that the visitor sends to CloudFront). Among these transformations, how the URL is constructed is the most important, and especially what the request path will be in the origin request.

The path of the request selects the directory and the object key for an S3 origin, such as /index.html is different than /bucket/index.html. In REST APIs, the path selects the resource to query, such as /user is different than /group.

In this article, we’ll look into what configurations in CloudFront influence the request it sends to the origin. We’ll discuss the path in detail as that can easily cause problems.

Origin config

When you add an origin, a few settings affect the URL sent to that origin. These specify the beginning of the URL (scheme, host, port, and the beginning of the path) and as these are properties of the origin it affects all requests.

The requests to this origin go to https://example.com:443/path.

Let’s break down each part!

The scheme comes from the Viewer Protocol Policy. It can be HTTP, HTTPS, or it can match the viewer request. In this case, it is HTTPS only, so the URL starts with https://.

The domain is straightforward, it’s the Origin Domain Name.

The port is selected from the two port selectors, HTTP Port and HTTPS Port, by what scheme is used. Since HTTPS is configured for this domain, the HTTPS Port is the effective one here, which is 443. Since this is the default for HTTPS connections, it will be used when if you don’t specify the port at all. https://example.com is the same as https://example.com:443.

Finally, the Origin Path defines the start of the path. Every request to this origin will start with this.

Adding CloudFront to your architecture is like laying down fiber-optic cable and building a datacenter next to every visitor. Learn how

Viewer request path

The rest of the path comes from the viewer request. Its path is appended to the Origin Path and that makes the full path.

A few examples (the /path comes from the Origin Path setting):

  • <id>.cloudfront.net => /path/
  • <id>.cloudfront.net/index.html => /path/index.html
  • <id>.cloudfront.net/api/users => /path/api/users
  • <id>.cloudfront.net/folder/file.txt => /path/folder/file.txt

You can see that the /path that comes from the origin settings is always present and the path from the incoming request is copied in full.

Cache behaviorsOrigins/api/**APIbucketvisitor«APIGateway2»API[]«S3Bucket»Bucket[]/api/user/api/user/api/user/index.html/index.html/index.htmlRequest forwarding

Cache behavior config

And we’ve arrived at the part that causes the most confusion. How does the cache behavior affect the path?

Every behavior has a Path Pattern that defines what paths it can serve.

Under a distribution, the list of cache behaviors gives an overview of these path patterns:

The Path Pattern does not directly influence the path of the origin request but it does define which behaviors can handle the request. And since the cache behavior selects the origin, the viewer request path will always match the behavior’s path pattern.

For example, if the behavior has a path pattern of /api/* and the origin path is /path then all paths will start with /path/api/.

A few examples:

  • /api/users => /path/api/users
  • /api/ => /path/api/
  • / => handled by a different cache behavior
  • /users => handled by a different cache behavior

In many cases, this can be a problem as it means two origins both expecting requests with paths going to the root (/) can not be easily integrated into a CloudFront distribution.

Let’s say there are two S3 buckets, and you want to host their contents under /bucket1 and /bucket2. The two cache behaviors for these buckets have path patterns of /bucket1/* and /bucket2/*. When a request hits the first behavior, the origin request goes to <bucket1>/bucket1/..., and for the second behavior, it sends it to <bucket2>/bucket2/.... To serve files from these buckets, you need to put the contents to the bucket1 and the bucket2 folders.

Unfortunately, CloudFront does not provide an easy way to remove parts of the path from the origin request. You can use Lambda@Edge, but that is a complicated solution for a seemingly simple problem.

Get familiar with CloudFront quickly with our video course.

Conclusion

The URL CloudFront sends to the origin is determined by the origin config and the viewer request. While the cache behavior does not change the request path, it still determines which requests it can handle. As a consequence, the requests sent to the origin will match the path pattern of the behavior.

25 September 2020
In this article