Alexey Baltacov for AWS Community Builders

Posted on Mar 22 • Originally published at linkedin.com

AWS WAF Rate Limiting Based on Origin Response

#aws #waf #cloudfront #security

Introduction

You have a public website fronted by Amazon CloudFront that serves static files from S3. Customers access these files via direct URLs and must be able to download any file at any time without interference. At the same time, you want to stop malicious actors from crawling your entire bucket.

The Challenge

Goal: Prevent automated scanning of all URLs while still allowing legitimate customers unlimited downloads of the specific files they need.
Constraint: No user login or authentication. Files are freely downloadable, so you cannot simply gate them behind a sign-in flow.

Why Plain AWS WAF Rate Limiting Is Not Enough

AWS WAF lets you define rate-limit rules keyed by source IP or by fingerprinting mechanisms such as JA3 and JA4. In theory, you could set:

A low limit such as 10 requests per minute, which blocks scanners effectively but risks blocking legitimate high-throughput customers.
A high limit, which lets scanners creep through, especially if attackers distribute requests across IPs or devices.

The result is an uncomfortable trade-off: too low hurts real users, too high fails to stop attackers.

AWS WAF's View of Origin Responses and ATP Rules

By default, custom AWS WAF rules only inspect request attributes. They do not know whether your origin returned 200 OK or 404 Not Found.

The only built-in AWS WAF rules that inspect responses are the Account Takeover Prevention (ATP) managed rules. Those require you to map login fields and are designed for authentication endpoints, not static file downloads.

Why Lambda@Edge Alone Cannot Solve It

Lambda@Edge runs per request and has no built-in shared global state. It cannot maintain counters across all executions, so by itself it cannot enforce a global request threshold.

A Hybrid Approach: WAF + Lambda@Edge

You can combine WAF's global counting capabilities with Lambda@Edge's ability to modify HTTP responses.

1) Primary WAF Rate-Limit Rule (Soft Threshold)

Type: Rate-based statement, for example 10 requests per 5 minutes
Action: Count (not Block)
Custom response header: Insert X-RateLimit-Exceeded: true

AWS WAF prefixes custom header names with x-amzn-waf-. So if you specify X-RateLimit-Exceeded, downstream components will see:

x-amzn-waf-x-ratelimit-exceeded

2) Secondary WAF Rate-Limit Rule (Hard Threshold)

Type: Rate-based statement with a much higher threshold, for example 1,000 requests per 5 minutes
Action: Block

This immediately stops heavy-volume attacks at the WAF layer, prevents excessive Lambda@Edge invocations, and reduces cost.

3) Lambda@Edge Function (Origin Response Trigger)

Trigger: CloudFront Origin Response event

exports.handler = async (event) => {
  const response = event.Records[0].cf.response;
  const headers = response.headers;

  // AWS WAF prefixes headers with x-amzn-waf-
  const flag = headers['x-amzn-waf-x-ratelimit-exceeded'];

  if (flag && flag[0].value === 'true' && response.status !== '200') {
    return {
      status: '429',
      statusDescription: 'Too Many Requests',
      headers: {
        'content-type': [{ key: 'Content-Type', value: 'text/html' }]
      },
      body: '<html><body><h1>Rate Limit Reached</h1><p>Please try again later.</p></body></html>'
    };
  }

  return response;
};

Associate and Deploy

Attach your Web ACL, containing both rate-limit rules and any ATP group you choose to use, to the CloudFront distribution.
Deploy the Lambda@Edge function through the CloudFront console or the AWS CLI.

Testing

Legitimate Access

Repeatedly fetch an existing file. The soft-limit counter will increment, but users will still receive the file until the hard threshold is crossed.

Scanning Attempts

Request many non-existent URLs. Errors quickly hit the soft threshold, causing your custom 429 response page. Extreme traffic volumes hit the hard threshold and are blocked at WAF before Lambda@Edge runs.

Demo video

Benefits of This Pattern

Precision: Rate limits are tied to actual Not Found or error responses.
User Experience: Legitimate customers getting 200 OK responses are not blocked unless they truly exceed your thresholds.
Cost Efficiency: High-volume attacks are stopped before Lambda@Edge runs.
AWS-native design: Uses AWS WAF, CloudFront, and Lambda@Edge without adding external state stores or proxy layers.

References

Originally published on LinkedIn on May 8, 2025.

Top comments (8)

Warren Parad AWS Community Builders • Mar 22

What's the point of returning a 429 on legitimate 404 response codes? Why would this be better than just the rate limiting rule?

Alexey Baltacov AWS Community Builders • Mar 22 • Edited

sometimes you still want to notify the real user the real reason.
sometimes you want to hide the real issue from the user
but you are right - you can return any error code suitable to you and your use case

regarding regular rate limiting - it is unaware of origin responses

Warren Parad AWS Community Builders • Mar 22

What do you mean by the real reason?

If you have already completely correctly handled the full request, and generated a response code, I can't fathom a reason to return a 429 at that moment. Returning a 429 will force a retry, which will create more load. I'm going just go out and say, I can't think of any reason why this implementation should ever be used.

If you wanted to rate limit requests, you would just move the actually BLOCK to the incoming WAF request, and not even execute your code. If you have already executed your code, and you want to rate limit, you can return a 429 from inside your code. Once you have moved outside to the origin response, the WAF has decided not to rate limit AND your application has decided not to rate limit, returning a 429 response is just going to be completely unnecessary and worse, negatively impactful both for your users and your service.

Alexey Baltacov AWS Community Builders • Mar 22

In my use case, the backend for this origin is S3, with a large number of files that have random names.

What I needed was to allow users to download a file from a specific link they receive, as many times as needed — even 100 times per minute.

At the same time, I wanted to block any attempts to guess the random names of other files after only a very small number of attempts (around 3–5).

So you are probably right that returning 429 may create more traffic than 503. That is exactly why I shared the function code — you can implement whatever behavior makes the most sense here.

Warren Parad AWS Community Builders • Mar 22

Two thing, for blog posts, I highly recommend starting with the problem statement that represents the business use case. This helps to orient readers.

Second thing, S3 Presigned GET requests or CloudFront Signed URL both solve this problem for you, as you don't even need to have your WAF consider this scenario. S3 requests will go directly through AWS infra, and CloudFront will directly block any request that isn't signed.

Using a WAF here and the CF Function handling adds unnecessary complexity.

Alexey Baltacov AWS Community Builders • Mar 22

You can use the mechanism described in the blog in many different scenarios, not just with S3. The origin does not have to be hosted on AWS.

It is also not limited to static URLs — it can be applied anywhere you need visibility into the origin’s response.

Warren Parad AWS Community Builders • Mar 22

I think I would recommend coming up with one real world use case then where that would actually be the case, because it feels mostly theoretical rather than something that would be done in practice. But that's just me.

Alexey Baltacov AWS Community Builders • Mar 23 • Edited

The one that comes to mind immediately is a DoS scenario, where your application starts returning errors after a minimal number of specially crafted requests, which you want to block without affecting legitimate users. Of course not with 429 in that case

Actually, you probably know that, for example, in F5 this functionality (being aware of origin responses) is used in WAF protections, while in AWS WAF it is found only in relatively rare cases, such as ATP or login page protection.