Rate limiting is a control mechanism that dictates how often a user can access an API within a certain time frame. It prevents abuse and ensures fair resource usage among multiple clients.

1. What is Rate Limiting?
In the context of REST APIs, rate limiting refers to controlling the number of requests a client can make to an API within a specified period. If the API request count exceeds the threshold defined by the rate limiter, all the excess calls are blocked. Rate limiting is vital for maintaining the server performance, ensuring fair usage across users/clients, and protecting against abuse (such as Denial of Service (DoS) attacks, Brute force attacks, data scrapping, etc.) or spikes in traffic that could overload the system.
Rate limiting is most important for paid APIs because they often operate on a pay-per-call or quota-based pricing model. When clients make frequent or high-volume API requests, rate limiting helps to avoid unintentional overages that can lead to unexpectedly high bills.
The following rules can be examples of rate limiting an API:
- A client can send no more than 20 requests per second.
- A client can send no more than 1000 requests within a minute.
- A client can send no more than 100,000 requests per day.
When accessing an API, when the rate limit is reached, the server generally responds with HTTP Status Code 429 (Too Many Requests). The response headers shall communicate the rate limit condition(s) and include a Retry-After header indicating how long to wait before making a new request.
Rate limiting is NOT always based on the number of requests made to the server. Some providers may count the usage of server resources between the requests.
2. HTTP Response Code and Rate Limit Headers
Suppose we have an API that allows up to 100 requests per minute, and the client has exceeded this limit. After the limit is breached, the client will get an HTTP Status Code 429 response with the following sample response headers.
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1691172000
{
"message": "Rate limit exceeded. Please wait before making more requests."
}
This response lets the client know they’ve exceeded the allowed rate and provides a precise wait time before retrying.
The following table summarizes standard Retry-After and additional headers (part of the new standardized RateLimit header framework) used to communicate the rate limits and related information:
| HTTP Header | Description | Example Value |
|---|---|---|
| Retry-After | Amount of time (in seconds) or a specific date/time when the client should retry after hitting the limit. | 60 or Wed, 21 Oct 2023 07:28:00 GMT |
| RateLimit-Limit | Indicates the request quota in the time window. | 200 |
| RateLimit-Remaining | Indicates remaining requests quota in the current window. | 50 |
| RateLimit-Reset | Indicates the time remaining in the current window, specified in seconds. | 120 (seconds) |
Apart from standard headers, we can use custom rate-limit headers to convey specific information based on the requirements. Such as:
| Custom Header | Description | Example Value |
|---|---|---|
| X-RateLimit-Type | Specifies the rate limit type, such as “user” for user-specific limits or “application” for client application-wide limits. | user / application |
3. Types of Rate Limits?
In industry, the most popular rate limit strategy is time-based. In time-based rate limiting, clients are allowed to make a certain number of requests within a timeframe. We can further divide such limits into four categories:
| Type of Rate Limit | Description | Example |
|---|---|---|
| Fixed window | Dividing time into fixed windows and allowing a certain number of requests in each window. | Max 200 requests per minute |
| Sliding window | Smoothly limits based on a rolling time period to balance sudden spikes in usage. | Max 1,000 requests over any 10-minute span |
| Token bucket | Maintains a “bucket of tokens” where each token represents the capacity to execute a single request. The bucket is refilled with a fixed strategy. | Bucket of 100 tokens, refills 1 per second |
| Leaky bucket | Requests are released from the bucket at a fixed rate, resembling water leaking out of a bucket at a steady pace. | Up to 10 requests per second, burstable |
To choose the best rate-limiting approach, organizations need to consider a few key things:
- The typical amount of traffic their system handles
- The kind of experience they want users to have
- And how well they want to protect against potentially harmful activities
The goal is to safeguard the system without making it difficult for genuine users to access the API. It’s also a good practice to periodically review and adjust rate limits to keep them effective against new risks or unusual traffic patterns.
4. Tools to Implement Rate Limiting
Perhaps the most critical decision when implementing rate limiting is where to implement it. We can implement rate limiting at the application level (in application source code), at the API gateway level (along with load balancer), or deploy a dedicated 3rd-party tool built for this purpose. Each pattern has trade-offs in scalability, ease of implementation, and cost, so selecting the right one depends on your application’s specific requirements.
- Small applications may work well with simple, in-memory counters or database-backed rate limits.
- Distributed applications may demand distributed solutions such as API gateways.
- High-volume APIs may mandate dedicated tools or CDN-based limits to handle scalability and ensure performance.
The following table provides an overview of different rate-limiting options, from simple setups for small apps to advanced solutions:
| Application Type | Pattern | Description | Tools/Technologies |
|---|---|---|---|
| Small Applications | In-Memory | Stores request counter in memory. | Local data structures (e.g., HashMap in Java, Python dictionaries) |
| Database-backed | Stores request data in a relational database. | MySQL, PostgreSQL, SQLite | |
| Distributed Applications | Distributed Cache | Stores request data in a distributed cache for request counters across servers. | Redis, Memcached |
| Edge / CDN | Limits requests at the network edge via CDNs or proxy servers. | Cloudflare, Akamai, Fastly | |
| API Gateway | Enforces rate limits through an API gateway, handling traffic and managing quotas. | AWS API Gateway, Kong Gateway, Google Cloud Endpoints | |
| High-Volume/Dedicated Solutions | Dedicated Rate Limiting Services | Specialized tools for high-traffic APIs with advanced throttling and monitoring. | Throttlestop, TrafficGuard, Rate-Limiter-Flex |
| Premium API Gateway Plans | API gateways with advanced rate-limiting features designed for high-traffic, enterprise-level APIs. | Apigee, MuleSoft API Manager, Azure API Management |
You can look into a specific solution/tool for its detailed information.
5. Best Practices
Consider the following best practices as per your application needs:
- Apply Limits at Different Levels: Implement rate limits at the user, IP, or API key level to gain flexibility. For example, user-level limits help manage individual abuse, while IP-level limits protect against DDoS attacks. Application-level limits can be applied for overall traffic control.
- Implement Graceful Error Handling: When a user exceeds the rate limit, respond with a clear ‘429 Too Many Requests‘ status and provide headers to improve the user experience and encourage respectful usage patterns.
- Monitor and Log Requests: Keep track of request patterns, exceeded limits, and blocked requests to analyze trends and adjust rate-limit policies as needed.
- Plan for Burst Control Mechanisms: Allow occasional bursts by setting burst limits within a fixed period. For example, increase the limit based on seasonality (festive times) to allow legitimate spikes without impacting user experience.
- Setup Alert: Configure alerts to notify users when they hit the limit and suggest upgrading to avoid further limits.
- Test, Test, Test: Test rate limits across various client types and quotas to ensure that limits are correctly applied at all levels.
Following these best practices can help ensure that rate limiting is effectively implemented and enhances the user experience while protecting system resources.
Useful Resources:
- https://developers.cloudflare.com/waf/rate-limiting-rules/request-rate/
- https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28
- https://techdocs.akamai.com/purge-cache/reference/rate-limit-response-examples
- https://docs.stripe.com/rate-limits
- https://datatracker.ietf.org/doc/html/rfc6585#page-3
- https://github.com/Salah856/System-Design/blob/main/Design%20Rate%20Limiter.md
Comments