Rate Limiting a REST API

Rate limiting is a control mechanism that dictates how often a user can access an API within a certain time frame. It prevents abuse and ensures fair resource usage among multiple clients.

1. What is Rate Limiting?

In the context of REST APIs, rate limiting refers to controlling the number of requests a client can make to an API within a specified period. If the API request count exceeds the threshold defined by the rate limiter, all the excess calls are blocked. Rate limiting is vital for maintaining the server performance, ensuring fair usage across users/clients, and protecting against abuse (such as Denial of Service (DoS) attacks, Brute force attacks, data scrapping, etc.) or spikes in traffic that could overload the system.

Rate limiting is most important for paid APIs because they often operate on a pay-per-call or quota-based pricing model. When clients make frequent or high-volume API requests, rate limiting helps to avoid unintentional overages that can lead to unexpectedly high bills.

The following rules can be examples of rate limiting an API:

A client can send no more than 20 requests per second.
A client can send no more than 1000 requests within a minute.
A client can send no more than 100,000 requests per day.

When accessing an API, when the rate limit is reached, the server generally responds with HTTP Status Code 429 (Too Many Requests). The response headers shall communicate the rate limit condition(s) and include a Retry-After header indicating how long to wait before making a new request.

Rate limiting is NOT always based on the number of requests made to the server. Some providers may count the usage of server resources between the requests.

2. HTTP Response Code and Rate Limit Headers

Suppose we have an API that allows up to 100 requests per minute, and the client has exceeded this limit. After the limit is breached, the client will get an HTTP Status Code 429 response with the following sample response headers.

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1691172000

{
  "message": "Rate limit exceeded. Please wait before making more requests."
}

This response lets the client know they’ve exceeded the allowed rate and provides a precise wait time before retrying.

The following table summarizes standard Retry-After and additional headers (part of the new standardized RateLimit header framework) used to communicate the rate limits and related information:

HTTP Header	Description	Example Value
Retry-After	Amount of time (in seconds) or a specific date/time when the client should retry after hitting the limit.	`60` or Wed, 21 Oct 2023 07:28:00 GMT
RateLimit-Limit	Indicates the request quota in the time window.	200
RateLimit-Remaining	Indicates remaining requests quota in the current window.	50
RateLimit-Reset	Indicates the time remaining in the current window, specified in seconds.	120 (seconds)

Apart from standard headers, we can use custom rate-limit headers to convey specific information based on the requirements. Such as:

Custom Header	Description	Example Value
X-RateLimit-Type	Specifies the rate limit type, such as “user” for user-specific limits or “application” for client application-wide limits.	user / application

3. Types of Rate Limits?

In industry, the most popular rate limit strategy is time-based. In time-based rate limiting, clients are allowed to make a certain number of requests within a timeframe. We can further divide such limits into four categories:

Type of Rate Limit	Description	Example
Fixed window	Dividing time into fixed windows and allowing a certain number of requests in each window.	Max 200 requests per minute
Sliding window	Smoothly limits based on a rolling time period to balance sudden spikes in usage.	Max 1,000 requests over any 10-minute span
Token bucket	Maintains a “bucket of tokens” where each token represents the capacity to execute a single request. The bucket is refilled with a fixed strategy.	Bucket of 100 tokens, refills 1 per second
Leaky bucket	Requests are released from the bucket at a fixed rate, resembling water leaking out of a bucket at a steady pace.	Up to 10 requests per second, burstable

To choose the best rate-limiting approach, organizations need to consider a few key things:

The typical amount of traffic their system handles
The kind of experience they want users to have
And how well they want to protect against potentially harmful activities

The goal is to safeguard the system without making it difficult for genuine users to access the API. It’s also a good practice to periodically review and adjust rate limits to keep them effective against new risks or unusual traffic patterns.

4. Tools to Implement Rate Limiting

Perhaps the most critical decision when implementing rate limiting is where to implement it. We can implement rate limiting at the application level (in application source code), at the API gateway level (along with load balancer), or deploy a dedicated 3rd-party tool built for this purpose. Each pattern has trade-offs in scalability, ease of implementation, and cost, so selecting the right one depends on your application’s specific requirements.

Small applications may work well with simple, in-memory counters or database-backed rate limits.
Distributed applications may demand distributed solutions such as API gateways.
High-volume APIs may mandate dedicated tools or CDN-based limits to handle scalability and ensure performance.

The following table provides an overview of different rate-limiting options, from simple setups for small apps to advanced solutions:

Application Type	Pattern	Description	Tools/Technologies
Small Applications	In-Memory	Stores request counter in memory.	Local data structures (e.g., HashMap in Java, Python dictionaries)
	Database-backed	Stores request data in a relational database.	MySQL, PostgreSQL, SQLite
Distributed Applications	Distributed Cache	Stores request data in a distributed cache for request counters across servers.	Redis, Memcached
	Edge / CDN	Limits requests at the network edge via CDNs or proxy servers.	Cloudflare, Akamai, Fastly
	API Gateway	Enforces rate limits through an API gateway, handling traffic and managing quotas.	AWS API Gateway, Kong Gateway, Google Cloud Endpoints
High-Volume/Dedicated Solutions	Dedicated Rate Limiting Services	Specialized tools for high-traffic APIs with advanced throttling and monitoring.	Throttlestop, TrafficGuard, Rate-Limiter-Flex
	Premium API Gateway Plans	API gateways with advanced rate-limiting features designed for high-traffic, enterprise-level APIs.	Apigee, MuleSoft API Manager, Azure API Management

You can look into a specific solution/tool for its detailed information.

5. Best Practices

Consider the following best practices as per your application needs:

Apply Limits at Different Levels: Implement rate limits at the user, IP, or API key level to gain flexibility. For example, user-level limits help manage individual abuse, while IP-level limits protect against DDoS attacks. Application-level limits can be applied for overall traffic control.
Implement Graceful Error Handling: When a user exceeds the rate limit, respond with a clear ‘429 Too Many Requests‘ status and provide headers to improve the user experience and encourage respectful usage patterns.
Monitor and Log Requests: Keep track of request patterns, exceeded limits, and blocked requests to analyze trends and adjust rate-limit policies as needed.
Plan for Burst Control Mechanisms: Allow occasional bursts by setting burst limits within a fixed period. For example, increase the limit based on seasonality (festive times) to allow legitimate spikes without impacting user experience.
Setup Alert: Configure alerts to notify users when they hit the limit and suggest upgrading to avoid further limits.
Test, Test, Test: Test rate limits across various client types and quotas to ensure that limits are correctly applied at all levels.

Following these best practices can help ensure that rate limiting is effectively implemented and enhances the user experience while protecting system resources.

Useful Resources: