Implementing Rate Limiting: Patterns and Best Practices
We didn’t implement rate limiting because we read a security best practices checklist. We implemented it because one enthusiastic customer — let’s call them a “power user” — wrote a script that hit our login endpoint 50,000 times in an hour and took the API down for everyone else.
That’s the moment rate limiting stops being theoretical. It’s not about punishing users. It’s about making sure one actor can’t starve the rest. It’s about protecting your database from itself. It’s about sleeping through the night without wondering if someone’s scraping your entire dataset at 3 AM.
After rate limiting APIs handling millions of requests per day, here’s what actually scales — the algorithms, the Redis patterns, and the client-facing details that separate “we blocked you” from “here’s when you can try again.”
Why Rate Limiting Exists
Rate limiting is traffic control for your API. Without it:
- One abusive client becomes everyone’s outage
- Your cloud bill becomes a function of someone else’s bad loop
- Fair usage stops being fair
- Login endpoints become brute-force welcome mats
With it, you cap damage, preserve capacity for legitimate users, and get a lever for tiered pricing (free gets 100/hour, enterprise gets 10,000/hour). Everybody wins except the script kiddie. They get a 429 and a Retry-After header.
The Algorithm Zoo: Pick Your Poison
Fixed Window: Simple, Sneaky
Count requests per time bucket. Easy to implement. Has a famous flaw:
import redis
import time
redis_client = redis.Redis(host='localhost', port=6379, db=0)
def fixed_window_rate_limit(key, limit, window):
"""
Fixed window: 100 requests per minute
Problem: Allows 200 requests at window boundary
"""
current_window = int(time.time() / window)
redis_key = f"rate_limit:{key}:{current_window}"
current = redis_client.incr(redis_key)
if current == 1:
redis_client.expire(redis_key, window)
return current <= limit
At the boundary between windows, a client can send 100 requests at 0:59 and 100 more at 1:00. You allowed 200 in two seconds while thinking you allowed 100 per minute. Fixed windows are fine for coarse protection. Don’t use them for strict quotas.
Sliding Window Log: Accurate, Memory-Hungry
Track every request timestamp. Precise. Expensive:
def sliding_window_log_rate_limit(key, limit, window):
"""
Sliding window log: Track all requests
Accurate but uses more memory
"""
now = time.time()
redis_key = f"rate_limit:{key}"
# Remove old entries
redis_client.zremrangebyscore(redis_key, 0, now - window)
# Count current requests
current = redis_client.zcard(redis_key)
if current < limit:
# Add current request
redis_client.zadd(redis_key, {str(now): now})
redis_client.expire(redis_key, int(window))
return True
return False
Every request is a sorted set entry. At millions of requests per hour per key, memory adds up. Great for low-volume, high-stakes endpoints (login, password reset). Overkill for general API traffic.
Sliding Window Counter: The Sweet Spot
Approximate sliding window using multiple fixed sub-windows. Good enough for almost everything:
def sliding_window_counter_rate_limit(key, limit, window):
"""
Sliding window counter: Efficient and accurate
Uses multiple fixed windows to approximate sliding window
"""
now = time.time()
# Use 10 sub-windows
sub_window_size = window / 10
current_sub_window = int(now / sub_window_size)
redis_key = f"rate_limit:{key}:{current_sub_window}"
# Increment current sub-window
current = redis_client.incr(redis_key)
redis_client.expire(redis_key, int(window))
# Count requests in all sub-windows
total = 0
for i in range(10):
sub_window = current_sub_window - i
count = redis_client.get(f"rate_limit:{key}:{sub_window}") or 0
total += int(count)
return total <= limit
Ten sub-windows gives you ~90% of sliding window accuracy at a fraction of the memory cost. This is where most APIs should start.
Token Bucket: When Bursts Are Features, Not Bugs
Some use cases want to allow short bursts — a user loads a dashboard that fires 20 parallel requests. Token bucket says “you can spike, but you refill slowly.”
class TokenBucket:
def __init__(self, redis_client, key, capacity, refill_rate):
"""
capacity: Maximum tokens
refill_rate: Tokens added per second
"""
self.redis = redis_client
self.key = f"token_bucket:{key}"
self.capacity = capacity
self.refill_rate = refill_rate
def consume(self, tokens=1):
now = time.time()
bucket_key = self.key
# Lua script for atomic operation
lua_script = """
local bucket = redis.call('HMGET', KEYS[1], 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or ARGV[1]
local last_refill = tonumber(bucket[2]) or ARGV[2]
local now = tonumber(ARGV[2])
local capacity = tonumber(ARGV[3])
local refill_rate = tonumber(ARGV[4])
local requested = tonumber(ARGV[5])
-- Refill tokens
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + (elapsed * refill_rate))
-- Check if enough tokens
if tokens >= requested then
tokens = tokens - requested
redis.call('HMSET', KEYS[1], 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', KEYS[1], 3600)
return {1, tokens}
else
redis.call('HMSET', KEYS[1], 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', KEYS[1], 3600)
return {0, tokens}
end
"""
result = self.redis.eval(
lua_script,
1,
bucket_key,
self.capacity,
now,
self.capacity,
self.refill_rate,
tokens
)
allowed = result[0] == 1
remaining = result[1]
return {
'allowed': allowed,
'remaining': remaining,
'reset_time': now + ((self.capacity - remaining) / self.refill_rate)
}
# Usage
bucket = TokenBucket(redis_client, 'user:123', capacity=100, refill_rate=10)
result = bucket.consume(tokens=5)
if result['allowed']:
# Process request
pass
else:
# Rate limited
return f"Rate limit exceeded. Try again in {result['reset_time']:.0f} seconds"
The Lua script is non-negotiable. Without atomic read-modify-write, two concurrent requests both see 1 token left and both proceed. You’ve built a rate limiter that doesn’t rate limit.
Distributed Rate Limiting: Multiple Servers, One Counter
In-memory rate limiting per server means a user with 100 req/min limit gets 100 per server. Three servers? 300. Redis centralizes the count:
class DistributedRateLimiter:
def __init__(self, redis_client):
self.redis = redis_client
def check_rate_limit(self, identifier, limit, window):
"""
Distributed rate limiting using Redis
Works across multiple application servers
"""
key = f"rate_limit:{identifier}"
now = time.time()
# Lua script for atomic check-and-increment
lua_script = """
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
-- Clean up old entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- Count current requests
local current = redis.call('ZCARD', key)
if current < limit then
-- Add current request
redis.call('ZADD', key, now, now)
redis.call('EXPIRE', key, window)
return {1, limit - current - 1, window}
else
-- Get oldest request to calculate reset time
local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
local reset_time = 0
if oldest[2] then
reset_time = oldest[2] + window - now
end
return {0, 0, reset_time}
end
"""
result = self.redis.eval(
lua_script,
1,
key,
limit,
window,
now
)
allowed = result[0] == 1
remaining = result[1]
reset_time = result[2]
return {
'allowed': allowed,
'remaining': remaining,
'reset_time': reset_time
}
Redis becomes a single point of failure. Run it with replication or accept that rate limiting disappears when Redis dies — and decide whether that’s fail-open (allow traffic) or fail-closed (block everyone). We chose fail-open for availability, fail-closed for auth endpoints.
Middleware: Where Limits Meet Requests
Express.js
const express = require('express');
const redis = require('redis');
const { RateLimiterRedis } = require('rate-limiter-flexible');
const redisClient = redis.createClient({
host: process.env.REDIS_HOST,
port: process.env.REDIS_PORT
});
// Create rate limiter
const rateLimiter = new RateLimiterRedis({
storeClient: redisClient,
keyPrefix: 'rl',
points: 100, // Number of requests
duration: 60, // Per 60 seconds
});
const rateLimiterMiddleware = async (req, res, next) => {
try {
// Use IP address or user ID as key
const key = req.user?.id || req.ip;
await rateLimiter.consume(key);
next();
} catch (rejRes) {
// Rate limit exceeded
res.status(429).json({
error: 'Too many requests',
retryAfter: Math.round(rejRes.msBeforeNext / 1000)
});
}
};
app.use('/api/', rateLimiterMiddleware);
rate-limiter-flexible handles the Redis plumbing. Roll your own if you enjoy debugging race conditions.
Laravel
<?php
namespace App\Http\Middleware;
use Closure;
use Illuminate\Support\Facades\Redis;
use Illuminate\Http\Request;
class RateLimitMiddleware
{
public function handle(Request $request, Closure $next, $limit = 60, $window = 60)
{
$key = $this->resolveRequestSignature($request);
$redis = Redis::connection();
$lua = "
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
local current = redis.call('ZCARD', key)
if current < limit then
redis.call('ZADD', key, now, now)
redis.call('EXPIRE', key, window)
return {1, limit - current - 1}
else
return {0, 0}
end
";
$result = $redis->eval($lua, 1, "rate_limit:{$key}", $limit, $window, time());
if ($result[0] == 0) {
return response()->json([
'error' => 'Rate limit exceeded'
], 429)->header('Retry-After', $window);
}
return $next($request)->header('X-RateLimit-Remaining', $result[1]);
}
protected function resolveRequestSignature(Request $request)
{
// Use user ID if authenticated, otherwise IP
return $request->user()
? "user:{$request->user()->id}"
: "ip:{$request->ip()}";
}
}
Authenticated users get rate limited by user ID. Anonymous traffic by IP. Shared NAT offices will share a bucket — document that or offer API keys.
Strategies Beyond “100 Per Minute for Everyone”
Tiered Limits
Free users and enterprise customers shouldn’t share the same bucket:
def get_user_rate_limit(user_id, user_tier):
"""
Different limits based on user tier
"""
limits = {
'free': {'limit': 100, 'window': 3600}, # 100/hour
'premium': {'limit': 1000, 'window': 3600}, # 1000/hour
'enterprise': {'limit': 10000, 'window': 3600} # 10000/hour
}
return limits.get(user_tier, limits['free'])
Rate limits become a product feature. “Upgrade for higher limits” is a sentence your sales team can use.
Per-Endpoint Limits
Login and search shouldn’t share limits. Login gets brute-forced. Search gets scraped:
# Different limits for different endpoints
ENDPOINT_LIMITS = {
'/api/login': {'limit': 5, 'window': 300}, # 5 per 5 minutes
'/api/search': {'limit': 30, 'window': 60}, # 30 per minute
'/api/data': {'limit': 100, 'window': 3600}, # 100 per hour
}
def endpoint_rate_limit_middleware(endpoint, identifier):
limits = ENDPOINT_LIMITS.get(endpoint, {'limit': 60, 'window': 60})
return check_rate_limit(identifier, limits['limit'], limits['window'])
Tight limits on auth endpoints. Generous limits on read-heavy data endpoints. Obvious in hindsight, often skipped in implementation.
Adaptive Limits
When the system is drowning, lower limits. When it’s idle, loosen up:
class AdaptiveRateLimiter:
def __init__(self, redis_client):
self.redis = redis_client
self.base_limit = 100
self.min_limit = 10
self.max_limit = 1000
def get_limit(self, identifier):
"""
Adjust limit based on system load
"""
# Check system load
load_avg = self.get_system_load()
if load_avg > 0.8:
# High load - reduce limits
return max(self.min_limit, int(self.base_limit * 0.5))
elif load_avg < 0.3:
# Low load - increase limits
return min(self.max_limit, int(self.base_limit * 1.5))
else:
return self.base_limit
def get_system_load(self):
# Get from monitoring system or calculate
import os
return os.getloadavg()[0] / os.cpu_count()
Adaptive limiting is a circuit breaker for your API. Use carefully — users notice when limits change mid-session.
Headers: Tell Clients What’s Happening
A 429 without context is a support ticket. Headers turn confusion into self-service:
def rate_limit_headers(remaining, reset_time, limit):
return {
'X-RateLimit-Limit': str(limit),
'X-RateLimit-Remaining': str(remaining),
'X-RateLimit-Reset': str(int(reset_time)),
'Retry-After': str(int(reset_time - time.time()))
}
# Usage in Flask
@app.route('/api/data')
@rate_limit(limit=100, window=3600)
def get_data():
# Get rate limit info
info = get_rate_limit_info(request.user.id)
response = jsonify({'data': get_data()})
# Add headers
for key, value in rate_limit_headers(
info['remaining'],
info['reset_time'],
info['limit']
).items():
response.headers[key] = value
return response
X-RateLimit-Remaining on successful responses lets well-behaved clients throttle themselves. Retry-After on 429s tells them exactly when to come back. This is API hygiene.
Testing: Prove It Before Production Proves It For You
import unittest
import time
class TestRateLimiter(unittest.TestCase):
def setUp(self):
self.redis = redis.Redis()
self.limiter = DistributedRateLimiter(self.redis)
self.key = 'test_user'
def test_allows_requests_within_limit(self):
limit = 10
window = 60
# Make requests up to limit
for i in range(limit):
result = self.limiter.check_rate_limit(
self.key, limit, window
)
self.assertTrue(result['allowed'])
# Next request should be blocked
result = self.limiter.check_rate_limit(
self.key, limit, window
)
self.assertFalse(result['allowed'])
def test_resets_after_window(self):
limit = 10
window = 1 # 1 second for testing
# Exhaust limit
for i in range(limit):
self.limiter.check_rate_limit(self.key, limit, window)
# Wait for window to expire
time.sleep(window + 0.1)
# Should allow again
result = self.limiter.check_rate_limit(
self.key, limit, window
)
self.assertTrue(result['allowed'])
Test the boundary. Test reset timing. Test concurrent access if you’re not using Lua. The limiter that works in unit tests and fails under parallel load is more common than you’d think.
What We Actually Recommend
Skip fixed windows for anything strict. Start with sliding window counter for general API protection. Reach for token bucket when bursts are legitimate. Always use Redis (or equivalent) for distributed deployments. Always use Lua scripts for atomicity. Always return headers.
Different limits per tier and per endpoint — login is not search. Monitor rate limit hits; a spike in 429s on /api/login is someone trying something. Graceful degradation on Redis failure is a policy decision, not an implementation detail.
Rate limiting isn’t about saying no. It’s about making “no” rare, fair, and survivable — for your users, your infrastructure, and your on-call rotation.
Rate limiting patterns using Redis, reflecting best practices from late 2016. Algorithm tradeoffs and header conventions remain standard; managed API gateways now offer built-in rate limiting for teams that prefer not to roll their own.