API Gateway Patterns: Rate Limiting, Caching, and Authentication
Before we had an API gateway, our microservices architecture looked like a strip mall with fifteen separate entrances — each with its own lock, its own rate limit (or lack thereof), and its own idea of what “authenticated” meant. Frontend clients talked to five different base URLs. Mobile apps cached credentials in three places. And one enthusiastic partner’s integration test DDoS’d our user service because nothing was throttling at the edge.
The API gateway became the single front door: routing, auth, rate limiting, caching, and the cross-cutting concerns that don’t belong duplicated in every service. Not because gateways are trendy — because “every team implements JWT validation slightly differently” is a security incident waiting for a calendar invite.
Here’s what we built and what we’d do again.
What an API Gateway Actually Does
An API gateway sits between clients and backend services. It:
- Routes requests to the right microservice
- Handles authentication and authorization once, centrally
- Protects backends from abuse (rate limiting)
- Caches responses that don’t need to hit origin every time
- Provides a single URL for clients, even as backends move and multiply
Think of it as a concierge: clients ask the concierge; the concierge knows which room to send them to, checks their credentials, and won’t let them order room service 500 times per minute.
Request Routing: One URL, Many Backends
The simplest gateway is a reverse proxy with opinions. Express + http-proxy-middleware gets you surprisingly far:
// Express.js gateway
const express = require('express');
const { createProxyMiddleware } = require('http-proxy-middleware');
const app = express();
// Route to user service
app.use('/api/users', createProxyMiddleware({
target: 'http://user-service:3000',
changeOrigin: true,
pathRewrite: {
'^/api/users': ''
}
}));
// Route to order service
app.use('/api/orders', createProxyMiddleware({
target: 'http://order-service:3000',
changeOrigin: true,
pathRewrite: {
'^/api/orders': ''
}
}));
app.listen(8080);
Clients see api.example.com/users. Behind the scenes, the gateway strips the prefix and forwards to user-service:3000. When you split the monolith further, clients don’t care — the gateway routing table changes, not the mobile app.
Rate Limiting: Protecting Backends From Enthusiasm
Not all traffic is malicious. Some is just… enthusiastic. Integration tests, retry loops, scrapers, that one client who polls every 100ms “for real-time feel.” Rate limiting is how you say “we love your business, but please breathe.”
Token Bucket: Smooth Burst Handling
The token bucket algorithm refills tokens at a steady rate and allows bursts up to bucket capacity. It’s intuitive and works well per-user or per-API-key:
class TokenBucket {
constructor(capacity, refillRate) {
this.capacity = capacity;
this.tokens = capacity;
this.refillRate = refillRate; // tokens per second
this.lastRefill = Date.now();
}
refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(
this.capacity,
this.tokens + elapsed * this.refillRate
);
this.lastRefill = now;
}
consume(tokens = 1) {
this.refill();
if (this.tokens >= tokens) {
this.tokens -= tokens;
return true;
}
return false;
}
}
// Per-user rate limiting
const userBuckets = new Map();
function rateLimitMiddleware(req, res, next) {
const userId = req.user?.id || req.ip;
if (!userBuckets.has(userId)) {
userBuckets.set(userId, new TokenBucket(100, 10)); // 100 tokens, 10/sec
}
const bucket = userBuckets.get(userId);
if (bucket.consume()) {
next();
} else {
res.status(429).json({
error: 'Rate limit exceeded',
retryAfter: Math.ceil((1 - bucket.tokens) / bucket.refillRate)
});
}
}
In-memory buckets work for single-instance gateways. The moment you scale horizontally, you need shared state — hello, Redis.
Redis-Based Rate Limiting: Works Across Gateway Replicas
const Redis = require('ioredis');
const redis = new Redis(process.env.REDIS_URL);
async function rateLimit(req, res, next) {
const key = `rate_limit:${req.user?.id || req.ip}`;
const limit = 100;
const window = 60; // seconds
const current = await redis.incr(key);
if (current === 1) {
await redis.expire(key, window);
}
if (current > limit) {
const ttl = await redis.ttl(key);
return res.status(429).json({
error: 'Rate limit exceeded',
retryAfter: ttl
});
}
res.setHeader('X-RateLimit-Limit', limit);
res.setHeader('X-RateLimit-Remaining', Math.max(0, limit - current));
res.setHeader('X-RateLimit-Reset', Date.now() + (ttl * 1000));
next();
}
Return 429 with Retry-After headers. Well-behaved clients back off. The ones that don’t? That’s what WAF rules and IP blocks are for.
Caching: Stop Asking the Database the Same Question
Some endpoints get called thousands of times per second with identical parameters. /api/users/123 doesn’t change every millisecond. Caching at the gateway offloads backends and cuts latency dramatically.
In-Memory Cache: Simple and Fast (Single Instance)
const NodeCache = require('node-cache');
const cache = new NodeCache({ stdTTL: 300 }); // 5 minutes
function cacheMiddleware(ttl = 300) {
return (req, res, next) => {
const key = req.originalUrl || req.url;
// Check cache
const cached = cache.get(key);
if (cached) {
return res.json(cached);
}
// Override res.json to cache response
const originalJson = res.json.bind(res);
res.json = function(data) {
cache.set(key, data, ttl);
return originalJson(data);
};
next();
};
}
// Usage
app.get('/api/users/:id', cacheMiddleware(600), async (req, res) => {
const user = await userService.getUser(req.params.id);
res.json(user);
});
Cache invalidation caveat: Gateway caching works best for read-heavy, rarely-changing data. User profiles? Maybe. Account balances? Be very careful. When in doubt, short TTLs and explicit cache-bust headers on writes.
Redis Cache: Shared Across Gateway Instances
const Redis = require('ioredis');
const redis = new Redis(process.env.REDIS_URL);
async function cacheMiddleware(req, res, next) {
const key = `cache:${req.originalUrl}`;
// Check cache
const cached = await redis.get(key);
if (cached) {
const data = JSON.parse(cached);
return res.json(data);
}
// Override res.json
const originalJson = res.json.bind(res);
res.json = async function(data) {
await redis.setex(key, 300, JSON.stringify(data)); // 5 min TTL
return originalJson(data);
};
next();
}
Only cache GET requests. Never cache responses that vary by auth header unless your cache key includes the user identity.
Authentication: Validate Once at the Door
Duplicating JWT validation in twelve microservices means twelve places to forget the expiry check. Do it at the gateway; pass trusted identity downstream.
JWT Validation
const jwt = require('jsonwebtoken');
function authMiddleware(req, res, next) {
const token = req.headers.authorization?.replace('Bearer ', '');
if (!token) {
return res.status(401).json({ error: 'No token provided' });
}
try {
const decoded = jwt.verify(token, process.env.JWT_SECRET);
req.user = decoded;
next();
} catch (error) {
return res.status(401).json({ error: 'Invalid token' });
}
}
// Usage
app.get('/api/users/me', authMiddleware, (req, res) => {
res.json(req.user);
});
Downstream services should receive identity via headers (X-User-Id, X-User-Roles) injected by the gateway — not re-validate the JWT unless you need defense-in-depth for highly sensitive operations.
API Key Authentication for Partners
const apiKeys = new Map([
['key-123', { userId: 'user-1', permissions: ['read', 'write'] }],
['key-456', { userId: 'user-2', permissions: ['read'] }]
]);
function apiKeyMiddleware(req, res, next) {
const apiKey = req.headers['x-api-key'];
if (!apiKey) {
return res.status(401).json({ error: 'API key required' });
}
const keyData = apiKeys.get(apiKey);
if (!keyData) {
return res.status(401).json({ error: 'Invalid API key' });
}
req.apiKey = keyData;
next();
}
In production, API keys live in a database with rotation, revocation, and per-key rate limits — not a hardcoded Map. But the middleware shape is the same.
Request/Response Transformation: Consistency at the Edge
Clients appreciate predictable response envelopes. Request IDs make debugging possible when logs span a dozen services:
function transformRequest(req, res, next) {
// Add request ID
req.id = require('crypto').randomUUID();
// Log request
console.log(`[${req.id}] ${req.method} ${req.path}`);
// Transform response
const originalJson = res.json.bind(res);
res.json = function(data) {
const transformed = {
requestId: req.id,
timestamp: new Date().toISOString(),
data: data
};
return originalJson(transformed);
};
next();
}
Propagate req.id to downstream services via X-Request-Id header. When a user reports “it broke at 3:15,” you grep one ID instead of reconstructing a distributed trace from vibes.
Load Balancing: Don’t Send Everyone to the Same Pod
Round-robin across healthy backend instances is the default for a reason — it’s simple and mostly works:
const servers = [
'http://user-service-1:3000',
'http://user-service-2:3000',
'http://user-service-3:3000'
];
let current = 0;
function getNextServer() {
const server = servers[current];
current = (current + 1) % servers.length;
return server;
}
app.use('/api/users', createProxyMiddleware({
target: getNextServer(),
changeOrigin: true,
router: (req) => getNextServer() // Round-robin
}));
For production, health-check-aware load balancing (via Kong, nginx, or cloud load balancers) skips unhealthy backends automatically. Sending traffic to a pod that’s mid-crash helps nobody.
Circuit Breakers at the Gateway: Fail Fast Before Backends Drown
When a backend is melting down, the gateway should stop forwarding traffic — not queue infinite requests:
class CircuitBreaker {
constructor(service, options = {}) {
this.service = service;
this.failureThreshold = options.failureThreshold || 5;
this.timeout = options.timeout || 60000;
this.state = 'CLOSED';
this.failures = 0;
this.nextAttempt = Date.now();
}
async call(...args) {
if (this.state === 'OPEN') {
if (Date.now() < this.nextAttempt) {
throw new Error('Circuit breaker is OPEN');
}
this.state = 'HALF_OPEN';
}
try {
const result = await this.service(...args);
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onSuccess() {
this.failures = 0;
this.state = 'CLOSED';
}
onFailure() {
this.failures++;
if (this.failures >= this.failureThreshold) {
this.state = 'OPEN';
this.nextAttempt = Date.now() + this.timeout;
}
}
}
// Usage
const userServiceBreaker = new CircuitBreaker(userService.getUser);
app.get('/api/users/:id', async (req, res) => {
try {
const user = await userServiceBreaker.call(req.params.id);
res.json(user);
} catch (error) {
res.status(503).json({ error: 'Service unavailable' });
}
});
Return 503 with a clear message instead of hanging until the client times out. Your mobile app’s retry logic will thank you — if you’ve taught it to respect 503s.
Off-the-Shelf Gateways: When Roll-Your-Own Gets Old
We started with Express. We migrated to Kong when plugin management, admin API, and rate-limiting plugins outweighed the simplicity of a custom server.
# kong.yml
_format_version: "1.1"
services:
- name: user-service
url: http://user-service:3000
routes:
- name: user-route
paths:
- /api/users
plugins:
- name: rate-limiting
config:
minute: 100
hour: 1000
- name: jwt
config:
secret_is_base64: false
- name: response-caching
config:
ttl: 300
consumers:
- username: api-consumer
keyauth_credentials:
- key: api-key-123
Rate limiting, JWT validation, and response caching — configured declaratively, no middleware spaghetti.
AWS API Gateway: When You’re Already on AWS
# serverless.yml
service: api-gateway
provider:
name: aws
runtime: nodejs14.x
apiGateway:
restApiId: ${self:custom.apiId}
restApiRootResourceId: ${self:custom.rootResourceId}
functions:
users:
handler: handlers/users.handler
events:
- http:
path: /api/users/{proxy+}
method: ANY
authorizer: aws_iam
throttling:
burstLimit: 200
rateLimit: 100
Managed throttling, IAM auth, and Lambda integration — less ops, more vendor coupling. Tradeoffs, as always.
What We Learned Running Gateways in Production
Rate limiting at the edge saved us more than once — set it before you need it, not after a partner’s load test. Cache read-heavy endpoints aggressively but invalidate carefully; stale user data causes support tickets, stale product catalogs cause shrug emojis. Auth belongs at the gateway for external clients; internal service-to-service calls need their own trust model (mTLS, service tokens). Log every request with a correlation ID — you will need it. Circuit breakers on unhealthy backends prevent cascade failures. And version your APIs (/v1/, /v2/) so you can migrate clients without a big-bang deploy.
The Bottom Line
An API gateway isn’t ceremony — it’s the choke point where you enforce the rules everyone agreed on in architecture reviews but nobody implemented in their service. One URL for clients. One place for auth. One place to say “slow down” before your database catches fire.
Build it early enough to matter, keep it thin enough to maintain, and resist the urge to put business logic in it. The gateway is a bouncer, not a chef.
Written October 2018, covering production gateway patterns with Express, Kong, and AWS API Gateway. Gateway technology has exploded since (Envoy, Istio, APISIX) — the patterns remain; evaluate managed vs self-hosted for your scale.