Building Production-Ready Rate Limiting Middleware in Go
Building Production-Ready Rate Limiting Middleware in Go
Rate limiting is your application’s first line of defense against abuse, whether from malicious actors launching denial-of-service attacks, buggy clients stuck in retry loops, or legitimate users inadvertently overwhelming your system. Without proper rate limiting, a single misbehaving client can bring down your entire service, impacting all users and potentially costing your business significant revenue and reputation.
In this comprehensive guide, we’ll build a production-grade rate limiting middleware for Go HTTP servers that protects your APIs while maintaining performance and flexibility. We’ll start with fundamental concepts and progress to sophisticated patterns used by companies serving millions of requests per second.
Why Rate Limiting Matters
Protecting Service Availability: A single client making thousands of requests per second can exhaust your server’s resources—CPU, memory, database connections, and bandwidth. Rate limiting ensures fair resource distribution across all users.
Cost Control: Cloud infrastructure bills scale with usage. Without rate limiting, runaway clients can generate massive costs. One misconfigured script hitting your API continuously could result in thousands of dollars in unexpected charges.
Security Defense: Rate limiting is crucial for defending against:
- Brute force attacks: Limiting login attempts prevents password guessing
- Credential stuffing: Slowing down automated account takeover attempts
- API scraping: Preventing competitors from harvesting your data
- DDoS attacks: Mitigating distributed denial-of-service attempts
Quality of Service: Rate limits help maintain consistent response times for all users. Without them, a few heavy users can slow down the experience for everyone else.
Business Model Enforcement: For tiered API products, rate limiting ensures customers stay within their plan limits, protecting your revenue model and preventing abuse of free tiers.
Prerequisites
Before implementing rate limiting, you should understand:
- Go fundamentals: Goroutines, channels, and concurrent programming patterns
- HTTP basics: Request/response cycle, status codes, and headers
- Middleware concepts: How middleware chains work in HTTP handlers
- Token bucket algorithm: The rate limiting strategy we’ll implement
- Concurrency primitives: Mutexes, atomic operations, and synchronization
Required packages:
go get golang.org/x/time/rate
This guide uses Go 1.21+ but is compatible with Go 1.13+.
Understanding Rate Limiting Algorithms
Before writing code, let’s understand the token bucket algorithm, which powers golang.org/x/time/rate
.
The Token Bucket Model:
Imagine a bucket that holds tokens. Each token represents permission to make one request:
- Tokens are added at a constant rate (the refill rate)
- The bucket has a maximum capacity (the burst size)
- Each request consumes one token
- If no tokens are available, the request is denied
For example, a rate limiter with rate=10
and burst=20
means:
- Tokens refill at 10 per second
- The bucket can hold maximum 20 tokens
- A client can make 20 requests instantly (burst)
- After bursting, they’re limited to 10 requests per second
- Tokens accumulate when not used, up to the burst limit
This algorithm is preferred because it:
- Allows bursts: Legitimate clients can handle temporary spikes
- Prevents sustained abuse: Long-term rate is strictly enforced
- Is simple and efficient: Requires minimal state per client
Building a Basic Rate Limiting Middleware
Let’s start with a simple per-IP rate limiter:
package middleware
import (
"log"
"net"
"net/http"
"sync"
"golang.org/x/time/rate"
)
// IPRateLimiter manages rate limiters for multiple clients
type IPRateLimiter struct {
limiters map[string]*rate.Limiter
mu sync.RWMutex
rate rate.Limit // requests per second
burst int // maximum burst size
}
// NewIPRateLimiter creates a new IP-based rate limiter
func NewIPRateLimiter(r rate.Limit, b int) *IPRateLimiter {
return &IPRateLimiter{
limiters: make(map[string]*rate.Limiter),
rate: r,
burst: b,
}
}
// GetLimiter returns the rate limiter for the provided IP address
func (i *IPRateLimiter) GetLimiter(ip string) *rate.Limiter {
i.mu.Lock()
defer i.mu.Unlock()
limiter, exists := i.limiters[ip]
if !exists {
limiter = rate.NewLimiter(i.rate, i.burst)
i.limiters[ip] = limiter
}
return limiter
}
// RateLimitMiddleware creates HTTP middleware that rate limits by IP
func (i *IPRateLimiter) RateLimitMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Extract IP address from request
ip, _, err := net.SplitHostPort(r.RemoteAddr)
if err != nil {
log.Printf("Error parsing IP address: %v", err)
http.Error(w, "Internal Server Error", http.StatusInternalServerError)
return
}
// Get or create rate limiter for this IP
limiter := i.GetLimiter(ip)
// Check if request is allowed
if !limiter.Allow() {
http.Error(w, http.StatusText(http.StatusTooManyRequests), http.StatusTooManyRequests)
return
}
// Request is allowed, proceed to next handler
next.ServeHTTP(w, r)
})
}
This basic implementation provides:
- Per-IP rate limiting: Each IP address gets its own limiter
- Thread safety: RWMutex protects concurrent access to the limiters map
- Automatic limiter creation: New IPs get limiters on first request
- Simple denial: Blocked requests receive 429 Too Many Requests
However, this basic version has critical flaws we’ll address in production-ready implementations.
Extracting Client IP Addresses Correctly
Getting the real client IP is tricky when your application sits behind proxies or load balancers:
package middleware
import (
"net"
"net/http"
"strings"
)
// GetClientIP extracts the real client IP from the request
// considering proxies and load balancers
func GetClientIP(r *http.Request) string {
// Check X-Forwarded-For header (set by proxies)
xff := r.Header.Get("X-Forwarded-For")
if xff != "" {
// X-Forwarded-For can contain multiple IPs: "client, proxy1, proxy2"
// Take the first one (the original client)
ips := strings.Split(xff, ",")
if len(ips) > 0 {
// Trim whitespace from IP
clientIP := strings.TrimSpace(ips[0])
// Validate it's a real IP
if net.ParseIP(clientIP) != nil {
return clientIP
}
}
}
// Check X-Real-IP header (set by some proxies)
xri := r.Header.Get("X-Real-IP")
if xri != "" {
if net.ParseIP(xri) != nil {
return xri
}
}
// Fall back to RemoteAddr
ip, _, err := net.SplitHostPort(r.RemoteAddr)
if err != nil {
// If no port, RemoteAddr might be just the IP
return r.RemoteAddr
}
return ip
}
// GetClientIPWithTrustedProxies safely extracts client IP
// only when the request comes through trusted proxies
func GetClientIPWithTrustedProxies(r *http.Request, trustedProxies []string) string {
remoteIP, _, _ := net.SplitHostPort(r.RemoteAddr)
// Check if request comes from a trusted proxy
isTrusted := false
for _, proxy := range trustedProxies {
if remoteIP == proxy {
isTrusted = true
break
}
}
// Only trust X-Forwarded-For if from trusted proxy
if isTrusted {
xff := r.Header.Get("X-Forwarded-For")
if xff != "" {
ips := strings.Split(xff, ",")
if len(ips) > 0 {
return strings.TrimSpace(ips[0])
}
}
}
// Otherwise use the direct connection IP
return remoteIP
}
Security consideration: Never blindly trust X-Forwarded-For
without validating the request comes from a trusted proxy. Attackers can spoof this header to bypass IP-based rate limiting.
Implementing Memory-Efficient Cleanup
Our basic rate limiter has a memory leak: limiters are never removed. If your API receives requests from millions of IPs, memory usage grows unbounded.
package middleware
import (
"sync"
"time"
"golang.org/x/time/rate"
)
// visitor represents a rate-limited client
type visitor struct {
limiter *rate.Limiter
lastSeen time.Time
}
// IPRateLimiterWithCleanup manages rate limiters with automatic cleanup
type IPRateLimiterWithCleanup struct {
visitors map[string]*visitor
mu sync.RWMutex
rate rate.Limit
burst int
}
// NewIPRateLimiterWithCleanup creates a rate limiter with background cleanup
func NewIPRateLimiterWithCleanup(r rate.Limit, b int) *IPRateLimiterWithCleanup {
limiter := &IPRateLimiterWithCleanup{
visitors: make(map[string]*visitor),
rate: r,
burst: b,
}
// Start cleanup goroutine
go limiter.cleanupVisitors()
return limiter
}
// GetLimiter returns the rate limiter for an IP, creating if necessary
func (i *IPRateLimiterWithCleanup) GetLimiter(ip string) *rate.Limiter {
i.mu.Lock()
defer i.mu.Unlock()
v, exists := i.visitors[ip]
if !exists {
limiter := rate.NewLimiter(i.rate, i.burst)
i.visitors[ip] = &visitor{
limiter: limiter,
lastSeen: time.Now(),
}
return limiter
}
// Update last seen time
v.lastSeen = time.Now()
return v.limiter
}
// cleanupVisitors removes inactive visitors to prevent memory leaks
func (i *IPRateLimiterWithCleanup) cleanupVisitors() {
ticker := time.NewTicker(5 * time.Minute)
defer ticker.Stop()
for range ticker.C {
i.mu.Lock()
// Remove visitors inactive for more than 10 minutes
threshold := time.Now().Add(-10 * time.Minute)
for ip, v := range i.visitors {
if v.lastSeen.Before(threshold) {
delete(i.visitors, ip)
}
}
i.mu.Unlock()
}
}
This cleanup strategy:
- Runs periodically: Checks every 5 minutes
- Removes stale entries: Deletes visitors inactive for 10+ minutes
- Balances memory and accuracy: Keeps recent visitors for smooth rate limiting
- Uses minimal resources: Simple background goroutine with low overhead
Adding Informative Rate Limit Headers
Good APIs inform clients about rate limits through HTTP headers, following industry standards:
package middleware
import (
"fmt"
"net/http"
"time"
"golang.org/x/time/rate"
)
// RateLimitMiddlewareWithHeaders adds rate limit information to responses
func (i *IPRateLimiterWithCleanup) RateLimitMiddlewareWithHeaders(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ip := GetClientIP(r)
limiter := i.GetLimiter(ip)
// Get current state of the limiter
reservation := limiter.Reserve()
if !reservation.OK() {
// This shouldn't happen with our settings, but handle it
http.Error(w, "Rate limit configuration error", http.StatusInternalServerError)
return
}
// Calculate available tokens (approximate)
delay := reservation.Delay()
reservation.Cancel() // We're not actually consuming the token yet
if delay > 0 {
// Rate limit exceeded
retryAfter := time.Now().Add(delay)
// Set standard rate limit headers
w.Header().Set("X-RateLimit-Limit", fmt.Sprintf("%d", i.burst))
w.Header().Set("X-RateLimit-Remaining", "0")
w.Header().Set("X-RateLimit-Reset", fmt.Sprintf("%d", retryAfter.Unix()))
w.Header().Set("Retry-After", fmt.Sprintf("%d", int(delay.Seconds())+1))
http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
return
}
// Allow the request and consume the token
limiter.Allow()
// Add rate limit headers for successful requests
// Note: These are approximate since tokens are consumed continuously
w.Header().Set("X-RateLimit-Limit", fmt.Sprintf("%d", i.burst))
// Remaining tokens is approximate
remaining := i.burst - 1 // We just consumed one
if remaining < 0 {
remaining = 0
}
w.Header().Set("X-RateLimit-Remaining", fmt.Sprintf("%d", remaining))
// Reset time is when bucket will be full again (approximate)
resetTime := time.Now().Add(time.Duration(i.burst) * time.Second / time.Duration(i.rate))
w.Header().Set("X-RateLimit-Reset", fmt.Sprintf("%d", resetTime.Unix()))
next.ServeHTTP(w, r)
})
}
These headers help clients:
- Understand their limits:
X-RateLimit-Limit
shows the maximum rate - Avoid hitting limits:
X-RateLimit-Remaining
helps clients pace requests - Know when to retry:
X-RateLimit-Reset
andRetry-After
indicate when limits reset - Build smarter clients: Applications can implement exponential backoff using this data
Implementing Per-User Rate Limiting
IP-based rate limiting doesn’t work well for:
- Mobile users: IPs change frequently as users move between networks
- Corporate networks: Many users share the same IP
- Authenticated APIs: You want to limit by user account, not IP
Here’s how to implement flexible per-user rate limiting:
package middleware
import (
"context"
"net/http"
"sync"
"golang.org/x/time/rate"
)
// UserRateLimiter manages rate limits by user ID
type UserRateLimiter struct {
limiters map[string]*visitor
mu sync.RWMutex
rate rate.Limit
burst int
}
// NewUserRateLimiter creates a user-based rate limiter
func NewUserRateLimiter(r rate.Limit, b int) *UserRateLimiter {
limiter := &UserRateLimiter{
limiters: make(map[string]*visitor),
rate: r,
burst: b,
}
go limiter.cleanupVisitors()
return limiter
}
func (u *UserRateLimiter) GetLimiter(userID string) *rate.Limiter {
u.mu.Lock()
defer u.mu.Unlock()
v, exists := u.limiters[userID]
if !exists {
limiter := rate.NewLimiter(u.rate, u.burst)
u.limiters[userID] = &visitor{
limiter: limiter,
lastSeen: time.Now(),
}
return limiter
}
v.lastSeen = time.Now()
return v.limiter
}
func (u *UserRateLimiter) cleanupVisitors() {
ticker := time.NewTicker(5 * time.Minute)
defer ticker.Stop()
for range ticker.C {
u.mu.Lock()
threshold := time.Now().Add(-10 * time.Minute)
for userID, v := range u.limiters {
if v.lastSeen.Before(threshold) {
delete(u.limiters, userID)
}
}
u.mu.Unlock()
}
}
// UserRateLimitMiddleware creates middleware for authenticated endpoints
func (u *UserRateLimiter) UserRateLimitMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Extract user ID from context (set by auth middleware)
userID, ok := GetUserIDFromContext(r.Context())
if !ok {
// No user ID means unauthenticated request
// You might want IP-based rate limiting for these
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
limiter := u.GetLimiter(userID)
if !limiter.Allow() {
http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
return
}
next.ServeHTTP(w, r)
})
}
// Context helpers
type contextKey string
const userIDKey contextKey = "userID"
func GetUserIDFromContext(ctx context.Context) (string, bool) {
userID, ok := ctx.Value(userIDKey).(string)
return userID, ok
}
func SetUserIDInContext(ctx context.Context, userID string) context.Context {
return context.WithValue(ctx, userIDKey, userID)
}
Creating Tiered Rate Limits
Production APIs often need different rate limits for different user tiers:
package middleware
import (
"context"
"net/http"
"sync"
"golang.org/x/time/rate"
)
// RateLimitTier defines a rate limit tier
type RateLimitTier struct {
Name string
Rate rate.Limit
Burst int
}
// Common tier definitions
var (
FreeTier = RateLimitTier{
Name: "free",
Rate: rate.Limit(10), // 10 requests per second
Burst: 20,
}
ProTier = RateLimitTier{
Name: "pro",
Rate: rate.Limit(100), // 100 requests per second
Burst: 200,
}
EnterpriseTier = RateLimitTier{
Name: "enterprise",
Rate: rate.Limit(1000), // 1000 requests per second
Burst: 2000,
}
)
// TieredRateLimiter manages different rate limits per tier
type TieredRateLimiter struct {
limiters map[string]*rate.Limiter
mu sync.RWMutex
tiers map[string]RateLimitTier
}
// NewTieredRateLimiter creates a rate limiter with tier support
func NewTieredRateLimiter(tiers []RateLimitTier) *TieredRateLimiter {
tierMap := make(map[string]RateLimitTier)
for _, tier := range tiers {
tierMap[tier.Name] = tier
}
return &TieredRateLimiter{
limiters: make(map[string]*rate.Limiter),
tiers: tierMap,
}
}
// GetLimiter returns a limiter for the user's tier
func (t *TieredRateLimiter) GetLimiter(userID string, tierName string) *rate.Limiter {
key := userID + ":" + tierName
t.mu.RLock()
limiter, exists := t.limiters[key]
t.mu.RUnlock()
if exists {
return limiter
}
// Create new limiter for this user and tier
tier, ok := t.tiers[tierName]
if !ok {
// Default to free tier if tier not found
tier = FreeTier
}
t.mu.Lock()
defer t.mu.Unlock()
// Double-check after acquiring write lock
if limiter, exists := t.limiters[key]; exists {
return limiter
}
limiter = rate.NewLimiter(tier.Rate, tier.Burst)
t.limiters[key] = limiter
return limiter
}
// TieredRateLimitMiddleware applies appropriate rate limit based on user tier
func (t *TieredRateLimiter) TieredRateLimitMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
userID, ok := GetUserIDFromContext(r.Context())
if !ok {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
// Get user's tier from context (set by auth middleware)
tier, ok := GetUserTierFromContext(r.Context())
if !ok {
tier = "free" // Default to free tier
}
limiter := t.GetLimiter(userID, tier)
if !limiter.Allow() {
w.Header().Set("X-RateLimit-Tier", tier)
http.Error(w, "Rate limit exceeded for your tier", http.StatusTooManyRequests)
return
}
w.Header().Set("X-RateLimit-Tier", tier)
next.ServeHTTP(w, r)
})
}
// Context helpers for tier
const userTierKey contextKey = "userTier"
func GetUserTierFromContext(ctx context.Context) (string, bool) {
tier, ok := ctx.Value(userTierKey).(string)
return tier, ok
}
func SetUserTierInContext(ctx context.Context, tier string) context.Context {
return context.WithValue(ctx, userTierKey, tier)
}
Best Practices
Choose Appropriate Limits: Don’t set limits arbitrarily. Analyze your traffic patterns:
- Start with generous limits and tighten based on abuse patterns
- Monitor 95th and 99th percentile request rates
- Different endpoints may need different limits (writes vs. reads)
Communicate Limits Clearly: Document rate limits in your API documentation:
- State exact limits (requests per second/minute/hour)
- Explain what counts as a request
- Describe burst allowances
- Provide example code for handling rate limits
Implement Graceful Degradation:
// Allow some requests through even when rate limited (probabilistic)
func (i *IPRateLimiter) AllowWithProbability(ip string, probability float64) bool {
limiter := i.GetLimiter(ip)
if limiter.Allow() {
return true
}
// Even if rate limited, allow with some probability
if rand.Float64() < probability {
return true
}
return false
}
Use Distributed Rate Limiting for Multi-Server Deployments:
For applications running on multiple servers, use Redis-based rate limiting:
// Pseudo-code for Redis-based rate limiting
func (r *RedisRateLimiter) Allow(key string) bool {
// Use Redis INCR with EXPIRE for sliding window
count, err := r.client.Incr(ctx, key).Result()
if err != nil {
// On Redis error, fail open (allow request)
return true
}
if count == 1 {
// Set expiration on first request in window
r.client.Expire(ctx, key, r.window)
}
return count <= r.limit
}
Monitor Rate Limiting Effectiveness:
// Track rate limiting metrics
type RateLimitMetrics struct {
TotalRequests int64
RateLimitedCount int64
mu sync.Mutex
}
func (m *RateLimitMetrics) RecordRequest(limited bool) {
m.mu.Lock()
defer m.mu.Unlock()
m.TotalRequests++
if limited {
m.RateLimitedCount++
}
}
func (m *RateLimitMetrics) GetRateLimitPercentage() float64 {
m.mu.Lock()
defer m.mu.Unlock()
if m.TotalRequests == 0 {
return 0
}
return float64(m.RateLimitedCount) / float64(m.TotalRequests) * 100
}
Provide Bypass Mechanisms:
// Allow whitelisted IPs to bypass rate limiting
type WhitelistedRateLimiter struct {
limiter *IPRateLimiter
whitelist map[string]bool
mu sync.RWMutex
}
func (w *WhitelistedRateLimiter) IsWhitelisted(ip string) bool {
w.mu.RLock()
defer w.mu.RUnlock()
return w.whitelist[ip]
}
func (w *WhitelistedRateLimiter) Allow(ip string) bool {
if w.IsWhitelisted(ip) {
return true
}
return w.limiter.GetLimiter(ip).Allow()
}
Common Pitfalls and How to Avoid Them
Pitfall: Memory Leaks from Unlimited Limiter Growth
Without cleanup, your limiters map grows forever as new IPs connect.
Solution: Implement periodic cleanup as shown in our IPRateLimiterWithCleanup
or use LRU caches with size limits.
Pitfall: Race Conditions in Concurrent Access
Multiple goroutines accessing the limiters map simultaneously causes panics.
Solution: Always protect shared state with mutexes. Use RWMutex
when reads vastly outnumber writes for better performance.
Pitfall: Trusting Client-Provided Headers Blindly
Attackers can spoof X-Forwarded-For
to bypass IP-based rate limiting.
Solution: Only trust forwarding headers when requests come from known proxies. Validate proxy IPs before trusting client IP headers.
Pitfall: Rate Limiting Before Authentication
This seems logical but can cause issues:
// Bad: rate limit before auth
router.Use(rateLimitMiddleware)
router.Use(authMiddleware)
// Better: auth first, then rate limit by user
router.Use(authMiddleware)
router.Use(userRateLimitMiddleware)
Rate limiting after authentication allows per-user limits and prevents attackers from depleting IP-based limits affecting legitimate users on shared IPs.
Pitfall: Setting Burst Too Low
A burst of 1 or 2 makes your API frustrating for legitimate clients experiencing network retries or parallel requests.
Solution: Set burst to 2-3x your sustained rate to handle legitimate traffic spikes while still preventing abuse.
Pitfall: Not Testing Under Load
Rate limiting behavior differs under load due to GC pressure, mutex contention, and CPU saturation.
Solution: Load test your rate limiter with realistic traffic patterns before production deployment.
Real-World Use Cases
Use Case 1: Protecting Login Endpoints
Authentication endpoints need aggressive rate limiting to prevent credential stuffing:
// Strict rate limiting for authentication
authLimiter := NewIPRateLimiter(rate.Limit(0.1), 3) // 1 request per 10 seconds, burst of 3
router.Handle("/api/login", authLimiter.RateLimitMiddleware(
http.HandlerFunc(loginHandler),
))
router.Handle("/api/register", authLimiter.RateLimitMiddleware(
http.HandlerFunc(registerHandler),
))
This allows 3 quick attempts (maybe user fat-fingered their password) but throttles sustained brute force attacks to 6 attempts per minute.
Use Case 2: API Gateway Rate Limiting
An API gateway serving multiple backend services needs flexible rate limiting:
type APIGateway struct {
defaultLimiter *IPRateLimiter
endpointLimits map[string]*IPRateLimiter
}
func (g *APIGateway) GetLimiterForEndpoint(path string) *IPRateLimiter {
if limiter, exists := g.endpointLimits[path]; exists {
return limiter
}
return g.defaultLimiter
}
func (g *APIGateway) RateLimitMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
limiter := g.GetLimiterForEndpoint(r.URL.Path)
ip := GetClientIP(r)
if !limiter.GetLimiter(ip).Allow() {
http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
return
}
next.ServeHTTP(w, r)
})
}
Use Case 3: Webhook Delivery Rate Limiting
When delivering webhooks to customer endpoints, rate limit per destination to avoid overwhelming customer servers:
type WebhookRateLimiter struct {
limiters map[string]*rate.Limiter // key: customer endpoint URL
mu sync.RWMutex
}
func (w *WebhookRateLimiter) DeliverWebhook(url string, payload []byte) error {
limiter := w.GetLimiter(url)
// Wait for permission if rate limited
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := limiter.Wait(ctx); err != nil {
return fmt.Errorf("rate limit wait timeout: %w", err)
}
// Deliver webhook
return sendWebhook(url, payload)
}
Use Case 4: Freemium Model with Progressive Throttling
Gently throttle free tier users as they approach limits rather than hard blocking:
func (t *TieredRateLimiter) AllowWithWarning(userID, tier string) (allowed bool, nearLimit bool) {
limiter := t.GetLimiter(userID, tier)
// Check current tokens
reservation := limiter.Reserve()
if !reservation.OK() {
return false, false
}
delay := reservation.Delay()
reservation.Cancel()
if delay > 0 {
// Already rate limited
return false, false
}
// Allow request
limiter.Allow()
// Check if approaching limit (within 20% of burst)
tierConfig := t.tiers[tier]
nearLimit = limiter.Tokens() < float64(tierConfig.Burst)*0.2
return true, nearLimit
}
Performance Considerations
Memory Usage:
Each limiter stores minimal state (last token time and current tokens), approximately 32 bytes. For 1 million active IPs, expect ~32MB memory usage plus map overhead (roughly 8 bytes per entry), totaling ~40MB.
CPU Impact:
Rate limiting adds negligible CPU overhead per request:
- Map lookup: O(1) average case
- Mutex lock/unlock: ~20-50 nanoseconds on modern hardware
- Token calculation: Simple arithmetic
Benchmark results on typical hardware:
BenchmarkRateLimiter-8 5000000 250 ns/op 0 B/op 0 allocs/op
That’s 4 million rate limit checks per second per core—more than sufficient for most applications.
Scaling to High Traffic:
For services handling millions of requests per second:
Use sharding: Partition limiters across multiple maps with separate mutexes:
type ShardedRateLimiter struct {
shards []*IPRateLimiter
shardMask uint32
}
func (s *ShardedRateLimiter) getShard(ip string) *IPRateLimiter {
hash := fnv32a(ip)
return s.shards[hash&s.shardMask]
}
Consider atomic operations: For very simple counters, sync/atomic
can outperform mutexes:
type AtomicCounter struct {
value int64
reset int64 // Unix timestamp
}
func (a *AtomicCounter) Increment() int64 {
now := time.Now().Unix()
reset := atomic.LoadInt64(&a.reset)
if now > reset {
// Reset window
atomic.StoreInt64(&a.value, 1)
atomic.StoreInt64(&a.reset, now+60)
return 1
}
return atomic.AddInt64(&a.value, 1)
}
Use Redis for distributed rate limiting: For multi-server deployments, centralized state in Redis prevents bypassing limits by distributing requests across servers.
Testing Approach
Unit Tests for Basic Functionality:
package middleware
import (
"net/http"
"net/http/httptest"
"testing"
"time"
"golang.org/x/time/rate"
)
func TestRateLimiter_Allow(t *testing.T) {
limiter := NewIPRateLimiter(rate.Limit(1), 2) // 1 per second, burst 2
// Should allow first 2 requests (burst)
l := limiter.GetLimiter("192.168.1.1")
if !l.Allow() {
t.Error("First request should be allowed")
}
if !l.Allow() {
t.Error("Second request should be allowed")
}
// Third request should be denied
if l.Allow() {
t.Error("Third request should be denied")
}
// After 1 second, one more request should be allowed
time.Sleep(1 * time.Second)
if !l.Allow() {
t.Error("Request after delay should be allowed")
}
}
func TestRateLimiter_DifferentIPs(t *testing.T) {
limiter := NewIPRateLimiter(rate.Limit(1), 1)
// Different IPs should have independent limits
ip1 := limiter.GetLimiter("192.168.1.1")
ip2 := limiter.GetLimiter("192.168.1.2")
if !ip1.Allow() {
t.Error("IP1 first request should be allowed")
}
if !ip2.Allow() {
t.Error("IP2 first request should be allowed")
}
// Both should be rate limited now
if ip1.Allow() {
t.Error("IP1 second request should be denied")
}
if ip2.Allow() {
t.Error("IP2 second request should be denied")
}
}
Integration Tests for Middleware:
func TestRateLimitMiddleware_Integration(t *testing.T) {
limiter := NewIPRateLimiter(rate.Limit(2), 2)
handler := limiter.RateLimitMiddleware(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
w.Write([]byte("OK"))
}))
// Simulate multiple requests from same IP
for i := 0; i < 2; i++ {
req := httptest.NewRequest("GET", "/", nil)
req.RemoteAddr = "192.168.1.1:1234"
w := httptest.NewRecorder()
handler.ServeHTTP(w, req)
if w.Code != http.StatusOK {
t.Errorf("Request %d: expected status 200, got %d", i+1, w.Code)
}
}
// Third request should be rate limited
req := httptest.NewRequest("GET", "/", nil)
req.RemoteAddr = "192.168.1.1:1234"
w := httptest.NewRecorder()
handler.ServeHTTP(w, req)
if w.Code != http.StatusTooManyRequests {
t.Errorf("Expected status 429, got %d", w.Code)
}
}
Load Testing:
func TestRateLimiter_Concurrent(t *testing.T) {
limiter := NewIPRateLimiter(rate.Limit(100), 100)
const numGoroutines = 100
const requestsPerGoroutine = 1000
var wg sync.WaitGroup
wg.Add(numGoroutines)
for i := 0; i < numGoroutines; i++ {
go func(id int) {
defer wg.Done()
ip := fmt.Sprintf("192.168.1.%d", id)
for j := 0; j < requestsPerGoroutine; j++ {
limiter.GetLimiter(ip).Allow()
}
}(i)
}
wg.Wait()
// Test passes if no race conditions or panics occur
}
Benchmark Tests:
func BenchmarkRateLimiter_SingleIP(b *testing.B) {
limiter := NewIPRateLimiter(rate.Limit(1000000), 1000000) // High limit
l := limiter.GetLimiter("192.168.1.1")
b.ResetTimer()
for i := 0; i < b.N; i++ {
l.Allow()
}
}
func BenchmarkRateLimiter_ManyIPs(b *testing.B) {
limiter := NewIPRateLimiter(rate.Limit(1000000), 1000000)
b.ResetTimer()
for i := 0; i < b.N; i++ {
ip := fmt.Sprintf("192.168.1.%d", i%1000)
limiter.GetLimiter(ip).Allow()
}
}
func BenchmarkRateLimiter_Concurrent(b *testing.B) {
limiter := NewIPRateLimiter(rate.Limit(1000000), 1000000)
b.RunParallel(func(pb *testing.PB) {
ip := "192.168.1.1"
l := limiter.GetLimiter(ip)
for pb.Next() {
l.Allow()
}
})
}
Conclusion
Rate limiting is essential infrastructure for any production API. A well-implemented rate limiter protects your service from abuse, ensures fair resource allocation, and maintains quality of service for all users.
Key takeaways:
- Choose the right algorithm: Token bucket balances burst allowance with sustained rate enforcement
- Clean up properly: Implement periodic cleanup to prevent memory leaks
- Be informative: Add standard rate limit headers to help clients adapt
- Scale appropriately: Use IP-based limiting for public endpoints, user-based for authenticated APIs
- Test thoroughly: Unit tests, integration tests, and load tests ensure reliability
- Monitor continuously: Track rate limiting effectiveness and adjust limits based on real usage
- Fail safely: When in doubt, allow requests rather than blocking legitimate traffic
The patterns and code in this guide handle common scenarios, but your specific requirements may differ. Always profile your implementation under realistic load, monitor production behavior, and iterate based on actual usage patterns.
Rate limiting is not a one-time implementation—it requires ongoing tuning as your traffic patterns evolve. Start with conservative limits, monitor carefully, and adjust based on data rather than assumptions.
Additional Resources
Libraries and Tools:
- golang.org/x/time/rate - Official Go rate limiting package
- tollbooth - HTTP rate limiter middleware
- redis-rate - Redis-based distributed rate limiting
- ulule/limiter - Dead simple rate limiter with multiple storage backends
Further Reading:
- Rate Limiting in Distributed Systems - Stripe’s approach
- Token Bucket Algorithm - Algorithm fundamentals
- RFC 6585 - Additional HTTP status codes including 429
- API Rate Limiting Best Practices - NGINX guide
API Rate Limit Standards:
- IETF Draft: RateLimit Headers - Standardized headers
- GitHub API Rate Limiting - Real-world example
- Twitter API Rate Limits - Tiered approach
Advanced Topics:
- Distributed rate limiting with Redis
- Adaptive rate limiting based on system load
- Rate limiting in microservices architectures
- DDoS mitigation strategies
- Circuit breakers and rate limiting combined
Read More
How to Rate Limit HTTP Requests by Alex Edwards - An excellent introduction to rate limiting in Go that inspired this comprehensive guide. Alex’s tutorial provides a concise foundation for understanding rate limiting fundamentals.