Building Production-Ready Rate Limiting Middleware in Go

Building Production-Ready Rate Limiting Middleware in Go

Rate limiting is your application’s first line of defense against abuse, whether from malicious actors launching denial-of-service attacks, buggy clients stuck in retry loops, or legitimate users inadvertently overwhelming your system. Without proper rate limiting, a single misbehaving client can bring down your entire service, impacting all users and potentially costing your business significant revenue and reputation.

In this comprehensive guide, we’ll build a production-grade rate limiting middleware for Go HTTP servers that protects your APIs while maintaining performance and flexibility. We’ll start with fundamental concepts and progress to sophisticated patterns used by companies serving millions of requests per second.

Why Rate Limiting Matters

Protecting Service Availability: A single client making thousands of requests per second can exhaust your server’s resources—CPU, memory, database connections, and bandwidth. Rate limiting ensures fair resource distribution across all users.

Cost Control: Cloud infrastructure bills scale with usage. Without rate limiting, runaway clients can generate massive costs. One misconfigured script hitting your API continuously could result in thousands of dollars in unexpected charges.

Security Defense: Rate limiting is crucial for defending against:

  • Brute force attacks: Limiting login attempts prevents password guessing
  • Credential stuffing: Slowing down automated account takeover attempts
  • API scraping: Preventing competitors from harvesting your data
  • DDoS attacks: Mitigating distributed denial-of-service attempts

Quality of Service: Rate limits help maintain consistent response times for all users. Without them, a few heavy users can slow down the experience for everyone else.

Business Model Enforcement: For tiered API products, rate limiting ensures customers stay within their plan limits, protecting your revenue model and preventing abuse of free tiers.

Prerequisites

Before implementing rate limiting, you should understand:

  • Go fundamentals: Goroutines, channels, and concurrent programming patterns
  • HTTP basics: Request/response cycle, status codes, and headers
  • Middleware concepts: How middleware chains work in HTTP handlers
  • Token bucket algorithm: The rate limiting strategy we’ll implement
  • Concurrency primitives: Mutexes, atomic operations, and synchronization

Required packages:

go get golang.org/x/time/rate

This guide uses Go 1.21+ but is compatible with Go 1.13+.

Understanding Rate Limiting Algorithms

Before writing code, let’s understand the token bucket algorithm, which powers golang.org/x/time/rate.

The Token Bucket Model:

Imagine a bucket that holds tokens. Each token represents permission to make one request:

  1. Tokens are added at a constant rate (the refill rate)
  2. The bucket has a maximum capacity (the burst size)
  3. Each request consumes one token
  4. If no tokens are available, the request is denied

For example, a rate limiter with rate=10 and burst=20 means:

  • Tokens refill at 10 per second
  • The bucket can hold maximum 20 tokens
  • A client can make 20 requests instantly (burst)
  • After bursting, they’re limited to 10 requests per second
  • Tokens accumulate when not used, up to the burst limit

This algorithm is preferred because it:

  • Allows bursts: Legitimate clients can handle temporary spikes
  • Prevents sustained abuse: Long-term rate is strictly enforced
  • Is simple and efficient: Requires minimal state per client

Building a Basic Rate Limiting Middleware

Let’s start with a simple per-IP rate limiter:

package middleware

import (
    "log"
    "net"
    "net/http"
    "sync"

    "golang.org/x/time/rate"
)

// IPRateLimiter manages rate limiters for multiple clients
type IPRateLimiter struct {
    limiters map[string]*rate.Limiter
    mu       sync.RWMutex
    rate     rate.Limit // requests per second
    burst    int        // maximum burst size
}

// NewIPRateLimiter creates a new IP-based rate limiter
func NewIPRateLimiter(r rate.Limit, b int) *IPRateLimiter {
    return &IPRateLimiter{
        limiters: make(map[string]*rate.Limiter),
        rate:     r,
        burst:    b,
    }
}

// GetLimiter returns the rate limiter for the provided IP address
func (i *IPRateLimiter) GetLimiter(ip string) *rate.Limiter {
    i.mu.Lock()
    defer i.mu.Unlock()

    limiter, exists := i.limiters[ip]
    if !exists {
        limiter = rate.NewLimiter(i.rate, i.burst)
        i.limiters[ip] = limiter
    }

    return limiter
}

// RateLimitMiddleware creates HTTP middleware that rate limits by IP
func (i *IPRateLimiter) RateLimitMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // Extract IP address from request
        ip, _, err := net.SplitHostPort(r.RemoteAddr)
        if err != nil {
            log.Printf("Error parsing IP address: %v", err)
            http.Error(w, "Internal Server Error", http.StatusInternalServerError)
            return
        }

        // Get or create rate limiter for this IP
        limiter := i.GetLimiter(ip)

        // Check if request is allowed
        if !limiter.Allow() {
            http.Error(w, http.StatusText(http.StatusTooManyRequests), http.StatusTooManyRequests)
            return
        }

        // Request is allowed, proceed to next handler
        next.ServeHTTP(w, r)
    })
}

This basic implementation provides:

  • Per-IP rate limiting: Each IP address gets its own limiter
  • Thread safety: RWMutex protects concurrent access to the limiters map
  • Automatic limiter creation: New IPs get limiters on first request
  • Simple denial: Blocked requests receive 429 Too Many Requests

However, this basic version has critical flaws we’ll address in production-ready implementations.

Extracting Client IP Addresses Correctly

Getting the real client IP is tricky when your application sits behind proxies or load balancers:

package middleware

import (
    "net"
    "net/http"
    "strings"
)

// GetClientIP extracts the real client IP from the request
// considering proxies and load balancers
func GetClientIP(r *http.Request) string {
    // Check X-Forwarded-For header (set by proxies)
    xff := r.Header.Get("X-Forwarded-For")
    if xff != "" {
        // X-Forwarded-For can contain multiple IPs: "client, proxy1, proxy2"
        // Take the first one (the original client)
        ips := strings.Split(xff, ",")
        if len(ips) > 0 {
            // Trim whitespace from IP
            clientIP := strings.TrimSpace(ips[0])
            // Validate it's a real IP
            if net.ParseIP(clientIP) != nil {
                return clientIP
            }
        }
    }

    // Check X-Real-IP header (set by some proxies)
    xri := r.Header.Get("X-Real-IP")
    if xri != "" {
        if net.ParseIP(xri) != nil {
            return xri
        }
    }

    // Fall back to RemoteAddr
    ip, _, err := net.SplitHostPort(r.RemoteAddr)
    if err != nil {
        // If no port, RemoteAddr might be just the IP
        return r.RemoteAddr
    }

    return ip
}

// GetClientIPWithTrustedProxies safely extracts client IP
// only when the request comes through trusted proxies
func GetClientIPWithTrustedProxies(r *http.Request, trustedProxies []string) string {
    remoteIP, _, _ := net.SplitHostPort(r.RemoteAddr)
    
    // Check if request comes from a trusted proxy
    isTrusted := false
    for _, proxy := range trustedProxies {
        if remoteIP == proxy {
            isTrusted = true
            break
        }
    }

    // Only trust X-Forwarded-For if from trusted proxy
    if isTrusted {
        xff := r.Header.Get("X-Forwarded-For")
        if xff != "" {
            ips := strings.Split(xff, ",")
            if len(ips) > 0 {
                return strings.TrimSpace(ips[0])
            }
        }
    }

    // Otherwise use the direct connection IP
    return remoteIP
}

Security consideration: Never blindly trust X-Forwarded-For without validating the request comes from a trusted proxy. Attackers can spoof this header to bypass IP-based rate limiting.

Implementing Memory-Efficient Cleanup

Our basic rate limiter has a memory leak: limiters are never removed. If your API receives requests from millions of IPs, memory usage grows unbounded.

package middleware

import (
    "sync"
    "time"

    "golang.org/x/time/rate"
)

// visitor represents a rate-limited client
type visitor struct {
    limiter  *rate.Limiter
    lastSeen time.Time
}

// IPRateLimiterWithCleanup manages rate limiters with automatic cleanup
type IPRateLimiterWithCleanup struct {
    visitors map[string]*visitor
    mu       sync.RWMutex
    rate     rate.Limit
    burst    int
}

// NewIPRateLimiterWithCleanup creates a rate limiter with background cleanup
func NewIPRateLimiterWithCleanup(r rate.Limit, b int) *IPRateLimiterWithCleanup {
    limiter := &IPRateLimiterWithCleanup{
        visitors: make(map[string]*visitor),
        rate:     r,
        burst:    b,
    }

    // Start cleanup goroutine
    go limiter.cleanupVisitors()

    return limiter
}

// GetLimiter returns the rate limiter for an IP, creating if necessary
func (i *IPRateLimiterWithCleanup) GetLimiter(ip string) *rate.Limiter {
    i.mu.Lock()
    defer i.mu.Unlock()

    v, exists := i.visitors[ip]
    if !exists {
        limiter := rate.NewLimiter(i.rate, i.burst)
        i.visitors[ip] = &visitor{
            limiter:  limiter,
            lastSeen: time.Now(),
        }
        return limiter
    }

    // Update last seen time
    v.lastSeen = time.Now()
    return v.limiter
}

// cleanupVisitors removes inactive visitors to prevent memory leaks
func (i *IPRateLimiterWithCleanup) cleanupVisitors() {
    ticker := time.NewTicker(5 * time.Minute)
    defer ticker.Stop()

    for range ticker.C {
        i.mu.Lock()
        
        // Remove visitors inactive for more than 10 minutes
        threshold := time.Now().Add(-10 * time.Minute)
        for ip, v := range i.visitors {
            if v.lastSeen.Before(threshold) {
                delete(i.visitors, ip)
            }
        }
        
        i.mu.Unlock()
    }
}

This cleanup strategy:

  • Runs periodically: Checks every 5 minutes
  • Removes stale entries: Deletes visitors inactive for 10+ minutes
  • Balances memory and accuracy: Keeps recent visitors for smooth rate limiting
  • Uses minimal resources: Simple background goroutine with low overhead

Adding Informative Rate Limit Headers

Good APIs inform clients about rate limits through HTTP headers, following industry standards:

package middleware

import (
    "fmt"
    "net/http"
    "time"

    "golang.org/x/time/rate"
)

// RateLimitMiddlewareWithHeaders adds rate limit information to responses
func (i *IPRateLimiterWithCleanup) RateLimitMiddlewareWithHeaders(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        ip := GetClientIP(r)
        limiter := i.GetLimiter(ip)

        // Get current state of the limiter
        reservation := limiter.Reserve()
        if !reservation.OK() {
            // This shouldn't happen with our settings, but handle it
            http.Error(w, "Rate limit configuration error", http.StatusInternalServerError)
            return
        }

        // Calculate available tokens (approximate)
        delay := reservation.Delay()
        reservation.Cancel() // We're not actually consuming the token yet

        if delay > 0 {
            // Rate limit exceeded
            retryAfter := time.Now().Add(delay)
            
            // Set standard rate limit headers
            w.Header().Set("X-RateLimit-Limit", fmt.Sprintf("%d", i.burst))
            w.Header().Set("X-RateLimit-Remaining", "0")
            w.Header().Set("X-RateLimit-Reset", fmt.Sprintf("%d", retryAfter.Unix()))
            w.Header().Set("Retry-After", fmt.Sprintf("%d", int(delay.Seconds())+1))
            
            http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
            return
        }

        // Allow the request and consume the token
        limiter.Allow()

        // Add rate limit headers for successful requests
        // Note: These are approximate since tokens are consumed continuously
        w.Header().Set("X-RateLimit-Limit", fmt.Sprintf("%d", i.burst))
        
        // Remaining tokens is approximate
        remaining := i.burst - 1 // We just consumed one
        if remaining < 0 {
            remaining = 0
        }
        w.Header().Set("X-RateLimit-Remaining", fmt.Sprintf("%d", remaining))
        
        // Reset time is when bucket will be full again (approximate)
        resetTime := time.Now().Add(time.Duration(i.burst) * time.Second / time.Duration(i.rate))
        w.Header().Set("X-RateLimit-Reset", fmt.Sprintf("%d", resetTime.Unix()))

        next.ServeHTTP(w, r)
    })
}

These headers help clients:

  • Understand their limits: X-RateLimit-Limit shows the maximum rate
  • Avoid hitting limits: X-RateLimit-Remaining helps clients pace requests
  • Know when to retry: X-RateLimit-Reset and Retry-After indicate when limits reset
  • Build smarter clients: Applications can implement exponential backoff using this data

Implementing Per-User Rate Limiting

IP-based rate limiting doesn’t work well for:

  • Mobile users: IPs change frequently as users move between networks
  • Corporate networks: Many users share the same IP
  • Authenticated APIs: You want to limit by user account, not IP

Here’s how to implement flexible per-user rate limiting:

package middleware

import (
    "context"
    "net/http"
    "sync"

    "golang.org/x/time/rate"
)

// UserRateLimiter manages rate limits by user ID
type UserRateLimiter struct {
    limiters map[string]*visitor
    mu       sync.RWMutex
    rate     rate.Limit
    burst    int
}

// NewUserRateLimiter creates a user-based rate limiter
func NewUserRateLimiter(r rate.Limit, b int) *UserRateLimiter {
    limiter := &UserRateLimiter{
        limiters: make(map[string]*visitor),
        rate:     r,
        burst:    b,
    }
    go limiter.cleanupVisitors()
    return limiter
}

func (u *UserRateLimiter) GetLimiter(userID string) *rate.Limiter {
    u.mu.Lock()
    defer u.mu.Unlock()

    v, exists := u.limiters[userID]
    if !exists {
        limiter := rate.NewLimiter(u.rate, u.burst)
        u.limiters[userID] = &visitor{
            limiter:  limiter,
            lastSeen: time.Now(),
        }
        return limiter
    }

    v.lastSeen = time.Now()
    return v.limiter
}

func (u *UserRateLimiter) cleanupVisitors() {
    ticker := time.NewTicker(5 * time.Minute)
    defer ticker.Stop()

    for range ticker.C {
        u.mu.Lock()
        threshold := time.Now().Add(-10 * time.Minute)
        for userID, v := range u.limiters {
            if v.lastSeen.Before(threshold) {
                delete(u.limiters, userID)
            }
        }
        u.mu.Unlock()
    }
}

// UserRateLimitMiddleware creates middleware for authenticated endpoints
func (u *UserRateLimiter) UserRateLimitMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // Extract user ID from context (set by auth middleware)
        userID, ok := GetUserIDFromContext(r.Context())
        if !ok {
            // No user ID means unauthenticated request
            // You might want IP-based rate limiting for these
            http.Error(w, "Unauthorized", http.StatusUnauthorized)
            return
        }

        limiter := u.GetLimiter(userID)
        if !limiter.Allow() {
            http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
            return
        }

        next.ServeHTTP(w, r)
    })
}

// Context helpers
type contextKey string

const userIDKey contextKey = "userID"

func GetUserIDFromContext(ctx context.Context) (string, bool) {
    userID, ok := ctx.Value(userIDKey).(string)
    return userID, ok
}

func SetUserIDInContext(ctx context.Context, userID string) context.Context {
    return context.WithValue(ctx, userIDKey, userID)
}

Creating Tiered Rate Limits

Production APIs often need different rate limits for different user tiers:

package middleware

import (
    "context"
    "net/http"
    "sync"

    "golang.org/x/time/rate"
)

// RateLimitTier defines a rate limit tier
type RateLimitTier struct {
    Name  string
    Rate  rate.Limit
    Burst int
}

// Common tier definitions
var (
    FreeTier = RateLimitTier{
        Name:  "free",
        Rate:  rate.Limit(10),  // 10 requests per second
        Burst: 20,
    }
    ProTier = RateLimitTier{
        Name:  "pro",
        Rate:  rate.Limit(100), // 100 requests per second
        Burst: 200,
    }
    EnterpriseTier = RateLimitTier{
        Name:  "enterprise",
        Rate:  rate.Limit(1000), // 1000 requests per second
        Burst: 2000,
    }
)

// TieredRateLimiter manages different rate limits per tier
type TieredRateLimiter struct {
    limiters map[string]*rate.Limiter
    mu       sync.RWMutex
    tiers    map[string]RateLimitTier
}

// NewTieredRateLimiter creates a rate limiter with tier support
func NewTieredRateLimiter(tiers []RateLimitTier) *TieredRateLimiter {
    tierMap := make(map[string]RateLimitTier)
    for _, tier := range tiers {
        tierMap[tier.Name] = tier
    }

    return &TieredRateLimiter{
        limiters: make(map[string]*rate.Limiter),
        tiers:    tierMap,
    }
}

// GetLimiter returns a limiter for the user's tier
func (t *TieredRateLimiter) GetLimiter(userID string, tierName string) *rate.Limiter {
    key := userID + ":" + tierName

    t.mu.RLock()
    limiter, exists := t.limiters[key]
    t.mu.RUnlock()

    if exists {
        return limiter
    }

    // Create new limiter for this user and tier
    tier, ok := t.tiers[tierName]
    if !ok {
        // Default to free tier if tier not found
        tier = FreeTier
    }

    t.mu.Lock()
    defer t.mu.Unlock()

    // Double-check after acquiring write lock
    if limiter, exists := t.limiters[key]; exists {
        return limiter
    }

    limiter = rate.NewLimiter(tier.Rate, tier.Burst)
    t.limiters[key] = limiter
    return limiter
}

// TieredRateLimitMiddleware applies appropriate rate limit based on user tier
func (t *TieredRateLimiter) TieredRateLimitMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        userID, ok := GetUserIDFromContext(r.Context())
        if !ok {
            http.Error(w, "Unauthorized", http.StatusUnauthorized)
            return
        }

        // Get user's tier from context (set by auth middleware)
        tier, ok := GetUserTierFromContext(r.Context())
        if !ok {
            tier = "free" // Default to free tier
        }

        limiter := t.GetLimiter(userID, tier)
        if !limiter.Allow() {
            w.Header().Set("X-RateLimit-Tier", tier)
            http.Error(w, "Rate limit exceeded for your tier", http.StatusTooManyRequests)
            return
        }

        w.Header().Set("X-RateLimit-Tier", tier)
        next.ServeHTTP(w, r)
    })
}

// Context helpers for tier
const userTierKey contextKey = "userTier"

func GetUserTierFromContext(ctx context.Context) (string, bool) {
    tier, ok := ctx.Value(userTierKey).(string)
    return tier, ok
}

func SetUserTierInContext(ctx context.Context, tier string) context.Context {
    return context.WithValue(ctx, userTierKey, tier)
}

Best Practices

Choose Appropriate Limits: Don’t set limits arbitrarily. Analyze your traffic patterns:

  • Start with generous limits and tighten based on abuse patterns
  • Monitor 95th and 99th percentile request rates
  • Different endpoints may need different limits (writes vs. reads)

Communicate Limits Clearly: Document rate limits in your API documentation:

  • State exact limits (requests per second/minute/hour)
  • Explain what counts as a request
  • Describe burst allowances
  • Provide example code for handling rate limits

Implement Graceful Degradation:

// Allow some requests through even when rate limited (probabilistic)
func (i *IPRateLimiter) AllowWithProbability(ip string, probability float64) bool {
    limiter := i.GetLimiter(ip)
    
    if limiter.Allow() {
        return true
    }
    
    // Even if rate limited, allow with some probability
    if rand.Float64() < probability {
        return true
    }
    
    return false
}

Use Distributed Rate Limiting for Multi-Server Deployments:

For applications running on multiple servers, use Redis-based rate limiting:

// Pseudo-code for Redis-based rate limiting
func (r *RedisRateLimiter) Allow(key string) bool {
    // Use Redis INCR with EXPIRE for sliding window
    count, err := r.client.Incr(ctx, key).Result()
    if err != nil {
        // On Redis error, fail open (allow request)
        return true
    }
    
    if count == 1 {
        // Set expiration on first request in window
        r.client.Expire(ctx, key, r.window)
    }
    
    return count <= r.limit
}

Monitor Rate Limiting Effectiveness:

// Track rate limiting metrics
type RateLimitMetrics struct {
    TotalRequests    int64
    RateLimitedCount int64
    mu               sync.Mutex
}

func (m *RateLimitMetrics) RecordRequest(limited bool) {
    m.mu.Lock()
    defer m.mu.Unlock()
    
    m.TotalRequests++
    if limited {
        m.RateLimitedCount++
    }
}

func (m *RateLimitMetrics) GetRateLimitPercentage() float64 {
    m.mu.Lock()
    defer m.mu.Unlock()
    
    if m.TotalRequests == 0 {
        return 0
    }
    return float64(m.RateLimitedCount) / float64(m.TotalRequests) * 100
}

Provide Bypass Mechanisms:

// Allow whitelisted IPs to bypass rate limiting
type WhitelistedRateLimiter struct {
    limiter   *IPRateLimiter
    whitelist map[string]bool
    mu        sync.RWMutex
}

func (w *WhitelistedRateLimiter) IsWhitelisted(ip string) bool {
    w.mu.RLock()
    defer w.mu.RUnlock()
    return w.whitelist[ip]
}

func (w *WhitelistedRateLimiter) Allow(ip string) bool {
    if w.IsWhitelisted(ip) {
        return true
    }
    return w.limiter.GetLimiter(ip).Allow()
}

Common Pitfalls and How to Avoid Them

Pitfall: Memory Leaks from Unlimited Limiter Growth

Without cleanup, your limiters map grows forever as new IPs connect.

Solution: Implement periodic cleanup as shown in our IPRateLimiterWithCleanup or use LRU caches with size limits.

Pitfall: Race Conditions in Concurrent Access

Multiple goroutines accessing the limiters map simultaneously causes panics.

Solution: Always protect shared state with mutexes. Use RWMutex when reads vastly outnumber writes for better performance.

Pitfall: Trusting Client-Provided Headers Blindly

Attackers can spoof X-Forwarded-For to bypass IP-based rate limiting.

Solution: Only trust forwarding headers when requests come from known proxies. Validate proxy IPs before trusting client IP headers.

Pitfall: Rate Limiting Before Authentication

This seems logical but can cause issues:

// Bad: rate limit before auth
router.Use(rateLimitMiddleware)
router.Use(authMiddleware)

// Better: auth first, then rate limit by user
router.Use(authMiddleware)
router.Use(userRateLimitMiddleware)

Rate limiting after authentication allows per-user limits and prevents attackers from depleting IP-based limits affecting legitimate users on shared IPs.

Pitfall: Setting Burst Too Low

A burst of 1 or 2 makes your API frustrating for legitimate clients experiencing network retries or parallel requests.

Solution: Set burst to 2-3x your sustained rate to handle legitimate traffic spikes while still preventing abuse.

Pitfall: Not Testing Under Load

Rate limiting behavior differs under load due to GC pressure, mutex contention, and CPU saturation.

Solution: Load test your rate limiter with realistic traffic patterns before production deployment.

Real-World Use Cases

Use Case 1: Protecting Login Endpoints

Authentication endpoints need aggressive rate limiting to prevent credential stuffing:

// Strict rate limiting for authentication
authLimiter := NewIPRateLimiter(rate.Limit(0.1), 3) // 1 request per 10 seconds, burst of 3

router.Handle("/api/login", authLimiter.RateLimitMiddleware(
    http.HandlerFunc(loginHandler),
))

router.Handle("/api/register", authLimiter.RateLimitMiddleware(
    http.HandlerFunc(registerHandler),
))

This allows 3 quick attempts (maybe user fat-fingered their password) but throttles sustained brute force attacks to 6 attempts per minute.

Use Case 2: API Gateway Rate Limiting

An API gateway serving multiple backend services needs flexible rate limiting:

type APIGateway struct {
    defaultLimiter *IPRateLimiter
    endpointLimits map[string]*IPRateLimiter
}

func (g *APIGateway) GetLimiterForEndpoint(path string) *IPRateLimiter {
    if limiter, exists := g.endpointLimits[path]; exists {
        return limiter
    }
    return g.defaultLimiter
}

func (g *APIGateway) RateLimitMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        limiter := g.GetLimiterForEndpoint(r.URL.Path)
        ip := GetClientIP(r)
        
        if !limiter.GetLimiter(ip).Allow() {
            http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
            return
        }
        
        next.ServeHTTP(w, r)
    })
}

Use Case 3: Webhook Delivery Rate Limiting

When delivering webhooks to customer endpoints, rate limit per destination to avoid overwhelming customer servers:

type WebhookRateLimiter struct {
    limiters map[string]*rate.Limiter // key: customer endpoint URL
    mu       sync.RWMutex
}

func (w *WebhookRateLimiter) DeliverWebhook(url string, payload []byte) error {
    limiter := w.GetLimiter(url)
    
    // Wait for permission if rate limited
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    
    if err := limiter.Wait(ctx); err != nil {
        return fmt.Errorf("rate limit wait timeout: %w", err)
    }
    
    // Deliver webhook
    return sendWebhook(url, payload)
}

Use Case 4: Freemium Model with Progressive Throttling

Gently throttle free tier users as they approach limits rather than hard blocking:

func (t *TieredRateLimiter) AllowWithWarning(userID, tier string) (allowed bool, nearLimit bool) {
    limiter := t.GetLimiter(userID, tier)
    
    // Check current tokens
    reservation := limiter.Reserve()
    if !reservation.OK() {
        return false, false
    }
    
    delay := reservation.Delay()
    reservation.Cancel()
    
    if delay > 0 {
        // Already rate limited
        return false, false
    }
    
    // Allow request
    limiter.Allow()
    
    // Check if approaching limit (within 20% of burst)
    tierConfig := t.tiers[tier]
    nearLimit = limiter.Tokens() < float64(tierConfig.Burst)*0.2
    
    return true, nearLimit
}

Performance Considerations

Memory Usage:

Each limiter stores minimal state (last token time and current tokens), approximately 32 bytes. For 1 million active IPs, expect ~32MB memory usage plus map overhead (roughly 8 bytes per entry), totaling ~40MB.

CPU Impact:

Rate limiting adds negligible CPU overhead per request:

  • Map lookup: O(1) average case
  • Mutex lock/unlock: ~20-50 nanoseconds on modern hardware
  • Token calculation: Simple arithmetic

Benchmark results on typical hardware:

BenchmarkRateLimiter-8    5000000    250 ns/op    0 B/op    0 allocs/op

That’s 4 million rate limit checks per second per core—more than sufficient for most applications.

Scaling to High Traffic:

For services handling millions of requests per second:

Use sharding: Partition limiters across multiple maps with separate mutexes:

type ShardedRateLimiter struct {
    shards    []*IPRateLimiter
    shardMask uint32
}

func (s *ShardedRateLimiter) getShard(ip string) *IPRateLimiter {
    hash := fnv32a(ip)
    return s.shards[hash&s.shardMask]
}

Consider atomic operations: For very simple counters, sync/atomic can outperform mutexes:

type AtomicCounter struct {
    value int64
    reset int64 // Unix timestamp
}

func (a *AtomicCounter) Increment() int64 {
    now := time.Now().Unix()
    reset := atomic.LoadInt64(&a.reset)
    
    if now > reset {
        // Reset window
        atomic.StoreInt64(&a.value, 1)
        atomic.StoreInt64(&a.reset, now+60)
        return 1
    }
    
    return atomic.AddInt64(&a.value, 1)
}

Use Redis for distributed rate limiting: For multi-server deployments, centralized state in Redis prevents bypassing limits by distributing requests across servers.

Testing Approach

Unit Tests for Basic Functionality:

package middleware

import (
    "net/http"
    "net/http/httptest"
    "testing"
    "time"

    "golang.org/x/time/rate"
)

func TestRateLimiter_Allow(t *testing.T) {
    limiter := NewIPRateLimiter(rate.Limit(1), 2) // 1 per second, burst 2

    // Should allow first 2 requests (burst)
    l := limiter.GetLimiter("192.168.1.1")
    if !l.Allow() {
        t.Error("First request should be allowed")
    }
    if !l.Allow() {
        t.Error("Second request should be allowed")
    }

    // Third request should be denied
    if l.Allow() {
        t.Error("Third request should be denied")
    }

    // After 1 second, one more request should be allowed
    time.Sleep(1 * time.Second)
    if !l.Allow() {
        t.Error("Request after delay should be allowed")
    }
}

func TestRateLimiter_DifferentIPs(t *testing.T) {
    limiter := NewIPRateLimiter(rate.Limit(1), 1)

    // Different IPs should have independent limits
    ip1 := limiter.GetLimiter("192.168.1.1")
    ip2 := limiter.GetLimiter("192.168.1.2")

    if !ip1.Allow() {
        t.Error("IP1 first request should be allowed")
    }
    if !ip2.Allow() {
        t.Error("IP2 first request should be allowed")
    }

    // Both should be rate limited now
    if ip1.Allow() {
        t.Error("IP1 second request should be denied")
    }
    if ip2.Allow() {
        t.Error("IP2 second request should be denied")
    }
}

Integration Tests for Middleware:

func TestRateLimitMiddleware_Integration(t *testing.T) {
    limiter := NewIPRateLimiter(rate.Limit(2), 2)

    handler := limiter.RateLimitMiddleware(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        w.WriteHeader(http.StatusOK)
        w.Write([]byte("OK"))
    }))

    // Simulate multiple requests from same IP
    for i := 0; i < 2; i++ {
        req := httptest.NewRequest("GET", "/", nil)
        req.RemoteAddr = "192.168.1.1:1234"
        w := httptest.NewRecorder()

        handler.ServeHTTP(w, req)

        if w.Code != http.StatusOK {
            t.Errorf("Request %d: expected status 200, got %d", i+1, w.Code)
        }
    }

    // Third request should be rate limited
    req := httptest.NewRequest("GET", "/", nil)
    req.RemoteAddr = "192.168.1.1:1234"
    w := httptest.NewRecorder()

    handler.ServeHTTP(w, req)

    if w.Code != http.StatusTooManyRequests {
        t.Errorf("Expected status 429, got %d", w.Code)
    }
}

Load Testing:

func TestRateLimiter_Concurrent(t *testing.T) {
    limiter := NewIPRateLimiter(rate.Limit(100), 100)

    const numGoroutines = 100
    const requestsPerGoroutine = 1000

    var wg sync.WaitGroup
    wg.Add(numGoroutines)

    for i := 0; i < numGoroutines; i++ {
        go func(id int) {
            defer wg.Done()
            ip := fmt.Sprintf("192.168.1.%d", id)

            for j := 0; j < requestsPerGoroutine; j++ {
                limiter.GetLimiter(ip).Allow()
            }
        }(i)
    }

    wg.Wait()
    // Test passes if no race conditions or panics occur
}

Benchmark Tests:

func BenchmarkRateLimiter_SingleIP(b *testing.B) {
    limiter := NewIPRateLimiter(rate.Limit(1000000), 1000000) // High limit
    l := limiter.GetLimiter("192.168.1.1")

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        l.Allow()
    }
}

func BenchmarkRateLimiter_ManyIPs(b *testing.B) {
    limiter := NewIPRateLimiter(rate.Limit(1000000), 1000000)

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        ip := fmt.Sprintf("192.168.1.%d", i%1000)
        limiter.GetLimiter(ip).Allow()
    }
}

func BenchmarkRateLimiter_Concurrent(b *testing.B) {
    limiter := NewIPRateLimiter(rate.Limit(1000000), 1000000)

    b.RunParallel(func(pb *testing.PB) {
        ip := "192.168.1.1"
        l := limiter.GetLimiter(ip)
        for pb.Next() {
            l.Allow()
        }
    })
}

Conclusion

Rate limiting is essential infrastructure for any production API. A well-implemented rate limiter protects your service from abuse, ensures fair resource allocation, and maintains quality of service for all users.

Key takeaways:

  • Choose the right algorithm: Token bucket balances burst allowance with sustained rate enforcement
  • Clean up properly: Implement periodic cleanup to prevent memory leaks
  • Be informative: Add standard rate limit headers to help clients adapt
  • Scale appropriately: Use IP-based limiting for public endpoints, user-based for authenticated APIs
  • Test thoroughly: Unit tests, integration tests, and load tests ensure reliability
  • Monitor continuously: Track rate limiting effectiveness and adjust limits based on real usage
  • Fail safely: When in doubt, allow requests rather than blocking legitimate traffic

The patterns and code in this guide handle common scenarios, but your specific requirements may differ. Always profile your implementation under realistic load, monitor production behavior, and iterate based on actual usage patterns.

Rate limiting is not a one-time implementation—it requires ongoing tuning as your traffic patterns evolve. Start with conservative limits, monitor carefully, and adjust based on data rather than assumptions.

Additional Resources

Libraries and Tools:

Further Reading:

API Rate Limit Standards:

Advanced Topics:

  • Distributed rate limiting with Redis
  • Adaptive rate limiting based on system load
  • Rate limiting in microservices architectures
  • DDoS mitigation strategies
  • Circuit breakers and rate limiting combined

Read More

How to Rate Limit HTTP Requests by Alex Edwards - An excellent introduction to rate limiting in Go that inspired this comprehensive guide. Alex’s tutorial provides a concise foundation for understanding rate limiting fundamentals.