How to Implement Rate Limiting in Rust

Your API is open for business

Your API handles traffic. Then a script starts hammering the login endpoint five hundred times a second. Your database connection pool exhausts. Legitimate users get 503 errors. The server crashes. You didn't implement rate limiting. Rate limiting isn't just a security feature. It's your server's circuit breaker. It stops abuse before it kills your resources.

The token bucket analogy

Rate limiting controls how many requests a client can make within a time window. The server tracks counts per client identifier, usually an IP address or API key. When the count exceeds the limit, the server returns HTTP 429 Too Many Requests. The counter resets after the window expires.

Think of a token bucket. The bucket holds a fixed number of tokens. Each request consumes one token. Tokens refill at a steady rate. If the bucket is empty, requests are rejected. The bucket size controls burst capacity. The refill rate controls sustained throughput. A bucket of size 10 with a refill rate of 1 per second allows a burst of 10 requests, then enforces a steady limit of 1 request per second. This model handles traffic spikes gracefully while protecting the backend.

Minimal counter implementation

Start with the core logic. A rate limiter needs a way to count requests per client and reject excess traffic. This example shows a simple counter using a HashMap and a Mutex. It demonstrates the mechanics without time windows or burst handling.

use std::collections::HashMap;
use std::sync::Mutex;

/// Tracks request counts per client key.
struct SimpleRateLimiter {
    // Mutex protects the map from concurrent access across threads.
    requests: Mutex<HashMap<String, u32>>,
    limit: u32,
}

impl SimpleRateLimiter {
    fn new(limit: u32) -> Self {
        Self {
            // Initialize empty map. Keys are client identifiers.
            requests: Mutex::new(HashMap::new()),
            limit,
        }
    }

    /// Returns true if the request is allowed, false if rate limited.
    fn check(&self, key: &str) -> bool {
        // Lock the mutex to safely read and write the map.
        let mut map = self.requests.lock().unwrap();
        
        // entry() creates the key with value 0 if missing.
        // This avoids a second lookup compared to get_mut().
        let count = map.entry(key.to_string()).or_insert(0);
        
        if *count < self.limit {
            *count += 1;
            true
        } else {
            false
        }
    }
}

fn main() {
    let limiter = SimpleRateLimiter::new(3);
    
    // First three requests succeed.
    assert!(limiter.check("192.168.1.1"));
    assert!(limiter.check("192.168.1.1"));
    assert!(limiter.check("192.168.1.1"));
    
    // Fourth request is rejected.
    assert!(!limiter.check("192.168.1.1"));
    
    // Different client is tracked independently.
    assert!(limiter.check("10.0.0.5"));
}

The Mutex ensures only one thread updates the map at a time. Without it, concurrent requests could read stale counts and bypass the limit. The entry API is a Rust convention for efficient map updates. It performs the lookup once and provides mutable access, avoiding the double-hash cost of get followed by insert. This counter works, but it never resets. Add a timer or switch to a token bucket for production use.

Production-grade rate limiting with Governor

Real applications need time windows, burst handling, and efficient concurrent access. The governor crate implements the token bucket algorithm with high performance. It works with any framework and handles the math for you. This example integrates governor with actix-web.

use actix_web::{web, App, HttpServer, HttpResponse};
use governor::{DefaultKeyedRateLimiter, Quota, RateLimiter};
use std::sync::Arc;

/// Application state shared across all requests.
struct AppState {
    // Arc allows the limiter to be cloned cheaply and shared across async tasks.
    limiter: Arc<DefaultKeyedRateLimiter<String>>,
}

/// Handler that checks the rate limit before processing.
async fn hello(data: web::Data<AppState>, ip: web::IpAddr) -> HttpResponse {
    // Extract IP as the rate limit key.
    let key = ip.ip().to_string();
    let limiter = &data.limiter;

    // check_key returns Ok(()) if allowed, Err(_) if rate limited.
    // Governor handles the token bucket logic internally.
    match limiter.check_key(&key) {
        Ok(_) => HttpResponse::Ok().body("Hello"),
        Err(_) => HttpResponse::TooManyRequests().body("Slow down"),
    }
}

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    // Quota: 10 requests per second, with burst capacity of 10.
    // allow_burst sets the initial token count in the bucket.
    let quota = Quota::per_second(std::time::Duration::from_secs(1)).allow_burst(10);
    
    // Keyed limiter tracks state per IP automatically.
    let limiter = RateLimiter::keyed(quota);

    HttpServer::new(move || {
        App::new()
            // app_data shares state with handlers.
            // Data::new wraps the state for concurrent access.
            .app_data(web::Data::new(AppState {
                limiter: Arc::new(limiter.clone()),
            }))
            .route("/", web::get().to(hello))
    })
    .bind("127.0.0.1:8080")?
    .run()
    .await
}

The Quota defines the rate limit policy. per_second sets the refill rate. allow_burst sets the bucket size. The KeyedRateLimiter maintains separate buckets for each key. When you call check_key, governor consumes a token if available. Tokens refill automatically based on elapsed time. The Arc wrapper allows the limiter to be shared across the server's worker threads without cloning the internal state. Governor handles the math. You handle the keys.

Pitfalls and compiler errors

Rate limiting introduces concurrency and state management challenges. Watch for these common issues.

Memory leaks happen when you track clients without expiration. A HashMap grows indefinitely as new IPs appear. Old entries never get removed. Use a crate like governor that prunes inactive keys, or implement a TTL with a background cleanup task. Memory leaks are silent killers. Set a TTL or use a crate that does it for you.

Distributed systems break local counters. If your service runs behind a load balancer with multiple instances, each server maintains its own counter. A client can hit the limit on server A and still have quota on server B. Local rate limiting only works for single-instance deployments. Use a distributed store like Redis to enforce limits across instances.

Lock contention kills performance. A global Mutex forces every request to wait for the lock. High traffic causes latency spikes. governor uses lock-free structures and sharding to minimize contention. Avoid custom Mutex implementations in hot paths. The compiler rejects shared state without proper wrappers with E0382 (use of moved value) when you try to clone the limiter without Arc. Wrap shared state in Arc or web::Data to satisfy the borrow checker.

Status codes matter. Return 429 Too Many Requests when rate limited. Include Retry-After headers to tell clients when to try again. Returning 503 or 400 confuses clients and breaks retry logic. Follow HTTP standards.

When to use what

Use governor when you need a flexible, framework-agnostic rate limiter with token bucket semantics and high performance. Use actix-web-ratelimit when you want a drop-in middleware for Actix Web without configuring the algorithm yourself. Use a custom HashMap implementation when you have unique requirements that existing crates don't cover, like database-backed limits or complex tiered quotas. Reach for std::sync::Mutex only for simple single-threaded tests; production code needs async-safe or lock-free structures. Pick a distributed store like Redis when your service runs behind a load balancer and limits must be enforced across multiple instances.

Don't reinvent the token bucket. Use a crate.

Where to go next

Rate limiting acts like a bouncer at a club, letting only a certain number of people in per minute to keep things safe. In Rust, you use a library to count how many times a user hits your server and block them if they go over the limit. This prevents your application from crashing due to too much traffic or malicious attacks.