How to Implement a Rate Limiter in Async Rust

The bouncer at the async door

You are running a weather API. A script starts hammering your endpoint every millisecond. Your database connection pool fills up, legitimate users get timeouts, and your server starts eating RAM like it is a buffet. You need a bouncer at the door. In async Rust, that bouncer has to be fast, it cannot block the event loop, and it needs to track state across many concurrent requests without turning your code into a deadlock nightmare.

Rate limiting is the practice of counting how many requests a client makes within a time window and rejecting them if they go over. In a synchronous program, you can just use a global variable or a file. In async Rust, multiple tasks run on a few OS threads. They share memory. They race. If two tasks try to update the request count at the exact same moment, the data corrupts. Rust stops you from doing that, but it forces you to be explicit about how you share and protect state.

The standard pattern uses three building blocks: Arc to share ownership, Mutex to protect mutations, and a HashMap to track each client. You wrap the map in the mutex, wrap the mutex in the arc, and pass the arc to every task that needs to check the limit.

The sliding window algorithm

Rate limiters usually implement a sliding window. Instead of resetting the count at the top of every minute, the window moves forward with time. If the limit is five requests per minute, the bouncer looks at the last sixty seconds. If a client made four requests at 12:00:55 and one at 12:01:05, the bouncer sees five requests in the window and blocks the next one. The window slides, old requests fall off, and the count drops.

Think of a nightclub with a guest list. The bouncer checks the list. If a name appears too many times in the last hour, they get turned away. The list is shared by all bouncers. If two bouncers try to update the list at the exact same moment, they need a rule to take turns, or the list gets corrupted. In Rust, that rule is a Mutex. The shared list is the HashMap. The ability for multiple bouncers to hold the list is the Arc.

Minimal implementation

Here is a rate limiter that tracks timestamps per client. It uses a Mutex to guard the map and a HashMap to store the history. The is_allowed method cleans up old timestamps, checks the count, and records the new request if permitted.

use std::collections::HashMap;
use std::sync::{Arc, Mutex};
use std::time::{SystemTime, UNIX_EPOCH};

/// Tracks request timestamps to enforce rate limits per client.
///
/// This struct uses a sliding window algorithm. It keeps a list of
/// timestamps for each client and removes entries outside the window.
struct RateLimiter {
    /// Shared map of client IDs to their request history.
    /// The Mutex ensures only one task modifies the map at a time.
    requests: Mutex<HashMap<String, Vec<u64>>>,
    /// Maximum requests allowed within the window.
    max_requests: usize,
    /// Duration of the sliding window in seconds.
    window_seconds: u64,
}

impl RateLimiter {
    /// Creates a new rate limiter with the given constraints.
    fn new(max_requests: usize, window_seconds: u64) -> Self {
        Self {
            // Initialize the map inside a Mutex for thread safety.
            requests: Mutex::new(HashMap::new()),
            max_requests,
            window_seconds,
        }
    }

    /// Checks if a client is allowed to make a request.
    ///
    /// Returns true if the request is within limits.
    /// Returns false if the client has exceeded the threshold.
    fn is_allowed(&self, client_id: &str) -> bool {
        // Get current time as seconds since epoch.
        // SystemTime is the wall clock, which can jump due to NTP.
        let now = SystemTime::now()
            .duration_since(UNIX_EPOCH)
            .unwrap()
            .as_secs();

        // Lock the map to read and update timestamps safely.
        // This blocks the current thread until the lock is acquired.
        let mut map = self.requests.lock().unwrap();
        
        // Get or create the entry for this client.
        // Vec::new creates an empty history if the client is new.
        let entry = map.entry(client_id.to_string()).or_insert_with(Vec::new);

        // Remove timestamps outside the current window.
        // retain keeps only elements where the closure returns true.
        entry.retain(|&t| now - t < self.window_seconds);

        // Check if the client is under the limit.
        let allowed = entry.len() < self.max_requests;
        
        // Record the request only if it is allowed.
        if allowed {
            entry.push(now);
        }
        
        allowed
    }
}

The Mutex guarantees that retain, len, and push happen atomically relative to other tasks. You lock the map, do the work, and the lock drops when map goes out of scope. The critical section is short. It touches memory, does math, and updates a vector. No I/O happens while the lock is held. This keeps the lock contention low.

Convention aside: The community prefers Arc::clone(&limiter) over limiter.clone() when cloning an Arc. Both compile and both work. The explicit form signals to the reader that you are cloning the reference count, not deep-copying the data. It prevents confusion when scanning code.

Sharing across tasks

Rust's ownership rule says every value has exactly one owner. You cannot move the same RateLimiter into two different async tasks. The compiler rejects that with E0382 (use of moved value). You need a way to share ownership. Arc provides atomic reference counting. Every time you clone an Arc, the count goes up. When an Arc is dropped, the count goes down. When the count hits zero, the inner value is freed.

Wrap the limiter in Arc and clone the arc for each task. The tasks share the underlying data. The Mutex inside protects the data from concurrent writes.

#[tokio::main]
async fn main() {
    // Create the limiter and wrap it in Arc for shared ownership.
    let limiter = Arc::new(RateLimiter::new(5, 60));

    // Spawn multiple tasks to simulate concurrent requests.
    let mut handles = vec![];
    
    for i in 0..10 {
        // Clone the Arc to share the limiter with the new task.
        // The reference count increments, keeping the data alive.
        let limiter = Arc::clone(&limiter);
        
        let handle = tokio::spawn(async move {
            // Simulate different clients.
            let client_id = format!("user_{}", i % 3);
            
            // Check the rate limit.
            if limiter.is_allowed(&client_id) {
                println!("Request {} from {} allowed", i, client_id);
            } else {
                println!("Request {} from {} blocked", i, client_id);
            }
        });
        
        handles.push(handle);
    }

    // Wait for all tasks to finish.
    for handle in handles {
        handle.await.unwrap();
    }
}

The tokio::spawn function requires the closure to implement Send. The Arc<RateLimiter> is Send because Arc is Send and RateLimiter contains only Send types. Mutex is Send. HashMap is Send. The chain holds. If you tried to use Rc instead of Arc, the compiler would reject the code with E0277 because Rc is not Send. Rc uses non-atomic reference counting, which is unsafe across threads.

The time trap

The example uses SystemTime. This is a common mistake in rate limiters. SystemTime represents the wall clock. The operating system adjusts the wall clock using NTP. The clock can jump forward or backward. If the clock jumps backward, now - t might produce a huge number, or retain might behave unexpectedly. Your rate limiter could suddenly allow a massive burst because the window reset, or it could block everyone because the window stretched.

Use std::time::Instant for relative timing. Instant is monotonic. It never goes backward. It measures elapsed time since some unspecified point in the past. Rate limiting cares about duration, not wall clock time. Replace SystemTime with Instant and store u128 nanoseconds or Duration values. The logic stays the same, but the behavior becomes robust against clock adjustments.

Ah-ha reveal: SystemTime lies. NTP adjustments can cause the clock to jump. Your rate limiter might suddenly allow a burst because the window reset, or block everyone because the window stretched. Use Instant for relative timing.

Pitfalls and production realities

The minimal example works for a demo. Production code needs more care.

Memory leaks are the first concern. The HashMap never forgets. If user_123 makes a request and never returns, that entry stays in the map forever. Over months, the map grows. You need a cleanup strategy. Add a background task that periodically scans the map and removes stale entries. Or use an LruCache from the lru crate to cap the number of tracked clients.

Lock poisoning is the second concern. If a task panics while holding the Mutex, the lock becomes poisoned. The next call to lock() returns a LockError. The unwrap() in the example panics again. You get a cascade failure. Handle the error. Log it. Recover. Or use lock().expect("Mutex poisoned") to fail fast with a clear message. In a rate limiter, a poisoned mutex usually means a bug in the critical section. Fix the bug.

Blocking the runtime is the third concern. The example uses std::sync::Mutex. This is correct for fast critical sections. The lock is held for microseconds. Blocking the OS thread is cheaper than yielding the async task. If you move I/O inside the lock, you block the thread for milliseconds or seconds. The runtime has fewer threads to run other tasks. Your latency spikes. Keep the lock short. Do I/O outside the lock.

Convention aside: The community calls this the "minimum unsafe surface" rule, but it applies to locks too. Keep the critical section as small as possible. Drop the map before you do the actual work.

Compiler errors will guide you. If you forget to wrap the limiter in Arc and try to use it in multiple tasks, you get E0382. If you try to share an Rc across threads, you get E0277. If you try to mutate the map without locking, you get a borrow checker error. Trust the borrow checker. It usually has a point.

Decision matrix

Use std::sync::Mutex when the critical section is fast and contains no I/O. The lock is held for microseconds, so blocking the OS thread is cheaper than yielding the async task. Use tokio::sync::Mutex when you must perform I/O or await another future while holding the lock. The async mutex yields the task instead of blocking the thread, keeping the runtime responsive. Use Arc<T> when multiple async tasks need to share ownership of the rate limiter across threads. Arc provides thread-safe reference counting. Use Rc<T> only in single-threaded runtimes where thread safety is unnecessary. This is rare in production async code. Reach for a dedicated crate like governor or limiter when you need advanced features like token buckets, distributed limiting, or persistence. Rolling your own works for simple sliding windows, but libraries handle edge cases like clock skew and cleanup.

Don't fight the compiler here. Reach for Arc and Mutex. Keep the lock fast. Clean up the map. Your rate limiter will hold the line.

Where to go next

A rate limiter acts like a bouncer at a club, counting how many times a specific person enters within a set time window. If they try to enter too often, the bouncer turns them away until the window resets. You use this to protect your server from being overwhelmed by too many requests from a single user.