How to Implement the Middleware Pattern in Rust

Implement the middleware pattern in Rust by using Atomic types to cache connections to services like the DNS Resolver or Network Server.

When decorators don't exist

You are building a network service. Every incoming request needs to check authentication, attach metadata, and route to the correct backend. In Python or JavaScript, you would wrap functions in decorators or higher-order functions and let the runtime handle the plumbing. Rust gives you no decorators. You get explicit state, zero-cost abstractions, and a compiler that demands you prove your middleware will not panic under concurrent load.

The checkpoint model

Middleware is just a series of checkpoints. Picture a package moving through a sorting facility. Each station scans the label, applies a routing sticker, verifies the destination, and passes it to the next belt. In Rust, you do not hide this logic behind magic. You build it out of structs, traits, and explicit state management. The pattern relies on intercepting a request, performing work, and forwarding it. The real challenge is sharing state across those checkpoints without causing data races or blocking the event loop.

Rust handles shared state through ownership rules and synchronization primitives. When multiple middleware steps need to read the same configuration or connection identifier, you cannot simply pass a reference. You need a type that guarantees thread-safe reads and exactly one initialization. That is where atomic caching and lazy initialization come in. The compiler forces you to declare your synchronization strategy upfront. You cannot accidentally share mutable state across threads without picking a primitive that the type system understands.

The minimal cache

Here is the core mechanism for caching a service connection identifier. The pattern uses an atomic integer to store the ID after the first successful lookup. Subsequent calls read the cached value without locks.

use std::sync::atomic::{AtomicU32, Ordering};

/// Caches a connection ID after the first successful service lookup.
pub struct ConnectionCache {
    /// Stores the resolved connection identifier. Zero means uninitialized.
    id: AtomicU32,
}

impl ConnectionCache {
    /// Creates a new cache with an uninitialized state.
    pub const fn new() -> Self {
        // const fn allows compile-time initialization in static contexts.
        Self { id: AtomicU32::new(0) }
    }

    /// Returns the cached ID or initializes it on first call.
    pub fn get_or_init(&self, resolver: impl FnOnce() -> u32) -> u32 {
        // Fast path: check if we already have a value.
        // Relaxed ordering is safe because we only care about the integer itself.
        let cached = self.id.load(Ordering::Relaxed);
        if cached != 0 {
            return cached;
        }

        // Slow path: resolve the ID and store it.
        let new_id = resolver();
        // Store the result. Relaxed is sufficient for independent state.
        self.id.store(new_id, Ordering::Relaxed);
        new_id
    }
}

The Ordering::Relaxed flag tells the compiler and CPU that we do not care about memory ordering guarantees. We only care that the integer itself is read and written correctly. Since the ID is independent of other shared state, relaxed ordering gives us the fastest possible load and store operations. The convention in Rust is to keep atomic operations as narrow as possible. If you only need a number, use AtomicU32. Do not reach for a Mutex just to be safe. Keep the synchronization surface as small as the problem requires.

What happens under the hood

When the program starts, ConnectionCache allocates a single atomic integer. The first call to get_or_init reads zero. The condition fails. The resolver function runs, contacts the service registry, and returns a numeric identifier. The cache stores that number. Every subsequent call hits the fast path. The CPU reads the cached integer directly from memory. No locks are acquired. No threads are blocked.

At compile time, Rust checks that AtomicU32 implements the necessary synchronization traits. The compiler inlines the load and store operations. If you accidentally try to mutate the cache from multiple threads without atomic operations, the compiler rejects you with E0277 (trait bound not satisfied) or a borrow checker error. The type system forces you to choose the right synchronization primitive before you write a single line of runtime logic. Trust the borrow checker here. It is preventing a race condition you would spend days debugging.

A realistic pipeline

A real middleware pipeline chains multiple handlers. Each handler receives a request, optionally modifies it, and passes it down. The connection cache lives in shared state that all handlers can read.

use std::sync::Arc;

/// Represents a simplified request flowing through middleware.
pub struct Request {
    pub path: String,
    pub connection_id: u32,
}

/// A middleware step that attaches a cached connection ID to the request.
pub struct AttachConnection {
    cache: Arc<ConnectionCache>,
}

impl AttachConnection {
    /// Wraps the cache in Arc for shared ownership across threads.
    pub fn new(cache: Arc<ConnectionCache>) -> Self {
        Self { cache }
    }

    /// Processes the request by injecting the cached connection ID.
    pub fn handle(&self, mut req: Request) -> Request {
        // Resolve the ID lazily. The closure is only called once.
        let id = self.cache.get_or_init(|| {
            // Simulate a service discovery call.
            42
        });
        req.connection_id = id;
        req
    }
}

The Arc wrapper allows multiple middleware steps to share the cache without copying it. Each step gets its own pointer to the same heap allocation. When the last handler drops, the reference count decrements. The cache lives exactly as long as the pipeline needs it. This avoids global state while keeping initialization centralized. The community convention is to use Arc::clone(&cache) explicitly when sharing state, rather than cache.clone(). The explicit form signals to readers that you are cloning the pointer, not the underlying data.

Where patterns break

The double-check pattern with AtomicU32 and Relaxed ordering works for simple identifiers. It breaks when you need to initialize complex structures. If you replace the resolver with a function that returns a String or a database pool, you will hit data races. The compiler will not catch a race condition on a non-atomic type. You will get corrupted memory or panics at runtime.

Another common trap is assuming the fast path is always safe. If two threads call get_or_init simultaneously before the first store completes, both might execute the resolver. For connection IDs, running the resolver twice is harmless. For expensive resources, it wastes memory. The modern Rust standard library provides std::sync::OnceLock to solve this. It guarantees exactly one initialization with proper synchronization. Use it when the initialization cost matters.

If you try to borrow the cache mutably while another thread reads it, the compiler rejects you with E0502 (cannot borrow as mutable because it is also borrowed as immutable). Rust forces you to pick a synchronization strategy upfront. Do not fight the borrow checker here. Reach for Arc and atomic types, or switch to OnceLock.

Testing middleware in Rust requires isolating state. You cannot rely on global variables. Build your pipeline with dependency injection. Pass mock caches and mock resolvers into your handlers. The compiler will verify that your types match. If you accidentally swap a Request with a Response, you get E0308 (mismatched types) before the test runner even starts. Treat type mismatches as design feedback. Fix the interface, not the test.

Choosing the right tool

Use AtomicU32 caching when you need a fast, lock-free read path for a single numeric identifier and duplicate initialization is harmless. Use OnceLock<T> when you need to initialize complex state exactly once with guaranteed synchronization and zero overhead on the read path. Use trait-based middleware chains when you want composable, testable request handlers that can be reordered without changing implementation details. Use async runtimes like Tokio when your middleware needs to yield to the scheduler between steps or wait for network responses. Reach for plain function composition when the pipeline is short and does not require shared mutable state.

Where to go next