How to Use Middleware in Rust Web Frameworks

The request pipeline problem

You are building a REST API. Every route needs to log the incoming method and path. Every route needs to verify a bearer token before touching your database. Every route needs to attach CORS headers to the response. You write the first handler. It works. You write the second. You copy the logging code. You write the third. You copy the auth check. By the fifth route, your handlers look like Swiss cheese. Business logic is buried under boilerplate. A single change to the auth flow means editing twelve files.

Rust web frameworks solve this with middleware. Middleware sits between the raw HTTP request and your route handler. It intercepts the request, does its job, passes it down the chain, catches the response on the way back, and modifies it if needed. You write the cross-cutting concern once. You attach it to routes or the entire server. Your handlers stay clean.

What middleware actually is

Think of middleware as a series of toll booths on a highway. A car enters the first booth. The attendant checks the license plate, stamps a receipt, and waves the car forward. The car hits the second booth. The attendant checks the receipt, verifies the destination, and adds a lane assignment sticker. Finally, the car reaches its exit ramp. On the way back, each booth might collect a fee or update a log.

In Rust web frameworks, the car is the Request. The exit ramp is your route handler. Each booth is a middleware layer. The framework gives you a standardized way to wrap handlers in these layers. The most common foundation is the tower crate, which defines a Service trait. A Service takes an input, returns a Future that resolves to an output, and can be composed. Frameworks like Axum, Actix Web, and Rocket all build their middleware systems on top of this composition model.

You do not need to implement tower::Service manually for most web tasks. Frameworks provide builder patterns or macros that generate the boilerplate. You focus on the transformation logic. The framework handles the async plumbing, the type erasure, and the chain execution.

A minimal middleware in action

Here is a complete logging middleware for Axum. It prints the method and path before the request reaches your handler, and prints the status code after the handler returns.

use axum::{
    extract::Request,
    response::Response,
    middleware::Next,
};
use std::fmt;

/// Logs the incoming request method/path and the outgoing status code.
pub async fn log_request(
    req: Request,
    next: Next,
) -> Response {
    // Extract metadata before passing the request down.
    let method = req.method().clone();
    let path = req.uri().path().to_string();
    
    // Call the next layer in the chain. This returns a Future.
    let response = next.run(req).await;
    
    // Read the status code from the response on the way back up.
    let status = response.status();
    println!("[{}] {} -> {}", method, path, status);
    
    // Return the response unchanged.
    response
}

The signature follows a strict convention. The first parameter is the Request. The second is Next, which represents the rest of the middleware chain plus your actual handler. The return type is Response. The function must be async because network I/O and downstream handlers are asynchronous.

When you attach this to a route, Axum wraps your handler with a generated service that calls log_request. The next.run(req).await line is the handoff. Nothing happens in your handler until that line executes. When the handler finishes, next.run resolves to a Response. Your middleware can inspect it, modify headers, change the body, or return it as-is.

How the chain executes

Middleware does not run in parallel. It runs in a strict forward-and-backward sequence. The framework builds a stack. The outermost layer is the first to receive the request. It decides whether to short-circuit (return early) or call next.run(). If it calls next.run(), control passes to the next layer. This continues until the final handler executes.

Once the handler produces a Response, control unwinds. The innermost middleware gets the response first. It can modify it and pass it back. The next layer out receives the modified response. This continues until the outermost layer returns the final Response to the HTTP server.

This stack behavior matters for error handling. If a middleware panics or returns an error response before calling next.run(), the handler never runs. If the handler returns an error, only the middleware layers that already called next.run() see it. You cannot catch a handler error in a middleware that sits outside the error boundary.

Convention aside: the community standard for error types in web middleware is Box<dyn std::error::Error + Send + Sync>. It erases concrete types so different layers can fail with different errors without breaking the chain. Frameworks expect this or a compatible IntoResponse type.

Building a realistic chain

Real applications rarely use a single middleware. You typically chain logging, authentication, rate limiting, and compression. Here is how you compose them in Axum using the middleware module and tower-http for common utilities.

use axum::{
    Router,
    middleware,
    routing::get,
};
use tower_http::compression::CompressionLayer;
use tower_http::trace::TraceLayer;

/// Returns a configured router with a realistic middleware stack.
pub fn create_app() -> Router {
    // Define the base route.
    let route = get(|| async { "Hello from the handler" });
    
    // Wrap the route with authentication and tracing.
    let protected = route
        .layer(middleware::from_fn(auth_middleware))
        .layer(TraceLayer::new_for_http());
        
    // Apply global compression and logging.
    Router::new()
        .nest("/api", protected)
        .layer(CompressionLayer::new())
}

/// Verifies a simple bearer token. Returns early if invalid.
async fn auth_middleware(
    req: Request,
    next: Next,
) -> Result<Response, (http::StatusCode, &'static str)> {
    // Check the Authorization header.
    let auth = req.headers().get("Authorization");
    
    // Short-circuit if the header is missing or malformed.
    if auth.is_none() {
        return Err((http::StatusCode::UNAUTHORIZED, "Missing token"));
    }
    
    // Pass valid requests down the chain.
    Ok(next.run(req).await)
}

The layer method composes middleware. Each call wraps the previous layer. The order matters. TraceLayer sits outside auth_middleware, so it logs both successful and failed requests. CompressionLayer sits at the top, so it compresses the final response regardless of which route handled it.

You can apply middleware globally, per-route, or per-nested router. Global layers run for every request. Route-specific layers only run when that exact path matches. Nested layers run for all routes under a prefix. Pick the narrowest scope that satisfies your requirement. Wider scopes increase latency for unrelated endpoints.

Where things break

Middleware introduces async boundaries and trait constraints that trip up new Rust developers. The compiler will catch most mistakes, but the error messages can feel dense.

If you forget to await the next.run(req) call, you get a type mismatch. The function expects Response, but you are returning a Future. The compiler rejects this with E0308 (mismatched types). You must await the future to actually execute the downstream chain.

If you try to store a non-Send type in shared state and pass it across async boundaries, the compiler rejects you with E0277 (trait bound not satisfied). Web servers spawn tasks on a multi-threaded runtime. Every piece of state shared across middleware must implement Send + Sync. Use Arc<Mutex<T>> or Arc<RwLock<T>> for mutable shared state. Use Arc<T> for read-only state.

Blocking the executor is the silent killer. If your middleware performs a heavy CPU calculation or a synchronous database query without spawn_blocking, you freeze the entire server. The runtime has a limited number of worker threads. One blocking middleware call starves every other request. Offload blocking work to a separate thread pool. Return a Future that yields control.

Another common trap is consuming the request body too early. The Request body is a stream. Once you read it, it is gone. If your middleware parses the JSON payload, the downstream handler receives an empty body. Clone the body if you need to inspect it, or use framework-specific extractors that handle streaming correctly. Axum provides axum::body::to_bytes for safe consumption.

Treat the middleware chain as a contract. If you modify the request, document it. If you short-circuit, return a clear status code. If you add headers, prefix them with your service name to avoid collisions. Trust the borrow checker here. It will force you to make ownership decisions explicit.

Choosing your middleware pattern

Use framework-specific middleware functions when you need simple request/response transformations like logging, timing, or header injection. Use tower-http crates when you need battle-tested utilities like compression, CORS, tracing, or request id generation. Use custom tower::Service implementations when you are building a reusable library that must work across multiple frameworks. Use per-route middleware when the logic only applies to a single endpoint or a small group of endpoints. Use global middleware when the behavior is mandatory for every request, such as security headers or server-wide metrics. Use nested router middleware when you are grouping endpoints by domain, like /api/v1 or /admin.

Write your own middleware only when existing crates do not cover your exact requirement. The tower ecosystem is mature. Reinventing rate limiting or authentication from scratch introduces subtle concurrency bugs. Reach for tower-http first. Fall back to custom code when you have a verified performance bottleneck or a highly specific business rule.

Keep your middleware thin. A middleware should do one thing well. If you find yourself writing conditional logic that branches into three different behaviors, split it into three layers. Composition is cheaper than complex branching. The stack unwinds predictably. Debugging becomes linear.

Where to go next

Middleware here acts as a specialized server that handles specific tasks like DNS lookups or network connections. Instead of your code doing the work directly, it sends requests to this middleware server, which processes them and returns the results. Think of it like a receptionist who handles all incoming calls so you can focus on your actual work.