How to implement health check endpoint in Rust

Web
Implement a Rust health check endpoint using Axum by defining a GET route that returns a 200 OK status.

The monitoring gap

A container restarts after a routine deployment. The orchestrator immediately routes production traffic to it. The service crashes because the database connection pool is still initializing. The load balancer keeps sending requests. Error rates spike. Users see timeouts.

Infrastructure does not guess whether a service is ready. It polls. A health check endpoint gives the orchestrator a deterministic signal: the process is alive, the HTTP stack is functional, and the critical dependencies are connected. Without it, you are flying blind during deployments and crashes.

What a health check actually does

Health checks answer two distinct questions. Liveness checks whether the process is running and not deadlocked. If the answer is no, the orchestrator kills and restarts the container. Readiness checks whether the service can actually handle traffic. If the answer is no, the orchestrator removes the instance from the load balancer until it recovers.

Think of a car engine. Liveness is asking whether the engine is turning over. Readiness is asking whether the fuel line is clear, the transmission is engaged, and the brakes are released. A car can have a running engine but still be unsafe to drive. A service can be alive but unable to process requests.

HTTP is the standard protocol for these checks because every load balancer, service mesh, and orchestrator already speaks it. You return a 200 OK when everything is green. You return a 503 Service Unavailable when dependencies are down. The status code does the talking.

Infrastructure doesn't guess. It polls.

The bare minimum

Start with a route that returns a static string. The framework handles the rest.

use axum::{routing::get, Router};

/// Returns a plain text OK response for liveness probes.
async fn health_check() -> &'static str {
    "OK"
}

#[tokio::main]
async fn main() {
    // Build the router and attach the handler to the path.
    let app = Router::new().route("/health", get(health_check));

    // Bind to all interfaces on port 3000.
    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000")
        .await
        .unwrap();

    // Start the async server and block until it drops.
    axum::serve(listener, app).await.unwrap();
}

Add axum = "0.7" and tokio = { version = "1", features = ["full"] } to your Cargo.toml. The handler returns &'static str. Axum implements IntoResponse for it, which means the framework automatically wraps the string in an HTTP response with a 200 status and a text/plain content type.

Keep it fast. Keep it deterministic.

Walking through the request

When a probe hits http://localhost:3000/health, the router matches the path against the registered routes. It finds the get handler and schedules the async function on the Tokio runtime. The function returns immediately because it does no I/O. The runtime hands the return value to Axum's response layer.

Axum checks the type. It sees &'static str. It applies the IntoResponse trait implementation. The trait converts the string into a Response struct with headers, status, and body. The runtime writes the bytes to the TCP socket. The connection closes. The entire cycle takes microseconds.

Notice the async fn signature. Even though this handler does nothing async, Axum requires all handlers to be async. The framework uses the async boundary to manage request lifetimes, extractors, and middleware uniformly. You do not need to add .await inside the function unless you actually perform asynchronous work.

If you try to return a type that does not implement IntoResponse, the compiler rejects you with E0277 (trait bound not satisfied). The error message will point directly to the handler signature. Fix the return type or wrap it in axum::Json, axum::Html, or a custom response builder.

Trust the trait system. It enforces the contract before the server starts.

Adding real dependencies

A static string only proves the process is alive. Production services need readiness checks. You must verify that the database, cache, or external API is reachable before accepting traffic.

Axum uses dependency injection through the State extractor. You define a struct that holds your shared resources, clone it into the router, and extract it in handlers.

use axum::{
    extract::State,
    http::StatusCode,
    response::Json,
    routing::get,
    Router,
};
use serde::Serialize;
use std::sync::Arc;

/// Holds shared application dependencies.
#[derive(Clone)]
struct AppState {
    db_url: String,
}

/// Structured response for monitoring dashboards.
#[derive(Serialize)]
struct HealthResponse {
    status: String,
    database: bool,
}

/// Checks database connectivity and returns JSON.
async fn readiness_check(
    State(state): State<AppState>,
) -> Result<Json<HealthResponse>, StatusCode> {
    // Simulate a connection check. Replace with real pool ping.
    let db_reachable = !state.db_url.is_empty();

    if db_reachable {
        // Return 200 with structured payload.
        Ok(Json(HealthResponse {
            status: "ok".to_string(),
            database: true,
        }))
    } else {
        // Return 503 to signal the orchestrator to stop routing traffic.
        Err(StatusCode::SERVICE_UNAVAILABLE)
    }
}

#[tokio::main]
async fn main() {
    // Wrap state in Arc for thread-safe sharing across async tasks.
    let state = Arc::new(AppState {
        db_url: "postgres://localhost:5432/app".to_string(),
    });

    // Layer the state into the router so handlers can extract it.
    let app = Router::new()
        .route("/ready", get(readiness_check))
        .with_state(state);

    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await.unwrap();
    axum::serve(listener, app).await.unwrap();
}

The State extractor pulls the shared data from the router context. You do not reach for global variables. Global state breaks async runtime isolation and makes testing impossible. The Arc wrapper allows multiple async tasks to read the configuration safely. The #[derive(Clone)] on AppState is required because Axum clones the state for each request task.

Convention aside: the community prefers explicit State extraction over hidden globals. If a handler needs a dependency, it asks for it in the signature. The compiler verifies the contract. Tests can inject mock state without touching environment variables.

Inject your dependencies. Never hide them.

Where things break

Health checks look simple until they interact with real systems. Three patterns cause production incidents.

Blocking the async runtime is the most common mistake. If you call a synchronous database driver inside the handler, you tie up a Tokio worker thread. Under load, the thread pool exhausts. The entire service stalls. Use async drivers like sqlx or tokio-postgres. If you must call a blocking library, wrap it in tokio::task::spawn_blocking.

Returning the wrong status code confuses orchestrators. A 500 Internal Server Error tells the load balancer that the service crashed. A 503 Service Unavailable tells it that the service is alive but temporarily unable to handle requests. Readiness probes expect 503 when dependencies are down. Liveness probes expect 500 or 503 only when the process is truly broken.

Forgetting #[derive(Clone)] on state structs triggers E0277 (trait bound not satisfied) at compile time. Axum's State extractor requires the state type to implement Clone because it clones the reference for each request task. The error points to the router layer or the handler signature. Add the derive macro and the compiler accepts the code.

The borrow checker will save you from returning borrowed data. Trust it.

Choosing your approach

Use a static string response when you only need to verify the process is alive and the HTTP stack is functioning. Use a JSON payload with version and uptime when your monitoring dashboard expects structured data for alerting and trend analysis. Use dependency-aware checks when you need to distinguish between a crashed process and a temporarily unreachable database. Use separate /live and /ready endpoints when your orchestrator supports distinct liveness and readiness probes. Reach for plain text when the probe is a simple TCP/HTTP check from a legacy load balancer. Reach for axum::Json when modern agents parse the response for metrics.

Match the endpoint to the probe. Don't overengineer a pulse check.

Where to go next