How to Build a Health Check Endpoint in Rust

When the server is alive but the service is dead

You deploy your Rust service. The container starts. The load balancer sends a request to /health. It gets a 200 OK. Traffic flows. Two seconds later, the database connection times out. Every user request crashes. The load balancer still thinks the service is healthy because the process is running. A health check that only verifies the process is breathing is a trap. It tells you the server exists, but not if it can actually do its job.

A proper health check endpoint answers two distinct questions. Liveness asks, "Is the process stuck?" Readiness asks, "Can I accept requests?" Orchestrators like Kubernetes use these signals to restart containers or route traffic. If you mix them up, you get restart loops or traffic directed to broken services. Your health check needs to reflect the actual state of your application, not just the state of the operating system process.

Liveness versus readiness

Liveness probes check if the application is responsive. If the event loop is blocked, or the process has deadlocked, the liveness probe fails. The orchestrator kills the container and starts a fresh one. This recovers from hangs.

Readiness probes check if the application can handle traffic. This includes checking dependencies. If the database is down, the cache is unreachable, or the configuration hasn't loaded, the readiness probe fails. The orchestrator removes the container from the load balancer. Traffic stops flowing to that instance. The container stays alive. When the dependency recovers, the readiness probe passes, and traffic resumes.

Separating these concerns prevents the "crash loop of death." If your app depends on a database that is temporarily unavailable, and your health check fails immediately on startup, the orchestrator will restart the container repeatedly. The container never gets a chance to reconnect. A readiness check allows the container to stay up while waiting for the dependency, while a liveness check ensures you don't stay up forever if the process is truly broken.

Minimal health check

Start with a simple endpoint that returns a success status. This covers the liveness case. The load balancer sees a 200 and knows the process is responding.

use actix_web::{web, App, HttpResponse, HttpServer, Responder};

/// Returns a 200 OK with a JSON payload indicating the service is running.
async fn health_check() -> impl Responder {
    // Return 200 OK. The body is optional for simple probes,
    // but JSON helps debugging and monitoring tools.
    HttpResponse::Ok().json(serde_json::json!({
        "status": "healthy"
    }))
}

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    HttpServer::new(|| {
        App::new()
            // Register the route. GET /health maps to the handler.
            .route("/health", web::get().to(health_check))
    })
    .bind("127.0.0.1:8080")?
    .run()
    .await
}

The handler returns impl Responder. This is a Rust trait that tells Actix the return value knows how to convert itself into an HTTP response. HttpResponse::Ok().json(...) builds the response and serializes the data to JSON. The serde_json::json! macro creates a JSON value from a literal. It's convenient for quick examples.

Convention aside: In production code, prefer defining a struct over using the json! macro. A struct gives you compile-time safety. If you typo a key name in the macro, the error happens at runtime when a client parses the response. A struct catches typos during compilation. It also makes it easier to add fields later without hunting through macro invocations.

How the routing works

Actix-web builds a routing tree when you call HttpServer::new. The closure passed to new runs once to configure the application. Inside that closure, App::new() creates the app builder. The .route() method attaches a handler to a path and method. web::get() restricts the route to HTTP GET requests. to(health_check) points to the handler function.

When a request arrives, Actix matches the path and method against the routing tree. If it finds a match, it calls the handler. The handler runs asynchronously. async fn means the function can yield control back to the executor while waiting for I/O. This keeps the server responsive even under load. If you block the handler with synchronous code, you tie up a thread in the thread pool. Other requests have to wait.

The bind call attaches the server to a network socket. run starts the event loop. The server listens for connections and dispatches them to the routing tree.

Adding dependency checks

A liveness check is rarely enough. You need to verify that your dependencies are available. This turns the endpoint into a readiness probe. To check dependencies, the handler needs access to the database pool, cache client, or other resources. Actix provides web::Data for sharing state with handlers.

use actix_web::{web, App, HttpResponse, HttpServer, Responder};
use std::sync::Arc;

// Define the state structure.
// Arc allows shared ownership across threads safely.
struct AppState {
    // Placeholder for a database pool.
    // In real code, this would be a type like sqlx::Pool<Postgres>.
    db_pool: Arc<DatabasePool>,
}

/// Checks the database connection and returns 503 if it fails.
async fn health_check(data: web::Data<AppState>) -> impl Responder {
    // Attempt a lightweight ping to the database.
    // This verifies the connection is alive without heavy queries.
    if let Err(_) = data.db_pool.ping().await {
        // Return 503 Service Unavailable.
        // The load balancer will stop sending traffic to this instance.
        return HttpResponse::ServiceUnavailable().json(serde_json::json!({
            "status": "unhealthy",
            "details": "database connection failed"
        }));
    }

    // All checks passed. Return 200 OK.
    HttpResponse::Ok().json(serde_json::json!({
        "status": "healthy",
        "timestamp": chrono::Utc::now().to_rfc3339()
    }))
}

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    // Initialize the database pool.
    let pool = Arc::new(DatabasePool::new());

    HttpServer::new(move || {
        App::new()
            // Share state with all handlers via web::Data.
            // web::Data wraps the state in Arc and RwLock for thread safety.
            .app_data(web::Data::new(AppState { db_pool: pool.clone() }))
            .route("/health", web::get().to(health_check))
    })
    .bind("127.0.0.1:8080")?
    .run()
    .await
}

The web::Data<AppState> argument in the handler is how Actix injects shared state. Actix automatically provides this argument to handlers that request it. The AppState struct holds the database pool. Arc wraps the pool so multiple threads can share it. web::Data adds another layer of synchronization, typically using RwLock, to allow concurrent reads and exclusive writes.

The handler calls data.db_pool.ping().await. This is an asynchronous check. It yields control while waiting for the database response. If the ping fails, the handler returns 503 Service Unavailable. This status code tells the load balancer that the service is temporarily unable to handle requests. The load balancer removes the instance from the rotation. When the database recovers, the next health check succeeds, and traffic resumes.

Convention aside: Keep health checks lightweight. A ping or a simple SELECT 1 is sufficient. Avoid complex queries that lock tables or consume significant resources. Health checks run frequently. If they become expensive, they degrade the performance of the service they are supposed to monitor.

Pitfalls and compiler errors

Health checks introduce subtle bugs if you aren't careful. The most common issue is blocking the async runtime. If you use a synchronous database driver inside an async handler, you block the thread. Actix runs on a thread pool. Blocking a thread starves other requests. The server becomes unresponsive. The health check itself might timeout, causing the orchestrator to restart the container. Always use async drivers for async handlers.

Another pitfall is leaking sensitive information. Health check endpoints are often exposed to load balancers and monitoring systems. If you include version numbers, internal IP addresses, or configuration details in the JSON response, you risk exposing them to attackers. Keep the response minimal. Return only what is necessary for monitoring.

Compiler errors often appear when sharing state. If you try to move a value into the handler closure without using web::Data, the compiler rejects you with E0382 (use of moved value). The closure captures the value, but the value is already moved into the server builder. web::Data solves this by wrapping the state in Arc, allowing multiple clones to share ownership.

If your state struct contains types that are not thread-safe, the compiler rejects you with E0277 (trait bound not satisfied). Actix requires shared state to implement Send and Sync. Send means the value can be transferred between threads. Sync means the value can be shared between threads via references. If you use Rc instead of Arc, you get this error. Rc is not thread-safe. Use Arc for web servers.

// This fails to compile.
// Rc is not Send + Sync.
struct BadState {
    data: std::rc::Rc<String>,
}

// The compiler rejects this with E0277.
// web::Data requires Send + Sync.
// .app_data(web::Data::new(BadState { ... }))

Fix the error by replacing Rc with Arc. Arc provides atomic reference counting that is safe for concurrent access.

Decision matrix

Use a simple status endpoint when you only need to verify the process is running and the framework is initialized. Use a readiness endpoint with dependency checks when your service relies on external resources like databases, caches, or message queues. Use separate liveness and readiness paths when your orchestrator supports distinct probes, allowing you to restart a stuck process without removing it from the load balancer during a brief dependency blip. Use structured JSON responses when your monitoring system parses health details for alerting or dashboards. Use plain text or empty bodies when you want to minimize response size for high-frequency probes from load balancers.

Don't lie to the load balancer. If your service can't handle requests, return 503. The health check is a promise. Keep it accurate.

Where to go next

A health check endpoint is a specific URL your application exposes to tell external systems if it is running correctly. It acts like a digital heartbeat, allowing load balancers or monitoring tools to quickly verify your service is alive before sending it traffic. If the endpoint returns a success status, the system knows your app is ready to work.