How to Add Metrics to a Rust Application (prometheus, metrics crate)

How to Add Metrics to a Rust Application

You've built a Rust service. It handles requests, processes data, and sits in production. Everything looks fine until Tuesday at 3 AM when the latency spikes and the dashboard goes red. You check the logs. The logs show individual errors, but they don't show the trend. You need numbers. You need to know how many requests hit the endpoint, how long they took, and how much memory the queue is eating. That's where metrics come in.

Logs are a transcript of everything that happened. Metrics are the gauges on the dashboard. You don't read the transcript to see if you're running out of gas; you look at the fuel gauge. In Rust, the standard way to handle this is the metrics crate ecosystem. It gives you a unified API to record data and a set of exporters to send that data to backends like Prometheus, StatsD, or Datadog.

The abstraction layer

Rust's crate ecosystem loves abstractions, and metrics are no exception. The metrics crate defines the interface. It provides macros like counter!, gauge!, and histogram! that you sprinkle throughout your code. The metrics crate itself doesn't know anything about Prometheus. It just sends events to a global recorder.

You pair metrics with an exporter crate, like metrics-exporter-prometheus. The exporter implements the recorder trait and formats the data for the backend. This split is intentional. You can write your instrumentation once and swap the exporter later without touching your application code. Want to move from Prometheus to StatsD? Change the dependency and the initialization. The counter! calls stay the same.

Convention aside: The community treats metrics as a core dependency and the exporter as a feature-gated or optional dependency. This keeps the instrumentation decoupled from the transport. If you're building a library, depend only on metrics. Let the binary crate choose the exporter.

The three pillars of metrics

Before writing code, pick the right type. Metrics fall into three categories. Using the wrong type makes your data useless.

Counters only go up. They measure cumulative totals. Request counts, error counts, bytes sent. You increment a counter. You never decrement it. If the process restarts, the counter resets to zero. That's expected. The monitoring system handles the reset by calculating rates over time.

Gauges go up and down. They measure a point-in-time value. Current memory usage, active connections, queue length. You set a gauge to a value. The value represents the state right now. Gauges are great for health checks. If the "active connections" gauge drops to zero, the service might be dead.

Histograms measure distributions. You use them for durations and sizes. How long does a request take? What's the 99th percentile? Histograms bucket values into ranges. The backend calculates percentiles from the buckets. If you care about latency, you need a histogram. A counter of "total request time" hides the outliers. A histogram shows you the slow requests that are killing your users.

Minimal setup

Here's the smallest working example. It initializes a Prometheus exporter and records a counter.

use metrics::{counter, set_global_recorder};
use metrics_exporter_prometheus::{PrometheusBuilder, PrometheusHandle};

fn main() {
    // Create the Prometheus exporter and install it as the global recorder.
    // This must happen before any metrics macros are called.
    let handle = PrometheusBuilder::new()
        .install_recorder()
        .expect("Failed to install Prometheus recorder");

    // Increment a counter with a label.
    // The macro returns a handle, so we can increment by 1 or any amount.
    counter!("http_requests_total", "method" => "GET").increment(1);

    // Render the metrics in Prometheus text format.
    // In a real app, you'd serve this string via an HTTP endpoint.
    println!("{}", handle.render());
}

The PrometheusBuilder::new().install_recorder() call does two things. It creates the backend that aggregates metrics and registers it globally. The metrics crate uses a single global registry. When you call counter!, the macro looks up the global recorder and sends the event. This design makes metrics thread-safe. Any thread can call the macros without passing a recorder around.

If you forget to install the recorder, the macros become no-ops. The code compiles and runs, but no data is collected. You get silent data loss. Always check the result of install_recorder.

Convention aside: Use expect or unwrap on install_recorder in main. If the recorder fails to install, the application shouldn't start. Metrics are often critical for observability. Failing fast is better than running blind.

What happens under the hood

When you call counter!("http_requests_total", "method" => "GET"), the macro expands to code that constructs a key and sends an increment event to the recorder. The key includes the metric name and the labels. The recorder aggregates the data. For Prometheus, the exporter maintains a set of time series in memory. Each unique combination of name and labels is a separate time series.

The aggregation happens in the exporter. The metrics crate doesn't store data. It just dispatches events. This keeps the overhead low. The macros are fast. They avoid allocations for the metric name by using string literals. Labels are passed as &str. The exporter handles the hashing and storage.

Thread safety is built in. The global recorder is shared across threads. The exporter uses internal synchronization to protect its state. You can call counter! from multiple threads concurrently. The increments are atomic. You don't need locks or Mutex wrappers around your metrics calls.

Realistic instrumentation

In a real application, you instrument request handlers and background tasks. Here's how you might measure an HTTP request.

use metrics::{counter, histogram};
use std::time::Instant;

/// Handles an incoming HTTP request and records metrics.
fn handle_request(method: &str, path: &str) {
    // Record the request count immediately.
    // This ensures we count the request even if it fails later.
    counter!("http_requests_total", "method" => method, "path" => path).increment(1);

    // Measure the duration of the work.
    let start = Instant::now();
    
    // Simulate work.
    do_expensive_work();

    // Record the duration in milliseconds.
    // Histograms expect f64 values.
    let duration = start.elapsed().as_secs_f64() * 1000.0;
    histogram!("http_request_duration_ms", "method" => method).record(duration);
}

fn do_expensive_work() {
    // Placeholder for actual logic.
}

The counter records the request count. The histogram records the duration. Notice the labels. We include method and path. This lets us slice the data. We can see how many GET requests hit /api/users. We can see the latency distribution for POST requests.

Labels are powerful, but they come with a cost. Each unique label set creates a new time series. If you add a label for user_id, you create a time series for every user. If you have a million users, you create a million time series. This is called cardinality explosion. Prometheus stores time series in memory. High cardinality eats RAM and slows down queries.

Convention aside: Never put high-cardinality data in labels. Avoid user_id, email, request_id, or ip_address. Use labels for static or low-cardinality dimensions like method, path, status_code, or region. If you need to track per-user metrics, use a database or a different tool. Metrics are for aggregates, not identifiers.

Pitfalls and compiler errors

Metrics in Rust are generally safe, but there are traps.

If you pass the wrong type to a histogram, the compiler catches you. Histograms expect numeric values. If you try to record a string, you get a trait bound error.

// This fails to compile.
histogram!("duration").record("not a number");

The compiler rejects this with E0277 (trait bound not satisfied). The record method requires a type that implements Into<f64>. Strings don't implement that trait. The error message points to the macro expansion. Fix it by passing a number.

Another pitfall is label cardinality. The compiler can't check cardinality. You have to reason about it. If you add a label for every unique value, you'll crash your Prometheus server. Review your labels carefully. Ask yourself: "How many unique values will this label have?" If the answer is "unbounded," remove the label.

You might also run into issues with string allocations. The metrics macros accept &str for labels. If you pass a String, it clones the data. Use string slices to avoid allocations.

// Good: no allocation.
counter!("requests", "status" => "200").increment(1);

// Bad: allocates a String.
let status = String::from("200");
counter!("requests", "status" => status).increment(1);

The second example works, but it allocates memory for every call. In a hot path, this adds up. Stick to &str literals or borrowed strings.

Convention aside: Use metrics::describe_counter! and similar macros at startup to add help text and unit information. This makes your metrics self-documenting in Prometheus. It's a small step that pays off when you're debugging at 3 AM.

Serving the metrics

The PrometheusHandle gives you a render() method that returns a string. You need an HTTP server to expose this string at /metrics. The exporter doesn't start a server. It just formats the data. You're responsible for the HTTP endpoint.

In a web framework like axum or actix-web, you create a handler that calls handle.render() and returns the result with the correct content type. The content type should be text/plain; version=0.0.4; charset=utf-8. Prometheus scrapers expect this format.

use actix_web::{get, web, App, HttpServer, Responder};

#[get("/metrics")]
async fn metrics_endpoint(handle: web::Data<PrometheusHandle>) -> impl Responder {
    // Render the metrics and return them with the correct content type.
    actix_web::HttpResponse::Ok()
        .content_type("text/plain; version=0.0.4; charset=utf-8")
        .body(handle.render())
}

You share the PrometheusHandle via the app state. The handler captures the handle and renders the metrics on every request. This is efficient. The rendering is fast. The exporter aggregates the data; the render just formats it.

When to use what

The Rust metrics ecosystem has options. Pick the right tool for your needs.

Use the metrics crate when you want a unified API that can swap backends. You get metrics-exporter-prometheus, metrics-exporter-statsd, or a custom exporter without changing your instrumentation code. This is the standard choice for most applications.

Use the prometheus crate directly when you need Prometheus-specific features that the abstraction layer doesn't expose, like custom collectors or complex federation setups. You trade portability for full access to the Prometheus ecosystem. This is rare. The metrics crate covers most use cases.

Use opentelemetry when you need to correlate metrics with traces and logs. OpenTelemetry gives you a single standard for all telemetry signals, making it easier to send data to backends that support the OTLP protocol. If you're already using OpenTelemetry for tracing, adding metrics is straightforward.

Use raw counters in your own struct when you need sub-millisecond precision or lock-free access patterns that the global recorder can't provide. This is rare. The metrics crate is usually fast enough. Only go this route if profiling shows the recorder is the bottleneck.

Labels let you slice your data. Use them to group, not to identify. High cardinality kills Prometheus.

Where to go next

Adding metrics to a Rust application creates a dashboard that counts events like button clicks or errors. It functions as a digital odometer accessible via a web browser to verify your app's health. You use this approach when you need to monitor software performance in real time.