How to Measure and Reduce Memory Usage in Rust Programs

Compile Rust programs with cargo build --release to optimize memory usage and performance.

The memory mystery in Rust

You ported a data processing script from Python to Rust. The CPU usage dropped by half, which felt great. Then you checked the system monitor. The Rust process is eating 400 megabytes of RAM for a dataset that fits in 5 megabytes. Something is wrong. Rust promises zero-cost abstractions and fine-grained control over memory, so why is your program hoarding bytes like a dragon?

The answer usually hides in two places. First, you are likely measuring a debug build, which includes overhead that distorts memory usage. Second, Rust gives you the tools to avoid allocations, but it doesn't force you to use them. Coming from Python or JavaScript, you are used to the runtime managing memory for you. In Rust, you make the choices. If you choose the easy path, the memory usage climbs.

Debug mode is lying to you

Rust's default build mode prioritizes fast compilation and helpful error messages. It disables optimizations that shrink memory footprint and speed up execution. The compiler leaves in debug assertions, stack trace information, and unoptimized code paths. A Vec in debug mode might reserve extra space for safety checks. Functions aren't inlined, so the call stack grows deeper.

Run your program with cargo build --release to see the real numbers. This command enables LLVM optimizations that inline functions, unroll loops, and eliminate dead code. The resulting binary uses less memory and runs faster. Never judge your memory usage based on a debug build. The overhead can be massive.

cargo build --release

The release binary lives in target/release. Run that binary to get accurate metrics. If your memory usage is still high after switching to release mode, the problem is in your code, not the compiler.

Release mode is the baseline. Never optimize for a build configuration you won't ship.

Measuring allocations

You can't reduce what you don't measure. Guessing where memory goes leads to wasted effort. Rust provides several tools to track allocations. The choice depends on your platform and how much detail you need.

The dhat crate is the easiest way to start. It instruments your allocator and prints a summary when the program exits. Add dhat to your dependencies and set it as the global allocator.

// Add dhat to Cargo.toml dependencies.
// This crate replaces the default allocator to track allocations.
#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;

fn main() {
    // Start the profiler. It captures heap allocations.
    // The summary prints automatically when the program exits.
    let _profiler = dhat::Profiler::new_heap();

    // Your code runs here.
    let data = vec![0u8; 1000000];
    println!("Allocated {} bytes", data.len());
}

When you run this, dhat outputs a breakdown of total allocations, current heap size, and the top allocation sites. It tells you exactly which lines are requesting memory.

For deeper analysis, use heaptrack on Linux. It records every allocation and deallocation with timestamps. You can visualize the data in heaptrack_print to see memory usage over time. This helps identify leaks and spikes. On macOS, leaks or Instruments serve a similar purpose.

Measure before you optimize. Profilers show you the truth; intuition often lies.

The String trap

The most common source of memory bloat in Rust is String. In Python, strings are immutable and often interned. In Rust, String is a growable, heap-allocated buffer. Every String owns its data. If you have a struct with ten string fields, you have ten separate heap allocations.

struct User {
    // Each String allocates memory on the heap.
    // This duplicates data if the source is already in memory.
    name: String,
    email: String,
    bio: String,
}

If you parse a JSON file or read a config, the data already lives in memory. Creating String fields copies that data into new allocations. You end up with the original buffer plus copies for every field. The memory usage triples or quadruples.

The fix is to borrow data instead of owning it. Use &str with lifetimes. A &str is a slice: a pointer and a length. It points to existing data without allocating.

struct User<'a> {
    // &str borrows data. No heap allocation occurs.
    // The struct cannot outlive the source text.
    name: &'a str,
    email: &'a str,
    bio: &'a str,
}

This drops the memory footprint of the struct to near zero for the data payload. The tradeoff is lifetime management. The User cannot outlive the source string. If you try to return a User that references a local variable, the compiler rejects it with E0597 (borrowed value does not live long enough).

fn parse_user() -> User {
    let text = String::from("John Doe");
    // Error E0597: `text` does not live long enough.
    // The function returns a reference to `text`,
    // but `text` is dropped at the end of the function.
    User { name: &text, email: "", bio: "" }
}

The compiler protects you from dangling pointers. If you need to return data that outlives the source, you must own it. Use String when you need to modify the text or when the data must live longer than the source. Use &str when you only read and the source lives longer.

Borrow when you can. Own when you must. Every allocation has a cost.

Vectors and capacity

Vectors grow dynamically. When you push an element and the vector is full, it allocates a larger buffer, copies the data, and frees the old buffer. The growth strategy usually doubles the capacity. This means a vector that ends up with 10,000 elements might have allocated and copied data for sizes 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, and 16384. The total memory touched is roughly twice the final size.

This wastes memory during growth and consumes CPU cycles for copying. If you know the size upfront, reserve the capacity.

fn build_list() -> Vec<i32> {
    // Vec::new starts with zero capacity.
    // Pushing triggers reallocations as the vector grows.
    let mut list = Vec::new();
    for i in 0..10000 {
        list.push(i);
    }
    list
}

fn build_list_optimized() -> Vec<i32> {
    // Reserve capacity upfront.
    // No reallocations occur during the loop.
    let mut list = Vec::with_capacity(10000);
    for i in 0..10000 {
        list.push(i);
    }
    list
}

The optimized version allocates once. It uses exactly the memory it needs. If you read data from a file or database and know the count, use with_capacity. If the size is unknown, Vec::new is fine. The overhead of reallocation is usually acceptable for small vectors.

After filling a vector, you might have excess capacity. If you pass the vector to another system or serialize it, the extra capacity stays allocated. Call shrink_to_fit to return the excess memory to the allocator.

let mut data = Vec::with_capacity(10000);
// ... fill data with 100 elements ...
data.shrink_to_fit();
// Capacity is now 100. Memory is reclaimed.

Don't fight the compiler here. Reach for with_capacity when you know the size.

Smart pointers and sharing

Sometimes you need multiple owners. Rc<T> and Arc<T> provide reference counting. They share a heap allocation and track how many pointers exist. When the count drops to zero, the data is freed.

Rc<T> is for single-threaded code. Arc<T> is for multi-threaded code and uses atomic operations, which add overhead. If you don't need thread safety, Rc<T> is lighter.

use std::rc::Rc;

fn main() {
    // Rc::new puts the value on the heap.
    // The reference count starts at one.
    let data = Rc::new(vec![1, 2, 3]);

    // Clone the pointer, not the data.
    // The reference count increments to two.
    let data2 = Rc::clone(&data);
}

Convention aside: write Rc::clone(&data) instead of data.clone(). Both compile and work identically. The explicit form signals to readers that you are cloning the pointer, not the underlying data. data.clone() looks like a deep clone but isn't. The community prefers the explicit form to avoid confusion.

Smart pointers add overhead. Each Rc or Arc stores a pointer to the data and a pointer to the reference count. This is extra memory per handle. If you have millions of small objects wrapped in Rc, the pointer overhead can dominate. Consider using indices into a central arena or Vec if you need to share many small values.

Counter-intuitive but true: the more you use smart pointers, the harder it becomes to track where data lives. Prefer references when possible.

Clone on Write

Sometimes you want to borrow data, but you might need to modify it later. Cloning upfront wastes memory if you never modify. Not cloning upfront forces a clone when you do modify, which might happen at an inconvenient time.

Cow<str> (Clone on Write) solves this. It is an enum that holds either borrowed data or owned data. You can treat it like a string. When you need to modify, it clones automatically.

use std::borrow::Cow;

fn process(text: &str) -> Cow<str> {
    if text.contains("bad") {
        // Allocation happens only when we need to modify.
        // The Cow takes ownership of the new string.
        Cow::Owned(text.replace("bad", "good"))
    } else {
        // No allocation. We borrow the original slice.
        // The Cow holds a reference to the input.
        Cow::Borrowed(text)
    }
}

This pattern is common in parsers and text processors. Most inputs don't need modification, so you save allocations. The few that do get cloned on demand.

Decision matrix

Use cargo build --release when you measure memory or performance for production. Debug builds include overhead that distorts results.

Use Vec::with_capacity when you know the approximate size before filling the vector. This prevents reallocations and memory fragmentation.

Use &str when you only read text and the source data lives longer than the reference. This avoids heap allocations entirely.

Use String when you need to modify text or own the data independently of its source.

Use Cow<str> when you sometimes borrow and sometimes modify, and you want to avoid cloning until the modification happens.

Use heaptrack or dhat when you need to identify which lines of code are allocating the most memory.

Use shrink_to_fit when a vector has grown large and you are done adding elements, but you want to return the excess memory to the allocator.

Use Rc<T> when you need shared ownership in a single-threaded context. It has less overhead than Arc<T>.

Use Arc<T> when multiple threads need to share data. It provides thread-safe reference counting.

Trust the borrow checker. It usually has a point. If it rejects a borrow, the lifetime is too short.

Where to go next