How to Use cargo-bloat to Find What's Contributing to Binary Size

Use `cargo-bloat` to analyze your binary's size by running it as a wrapper around `cargo build`, which generates a report listing functions and crates sorted by their memory footprint.

When your binary weighs more than your code

You just finished a command-line tool. It parses arguments, fetches data, and formats output. You run cargo build --release and check the file size. Twelve megabytes. Your JavaScript equivalent bundles to three hundred kilobytes. You did not write twelve megabytes of source code. The compiler did.

Rust trades runtime overhead for compile-time work. Every generic function gets expanded for every type you use it with. Every dependency you pull in brings its own compiled machine code. The linker stitches it all together, and suddenly your tiny script looks like a desktop application. You need to know what is actually inside that file before you start guessing.

cargo-bloat acts as a magnifying glass for your compiled output. It runs a build, reads the resulting binary, and maps out exactly which functions and crates are consuming space. You stop guessing and start targeting.

How the tool reads your binary

Compiled Rust code lives in an object file format like ELF on Linux or Mach-O on macOS. These formats contain a symbol table: a directory of every function, global variable, and data section the linker decided to keep. Each entry has a starting address and a length. Add up the lengths, and you get the size.

cargo-bloat wraps cargo build. It compiles your project, then parses that symbol table. It groups symbols by crate, calculates percentages, and demangles the compiler's internal names into something readable. The output is a ranked list. The biggest offenders sit at the top.

Install it once and keep it in your toolchain:

cargo install cargo-bloat

Run it against your project to see the default breakdown:

# Analyze the optimized build, not the debug build
# Group results by crate to avoid terminal flooding
cargo bloat --release --crates

The --release flag is mandatory for meaningful results. Debug builds include stack frames, line tables, and unoptimized machine code that inflates every function by several times its real size. --crates groups the output by dependency instead of listing every single function. Without it, the terminal floods with thousands of monomorphized helper functions.

Trust the symbol table. It shows you exactly what the linker kept.

Tracking down generic bloat

Rust's generics are zero-cost abstractions. The cost is paid at compile time through monomorphization. If you write a function that works with T, and you call it with String, Vec<u8>, and Option<i32>, the compiler generates three separate copies of that function in the final binary. Each copy has its own symbol. Each copy takes up space.

When cargo-bloat shows dozens of nearly identical function names with different type parameters, you are looking at monomorphization bloat. The compiler did exactly what you asked. It specialized the code for maximum speed. Speed sometimes costs size.

Filter the output to see the worst offenders:

# Show every symbol regardless of crate origin
# Limit output to the fifteen largest entries
cargo bloat --release --all --top 15

Scan the names. If you see my_crate::process::<std::string::String> and my_crate::process::<alloc::vec::Vec<u8>> sitting side by side, you have duplication. You can reduce this by introducing trait objects or boxing the generic parameter. Box<dyn Processor> creates a single vtable dispatch instead of multiple compiled copies. You trade a tiny runtime indirection for a smaller binary.

Here is how the bloat manifests in code and how to constrain it:

// Generic version: compiler generates a separate copy for every T
fn process_generic<T: std::fmt::Display>(item: T) {
    // Each type parameter creates a new function in the binary
    // The compiler optimizes away the trait bound check at compile time
    println!("Processing: {}", item);
}

// Trait object version: single compiled function with runtime dispatch
fn process_dynamic(item: &dyn std::fmt::Display) {
    // The compiler generates exactly one function body
    // Runtime uses a vtable pointer to resolve the Display method
    println!("Processing: {}", item);
}

fn main() {
    // Calling the generic version with three types creates three symbols
    process_generic(42);
    process_generic("hello");
    process_generic(vec![1, 2, 3]);

    // Calling the dynamic version reuses the same compiled function
    process_dynamic(&42);
    process_dynamic(&"hello");
    process_dynamic(&vec![1, 2, 3]);
}

If you switch to trait objects and the compiler complains about missing trait bounds, you will see E0277 (trait bound not satisfied). That error means the type you passed does not implement the trait you requested. Fix the bound, not the binary size. Measure the impact first. Sometimes the compiler's dead code elimination already dropped the unused copies.

Stop expanding generics blindly. Box the heavy ones.

Real-world dependency trimming

Heavy dependencies rarely pull in their entire codebase by accident. They pull in features. The serde crate, for example, supports JSON, YAML, TOML, and custom formats. Each format adds serialization and deserialization functions. If your project only reads JSON, the YAML and TOML parsers still sit in the binary unless you disable them.

Check your Cargo.toml for feature flags. Most well-maintained crates split functionality behind optional features. Disable what you do not use. Run cargo bloat again. Watch the percentage drop.

[dependencies]
# Explicitly list only the features you actually need
# Omitting "default" prevents pulling in every parser and format
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"

Notice the absence of default features in many crates. The default feature often enables every format, every parser, and every compatibility layer. Explicitly listing only derive or alloc keeps the dependency lean.

Convention aside: write Rc::clone(&data) instead of data.clone() when working with reference counting. The explicit form signals to readers that you are cloning a pointer, not the underlying data. The same clarity applies to dependency features. List them explicitly. Do not rely on defaults.

Audit your Cargo.toml like you audit your source code. Unused features are dead weight.

When the symbol table lies

cargo-bloat reads the symbol table. The symbol table only contains what the linker decided to keep. If you enable Link Time Optimization (LTO), the compiler merges functions, inlines aggressively, and sometimes removes symbols entirely. The reported sizes will shrink, but the actual binary size might not change proportionally. LTO changes the game.

You might also see garbled names like _ZN4core3ptr87drop_in_place... instead of readable function signatures. That means your system is missing llvm-symbolizer or rustc-demangle. The tool falls back to raw mangled names when it cannot find the demangler. Install llvm via your package manager or Homebrew to restore readable output.

Do not confuse binary size with runtime memory usage. cargo-bloat measures the executable file on disk. It does not measure heap allocations, stack frames, or memory mapped files. A small binary can still leak memory. A large binary can run efficiently. They are different problems.

Run cargo bloat before enabling LTO. Compare the reports. Optimize the source first.

Stripping vs analyzing

Running strip on a release binary removes debug symbols, DWARF information, and sometimes the symbol table itself. The file shrinks. The program still runs. This is a blunt instrument. It hides the problem rather than solving it.

Use strip when you are deploying to embedded systems with strict flash limits and you have already optimized the source code. Use cargo-bloat when you want to understand why the binary is large in the first place. Stripping a bloated binary just gives you a smaller bloated binary.

Convention aside: keep unsafe blocks as small as possible. The community calls this the minimum unsafe surface rule. The same discipline applies to binary size. Keep your dependency surface small. Keep your generic surface small. Every byte you add is a byte you have to justify.

Do not strip your way out of a design problem. Fix the design.

Decision matrix

Use cargo-bloat when you need to identify which functions or dependencies are inflating your release binary. Use cargo-udeps when you suspect you are importing crates that are never actually called in your code. Use strip when you have already optimized your codebase and need to remove debug metadata for deployment. Use trait objects or Box<dyn Trait> when monomorphization creates dozens of identical function copies that dominate the size report. Use explicit feature flags in Cargo.toml when a dependency pulls in parsers, formats, or compatibility layers you do not need. Reach for cargo bloat --release --no-std when you are building for bare metal or microcontrollers and want to exclude the standard library from the analysis.

Where to go next