How to Hash Data in Rust (SHA-256, Blake3, etc.)

Hash data in Rust using the sha2 crate for SHA-256 or blake3 crate for Blake3 by initializing a hasher, updating it with data, and finalizing the result.

When you need a fingerprint, not a summary

You download a Linux kernel image from a mirror. The website lists a SHA-256 checksum next to the download link. You run a tool, get a string of hex characters, and compare them. They match. You trust the file. If they don't match, you delete the file and look for another mirror. The hash is your proof that the data hasn't been corrupted or tampered with.

Hashing turns arbitrary data into a fixed-size fingerprint. The function is deterministic: the same input always produces the same output. It has the avalanche effect: change one bit in the input, and the output changes completely. It is one-way: you cannot reconstruct the input from the hash. And it resists collisions: it is computationally infeasible to find two different inputs that produce the same hash.

Rust's standard library does not include cryptographic hashing. The ecosystem relies on crates, but they all follow a single convention. You interact with hashers through the Digest trait. This trait gives you a uniform API regardless of whether you are using SHA-256, Blake3, or SHA-3. You create a hasher, feed it data, and finalize the result. The trait handles the boilerplate so you can swap algorithms without rewriting your logic.

A hash is a commitment. Change one bit of input, and the output changes completely.

The Digest trait unifies the ecosystem

Rust's hashing crates implement the Digest trait from the digest crate. This trait defines three core methods: new to create a fresh hasher, update to feed data, and finalize to compute the result. Because the trait is generic, you can write functions that accept any hasher and work with multiple algorithms.

Add the sha2 crate to your Cargo.toml for SHA-256 support. The crate re-exports the Digest trait, so you can import it directly.

use sha2::{Sha256, Digest};

fn main() {
    // Create a new hasher instance. This allocates internal buffers.
    let mut hasher = Sha256::new();

    // Feed data into the hasher. You can call update multiple times.
    // The trait accepts &str, &[u8], Vec<u8>, and more via AsRef<[u8]>.
    hasher.update(b"hello world");

    // Finalize computes the hash and returns a fixed-size array.
    // This consumes the hasher, so you cannot reuse it afterward.
    let result = hasher.finalize();

    // The result is a GenericArray. Use {:x} for lowercase hex output.
    println!("{:x}", result);
}

The update method takes impl AsRef<[u8]>. This means you can pass byte slices, strings, vectors, and other types without manual conversion. The hasher processes the data in chunks internally, so you can stream gigabytes of data through a single update call or break it into small pieces. The result is identical.

Stick to the Digest trait. It lets you swap algorithms without rewriting your hashing logic.

Lifecycle: new, update, finalize

The hasher lifecycle follows a strict pattern. new allocates the internal state. For SHA-256, this includes a buffer for partial blocks and the current hash values. update processes input bytes. If the input doesn't fill a complete block, the hasher stores the remainder in its buffer. Subsequent calls continue from where the last one left off. finalize pads the data according to the algorithm's rules, processes the final blocks, and returns the digest.

The finalize method takes self by value. It consumes the hasher. This is a deliberate design choice. Once you have the hash, the internal state is no longer needed. Consuming the hasher prevents accidental reuse, which could lead to subtle bugs where you think you are hashing fresh data but are actually appending to a previous computation.

If you try to use the hasher after finalizing, the compiler rejects the code with E0382 (use of moved value). The hasher has been moved into finalize. You must create a new instance to hash again.

use sha2::{Sha256, Digest};

fn main() {
    let mut hasher = Sha256::new();
    hasher.update(b"first part");

    // Finalize consumes the hasher.
    let _hash1 = hasher.finalize();

    // ERROR: E0382 use of moved value: `hasher`
    // hasher.update(b"second part");
}

Some algorithms provide finalize_reset. This method returns the hash but resets the internal state, allowing you to reuse the same hasher instance. This is useful in tight loops where allocation overhead matters. However, in most Rust code, creating a new hasher is cheap enough that finalize_reset is unnecessary. The compiler optimizes allocation well, and the clarity of a fresh instance usually outweighs micro-optimizations.

The compiler forces you to decide: keep the hasher alive and lose the result, or get the result and drop the hasher. There is no middle ground without finalize_reset.

Realistic pattern: Hashing a file stream

Real applications rarely hash small strings in memory. You hash files, network streams, or large buffers. The pattern involves reading data in chunks and updating the hasher incrementally. This keeps memory usage constant regardless of file size.

The critical detail is slicing the buffer by the number of bytes actually read. The read method returns the number of bytes placed in the buffer. If the file is smaller than the buffer, the remaining bytes contain garbage from previous reads or uninitialized memory. Hashing the full buffer instead of the valid slice introduces incorrect data and breaks verification.

use sha2::{Sha256, Digest};
use std::fs::File;
use std::io::{BufReader, Read};

fn hash_file(path: &str) -> Result<String, std::io::Error> {
    // Open the file and wrap it in a buffered reader for efficiency.
    let file = File::open(path)?;
    let mut reader = BufReader::new(file);

    // Create the hasher.
    let mut hasher = Sha256::new();

    // Buffer for reading chunks. 8KB is a reasonable default.
    let mut buffer = [0u8; 8192];

    loop {
        // Read bytes into the buffer. Returns the number of bytes read.
        let bytes_read = reader.read(&mut buffer)?;

        // Stop when no more bytes are available.
        if bytes_read == 0 {
            break;
        }

        // Update the hasher with only the valid bytes.
        // Slicing by bytes_read prevents hashing garbage data.
        hasher.update(&buffer[..bytes_read]);
    }

    // Finalize and return the hex string.
    let digest = hasher.finalize();
    Ok(format!("{:x}", digest))
}

The BufReader reduces system calls by reading larger blocks from the OS and serving smaller chunks to your loop. The hasher processes each chunk as it arrives. The loop terminates when read returns zero, indicating end-of-file. Error handling propagates I/O errors up the stack.

Always slice by bytes_read. Hashing the full buffer when the file is small introduces garbage data that breaks verification.

Blake3 vs SHA-256: Speed and standards

SHA-256 is the industry standard. It appears in TLS, blockchain, Git, and countless protocols. You use it when compatibility is required. The sha2 crate provides a well-optimized implementation, but SHA-256 is inherently slower than modern alternatives. It processes data sequentially and lacks parallelism.

Blake3 is the new performance king. It uses SIMD instructions to process multiple lanes of data in parallel. On modern CPUs, Blake3 can be ten times faster than SHA-256. It also supports streaming and tree hashing, which allows parallel computation over large inputs. The blake3 crate implements the Digest trait, so the API is identical to sha2. You can swap the hasher type and see immediate speedups.

use blake3::Hasher;
use digest::Digest;

fn main() {
    // Blake3's Hasher implements the Digest trait.
    let mut hasher = Hasher::new();

    // Feed data exactly like SHA-256.
    hasher.update(b"hello world");

    // Finalize returns the digest.
    let result = hasher.finalize();

    // Hex output works the same way.
    println!("{:x}", result);
}

The blake3 crate also provides convenience functions like blake3::hash(data) for one-shot hashing. This avoids manual hasher management when you have all the data in memory. For streaming scenarios, stick to the Digest trait interface.

Convention in the Rust community favors Blake3 for new projects where no external standard mandates SHA-256. The performance difference is not theoretical. It changes how you design data pipelines. If you are hashing large files or high-frequency payloads, Blake3 reduces CPU load and latency significantly.

If you have a choice, pick Blake3. The performance gain is massive, and the API is identical.

Pitfalls: Reuse, encoding, and the wrong hash

Hashing code has a few common traps. The first is reusing a hasher without resetting. If you create a hasher outside a loop and call update inside, you are appending data across iterations. The hash will accumulate all previous data. This is almost never what you want. Create a fresh hasher inside the loop, or use finalize_reset if you have measured that allocation is the bottleneck.

The second trap is hex encoding. The finalize method returns a GenericArray<u8, U32>. This type does not implement Display. You must convert it to a string. The {:x} format specifier works for lowercase hex. For uppercase, use { :X }. For production code that needs error handling or custom formatting, the hex crate provides hex::encode and hex::decode. The hex crate is faster and more explicit than format strings.

The third trap is confusing cryptographic hashing with hash table hashing. Rust's std::hash::Hash trait is used by HashMap and HashSet. It is optimized for speed and distribution, not security. The default hasher is SipHash, which resists hash-flooding attacks but does not provide cryptographic guarantees. You cannot use std::hash::Hash for integrity checks, digital signatures, or content addressing. An attacker can craft collisions for SipHash if they know the seed. Always use a Digest-based hasher for security-sensitive work.

Never use std::hash::Hash for security. That trait is optimized for hash tables, not collision resistance. It offers zero guarantees against attacks.

Decision: Which hasher to pick

Use sha2 when you need compatibility with existing systems, protocols, or compliance requirements that mandate SHA-256. Use blake3 when you control the protocol and want maximum throughput, especially for large files or high-frequency hashing. Use std::collections::hash_map with default hashing when you are building in-memory data structures and need fast lookups, not cryptographic integrity.

Where to go next

Hashing turns any amount of data into a fixed-size string of characters that acts like a unique fingerprint. You use this to verify that files haven't been tampered with or to store passwords securely without saving the actual text. Think of it like a digital seal that breaks if the contents change.