How to Memory-Map Files in Rust (memmap)

Use the memmap2 crate to map files directly into memory for fast, array-like access in Rust.

When the file is too big to load

You have a 4GB binary log file. You need to jump to offset 1.5GB to read a record, then back to 100MB for a header, then to the end for a checksum. Loading the whole file into a Vec<u8> doubles your memory usage. Reading chunks with seek and read turns your code into a state machine of buffer management. You want the file to look like a slice of bytes in memory. You want to write data[offset] and have it just work.

Memory mapping gives you exactly that. It makes the OS pretend the file is RAM. You get random access with zero manual I/O. The file stays on disk until you touch it. This is lazy loading at the hardware level.

The OS pretends the file is RAM

Memory mapping tricks the CPU into thinking a file on disk is actually a region of memory. The OS reserves a range of virtual addresses and links them to the file on disk. It does not load any data yet.

When you access a byte in that range, the CPU sends a signal to the OS. The OS checks if that page is loaded. If not, the OS reads the corresponding page from the disk, maps it to your virtual address, and resumes your code. This happens transparently. You just index the slice and the hardware handles the rest.

This approach shines for large files. You can map a 1TB file in microseconds because no data is transferred. You only pay for the pages you actually read. The OS manages the cache. If memory gets tight, the OS evicts pages to disk automatically. You get the performance of RAM with the capacity of storage.

Mapping a file

The memmap2 crate is the standard tool for this. The original memmap crate is abandoned. memmap2 is the maintained fork with a cleaner API and better safety guarantees. Add it to your dependencies:

[dependencies]
memmap2 = "0.9"

Here is the minimal pattern to map a file and read a byte:

use memmap2::MmapOptions;
use std::fs::File;

fn main() {
    // Open the file for reading.
    let file = File::open("data.bin").expect("file not found");

    // Map the file into memory.
    // This requires unsafe because the OS could unmap the file
    // while we hold the reference, or the file could be truncated.
    let mmap = unsafe {
        MmapOptions::new()
            .map(&file)
            .expect("failed to map")
    };

    // Access bytes directly like a slice.
    // The Mmap struct implements Deref<Target=[u8]>, so slice methods work.
    println!("First byte: {}", mmap[0]);
    println!("Length: {}", mmap.len());
}

Convention: Keep the unsafe block as small as possible. Only wrap the map call. The resulting Mmap is safe to use. The community calls this the minimum unsafe surface rule. If you wrap the usage in unsafe too, you hide bugs.

The Mmap struct keeps the file open. You can drop the File handle immediately after mapping. The mapping holds the reference to the underlying file descriptor. This is a useful detail for structuring code. You don't need to keep the File alive manually.

Don't fight the compiler here. The unsafe block is the price for crossing into OS territory. Pay it once and use the safe wrapper.

What happens under the hood

File::open gives Rust a file descriptor. MmapOptions::new().map asks the OS to create a mapping. The OS reserves virtual address space and returns a pointer. memmap2 wraps that pointer in an Mmap struct.

When you access mmap[0], the Deref implementation turns it into a slice access. If the page is not loaded, the CPU triggers a page fault. The OS catches the fault, reads the disk page, maps it, and resumes execution. This is why the first access to a region is slower than subsequent accesses. The data is now in the page cache.

The unsafe block is necessary because Rust cannot guarantee the file won't be truncated or unmapped by the OS while you hold the reference. If another process truncates the file, accessing the mapping can crash your program with a bus error. Rust requires you to acknowledge this risk explicitly.

Ah-ha: You can map the same file multiple times. Each mapping gets its own virtual address range. This is useful for sharing data between threads or processes without copying. The OS deduplicates the physical pages.

Real-world usage: subset mapping

Mapping the entire file is not always what you want. Maybe you only need the last kilobyte of a log file. Or you need a specific chunk of a game asset. MmapOptions lets you map a subset using offset and len.

use memmap2::MmapOptions;
use std::fs::File;
use std::path::Path;

/// Reads the version string from a binary file header.
/// Expects the first 4 bytes to be a length-prefixed string.
fn read_version(path: &Path) -> Result<String, Box<dyn std::error::Error>> {
    let file = File::open(path)?;

    // Map the file. We keep the file handle alive because the mapping
    // depends on the file descriptor remaining valid.
    let mmap = unsafe {
        MmapOptions::new().map(&file)?
    };

    // Check for minimum size to avoid panic on indexing.
    if mmap.len() < 4 {
        return Err("file too small".into());
    }

    // Read the length of the version string (u32 LE).
    let len = u32::from_le_bytes(mmap[0..4].try_into()?) as usize;

    // Check bounds before slicing.
    if mmap.len() < 4 + len {
        return Err("version string extends beyond file".into());
    }

    let version_bytes = &mmap[4..4 + len];

    Ok(String::from_utf8_lossy(version_bytes).into_owned())
}

/// Maps only the last 1024 bytes of a file.
/// This is efficient for reading footers or checksums.
fn read_footer(path: &Path) -> Result<Vec<u8>, Box<dyn std::error::Error>> {
    let file = File::open(path)?;
    let metadata = file.metadata()?;
    let file_len = metadata.len();

    if file_len < 1024 {
        return Ok(Vec::new());
    }

    // Calculate the offset.
    // memmap2 requires the offset to be page-aligned.
    // The crate handles alignment internally, but it's good to know.
    let offset = file_len - 1024;

    let mmap = unsafe {
        MmapOptions::new()
            .offset(offset)
            .len(1024)
            .map(&file)?
    };

    // Copy the data out. The Mmap is a view, not owned data.
    Ok(mmap.to_vec())
}

Convention: Use let _ = file; if you need to keep the file open but don't use it directly. This signals to readers that you considered the value and chose to keep it alive. In the read_footer example, the Mmap keeps the file open, so the file variable can be dropped. The mapping holds the reference.

Check lengths before indexing. The OS won't save you from a panic. If you index out of bounds, Rust panics. If the file is truncated, the OS sends a signal. Always validate sizes.

Pitfalls and compiler errors

Memory mapping is powerful but has traps.

File truncation. If another process truncates the file while you hold a mapping, accessing the invalid region causes a bus error (SIGBUS on Unix). Your program crashes. Rust cannot prevent this. The unsafe block acknowledges this risk. If you need to handle truncation, you must check the file size before accessing or use a separate mechanism to detect changes.

Offset alignment. memmap2 requires offsets to be page-aligned on some platforms. The crate handles this by adjusting the mapping internally, but it means you might get a few extra bytes at the start. Check the documentation for offset behavior. If you need exact alignment, ensure your offset is a multiple of the page size.

Compiler errors. If you try to dereference a raw pointer outside an unsafe block, the compiler rejects you with E0133 (dereference of raw pointer requires unsafe). Mmap wraps the pointer safely, so you rarely see this. If you try to assign an Mmap to a &[u8] without dereferencing, you get E0308 (mismatched types). Use &mmap[..] or rely on Deref.

Thread safety. Mmap is Send and Sync. You can share it across threads for reading. MmapMut is Send but not Sync. You can move it to another thread, but you cannot share it. This matches the ownership model. Shared reads are safe. Exclusive writes require exclusive access.

Counter-intuitive but true: the more you use unsafe, the harder the rest of your code becomes to reason about. Keep the mapping creation isolated. Use the safe Mmap API everywhere else.

Choosing the right tool

Use memmap2 when you need random access to large files without loading everything into RAM. Use memmap2 when you want to parse binary formats with zero-copy slicing. Use memmap2 when you need to share file data between threads or processes efficiently.

Use std::fs::read when the file fits comfortably in memory and you only need to read it once. Use std::fs::read when you want simplicity and don't care about memory usage.

Use BufReader when you are streaming data sequentially and want to minimize system calls. Use BufReader when you are processing text line-by-line or reading a stream that is not seekable.

Use MmapMut when you need to modify the file contents in place with zero-copy writes. Use MmapMut when you are implementing a database or cache that updates files directly.

Use File::seek and read when you are on a platform with strict memory limits where mapping might fail or consume too much virtual address space. Use File::seek and read when you need to interleave reads from multiple files without holding mappings open.

Where to go next