The file is text, right?
You're writing a tool that reads a configuration file. You need the text inside config.toml so you can parse it. You grab std::fs::read_to_string, pass the path, and boom: you have a String. It feels like magic. Until you try to read a 500-megabyte log file and your process eats all the RAM. Or until you point it at a PNG image and the runtime screams about invalid UTF-8. Reading a file to a string is the "Hello World" of file I/O, but the details matter when the file gets big or weird.
Rust gives you a few ways to read files. The right choice depends on whether you care about memory usage, whether the data is actually text, and whether you need to process the content in chunks. Picking the wrong tool leads to out-of-memory crashes or confusing encoding errors.
The one-shot read
For most standard cases, std::fs::read_to_string is the answer. It opens the file, reads the entire content, validates that the bytes are valid UTF-8, and returns a String. If anything goes wrong, you get a Result::Err.
use std::fs;
use std::io;
/// Reads a config file and returns its contents as a String.
fn read_config(path: &str) -> io::Result<String> {
// read_to_string handles opening, reading, and UTF-8 validation in one shot.
// It returns a Result because the file might not exist or might be unreadable.
fs::read_to_string(path)
}
fn main() -> io::Result<()> {
// The ? operator propagates errors up to main, which prints them and exits.
// This is the idiomatic pattern for CLI tools.
let contents = read_config("config.txt")?;
// len() returns the byte length, not the character count.
// For ASCII this matches, but for emojis it will be larger.
println!("File has {} bytes", contents.len());
Ok(())
}
Convention aside: Returning io::Result<()> from main is a community standard for command-line tools. It lets you use the ? operator everywhere without writing match blocks. If an error occurs, main returns it, and the runtime prints a clean error message and exits with a non-zero status code.
Trust the Result. If you unwrap blindly, you're just hoping the file exists.
How read_to_string works
read_to_string does more than just copy bytes. It performs a sequence of steps that optimize for both speed and safety.
First, it opens the file descriptor. On most operating systems, it calls stat to query the file metadata. If the OS reports a file size, read_to_string pre-allocates a String with exactly that capacity. This avoids the "grow the vector" overhead where the buffer doubles in size repeatedly as data arrives. The string grows exactly once.
If the file size is unknown, such as when reading from a pipe or standard input, read_to_string falls back to a growing buffer strategy. It starts with a small allocation and expands as needed. This handles streams gracefully, though it may involve a few memory copies.
After reading the bytes, read_to_string runs a UTF-8 validator over the buffer. Rust's String type has a hard contract: it must always contain valid UTF-8. This isn't a suggestion. The compiler enforces this at the type level. If the file contains a byte sequence that violates UTF-8 rules, the function returns an io::Error with kind InvalidData. You cannot create a String with invalid UTF-8 using safe Rust. This prevents a whole class of bugs where code assumes text encoding and crashes on garbage bytes.
Convention aside: read_to_string accepts any type that implements AsRef<Path>. This includes &str, String, &Path, and PathBuf. This is a Rust convention called "accepting references to traits." It makes your API flexible. You don't need to convert arguments to PathBuf before calling the function. The compiler handles the coercion.
Don't fight the compiler here. Reach for AsRef<Path> in your own functions to match the standard library.
When the file gets big
Loading a 2-gigabyte file into a String is a bad idea. Your process will allocate a massive block of memory, and if the system is under pressure, the allocation fails. Or worse, it succeeds, but your app thrashes the cache and slows to a crawl.
For large files, you want to stream the data. std::io::BufReader wraps a file handle and provides buffered reading. It fetches data in chunks from the OS, reducing system call overhead. More importantly, it lets you process the file line-by-line or chunk-by-chunk without holding the entire content in memory.
use std::fs::File;
use std::io::{self, BufRead, BufReader};
/// Counts non-empty lines in a large file without loading it all into memory.
fn count_lines(path: &str) -> io::Result<usize> {
// Open the file first. This gives us a handle to the OS resource.
let file = File::open(path)?;
// Wrap in BufReader to reduce system calls.
// Reading byte-by-byte is slow; BufReader fetches chunks.
let reader = BufReader::new(file);
// lines() returns an iterator over Result<String>.
// It reads line by line, allocating only one line at a time.
// This keeps memory usage constant regardless of file size.
let count = reader
.lines()
.filter_map(|line| line.ok()) // Handle read errors per line
.filter(|line| !line.is_empty())
.count();
Ok(count)
}
Convention aside: BufReader::lines() strips the newline character from each line. If you need the delimiter, use BufReader::read_line() instead. The community convention is to use lines() for iteration and read_line() only when the delimiter matters. Also, filter_map(|line| line.ok()) is a common pattern for discarding errors in iterators when you want to continue processing.
Stream the data. Your RAM will thank you.
Pitfalls and errors
The biggest trap is assuming every file is text. If you point read_to_string at a JPEG, a compiled binary, or a file with Latin-1 encoding, it will fail. The error isn't a compiler error; it's a runtime error. You'll get an io::Error with kind InvalidData. The message usually mentions "stream did not contain valid UTF-8".
If you need to read binary data, use std::fs::read instead. It returns a Vec<u8>, which is a byte buffer. You can process the bytes directly or convert them to a String later if you handle the encoding yourself.
use std::fs;
/// Reads binary data into a Vec<u8>.
fn read_binary(path: &str) -> std::io::Result<Vec<u8>> {
// fs::read returns a Vec<u8>, which can hold any byte sequence.
// It does not validate UTF-8.
fs::read(path)
}
Another pitfall is error handling. If you try to use the ? operator in a function that doesn't return a Result or Option, the compiler rejects you. You'll see error E0277 (the trait From<std::io::Error> is not implemented for the return type). The fix is to change the function signature to return io::Result<T> or handle the error with match.
fn bad_main() {
// This won't compile.
// Error E0277: the `?` operator can only be used in a function
// that returns Result or Option.
let _ = std::fs::read_to_string("file.txt")?;
}
Binary data breaks read_to_string. Use fs::read for bytes.
Choosing your tool
Rust provides multiple ways to read files. The decision depends on your memory budget, the data type, and how you plan to process the content.
Use std::fs::read_to_string when you need the entire file content as a String and the file size is reasonable. This is the idiomatic choice for config files, small templates, and anything that fits comfortably in memory. It pre-allocates based on file size and validates UTF-8 in one pass.
Use std::fs::read when you are dealing with binary data or non-UTF-8 text. This returns a Vec<u8>, which is a byte buffer. You can convert it to a String later if you handle the encoding yourself, or process it as raw bytes.
Use BufReader with lines() when you need to process a large file line-by-line to keep memory usage low. This streams the data, allocating only one line at a time. It's the right tool for log analysis, CSV processing, or any file where you don't need the whole content in memory at once.
Use File::open with manual reads when you need fine-grained control over the read loop, such as reading fixed-size chunks or implementing a custom parser that doesn't fit the line-based model.
Reach for std::io::Read::read_to_string on a generic reader when your code needs to work with any input source, not just files. This lets you write functions that accept File, Cursor<Vec<u8>>, or TcpStream interchangeably.
Pick the tool that matches your memory budget and data type.