When a single directory isn't enough
You're building a tool that needs to process every file in a project. Maybe it's a linter scanning for .rs files, a backup script archiving documents, or a build system finding assets. The directory structure isn't flat. Files live in nested folders, submodules, and vendor directories. You could write a recursive function yourself, managing a stack of paths, handling OS-specific separators, and wrestling with symlink loops. Or you could use walkdir.
The walkdir crate gives you an iterator over a directory tree. It handles the recursion, error reporting, and configuration so you can focus on what to do with each entry. You get a clean, lazy sequence of files and directories, one at a time.
How the iterator model works
walkdir turns a hierarchical directory tree into a flat stream of entries. Think of it like a tour guide walking through a house. The guide enters the front door, visits the first room, then dives into the first closet in that room. If the closet has shelves, the guide checks those. When the closet is empty, the guide backtracks to the room, checks the next closet, and so on. This is a depth-first traversal.
The iterator yields one DirEntry per step. It doesn't load the entire tree into memory. It visits nodes on demand. This keeps memory usage constant, even if you're walking a directory with millions of files. The crate uses std::fs::read_dir under the hood but adds the recursion logic, configuration options, and consistent error handling.
The iterator is lazy. Nothing happens until you pull entries from it. You can chain methods like filter, map, and take without triggering the walk. This lets you compose complex queries efficiently.
Don't buffer the whole tree unless you need to. Process entries as they arrive.
Minimal example
Here's the standard pattern for walking a directory and filtering results. This code finds all Markdown files in a src directory, skipping the root directory itself.
use walkdir::WalkDir;
fn main() {
// WalkDir::new creates a configuration object. It does not start walking yet.
// into_iter() consumes the configuration and returns the iterator.
for entry in WalkDir::new("src")
.min_depth(1) // Skip the "src" directory entry. Start with its children.
.into_iter()
.filter_map(|e| e.ok()) // Convert Result<DirEntry> to Option<DirEntry>, discarding errors.
{
// entry is a DirEntry.
// Check if the file has a .md extension.
if entry.path().extension().map_or(false, |ext| ext == "md") {
println!("Found: {}", entry.path().display());
}
}
}
The filter_map(|e| e.ok()) chain is a community convention. WalkDir yields Result<DirEntry, Error>. If you encounter a permission error or a broken symlink, the iterator yields an Err. The filter_map converts Ok(entry) to Some(entry) and Err(e) to None. The loop skips errors silently. This is concise and works well for simple scripts where you just want to process accessible files. If you need to log errors, this pattern hides them.
Walking through the mechanics
Let's break down what happens in the minimal example.
WalkDir::new("src") builds a WalkDir struct. It stores the root path and default settings. The default settings include follow_links(false), which prevents infinite loops from circular symlinks. It also sets min_depth(0) and max_depth(u32::MAX), meaning the walk includes the root and goes as deep as possible.
.min_depth(1) updates the configuration. The walker will skip the root entry and start yielding children. This is useful when you want to process contents but ignore the container directory.
.into_iter() consumes the WalkDir and returns a IntoIter iterator. The iterator holds the state of the walk: the current stack of directories to visit, the depth counter, and the configuration.
The for loop pulls entries one by one. Each entry is a Result<DirEntry, Error>. The filter_map handles the Result. If the entry is Ok, the loop body runs. If it's Err, the entry is dropped.
Inside the loop, entry.path() returns a &Path. extension() extracts the file extension as an Option<&OsStr>. map_or(false, |ext| ext == "md") checks if the extension exists and equals "md". If true, the path is printed. display() formats the path for human output, handling non-UTF8 characters gracefully.
Check file_type() before calling metadata(). DirEntry caches the file type. Calling file_type() is cheap after the first access. metadata() may trigger a disk syscall. If you only need to know if an entry is a file or directory, file_type() is sufficient.
Realistic example
In production code, you often need more control. You might want to log errors, limit depth, respect symlinks carefully, and collect results. This example shows a robust file finder function.
use walkdir::{DirEntry, Error, WalkDir};
use std::path::Path;
/// Finds all files with a specific extension, logging errors to stderr.
///
/// Returns a vector of DirEntry objects for matching files.
/// Errors are printed but do not stop the traversal.
fn find_files(root: &Path, extension: &str) -> Vec<DirEntry> {
let mut results = Vec::new();
// follow_links(false) is the default. It prevents infinite loops from circular symlinks.
// max_depth(10) acts as a safety valve against unexpectedly deep trees.
for entry in WalkDir::new(root)
.max_depth(10)
.into_iter()
{
match entry {
Ok(entry) => {
// Skip directories. We only care about regular files.
// file_type() is cached in DirEntry, so this is cheap.
if entry.file_type().is_file() {
// Check the extension.
if entry.path().extension().map_or(false, |ext| ext == extension) {
results.push(entry);
}
}
}
Err(e) => {
// Log the error. WalkDir reports errors for entries it couldn't read.
// The error includes the path that caused the issue.
eprintln!("Warning: could not access {:?}: {}", e.path(), e);
}
}
}
results
}
This function iterates over the directory tree, handling errors explicitly. It logs warnings for inaccessible paths but continues walking. This is important for tools that process large trees; a single permission error shouldn't abort the entire operation. The max_depth(10) limit prevents resource exhaustion on pathological directory structures.
The DirEntry objects are stored in the result vector. DirEntry owns its path data. You can move DirEntry values around without worrying about lifetimes. If you need just the path, you can call entry.into_path() to consume the entry and extract the PathBuf. This is useful when you're moving files or passing paths to other functions that take ownership.
Convention aside: follow_links(false) is the default for a reason. Symlink cycles can hang your program forever. Only enable follow_links(true) if you trust the directory structure and have a strategy to detect cycles. The crate doesn't detect cycles automatically when following links.
Always set max_depth in production tools. A misconfigured symlink or a massive directory tree can exhaust resources without a limit.
Pitfalls and compiler errors
walkdir is straightforward, but a few traps exist.
Symlink loops. If you enable follow_links(true) and the directory contains a symlink that points back to an ancestor, the walker will loop infinitely. The crate does not track visited paths when following links. You must manage cycle detection yourself if you follow links. The default behavior skips symlinks, which is safe.
Error handling. The iterator yields Result<DirEntry, Error>. If you try to collect the iterator directly into a Vec<DirEntry>, the compiler rejects you with a type mismatch error. The iterator yields results, not entries. You must handle the Result first. Use filter_map(|e| e.ok()) to ignore errors, or match to handle them. If you unwrap inside the loop, one error crashes the whole walk.
into_iter vs iter. into_iter() consumes the WalkDir configuration. You can't reuse the walker after calling into_iter(). If you need to walk the same directory multiple times with the same settings, use iter(). iter() borrows the WalkDir and returns an Iter iterator. The configuration remains available for subsequent walks.
Path ownership. entry.path() returns a borrowed &Path. The path data lives inside the DirEntry. If you store the &Path somewhere, you must keep the DirEntry alive. If you need an owned path, call entry.into_path() to consume the entry and get a PathBuf. This is useful when you're collecting paths and want to drop the DirEntry metadata.
Order is not guaranteed. walkdir does not sort entries by default. The order depends on the filesystem and OS. If you need deterministic order, use .sort_by_file_name(true). This sorts entries within each directory alphabetically. Sorting adds overhead, so only enable it when order matters.
If you try to collect the iterator without handling results, the compiler rejects you with a type mismatch error. The iterator yields Result<DirEntry, Error>, not DirEntry. You must transform the stream first.
Treat errors as part of the data stream. Handle them explicitly. Don't unwrap in a loop.
When to use walkdir
Pick the right tool for the scope of your traversal.
Use walkdir when you need recursive traversal with control over depth, link following, and error handling. It's the standard choice for walking directory trees in Rust.
Use std::fs::read_dir when you only need to list the contents of a single directory without recursion. It's built into the standard library and avoids an external dependency.
Use the glob crate when you need to match paths against shell-style patterns like **/*.rs. It supports glob syntax and expands wildcards.
Use the ignore crate when you need to respect .gitignore rules and skip hidden files automatically. It's designed for tools that should behave like git or find with ignore support.
Reach for walkdir for general recursive walks. It's fast, safe, and well-maintained.