When one core isn't enough
You have a vector of ten thousand items. A simple for loop processes them one by one. It takes three seconds. You know your CPU has eight cores sitting idle. You want all eight cores to work on the loop at once. In Python, you'd wrestle with multiprocessing and serialization headaches. In Rust, you add a crate and change one method call. The loop runs in parallel without you managing threads.
Rayon brings data-parallelism to Rust iterators. It lets you write parallel code that looks almost identical to serial code. The compiler and the library handle the thread management, synchronization, and load balancing. You focus on the transformation logic.
How Rayon works under the hood
Rayon turns your iterator into a parallel machine. Think of a restaurant kitchen during the rush. One chef can only cook so many dishes. Rayon is like hiring a team of sous-chefs who share the same prep station. When the head chef gets swamped, the sous-chefs grab orders and start cooking. If one sous-chef finishes early, they steal an order from a busy colleague. The kitchen stays busy, and orders come out faster.
Rayon does this with your data. It splits your collection into chunks, hands them to threads in a pool, and lets threads steal work from each other to keep the load balanced. The key mechanism is work-stealing. Rayon maintains a global thread pool. Each thread has a deque of tasks. When a thread runs out of work, it steals a task from the back of another thread's deque. This strategy minimizes contention. Threads mostly work on their own data. Stealing only happens when necessary. The result is high throughput with low overhead.
Rayon also fuses operations. If you chain filter and map, Rayon runs them in a single parallel pass. It does not create intermediate collections. This keeps memory usage low and cache performance high. The library adapts to your data. If items have varying costs, Rayon splits chunks dynamically to ensure no thread sits idle while others are busy.
Work-stealing keeps every core busy without you writing a scheduler.
Minimal example
The fastest way to parallelize a loop is to swap iter() for par_iter(). Rayon provides extension traits that add parallel methods to standard collections.
use rayon::prelude::*;
fn main() {
// Create a collection large enough to benefit from parallelism.
// Small collections incur overhead that outweighs speed gains.
let numbers: Vec<i32> = (1..100_000).collect();
// par_iter() converts the serial iterator to a parallel one.
// Rayon splits the range into chunks and distributes them.
let sum: i32 = numbers.par_iter().sum();
println!("Sum: {}", sum);
}
The community convention is to import rayon::prelude::* at the top of the file. This brings par_iter, par_bridge, and other extension traits into scope without cluttering imports. You rarely see explicit imports of ParallelIterator.
One method call unlocks your CPU. The result is identical; the speed is not.
Walkthrough of execution
When you call par_iter(), Rayon does not spawn a new thread for every item. Spawning threads is expensive. Creating thousands of threads would crash your operating system or waste cycles on context switching. Rayon uses a global thread pool created once when your program starts. The pool size matches your CPU core count by default. You can configure this, but the default works for most workloads.
Rayon breaks your iterator into chunks. The chunk size is determined dynamically. Rayon starts with a large chunk and splits it if the work takes too long. This adaptive splitting ensures balanced load even if items have varying costs. The chunks get handed to idle threads in the pool. Threads process their chunks independently. If a thread finishes early, it checks other threads for work to steal. This work-stealing loop continues until all chunks are done.
Rayon merges the results. For operations like sum, Rayon computes partial sums for each chunk and adds them together. The merge step is fast because it happens after the heavy lifting. Rayon also handles the type system. par_iter() returns a ParallelIterator. This is a different trait from the standard Iterator. The methods look the same, but the implementation runs across threads. You can mix parallel and serial iterators using par_bridge() and serial_bridge(). This lets you parallelize specific parts of a pipeline while keeping others serial.
Rayon scales with your hardware. Add cores, get speed. No code changes.
Realistic example
Parallel iteration shines when you process large datasets with independent operations. Here is a pipeline that filters and aggregates data across threads.
use rayon::prelude::*;
/// Analyze a batch of log entries in parallel.
/// Counts errors and warnings across multiple threads.
fn analyze_logs(entries: &[String]) -> (usize, usize) {
// par_iter() enables parallel traversal of the slice.
// The closure captures no external state, so it is Send.
entries.par_iter()
// filter() runs in parallel.
// Only items matching the predicate proceed.
.filter(|entry| entry.contains("ERROR") || entry.contains("WARN"))
// map() transforms the filtered items.
// Each thread processes its chunk of filtered items.
.map(|entry| {
if entry.contains("ERROR") {
(1, 0)
} else {
(0, 1)
}
})
// reduce() combines results from all threads.
// The initial value is (0, 0).
// The closure adds two tuples together.
.reduce(
|| (0, 0),
|acc, item| (acc.0 + item.0, acc.1 + item.1)
)
}
fn main() {
let logs = vec![
"INFO: Started".to_string(),
"ERROR: Disk full".to_string(),
"WARN: Low memory".to_string(),
"ERROR: Timeout".to_string(),
];
let (errors, warnings) = analyze_logs(&logs);
println!("Errors: {}, Warnings: {}", errors, warnings);
}
The reduce method is the parallel equivalent of fold. fold returns a different type per chunk, which can be useful for building local buffers. reduce combines results when the accumulator type matches the item type or when you need a deterministic merge. Rayon calls the reduce closure multiple times to merge partial results. The closure must be associative and commutative for correctness.
Use reduce when you need to aggregate values across threads. Use fold when you want to collect items into a local buffer per thread and merge later.
Pitfalls and compiler errors
Rayon requires your data to be safe to move across threads. If you capture an Rc inside a parallel closure, the compiler rejects it with E0277 (trait bound not satisfied). Rc is not thread-safe. Swap it for Arc to share references across threads.
Parallel iteration adds overhead. Splitting work and merging results takes time. If your collection has fewer than a few hundred items, or if the work per item is tiny, parallel iteration will be slower than the serial version. Rayon has a threshold to avoid splitting tiny chunks, but the overhead of the parallel machinery still exists. Profile your code. Parallelism is not a free lunch. Measure the overhead before you parallelize.
Order is not guaranteed for side effects. par_iter().for_each() does not run in order. The closure executes on whichever thread grabs the chunk. If you print inside for_each, the output will be interleaved and unpredictable. If you need ordered output, collect into a Vec first. collect::<Vec<_>>() preserves the original order.
Rayon uses a global thread pool. You can configure the pool size with rayon::ThreadPoolBuilder. This must be done before any parallel work starts. The convention is to let Rayon pick the size. Only override if you have specific constraints, like limiting threads in a container or dedicating cores to a specific task.
Parallelism is not a free lunch. Measure the overhead before you parallelize.
When to use Rayon
Use par_iter() when you have a large collection and an embarrassingly parallel operation where each item is independent. Use par_iter() when profiling shows the loop is the bottleneck and the work per item is substantial enough to outweigh parallel overhead. Use iter() when the collection is small or the operation is trivial; the serial loop avoids thread coordination costs. Use par_iter_mut() when you need to modify items in place and the mutation of one item does not depend on another. Use manual threads or std::thread::scope when you need fine-grained control over thread creation, or when the work cannot be expressed as an iterator pipeline.
Pick the tool that matches the complexity. Rayon wins for data-parallel loops.