How to Use Rayon for Easy Parallelism in Rust

When one core isn't enough

You're processing a million rows of CSV data. The loop runs for forty seconds. You watch the CPU monitor: one core is pinned at 100%, the other fifteen are napping. You know the work is independent. Row 42 doesn't care about row 43. You just want all those cores to help out without writing thread pools, channels, or mutexes.

Rayon turns that single-core bottleneck into a multi-core pipeline with one method call.

What Rayon actually does

Rayon is a data-parallelism library. It takes a collection and a set of operations, splits the work across your CPU cores, and reassembles the result. You write the logic as if it runs on one thread. Rayon handles the splitting, the synchronization, and the joining.

Think of a restaurant kitchen during the lunch rush. The head chef has a stack of twenty identical orders for burgers. Instead of cooking them one by one, the chef hands five burgers to each of four line cooks. Everyone grills at the same time. When the burgers are done, the head chef plates them. Rayon is the head chef. You provide the recipe and the ingredients. Rayon manages the line cooks.

You don't create threads manually. You don't pass messages between tasks. You call par_iter() on a collection, chain methods like map, filter, or sum, and Rayon executes the chain in parallel.

Getting started

Add Rayon to your dependencies. The version stabilizes around 1.10.

[dependencies]
rayon = "1.10"

Import the prelude. This brings ParallelIterator into scope so you can call par_iter() on standard collections.

use rayon::prelude::*;

fn main() {
    let data = vec![1, 2, 3, 4, 5];
    
    // par_iter() returns a parallel iterator.
    // sum() is a consumer that triggers execution.
    let sum: i32 = data.par_iter().sum();
    
    println!("Sum: {}", sum);
}

The syntax is identical to sequential iterators. The only difference is par in the name. Rayon follows the convention of mirroring the standard library API. If you know iter().map().collect(), you know par_iter().map().collect().

How the execution works

Rayon iterators are lazy. Calling par_iter() doesn't start any threads. It returns an iterator object that describes the work. The actual computation happens when you call a consumer method like sum(), collect(), for_each(), or reduce().

Under the hood, Rayon uses a work-stealing thread pool. When your program starts, Rayon creates a pool of threads, usually matching your core count. This pool is global. You don't pay the cost of thread creation for every parallel call. The pool lives for the duration of the program.

When you invoke a consumer, Rayon splits the data into chunks. Each thread in the pool grabs a chunk and starts processing. If a thread finishes its chunk early, it looks around and steals work from a busy neighbor. This keeps all cores fed without the overhead of dynamic scheduling.

Work-stealing keeps cores busy. You don't need to tune chunk sizes. Rayon adapts to your data and your hardware.

Transforming data in parallel

The most common pattern is transforming a collection. You read data, modify it, and collect the results. Use into_par_iter() when you want to move data out of the collection, such as transforming Vec<i32> into Vec<i64>.

use rayon::prelude::*;

fn main() {
    let numbers = vec![1, 2, 3, 4, 5];
    
    // into_par_iter() takes ownership of the vector.
    // Each item is moved into the closure.
    let doubled: Vec<i64> = numbers
        .into_par_iter()
        .map(|n| n as i64 * 2)
        .collect();
        
    println!("{:?}", doubled);
}

The collect() method reassembles the results into a Vec. Rayon preserves the logical order of the output. Even though threads process chunks in random order, the final vector has items in the same order as the input.

Prefer into_par_iter() when you want to transform types and move data out of the collection. Use par_iter() when you only need to read the data.

Reductions and accumulators

Summing is a reduction. You take many values and combine them into one. Rayon provides sum(), min(), max(), and product(). For custom reductions, use reduce().

The reduce() method requires a neutral element and a combining function. The neutral element is the identity value, like zero for addition. The combining function merges two partial results.

use rayon::prelude::*;

fn main() {
    let data = vec![1, 2, 3, 4, 5];
    
    // reduce combines partial results from threads.
    // The first argument is the neutral element.
    // The second argument is the merge function.
    let total: i32 = data.par_iter().reduce(
        || 0,
        |left, right| left + right
    );
    
    println!("Total: {}", total);
}

Rayon splits the data, sums each chunk locally, and then calls your merge function to combine the partial sums. This avoids locks. Each thread works on its own accumulator, and the merge happens only when combining results.

For more complex accumulation, use fold() followed by reduce(). fold() creates a local accumulator for each thread. reduce() merges those accumulators at the end. This pattern is essential when building collections in parallel.

use rayon::prelude::*;

fn main() {
    let data = vec![1, 2, 3, 4, 5];
    
    // fold creates a Vec per thread.
    // reduce merges the Vecs into one.
    let result: Vec<i32> = data.par_iter()
        .fold(
            || vec![],
            |mut acc, &item| {
                acc.push(item * 2);
                acc
            }
        )
        .reduce(
            || vec![],
            |mut left, right| {
                left.extend(right);
                left
            }
        );
        
    println!("{:?}", result);
}

The fold closure runs on each thread independently. It builds a local vector. The reduce closure merges vectors from different threads. This pattern scales well because threads never contend for a shared resource.

Use fold and reduce when you need to build a collection or complex structure in parallel. It avoids the performance hit of locking a shared accumulator.

Dynamic parallelism with scopes

Sometimes you don't know the work ahead of time. You might need to spawn tasks dynamically based on data. Rayon provides scope for this. A scope lets you spawn parallel tasks that can spawn more tasks, all within a bounded region.

use rayon::prelude::*;

fn main() {
    // scope creates a region for dynamic parallelism.
    // All spawned tasks must finish before scope returns.
    rayon::scope(|s| {
        // Spawn a task that runs in parallel.
        s.spawn(|_| {
            println!("Task A running");
        });
        
        // Spawn another task.
        s.spawn(|_| {
            println!("Task B running");
        });
    });
    
    println!("All tasks finished");
}

Scopes are useful for recursive algorithms or when the number of tasks depends on runtime data. The scope ensures that all spawned tasks complete before execution continues. This prevents dangling references and keeps memory safe.

Scopes add flexibility. Use them when the parallelism structure isn't known at compile time.

Pitfalls and compiler errors

Rayon doesn't bypass the borrow checker. If your closure captures mutable state, the compiler will stop you. Parallel iterators require closures that can be safely sent across threads.

If you try to mutate a shared variable inside a parallel loop, the compiler rejects the code. You'll see E0277 (trait bound not satisfied) complaining that the closure isn't Send. Mutable references aren't thread-safe by default.

use rayon::prelude::*;

fn main() {
    let mut total = 0;
    
    // This fails. The closure captures &mut total.
    // &mut i32 is not Send.
    // Compiler error: E0277 `&mut {integer}` cannot be sent between threads safely.
    let v = vec![1, 2, 3];
    v.par_iter().for_each(|_| total += 1);
}

The fix is to avoid shared mutable state. Use reduce or fold to accumulate results locally per thread. If you absolutely must share state, wrap it in Mutex or AtomicUsize, but prefer functional patterns whenever possible.

Rayon has overhead. Splitting work, stealing tasks, and joining results takes time. If your loop body is trivial, Rayon might be slower than sequential code. The work per item needs to be significant. If map does one addition, stick to sequential. If map does image resizing or heavy math, Rayon wins.

Parallel iteration doesn't guarantee order of execution. par_iter processes items in chunks. The order of execution is non-deterministic. If you rely on order, use enumerate and sort later, or stick to sequential. Side effects happen in random order. collect preserves the logical order of the result, but don't assume tasks run left-to-right.

Profile before parallelizing. Rayon helps only when the math adds up.

When to use Rayon

Use Rayon for data-parallel tasks where each item is processed independently and the work per item is heavy enough to justify thread overhead.

Use sequential iterators when the loop body is trivial, the collection is small, or operations depend on the result of previous items.

Use std::thread or tokio when you need task parallelism, where different functions run concurrently, or when you are doing I/O bound work like network requests.

Use crossbeam or std::sync::mpsc when you need fine-grained control over channels and thread scheduling, or when building a custom parallel runtime.

Pick the tool that matches the dependency graph. Independent data gets Rayon. Dependent data gets sequential code.

Where to go next

Rayon lets you run code on multiple CPU cores at the same time without writing complex threading code. It works like a team of workers where you hand out a pile of tasks, and Rayon automatically figures out how to split them up and combine the results. You use it whenever you have a large list of items to process and want to finish the job faster.