How to Share State Between Threads Without a Mutex (Atomics)

The scoreboard problem

You are running a trivia night. Ten tables are shouting answers. You need a scoreboard that updates instantly as answers come in. If you hand a single pen to the scorekeeper, every table has to wait for the pen to travel, get written, and come back. That is a bottleneck. You want a digital board where any table can press a button and the number jumps up immediately, without anyone waiting for a turn.

That is what atomics give you. They let threads update shared values like counters or flags without the overhead of a mutex, avoiding the queue entirely. You get thread-safe mutation with the speed of a hardware instruction.

Atomics are indivisible operations

An atomic type is a value that supports operations which happen in one indivisible step. The CPU guarantees that no other thread can see the value in the middle of the operation. It is like a light switch that snaps from off to on. You never see it halfway.

In Rust, Mutex<T> protects any type by locking it. Atomics are built-in types like AtomicUsize or AtomicBool that carry their own thread-safety guarantees. You do not lock them. You just perform an operation, and the hardware handles the rest.

To share an atomic across threads, you still need Arc. Arc handles the reference counting for the pointer, while the atomic handles the safety of the value inside. Arc makes the data Send and Sync so multiple threads can own the pointer. The atomic ensures that mutations to the value do not race.

Trust the hardware for simple values. Trust the mutex for complex ones.

Minimal counter

Here is a counter shared across ten threads. Each thread increments the counter. No mutex is involved.

use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;

fn main() {
    // Arc wraps the atomic so multiple threads can own the pointer.
    // The atomic itself handles thread-safe mutations.
    let counter = Arc::new(AtomicUsize::new(0));
    let mut handles = vec![];

    for _ in 0..10 {
        // Clone the Arc, not the atomic.
        // This bumps the reference count, not the counter value.
        let counter = Arc::clone(&counter);
        let handle = thread::spawn(move || {
            // fetch_add increments and returns the old value.
            // SeqCst is the strongest ordering, safe for most cases.
            counter.fetch_add(1, Ordering::SeqCst);
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.join().unwrap();
    }

    // Load the final value to print it.
    println!("Result: {}", counter.load(Ordering::SeqCst));
}

Convention aside: use Arc::clone(&counter) instead of counter.clone(). Both compile and both work. The explicit form signals to readers that you are cloning the reference, not deep-copying the data. It prevents the "wait, did this copy the whole struct?" question during code review.

No lock. No wait. Just math.

What happens under the hood

When you call Arc::new, the AtomicUsize lands on the heap. The reference count starts at one. Inside the loop, Arc::clone creates a new pointer to the same heap allocation and increments the reference count. The threads get their own Arc handles.

When a thread calls fetch_add, the CPU executes a special instruction that reads the value, adds one, and writes it back, all while preventing other cores from interfering. This happens without acquiring a lock. There is no queue. No context switching for the lock manager. Just a fast hardware operation.

The fetch_add method returns the value that was in the atomic before the addition. This is useful for generating unique IDs or implementing compare-and-swap loops. You get the old state, so you can react to it.

When the thread ends, its Arc drops, decrementing the reference count. The atomic value stays alive until the last Arc is gone. The memory is reclaimed automatically.

Real-world flag

Counters are the classic example, but flags are often more useful. An AtomicBool lets you signal state changes between threads. Here is a worker loop that stops when a flag flips.

use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use std::thread;
use std::time::Duration;

/// A worker that checks a shared flag to decide whether to keep running.
fn worker(flag: Arc<AtomicBool>, id: usize) {
    // Loop until the flag is set to false.
    // Relaxed ordering is sufficient here since we just need the value.
    while flag.load(Ordering::Relaxed) {
        println!("Worker {id} is busy...");
        thread::sleep(Duration::from_millis(100));
    }
    println!("Worker {id} stopping.");
}

fn main() {
    // Start with the flag true so workers run.
    let stop_flag = Arc::new(AtomicBool::new(true));
    let mut handles = vec![];

    for id in 0..3 {
        let flag = Arc::clone(&stop_flag);
        handles.push(thread::spawn(move || worker(flag, id)));
    }

    // Let workers run for a bit.
    thread::sleep(Duration::from_secs(1));

    // Flip the flag to false.
    // Store updates the value.
    stop_flag.store(false, Ordering::SeqCst);

    for handle in handles {
        handle.join().unwrap();
    }
}

The store method writes a new value. The load method reads it. The workers poll the flag in a loop. When the main thread calls store, the flag updates, and the workers see the change on their next iteration.

One bit change. Instant broadcast.

The ordering trap

Atomics are not just about updating a value. They are also about memory ordering. The Ordering parameter tells the compiler and CPU how this operation relates to other memory accesses. Getting this wrong leads to bugs that only appear under heavy load.

Ordering::SeqCst is the default choice. It stands for "sequentially consistent." It guarantees that all threads see the operations in the same order. It matches the intuition of a single timeline. Use this unless you have measured a performance problem and understand the alternatives.

Ordering::Relaxed guarantees the atomic operation happens, but it makes no promises about other memory. If you use Relaxed to update a flag that guards other data, other threads might see the flag update but still see stale values for the data. The compiler will not stop you. You get a race condition.

Ordering::Acquire and Ordering::Release work in pairs. A Release store ensures that all writes before it are visible to any thread that performs an Acquire load on the same atomic. This is lighter than SeqCst but requires careful reasoning.

If you try to use a plain usize inside an Arc and mutate it, the compiler rejects you with E0596 (cannot borrow as mutable) or E0277 (trait bound not satisfied) depending on how you try it. The type system forces you to pick the right tool. Atomics are the only way to mutate shared integers without a lock.

Profile before you relax. SeqCst is your friend.

Pitfalls and limits

Atomics have strict limits. You cannot put a String, Vec, or custom struct inside an atomic. Atomics are limited to primitive sizes that fit in a machine word. If you need to share complex state, use Mutex or RwLock.

You cannot perform compound operations atomically without help. If you need to read a value, compute something, and write it back only if the value hasn't changed, you need compare_exchange. This creates a loop that retries on failure. It is powerful but error-prone. Most code does not need this.

Do not use atomics for complex logic. The mental model for memory orderings is harder than the mental model for a mutex. If you can use a mutex, use a mutex. Atomics are for when the mutex gets in the way.

When to use atomics

Use AtomicUsize or AtomicBool when you need a counter, a flag, or a simple state machine that fits in a single machine word.

Use AtomicPtr when you are building a lock-free data structure and need to swap pointers safely.

Reach for Mutex<T> when the shared state is complex, like a Vec or a HashMap, or when you need to perform multiple reads and writes as a single transaction.

Pick RwLock<T> when you have many readers and few writers, and the read path is expensive enough to justify the lock overhead.

Avoid atomics for complex logic. The cognitive cost of reasoning about orderings outweighs the performance gain for most applications.

Reach for the mutex first. Atomics are for when the mutex gets in the way.

Where to go next

Atomic types let multiple threads safely update a single number without needing a lock to coordinate access. Think of it like a digital counter that only one person can press at a time, but the machine handles the timing automatically so no two people ever mess up the count. You use this when you need to share a simple value like a counter or flag between threads and want better performance than a Mutex.