When two threads stare at each other forever
You have two threads. One needs to update a user's balance and a transaction log. The other needs to update the log and the balance. Thread A grabs the balance lock. Thread B grabs the log lock. Now Thread A waits for the log. Thread B waits for the balance. Both threads sit there, holding what the other needs, waiting for what the other holds. The program freezes.
This is a deadlock. It is the concurrency equivalent of two people standing in a narrow hallway, both stepping to the right to let the other pass, and then both stepping to the left, forever. Neither can move. Neither can yield. The system makes zero progress.
Why the compiler stays silent
Rust's borrow checker guarantees memory safety. It ensures you never use a reference after the data is gone. It ensures you never have two mutable references to the same data at the same time. It does not guarantee freedom from deadlocks.
The borrow checker operates at compile time. It sees types and lifetimes. It does not see the runtime schedule of threads. When you call Mutex::lock, the compiler sees a function that returns a guard. It checks that the guard implements the right traits. It does not track the order in which different threads acquire locks. Deadlocks are a runtime phenomenon caused by the interleaving of thread execution. The compiler cannot predict which thread will run next.
You can write perfectly valid Rust code that deadlocks instantly. The type system will accept it. The borrow checker will smile. The program will hang.
The borrow checker protects your memory. It does not protect your logic from circular waits. You are the deadlock police.
The minimal deadlock
Here is the classic pattern. Two resources, two threads, opposite lock orders.
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
// Create shared resources wrapped in Arc for thread-safe reference counting
let resource_a = Arc::new(Mutex::new(0));
let resource_b = Arc::new(Mutex::new(0));
// Spawn thread 1: locks A, then B
let a1 = Arc::clone(&resource_a);
let b1 = Arc::clone(&resource_b);
let handle1 = thread::spawn(move || {
// Acquire lock A first
let mut guard_a = a1.lock().unwrap();
// Simulate work to increase chance of interleaving
thread::sleep(std::time::Duration::from_millis(10));
// Now try to acquire lock B
let mut guard_b = b1.lock().unwrap();
*guard_a += 1;
*guard_b += 1;
});
// Spawn thread 2: locks B, then A
let a2 = Arc::clone(&resource_a);
let b2 = Arc::clone(&resource_b);
let handle2 = thread::spawn(move || {
// Acquire lock B first
let mut guard_b = b2.lock().unwrap();
// Simulate work
thread::sleep(std::time::Duration::from_millis(10));
// Now try to acquire lock A
let mut guard_a = a2.lock().unwrap();
*guard_a += 1;
*guard_b += 1;
});
handle1.join().unwrap();
handle2.join().unwrap();
}
Run this code and watch your CPU usage drop to zero while the process hangs. That is the signature of a deadlock.
Here is what happens inside. Thread 1 starts, acquires resource_a. Thread 2 starts, acquires resource_b. Thread 1 sleeps, releasing the CPU. Thread 2 sleeps, releasing the CPU. Thread 1 wakes up, tries to lock resource_b. It blocks because Thread 2 holds it. Thread 2 wakes up, tries to lock resource_a. It blocks because Thread 1 holds it. Both threads are now blocked forever. The join calls in main wait for the threads to finish. They never finish. The program is stuck.
Fix 1: Lock ordering
The most common fix is a total ordering on locks. Pick an arbitrary rule. Lock A always comes before Lock B. Every thread follows this rule. If a thread needs B and A, it still acquires A first. This breaks the circular wait condition.
// ... setup ...
let handle2 = thread::spawn(move || {
// Thread 2 also acquires A first, even though it conceptually needs B
let mut guard_a = a2.lock().unwrap();
thread::sleep(std::time::Duration::from_millis(10));
let mut guard_b = b2.lock().unwrap();
*guard_a += 1;
*guard_b += 1;
});
// ...
With this change, Thread 2 can never hold B while waiting for A. It must acquire A first. If Thread 1 holds A, Thread 2 waits. Thread 1 eventually gets B, does its work, and releases both. Thread 2 then acquires A, acquires B, and proceeds. No cycle. No deadlock.
When you have many locks, assigning a manual order can be tedious. A common technique is to sort locks by their memory address. Pointer addresses provide a total order. You can collect the locks you need, sort them, and then acquire them in order.
// Collect locks into a vector
let mut locks = vec![Arc::clone(&resource_a), Arc::clone(&resource_b)];
// Sort by pointer address to establish a deterministic order
locks.sort_by_key(|l| Arc::as_ptr(l));
// Acquire in sorted order
let mut guard1 = locks[0].lock().unwrap();
let mut guard2 = locks[1].lock().unwrap();
This approach works well when you have a dynamic set of locks. It ensures every thread acquires the same set of locks in the same order, regardless of how the locks were passed in.
Enforce a global lock order and stick to it. If you can define a "less than" relationship between your locks, deadlocks vanish.
Fix 2: Try-lock and backoff
Sometimes you cannot define a global order. Maybe the locks come from a dynamic graph where dependencies shift at runtime. Maybe the order depends on user input. In these cases, use try_lock. It returns a Result immediately instead of blocking. If it fails, you release what you hold and retry.
loop {
// Try to acquire A
let guard_a = a1.try_lock().unwrap();
// Try to acquire B
match b1.try_lock() {
Ok(guard_b) => {
// Got both, do work
*guard_a += 1;
*guard_b += 1;
break;
}
Err(_) => {
// Failed to get B. Drop A and retry.
// Dropping guard_a releases the lock immediately
drop(guard_a);
// Back off to avoid spinning and burning CPU
thread::sleep(std::time::Duration::from_micros(100));
}
}
}
This pattern avoids permanent hangs. If a thread cannot get a lock, it gives up and tries again later. The other thread eventually finishes and releases its locks. The retrying thread picks them up.
There is a pitfall here. If both threads retry at exactly the same time, they can keep bumping into each other forever. This is called livelock. Add random jitter to the sleep duration to break the symmetry.
Use try_lock when you cannot guarantee order, but always include a backoff strategy. Spinning without yielding turns a deadlock into a CPU fire.
Fix 3: Scope and granularity
Hold locks for the minimum time necessary. If you lock A, do some calculation, then lock B, you are vulnerable. The longer you hold a lock, the higher the chance another thread will need it.
// Bad: Lock A, do work, Lock B
let mut guard_a = a.lock().unwrap();
let result = expensive_computation(); // Held A during this!
let mut guard_b = b.lock().unwrap();
*guard_a = result;
// Good: Compute first, then lock
let result = expensive_computation();
let mut guard_a = a.lock().unwrap();
let mut guard_b = b.lock().unwrap();
*guard_a = result;
Rust's MutexGuard drops the lock when it goes out of scope. Use inner blocks to limit scope explicitly. This is idiomatic Rust. It makes the lifetime of the lock visible to readers.
{
let mut guard = lock.lock().unwrap();
// Critical section
} // Lock released here
Shrink your critical sections. The less time you hold a lock, the less likely you are to collide with another thread.
Pitfalls: Poisoning and recursion
Rust's Mutex has a feature called poisoning. If a thread panics while holding a lock, the lock becomes poisoned. Subsequent calls to lock return Err(PoisonError). This prevents other threads from using data that might be in an inconsistent state.
You must handle this. If you call unwrap() on a poisoned lock, your thread panics too. This can cascade and kill your entire program.
match lock.lock() {
Ok(guard) => {
// Use guard normally
}
Err(poisoned) => {
// Recover the guard. The data might be inconsistent,
// but the lock is usable again.
let guard = poisoned.into_inner();
// Use guard, perhaps with validation
}
}
Decide upfront whether a poisoned lock is a fatal error or a recoverable condition. In many systems, you can recover the guard and continue, assuming the data corruption is localized. In safety-critical code, you might treat poisoning as a system failure.
Another pitfall is recursive locking. Rust's Mutex is not recursive. If you try to lock it again from the same thread, it deadlocks immediately. The thread waits for itself. Use a recursive mutex wrapper if you absolutely need re-entrancy, though re-entrancy is often a code smell. It usually means your lock granularity is too coarse.
Handle PoisonError explicitly. A panicked thread does not release its locks; it marks them as toxic. Your code must decide whether to treat a poisoned lock as a fatal error or recover the data.
Decision: Choosing your strategy
Use lock ordering when you have a fixed set of resources and can define a consistent acquisition hierarchy. Use try_lock with backoff when lock dependencies are dynamic or circular by nature, and you can tolerate retry logic. Use channels when threads need to exchange data rather than share mutable state; message passing eliminates the need for locks entirely. Use atomic types for simple counters or flags where you need lock-free updates without the overhead of a Mutex. Use RwLock when you have many readers and few writers, but remember that RwLock can still deadlock if you nest locks incorrectly.
Pick the tool that matches your access pattern. If you are reaching for locks, ask if you could pass a message instead.