How to Build Recursive Data Structures in Rust (Linked Lists, Trees)

The infinite box problem

You're building a parser for a nested configuration format. You define a Config struct. It has a name and a list of sub-configs. You write sub_configs: Vec<Config>. You hit compile. Rust rejects you. The error isn't about your logic. It's about size. The compiler stops you because Config would have to contain itself, which means it would have to contain itself, forever. The size would be infinite. The compiler refuses to allocate infinite stack space for a single variable.

This hits every Rustacean early. You want a linked list, a tree, or a graph. You try to nest the type inside itself. The compiler says no. The solution isn't to fight the compiler. The solution is to break the cycle. You don't put the next node inside the current node. You put a pointer to the next node inside. The pointer has a fixed size. The actual data lives on the heap. Rust gives you tools for this: Box<T>, Rc<T>, and Option.

Breaking the chain with `Box`

Think of a struct like a physical box. The compiler needs to know exactly how big the box is so it can set aside the right amount of space on the stack. If the box contains another box of the exact same type, that inner box also contains another box, and so on. The outer box would have to be infinitely large to hold the infinite chain of inner boxes.

Box<T> solves this. Box is a smart pointer. It owns data on the heap. The Box itself is just a pointer. Pointers have a fixed size. On a 64-bit system, a pointer is 8 bytes. The compiler doesn't care how big the data inside the box is. It only cares about the pointer. When you write Box<List>, you're telling the compiler: "The next node lives somewhere else. I'm just holding an address to it." The address has a finite size. The recursion breaks.

#[derive(Debug)]
enum List {
    // Cons holds a value and a pointer to the next node.
    // Box<List> moves the next node to the heap.
    // The Box itself has a fixed size (a pointer).
    Cons(i32, Box<List>),
    // Nil marks the end of the list.
    // This variant has no recursive field, so it has a fixed size.
    Nil,
}

fn main() {
    // Build the list from the inside out.
    // Each Box::new allocates memory on the heap.
    let list = List::Cons(1, Box::new(
        List::Cons(2, Box::new(List::Nil))
    ));
    
    println!("{:?}", list);
}

The compiler calculates the size of List. It looks at Cons. Cons contains an i32 (4 bytes) and a Box<List>. Box<List> is a pointer (8 bytes). The size of Cons is 4 + 8 plus some padding and a discriminant to track which variant is active. It's finite. Nil has no fields. It's tiny. The enum size is the largest variant plus the discriminant. The data structure can grow infinitely deep, but the type definition remains finite.

Break the cycle with a pointer. The compiler needs a finite size, and Box gives it one.

Structs, enums, and the base case

The example above uses an enum. This is idiomatic for algebraic data types. Cons and Nil represent two distinct states: a node with data and a node that ends the list. You can also use a struct with Option.

#[derive(Debug)]
struct Node {
    value: i32,
    // Option handles the base case.
    // None means there is no next node.
    // Some(Box<Node>) means there is a next node on the heap.
    next: Option<Box<Node>>,
}

fn main() {
    let node = Node {
        value: 1,
        next: Some(Box::new(Node {
            value: 2,
            next: None,
        })),
    };
    
    println!("{:?}", node);
}

Option serves two purposes here. It breaks the recursion by wrapping the pointer. It also represents the absence of a next node. None is the base case. This pattern is common when the node always exists but the link might not. Enums are better when the variants represent fundamentally different shapes. Structs with Option are better when the shape is uniform but fields are optional.

Pick the shape that matches your data. Use enums for distinct states. Use structs with Option when the node always exists but the link might not.

Shared ownership with `Rc`

Box gives you single ownership. One parent owns the child. What if multiple parents need to point to the same child? This happens in graphs, syntax trees, or UI component trees where a widget is reused in multiple places. Box can't handle this. Dropping one Box would free the data, invalidating the other pointers.

Use Rc<T> (Reference Counted) for shared ownership. Rc keeps a count of how many references point to the data. When you clone an Rc, you don't copy the data. You bump the counter. When an Rc is dropped, the counter decrements. When the counter hits zero, the data is freed.

use std::rc::Rc;

#[derive(Debug)]
struct Node {
    value: i32,
    // Children are shared.
    // Multiple nodes can point to the same child.
    children: Vec<Rc<Node>>,
}

fn main() {
    // Create a shared leaf node.
    // Rc::new puts the node on the heap and sets the reference count to 1.
    let shared_leaf = Rc::new(Node {
        value: 42,
        children: vec![],
    });

    // Create two parents that share the leaf.
    // Rc::clone bumps the reference count.
    // The leaf now has three owners: shared_leaf, parent_a, and parent_b.
    let parent_a = Rc::new(Node {
        value: 10,
        children: vec![Rc::clone(&shared_leaf)],
    });

    let parent_b = Rc::new(Node {
        value: 20,
        children: vec![Rc::clone(&shared_leaf)],
    });

    println!("Leaf value: {}", shared_leaf.value);
}

Community convention prefers Rc::clone(&shared_leaf) over shared_leaf.clone(). Both compile. Both work. The explicit form signals to readers that you're cloning the reference, not the data. shared_leaf.clone() looks like it might deep-copy the node, which can confuse reviewers. Write Rc::clone to be clear.

Shared ownership costs memory and CPU. Pay that cost only when multiple owners are real.

Interior mutability and `RefCell`

Rc gives you shared ownership, but the data is immutable. You can't modify a node through an Rc. If you need to update values in a shared tree, you need interior mutability. RefCell<T> moves borrow checking from compile time to runtime. It allows you to mutate data even when you only have an immutable reference to the container.

Combine Rc and RefCell for mutable shared structures.

use std::rc::Rc;
use std::cell::RefCell;

#[derive(Debug)]
struct Node {
    value: i32,
    // RefCell allows mutation without borrowing the whole tree.
    // Rc allows multiple owners.
    children: Vec<Rc<RefCell<Node>>>,
}

fn main() {
    let leaf = Rc::new(RefCell::new(Node {
        value: 42,
        children: vec![],
    }));

    let root = Rc::new(RefCell::new(Node {
        value: 10,
        children: vec![Rc::clone(&leaf)],
    }));

    // Modify the leaf through the root's reference.
    // borrow_mut checks at runtime that no one else is holding a reference.
    let mut leaf_borrow = leaf.borrow_mut();
    leaf_borrow.value = 100;
    
    println!("Updated value: {}", leaf_borrow.value);
}

RefCell tracks borrows at runtime. If you call borrow_mut while an immutable borrow is active, the program panics. This is a runtime guarantee, not a compile-time one. Use RefCell when the borrow checker is too strict for your access patterns, and you can guarantee safety through logic.

Treat RefCell panics as logic errors. If the borrow panics, your access pattern is broken.

Pitfalls and errors

Recursive structures introduce specific failure modes. Watch for these.

Infinite size error. If you forget the pointer wrapper, the compiler rejects you with E0072 (recursive type has infinite size). You must use Box, Rc, or a raw pointer.

Reference cycles. Rc prevents leaks by dropping data when the count hits zero. If two nodes point to each other, the count stays above zero even when the external references are gone. The memory leaks. Use Weak pointers to break cycles. Weak doesn't increment the reference count. It's an observer. When all Rc owners are gone, the data drops, and Weak pointers become invalid.

Runtime panics. RefCell panics on borrow violations. This is harder to debug than compile errors. Keep RefCell usage localized. Don't wrap the entire world in RefCell. Use it only where interior mutability is necessary.

Cache locality. Recursive structures allocate each node on the heap. This scatters data in memory. Traversing a linked list causes cache misses. If performance matters, consider flat arrays or arena allocators. Pointers are flexible, but they hurt cache performance.

Debug recursion limit. std::fmt::Debug has a recursion limit. If you print a huge recursive structure, it truncates the output to prevent stack overflow during formatting. This is a safety feature. Don't rely on println!("{:?}", huge_tree) to debug deep structures. Write a custom iterator or printer.

Treat the Debug limit as a hint. Deep structures need custom inspection tools.

Decision matrix

Use Box<T> when you have a single owner and need to break recursion. Use Box<T> for linked lists and trees where each node is owned by exactly one parent. Use Box<T> when you want minimal overhead and strict ownership.

Use Rc<T> when multiple parts of your data structure need to share ownership of a node. Use Rc<T> for graphs or syntax trees where subtrees are referenced from multiple locations. Use Rc<T> when the data is immutable after creation.

Use Rc<RefCell<T>> when you need shared ownership and interior mutability. Use Rc<RefCell<T>> for mutable graphs where you update node values without reborrowing the entire structure. Use Rc<RefCell<T>> when the borrow checker rejects your access pattern but you can prove safety at runtime.

Use Arc<T> when the recursive structure crosses thread boundaries. Use Arc<T> for concurrent parsers or multi-threaded caches. Use Arc<T> when you need atomic reference counting.

Use Weak<T> to break reference cycles. Use Weak<T> for parent pointers in trees where the child owns the parent reference but the parent shouldn't keep the child alive. Use Weak<T> to avoid leaks in bidirectional graphs.

Use Option<Box<T>> when the recursive link is optional. Use Option<Box<T>> in structs to represent a nullable next pointer. Use Option to handle the base case explicitly.

Reach for plain references when lifetimes are simple. The pointer alternative is rarely worth it for simple recursion.

Where to go next

Rust needs to know exactly how much memory a variable uses before the program runs. A recursive structure like a linked list could theoretically be infinite, making its size impossible to calculate. By wrapping the next item in a Box, you move that part to the heap where the size doesn't matter, leaving only a fixed-size pointer on the stack.