How to Use pest for PEG Parsing in Rust

When regex hits the wall

You are building a tool that needs to understand a custom format. Maybe a config file for a game engine, a mini-language for a build script, or a domain-specific query syntax. You could write a recursive descent parser by hand, managing state and backtracking manually. Or you could reach for pest.

pest is a Parsing Expression Grammar (PEG) library for Rust. You describe the structure of your text in a dedicated grammar file, and the crate generates a fast, zero-allocation parser for you. Regex is great for matching patterns, but it falls apart when you need to build a tree of meaning. pest gives you that tree, with clear rules and generated code that integrates seamlessly into your Rust project.

PEGs and the pest workflow

PEG stands for Parsing Expression Grammar. It is a formalism for describing languages where the rules are ordered and unambiguous. If two rules could match the same input, the one listed first wins. This makes PEGs easier to reason about than traditional Context-Free Grammars, which can be ambiguous and require complex disambiguation logic.

Think of a PEG as a decision tree that the parser walks. At each step, it tries the rules in order. If a rule matches, great. If not, it backtracks and tries the next option. pest compiles this decision tree into efficient Rust code at compile time. The result is a parser that runs at the speed of hand-written code but is defined in a readable, declarative syntax.

The workflow has three parts. You add pest and pest_derive to your dependencies. You write a .pest file that defines your grammar rules. You use the #[derive(Parser)] macro in Rust to generate the parser struct. The macro reads the grammar file, generates a Rule enum, and implements the parsing logic.

Minimal example

Start by adding the dependencies. You need both pest for the runtime and pest_derive for the macro. The versions must match.

[dependencies]
pest = "2.7"
pest_derive = "2.7"

Create a grammar file named grammar.pest in your project root. Define the rules. The WHITESPACE rule is special. It must be named exactly WHITESPACE for pest to consume it automatically between tokens. The _ prefix makes it a silent rule, meaning it matches but does not appear in the parse tree.

// grammar.pest

// WHITESPACE is a silent rule. It gets consumed automatically between tokens.
// The name must be exactly WHITESPACE for auto-consumption to work.
WHITESPACE = _{ " " | "\t" | "\n" }

// start is the entry point. It matches one or more numbers or words.
start = { (number | word)+ }

// @ means atomic. No whitespace inside, and no silent rules inside.
// Atomic rules match without backtracking and ignore internal whitespace rules.
number = @{ ASCII_DIGIT+ }
word = @{ ASCII_ALPHA+ }

Import the grammar into your Rust code. The #[grammar] attribute points to the file relative to the current source file. The macro generates the MyParser struct and the Rule enum.

use pest_derive::Parser;

/// A parser generated from grammar.pest.
#[derive(Parser)]
#[grammar = "grammar.pest"]
struct MyParser;

fn main() {
    // Parse the input string starting from the 'start' rule.
    // unwrap() panics on error; in real code, handle the Result.
    let pairs = MyParser::parse(Rule::start, "hello 123").unwrap();

    // Iterate over the top-level pairs.
    for pair in pairs {
        // as_rule() returns the Rule variant.
        println!("{:?}", pair.as_rule());
    }
}

Run the code. The output shows the rules that matched.

Word
Number

Name your whitespace rule WHITESPACE. If you misspell it, your parser will reject valid input with confusing errors.

Walking the parse tree

When you call MyParser::parse, pest returns a Pairs iterator. Each item in the iterator is a Pair. A Pair represents a matched rule. It contains the rule name, the span of text it matched, and any inner pairs if the rule has children.

You can extract the matched string with as_str(). You can get the byte range with as_span(). You can iterate over children with into_inner(). The into_inner method consumes the pair and returns an iterator over its direct children.

use pest_derive::Parser;

#[derive(Parser)]
#[grammar = "grammar.pest"]
struct MyParser;

fn main() {
    let pairs = MyParser::parse(Rule::start, "hello 123").unwrap();

    for pair in pairs {
        // pair.as_rule() is Rule::Word or Rule::Number.
        // pair.as_str() is "hello" or "123".
        println!("Rule: {:?}, Text: '{}'", pair.as_rule(), pair.as_str());
    }
}

The Rule enum is generated inside the parser struct. You can import it with use MyParser::Rule. This keeps the namespace clean. The convention is to refer to rules as Rule::start or Rule::word in your Rust code.

Realistic example: Config parser

A minimal example shows the mechanics. A realistic example shows how to extract data. Consider a simple key-value config format. Each line has a key, an equals sign, and a value.

// config.pest

WHITESPACE = _{ " " | "\t" | "\n" | "\r" }

// SOI and EOI ensure the entire input is consumed.
// Without them, trailing garbage would be silently ignored.
start = { SOI ~ (entry ~ NEWLINE*)* ~ EOI }

entry = { key ~ "=" ~ value }

// Atomic rules for tokens.
key = @{ ASCII_ALPHA+ }
value = @{ ASCII_ALPHANUMERIC+ }

The SOI and EOI rules match the start and end of input. They anchor the parser. If the input has extra characters at the end, the parser fails. This prevents silent failures where trailing garbage is ignored.

In Rust, parse the input and extract the key-value pairs.

use pest_derive::Parser;
use pest::iterators::Pair;

#[derive(Parser)]
#[grammar = "config.pest"]
struct ConfigParser;

fn parse_config(input: &str) -> Result<Vec<(String, String)>, pest::error::Error<Rule>> {
    let mut result = Vec::new();
    
    // Parse the input. Returns Result<Pairs, Error>.
    let pairs = ConfigParser::parse(Rule::start, input)?;

    // The 'start' rule contains 'entry' pairs.
    // pairs is an iterator over the children of 'start'.
    for pair in pairs {
        // pair is 'start', we need its children.
        // into_inner() consumes the pair and returns children.
        for entry in pair.into_inner() {
            // entry is 'entry', which has key, '=', and value children.
            let mut inner = entry.into_inner();
            
            // First child is key.
            let key = inner.next().unwrap().as_str().to_string();
            
            // Second child is '='. Skip it.
            inner.next();
            
            // Third child is value.
            let value = inner.next().unwrap().as_str().to_string();
            
            result.push((key, value));
        }
    }
    Ok(result)
}

fn main() {
    let input = "name = Alice\nage = 30";
    match parse_config(input) {
        Ok(pairs) => println!("{:?}", pairs),
        Err(e) => eprintln!("Parse error: {}", e),
    }
}

The pest error type includes line and column information. It also lists the expected rules at the error location. This makes debugging grammar issues straightforward.

Anchor your start rule with SOI and EOI. A parser that ignores trailing garbage is a parser that hides bugs.

Pitfalls and conventions

Left recursion is a common trap. If you write expr = { expr ~ "+" ~ term }, the parser will try to match expr, which tries to match expr, forever. pest detects left recursion at compile time and rejects it. You will get a macro error about left recursion. Use prec_climber for operators instead.

prec_climber handles precedence and associativity automatically. You list operators from lowest to highest precedence. pest generates the logic to parse expressions correctly.

expr = { prec_climber(
    "+" ~ "-",
    "*" ~ "/",
    term
) }
term = { number | "(" ~ expr ~ ")" }

Atomic rules (@) and silent rules (! or _) have specific behaviors. Atomic rules match without backtracking inside and ignore whitespace and silent rules. Silent rules match but do not appear in the parse tree. The _ prefix combines silent and atomic. Use @ for tokens that should not contain whitespace. Use _ for structural whitespace.

The WHITESPACE rule must be named exactly WHITESPACE. pest looks for this specific name to enable automatic whitespace skipping. If you name it SPACE or ws, the parser will treat whitespace as a syntax error unless you explicitly handle it in every rule.

Test your grammar with edge cases. A parser that works on 1+2 but crashes on 1 + 2 is a parser that will break in production.

Decision: pest vs alternatives

Use pest when you need a readable grammar file and generated parser for a custom DSL or config format. Use nom when you need maximum control over parsing logic and want to write parsers directly in Rust without a separate grammar file. Use lalrpop when you are building a complex language with LR parsing requirements and need detailed error recovery. Use regex when you only need to extract simple patterns and don't care about the structure of the result. Use a hand-written recursive descent parser when the grammar is trivial and adding a dependency feels like overkill.

Pick the tool that matches your grammar's complexity. If your grammar fits in a paragraph, pest is your friend. If it requires a PhD in formal languages, look elsewhere.

Where to go next

Pest is a tool that lets you write a simple text file describing how your data should look, and it automatically builds the code to read and understand that data. It works like a translator that turns your custom rules into a working program that can check if text follows those rules. You use it when you need to build a custom language, parse configuration files, or analyze structured text without writing complex parsing logic from scratch.