How to use regex in Rust

Parsing logs with regex

You are building a tool to scan server logs. The logs contain a chaotic mix of timestamps, IP addresses, and error codes. You need to pull out every 404 error from the last hour and extract the requested URLs. In Python, you would grab the re module and write a quick script. In Rust, the approach feels different. The compiler refuses to let you throw a string pattern at text and hope for the best. You must compile the pattern first. This step feels like overhead until you realize it is exactly what makes regex in Rust blazing fast and safe.

The compiled machine

Rust treats regular expressions as compiled objects. You do not match a string against a string. You match a string against a Regex object that holds the optimized state of your pattern. This separation is the core of the regex crate.

Think of a regex pattern like a blueprint for a machine. In many languages, you hand the blueprint to the factory every time you want to build a part. The factory reads the blueprint, builds the machine, makes the part, and throws the machine away. In Rust, you hand the blueprint to the factory once. The factory builds a permanent machine. Every time you need a part, you just run the machine. No re-reading the blueprint. No rebuilding.

The crate uses a hybrid automaton approach under the hood. It combines the raw scanning speed of a deterministic finite automaton with the flexible grouping of a nondeterministic finite automaton. This means it can often scan text in a single pass without backtracking. You will not hit catastrophic backtracking where a malicious input freezes your program. The crate also guarantees that matches are UTF-8 aware by default. You do not need to worry about multi-byte characters breaking your match unless you explicitly opt out.

Compile once. Match many. That is the contract.

Minimal example

Here is the basic workflow. You create a Regex from a pattern string, then use methods like is_match or replace_all to process text.

use regex::Regex;

fn main() {
    // Compile the pattern once. The raw string avoids escaping backslashes.
    // unwrap() is safe here because the pattern is hardcoded and valid.
    let re = Regex::new(r"\berror\b").unwrap();

    let log_line = "System check: no error found. Status: OK.";

    // is_match returns a boolean. It is the fastest way to check for presence.
    if re.is_match(log_line) {
        println!("Found an error!");
    } else {
        println!("Clean.");
    }

    // replace_all returns a Cow<str>.
    // If no replacement happens, it returns the original slice without allocating.
    let cleaned = re.replace_all(log_line, "WARNING");
    println!("{}", cleaned);
}

Convention aside: the community prefers Regex::new with unwrap() only for patterns hardcoded in the source. If the pattern comes from user input or a config file, handle the Result properly. A bad pattern should crash the startup or return an error, not panic deep in a request handler.

Hardcode patterns safely. Unwrap is fine when the pattern lives in your source code.

How the match happens

When you call Regex::new, the crate parses your pattern string. It checks for syntax errors. If the pattern is valid, it builds an internal state machine. This object is what you store in re. The compilation step takes microseconds for simple patterns and milliseconds for complex ones. It is cheap, but it is not free.

When you call is_match, the state machine walks over your input text. It tracks which states are active as it consumes characters. If it reaches an accepting state, it returns true. Because the machine is pre-built, this walk is very efficient. The crate also handles Unicode normalization implicitly for many properties. For example, \w matches Unicode word characters, not just ASCII letters. This is usually what you want, but it can be a surprise if you are expecting ASCII-only behavior.

If you need ASCII-only matching, you can use the (?-u) flag to disable Unicode mode. Most users do not need this. Trust the automaton. It will not backtrack into a time sink.

Extracting data with captures

Checking if a pattern exists is useful. Extracting data is where regex shines. Use captures to get the matched groups.

use regex::Regex;

fn main() {
    // Named groups make the code self-documenting.
    // (?P<name>...) syntax is supported.
    let re = Regex::new(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})").unwrap();
    let text = "Event on 2023-10-27 was great.";

    // captures returns an Option<Captures>.
    // It is None if no match, Some if a match exists.
    if let Some(caps) = re.captures(text) {
        // get returns an Option<Match>.
        // Use map to extract the string slice safely.
        let year = caps.get("year").map(|m| m.as_str()).unwrap();
        let month = caps.get("month").map(|m| m.as_str()).unwrap();
        let day = caps.get("day").map(|m| m.as_str()).unwrap();

        println!("Date: {}-{}-{}", year, month, day);
    }
}

The captures method returns an Option. This forces you to handle the case where the pattern does not match. You cannot accidentally dereference a null match. The get method returns an Option<Match>, which also forces safety. You can access groups by index or by name. Named groups are the convention for readable code. They make refactoring easier and eliminate magic numbers.

Named groups make your code self-documenting. Use them.

Realistic example: URL router

Here is a scenario closer to real code. You are building a simple URL router that matches patterns and extracts parameters.

use regex::Regex;
use std::collections::HashMap;

fn main() {
    // Store patterns and their handlers.
    // In a real app, you would use a struct or enum for handlers.
    let routes: Vec<(Regex, &str)> = vec![
        (Regex::new(r"^/user/(?P<id>\d+)$").unwrap(), "get_user"),
        (Regex::new(r"^/post/(?P<slug>[a-z-]+)$").unwrap(), "get_post"),
    ];

    let requests = vec!["/user/123", "/post/rust-regex-guide", "/unknown/path"];

    for path in requests {
        let mut matched = false;
        for (re, handler) in &routes {
            if let Some(caps) = re.captures(path) {
                // Build a params map from named captures.
                let mut params = HashMap::new();
                for name in re.capture_names().flatten() {
                    if let Some(m) = caps.name(name) {
                        params.insert(name.to_string(), m.as_str().to_string());
                    }
                }
                println!("Route: {} -> Handler: {}", path, handler);
                matched = true;
                break;
            }
        }
        if !matched {
            println!("404: {}", path);
        }
    }
}

This example shows how to iterate over multiple patterns, extract named captures, and build a parameter map. The capture_names method helps you iterate over the groups dynamically. This pattern is common in web frameworks and CLI tools.

Cow saves allocations. Let the compiler borrow when nothing changes.

Global patterns and caching

If you have a pattern you use throughout your application, compile it once and share it. Do not compile it in every function call.

use std::sync::LazyLock;
use regex::Regex;

// LazyLock compiles the regex the first time it is accessed.
// The result is cached for the lifetime of the program.
static DATE_PATTERN: LazyLock<Regex> = LazyLock::new(|| {
    Regex::new(r"\d{4}-\d{2}-\d{2}").unwrap()
});

fn check_date(text: &str) -> bool {
    // Accessing the static compiles the regex on first use.
    // Subsequent accesses are instant.
    DATE_PATTERN.is_match(text)
}

Convention aside: lazy_static was the standard for years. Modern Rust prefers std::sync::LazyLock (stable in recent versions) or once_cell::sync::Lazy. LazyLock is zero-cost and built into the standard library. Ditch lazy_static for new code unless you are stuck on an old compiler.

LazyLock is the modern standard. Ditch lazy_static for new code.

Pitfalls and compiler errors

Regex in Rust has a few traps. Avoid them by understanding the types and the UTF-8 assumption.

Recompiling in a loop kills performance. If you put Regex::new inside a loop, you pay the compilation cost every iteration. Compile once, reuse many times.

// BAD: Compiles the regex on every iteration.
// let re = Regex::new(r"\d+").unwrap();
// for line in lines { if re.is_match(line) { /* ... */ } }

// GOOD: Compile outside the loop.
let re = Regex::new(r"\d+").unwrap();
// for line in lines { if re.is_match(line) { /* ... */ } }

Binary data breaks the default engine. Regex in Rust assumes UTF-8. If you are scanning binary files, network packets, or non-UTF-8 encodings, the standard Regex will refuse to match invalid sequences. Switch to regex::bytes::Regex for binary data. It works the same way but operates on &[u8] instead of &str.

The replace method only replaces the first match. If you expect all occurrences to change, you will be surprised. Use replace_all for global replacement.

Type mismatches happen when you forget to handle the result. Regex::new returns a Result. If you forget to unwrap or handle the result, the compiler rejects you with E0308 (mismatched types). You tried to stuff a Result<Regex, Error> into a Regex variable.

// E0308: mismatched types
// found `Result<regex::Regex, regex::Error>`, expected `regex::Regex`
let re: Regex = Regex::new(r"\d+");

Method not found errors appear when you try to call is_match on a String directly. You get E0599 (no method named is_match found for struct String). You need a Regex object.

Check the return type. Regex::new gives you a Result, not a Regex.

Decision: regex vs alternatives

Regex is powerful, but it is not always the right tool. Pick the right tool for the job.

Use regex when you need pattern matching with capture groups, alternation, or quantifiers on UTF-8 text. It is the standard choice for almost all Rust projects. Reach for str::contains or str::starts_with when you are checking for a literal substring. These methods are faster and have zero dependencies. Do not pull in a regex engine to check if a string contains "error". Pick the glob crate when you are matching file paths with wildcards. Patterns like *.md or src/**/*.rs are path globs, not regex. The glob crate handles OS-specific path rules correctly. Use regex-automata when you are writing a high-performance search tool and need fine-grained control over the automaton state. This is the engine under regex, but it exposes a more complex API for streaming and multi-pattern matching. Consider regex-lite when binary size matters and you can live without advanced features like lookarounds. It is a smaller crate that covers the most common regex use cases.

Match the tool to the job. Regex is powerful, but literal checks are faster.

Where to go next

Regex lets you find and replace text patterns in your code, like cleaning up formatting or renaming files. Think of it as a powerful search tool that understands complex rules instead of just exact words. You define the rule once, then use it to fix or check many strings at once.