Parsing with nom
You're building a CLI tool that reads a configuration file. Each line looks like port = 8080 or host = "localhost". You try regex. It works for one field. It falls apart when you need to handle optional whitespace, quoted strings, and comments. You try writing a state machine. It's fifty lines of index tracking and match statements. You want to describe the shape of the data and get the values back.
nom gives you that. It lets you build parsers by composing small, reusable functions. You define the grammar in code, and the library handles the traversal. You write a parser for a number. You write a parser for a key. You snap them together with combinators to parse a key-value pair. The result is code that reads like the grammar specification.
Parsers are functions
A parser in nom is a function that takes input and returns a result. The result is a tuple. The first element is the remaining input. The second element is the value you parsed. This design makes parsers composable. You can chain them. If the first parser succeeds, it hands the rest of the input to the second parser. If it fails, the whole chain fails.
Think of a parser as a filter in a pipeline. The input flows in. The parser consumes what it matches. It passes the rest downstream. Combinators are the connectors. They take parsers and return new parsers. tag creates a parser that matches a literal string. map takes a parser and transforms its output. alt tries a list of parsers until one succeeds. You snap these together to describe complex structures without writing a single loop.
Treat every parser as a pure function. If it doesn't return the remaining input, it's not a nom parser.
Minimal example
The core type is IResult. It's an alias for Result<(I, O), E>. I is the input type. O is the output type. E is the error type. In most cases, I is &str or &[u8]. O is whatever you parsed. E is a nom error struct.
use nom::{IResult, bytes::complete::tag, combinator::map};
/// Parses the literal string "hello" and returns a fixed greeting.
fn parse_hello(input: &str) -> IResult<&str, &str> {
// map takes a parser and a transformation function.
// tag("hello") consumes "hello" from the input.
// The closure |_| "greeting" ignores the matched text and returns a constant.
map(tag("hello"), |_| "greeting")(input)
}
fn main() {
let result = parse_hello("hello world");
// IResult returns the remaining input and the parsed value.
// The remaining input is " world". The value is "greeting".
assert_eq!(result, Ok((" world", "greeting")));
}
tag("hello") looks for the literal bytes hello. It consumes them. It returns the rest of the string. map wraps tag. It runs tag, gets the result, and applies the closure. The closure ignores the matched text and returns "greeting". The final result is Ok((" world", "greeting")). The space remains. The value is transformed.
If the input doesn't start with hello, tag returns an error. map propagates that error. The parser fails fast.
Realistic example
Real parsers handle structure. You often need to parse a key, a delimiter, and a value. nom provides separated_pair for this. It parses the first element, then a separator, then the second element. It returns a tuple of the two values.
use nom::{
IResult,
bytes::complete::{tag, take_while1},
character::complete::{u32, space0},
combinator::map,
sequence::{preceded, separated_pair},
};
/// Parses a configuration line like "port = 8080".
/// Returns the key as a string slice and the value as a u32.
fn parse_config(input: &str) -> IResult<&str, (&str, u32)> {
// separated_pair parses A, then a separator, then B.
// Here we parse a key, then " = ", then a number.
// space0 handles optional whitespace around the equals sign.
separated_pair(
// take_while1 consumes characters matching the predicate.
// This captures alphanumeric keys.
take_while1(|c: char| c.is_alphanumeric()),
// preceded parses the separator but discards it from the result.
// We parse optional space, then "=", then optional space.
preceded(space0, preceded(tag("="), space0)),
// u32 is a built-in parser for unsigned integers.
// It consumes digits and returns a u32 value.
u32,
)(input)
}
fn main() {
let result = parse_config("port = 8080\n");
// The parser consumes "port = 8080".
// It returns the remaining "\n" and the tuple ("port", 8080).
assert_eq!(result, Ok(("\n", ("port", 8080))));
}
take_while1 consumes characters as long as the predicate returns true. It stops when the predicate fails. It returns the slice of consumed characters. u32 is a built-in parser. It consumes digits and returns a u32. preceded parses the separator but discards it. This is how you handle delimiters without cluttering the result.
Convention aside: nom parsers return references into the input by default. This avoids allocation. The key "port" is a slice of the original string. If you need owned data, use map to clone or convert. Convention is to keep parsers returning references as long as possible. Convert to owned types at the boundary. This keeps the parser fast and memory-efficient.
Nest parsers like Russian dolls. The outer parser handles the structure. The inner parsers handle the details.
Pitfalls and errors
The biggest pitfall is mixing complete and streaming parsers. nom has two modes. complete parsers assume the input is fully available. streaming parsers handle partial input. If you use streaming parsers on a complete string, you might get Incomplete errors. The parser thinks more data is coming. It waits. The result is a runtime error that looks like a logic bug.
Convention: Use bytes::complete and character::complete for parsing files or strings loaded in memory. Use bytes::streaming for network streams or interactive input. Pick the mode at the start. Switching halfway through breaks the contract.
If you forget to return IResult, the compiler rejects you with E0308 (mismatched types). Parsers must return the tuple. If you try to use a parser on a type that doesn't implement the required trait, you get E0277 (trait bound not satisfied). Check your imports. nom parsers are generic over the input type. Passing String instead of &str causes type errors.
Pick complete or streaming at the start. Switching halfway through breaks the contract.
When to use nom
Use nom when you need to parse structured text with a custom grammar, like a config file, a mini-language, or a log format that regex can't handle cleanly.
Use serde when you are parsing standard formats like JSON, TOML, or YAML; nom is overkill for formats that already have robust, optimized parsers.
Use regex when you just need to extract a pattern from a string without building a full grammar; regex is faster to write for simple matching tasks.
Use a hand-written state machine when profiling shows nom is too slow for a tight inner loop and you need absolute control over memory allocation and branching.
Reach for nom when the grammar is yours. Reach for libraries when the format is standard.