How to split a string

When splitting gets tricky

You're building a CLI tool that reads a configuration file. The file contains lines like database_host=localhost and log_level=debug. You need to extract the keys and values. You grab a line, call split('='), and suddenly you're juggling iterators, string slices, and lifetime errors. Or maybe you're processing a CSV where split(',') produces empty strings for missing fields, and you have to filter them out manually.

String splitting looks trivial in languages that treat strings as arrays of characters. Rust forces you to think about memory, ownership, and exactly how you want the data cut. Strings in Rust are UTF-8 byte sequences, not character arrays. Splitting must respect Unicode boundaries, and the result is an iterator of slices, not a vector of new strings. This design keeps memory usage low and performance high, but it requires understanding the mechanics.

Rust treats strings as byte sequences with Unicode guarantees, so splitting requires respecting character boundaries and ownership rules.

How splitting works in Rust

Splitting a string in Rust does not allocate new memory. It does not copy characters into new buffers. Instead, it returns an iterator that yields &str slices. A &str is a fat pointer: it contains a pointer to the start of the substring and its length in bytes. These slices point directly into the original string.

Think of a string as a long ribbon. Splitting doesn't cut the ribbon and hand you separate pieces. It hands you a pair of scissors and tells you where the cuts are. You look at the ribbon through the scissors to see each segment. The ribbon stays whole. If you want separate pieces, you have to glue them onto new cards, which means allocating String objects. Until you do that, you're just viewing parts of the original data.

The iterator is a lazy promise. It does zero work until you pull the next value, keeping your code efficient by default.

Minimal example: words and whitespace

The most common splitting task is breaking text into words. Rust provides split_whitespace for this. It handles all Unicode whitespace characters, including spaces, tabs, newlines, and carriage returns. It also skips empty tokens, so multiple spaces in a row don't produce empty strings.

fn main() {
    let text = "hello world wonderful world";

    // split_whitespace returns an iterator over &str slices.
    // No new memory is allocated. The slices point into `text`.
    for word in text.split_whitespace() {
        println!("{word}");
    }
}

The output prints each word on a separate line. The for loop consumes the iterator, calling next() until it returns None. Each iteration yields a &str that references the original text.

Convention aside: prefer split_whitespace over split(' ') for word tokenization. split(' ') only matches the space character. It misses tabs and newlines, and it produces empty strings for consecutive spaces. split_whitespace is the robust default that handles real-world text correctly.

Walkthrough: the iterator and the slice

When you call text.split_whitespace(), Rust creates a SplitWhitespace struct. This struct holds a reference to the string and the current position in the scan. It implements the Iterator trait.

The for loop calls next() on the iterator. The first call scans forward, skipping any whitespace characters. It finds the start of "hello". It scans forward again until it hits whitespace or the end of the string. It calculates the length of "hello". It returns Some(&str) containing the pointer and length. The loop body runs and prints "hello".

The next call to next() continues from where it left off. It skips the space, finds "world", and returns it. This continues until the iterator reaches the end of the string. The final call returns None, and the loop terminates.

At no point did Rust allocate memory for the words. The &str values are just views into text. If text goes out of scope, all the slices become invalid. The borrow checker enforces this. You cannot return a slice from a function if the original string is local to that function. The compiler will reject the code with E0515 (cannot return value referencing local variable). The slice points to memory that will be freed when the function returns.

Trust the borrow checker. If the compiler complains about lifetimes, your slice is trying to outlive the data it points to. Fix the scope, don't fight the error.

Realistic example: parsing config lines

In real applications, you often need to split on a specific delimiter and extract structured data. A common pattern is parsing key=value pairs. Using split works, but it scans the entire string even if you only need the first split. A better tool is split_once.

split_once finds the first occurrence of the delimiter and returns an Option<(&str, &str)>. It stops scanning after the first match, which saves CPU cycles for long strings. It also returns a tuple directly, avoiding the overhead of an iterator.

/// Parses a "key=value" line into a tuple of references.
fn parse_entry(line: &str) -> Option<(&str, &str)> {
    // split_once stops at the first delimiter.
    // It avoids scanning the rest of the string.
    // Returns Option<(&str, &str)> for safe handling.
    line.split_once('=')
}

fn main() {
    let config = "host=localhost";

    // Pattern match on the Option to handle missing delimiters.
    if let Some((key, value)) = parse_entry(config) {
        println!("Key: {key}, Value: {value}");
    }
}

The function returns Option because the delimiter might not exist. If the line is just host, split_once returns None. The caller handles this safely with pattern matching.

Convention aside: use split_once for single delimiters in structured data. It's faster than split and clearer in intent. If you only need the first split, split is the wrong tool. It builds an iterator that implies you might consume multiple segments. split_once signals that you expect exactly one split point.

Use split_once for single delimiters. It stops scanning early and returns a clean tuple, saving CPU cycles and boilerplate.

Pitfalls and compiler errors

Splitting strings introduces a few common pitfalls. Understanding these prevents bugs and compiler errors.

Collecting into owned strings

The iterator yields &str slices. If you need owned String values, you must convert them. A frequent mistake is trying to collect directly.

let text = "hello world";
// This fails to compile.
let words: Vec<String> = text.split_whitespace().collect();

The compiler rejects this with E0277 (trait bound not satisfied). The collect method cannot infer that you want String values from an iterator of &str. You must explicitly map the slices to owned strings.

// Map each &str to a String using String::from.
let words: Vec<String> = text.split_whitespace().map(String::from).collect();

This allocates a new String for each word. The allocation cost is real. Only collect if you need to store the words beyond the lifetime of the original string. Streaming through the iterator avoids allocation entirely.

Empty segments and delimiters

The split method preserves empty segments. If the string starts with the delimiter, ends with the delimiter, or has consecutive delimiters, split produces empty strings.

let csv = "a,,b,";
// split produces ["a", "", "b", ""]
for field in csv.split(',') {
    println!("{field:?}");
}

The output includes empty strings for the missing field and the trailing comma. If you want to discard the trailing empty segment, use split_terminator.

// split_terminator discards the trailing empty segment.
// Produces ["a", "", "b"]
for field in csv.split_terminator(',') {
    println!("{field:?}");
}

split_terminator treats the delimiter as a terminator rather than a separator. It's useful for CSV parsing where a trailing comma indicates the end of the record, not an extra empty field.

Keeping the delimiter

Sometimes you need the delimiter itself. split_inclusive returns segments that include the delimiter.

let log = "ERROR: disk full";
// split_inclusive keeps the delimiter with each segment.
// Produces ["ERROR:", " disk full"]
for part in log.split_inclusive(':') {
    println!("{part:?}");
}

This is handy for log parsing where you want to preserve the log level attached to the message.

UTF-8 boundaries

Rust strings are UTF-8. Splitting respects character boundaries. You cannot split on arbitrary byte indices. If you try to slice a string at a byte index that falls in the middle of a multi-byte character, the program panics at runtime. split handles this correctly by scanning for valid character boundaries. Never assume byte indices align with character indices.

Trust the borrow checker. If the compiler complains about lifetimes, your slice is trying to outlive the data it points to. Fix the scope, don't fight the error.

Decision: choosing the right split method

Rust provides several splitting methods. Each matches a specific data pattern. Pick the one that aligns with your requirements.

Use split_whitespace when tokenizing words and ignoring all Unicode whitespace variants automatically.

Use split when you need precise control over delimiters and want to preserve empty segments for downstream processing.

Use split_once when parsing structured data with a single separator, like key=value pairs or host:port addresses.

Use split_terminator when the delimiter marks the end of a segment and a trailing delimiter should not produce an empty final segment.

Use split_inclusive when you need to retain the delimiter with each segment for context preservation in logs or formatted data.

Match the method to the data shape. Precision beats brute force.

Where to go next

Splitting a string means breaking a long line of text into smaller pieces based on spaces or specific characters. You use this when you need to process individual words or sentences separately, like counting how many times a word appears. Think of it like cutting a loaf of bread into slices so you can eat them one by one.