How to Get a Substring (Slice) from a String in Rust

You cannot directly slice a `String` using integer indices because Rust enforces UTF-8 validity, so you must use `char_indices()` to find valid byte boundaries or use the `split_at` method with a pre-calculated byte offset.

The UTF-8 boundary trap

You write let sub = text[0..5]; because you want the first five characters. The code compiles without warnings. You run the program with the input "Hello, δΈ–η•Œ". The program crashes with a panic: byte index 5 is not a char boundary.

You didn't do anything wrong in your head. You asked for five characters. Rust interpreted your request as five bytes. Five bytes isn't always five characters. The slice landed in the middle of a multi-byte character, and Rust refuses to hand you invalid UTF-8 data. Slicing a string in Rust requires byte offsets that align exactly with character starts. If you slice in the middle of a character, you create a broken string, and the runtime stops you before that broken string can corrupt your program.

Bytes, characters, and the cost of safety

Rust stores String data as UTF-8 encoded bytes. UTF-8 is a variable-width encoding. ASCII characters like a, z, 0, and punctuation take one byte. Characters from many other scripts, like Chinese, Japanese, or Korean, typically take three bytes. Emoji and rare symbols can take four bytes.

When you index a string with [start..end], the indices are byte offsets, not character counts. The compiler treats the string as a sequence of bytes. It checks that start and end fall on valid character boundaries. If they don't, the program panics at runtime.

Think of a string as a row of houses. In some languages, every house is the same width. You can jump to house five by counting five units. In Rust, houses vary in width. Some are one brick wide, some are three. If you count five bricks, you might land in the middle of a wide house. Rust won't let you break the wall. You have to count houses to find the right wall.

This design guarantees that every &str in Rust is valid UTF-8. You never have to check for encoding errors when you receive a string slice. The safety check happens once, at the slice point. The cost is that you must calculate byte offsets carefully when you work with character counts.

Finding the byte offset with char_indices

When you need to slice by character count, use char_indices(). This iterator yields pairs of (byte_offset, char) as it decodes the string. You can advance the iterator to the character you want, grab the byte offset, and use that offset for slicing.

fn main() {
    let text = "Hello, δΈ–η•Œ";
    // char_indices yields (byte_offset, char) for each character
    // .nth(5) advances the iterator to the 6th character (index 5)
    // This stops early and avoids iterating the whole string
    let byte_offset = text.char_indices()
        .nth(5)
        .map(|(i, _)| i) // Extract the byte index from the tuple
        .unwrap_or(text.len()); // Fallback to end if string is shorter

    // Slice from the calculated byte offset to the end
    // This is safe because char_indices guarantees valid boundaries
    let sub = &text[byte_offset..];
    println!("{sub}"); // Output: "δΈ–η•Œ"
}

The iterator decodes UTF-8 one character at a time. It tracks the current byte position. When you call nth(5), the iterator calls next() five times and stops. You get the byte offset where the sixth character begins. The slice &text[byte_offset..] starts at that boundary, so the compiler and runtime accept it.

Convention aside: Use char_indices() when you need the byte offset. Use chars() when you only need the characters and don't care about positions. chars() drops the byte index, which saves a tiny amount of overhead, but char_indices() is the right tool when you need to slice.

Slicing is a zero-cost operation once you have the byte offset. The slice is just a pointer and a length. No data is copied. The work happens in finding the offset, not in creating the slice.

The O(1) alternative: split_at

If you already know the byte offset, use split_at(). This method splits the string into two parts at a given byte index. It checks the boundary and returns a tuple of two slices. It runs in O(1) time because it doesn't iterate the string. It just validates the index and adjusts pointers.

fn main() {
    let text = "Rust is great";
    // Byte index 4 is the space after "Rust"
    // split_at checks the boundary and returns two views
    // This is O(1) and does not allocate or copy data
    let (prefix, suffix) = text.split_at(4);

    println!("Prefix: {prefix}"); // "Rust"
    println!("Suffix: {suffix}"); // " is great"
}

split_at panics if the index is not a valid character boundary. Use it when you are confident the offset is correct, such as when you parsed the offset from a fixed-width header or calculated it from previous safe operations.

If you need to handle invalid offsets gracefully, use get() instead. get() returns an Option<&str>. It returns None if the range is out of bounds or crosses a character boundary. This lets you handle errors without panicking.

fn main() {
    let text = "Hello, δΈ–η•Œ";
    // Byte index 7 is the start of 'δΈ–'
    // Byte index 10 is the end of 'δΈ–'
    // get returns Option<&str> to handle invalid ranges safely
    let sub = text.get(7..10);

    match sub {
        Some(s) => println!("Found: {s}"), // "δΈ–"
        None => println!("Invalid range"),
    }

    // Byte index 8 is inside 'δΈ–', so this returns None
    let bad = text.get(8..10);
    println!("Bad range: {bad:?}"); // None
}

Convention aside: The community prefers get() over is_char_boundary() followed by a slice. get() combines the check and the slice in one call. It's clearer and less error-prone. If you see is_char_boundary() in code, it's usually because the author needs to validate an offset without creating a slice, such as when building a custom parser state machine.

Trust get() for safe slicing. It handles the boundary check and the slice atomically.

Realistic example: parsing a log line

Real code often needs to extract substrings from structured text. Consider a log parser that extracts the username from a line formatted as "USER:alice:logged_in". The username length varies, but the delimiters are fixed bytes. You can find the delimiters by byte offset and slice safely.

/// Extracts the username from a log line with format "USER:username:action"
/// Returns None if the format is invalid or the username is empty
fn extract_username(line: &str) -> Option<&str> {
    // Find the first colon
    let first_colon = line.find(':')?;
    // Find the second colon after the first one
    let second_colon = line[first_colon + 1..].find(':')?;

    // Calculate byte offsets relative to the start of the string
    // first_colon is the index of ':'
    // username starts at first_colon + 1
    let start = first_colon + 1;
    // second_colon is relative to the slice, so add first_colon + 1
    let end = first_colon + 1 + second_colon;

    // Use get to slice safely
    // This handles cases where offsets are out of bounds
    line.get(start..end).filter(|s| !s.is_empty())
}

fn main() {
    let log = "USER:alice:logged_in";
    match extract_username(log) {
        Some(user) => println!("User: {user}"), // "alice"
        None => println!("Invalid log line"),
    }
}

The find() method returns byte offsets. It scans for the delimiter byte. Once you have the offsets, you use get() to extract the substring. The filter() call ensures the username isn't empty. This function handles variable-length usernames and returns None for malformed input.

Don't assume find() returns character indices. It returns byte offsets. The offsets work directly with get() because both operate on bytes.

Pitfalls and performance traps

The len() method returns the byte length, not the character count. If you use len() to calculate a character-based slice, you will get wrong results. For "Hello, δΈ–η•Œ", len() returns 13. The character count is 9. Using len() as a character count leads to off-by-one errors and boundary panics.

fn main() {
    let text = "Hello, δΈ–η•Œ";
    // len() returns bytes, not characters
    println!("Bytes: {}", text.len()); // 13
    println!("Chars: {}", text.chars().count()); // 9

    // This panics because 9 is not a valid byte boundary
    // let sub = &text[0..9]; // Panic: byte index 9 is not a char boundary
}

Performance trap: Calling char_indices().nth(n) inside a loop creates an O(NΒ²) algorithm. Each call to nth(n) iterates from the start of the string to index n. If you do this for every character, you rescan the string repeatedly.

// BAD: O(NΒ²) performance
fn print_chars_slow(text: &str) {
    for i in 0..text.chars().count() {
        // nth(i) iterates from start every time
        let byte = text.char_indices().nth(i).map(|(b, _)| b).unwrap();
        let ch = text.chars().nth(i).unwrap();
        println!("{byte}: {ch}");
    }
}

// GOOD: O(N) performance
fn print_chars_fast(text: &str) {
    // Iterate once and collect offsets
    for (byte, ch) in text.char_indices() {
        println!("{byte}: {ch}");
    }
}

Use a single char_indices() loop when you need to process multiple characters. Cache the byte offsets if you need random access later. The compiler can't optimize away repeated iterations. You must structure the loop to avoid rescanning.

Compiler error: If you try to slice a String and assign it to a String, you get E0308 (mismatched types). Slicing produces a &str, not a String. You must call .to_string() or .into() to convert the slice to an owned string.

fn main() {
    let text = String::from("Hello");
    // E0308: mismatched types
    // expected `String`, found `&str`
    // let sub: String = &text[0..5];

    // Correct: convert slice to owned String
    let sub: String = text[0..5].to_string();
    println!("{sub}");
}

Slicing is cheap. Converting to String allocates memory. Slice as much as possible. Convert to String only when you need ownership.

Decision matrix

Use char_indices() when you need to convert a character count into a byte offset for slicing.

Use split_at(byte_offset) when you already have a valid byte offset and want to split the string into two parts without allocating.

Use get(start..end) when you have byte offsets that might be invalid and need to handle the failure gracefully without panicking.

Use chars() when you only need to iterate over characters and don't care about byte positions.

Use is_char_boundary(byte_offset) when you must validate a byte offset before performing a raw slice operation in a low-level context.

Use find() and rfind() when you need to locate delimiters by byte offset for structured parsing.

Use to_string() on a slice only when you need an owned String. Slices are &str and don't allocate.

Where to go next