How to iterate over characters

When bytes break your text

You ask a user for their name. They type "José". You grab the string, loop over the bytes, and print them out. You get J, o, s, e, and then two bytes that look like random noise. The console spits out J o s e Ã©. The user stares at the screen. The name is broken.

Bytes are not characters. Rust strings are UTF-8, which means a single character can take one, two, three, or four bytes. If you iterate over bytes, you slice through the middle of emojis, accented letters, and Chinese characters. You need to iterate over char.

How UTF-8 and `chars()` work

A String in Rust is just a bag of bytes. It doesn't know where one character ends and the next begins. UTF-8 is a variable-length encoding. The letter A takes one byte. The letter é takes two. The emoji 🦀 takes four. If you treat the string like a fixed-width array, you'll cut characters in half.

chars() acts like a decoder. It reads the bytes, looks at the first byte to figure out how many more bytes belong to that character, groups them together, and hands you a complete char. It's the difference between reading a stream of raw pixels and reading the words on the page.

/// Demonstrates iterating over characters correctly.
fn main() {
    let text = "Hello, 世界!";
    
    // .chars() returns an iterator over Unicode scalar values.
    // This handles multi-byte characters like '世' correctly.
    for c in text.chars() {
        println!("Char: {}", c);
    }
}

When you call .chars(), Rust creates an iterator. This iterator holds a cursor pointing to the start of the string. Every time the loop asks for the next item, the iterator peeks at the current byte. If the top bit is zero, it's a one-byte ASCII character. The iterator grabs that byte, advances the cursor by one, and returns the character.

If the top bits signal a multi-byte sequence, the iterator reads the required number of continuation bytes, validates them, assembles the Unicode code point, advances the cursor past the whole sequence, and returns the result. If the bytes are malformed, the iterator substitutes a replacement character. The loop never sees the raw bytes. It only sees valid char values.

Bytes are not characters. Trust the iterator.

The `char` type costs memory

The char type is a fixed-size value. It takes four bytes of memory. This might seem wasteful compared to a u8, but it's necessary. Unicode code points range up to 0x10FFFF, which requires 21 bits. Rust rounds char up to 32 bits for alignment and simplicity. Every char can hold any Unicode scalar value.

This has a memory cost. If you collect characters into a Vec<char>, you expand the data. A string of 100 ASCII characters takes 100 bytes in a String. The same characters in a Vec<char> take 400 bytes.

Keep your data in String or &str. Iterate with .chars() when you need to process the content. Don't convert to Vec<char> just to hold the data. The community convention is to treat char as a transient value during iteration, not a storage format. If you need random access to characters, you're fighting the encoding. Restructure the logic to use an iterator or a different data structure.

Pitfalls that trip up beginners

Indexing strings fails

You cannot index into a string with text[i]. The compiler rejects this because str does not implement Index<usize>. The error tells you that indexing requires constant-time access, which UTF-8 cannot guarantee. You have to walk the bytes to find the Nth character.

If you need the character at position N, use .chars().nth(n). This walks the iterator N times. It's linear time, not constant time. Accept the cost. Indexing strings is a trap. Walk the iterator.

`to_uppercase` returns an iterator

This is a surprise for many. char::to_uppercase() returns a CharCaseMapping iterator, not a char. Why? Because one character can expand into multiple characters during case conversion. The German sharp s ß becomes SS.

If you write let upper = c.to_uppercase();, the compiler complains about a type mismatch. You must consume the iterator. let upper: String = c.to_uppercase().collect(); is the safe pattern. Or use .next() if you expect a single character and want to ignore expansion. c.to_uppercase().next() gives you the first character of the expansion.

Case conversion can expand characters. Handle the iterator.

Comparing `char` to `&str`

Comparing a char to a string slice fails. c == "a" triggers E0308 (mismatched types). You must compare char to char using c == 'a'. The compiler is strict here because &str can contain multiple characters or be empty. The types are fundamentally different.

Realistic example: validating input

Let's build a password validator. It needs to check the length, reject emojis, and ensure the input contains only alphanumeric characters and underscores. This is a common pattern in web backends and CLI tools.

/// Validates a password string based on simple rules.
fn is_valid_password(password: &str) -> bool {
    // .chars() gives us Unicode characters.
    // .count() consumes the iterator to get the character count.
    // This counts characters, not bytes, which matches user expectations.
    if password.chars().count() < 8 {
        return false;
    }

    // .all() checks every character.
    // We reject emojis by checking the Unicode scalar value range.
    // Emojis generally live above 0x2FFFF, but this is a simplified check.
    // For production, use a crate like `unicode-ident` or `emoji`.
    if password.chars().any(|c| c > '\u{2FFFF}') {
        return false;
    }

    // .all() returns true if every character satisfies the predicate.
    // is_alphanumeric() handles Unicode letters and digits correctly.
    password.chars().all(|c| c.is_alphanumeric() || c == '_')
}

fn main() {
    let tests = vec![
        "rust_is_cool",      // Valid
        "short",             // Too short
        "has_emoji_🦀",      // Emoji rejected
        "123_456_789",       // Valid
    ];

    for test in tests {
        let result = is_valid_password(test);
        println!("{} -> {}", test, result);
    }
}

The validator uses .chars().count() for length. This counts characters, not bytes. A user typing "café" sees four characters. The length check passes. If you used .len(), you'd get five bytes, and the logic might drift from user perception.

The validator also uses .all() and .any(). These methods consume the iterator and return a boolean. They stop early if the condition is met. .any() returns true as soon as it finds an emoji. .all() returns false as soon as it finds an invalid character. This is efficient. You don't scan the whole string if you can fail fast.

When to use what

Use .chars() when you need to process individual Unicode scalar values, such as counting letters, checking for specific symbols, or validating input content.

Use .bytes() when you are performing binary-safe operations, parsing protocols, or need maximum performance on ASCII-only data where you can guarantee no multi-byte sequences exist.

Use .split() or .lines() when you need to break the string into substrings based on delimiters rather than examining individual characters.

Use the unicode-segmentation crate when you need grapheme clusters, such as calculating visual width for a terminal or handling emojis with modifiers like 👨‍👩‍👧‍👦.

Pick the tool that matches the granularity of your problem.

Where to go next

In Rust, strings are made of bytes, but characters can be larger than one byte. The .chars() method breaks the string into individual, readable characters so you can process them one by one. Think of it like reading a sentence word by word instead of letter by letter, ensuring you don't cut a word in half.