When data needs a home
You are writing a script to parse a log file. You see a stream of lines. You need to count how many times each error code appears. You need to store the raw lines for a debug dump. You need to track the most recent timestamp. You reach for a variable, but Rust asks: what kind of box does this data live in?
The standard library gives you a toolbox, not a single hammer. Picking the wrong tool makes your code slow, hard to read, or impossible to compile. A Vec is not a HashMap. A String is not a Vec<u8>. The compiler forces you to declare your intent. This explicitness prevents bugs where you assume order in a map or treat text as raw bytes.
Collections as storage shapes
Think of collections as different types of storage units optimized for specific access patterns.
A Vec<T> is a row of lockers where the order matters. Locker 0 is next to Locker 1. You can grab Locker 5 instantly because you just count five steps from the start. The memory is contiguous. This makes Vec incredibly fast for iteration and indexing. It also means inserting in the middle is expensive; you have to shift every locker after the insertion point.
A HashMap<K, V> is a warehouse with a magical index card system. You don't care about order. You care that if you ask for "Widget A", the system hands you "Widget A" without checking every other widget. The map uses a hash function to turn your key into a number, then uses that number to find a bucket. Lookups are average O(1). Insertions are average O(1). But the memory layout is scattered. You cannot predict iteration order.
A String is a Vec<u8> with a safety harness. It stores bytes on the heap, just like a Vec. But it guarantees that every byte sequence is valid UTF-8. This means you can display it as text safely. It also means you cannot write arbitrary bytes into it without validation. The harness adds a tiny overhead for mutation, but it prevents mojibake and memory corruption.
Minimal examples with intent
Every collection has a creation pattern and an access pattern. The code reveals the shape.
use std::collections::HashMap;
fn main() {
// Vec keeps order. Indexing is O(1).
// Use vec! macro for initialization; it's idiomatic and concise.
let mut tasks = vec!["write code", "fix bug", "deploy"];
// Push is amortized O(1). The Vec grows its capacity as needed.
tasks.push("write tests");
// Indexing panics on out-of-bounds. Use .get() for safe access.
println!("First task: {}", tasks[0]);
// HashMap maps keys to values. Order is undefined.
// Use HashMap::new() for empty maps.
let mut config = HashMap::new();
// Insert returns Option<V> if the key existed.
config.insert("timeout", 30);
config.insert("retries", 3);
// Access by key. Indexing syntax works for HashMap too.
println!("Timeout: {}", config["timeout"]);
// String is text. It's a Vec<u8> that checks UTF-8 validity.
let name = String::from("Rustacean");
println!("Hello, {}", name);
}
What happens under the hood
When you create a Vec, Rust allocates a chunk of memory on the heap. The Vec struct on the stack holds three values: a pointer to the heap, the length (how many items are valid), and the capacity (how much space is reserved). This triple is why Vec is fast. Accessing vec[i] is just pointer arithmetic. No loops. No searching.
When you push to a Vec and the capacity is full, Rust allocates a larger block, copies the data, and frees the old block. The growth strategy usually doubles the capacity. This makes push amortized O(1). The occasional reallocation is expensive, but it happens rarely enough that the average cost stays low. If you know the size ahead of time, use Vec::with_capacity to avoid reallocations entirely.
A HashMap uses a hash table with Robin Hood hashing for collision resolution. When you insert a key, Rust computes a hash, finds the bucket, and places the entry. If the bucket is occupied, it probes nearby buckets. The standard library uses SipHash by default, which is designed to prevent hash-flooding denial-of-service attacks. This adds a small cost to hashing but protects against malicious inputs.
String validation happens on every mutation. When you push a character, Rust checks that the resulting byte sequence is valid UTF-8. This prevents you from creating broken strings. If you have raw bytes that you know are valid UTF-8, you can use String::from_utf8 to skip the check, or from_utf8_unchecked inside an unsafe block if you have a proof. The community convention is to avoid from_utf8_unchecked unless profiling shows the validation is a bottleneck and you can write a // SAFETY: comment proving validity.
Realistic scenario: word frequency counter
You need to count word frequencies in a text. This requires reading tokens, normalizing them, and aggregating counts. A HashMap is the natural choice for aggregation. A Vec might hold the raw lines if you need to preserve them.
use std::collections::HashMap;
/// Counts word frequencies in a text.
/// Returns a map from lowercase word to count.
fn count_words(text: &str) -> HashMap<String, usize> {
// Pre-allocate if you have a rough estimate of unique words.
// This avoids reallocations during the loop.
let mut counts = HashMap::new();
for word in text.split_whitespace() {
// Trim punctuation and convert to lowercase.
let clean: String = word
.trim_matches(|c: char| !c.is_alphanumeric())
.to_lowercase();
// Skip empty strings after trimming.
if clean.is_empty() {
continue;
}
// The entry API is the community standard for "get or create".
// It avoids double hashing: one for contains_key, one for insert.
*counts.entry(clean).or_insert(0) += 1;
}
counts
}
fn main() {
let text = "Rust is fast. Rust is safe. Rust is fast.";
let counts = count_words(text);
// Iterate and print. Order is random.
for (word, count) in &counts {
println!("{}: {}", word, count);
}
}
The entry API is a convention that pays off. It computes the hash once and handles the insertion or update in a single step. Using contains_key followed by insert computes the hash twice. For large maps, the entry API is measurably faster.
Pitfalls and compiler errors
Collections enforce rules at compile time. Breaking these rules produces errors that guide you to the correct structure.
HashMap keys must implement Eq and Hash. If you try to use a Vec<T> as a key, the compiler rejects you with E0277 (trait bound not satisfied). Vec does not implement Hash because hashing a variable-length sequence requires iterating the whole sequence, which changes the cost model. Use &[T] or HashSet logic instead. If you need a custom struct as a key, derive Hash and Eq.
use std::collections::HashMap;
fn main() {
let mut map = HashMap::new();
let key = vec![1, 2, 3];
// E0277: Vec<i32> cannot be hashed.
// map.insert(key, "value");
}
Borrowing a Vec while mutating it causes conflicts. If you hold a reference to an element and then push to the Vec, the compiler rejects you with E0502 (cannot borrow as mutable because it is also borrowed as immutable). The push might trigger a reallocation, which invalidates the reference. The borrow checker protects you from dangling pointers.
fn main() {
let mut v = vec![1, 2, 3];
let first = &v[0];
// E0502: v.push might reallocate, invalidating `first`.
// v.push(4);
println!("{}", first);
}
String indexing is not supported. You cannot use s[0] on a String. The compiler rejects this because String does not implement the Index trait for usize. Characters in UTF-8 can be multi-byte. Indexing by byte index is safe, but indexing by character index requires iteration. Use chars() or bytes() to access content.
fn main() {
let s = String::from("café");
// Error: Index trait not implemented for String.
// let c = s[1];
// Use chars() for character iteration.
let first_char = s.chars().next().unwrap();
println!("{}", first_char);
}
Decision matrix: picking the right collection
Use Vec<T> when you need a growable list with fast random access by index and iteration order matters. Use String when you need a growable buffer of valid UTF-8 text. Use HashMap<K, V> when you need average O(1) lookups, insertions, and deletions by key, and you don't care about iteration order. Use BTreeMap<K, V> when you need keys sorted or range queries like "give me all keys between A and Z". Use HashSet<T> when you need to check membership or remove duplicates without storing extra values. Use VecDeque<T> when you need fast pushes and pops from both ends. Avoid LinkedList<T> in almost all cases; Vec<T> is faster and uses less memory for 99% of workloads due to cache locality and pointer overhead.
Reach for Vec::with_capacity when you know the approximate size ahead of time. Reach for HashMap::with_capacity when you expect many insertions and want to avoid rehashing. Reach for entry API when you need to get or create a value in a map. Reach for BTreeMap when you need deterministic iteration order based on key comparison.
Where to go next
- How to Convert Between Collection Types in Rust
- How to remove duplicates from Vec
- How to find min max in Vec
Pick the collection that matches your access pattern. The compiler will force you to be honest about it. Trust the borrow checker when it blocks a mutation; it's saving you from a reallocation crash.