How to Use Phantom Types for Type-Level Programming
You are writing a configuration loader. You have a Config struct that starts as raw bytes. You parse those bytes into structured fields. Once parsed, the config is ready to use. If someone tries to call config.save() before parsing finishes, the program crashes or writes garbage. In Python, you would add an is_parsed boolean and check it at runtime. In Rust, you can make the type itself change. An unparsed config is a different type than a parsed one. The compiler refuses to let you call save() on the wrong type. This is the power of phantom types.
Phantom types let you encode state, units, or capabilities directly into the type system. The compiler enforces the rules at compile time. You eliminate entire classes of runtime bugs. The technique relies on PhantomData, a zero-sized type that carries type information without storing any data.
The concept: tags that vanish at runtime
A phantom type is a tag you stick on a struct to tell the compiler about its state or constraints. The tag takes up zero bytes in memory. It exists only while the code is compiling. Once the binary runs, the tag is gone.
Think of a file folder in a bureaucracy. The folder contains the same papers whether it is labeled "DRAFT" or "FINAL". The label does not change the contents. It changes how the clerk handles the folder. A clerk cannot sign a "DRAFT" folder. The label enforces a rule. In Rust, the label is a generic type parameter, and PhantomData is the mechanism that makes the compiler respect the label.
You define a struct with a generic parameter that does not appear in any field. The compiler complains because it cannot determine how the struct relates to that type. You add a PhantomData<T> field to satisfy the compiler. The field is never read. It never writes. It only informs the compiler about variance, ownership, and trait bounds.
use std::marker::PhantomData;
// State tags: empty structs that exist only as types
struct Uninitialized;
struct Initialized;
// The wrapper carries a state tag
struct Sensor<T, State> {
value: Option<T>,
// PhantomData tells the compiler this struct conceptually holds a State.
// It takes zero space. It only affects compile-time checks.
_state: PhantomData<State>,
}
fn main() {
// PhantomData is zero-sized. The struct layout is identical regardless of State.
println!("Size of Sensor<f32, Uninitialized>: {}", std::mem::size_of::<Sensor<f32, Uninitialized>>());
println!("Size of Sensor<f32, Initialized>: {}", std::mem::mem::size_of::<Sensor<f32, Initialized>>());
}
The compiler treats Sensor<f32, Uninitialized> and Sensor<f32, Initialized> as completely distinct types. You cannot pass one where the other is expected. The state is baked into the type.
The tag is invisible at runtime. It only guides the compiler.
Minimal example: enforcing a state machine
Phantom types shine when you have a state machine where the state is known at compile time. You can define methods only on specific states. You can write transition functions that consume one state and return another. The compiler forces the code to follow the valid path.
use std::marker::PhantomData;
struct Uninitialized;
struct Initialized;
struct Sensor<T, State> {
value: Option<T>,
_state: PhantomData<State>,
}
// Methods available only on Uninitialized sensors
impl<T> Sensor<T, Uninitialized> {
fn new() -> Self {
Sensor {
value: None,
_state: PhantomData,
}
}
// calibrate consumes the uninitialized sensor and returns an initialized one.
// The type changes. The compiler enforces this transition.
fn calibrate(self, value: T) -> Sensor<T, Initialized> {
Sensor {
value: Some(value),
_state: PhantomData,
}
}
}
// Methods available only on Initialized sensors
impl<T> Sensor<T, Initialized> {
fn read(&self) -> Option<&T> {
// Safe to read because the type guarantees initialization.
self.value.as_ref()
}
}
fn main() {
let sensor = Sensor::<f32, Uninitialized>::new();
// This line would fail to compile:
// sensor.read();
// Error: method `read` not found in `Sensor<f32, Uninitialized>`
let sensor = sensor.calibrate(42.0);
println!("Reading: {:?}", sensor.read());
}
The calibrate method takes self by value. It consumes the Uninitialized sensor. It returns an Initialized sensor. You cannot call read before calibrate. The compiler rejects the code with a clear error. The type system models the workflow.
The type changes. The flow is enforced.
Realistic example: a database connection
A common use case is a connection object that must go through a sequence of steps. You create a connection. You connect to the server. You authenticate. Only then can you query. Phantom types make it impossible to query before authenticating.
use std::marker::PhantomData;
// State tags for the connection lifecycle
struct Unconnected;
struct Connected;
struct Authenticated;
struct DbConn<State> {
host: String,
// In a real app, this would hold the socket or handle.
// Here we use a placeholder to focus on the type structure.
_handle: Option<String>,
_state: PhantomData<State>,
}
impl DbConn<Unconnected> {
fn new(host: &str) -> Self {
DbConn {
host: host.to_string(),
_handle: None,
_state: PhantomData,
}
}
fn connect(self) -> DbConn<Connected> {
// Simulate connection logic
DbConn {
host: self.host,
_handle: Some("socket_handle".to_string()),
_state: PhantomData,
}
}
}
impl DbConn<Connected> {
fn auth(self, user: &str, _pass: &str) -> DbConn<Authenticated> {
// Simulate authentication
DbConn {
host: self.host,
_handle: self._handle,
_state: PhantomData,
}
}
}
impl DbConn<Authenticated> {
fn query(&self, sql: &str) {
println!("Executing query on {}: {}", self.host, sql);
}
}
fn main() {
let conn = DbConn::<Unconnected>::new("localhost:5432");
// conn.query("SELECT 1");
// Error: method `query` not found in `DbConn<Unconnected>`
let conn = conn.connect();
// conn.query("SELECT 1");
// Error: method `query` not found in `DbConn<Connected>`
let conn = conn.auth("admin", "secret");
conn.query("SELECT 1");
}
The compiler forces the sequence. You cannot skip connect. You cannot skip auth. The method query exists only on DbConn<Authenticated>. If you try to call it on the wrong state, the compiler rejects you. This pattern prevents subtle bugs where code assumes a connection is ready when it is not.
Convention aside: The community often names the phantom field _marker or _state. The underscore prefix suppresses the unused variable warning. The name signals to readers that the field is a marker, not real data. Stick to _marker for variance markers and _state for state machines. It helps readers scan the code.
You can't query before you authenticate. The compiler won't let you.
Pitfalls and compiler errors
Phantom types are safe, but they introduce subtle rules. The compiler checks variance and auto-traits based on the phantom data. If you get these wrong, you get errors that look unrelated to your logic.
If you forget to add PhantomData for a generic parameter, the compiler rejects the code with E0392 (parameter is never used). The compiler requires every generic parameter to appear in the struct definition. PhantomData satisfies this requirement.
struct Bad<T> {
value: i32,
// Missing PhantomData<T>
}
// Error[E0392]: parameter `T` is never used
Variance is the trickier part. Variance describes how subtyping relationships propagate through generic types. PhantomData<T> tells the compiler that the struct owns a T. PhantomData<&T> tells the compiler that the struct borrows a T. If you have a lifetime parameter, you must match the phantom data to the actual usage.
If your struct holds a reference &'a T, but you use PhantomData<T>, the compiler thinks you own a T. It may enforce stricter lifetime rules than necessary. You get errors like "lifetime may not live long enough". The fix is to use PhantomData<&'a T> to tell the compiler you only borrow.
use std::marker::PhantomData;
struct Borrowed<'a, T> {
// This struct holds a reference, but the phantom says it owns T.
// This causes variance mismatches.
_marker: PhantomData<T>,
}
// Correct version:
struct BorrowedCorrect<'a, T> {
_marker: PhantomData<&'a T>,
}
Auto-traits like Send and Sync are also affected. If T is not Send, a struct with PhantomData<T> is not Send. This matters for threading. If you want the struct to be Send regardless of T, you cannot use PhantomData<T>. You would need PhantomData<*const T> or other workarounds, but those require unsafe. For most cases, let the phantom data dictate the auto-traits. It keeps the code safe.
Respect the variance. The compiler is watching your phantom data.
Decision matrix
Phantom types are a powerful tool, but they are not always the right choice. Use them when the benefits outweigh the complexity.
Use phantom types for state machines where the state is known at compile time and the set of states is small. The compiler enforces the transitions. You eliminate invalid states.
Use phantom types to encode units or dimensions. A Length<Meters> type is distinct from Length<Feet>. The compiler prevents adding meters to feet. This catches physics bugs at compile time.
Use PhantomData to fix variance or auto-traits when you have a generic parameter that does not appear in fields. This is a technical necessity, not a design choice. The compiler requires it.
Reach for runtime enums when the state can change dynamically or the set of states is large. Enums are flexible and easy to match. Phantom types add boilerplate and can make the API harder to use.
Reach for Option<T> or Result<T> for simple initialized or error cases where the type does not need to change. These types are idiomatic and well-supported. Phantom types are overkill for a single flag.
Phantom types turn runtime bugs into compile-time errors. Use them when the cost of a wrong state is high.