When text gets too heavy
You are building a service that processes thousands of events per second. Every millisecond of parsing latency compounds into dropped requests. Every extra byte on the wire increases bandwidth costs. You started with JSON because it is familiar, but the payload sizes are bloated and the string parsing is eating your CPU budget. You need a binary format that stays compact, deserializes fast, and integrates cleanly with Rust's type system. MessagePack and CBOR are the two standard answers.
How binary formats actually work
Text-based formats like JSON or YAML convert your data into human-readable strings. A number like 42 becomes two characters. A boolean true becomes four characters. Field names are repeated for every single record. Binary formats skip the text translation entirely. They write raw bytes directly to memory or disk. Think of JSON as mailing a letter with a detailed hand-written description of every item inside. MessagePack and CBOR are like shipping a pre-packed crate with a standardized barcode. The receiver scans the header, knows exactly how many bytes to read for each field, and reconstructs the object without guessing.
MessagePack focuses on speed and simplicity. It mirrors JSON's structure but strips away quotes, braces, and repeated field names. CBOR is an IETF standard designed for constrained environments. It supports more data types out of the box, including timestamps, big integers, and self-describing tags. Both integrate with Serde through format-specific crates that implement the Serializer and Deserializer traits. Serde provides the trait definitions and the derive macros. The format crates provide the byte-level encoding rules. The two pieces snap together at compile time.
Stop treating JSON as the default. Switch to binary when payload size or parsing speed matters.
The minimal setup
Add the format crates to your dependencies. rmp-serde handles MessagePack. ciborium handles CBOR. Serde itself needs the derive feature to generate the boilerplate implementations for your structs.
[dependencies]
serde = { version = "1.0", features = ["derive"] }
rmp-serde = "1.3"
ciborium = "0.2"
The basic workflow is identical for both formats. You define a struct, derive the traits, and call the format crate's serialization functions.
use serde::{Serialize, Deserialize};
use std::io::Cursor;
/// A simple configuration record to serialize
#[derive(Serialize, Deserialize, Debug, PartialEq)]
struct Config {
id: u32,
active: bool,
}
fn main() {
let original = Config { id: 101, active: true };
// MessagePack serializes directly to a byte vector
// The function allocates a Vec<u8> and writes the raw bytes
let mp_bytes = rmp_serde::to_vec(&original).unwrap();
// Deserialization reads the slice and reconstructs the struct
let mp_restored: Config = rmp_serde::from_slice(&mp_bytes).unwrap();
// CBOR uses a reader/writer pattern for streaming compatibility
// We allocate a buffer to receive the encoded bytes
let mut cbor_buf = Vec::new();
// into_writer accepts any type implementing std::io::Write
ciborium::into_writer(&original, &mut cbor_buf).unwrap();
// from_reader requires a type implementing std::io::Read
// Cursor wraps a slice so it satisfies the Read trait
let cbor_restored: Config = ciborium::from_reader(Cursor::new(&cbor_buf)).unwrap();
assert_eq!(original, mp_restored);
assert_eq!(original, cbor_restored);
}
Real-world payloads and streaming
Real data rarely stays flat. You will encounter nested structs, optional fields, and enums. Both formats handle them automatically, but the byte layout differs. CBOR supports self-describing tags, which means you can serialize a value without the receiving code knowing the exact type ahead of time. MessagePack requires the deserializer to know the target type.
use serde::{Serialize, Deserialize};
use std::io::Cursor;
/// Represents different server states
#[derive(Serialize, Deserialize, Debug, PartialEq)]
enum ServerState {
Idle,
Processing { queue_depth: usize },
Offline,
}
/// A more complex payload with nested data and optional fields
#[derive(Serialize, Deserialize, Debug, PartialEq)]
struct ServerSnapshot {
state: ServerState,
uptime_seconds: u64,
// Optional fields are encoded as a special nil byte in both formats
debug_token: Option<String>,
}
fn serialize_snapshot() -> Result<(), Box<dyn std::error::Error>> {
let snapshot = ServerSnapshot {
state: ServerState::Processing { queue_depth: 42 },
uptime_seconds: 86400,
debug_token: Some("alpha-test".to_string()),
};
// MessagePack handles enums by encoding the variant index and its fields
// The derive macro generates the Serialize impl automatically
let mp_data = rmp_serde::to_vec(&snapshot)?;
println!("MessagePack size: {} bytes", mp_data.len());
// CBOR can encode the same data with self-describing tags if configured
// We reuse the writer pattern to avoid intermediate allocations
let mut cbor_data = Vec::new();
ciborium::into_writer(&snapshot, &mut cbor_data)?;
println!("CBOR size: {} bytes", cbor_data.len());
// Verify round-trip integrity
// Deserialization fails fast if the byte layout does not match the struct
let mp_back: ServerSnapshot = rmp_serde::from_slice(&mp_data)?;
let cbor_back: ServerSnapshot = ciborium::from_reader(Cursor::new(&cbor_data))?;
assert_eq!(snapshot, mp_back);
assert_eq!(snapshot, cbor_back);
Ok(())
}
When you call rmp_serde::to_vec, Serde visits every field in ServerSnapshot. It asks the MessagePack serializer how to encode a u64, an enum, and an Option. The serializer writes the exact byte sequence defined by the MessagePack spec. No quotes, no field names, just raw values. Deserialization reverses the process. from_slice reads the first byte, sees a fixint marker, consumes the next eight bytes, reads the enum variant index, and hands the reconstructed struct back to you.
CBOR follows the same trait-based pipeline but exposes a different API surface. into_writer and from_reader accept anything that implements std::io::Write and std::io::Read. This design choice lets you stream CBOR data directly to a file, a network socket, or a memory buffer without allocating an intermediate vector. MessagePack's to_vec and from_slice are simpler because they assume you want everything in memory at once. The community convention is to stick with Cursor for in-memory CBOR work unless you are streaming to a file or socket. If you pass a &Vec<u8> directly to ciborium::from_reader, the compiler rejects it with E0277 (trait bound not satisfied). Wrap it in Cursor::new() and the error disappears.
Keep your format crates locked to specific versions. Binary layouts change between minor releases.
Pitfalls and compiler friction
The most common mistake is forgetting the #[derive(Serialize, Deserialize)] attribute. The compiler will reject the code with E0277 (trait bound not satisfied). Serde cannot guess how to pack your custom type without the generated implementation. Add the derive macro and the error disappears.
Another trap is mixing up the API styles. rmp-serde works with byte slices. ciborium works with readers and writers. If you try to pass a slice directly to a reader function, you get E0308 (mismatched types). The compiler expects something that implements std::io::Read. Use Cursor for in-memory buffers or std::fs::File for disk I/O. The community convention is to keep Cursor usage explicit. It signals to readers that you are treating a byte slice as a stream, not as a raw buffer.
Version drift causes silent breakage. Serde 1.0 is stable, but format crates update independently. Lock your versions in Cargo.toml and run cargo update deliberately. Binary formats are not forward-compatible by default. If you add a field to a struct and try to deserialize old data, both MessagePack and CBOR will fail unless you configure default values or use #[serde(default)]. The compiler will not catch this at compile time. The deserialization will panic at runtime. Add #[serde(default)] to optional fields and handle missing data gracefully.
Error handling matters. unwrap() hides deserialization failures. In production code, propagate errors with ? or map them to your application's error type. A malformed byte stream should never crash your service.
Treat deserialization errors as data validation failures. Log the payload and reject the request.
Choosing your format
Use MessagePack when you need maximum serialization speed and your data model closely matches JSON. Use MessagePack when you are building internal microservices where both sides share the exact same Rust codebase. Use CBOR when you need an IETF-standard format that supports timestamps, big integers, or self-describing tags. Use CBOR when you are writing data to disk or streaming it over a network and want to avoid intermediate buffer allocations. Use JSON when human readability matters more than payload size or parsing latency.
Pick the format that matches your I/O pattern. Do not optimize for bytes you will never count.