When Rust isn't enough
You're optimizing a tight loop in a game engine. The profiler screams that a specific math operation is eating 40% of your frame budget. You check the assembly output and see the compiler generated a sequence of three instructions. You know the CPU has a single instruction that does exactly this, but Rust doesn't expose it. Or you're writing a driver for a custom microcontroller and need to set a bit in a hardware register that has no memory-mapped address. You've reached the edge of the abstraction. You need to inject raw assembly instructions directly into your Rust code.
Inline assembly lets you embed CPU instructions inside a Rust function. It bridges the gap between Rust's safety guarantees and the raw capabilities of the hardware. The tool is the asm! macro. It gives you the power to talk to the metal, but it requires you to negotiate carefully with the compiler. You must describe exactly what your assembly does, or the optimizer will make assumptions that break your code.
The black box contract
Think of the compiler as a master architect building a house. It knows exactly where every brick goes and how to arrange them for maximum strength. Inline assembly is like handing the architect a pre-fabricated steel beam and saying, "Put this here, and don't touch it." The architect has to make sure the walls align with the beam, but it can't change the beam's internal structure.
You have to tell the architect what the beam needs and what it produces. You specify which registers hold your data, which registers get trashed, and whether the code touches memory. The compiler treats your assembly block as a black box. It doesn't know what the instructions do internally. It only knows what you tell it about inputs, outputs, and side effects. If you lie to the compiler, the optimizer will reorder code or cache values in ways that corrupt your assembly's state. The compiler trusts your description, not your assembly.
Minimal example
Inline assembly is currently unstable. You must enable the asm feature flag to use it. The macro takes a template string and a list of operands. The template uses placeholders like {0} and {1} to refer to operands by index.
#![feature(asm)]
fn add_asm(a: i32, b: i32) -> i32 {
// Declare a mutable variable to hold the output.
// The asm! macro will write the result here.
let mut result: i32;
unsafe {
// SAFETY: The assembly is pure, reads only inputs, and writes only to the output register.
// No memory is accessed, and no registers are clobbered beyond the output.
asm!(
"add {0}, {1}",
out(reg) result, // Compiler picks a register for result and writes the sum here.
in(reg) a, // Compiler loads 'a' into a register.
in(reg) b, // Compiler loads 'b' into a register.
options(nostack, pure, readonly) // Tells compiler this is safe to optimize aggressively.
);
}
result
}
fn main() {
println!("Result: {}", add_asm(10, 20));
}
The out(reg) constraint tells the compiler to allocate a register for the output. The in(reg) constraint tells the compiler to load the value into a register. The options clause is crucial. nostack means the assembly doesn't use the stack pointer. pure means the assembly has no side effects other than writing outputs. readonly means the assembly doesn't read memory. These options let the compiler reorder and optimize around the block. Without them, the compiler assumes the worst and inserts memory barriers.
The constraints are the contract. Break them, and the black box explodes.
Anatomy of the asm! macro
The asm! macro has three parts: the template, the operands, and the options. Understanding each part prevents subtle bugs.
Constraints
Constraints tell the compiler how to handle each operand. The most common constraint is reg, which asks for a general-purpose register. You can also use mem to force a memory operand, or specific register names like rax if you need a particular register.
Use inout when an operand is both an input and an output. This tells the compiler to load the value, pass it to the assembly, and write the result back to the same location. Use lateout when the output is written late in the instruction, allowing the compiler to reuse the input register for the output. This is useful for instructions that read and write the same register, like xchg.
Options
Options control how the compiler optimizes around the assembly. nostack is safe to use when the assembly doesn't push or pop values. pure is safe when the assembly has no side effects. readonly is safe when the assembly doesn't read memory. preserves_flags tells the compiler that the assembly doesn't modify CPU flags, which allows better optimization of surrounding code that checks flags. naked is used with #[naked] functions to disable prologue and epilogue generation.
Convention: Pick one syntax and stick to it. The asm! macro supports both AT&T and Intel syntax. The community leans toward Intel syntax on x86 for readability, but AT&T is the default on some toolchains. Use options(intel_syntax) to be explicit. Don't mix syntaxes in the same file.
Syntax
The template string uses {n} placeholders. You can add modifiers to the placeholders. {0:r} forces a register. {0:e} forces a 32-bit register. {0:w} forces a 16-bit register. These modifiers help when the instruction requires a specific register size.
Realistic patterns
Inline assembly often appears in performance-critical code or hardware interaction. A common pattern is modifying a value in place. The inout constraint handles this cleanly.
#![feature(asm)]
fn square_in_place(val: &mut i32) {
unsafe {
// SAFETY: We are modifying the value pointed to by val via a register.
// The assembly reads the value, squares it, and writes it back.
// No memory is accessed directly; the dereference happens via the register constraint.
asm!(
"imul {0}, {0}",
inout(reg) *val, // Load *val, square it, write back to the same register.
options(nostack, pure)
);
}
}
The inout(reg) *val syntax tells the compiler to load *val into a register, pass it as both input and output, and write the result back to *val. The compiler handles the dereference. You don't need to manage the memory access manually.
Convention: Wrap asm! in a safe function when the safety contract is obvious. This hides the unsafe from callers and centralizes the proof. If the assembly is simple and the constraints are correct, you can provide a safe wrapper that guarantees safety. This follows the "minimum unsafe surface" rule. Keep the unsafe block as small as possible.
Naked functions
Sometimes you need total control. No prologue, no epilogue. The #[naked] attribute combined with asm! lets you write a function that is purely assembly. This is essential for bootloaders, exception handlers, and interrupt vectors. The compiler generates zero code around your assembly. You are responsible for everything, including returning.
#![feature(asm)]
#![feature(naked_functions)]
#[naked]
unsafe extern "C" fn idle_loop() {
// SAFETY: This is a naked function. The compiler generates no prologue or epilogue.
// The assembly must handle the return or loop forever.
asm!(
"1: hlt",
" jmp 1b",
options(noreturn)
);
}
The noreturn option tells the compiler that the function never returns. This allows the compiler to optimize the call site by not saving state for a return. Naked functions are rare. Use them only when you need to control the exact instruction sequence, such as in firmware or kernel code.
Treat the naked function as a raw instruction stream. If you forget to return or loop, the CPU will execute garbage.
Pitfalls and errors
Inline assembly is unstable. You must enable the asm feature flag. If you forget, the compiler rejects the code with an error about unstable features. The error message will mention that asm is not stable and requires a feature gate.
The bigger danger is lying to the compiler. If you access memory but mark the block readonly, the compiler might assume memory hasn't changed and skip reloading a value. Your assembly reads stale data. If you modify a register that the compiler is using for something else and you don't list it in the clobber list, you corrupt the compiler's state. The result is undefined behavior. The compiler won't catch this. You have to audit every instruction against the ABI and the constraints.
Another pitfall is using the wrong constraint. If you use reg but the instruction requires a specific register, the compiler might pick a register that the instruction doesn't support. The assembly will assemble, but the CPU will raise an invalid opcode exception at runtime. Always check the instruction manual for register requirements.
Treat the clobber list as a legal document. If a register isn't listed, the compiler assumes you didn't touch it.
Decision matrix
Use asm! when you need a specific CPU instruction that has no Rust intrinsic and no safe equivalent, such as a niche SIMD instruction or a privileged operation.
Use std::arch intrinsics when the instruction is common and available as a compiler intrinsic, like mm256_add_ps for AVX math. Intrinsics are portable across Rust versions and often get better optimization than raw assembly.
Use safe Rust when the performance difference is negligible or when the logic can be expressed with standard operators. The compiler generates excellent code for most algorithms, and readable code beats micro-optimizations in 99% of projects.
Use FFI when you need to call existing C or assembly libraries rather than writing assembly inline. Wrapping a C function is often easier than reimplementing the logic in asm!.
Inline assembly is a scalpel, not a hammer. Use it only when you have a precise target.