The C++ Wall
You spent weeks training a neural network in Python. The accuracy is great. Now you need to ship it inside a Rust microservice that handles thousands of requests per second. You open your editor, type extern "C", and stare at a blank line. PyTorch doesn't give you a C function to call. It's a C++ beast with no stable C API. You can't just link against it and run.
PyTorch is written in C++. C++ compilers mangle function names based on their arguments and namespaces. A function like load_model might become _ZN6torch3jit6load... in the compiled binary. Rust's FFI expects C-style symbols. It cannot guess the mangled names. Even if you could, the C++ Application Binary Interface changes between compiler versions and library releases. Calling C++ directly from Rust is a maintenance trap that breaks with every update.
The community solved this with two bridges. The Python bridge uses pyo3 to call the Python interpreter, which then talks to PyTorch. The C++ bridge uses torch-sys to wrap the C++ core with a C shim, giving Rust access to the raw library. The Python bridge is the standard path for deploying models. The C++ bridge is for building Rust-native machine learning engines.
Pick the bridge that matches your goal. Don't try to build a third bridge unless you enjoy reinventing wheels.
Two Bridges: Python vs C++
The Python bridge relies on pyo3. Python has a stable C API. PyTorch exposes itself to Python. Rust calls Python, Python calls PyTorch. The chain is stable and well-supported. You get access to every PyTorch feature, every model zoo script, and every community hack. The trade-off is that you need a Python runtime and you pay the overhead of crossing the Python boundary.
The C++ bridge relies on torch-sys. This crate uses bindgen to generate Rust bindings from the PyTorch C++ headers. It includes a C shim that exports C++ functions as C symbols. You get direct access to the C++ API without Python. The trade-off is complexity. You deal with C++ memory management, ABI stability, and a steeper learning curve. The tch-rs crate sits on top of torch-sys to provide a safer, more ergonomic API.
Use the Python bridge when you want to deploy existing models quickly. Use the C++ bridge when you need maximum performance and want to avoid the Python dependency entirely.
The Standard Path: pyo3
Most Rust projects that use PyTorch reach for pyo3. It handles the Foreign Function Interface details, manages the Global Interpreter Lock, and converts types between Rust and Python automatically. You write Rust code that looks like Python code, but with Rust's type safety and performance.
Add pyo3 to your Cargo.toml. The extension-module feature is essential if you are building a Python extension. It tells Rust to link against the Python library dynamically and avoid conflicting with the host Python interpreter.
[dependencies]
pyo3 = { version = "0.20", features = ["extension-module"] }
Convention aside: The community uses maturin instead of cargo for pyo3 projects. maturin handles the Python packaging, builds the shared library, and installs it into the virtual environment. Run maturin develop for fast iteration. It saves you from fighting linker flags on macOS and Windows.
Minimal Example
Here is a minimal function that loads a PyTorch model and returns a score. The code runs inside a Python module. The #[pyfunction] attribute exposes the function to Python. The Python<'_> argument proves you hold the GIL.
use pyo3::prelude::*;
/// Run inference on a loaded PyTorch model.
#[pyfunction]
fn run_model(py: Python<'_>) -> PyResult<f64> {
// Import the torch module from the running Python interpreter.
// This is equivalent to `import torch` in Python.
let torch = py.import("torch")?;
// Load the model file. This calls torch.load("model.pt").
// The ? operator converts Python exceptions into Rust errors.
let model = torch.getattr("load")?.call1(("model.pt",))?;
// Return a dummy score for the example.
// In real code, you would call model.forward() here.
Ok(0.95)
}
/// Define the Python module structure.
#[pymodule]
fn my_module(_py: Python<'_>, m: &Bound<'_, PyModule>) -> PyResult<()> {
// Expose the Rust function to Python.
// wrap_pyfunction creates a Python callable from the Rust function.
m.add_function(wrap_pyfunction!(run_model, m)?)?;
Ok(())
}
The Python<'_> token is your passport. No token, no Python calls. The compiler rejects any attempt to call py.import or model.call_method without that token in scope. This prevents data races where multiple threads try to execute Python bytecode simultaneously.
How the GIL Token Works
Python has a Global Interpreter Lock. Only one thread can execute Python bytecode at a time. pyo3 represents this lock with the Python<'_> token. You acquire the token with Python::with_gil or receive it as an argument in #[pyfunction]. The token has a lifetime. The compiler ensures you cannot use the token after you release the lock.
Ah-ha reveal: The token isn't just a value. It's a proof that you hold the GIL. The compiler uses it to prevent you from calling Python from a thread that doesn't own the lock. If you spawn a Rust thread and try to call py.import, the code won't compile. You must acquire the GIL inside the thread or pass the token carefully.
When you are done with Python calls, you can release the GIL to let other threads run. Use py.allow_threads to run Rust code without holding the lock. This is critical for performance. If you hold the GIL while doing heavy Rust computation, you block the entire Python interpreter.
// Release the GIL while doing heavy Rust work.
// The closure runs without the lock.
let result = py.allow_threads(|| {
// Heavy Rust computation here.
// No Python calls allowed inside this block.
42
});
Trust the borrow checker here. If the compiler complains about the GIL token, it's protecting you from a deadlock or a crash.
Realistic Example: Model Lifecycle
Loading a model from disk is expensive. You don't want to load it on every request. You need to load it once and reuse it. pyo3 provides PyObject for this. PyObject is a reference-counted handle to a Python object. It keeps the object alive even when you drop the typed reference.
use pyo3::prelude::*;
use pyo3::types::PyDict;
/// Holds a PyTorch model loaded from disk.
struct ModelRunner {
// PyObject stores a reference to the Python object.
// It keeps the model alive across multiple calls.
model: PyObject,
}
impl ModelRunner {
/// Load the model once during initialization.
fn new(py: Python<'_>) -> PyResult<Self> {
let torch = py.import("torch")?;
// Load the model and store it as a generic Python object.
// into() converts the typed reference into a PyObject.
let model = torch.getattr("load")?.call1(("model.pt",))?;
Ok(Self { model: model.into() })
}
/// Run inference on input data.
fn predict(&self, py: Python<'_>, input: f64) -> PyResult<f64> {
// Convert the stored PyObject back to a usable reference.
// bind(py) requires the GIL token to access the object.
let model = self.model.bind(py);
// Call the forward pass.
// call_method invokes model.forward(input).
let result = model.call_method("forward", (input,), None)?;
// Extract the result as a Rust f64.
// extract() handles the type conversion.
Ok(result.extract()?)
}
}
Convention aside: Always store PyObject for cross-call state. Storing &PyAny or other typed references won't work because they borrow the Python object and can't outlive the GIL scope. PyObject owns a reference count and survives GIL drops.
Memory and Data Transfer
Passing small values like floats is cheap. Passing large tensors requires care. PyTorch tensors live in Python memory. Rust vectors live in Rust memory. Copying data between them is slow.
The standard approach is to use NumPy arrays as the bridge. pyo3 has built-in support for NumPy. You can create a NumPy array from a Rust slice without copying data, or extract a Rust slice from a NumPy array. This zero-copy transfer is essential for performance.
use pyo3::types::PyReadonlyArray1;
/// Extract a Rust slice from a NumPy array without copying.
fn process_numpy(py: Python<'_>, arr: &Bound<'_, PyArray>) -> PyResult<Vec<f64>> {
// Extract a readonly view of the array data.
// This avoids copying the buffer.
let view = arr.readonly()?;
let data = view.as_slice()?;
// Process the data in Rust.
let result: Vec<f64> = data.iter().map(|x| x * 2.0).collect();
Ok(result)
}
Pitfall: The NumPy array must stay alive while you use the slice. If you drop the array, the slice points to freed memory. pyo3's PyReadonlyArray1 manages this lifetime for you. Don't try to manage raw pointers manually.
Pitfalls and Errors
Missing the extension-module feature causes link errors. Rust tries to link Python statically or in a way that conflicts with the host interpreter. You get a crash on startup or a linker error about duplicate symbols. Always enable the feature for Python extensions.
Calling Python without the GIL is a compile error. The compiler rejects code that tries to use py.import or model.call_method without a Python<'_> token. You'll see an error about missing the token or a lifetime mismatch. Acquire the token with Python::with_gil or pass it from the function argument.
ABI mismatches happen when your Rust binary is built against a different Python version than the runtime. PyTorch binaries are picky about Python versions and system libraries. Test with the exact Python version your deployment uses. Use abi3 in pyo3 if you need to support multiple Python versions, but be aware that some features are restricted.
Runtime panics occur when Python raises an exception. pyo3 converts exceptions into PyResult errors. Use the ? operator to propagate them. If you ignore the error, the Python interpreter enters an inconsistent state. Always handle PyResult.
Don't fight the compiler here. If pyo3 rejects your code, it's usually preventing a segfault or a deadlock. Read the error message. It tells you exactly what went wrong.
Decision Matrix
Use pyo3 when you need to run existing PyTorch models written in Python and want to avoid rewriting the inference logic. Use pyo3 with the extension-module feature when you are building a Python extension that calls Rust, rather than a Rust binary that calls Python. Use pyo3 with allow_threads when you have heavy Rust computation that should run without blocking the Python interpreter.
Use torch-sys when you are building a Rust-native machine learning library and need direct access to the C++ API without the Python overhead. Use tch-rs when you want C++ performance with Rust ergonomics and don't want to write unsafe blocks manually. Use a Rust-native framework like candle or burn when you want full Rust control, no Python dependency, and better integration with Rust's type system.
Reach for plain Rust math libraries when your model is simple enough to implement from scratch. PyTorch adds significant dependency weight. If you only need matrix multiplication and softmax, a lightweight crate is faster and easier to deploy.
Counter-intuitive but true: the more you use pyo3, the harder it is to reason about performance. Python overhead adds up. Profile your code. If the Python boundary is the bottleneck, switch to tch-rs or a native framework.