Overview of Machine Learning Libraries in Rust

Why Rust for Machine Learning?

You spent weeks training a model in Python. The accuracy is great. Now the product manager asks you to run it on a Raspberry Pi Zero, or embed it in a C++ game engine, or ship it as a static binary that doesn't require a 300MB Python runtime. You try to bundle the model, and the deployment script fails because the target machine doesn't have libpython3.9.so. You need something smaller, faster, and safer. That's where Rust enters the machine learning conversation.

Rust doesn't replace Python for data exploration. Python has the notebooks, the interactive plots, and the instant gratification of pip install. Rust shines when you move from experimentation to production. It gives you zero-cost abstractions, memory safety without a garbage collector, and the ability to compile to WebAssembly for the browser or bare metal for embedded devices.

The trade-off is ecosystem maturity. Python's machine learning stack is a decade older. Rust's stack is younger, moving faster, and more fragmented. You won't find a single std::ml module. Instead, you navigate a vibrant collection of community crates, each with a specific focus. You pick the tools that match your problem, and you get a system that is often an order of magnitude faster and lighter than the Python equivalent.

The Landscape: Crates, not a Standard Library

Rust's standard library focuses on systems programming. It provides strings, collections, I/O, and concurrency primitives. It deliberately leaves domain-specific logic to the crate ecosystem. Machine learning is no exception.

This design is a feature. It means the ecosystem evolves rapidly without waiting for RFCs or stabilization cycles. New algorithms appear as crates within weeks of a research paper. It also means you have to make choices. There is no "official" way to do linear regression or train a neural network. You choose based on your needs: classical algorithms, deep learning, inference speed, or hardware portability.

Think of Python's ML ecosystem as a massive hardware store. You walk in, grab a drill, a saw, and a hammer, and you're building. Rust's ecosystem is more like a machine shop. You have the raw steel (types), the precision lathes (traits), and a growing collection of specialized tools (crates). You can buy a pre-made drill (linfa), or you can forge a custom one that fits your exact workflow. The tools are sharper and lighter, but you have to know how to hold them.

The Foundation: ndarray

Before you train a model, you need to handle data. Machine learning is linear algebra. You need tensors, matrices, and efficient numerical operations. In Rust, that foundation is ndarray.

ndarray is the workhorse of the ecosystem. It provides N-dimensional arrays with a safe API, broadcasting, and slicing. It is the equivalent of numpy in Python, though the API is stricter. ndarray is row-major by default, which matches numpy and helps with interoperability. Most other ML crates depend on ndarray or provide conversion methods to it.

use ndarray::{array, Array2};

fn main() {
    // Create a 2x3 matrix. ndarray uses row-major order by default.
    // This matches numpy, which helps when porting code.
    let a = array![[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]];

    // Create a 3x2 identity-like matrix.
    let b = array![[1.0, 0.0], [0.0, 1.0], [0.0, 0.0]];

    // Matrix multiplication requires an explicit call.
    // Rust doesn't overload operators for math to avoid ambiguity.
    let c = a.dot(&b);

    println!("{:?}", c);
}

ndarray is safe. Accessing an out-of-bounds index panics at runtime instead of segfaulting. This safety comes with a tiny overhead, but the crate uses SIMD optimizations internally to keep performance high. You'll rarely need to reach for unsafe unless you are writing a custom low-level kernel.

Treat ndarray as the lingua franca of Rust math. If a library doesn't support it, you'll spend half your time writing converters.

Classical ML: linfa

For classical algorithms like linear regression, k-means, support vector machines, and random forests, the go-to crate is linfa. The name is a nod to scikit-learn. The API mirrors Python's scikit-learn closely, which makes porting code straightforward.

linfa uses Rust's trait system to provide a unified interface. All supervised learners implement the SupervisedLearner trait. This means you can write generic code that works with any model, as long as it implements the trait. The crate ships with built-in datasets like Iris and Wine, which are useful for quick experiments.

use linfa::datasets::Iris;
use linfa::linear_regression::LinearRegression;
use linfa::traits::SupervisedLearner;
use ndarray::Array2;

fn main() {
    // Load a built-in dataset. linfa ships with Iris, Wine, and others.
    // This avoids the friction of finding CSV files for a quick test.
    let iris = Iris::load().unwrap();

    // Split the dataset into training and testing sets.
    // The split method handles shuffling and stratification.
    let (train, test) = iris.split(0.8);

    // Train the model. The API mirrors scikit-learn's fit method.
    // The default constructor sets reasonable hyperparameters.
    let model = LinearRegression::default()
        .fit(&train)
        .unwrap();

    // Predict on the test set.
    // The predict method returns an array of predictions.
    let predictions = model.predict(&test);

    println!("Predictions: {:?}", predictions);
}

linfa focuses on CPU performance. It doesn't support GPU training out of the box. The crate is still maturing, and APIs can shift between minor versions. The community is active, and the roadmap includes more algorithms and better documentation.

Check the version. The API stabilizes fast, but a minor bump can still break your imports. Pin your Cargo.toml and test upgrades deliberately.

Deep Learning: burn and candle

Deep learning in Rust is split between two major projects: burn and candle. They solve different problems.

burn is a portable deep learning framework. It supports multiple backends: CPU, CUDA, Metal, and WebGPU. You can write your model once and run it on a GPU, a mobile device, or in the browser. burn uses a declarative style for defining modules, similar to PyTorch. It supports automatic differentiation, optimizers, and data loaders. It is the best choice if you need to train models or deploy to diverse hardware.

candle is an inference-focused library created by Hugging Face. It prioritizes speed and simplicity for running pre-trained models. It has excellent support for transformers and integrates tightly with the Hugging Face ecosystem. If your goal is to run a large language model or a vision model with minimal latency, candle is the tool.

use burn::tensor::Tensor;
use burn::module::Module;

// Define a simple linear layer.
// burn uses a derive macro to generate boilerplate for serialization and device movement.
// This keeps the code clean and reduces errors.
#[derive(Module, Debug)]
pub struct Linear {
    weight: Tensor,
    bias: Tensor,
}

impl Linear {
    pub fn forward(&self, input: Tensor) -> Tensor {
        // Matrix multiply and add bias.
        // The tensor operations are lazy and optimized by the backend.
        input.matmul(&self.weight) + &self.bias
    }
}

burn's #[derive(Module)] macro is a convention. It generates the code needed to save and load model weights, and to move tensors between devices. You'll see this pattern in almost every burn example. It's a small detail that pays off when you scale up.

Pick the backend early. Switching from CPU to GPU later might require refactoring your tensor creation code.

Pitfalls and Gotchas

Rust's compiler is strict, and machine learning code triggers many of its checks. Here are the common traps.

Trait bounds. You'll see E0277 (trait bound not satisfied) often. This happens when you pass a type to a function that expects a trait implementation. For example, linfa's fit method requires the dataset to implement SupervisedDataset. If you pass a raw ndarray, the compiler rejects it. The fix is to wrap your data in the correct struct or implement the trait.

Shape mismatches. ndarray checks shapes at runtime. If you multiply a 2x3 matrix by a 2x3 matrix, the program panics. The compiler can't catch this because shapes are dynamic. You need to validate your data shapes before you train. Add assertions in your data loading code to fail fast.

The Python gap. Datasets and preprocessing tools are often Python-only. You might need pyo3 to bridge Rust and Python. pyo3 lets you call Python functions from Rust. It's powerful but adds complexity. You'll need to manage the Python interpreter and handle GIL locks. Use pyo3 only when you have no other choice.

Unsafe code. Some crates use unsafe for performance. ndarray uses unsafe internally for SIMD optimizations. burn uses unsafe for GPU calls. This is normal. The community follows the "minimum unsafe surface" rule: keep unsafe blocks small and well-documented. You rarely need to write unsafe yourself unless you are implementing a custom kernel.

The compiler catches type errors, but it won't save you from a shape mismatch in a matrix multiply. Validate your data shapes before you train.

Decision Matrix

Use linfa when you need classical algorithms like linear regression, k-means, or SVMs and want a scikit-learn-like API.

Use burn when you are building deep learning models that need to run on multiple backends, including WebGPU for the browser or CUDA for GPUs.

Use candle when your priority is fast inference of transformer models or Hugging Face models with minimal overhead.

Use ndarray when you need numerical computing primitives, matrix operations, or a common data format to pass between libraries.

Use pyo3 when you must integrate Rust into an existing Python data science pipeline or need access to a Python-only dataset loader.

Use onnxruntime when you have a model trained in Python and need to run it in Rust without retraining.

The ecosystem moves fast. Pin your versions in Cargo.toml and test upgrades deliberately.

Where to go next

Rust relies on external packages called crates to handle machine learning tasks rather than having them built-in. You pick a specific crate like linfa for standard algorithms or burn for neural networks and add it to your project. It is like choosing a specific tool from a toolbox rather than having one giant tool that does everything.