How to Handle Strings Between Rust and JavaScript in WASM

Web
Use `wasm-bindgen` to automatically handle the conversion between Rust `String` and JavaScript `String`, as it manages the necessary memory allocation and encoding for you.

The Border Crossing Problem

You wrote a Rust function that greets a user. You call it from JavaScript. Instead of "Hello, Alice", you get a crash or a string full of garbage bytes. The problem isn't your logic. It's the border crossing. Rust and JavaScript speak different languages for text, and they live in different memory worlds. You need a translator that handles the encoding and the memory allocation, or the data gets lost in transit.

WebAssembly modules run in a sandbox with a single block of memory called linear memory. It is a flat array of bytes. JavaScript has its own heap with garbage collection. When you pass a string from JS to Rust, you cannot just pass a pointer. A pointer in JS means nothing to WASM, and a pointer in WASM means nothing to JS. You have to copy the data across the boundary. You also have to convert the encoding. JavaScript uses UTF-16. Rust uses UTF-8. If you copy bytes directly, emojis and non-ASCII characters break.

The standard solution is wasm-bindgen. It generates the glue code that allocates memory, copies bytes, converts encoding, and cleans up. You write normal Rust. You call normal JS. The bridge handles the rest.

How the Bridge Works

Think of wasm-bindgen as a customs agent at a secure facility. JavaScript lives outside the facility. Rust lives inside. When JS wants to send a string to Rust, it hands the string to the agent. The agent checks the string, converts it to the facility's standard format, allocates a crate in the warehouse, copies the contents, and hands Rust a receipt. Rust uses the data. When Rust returns a string, the agent does the reverse. It allocates a crate outside, copies the data back, converts the format, and hands JS the result. The agent then destroys the crates.

This process happens automatically when you use the #[wasm_bindgen] attribute. The attribute tells the compiler to generate the agent code. You don't write the agent code. You don't manage the crates. You just declare what goes across the border.

The Standard Tool: wasm-bindgen

Here is the minimal setup. You define a function with #[wasm_bindgen]. You accept a String. You return a String.

use wasm_bindgen::prelude::*;

/// Greet a user by name.
#[wasm_bindgen]
pub fn greet(name: String) -> String {
    // wasm-bindgen allocates memory in WASM, copies the JS string here,
    // and converts UTF-16 to UTF-8 automatically.
    format!("Hello, {} from Rust!", name)
}

On the JavaScript side, you import the function and call it. The glue code runs behind the scenes.

import { greet } from './pkg';

const message = greet("Alice");
console.log(message); // "Hello, Alice from Rust!"

The community convention is to use wasm-pack for building. It runs cargo build, generates the JS glue, and sets up the package. Don't fight the toolchain. Use wasm-pack unless you have a specific reason to build the JS bindings by hand.

Let the runtime handle the bytes. Your job is the logic.

Inside the Glue Code

Understanding what happens under the hood helps you debug and optimize. When you call greet("Alice"), the wasm-bindgen runtime intercepts the call. It sees a string argument. It allocates a buffer in the WASM linear memory. It iterates over the JS string, decoding UTF-16 code units and encoding them as UTF-8 bytes. It writes those bytes into the buffer. It passes a pointer and length to your Rust function.

Your Rust function sees a normal String. It builds the result. The return path reverses the process. The runtime allocates a JS string, copies the UTF-8 bytes from WASM memory, converts to UTF-16, and returns the JS string. The WASM buffer is freed.

Encoding matters. JavaScript strings use UTF-16. Rust strings use UTF-8. An emoji like 🦀 is two code units in JS but four bytes in Rust. If you manually copy bytes without conversion, the emoji breaks. wasm-bindgen handles this transparently. It transcodes the data. Manual code must handle the transcode, or your crab emoji becomes garbage.

Real Code: Parsing and Returning

Real code often processes text. You might count words, validate input, or format output. Using &str for arguments is a common optimization. It tells wasm-bindgen to skip the allocation of a String on the Rust side. The data lives in WASM memory for the duration of the call, and Rust borrows it. This saves an allocation.

use wasm_bindgen::prelude::*;

/// Count words in a string without allocating a new String.
#[wasm_bindgen]
pub fn count_words(text: &str) -> usize {
    // Using &str avoids allocating a new String for the input.
    // The lifetime is valid only for the duration of this call.
    text.split_whitespace().count()
}

/// Return a structured result as a JSON string.
#[wasm_bindgen]
pub fn analyze(text: String) -> String {
    // We own the String here, so we can store it or move it.
    // This is necessary if we need to keep the data after the call.
    let words = text.split_whitespace().count();
    let chars = text.chars().count();
    
    // Serialize to JSON manually for this example.
    // In real code, use serde_json for complex structures.
    format!("{{\"words\": {}, \"chars\": {}}}", words, chars)
}

If you try to return a &str from a #[wasm_bindgen] function, the compiler rejects you with E0515 (cannot return reference to temporary value). The string would be dropped when the function ends, leaving a dangling pointer. Return a String instead. The bridge needs owned data to copy back to JavaScript.

Return owned data. The bridge needs something to hold onto.

When Things Go Wrong

Pitfalls appear when you ignore the boundary rules or try to manage memory manually.

Returning references. If you try to return a &str, the compiler stops you. Rust strings must be owned when crossing the boundary. The error E0515 tells you the reference points to data that will be destroyed. Return a String instead.

Manual memory leaks. If you drop wasm-bindgen and use raw pointers, you become the memory manager. Every allocation in WASM needs a deallocation. If JS forgets to call free, the WASM heap grows until the browser kills the tab. There is no garbage collector in WASM linear memory. You must export a free function and call it from JavaScript.

use std::alloc::{self, Layout};

/// Allocate a buffer in WASM memory.
#[no_mangle]
pub extern "C" fn allocate_buffer(len: usize) -> *mut u8 {
    // Allocate memory in WASM heap.
    // SAFETY:
    // 1. The caller must provide a valid length greater than zero.
    // 2. The caller must call free_buffer with the returned pointer.
    unsafe {
        let layout = Layout::from_size_align(len, 1).unwrap();
        alloc::alloc(layout)
    }
}

/// Free a buffer allocated by allocate_buffer.
#[no_mangle]
pub extern "C" fn free_buffer(ptr: *mut u8, len: usize) {
    // Deallocate memory in WASM heap.
    // SAFETY:
    // 1. ptr must be a valid pointer returned by allocate_buffer.
    // 2. ptr must not be used after this call.
    // 3. len must match the length used during allocation.
    unsafe {
        let layout = Layout::from_size_align(len, 1).unwrap();
        alloc::dealloc(ptr, layout);
    }
}

Encoding mismatches. If you manually copy bytes, you must handle UTF-8 vs UTF-16. JavaScript uses TextDecoder to read UTF-8 from WASM memory. Rust must write valid UTF-8. If you write raw bytes that aren't valid UTF-8, TextDecoder fails or produces replacement characters. wasm-bindgen avoids this by doing the conversion for you.

Type mismatches. If you pass a number where a string is expected, the compiler rejects you with E0308 (mismatched types). The glue code enforces types. You cannot sneak a number past the bridge disguised as a string.

If you manage the memory, you own the leaks. There is no garbage collector in WASM linear memory.

Choosing Your Strategy

Pick the tool that matches your boundary. Safety first, optimization later.

Use wasm-bindgen with String arguments when you need ownership of the input or the simplest integration path. The runtime handles allocation, encoding, and cleanup automatically.

Use wasm-bindgen with &str arguments when you want to avoid allocation and the string is only needed during the function call. This is faster for read-only operations like parsing or validation.

Use wasm-bindgen with JsValue when you need to pass arbitrary JavaScript values, including strings, objects, or functions, without strict type checking at compile time.

Use manual memory management with *const c_char only when you are integrating with a legacy C library inside WASM or building a custom binding layer where wasm-bindgen is unavailable. You must implement a matching free function and call it from JavaScript to prevent leaks.

Use serde_json for complex data structures instead of serializing strings manually. Strings are error-prone for nested data; JSON provides a standard format that both Rust and JavaScript handle efficiently.

Where to go next