Calling Rust from Python

Table of Contents

Using PyO3 and maturin, it’s very easy to call Rust code from Python. I’m mostly following the guide at pyo3.rs, but leaving out some thing related to python environments.

1 Steps

  1. Install maturin. I use the Arch package but you can also do a pip install in the environment below.

  2. Make sure you have a lib target, and add cdylib as a crate-type.

    1
    2
    
    [lib]
    crate-type = ["cdylib", "rlib"]
    
  3. Add pyo3 as a dependency:

    1
    2
    
    [dependencies]
    pyo3 = { version = "0.22.2", features = ["extension-module"] }
    
  4. Create a python environment

    1
    
    python -m venv .env
    
  5. Wrap the functions you want to export. I like to put them in a separate src/py.rs module1. A list of how Rust types map to Python types is here. For example, for my minimizers repository:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    
    use super::*;
    use pyo3::prelude::*;
    
    #[pyfunction]
    pub fn generate_random_string(n: usize, sigma: usize) -> PyResult<Vec<u8>> {
        Ok(super::generate_random_string(n, sigma))
    }
    
    /// Take the minimizer configuration (a Rust enum) json encoded, we don't have
    /// to pass around complex types.
    /// The text is automatically converted between Rust slice and python list.
    /// Returns a simple floa.t
    #[pyfunction]
    fn density(scheme: &str, text: Vec<u8>, w: usize, k: usize, sigma: usize) -> PyResult<f64> {
        let tp: MinimizerType = serde_json::from_str(scheme).unwrap();
        let density = tp.stats(text, w, k, sigma).0;
        Ok(density)
    }
    
    /// A Python module. The name of this function must match the `lib.name`
    /// setting in the `Cargo.toml`, else Python will not be able to import the
    /// module.
    #[pymodule]
    fn minimizers(m: &Bound<'_, PyModule>) -> PyResult<()> {
        // Add our functions to the module.
        m.add_function(wrap_pyfunction!(generate_random_string, m)?)?;
        m.add_function(wrap_pyfunction!(density, m)?)?;
        Ok(())
    }
    
  6. Build the python code. Make sure to include the -r to build in release mode.

    1
    
    source .env/bin/activate && maturin develop -r
    
    1
    2
    3
    4
    5
    6
    
    🐍 Found CPython 3.12 at /home/philae/git/eth/git/minimizers/.env/bin/python
       Compiling minimizers v0.1.0 (/home/philae/git/eth/git/minimizers)
        Finished `release` profile [optimized + debuginfo] target(s) in 5.81s
    📦 Built wheel to /tmp/.tmpq2msxE/minimizers-0.1.0-cp312-cp312-linux_x86_64.whl
    ✏️  Setting installed package as editable
    🛠 Installed minimizers-0.1.0
    
  7. I like to put my python files separate, so create a py/ directory. From there, symlink the generated library:

    1
    2
    3
    
    mkdir py
    cd py
    ln -s ../.env/lib/python3.12/site-packages/minimizers
    
  8. Now we can run the following python code:

    1
    2
    3
    4
    5
    6
    7
    
    import minimizers
    n = 10000000
    sigma=4
    w = 12
    k = 12
    text = minimizers.generate_random_string(n, sigma)
    minimizers.density('{"minimizer_type": "ModSampling", "k0": 4}', text, w, k, sigma)
    

1.1 Using kwargs

Passing arguments JSON-encoded is somewhat ugly. We can do it nicer by passing the scheme being used, and passing any additional parameters as keyword argument (as in, t=4 at the end). The new Rust code then looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/// Get an integer parameter from the given optional dictionary.
fn get(dict: Option<&Bound<'_, PyDict>>, key: &str) -> PyResult<usize> {
    Ok(dict
        .ok_or_else(|| {
            PyValueError::new_err(format!(
                "Missing minimizer parameter {key}. Add {key}=<val>."
            ))
        })?
        .get_item(key)?
        .ok_or_else(|| {
            PyValueError::new_err(format!(
                "Missing minimizer parameter {key}. Add {key}=<val>."
            ))
        })?
        .extract()?)
}

/// `tp` is the main type of scheme being used.
/// Additional parameters can be passed using the `params` kwargs.
#[pyfunction]
#[pyo3(signature = (tp, text, w, k, sigma, **params))]
fn density(
    tp: &str,
    text: Vec<u8>,
    w: usize,
    k: usize,
    sigma: usize,
    params: Option<&Bound<'_, PyDict>>,
) -> PyResult<f64> {
    let scheme: super::MinimizerType = match tp {
        "Minimizer" => super::MinimizerType::Minimizer,
        "Miniception" => super::MinimizerType::Miniception {
            k0: get(params, "k0")?,
        },
        // other variants omitted
        _ => PyResult::Err(PyValueError::new_err("Invalid minimizer type"))?,
    };
    let density = scheme.stats(&text, w, k, sigma).0;
    Ok(density)
}

Now, we can call our function in a much cleaner way:

1
2
-minimizers.density('{"minimizer_type": "ModSampling", "k0": 4}', text, w, k, sigma)
+minimizers.density("ModSampling", text, w, k, sigma, k0 = 4)

2 TODOs

  • Figure out hot-reloading of the library after recompilations. So far, neither %autoreload nor importlib seem to work.

  • In my understanding, inputs and outputs are converted between Rust and Python representations on every invocation. When passing large texts, such as a human genome, it’s probably nicer to store them as opaque types instead. That way Python can’t look inside them (and thus not read/write them), but we won’t have to pay the price for converting.

    This should be possible using PyCapsule.


  1. Note that neither the module nor the functions inside it have to be pub↩︎