A small trick for a simple interaction between Rust and C++

A small trick for a simple interaction between Rust and C++

At work, I’m rewriting some messy C++ Rust code.

Because of heavy use of callbacks (sigh), Rust sometimes calls C++, and C++ sometimes calls Rust. This is all thanks to the fact that both languages ​​provide C APIs for functions that can be called from the opposite language.

This applies to functions; but what about C++ methods? Here’s a little trick that lets you rewrite one C++ method at a time without a headache. And by the way, this works regardless of the language you rewrite the project in, it doesn’t have to be Rust!

Cunning

  1. Create standard‑layout class C++. It is defined by the C++ standard. Simply put, it makes a C++ class look like a regular C structure with some caveats: for example, a C++ class can still use inheritance and some other features. But what is especially important, virtual methods are prohibited. This limitation doesn’t bother me because I never use virtual methods and it’s my least favorite feature in any programming language.

  2. Create a Rust framework with exactly like that the same layout as in the C++ class.

  3. Create a Rust function with the C calling convention where the first argument is a Rust structure to create. Now you can access every member of a C++ class!

Note: Depending on the C++ code you’re working with, the first step may be trivial or impossible. This depends on the number of virtual methods used and other factors.

In my case, there were several virtual methods that could be successfully made non-virtual.

Sound too abstract? Let’s look at an example!

example

Here is our C++ class User. It stores name, UUID and number of comments. The user can write comments (just a line), which we display on the screen:

// Path: user.cpp

#include 
#include 
#include 
#include 

class User {
  std::string name;
  uint64_t comments_count;
  uint8_t uuid[16];

public:
  User(std::string name_) : name{name_}, comments_count{0} {
    arc4random_buf(uuid, sizeof(uuid));
  }

  void write_comment(const char *comment, size_t comment_len) {
    printf("%s (", name.c_str());
    for (size_t i = 0; i 

Let’s first make sure the class is responsive standard‑layout. Let’s add this check to the constructor (you can place it anywhere, but the constructor is a perfectly suitable place):

// Path: user.cpp

    static_assert(std::is_standard_layout_v);

III… the project is successfully assembled!

Now on to the second step: let’s define an equivalent class on the Rust side.

Let’s create a new library project on Rust:

$ cargo new --lib user-rs-lib

Let’s place our Rust framework in src/lib.rs.

We need to be careful about the alignment and order of the fields. For this we notice the structure as repr(C)to make the Rust compiler use the same layout as C:

// Path: ./user-rs/src/lib.rs

#[repr(C)]
pub struct UserC {
    pub name: [u8; 32],
    pub comments_count: u64,
    pub uuid: [u8; 16],
}

Note that if you want to, fields in the Rust structure can be named differently.

It is also important to note that std::string represented here as an opaque 32-byte array. This is because on my machine, with my standard library, sizeof(std::string) is 32. This is not guaranteed by the standard, so this approach makes the code not very portable. We’ll look at possible workarounds for this limitation at the end. I wanted to show that using standard library types does not prevent a class from being standard‑layout class, but also creates certain difficulties.

Let’s forget about this obstacle for now.

Now we can write a stub for the Rust function that will be the equivalent of the C++ method:

// Path: ./user-rs-lib/src/lib.rs

#[no_mangle]
pub extern "C" fn RUST_write_comment(user: &mut UserC, comment: *const u8, comment_len: usize) {
    todo!()
}

Now let’s use the tool cbindgen to generate a C-header corresponding to this Rust code.

$ cargo install cbindgen
$ cbindgen -v src/lib.rs --lang=c++ -o ../user-rs-lib.h

And we get the following C-header:

// Path: user-rs-lib.h

#include 
#include 
#include 
#include 
#include 

struct UserC {
  uint8_t name[32];
  uint64_t comments_count;
  uint8_t uuid[16];
};

extern "C" {

void RUST_write_comment(UserC *user, const uint8_t *comment, uintptr_t comment_len);

} // extern "C"

Now let’s go back to C++, include this C header, and add some checks to make sure the layouts do match. We put these checks in the constructor again:

#include "user-rs-lib.h"

class User {
 // [..]

  User(std::string name_) : name{name_}, comments_count{0} {
    arc4random_buf(uuid, sizeof(uuid));

    static_assert(std::is_standard_layout_v);
    static_assert(sizeof(std::string) == 32);
    static_assert(sizeof(User) == sizeof(UserC));
    static_assert(offsetof(User, name) == offsetof(UserC, name));
    static_assert(offsetof(User, comments_count) ==
                  offsetof(UserC, comments_count));
    static_assert(offsetof(User, uuid) == offsetof(UserC, uuid));
  }

  // [..]
}

Thanks to this, we are sure that the in-memory layout of the C++ class and the Rust structure match. We could generate all of these checks using a macro or code generator, but for the purposes of this article we can do it manually.

Now let’s rewrite the C++ method on Rust. We’ll leave the field blank for now namebecause it is a bit problematic. Later we’ll see how we can still use it with Rust:

// Path: ./user-rs-lib/src/lib.rs

#[no_mangle]
pub extern "C" fn RUST_write_comment(user: &mut UserC, comment: *const u8, comment_len: usize) {
    let comment = unsafe { std::slice::from_raw_parts(comment, comment_len) };
    let comment_str = unsafe { std::str::from_utf8_unchecked(comment) };
    println!("({:x?}) says: {}", user.uuid.as_slice(), comment_str);

    user.comments_count += 1;
}

We want to build a static library, so let’s specify that cargoadding the following lines to Cargo.toml:

[lib]
crate-type = ["staticlib"]

And now let’s collect the library:

$ cargo build
# This is our artifact:
$ ls target/debug/libuser_rs_lib.a

We can use our Rust function from C++ to functions mainbut with some uncomfortable ghosts:

// Path: user.cpp

int main() {
  User alice{"alice"};
  const char msg[] = "hello, world!";
  alice.write_comment(msg, sizeof(msg) - 1);

  printf("Comment count: %lu\n", alice.get_comment_count());

  RUST_write_comment(reinterpret_cast(&alice),
                     reinterpret_cast(msg), sizeof(msg) - 1);
  printf("Comment count: %lu\n", alice.get_comment_count());
}

And link (manually) our new Rust library with our C++ program:

$ clang++ user.cpp ./user-rs-lib/target/debug/libuser_rs_lib.a
$ ./a.out
alice (336ff4cec0a2ccbfc0c4e4cb9ba7c152) says: hello, world!
Comment count: 1
([33, 6f, f4, ce, c0, a2, cc, bf, c0, c4, e4, cb, 9b, a7, c1, 52]) says: hello, world!
Comment count: 2

The conclusion is slightly different for UUIDs, because in the Rust implementation we use a trait Debug defaults to slice output, but the content remains the same.

A few thoughts:

  • Challenges alice.write_comment(..) and RUST_write_comment(alice, ..) are strictly equivalent, and in fact the C++ compiler converts the first call to the second in pure C++ code if you look at the generated assembly code. So our Rust function just mimics what the C++ compiler would do anyway. However, we can place an argument User at any position in the function. In other words, we rely on API compatibility, not ABI.

  • A Rust implementation can freely read and modify closed members of a C++ class, such as a field comment_countwhich is only available in C++ via a getter, but Rust could access it if it were public. This happens because public/private modifiers are simply rules imposed by the C++ compiler. However, your CPU doesn’t know or care about that. Bytes are just bytes. If you can access the bytes at runtime, it doesn’t matter that they were marked as “private” in the source code.

We’re forced to use tedious typecasts, which is fine. We do reinterpret memory from one type (User) in another (UserC). This is allowed by the standard because the C++ class is standard‑layout class If it wasn’t, it would lead to undefined behavior and probably work on some platforms but break on others.

Accessing std::string with Rust

std::string should be treated as an opaque type from Rust’s point of view, because its representation may differ between platforms or even compiler versions, so we cannot accurately describe its layout.

But we want to access the output bytes of the string. So we need a helper function in C++ that will extract those bytes for us.

Rust first. We define a helper type ByteSliceViewwhich is a pointer and a length (analog std::string_view in recent versions of C++ and &[u8] in Rust), and our Rust function now accepts an additional parameter, name:

#[repr(C)]
// Akin to `&[u8]`, for C.
pub struct ByteSliceView {
    pub ptr: *const u8,
    pub len: usize,
}


#[no_mangle]
pub extern "C" fn RUST_write_comment(
    user: &mut UserC,
    comment: *const u8,
    comment_len: usize,
    name: ByteSliceView, // 

We rerun cbindgen and now C++ has access to the type ByteSliceView. Thus, we write a helper function for the conversion std::string into this type and pass an additional parameter to the Rust function (we also define a trivial getter get_name() for Userbecause name is still private):

// Path: user.cpp

ByteSliceView get_std_string_pointer_and_length(const std::string &str) {
  return {
      .ptr = reinterpret_cast(str.data()),
      .len = str.size(),
  };
}

// In main:
int main() {
    // [..]
  RUST_write_comment(reinterpret_cast(&alice),
                     reinterpret_cast(msg), sizeof(msg) - 1,
                     get_std_string_pointer_and_length(alice.get_name()));
}

We recompile and restart, and surprisingly, the Rust implementation now outputs the name:

alice (69b7c41491ccfbd28c269ea4091652d) says: hello, world!
Comment count: 1
alice ([69, b7, c4, 14, 9, 1c, cf, bd, 28, c2, 69, ea, 40, 91, 65, 2d]) says: hello, world!
Comment count: 2

Alternatively, if we can’t or don’t want to modify the Rust signature, we can make a C++ helper function get_std_string_pointer_and_length with the C convention and accept a pointer to void so that Rust can call this helper function itself, at the expense of multiple typecasts in and from void*.

Improved situation with std::string

  • Instead of modeling std::string as an array of bytes whose size depends on the platform, we could move this field to the end of the C++ class and remove it completely from Rust (since it is not used there). This would violate equality sizeof(User) == sizeof(UserC)now will be sizeof(User) - sizeof(std::string) == sizeof(UserC). So the layout will be exactly the same (up to the last field, which is perfectly fine) between C++ and Rust. However, this will break the ABI if external users depend on the exact C++ class. C++ constructors will also need to be adapted as they rely on field ordering. This approach is essentially analogous flexible array functions in C.

  • If memory allocation is cheap, we can store the name as a pointer: std::string *name; on the C++ side, and on the Rust side – as a pointer to void: name: *const std::ffi::c_voidsince pointers have a guaranteed size on all platforms. The advantage is that Rust can access the data in the std::stringby calling a C++ helper function with the C calling convention. However, some may prefer to use a “bare” pointer in C++.

Conclusion

We have successfully rewritten the C++ class method. This is a great technique because a C++ class can contain hundreds of methods in real code, and we can override them one by one without breaking or touching the others.

A big caveat: the more C++-specific functions and standard types a class uses, the more difficult it is to apply this technique because it requires helper functions to convert from one type to another and/or lots of tedious typecasting. If the C++ class is actually a C struct and uses only C types, this will be very simple.

That said, I’ve used this technique a lot at work and really appreciate its relative simplicity and incremental approach.

All this can also be theoretically automated, for example using tree-sitter or libclang for working with AST C++:

  1. Add a check to the C++ class constructor to make sure it exists standard-layout class, for example: static_assert(std::is_standard_layout_v); If the check does not pass, we skip this class – it requires manual intervention.

  2. Generate an equivalent Rust structure, such as a structure UserC.

  3. For each field of the C++ class/Rust structure, add a check to make sure the layout is the same: static_assert(sizeof(User) == sizeof(UserC)); static_assert(offsetof(User, name) == offsetof(UserC, name)); If the check does not take place, then we complete the work.

  4. For each C++ method, generate an empty equivalent Rust function, e.g. RUST_write_comment.

  5. The developer implements the Rust feature. Or II. Or something else.

  6. For each C++ call location, replace the C++ method call with a Rust function call: alice.write_comment(..); becomes RUST_write_comment(alice, ..);

  7. Remove overridden C++ methods.

And voila, the project has been rewritten!

Related posts