This is too dangerous for C++

This is too dangerous for C++

Some patterns became possible to use practically only thanks to Rust’s memory safety, but in C++ they are too dangerous. The article gives one such example.

Working on an internal library written in Rust, I created a type of errors for the parser that should be able to do Clone without duplicating internal data. In Rust, this requires a reference-count pointer (referencecounted pointer) like Rc.

So I wrote my fallible type, used it as a fallible function error variant, and moved on.

struct Error {
    data: Rc<ExpensiveToCloneDirectly>,
}

pub type Response = Result<Output, Error>;

fn parse(input: Input) -> Response {
    todo!()
}

Later we noticed that parsing takes a long time for some types of input, so I decided to send the input to another thread using a pipe and receive the response through another pipe so that these long operations don’t block the main thread.

enum Command {
    Input(Input),
    Exit,
}

pub enum RequestStatus {
    Completed(Response),
    Running,
}

pub struct Parser {
    command_sender: Sender<Command>,
    response_receiver: Receiver<(Input, Response)>,
    cached_result: HashMap<Input, RequestStatus>,
}

impl Parser {
    pub fn new() -> Self {
        let (command_sender, command_receiver) = channel::<Command>();
        let (response_sender, response_receiver) = channel::<(Input, Response)>();

        std::thread::spawn(move || loop {
            match command_receiver.recv() {
                Ok(Command::Input(input)) => {
                    let response = parse(input);
                    let _ = response_sender.send((input, response));
                }
                Ok(Command::Exit) => break,
                Err(_) => break,
            }
        });

        Self {
            command_sender,
            response_receiver,
            cached_result: HashMap::default(),
        }
    }

    pub fn request_parsing(&mut self, input: Input) -> RequestStatus {
        // накачиваем ранее полученные ответы
        while let Ok((input, response)) = self.response.receiver.try_recv() {
            self.cached_result
                .insert(input, RequestStatus::Completed(response));
        }

        let response = match self.cached_result.entry(input) {
            Entry::Vacant(entry) => {
                self.command_sender
                    .send(Command::Input(entry.key()))
                    .unwrap();
                entry.insert(RequestStatus::Running)
            }
            Entry::Occupied(entry) => entry.into_mut(),
        };
        response.clone()
    }
}

However, when I made this change I saw the following error:

error[E0277]: `Rc<String>` cannot be sent between threads safely
   --> src/main.rs:58:32
    |
58  |               std::thread::spawn(move || loop {
    |  _____________------------------_^
    | |             |
    | |             required by a bound introduced by this call
59  | |                 match command_receiver.recv() {
60  | |                     Ok(Command::Input(input)) => {
61  | |                         let response = maybe_make(input);
...   |
68  | |                 }
69  | |             });
    | |_____________^ `Rc<String>` cannot be sent between threads safely
    |
    = help: within `(&'static str, Result<worker::Output, worker::Error>)`, the trait `Send` is not implemented for `Rc<String>`
note: required because it appears within the type `Error`
   --> src/main.rs:17:16
    |
17  |     pub struct Error {
    |                ^^^^^
note: required because it appears within the type `Result<Output, Error>`
   --> /home/dureuill/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs:502:10
    |
502 | pub enum Result<T, E> {
    |          ^^^^^^
    = note: required because it appears within the type `(&str, Result<Output, Error>)`
    = note: required for `Sender<(&'static str, Result<worker::Output, worker::Error>)>` to implement `Send`
note: required because it's used within this closure
   --> src/main.rs:58:32
    |
58  |             std::thread::spawn(move || loop {
    |                                ^^^^^^^
note: required by a bound in `spawn`
   --> /home/dureuill/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:683:8
    |
680 | pub fn spawn<F, T>(f: F) -> JoinHandle<T>
    |        ----- required by a bound in this function
...
683 |     F: Send + 'static,
    |        ^^^^ required by this bound in `spawn`

As explained by the compiler, the error is caused by the fact that the type Rc does not support forwarding between threads, as this will cause a data race. And indeed, counting links in Rc is not done atomically, which would be thread-safe, but uses regular integer operations.

To provide thread-safe reference counting, Rust has another type called Arcwhich uses atomic reference counting (atomic refference counting). To change the code to use ArcIt is enough to do the following:

diff --git a/src/main.rs b/src/main.rs
index 04ec0d0..fd4b447 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -3,9 +3,9 @@ use std::{io::Write, time::Duration};
 mod parse {
     use std::{
         collections::{hash_map::Entry, HashMap},
-        rc::Rc,
         sync::{
             mpsc::{channel, Receiver, Sender},
+            Arc,
         },
         time::Duration,
     };
@@ -15,13 +15,13 @@ mod parse {

     #[derive(Clone, Debug)]
     pub struct Error {
-        data: Rc<ExpensiveToCloneDirectly>,
+        data: Arc<ExpensiveToCloneDirectly>,
     }

     impl Error {
         fn new(data: ExpensiveToCloneDirectly) -> Self {
             Self {
-                data: Rc::new(data),
+                data: Arc::new(data),
             }
         }
     }

(Test this code online)

As long as I didn’t need the link counting to be atomic, I could use Rc. When I needed thread safety, the compiler forced me to switch to Arc and to atomic reference counting. This is an illustration of the old principle “don’t pay for what you don’t use”.

This principle is also very close to C++ developers, however, unlike Rust, C++ has in its standard library only generic pointers with atomic reference counting equivalent to Arcbut not Rc. We always have to pay for atomicity, even if we don’t use it. Providing two classes was considered but rejected, in part because it was considered too dangerous (“Code written with out of sync”) shared_ptr can be used in threaded code and cause difficult-to-debug problems without displaying warnings).

Since Rust tracks them at compile time, they are safe.

Some implementations of the C++ standard library attempted to recover lost performance in some limited situations (such as when the program as a whole is not multithreaded), and this had a funny effect on microbenchmarks.

However, security is not guaranteed

Unfortunately, the caveats made by C++ (the constant use of atomic reference counting) are still not enough to shared_ptr was safe in a multi-threaded context, because developers still have to consider a couple of features that allow you to shoot yourself in the foot.

shared_ptr is thread-safe on copy but not on assignment

This problem is quite subtle and I honestly don’t think I’ve ever run into it, but I’ll mention it for clarity because sometimes people confuse it with the second problem.

Can take shared_ptrmake a copy of it by calling its copy constructor in a thread-safe way. However, one copy cannot be made shared_ptr common to several threads. Imagine you have a struct containing a shared pointer shared by multiple threads, and a method on that struct that overrides that shared pointer. If this method is called asynchronously by multiple threads, this will result in undefined behavior.

Apparently this problem was considered serious enough to add C++20 for std::atomic<std::shared_ptr> partial specialization of templates. However, I do not advise using it! Instead, limit the shared pointer to one thread and send copies to other threads as needed.

Since assignment requires an exclusive reference or an object with ownership, Rust statically disallows assignment Arcwhich is used by multiple threads, avoiding this problem at compile time.

The object pointed to still requires synchronization

IN shared_ptr only reference counting is atomic, but writing and reading from different streams of the object being pointed to requires its own synchronization. There is a danger in this, because there is a temptation to cut backshared_ptr is a pointer with a count of safe references to streams “to”shared_ptr is a thread-safe pointer with reference counting,” although only the former is true.

While this may seem obvious to experienced developers, in practice I’ve seen this problem a lot, and it always seems to be with junior developers. I’ve never seen this mistake made by knowledgeable developers refactoring their code to add threads.

Of course, Rust has the same content requirements Arcbut thanks to the trait Send and trait Sync and also because the counter Arc provides a generic reference to its contents, an asynchronous write and read of the object addressed by the pointer is a compile-time error.

Rust achieves this result thanks to the borrow checker and its type system. It is the only language I work with that can statically prevent data races.

Related posts