Rust is good / Hebrew

Rust is good / Hebrew

As you know, html developers miss multithreading. While reading Mara Bos’ gorgeous book about “atomics and loops” the other day, I came across an interesting programming language: rust lang. According to the Internet, the learning curve grows to infinity, but for multi-threaded programming, the choice seems normal. Here the question is whether it is possible and, most importantly, whether it is convenient to use growth for business logic (that is, for production).

Summary: macros, enemy compiler, friend compiler, unsafe and miri, multistream, options, iters, match.

Let’s be honest, the features were tested on an educational project, but as happens with all projects (educational projects), they were rewritten dozens of times.

Macros

The design of any business logic is usually done with a small improvement for the bright future. Addition of new features occurs if you greatly multiply, by inheritance, copy-paste, expansion of the global hash table or “add here and there” a couple of if-elif constructions. In general, it’s about the balance between reuse and copy-paste. If you need to inherit from something, it is probably better to use a capacitive dsl design. The functional meaning is the same, but it looks better due to locality and incoherence. For example, the macro tree!

let expr = tree![
    | BracketLeft, Expr, BracketRight, Term
    | Number, Term
    | Never
];

let term = tree![
    | Operator, Expr
];

The output is two trees that match, for example, the input string 5+5+5 or (2+2)*2. The engine waits for the first column of the macro, and if there is a match, it continues along the selected row. Constituent tokens (such as Expr or Term) are revealed in yet another tree. Using a macro is a bit more complicated:

// grammar for javascript object literals

let map = HashMap::from([
	(Expr, tree![
	    | CurlyBracketLeft, Object, CurlyBracketRight
	    | SquareBracketLeft, Array, SquareBracketRight
	    | OtherExpr
	]),
	(Object, tree![
	    | Variable, Value
	    | String, Value
	    | SquareBracketLeft, Variable, SquareBracketRight, Value
	]),
	(Value, tree![
	    | Colon, Expr, Term
	    | Never
	]),
	(Term, tree![
	    | Comma, Object
	]),
]);

The macro code seems confusing, but then you get used to it :). Here we match on the vertical bar and fill the tree with tokens row by row.

macro_rules! tree {
    ($(| $($x:ident),+ $(,)?)+) => {
        {
            let mut tt = TokenTree::new();
            $(
                let v = vec![$(Token::$x,)*];
                tt.add_right(v[0]);
                for x in v.into_iter().skip(1) {
                    tt.add_left(x, Token::Never);
                }
            )*
            tt
        }
    }
}

Another example of a compile time macro

macro_rules! _reverse_vec {
    ([$first:ident $($rest:ident)*] $($acc:ident)*) => {
        _reverse_vec!([$($rest)*] $first $($acc)*)
    };
    ([] $($x:ident)*) => {
        vec![$(Token::$x,)*]
    }
}

Good compiler, bad compiler

When using the entry api for hash tables, the compiler first likes to quarrel, especially on the js of developers who use objects (they are maps) almost everywhere:

// javascript code for if-elif construction

const x = ({
	plus: func,
	minus: func2,
}[operator] || defFunc)(operand, operand2);

The Entry api accepts a lambda for object modification:

map.entry("key")
   .and_modify(|e| { *e += 1 })
   .or_insert(42); 

In the lambda above, you want to stuff as much code as possible, like in python or javascript, but in growth, the compiler will most likely be unhappy and will greatly reduce the range of possible actions, since the argument is passed by mutable reference. In fact, the compiler helps, or wants to help. Error messages are very good. Not much time, and you are already bored by the compiler to grow in other programming languages. On the Internet, someone said that pair programming is growing for free – and it is. Affine types (type system) allows (rather forces, but these are subtleties) to look at the structure of the code from a slightly different angle, until, frankly, it is not clear whether it is good or bad, but something new is the right direction.

Rc & Arc

Just in case, the standard growth library has a nice set of primitives for working in a multi-threaded environment.

Our test project parses JavaScript files, in principle, using only the following structure:

type Choice = Option<Word>;

#[derive(Default, PartialEq, PartialOrd, Debug, Clone)]
struct Word(Token, Rc<Choice>, Rc<Choice>);

Logic, in two words – we move to the left (select the second element of the Word structure) if the input token matches the expected token, otherwise – we move to the right (the third element of the Word structure). The Rc (reference count) type is needed here to store our Word structure in several places at once (for example, for tree iteration and hash table matching).

Most likely, we will want to parse files in parallel. There is an excellent Rayon library in growth that seems to be made for these purposes. That is, we have a list of files, and instead of input.iter() we write par_iter() -> profit (almost).

use rayon::prelude::*;

fn par(input: Vec<PathBuf>) {
    let tt = token_tree();
    input
        .par_iter()
        .map(|path| {
            let str = fs::read_to_string(path).expect("Unable to read file");
            let data = str.bytes().peekable();
            let v = tokens(data).into_iter().rev().collect();
            parse(v, &tt)
        })
        .collect()
}

Do not forget to replace the Rc type with Arc (Atomically Reference Counted) in the structure above, and the system works in a multithreaded environment. Profit

unsafe and miri

Using smart pointers (rc, arc) for immutable structures is probably overhead, sometimes simple pointers are enough. Incrementing a pointer read is an unsafe operation, and it’s best, frankly, to stay away from unsafe blocks. There is an excellent book on the Internet “Learn Rust With Entirely Too Many Linked Lists”, which will help make life with unsafe a little easier. As a result, we replace Arc with ordinary pointers, without losing the ability to parse files in parallel:

use std::ptr::NonNull;
use std::marker::PhantomData;

pub struct Tree<T> { /* omitted */ }

unsafe impl<T: Send> Send for Tree<T> {}
unsafe impl<T: Sync> Sync for Tree<T> {}

type Link<T> = Option<NonNull<Node<T>>>;

struct Node<T> {
    elem: T,
    left: Link<T>,
    right: Link<T>,
}

#[derive(Default)]
pub struct Cursor<'a, T> {
    current: Link<T>,
    _foo: PhantomData<&'a T>, 
}

impl<'a, T> Cursor<'a, T> {
    pub fn get(&self) -> Option<&'a T> {
        unsafe {
            self.current.map(|node| &(*node.as_ptr()).elem)
        }
    }

    pub fn left(&mut self) {
        unsafe {
            if let Some(node) = self.current.take() {
                self.current = (*node.as_ptr()).left;
            }
        }
    }

    pub fn right(&mut self) {
        unsafe {
            if let Some(node) = self.current.take() {
                self.current = (*node.as_ptr()).right;
            }
        }
    }
}

Here, the Tree structure (representing a tree of grammars) rummages between streams, and the Cursor (our pointer) does not rummage. In other words, the Cursor is created in the scope of the execution thread, but the tree is one for all threads and must be immutable. PhantomData is a so-called marker that intuitively shows what type of data this structure has (useful when working with pointers). It seems to work, but unsafe code is not checked by the compiler, and we have already decided that the compiler is a friend. This is where the Miri project (An experimental interpreter for Rust’s mid-level intermediate representation) comes to the rescue. Dudes seem to be trying to test unsafe code, and they seem to be getting something. Miri says everything is ok with unsafe code, but swears at the Rayon library (parallel calculations). The author on the Internet says that everything is ok, but he does not know how to persuade Miri. Well, let’s leave it to the conscience of the library author and move on to iterators.

iters, options, match

Iterators came from Haskell, if not Haskell at all – they are still well done, very good api. For example, in python, please, without a console, remember the position of the index when enumerate(list)? In growth, in my opinion, the answer is obvious:

for (x,i) in v.iter().zip(0..) {
	println!("val {x} - index {i}");
}

Or the Option type – which is implicitly imported into the scope of each file – is just beautiful. The next time you want to set -1 as no value, please remember Option. And finally, match:

let token = match ch as char {
    '0'..='9' => Token::Number,
	'A'..='Z' | 'a'..='z' | '_' => Token::Variable,
    '*' | '+' | '-' | '/' | '&' | '|' | '%' | '^' => Token::Operator,
    _ => panic!("no match"),
};

It seems that the creators of growth wanted Haskell in production, but in such a way that, like, no one would have guessed. Idea approx.

Related posts