Simple Lambda Calculus

Simple Lambda Calculus

In the last article, we implemented the unification algorithm on Rust. Now let’s apply it to a real example – a simple lambda calculation.

Syntax

The syntax of expressions in our simple lambda calculus will look like this:

<expr> ::= <identifier>
        |  \ <identifier> . <expr>
        | ( <expr> <expr> )

In this case \ <identifier> . <expr> this is the syntax for lambdas:

\x.x
 ^ параметр лямбды
   ^ "возвращаемое" значение
\x.\y.x
\x.\y.y
\a.((\x.\y.y) a)

AND <expr> <expr>for applications:

\x.\y.(x y)
       ^ лямбда
         ^ аргумент

Now that we have familiarized ourselves with the syntax, we can proceed to the sweetest part: implementation!

Tokenization (scanning, or lexical analysis, or …)

The first thing we need to do with the source in our simple programming language is to convert it to a list tokensi.e. tokenize:

Token – here it can be considered as a grammatical unit of the language: different types of punctuation, name (identifier), sometimes spaces, keywords, etc.

Let’s mark the token structure in our language:

#[derive(Debug, PartialEq, Eq, Clone, Hash)]
pub enum Token {
    Lambda, // `\`
    OpenParent, // `(`
    CloseParent, // `)`
    Dot, // `.`
    Name(String), // identifier
}

We implement tokenization!

pub fn tokenize(source: &str) -> Vec<Token> {
    let mut chars = source.chars().peekable();
    let mut tokens = Vec::new();

    while let Some(c) = chars.next() {
        match c {
            '(' => tokens.push(Token::OpenParent),
            ')' => tokens.push(Token::CloseParent),
            '\\' => tokens.push(Token::Lambda),
            '.' => tokens.push(Token::Dot),
            _ => {

Here we process tokens consisting of single characters – punctuation. Now the most interesting thing is the identifiers:

                if c.is_whitespace() {
                    // ignore
                } else if c.is_alphabetic() || c == '_' {
                    ...
                } else {
                    panic!("unexpected character `{}`", c);
                }
            }
        }
    }

    tokens
}

Here we do several things. First, we skip gaps in our program (they don’t even matter from the point of view of syntax here). Second, if we encounter an invalid character, we throw an error (panic for now, which we’ll fix later). Thirdly, we process identifiers:

                    let mut name = String::new();

                    name.push(c);

                    while let Some(&c) = chars.peek() {
                        if c.is_alphanumeric() || c == '_' {
                            name.push(c);
                            chars.next();
                        } else {
                            break;
                        }
                    }

                    tokens.push(Token::Name(name));

While the next character is either a number, or a letter, or '_'we continue to go further and add characters to our string nameotherwise, we break the cycle and go to scan the next tokens.

Parser

Now that we have a list of tokens (which is easier to work with than the source code), we need to do some parsing:

A parser takes a list of tokens and turns it into an abstract syntax tree, or AST.

The first thing we need to do is mark up the AST for the expressions:

#[derive(Debug, PartialEq, Eq, Clone, Hash)]
pub enum Expression {
    Lambda {
        parameter: String,
        value: Box<Self>,
    },
    Apply {
        callee: Box<Self>,
        argument: Box<Self>,
    },
    Variable {
        name: String,
    },
}

I hope there should be no questions here.

Now we need to create a structure that will hold an iterator over the tokens in order to iterate over them:

struct ParseState<I>
where
    I: Iterator<Item = Token>,
{
    tokens: Peekable<I>,
}

impl<I> ParseState<I>
where
    I: Iterator<Item = Token>,
{
    fn new(tokens: I) -> Self {
        Self {
            tokens: tokens.peekable(),
        }
    }

Now you can start the most interesting part! Let’s start with the expressions enclosed in brackets:

    fn parse_parenthesized_expression(&mut self) -> Expression {
        let inner_expression = self.parse_expression();

        if !matches!(self.tokens.next(), Some(Token::CloseParent)) {
            panic!("expected `)`");
        }

        inner_expression
    }

We don’t check for a parenthesis here, because it’s assumed that the function is already handled when the function is called.

Now the lambda processing:

    fn parse_lambda(&mut self) -> Expression {
        let parameter = match self.tokens.next() {
            Some(Token::Name(name)) => name,
            _ => panic!("expected identifier"),
        };

        if !matches!(self.tokens.next(), Some(Token::Dot)) {
            panic!("expected `.`");
        }

        let value = self.parse_primary_expression();

        Expression::Lambda {
            parameter,
            value: Box::new(value),
        }
    }

Here it is also assumed that \ already processed

In this case, we can notice the method parse_primary_expression. This is essentially the parsing of all expressions except the application (<expr> <expr>), because for example here:

\x.x a

We don’t steam x a whole, but we take onlyx. And that’s why we get a completely different course of action, since \x.x and a they are two separate expressions.

Let’s implement this method!

    fn parse_primary_expression(&mut self) -> Expression {
        match self.tokens.next() {
            Some(Token::Lambda) => self.parse_lambda(),
            Some(Token::OpenParent) => self.parse_parenthesized_expression(),
            Some(Token::Name(name)) => Expression::Variable { name },
            _ => panic!("expected expression"),
        }
    }

Here we call all the functions we wrote before and also process simple variables.

Now we need to process application (<expr> <expr>):

    fn parse_expression(&mut self) -> Expression {
        let mut left = self.parse_primary_expression();

        while !matches!(
              self.tokens.peek(), 
              Some(Token::CloseParent) | None) {
            let argument = self.parse_primary_expression();

            left = Expression::Apply {
                callee: Box::new(left),
                argument: Box::new(argument),
            };
        }

        left
    }
}

Here, we continue parsing expressions until we encounter a closing parenthesis, or until we hit the end of the input data:

          while !matches!(
              self.tokens.peek(), 
              Some(Token::CloseParent) | None) { ... }

Well, and most logically, we analyze the argument at application and store it all in AST:

            let argument = self.parse_primary_expression();

            left = Expression::Apply {
                callee: Box::new(left),
                argument: Box::new(argument),
            };

Thus, we implemented a simple parser in ~130 lines of code for the lambda calculation. And now what we’ve been waiting for so long:

Type inference

Remember in the last article, I talked about type constraints? So, for the sake of simplicity, we will have only one type of constraint – equality:

#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
enum Constraint {
    Eq { left: Arc<Type>, right: Arc<Type> },
}

That is, as previously explained, the constraint requires that the types are equal. Accordingly, unification will continue to accept left and right types and give us new ones substitutions.

Now, since we’re already starting to work with scopes, we need somewhere to store the type information. We’ll do it in the most obvious way so far:

#[derive(Debug, Clone, PartialEq, Eq, Default)]
struct Environment {
    variables: HashMap<String, Arc<Type>>,
}

impl Environment {
    fn new() -> Self {
        Self::default()
    }
}

Now let’s create a context for our type inference:

#[derive(Debug, PartialEq, Eq)]
struct InferenceContext<'env> {
    constraints: Vec<Constraint>,
    environment: &'env mut Environment,
    last_type_variable_index: usize,
}

impl<'env> InferenceContext<'env> {
    fn new(environment: &'env mut Environment) -> Self {
        Self {
            constraints: Vec::new(),
            environment,
            last_type_variable_index: 0,
        }
    }

As you can see, there is an interesting field here. last_type_variable_index. It is needed to create new unique type variables. Do you remember?

Let’s write functions that will generate new type variables and placeholder types themselves:

    fn fresh_type_variable(&mut self) -> TypeVariable {
        self.last_type_variable_index += 1;
        TypeVariable(self.last_type_variable_index)
    }

    fn type_placeholder(&mut self) -> Arc<Type> {
        Arc::new(Type::Variable(self.fresh_type_variable()))
    }

And now we make the finishing touches. Let’s write a function that we will call infer_type. It will return the type of this expression:

    fn infer_type(&mut self, expression: &Expression) -> Arc<Type> {
        match expression {
            Expression::Variable { name } => self
                .environment
                .variables
                .get(name)
                .unwrap_or_else(|| panic!("unbound variable: {}", name))
                .clone(),

The first thing we should handle is the simplest case. If the expression is a variable, we take information about the type from the scope.

            Expression::Lambda {
                parameter,
                value,
            } => {
                let parameter_type = self.type_placeholder();

                self.environment
                    .variables
                    .insert(parameter.clone(), parameter_type.clone());

                let value_type = self.infer_type(value);

                Arc::new(Type::Constructor(TypeConstructor {
                    name: "Function".to_owned(),
                    generics: vec![parameter_type, value_type],
                }))
            }

Everything is more interesting with lambdas. First, we know the type of the lambda parameter. The only thing we can find out is its value, which we return, which is what we do:

                let value_type = self.infer_type(value);

Accordingly, for the type of the parameter, we will leave the type variable for now (which can later be removed using substitutions) and, of course, add to the current scope so that the parameter can be used when analyzing the returned value:

                let parameter_type = self.type_placeholder();

                self.environment
                    .variables
                    .insert(parameter.clone(), parameter_type.clone());

Note that we return a type constructor with a name Function:

                Arc::new(Type::Constructor(TypeConstructor {
                    name: "Function".to_owned(),
                    generics: vec![parameter_type, value_type],
                }))

In this case, the first generic argument is the type of the lambda parameter, and the second is the type of the return value.

Now let’s start analyzing applications:

            Expression::Apply { callee, argument } => {
                let callee_type = self.infer_type(callee);
                let argument_type = self.infer_type(argument);

                let return_type = self.type_placeholder();

                self.constraints.push(Constraint::Eq {
                    left: callee_type.clone(),
                    right: Arc::new(Type::Constructor(TypeConstructor {
                        name: "Function".to_owned(),
                        generics: vec![argument_type.clone(), return_type.clone()],
                    })),
                });

                return_type
            }
        }
    }

First, we analyze the types of the lambda itself and the argument. The value is returned, we could learn the theory if it were not for the typical variables. Instead of such a naive approach, we can still mark the return value as a type variable and use the type constraint:

                self.constraints.push(Constraint::Eq {
                    left: callee_type.clone(),
                    right: Arc::new(Type::Constructor(TypeConstructor {
                        name: "Function".to_owned(),
                        generics: vec![argument_type.clone(), return_type.clone()],
                    })),
                });

And based on the information that Function<A, ?t> = Function<A, B> we already know the type of the return value (?t = B), and secondly, let’s check the types for type mismatch.

Congratulations, we’re almost done!

Now we need a function that will go through type constraints and apply our function written in the last article unifyto get substitutions for default variables:

    fn solve(self) -> HashMap<TypeVariable, Arc<Type>> {
        let mut substitutions = HashMap::new();

        for constraint in self.constraints {
            match constraint {
                Constraint::Eq { left, right } => {
                    unify(left, right, &mut substitutions);
                }
            }
        }

        substitutions
    }
}

And now the sweetest part, let’s check if our algorithm works:

fn main() {
    let mut environment = Environment::new();

    environment.variables.insert(
        "add".to_owned(),
        tconst!(
            "Function",
            tconst!("int"),
            tconst!("Function", tconst!("int"), tconst!("int"))
        ),
    );

    let expression = ParseState::new(tokenize("\\a.(add a a)").into_iter()).parse_expression();

    let mut context = InferenceContext::new(&mut environment);
    let ty = context.infer_type(&expression);
    let substitutions = context.solve();

    println!("{:?}", ty.substitute(&substitutions));
}

As we can see, the algorithm understood that the type of expression \a.(add a a) if add it Function<int, Function<int, int>> it Function<int, int>which sounds logical:

Constructor(TypeConstructor { name: "Function", generics: [Constructor(TypeConstructor { name: "int", generics: [] }), Constructor(TypeConstructor { name: "int", generics: [] })] })

Github

All the source code I provide in the articles can now be found here.

End

Thank you very much for your attention! In the following articles, we will talk about more advanced methods of implementing type inference, interesting features and much more!

Related posts