what it is and how to use

what it is and how to use

Hello, Khabriv residents!

I’m Dima, a Python developer from 21YARD, a construction contractor search service.

In the article I will talk about the pattern Interpreter. Let’s figure out when to use it, what concepts are behind it. After that, we use the pattern to write a program to solve mathematical expressions. Search for the code in the repository using the link.

When to use

Interpreter used when it is necessary to process lexical constructions with certain grammatical rules. As an example – regular expressions.

Examples of situations in which the pattern is useful:

  • when a language needs to be developed to describe operations or behavior;

  • when you often have to process complex expressions;

  • when the grammar of the system is often supplemented with new rules.

The interpreter makes it easy to add new rules and symbols to the grammar of the language. This makes the system flexible and extensible. Also, the interpreter simplifies the processing of complex expressions: it allows you to break complex expressions into simple parts and process them step by step.

Concepts at the core

The interpreter is based on three main concepts:

  • lexical processor: analyzes an expression and creates from it tokens. A token is an object that stores the value of a literal and its type;

  • token handler: builds an abstract syntax tree from tokens. The tree is built according to the priority of the operations in the expression. The root of the tree is the last operation in the expression;

  • interpreter itself: interprets the user expression. If the expression has no errors, processes it. Otherwise, it reports an error.

Realization

Let’s write an interpreter of mathematical expressions that will process user input and return an answer.

The interpreter will support:

  • Basic arithmetic operations: addition (+), subtraction (-), multiplication

  • distribution (/), exponentiation (^);

  • Execution of operations taking into account the priority;

  • The ability to use parentheses to determine the priority of operations;

  • Basic math functions such as sin, cos, tan, log, sqrt, and exp;

  • Variables: the user will be able to define variables, assign them values ​​and use them in subsequent operations;

Error handling.

Lexical analysis of the expressionWe will describe the types of tokens that determine the grammar of the expression language (src/enums.py

class TokenTypesEnum(str, Enum):
	NUMBER = 'number'
	PLUS = 'plus'
	MINUS = 'minus'
	STAR = 'star'
	SLASH = 'slash'
	CARET = 'caret'
	LEFT_PARENTHESIS = 'left_parenthesis'
	RIGHT_PARENTHESIS = 'right_parenthesis'
	EOF = 'EOF'

):Let’s create a token class that will store a literal and its type (src/tokens.py

@dataclass
class Token:
	type: TokenTypesEnum
	literal: str

	def __str__(self) -> str:
    	return f'{self.__class__.__name__}({self.type}, {self.literal})'

): It is necessary to divide the mathematical expression into tokens. For this, it is necessary to define grammatical rules. We use it for this RegEx– regular expressions (src/config.py

LEXICAL_RULES: Dict[TokenTypesEnum, str] = {
	TokenTypesEnum.NUMBER: r'(\d+(\.\d+)?)',
	TokenTypesEnum.PLUS: r'(\+)',
	TokenTypesEnum.MINUS: r'(\-)',
	TokenTypesEnum.STAR: r'(\*)',
	TokenTypesEnum.SLASH: r'(/)',
	TokenTypesEnum.CARET: r'(\^)',
	TokenTypesEnum.LEFT_PARENTHESIS: r'(\()',
	TokenTypesEnum.RIGHT_PARENTHESIS: r'(\))'
}

):A mathematical expression must be converted into a set of tokens. To do this, we implement a lexical processor (src/lexical_processor.py

class LexicalProcessor(Processor):

	def __init__(self) -> None:
    	self._expression: str=""
    	self._results: List[Token] = []

	def process_expression(self, expression: str) -> List[Token]:
    	"""
    	Processes expression and returns the list of tokens generated from expression, if expression is valid.
    	"""

    	# Initializes for each new expression processing:
    	self._expression = expression
    	self._results = []

    	self._extract_regex_pattern_from_expression()

    	# Add a token symbolizing the end of the line for further operations on the preprocessed expression:
    	self._results.append(
        	Token(
            	literal="",
            	type=TokenTypesEnum.EOF
        	)
    	)

    	return self._results

	def _extract_regex_pattern_from_expression(self) -> None:
    	"""
    	Extracts the RegEx pattern from expression, starting from the beginning.
    	If one of RegEx patterns is found in the expression, the corresponding token will be created.
    	In other case, expression is not valid and ExpressionSyntaxError will be raised.

    	After token is created, the expression is reduced by the characters used for tokenization
    	and processed recursively.
    	"""

    	while len(self._expression) > 0:
        	max_rule: TokenTypesEnum = TokenTypesEnum.EOF
        	max_lit: str=""
        	self._expression = self._expression.strip()

        	# Finding the longest part of an expression using RegEx:
        	for rule in LEXICAL_RULES.keys():
            	regex_pattern: re.Pattern[str] = re.compile(pattern=LEXICAL_RULES[rule])
            	regex_match: Optional[re.Match[str]] = regex_pattern.match(string=self._expression)
            	if (regex_match is not None) and (len(regex_match.group(0)) > len(max_lit)):
                	max_lit = regex_match.group(0)
                	max_rule = rule

        	if max_rule == TokenTypesEnum.EOF:
            	raise ExpressionSyntaxError()

        	self._expression = self._expression[len(max_lit):]  # Reduce the expression by the processed part
        	self._results.append(
            	Token(
                	literal=max_lit,
                	type=max_rule
            	)
        	)

): Method self._extract_regex_pattern_from_expression

removes spaces from both sides of an expression. It then processes the first literal in the expression, determining its type using grammar rules. If the literal does not match any of the rules, then the expression is incorrect and an error will be generated. If the expression is correct, a new token will be created, and the expression will be trimmed to the processed part. The remaining part of the expression is reprocessed. After all literals in the expression are processed, a typed token is createdTokenTypesEnum.EOF

. It signals the end of the statement and the necessary token processing in the future.

Token processing

The purpose of processing tokens is to create an abstract syntactic tree based on them. The tree is built according to priorities in mathematical expression.Let’s introduce tree classes, expressions and specific types of possible expressions (src/expressions.py

dataclass
class TreeNode:
	"""
	Abstract Syntax Tree.
	"""

	pass


@dataclass
class Expression(TreeNode):
	pass


@dataclass
class UnaryOperation(Expression):
	"""
	-(2 + 3), where minus is operation and (2 + 3) is an expression.
	"""

	operation: str
	expression: Expression


@dataclass
class BinaryOperation(Expression):
	"""
	1 * 2 + 3 * 3, where "1 * 2" is left expression, "+" is operation and "3 * 3" is right expression.
	"""

	operation: str
	left: Expression
	right: Expression


@dataclass
class Number(Expression):
	value: float

):

The simplest expression is a number. A unary expression sets the sign of another expression. The most complex expression is binary. A binary expression has a child left and right expression and an operation to perform on them.We implement a token handler that builds an abstract syntactic tree (src/tokens_parser.py

class TokensParser(Parser):
	"""
	Creates Abstract Syntax Tree according to operations priority in provided expression.

	program := computation
	computation := term ( (PLUS | MINUS) term )*
	term := unary ( (STAR | SLASH ) unary )*
	unary := PLUS unary | MINUS unary | exponentiation
	exponentiation := atom CARET unary | atom
	atom := LEFT_PARENTHESIS computation RIGHT_PARENTHESIS | number
	number := INT
	"""

	def __init__(self) -> None:
    	self._tokens: List[Token] = []
    	self._next_token_index: int = 0

	def parse(self, tokens: List[Token]) -> Expression:
    	"""
    	Parses the expression, created by user.
    	"""

    	# Initializes for each new parsing process:
    	self._next_token_index = 0
    	self._tokens = tokens

    	computation: Expression = self._parse_computation()
    	self._get_next_token(expected_token_type=TokenTypesEnum.EOF)
    	return computation

	def _parse_computation(self) -> Expression:
    	result: Expression = self._parse_term()
    	while (next_token_type := self._get_next_token_type()) in {TokenTypesEnum.PLUS, TokenTypesEnum.MINUS}:
        	operation: str = OPERATIONS[next_token_type]
        	self._get_next_token(expected_token_type=next_token_type)
        	right: Expression = self._parse_term()
        	result = BinaryOperation(operation=operation, left=result, right=right)

    	return result

	def _parse_term(self) -> Expression:
    	"""
    	Parses an expression with multiplications and divisions.
    	"""

    	result: Expression = self._parse_unary()
    	while (next_token_type := self._get_next_token_type()) in {TokenTypesEnum.STAR, TokenTypesEnum.SLASH}:
        	operation: str = OPERATIONS[next_token_type]
        	self._get_next_token(expected_token_type=next_token_type)
        	right: Expression = self._parse_unary()
        	result = BinaryOperation(operation=operation, left=result, right=right)

    	return result

	def _parse_unary(self) -> Expression:
    	"""
    	Parses a unary operator.
    	"""

    	if (next_token_type := self._get_next_token_type()) in {TokenTypesEnum.PLUS, TokenTypesEnum.MINUS}:
        	operation: str = OPERATIONS[next_token_type]
        	self._get_next_token(expected_token_type=next_token_type)
        	expression: Expression = self._parse_unary()
        	return UnaryOperation(operation=operation, expression=expression)
    	else:  # No unary operators in sight.
        	return self._parse_exponentiation()

	def _parse_exponentiation(self) -> Expression:
    	"""
    	Parses a caret operator.
    	"""

    	expression: Expression = self._parse_atom()
    	next_token_type: TokenTypesEnum = self._get_next_token_type()
    	if next_token_type == TokenTypesEnum.CARET:
        	self._get_next_token(expected_token_type=TokenTypesEnum.CARET)
        	right: Expression = self._parse_unary()
        	expression = BinaryOperation(operation=OPERATIONS[next_token_type], left=expression, right=right)

    	return expression

	def _parse_atom(self) -> Expression:
    	"""
    	Parses a parenthesised expression or a number.
    	"""

    	expression: Expression
    	if self._get_next_token_type() == TokenTypesEnum.LEFT_PARENTHESIS:
        	self._get_next_token(expected_token_type=TokenTypesEnum.LEFT_PARENTHESIS)
        	expression = self._parse_computation()
        	self._get_next_token(expected_token_type=TokenTypesEnum.RIGHT_PARENTHESIS)
    	else:
        	expression = self._parse_number()

    	return expression

	def _parse_number(self) -> Number:
    	return Number(float(self._get_next_token(expected_token_type=TokenTypesEnum.NUMBER).literal))

	def _get_next_token(self, expected_token_type: TokenTypesEnum) -> Token:
    	"""
    	Returns next token if token's type matches expected token type.

    	In other case raises error.
    	"""

    	next_token: Token = self._tokens[self._next_token_index]
    	self._next_token_index += 1

    	if next_token.type != expected_token_type:
        	raise ParseError(f'Expected {expected_token_type}, but received {next_token!r}.')

    	return next_token

	def _get_next_token_type(self) -> TokenTypesEnum:
    	"""
    	Checks type of upcoming token without consuming it.
    	"""

    	return self._tokens[self._next_token_index].type

):

Prioritizing operations and waiting for the next type of token is the basis of processing. If the token received is not the one expected, an error will be thrown.

InterpreterThe interpreter works with a lexical handler, a token parser, and commands through abstractions. This allows you to change the implementation of these objects, without the need to change the code of the interpreter itself ( src/interfaces.py andsrc/commands/interfaces.py

class Parser(ABC):

	@abstractmethod
	def parse(self, tokens: List[Token]) -> Expression:
    	raise NotImplementedError


class Processor(ABC):

	@abstractmethod
	def process_expression(self, expression: str) -> List[Token]:
    	raise NotImplementedError

class BaseCommand(ABC):

	def __init__(self, a: float, b: float) -> None:
    	self._a: float = a
    	self._b: float = b

	@abstractmethod
	def execute(self) -> float:
    	raise NotImplementedError


class MathCommand(ABC):

	def __init__(self, value: float) -> None:
    	self._value: float = value

	@abstractmethod
	def execute(self) -> float:
    	raise NotImplementedError

): Each command is represented by a separate object and is a simplified version of the pattern Team( src/commands/base_commands.py andsrc/commands/math_commands.py

class MultiplyCommand(BaseCommand):

	def execute(self) -> float:
    	return self._a * self._b


class DivideCommand(BaseCommand):

	def execute(self) -> float:
    	try:
        	return self._a / self._b
    	except ZeroDivisionError:
        	raise CustomZeroDivisionError()


class SubtractCommand(BaseCommand):

	def execute(self) -> float:
    	return self._a - self._b


class AddCommand(BaseCommand):

	def execute(self) -> float:
    	return self._a + self._b


class ExponentialCommand(BaseCommand):

	def execute(self) -> float:
    	return self._a ** self._b

class SqrtCommand(MathCommand):

	def execute(self) -> float:
    	return math.sqrt(self._value)


class SinCommand(MathCommand):

	def execute(self) -> float:
    	return math.sin(self._value)


class CosCommand(MathCommand):

	def execute(self) -> float:
    	return math.cos(self._value)


class LogCommand(MathCommand):

	def execute(self) -> float:
    	return math.log(self._value)


class TanCommand(MathCommand):

	def execute(self) -> float:
    	return math.tan(self._value)


class ExpCommand(MathCommand):

	def execute(self) -> float:
    	return math.exp(self._value)

):Let’s associate commands with symbols that represent them in a mathematical expression (src/config.py


BASE_COMMANDS: Dict[str, Type[BaseCommand]] = {
	'+': AddCommand,
	'-': SubtractCommand,
	'*': MultiplyCommand,
	'/': DivideCommand,
	'^': ExponentialCommand,
}

MATH_COMMANDS: Dict[str, Type[MathCommand]] = {
	'sin': SinCommand,
	'cos': CosCommand,
	'tan': TanCommand,
	'log': LogCommand,
	'exp': ExpCommand,
	'sqrt': SqrtCommand,
}

):

  1. The work of the interpreter of mathematical expressions is implemented through the following steps:

  2. Get user input;

  3. Verify that the user’s input is correct, or notify the user of an error. Valid user input includes a variable, an assignment sign, and an expression to assign. The variable name consists only of letters;

  4. Replace all variables in the expression with their values. This is possible if the variables exist in storage and have been previously computed based on other user expressions;

  5. Calculate the value of mathematical functions and replace them in the expression with the obtained value. The argument for the mathematical function is calculated in steps #5, #6, and #7;

  6. Convert an expression into a set of tokens using a word processor;

  7. Build a syntax tree based on tokens using a token handler;

Calculate the value of each node of the tree recursively in ascending order and get the final value of the expression.We implement an interpreter class that performs all these steps in the required order (src/interpreter.py

class MathOperationsInterpreter:

	def __init__(
        	self,
        	interpreter_base_commands: Dict[str, Type[BaseCommand]],
        	interpreter_math_commands: Dict[str, Type[MathCommand]],
        	parser: Parser,
        	lexical_processor: Processor
	) -> None:

    	self._base_commands: Dict[str, Type[BaseCommand]] = interpreter_base_commands
    	self._math_commands: Dict[str, Type[MathCommand]] = interpreter_math_commands
    	self._parser: Parser = parser
    	self._lexical_processor: Processor = lexical_processor

    	# Storage for executed expressions, which can be user in future expressions:
    	self._user_variables: Dict[str, float] = {}

	def interpret(self, user_input: str) -> None:
    	"""
    	1) Receives user input and checks it validity;
    	2) Interprets the expression in the user input and executes it;
    	3) Assigns executed result of the expression to a given variable.
    	"""

    	try:
        	key: str
        	expression: str
        	key, expression = self._validate_user_input(user_input=user_input.lower())

        	expression_result: float = self._execute(expression=expression)
        	self._user_variables[key] = expression_result
    	except (
            	ParseError,
            	ExpressionSyntaxError,
            	IncorrectVariableAssignmentError,
            	UnknownExpressionTypeError,
            	CustomZeroDivisionError
    	) as e:
        	print(e)

	def _validate_user_input(self, user_input: str) -> Tuple[str, str]:
    	"""
    	Validates user input. If input is invalid, raises IncorrectVariableAssignmentError.

    	User input must contain a variable, an assignment sign, and an assignment expression.
    	Variable must contain only alphabetic characters.

    	Examples of correct user input are:
    	1) "x = 2";
    	2) "x= 2";
    	3) "x=2";
    	"""

    	user_input_sep: str=" "
    	values: List[str] = user_input.split(user_input_sep)

    	# Check, if input is "variable = expression":
    	user_variable: str
    	expression: str
    	equal_sign: str="="
    	if values[0].endswith(equal_sign):
        	user_variable = values[0][: -1]  # Variable name without "=" symbol
        	expression = user_input_sep.join(values[1:])
    	elif len(values) == 1 and equal_sign in values[0]:
        	use_var_index: int = values[0].find(equal_sign)
        	user_variable = values[0][: use_var_index]
        	expression = values[0][use_var_index + 1:]
    	elif values[1] == equal_sign:
        	user_variable = values[0]
        	expression = user_input_sep.join(values[2:])
    	else:
        	raise IncorrectVariableAssignmentError()

    	if not user_variable.isalpha():
        	raise IncorrectVariableAssignmentError()

    	expression = self._substitute_user_variables(expression=expression)
    	return user_variable, expression

	def _substitute_user_variables(self, expression: str) -> str:
    	"""
    	Substitutes already interpreted variable in expression if exists.
    	"""

    	for key in self._user_variables.keys():
        	if key in expression:
            	expression = expression.replace(key, str(self._user_variables[key]))

    	return expression

	def _execute(self, expression: str) -> float:
    	"""
    	1) All basic mathematical functions, such as sin, cos, tan, log, sqrt and exp,
    	are executed if any in the expression;
    	2) Generates tokens from the expression for parsing purpose;
    	3) Creates AST (Abstract Syntax Tree) on tokens basis;
    	4) Recursively calculates the value of a parsed expression based on the AST.
    	"""

    	expression = self._execute_math_operations(expression=expression)
    	tokens: List[Token] = self._lexical_processor.process_expression(expression=expression)

    	try:
        	operations_tree: TreeNode = self._parser.parse(tokens=tokens)
    	except ParseError:
        	raise ExpressionSyntaxError()

    	return self._calculate_node_value(node=operations_tree)

	def _execute_math_operations(self, expression: str) -> str:
    	"""
    	Searches for a math function in expression.
    	If such function is found, calculates its value and replaces the subexpression related to the
    	math function with the calculated value.

    	Example:
    	:param expression: "sqrt(25) + 5 * 2"
    	:return: "5.0 + 5 * 2"
    	"""

    	for math_command in self._math_commands.keys():
        	math_command_index: int = expression.find(math_command)
        	if math_command_index != -1:  # Not found
            	math_command_expression: str
            	postfix: str
            	math_command_expression, postfix = self._extract_expression_from_parentless(
                	expression=expression[math_command_index + len(math_command):]
            	)

            	command = self._math_commands[math_command](
                	value=self._execute(
                    	expression=math_command_expression
                	)
            	)

            	prefix: str = expression[: math_command_index]
            	expression = prefix + str(command.execute()) + postfix

    	return expression

	@staticmethod
	def _extract_expression_from_parentless(expression: str) -> Tuple[str, str]:
    	"""
    	Extracts subexpression from parentless in original expression for math functions purposes.
    	If original expression is not valid or there is no parentless in it, raises ExpressionSyntaxError.
    	Returns extracted subexpression and the remaining part of original expression.

    	Example:
    	:param expression: "(5 + 2) * 3"
    	:return: ("5 + 2", " * 3")
    	"""

    	opening_parenthesis: str="("
    	closing_parenthesis: str=")"
    	if not expression.startswith(opening_parenthesis):
        	raise ExpressionSyntaxError()

    	# Starting from -1, because it's index and will be straightaway incremented
    	# at first iteration of for loop below:
    	parentless_expression_end_index: int = -1

    	parentless_stack: List[str] = []
    	for index in range(len(expression)):
        	parentless_expression_end_index += 1
        	symbol: str = expression[index]
        	if symbol == opening_parenthesis:
            	parentless_stack.append(symbol)
        	elif symbol == closing_parenthesis:
            	parentless_stack.pop(0)

        	if len(parentless_stack) == 0:
            	break

    	if len(parentless_stack) != 0:
        	raise ExpressionSyntaxError()

    	# Expression and postfix should be without parentless from original expression:
    	parentless_expression: str = expression[1: parentless_expression_end_index]
    	postfix: str = expression[parentless_expression_end_index + 1:]
    	return parentless_expression, postfix

	def _calculate_node_value(self, node: TreeNode) -> float:
    	"""
    	1) Gets an AST tree node, which is one of next types: UnaryOperation, BinaryOperation or Number;
    	2) If node is a Number type - returns it, else recursively calculates expression, stored in node.
    	"""

    	command: BaseCommand
    	if isinstance(node, UnaryOperation):
        	command = self._base_commands[node.operation](
            	a=0,  # Unary operation has only one part of expression
            	b=self._calculate_node_value(node=node.expression)
        	)

        	return command.execute()
    	elif isinstance(node, BinaryOperation):
        	command = self._base_commands[node.operation](
            	a=self._calculate_node_value(node=node.left),
            	b=self._calculate_node_value(node=node.right)
        	)

        	return command.execute()
    	elif isinstance(node, Number):
        	return node.value
    	else:
        	raise UnknownExpressionTypeError()

	def get_result(self) -> Optional[float]:
    	"""
    	Returns the value for a result variable if it exists.

    	This variable signals that all expressions have been interpreted and variable storage is no longer required.
    	"""

    	result: Optional[float] = self._user_variables.get(RESULT_VARIABLE)
    	if result:
        	self._user_variables.clear()

    	return result

): To get the result of an expression using all variables – the final expression must be assigned to a variableresult

.
See you in the next issues.

I will be grateful for support and constructive criticism!

Related posts