Assembler for programmers in high-level languages: conditional constructions

Assembler for programmers in high-level languages: conditional constructions

IN

previous article

we got acquainted with the basics of assembly language syntax and were able to create a program based on just two commands. Impressive result!

In this lesson, we’ll learn about new commands and use that knowledge to translate the first high-level construct into assembly language: conditional constructs.

▍ Control transfer commands

The sequence of decoded and executed CPU commands is called

flow of commands

. You can imagine it in the form of an array, the indices of which are

addresses

teams

1

.

The current instruction being executed is the one whose address is stored as a value in the register rip; that is why it is called a register index of commands.

In pseudocode, the execution of the program should look something like this:

while (!exited) {
  // Получаем команду в `registers.rip`
  instruction = instruction_stream[registers.rip]
  // Исполняем команду и возвращаем
  // адрес следующей команды.
  next_pointer = instruction.execute()
  // Присваиваем `rip` значение нового адреса,
  // чтобы на следующей итерации получить новую команду.
  registers.rip = next_pointer
  // Обрабатываем побочные эффекты (это мы рассматривать не будем)
}

Most often, execution is linear: commands are executed one after the other in the order in which they are coded from top to bottom. However, some teams may violate this order; they are called

control transfer commands

(Control Transfer Instruction, CTI).

The CTIs of interest to us belong to the category conditional and unconditional; they provide control flow capability in assembly language, allowing sequential execution of instructions. Another type of CTI is software interrupts; we will not consider them explicitly2because they are closely related to operating systems and are beyond the scope of our series of articles.

CTI will be the first to be investigated by us jmp (Jump, transition).

▍ Unconditional transitions

Jumps allow code to be executed anywhere in the flow of commands. They are only necessary for change

rip

after which, in the next clock cycle, the CPU will take the command to the new address.

Syntactically, the transition looks like this:

jmp label

Where the operand stands for the target command.

In most cases, the target command is indicated by a label; in natural language, the above command can be described as follows: “Continue execution from the command whose label label“.

An assembler, i.e., software that converts an assembly language program into machine code, converts the labels into numerical addresses of the instruction stream, and when executed, this address will be assigned to a register rip.

In fact, numeric addresses and relative addresses are also valid for rip values, but it is more convenient for machines to work with them than for people. For example, compilers with optimization flags or disassemblers prefer numeric addressing over labels.

Attentive readers should have noticed that the transition we described depends on any conditions: if the program reaches this line, it will perform the transition. Therefore, such a team is considered unconditional.

Let’s look at an example.

We use the same program “hello world” from the first lesson. Let’s make it easier to read by adding transitions to break the code into fragments. In parallel, we will add numeric constants to eliminate magic numbers from our code.

section .data
  ; Так же, как и раньше, мы определяем константу msg
  msg db `Hello, World!\n`
  ; На этот раз мы также определим здесь её длину,
  ; а также другие константы для повышения читаемости. 
  ; Директива `equ` (equals) используется для определения
  ; числовых констант.
  len       equ 14 ; длина буфера
  sys_write equ 1  ; идентификатор системного вызова write
  sys_exit  equ 60 ; идентификатор системного вызова exit
  stdout    equ 1  ; дескриптор файла для stdout

section .text
  global _start
_start:
  ; Переходы могут показаться непонятными. Чтобы упростить это
  ; введение, мы используем пункты (1), (2) ... 
  ; для описания этапов кода и их порядка.

  ; (1) Здесь мы мгновенно переходим к коду,
  ; выводящему сообщение. Конечная точка - это метка
  ; `print_msg`, то есть исполнение будет продолжено
  ; со строки прямо под ней. 
  ; Давайте перейдём к (2), чтобы посмотреть,
  ; как разворачивается эта история.
  jmp print_msg

exit:
  ; (3) Мы уже знаем принцип: при помощи `sys_exit` из
  ; верхнего блока мы можем вызывать системный вызов exit, 
  ; чтобы выйти из программы с кодом состояния 0.
  mov rax, sys_exit
  mov rdi, 0
  syscall

print_msg:
  ; (2) После вызова `jmp`, мы выполняем ту же
  ; подпрограмму, которую определили в первом уроке.
  ; Можете вернуться назад, если не помните точно,
  ; для чего нужны представленные ниже регистры.
  mov rax, sys_write
  mov rdi, stdout
  mov rsi, msg
  mov rdx, len
  syscall

  ; Мы закончили с выводом, пока выполнять выход
  ; из программы. Снова используем переход для выполнения
  ; блока по метке `exit`.
  ;
  ; Стоит отметить, что если бы мы не перешли куда-то ещё,
  ; даже если больше кода для исполнения не осталось,
  ; программа не выполнила бы выход! Она осталась бы в чистилище
  ; и рано или поздно сгенерировала бы ошибку сегментации.
  ; Закомментируйте следующую строку, если захотите это проверить.
  ;
  ; Готово? Увидели, как поломалась программа? Отлично!
  ; А теперь исправим это, выполнив переход к метке `exit`.
  ; Отправляйтесь к (3), чтобы увидеть конец этой короткой истории о переходах.
  jmp exit

▍ Conditional transitions

As you might have guessed, we implement a conditional control flow using

conditional

CTI, and in particular

conditional transitions

. Don’t worry, we’ve already laid the groundwork, conditional transitions are just an extension of the same concept of transitions.

When working with high-level languages, you might be used to flexible conditional statements like if, unless and when. Assembly language takes a different approach. Instead of a few general-purpose conditional statements, it has many specialized commands for specific checks.

Fortunately, these commands have a logical name structure that makes them easy to remember.

Let’s consider an example:

jne label

Here

label

points to a command in our code, just like with unconditional transitions. In natural language, this can be read as “

J

ump (transition) to

label

if

N

ot (no)

E

qual (equal)”.

The table below presents the most frequent designations of conditional transitions3:

Here are some more examples:

  • je label: “go if one”,
  • jge label: “go if more or one”,
  • jnz label: “go if not null”

These commands do exactly what their names say: if a condition is met, the program jumps to the target label. If not, it just continues on the next line. As with unconditional transitions, the transition location can be specified numerically.

You might ask, “equal to why?”, “more than what?”, “zero compared to what?”

Let’s answer these questions by delving into the mechanics of comparisons in assembly language and introducing a special register that plays the most important role in this process: the register eflags.

▍ Flags

eflags

is a 32-bit register that stores various flags. Unlike general-purpose registers,

eflags

is read bit by bit and its position indicates a specific flag. You can think of these flags as many boolean values ​​built right into the CPU. When the bit is 1, the corresponding flag is set

true

and when 0, the flag is equal

false

.

Flags are designed for many different tasks

4

But we only care that they are used to provide context after the operation. For example, if the result of the addition is zero, then the overflow (OF) flag can tell us whether it was caused by a true zero or an overflow. They are important to us because it is with the help of flags that the assembly language stores the results of comparisons.

In this section, we will only consider the following flags:

  • zero flag (ZF), is equal to 1 when the result of the operation is zero,
  • sign flag (SF) is 1 when the result of the operation is negative.

Team

cmp

(Compare) is one of the standard ways of performing comparisons:

cmp rax, rbx

This command subtracts the second operand from the first without saving the result, instead setting the flags. For example:

  • If the operands are equal, the zero flag (ZF) takes the value 1.
  • If the first operand is greater than the second, then the sign flag (SF) takes the value 0.

Having learned about this, we will begin to understand the meaning of conditional transitions:

  • “go if one” (je) means “go if ZF=1”,
  • “go if more or one” (jge) means “go if SF=0 or ZF=1”,
  • “go if not null” (jnz) means “go if ZF = 0”.5

▍ Finally, conditional constructions

We are finally ready to write conditional constructs in assembly language. Hooray!

Consider the following pseudocode:

if rax == rbx 
  success()
else
  error()

In assembly language, we can express this logic like this:

; Сравнить значения в rax и rbx
cmp rax rbx
; Если они равны, перейти к `success`
je success
; Иначе перейти к `error`
jmp error

This assembly code first compares the values ​​of the rax and rbx registers using the cmp command. It then uses a conditional and unconditional transition (

je

and

jmp

) to control the flow of program execution based on the result of the comparison.

Let’s consider another example, whether “hello world” is enough for us. This time we will create a serious piece of software that performs the addition and checks if the result is as expected. Very serious.

section .data
  ; Первым делом мы задаём константы,
  ; чтобы повысить читаемость
  sys_exit  equ 60
  sys_write equ 1
  stdout    equ 1
  
  ; Здесь мы задаём параметры нашей программы.
  ; Мы суммируем `a` и `b` и ожидаем, что результат
  ; будет равен значению константы `expected`.
  a         equ 100
  b         equ 50
  expected  equ 150

  ; Если сумма верна, мы хотим показать
  ; пользователю сообщение
  msg       db  `Correct!\n`
  msg_len   equ 9

section .text
global _start

_start: 
  ; Мы используем команду `add`, суммирующую
  ; два целых значения. `add` получает в качестве операндов
  ; регистры, поэтому мы копируем константы
  ; в регистры `rax` и `rbx`
  mov rax, a
  mov rbx, b
  
  ; Вот наша новая команда!
  ; Она использует арифметические способности
  ; CPU, чтобы суммировать операнды, и сохраняет
  ; результат в `rax`.
  ; На языках высокого уровня это выглядело бы так:
  ;    rax = rax + rbx
  add rax, rbx

  ; Здесь мы используем команду `cmp` (compare),
  ; чтобы проверить равенство rax == expected
  cmp rax, expected
  
  ; `je` означает "перейти, если равно", так что если сумма
  ; (в `rax`) равна `expected` (константе), мы переходим
  ; к метке `correct`
  je correct

  ; Если же результат неправильный, мы переходим
  ; к метке `exit_1`, чтобы выйти с кодом состояния 1
  jmp exit_1

exit_1:
  ; Здесь то же самое, что и в предыдущем уроке,
  ; но теперь мы используем код состояния 1,
  ; традиционно применяемый, чтобы сигнализировать об ошибках.
  mov rax, sys_exit
  mov rdi, 1
  syscall

correct:
  ; Мы уже знакомы с этим блоком: здесь мы
  ; делаем системный вызов `write` для вывода сообщения,
  ; говорящего пользователю, что сумма верна.
  mov rax, sys_write
  mov rdi, stdout
  mov rsi, msg
  mov rdx, msg_len
  syscall
  ; После вывода сообщения мы можем перейти к
  ; `exit_0`, где выполняется выход с кодом
  ; состояния 0, обозначающим успех
  jmp exit_0

exit_0:
  ; Это тот же самый код, который мы видели во всех
  ; предыдущих упражнениях; вам он должен быть знаком.
  mov rax, sys_exit
  mov rdi, 0
  syscall

▍ Conclusion

So, we have mastered the fundamental building blocks of assembly language control flow.

We learned about Control Transfer Instruction (CTI) commands using unconditional and conditional transitions. We analyzed how the index of commands (rip) controls program execution and how transitions manipulate that flow. We studied the register eflags and learned about its important role in comparisons by understanding how the zero flag (ZF) and sign flag (SF) are related to conditional operations. By connecting the team cmp with transitions, we have created the assembly language equivalent of conditionals from high-level languages.

Transitions allow for the simplest control flow, but make the code more difficult to understand. In the next article, we’ll look at the equivalent of functions: a way to execute code from another location while preserving the linear flow. You will find that this approach is similar to procedural code in high-level languages ​​and that it makes assembly code more understandable and organized.


1. Such an abstraction is not completely detached from reality. Emulators, ie software that emulates systems on other hardware, usually represent streams of commands as arrays. If you are interested in simulation, you should try to start with

CHIP-8

and as an introductory guide to use

it

.

2. I speak in an explicit formbecause, for example, the team syscall may cause an interruption. The interaction between operating systems and user programs is a wonderful world in itself, too big to explore in our articles. If you are interested, read any book on operating systems. I personally recommend OSTEP and this section in particular. ↩

3. For a complete review, see in the “Jump if Condition is Met” section of the Intel Software Developer Manuals (SDM). ↩

4. A complete list can be found in the “EFLAGS Register” section of the Intel Software Developer Manuals (SDM). ↩

5. Note that checking for equality and checking for null are essentially the same thing. ↩

Telegram channel with discounts, prize draws and IT news 💻

Related posts