Assembler for programmers in high-level languages: conditional constructions
IN
previous article
we got acquainted with the basics of assembly language syntax and were able to create a program based on just two commands. Impressive result!
In this lesson, we’ll learn about new commands and use that knowledge to translate the first high-level construct into assembly language: conditional constructs.
Contents
▍ Control transfer commands
The sequence of decoded and executed CPU commands is called
flow of commands
. You can imagine it in the form of an array, the indices of which are
addresses
teams
1
.
The current instruction being executed is the one whose address is stored as a value in the register rip
; that is why it is called a register index of commands.
In pseudocode, the execution of the program should look something like this:
while (!exited) {
// Получаем команду в `registers.rip`
instruction = instruction_stream[registers.rip]
// Исполняем команду и возвращаем
// адрес следующей команды.
next_pointer = instruction.execute()
// Присваиваем `rip` значение нового адреса,
// чтобы на следующей итерации получить новую команду.
registers.rip = next_pointer
// Обрабатываем побочные эффекты (это мы рассматривать не будем)
}
Most often, execution is linear: commands are executed one after the other in the order in which they are coded from top to bottom. However, some teams may violate this order; they are called
control transfer commands
(Control Transfer Instruction, CTI).
The CTIs of interest to us belong to the category conditional and unconditional; they provide control flow capability in assembly language, allowing sequential execution of instructions. Another type of CTI is software interrupts; we will not consider them explicitly2because they are closely related to operating systems and are beyond the scope of our series of articles.
CTI will be the first to be investigated by us jmp
(Jump, transition).
▍ Unconditional transitions
Jumps allow code to be executed anywhere in the flow of commands. They are only necessary for change
rip
after which, in the next clock cycle, the CPU will take the command to the new address.
Syntactically, the transition looks like this:
jmp label
Where the operand stands for the target command.
In most cases, the target command is indicated by a label; in natural language, the above command can be described as follows: “Continue execution from the command whose label label
“.
An assembler, i.e., software that converts an assembly language program into machine code, converts the labels into numerical addresses of the instruction stream, and when executed, this address will be assigned to a register rip
.
In fact, numeric addresses and relative addresses are also valid for rip
values, but it is more convenient for machines to work with them than for people. For example, compilers with optimization flags or disassemblers prefer numeric addressing over labels.
Attentive readers should have noticed that the transition we described depends on any conditions: if the program reaches this line, it will perform the transition. Therefore, such a team is considered unconditional.
Let’s look at an example.
We use the same program “hello world” from the first lesson. Let’s make it easier to read by adding transitions to break the code into fragments. In parallel, we will add numeric constants to eliminate magic numbers from our code.
section .data
; Так же, как и раньше, мы определяем константу msg
msg db `Hello, World!\n`
; На этот раз мы также определим здесь её длину,
; а также другие константы для повышения читаемости.
; Директива `equ` (equals) используется для определения
; числовых констант.
len equ 14 ; длина буфера
sys_write equ 1 ; идентификатор системного вызова write
sys_exit equ 60 ; идентификатор системного вызова exit
stdout equ 1 ; дескриптор файла для stdout
section .text
global _start
_start:
; Переходы могут показаться непонятными. Чтобы упростить это
; введение, мы используем пункты (1), (2) ...
; для описания этапов кода и их порядка.
; (1) Здесь мы мгновенно переходим к коду,
; выводящему сообщение. Конечная точка - это метка
; `print_msg`, то есть исполнение будет продолжено
; со строки прямо под ней.
; Давайте перейдём к (2), чтобы посмотреть,
; как разворачивается эта история.
jmp print_msg
exit:
; (3) Мы уже знаем принцип: при помощи `sys_exit` из
; верхнего блока мы можем вызывать системный вызов exit,
; чтобы выйти из программы с кодом состояния 0.
mov rax, sys_exit
mov rdi, 0
syscall
print_msg:
; (2) После вызова `jmp`, мы выполняем ту же
; подпрограмму, которую определили в первом уроке.
; Можете вернуться назад, если не помните точно,
; для чего нужны представленные ниже регистры.
mov rax, sys_write
mov rdi, stdout
mov rsi, msg
mov rdx, len
syscall
; Мы закончили с выводом, пока выполнять выход
; из программы. Снова используем переход для выполнения
; блока по метке `exit`.
;
; Стоит отметить, что если бы мы не перешли куда-то ещё,
; даже если больше кода для исполнения не осталось,
; программа не выполнила бы выход! Она осталась бы в чистилище
; и рано или поздно сгенерировала бы ошибку сегментации.
; Закомментируйте следующую строку, если захотите это проверить.
;
; Готово? Увидели, как поломалась программа? Отлично!
; А теперь исправим это, выполнив переход к метке `exit`.
; Отправляйтесь к (3), чтобы увидеть конец этой короткой истории о переходах.
jmp exit
▍ Conditional transitions
As you might have guessed, we implement a conditional control flow using
conditional
CTI, and in particular
conditional transitions
. Don’t worry, we’ve already laid the groundwork, conditional transitions are just an extension of the same concept of transitions.
When working with high-level languages, you might be used to flexible conditional statements like if
, unless
and when
. Assembly language takes a different approach. Instead of a few general-purpose conditional statements, it has many specialized commands for specific checks.
Fortunately, these commands have a logical name structure that makes them easy to remember.
Let’s consider an example:
jne label
Here
label
points to a command in our code, just like with unconditional transitions. In natural language, this can be read as “
J
ump (transition) to
label
if
N
ot (no)
E
qual (equal)”.
The table below presents the most frequent designations of conditional transitions3:
Here are some more examples:
je label
: “go if one”,jge label
: “go if more or one”,jnz label
: “go if not null”
These commands do exactly what their names say: if a condition is met, the program jumps to the target label. If not, it just continues on the next line. As with unconditional transitions, the transition location can be specified numerically.
You might ask, “equal to why?”, “more than what?”, “zero compared to what?”
Let’s answer these questions by delving into the mechanics of comparisons in assembly language and introducing a special register that plays the most important role in this process: the register eflags
.
▍ Flags
eflags
is a 32-bit register that stores various flags. Unlike general-purpose registers,
eflags
is read bit by bit and its position indicates a specific flag. You can think of these flags as many boolean values built right into the CPU. When the bit is 1, the corresponding flag is set
true
and when 0, the flag is equal
false
.
Flags are designed for many different tasks
4
But we only care that they are used to provide context after the operation. For example, if the result of the addition is zero, then the overflow (OF) flag can tell us whether it was caused by a true zero or an overflow. They are important to us because it is with the help of flags that the assembly language stores the results of comparisons.
In this section, we will only consider the following flags:
- zero flag (ZF), is equal to 1 when the result of the operation is zero,
- sign flag (SF) is 1 when the result of the operation is negative.
Team
cmp
(Compare) is one of the standard ways of performing comparisons:
cmp rax, rbx
This command subtracts the second operand from the first without saving the result, instead setting the flags. For example:
- If the operands are equal, the zero flag (ZF) takes the value 1.
- If the first operand is greater than the second, then the sign flag (SF) takes the value 0.
Having learned about this, we will begin to understand the meaning of conditional transitions:
- “go if one” (
je
) means “go if ZF=1”, - “go if more or one” (
jge
) means “go if SF=0 or ZF=1”, - “go if not null” (
jnz
) means “go if ZF = 0”.5
▍ Finally, conditional constructions
We are finally ready to write conditional constructs in assembly language. Hooray!
Consider the following pseudocode:
if rax == rbx
success()
else
error()
In assembly language, we can express this logic like this:
; Сравнить значения в rax и rbx
cmp rax rbx
; Если они равны, перейти к `success`
je success
; Иначе перейти к `error`
jmp error
This assembly code first compares the values of the rax and rbx registers using the cmp command. It then uses a conditional and unconditional transition (
je
and
jmp
) to control the flow of program execution based on the result of the comparison.
Let’s consider another example, whether “hello world” is enough for us. This time we will create a serious piece of software that performs the addition and checks if the result is as expected. Very serious.
section .data
; Первым делом мы задаём константы,
; чтобы повысить читаемость
sys_exit equ 60
sys_write equ 1
stdout equ 1
; Здесь мы задаём параметры нашей программы.
; Мы суммируем `a` и `b` и ожидаем, что результат
; будет равен значению константы `expected`.
a equ 100
b equ 50
expected equ 150
; Если сумма верна, мы хотим показать
; пользователю сообщение
msg db `Correct!\n`
msg_len equ 9
section .text
global _start
_start:
; Мы используем команду `add`, суммирующую
; два целых значения. `add` получает в качестве операндов
; регистры, поэтому мы копируем константы
; в регистры `rax` и `rbx`
mov rax, a
mov rbx, b
; Вот наша новая команда!
; Она использует арифметические способности
; CPU, чтобы суммировать операнды, и сохраняет
; результат в `rax`.
; На языках высокого уровня это выглядело бы так:
; rax = rax + rbx
add rax, rbx
; Здесь мы используем команду `cmp` (compare),
; чтобы проверить равенство rax == expected
cmp rax, expected
; `je` означает "перейти, если равно", так что если сумма
; (в `rax`) равна `expected` (константе), мы переходим
; к метке `correct`
je correct
; Если же результат неправильный, мы переходим
; к метке `exit_1`, чтобы выйти с кодом состояния 1
jmp exit_1
exit_1:
; Здесь то же самое, что и в предыдущем уроке,
; но теперь мы используем код состояния 1,
; традиционно применяемый, чтобы сигнализировать об ошибках.
mov rax, sys_exit
mov rdi, 1
syscall
correct:
; Мы уже знакомы с этим блоком: здесь мы
; делаем системный вызов `write` для вывода сообщения,
; говорящего пользователю, что сумма верна.
mov rax, sys_write
mov rdi, stdout
mov rsi, msg
mov rdx, msg_len
syscall
; После вывода сообщения мы можем перейти к
; `exit_0`, где выполняется выход с кодом
; состояния 0, обозначающим успех
jmp exit_0
exit_0:
; Это тот же самый код, который мы видели во всех
; предыдущих упражнениях; вам он должен быть знаком.
mov rax, sys_exit
mov rdi, 0
syscall
▍ Conclusion
So, we have mastered the fundamental building blocks of assembly language control flow.
We learned about Control Transfer Instruction (CTI) commands using unconditional and conditional transitions. We analyzed how the index of commands (rip
) controls program execution and how transitions manipulate that flow. We studied the register eflags
and learned about its important role in comparisons by understanding how the zero flag (ZF) and sign flag (SF) are related to conditional operations. By connecting the team cmp
with transitions, we have created the assembly language equivalent of conditionals from high-level languages.
Transitions allow for the simplest control flow, but make the code more difficult to understand. In the next article, we’ll look at the equivalent of functions: a way to execute code from another location while preserving the linear flow. You will find that this approach is similar to procedural code in high-level languages and that it makes assembly code more understandable and organized.
1. Such an abstraction is not completely detached from reality. Emulators, ie software that emulates systems on other hardware, usually represent streams of commands as arrays. If you are interested in simulation, you should try to start with
CHIP-8
and as an introductory guide to use
it
.
↩
2. I speak in an explicit formbecause, for example, the team syscall
may cause an interruption. The interaction between operating systems and user programs is a wonderful world in itself, too big to explore in our articles. If you are interested, read any book on operating systems. I personally recommend OSTEP and this section in particular. ↩
3. For a complete review, see in the “Jump if Condition is Met” section of the Intel Software Developer Manuals (SDM). ↩
4. A complete list can be found in the “EFLAGS Register” section of the Intel Software Developer Manuals (SDM). ↩
5. Note that checking for equality and checking for null are essentially the same thing. ↩
Telegram channel with discounts, prize draws and IT news 💻