why it is important and why it was necessary to create a new language “Argentum”
Any modern programming language is based on a reference model that describes the data structures that applications will operate on. It determines how objects refer to each other, at what point an object can be deleted, when and how an object can be modified.
Status Quo
Most modern programming languages are built using one of three reference models:
The first category is languages with by hand managing the lifetime of objects. Examples – C/C++/Zig. In these languages, objects are allocated and deallocated manually, and a pointer is simply a non-binding memory address.
The second category includes languages from by counting links. These are Objective-C, Swift, Partially Rust, C++ when using smart pointers and some others. These languages allow you to automate the removal of unnecessary objects to a certain extent. But it has its own price. In a multithreaded environment, such link counters must be atomic, which is expensive. In addition, link counting cannot eliminate all types of garbage. When object A refers to object B, object B refers back to object A, such a looped hierarchy cannot be removed by reference counting. Languages such as Rust, Swift introduce additional non-owned references that solve the problem of loops at the cost of complicating the object model and syntax.
Most modern programming languages fall into the third category. These are languages with automatic garbage collection: Java, JavaScript, Kotlin, Python, Lua… In these languages, unnecessary objects are removed automatically, but there is a nuance. The garbage collector consumes a lot of memory and CPU time. It turns on at random times and pauses the main program. Sometimes completely – all the time of work, sometimes partially. Garbage collection without pauses does not exist. Only the browsing algorithm can guarantee that all garbage is collected the whole memory and stops the application on All at the time of work. In real life, such collectors have not been used for a long time due to their inefficiency. In modern systems, some garbage objects are not deleted at all.
In addition, the very definition of an unnecessary object needs clarification. If, for example, we have a GUI application and you remove a control from the form that subscribes to a timer event, it cannot be removed simply because somewhere in the timer object there is a reference to that object, and the garbage collector will not treat such an object as garbage.
As mentioned above, each of the three link models has its own drawbacks. In the first case, we have holes in memory safety and memory leaks, in the second case, we have slow operation in a multithreaded environment and memory leaks due to loops, in the third case, we get sporadic stops of the program, heavy consumption of memory, processor and the need to manually break links when the object becomes unnecessary. In addition, the system with reference counting and garbage collection does not allow you to manage the lifetime of other resources – such as open file descriptors, identifiers of windows, processes, fonts, etc. These methods are memory only. There is another problem of systems with garbage collection — virtual memory. In situations where the software system accumulates garbage and then scans memory for freeing, pushing part of the address space to external media can completely kill the performance of the program. Therefore, garbage collection is not compatible with virtual memory.
That is, there are problems and the current methods of solving them have flaws.
A possible solution
Let’s try to build a reference model free from the above-mentioned shortcomings. First, you need to gather requirements, look at how objects are actually used in programs, what the programmer expects from the hierarchy of objects, how his work can be facilitated and automated without loss of productivity.
Our industry has accumulated a wealth of experience in designing data models. We can say that this experience is summarized in the universal modeling language UML. UML introduces three types of relationships between objects: association, composition and aggregation.
-
Association – when one object knows about another. Association implies ownership.
-
Composition – when an object monopolizes another object. For example, a wheel can be in only one car at a time.
-
Aggregation is multiple ownership. When, for example, many people search for the name Andriy.
Let’s analyze it using more specific examples:
The database has its own tables, records, counters and stored procedures. A table has its own records, column metadata, and indexes. The record has its own fields. Another example: a user interface form has its own controls, a list has its own elements. A document owns its style sheets, pages, which in turn own elements on the page, text blocks own paragraphs, which own symbols. All these relationships are a composition. A composition always forms a tree-like structure in which each object has exactly one owner, and the object exists only as long as that owner refers to it. We always know when an object needs to be removed, that is, a garbage collector is not needed for such references, just as their counting is not needed.
Examples of association: paragraphs of documents refer to styles, some records of tables refer to other records of other tables (suppose we have an advanced relational database in which such relationships are encoded with a special data type, and not with the help of foreign keys), control elements on a GUI form are linked in a data chain to a control in some reactive application. All these relationships are provided by non-owning pointers. They do not prevent the object from being deleted, but must handle this deletion in some way to ensure memory safety, and the language must stop attempts to access such references without checking for object loss.
Aggregation is a dangerous thing. All industry experience shows that only immutable objects can be aggregated. For example, strings are immutable in Java, so many objects can refer to the same string. If the object has many owners from different hierarchies, its change will lead to the saddest consequences in the most unexpected places. Therefore, the programming language should stop aggregating the changing object. Is it possible to exclude the use of aggregation entirely as recommended by the Google Coding Style Agreement? That would be too radical. For example, the popular flyweight design pattern is built on aggregation. In addition, the unit will help a lot in a multi-threaded environment, where immutable objects can roam safely between threads.
It is interesting that the hierarchy unchanging objects connected by aggregating links cannot contain rings. Each immutable object begins its life as mutable, because the object needs to be filled with data and only then “frozen”, making it immutable. And non-changeable objects, as we have already seen, cannot contain loops. Therefore, ringing in correctly organized hierarchies of objects can occur only by non-owning associative links. References of possessive composites are always formative tree. Aggregating links directed acyclic graph (DAG). And none of these structures need a garbage collector.
It turns out that the main problem with existing reference models is that they allow you to create data structures that contradict the best practices of our industry and, as a side effect, create a bunch of memory safety problems and memory leaks. Languages using these models then heroically deal with the consequences of their architecture without addressing the root causes.
If you design a programming language that will:
-
support UML references at the declarative level
-
using these declarations, automatically generate all operations on objects (copying, destruction, transfer between streams, etc.),
-
at the compilation stage, inform the rules of using these links (one owner, constancy, checking for object loss, etc.),
… then such a language will ensure memory safety, and the absence of memory leaks, and the absence of garbage collector vertices, and will significantly simplify the programmer’s life. And since the objects will be deleted at predictable moments of time, this will allow you to connect resource management to the life of the objects.
Realization
In an experimental programming language Argentum the idea of UML links is implemented as follows:
-
Class field labeled “&is does not own a link (association).
-
The field is marked*is a separable (balloon) reference to an immutable object (aggregation)
-
All other reference fields are compositions (such a field is the only owner of a changing object).
Example:
class Scene {
// Поле `elements` владеет объектом `Array`, который владеет множеством `SceneItem`s
// Это композиция
elements = Array(SceneItem);
// Поле `focused` ссылается на произвольный `SceneItem`. Без владения
// Это ассоциация
focused = &SceneItem;
}
interface SceneItem { ... }
class Style { ... }
class Label { +SceneItem; // Наследование
text = ""; // Композиция: строка принадлежит лейблу
// Поле `style` ссылается на неизменяемый экземпляр `Style`
// Его неизменяемость позволяет ему шариться между всеми, кто на него ссылается
// Это агрегация
style = *Style;
}
Continuation of the example, creating objects:
// Создаем объект класса Scene и сохраняем в переменной
// `root` это композитная ссылка (с единственным владельцем)
root = Scene;
// Создаем объект класса Style; заполняем его, вызывая методы инициализации;
// замораживем, превращая в неизменяемый объект с помощью *-оператора
// `normal` это агрегирующая ссылка
// (другие ссылки могут ссылаться на тот же экземпляр Style)
normal = *Style.font(times).size(40).weight(600);
// Создаем объект класса Label, инициализируем поля и вставляем в `scene`
root.elements.add(
Label.at(10, 20).setText("Hello").setStyle(normal));
// Настраиваем ссылку из `scene` в `Label`
root.focused := &root.elements[0];
The data structure constructed by us provides several important guarantees of integrity:
root.elements.add(root.elements[0]);
// ошибка компиляции: объект `Label` может иметь только одного владельца
normal.weight := 800;
// ошибка компиляции: `normal` - неизменяемый объект
root.focused.hide();
// ошибка компиляции: нет проверки и реакции на оборванную ссылку `focused`
But Argentum not only watches over the programmer (and beats him on the hands). It also helps, let’s try to fix compilation errors:
root.elements.add(@root.elements[0]);
// @-оператор глубокого копирования.
// В этой строке на сцену добавляется _копия_ Label,
// которая ссылается на _копию_ текста, но шарит тот же самый Style-объект.
normal := *(@normal).weight(800);
// Сделать изменяемую копию Style-объекта,
// изменить в нем weight,
// заморозить эту копию (сделать ее неизменяемой)
// и пусть `normal` ссылается не нее.
root.focused? _.hide();
// Защитить объект по ассоциативной ссылке от удаления,
// и если он не пуст, вызвать его метод.
// после чего снять с объекта защиту.
All operations of copying, freezing, unfreezing, deleting, transferring between threads, etc. are performed automatically. The compiler constructs these operations using composition-aggregation-association declarations in object fields.
For example, if you write:
newScene := @root;
…then a full copy of the scene will be made with properly configured internal links:
Pay attention:
-
all subobjects that must have a single owner are copied in cascade.
-
objects marked as layered (Style) do not participate in copying
-
In the copy of the scene, the focused field correctly refers to the copy of the label.
Automation of key operations on object hierarchies in Argentum:
-
provides memory safety
-
ensures no memory leaks (and makes the garbage collector unnecessary)
-
guarantees the timely deletion of objects, which allows you to automatically manage resources other than RAM through RAII: automatically close files, sockets, handles,
-
guarantees the absence of damage in the logical structure of the object model,
-
relieves the programmer from the routine implementation of these operations.
Intermediate conclusions
The Argentum language is built on the new, but already well-known UML reference model, which is free from the limitations and shortcomings of garbage collection, reference counting, and manual memory management systems. To date, the language already has: parameterized classes and interfaces; multithreading; control structures based on optional data types; fast type casting and very fast interface method calls; modularity and FFI. It provides memory, type safety, no memory leaks, races, and deadlocks. It uses LLVM for code generation and builds standalone executables.
Argentum is an experimental language. There is a lot to be done in it: numerous code generation optimizations, bugfixes, test coverage, stack unwinding, better debugger support, syntactic sugar, etc., but it is developing quickly and the listed improvements are a matter of the near future.
Argentum project home page with demo and tutorials: aglang.org.
The next article will cover the semantics of operations on UML pointers and their implementation in Agentum.