Old codebases as classic literature

Old codebases as classic literature

Fragments of surviving Greek papyri, source

The well-known principle of well-readness works in language and literature: the more a person reads good books, the larger his vocabulary, the wider his horizons. The conceptual apparatus develops, literacy improves by itself without any textbooks.

Do we apply this principle in teaching programming?

▍ Linux 0.0.1

Of course, studying the original classical programs is useful for general development. Example,

guts of Linux 0.0.1

demonstrate an example of competent system programming, and also make the working principles of the modern Linux kernel clearer. We see where it all started: a modest 8,670 lines of code (if you don’t count the empty ones), just 66 system calls, hard core optimization for the Intel 386 architecture, support for PC/AT devices only, etc.

Reading this code (sources in file linux-0.01.tar.gz) is a pleasure comparable to reading Shakespeare. Some sections are surprisingly laconic. Let’s say five lines to handle kernel panic:

volatile void panic(const char * s)
	printk("Kernel panic: %s\n\r",s);

An error message – and the system hangs, nothing more.

Or a fragment with the initialization of all subsystems from a file init/main.c, which is still in the Linux kernel and initializes it. All this is very interesting and gives you the opportunity to touch history, as if you are flipping through the annals of the Middle Ages with your own hands. However, ancient chronicles were kept in the form of scrolls/rolls (eng. scrolls), so the word “scroll” is not entirely appropriate here, rather scrolls are “scrolled” – hence the modern jargon “scroll”.

In any case, before us are valuable historical artifacts and at the same time pleasant, and sometimes funny reading. For example, such a touching comment from the author in the kernel code:

 * For those with more memory than 8 Mb - tough luck. I've
 * not got it, why should you :-) The source is here. Change
 * it. (Seriously - it shouldn't be too difficult. ...

A young Linus Torvalds (37 years old) gives a lecture about Git in 2007 (still without glasses)

▍ Open source is good for everyone

Having open, clear and well-documented original sources is very important for posterity. First, it is important

for quality code support

. Then it will be important to the story and to the fans, who will want to run this program decades later, but will not be able to without the weekend. For example, last year in the Network

ran away

origins of the classic game Wipeout by Psygnosis, including the original PSX version and the ported version for Windows.

As a result, fans for a year

rewrote the game from scratch

including renderer, physics, memory management, sound effects and

everything else

. They have released modern versions for Windows, Linux, macOS and WASM/WebGL (

runs in the browser


That is, thanks to the leak and the forced opening of the sources, the legendary racing simulator got a second life. The new versions of Wipeout are even better than the original.

▍ Ephemeral work

If we compare old codebases with classical literature, it is not superfluous to remember that only 1% of ancient literature has survived to this day.

Existing software is likely to suffer a similar fate. In a few centuries, 99.9% of existing software will disappear not only from active use, but even from archives. In this sense, one can only sympathize with existing software architects: unlike real architects, their work is not material, has no particular importance — and will soon disappear from reality without a trace.

It is interesting to see how historians and archaeologists try to reconstruct the ancient scrolls, which were poorly preserved after the eruption of Vesuvius in 79 AD, by symbols. is.

Fragments of Greek papyri have been preserved. On the left – detection of ink traces using machine learning methods, on the right – the real state of affairs according to IR imaging, source

Likewise, modern digital archaeologists are trying to compile old Linux 0.0.1 source codes with modern compilers. It turned out that GCC is not so backward compatible … It was still possible to adapt the kernel to the modern toolchain using the GCC 4.x compiler. Prepared images work fine with modern programs such as bash-3.2, coreutils-6.9, dietlibc-0.31 (instead of glibc), bin86-0.16.17, make-3.81, ncurses-2.0.7, and vim-7.1.

There is an opinion that the code is not really literature, but rather encryption, understood only by “initiates” after special training and immersion in the context. We don’t read it, but decode.

As already mentioned, reading improves mental processes, develops a person as a person and helps him write the best texts, letters, poems. Unfortunately, in programming, the system works a little bit wrong. Simple reading is not enough here.

There is some sense in this. No one disputes that open, accessible code with detailed documentation and comments is a good thing for everyone. But it is unlikely that you can fully learn programming on such a code. Yes, you can learn some unusual techniques, see something funny. But learning this is not enough. Also, codebases are aging. Well, what can be learned from the assembly code of the Apollo space navigation computer? All this is no longer used in practice, like many other things from old programs. Now they are just historical artifacts, useful only for general development. Moreover, there are not so many examples of really good code. And you won’t learn from bad (ordinary) code.

▍ Code as literature

On the other hand, Donald Knuth promoted the concept of “literate programming” (LP). It is a programming and documentation methodology in which a program consists of natural language prose interspersed with macro substitutions and code in programming languages.

This concept was formulated by Donald Knuth in 1981 during the development of the computer typesetting language TeX. Accordingly, he actually published his programs in the form of books: for example, TeX and METAFONT (in PDF format). These are really real books, literature that is pleasant to read even in paper form. True, without pictures, but made personally by the author.

The LP methodology is regularly used in science to prepare reproducible studies and ensure open access to data. Literate programming tools are used by millions of developers today, especially in the field of data science. For example, this is what the accompanying materials for a modern data science article look like: it is mostly text with code snippets and explanations, as well as illustrations and graphs (results of applying this code):

Everything as bequeathed by the teacher (Donald Knuth).

What distinguishes the paradigm of competent programming from traditional development is that here the development of programs is conducted in the order required by the logic and flow of thought of the developer, and not in the order of program execution or in the order imposed by the compiler. That is, the primary thought of a person, and not the structure of machine code. In this case, a special toolkit (macros) is needed to hide abstractions and traditional source code. As a result, the program is more like an essay text.

In the latest generation, LP tools are completely independent of the programming language.

▍ Examples of code to read

For reading in your free time, you can recommend the following classic “works”:

Many of the programs mentioned above are written by one author in his own style, so they really read like fiction. Some were maintained for decades by a single maintainer who made 99% of all changes as Bram Molenar.

Be the first to learn about new promotions and promo codes from our Telegram channel 💰

Related posts