The BCPL language from which C came out

The BCPL language from which C came out

While preparing for interviews with Go, I noticed that among its creators was Ken Thompson – I vaguely remembered that he was also standing near the sources of the C language, but without details. It actually went something like this: Martin Richards wrote BCPLKen Thompson reworked it into B discarding “junk” and improving the syntax, and Dennis Ritchie added variety of types to make the language WITH which we already more or less imagine.

And so I decided to look into BCPL – how similar it was and how it differed. I want to share a brief overview – a comparison with C! You can “feel” it yourself if you wish 🙂

A little background

“High-level” programming languages ​​began with FORTRAN in 1957, which was soon followed by LISP and Algol in 1960. although BASIC in 1963 closely resembled it). Lisp is still found in various dialects – unlike the others, it was interpreted (and allowed self-modification of code, etc.) – but its parentheses syntax is always surprising at first. But Algol – it is very easy to confuse it with Pascal – it is in it that the “block” structure of the code appeared, all these begin-ends (which in C-like languages ​​were replaced by curly brackets). In fact, Algol became the “grandfather” of all languages ​​with “Pascal” and “Sysh” syntax.

In 1963, CPL (combined programming language) was invented – in which they tried to replace begin-end with shorter character sequences. However, this language had an unusual fate – it was designed to be quite complex, including for scientific calculations – and it so happened that a full-fledged compiler appeared only in 1970. A much simpler BCPL language created on the basis of CPL turned out to be ready earlier (1967)! Moreover, language B (1969) – the mentioned simplification of BCPL – was ready, and its “typed” version called C appeared in 1972. All this led to the fact that CPL itself was used little and soon practically disappeared.

It seems to me that this is a lesson for good fellows invent easier, do faster. Long projects can be late.

First look

get "LIBHDR"                // включаем заголовочный файл

let start() be $(           // вместо main - start, и скобки немного другие
    writes("Hi, People!")   // writes - Write String
$)

The “hello world” program looks so similar to the usual C – there is little to add. Semicolons at the end of a line are optional – only if you need to separate a pair of statements in one line. The parentheses around the body of the function in this example could not be put let start() be writes("...") – this will work, the “let” keyword is used to declare variables and functions.

Let’s look at a program that enters and adds numbers:

get "LIBHDR"

let start() be $(
    let a, b = 0, 0
    a := readn()
    b := readn()
    writen(a + b)
$)

As was said let declares variables – their initialized values ​​are mandatory, which is always convenient (in this case, they are equally unnecessary, but you cannot initialize immediately by reading with readn()). Obviously readn() reads the number a writen(...) prints a number to the console. Note the assignment operator (Pascal-Agonal) as opposed to the equals sign during initialization – reminiscent of the situation in Go…

But the main thing is that variables have no type! This is not because the language has dynamic types or duck typing – it’s just that a type can be only one, albeit in several guises:

  • or is an integer equal in size to a processor word (say 32 bits)

  • or this number is a pointer, the memory address is the beginning of the array

  • the array can contain text data, the line is usually packed

  • and of course a pointer can point to a function

Thus, one type is used for all this – like ours int.

In retrospect, it can be noted that the assignment of several values ​​by one operator is possible, for example a, b := readn(), readn()– but this is not the same as in Python, but is performed in the same way in turn – i.e. it will not be possible to exchange two variables in this way.

OBCPL Compiler

Just in case (a little) if you already want to try – you can find several implementations of BCPL on the network and for modern machines – one of the most effective is OBCPL, which can be found in several slightly different versions, for example https://github.com / SergeGris/BCPL-compiler – I took it from here, it compiles easily under Linux / FreeBSD.

I didn’t find online options – that’s why I added the same compiler on the CodeAbbey site (my site with tasks, now open source) – it’s true that to get to the “launcher” you need to log in, but registration is simple, does not require a phone or a valid email.

Control structures

When looking at the set of various conditions and cycles, it becomes clear why Ken Thompson decided to simplify this matter a little. Here are the conditional statements, three of them instead of one:

  • if A then B – Conditional operator without alternative

  • test A then B else C – Conditional operator with an alternative, uses another keyword (instead of if – test)

  • unless A then C – conditional operator “if-not”

Note here that parentheses around expressions (conditions) are not needed. Word then almost always missable. Why was it necessary? unless? The fact is that there are AND/OR/NOT in the language, but they are only bits! But later he migrated, for example, to Perl.

Of course B and C can be blocks of several statements, in parentheses $( ... $) – as in the languages ​​we are used to.

And here are the cycles. There are many of them:

  • while E do C and also until E do C – loops “until” and “until-not” with a precondition

  • C repeat – an infinite loop, oddly enough, is the keyword after the body

  • C repeatwhile E and of course C repeatuntil E – loops with a post-condition (E – it still does not require parentheses)

  • for N = E1 to E2 by K do C – loop “for” with local variable N, and “by K” – step of the loop can be skipped, it is usually equal to 1 by default.

As with conditions, the body of the C loop can be written in “dollar” brackets if there are several actions in it – the word DO is optional.

With cycles are also usedbreak and loop – The latter is an analogue continue.

A familiar is used to return early from a function return but if the function returns a value, then it is converted to resultis – and the function itself in the description must use an equals sign and a keyword valof instead be – An example with a factorial looks like this:

let fact(n) = valof $(
    resultis n = 0 -> 1, fact(n-1) * n
$)

The ternary operator is used here – it has a slightly different form than in Si A -> B, C – B or C is calculated depending on whether A is equal to zero. In general valof and resultis it’s a design that can also be used as a separate unit rather than a function – can be quite handy!

Arrays and Pointers

get "LIBHDR"

let start() be $(
    let a = vec 9
    for i = 0 to 9 a!i := readn()
    for i = 9 to 0 by -1 $(
        writen(a!i)
        wrch('*N')
    $)
$)

Here we declare an array using the keyword vec – apparently, it reserves memory on the stack among other local variables – for the required number of cells. It’s a little confusing that the “last index” is specified, not the total size (ie, 9 instead of 10 for ten cells).

Addressing the element of the array – instead of square brackets – with an exclamation mark (I wouldn’t be surprised if many keyboards then had neither curly nor square brackets). That is, a!i – The 1st element of array A. Function wrch(...) we haven’t seen yet – but she obviously prints a symbol (analog putc) – special characters are marked with an asterisk instead of the usual slash, in this case it is a line translation.

Pointers, like C, are closely related to arrays. Symbol @ allows you to get the address of a variable, and ! on the contrary “dereference” a pointer, for example:

let a, b = 0, 0
a := 15
b := @a          // B теперь указывает на A
!b := 51         // по адресу лежащему в B запишем новое число
writen(a)        // конечно оно оказалось в A

In this way, the exclamation mark has two varieties – if it is “unary” – before the name of the variable – then it is a “dereference”. And if it is between two expressions – as in the case of an array element – it is actually “syntactic sugar”:

a!i          берем из массива по индексу, как a[i] в C
!(a + i)     прибавляем индекс к указателю и дереференсим его, как *(a + i)

These two lines are identical – the same is done in sy. And from this it follows that you can change their places i!a – which will also work in C, at least if the warnings about this are turned off.

Prototypes of structures are also implemented on this mechanism. You can declare constants and use them for dereference:

let person.age, person.iq = 0, 1
let a = vec 10
a!person.age := 25
a!person.iq := 113

Note that the dot is just a valid character in the identifier, it does not apply to the creation of the structure. Unfortunately, in this form, you still need to carefully monitor that the fields of one structure in the array do not cross with the fields of the next. The author explains this technique in great detail in his book.

Dynamic arrays are not included in the basic standard of the language, but are present, according to the author, in almost any implementation. For example, OBCPL uses functions for them getvec and freevec (analogues malloc and free).

As casually mentioned above, pointers to functions are used easily and casually, unlike C you don’t have to be clever with the type declaration:

    let blaha = 0
    blaha := writes
    blaha("muha")

Conclusion

We haven’t looked at many things yet – declaring constants, global and static variables, working with strings (which are written into an array in the “Pascal” way, with a length counter at the beginning), various functions for working with streams (files) – switching input and output by essentially But it seems to me that it is enough for the first acquaintance 🙂

The language turned out to be quite successful in the sense that it allowed writing fairly low-level things, and provided good portability of high-level programs. In particular, as I told earlier, the first MUD is written on it – the source can be found on github (and you can play it – on british-legends.com)

We can see how many ideas have migrated to C (and beyond) with minimal changes. At the same time, it is impressive how many “opportunities for improvement” the author has left for future developers of the B and C languages.

The idea of ​​”one type” in the days of 8 and 16-bit home computers was inconvenient – and now, for example, in programming for ARM, it seems to sound relevant again.

An interesting and important feature of the language was that it was compiled in three stages (often three separate programs – including in OBCPL):

  • first, the text is divided into tokens, from which a structural tree of tokens representing the program is formed

  • the structural tree turns into a portable O-code – a kind of assembler with a relatively small number of operations

  • and only O-code is actually compiled into native code

This approach was later widely used in other languages ​​as well – it turned out that to port the language to another platform, it is enough to rewrite only a third (compilation of the O-code).

By the way, this is described in detail by the author in a separate section of his book (BCPL, language and its compiler) – which can therefore be relevant for lovers of inventing new languages ​​even now.

Related posts