Demon Knight, Intel Xeon Phi in 2024, part 0

Demon Knight, Intel Xeon Phi in 2024, part 0

The image is NOT generated by a neural network

You definitely know about Intel processors. You’ve probably heard something about Intel graphics cards. But no one (effectively no one) knows about this device. Three hundred watts, two hundred and fifty cores, x86 architecture (almost). This monster from 2014 is called Intel Knights Corner or Intel Xeon Phi… and it’s a PCI-E slot coprocessor card.

The lazy hunt for this device in flea market spaces lasted for several years. After the boom of Chinese “motherboards” for the Xeon E5 v3, we wanted something even weirder. Ding. Some non-cash money and a device are waiting at the point of issue. Severe. There is a smell of burning (he did not die in the Forbidden Forest). But the seller assured that everything works. Curiosity won over greed. Let’s believe the seller, he’s not kidding.

Of the system requirements, only 400 watts of power supply power and the presence of the Above 4G Decoding option (aka ReBAR) are included. I put it in and forgot… no, I didn’t forget, it’s impossible to forget about this turbine vacuum cleaner. However, it is clearly designed for server use, so let’s take it to the balcony. It warms even “on unmarried people”.

It shows 61 real cores

Unfortunately, it did not work due to Thunderbolt in the external graphics card box, please consider this before purchasing. Offers at flea markets sometimes appear, seize the moment. By the way, judging by the appearance (on the Internet, the map is blue, and my version is gray), I got an engineering sample. The manufacturer has strictly prohibited mining on this device. It is good that we are already in 2024, the temptation is small.

Without retelling long advertising articles, Intel Xeon Phi is such a mini-computer, it has 200 virtual computing cores and 8/16 gigabytes of RAM. This mini computer runs GNU/Linux and waits while it creates a 40 gigabit/s network adapter for the host machine. This is enough so that programs compiled in a special way can download instructions and data to the memory of the device and run them there in the mode of massive parallel execution on all these cores. And everything had to be fast. Very fast.

Intel’s proprietary compiler is required for this scheme to work. It is not so easy to find, because the support for the device has ended. Helped by web.archive.org, three hours of (very slow) downloading, and the distribution with the drivers I have. A little more time and the old version of Intel Parallel Studio XE 2017 is also there. Host machine on Windows, sorry.

Working

We will skip the boring part. The device really works, the network adapter is active, the ping is going. SSH access is working. A little bit of netsh network magic and the internet, LAN and even NFS mounts work. Everything you need for a manual sunrise. This use case was also envisaged, direct download and launch of programs via ssh.

Unfortunately, Visual Studio 2017 (the compatibility with which the authors of the Intel Parallel Studio distribution promised) refused to see the Intel compiler, so so far I was able to compile only the simplest program and only in the console. But it really works, writes the cherished line “Hello, World!” in the Xeon Phi console.

Hello world!

The device does not have a permanent memory for user data, so each start is actually done from scratch. At this point, the “firmware” of the user’s file system is loaded into the device, consisting of those files and folders that are listed (manually, yes) in text files on the host machine. Well, thanks for that too.

Unfortunately (or fortunately) I’m just a simpleton and very little familiar with the C ecosystem. The naïve hope that smart compilers from 2017 will do everything by themselves did not come true. The desire to compile a popular player of language models for this miracle of technology remained unfulfilled llama.cpp, which no one wrote about (including me). It seems that in the inference of budget neural networks, this device can find its otherworldly niche.

Windows is partly “to blame” for the failure, and indeed, many successful instructions on the Internet refer to cases when the host machine has CentOS or Debian installed with a special compilation of the gcc compiler. This was the next step in my plan to feed the old knight in the corner (Knights Corner, the device’s codename), but as soon as I decided to experiment further, the New Year’s break was over and I had to return to 2024, in which Kotlin, Angular and other boring time-to-salary converters.

And if someone still knows how to compile llama.cpp for the mic architecture, write in the comments. Maybe the material for the next part will be collected, as already said, which the devil knight is not kidding about.

And most importantly: take care of yourself and your loved ones.

Related posts