192 cores per processor – release of server AMD EPYC Turin

192 cores per processor – release of server AMD EPYC Turin

Lisa Su has been at the wheel for a decade AMDand during this time, she not only brought the company out of the crisis, but turned it into a real market leader. Under her leadership, AMD turned from an outsider into a serious player in the market of server processors, where Intel as a result has to play the role of not even catching up with AMD, and given their current state – to fight for survival with all their might.

And so, finally, the long-awaited event happened to everyone who follows the server hardware market – on October 10, AMD presented its new series of server processors EPYC 9005 Turin. Want 384 threads per CPU? Please, here’s an Epyc with 192 cores, except with a nuance inherited from the EPYC 9004 “Bergamo” line – energy-efficient Zen5C cores with a smaller number of caches, all for the sake of multithreading and cramming as many cores as possible into one. 128-core variants with full-fledged Zen5 cores and up to half a gigabyte of L3 cache were also presented.

However, there is much to talk about beyond just cores with cache, and we will talk about all the innovations in more detail in the post below.

It will be hot

The new processors show an impressive jump in performance compared to their predecessors – +17% in cloud tasks and +37% in neural networks, at a fairly modest price. From $527 for an 8-core model and $14,800 for a 192-core monster, albeit with small cores.

However, an increase in the number of cores and the density of transistors inevitably leads to an increase in TDP. Despite the reduction of the technical process to 3 and 4 nanometers, the area of ​​the processor remained unchanged. The result is a non-trivial task – how to effectively remove 0.5kW of heat with a relatively small contact area?

IPC growth in the new generation of AMD EPYC Turin.

This level of heat generation may mark the transition of the server segment to the days of water cooling. We may soon see top-of-the-line EPYC models coming with integrated liquid cooling systems. As for classic air cooling systems in 1U format servers, it seems that engineers will have to work hard to fit a fairly powerful radiator there.

It is being prepared Before leaving the server with EPYC 9965, as you can see, liquid cooling is used here.

But let’s look at it from the other side. The EPYC 9005 series offers server solutions with exceptional performance. For example, dual-socket servers with AMD EPYC 9965 processors provide 1.7 times more performance per system watt than Intel Xeon 8592+ processors when running the SPECpower test.

Moreover, replacing 100 old dual-socket Intel Xeon 8280 servers with just 14 new AMD EPYC 9655 servers can provide comparable performance while using up to 86% fewer servers and consuming 69% less power. It would take 35 Intel Xeon 8592+ based servers to achieve the same level of performance. But on the account of the 6th generation of Intel Xeon, AMD data was not given in its brochures, which is surprising.

So yes, processors are hot, but they allow you to significantly reduce the number of servers and the overall power consumption of the data center for the same or even more computing power. And how exactly the server OEMs will solve the issue of cooling all this power, we will find out soon.

The slide talks about the capabilities of Turin on Zen5 and Zen5C architectures.

Zen5 is pure power

The new architecture of Zen5 is not just another step, but a real leap forward. AMD managed not only to cram more transistors onto the crystal, but also to seriously optimize their operation. Result? Frequencies up to 5 GHz, which for a server processor sounds rather dubious, since the heat produced by these 5 GHz will need to be dissipated somewhere and all the cores are unlikely to be able to work at such a frequency at the same time. And single-core performance is not as important in the server segment as multithreading, although it probably makes sense for the same databases.

Presentation of the Turin Classic and Turin Dense chiplet layout.

Zen5C – density and efficiency

But AMD decided that this was not enough. Therefore, Zen5C appeared – the younger brother of the great Zen5, but with serious ambitions. Less cache, more cores crawling instead. And now we have 192 cores in one processor. Of course, they are not as powerful as the full-fledged Zen5, but when it comes to multi-threaded tasks – here it is no longer about fat, being alive, and quantity prevails over quality.

The Zen5C CPU chip contains 16 cores, each with 1MB of L2 cache and a total of 32MB of L3 cache. To create processors with more than 128 cores, up to 12 such dies can be connected to the I/O die, resulting in up to 192 cores per processor for ultra-dense, high-performance systems.

Appearance of EPYC 9965 processors with 192 physical cores and 384 threads.

Memory and tires are the most important

And what about RAM? Here, the new Epycs are also good, but without the fantastic. Supports DDR5 with frequencies of 6400 MHz with 12 channels. For example, the EPYC 9005 supports up to 6 TB of DDR5-6000 memory, which provides a maximum theoretical throughput of 576 GB/s sockets. That will most noticeably affect applications sensitive to the bandwidth of the RAM, such as in-memory databases.
And speaking of bandwidth, let’s note another 160 PCI-E 5.0 lines so that the data exchange bus is definitely not a bottleneck when connecting new server GPUs.

Server system based on EPYC 9005.

EPYC vs Xeon

Direct competitors of the new EPYC are processors from Intel – Xeon 6700E and 6900P, which were released a little earlier. But AMD once again does not give the blue colleague a chance to win. More cores, higher frequency, more modern memory – EPYC 9005 surpasses Xeon 6 generation in everything. This is especially noticeable in tasks related to AI, where EPYC shows itself as a real performance monster.

AMD estimates that dual-socket servers with 192-core EPYC 9965s demonstrate 2.68x higher throughput compared to 64-core Intel Xeon 8592+ when running SPECrate2017_int_base.

Advantage in business workloads

When it comes to real-world business applications, the EPYC 9005 also delivers impressive results. Using dual-socket servers based on 192-core AMD EPYC 9965 achieves 2.2 times more critical jOPs in Multi-JVM compared to 64-core Intel Xeon 8592+ when running the SPECjbb2015-MultiJVM benchmark.

For MySQL workloads based on the TPC-C benchmark, dual-socket 192-core AMD EPYC 9965 servers deliver up to 2.9 times more transactions per second compared to 64-core Intel Xeon 8592+.

AI – trying to jump on a moving train?

It is worth noting separately how AMD is positioning the new processors for artificial intelligence tasks. EPYC 9005 does not just support AI computing, it becomes a real foundation for creating powerful AI systems. The ability to connect a bunch of specialized accelerators via PCI-E 5.0, combined with a huge number of cores and fast memory, makes these processors an ideal choice for creating an infrastructure for the most demanding AI applications.

The EPYC 9005 provides up to ~2.7x higher throughput for AI inference tasks such as XGBoost on the Higgs boson dataset compared to the Intel Xeon 8592+. This makes them an excellent choice for a wide range of AI tasks, from image classification to natural language processing.

Why and who needs it, when the possibilities of inference on the GPU or NPU/TPU are orders of magnitude superior to those even in multi-core processors – have traditionally been left a mystery.

Optimization for GPU systems

However, the new processors are really good as a complement to powerful graphics cards in AI-related tasks. AMD has optimized some EPYC 9005 models for use as host processors in systems with GPUs. For example, when using two high-frequency AMD EPYC 9575Fs as hosts for 8 GPU accelerators, a ~15% faster learning time is achieved compared to two Intel Xeon 8592+ when running Llama 3.1-8B.

AsRock Rack company portfolio based on EPYC 9005.

The red road to innovation

AMD EPYC processors have divided blocks of CPU cores and I/O functions into different chips that can be designed on their own schedules and run using processes appropriate to the tasks they are supposed to perform. From generation to generation, the size of CPU crystals has decreased with the development of photolithography technology. Today, the ‘Zen 5’ cores are manufactured using 4nm technology, the ‘Zen 5c’ core is manufactured using 3nm technology, and the I/O chip remains on the 6nm technology from the previous generation.

Implementation of interprocessor communication in a two-socket system based on EPYC Turin.

This approach is more flexible and dynamic than trying to build all processor functions using a single manufacturing technology. With a modular approach, we can mix and match CPU and I/O chips to create specialized processors that precisely match workload requirements. They range from high-performance processors with 192 cores to processors for scalable systems that require as few as eight cores.

So, what do we have in the dry residue?

AMD once again proved that it can not just compete with Intel, but set the tone in the market of server processors. The trend has remained the same as in previous generations, but the scale has become larger – more transistors, more cores, more cache, and more heat.

Well, let’s wait now for the new EPYCs to start appearing in data centers around the world. And there, look, and our servers will reach ServerFlow. The main thing is not to forget to update the air conditioning system in the server room. And how can you not get a sauna instead of a server room with 0.5 kilowatts per processor.

What remains a mystery is how Intel will respond to this? And when, again before the release of new ones, they are surpassed by Xeon in all directions of EPYC? And will Intel even survive the release of the new EPYCs? Everyone is welcome in the comments to discuss this!

Related posts