Skip to main content

Dardel Fastest in Sweden

Gert Svensson, PDC

Dardel is now the fastest high-performance computing (HPC) system in Sweden and fifth in the worldwide Green500 list ( www.top500.org/lists/green500/2022/11 )! In recent months, Dardel has undergone a number of expansions and upgrades. The most significant is that 56 graphics processing unit (GPU) nodes have been installed. They are now in operation and in the process of undergoing acceptance tests.

As previously announced, the plan was for Dardel to have 56 GPU nodes, each with four AMD Instinct™ MI250X GPUs; all of those nodes have now been installed in the system. That increased the speed significantly and resulted in a performance of 8.2 petaflops for the GPUs on the High-Performance Linpack (HPL) benchmark. This means that Dardel’s HPL performance went up more than three times.

One of Dardel’s GPU boards

Dardel is now in 68th place on the latest TOP500 list, which was released in November (see www.top500.org/lists/top500/2022/11 ). Note that the GPU partition of Dardel is referred to as “Dardel GPU” in both the TOP500 and Green500 lists in order to distinguish it from the earlier CPU-only phase of the system, which was previously listed just as “Dardel”. The CPU partition of the system is now referred to as “Dardel CPU” and is in 345th place on the TOP500 list. As mentioned earlier, “Dardel GPU” is in fifth place on the Green500 list, and it is interesting to observe that the systems in positions two to seven on the Green500 list are all occupied by HPE Cray EX systems with AMD Instinct™ MI250X GPUs. This clearly demonstrates that this type of architecture is highly energy efficient, measured in floating-point operations per watt.

When the previous issue of the PDC newsletter was published, most researchers had been migrated from PDC’s previous systems to Dardel, although the Scania partition of Dardel was not fully operational at the time. Thanks to the upgrades and installations, 240 of the Dardel CPU nodes are now serving Scania’s research and development. Recently, 468 nodes were added to the system for SNIC academic researchers, including eight two-terabyte “Giant” nodes. In addition, twelve nodes (which are dedicated for research by the Dept. of Astronomy at Stockholm University) have been installed.

This table shows the number and types of nodes now in the Dardel system.
Types of nodes Memory Number of CPU nodes Number of GPU nodes
SNIC initial Industry/business SU Astronomy SNIC extra Total
Thin 256 GB 488 36 0 212 736 0
Large 512 GB 20 236 12 248 516 56
Huge 1 TB 8 0 0 0 8 0
Giant 2 TB 2 0 0 8 10 0
TOTAL - 518 272 12 468 1270 56

The entire software stack of Dardel has been upgraded to a more current release called Strawberry. At the time of the writing, this release was being tested on an “island” of the Dardel system. Also, the access mechanism for the Lustre file storage system has been changed so it will be compatible with future software versions – more specifically it was changed from Remote Direct Memory Access (RDMA) technology to the Transmission Control Protocol/Internet Protocol (TCP/IP). This may temporarily decrease the speed of the disks until the disk software is also updated, after which the speed should be similar or better than before.

PDC has been working with Hewlett Packard Enterprise (HPE) to make all these significant changes to Dardel while minimising effects on the system operation. Most of the work has been possible without affecting the researchers using the system.

There are still some major hardware and software upgrades ahead of us. The interconnect in Dardel will soon be upgraded to the next generation of Slingshot network with a speed of 200 gigabit/second. This will require that all the Slingshot cards and some cables be replaced, which will require system downtime of a week. 

Other upcoming upgrades include updating the Lustre disk system with 50% more capacity in terms of both size and speed and upgrading the software in the disk system to a new version. As mentioned earlier, that software upgrade should optimize TCP/IP access so the access speed should be similar to that with the previous RDMA access. The Dardel system software will also be upgraded to several new releases that will provide a lot of new functionality, especially for supporting the GPUs.