The 58th annual edition of the TOP500 saw little change in the Top10. The Microsoft Azure system called Voyager-EUS2 was the only machine to shake up the top spots, claiming No. 10. Based on an AMD EPYC processor with 48 cores and 2.45GHz working together with an NVIDIA A100 GPU and 80 GB of memory, Voyager-EUS2 also utilizes a Mellanox HDR Infiniband for data transfer.
While there were no other changes to the positions of the systems in the Top10, Perlmutter at NERSC improved its performance to 70.9 Pflop/s. Housed at the Lawrence Berkeley National Laboratory, Perlmutter’s increased performance couldn’t move it from its previously held No. 5 spot.
Fugaku continues to hold the No. 1 position that it first earned in June 2020. Its HPL benchmark score is 442 Pflop/s, which exceeded the performance of Summit at No. 2 by 3x. Installed at the Riken Center for Computational Science (R-CCS) in Kobe, Japan, it was co-developed by Riken and Fujitsu and is based on Fujitsu’s custom ARM A64FX processor. Fugaku also uses Fujitsu’s Tofu D interconnect to transfer data between nodes.
In single or further-reduced precision, which are often used in machine learning and A.I. application, Fugaku has a peak performance above 1,000 PFlop/s (1 Exaflop/s). As a result, Fugaku is often introduced as the first “Exascale” supercomputer.
While there were also reports about several Chinese systems reaching Exaflop level performance, none of these systems submitted an HPL result to the TOP500.
Here’s a summary of the systems in the Top10:
Fugaku remains the No. 1 system. It has 7,630,848 cores which allowed it to achieve an HPL benchmark score of 442 Pflop/s. This puts it 3x ahead of the No. 2 system in the list.
Summit, an IBM-built system at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, remains the fastest system in the U.S. and at the No. 2 spot worldwide. It has a performance of 148.8 Pflop/s on the HPL benchmark, which is used to rank the TOP500 list. Summit has 4,356 nodes, each housing two Power9 CPUs with 22 cores each and six NVIDIA Tesla V100 GPUs, each with 80 streaming multiprocessors (S.M.). The nodes are linked together with a Mellanox dual-rail EDR InfiniBand network.
Sierra, a system at the Lawrence Livermore National Laboratory, CA, USA, is at No. 3. Its architecture is very similar to the #2 systems Summit. It is built with 4,320 nodes with two Power9 CPUs and four NVIDIA Tesla V100 GPUs. Sierra achieved 94.6 Pflop/s.
Sunway TaihuLight is a system developed by China’s National Research Center of Parallel Computer Engineering & Technology (NRCPC) and installed at the National Supercomputing Center in Wuxi, China's Jiangsu province is listed at the No. 4 position with 93 Pflop/s.
Perlmutter at No. 5 was newly listed in the TOP10 in last June. It is based on the HPE Cray “Shasta” platform, and a heterogeneous system with AMD EPYC based nodes and 1536 NVIDIA A100 accelerated nodes. Perlmutter improved its performance to 70.9 Pflop/s
Selene, now at No. 6, is an NVIDIA DGX A100 SuperPOD installed in-house at NVIDIA in the USA. The system is based on an AMD EPYC processor with NVIDIA A100 for acceleration and a Mellanox HDR InfiniBand as a network. It achieved 63.4 Pflop/s.
Tianhe-2A (Milky Way-2A), a system developed by China’s National University of Defense Technology (NUDT) and deployed at the National Supercomputer Center in Guangzhou, China, is now listed as the No. 7 system with 61.4 Pflop/s.
A system called “JUWELS Booster Module” is No. 8. The BullSequana system build by Atos is installed at the Forschungszentrum Juelich (FZJ) in Germany. The system uses an AMD EPYC processor with NVIDIA A100 for acceleration and a Mellanox HDR InfiniBand as a network similar to the Selene System. This system is the most powerful system in Europe, with 44.1 Pflop/s.
HPC5 at No. 9 is a PowerEdge system built by Dell and installed by the Italian company Eni S.p.A. It achieves a performance of 35.5 Pflop/s due to using NVIDIA Tesla V100 as accelerators and a Mellanox HDR InfiniBand as the network.
Voyager-EUS2, a Microsoft Azure system installed at Microsoft in the U.S., is the only new system in the TOP10. It achieved 30.05 Pflop/s and is listed at No. 10. This architecture is based on an AMD EPYC processor with 48 cores and 2.45GHz working together with an NVIDIA A100 GPU with 80 G.B. memory and utilizing a Mellanox HDR Infiniband for data transfer.
Rmax and Rpeak values are in TFlops. For more details
about other fields, check the TOP500 description.
Rpeak values are calculated using the advertised clock rate of the CPU. For the
efficiency of the systems you
should take into account the Turbo CPU clock rate where it applies.