Fugaku Holds Top Spot, Exascale Remains Elusive

FRANKFURT, Germany; BERKELEY, Calif.; and KNOXVILLE, Tenn.— The 57 ^th edition of the TOP500 saw little change in the Top10. The only new entry in the Top10 is the Perlmutter system at NERSC at the DOE Lawrence Berkeley National Laboratory. The machine is based on the HPE Cray "Shasta" platform and a heterogeneous system with both GPU-accelerated and CPU-only nodes. Perlmutter achieved 64.6 Pflop/s, putting the supercomputer at No. 5 in the new list.

The Japanese supercomputer Fugaku held onto the top spot on the list. A system codeveloped by Riken and Fujitsu, Fugaku has an HPL benchmark score of 442 Pflop/s. This performance exceeds the No. 2 Summit by 3x. The machine is based on Fujitsu's custom ARM A64FX processor. What's more, in single or further reduced precision, which is often used in machine learning and AI, Fugaku's peak performance is actually above an exaflop. Such an achievement has caused some to introduce this machine as the first "Exascale" supercomputer. Fugaku already demonstrated this new level of performance on the new HPL-AI benchmark with 2 Eflop/s.

Outside of this, we saw quite a few instances of Microsoft Azure and Amazon EC2 Cloud instances fairly high on the list. Pioneer-EUS, the machine to snag the No. 24 spot and the No.27 Pioneer-WUS2, rely on Azure. The Amazon EC2 Instance Cluster at No. 41 utilizes Amazon EC2.

Here is a summary of the systems in the Top10:

Fugaku remains the No. 1 system. It has 7,630,848 cores which allowed it to achieve an HPL benchmark score of 442 Pflop/s. This puts it 3x ahead of the No. 2 system in the list.
Summit, an IBM-built system at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, remains the fastest system in the U.S. and at the No. 2 spot worldwide with a performance of 148.8 Pflop/s on the HPL benchmark, which is used to rank the TOP500 list. Summit has 4,356 nodes, each housing two Power9 CPUs with 22 cores each and six NVIDIA Tesla V100 GPUs, each with 80 streaming multiprocessors (SM). The nodes are linked together with a Mellanox dual-rail EDR InfiniBand network.
Sierra, a system at the Lawrence Livermore National Laboratory, CA, USA is at No. 3. Its architecture is very similar to the #2 system Summit. It is built with 4,320 nodes with two Power9 CPUs and four NVIDIA Tesla V100 GPUs. Sierra achieved 94.6 Pflop/s.
Sunway TaihuLight, a system developed by China's National Research Center of Parallel Computer Engineering & Technology (NRCPC) and installed at the National Supercomputing Center in Wuxi, which is in China's Jiangsu province, is listed at the No. 4 position with 93 Pflop/s.
Perlmutter at No. 5 is new in the TOP10. It is based on the HPE Cray "Shasta" platform, and a heterogeneous system with AMD EPYC based nodes and 1536 NVIDIA A100 accelerated nodes. Perlmutter achieved 64.6 Pflop/s.
Selene, now at No. 6, is an NVIDIA DGX A100 SuperPOD installed inhouse at NVIDIA in the USA. The system is based on an AMD EPYC processor with NVIDIA A100 for acceleration and a Mellanox HDR InfiniBand as a network and achieved 63.4 Pflop/s.
Tianhe-2A (Milky Way-2A), a system developed by China's National University of Defense Technology (NUDT) and deployed at the National Supercomputer Center in Guangzhou, China, is now listed as the No. 7 system with 61.4 Pflop/s.
A system called "JUWELS Booster Module" is the No. 8. The BullSequana system build by Atos is installed at the Forschungszentrum Juelich (FZJ) in Germany. The system uses an AMD EPYC processor with NVIDIA A100 for acceleration and a Mellanox HDR InfiniBand as a network similar to the Selene System. This system is the most powerful system in Europe, with 44.1 Pflop/s.
HPC5 at No. 9 is a PowerEdge system build by Dell and installed by the Italian company Eni S.p.A. It achieves a performance of 35.5 Pflop/s due to using NVIDIA Tesla V100 as accelerators and a Mellanox HDR InfiniBand as a network.
Frontera, a Dell C6420 system is installed at the Texas Advanced Computing Center of the University of Texas and is now listed at No. 10. It achieved 23.5 Pflop/s using 448,448 of its Intel Xeon cores.

Other TOP500 highlights

Although there wasn't much change to the Top10, that doesn't mean there weren't interesting revelations within this year's list. To begin, it would appear that there is a marked increase in the use of AMD processors. Perlmutter, for instance, utilizes an AMD EPYC 7763 processor. At No. 6 on the list, Selene also has an AMD processor using the AMD EPYC 7742. Selene, an NVIDIA DGX A100 SuperPOD, exists at NVIDIA in the USA and was bumped down to its current ranking by the introduction of Perlmutter.

Another point of interest is the fact that this list saw fewer systems in China than would be normally expected. Chinese machines accounted for 186 supercomputers on the TOP500 list. The 56^th edition of the list saw 212 machines out of China, which is a significant drop. There hasn't been much definitive proof of why this is happening, but it certainly is something to note.

Like last year, the sheer number of Chinese systems (186) outpaces any other country. On the previous list, the country had 212 systems on the list. The USA, on the other hand, checked in with 123 systems. This was an increase over the 113 USA machines on the last TOP500 list. Despite having fewer machines, the performances of the USA machines easily outstripped Chinese supercomputers. The USA has an aggregate performance of 856.8 Pflop/s, while China's machines produced a performance of 445.3 Pflop/s.

There also wasn't much change in the variety of system interconnects. Ethernet is still used in around half of the systems (245), Infiniband was around a third of the machines (169), OmniPath interconnects made up less than one-tenth (42), and only one system relied on Myrinet. Custom interconnects accounted for 37 systems, while proprietary networks were found on 6 systems.

Green500 results

Although there was a trend of steady progress in the Green500, nothing has indicated a big step toward newer technologies.

The system to snag the No. 1 spot for the Green500 was MN-3 from Preferred Networks in Japan. Knocked from the top of the last list by NVIDIA DGX SuperPOD in the US, MN-3 is back to reclaim its crown. This system relies on the MN-Core chip, an accelerator optimized for matrix arithmetic, as well as a Xeon Platinum 8260M processor. MN-3 achieved a 29.70 gigaflops/watt power-efficiency and has a TOP500 ranking of 337.

HiPerGator AI of the University of Florida in the USA is now No.2 on the Green500 with a 29.52 gigaflops/watt power-efficiency. An NVIDIA machine boasts 138,880 cores, much more than any other machine in the Top5 of the Green500. Like many other systems on the list, this supercomputer utilizes an AMD processor – specifically the AMD EPYC 7742. With an overall performance that outpaces most of the other competition on the Green500, it's no surprise this machine holds the 22^nd spot on the TOP500 list.

The Wilkes-3 system out of the University of Cambridge in the U.K. has achieved the No. 3 spot. A Dell EMC machine, this supercomputer had an efficiency of 28.14 gigaflops/watt. Like HiPerGator AI, this system relies on an AMD EPYC processor – the AMD EPYC 7763. This system is ranked 101 on the TOP500 list.

The MeluXina machine of LuxProvide holds the No. 4 spot on the Green500. It's also unique in that it is the only machine on the list from Luxembourg. Like many other machines, MeluXina is also using an EMD EPYC processor. This system's efficiency was clocked at 26.96 gigaflops/wat, and it is ranked 37 on the TOP500.

Rounding out the Top5 of the Green500 is NVIDIA DGX SuperPOD. Created by NVIDIA in the USA, this machine also relies on an EMD EPYC processor. Ranked 216 on the TOP500, this system achieved 26.20 gigaflops/watt of efficiency. Again, this machine has an AMD EPYC processor.

Outside of the Top5, an incredible development is the Perlmutter system. This supercomputer was the only new machine in the Top10 in the TOP500 rankings; it also claims the No. 6 spot on the Green500. With a performance of 64.6 Pflop/s, this supercomputer's power efficiency of 25.55 gigaflops/watt is quite impressive.

HPCG Results

The TOP500 list has incorporated the High-Performance Conjugate Gradient (HPCG) Benchmark results, which provides an alternative metric for assessing supercomputer performance and is meant to complement the HPL measurement.

Fugaku, the clear winner of the TOP500, stayed consistent with last year's results with 16.0 HPCG-petaflops. Summit also stuck with the results from last year, with Summit achieving 2.93 HPCG-petaflops and Perlmutter, a new system, benchmarking with 1.91 HPCG-petaflops.

HPL-AI Results

The HPL-AI benchmark seeks to highlight the convergence of HPC and artificial intelligence (AI) workloads based on machine learning and deep learning by solving a system of linear equations using novel, mixed-precision algorithms that exploit modern hardware.

Much like the TOP500, RIKEN's Fugaku system is leading the pack here. As stated, this machine is considered by some to be the first "Exascale" supercomputer. Fugaku demonstrated this new level of performance on the new HPL-AI benchmark with 2 Exaflops.

About the TOP500 List

The first version of what became today's TOP500 list started as an exercise for a small conference in Germany in June 1993. Out of curiosity, the authors decided to revisit the list in November 1993 to see how things had changed. About that time, they realized they might be onto something and decided to continue compiling the list, which is now a much-anticipated, much-watched and much-debated twice-yearly event.

The TOP500 list is compiled by Erich Strohmaier and Horst Simon of Lawrence Berkeley National Laboratory; Jack Dongarra of the University of Tennessee, Knoxville; and Martin Meuer of ISC Group, Germany.

News

Fugaku Holds Top Spot, Exascale Remains Elusive

Current List