Highlights - November 2020

This is the 56th edition of the TOP500.

After a make-over of the Top10 in June we again see some interesting changes driven by two system upgrades (#1 Fugaku and #5 Selene) and two new systems (#7 JUWELS Booster Module and #10 Dammam-7). The full list however recorded the smallest number of new entries ever since the project started in 1993.

Supercomputer Fugaku, a system based on Fujitsu’s custom ARM A64FX processor remains the new No. 1. It is installed at the RIKEN Center for Computational Science (R-CCS) in Kobe, Japan, the location of the former K-Computer. It was co-developed in close partnership by Riken and Fujitsu and uses Fujitsu’s Tofu D interconnect to transfer data between nodes. It increased in size by about 5% which allowed it to improve its HPL benchmark score to 442 Pflop/s easily exceeding the No. 2 Summit by 3x.

In half precision (16 bit floating point arithmetic), which are often used in machine learning and AI applications, it’s peak performance is actually above 2,000 PFlop/s (= 2 Exaflop/s) and because of this, it is often introduced as the first ‘Exascale’ supercomputer. Fugaku actually already demonstrated this new level of performance on the new HPL-AI benchmark which was measured in June 2020 at just over 1.4 Exaflops. It has now increased its performance on this new benchmark to 2 Exaflops! These are the first measurements above 1 Exaflop for any precision on any type of hardware and with this Fugaku is heralding in the age of Exaflops! (https://www.r-ccs.riken.jp/en/)

Here a brief summary of the system in the Top10:

Fugaku remains the No. 1 system. It grew slightly in size from 7,299,072 cores to 7,630,848 cores which allowed it to improve its HPL benchmark score from 416 Pflop/s to 442 Pflop/s. This puts it by 3x ahead of the No. 2 system in the list.
Summit, an IBM-built system at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, remains the fastest system in the U.S. now at the No. 2 spot worldwide with a performance of 148.8 Pflop/s on the HPL benchmark, which is used to rank the TOP500 list. Summit has 4,356 nodes, each one housing two Power9 CPUs with 22 cores each and six NVIDIA Tesla V100 GPUs each with 80 streaming multiprocessors (SM). The nodes are linked together with a Mellanox dual-rail EDR InfiniBand network.
Sierra, a system at the Lawrence Livermore National Laboratory, CA, USA is now at No. 3. It’s architecture is very similar to the new #2 systems Summit. It is built with 4,320 nodes with two Power9 CPUs and four NVIDIA Tesla V100 GPUs. Sierra achieved 94.6 Pflop/s.
Sunway TaihuLight, a system developed by China’s National Research Center of Parallel Computer Engineering & Technology (NRCPC) and installed at the National Supercomputing Center in Wuxi, which is in China's Jiangsu province is listed at the No. 4 position with 93 Pflop/s.
Selene at No. 5 is an NVIDIA DGX A100 SuperPOD installed inhouse at NVIDIA in the USA. It was listed as No. 7 in June and has doubled in size which allowed it to move up the list by two positions. The system is based on AMD EPYC processor with NVIDIA A100 for acceleration and a Mellanox HDR InfiniBand as network and achieved 63.4 Pflop/s after it’s upgrade.

Rank	Site	System	Cores	Rmax (TFlop/s)	Rpeak (TFlop/s)	Power (kW)
1	RIKEN Center for Computational Science Japan	Supercomputer Fugaku - Supercomputer Fugaku, A64FX 48C 2.2GHz, Tofu interconnect D Fujitsu	7,630,848	442.01	537.21	29,899
2	DOE/SC/Oak Ridge National Laboratory United States	Summit - IBM Power System AC922, IBM POWER9 22C 3.07GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband IBM	2,414,592	148.60	200.79	10,096
3	DOE/NNSA/LLNL United States	Sierra - IBM Power System AC922, IBM POWER9 22C 3.1GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband IBM / NVIDIA / Mellanox	1,572,480	94.64	125.71	7,438
4	National Supercomputing Center in Wuxi China	Sunway TaihuLight - Sunway MPP, Sunway SW26010 260C 1.45GHz, Sunway NRCPC	10,649,600	93.01	125.44	15,371
5	NVIDIA Corporation United States	Selene - NVIDIA DGX A100, AMD EPYC 7742 64C 2.25GHz, NVIDIA A100, Mellanox HDR Infiniband Nvidia	555,520	63.46	79.22	2,646
6	National Super Computer Center in Guangzhou China	Tianhe-2A - TH-IVB-FEP Cluster, Intel Xeon E5-2692v2 12C 2.2GHz, TH Express-2, Matrix-2000 NUDT	4,981,760	61.44	100.68	18,482
7	Forschungszentrum Juelich (FZJ) Germany	JUWELS Booster Module - Bull Sequana XH2000 , AMD EPYC 7402 24C 2.8GHz, NVIDIA A100, Mellanox HDR InfiniBand/ParTec ParaStation ClusterSuite EVIDEN	449,280	44.12	70.98	1,764
8	Eni S.p.A. Italy	HPC5 - PowerEdge C4140, Xeon Gold 6252 24C 2.1GHz, NVIDIA Tesla V100, Mellanox HDR Infiniband DELL	669,760	35.45	51.72	2,252
9	Texas Advanced Computing Center/Univ. of Texas United States	Frontera - Dell C6420, Xeon Platinum 8280 28C 2.7GHz, Mellanox InfiniBand HDR DELL	448,448	23.52	38.75
10	Saudi Aramco Saudi Arabia	Dammam-7 - Cray CS-Storm, Xeon Gold 6248 20C 2.5GHz, NVIDIA Tesla V100 SXM2, InfiniBand HDR 100 HPE	672,520	22.40	55.42

Tianhe-2A (Milky Way-2A), a system developed by China’s National University of Defense Technology (NUDT) and deployed at the National Supercomputer Center in Guangzho, China is now listed as the No. 6 system with 61.4 Pflop/s.
A new supercomputer, known as the JUWELS Booster Module, debuts at number seven on the list. The Atos-built BullSequana machine was recently installed at the Forschungszentrum Jülich (FZJ) in Germany. It is part of a modular system architecture and a second Xeon based JUWELS Module is listed separately on the TOP500 at position 44. These modules are integrated by using the ParTec Modulo Cluster Software Suite. The Booster Module uses AMD EPYC processors with NVIDIA A100 GPUs for acceleration similar to the number five Selene system. Running by itself the JUWELS Booster Module was able to achieve 44.1 HPL petaflops, which makes it the most powerful system in Europe.
HPC5, a Dell PowerEdge system installed by the Italian company Eni S.p.A., is ranked 8^th. It achieves a performance of 35.5 petaflops using Intel Xeon Gold CPUs and NVIDIA Tesla V100 GPUs. It is the most powerful system in the list used for commercial purposes at a customer site.
Frontera, a Dell C6420 system was installed at the Texas Advanced Computing Center of the University of Texas last year and is now listed at No. 9. It achieved 23.5 Pflop/s using 448,448 of its intel Xeon cores.
The second new system in the TOP10 is Dammam-7 listed at No. 10. It is installed at Saudi Aramco in Saudi Arabia and the second commercial installation in the current TOP10. This is fairly unusual as the TOP10 is usually solely populated with system installed at government funded research centers. The HPE Cray CS-Storm systems uses NVIDIA Tesla V100 for acceleration and an InfiniBand network. It was measured with 22.4 Pflop/s on the HPL benchmark which now marks the entry level to the TOP10.

Highlights from the List

A total of 147 systems on the list are using accelerator/co-processor technology, up from 146 six months ago. 0 of these use NVIDIA Hopper chips, 6 use NVIDIA Ampere, and 0 systems with AMD Instinct.
Intel continues to provide the processors for the largest share (91.80 percent) of TOP500 systems, down from 94.20 % six months ago. 21 (4.20 %) of the systems in the current list used AMD processors, up from 2.00 % six months ago.
Supercomputer Fugaku maintains the leadership followed by the 2 top DOE systems Sierra and Summit in the #2 and #3 spots with respect to HPCG performance.
The entry level to the list moved up to the 1.32 Pflop/s mark on the Linpack benchmark.
The last system on the newest list was listed at position 463 in the previous TOP500.
Total combined performance of all 500 exceeded the Exaflop barrier with now 2.43 exaflop/s (Eflop/s) up from 2.21 exaflop/s (Eflop/s) 6 months ago.
The entry point for the TOP100 increased to 3.15 Pflop/s.
The average concurrency level in the TOP500 is 144,932 cores per system up from 142,320 six months ago.

General Trends

Installations by countries/regions:

TOP 10 HPC manufacturer:

TOP 10 Interconnect Technologies:

TOP 10 Processor Technologies:

Green500

The data collection and curation of the Green500 project has been integrated with the TOP500 project. This allows submissions of all data through a single webpage at http://top500.org/submit

Rank	TOP500 Rank	System	Cores	Rmax (TFlop/s)	Power (kW)	Energy Efficiency (GFlops/watts)
1	170	NVIDIA DGX SuperPOD - NVIDIA DGX A100, AMD EPYC 7742 64C 2.25GHz, NVIDIA A100, Mellanox HDR Infiniband , Nvidia NVIDIA Corporation United States	19,840	2.36	90	26.195
2	330	MN-3 - MN-Core Server, Xeon Platinum 8260M 24C 2.4GHz, Preferred Networks MN-Core, MN-Core DirectConnect , Preferred Networks Preferred Networks Japan	1,664	1.65	65	26.039
3	7	JUWELS Booster Module - Bull Sequana XH2000 , AMD EPYC 7402 24C 2.8GHz, NVIDIA A100, Mellanox HDR InfiniBand/ParTec ParaStation ClusterSuite , EVIDEN Forschungszentrum Juelich (FZJ) Germany	449,280	44.12	1,764	25.008
4	146	Spartan2 - Bull Sequana XH2000 , AMD EPYC 7402 24C 2.8GHz, NVIDIA A100, Mellanox HDR Infiniband , EVIDEN Atos France	23,040	2.57	106	24.262
5	5	Selene - NVIDIA DGX A100, AMD EPYC 7742 64C 2.25GHz, NVIDIA A100, Mellanox HDR Infiniband , Nvidia NVIDIA Corporation United States	555,520	63.46	2,646	23.983
6	239	A64FX prototype - Fujitsu A64FX, Fujitsu A64FX 48C 2GHz, Tofu interconnect D , Fujitsu Fujitsu Numazu Plant Japan	36,864	2.00	118	16.876
7	29	AiMOS - IBM Power System AC922, IBM POWER9 20C 3.45GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband , IBM Rensselaer Polytechnic Institute Center for Computational Innovations (CCI) United States	130,000	8.34	512	16.285
8	8	HPC5 - PowerEdge C4140, Xeon Gold 6252 24C 2.1GHz, NVIDIA Tesla V100, Mellanox HDR Infiniband , DELL Eni S.p.A. Italy	669,760	35.45	2,252	15.740
9	458	Satori - IBM Power System AC922, IBM POWER9 20C 2.4GHz, Infiniband EDR, NVIDIA Tesla V100 SXM2 , IBM MIT/MGHPCC Holyoke, MA United States	23,040	1.46	94	15.574
10	1	Supercomputer Fugaku - Supercomputer Fugaku, A64FX 48C 2.2GHz, Tofu interconnect D , Fujitsu RIKEN Center for Computational Science Japan	7,630,848	442.01	29,899	15.418

The most energy-efficient system and No. 1 on the Green500 is a new NVIDIA DGX SuperPOD from NVIDIA installed at NVIDIA Corp., United States. It achieved 26.195 GFlops/Watt power-efficiency during its 2.356 Pflop/s Linpack performance run. It is listed on position 172 in the TOP500.
The previous No. 1 and now No. 2 on the Green500 is a MN-Core Server from Preferred Networks installed at Preferred Networks , Japan. It achieved 26.04 GFlops/Watt power-efficiency during its 1.652 Pflop/s Linpack performance run. It is listed on position 332 in the TOP500.
In third position is the JUWELS Booster Module by Atos installed at Forschungszentrum Juelich (FZJ) in Germany. It achieves 25 GFlops/Watt energy efficiency. It is on position 7 in the TOP500.

HPCG Results

The Top500 list now includes the High-Performance Conjugate Gradient (HPCG) Benchmark results.

Rank	TOP500 Rank	System	Cores	Rmax (TFlop/s)	HPCG (TFlop/s)
1	1	Supercomputer Fugaku - Supercomputer Fugaku, A64FX 48C 2.2GHz, Tofu interconnect D , RIKEN Center for Computational Science Japan	7,630,848	442.01	16004.50
2	2	Summit - IBM Power System AC922, IBM POWER9 22C 3.07GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband , DOE/SC/Oak Ridge National Laboratory United States	2,414,592	148.60	2925.75
3	3	Sierra - IBM Power System AC922, IBM POWER9 22C 3.1GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband , DOE/NNSA/LLNL United States	1,572,480	94.64	1795.67
4	5	Selene - NVIDIA DGX A100, AMD EPYC 7742 64C 2.25GHz, NVIDIA A100, Mellanox HDR Infiniband , NVIDIA Corporation United States	555,520	63.46	1622.51
5	7	JUWELS Booster Module - Bull Sequana XH2000 , AMD EPYC 7402 24C 2.8GHz, NVIDIA A100, Mellanox HDR InfiniBand/ParTec ParaStation ClusterSuite , Forschungszentrum Juelich (FZJ) Germany	449,280	44.12	1275.36
6	10	Dammam-7 - Cray CS-Storm, Xeon Gold 6248 20C 2.5GHz, NVIDIA Tesla V100 SXM2, InfiniBand HDR 100 , Saudi Aramco Saudi Arabia	672,520	22.40	881.40
7	8	HPC5 - PowerEdge C4140, Xeon Gold 6252 24C 2.1GHz, NVIDIA Tesla V100, Mellanox HDR Infiniband , Eni S.p.A. Italy	669,760	35.45	860.32
8	19	TOKI-SORA - PRIMEHPC FX1000, A64FX 48C 2.2GHz, Tofu interconnect D , Japan Aerospace eXploration Agency Japan	276,480	16.59	614.22
9	13	Trinity - Cray XC40, Xeon E5-2698v3 16C 2.3GHz, Intel Xeon Phi 7250 68C 1.4GHz, Aries interconnect , DOE/NNSA/LANL/SNL United States	979,072	20.16	546.12
10	33	Plasma Simulator - SX-Aurora TSUBASA A412-8, Vector Engine Type10AE 8C 1.58GHz, Infiniband HDR 200 , National Institute for Fusion Science (NIFS) Japan	34,560	7.89	529.16

Supercomputer Fugaku is now the leader on the HPCG benchmark with 16 PFlop/s.
The two DOE systems Summit at ORNL and Sierra at LLNL are now at second and third position on the HPCG benchmark. Summit achieved 2.93 HPCG-Pflop/s and Sierra 1.80 HPCG-Pflop/s.

About the TOP500 List

The first version of what became today’s TOP500 list started as an exercise for a small conference in Germany in June 1993. Out of curiosity, the authors decided to revisit the list in November 1993 to see how things had changed. About that time they realized they might be onto something and decided to continue compiling the list, which is now a much-anticipated, much-watched and much-debated twice-yearly event.