Nov. 30, 2016
By: Michael Feldman
The next-generation MareNostrum 4 supercomputer at the Barcelona Supercomputer Center (BSC) will have an unusual configuration, to say the least. Instead of selecting a single architecture for the new machine, the powers that be at BSC have decided to build a system that consists of multiple platforms, incorporating a diverse set of technologies from IBM, Lenovo, Fujitsu, Intel and NVIDIA.
According to the BSC press announcement, the 13-petaflop MareNostrum 4 will be split into two parts. The “general purpose” part will be a standard cluster provided by Lenovo, consisting of 48 racks (3,400 nodes) of Intel Xeon-based servers, with an aggregate memory capacity of 390 terabytes. Peak performance is anticipated to be 11 petaflops. That’s 10 times more powerful, flops-wise, than the current MareNostrum 3 supercomputer, an IBM iDataPlex machine installed in 2012. and upgraded in 2013.
Source: Barcelona Supercomputer Center
The second part of the new supercomputer will be made up of three different clusters, each of which will incorporate a different set of “emerging technologies.” One of the clusters will be an IBM system that is based on heterogeneous nodes housing Power9 CPUs and NVIDIA GPUs. BSC indicates this cluster will use the same components that IBM is using for the Summit and Sierra supercomputers that will deployed by the US Department of Energy (DOE) next year. That would mean this system will get the future Volta GPUs from NVIDIA, as well as EDR InfiniBand from Mellanox. Peak performance is expected to be 1.5 petaflops.
Another cluster will be powered Xeon Phi silicon, in this case, Knights Landing and Knights Hill processors. Curiously, the Knights Landing nodes will be provided by Fujitsu and the Knights Hill nodes will be provided by Lenovo. Again, BSC has taken its cue from the DOE here, which is using this processor set in the Theta (Knights Landing) and Aurora (Knights Hill) supercomputers. This dual-Xeon Phi system will deliver 500 peak teraflops.
The final cluster will employ the vector-enhanced ARMv8 processors that Fujitsu is developing for its Post-K exascale supercomputer. BSC is characterizing this as a “prototype machine,” which suggests a pre-production version of the Post-K platform, the final version of which is not scheduled to be deployed until after 2020. Like the Xeon Phi cluster, this ARM-based system is expected to deliver 500 peak teraflops.
The rationale for deploying these smaller, more exotic clusters in conjunction with the standard Xeon-based Lenovo system is to evaluate the most likely technologies that will power BSC’s next-generation supercomputer after MareNostrum 4. The larger Lenovo cluster, which makes up about 80 percent of the performance of the aggregate system, will be used for most of the production work at BSC.
BSC is receiving €30 million from the Spanish government to purchase MareNostrum 4, as well as upgrade some of the associated power and cooling infrastructure. An additional €4 million has been allocated for the system’s 10-petabyte disk storage system.
BSC has not specified a timeframe for installation of the various clusters, but given their reliance on future processors not expected to be commercially available until the second half of 2017, it’s unlikely that the project will be complete before the end of next year, at the earliest.