Oct. 31, 2018
By: Michael Feldman
The National Energy Research Scientific Computing (NERSC) Center at Lawrence Berkeley National Laboratory has purchased a Cray supercomputer that will be one of the world’s first Shasta-class systems to come online.
The machine, known as Perlmutter, (named after the lab’s Nobel Prize-winning astrophysicist Saul Perlmutter) comes with a price tag of $146 million. That includes the cost of the machine plus multiple years of service and support from Cray.
When it comes online in 2020, Perlmutter is expected to deliver more than three times the computational performamce currently installed at NERSC. Given that most of the current capacity is provided by the 27.9-petaflop Cori supercomputer, we can assume new system will top out at close to 100 peak petaflops.
According to Nick Wright, the lead for NERSC’s Advanced Technology Group, the system will be a mix of CPU-only nodes and GPU-accelerated nodes, with more of the former than the latter. The GPUs will be sourced from NVIDIA, and will most like be the company’s next-generation Tesla offering; AMD will be providing the CPUs, presumably based on its third-generation EPYC processors (codenamed Milan). The CPU-only nodes will be dual-socket servers, while the NVIDIA-accelerated nodes will have a single CPU and four GPUs. The system will be hooked together with Cray’s Slingshot interconnect, a technology the company is introducing in conjunction with its Shasta platform.
Perlmutter represents the lab’s NERSC-9 machine, which as recently as 2017 was going to deliver over 100 petaflops of capacity. That suggests the NERSC originally had another design in mind, perhaps, like Cori, based on Intel silicon. Since Intel dropped its Xeon Phi processor line and is way behind schedule even getting its 10nm Xeon CPUs into the field, it’s possible the lab had to turn to AMD and NVIDIA to meet the system’s planned deployment date.
By the time Perlmutter is up and running in 2020, Cori will be four years old and likely near the end of its useful life. As such, the new system will be the principal supercomputer NERSC’s 7,000-plus users will turn to for their large-scale HPC jobs. In addition to the Perlmutter’s increased computational power, it will also be equipped with an integrated 30PB all-flash storage filesystem. That will make it much more capable than its predecessor at processing the kind of massive datasets that researchers are now using to feed their scientific simulations, analytics, and machine learning workloads.
After Perlmutter will come NERSC-10, which is slated to be the lab’s first exascale machine. That system is tentatively planned to be installed in 2024.