Aug. 7, 2017
By: Michael Feldman
Oak Ridge National Laboratory has begun to install Summit, the IBM-NVIDIA-powered system that is likely to become the most powerful supercomputer in the world when completed.
The news comes courtesy of Oak Ridge Today, which reported that the first cabinets for Summit arrived last Monday (July 31). According to ORNL spokesperson Morgan McCorkle, once the crates are unpacked, they will begin installing the internal computational and networking components and hook them into the power and cooling infrastructure at the Oak Ridge Leadership Computing Facility (OLCF).
Installation is expected to take six months of more, with the system expected to become generally available to scientific users by January 2019. However, select application developers at the Department of Energy and a handful of universities will get a crack at it well before that. McCorkle told TOP500 News that the pre-production Summit will be available via the Center for Accelerated Application Readiness, an early-access program designed to allow developers to port and optimize grand challenge codes for Summit’s new CPU-GPU architecture.
All of that suggests that the system may not be up and running until well into 2018, and will not turn up in the TOP500 list until next June. At that point, absent another surprise from China, it still has an excellent chance of unseating the current supercomputing champ, TaihuLight. That system has a peak performance of 125.4 petaflops and a Linpack result of 93 petaflops. Later this year, China is expected to deploy Tianhe-2a, a supercomputer expected to deliver around 100 petaflops, although, as we reported back in January, that number could rise in concert with China’s ambition to own the number one spot.
Officially, Summit is expected to be 5 to 10 times as powerful as Titan, ORNL’s current top system. Titan is currently ranked as number four on the TOP500, with a Linpack mark of 17.6 petaflops (from 27.1 peak petaflops). Given that Summit will be comprised of approximately 4,600 nodes, each containing six 7.5-teraflop NVIDIA V100 GPUs, and two IBM Power9 CPUs, its aggregate peak performance should be well north of 200 petaflops. The GPUs alone provide this level of performance.
Another possibility is that ORNL will run Linpack on a partially completed Summit in October or November, which at that point may be large enough to recapture the top supercomputing spot for the US. A possible glitch is that IBM has not officially launched its Power9 processor, and is not expected to do so until early 2018. But some number of chips will certainly be available before that, and, in fact, it’s unlikely that IBM would be shipping crates of Power9 servers to Oak Ridge without their CPUs.
Regardless of who is at the top of the supercomputing heap, Summit will be a unique resource for the DOE and its research community. Besides providing unprecedented amounts of computational capacity for traditional HPC applications, it will offer the largest platform in the world for deep learning workloads. Assuming the system is configured as advertised, it will deliver something like 3.3 exaflops of deep learning performance (mixed 16/32-bit precision math). That’s thanks to the Tensor Cores in the V100 GPUs, which were specifically bred to accelerate the type of matrix operations involved in this kind of software. As a result, Summit will be an exceptional resource for testing the limits of the neural networks used for deep learning.
Summit is also the last stop on the way to exascale, at least for the gang at Oak Ridge. Given the cadence of supercomputer upgrades at the DOE, the next big deployment at ORNL will almost certainly be an exascale machine – perhaps the first in the US. Whether that turns out to be a future implementation of Summit’s CPU-GPU architecture, or something else entirely, remains to be seen.
Image source: Oak Ridge Leadership Computing Facility (OLCF), distributed under Creative Commons license