Nov. 27, 2017
By: Michael Feldman
Inspur has been tasked to build a one petaflop supercomputer for Central China Normal University (CCNU), which will use the system to host both traditional HPC applications and deep learning workloads.
When installed, the supercomputer will be comprised of 18 AGX-2 servers, each outfitted with eight of NVIDIA’s new V100 GPUs and two Intel Skylake processors. The 144 GPUs will deliver over one peak petaflops of double precision computing and more than 17 petaflops of deep learning performance, the latter courtesy of the V100’s specialized Tensor Cores. The servers will be hooked together with Mellanox EDR InfiniBand.
The AGX-2 was launched in May 2017, and is being touted by Inspur as “an ultra-high density AI computing server to accelerate Artificial Intelligence.” Besides the eight-GPU capability, the server can support up to two terabytes of main memory, four 100Gbps EDR ports, and eight NVMe/SAS/SATA drivers. A GPU expansion box can also be added to provide up to 16 GPUs per dual-socket server. The AGX-2 can be configured with either the PCIe V100 or the NVLink 2.0-enabled version. The CCNU machine will be equipped with the NVLink GPUs.
According to the press release, the system will be used by CCNU researchers to run physics modeling and autonomous driving applications. Inspur claims that VASP (Vienna Ab initio simulation package), a physics and material science code, can be accelerated by a factor of eight using a AGX-2 server equipped with a single V100 GPU, compared to a conventional CPU-based system. The Inspur box also is able to deliver 1898 images per second using the ImageNet dataset for deep learning training. Using a TensorFlow-trained GoogleNet model, a 1.87x speedup is achieved on the V100-powered server, compared to a similarly equipped system with NVIDIA’s previous generation P100 GPUs
Besides the hardware, Inspur is also supplying its ClusterEngine system management software, as well as AIStation, the company’s new deep learning cluster management tool. AIStation was announced at the GPU Technology Conference in May (GTC 2017), and is designed to speed up deep learning training applications across GPU-powered clusters.
Inspur has 56 systems on the latest TOP500 list, 55 of which are installed at Internet service companies. The remaining system is deployed at Qingdao National Laboratory for Marine Science and Technology. Twenty-two of the Inspur machines achieved more than a petaflop on Linpack, although none are accelerated by GPUs.
The university plans to upgrade the supercomputer to a multi-petaflop system in the future.