Nov. 8, 2018
By: Michael Feldman
AMD has announced the MI50 and MI60, the company’s latest Radeon Instinct GPUs targeted to high performance computing, deep learning, rendering, and cloud computing.
Based on TSMC’s 7nm process technology, the two Vega20-generation GPUs offer terascale-level performance across these workloads. The MI60 offering is the more powerful of the two, offering 7.4 FP64 teraflops for HPC, 14.7 FP32 teraflops and 29.5 FP16 teraflops for deep learning training, and 59 INT8 teraops for deep learning inferencing. The MI50 has about 10 percent less performance than the MI60 across the board.
Both, however, are a good deal more powerful than the first-generation Radeon Instinct GPUs that AMD introduced last year (the MI25, MI18, and MI16), which were aimed exclusively at deep learning acceleration. The most powerful one, the MI25, delivers 12.3 FP32 teraflops and 24.6 FP16 teraflops, but only 768 FP64 gigaflops.
The more important distinction between the MI60 and MI50 is in the on-package memory, in this case the HBM2 stacked memory that is now fairly commonplace in datacenter GPUs. The MI60 offers 32 GB of this memory, while the MI50 comes with 16 GB. In both cases, AMD is claiming a record 1 TB/sec of memory bandwidth, which is about twice the speed of the HBM2 memory in the MI25 and MI18 boards. Despite the differences in memory capacity and computational performance for the MI60 and MI50, both boards claim a TDP of 300 watts.
A noteworthy feature in these second-generation products is the support of AMD’s Infinity Fabric, which can connect as many as four GPUs in peer-to-peer fashion. Each GPU is outfitted with two 100 GB/sec Infinity links, delivering a total of 200 GB/sec of bandwidth. The MI60 and MI50 are also the first GPUs that will be able to communicate with their CPU host over a PCIe 4.0 link, providing twice the bandwidth of PCIe 3.0. Conveniently, AMD’s upcoming Zen 2 EPYC CPUs, codenamed Rome, will also support PCIe 4.0. We’re not sure why AMD didn’t extend Infinity Fabric support for CPU-to-GPU communications across their products, since it would provide about three times the bandwidth of the PCIe 4.0 x16 setup.
The raw numbers for the MI50 and MI60 stack up pretty well against NVIDIA’s latest and greatest datacenter GPU, the Tesla V100. That accelerator tops out at 7.8 FP64 teraflops, 15.7 FP32 teraflops, and 125 Tensor teraflops – the latter a result of the V100’s specialized Tensor Cores specifically designed for deep learning computations. Those numbers are for the NVLink variant of the GPU; the PCIe version is about 10 percent less performant. The V100’s HBM2 memory bandwidth is 900 GB/sec, just slightly less then its AMD counterpart. NVIDIA is still faster in GPU-to-GPU communications, with its 300 GB/sec NVLink technology, but is still limited by its use of PCIe 3.0 for GPU-to-CPU communications (with the exception of the IBM Power processors supporting NVLink natively).
At least from a hardware perspective, the MI50 and MI60 boards appear to offer substantial competition to NVIDIA, which has essentially had the HPC/deep learning accelerator market to itself. As always, one of the big questions will be software support. AMD’s strategy for datacenter GPUs has always been to use an open source approach for developers and that’s what they’ve done here. In this case, AMD announced a new version of its ROCm open software platform that supports the new Radeon Instinct boards and all its bells and whistles, including optimizations for deep learning operations (DLOPS).
Perhaps the biggest question will be pricing. A likely strategy would be for AMD to offer the best price-performance, undercutting NVIDIA Tesla accelerators with aggressive pricing in the same manner that the company has done with the its EPYC CPUs. And that may be a big determining factor of whether the big buyers of these types of GPUs – principally hyperscale cloud companies and supercomputing labs – embrace them. We’ll know soon enough. The MI60 is scheduled to ship by the end of 2018, while the MI50 is expected to ship by the end of Q1 2019.