Feb. 6, 2017
By: Michael Feldman
NVIDIA has launched the Quadro GP100, the workstation version of its Pascal Tesla P100 product. The new five-teraflop GPU card is designed for desktop users doing HPC simulations, image rendering, and deep learning work.
The GP100 has the same feature set as the Tesla P100, namely 16GB of high bandwidth memory (HBM2), NVLink connectivity and more flops than most desktop users know what to do with. In that department, the GP100 offers 5 teraflops for 64-bit data (double precision), 10 teraflops for 32-bit (single precision), and 20 teraflops for 16-bit (half precision). The latter is especially relevant for deep learning applications, which can make good use of half precision floating point arithmetic.
Putting a couple of these cards into a workstation give a developer access to as many flops as would be available in a small CPU-based cluster – say, five or six servers – with the added benefit of lower precision floating point operations. Lenovo is promising to support as many as three GP100 GPUs in its ThinkStation P910, which would provide over 60 teraflops of deep learning capability on a desktop.
That’s a nice-sized sandbox for AI enthusiasts, but also for HPC application developers who desire the convenience of a personal computing environment. That would be true whether or not one intends to scale their code to an honest-to-god supercomputer in the future. The limitation of course is that your code must be suitable for GPUs, which nowadays applies to practically every deep learning code invented and most of the top HPC application package as well.
Of course, a GP100-powered workstation is also going to provide plenty of computational horsepower for more conventional users running CAD, image/video rendering or other visual computing applications, but for these workloads there is probably less need for the HBM2 and NVLink capabilities. For these graphically-oriented applications, NVIDIA offers the P2000, P1000, P600, and P400, which are Pascal GPUs without the HBM2 and NVLink features, and in most cases, less flops.
Even with the GP100, the NVLink feature will only be utilized to speed up GPU-to-GPU data transfers, since the host processor in all the workstations will almost certainly be some sort of x86 CPU, none of which support NVLink. Having this support at the host end can significantly speed up access to main memory, which tends to be a choke point for many GPU applications since it’s limited by the relatively slow speed of the PCIe connection. Currently the only CPU that supports NVLink is the Power8 processor. If IBM or some other enterprising OpenPower vendor was interested, they could build a Power8-GP100 desktop system that would represent a unique solution for really high-end users.
In the meantime, users can at least look forward putting tens of teraflops onto their Windows or Linux desktop. The new Quadro products are scheduled to show up in systems starting in March from Dell, HP, Lenovo, and Fujitsu. Prices of the new cards and workstations have yet to be revealed.