June 1, 2017
By: Michael Feldman
NVIDIA is hooking up with four of the world’s largest original design manufacturers (ODMs) to help accelerate adoption of its GPUs into hyperscale datacenters. The new partner program would give Foxconn, Inventec, Quanta and Wistron early access to the HGX reference architecture, NVIDIA’s server design for machine learning acceleration.
Source: NVIDIA
HGX is an attempt to establish an industry-standard GPU box that maximizes computational density for machine learning workloads. It uses NVIDIA’s most advanced GPUs, namely the Tesla P100, and soon, the Tesla V100. It glues eight of these into an NVLink cube mesh, and uses PCIe switching to allow CPUs to dynamically connect to them. Examples of this architecture include Microsoft's Project Olympus HGX-1 chassis, Facebook's Big Basin system, and NVIDIA’s own DGX-1 server.
Facebook’s Big Basin and Microsoft’s HGX-1 systems are GPU-only boxes, which rely on external CPU servers as hosts. Since the processor and co-processor are disaggregated, applications can fiddle with GPU-CPU ratio as needed. In most machine learning situations, you want a rather high ratio of GPUs to CPUs, since most of the processing ends up on the graphics chip. And in hyperscale/cloud datacenters, you also want the flexibility of allocating these resources dynamically as workloads shift around.
The DGX-1 server is a different animal altogether. It’s a stand-alone machine learning appliance and includes two Xeon processors, along with the same eight-GPU NVLink mesh of its hyperscale cousins. As such, it’s not meant for cloud duty, but rather for businesses, research organizations, and software development firms that want an in-house machine learning box. SAP in the most prominent commercial buyer of the DGX-1, at least of those revealed publicly. But NVIDIA never intended to sell boatloads of these systems, especially since a lot of customers would prefer to rent machine learning cycles from cloud providers.
That’s why the ODM partnership could end up paying big dividends. These manufacturers already have the inside track with hyperscale customers, who have figured out that they can use these companies to get exactly the gear they want, and at sub-OEM pricing. ODMs are also more nimble than traditional server-makers, inasmuch that they can shorten the design-to-production timeline. That makes them better suited to the nearly continuous upgrade cycle of these mega-datacenters.
Given that the HGX-1 is manufactured by Foxconn subsidiary Ingrasys and the Big Basin system is built by Quanta, it’s a logical step for NVIDIA to include the other big ODMs, Inventec and Wistron, into the fold. The goal is to bring a wider range of HGX-type machinery to market and make them available to hyperscale customers other than just Microsoft and Facebook.
The other aspect of this is that NVIDIA would like to solidify its dominance with machine learning customers before Intel brings its AI-optimized silicon to market. Startup companies like Wave Computing and Graphcore also are threatening to challenge NVIDIA with their own custom chips. Establishing an industry-standard architecture before these competing solutions get market traction would help NVIDIA maintain its leadership.
To some extent, NVIDIA is also competing with some its biggest customers, like Google and Microsoft, both of which are building AI clouds based on their own technologies. In the case of Google, it’s their Tensor Processor Unit (TPU), which the search giant has upgraded for an expanded role that threatens NVIDIA directly. Meanwhile, Microsoft is filling out its AI infrastructure with an FPGA-based solution that, likewise, could sideline NVIDIA GPUs in Azure datacenters.
The prospect of using the future V100 Tesla GPUs in HPX platforms actually intensifies the competition, since these upcoming processors are built for both neural net training and inferencing. Although NVIDIA used to build its own inferencing-specific GPUs (the M4 and M40, followed by the P4 and P40), inferencing is also performed by regular CPUs and FPGAs, not to mention Google’s TPUs, running in regular cloud servers.
Inferencing has somewhat different requirements than training, especially with regards to minimizing latency, but with the Volta architecture and the V100, NVIDIA thinks it has designed a solution that is capable of doing both, and doing so competitively. From a hyperscale company’s point of view, there are some obvious advantages in separating inferencing, and certainly training infrastructure from the rest of the server farm – not the least of which is being able to deploy and run machine learning gear in a more flexible manner. And since these upcoming V100 GPUs will be used by hyperscale companies for training, they are also likely to get a shot at some of those same companies’ inference workloads.
Finally, if NVIDIA manages to establish HGX as the standard GPU architecture for AI clouds, it makes its own recently announced GPU cloud platform more attractive. Since NVIDIA’s cloud stack of machine learning libraries and frameworks runs on top of other peoples’ infrastructure, pushing its HGX architecture into the ecosystem would make NVIDIA’s job of supporting the various hardware solutions that much simpler. It would also make it easier for customers to switch cloud providers without having to tweak their own software.
We’ll be able to tell if these ODM relationships pay off when we start seeing additional HGX solutions coming to market and being adopted by various cloud providers. As NVIDIA likes to remind us, its GPUs are used in the world’s top 10 hyperscale businesses today. If all goes as planned, someday it will be able to make the same claim for HGX.