Oct. 11, 2016
By: Michael Feldman
Intel announced it is sampling its Stratix 10 FPGAs, the latest family of field programmable gate arrays that are designed to accelerate a number of datacenter workloads. The new devices, which Intel is calling “the most significant FPGA innovations in over a decade,” offer advanced features like embedded 64-bit ARM processors, second-generation High Bandwidth Memory (HBM2), and DSP blocks.
Image: Intel
The Stratix 10 represents the Intel’s first serious foray into the FPGA-accelerated datacenter since it acquired Altera last year for $16.7 billion. When it initiated that purchase, the chipmaker boldly predicted that 30 percent of datacenter servers will be equipped with FPGAs by 2020. Although such a level of adoption is hard to envision today, the introduction of the Stratix 10 sets the stage for what is likely to be an even more aggressive strategy going forward.
The new Stratix chips will be manufactured with the company’s 14nm FinFET technology, which, according to Intel, provides five times the density of logic elements as the previous-generation Stratix V devices built on TSMC’s 28nm process node. The chipmaker is also claiming 70 percent lower power draw than the Stratix V for equivalent performance. Wattage aside, the Stratix 10 is supposed to deliver twice the throughput of its predecessor.
Part of this is undoubtedly related to the smaller transistor sizes on the new silicon, but when you throw in the DSP add-in, that’s bound to have a big impact on performance-per-watt as well. The DSP component in isolation offers 10 teraflops of single precision floating point and 80 gigaflops per watt according to the hardware specification.
The HBM2 is also central to better performance and energy efficiency, and definitely places the Stratix 10 in the upper echelons of reconfigurable computing devices. The 3D memory technology will offer up to 1 terabyte/second of bandwidth to feed the FPGA, DSP, and ARM CPU. The use of this technology suggests an affinity with other modern accelerator architectures like NVIDIA’s and AMD’s newest GPUs, as well as Intel’s own Knights Landing Xeon Phi.
Image: Intel
However, the server applications Intel is targeting with the Stratix 10 family is somewhat tangential to those other accelerators. Intel believes workloads such as signal processing, data compression, data encryption, storage management, and video encoding – in truth tough, practically any server-side application where data throughput is the driving criteria. With the DSP unit offering lots of hardwired flops, these devices can also be used for high performance computing.
Here it’s worth mentioning that Intel is not promoting using the Stratix 10 for HPC, and for somewhat obvious reasons (cough, Xeon Phi). Although single precision limits this DSP in many traditional scientific simulations, it can still be used in things like seismic processing, genomic analysis, and deep learning. And setting the DSP aside, the FPGA logic elements themselves can be used for scientific algorithms for any precision one chooses. Regardless, the Stratix 10 is unlikely to find a wide audience in HPC, given the challenges of programming FPGAs, but it may find some high-value use cases in financial services, oil & gas exploration, medical imaging, and perhaps even deep learning.
Some of the early samples of the Stratix 10 are undoubtedly in the hands of Microsoft, the company that has built the largest FPGA-equipped cloud on the planet, its Azure-based “AI supercomputer.” Azure currently has an exa-ops worth Stratix V FPGAs in its datacenters, and if we are to believe Intel, they could double that performance with a rolling upgrade to the Stratix 10.
Even though Intel is just getting around to sampling the chips, Stratix 10 has been in the works for a while, certainly before the company purchased Altera. That may help explain why, for instance, the embedded processor is an ARM CPU rather than an Intel x86 processor. Likewise, the choice of HBM2 instead of Intel’s native hybrid memory cube (HMC) technology may also be a relic of a pre-acquisition design decision. On the other hand, this could be an interesting opportunity for Intel to test the waters on technologies that it may want to adopt more fully in the future.
The ARM CPU, by the way, won’t be part of every Stratix 10 product; it will be relegated to the SoC variant. The CPU being used here is a quad-core Cortex-A53 ARM processor, a 64-bit ARM design that is cast as a low-cost platform for standalone entry-level platforms. In this context, it is available for users building a more hardened system that is independent of external host processors. Theoretically, it also offers the best energy efficiency, assuming all your application code maps to the ARM CPU and FPGA logic elements.
It’s tempting to think that Intel will cannibalize its Xeon server business, with every Stratix 10 sold, with or without the ARM processor. To some extent that could be true, but since these new high-end devices are likely to have similar price points and margins as the Xeon line, it’s probably six of one or half a dozen of the other. In any case, it’s all revenue going to Intel’s bottom line.