Jan. 13, 2017
By: Michael Feldman
The release date for AMD’s Zen-based “Naples” CPU is still a few months away, but details about the new high performance server chip are already leaking into the public domain. Some of these specs are available in a recent report published at WCCFtech. Although much remains to be revealed, Naples is shaping up to be the first credible Xeon competitor that Intel has encountered in several years.
The Naples server processors will sport up to 32 cores, each of which will run two threads simultaneously. That’s just a handful more than Intel’s upcoming 28-core/56-thread Skylake-EP processors expected later this year, but on compute-intensive tasks, those 4 extra cores could make a notable difference. The addition of the simultaneous multi-threading (SMT) in the Zen architecture lines up closely with Intel’s more established hyperthreading (HT) technology, although for HPC applications this feature is often switched off.
Naples will also provide eight channels for DDR4 memory, against the six channels supported in the Skylake parts. More channels are nearly always better, especially for the types of data-demanding applications that are a hallmark of HPC – weather forecasting, quantum chemistry, seismic analysis, astrophysics, and molecular dynamics, to name a few.
On the L3 cache front, Naples also comes out on top, with up to 64MB. That’s almost twice the capacity of the Skylake Xeon at 38.75MB of L3 cache (although about half that of the Power9’s 120 MB). Here though, cache design can make a big difference. AMD’s Bulldozer CPU had plenty of cache as well, but suffered from slower performance compared to its Intel Sandy Bridge Xeon competition. For what it’s worth, AMD is promising a cache speedup of 5X on its Zen-architected L3 cache, as well as a 2x speedup on the L1 and L2 caches.
At the more fundamental level of the core architecture, Zen adds a bunch of new capabilities. Besides the aforementioned SMT feature that offers two threads per core, Zen also has increased the instruction queues for both integer and floating point operations. Each of the two floating point units on the new core encompasses 4 pipes and 128 floating point multiply-accumulate (FMAC) units. In addition, there are two floating point addition (FADD) and multiplication (FMUL) units per FPU. Keep in mind though that Skylake will be supporting AVX-512, the extra-wide vector instructions, which until now were only available on Intel's Xeon Phi HPC processors. Nonetheless, given the large number of cores on Naples, the CPU should offer quite a bit of floating point performance, especially if there are product SKUs with decent clock rates.
Speaking of which: according to the WCCFtech writeup, the base clock on a 32-core Naples CPU will be 1.4 GHz, with the turbo frequency topping out at 2.8 GHz. That seems reasonably competitive inasmuch as the 28-core Skylake CPUs are slated to run at 1.5 or 1.8 GHz. Power draw look to be comparable as well, at least from a per-core perspective.
To some extent, the more competitive specs exhibited by AMD is likely the result of reaching parity with Intel on process technology. Both the Naples and the Skylake Xeon CPUs will be manufactured with 14nm FinFET technology, with AMD employing GlobalFoundries for the work and Intel using its own fabs. It wasn’t that long ago (up until 2014) that Intel had what seemed like an insurmountable lead in process technology. But over the last few years, the other semiconductor makers have caught up, even as the physics involved to keep Moore’s Law on track has become more challenging. Although not all process technologies at a given node are equal in performance characteristics across manufacturers, the fact is that Intel’s advantage in transistor shrinkage can no longer be counted on to give its Xeon silicon an edge in speed or energy efficiency.
Perhaps Naples’s biggest differentiator with regard to Skylake is in connectivity. Naples will support 128 lanes of PCIe Gen3, which easily surpasses the 48-lane setup that Intel is promising on the Skylake competition. By the way, IBM’s upcoming Power9 chips will support 48 lanes as well, but in this case, they are PCIe Gen4 (which operates at twice the speed of PCIe Gen3).
Image: WCCFtech
The huge number of PCIe lanes on Naples are there to maximize access to network, storage, and accelerator devices. Regarding the latter, AMD is envisioning servers with lots of GPUs attached to their CPUs. A 1U server reference design for Naples has up to four GPUs, two InfiniBand EDR adapters and four NVMe devices. A 2U design increases that to six GPUs, an InfiniBand adapter, and three NVMe devices, although some configurations support up to eight GPUs. The 1U design is targeted for maximal compute density, while the 2U design, by virtue of the greater number of GPUs, is designed to deliver the maximum performance per node.
As we recently reported, AMD’s new Vega GPUs are also due out in the same general timeframe as the Naples CPUs, so the company is surely hoping to see system providers delivering all-AMD boxes in the traditional HPC server space as well as the burgeoning deep learning space. It's easy to imagine we'll see at least a few supercomputers this year powered by the Naples-Vega tandem.
None of this ensures that Naples processors will be better or worse than the next-generation Xeon CPUs. And of course pricing for these chips has yet to be determined. All of that won’t become known until actual products are released into the wild. At this point the first Naples chips are expected to become available sometime in the first half of 2017. Meanwhile, Intel is planning for the first Skylake Xeon processors to hit the streets in mid-2017. This year, the battle for x86 preeminence starts anew.