News

AMD Takes Aim at Performance Leadership with Next-Generation EPYC Processor

None
Nov. 7, 2018

By: Michael Feldman

AMD has offered a tantalizing preview of “Rome,” the Zen 2 EPYC processor that will offer up to 64 cores and four times the floating point performance as its predecessor. If the claims hold up, the second-generation processor has a shot at being the highest performing datacenter CPU in 2019.

 

AMD CEO Lisa Su holding up pre-production Rome processor.  Source: AMD

 

The Rome preview was part of AMD’s Next Horizon Event that took place in San Francisco on Tuesday. Before we get to the ramifications for HPC users, here’s a list of feature enhancements that AMD CTO Mark Papermaster outlined for the second-generation EPYC:

  • Improved branch prediction
  • Better instruction pre-fetching
  • Optimized instruction cache
  • Larger op cache
  • Support for PCIe 4.0 (doubles PCIe 3.0 bandwidth)
  • Added hardware Spectre mitigations and increased flexibility of memory encryption
  • Increased dispatch/retire bandwidth
  • Doubled floating point load/store bandwidth
  • Doubled floating point width (to 256 bits)
  • Doubled maximum core count (to 64)

Those last four apply to Rome’s floating point performance, which is of particular interest to HPC customers, who for more than a decade have turned to Intel Xeon as the go-to CPU for delivering maximum flops. That could soon change. Thanks to the doubling of the core count and floating point width, AMD is claiming Rome will deliver four times the flops per socket as its first-generation EPYC offering.

A little context: Intel’s top-of-the-line 28-core Xeon Skylake processor currently offers about three times the floating point performance of the first-generation 32-core EPYC 7601 processor. A few days ago, Intel claimed that its upcoming 48-core HPC Xeon processor, known as Cascade Lake Advanced Performance (AP), would deliver 3.4 times the floating point performance on Linpack as this same EPYC chip. Considering what AMD is now claiming, Rome should be able to match or exceed Intel’s highest performing offering in 2019. If you’ve been tracking the x86 wars for the last couple of decades, you’ll know such a development is an exceedingly rare occurrence.

AMD is also claiming the platform’s overall compute performance will be double that of its first-generation EPYC, delivering more instructions-per-cycle along with up to 400 GB/sec of memory bandwidth. Papermaster summed it up thusly: “We’re in the business of high performance.”

To highlight the chip’s floating point performance, AMD ran the standard C-Ray rendering benchmark on-stage at the event, using a single-socket server outfitted with a pre-production version of Rome. When matched against a dual-socket Xeon Platinum 8180M server, the AMD box ran the benchmark to completion first.

Key to this better performance is AMD’s choice to manufacture Rome on TSMC’s 7nm process technology, which gave chip designers a larger transistor budget, while also offering increased performance and performance per watt.  According to Papermaster, this was something of a gamble, given the challenges of implementing such a complex design on a leading-edge semiconductor manufacturing technology.

When they made the decision to go with 7nm during Rome’s design, AMD thought they would be able to reach transistor parity with Intel in 2019, which was planning to manufacture Skylake’s successor on its own 10nm process (basically equivalent to TSMC’s 7nm technology). But because of continued delays in Intel’s 10nm effort, AMD lucked out; it will be shipping some of the fastest and most power-efficient silicon in the datacenter next year.

Another key element of Rome’s design is its modular architecture, which uses AMD’s second-generation Infinity Fabric as the interconnect that links up all the processor’s component dies – what AMD refers to as chiplets. There are a couple of advantages to this approach: (1) a 64-core CPU can be manufactured more cheaply as a multi-chip package due to the reduction of manufacturing defects on smaller dies and (2) less critical dies can be implemented on less expensive process technologies. For example, the I/O and Infinity Fabric chiplets in Rome are implemented on a 14nm process, since unlike the execution units, they don’t need the same level of transistor density.

Rome is currently sampling with customers and is expected to ship sometime in 2019, at which point we’ll find out about all the product specs: SKUs, clock frequencies, cache sizes, pricing, and so on.

Beyond that, AMD is already putting the finishing touches on its Zen 3 EPYC processor, which is slated to be manufactured on a 7nm++ process and launched in 2020. A Zen 4 processor is also in the works, although no timeline was given for its debut.