June 15, 2017
By: Michael Feldman
The US Department of Energy announced that it will funnel $258 million into six tech companies as part of its PathForward program to develop new HPC technologies for exascale supercomputers. The awardees include AMD, Cray, HPE, IBM, Intel, and NVIDIA.
The funding, which is being managed under the DOE’s Exascale Computing Project (ECP), will be spread over the next three years and is expected to result in hardware that can applied to exascale-capable systems suitable for the energy agency’s work. The six companies will shoulder at least 40 percent of the R&D cost, bringing the total investment to at least $430 million.
“The PathForward program is critical to the ECP’s co-design process, which brings together expertise from diverse sources to address the four key challenges: parallelism, memory and storage, reliability and energy consumption,” said ECP Director Paul Messina, in a prepared statement. “The work funded by PathForward will include development of innovative memory architectures, higher-speed interconnects, improved reliability systems, and approaches for increasing computing power without prohibitive increases in energy demand. It is essential that private industry play a role in this work going forward: advances in computer hardware and architecture will contribute to meeting all four challenges.”
The rationale for injecting government money into these companies for technologies and products they’re likely already working on is that some of this R&D work may not come to fruition by the time the DOE begins ordering its first exascale supercomputers in 2019. When applying for the PathForward work, each vendor had to make that case that their technology roadmap needed to be accelerated to meet that deadline.
At the same time, the vendors also had to convince the DOE that the resulting products would have commercial viability in their HPC business. (In general, the government is not interested in buying one-off products with no commercial future.) This is a little trickier than it sounds, since a product that may be suitable for exascale computing may attract little interest from other customers, even years after the technology become established in the upper rungs of supercomputing.
For a company like Cray, that’s not a problem, since the majority of their business is in this elite realm of HPC. It's reasonable to assume their PathForward money will go into accelerating the development of a future XC supercomputer platform of some sort. But for vendors like HPE or IBM, their volume business is in the enterprise space. It’s not entirely clear what IBM will be working on for PathForward, but it’s a fair bet that it will be some advanced version of the IBM Power/NVIDIA GPU “Minsky” server that is the basis for the pre-exascale Summit and Sierra supercomputers for the DOE. For PathForward, HPE appears to be focusing on its Memory-Driven Computer architecture, aka “The Machine.” It features technologies such as photonics interconnects, a new memory fabric, and advanced non-volatile memory, all of which can be folded into the company’s HPC product line.
For the chip vendors, it’s safe to assume that the PathForward money will be applied to future HPC processors whose predecessors are already in the pipeline today. For NVIDIA, that almost certainly means a post-Volta GPU of some sort, and for Intel, a future Xeon Phi beyond the third-generation Knights Hill. AMD is the odd one out here, since its current crop of chips are used very little in HPC machinery right now. However, its APU technology that integrates x86 CPUs and Radeon GPUs on the same die has a lot of potential as an energy-efficient platform for heterogeneous computing, given the right software support. It would make sense the DOE wants to accelerate this product line for exascale computing, especially given that no other chip vendor is currently pursuing this type of processing design.
The PathForward contracts will end in 2019, which corresponds to the time when the DOE labs will begin acquiring exascale systems for delivery in the 2022-2023 timeframe. However, at least one exascale system of “novel architecture” is scheduled to be delivered in 2021. Of course, that could theoretically apply to any of the hardware being developed under PathForward. A clearer picture of which specific technologies these six vendors are pursuing for this work will likely emerge over the next year. Stay tuned.