Nov. 14, 2018
By: Andrew Jones, VP of HPC Consulting & Services at NAG
It used to be easy to understand what supercomputing was. Powerful computers running science and engineering applications. We might have had a debate about the definition of a powerful-enough computer to be called a supercomputer or HPC system. We might have quibbled about the size of the science or engineering simulation being "big enough". But these debates were really just pedantic types keeping themselves busy. Generally, there was a workable consensus on what supercomputing was.
However, as we look at the non-traditional machines being stuffed into recent TOP500 lists, the role of cloud computing, and the surge of AI and data analytics stories, the arguments about what constitutes HPC have become louder.
Why do we care what gets classed as HPC? Apart from the inherent urge of humans to categorize our world into tribes, silos and labels?
Consider the TOP500 list, which has been used for over 25 years to track the HPC community. It's not perfect of course, but it would be totally useless for the HPC community if positions 1-500 were filled with the many clusters making up the enormous Google, AWS and Azure compute empires. For example, in my review of the June 2018 list, I discussed how the inclusion of 124 duplicate entries and 238 “cloud” systems potentially distorted the statistics of the list. However, it is far from clear how to conduct that moderation effectively and fairly.
The visible behaviors of the HPC community as reflected in conferences, the media, funding agencies, and grant writers are driven to appear trendy and to always seek new topics such as AI, cloud, FPGAs, quantum computing, etc. Is this unrepresentative prominence of these topics effectively starving the traditional science and engineering applications of conference opportunities, R&D focus, funding, or future talent? We must find a way to ensure that the vast swathes of “traditional” HPC use cases still get the attention that they need.
But, can we really draw clear lines between applications of supercomputers that are HPC and those that aren’t? For example, consider data analytics. The intelligence agencies have always been accepted as supercomputing sites and yet I suspect their dominant applications are some form of data analytics rather than CFD or molecular dynamics. The very large HPC clusters in finance sector might be running partial differential equations but these are modelling market data rather than scientific simulations. Likewise, the large-scale data processing of the particle physics community. Indeed, you could probably have a healthy debate about where seismic processing (arguably the largest commercial HPC use case) fits in the data-processing-versus-scientific-simulation spectrum. In practice, HPC centers across academia and industry are blurring the lines and working with both traditional HPC user communities and new application areas such as machine learning.
So, if we can’t mark our turf based on application areas, what about scale?
We could probably get a very strong community consensus that running an application on a single desktop CPU is not HPC. Same if the single CPU is on a server down the hall. Why should that change if that application is being run on a single CPU of machine that just happens to have thousands of similar CPUs being used by other people/applications? Following this path leads us to the conclusion that merely running a code on a node of a supercomputer or cluster doesn’t intrinsically make it “supercomputing”. The application has to make some meaningful use of the supercomputer’s differentiated capabilities (e.g., scale, interconnect, parallel processing, etc.) to earn the supercomputing label.
But how much scale is enough? Two nodes? One hundred nodes? What if the application runs on a single GPU instead of a single CPU?
To resolve this, perhaps I can turn to a definition of HPC that I often use when helping my consulting clients, and that is to focus on the user impact – if it is "super" for that user, that is, a step change from other computing or modelling they could do, then it counts as HPC. So even a single GPU or a handful of compute nodes could qualify here. But, to earn the label supercomputing, it has to pass a more stringent test; it must not only be a step change in capability for that specific user, but also recognizable as a step change for most users in that discipline.
In the same way, using “the cloud” doesn’t make it high performance computing. Obviously, running one VM in the cloud isn’t HPC. But running a handful of VMs isn’t HPC either. Using the cloud for technical computing at scale or in some “super” way, will probably earn the label HPC. Cloud computing is just another means of delivering compute capability, like an “on-prem” cluster or a supercomputer. Like any other HPC delivery solution, cloud brings benefits (such as business flexibility or capacity) and limitations (such as performance or different operational behaviors) and depending on your scenario, can be either more expensive or the best value solution.
In summary, any application has the right to be called HPC, as long there is a distinct element of scale or performance. So, data analytics and machine learning just become other use cases alongside climate simulations, aerodynamics, and materials science. Cloud is just another means of bringing specific business benefits to the HPC landscape. And so on.
My HPC consulting team is lucky enough to advise and help customers across all the application areas and technologies mentioned here, and we have found a simple way forward on this labeling issue. We encourage people to stop stressing over the sanctity of the HPC silo and instead get on with working to bring the mix of right technologies and processes together to deliver the best impact for each business or science need, using HPC or whatever other label the user best understands.
Andrew Jones can be contacted via twitter (@hpcnotes), via LinkedIn (https://www.linkedin.com/in/andrewjones/), or via the TOP500 editor.