June 12, 2017
By: Michael Feldman
When assessing the fastest supercomputers in the world, system performance is king, while the I/O componentry that feeds these computational beasts often escapes notice. But a small group of storage devotees working on a project at the Virtual Institute for I/O (VI4IO) want to change that.
VI4IO is a non-profit organization, whose mission is to is build a global community of I/O enthusiasts devoted to lifting the visibility of high-performance storage and provide resources for both users and buyers. It does this through outreach and information exchanges at conferences like ISC and SC, and maintains a website to help spread the word.
An important element of VI4IO’s mission now involves the creation of a High Performance Storage List (HPSL), also known as the IO-500. Like its TOP500 counterpart, the list purports to tracks the top systems in the world, but in this case from the perspective of storage. The TOP500, you’ll note, collects no information on this critical subsystem.
Essentially the IO-500 provides a collection of I/O metrics and other data associated with some of the largest and fastest storage systems on the planet. The effort is being spearheaded by Julian Kunkel, a researcher at DKRZ (the German Climate Computing Center), along with Jay Lofstead, at Sandia National Labs, and John Bent, fromSeagate Government Solutions.
Both performance and capacity data is captured, as well as other relevant information.. Since the work to compile this data began just over a year ago, the list today contains a mere 33 entries. The eventual goal is to provide a knowledge base for 500 top storage systems and track them over time to provide a historical reference, as has been done in the TOP500 list.
Kunkel says the motivation to compile the list came from the desire to provide a central data repository for these big storage systems -- information that is now spread across hundreds of websites in different formats, languages, and levels of detail. Another incentive for the list was to create some standard I/O benchmarks that would be widely accepted by storage makers and users. According to Kunkel, a lot of people are doing great work in measuring and analyzing storage systems, but they tend to be isolated and work off their own metrics,
Although it’s loosely based on the TOP500 concept, the IO-500 data is compiled quite differently. For starters, there is no formal submission process. Individuals familiar with the storage at their own sites can input and edit the metrics and other data via a wiki website. So rather than a new list getting released every six months, the list is being continuously updated.
Such data, by definition, is difficult to verify, but the list makers encourage submitters to include references to web pages or public presentations to back up the credibility of their submission. “The integrity of the people is the key,” admits Kunkel.
The current list is very much a work in progress. Much of it has been compiled by Kunkel himself, along with some graduate student help, based on online material or correspondence with system owners. Even in the 33 current entries, none have complete profiles. Part of this is because many of these storage systems aren’t documented in much detail. But most of the missing data can be attributed to the fact that the list allows for just about any attribute you can think of – from metadata rate and cache size to procurement costs and annual TCO.
Mandatory data is limited to things like the name of the institution, the name of the supercomputer, the storage capacity, and the storage system name (actually the file system name, since, unlike the supercomputers themselves, storage systems are usually unnamed).
One of the unique strengths of the IO-500 list is that it’s interactive. You can click on the site, the system, or the file system to reveal more detailed aspects of those areas. These secondary pages are not just a collection of metrics, but also can provided explanations of how those metrics were derived. You can also select non-mandatory data fields to be include in the list, like sustained read performance and cache size.
What is especially useful is the ability to re-sort the list by clicking on any one of the metrics-based fields – storage capacity (the default) or any of the non-mandatory metrics selected. Even if you’re not interested in storage per se, you can re-sort the list based on metrics like system peak performance or memory capacity.
There’s also a “derived metrics” page where you generate correlations between different storage aspects or other elements of the system. So, for example, you could compute things like the ratio of storage to memory capacity or the I/O performance per drive, and then sort on that metric. There’s a wealth of possibilities for different types of analysis.
The current weakness of the list, beside the paucity of entries, is the lack of standard metrics. Unlike the TOP500 with its High Performance Linpack (HPL), there is no standard set of storage performance benchmarks mandated by the list. Therefore, comparisons between the various submitted metrics, like peak I/O or sustained reads and writes, may not be directly comparable.
To rectify that, the IO-500 team have come up with three performance benchmarks: a metadata or small object benchmark, an “optimistic” I/O benchmark, and a “pessimistic” I/O benchmark. The pessimistic benchmark is still under development.
At some point, they would like to distill all this work into a single metric, which would likely combine the three benchmarks, weighted in some manner that made sense. That would provide a standard performance measurement on which to rank storage systems, analogous to HPL in the TOP500 rankings.
The immediate challenge, though, is to get more people involved in submitting storage entries, since their principle focus right now is to collect enough data to provide the list with enough critical mass to make it a worthwhile resource. That’s one reason why he and his two IO-500 cohorts are hosting a BoF session at the ISC High Performance conference next week. Also to be discussed will be how the standard benchmark efforts are progressing, although according to Kunkel, there’s no rush to force something on the community.
“There have been previous efforts at developing an I/O benchmark, and they all failed,” he says. “And that’s why we are going a bit slower. We don’t want this one to fail.”