By Brian Sparks, UCF Consortium Marketing Working Group Chair
Modern HPC systems include extreme numbers of compute elements and extremely low-latency interconnection networks. In order to exploit the capabilities of these architectures and to meet their demands in scalability, communication software needs to scale and support applications with adequate functionality to express their parallelism. Moreover, communication software should add as little overhead as possible in order to avoid compromising the native performance of the interconnection network. These requirements make the design of high-performance communication software extremely intricate, since they demand minimal memory requirements and low instruction counts and cache activity while meeting stringent performance targets.
High-level programming models for communication (e.g., MPI, SHMEM) can be built on top of middleware, such as Portals, GASNet, UCCS, and ARMCI or use lower-level network-specific interfaces, often provided by the vendor. While the former offer high-level communication abstractions and portability across different systems, the latter offer proximity to the hardware and minimize overheads related to multiple software layers. An effort to combine the advantages of both is UCX, a communication framework for high-performance computing systems. UCX is a project of the UCF consortium. Due to its importance to the future of HPC technologies and applications, UCX received the 2019 R&D100 award.
New technologies continue to be developed to support the migration of the data center architecture from the old CPU-centric concept to the data-centric concept, and to help drive this effort the UCF created and announced a new project at ISC2020 - the Open Smart Network Application Programming Interface (OpenSNAPI) project. The goal of this new project is to help expand the applicability and portability of emerging use-cases for smart networking and computational storage to enhance supercomputing performance, offload security or virtualization functions, increase storage performance, and more.
The new OpenSNAPI project has quickly picked up momentum and we are seeing new development and contributions to OpenSNAPI across the industry. Developers are seeing for themselves the efficiency gains achieved by offloading network processing to smart adapters, as well as experiencing the incredible flexibility and performance available for other offload activities, such as persistent memory storage. For example, the UCF recently collaborated with Arm and accepted the open source contribution of an OpenSHMEM-based I/O research extension to access persistent memory storage. Provided to the OpenSNAPI project, this open source software enables Smart Networking Adapters to provide real-time access to large datasets and deliver higher application performance for latency-sensitive applications such as fraud detection, cybersecurity analysis, web-scale personalization, and Internet of Things (IoT).
Other projects that the UCF Consortium has spurred is the Unified Collective Communication (UCC) project and the High Performance Compute Availability (HPCA) Benchmark project. The goal of UCC is to provide highly performant and scalable collective operations leveraging scalable and topology-aware algorithms, software implementation techniques and In-Network Computing hardware acceleration engines. It collaborates with UCX and utilizes UCX’s highly performant point-to-point communication operations and library utilities.
The HPCA Benchmark project is an effort to create a new metric for ranking HPC and AI system performance and capabilities, and is intended as a complement to existing benchmarks, such as HPCC and HPCG. HPCA is designed to exercise computational, data and network access patterns for measuring the compute availability of the system.
All of these projects are seeing tremendous amounts of activity and development and we look forward to sharing it all with you next month during our upcoming Annual Meeting and Workshop (Nov. 30 – Dec. 3, 2020). The annual “digital” meeting and workshop will cover multiple UCF Consortium topics and projects, including, UCX, UCX for Apache Spark, UCC, OpenSNAPI, and other RDMA development, usage and futures.
Register today for free and take advantage of this great opportunity to connect with UCF experts and gain insights from AMD, ANL, Arm, Facebook, Huawei, LANL, NVIDIA, ORNL and more, on their development and future directions and projects within the UCF Consortium.
We look forward to more exciting contributions to all our projects and invite you and your organization to be a part of the data-centric revolution. Learn more about joining the UCF and becoming a member today.