News

Designing Energy-Aware MPI Communication Library: Opportunities and Challenges

By: TOP500 Team

Dhabaleswar K. (DK) Panda, Ohio State University

DK PandaPower is considered the major impediment in designing the next-generation exascale systems. In recent years, the TOP500 and Green500 lists have been focusing on both performance and power consumption.  To address the power challenge, researchers and engineers are proposing solutions along multiple directions including: 1) exploring revolutionary architectures that compute at near threshold voltage (NTV) to minimize leakage power; 2) developing user-controlled mechanisms to control power (power levers) such as dynamic voltage and frequency scaling (DVFS), and core-idling; 3) increasing the efficiency of cooling subsystems; 4) extending the job scheduler and resource management schemes to optimize energy consumption; 5) optimizing the throughput of a system under a strict power budget; and 6) reducing the energy consumption of an application by optimizing the computation kernels and increasing data locality.

However, without exception, all these approaches treat the communication runtimes as a black-box with regard to energy consumption. Several of these techniques use DVFS to reduce the energy consumption of the communication phase of an application.  However, such coarse-grain approaches lead to inefficient communication performance and hence, increase the total execution time of the application.  This leads to an open challenge: Can new techniques be designed to reduce the energy consumption of communication runtimes? These new techniques, if feasible, have the potential to deliver significant energy savings in conjunction with complimentary state-of-the-art techniques on next-generation exascale systems.
 
The Message Passing Interface (MPI) is the de-facto communication runtime for most current-generation HPC systems.  In a recently published paper [1] that was selected as a finalist in the best student paper category at SC15, authors from The Ohio State University and Pacific Northwest National Laboratory asked the following two fundamental questions: 1) Can MPI communication runtimes be designed to be energy-aware? and 2) Can energy be saved during MPI calls without a loss in performance?

To answer these questions, the authors proposed a set of designs that exploit the slack in MPI calls to save energy by applying a lower energy lever, using DVFS and/or core-idling. However, the challenge is when to apply a power lever to maximize the energy savings with no impact on performance.  The authors analyzed the behavior of different internal communication protocols used by MPI and proposed a set of designs to achieve fine-grained performance-energy trade-offs.

The design also incorporated a user-defined parameter that sets a threshold on the maximum allowed performance degradation.  The proposed designs allow one to save as much energy as possible inside the MPI communication runtime, while guaranteeing no degradation more than the user-specified value. For instance, with the Graph500 application kernel, it was demonstrated that the MPI runtime can achieve 41 percent energy savings with minimal impact on performance (less than 4 percent) using 2,048 processes.

The above study was done by modifying the MVAPICH2 MPI runtime [2]. Subsequent to this study, the MVAPICH2 team members have incorporated the proposed designs into an initial production-ready energy-aware runtime, known as MVAPICH2-EA [3]. In order to measure energy savings for MPI applications using the MVAPICH2-EA stack, the MVAPICH2 team members have also designed an OSU Energy Monitoring Tool (OEMT) [4]. Both MVAPICH2-EA and OEMT are publicly available. The MVAPICH2 team members are also exploring designs of collective algorithms using new transport protocols to save energy on InfiniBand clusters [5].


[1] A. Venkatesh, A. Vishnu, K. Hamidouche, N. Tallent, D. K. Panda,
D. Kerbyson, and A. Hoise, A Case for Application-Oblivious Energy-Efficient MPI Runtime, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , Nov 2015 (Best Student Paper Finalist)

[2] http://mvapich.cse.ohio-state.edu/

[3] http://mvapich.cse.ohio-state.edu/downloads/

[4] http://mvapich.cse.ohio-state.edu/tools/oemt/

[5] H. Subramoni, A. Venkatesh, K. Hamidouche, K. Tomko, and D. K. Panda, Impact of InfiniBand DC Transport Protocol on Energy Consumption of All-to-all Collective Algorithms, 23rd International Symposium on High Performance Interconnects 2015, Aug 2015