Publications
Sunil Kumar, "Power Scheduling on Multicore Multiprocessor Systems for Maximizing Throughput and Fairness", in PhD Symposium, Euro-Par'25, Dresden, Germany, August 2025.
[abstract]
Hardware overprovisioning is a widely used technique to improve the average power utilization of computing systems by capping the processor’s power consumption. However, applying uniform power caps across sockets in multiprocessor systems can degrade the performance of co-running applications due to workload variation. Existing solutions primarily aim to enhance power utilization and system throughput but ignore fairness in sharing surplus power between applications and rely on processor-selected frequency settings under a power cap, which may be suboptimal. This thesis addresses these limitations through a two-stage power management approach.
First, it proposes a system that dynamically and bidirectionally redistributes surplus power among applications to improve overall system throughput while ensuring fairness. When it identifies an application with surplus power, it donates power to other applications exhibiting high CPU utilization. When donor applications transition to the high CPU utilization phase, they are rewarded by having a portion of the transferred power returned to them. Second, it improves performance by configuring optimal frequency settings for each application under a power cap rather than relying on default processor settings. Once the performance is optimized, the system further reallocates power to balance performance gains across applications to improve the fairness between applications. The proposed system is evaluated on a quad-socket, 72-core Intel Xeon server using diverse HPC application mixes and power cap settings. The results demonstrate significant improvements in both system throughput and application-level fairness.
Sunil Kumar, and Vivek Kumar, "KarmaPM: Reward-Driven Power Manager", in Proceedings of the European Conference on Parallel Processing (Euro-Par'25), Dresden, Germany, August 2025.
[abstract],
[paper]
Hardware overprovisioning is a widely used technique to improve the average power utilization of computing systems by capping the processor’s power consumption. However, applying a uniform powercap across multiprocessor system sockets can significantly impact co-running applications due to workload variations. This paper introduces KarmaPM, a novel power management library for co-running applications on multiprocessor systems, independent of the parallel programming model, based on application power donation phases. KarmaPM dynamically redistributes power bidirectionally across the sockets to improve overall system throughput for co-running applications while maintaining fairness between them.
KarmaPM periodically profiles the CPU utilization of each socket. When it detects a socket underutilizing its CPU resources, it donates the surplus power from this donor socket to the other sockets (receivers) exhibiting high CPU utilization. When donor sockets enter a high CPU utilization phase, KarmaPM employs a reward power scheme that rewards the donor sockets by returning a portion of the transferred power. We evaluated KarmaPM across various exascale proxy application mixes and power caps on a four-socket, 72-core Intel Cooper Lake processor. Our results show that KarmaPM improved the system throughput (geometric mean) by 13.2% at a lower power cap and 6.6% at a higher power cap. Additionally, KarmaPM delivered improvements of 12.5% and 4.4% in system throughput (geomean) compared to an existing power manager at these respective power caps.
Sunil Kumar, Vivek Kumar, and Sridutt Bhalachandra, "RPM: Reward Power Manager for Power Distribution over a Cluster", in Student Research Symposium, HiPC'24.
[paper]
Sunil Kumar, Vivek Kumar, and Sridutt Bhalachandra, "Energy Efficiency under Limited Power Budget", in Student Research Symposium, HiPC'22.
[paper]
Sunil Kumar, Akshat Gupta, Vivek Kumar, and Sridutt Bhalachandra, "Cuttlefish: Library for Achieving Energy Efficiency in Multicore Parallel Programs", in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'21), St. Louis, MO, USA, November 2021.
[abstract],
[paper]
A low-cap power budget is challenging for exascale computing. Dynamic Voltage and Frequency Scaling (DVFS) and Uncore Frequency Scaling (UFS) are the two widely used techniques for limiting the HPC application’s energy footprint. However, existing approaches fail to provide a unified solution that can work with different types of parallel programming models and applications.
This paper proposes Cuttlefish, a programming model oblivious C/C++ library for achieving energy efficiency in multicore parallel programs running over Intel processors. An online profiler periodically profiles model-specific registers to discover a running application’s memory access pattern. Using a combination of DVFS and UFS, Cuttlefish then dynamically adapts the processor’s core and uncore frequencies, thereby improving its energy efficiency. The evaluation on a 20-core Intel Xeon processor using a set of widely used OpenMP benchmarks, consisting of several irregular-tasking and work-sharing pragmas, achieves geometric mean energy savings of 19.4% with a 3.6% slowdown.