ACMP06 - Understanding HMM Performance for Enhanced HPC Portability
Description
Heterogeneous Memory Management (HMM) simplifies programming for heterogeneous systems, making High-Performance Computing (HPC) devices more accessible to domain scientists; however, it suffers from slow performance compared to other memory management approaches. HMM is an infrastructure provided by Linux to enable a more simple and universal usage of non-conventional memory, enabling usage of multiple devices without the need for developers to use runtime APIs for memory allocation and data transfer. This simplification benefits domain scientists by reducing code complexity, making it easier to transition between systems, and enabling quicker adoption of legacy code for complex heterogeneous systems, such as those found in HPC Centers.Currently, HMM has slow performance compared to explicit memory management and Unified Virtual Memory (UVM), a similar infrastructure provided by NVIDIA for their GPUs. UVM requires driver specific APIs for allocation, but once allocated the memory can be used by any device. Due to the similarity between HMM and UVM, we expect any performance differences to result from improper UVM driver implementation, challenges in utilizing HMM correctly, or inefficient algorithms introduced by the abstraction. In our work, we conduct experiments to identify the root cause of the expected underlying issues and provide insights into their impact.