Button Text
Back

P34 - Optimizing the ECsim Plasma Code for Exascale Architectures: GPU Acceleration, Portability, and Scalability

This is some text inside of a div block.
This is some text inside of a div block.
-
This is some text inside of a div block.
CEST
Climate, Weather and Earth Sciences
Chemistry and Materials
Computer Science, Machine Learning, and Applied Mathematics
Applied Social Sciences and Humanities
Engineering
Life Sciences
Physics
This is some text inside of a div block.

Description

This work presents the adaptation of the plasma code ECsim for future exascale architectures. The code has three main blocks called particle movers, moment gathering and field solver. The first two blocks are the most computationally challenging, thus we focused on optimizing them for GPU acceleration using OpenACC directives. Our approach prioritized GPU readiness with minimal code restructuring. The legacy CPU code makes extensive use of C++ structures and templates, which hinder seamless GPU implementation. To overcome this, we manually managed data transfers through CUDA API calls. Performance profiling on NVIDIA GPUs reveals a speedup of 5x to 9x compared to the CPU implementation (considering node-to-node comparison). Scaling tests conducted on multiple supercomputers demonstrate ECsim scalability, achieving above 80% efficiency up to 1024 GPUs in weak and strong scaling tests for adequately sized problems.We further extended this work to use also OpenMP target directives. Our memory management strategy for GPU porting allowed for minimal effort in this case, enhancing the portability of ECsim across different GPU architectures. Comparative analysis on NVIDIA GPUs highlights the code portability and significant speedup also with OpenMP target directives compared to the CPU. Similar work is underway on an AMD GPU system at EuroHPC.

Presenter(s)

Authors