P21 - Graph Abstraction for Efficient Scheduling of Asynchronous Workloads on GPU
Description
Many computational physics simulations need to efficiently execute asynchronous workloads (FEM assembly, linear algebra, etc) that can be organised as a Direct Acyclic Graph (DAG). Ad hoc scheduling of these asynchronous workloads is an additional burden to the code and might not fully exploit the available execution resources (e.g. a multi-GPU node). By contrast, architecting the code based on a graph abstraction exposes the whole computational graph to the compiler/driver ahead of execution, thereby enabling as many optimisations as possible. Therefore, a graph abstraction that can be prescribed either at compile time or at runtime is necessary, and it must be mappable to the best backend scheduler, thus maximising resource usage. We contribute to the Kokkos implementation of this graph abstraction which allows for a performance portable single source code. More specifically, this poster will focus on recent contributions to Kokkos::Graph that make it evolve towards the C++ std::execution proposal for managing asynchronous execution on generic execution resources (P2300). We will demonstrate the benefits of using Kokkos::Graph both in terms of performance and software design. We will present several examples of varying complexity, including a FEM simulation of electromagnetic wave scattering.