Button Text
Back

P11 - Enabling Lattice QCD Normalizing Flows in HPC Infrastructures

This is some text inside of a div block.
This is some text inside of a div block.
-
This is some text inside of a div block.
CEST
Climate, Weather and Earth Sciences
Chemistry and Materials
Computer Science, Machine Learning, and Applied Mathematics
Applied Social Sciences and Humanities
Engineering
Life Sciences
Physics
This is some text inside of a div block.

Description

The Horizon Europe project interTwin aims at developing a prototype for a multidisciplinary Digital Twin Engine, applicable across a whole spectrum of scientific disciplines: High Energy Physics (HEP), Environment, Climate, etc. As part of this effort we explore the extent to which Machine Learning (ML) methods can speed up Lattice Gauge Theory Simulations in challenging areas of the parameter space where Monte Carlo methods suffer from severe critical slowing down. The overall goal is progressing towards designing the digital twin of a HEP detector, where Lattice QCD simulations could provide future realistic simulations of the Standard model. We are exploiting the advantages of the tools developed in the project interTwin, notably intertwinai, to scale up and support the deployment of our simulations in HPC systems, while enabling as well several code features. The itwinai toolkit provides functionalities for distributed machine learning on HPC, supporting different distributed frameworks (DeepSpeed, Horovod, and PyTorch DistributedDataParallel) implementing different communication protocols across different GPUs, suited to different infrastructures. Furthermore, itwinai also offers a profiling feature based on the PyTorch profiling backend, enabling it to identify communication and computation shares. This profiler will enable the identification of bottlenecks, and hence optimize the code to improve performance.

Presenter(s)

Presenter

Matteo
Bunino
-
CERN

Matteo Bunino earned a double MSc degree in Data Science and Computer Engineering from the Polytechnic University of Turin (Italy) and EURECOM (France). He worked at Huawei's Munich Research Center (MRC) on AI-powered malware analysis, resorting to reinforcement learning, NLP, and graph machine learning.Currently, Matteo is a fellow in the IT department at CERN and he is working on interTwin, a European project aimed at developing a unified digital twin engine (DTE) for science. In particular, Matteo is the main developer of "itwinai", a toolkit for advanced MLOps on cloud and HPC aimed at simplifying the access to large-scale distributed ML and hyper-parameter optimization for scientific use cases. Moreover, Matteo is also part of CERN openlab, where he is investigating digital twin applications with Nvidia Omniverse and heterogeneous computing benchmarking.

Presenter

Marina
Marinkovic
-
ETH Zurich

Marina Marinkovic is an assistant professor in computational physics at ETH Zurich, where she is leading a High Performance Computational Physics research group as of February 2021. While studying theoretical physics at the University of Belgrade, Marinkovic spent the final year of her undergraduate degree (2008-2009) at DESY Zeuthen and the University of Graz as a scholar of the Austrian agency for international mobility and cooperation in education, science and research. She obtained her PhD in the Computational Physics Group of Humboldt University in Berlin (2009-2013) and from there went on to a postdoctoral position at the University of Southampton (2012-2014). As an acknowledgment of her strong potential for independence and leadership in theoretical particle physics, Marinkovic was recognised by a prestigious CERN fellowship (2014-2017), followed by the Hitachi Assistant Professorship at Trinity College Dublin (2016-2019) and a junior professorship at LMU Münich (2020-2021). Besides her work on theoretical aspects of lattice gauge theories, Marinkovic has experience in porting and optimising physics codes on IBM Blue Gene/P at Jülich Supercomputing Center, HLRN in Berlin and Hannover, HLRS in Stuttgart, Blue Gene Q in Edinburgh, Altamira in Spain, CSCS in Switzerland, TH cluster at CERN, Research IT facilities at TCD, Ireland and LRZ near Munich, Germany.

Authors