Back

Minisymposium Presentation

DGEMM Emulation Using INT8 Matrix Engines and its Rounding Error Analysis

Wednesday, June 18, 2025
10:00
-
10:30
CEST
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Climate, Weather and Earth Sciences
Chemistry and Materials
Chemistry and Materials
Chemistry and Materials
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Computer Science and Applied Mathematics
Humanities and Social Sciences
Humanities and Social Sciences
Humanities and Social Sciences
Engineering
Engineering
Engineering
Life Sciences
Life Sciences
Life Sciences
Physics
Physics
Physics

Presenter

Yuki
Uchino
-
RIKEN Center for Computational Science

Yuki Uchino is a postdoctoral researcher at RIKEN R-CCS. He received his Ph.D. in engineering from Shibaura Institute of Technology in 2024. His research interests include reliable computing, numerical linear algebra, and highly accurate algorithms.

Description

Modern architectures are equipped with high-performance matrix engines optimized for low-precision matrix multiplications used in machine learning models. Fully leveraging these architectures is the key to achieving superior performance in numerical algorithms. This study aims to design methods for emulating DGEMM using int8 matrix engines to achieve superior performance on modern architectures. The Ozaki scheme, a highly accurate matrix multiplication algorithm using error-free transformations, enables higher-precision matrix multiplication to be performed through multiple lower-precision matrix multiplications and higher-precision matrix additions. Ootomo et al. implemented the Ozaki scheme using int8 matrix engines with the aim of achieving both sufficient accuracy and high performance. We propose alternative approaches to improving performance by reducing the numbers of lower-precision matrix multiplications and higher-precision matrix additions. Numerical experiments demonstrate the accuracy of the results and conduct performance benchmarks of the proposed approaches. These approaches are expected to yield more efficient results in next-generation architectures. We also provide a rounding error analysis of the proposed methods.

Authors