Minisymposium Presentation
Deep Neural Network Inference with Analog In-Memory Computing

Presenter
Manuel Le Gallo joined IBM Research Europe in 2013, where he is currently employed as a Staff Research Scientist in the In-Memory Computing group of the Zurich laboratory. His main research interest is in using phase-change memory devices for non-von Neumann computing. He has co-authored more than 100 scientific papers in journal and conferences, holds 35 granted patents and has given 15 invited talks. He was appointed IBM Master Inventor in 2019 and 2024 for significant contributions to intellectual property and is a recipient of the MIT Technology Review's 2020 Innovators Under 35 award.
Description
The need to repeatedly shuttle around synaptic weight values from memory to processing units has been a key source of energy inefficiency associated with hardware implementation of artificial neural networks. Analog in-memory computing (AIMC) with spatially instantiated synaptic weights holds high promise to overcome this challenge, by performing matrix-vector multiplications directly within the network weights stored on a chip to execute an inference workload. In this talk, I will first present our latest multi-core AIMC chip in 14-nm complementary metal–oxide–semiconductor (CMOS) technology with backend-integrated phase-change memory (PCM). The fully-integrated chip features 64 256x256 AIMC cores interconnected via an on-chip communication network. Experimental inference results on ResNet and LSTM networks will be presented, with all the computations associated with the weight layers and the activation functions implemented on-chip. Then, I will present our open-source toolkit (https://aihw-composer.draco.res.ibm.com/) to simulate inference and training of neural networks with AIMC. Finally, I will present our latest architectural solutions to increase the weight capacity of AIMC chips towards supporting large-language models, as well as alternative solutions suited for low-power edge computing applications.