Minisymposium Presentation
From Reactive Debugging to Proactive Detection: ML for Performance-Aware Software Development
Presenter
Dr. Tanzima Islam is an Assistant Professor in the Department of Computer Science at Texas State University. She earned her Ph.D. in Computer Engineering from Purdue University and served as a postdoctoral scholar at Lawrence Livermore National Laboratory (LLNL). Her research centers on developing software tools and data-driven techniques to identify and alleviate performance challenges in high-performance computing (HPC) applications. Dr. Islam’s contributions have been widely recognized, including a 2025 NSF CAREER Award, 2022 DOE Early Career Award, a 2019 R&D 100 Award, an LLNL Science and Technology Award, and the Presidential Seminar Award from Texas State University. Her research has attracted funding from national laboratories and industry partners, such as the National Science Foundation, U.S. Department of Energy, LLNL, ORNL, and AMD. She is also a DOE SRP fellow and has initiated collaborations with Brookhaven, LLNL, LBNL, and ORNL, engaging as visiting faculty, presenting invited seminars, and mentoring graduate students to support workforce development. Beyond academia, Dr. Islam co-founded BWCSE (https://bwcse.wordpress.com), the first research and mentoring platform in Bangladesh, dedicated to providing research and career development opportunities for female computer science and engineering students in developing countries.
Description
Software performance evolves over time, yet traditional debugging and profiling remain reactive, costly, and disconnected from development workflows. Performance drift—gradual degradation in execution efficiency due to code modifications—often goes undetected until it causes significant slowdowns, forcing late-stage debugging and costly fixes. This talk presents a vision for AI/ML-driven proactive performance-drift detection, where models continuously monitor software evolution, identifying inefficiencies before they degrade execution. By combining static analysis (abstract syntax trees) with dynamic insights from nightly tests, this framework enables early detection of performance-impacting changes. Traditional ML approaches require full model retraining whenever code changes or new runtime data become available, making them impractical for fast-moving development cycles. Few-shot learning eliminates this overhead by allowing models to update incrementally with minimal new data. Attention-based representation learning further enhances interpretability by prioritizing performance-critical features, enabling more targeted interventions. This framework supports two key decision-making processes where (1) developers can receive automated feedback on whether a code change improves or degrades performance, enabling early intervention; (2) the insights can guide hardware configuration choices and runtime parameter tuning. This approach can be seamlessly integrated into CI/CD pipelines to achieve software that not only remains correct but also maintains efficiency as it evolves.