Who am I?
I am a fourth year PhD Student at NC State. I am working with Dr. Marcelo D’Amorim as a member of the Software Engineering Group. My area of research is Software Testing.
Research Interests
My overall research interest lies in testing learning enabled system. I am working on various testing techniques to find bugs in Deep Learning Libraries such as PyTorch, Tensorflow, Jax etc. To date, I have submitted 94 bug reportes in PyTorch and Tensorflow, among which 74 have been confirmed (list of bug-reports). I am also working in generating complex test cases for Autonomous Driving Systems (ADS).
Publications
- BugsInDLLs : A Database of Reproducible Bugs in Deep Learning Libraries to Enable Systematic Evaluation of Testing Techniques
M M Abid Naziri, Aman Kumar Singh, Feiran Qin, Benjamin Wu, Saikat Dutta, Marcelo d’Amorim
We introduce BugsInDLLs, a curated database of 112 reproducible bugs from popular deep learning libraries like TensorFlow and PyTorch. This benchmark provides the research community with a standard resource to systematically evaluate and improve bug-finding techniques.
Published in ISSTA 2025, Tool Demonstration
[PDF] [Tool] - Evaluating the Effectiveness of Neurosymbolic Constraint Learning for Testing Deep Learning Library APIs
M M Abid Naziri, Shinhae Kim, Feiran Qin, Saikat Dutta, Marcelo d’Amorim
This paper introduces a novel neurosymbolic technique to test Deep Learning library APIs by dynamically learning their input constraints. By uniquely combining a grammar-guided Large Language Model with an SMT solver, the technique generates more valid and diverse test inputs than prior approaches. Our method significantly improves API and code coverage and has already found 23 new bugs in PyTorch and TensorFlow, 11 of which have been confirmed.
Submitted - Testing Autonomous Driving Systems with Focused Misbehavior Forecasting
M M Abid Naziri, Stefano Carlo Lambertenghi, Andrea Stocco, Marcelo d’Amorim
This paper introduces a testing technique for autonomous driving systems that identifies potential failures by forecasting and fuzzing “near-miss” events in simulation. By using a misbehavior forecaster to target high-risk scenarios, our approach makes testing more efficient and effective. In our evaluation using the CARLA simulator, Foresee finds up to 128% more failures than baselines while being up to 2.49x faster, and improves the bug-finding capability of state-of-the-art fuzzers by over 93%.
Submitted - Evaluating the Effectiveness of Coverage-Guided Fuzzing for Testing Deep Learning Library APIs
Feiran Qin, M M Abid Naziri, Saikat Dutta, Marcelo d’Amorim
This work presents the first in-depth study confirming the effectiveness of Coverage-Guided Fuzzing (CGF) for testing Deep Learning library APIs. We introduce FlashFuzz, a novel tool that makes this possible by using Large Language Models (LLMs) to automatically synthesize and repair the required test harnesses. Our approach vastly outperforms state-of-the-art fuzzers in code coverage (up to +212%) and speed (up to 1182x), leading to the discovery of 42 new bugs in PyTorch and TensorFlow.
Submitted
[Preprint] [PDF]