Mahmud Ahmed Usman1 and Muhammad Tella2
1 Department of Management and Information Technology, Faculty of Management Sciences, Abubakar Tafawa Balewa University, Bauchi, Nigeria
2 Department of Management and Information Technology, Faculty of Management Sciences, Abubakar Tafawa Balewa University, Bauchi, Nigeria
Original language: English
Copyright © 2026 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Explainable Artificial Intelligence (XAI) is essential for deploying complex Computer Vision (CV) models in areas such as medical diagnosis, where transparency and accountability are required. This paper explores a hybrid interpretability framework that balances faithfulness, how well the explanation matches the model’s decision, and computational efficiency. We assess three main types of XAI: attribution-based (Grad-CAM), perturbation-based (RISE), and transformer-based attention methods. Studies show that perturbation-based methods such as RISE achieve the highest fidelity (Insertion AUC 0.727, Pointing Game Accuracy 91.9%), but they are too slow for real-time clinical use (0.05 FPS). Transformer-based XAI methods, by contrast, align more closely with expert annotations in medical tasks (IoU 0.099) and operate at a moderate speed (25.0 FPS). We suggest combining the localisation accuracy of attention-based models with the efficiency needed in clinical settings to create high-quality, useful saliency maps for medical diagnosis.
Author Keywords: XAI, medical imaging, computer vision, attention, efficiency.