Explainable and Automated Pneumonia Detection from Chest X-Rays using CNNs
DOI:
https://doi.org/10.29304/jqcsm.2026.18.12528Keywords:
Grad-CAM, Integrated GradientsAbstract
Chest X-ray is the most popular examination type for thoracic diseases, but its interpretation exhibits error rates which are still subject to inter-observer variability and economic workload constraints. This work can be found in this paper: "A reproducible deep learning pipeline for pneumonia identification vs normal based on DenseNet-121 from chest X-ray". The dataset originated from the NIH ChestX-ray14 corpus and was downsampled to 8,500 frontal radiographs (1,050 pneumonia-positive, 7,450 normal) and split at the patient level into training, validation and testing sets. Preprocessing: Grayscale normalization, Resize, Targeted Augmentation and Training (with) Early Stopping, Learning Rate Scheduling, Class Weighting and Post-Hoc Probability Calibration. In the held-out test set, the model achieved ROC-AUC: 0.87, PR-AUC: 0.72, as well as a general accuracy of 93.2%, sensitivity: 82.8% and specificity: 94.6%. Calibration analysis contributed to improving the Brier score from 0.042 to 0.019 and led to good-fitting reliability curves. Interpretability was built into the inference using Grad-CAM and Integrated Gradients, with explanation faithfulness quantitatively checked (deletion AUC = 0.84, insertion AUC = 0.87, sanity check pass rate = 98%, pointing-game hit rate = 76%). Based on the above results, it can be seen that CNN-based diagnosis is promising to achieve a good accuracy as well as interpretability and reproducibility simultaneously. Hence, the proposed framework provides a white-box baseline for clinical examination and future multi-label thoracic disease detection extensions
Downloads
References
Brady, A. P. “Error and discrepancy in radiology: inevitable or avoidable?” Insights into Imaging 8, 171–182 (2017), p. 173–176.
Wang, X. et al. “ChestX-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases.” CVPR (2017), p. 2097–2106.
Selvaraju, R. R. et al. “Grad-CAM: Visual explanations from deep networks via gradient-based localization.” ICCV (2017), p. 618–626.
Irvin, J. et al. “CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison.” AAAI 33 (2019), p. 590–597.
Sundararajan, M., Taly, A., & Yan, Q. “Axiomatic attribution for deep networks.” ICML (2017), p. 3319–3328.
Adebayo, J. et al. “Sanity checks for saliency maps.” NeurIPS 31 (2018), p. 9505–9515.
Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G., & King, D. “Key challenges for delivering clinical impact with AI.” BMC Medicine 17, 195 (2019), p. 2–5.
Rudin, C. “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.” Nature Machine Intelligence 1, 206–215 (2019), p. 206–210.
Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., ... & Ng, A. Y. (2017). CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv preprint arXiv:1711.05225
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Manaaf Abdulredha Yassen

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.








