Abstract
Recent advances in medical imaging and natural language processing enable new opportunities for automated diagnostic support. Chest X-rays (CXRs) remain the most common imaging modality for screening pulmonary, cardiovascular, and systemic diseases; however, the growing volume of studies and free-text reports can overwhelm clinicians. We propose a unified multimodal retrieval and prediction framework that jointly leverages DICOM-format CXRs and radiology reports by projecting visual and textual features into a shared semantic space. Trained with contrastive and multi-label objectives, the system supports disease classification, case-based retrieval, and explainable AI. Experiments on the Open-I dataset demonstrate strong performance, achieving a Macro AUROC of 0.95 and Macro F1-score of 0.71 across 22 diagnostic categories. The retrieval module attains high ranking quality (MRR and nDCG > 0.93) with sub-millisecond query latency. Quantitative explainability analysis further shows strong agreement between attention- and gradient-based attribution maps (Pearson ρ ≈ 0.92), supporting trustworthy clinical decision-making.
Graphical Abstract
Phu Duc Do, Hao Ngoc Nguyen Van, Viet Hoai Vo, (2026) Computers in Biology and Medicine, 208, p. 111667.


