A unified multimodal framework for chest X-ray retrieval and disease prediction for clinical decision support

Abstract

Recent advances in medical imaging and natural language processing enable new opportunities for automated diagnostic support. Chest X-rays (CXRs) remain the most common imaging modality for screening pulmonary, cardiovascular, and systemic diseases; however, the growing volume of studies and free-text reports can overwhelm clinicians. We propose a unified multimodal retrieval and prediction framework that jointly leverages DICOM-format CXRs and radiology reports by projecting visual and textual features into a shared semantic space. Trained with contrastive and multi-label objectives, the system supports disease classification, case-based retrieval, and explainable AI. Experiments on the Open-I dataset demonstrate strong performance, achieving a Macro AUROC of 0.95 and Macro F1-score of 0.71 across 22 diagnostic categories. The retrieval module attains high ranking quality (MRR and nDCG > 0.93) with sub-millisecond query latency. Quantitative explainability analysis further shows strong agreement between attention- and gradient-based attribution maps (Pearson ρ ≈ 0.92), supporting trustworthy clinical decision-making.

Graphical Abstract

Phu Duc Do, Hao Ngoc Nguyen Van, Viet Hoai Vo, (2026) Computers in Biology and Medicine, 208, p. 111667.

DOI: https://doi.org/10.1016/j.compbiomed.2026.111667