Rapid Machine Learning-Driven Detection of Pesticides and Dyes Using Raman Spectroscopy

SDG4-Giáo dục có chất lượng
SDG9-Công nghệ - sáng tạo và phát triển hạ tầng

Abstract

The extensive use of pesticides and synthetic dyes poses critical threats to food safety, human health, and environmental sustainability, necessitating rapid and reliable detection methods. Raman spectroscopy offers molecularly specific fingerprints but suffers from spectral noise, fluorescence background, and band overlap, limiting its real-world applicability. Here, we propose a deep learning framework based on ResNet-18 feature extraction, combined with advanced classifiers, including XGBoost, SVM, and their hybrid integration, to detect pesticides and dyes from Raman spectroscopy, called MLRaman. The MLRaman with the CNN–XGBoost model achieved a predictive accuracy of 97.4% and a perfect AUC of 1.0, while it with the CNN–SVM model provided competitive results with robust class-wise discrimination. Dimensionality reduction analyzes (PCA, t-SNE, UMAP) confirmed the separability of Raman embeddings across 10 analytes, including 7 pesticides and 3 dyes. Finally, we developed a user-friendly Streamlit application for real-time prediction, which successfully identified unseen Raman spectra from our independent experiments and also literature sources, underscoring strong generalization capacity. This study establishes a scalable, practical MLRaman model for multiresidue contaminant monitoring, with significant potential for deployment in food safety and environmental surveillance.

Graphical abstract

Thai Binh, Q.T., Thuan Phuoc, L., Xuan Hai, P., Phan, T.B., Thu, V.T.H. and Tuan Hung, N. (2026) Journal of Chemical Information and Modeling, 66(7), pp. 3803–3813.

DOI: https://doi.org/10.1021/acs.jcim.6c00396