Abstract:
The structural diversity of emerging contaminants and the absence of analytical standards for certain compounds limit the capability of traditional targeted approaches to detect substances beyond predefined reference standards. Consequently, the application of high-resolution mass spectrometry (HRMS)-based suspect and non-targeted screening has become indispensable for comprehensive identification of emerging contaminants in environmental matrices. However, traditional analysis methods are difficult to process the massive data obtained by HRMS, and complex mass spectrometry data analysis and substance identification have become the core challenges of environmental analytical chemistry. As a powerful data processing and pattern recognition tool, machine learning provides great application potential in improving the efficiency and accuracy of suspect and non-targeted screening of emerging contaminants. In this paper, we systematically review the innovative applications and recent advances of machine learning techniques in the full process analysis of suspect and non-targeted screening, focusing on key aspects such as raw mass spectrometry data pre-processing, intelligent molecular formula assignment, retention time prediction and quantitative concentration analysis, and comprehensively illustrate the role of conventional machine learning and deep learning algorithms in improving the efficiency and accuracy of screening. Future efforts should prioritize the integration of machine learning into the entire suspect and non-targeted screening workflow to enable more holistic investigations into the environmental exposure characteristics of emerging contaminants.