Abstract:
With the continuous growth of the number and types of chemical substances and the aggravation of potential environmental risks of chemical substances, traditional toxicity testing methods are inadequate to meet the requirements of high-throughput screening and systematic risk assessment. Artificial intelligence (AI) technologies, particularly big data and machine learning technologies have shown great promise in chemical substance toxicity prediction. In this paper, we conduct a systematical review of the advancements of the application of key AI technologies in the construction of chemical toxicity prediction models, covering the core aspects of data collection, cleaning and preprocessing, molecular descriptor computation, feature extraction and selection, model training and validation, as well as the definition of model applicability domain and interpretability analysis. Moreover, by integrating the major domestic research findings in the field of AI-assisted toxicity prediction with our team's research practices, the key achievements of our team in data standardization, molecular feature engineering, model development, applicable domains, and model interpretability are presented. Finally, the future development direction is proposed to address the challenges of current toxicity prediction models in terms of data heterogeneity, multimodal data fusion, complex toxicity endpoint prediction and interpretation of predicted results. The aim of this paper is to promote the in-depth application of AI technology in the toxicity prediction of chemical substances, and to provide theoretical foundation and technical support for the efficient and reliable risk assessment of environmental pollutants.