(M-040) Machine Learning and Chemical Language Model-Based QSAR Models for Predicting Plasma and Tissue Half-Lives of Drugs in Cattle Across Various Administration Routes

Monday, October 20, 2025

7:00 AM - 5:00 PM MDT

Location: Colorado A

Zhicheng Zhang – Department of Environmental and Global Health – University of Florida; Lisa Tell – Department of Medicine and Epidemiology – University of California-Davis; Zhoumeng Lin – Department of Environmental and Global Health – University of Florida

Author(s)

ZZ

Zhicheng Zhang

PhD student
University of Florida, Florida, United States
ZZ

Zhicheng Zhang

PhD student
University of Florida, Florida, United States

Disclosure(s):

Zhicheng Zhang: No financial relationships to disclose

Objectives: This study aimed to develop and rigorously evaluate both traditional machine learning (ML)-based and advanced chemical language model-based quantitative structure–activity relationship (QSAR) models for predicting plasma and tissue half-lives of drugs administered through various routes in cattle.

Methods: This study utilized data from the Food Animal Residue Avoidance Databank (FARAD) Comparative Pharmacokinetic Database1,2, focusing on plasma and tissue residue non-compartmental elimination half-life data of drugs administered to cattle via different routes, including intravenous, oral, intramuscular, and subcutaneous administration routes. Two types of QSAR models were developed: (1) 21 Traditional ML models: Four machine learning algorithms (random forest, support vector regression, k-nearest neighbors, deep neural network)3,4 were implemented using five different types of molecular descriptors (RDKit descriptors, extended-connectivity fingerprints [ECFP6], functional-class fingerprints [FCFP6], molecular access system [MACCS] fingerprints, and a comprehensive descriptor combination)5,6,7; (2) An innovative, descriptor-free ImprovedChemBERTa model: a transformer-based model (ChemBERTa)8 which specifically designed for chemical informatics tasks was fine-tuned based on our dataset, using SMILES as input to predict half-lives directly. Internal validation procedures included robust 5-fold cross-validation, while external validation was performed using an independent test dataset. Performance metrics included the coefficient of determination (R²) and root mean square error (RMSE)9. Additionally, applicability domain assessments were performed using Williams plots to ensure reliable predictions within defined chemical spaces10.

Results: Among the traditional ML-QSAR approaches, the DNN model utilizing the molecular descriptors combination demonstrated the highest predictive performance, with an external test R² of 0.45 and internal validation (5-fold cross-validation) R² of 0.50 ± 0.14. Despite these promising results, the ImprovedChemBERTa significantly surpassed all traditional models in terms of predictive accuracy with a substantially higher external test R² value of 0.72 and an internal cross-validation R² of 0.74. Applicability domain analysis reinforced the reliability of ImprovedChemBERTa, confirming that over 90% of compound predictions fell within the established applicability domain.

Conclusions: Our comprehensive evaluation underscores the significant advantages offered by chemical language model-based QSAR methodologies over traditional descriptor-based ML approaches. By directly processing raw chemical notations (SMILES strings) without relying on explicit molecular descriptors, ImprovedChemBERTa demonstrated notably higher predictive accuracy and robust generalizability. This study confirms the transformative potential of descriptor-free, chemical language-based QSAR models in veterinary pharmacokinetics, significantly enhancing regulatory capabilities to ensure the safety and efficacy of drugs and animal-derived food products.

Citations: 1. Riviere JE, Craigmill AL, Sundlof SF. 1986. Food animal residue avoidance databank (farad): An automated pharmacologic databank for drug and chemical residue avoidance. Journal of food protection. 49(10):826-830.
2. Riviere JE, Tell LA, Baynes RE, Vickroy TW, Gehring R. 2017. Guide to farad resources: Historical and future perspectives. Journal of the American Veterinary Medical Association. 250(10):1131-1139.
3. Lin Z, Chou W-C, Cheng Y-H, He C, Monteiro-Riviere NA, Riviere JE. 2022. Predicting nanoparticle delivery to tumors using machine learning and artificial intelligence approaches. International journal of nanomedicine.1365-1379.
4. Wu P-Y, Chou W-C, Wu X, Kamineni VN, Kuchimanchi Y, Tell LA, Maunsell FP, Lin Z. 2025a. Development of machine learning-based quantitative structure–activity relationship models for predicting plasma half-lives of drugs in six common food animal species. Toxicological Sciences. 203(1):52-66.
5. Chung E, Russo DP, Ciallella HL, Wang Y-T, Wu M, Aleksunes LM, Zhu H. 2023. Data-driven quantitative structure–activity relationship modeling for human carcinogenicity by chronic oral exposure. Environmental Science & Technology. 57(16):6573-6588.
6. Ciallella HL, Russo DP, Aleksunes LM, Grimm FA, Zhu H. 2021. Predictive modeling of estrogen receptor agonism, antagonism, and binding activities using machine-and deep-learning approaches. Laboratory investigation. 101(4):490-502.
7. Jia X, Wen X, Russo DP, Aleksunes LM, Zhu H. 2022. Mechanism-driven modeling of chemical hepatotoxicity using structural alerts and an in vitro screening assay. Journal of hazardous materials. 436:129193.
8. Ahmad W, Simon E, Chithrananda S, Grand G, Ramsundar B. 2022. Chemberta-2: Towards chemical foundation models. arXiv preprint arXiv:220901712.
9. Mi K, Chou W-C, Chen Q, Yuan L, Kamineni VN, Kuchimanchi Y, He C, Monteiro-Riviere NA, Riviere JE, Lin Z. 2024. Predicting tissue distribution and tumor delivery of nanoparticles in mice using machine learning models. Journal of Controlled Release. 374:219-229.
10. Wang Y-n, Chen J, Li X, Wang B, Cai X, Huang L. 2009. Predicting rate constants of hydroxyl radical reactions with organic pollutants: Algorithm, validation, applicability domain, and mechanistic interpretation. Atmospheric Environment. 43(5):1131-1135.

Keywords: Half-life, Quantitative structure–activity relationship models (QSAR models), Machine learning