Emily Nieves: No financial relationships to disclose
Objectives: Development of mathematical models for drug development requires substantial expertise in relevant biology, drug characteristics, and modeling techniques. This study aims to enhance both efficiency and quality of model development by implementing an AI agent system that extends Delineate's literature processing capabilities to generate models from published literature and clinical/preclinical data.
Methods: The previously established Delineate literature copilot includes specialized tools for quantitative pharmacology, featuring a fine-tuned embedding model and AI-powered plot digitization and comprehension [1]. We extended the Delineate software to support comprehensive generation of new models through an AI agent system. This fully automated system comprises multiple specialized agents including: a modeling agent versed in methodological best practices, a biologist agent for evaluating biological plausibility and completeness, a coding agent for model implementation, and a code review agent ensuring correctness and executability. To assess system accuracy while minimizing human-in-the-loop bias, we designed an experiment providing only queries, publications on relevant biological mechanisms, and realistically simulated patient data as inputs. No prior modeling literature on the specific topics was supplied. We tested four case studies: a QSP model of GLP-1 and glucose/insulin dynamics, a QSP model of cytokine release following CAR-T cell therapy, a PBPK model of tissue delivery for lipid nanoparticle-encapsulated nucleic acids, and an exposure-response analysis of trastuzumab deruxtecan.
Results: We compared Delineate-generated models against published models with identical objectives and input data. Using a comprehensive scoring rubric that evaluated structural equation similarity, biological relevance, parameter selection, alignment with published approaches, and adherence to best practices, we generated ten models for each test case. Average scores were 68.5%, 71.5%, 73.6%, and 81.3% for the GLP-1, CAR-T, nucleic acid, and exposure-response modeling case studies, respectively. Most scoring deductions resulted from simplified equation representations compared to published ground truth models.
Conclusions: While the AI-generated models demonstrated promising performance, our findings suggest that optimal outcomes will be achieved through a scientist-in-the-loop approach, where modeling experts guide AI systems in refining and adapting modeling approaches to maximize practical utility.
Citations: [1] Nieves E. et al. “Delineate: a Literature Co-Pilot for Quantitative Systems Pharmacology,” 2024