(S-030) Automated Generation of Computational Models from Scientific Publications Using AI
Sunday, October 19, 2025
7:00 AM - 5:00 PM MDT
Location: Colorado A
Jean-Baptiste Gourlet – Scientific Software Development – Nova In Silico; Elodie Prevot – Scientific Software Engineering – Nova In Silico; Jérémy Villard – Scientific Software Engineering – Nova In Silico
Objectives: The objective of this study is to introduce a novel AI-driven workflow that automates the conversion of scientific publications into computational models. This process aims to enhance reproducibility and efficiency in model implementation from existing literature.
Methods: - AI Agent Integration: Utilizing an AI agent within the jinkō modeling and simulation platform to process scientific publications containing model descriptions, including LaTeX-formatted equations and parameter data.
- OCR and Data Extraction: Employing the Mathpix API to perform Optical Character Recognition (OCR) on PDFs, extracting LaTeX equations and parameter tables.
- Iterative Model Construction: Leveraging Large Language Models (LLMs) to interact with jinkō's API, iteratively building and validating the computational model through function calls and embedded tools. The current focus is on ODE-based models.
- Unit Analysis and Annotation: Extracting unit information from source papers, converting it into a standardized SI-compliant format for automated unit checking within the modeling platform, and annotating equations with links to their specific locations in the source document for traceability.
Results: The implementation of this AI-driven workflow significantly reduces the time required to reproduce and implement models from scientific literature. User feedback indicates a reduction in model importation time from several days to a few hours, primarily spent on unit adjustments. This enhancement streamlines the process of validating and utilizing published models. Models built through this workflow can be directly used within jinkō or exported in interoperable formats such as SBML or Julia code, facilitating broader reuse. Current development efforts focus on increasing interaction between the AI agent and the modeler to improve the precision of information extraction.
Conclusions: Integrating AI agents with OCR and LLM capabilities facilitates the automated generation of computational models from scientific publications. This approach improves reproducibility, accelerates model implementation, and provides researchers with efficient tools to leverage existing literature in their computational work.