Emily Nieves: No financial relationships to disclose
Large Language Models (LLMs) have demonstrated capabilities in accelerating repetitive, detail-intensive tasks through their advanced natural language processing capabilities. In this talk, we will introduce Delineate, a novel platform that leverages LLMs to facilitate the systematic creation of fit-for-purpose datasets from research literature. By employing domain-specific fine-tuned language models for the extraction and structured representation of clinical trial information, combined with computer vision models utilizing deep learning techniques for the precise digitization of quantitative data from plots and graphs, comprehensive datasets can be compiled with high fidelity. Integrating the platform with established biomedical databases such as PubMed and ClinicalTrials.gov, Delineate implements sophisticated semantic search algorithms to identify papers that meet detailed study inclusion criteria. We will present empirical case studies demonstrating the application of the Delineate platform in creating large-scale datasets specifically formatted for Model-Based Meta Analysis (MBMA) studies, and quantitatively compare its extraction accuracy against GPT-based models and other general-purpose language models. We further showcase an additional use case of utilizing Delineate to create a custom database from over 900 research papers for the acceleration of AI model training.