(S-032) Exploring Python to Prepare CDISC-Compliant Analysis Datasets for Pharmacometric Analyses

Sunday, October 19, 2025

7:00 AM - 5:00 PM MDT

Location: Colorado A

Casey McBride – Bristol-Myers Squibb; Yu Liu – Bristol-Myers Squibb; Renuka Hegde – Bristol-Myers Squibb; Erin Dombrowsky – Bristol-Myers Squibb; Lu Chen – Bristol-Myers Squibb

Author(s)

RH

Renuka Hegde (she/her/hers)

Associate Director
Bristol Myers Squibb
monroe township, New Jersey, United States

Disclosure(s):

Renuka Hegde: No financial relationships to disclose

Objectives: The development of a high-quality, analysis-ready dataset is crucial for any pharmacometric analysis. While SAS and R have been traditionally used for Population Pharmacokinetic (popPK), Pharmacokinetic and Pharmacodynamic (PKPD), and Exposure Response (ER) dataset creation, this abstract explores Python as an alternative programming language for generating CDISC-compliant, analysis-ready pharmacometric datasets.

Methods: Using Jupyter Notebook with the Python 3 kernel and widely used packages such as pandas, NumPy, and the os and datetime modules, we developed Python programs following the CDISC Implementation Guide for creating popPK, PKPD, and ER datasets from ADaM source data in sas7bdat format. We highlight visual and tabular quality control checks integrated within Jupyter Notebook. Programming steps including datetime processing of SAS-formatted source data, flagging of records with issues, derivations of relative time variables, imputation of missing dosing times, and derivation of ER endpoints like progression-free survival, immune-mediated adverse events, and baseline covariates will be discussed. Python functions used to convert lab units and adding variable labels to the output dataset will be presented. Python code to handle the stacked structure of events in popPK and PKPD datasets will also be shown.

Results: Python successfully produced CDISC-compliant, analysis-ready popPK, PKPD, and ER datasets. The use of Jupyter Notebooks facilitated both the quality control of source and output datasets and seamless, programmer-friendly documentation of the dataset creation process. Python effectively managed complex algorithms related to source data issues and the intricacies of programming pharmacometric datasets.

Conclusions: Python is a suitable and versatile environment for programming CDISC-compliant, popPK, PKPD, and ER datasets and merits further exploration. More research into ensuring reproducibility between Python versions and packages is required. Python's flexibility, open-source nature, and widespread use make it a promising tool for pharmacometric programming.

Citations: Yan, Y., Sukumar P., Thanneer N. (2023). Using R to Create Population Pharmacokinetic Data Set. In Proceedings of the PharmaSUG 2023 Conference. Retrieved from https://pharmasug.org/proceedings/2023/SD/PharmaSUG-2023-SD-257.pdf
Clinical Data Interchange Standards Consortium (CDISC). (2023, October 6) Basic Data Structure for ADaM popPK Implementation Guide v1.0. Retrieved from: https://www.cdisc.org/standards/foundational/adam/basic-data-structure-adam-poppk-implementation-guide-v1-0

Keywords: python, open source, pharmacometric programming