(M-023) Towards uncertainty assessment-based acceptability thresholds for model validation?

Monday, October 20, 2025

7:00 AM - 5:00 PM MDT

Location: Colorado A

Alexander KULESZA – University of Namur; Camille Massaux – University of Namur; Flora Musuamba Tshinanu – University of Namur

Author(s)

Alexander Kulesza, PhD (he/him/his)

Principal Scientist | QSP(T) & qAOP lead
ESQlabs GmbH , France

Disclosure(s):

Alexander Kulesza, PhD: No relevant disclosure to display

Objectives: While there is a clear need for better characterization of the predictive performance and qualification of mechanistic models (including physiologically based and quantitative system’s pharmacology modeling) there still lacks consensus best practice on the prospective definition of assessment metrics and goals as a function of the question of interest the modeling platform is supposed to address [1,2]. To help with the discussion and definition of such best practice we present here a theoretical study gauging a variety of metrics on the example of a theoretical PBPK based DDI prediction validation.

Methods: We first set up a well-controlled toy model mimicking prototypical DDI, compare assessment metrics and introduce uncertainty in the validation set as an additional dimension to assess. We then formally assess bootstrapped statistical tests of detecting relevant DDI in a simulation study and empirically relate other – also frequently used – metrics of goodness-of-fit. Then, we illustrate the essence of this concept on a PBPK model using the Itraconazole-Midazolam DDI [3] as a use case for the original and a theoretically weaker interaction.

Results: Our results indicate that the size of the validation set is a critical component for establishing credibility as the confidence interval around the validated predictions need gauged against the prediction task. While e.g. n-fold thresholds are not directly linked to the question of interest, a formal statistical test can be more directly related to it: In the case of well-established clinical criterion such as bioequivalence, classification of DDI being relevant or not (out/within of the bioequivalence limit) is a suitable binary test with AUROC being a common metric representing model classification performance in line with this test. We find that Bland-Altman analysis can be related to the classification including uncertainty and the statistical significance of AUROC can be checked using bootstrapping techniques (also as a function of sample size and error level). Then other metrics such as observed-predicted correlation coefficients can empirically link to the formal test and acceptability thresholds formally justified by simulation. For the example of PBPK DDI, an assumed uncertainty of 30% on in vitro Kis, has only little risk for misprediction of strong perpetrators but the error may become relevant for weaker perpetrators than Itraconazole:

Conclusions: Mechanistic model platform validation should consider uncertainty assessments to prospectively judge the risk for incorrect predictions. What an incorrect prediction is, should be derived from the question of interest (and model risk) and acceptability thresholds could be informed by simulation studies exploring expected variability experimental certainty of critical parameters as well as worst case scenarios. Once formally established, other surrogate metrics can be derived empirically.

Citations: [1]. Musuamba, F. T. et al. Moving Toward a Question‐Centric Approach for Regulatory Decision Making in the Context of Drug Assessment. Clin Pharma and Therapeutics 114, 41–50 (2023).
[2]. Musuamba, F. T. et al. Scientific and regulatory evaluation of mechanistic in silico drug and disease models in drug development: Building model credibility. CPT Pharmacometrics Syst Pharmacol 10, 804–825 (2021).
[3]. Itraconazole-Midazolam-DDI PBPK model on github.com, https://github.com/Open-Systems-Pharmacology/Itraconazole-Midazolam-DDI.

Keywords: Credibility, validation, PBPK, DDI