Abstract
During the drug development process, testing potency plays an important role in the quality assessment required for the manufacturing and marketing of biologics. Due to multiple operational and biological factors, higher variability is usually observed in bioassays compared with physicochemical methods. In this paper, we discuss different sources of bioassay variability and how this variability can be statistically estimated. In addition, we propose an algorithm to estimate the variability of reportable results associated with different numbers of runs and their corresponding OOS rates under a given specification. Numerical experiments are conducted on multiple assay formats to elucidate the empirical distribution of bioassay variability.
Keywords: bioassay, CMC statistics, linear mixed model, method variability, potency
1. Introduction
Testing potency, defined as the therapeutic activity of the drug product as indicated by appropriate laboratory tests or by adequately developed and controlled clinical data [1], is a legal requirement for lot release testing of biologics intended for human administration both at clinical trial and postcommercial approval stages. Potency tests are a vital part of process control strategy for biologics as they provide a quantitative measure of drug's intended biological activity (defined as its mechanism of action [MoA]). They can be used not only for lot release testing and for stability testing, contributing to determining product's shelf life, but also other associated chemistry, manufacturing, and controls (CMC) development activities such as upstream, downstream, and formulation process development, product characterization, and choice of a dosage form.
Determining, controlling, and mitigating method variability is particularly important for potency assays due to the inherent variability of experimental biological systems. Additionally, because of the product‐specific nature of these methods, they typically require development from “scratch” as part of CMC drug development activities. Unlike compendial methods, they can less frequently benefit from multicompany improvement across years and international standardization that aim to increase measurement consistency and reduce variability. Variability of each potency assay should be periodically assessed, first during assay development, then as part of phase‐appropriate method validation (also known as qualification), monitored throughout clinical stage testing support (trending), confirmed during full method validation, and finally trended during commercial lot testing. While there are numerous publications outlining best practices for potency assay design [2, 3, 4, 5], few describe methodology for deriving assay run variability and its impact on the accuracy of reportable potency result and likelihood of not meeting product specification.
This paper aimed to address these needs. In Section 2, we briefly introduce different types of potency assays and describe how relative potency (RP) is derived from a model fit of dose–response data and how potency method activities fit in the drug development processes. In Section 3, a general theoretical framework of classical linear mixed model is provided and estimates of different sources of variability on multiple assay datasets are discussed. In Section 4, we illustrate that reportable result should have less variability than individual run result, and develop an algorithm to estimate the variability of reportable results associated with different number of runs and their corresponding out‐of‐specification (OOS) rate. In Section 5, one assay example is demonstrated to connect the relationship among assay variability, reportable result, and OOS rate. Finally, a summary of topics discussed in this paper is provided in Section 6.
2. Potency Assay and Relative Measurement
2.1. Types of Potency Assays
Measuring biological properties, unlike many physicochemical attributes, requires product‐specific methods (Figure 1) that include but are not limited to the following:
immunoassays that measure drug–target binding, where a target is presented as a recombinant protein (non–cell‐based target binding assays) or presented at the cell surface (cell‐based target binding assays),
enzymatic assays that measure reaction rates when the drug or target has an MoA‐relevant enzymatic activity,
other functional in vitro assays that look at downstream effect of drug–target interaction, for example, non–cell‐based competition assays that measure interference of the drug's target binding to its interacting ligand, or cell‐based reporter assays that measure activation/inhibition of specific cellular pathways due to drug–target (e.g., cell receptor) interaction,
most rarely, animal assays that measure animal organism's response to the drug (e.g., elicited immune response).
FIGURE 1.

Common types of potency methods. In the given examples, depicted biologic (Drug) is a human monoclonal antibody. (a) Non–cell‐based target binding assay, where a recombinant target is coated on a microtiter plate and the amount of bound drug is detected with an anti‐human detection antibody. (b) Cell‐based target binding assay, where cells expressing a target are grown on a microtiter plate and the amount of bound drug is detected with an anti‐human detection antibody. (c) Reporter assay, where cells expressing a target are grown on a microtiter plate. Cells are engineered to express a reporter protein (e.g., luciferase) when a given signaling pathway is activated. Drug binding to the target activates downstream cellular signaling triggering synthesis of the reporter protein. (d) Animal potency assay measuring immune response levels in rodent serum post drug administration.
From the assay variability perspective, designing simpler potency control systems (e.g., non–cell‐based target binding assays) is preferred. However, this is often not possible because of regulatory expectations for a potency assay or a matrix of assays to capture the complete MoA of the drug when scientifically feasible. This in many cases means the introduction of a functional, often cell‐based, potency method for release of material used in pivotal clinical trials and then keeping it for commercial lot release.
2.2. Assay Variability, RP, and Reportable Result
Instead of absolute quantification of biological activity, which in many instances is hard to interpret or reliably reproduce, the output of most potency assays is relative to a standard and reported as %RP. Relative measurement is derived from a pairwise comparison of modeled fits of the dose‐dependent response of reference standard (RS, well‐characterized drug lot of known potency) and a test sample (Figure 2). Four‐parameter logistic (4PL) fit, which models a symmetric S‐shaped curve, is most commonly used but other fits (e.g., 5PL or linear fit with parallel line analysis) are also possible depending on experimental setup [6]. Parallelism criteria control the similarity of curve shapes of dose‐dependent response between the RS and test samples so that the horizontal shift of the curves on the dose scale can be used to measure changes in potency, typically the estimated concentration of the 50% point on the X‐axis or EC50 (Figure 2). Regardless of the chosen fit type, the fundamental assumption of parallelism needs to be met for meaningful derivation of %RP in a standardized and reproducible manner.
FIGURE 2.

Deriving relative potency of test sample with the use of reference standard. (a) Response data from dose–response series get plotted and fit into a model. (b) Model fit for RS is evaluated. (c) Evaluation of parallelism between RS and test sample unconstrained fits. For a 4PL fit, a (lower asymptote), b (curve steepness), and d (upper asymptote) parameters are evaluated. (d) If parallelism criteria are met, %RP is derived from the horizontal shift of the curve (for 4PL fit, difference in c parameter—effective concentration 50%).
The relative measurement of potency against an RS helps controlling intra‐lab (e.g., day‐to‐day and analyst‐to‐analyst) and interlab assay variability. The experimental design of potency methods is inherently linked to the variability of the final assay format but since the development of potency assays is not in scope of this publication, common practices are just briefly outlined. A common framework for developing and evaluating analytical procedures is termed analytical quality by design (AQbD [7]). A design of experiments (DoE) approach is used to find optimal parameter ranges and identify parameter interactions in a multivariate fashion, and to test method robustness. For potency assays, intra‐assay variability is often additionally controlled by replication strategy; having multiple dilution series within an assay run that are combined to form a model fit allows more precise measurement, helps control sample preparation errors (especially in the case of independent preparation of dilution series), and allows statistically driven outlier data identification and exclusion [8].
An equally important activity is to tailor the number of assay runs used for the derivation of reportable potency value for test samples. An assay run is defined as a time‐limited experiment, typically performed by only one analyst and executed as defined in the method standard operating procedure (SOP) that generates a single %RP value for a test sample. A single %RP value is generated only when an assay run is considered valid, which means that both system suitability criteria (including parallelism testing between RS and assay control, a well‐characterized material of known potency) and test sample acceptance criteria (including RS vs test sample parallelism testing) are met. A reportable potency value for a given potency method can either come from one valid assay run, that is, a single %RP value, or be averaged over multiple %RP values coming from multiple different assay runs. Selecting an appropriate number of assay runs for a reportable %RP is an important way to control the accuracy and precision of the assay results for sample testings. More details on statistical method for variability estimation are discussed in Section 3, and the algorithm to generate reportable results is provided in Section 4.
2.3. Potency Assay Activities in the Drug Development Process
The development of a potency method begins postlead drug candidate selection in late preclinical phases following AQbD practices. The assay development phase is finished when the assay meets internal quality targets (especially variability assessment across a chosen range) in prequalification assay runs. Assay qualification (also known as phase appropriate validation), performed in the good manufacturing practice (GMP) testing laboratory, provides the first formal assessment of assay variability and supports investigational new drug (IND) or clinical trial application (CTA) first time in human (FTIH) filings, required for clinical Phase 1 study start (Figure 3). A potency assay is qualified across a range, which fully covers the planned specification range for lot release and stability testing, ideally including a range both above and below specifications. Method robustness is evaluated before full method validation, a laboratory study performed in the GMP setting that verifies that the potency assay procedure meets predefined analytical characteristics (including various components of assay variability) for its intended use (lot release and stability testing).
FIGURE 3.

Potency assay activities during drug development product life cycle. BLA, biologics license application; IND, investigational new drug application; OOS, out‐of‐specification; Ph1, clinical Phase 1; PPQ, process performance qualification. Method qualification is also often described as phase‐appropriate method validation. Refer to the main text for a detailed description.
Potency assay development activities are typically connected with manufacturing process development (Figure 3). A newly developed potency method supports process development activities in the preclinical phase and process improvements in the clinical phases. The validated potency method is then used in process performance qualification (PPQ), a qualification of the commercial manufacturing process for biologics. PPQ must be conducted in time for the Biologics License Application (BLA), which is required for the introduction of the drug to the market. Set manufacturing quality systems allow for continued verification of the process in the marketplace setting, as well as monitoring of the performance of analytical methods, including potency assays.
All lot release analytical methods must have appropriate acceptance criteria captured in the product specifications. Conformance of the manufactured material with set specifications is an important part of the process control strategy to ensure lot‐to‐lot consistency and product quality [9]. The acceptance criteria must be set tightly enough to ensure product safety and efficacy, and also wide enough to allow for variation in the measured results. This variation consists of two parts: method variability (uncertainty of measured %RP compared with true %RP value) and process capability (differences in true %RP between manufactured lots following the same process). For FTIH filings, typical numerical ranges of %RP are wider because of limited understanding of method variability and performance of the process, which often gets modified/optimized throughout the clinical phase (Figure 3). Often, a “platform” potency acceptance criterion is used, which is justified based on sponsor's experience of testing previous products that followed similar manufacturing processes and were tested by similar types of potency methods (Section 2.1). An acceptance criterion gets progressively tightened in time for BLA filing based on actual manufacturing process and method performance, coming mainly from PPQ, relevant clinical experience, and method validation. Intrinsically connected to a specification range is a probability rate of obtaining an OOS result. This is an important aspect driving the product specification as well as assessing the suitability of the developed potency method based on method variability assessed during method qualification (see Section 3.1) and in choosing the number of assay runs for a reportable value (see Section 4.1). Too large an assay variability may require method re‐development as it may not be feasible to apply acceptable acceptance criteria with sufficiently low OOS rate (see Section 4.2).
3. Assay Variability Estimation
In most cases, assay variability needs to be evaluated during the validation of analytical procedures. This applies both to full method validation, as well as method qualification to support FTIH filings. Typical validation characteristics include accuracy, precision, specificity, linearity, and range. Among these validation characteristics, the precision of an assay demonstrates the closeness of agreement between a series of measurements obtained from multiple sampling of the same homogeneous sample under the prescribed conditions [10]. It is usually computed as standard deviation or coefficient of variation (CV) of a series of measurements. In general, precision is usually considered at three levels:
Repeatability: the variability of an assay under the same operating conditions over a short interval of time;
Intermediate precision: the variability of an assay within a laboratory usually includes variability caused by different days, different analysts, different equipment, and so forth;
Reproducibility: the variability of an assay between different laboratories.
3.1. Linear Mixed Model
As demonstrated above, there are different sources of variability in potency assays. In practice, the experimental data can be generated under the same or different analysts, days, laboratories, and so forth. Due to the complex hierarchical structures of bioassay data, linear mixed model is a popular choice to analyze experimental results and estimate assay variability. The general form of linear mixed model [11] can be represented as
| (1) |
where is the vector of experimental responses, is the design matrix of fixed effects, and is the design matrix of random effects. Design matrices and should reflect the hierarchical structure of the experiment with respect to fixed effects and random effects, respectively. Moreover, is the vector of fixed effect coefficients, is the vector of random effects, and is the error term. In the classical linear mixed model, it is usually assumed that
| (2) |
Based on the above model form and assumptions, the corresponding conditional distribution follows
| (3) |
and the marginal distribution follows
| (4) |
In the classical linear mixed model, the parameters that need to be estimated from experimental data are the fixed effect coefficients and all the unknown parameters in the covariance matrices and . Let represents the collection of all the unknown parameters in the covariance matrices and . For simplicity of notation, let . Assuming that is known and by maximizing the joint likelihood of and , it can be shown that
| (5) |
Then the restricted maximum likelihood (REML) estimate of is
| (6) |
where is the parameter space of . Thus, the fixed effect coefficients estimate is
| (7) |
In summary, this is the general framework of a classical linear mixed model. There has been an increasing interest in linear mixed model in pharmaceutical industry [12, 13, 14, 15], since it is a powerful analysis tool for experimental data with complex hierarchical structures. The fitting methods of linear mixed models are also well developed. There are various statistical software packages, such as SAS, SPSS, STATA, and R, that provide model fitting on classical and general linear mixed models. Within R, there are many different packages that can be used to fit linear mixed model, and two most popular package choices are nlme and lme4. In this paper, we will mainly use lme4 to fit linear mixed model and generate numerical results.
3.2. Variability Estimates on Historical Assay Datasets
In this section, we will apply linear mixed models on real‐world assay data to assess the variability of different potency methods. To develop some understanding of the empirical distribution of assay variability, we analyze a list of publishable assays developed for potency testing of AstraZeneca biologics over the course of the last 20 years. Presented data originate from assay qualifications performed to evaluate product‐specific potency assay performance to support release and stability testing of clinical material for FTIH studies and at later clinical stages of the product life cycle if more MoA‐reflective potency assays were introduced post‐FTIH studies. The large majority of collated potency assays were developed to measure the potency of monoclonal antibodies (mAb) and other fragments crystallizable region (Fc)‐containing molecules (bispecifics, antibody‐drug‐conjugates—ADCs and Fc‐fusion proteins). Assay performance was evaluated across ranges fully covering lot release and stability specifications (e.g., 50%–150% RP), and qualification designs follow regulatory guidance [16] to access various analytical characteristics of interest, such as linearity, accuracy, and precision.
Intermediate precision and repeatability are two main types of variability of interest in the qualification study. Intermediate precision is typically assessed through an analysis of the variance components associated with different analysts, runs, days, and instruments within the same lab. Due to the limited number of assay runs in qualification study, some random effects are usually confounded with each other and the individual estimate for each random effect (e.g., analysts, days, and instruments) is not always reliable. A simple and practical solution to this issue is to combine the random effects from all these factors as the run‐to‐run variability. In other words, based on the characteristics of qualification data in our historical assay, the following linear mixed model is chosen to assess assay variability:
| (8) |
where denotes the observed RP value at the jth expected potency level in the ith run, denotes the fixed effect from jth expected potency level, denotes the random effect from ith run, and is the error term. It is assumed that
| (9) |
where represents the run‐to‐run variability and represents the within‐run variability. Under this model form, intermediate precision can be estimated by the total variability and repeatability can be estimated by the within‐run variability . If model diagnostics (e.g., plot of conditional Pearson residuals versus fitted values) shows obvious heterogeneous variance, other appropriate covariance matrix structures can be defined for the error term. If the qualification data do not meet the standard assumptions for the classical linear mixed model (e.g., the random effect and residual error are not independent), extended linear mixed model can be used.
The linear mixed model (8) is fitted on the qualification data of 47 historical assays via the lme4 package in R. Estimates of intermediate precision and repeatability are demonstrated in Figures 4 and 5, respectively. Some scientists define repeatability as the variability of assay observed only at the 100% potency level. Thus, for each assay, another estimate of repeatability using only 100% potency level data is also provided in Figure 5. These repeatability estimates are computed directly via the formula of coefficients of variation, as a comparison to the linear mixed model estimates. The following findings on the variability estimates are shown in Figures 4 and 5. Both intermediate precision and repeatability tend to have a skewed distribution with a heavy right tail or possibly a small peak in the right tail, which is consistent with our expectations. Based on prior experiences in bioassay, it is rare to see a potency assay with an intermediate precision CV% less than 2.5, and a highly variable assay is not unusual for complex experimental systems. This fact leads to the asymmetry and heavy‐tail property of the empirical distribution of assay variability. Another observation on the repeatability estimates is that there is no significant evidence that either one of the estimation methods shows overestimating or underestimating compared with the other. It is suspected that data from different potency levels might have different variability and only using data from 100% level might lead to a lower estimate of repeatability. However, this is not the case in our empirical data, as shown in Figure 5. In fact, in 27 of the 47 assays, repeatability estimated on only 100% potency data is higher compared with the estimates using all potency data. This might be a result of the fact that fewer data are available on 100% potency, while the other estimate can use all the data across potency levels to generate an averaged estimate. Choosing the appropriate estimation method based on the definition of repeatability in the assay and the design of the qualification study is recommended.
FIGURE 4.

Analysis of intermediate precision (CV%) on 47 historical potency methods. (a) For each method, solid bar represents the point estimate of intermediate precision and the error bar represents the one‐sided 95% upper confidence limit. (b) Histogram of these estimated intermediate precision and the corresponding density function.
FIGURE 5.

Estimates of repeatability (CV%) on 47 historical potency methods. (a) For each assay, the blue bar represents the estimate from linear mixed model using all assay data across different potency levels, while the coral bar represents the estimate via the formula of coefficients of variation on 100% potency level data only. (b) Histogram and density function of estimated repeatability‐based data across different potency levels.
Furthermore, we are also interested in exploring the assay variability based on different types of potency methods. Figure 6 demonstrates the estimated assay variability of four classified categories: non–cell‐based binding assay, cell‐based binding assay, cell‐based reporter assay, and other cell‐based functional assay, where the complexity of the assay increases in this order. Examined methods might differ in the final assay format (e.g., intra‐assay replication strategy, number of dilution series, or dilution points within series) but all their assay designs followed AQbD practices. Based on the categorized results, it appears that assay variability correlates with the general assay complexity. It can be seen that the majority of non–cell‐based and cell‐based target binding assays tend to have smaller variability compared with the functional assays (cell‐based reporter assays and other cell‐based functional assays). This might be caused by the fact that functional assay quantifies functional consequences of the drug–target interaction, which rely not only on drug–target binding but also downstream signaling events (e.g., receptor activation leading to transcription factor activation) and subsequent cellular processes (e.g., cell death or proliferation). These often come with increased complexity of measured biological processes and more complex assay readouts, which inevitably increases assay variability.
FIGURE 6.

Box plot of assay variability based on different assay types. For both intermediate precision and repeatability, assay variability tends to increase as the complexity of the assay.
4. Impact of Assay Variability on Reportable Value and OOS Rate
4.1. Variability of Reportable Results
GMP standards require all analytical methods on official control strategy that are used to assess the quality of manufactured products to meet appropriate standards for accuracy and precision. This also applies to potency methods. As already mentioned, good assay design practices enable minimizing intra‐run variability of biological systems. Another equally important good practice on reducing the variability of assay results is to generate a reportable result based on the average of multiple individual run results. For independent individual results, it is intuitive that the averaged value should have less variability. For correlated results, this intuition is less straightforward. The following is a statistical justification for why reportable results should have less variability than individual run results.
Proposition 1
Let be the accuracy results from their corresponding assay run , where
(10) Let be the reportable result of these assay runs, where
(11) Then
(12)
The variability of reportable result can be computed as
(13)
(14) Based on Cauchy–Schwarz inequality, Equation (13) can be derived as
(15)
(16)
(17) Therefore, it can be shown that the variability of reportable results is smaller than the variability of individual assay run results. Note that this statement also holds if CV% is chosen as a measure of variability, since
(18) where is nonnegative based on its scientific interpretation.
To decide how many runs are appropriate for reportable results, estimates of variability of reportable results are needed. One could simply compute the variability of as , where can be estimated using the residual error from a linear mixed model. However, this approach relies on the independent assumption among , which might not be true due to the complex hierarchical structures of bioassay data. For example, the same analyst might perform several assay runs on the same day, and these measurements are not rigorously independent from a statistical perspective. In addition, the corresponding confidence interval for the variability of reportable results is not straightforward using this approach. Due to the above reasons, we choose the idea of bootstrapping to simulate reportable results from individual run results, and then variability estimates can be derived from empirical distributions. In this paper, we propose an algorithm to achieve this goal and the idea of this algorithm is straightforward. First, given available assay accuracy results (e.g., from assay qualification study), this dataset is sampled with replacement to get a large number of simulated results from this assay. Second, for each resample, reportable results are generated based on the number of runs of interest. Third, variability of these reportable results are calculated within each resample. Finally, statistics of variability of reportable results can be derived based on the empirical distribution observed in these simulated assay data. The pseudocode of this approach is provided in Algorithm 1.
ALGORITHM 1. Estimation of variability of different reportable results and their corresponding OOS rates.
Inputs : Assay accuracy results , number of simulation , maximum number of runs of interest for reportable results , upper specification limit , lower specification limit , product variability Perform bootstrap with replacement on accuracy data, generate simulation matrix , where for do for do for in seq(1, , by = ) do end for , where end for The median of The 95th percentile of The median of The 2.5th percentile of The 97.5th percentile of end for Outputs: , estimates of variability of reportable results calculated on runs and their corresponding OOS rates
To demonstrate the performance of Algorithm 1, we implement it on historical assay data, and the estimated variability of reportable results associated with different number of runs is provided in Figure 7. As shown, the variability of reportable results is smaller than the variability of one‐run results for each assay. These estimates also provide numerical evidence for Proposition 1.
FIGURE 7.

Estimates of variability of reportable results using different number of runs. Within each assay, the variability of reportable results is smaller than the variability of individual run results, as demonstrated in Proposition 1.
The following are some remarks on Algorithm 1. First, CV% is output as the variability of reportable results in the pseudocode, but the estimates of standard deviation can also be used as a measure of variability, depending on the user's preference. In addition, this algorithm uses a default 95% confidence level and chooses one‐sided upper confidence limit for CV, but two‐sided limits or different confidence levels can also be derived by choosing appropriate percentiles. Furthermore, Algorithm 1 also provides estimates of OOS rates, which will be discussed in Section 4.2. For the succinctness of this paper, the pseudocode of the estimation of two variables is presented in the same algorithm, but Algorithm 1 can be used separately for estimating the variability of reportable results, where USL, LSL, and are not required as input in this case.
4.2. OOS Rate Estimation
Following the recommended approach in regulatory guidance [16], the OOS rate can be estimated as
| (19) |
where is the cumulative distribution function of standard normal distribution and is the Taguchi capability measure index defined as
| (20) |
In the definition of , is the upper specification limit, is the lower specification limit, is the variability of reportable result, is the variability of corresponding product, and is the relative bias between observed potency and expected potency
| (21) |
During OOS rate estimation, and are provided and they are usually determined from business needs. As discussed before, the variability of reportable results can be estimated using Algorithm 1. In most cases, the data used to estimate are from the same lot, in order to reduce the additional variation from lot‐to‐lot variability. There are different heuristic methods to estimate product variability . One of them is to estimate based on prior knowledge derived from historical product variability. For example, estimation of product variability can be obtained using available data from products that are manufactured by a similar process. Given , , and , estimates of OOS rate of different reportable results can be derived using Algorithm 1.
The impact of the variability of reportable results on OOS rate can be directly seen in Equations (19) and (20). Assuming the other variables are constant, lower variability of reportable results can lead to a smaller OOS rate estimate. In particular, when an assay has a relatively high or tight specification, the amount of control on the variability of reportable results is especially important. Figure 8 demonstrates these examples. As it can be seen, using individual run result has a higher chance to go beyond the specification, since this value has higher variability compared with reportable results calculated on multiple assay runs.
FIGURE 8.

A comparison between individual run result and reportable result calculated from multiple runs. (a) Both results have high relative bias, but the probability of reportable result contained within the specification (70, 130) is higher since reportable result has less variability. (b) When the potency assay specification is tight, reportable result also has a lower OOS rate, due to its smaller variability.
In this paper, we focus on estimating variability of reportable results and their corresponding OOS rates. On the other direction, given an acceptable OOS rate, upper bounds of and variability of reportable results can be derived. These upper bounds can be used as acceptance criteria in the qualification and validation of potency methods. This topic goes beyond the scope of this paper and details are not discussed here. In addition, this framework is the recommended approach in the USP guidance on assay validation [16]. If other formulas are preferred to estimate OOS rate, Algorithm 1 can also be updated accordingly. For example, there is another OOS rate estimation method demonstrated in USP <1010> [17], which can also be used as a comparison method.
5. Numerical Example
In this section, we will use one of the historical methods to briefly demonstrate the workflow described in this paper. The example presented here is a cell‐based reporter assay, and the design of this assay follows the regulatory guidance [2].
A summary of the measured %RP values in the qualification study of this assay, conducted to support the FTIH filing, is provided in Table 1. As discussed in Section 3, estimates of assay variability can be derived by fitting a linear mixed model on these qualification results, and the corresponding statistical estimates are provided in Table 2. It can be seen that the estimated CV% of intermediate precision is 10.89, which implies the variability of this assay is not trivial. To reduce variability of measured potency results in this assay, reportable results can be computed to support testing of the lot‐release and stability samples. Algorithm 1 can be implemented on the accuracy results derived from Table 1 and variability of reportable results associated with different number of runs can be estimated. Figure 9 illustrates the variability estimates of reportable result values associated with different number of runs and their corresponding one‐sided 95% upper confidence limit. In this example, the estimate of product variability is 5% based on prior knowledge of previous similar product. Then the OOS rates can be estimated under given specifications and the results are presented in Table 3. Considering OOS rate estimates and expected specification, business decisions can be made on how many numbers of runs are appropriate to compute reportable results for this assay. For example, given the expected specification is (70%, 130%) and OOS rate <0.3%, then a reportable result might be recommended as the average of results from two runs in this assay.
TABLE 1.
Example of experimental results in the qualification study of a potency assay.
| Run | Expected potency level (%RP) | ||||
|---|---|---|---|---|---|
| 50 | 75 | 100 | 125 | 150 | |
| 1 | 45.1 | 67.9 | 92.9 | 127.5 | 164.9 |
| 2 | 48.2 | 72.5 | 101.8 | 125.5 | 180.8 |
| 3 | 45.8 | 72.8 | 104.6 | 129.3 | 156.1 |
| 4 | 56.0 | 79.1 | 114.2 | 157.3 | 194.0 |
| 5 | 48.3 | 78.7 | 98.7 | 128.3 | 153.9 |
| 6 | 41.9 | 70.4 | 94.7 | 108.5 | 123.7 |
TABLE 2.
Estimates of variability components in this assay.
| Variability component | CV% estimate | One‐sided 95% UCL |
|---|---|---|
| Total | 10.89 | 17.99 |
| Run‐to‐run | 8.94 | 13.08 |
| Residual error | 6.21 | 8.43 |
FIGURE 9.

Variability of reportable results calculated based on different number of runs in this assay. Solid bar represents estimate of CV%, and error bar represents the one‐sided 95% upper confidence limit of CV%.
TABLE 3.
Statistics of OOS rates for different reportable results under different specifications.
| Specification (%) | Number of runs for reportable results | OOS rate median (%) | 95% LCL | 95% UCL |
|---|---|---|---|---|
| (60, 140) | 1 | 0.107 | 0.002 | 0.846 |
| 2 | 0.002 | 1.665E‐05 | 0.129 | |
| 3 | 9.119E‐05 | 1.503E‐08 | 0.034 | |
| 4 | 1.149E‐05 | 3.997E‐10 | 0.016 | |
| (70, 130) | 1 | 1.413 | 0.123 | 4.829 |
| 2 | 0.131 | 0.002 | 1.579 | |
| 3 | 0.023 | 1.557E‐04 | 0.720 | |
| 4 | 0.007 | 1.961E‐05 | 0.465 | |
| (80, 120) | 1 | 10.186 | 3.123 | 18.799 |
| 2 | 3.210 | 0.478 | 10.759 | |
| 3 | 1.409 | 0.136 | 7.320 | |
| 4 | 0.803 | 0.052 | 5.916 |
6. Summary
In this paper, we provided a brief introduction on potency methods and elucidated the importance of assay variability assessment in the drug development process. In addition, the theoretical framework of linear mixed model was discussed to illustrate the appropriate statistical method of estimating assay variability, and empirical estimates of historical assay variability were also demonstrated, which can be used as prior knowledge for the evaluation of prospective potency methods. Moreover, we proposed a bootstrapping method to estimate the variability of reportable results associated with different number of runs and discussed the benefit of using this approach for reportable result value calculation for lot release and stability samples testing. Finally, we use one real‐world example to demonstrate some good practices on assay variability discussed in this paper.
In summary, it is vital to measure, monitor, and control the variability of every potency method so that it can be an effective part of the control system of quality and lot‐to‐lot consistency of biologics. Estimation of assay variability can be used to guide an appropriate number of assay runs to derive reportable results of an assay under a given product specification and a target OOS rate in the drug development process.
Conflicts of Interest
Hang Li, Tomasz M. Witkos, Scott Umlauf, and Christopher Thompson are employees of AstraZeneca and may hold AstraZeneca stock or stock options.
Funding: This work was supported by AstraZeneca.
Data Availability Statement
Research data are not shared.
References
- 1. FDA , “Biological Products: General, 21 C.F.R. 210.3(b)(16),” 2023 https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?fr=210.3.
- 2. USP , “<1032> Design and Development of Biological Assay,” 10.31003/USPNF_M1354_01_01. [DOI]
- 3. Robinson C., Sadick M., Deming S., Estdale S., Bergelson S., and Little L., “Assay Acceptance Criteria for Multiwell‐Plate‐Based Biological Potency Assays Draft for Consultation,” Bioprocess International 12 (2014): 30–41. [Google Scholar]
- 4. White J. R., Abodeely M., Ahmed S., et al., “Best Practices in Bioassay Development to Support Registration of Biopharmaceuticals,” BioTechniques 67, no. 3 (2019): 126–137. [DOI] [PubMed] [Google Scholar]
- 5. Little T. A., “Essentials in Bioassay Design and Relative Potency Determination,” BioPharm International 29, no. 4 (2016): 49–52. [Google Scholar]
- 6. USP , “<1034> Analysis of Biological Assays,” 10.31003/USPNF_M5677_02_01. [DOI]
- 7. ICH , “Q14 Analytical Procedure Development. Draft Version,” 2022 https://database.ich.org/sites/default/files/ICH_Q14_Document_Step2_Guideline_2022_0324.pdf.
- 8. USP , “<111> Design and Analysis of Biological Assays,” 10.31003/USPNF_M98860_03_01. [DOI]
- 9. ICH , “Q6B Specifications: Test Procedures and Acceptance Criteria for Biotechnological/Biological Products,” 1999 https://www.ema.europa.eu/en/documents/scientific‐guideline/ich‐q‐6‐b‐test‐procedures‐acceptance‐criteria‐biotechnological/biological‐products‐step‐5_en.pdf.
- 10. ICH , “Q2 (R1) Validation of Analytical Procedures: Text and Methodology,” 2005 https://database.ich.org/sites/default/files/Q2%28R1%29%20Guideline.pdf.
- 11. Laird N. M. and Ware J. H., “Random‐Effects Models for Longitudinal Data,” Biometrics 38, no. 4 (1982): 963–974. [PubMed] [Google Scholar]
- 12. Thai H. T., Mentré F., Holford N. H., Veyrat‐Follet C., and Comets E., “A Comparison of Bootstrap Approaches for Estimating Uncertainty of Parameters in Linear Mixed‐Effects Models,” Pharmaceutical Statistics 12, no. 3 (2013): 129–140, 10.1002/pst.1561. [DOI] [PubMed] [Google Scholar]
- 13. Brown H. and Prescott R., Applied Mixed Models in Medicine, 3rd ed. (West Sussex, UK: John Wiley & Sons, 2015). [Google Scholar]
- 14. Francq B. G., Lin D., and Hoyer W., “Confidence and Prediction in Linear Mixed Models: Do Not Concatenate the Random Effects. Application in an Assay Qualification Study,” Statistics in Biopharmaceutical Research 12, no. 3 (2020): 262–272, 10.1080/19466315.2020.1776762. [DOI] [Google Scholar]
- 15. Duan J., Levine M., Luo J., and Qu Y., “Estimation of Group Means in Generalized Linear Mixed Models,” Pharmaceutical Statistics 19, no. 5 (2020): 646–661, 10.1002/pst.2022. [DOI] [PubMed] [Google Scholar]
- 16. USP , “<1033> Biological Assay Validation,” 10.31003/USPNF_M912_01_01. [DOI]
- 17. USP , “<1010> Analytical Data—Interpretation and Treatment,” 10.31003/USPNF_M99740_05_01. [DOI]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Research data are not shared.
