Skip to main content
Springer logoLink to Springer
. 2024 Mar 15;86(4):40. doi: 10.1007/s11538-024-01274-4

Nonlinear Regression Modelling: A Primer with Applications and Caveats

Timothy E O’Brien 1,, Jack W Silcox 2
PMCID: PMC10943168  PMID: 38489047

Abstract

Use of nonlinear statistical methods and models are ubiquitous in scientific research. However, these methods may not be fully understood, and as demonstrated here, commonly-reported parameter p-values and confidence intervals may be inaccurate. The gentle introduction to nonlinear regression modelling and comprehensive illustrations given here provides applied researchers with the needed overview and tools to appreciate the nuances and breadth of these important methods. Since these methods build upon topics covered in first and second courses in applied statistics and predictive modelling, the target audience includes practitioners and students alike. To guide practitioners, we summarize, illustrate, develop, and extend nonlinear modelling methods, and underscore caveats of Wald statistics using basic illustrations and give key reasons for preferring likelihood methods. Parameter profiling in multiparameter models and exact or near-exact versus approximate likelihood methods are discussed and curvature measures are connected with the failure of the Wald approximations regularly used in statistical software. The discussion in the main paper has been kept at an introductory level and it can be covered on a first reading; additional details given in the Appendices can be worked through upon further study. The associated online Supplementary Information also provides the data and R computer code which can be easily adapted to aid researchers to fit nonlinear models to their data.

Supplementary Information

The online version contains supplementary material available at 10.1007/s11538-024-01274-4.

Keywords: Bioassay, Dose–response, Likelihood ratio test, Profile likelihood confidence interval, Relative potency, Wald statistic

Introduction

Facilitated by readily-available statistical software, practitioners in fields as diverse as agronomy, biochemistry, biomedicine, drug development, engineering, environmental science, neuroscience, pharmacology and toxicology fit nonlinear models to their data to help answer their research questions. In addition to providing parsimonious data fits, nonlinear models are preferred to empirical models since the associated nonlinear model parameters typically have meaningful practical interpretations. For example, two drugs or experimental conditions are often compared in terms of half maximal effective concentration (EC50) or median lethal doses (LD50), which can be modelled as nonlinear dose–response model parameters. Mechanistic nonlinear models are often chosen based on underlying subject-matter knowledge such as Michaelis–Menten enzyme kinetic theory or dose response modelling methods (Bates and Watts 2007; Finney 1978; Govindarajulu 2001; Hubert 1992; Miguez et al. 2020; Ratkowsky 1983; Seber and Wild 1989). Faced with a plethora of experimental design and modelling methods, even statistically-savvy subject-matter practitioners may be unaware of key nonlinear methods and important requirements and cautions associated with nonlinear regression hypothesis testing methodologies and confidence interval estimation techniques.

Using straightforward nonlinear regression models and illustrations, this article overviews and illustrates useful nonlinear regression methods, underscores problems associated with commonly-used Wald statistic test p-values and confidence intervals (Wald 1943), and demonstrates the preference for exact likelihood-based confidence intervals over Wald intervals. Specifically, as highlighted below, p-values provided by statistical software packages are often based on the Wald approximation and can be grossly inaccurate for small- to moderately-sized studies. For example, Wald-based confidence intervals for nonlinear model parameters with nominal labels of 95% may have actual coverage levels of 75% or even lower. Conversely, readily-available exact or near-exact likelihood-based intervals generally show good agreement between nominal and actual coverage levels.

This nonlinear modelling introductory discussion, which builds upon readers background in linear methods (Draper and Smith 1998; Kleinbaum et al. 2014; Mendenhall and Sincich 2020), also provides the basis for further consideration of additional topics including dose response modelling, high-throughput screening methods, compartmental models based on differential equation(s) and other multivariate nonlinear models, computational algorithms and starting values, and further explorations of curvature measures. Given the evolution away from hypothesis testing approaches to estimation methods (Halsey 2019; Krzywinski and Altman 2013; Meeker and Escobar 1995), the focus here is largely on accurate confidence interval methods instead of p-values.

The article is structured as follows. To provide important context, Sect. 2 introduces simple motivating nonlinear model examples which highlight both nonlinear modelling in practice and underscores key differences with linear models. Section 3 overviews general nonlinear regression methods, makes connections to and contrasts with linear models, discusses parameter profiling in multiparameter models, nonlinear model selection, model fitting algorithms, and starting value selection. Section 4 provides additional exemplary nonlinear illustrations and extensions. In Sect. 5, we give important concluding remarks and discussion. The Appendices provide additional details and illustrations regarding the effects of curvature on nonlinear modelling, the Fieller-Creasy ratio of means example, comparisons of the F-based and the asymptotic likelihood tests and intervals, and comments and caveats regarding overfitting. Further, the R computer code (R Core Team 2020) used in the data analyses is given in the Supplementary Information. These R programs are easily adapted to help practitioners fit meaningful nonlinear models to their data.

Motivating Illustrations

The two key motivating examples introduced and discussed here illustrate the basic use of nonlinear modelling and demonstrate some of the widespread use of these methods.

Example 1

For a single substrate, Michaelis–Menten enzyme kinetics theory (Michaelis and Menten 1913; Bates and Watts 2007) can be used to model the connection between the velocity of an enzymatic reaction (in counts per min2) to the substrate concentration (in ppm). To illustrate, consider the simulated data plotted in the left panel of Fig. 1. The chosen design involves the chosen substrate concentrations x=0.02,0.04,0.06,0.08 and 0.10 replicated three times. The reader can see that there is an obvious curve in the data and a researcher may be inclined to use a linear approach by modelling the data with a polynomial regression that includes a quadratic term in which the predictor is the squared substrate concentrations. This approach may help the model to successfully fit the data by accounting for the curvature seen but polynomial regression is notoriously difficult to interpret. Instead, if a researcher were to use a nonlinear approach that was motivated by theory, such as the Michaelis–Menten enzyme kinetic theory, they would have a much more useful and interpretable model.

Fig. 1.

Fig. 1

Two motivating example plots. Left panel: Plot of simulated data (filled circles), fitted two-parameter Michaelis–Menten model function and estimated EC50 point (large, filled triangle). Right panel: Plot of fungal growth data (filled circles), fitted line and estimated IC50 point (large, filled triangle)

The classical Michaelis–Menten model function is given by

ηx,θ=θ1xθ2+x 1

Here, the model parameters are θ=θ1θ2, where θ1 is the upper asymptote (also called the ultimate velocity parameter) and θ2 is the EC50 or IC50 (sometimes also called the half-velocity) parameter. This follows since when x=θ2, we obtain ηx,θ=12θ1. By connecting the substrate concentrations (xi) with the velocity reaction rate (yij) data using the additive model expression yij=ηxi,θ+εij and least-squares estimation (see Sect. 3 and the R-code in the Supplementary Information), we obtain the parameter estimates, θ^1=209.868 and θ^2=0.0647. This fitted Michaelis–Menten model function is the nonlinear curve plotted in the left panel of Fig. 1. This model predicts that the enzyme velocity levels off at an ultimate velocity (upper asymptote) of almost 210 counts per min2. It also predicts that for a substrate concentration of 0.0647 ppm, the predicted velocity is approximately half of 210 or about 105 counts per min2. Note that the half-velocity point (0.0647,104.9) is the plotted solid triangle and is highlighted by the dashed vertical and horizontal lines. ■

The above simple illustration demonstrates the common application of nonlinear modelling in practical applications. A researcher now has easy-to-interpret parameter estimates that fit within a theoretical model that was motivated by expert background knowledge rather than some arbitrary quadratic term from a polynomial linear regression that is difficult to interpret. The next example shows that nonlinear modelling is also encountered when fitting a linear model but where interest centers on a nonlinear function of the linear model parameters.

Example 2

The data plotted in the right panel of Fig. 1 are adapted from a regression study (Bowers et al. 1986; Samuels et al. 2016) relating laetisaric acid concentration (the independent variable) to fungal growth (the dependent variable) in P. ultimum, the plant pathogen. These n=6 data points are plotted along with the fitted ordinary least-squares regression line. Indeed, a simple linear regression could be used to describe the relationship between laetisaric acid concentration and fungal growth in P. ultimum. However, the main goal of this particular study, as stated by the authors, was to estimate the acid concentration which “inhibits growth of P. ultimum by 50 percent” (Bowers et al. 1986, p. 106). Nonlinear regression modeling can help the authors meet this goal much more precisely. Rather than fitting a linear regression model, yi=α+βxi+εi, the researchers could fit a nonlinear model that directly estimates the half maximal inhibitory concentration (sometimes referred to as IC50). In the following nonlinear model, this IC50 parameter is represented by theta (θ) and the expected zero-concentration value (i.e., the intercept) is represented by alpha (α):

ηx,θ=α1-x2θ 2

We can fit this model using the”nls” function in R (for more details, please see the code in the Supplementary Information). This model fit shows that the parameter estimate for alpha (i.e., the intercept) isα^=32.64. In other words, the expected fungal growth with no applied laetisaric acid is estimated to be 32.64. Our model fit also estimates thatθ^=22.33. In other words, the concentration of laetisaric acid that will inhibit P. ultimum growth by half is 22.33. Indeed, if the reader plugs 22.33 into x in our fitted model, the expected value ofy, ory^50=ηθ^=16.32, is indeed half of α^ (i.e., the estimated intercept).

The observant reader may note that the fit for this model (depicted as a black line in the right panel of Fig. 1) is indeed a straight line. That is, if a person were to fit the data using the”lm” function in R, they would end up with essentially the same line as the one seen in Fig. 1. So, it is natural to ask why this model is considered nonlinear considering it produces a fit that is a straight line. As discussed further in Sect. 3, nonlinearity refers to whether or not the parameters of the model enter the model linearly or nonlinearly. In the Eq. (2), note that the slope (parameter expression which multipliesx) is-α2θ, which is nonlinear in the model parameters. Therefore, the model introduced in Eq. (2) is nonlinear.

The reader may next ask what benefit they gain from fitting a nonlinear model in this scenario since the produced model fit is the same line that a linear model would produce. When fitting a linear model in this case, the parameters that are estimated are the intercept (which is also estimated in the Eq. (2) model) and the slope of the line. This slope estimate would tell the researcher how much growth would change, on average, for every one unit increase in laetisaric acid concentration. This may be of interest to the researcher. But, as discussed above, the main goal of the study, as stated by the authors, was to investigate at which concentration of acid they may expect to see a 50% decline in growth. This so-called IC50 value (i.e., theta) could be determined from the linear model but the researcher would be unable to perform hypothesis testing or to calculate confidence intervals without making often inaccurate simplifying approximations. With the nonlinear model, we are directly estimating the IC50 parameter and, therefore, may directly test hypotheses and estimate confidence intervals.

In Fig. 1’s right panel, the point (θ^,y^50) (the filled triangle) as well as the corresponding vertical and horizontal (dashed) line segments are also plotted. As mentioned above, because we now have a direct estimate of the IC50 from our model we now calculate a confidence interval for this estimate that is so important for our study. ■

In this paper, we will discuss two different approaches to creating confidence intervals: the traditional Wald approach and the likelihood-based approach. As noted in the next section, for linear models, the two approaches generally give the same estimates for confidence intervals. But, as we will detail in Sect. 3 and elsewhere in the paper, likelihood-based confidence intervals are typically preferable when using nonlinear models.

Key Nonlinear Regression Methods and Results

In this section, we briefly introduce and develop key nonlinear regression results. Additional details, including theoretical results, are given in general nonlinear texts (Bates and Watts 2007; Ratkowsky 1983; Seber and Wild 1989) and in subject-matter works from an array of fields including agronomy (Miguez et al. 2020), animal science (Gonçalves et al. 2016), immunology (Bursa et al. 2020), and pupillometry (Bartošová et al. 2018; Rollins et al. 2014; You et al. 2021). Before addressing nonlinear models, we first illustrate various linear models.

For i=1,2,n, the usual (homoscedastic normal) simple linear regression model is written yi=α+βxi+εi with εiiidN0,σ2 where “iid” denotes ‘independent and identically distributed’. For the model function ηxi,θ=α+βxi and model function parameter vector θ=αβ, the general structure of this model is

yi=ηxi,θ+εi 3

The multiple linear regression model function is ηxi,θ=α+β1xi1+β2xi2++βp-1xip-1 for θT=α,β1,,βp-1 and the quadratic regression model is ηxi,θ=α+β1xi+β2xi2.

Perhaps surprisingly, even though the quadratic regression model (and other polynomial models) is sometimes used to account for curves observed in data, it is actually a linear model. In Sect. 2, we also saw a case in which we used a nonlinear model to fit a straight line to data with no observable curvature in it (see Example 2). So, what exactly do we mean when we call a model “nonlinear” since it does not necessarily refer to the shape that we see in the data? We next define and illustrate nonlinearity in regression modelling.

What Makes a Nonlinear Model Function Nonlinear?

For the homoscedastic normal model given in Eq. (3), the model function ηxi,θ with parameters θT=θ1,θ2,,θp is characterized as nonlinear if the (partial) derivative of ηxi,θ with respect to at least one of the parameters includes at least one of the model parameters. For example, for the Michaelis–Menten model function from Example 1 in Sect. 2, ηx,θ=θ1xθ2+x, note that both the partial derivatives, ηθ1=xθ2+x and ηθ2=-θ1xθ2+x2, contain model parameters, so this model function is nonlinear. Similarly, for Example 2 with model function ηx,θ=α1-x2θ, this model function is nonlinear since at least one of the two derivatives, ηα=1-x2θ and ηθ=αx2θ2, contain model parameters—in this case, both partial derivatives do. On the other hand, the quadratic model function ηx,θ=α+β1x+β2x2 has derivatives ηα=1,ηβ1=x and ηβ2=x2, and since none of these contain model parameters, the quadratic model is indeed a linear model. In sum, nonlinearity assesses the manner in which the model function parameters enter the model function—not how the explanatory variable(s) enter.

Another way in which nonlinear models are encountered is when a linear model is fit but where interest focuses on a nonlinear function of the model parameters. This was the case for Example 2 where researchers fit the simple linear model function ηx,θ=α+βx but where their focus was on estimating the IC50 parameter (θ). This parameter is such thatα2=α+βθ. Solving forβ, we getβ=-α2θ, and when this value is substituted into ηx,θ=α+βx with θ=αβ,this yieldsηx,θ=α1-x2θwithθ=αθ.

Another example is when researchers fit the quadratic regression model and wish to estimate the input value x where the model function achieves its maximum or minimum. Using basic calculus, this value, denoted δ, is such that β1+2β2δ=0, so that β1=-2β2δ. The model function can then be rewritten ηx,θ=α-2β2δx+β2x2. This new way of writing the original quadratic model function, called a model reparameterization, yields a nonlinear model. It has the clear advantage of making the parameter of interest be an inherent model parameter so as to more easily obtain accurate point and interval estimates.

Parameter Estimation: Point Estimates and Standard Errors

Parameter estimation for the homoscedastic normal nonlinear models considered here can be achieved using maximum likelihood estimation, or equivalently, least squares estimation. For Sθ given below in Eq. (5), the corresponding log-likelihood is written,

LLθ=-n2logσ2-12σ2Sθ 4

Since the model function parameters only appear in the Sθ term, maximum likelihood estimates (denoted MLEs) can be found by minimizing the sum of squares function,

Sθ=i=1nεi2=i=1nyi-ηxi,θ2 5

Least-squares estimates (denoted LSEs) are those parameter values that minimize Sθ for each of the p model function parameters. In other words, the goal of least squares estimation is to find parameter estimates that minimize the difference between observed values of y (denoted as yi) and model-predicted values of y (typically denoted as y^i). We denote the MLE/LSE parameter vector by θ^, and, when transposed, we can write θ^T=θ^1,θ^2,,θ^p. So for nonlinear model parameters, MLEs and LSEs are indeed the same.

Under standard regularity conditions, least-squares parameter estimates for these model function parameters are obtained by differentiating Sθ with respect to the p model parameters, setting these derivatives to zero, and solving the resulting p so-called normal equations. These normal equations are S(θ)θ1=0,S(θ)θ2=0,,S(θ)θp=0, and they can also be written,

i=1nηxi,θ^θ1ei=0,i=1nηxi,θ^θ2ei=0,,i=1nηxi,θ^θpei=0, 6

where for i=1,2,,n, the model residuals are ei=yi-ηxi,θ^. Note that in general for nonlinear model functions, Eq. (6) is a nonlinear system of p equations in p unknowns (i.e., the model function parameters). Since the system of normal equations can be written more concisely in matrix form, we introduce the n×p so-called Jacobian matrix,

X=ηx1,θ^θ1ηx1,θ^θ2ηx1,θ^θpηx2,θ^θ1ηx2,θ^θ2ηx2,θ^θpηxn,θ^θ1ηxn,θ^θ2ηxn,θ^θp 7

Using this notation, the normal equations system of p equations from Eq. (6) can be rewritten,

XTe=0, 8

where eT=e1,e2,,en. Readers familiar with linear models (where the model parameter vector is often denoted β) will recognize that here, e=y-Xβ^ and the normal equations are XTy-Xβ^=0 or XTXβ^=XTy. We emphasize that this latter expression holds only for linear models whereas Eq. (8) holds for the more general nonlinear model situation.

For the nonlinear models considered here, parameter estimates are obtained by solving the (nonlinear) normal equation system in Eq. (8) and this in general involves using numerical methods and computer algorithms such as the root-finding methods available in freeware packages such as R (see the Supplementary Information). The following two illustrations are provided to demonstrate the use of these nonlinear normal equations in nonlinear parameter estimation.

Example 3

Similar to the Michaelis-Menton model function in Example 1, consider the one-parameter (i.e., p=1) model function (which is nonlinear since the derivative contains the parameter θ),

ηx,θ=xθ+x 9

Notice that this model function has y=1 as its upper asymptote and the model parameter θ is the model EC50 since for x=θ,ηx,θ=12. For this single parameter nonlinear illustration, we use the simulated n=4 data points, xi,yi=0,0.037,2,0.209,(4,0.519) and (6,0.430). Since ηθ=-xθ+x2, we obtain the estimate of θ by substituting the xi,yi data values into the single normal equation,

x1θ+x12y1-x1θ+x1++x4θ+x42y4-x4θ+x4=0 10

This expression is a nonlinear equation in the model function parameter, θ. The “uniroot” function in R (see the Supplementary Information) is used to obtain the LSE which here is θ^=5.8698. ■

Example 2 (continued)

For the two-parameter vector θ=αθ, the model function for this illustration is ηx,θ=α1-x2θ and so the sum of squares function is Sθ=i=16yi-α1-xi2θ2. Differentiating with respect to the model parameters gives the expressions,

S(θ)α=-2i=161-xi2θyi-α1-xi2θ 11

and

S(θ)θ=-2i=16αxi2θ2yi-α1-xi2θ 12

When these two equations are set equal to zero, we obtain the two nonlinear normal equation system in two unknowns (α^ and θ^), i.e., the normal equations. For the data used in this example, the numerical algorithm yields the LSEs α^=32.639 and θ^=22.327 as reported previously. ■

The manner of finding standard errors for model function parameters is similar to that used for linear models but for nonlinear models it is based on the following approximation. The first order (and also asymptotic) variance–covariance matrix associated with the LSE parameter vector estimate θ^ is s2(XTX)-1 where the mean-square error (MSE) is s2=Sθ^n-p and the Jacobian matrix X is given in Eq. (7). The diagonal elements of s2(XTX)-1 are the squares of the standard errors (SEs) of the parameter estimates θ^. Note that for linear models, these standard errors are exact but for nonlinear models, they are based on a first-order (or large-sample) approximation.

To illustrate these results for Example 3, since here s2=Sθ^n-p=0.022173=0.00739 and XTX is the scalar i=14xi2θ^+xi4=0.004542, the standard error associated with θ^=5.8698 is SE=0.007390.004542=1.6269=1.2755. Analogous results can be obtained for Examples 1 and 2 but in these cases, since XTX is of dimension 2×2, matrix inversion is used to find XTX-1 and thereby the corresponding standard errors associated with the LSE parameter estimates.

Parameter Estimation: Interval Estimates

In statistical methodology, confidence interval strategies and methodologies can be obtained by “inverting” a test statistic. For example, in many single parameter situations (such as for a paired t-tests or regression through the origin), the null hypothesis H0:θ=θ0 can be tested using the test statistic,

t=θ^-θ0SE 13

Here, θ^ and SE are the corresponding parameter estimate and standard error. Under certain normal theory assumptions, this so-called Wald test statistic follows a t-distribution with n-1 degrees of freedom (Wald 1943). The test statistic is rearranged and solved for θ0 to produce the associated Wald 1-α100% confidence interval for θ, viz, θ^±tα/2,(n-1)×SE. Here, tα/2,(n-1) is the t-distribution quantile with n-1 degrees of freedom which puts area α/2 in both the lower and the upper tails. Similarly, for the p-dimensional parameter case where p>1, the corresponding Wald confidence interval (WCI) for parameter θi, i=1,2,,p, is

θ^i±tα/2,n-p×SEi 14

As noted in the previous section, SEi is the square root of the ith diagonal element of the variance–covariance matrix s2XTX-1. The degrees of freedom of the t quantile here is n-p.

For this one-parameter hypothesis test H0:θ=θ0, a rival test statistic to the above Wald test statistic is the likelihood-based F-statistic,

F=Sθ0-Sθ^s2 15

When H0 is true, this test statistic has the F distribution with 1 and (n-1) degrees of freedom. Inverting this test statistic gives the likelihood-based confidence interval (LBCI) which, since s2=Sθ^n-1 here, consists of the values of θ such that

Sθ=Sθ^1+F1,(n-1)n-1=Sθ^1+tn-12n-1 16

In this expression, Sθ=i=1nyi-ηxi,θ2 is the sum of squares function introduced and discussed previously. Note that Eq. (16) also makes use of the fact that the square of a t-quantile with k degrees of freedom is equal to the corresponding F-quantile with 1 and k degrees of freedom. Once the data have been obtained and used to estimate the single model parameter θ, and once the confidence level has been set, the right-hand side of Eq. (16) is a positive number. And, per this equation, finding the values of θ for which Sθ is equal to that positive number is again a nonlinear root-finding undertaking which generally uses numerical methods to solve. The following example provides an illustration of these Wald and likelihood-based confidence interval methodologies.

Example 3 (continued).

As reported in Sect. 3.2, for the n=4 simulated data points and the one-parameter model function given in Eq. (9), the LSE parameter estimate and standard error are θ^=5.8698 and SE=1.2755 respectively. The 95% t-quantile, obtained using the R command, qt(0.975,3), is 3.1824, and so the 95% (Wald) WCI is 5.8698±3.1824×1.2755 or(1.8106,9.9291). Finding the (likelihood-based) LBCI is a little more challenging since it is the interval of θ values between the interval end-point values for whichSθ=Sθ^1+tn-12n-1=0.09702. Using the “uniroot” R function employed in the Supplementary Information, the LBCI here is the values of θ in the interval(2.9960,12.7122). This simple illustration demonstrates a pronounced difference between these two types of 95% confidence intervals for these data. For example, the test that H0:θ=11 would be retained using the likelihood method but rejected using the Wald approach. The opposite conclusion would follow in testingH0:θ=2. We emphasize that whereas these two types of intervals coincide exactly for linear models, this is clearly not the case for nonlinear models. ■

We next demonstrate how the Wald approach supplies an approximation to the likelihood approach. Applying the first-order Taylor series approximation of the one parameter model function about θ^ and substituting this approximation in the sum-of-squares function, we obtain

ηx,θηx,θ^+ηx,θ^θθ-θ^
yi-ηxi,θyi-ηxi,θ^-ηxi,θ^θθ-θ^

So,

i=1nyi-ηxi,θ2i=1nyi-ηxi,θ^2+i=1nηxi,θ^θ2θ-θ^2 17

The last line in Eq. (17) follows from the penultimate line by squaring, summing over i from 1 to n, and noting that the cross-product term is zero by the normal equation. This last line of Eq. (17) shows that subject to the assumed first-order approximation, Sθ is approximately equal to the constant Sθ^ plus a quadratic expression in θ. When combined with Eq. (16), this provides the Wald interval given in Eq. (14). This demonstrates that Wald intervals are quadratic approximations to the true sum of squares function and result from an initial first-order approximation. These results are illustrated as follows.

Example 3 (continued).

For the given data, the sum of squares function Sθ is plotted in the left panel of Fig. 2 using the solid curve and where the filled circle is the point(θ^,Sθ^)=(5.8698,0.02217). Also plotted is the horizontal cut line at y=Sθ^1+tn-12n-1=0.09702 obtained from Eq. (16). The intersection of Sθ and the cut line gives the endpoints of the LBCI,(2.9960,12.7122), as indicated by the filled squares in the left panel. Also plotted is the (Wald) quadratic approximation as the dashed parabola; the intersection of this quadratic approximation and the cut line gives the endpoints of the WCI,(1.8106,9.9291). Although the exact shape of the sum of squares function Sθ is not relevant to the practitioner, what is important is that the Wald method is based on a linear approximation (which when squared gives the parabola) and that the two methods generally differ for nonlinear models.

Fig. 2.

Fig. 2

Sum of squares plots. Left panel: For Example 3 data, Michaelis–Menten-like sum of squares plot (solid curve) and Wald approximation (dashed parabola), least-squares parameter estimate (filled circle), LBCI endpoints (filled squares) and WCI endpoints (filled triangles). Right panel: For Example 2 data, laetisaric acid shifted profiled sum of squares plot (solid curve) and Wald approximation (dashed parabola), least-squares parameter estimate (filled circle), PLCI endpoints (filled squares) and WCI endpoints (filled triangles)

Before leaving this example, two other comments are important to note. First, notice that the LSE point estimate is the same whether we use the true Sθ function or the quadratic approximation. This is to be expected since the linear approximation takes place at θ=θ^, where the two functions are equal. Second, using Eq. (16), the cut-line for the 90% confidence intervals is easily computed to be y=0.06310. At this lower height (and lower confidence level), note that the difference between the Wald and likelihood-based intervals is less pronounced. Thus in general, the higher the confidence level, the greater will generally be the divergence between the two intervals for nonlinear models. ■

As noted in Eq. (14), Wald confidence interval methods for the multi-parameter case of p>1 model parameters are straightforward. Methods for obtaining likelihood intervals involve the technique of parameter profiling which we now discuss. First, partition the p-dimensional parameter vector θ as θ1θ2 where θ1 contains p1 model parameters and θ2 contains p2 model parameters so that p1+p2=p. To test the subset null hypothesis H0:θ2=θ20, the likelihood-based F test statistic (i.e., the counterpart of Eq. (15)) is

F=Sθ~1,θ20-Sθ^/p2s2=Sθ~1,θ20-Sθ^/p2Sθ^/(n-p) 18

Under the null hypothesis, this test statistic follows the Fp2,(n-p) distribution, that is, the F distribution with p2 and (n-p) degrees of freedom. In Eq. (18), θ~1 minimizes S(θ) subject to the constraint H0:θ2=θ20. This technique of removing so-called nuisance parameters (i.e., θ1 here) by constrained optimization is the mentioned profiling technique. Note that the restricted (constrained) estimate θ~1 is in general not equal to the unrestricted (LSE) estimate θ^1. Furthermore, since our interest is in obtaining a confidence interval instead of a region, let p2=1 so that θ2 is the single parameter of interest. Then, the result in Eq. (18) is

Sθ~1,θ20-Sθ^Sθ^/(n-p)F1,(n-p) 19

Inverting this expression gives the profile likelihood confidence interval (PLCI) as the set of θ2 values which solve the (typically nonlinear) equation

S(θ~1,θ2)=Sθ^1+Fα,1,(n-p)n-p=Sθ^1+tα/2,(n-p)2n-p 20

We underscore that θ~1=θ~1(θ2) in Eq. (20) is the value of the remaining (p-1)-vector θ1 in θ=θ1θ2 that minimizes S(θ) subject to the constraint that θ2 is the given fixed value. In certain instances, algebraic results can be derived to obtain the θ~1 vector in closed-form, but in the general situation, numerical methods are required.

Direct comparison of the LBCI expression in Eq. (16) with the PLCI expression in Eq. (20) highlights the fact that both approaches use root finding methods to find the corresponding confidence intervals. But the PLCI equation also involves constrained optimization to remove the remaining (so-called ‘nuisance’) parameters. The following example illustrates the PLCI method for a situation where an exact algebraic result is available.

Example 2 (continued).

As noted above, the key parameter in this example is the IC50 parameter, θ, so that the intercept parameter α is treated as a nuisance parameter. That is, α is a parameter which must be estimated but which is not the main focus of the study. The intercept α is profiled out by fixing θ and setting to zero only the partial derivative S(θ)α in Eq. (11). This gives the profiled (conditional) parameter estimate,

α~=α~θ=i=16yi1-xi2θi=161-xi2θ2 21

This profiled parameter estimate is then substituted into the sum of squares function to obtain the profiled sum of squares function,

Sθ=Sα~,θ=i=16yi-α~1-xi2θ2 22

For the given data, this profiled sum of squares function is plotted in the right panel of Fig. 2 using the solid curve and where the filled circle is the point(θ^,Sθ^)=(22.327,-109.5497). For ease in computations, this profiled sum of squares function has been shifted down here by the amount Sθ^1+tn-22n-2 so the horizontal cut line of Eq. (20) is then y=0. The intersection of the shifted profiled Sθ function and the horizontal cut line gives the endpoints of the PLCI,(15.9176,43.9640), as indicated by the filled squares in the figure. Also plotted is the (Wald) quadratic approximation as the dashed parabola; the intersection of this quadratic approximation and the cut line gives the endpoints of the WCI,(12.5786,32.0756). ■

The plots in the two panels of Fig. 2 look very similar but note that the plots in the left panel are for the sum of squares function, whereas in the right panel they correspond to the profiled sum of squares function. Regardless, in both cases, the Wald and likelihood curves and intervals are observed to differ appreciably.

Examples 1 and 2 (continued).

To summarize the previous findings for Sect. 2’s motivating examples and to introduce the topic of the next section, we briefly return to these two 2-parameter examples now displayed in the panels of Fig. 3.

Fig. 3.

Fig. 3

Two motivating example plots with confidence intervals. Left panel: Plot of simulated data (small, filled circles), fitted two-parameter Michaelis–Menten model function, estimated EC50 point (large, filled triangle), 95% Wald confidence interval (WCI) (short-dashed line segment between large, filled squares) and 95% profile likelihood confidence interval (PLCI) (long-dashed line segment between large, filled circles). Right panel: Plot of fungal growth data (small, filled circles), fitted line, estimated IC50 point (large, filled triangle), 95% Wald confidence interval (WCI) (short-dashed line segment between large, filled squares) and 95% profile likelihood confidence interval (PLCI) (long-dashed line segment between large, filled circles). The confidence intervals are spanned by the horizontal lines at the bottom of the panels

In both of these examples, the Wald intervals are (by construction) symmetric about the respective LSEs whereas the profile likelihood intervals are shifted to the right. ■

We next turn to giving practical reasons for preferring one of these confidence interval methods over the other. Then, in Sect. 3.5, we discuss nonlinear model selection and computational algorithms.

Deciding Which Confidence Interval Method is Preferred and Why

For nonlinear model function parameter estimation, F-statistic likelihood-based confidence intervals are generally preferred to Wald methods for several important reasons. These reasons, discussed further below, have been underscored by several notable works (Bates and Watts 2007; Clarke 1987; Cook and Witmer 1985; Donaldson and Schnabel 1987; Evans et al. 1996; Faraggi et al. 2003; Haines et al. 2004; Pawitan 2013; Peddada and Haseman 2005; Ratkowsky 1983; Seber and Wild 1989). As regards the notation used here, Wald confidence intervals (WCIs) are those given in Eq. (14) and likelihood confidence intervals are either likelihood-based confidence intervals (LBCIs) in the one-parameter setting such as in Eq. (16) or profile-likelihood confidence intervals (PLCIs) for the multiparameter setting such as for the two 2-parameter motivating examples displayed in Fig. 3 and as in Eq. (20).

There are several important reasons why likelihood confidence interval methods (LBCIs and PLCIs) are preferred over WCIs for nonlinear modelling. One reason is that likelihood intervals generally have much better agreement between nominal (i.e., assumed) confidence levels and actual confidence levels. Several works (Clarke 1987; Donaldson and Schnabel 1987; Evans et al. 1996; Faraggi et al. 2003; Ratkowsky 1983; Seber and Wild 1989) have used computer simulations to demonstrate that, provided the model/data’s intrinsic curvature (discussed in Appendix A.1) is reasonably low, likelihood intervals typically demonstrate good agreement between the chosen nominal (e.g., 95%) and the actual confidence level. Results for Wald intervals, however, can be quite disappointing. For example, simulation studies for some reported homoscedastic normal nonlinear models have found that the “observed coverage for a nominally 95% [Wald] confidence interval is as low as 75.0%, 44.0%, and 10.8%” (Donaldson and Schnabel 1987, p.76), depending on the data and chosen model function.

The superior coverage of likelihood methods is not surprising and is to be expected since for homoscedastic normal nonlinear models, the likelihood test statistics given in Eqs. (15), (18), and (19)—and the associated likelihood confidence intervals—are exact or very nearly so. Theoretical results show that the only difference between these F-statistic results and exact results (including p-values and coverage probabilities) depends upon the model’s intrinsic curvature (discussed in Appendix A.1). Further, intrinsic curvature is often negligible for nonlinear models in practice (Bates and Watts 2007; Clarke 1987; Ratkowsky 1983; Seber and Wild 1989). Wald confidence intervals, on the other hand, are also affected by so-called parameter-effects curvature (also discussed in Appendix A.1), which can be appreciable for many nonlinear model-dataset situations.

Another reason likelihood methods are preferred is that they more accurately reflect the information in the data. For example, for the two models and datasets of Fig. 3, the WCIs are symmetric whereas the PLCIs are shifted to the right. Since most of the datapoints (e.g., five of the six data points in the right panel) lie to the left of the estimated IC50, the relative amount of ‘information’ in the data about the IC50 parameters is higher on the left of the IC50 estimate and lower on the right side. So the PLCIs are more reasonable for these datasets and models since less information is contained in the data on the right side of the estimated parameters and so the PLCIs extend further on the right-hand side. More generally, since WCIs for nonlinear model parameters are always symmetric and PLCIs can and often are asymmetric, PLCIs can more accurately reflect any information/precision imbalances in the data regarding specific parameter values.

In sum, for all practical purposes, the F-based likelihood methods used here are essentially exact (see Appendices A.1 and A.2). On the other hand, Wald methods for nonlinear model parameters are based on the asymptotic (valid for large-sample) normality of the model parameter estimate. Since this approximation breaks down for many nonlinear models with small-to-moderate datasets, these Wald methods should only be used with caution or avoided altogether—and this includes the commonly reported Wald p-values given by some popular statistical software packages.

Nonlinear Model Function Selection and Computational Methods

As regards model function selection, our preference is to use mechanistic models instead of empirical models whenever possible. Mechanistic models are those chosen based on the subject-matter knowledge of the relevant system or phenomenon under study, whereas empirical models are those often chosen based on providing a good fit to the study data. In early-stage studies of two or more quantitative factors, empirical modelling sometimes includes response surface modelling such as quadratic (or higher-order) polynomial fitting. As expert knowledge of the system grows, focus often shifts to nonlinear modelling, such as using dose response or similar (nonlinear) model(s).

Mechanistic nonlinear model functions are sometimes based on a system of one or more differential equation(s). These are equations that model rates of change in the given system. These so-called compartmental models are popular in fields including chemical kinetics, ecology, pharmacology (including pharmacokinetics and pharmacodynamics) and toxicology (Bates and Watts 2007; Seber and Wild 1989). For example, the exponential decay model function for the population of an ecosystem at time t,

Pt=P0e-rt, 23

is a solution of the differential equation (with given initial condition),

dP(t)dt=-rP(t),P0=P0 24

This differential equation posits that the rate of decrease in the population at time t is proportional to the size of the population at that time. For another example, if the rate of change of the size of a biological culture is assumed to grow rapidly at first up to a point and then decrease (e.g., with increased competition), a commonly-assumed differential equation with ‘half-life’ condition is,

dP(t)dt=θ3θ1P(t)θ1-P(t),Pθ2=θ12 25

A solution of this differential equation is the three-parameter, normally distributed logistic growth model function,

P(t)=θ11+e-θ3t-θ2 26

Other models, such as the intermediate-product model in pharmacokinetics (Bates and Watts 2007), involve systems of two or more differential equations.

Parameter estimation for nonlinear models is generally achieved using iterative methods such the Newton–Raphson method (Ratkowsky 1983) or some variant thereof. This method involves successively substituting linear approximations such as those used in Eq. (17) into the sum of squares function and/or normal equations. This process is repeated “until convergence,” meaning until the changes in the objective function between iterations are below some chosen threshold. These computational algorithms have been implemented into the NLIN, NLP and NLMIXED procedures in SAS, the “nls,” “nlmer” and “gnm” functions in R, and other software packages such as GAUSS, Minitab, PRISM, STATA, etc. Paramount to this process is the necessity of well-chosen starting points, which is best achieved by first understanding the roles of the individual model function parameters and plotting the given data. Further details of computation aspects for nonlinear modelling can be found in nonlinear regression texts (Bates and Watts 2007; Ratkowsky 1983; Seber and Wild 1989).

Additional Nonlinear Illustrations

The following examples further serve to illustrate the wide-ranging applications of nonlinear modelling and are included for readers wishing for additional examples.

Example 4

The nonlinear model discussed here is a segmented regression function model (also called a broken-stick, piecewise, or change-point model). This model function is used in data science and application fields such as agronomy, economics, engineering, environmental studies, and medicine (Seber and Wild 1989). The data examined here (Anderson and Nelson 1975) and graphed in Fig. 4 relate average corn yields (the outcome variable) to the amount of nitrogen fertilizer applied (the input variable). Following the authors, the linear-plateau segmented model is fitted here, and the corresponding fitted linear-plateau curve is also superimposed in the figure.

Fig. 4.

Fig. 4

Simple Spline Fit. Corn yield versus nitrogen fertilizer data (six filled circles), fitted linear-plateau segmented curve, and estimated knot or join-point (filled square)

The linear-plateau model used here has parameter vector θT=α,β,κ and is written

ηx,θ=α+βx,forxκyMAX=α+βκ,forx>κ 27

This model function can also be written

ηx,θ=α+βxIxκ+α+βκIx>κ 28

Here IC is an indicator function equal to one when condition C in the subscript is true and equal to zero otherwise. In accordance with the underlying (agricultural) subject-matter reasoning used by the authors and as observing in the data plotted in Fig. 4, this chosen model is continuous at the unknown join or transition point x=κ. This is a nonlinear model since the transition point (also called a knot), κ, is a model parameter to be estimated, and for x>κ, the derivative ηβ=κ contains a model parameter.

Using the given R code (see the Supplementary Information), for these data the parameter estimates are α^=60.90,β^=0.22 and κ^=101.28, so the maximum corn yield is estimated to be y^MAX=60.90+0.22×101.28=83.58. The point (κ^,y^MAX) is the solid square plotted in Fig. 4. The 95% profile likelihood confidence interval for the transition point κ is (78.72,143.48).

Although sound reasons were already given in Sect. 3.4 for avoiding the use of Wald methods, we re-emphasize that caution is given (Hinkley 1969; Seber and Wild 1989) to avoid using Wald-based methods for such segmented models since the required asymptotical normality approximation for κ^ can often be quite poor; with such the small sample size of n=6, likelihood methods are instead recommended. This caution regarding WCIs should especially be borne in mind when using spline models with unknown knots such as in fitting smoothing splines and generalized additive models popular in the domains of predictive modelling and machine learning (James et al. 2021). ■

Example 5.

Estimation of the ratio of two homoscedastic independent-sample normal means, referred to as the Fieller-Creasy problem (Cook and Witmer 1985; Creasy 1954; Fieller 1954), is the focus of this next illustration. For n1+n2=n, let y11,y12,,y1n1 denote the n1 group 1 independent measurements and y21,y22,,y2n2 denote the n2 group 2 independent measurements. The nonlinear Fieller-Creasy model function is written.

ηx,θ=θ1x+θ1θ21-x 29

In Eq. (29), x=1 for group 1 observations and x=0 for group 2 observations. Thus, θT=θ1,θ2, θ1=μ1 and θ2=μ2/μ1. It follows that θ2 is the parameter of interest since it is the ratio of the two means and θ1 is the nuisance parameter; following Sect. 3.3, θ1 is removed by parameter profiling so as to find the PLCI for the ratio parameter, θ2.

To illuminate use of these methods here, we use the simulated dataset wherein the n1=3 group 1 response values are y1j=3,4,5 and the n2=8 group 2 response values are y2j=6,6,7,8,8,9,10,10. Clearly, θ^1=y¯1=4 and, since y¯2=8, θ^2=8/4=2. With S(θ^)=20, the unbiased estimator of σ2 is the mean-square error (MSE), s2=2011-2=2.22.

Using the results given in Appendix A.3, the (1-α)100% Wald confidence interval (WCI) for θ2 is

θ^2±stα/2,(n-2)θ^1n1+n2θ^22n1n2 30

Likewise, the Appendix A.3 results are used to show that the profile likelihood confidence interval (PLCI) for θ2 is

θ^21-c±stα/2,(n-2)1-cθ^1n11-c+n2θ^22n1n2 31

In this expression, c=s2tα/2,(n-2)2n1θ^12, and c is in the interval, 0<c<1. Thus, the center of the PLCI, θ^21-c, is shifted to the right of the center of the WCI, θ^2.

For the given data, the 95% WCI is (0.98,3.02) and the 95% PLCI is (1.30,3.94). The rightward shift of the PLCI vis-à-vis the WCI is notable here. Also, whereas the Wald approach would retain equal means (i.e., value of one is retained for the ratio parameter, θ2), the likelihood approach clearly rejects this claim. ■

In the following continuation of Example 1, we extend the original illustration to comparing two curves and calculate a relative potency parameter based on the ratio methodology of the previous illustration.

Example 1 continued.

The original Example 1 enzyme kinetic data analyzed previously and displayed in the left panel of Fig. 1 are for samples untreated with an antibiotic; the averages of the three (same concentration) replicates of these data are plotted in Fig. 5 using the small, filled triangles. In a spirit similar to other works (Bates and Watts 2007, p. 269), additional enzyme velocity measurements were made (also in triplicate) using the same substrate concentrations but for samples treated with the antibiotic. Averages of these replicates are also shown in Fig. 5 using the small, filled circles. (The fitted curves in the figure will be discussed below.) Using the relevant kinetics nonlinear modelling, researchers are interested in determining and quantifying the effect of the antibiotic on enzymatic activity.

Fig. 5.

Fig. 5

Treated and Untreated Enzyme Kinetic Model Fits. Average enzyme velocity versus concentration for antibiotic treated data (small, filled circles) with fitted common-upper-asymptote Michaelis–Menten curve (dashed curve) and untreated data (small, filled triangles) with fitted common-upper-asymptote Michaelis–Menten curve (solid curve). Also shown are estimated EC50 points: treated (larger, filled circle) and untreated (larger, filled triangle)

To enable testing between the treated and untreated groups, the Michaelis–Menten model function of Eq. (1) is modified to fit both groups simultaneously using the model function,

ηx,θ=θ1TDT+θ1UDUxθ2TDT+θ2UDU+x 32

In this expression, DT=1 for samples in the treated group and DT=0 for samples in the untreated group. Analogously, since DU=1-DT, one obtains DU=1 for samples in the untreated group and DU=0 for samples in the treated group. With θT=(θ1T,θ1U,θ2T,θ2U), this model function expression is equal to ηx,θ=θ1Txθ2T+x for samples in the treated group and ηx,θ=θ1Uxθ2U+x for samples in the untreated group, and so, as before, the θ1 and θ2 parameters are the respective upper asymptote and EC50 parameters.

For the given data, the LSE parameter estimates are θ^1T=214.6,θ^1U=209.9,θ^2T=0.03712 and θ^2U=0.06472. The global test of one curve for both treatment groups, H0:θ1T=θ1U,θ2T=θ2U, is soundly rejected with F=25.76,p<0.0001, but the claim of equal upper asymptotes, H0:θ1T=θ1U,p=0.89, is retained. (These results can be verified by running the R code in the Supplementary Information.)

The reduced two-group Michaelis–Menten model function with common upper asymptote is given by the expression,

ηx,θ=θ1xθ2TDT+θ2UDU+x=θ1xθ2TDT+ρθ2TDU+x 33

Note that this expression is equal to ηx,θ=θ1xθ2T+x for the treated group and ηx,θ=θ1xθ2U+x for the untreated group, and the commonality of the upper asymptotes is noted. The connection between the right-hand expression of Eq. (33) is given by the relation,

ρ=θ2Uθ2T 34

The so-called relative potency parameter ρ in Eq. (34) is the ratio of the respective EC50 parameters, and it is in this context that this illustration mirrors Example 5; note too that by making it an explicit model function parameter, we can readily obtain an accurate (likelihood-based) confidence interval.

When the model function in Eq. (33) is fit to these data, the fitted curves are shown in Fig. 5 for the treated (top) and untreated (bottom) groups. For these data, the LSE estimate of ρ isρ^=1.8275, so the substrate is approximately 1.8 times more potent for the treated group than for the untreated group. Further, the 95% PLCI forρ,(1.5274,2.2366), lies entirely above one, thereby establishing that the substrate is significantly more potent for the treated group than for the untreated group. ■

Example 6.

Examined here are dose–response data (Seefeldt et al. 1995) relating yield dry weight of biotype C wild oat Avena fatua (the response variable in g) to herbicide dose (the explanatory variable in kg ai/ha). These data are plotted in Fig. 6 with the raw data shown in the left panel and with the log-yield data plotted in the right panel. We use here the four-parameter log-logistic (LL4) model function (Seefeldt et al. 1995),

ηx,θ=θ2+θ1-θ21+xθ3θ4,θT=θ1,θ2,θ3,θ4 35

Fig. 6.

Fig. 6

Wild Oat Dry Weight Dose Response Fits. Left panel: Original dry weight yield data plotted versus herbicide dose with heteroskedastic (variance function modelled) LL4 model fit (solid curve). Right panel: (Natural) Log-transformed dry weight yield data plotted versus herbicide dose with homoskedastic LL4 model fit (solid curve)

In this model function, which is also called the Hill equation or the Morgan-Mercer-Flodin family (Seber and Wild 1989), θ3 is the ED50 (50% effective dose) parameter and θ4 is the slope parameter. For θ4>0, θ1 is the ‘upper asymptote,’ or the expected response when x=0 and θ2 is the lower asymptote or the expected response for very large dose (i.e., for x). To establish that θ3 is the ED50 parameter, note that when x=θ3, the expected response is indeed θ1+θ22, the average of the two asymptotes.

In viewing the non-constant variance of the original data in the left panel of Fig. 5, we can fit the LL4 model function using log-yield as the response variable, and this fitted model function is superimposed as the solid curve in the right panel plot. Alternatively, after applying the log-transformation to both left and right sides of the equation, the log-yields could be fit using the logarithm of the LL4 model function. In this instance, the results in both cases are very similar. This practice of transforming the response variable (e.g., log-transformation here) with or without transformation of the model function, and then fitting the additive homoskedastic normal nonlinear model of Eq. (3), is quite commonly-used in practice. But, whether this is a sound practice depends on whether selected variance-stabilizing transformation (such as logarithm, square-root, etc.) is a good choice for the given dataset and model function. As such, we next consider an alternative strategy.

Although it falls outside of the constant-variance normal additive paradigm of Eq. (3), another option is to fit the additive LL4 model function to model the un-transformed responses and to also model the variances using a variance function such as

varyij=σ2ηρxi,θ 36

In Eq. (36), in addition to the variance parameter, σ2, an additional parameter, ρ, has been included as the power of the mean model function, ηx,θ. If ρ=0, then Eq. (36) reduces to the usual homoskedastic case where varyij=σ2 of Eq. (3). Whenever ρ>0, this variance function holds that the variance (i.e., the spread of the data response values) decreases with the mean, and this behavior is indeed observed in the left panel of Fig. 5 since the variance of the responses is higher when the average yield is higher and lower when the average yield is lower. For the data plotted in the left panel of Fig. 6, the maximum-likelihood estimate of ρ is ρ^=1.4707, and the test of H0:ρ=0 is rejected (p<0.0001). Using results in (Seber and Wild 1989), the estimate of ρ^1.5 suggests that the fourth-root transformation (y1/4) may have been a better choice for these data than the log-transformation used above. For these data, however, since the results are very similar, the homoskedastic normal nonlinear fit shown in the right panel of Fig. 6 (for the log-transformed data) is deemed to be sufficient. ■

Discussion and Final Thoughts

Before the advent of sufficient computing power and model-fitting methods, nonlinear models—often derived and based on sound expert-knowledge and theory—were historically fit by using linearization methods. This technique ignores the overall additive model structure given in Eq. (3) and the underlying model assumptions. For example, for the Michaelis–Menten model and function, y=θ1xθ2+x+ε, if this expression is replaced with the approximation yθ1xθ2+x, algebraic manipulation leads to the expression x/yθ2/θ1+1/θ1x. With some further substitutions, the right-hand side of this expression is of the form α+βx and so linear models were then fit. Often, the resulting transformation introduced additional problems such as non-constant variance, lack-of-fit, and challenges in obtaining confidence intervals for the original model parameters. Although several authors (Currie 1982; Seber and Wild 1989) clearly warn against using such linearization methods, without introductory guides such as the current work, practitioners may not yet be aware of these problems.

In addition to the nonlinear regression methods and examples provided here, interested readers may wish to more fully explore topics such as further heteroskedastic (variance function) modelling, bioassay and synergy modelling (Lee et al. 2007; Lynch et al. 2016; Sims and O’Brien 2011; Straetemans et al. 2005; Tallarida 2000; Wheeler et al. 2006; White et al. 2019), multivariate, compartmental, and generalized nonlinear models, related experimental design considerations (Kim et al. 2021; O’Brien et al. 2010, O’Brien and Silcox 2021), and additional curvature examples (Seber and Wild 1989). Other notable recent application fields include the use of high-throughput dose response methods to evaluate compounds as potential antiviral drugs to treat COVID-19 patients (Chen et al. 2022) and modelling to assess enzymatic activity in viral proteins comparing SARS-CoV with SARS-CoV-2 (O’Brien et al. 2021).

Electronic supplementary material

Below is the link to the electronic supplementary material.

Acknowledgements

The authors express their gratitude for the thoughtful suggestions and comments from the Editor and two anonymous reviewers, which led to a significant improvement in the quality of this work. T.E.O. expresses his appreciation to the J. William Fulbright Board for granting his Fulbright Traditional Scholar Award at Budapesti Műszaki és Gazdaságtudományi Egyetem (BME) in Budapest, Hungary, the BME Mathematics and Data Science teams for cordially hosting his 2021 Fulbright visit, and to his BME Fall 2021 semester class for insightful questions and discussions related to some of this material.

Appendix A.1. Why Wald and Likelihood Confidence Interval Methods Differ: Statistical Curvature and Model Function Reparameterizations

Although the question of why Wald and likelihood intervals differ is easy to pose, it is somewhat mathematically challenging to answer. As such, the discussion here is kept at the conceptual level. Further details are given in the references and readers wishing to understand these details (which are based on differential geometry and nonlinear statistical theory) are suggested to consult these outside sources.

Agreement or disagreement of Wald and likelihood-based confidence intervals can be assessed using so-called ‘nonlinearity’ or ‘curvature’ measures introduced in several works (Bates and Watts 2007; Clarke 1987; Haines et al. 2004; Ratkowsky 1983; Seber and Wild 1989). There are two main types of curvature measures: intrinsic (denoted IN) and parameter-effects (denoted PE) curvatures. Both IN and PE curvatures characterize nonlinear attributes of a model/dataset’s expectation surface. Although no hard-and-fast rules exist regarding these IN and PE measures, general rule-of-thumb guidelines have been given to decide when nonlinearity is problematic (Bates and Watts 2007). Furthermore, reparameterizing a model function can reduce the disagreement between likelihood and Wald intervals. This means that PE curvature, which captures how parameters enter the model function, can be reduced; what’s left after all such parameter-effects curvature is removed is the intrinsic curvature (IN), which cannot be reduced and is inherent to the model function and data. Further, as pointed out in (Seber and Wild 1989, p.195), since likelihood methods are “invariant to reparameterizations, we can assume that a transformation has been found to remove the parameter-effects curvature,” and thus they are “only affected by intrinsic curvature, which is often negligible.” So, in practice, the main reason likelihood and Wald intervals differ is due to PE curvature, and this in turn is related to the manner in which distances are calculated when finding confidence intervals. This distance metric is straightforward to assess for linear models where both IN and PE measures are zero, but is more challenging for nonlinear models. In light of the linear approximation used in Wald methods, the metric used for Wald distance and interval calculations is often inaccurate.

The following one-parameter illustration provides the development and illustration of the intrinsic (IN) and parameter-effects (PE) curvature measures. It also underscores the manner in which Wald confidence intervals approximate likelihood intervals and highlights how and when these intervals can diverge.

Example 7.

The homoscedastic normal one-parameter simple exponential model function, used here and in drug studies involving pharmacokinetic one-compartment modelling (Bailer and Portier 1990), is given by the expression,

ηx,θ=e-θx,θ>0 37

This one-parameter model function is fitted here to the simulated n=2-point data, (x1,y1)=(0.50,0.93) and (x2,y2)=(4.0,0.025). (Although it would be ill-advised to use such a small sample size in practice, choosing only two data points facilitates viewing the so-called one-dimensional expectation surface in this two-dimensional space.) These data and model yield the LSE/MLE θ^=0.50, the 80% likelihood-based confidence interval (LBCI), (0.12,2.25) and the 80% Wald confidence interval (WCI), (-0.37,1.37).

To appreciate how and why likelihood and Wald intervals differ, in the left panel of Fig. 7 is plotted the one-dimensional expectation surface (E) in the two-dimensional sample space for this model function and the given data (see also Bates and Watts 2007). Since here η1=e-0.5θ andη2=e-4θ, the expectation surface is given by the equationη2=η18. The expectation surfaceE is mapped out as θ ranges from 0 to as indicated by the plotted θ values. Intrinsic curvature (IN) assesses the degree to which E deviates from a straight line here and from a plane or hyper-plane in the general p>1 case. Given the pronounced bending in E in Fig. 7, it is not surprising that the realized value ofIN, calculated here to beIN=1.44, exceeds the suggested 0.30 threshold value (Bates and Watts 2007). (Details of these calculations are omitted here and can be found in the references.) Thus, IN is deemed to be “significant” in this instance. We again underscore that for linear models,IN=0, since then the expectation surface is planar (or a straight line whenp=1).

Fig. 7.

Fig. 7

Illustration of Intrinsic (IN) and Parameter Effects (PE) Curvatures for Simple Exponential Example. Plots of the expectation surfaces (E) for one-parameter simple exponential model function/design for original (θ) parameterization (left panel) and reciprocal (γ=1/θ) parameterization (right panel). The data point is labelled “y (data)”. The selected points on each expectation surface are indexed by the indicated parameter values

The expectation surface given in the right panel of Fig. 7 is also for these data but corresponds to use of the reciprocal model function parameterization, η2x,γ=e-x/γ, instead of the original model function parameterization, ηx,θ=e-θx. For η2x,γ, the intrinsic curvature is again equal to IN=1.44; this follows due to the invariance of IN to the one-to-one reparameterization, γ=1/θ.

The data point (y1,y2)=(0.93,0.025) is also displayed in both panels of Fig. 7. The least-squares estimate points, θ^=0.5013 in the left panel and γ^=1.9948 in the right panel, correspond to the point on E nearest to the data point.

The second curvature measure, called parameter-effect (PE) curvature, assesses the extent to which the spacing of the parameter points on E near the LSE point are “regular” (i.e., equi-spaced in this one-parameter case). The spacing in the left panel is distorted since although the actual numerical distance between the numbers 1 and 0.5013 is about the same as that between the numbers 0.5013 to 0 (i.e., both are about 0.5), the distances between the respective labelled points in the left panel of Fig. 7 are quite different. That is, the arc length of the segment of E from the θ=1 point to θ^=0.50 is much less than the arc length of the segment of E from θ^=0.50 to θ=0. Since the PE measure evaluates aspects of spacing, it is not surprising, then, that the left panel’s PE value, PE=2.43, greatly exceeds the threshold suggested cut-off value of 0.30. The PE curvature situation for the right panel in Fig. 7 in the neighborhood of γ^=1.99 is not as problematic since the spacing near γ^ appears more regular. It is not surprising, then, that with the value of PE=1.03, although still above the 0.30 cut-off, the right panel’s parameter-effects curvature measure is less than half that of the left panel. Clearly, for this model and data, both the IN and PE curvature measures are high, and so the Wald approximation will be poor, as was noted above. In practice, note that these curvature values can be reduced by increasing the sample size. We next delve deeper into understanding likelihood and Wald confidence interval methods and any discrepancies between the two.

Using the given data and the original model function, ηx,θ=e-θx, Fig. 8 enables the visualization of discrepancies between Wald and likelihood confidence intervals. The 80% confidence circle centered at the data point (labelled point Y) is obtained from Eq. (5). Instead of plotting this equation in the one-dimensional parameter space, it is viewed here in the n=2-dimensional sample space and so it is the (circular) set of (η1,η2) values for which

0.93-η12+0.025-η22=Sθ^1+t0.10,12=0.3669=0.60572 38
Fig. 8.

Fig. 8

Distinguishing Wald and Likelihood Confidence Intervals. Plots of the expectation surface for the (original parameterization) simple exponential model and chosen design (curved surface E, the same as given in left panel of Fig. 7), data point (labelled Y), and point on E corresponding to the least-squares estimate θ^=0.5013 (labelled E). Also plotted are the tangent line approximation to E at E (the dot-dashed line segment) connecting points C and D, and the 80% confidence circle centered at Y. The points on E labelled A and B correspond to the likelihood confidence interval endpoints. The points on tangent line approximation corresponding to Wald confidence interval endpoints are labelled C and D. A regular-grid approximation on tangent line approximation is indicated using filled squares

The tangent line to E at the point θ^=0.5013 is also plotted in Fig. 8. It has slope (η18)η1η^1=0.78=1.38 and is given by the equation η2=-0.94+1.38η1. To facilitate understanding of parameter effects curvature, superimposed on this tangent line in Fig. 8 are a series of points (labelled in the plot as filled squares) which have regularly-spaced θ values; these values are chosen here to be θ=-0.25,0,0.25,,1,1.25.

As noted above, the intrinsic curvature measure (IN) assesses the discrepancy between the actual expectation surface E and the tangent line approximation. In this case, this discrepancy is pronounced and is reflected in the fact that the observed IN=1.44 exceeds the 0.30 cut-off. The parameter effects curvature measure (PE), on the other hand, measures the difference between the actual spacing of the θ values on E (see the left panel of Fig. 7) and the filled-square regularly-spaced θ values superimposed on the tangent line in Fig. 8. This difference is also pronounced and is reflected in the calculated value, PE=2.43, also exceeding 0.30. (In models with more than one model function parameter, PE assesses the degree to which θ values on E near θ^ behave like a regular grid.)

Figure 8 shows that the intersection of the 80% confidence circle given in Eq. (38) with the expectation surface E occurs at point A (for which η1=0.94 and η2=0.63) and point B (for which η1=0.32 and η2=0.0001). Using the model function η=e-θx and solving for the corresponding values of θ, these points give the 80% LBCI, (0.12,2.25). The intersection of the 80% confidence circle with the tangent line approximation occurs at point C (for which η1=1.12 and η2=0.60) and point D (for which η1=0.44 and η2=-0.33). Using the linear approximation to the model function, η(x,θ)e-θ^x-xe-θ^x(θ-θ^) and solving for θ, these points give the 80% WCI, (-0.37,1.37). This demonstrates how Wald and likelihood confidence intervals are obtained for nonlinear models and how and why they differ.

This simple, one-parameter, small (n=2) illustration demonstrates that Wald confidence intervals are obtained based on two approximations, which may or may not be met for a given nonlinear model and dataset. The first approximation involves replacing the actual expectation surface with its tangent line (or tangent plane or hyper-plane) approximation. The second approximation involves replacing the actual spacing of the parameter values on the expectation surface near the parameter estimate with a regular grid, and using this approximate regular grid to measure distances. For the reciprocal parameterization used above for this model (and with expectation surface in the right panel of Fig. 7), the spacing of the parameter values on the expectation surface in the vicinity of γ^ is more regular, and so the PE curvature would therefore be lower. Since commonly-used statistical software typically does not indicate when these approximations hold or fail, practitioners are wise to bear them in mind when using (approximate) Wald confidence intervals and p-values. ■

For the laetisaric illustration in Example 2 (Sect. 2) and the Fieller-Creasy illustration in Example 5 (Sect. 4), the intrinsic curvatures measures are exactly zero. This follows since the chosen model functions are transformations of linear models, the IN measure is invariant to the reparameterization of the model function, and IN is zero for linear models. For example, Example 2’s model function, ηx,θ=α1-x2θ, is a one-to-one transformation of the usual linear regression model function, ηx,θ=α+βx, using the non-singular transformation from (α,β) to (α,θ) involving θ=-α2β. This means that differences between PLCIs and WCIs for these examples result only from parameter-effects curvature, and that PLCIs are exact for these illustrations.

In addition to the two overarching reasons given in Sect. 3.4 for preferring likelihood intervals over Wald intervals, a third reason is related to model reparameterization: likelihood intervals are invariant to reparameterizations—even nonlinear ones—but Wald intervals are not. To illustrate, recall that the model function used in Example 3 isη1x,θ=xθ+x, where the model parameter (θ) is theEC50. The LSE is θ^=5.8698 and the likelihood and Wald confidence intervals for θ are 2.9960,12.7122 and 1.8106,9.9291 respectively. To assess overlap in these intervals, the intersection of these intervals is the interval, 2.9960,9.9291 and the union of these intervals is the interval,1.8106,12.7122. These latter intervals have respective lengths 6.9331 and 10.9016 and an assessment of overlap of the LBCI and WCI (Haines et al. 2004) is6.933110.9016=63.60%. This shows a fair amount of difference between these two intervals. A simple modification of the original model function isη2x,ϕ=ϕx1+ϕx, so that this new model function’s parameter (ϕ) is the reciprocal EC50 since when ϕ=1/θ is substituted here, we obtain the original model function. When this new modified model function is fit to the data, we obtain ϕ^=0.1704 (which equals1/θ^) and the respective likelihood and Wald confidence intervals for ϕ are 0.07867,0.3338 and0.05255,0.2882. Notice first that (except for roundoff error) the reciprocal of the η2x,ϕ LBCI endpoints, 10.07867=12.7113 and10.3338=2.9960, coincide exactly with the LBCI endpoints of η1x,θ given above. This invariance does not hold for WCI’s, since 10.05255=19.0299.9291 and10.2882=3.46981.8106. Further, for these data andη2x,ϕ, the overlap assessment of the LBCI and WCI is0.20950.2812=74.50%; since this value exceeds the 63.60% overlap assessment for the original parameterization, there is more agreement here between the LBCI and WCI. Thus, in general and also in multidimensional (p>1) situations, likelihood intervals are invariant to one-to-one parameter modifications and some reparameterizations yield closer agreement between likelihood and Wald intervals than others.

Appendix A.2. Distinguishing Two Likelihood Approaches and Penalizing for Overfitting

An internet search of the term, “profile likelihood confidence interval,” yields several references to asymptotic likelihood tests and intervals (Royston 2007; Stryhn and Christensen 2003; Venzon and Moolgavkar 1988). The intentional focus in our work has been solely on model function parameters in homoscedastic normal nonlinear models which use the F-based (exact or nearly-exact) likelihood-based expressions. Along these lines, we next distinguish the two different likelihood approaches and discuss penalizing for overfitting related to profiled-out model function parameters.

In contrast with the F-statistic likelihood approach used in this paper, the approximate (or asymptotic) likelihood-based test for testing H0:θ=θ0, is based on twice the change in the log-likelihood (denoted LL),

2LLθ^MLE-LLθ0 39

(Recall that the log-likelihood for the normal distribution is given in Eq. (4).) In Eq. (39), θ^MLE maximizes LL(θ). In very general situations, this asymptotic (i.e., valid for large sample sizes) test statistic approximately follows the chi-square distribution with p degrees of freedom. In a manner similar to Eq. (16), this test statistic can be “inverted” to obtain approximate likelihood intervals (see Eq. (40) below). The asymptotic likelihood approach is commonly used in generalized linear, survival and longitudinal modelling. For the homoscedastic normal cases considered here, these large-sample likelihood intervals and tests generally differ from the F-statistics likelihood methods, and F-statistic likelihood methods are preferred for the reasons given later in this section (and also due to increased power).

When comparing the F-based likelihood approach used in this work and the approximate likelihood approach in Eq. (39), in addition to better coverage, another important reason for preferring the F-based likelihood approach is that they levy a penalty for overfitting, and we demonstrate this as follows. Using model parameter profiling and inverting Eq. (39), the approximate profile likelihood confidence interval (APLCI) for key parameter θ2 is the set of θ2 for which

LLθ~1,θ2=LLθ^-12χα,12 40

Here, χα,12 is the (1-α)100% quantile of the distribution with one degree of freedom and is such that Pχ2>χα,12=α. On the other hand, as stated in the main text and repeated here, the (exact or nearly exact) F-statistic-based profile likelihood confidence interval (PLCI) is the set of θ2 values which solve the equation

S(θ~1,θ2)=Sθ^1+tα/2,(n-p)2n-p 41

Since the APLCI approach is based on the chi-square quantile with one degree of freedom, it treats the single-parameter situation and the multi-parameter situation the same. That is, in the multiparameter case, the approximate approach profiles out the nuisance parameter(s) and treats the resulting profile equation as a one-parameter likelihood equation, so it levies no penalty for estimating the profiled parameters. The preferred PLCI approach of Eq. (41), on the other hand, is based on the t distribution with (n-p) degrees of freedom, which penalizes for the estimation of the (p-1) other (i.e., nuisance) parameters. Indeed, this PLCI statistic calibrates the result since for fixed n and α, the term

tα/2,(n-p)2n-p 42

increases with the number of parameters, p. This means that for a larger total number of nonlinear model parameter to estimate, the cut-line of the profile function will be higher, and the resulting profile confidence interval will be wider (i.e., more conservative), thereby reflecting the penalty for estimating a larger number of parameters. The interested reader can visualize these results by examining Fig. 2.

Other works (Fears et al. 1996; Pawitan 2000) have highlighted Wald confidence interval anomalies however their examples have been less clear-cut and have been based on the approximate likelihood test. Focusing on variance component estimation in the one-way random effects analysis of variance (ANOVA) model, these works underscore the inadequacy of Wald method using both simulation and this approximate or large-sample likelihood approach. Estimation of the variance component in this one-way random-effects ANOVA case is problematic since the null hypothesis value (i.e., the zero in the expression, H0:σ02=0) is on the boundary of the parameter space and so the likelihood distribution statistic has a mixed distribution (Chernoff 1954; Molenberghs and Verbeke 2007). As such, although the random-effects ANOVA illustration does underscore Wald inadequacies, it easily confuses practitioners who may confound boundary issues with Wald statistics caveats and so may not realize the far-reaching nature of these Wald inadequacies. Our chosen nonlinear regression examples, on the other hand, underscore some nuances associated with nonlinear modelling and clearly show the extent of the preference of F-based likelihood methods over Wald methods.

In sum, with strong evidence in favor of likelihood methods over Wald methods—and the F-statistic-based likelihood methods considered here over the APLCI likelihood approach in Eq. (40)—it is surprising that some software packages still report Wald (and sometimes approximate likelihood) p-values and intervals for homoskedastic normal nonlinear models.

Appendix A.3. Some Detailed Results Related to the Fieller-Creasy Problem

For the Fieller-Creasy model function given in Eq. (29), the sum-of-squares function is

Sθ=j=11y1j-θ12+j=12y2j-θ1θ22 43

Differentiating in Eq. (43) with respect to the model parameters and setting to zero, the LSE parameter estimates are θ^1=y¯1 and θ^2=y¯2/y¯1 and the residual sum-of-squares is Sθ^=j=11y1j-y¯12+j=12y2j-y¯22. We restrict attention here to situations where θ1, θ2, y¯1 and y¯2 are all positive since absolute value terms can be substituted otherwise. The n×2 Jacobian matrix X has the first n1 rows equal to one in column one and to zero in column two, and the last n2 rows equal to θ2 in column one and to θ1 in column two. It follows that

XTX=n1+n2θ22n2θ1θ2n2θ1θ2n2θ12 44

For this multi-parameter (p=2) situation, the likelihood-based confidence region is the set of θ such that

S(θ)=Sθ^1+pn-pFα,p,(n-p) 45

In this instance and α=0.05, the right-hand-side of this expression is Sθ^1+pn-pFα,p,(n-p)=201+29×4.2565=38.92. Further, the (Wald) approximation to Sθ-Sθ^ here is

n1+n2θ^22θ1-θ^12+2n2θ^1θ^2θ1-θ^1θ2-θ^2+n2θ^12θ2-θ^22 46

For these data, the 95% likelihood confidence region (solid region) and the 95% Wald confidence region (dashed ellipse) are plotted in the left panel of Fig. 9. The plotted central point in the figure is the least-squares estimator, θ^T=4,2. These regions are ‘joint’ or ‘simultaneous’ confidence regions for the parameter vector θT=θ1,θ2, and in a similar manner to the confidence interval case, simulation results (Donaldson and Schnabel 1987; Seber and Wild 1989) highlight the superiority of likelihood regions over Wald regions in the sense of better agreement between nominal and actual coverage probabilities (e.g., 95%).

Fig. 9.

Fig. 9

Confidence Regions and Intervals for the Fieller-Creasy Ratio of Means. Left panel: 95% joint confidence regions for Fieller-Creasy data: likelihood based region (solid region) and Wald approximate region (dashed ellipse). Right panel: profile likelihood sum-of-squares (SSE) curve for ratio-of-means parameter (solid curve), Wald approximation SSE curve (dashed parabola), 95% and 99% horizontal labelled cut-lines, and confidence intervals: 95 likelihood interval is (A95,B95), 95% Wald interval is (C95,D95) corresponding to lower circles, 99% likelihood interval is (A99,B99), and 99% Wald interval is (C99,D99) corresponding to upper circles

Another manner to obtain the Wald approximate region in Eq. (46) is to observe that for this model function the first-order (planar) Taylor approximation of ηx,θ about the point (θ^1,θ^2) is

ηx,θ=θ1θ1θ2θ^1+θ1-θ^1(group1)θ^1θ^2+θ^2θ1-θ^1+θ^1θ2-θ^2(group2) 47

Substituting these approximations into the sum of squares expression in Eqs. (5) and (45) gives the Wald confidence region expression given in Eq. (46).

To obtain the PLCI for θ2 we profile out θ1 by setting to zero the corresponding normal equation,

j=11y1j-θ1+θ2j=12y2j-θ1θ2=0 48

Next, when θ2 is taken as fixed and this expression is solved for θ1, we obtain the constrained optimum value,

θ~1=n1θ^1+n2θ^1θ^2θ2n1+n2θ22=n1n1+n2θ22θ^1+n2θ22n1+n2θ22θ^2θ2θ^1 49

Interestingly, Eq. (49) is of the form ωθ^1+(1-ω)θ^2θ2θ^1, which is a weighted sum of the LSE, θ^1, and the modified estimator θ^2θ2θ^1 with ω= n1n1+n2θ22. The LSE weight ω increases as n1 increases relative to n2, as θ2=μ2/μ1 decreases, or as θ2 nears θ^2.

Next, the constrained maximal value θ~1 of Eq. (49) is substituted into the profile log-likelihood expression to obtain

Sθ~1,θ2-Sθ^=n1n2θ^12n1+n2θ22θ2-θ^22 50

The Wald counterpart to the adjusted profile log-likelihood expression in Eq. (50) is very similar but with the right-hand side instead equal to

n1n2θ^12n1+n2θ^22θ2-θ^22 51

This subtle but crucial change to Eq. (50)—whereby θ22 in the denominator is replaced by the estimated value θ^22—is cause for the difference between WCIs and PLCIs for this model. These details supply the justification of the WCI and PLCI expressions given in Eqs. (30) and (31).

For the given data, these shifted profile expressions are plotted in the right-hand panel of Fig. 9. The profile log-likelihood is the solid curve and the Wald approximation is the dashed parabola. The minimum value point occurs at (θ^2,0). The cut-lines in Fig. 9’s right-hand panel occur at s2tα/2,(n-2)2=(20/9)tα/2,92 which is equal to 11.37 for 95% and to 23.47 for 99%. For the given data, the 95% PLCI, (1.30,3.94), are points A95 and B95 in the right panel of Fig. 9, and the 95% WCI, (0.98,3.02), are points C95 and D95. Thus, for these data, the Wald interval contains the value of one and so would retain the hypothesis of equal means (H0:θ2=1). But since the PLCI does not contain one, the equal means hypothesis would be rejected. Further, the 99% PLCI, (1.11,6.71) (points A99 to B99 in the graph), represent a large right shift from the 99% WCI, (0.54,3.46) (points C99 to D99 in the graph). Shifts in the center of PLCIs (versus WCIs) as well as lengthening or shortening of these intervals are related to skewness and curvature measures (Clarke 1987; Haines et al. 2004; Ratkowsky 1983).

It is noteworthy to underscore the following regarding model reparameterization here. Again for x=1 for group 1 observations and x=0 for group 2 observations, we now rewrite (or reparameterize) the Fieller-Creasy model function in Eq. (29) as

η2x,θ=θ1x+θ1/θ2R1-x 52

So η2x,θ=θ1=μ1 for group 1 (as before) but now η2x,θ=θ1/θ2R for those in group 2. Thus, θ2R=μ1/μ2 is the reciprocal value of the original θ2=μ2/μ1. In this reciprocal case, the 95% PLCI for θ2R is (0.25,0.77), which is precisely the reciprocal of the 95% PLCI for θ2, (1.30,3.94). This again underscores the fact that PLCIs are invariant to one-to-one (even nonlinear) transformations. The 95% WCI for θ2R is (0.25,0.75), whereas the reciprocal of this WCI, (1.33,4.07), differs substantially from the original 95% WCI for θ2, (0.98,3.02). This invariance of the PLCI to reparameterization—and lack of invariance of the WCI—reemphasizes the important advantage of likelihood methods over Wald methods. Indeed, the coincidence of the PLCI and WCI on the reciprocal scale (i.e., for θ2R) but not on the original scale (i.e., for θ2) also highlights the statistical curvature attributes associated with this dataset and these models. We observed similar results in Sect. 3.4 for the single-parameter model function and LBCI used in Example 3, but here these results extend to the p=2 case and PLCI situation.

Author Contributions

TEO conceptualized, supervised, analyzed data, and wrote the initial draft of this manuscript. Both authors discussed and contributed to the extensive revisions. JWS also reviewed and modified all examples and R code.

Data Availability

All data used in this article is provided in the R code given in the Supplementary Information, and all needed permissions have been secured and granted.

Declarations

Conflict of interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Anderson RL, Nelson LA. A family of models involving intersecting straight lines and concomitant experimental designs useful in evaluating response to fertilizer nutrients. Biometrics. 1975;31:303–318. doi: 10.2307/2529422. [DOI] [Google Scholar]
  2. Bailer AJ, Portier CJ. A note on fitting one-compartment models: non-linear least squares versus linear least squares using transformed data. J Appl Toxicol. 1990;10(4):303–306. doi: 10.1002/jat.2550100413. [DOI] [PubMed] [Google Scholar]
  3. Bartošová O, Bonnet C, Ulmanová M, et al. Pupillometry as an indicator of L-DOPA dosages in Parkinson’s disease patients. J Neural Transm. 2018;125:699–703. doi: 10.1007/s00702-017-1829-1. [DOI] [PubMed] [Google Scholar]
  4. Bates DM, Watts DG. Nonlinear regression analysis and its applications. New York: Wiley; 2007. [Google Scholar]
  5. Bowers WS, Hoch HC, Evans PH, Katayama M. Thallophytic allelopathy: isolation and identification of laetisaric acid. Science. 1986;232:105–106. doi: 10.1126/science.232.4746.105. [DOI] [PubMed] [Google Scholar]
  6. Bursa F, Yellowlees A, Bishop A, et al. Estimation of ELISA results using a parallel curve analysis. J Immunol Meth. 2020 doi: 10.1016/j.jim.2020.112836. [DOI] [PubMed] [Google Scholar]
  7. Chen KY, Krischuns T, Ortega VL, et al. A highly sensitive cell-based luciferase assay for high-throughput automated screening of SARS-CoV-2 nsp5/3CLpro inhibitors. Antiviral Res: 2022 doi: 10.1101/2021.12.18.473303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Chernoff H (1954) On the distribution of the likelihood ratio. Ann Math Stat 25: 573–578 https://www.jstor.org/stable/2236839
  9. Clarke GPY. Marginal curvatures and their usefulness in the analysis of nonlinear regression models. J Amer Statist Assoc. 1987;82(399):844–850. doi: 10.1080/01621459.1987.10478507. [DOI] [Google Scholar]
  10. Cook RD, Witmer JA. A note on parameter-effects curvature. J Amer Statist Assoc. 1985;80(392):872–878. doi: 10.1080/01621459.1985.10478196. [DOI] [Google Scholar]
  11. Creasy MA. Limits for the ratio of means. J Roy Statist Soc Ser b. 1954;16(2):186–194. [Google Scholar]
  12. Currie DJ. Estimating the Michaelis-Menten parameters: Bias, variance and experimental design. Biometrics. 1982;38(4):907–919. doi: 10.2307/2529871. [DOI] [Google Scholar]
  13. Donaldson JR, Schnabel RB. Computational experience with confidence regions and confidence intervals for nonlinear least squares. Technometrics. 1987;29(1):67–82. doi: 10.1080/00401706.1987.10488184. [DOI] [Google Scholar]
  14. Draper NR, Smith H. Applied regression analysis. New York: Wiley; 1998. [Google Scholar]
  15. Evans MA, Kim HM, O'Brien TE. An application of profile-likelihood confidence interval to capture-recapture estimators. J Agric Biol Envir Stat. 1996;1(1):131–140. doi: 10.2307/1400565. [DOI] [Google Scholar]
  16. Faraggi D, Izikson P, Reiser B. Confidence intervals for the 50 per cent response dose. Stat Med. 2003;22(12):1977–1988. doi: 10.1002/sim.1368. [DOI] [PubMed] [Google Scholar]
  17. Fears TR, Benichou J, Gail MH. A reminder of the fallibility of the Wald statistic. Amer Statist. 1996;50(3):226–227. doi: 10.1080/00031305.1996.10474384. [DOI] [Google Scholar]
  18. Fieller EC. Some problems in interval estimation. J Roy Statist Soc Ser B. 1954;16(2):175–185. [Google Scholar]
  19. Finney DJ. Statistical method in biological assay. 3. London: Charles Griffin; 1978. [Google Scholar]
  20. Gonçalves MAD, Bello NM, Dritz SS, et al. An update on modeling dose-response relationships: accounting for correlated data structure and heterogeneous error variance in linear and nonlinear mixed models. J Anim Sci. 2016;94(5):1940–1950. doi: 10.2527/jas.2015-0106. [DOI] [PubMed] [Google Scholar]
  21. Govindarajulu Z. Statistical techniques in bioassay. 2. Basel: Karger; 2001. [Google Scholar]
  22. Haines LM, O’Brien TE, Clarke GPY (2004) Kurtosis and curvature measures for nonlinear regression models. Stat Sinica 14(2): 547–570. https://www.jstor.org/stable/24307208
  23. Halsey LG. The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum? Biol Lett. 2019;15:20190174. doi: 10.1098/rsbl.2019.0174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Hinkley DV. Inference about the intersection in two-phase regression. Biometrika. 1969;56(3):495–504. doi: 10.1093/biomet/56.3.495. [DOI] [Google Scholar]
  25. Hubert JJ. Bioassay. 3. Dubuque: Kendall/Hunt; 1992. [Google Scholar]
  26. James G, Witten D, Hastie T, Tibshirani R (2021) An introduction to statistical learning: with applications in R, 2nd edn. Springer, New York https://hastie.su.domains/ISLR2/ISLRv2_website.pdf
  27. Kim SB, Kim DS, Magana-Ramirez C. Applications of statistical experimental designs to improve statistical inference in weed management. PLoS ONE. 2021 doi: 10.1371/journal.pone.0257472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kleinbaum DG, Kupper LL, Nizam A, Rosenberg ES. Applied regression analysis and other multivariable methods. 5. Boston: Cengage; 2014. [Google Scholar]
  29. Krzywinski M, Altman N. Error bars. Nat Methods. 2013;10:921–922. doi: 10.1038/nmeth.2659. [DOI] [PubMed] [Google Scholar]
  30. Lee JJ, Kong M, Ayers GD, Lotan R. Interaction index and different methods for determining drug interaction in combination therapy. J Biopharm Stat. 2007;17:461–480. doi: 10.1080/10543400701199593. [DOI] [PubMed] [Google Scholar]
  31. Lynch N, Hoang T, O’Brien TE. Acute toxicity of binary-metal mixtures of copper, zinc, and nickel to Pimephales Promelas: evidence of more-than-additive effect. Environ Toxicol Chem. 2016;35(2):446–457. doi: 10.1002/etc.3204. [DOI] [PubMed] [Google Scholar]
  32. Meeker WQ, Escobar LA. Teaching about approximate confidence regions based on maximum likelihood estimation. Am Stat. 1995;49(1):48–53. doi: 10.1080/00031305.1995.10476112. [DOI] [Google Scholar]
  33. Mendenhall W, Sincich T. A second course in statistics: regression analysis. 8. Boston: Prentice Hall; 2020. [Google Scholar]
  34. Michaelis L, Menten ML. Die Kinetik Der Invertinwirkung. Biochem Z. 1913;49:333–369. [Google Scholar]
  35. Miguez F, Anchontoulis S, Dokoohaki H. Nonlinear regression models and applications in applied statistics. In: Glaz B, Yeater KM, editors. Agricultural, biological and environmental sciences. New York: Wiley; 2020. pp. 401–447. [Google Scholar]
  36. Molenberghs G, Verbeke G. Likelihood ratio, score, and Wald tests in a constrained parameter space. Am Stat. 2007;61(1):22–27. doi: 10.1198/000313007X171322. [DOI] [Google Scholar]
  37. O’Brien TE, Silcox J. Efficient experimental design for dose response modelling. J Appl Stat. 2021;48:2864–2888. doi: 10.1080/02664763.2021.1880556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. O’Brien TE, Jamroenpinyo S, Bumrungsup C. Curvature measures for nonlinear regression models using continuous designs with applications to optimal design. Involve J Math. 2010;3(3):317–332. doi: 10.2140/involve.2010.3.317. [DOI] [Google Scholar]
  39. O’Brien A, Chen D-Y, Hackbart M, et al. Detecting SARS-CoV-2 3CLpro expression and activity using a polyclonal antiserum and a luciferase-based biosensor. Virology. 2021;556:73–78. doi: 10.1016/j.virol.2021.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Pawitan Y. A reminder of the fallibility of the Wald statistic: likelihood explanation. Amer Stat. 2000;54(1):54–56. doi: 10.1080/00031305.2000.10474509. [DOI] [Google Scholar]
  41. Pawitan Y. In all likelihood: statistical modelling and inference using likelihood. Oxford: Oxford University Press; 2013. [Google Scholar]
  42. Peddada SD, Haseman JK. Analysis of nonlinear regression models: a cautionary note. Dose Response. 2005;3:342–352. doi: 10.2203/dose-response.003.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. R Core Team (2020) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
  44. Ratkowsky DA. Nonlinear regression modeling: a unified practical approach. New York: Marcel Dekker; 1983. [Google Scholar]
  45. Rollins MD, Feiner JR, Lee JM, et al. Pupillary effects of high-dose opioid quantified with infrared pupillometry. Anesthesiology. 2014;121(5):1037–1044. doi: 10.1097/ALN.0000000000000384. [DOI] [PubMed] [Google Scholar]
  46. Royston P. Profile likelihood for estimation and confidence intervals. Stata J. 2007;7(3):376–387. doi: 10.1177/1536867X0700700305. [DOI] [Google Scholar]
  47. Samuels ML, Witmer JA, Schaffner AA. Statistics for the life sciences. 5. Boston: Pearson; 2016. [Google Scholar]
  48. Seber GAF, Wild CJ. Nonlinear regression. New York: Wiley; 1989. [Google Scholar]
  49. Seefeldt SS, Jensen JE, Fuerst EP. Log-logistic analysis of herbicide dose-response relationships. Weed Technol. 1995;9:218–227. doi: 10.1017/S0890037X00023253. [DOI] [Google Scholar]
  50. Sims SR, O’Brien TE. Mineral oil and aliphatic alcohols: toxicity and analysis of synergistic effects on German cockroaches (Dictyoptera: Blattellidac) J Econ Entomol. 2011;104(5):1680–1686. doi: 10.1603/EC10440. [DOI] [PubMed] [Google Scholar]
  51. Straetemans R, O’Brien T, Wouters L, et al. Design and analysis of drug combination experiments. Biom J. 2005;47(3):299–308. doi: 10.1002/bimj.200410124. [DOI] [PubMed] [Google Scholar]
  52. Stryhn H, Christensen J (2003) Confidence intervals by the profile likelihood method, with applications in veterinary epidemiology. In: Proc 10th Intl Symp Vet Epi Econ pp. 208–210. https://gilvanguedes.com/wp-content/uploads/2019/05/Profile-Likelihood-CI.pdf. Accessed 14 May 2023
  53. Tallarida RJ. Drug synergism and dose-effect data analysis. Boca Raton: Chapman and Hall/CRC; 2000. [Google Scholar]
  54. Venzon DJ, Moolgavkar SH. A method for computing profile-likelihood-based confidence intervals. J Roy Stat Soc C. 1988;37(1):87–94. doi: 10.2307/2347496. [DOI] [Google Scholar]
  55. Wald A. Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans Am Math Soc. 1943;54:426–482. doi: 10.1090/S0002-9947-1943-0012401-3. [DOI] [Google Scholar]
  56. Wheeler MW, Park RM, Bailer AJ. Comparing median lethal concentration values using confidence interval overlap or ratio tests. Environ Toxicol Chem. 2006;25:1441–1444. doi: 10.1897/05-320R.1. [DOI] [PubMed] [Google Scholar]
  57. White JR, Abodeely M, Ahmed S, et al. Best practices in bioassay development to support registration of biopharmaceuticals. Biotechniques. 2019;67:126–137. doi: 10.2144/btn-2019-0031. [DOI] [PubMed] [Google Scholar]
  58. You S, Hong J-H, Yoo J. Analysis of pupillometer results according to disease stage in patients with Parkinson’s disease. Sci Rep. 2021;11:17880. doi: 10.1038/s41598-021-97599-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

All data used in this article is provided in the R code given in the Supplementary Information, and all needed permissions have been secured and granted.


Articles from Bulletin of Mathematical Biology are provided here courtesy of Springer

RESOURCES