Abstract
Calibration curves are an important part of many measurement processes. The user of a fitted calibration curve must know its precision and accuracy. These are determined in a timely fashion using the data iteratively. This paper gives a method that divides the data into training and test groups. The test group is iteratively checked to see that a prechosen nominal confidence interval probability of coverage is met. If on the basis of this check the calibration experiment is completed, the nominal probability level is shown to still be valid.
Keywords: constants, measurements, observations, probability, statistics, statistical methods
1. Introduction
Calibration curves are an important part of many measurement processes. The user of a fitted calibration curve must know its precision and accuracy [1]1, and these are determined in a timely fashion by using the data iteratively. This paper gives a method that divides the data into training (calibration curve-producing) and test (check) groups. The test group is iteratively checked to see that a prechosen nominal confidence interval probability of coverage is met. If on the basis of this check the calibration experiment is completed, the nominal probability level is shown to still be valid.
We assume that the measurement process has negligible drift. This is only partially checked by the iterative calibration technique; of course, routine application of control chart procedures is a must [2].
It is also assumed that many measurements are taken between calibrations. Under this circumstance particularly appropriate statistical calibration procedures are found in Scheffé [3], Lieberman, Miller, and Hamilton [4], and Knafl, Sacks, Spiegelman, and Ylvisaker [5]. We concentrate on the Scheffé procedure; it is demonstrated on an engineering example in Lechner, Reeve, and Spiegelman [6].
All of these procedures produce interval estimates such that the true value is contained in (1−α)% of them in the long run with probability 1–δ. The two probability levels α and δ are chosen by the calibrator.
In order to describe the iterative procedure, the necessary notation is given.
2. Notation and Method
There are two fundamental variables: Y which is a nonstandard measurement of a property and x which is an exact standard or certified value of a possibly different property. For the example in section 3, x represents the gravimetric value (mass) of liquid in a tank (fig. 1) and Y represents differential pressure.
Figure 1–
Calibrated tank located at NBS. A cubic model was used to correspond to linear deformation of all the tank walls.
These two variables are related by the equation Y=Hβ+σe where the terms of the equation are defined below. The other observables are and they correspond to unknown .
Here Y is an nxl vector of observations, H is an nxp full rank matrix whose i-th row is , e is an nxl vector of independent and identically distributed standard normal random variables having mean zero and covariance matrix V(e)=In and σ is the standard deviation. The are post calibration observations and the goal is to estimate their associated ; in this paper are taken to be unknown constants.
The calibration curve is denoted by m(x)=ht(x)β; it is taken to be monotonic. Let the least squares estimate of m(x) be denoted by and its variance by σ2s2(x). Initially we assume that σ2 is known. We discuss estimating σ2 in section 4.
Data are nearly always collected sequentially; therefore it makes sense to analyze them sequentially. Once the measurement process is out of control additional measurements are of value only in identifying the problem. If a reasonable statistical procedure is available for iteratively analyzing the data, as is the procedure outlined in this section, then it should be used. This will help identify out-of-control situations early.
Of course the ability to detect out-of-control situations depends on the calibration design, i.e., x-values used for the calibration. Such designs have been discussed in detail for linear spline calibration curves [7]. As a byproduct of the present investigation we show the soundness of the advice in the cited work against using the exact optimal design. In fact efficiency under an assumed model and an ability to check when this model holds are competing demands.
A procedure is given for checking in an ad hoc fashion the validity of the previous assumptions. The checks are deliberately for coverage probabilities rather than directly for the assumptions, i.e., the stated (1-α) uncertainty level is checked. This is an indirect check on the underlying assumptions. If an assumption is marginally violated and yet the 1-α is met, the author sees little reason to doubt the calibration procedure. If the nominal level is not met, then the calibrator is expected to at least check his measurement procedure and possibly reset his equipment. The novelty of this procedure is that if the experiment is carried to completion, the nominal levels (1−α), and (1−δ) remain valid.
Our procedure is as follows:
Step 1. After a reasonable amount of data is collected, the data are divided into two groups, SG1 and SG2. New data are placed in either group. Ways in which this may be done are given as comments at the end of this section. Each group should contain approximately half the data, although under some circumstances other divisions are reasonable (see section 4). The partitioning of the available data can be done randomly or according to a well chosen statistical sampling plan (see the comments at the end of this section).
Step 2. Choose the probability levels 1−α and 1−δ. In order to simplify the notation, anything calculated only from the data in SG1 has subscript 1; anything calculated from all the data has no subscript.
Step 3. Using only data from SG1, determine the least squares estimate of m(x), and its variance .
-
Step 4. From the data in SG1, form the Scheffé upper and lower curves U(x) and L(x). The rationale is given in Scheffé [3] and Lechner, Reeve, and Spiegelman [6].
For all x
Here za is the two-tailed α point of a standard normal; is the upper δ point of the chi-squared distribution with p degrees of freedom and -
Step 5 (optional). Calculate the minimum of s1(x) in the calibration region. Denote min s1(x) by s. Redefine za to be the solution q of the equation
(1) As explained [5], this step reduces the conservativeness of the Scheffé procedure while maintaining the validity of the probability statements.
- Step 6. For each (xi, Yi,) in SG2 check whether or not xi ∈ [L−1(Yi),U−1(Yi)]. Let
Recall both and m(x) are linear combinations of the hj(x), j = 1, …,p. Let θt=(θl,…,θp) be a vector parameter in Rp and . Let . Finally denote the likelihood conditioned on SG1 and thus also on by
From the likelihood L(x,θ), get the maximum likelihood estimator for θ and compute the maximum likelihood estimator for p(x,θ), . Check whether or not for all x in the calibration region. If for some x, , consider the measurement process possibly defective. If for all x, and the calibration experiment is not finished, collect the next data point and return to step 1.
Many scientists may not have the computer programs readily available to form the efficient maximum likelihood estimator of p(x,θ). In these cases we recommend using local averaging or otherwise smoothed estimates (see Stone [8] or Collomb [9]). In particular we recommend a nearest neighbor approach. Choose a number k and then at each point x in the calibration region average the Ti, values corresponding to the k closest xi values to x. For small samples there is little known about choosing k; however, in large samples a value of k approximately equal to n2/5 should be satisfactory.
This procedure provides a balanced check on whether the conservativeness of the Scheffé procedure and the lack of the model holding exactly, seriously alter the hoped for uncertainty level 1-α. The bigger the sample size, the less conservative the Scheffé procedure.
Comments about design
As previously stated, the Scheffé procedure is very conservative when s1(x) is large. Therefore, some of the best diagnostic information comes from data where s1(x)= s. The optimal (D-optimal) design takes observations where s(x) is at a maximum. Thus for a straight line the optimum design has observations only at the ends of the calibration region. Some of the best diagnostic information occurs at x-values in the middle and will be missed with this design.
Comments about subgroups
Often the calibrator will have a good understanding about the possible malfunction of his measurement system. Then a choice of subgroups will be clear. He should feel free to choose as many combinations as he likes. The validity of uncertainty statements for completed calibrations remains. The check procedures are ad hoc, and if many checks are performed, he should expect some of them to indicate a possible malfunction of his system. The interpretation of these ad hoc checks requires sound scientific and engineering judgment. Some possible choices of subgroups are:
If we are mainly interested in detecting drift then SG1 should contain the older measurements and SG2 the newer ones. If we want to check run-to-run variability, SG1 and SG2 should not contain observations from the same run.
Suppose we want to check whether or not m(x) has the assumed form over a subinterval [a,b]. Then SG1 should not contain (if possible) observations with x-values in [a,b].
3. Analysis
We show that if σ is known all the Ti are independent of ; in this case, our iterative check does not affect the coverage probabilities when the model defined in section 2 holds.
Theorem. When σ is known the statistics Ti are independent of .
Proof:
(2) |
Clearly eq (2) is equivalent to
Given the least squares estimate for β, , ; similarly .
Thus, is uncorrelated with . Since the and are jointly normal the Ti are independent of . Q.E.D.
Suppose σ is not known but estimated independently from the calibration experiment. Then if the upper and lower bounds in Scheffé [3] are modified as he indicated the desired uncertainty statements still apply. This follows from the fact that all of the Ti are independent of . This is not obvious to the author so the details are included.
Let Ti be modified to incorporate replacement of σ by σ1; see Scheffé [3] for details. It is to be shown that the Ti are independent of . Divide all sides of modified eq (2) by . After some algebra can be written as a function of the ratio of two independent chi-squares whose sum is proportional to . By applying standard change-of-variable techniques, it can be seen that is independent of this ratio. Finally are uniformly distributed on a unit sphere and are independent of . Q.E.D.
4. Example
The pressure mass calibration example is based upon data collected under the direction of J. Whetstone of NBS. The tank is of an experimental nature and is located in the fluid mechanics building at the National Bureau of Standards. The calibration curve relates pressure and mass measurements. In the region where the tank is used the calibration curve is hypothesized to be a straight line. However, due to bowing of the tank walls C. P. Reeve of NBS’ Statistical Engineering Division and the author felt a cubic model was more appropriate. This model corresponds to linear deformation of all the tank walls.
The calculations made were done using the updated version of the program fully documented in Lechner, Reeve, and Spiegelman [10]. The updated program allows designation of training and test samples and automatically indicates whether or not a test point is in the calibration interval. Further information about this modification can be obtained from the author or C. P. Reeve.
The data are shown in table 1. In figure 2 residuals from the five runs are shown. Clearly run 2 is quite different from the others. However, as figure 3 indicates, the third run is also quite different from runs 1, 4, and 5.
Table 1.
Mass-pressure calibration data.
Mass | Pressure | Run |
---|---|---|
567.004 | 2.06534 | 1 |
567.2 | 2.0655 | 3 |
567.22 | 2.05974 | 2 |
585.772 | 2.32647 | 4 |
586.091 | 2.32747 | 3 |
604.913 | 2.58939 | 5 |
604.964 | 2.5881 | 3 |
623.878 | 2.84772 | 3 |
680.441 | 3.62457 | 1 |
680.693 | 3.61958 | 2 |
699.204 | 3.88191 | 4 |
718.321 | 4.14248 | 5 |
737.333 | 4.39982 | 3 |
793.881 | 5.17109 | 1 |
794.134 | 5.16728 | 2 |
812.658 | 5.4279 | 4 |
831.74 | 5.68723 | 5 |
850.749 | 5.94467 | 3 |
907.347 | 6.71461 | 1 |
907.572 | 6.71065 | 2 |
926.108 | 6.97103 | 4 |
Figure 2–
Residuals from runs 1–5.
Figure 3–
Residuals from runs 1,3,4, and 5.
In all cases σ2 is estimated from the data. For the data on hand if SG1 contains any data points from run 2 then is identically one. That is, the Scheffé intervals include all the data in SG2. This is true regardless of how many points are in SG1, provided it is five or more. (Note: Five is the minimum number of observations needed). If all of the points from run 2 are in SG2 then the Scheffé intervals cover none of them. In particular if SG2 contains only the data from run 2, is identically zero, see figure 4.
Figure 4–
Summary of cross-validation results. Data from runs 1, 3, 4, and 5 are shown in SG1; data from run 2 are shown in SG2. A value bigger than 1 in absolute value indicates an x value outside the calibration interval.
Note that in typical cross validation procedures a fixed number of observations, usually one, is dropped out at a time and the procedure checked [11]. If this is done then the estimate of p(x,θ) is identically one. It can be shown that even if four or five observations are dropped out at one time the resulting average estimate of p(x,θ) will be nearly one. Thus, it appears that in this case purposeful choice of SG1 and SG2 is important.
5. Conclusions and Summary
It is important to find out early whether or not a calibration procedure is in control. In particular for the example in section 3, had the new procedure been applied the experiment might have been terminated as a failure after run 3. Alternatively one additional run to compensate for run 2 may have been collected. Surely something different would have been done. Clearly, too, the Scheffé procedure is conservative enough to account for some unmodeled run-to-run variation as in run 3.
Thus an iterative calibration can provide insight into the calibration procedure in a timely fashion without doing too much violence to the final uncertainty statements.
Acknowledgments
The author thanks J. Whetstone for providing the data and insight into his calibration system. The data were jointly examined with C. P. Reeve who has written a program to implement many of the procedures shown in this paper.
Biography
About the Author, Paper: C. H. Spiegelman is with the Statistical Engineering Division in NBS’ Center for Applied Mathematics. His work was partially supported under Office of Naval Research contract N00014-83-k-0005.
Footnotes
Figures in brackets indicate literature references at the end of this paper.
References
- [1].Eisenhart C. Realistic evaluation of the precision and accuracy of instrument calibration systems. Journal of Research NBS, 67c: 161–187; 1963. [Google Scholar]
- [2].Parobeck P. Tom B.; Ku H.; Cameron J.. Measurement assurance program for weighings of respirable coal mine dust samples. The Journal of Quality Technology, 13, No. 3: 157–165; 1981. [Google Scholar]
- [3].Scheffé H. A statistical theory of calibration, Annals of statistics, 1: 1–37; 1973. [Google Scholar]
- [4].Lieberman G. J.; Miller R. G.; Hamilton M. A.. Unlimited simultaneous discrimination intervals in regression. Biometrica 54; 133–145; 1967. [PubMed] [Google Scholar]
- [5].Knafl G.; Sacks J.; Spiegelman C.; Ylvisaker D.. Nonparametric calibration. Accepted for publication in Technometrics (1984).
- [6].Lechner J. A.; Reeve C. P.; Spiegelman C. H.. An implementation of the Scheffé approach to calibration using spline functions, illustrated by a pressure-volume calibration. Technometrics 24, No. 3: 229–234; 1982. [Google Scholar]
- [7].Spiegelman C. H.; Studden W. J.. Design aspects of Scheffé’s calibration theory using linear spines. Journal of Research NBS, 85: 295–304; 1980. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Stone C. J. Consistent nonparametric regression. Annals of Statistics, 5: 595–645; 1977. [Google Scholar]
- [9].Collomb G. Estimation nonparamćtrique de la régression: Revue bibliographique. International Statistical Review, 49, No. 1: 75–93; 1981. [Google Scholar]
- [10].Lechner J. A.; Reeve C. P.; Spiegelman C. H.. A new method of assigning uncertainty in volume calibration. NBSIR 80–2151, 101 pages; 1980. [Google Scholar]
- [11].Golub G.; Heath M.; Wahba G. Generalized crorss-validation as a method for choosing a Good Ridge parameter. Technometrics 21: 215–223; 1979. [Google Scholar]