Abstract
According to the Guide to the Expression of Uncertainty in Measurement (GUM), a result of measurement consists of a measured value together with its associated standard uncertainty. The measured value and the standard uncertainty are interpreted as the expected value and the standard deviation of a state-of-knowledge probability distribution attributed to the measurand. We discuss the term metrological compatibility introduced by the International Vocabulary of Metrology, third edition (VIM3) for lack of significant differences between two or more results of measurement for the same measurand. Sometimes a combined result of measurement from multiple evaluations of the same measurand is needed. We propose an approach for determining a combined result which is metrologically compatible with the contributing results.
Keywords: consensus value, inter-laboratory evaluations, uncertainty in measurement
1. Introduction
A function of various calibration laboratories, measurement standards organizations, national metrol-ogy institutes (NMIs), and international organizations such as the International Bureau of Weights and Measures (BIPM), the International Organization for Standardization (ISO), the International Organization of Legal Metrology (OIML), and the International Electro-technical Commission (IEC) is to ensure that the differences are insignificant between different measured values for the same measurand determined in various places, at various times, and by various measurement procedures. Without this assurance, the world’s commerce, trade, manufacturing, engineering, and scientific research would be chaotic.
The old-time thinking concerning the uncertainty in measurement based on statistical error analysis are inappropriate for the rapidly advancing science and technology of measurement. Therefore the world’s leading authorities in metrology developed a new concept of uncertainty in measurement. This concept is described in the Guide to the Expression of Uncertainty in Measurement (GUM) [1] and extended in the International Vocabulary of Metrology, third edition (VIM3) [2]. In accordance with the GUM and the VIM3, a result of measurement is generally expressed as a pair of values: a measured quantity value and its associated standard uncertainty. The measured value and the standard uncertainty together represent a range of values being attributed to the measurand [2, Sec. 2.9]. Suppose [x1, u(x1)], …, [xn, u(xn)] are n different results of measurement for a common measurand believed to be sufficiently stable, where x1, …, xn are the measured values and u(x1), …, u(xn) are the corresponding standard uncertainties. In the GUM concept of uncertainty, a measured value xi and its associated standard uncertainty u(xi) are regarded, respectively, as the expected value and the standard deviation of an incompletely determined state-of-knowledge probability density function (pdf) attributed to the common measurand, for i = 1, 2, …, n [1].
Since the era of error analysis, metrologists have used the Birge chi-square test of statistical consistency to decide whether the differences between two or more measured values x1, …, xn are insignificant (Fig. 1). The Birge test is based on regarding the measured values x1, …, xn as realizations of random variables drawn from normal (Gaussian) sampling pdfs with unknown but equal expected values and known standard deviations [3]. When the measured values are correlated they are regarded as realizations of a random vector drawn from a joint n-variate normal distribution with a known variance-covariance matrix, referred to a normal consistency model. To assess statistical consistency of a set of measured values x1, …, xn, a common practice is to pretend that the standard uncertainties u(x1), …, u(xn) are the known standard deviations of the presumed normal sampling pdfs of x1, …, xn. It has previously been pointed out [4] that the Birge test and the concept of statistical consistency motivated by it do not apply to the results of measurement based on the GUM.
Fig. 1.
Illustration of the classical approach to statistically evaluating and testing consistency of multiple measurements of the same measurand presuming a randomly disturbed measurement process. Symbols: Y – (joint) measurand, Xi – indicated quantities, ξ – possible values of the quantities Xi, q1,…,qn – vectors of the repeated observations qij where qi = [qi1, …,qik]T, gXi (ξ|qi) – pdf for the quantity Xi given the data qi, hXi (qi) – frequency distribution of the data qi.
Recently, the VIM3 [2] introduced the idea of metro-logical compatibility, which can be used to assess the significance of the differences between two or more results of measurement for the same measurand (Fig. 2). As noted in [4] the concept of metrological compatibility fits with the GUM and it can be used to assess the significance of differences between results based on the GUM for the same measurand. In Sec. 2, we discuss the VIM3 definition of metrological compatibility and its consequences in more detail than done in [4]. In this paper we propose an approach for determining a combined result which is metrologically compatible with the contributing results whether or not the results as available were compatible. When a set of results for the same measurand turn out to be incompatible, the seemingly anomalous results must be investigated. In Sec. 3, we discuss the importance of documenting information which may be needed in such investigations. Sometimes multiple evaluations of the same measurand need to be combined. A legitimate combined result must be metrologically compatible with the contributing results. In Sec. 4, we propose an approach for determining a combined result which is metrologically compatible with the contributing results. In Sec. 5, we illustrate the proposed approach using published data from an interlaboratory evaluation of the same measurand. A brief summary is given in Sec. 6.
Fig. 2.
Illustration of an uncertainty approach to the metrological evaluation and test of compatibility of two measurements of the same measurand taking all known influences on the measurement processes into consideration. Symbols: Y – (joint) measurand, Y1, Y2 – measurand of the measurements, X10, X20, – indicated quantities, q1,q2 – vectors of the repeated observations qij where qi = [qi1, …, qik]T, X1i, X2i – influence quantities with state of knowledge distributions. The elements in the grey blocks have been introduced by the GUM.
2. The VIM3 Concept of Metrological Compatibility
Generally, the measurand (quantity intended to be measured) is a property of a material or of a phenomenon. In many scientific, industrial, and commercial measurements, the measurand is sufficiently stable between multiple evaluations. Our primary interest is in such applications. Suppose two or more measurement procedures are used to measure the same measur-and. The measurement procedures may be (i) applications of the same method of measurement at different times or (ii) different implementations of a given method in different places or (iii) different methods.
A measured quantity value is a number together with a metrological reference (unit of measurement) expressing the magnitude of the quantity [2, Secs. 1.19 and 2.10] relative to the reference. A measured value must be traceable to a recognized metrological reference for it to be widely communicable. According to VIM3, two or more results of measurement for the same measurand are metrologically comparable if they are metrologically traceable to the same metrological reference [2, Sec. 2.46]. Metrological comparability does not imply that the measured values have similar magnitudes. The VIM3 concept of metrological compatibility applies only to those results of measurement which are metrologically comparable.
We assume that all results [x1, u(x1)], …, [xn, u(xn)] for a common measurand are traceable to the same metrological reference and hence they are metrologically comparable. Following the GUM [1], we use the symbol Xi for a variable with a state-of-knowledge pdf represented by the result [xi, u(xi)], for i = 1, 2, …, n. The measured value xi is regarded as the expected value E(Xi) and the standard uncertainty u(xi) is regarded as the standard deviation S(Xi) of the pdf of Xi for i = 1, 2, …, n. In the mainstream GUM, the pdf of Xi is incompletely determined; the only thing reliably known about the pdf of Xi is the expected value E(Xi) = xi and the standard deviation S(Xi) = u(xi), for i = 1, 2, …, n.
2.1 Metrological Compatibility of Two Particular Results
Metrological compatibility is defined for two results at a time. In the mainstream GUM, the difference X1 − X2 is a variable with an incompletely determined state-of-knowledge pdf for the difference between the values attributed by the two results [x1, u(x1)] and [x2, u(x2)] to the common measurand. The expected value and the standard deviation of the pdf of X1 − X2 are, respectively, E(X1 − X2) = x1 − x2 and S(X1 − X2) = √[u2(x1) + u2(x2) − 2r(x1, x2)u(x1)u(x2)], where r(x1, x2) is the correlation coefficient between X1 and X2. Following the GUM, we use the symbol u(x1 − x2) for the standard deviation S(X1 − X2).
According to the VIM3 [2, Sec. 2.47], two metrologically comparable results [x1, u(x1)] and [x2, u(x2)] for a measurand, supposed to be stable, are metrologically compatible if |x1 − x2| ≤ κ × u(x1 − x2) for a chosen threshold κ. According to the VIM3 [2, Sec. 2.47, Note 1], if two measurements for a common measurand, thought to be constant, are not metrologically compatible then there are two possibilities: (i) one or both of the measurements are incorrect (e.g., one or both of the measurement uncertainties are assessed as being too small) or (ii) the measurand changed between measurements.
We can use the VIM3 concept of metrological compatibility as a criterion to assess the significance of the differences between metrologically comparable results of measurement for the same measurand. In the mainstream GUM, the state-of-knowledge pdf represented by a result [xi, u(xi)], for i = 1, 2, …, n, is incompletely determined. Therefore, we need a quantitative measure for the difference between two fixed known results [x1, u(x1)] and [x2, u(x2)], each consisting of a measured value with standard uncertainty. Let us define a ζ-function, denoted by ζ (Δ), as
| (1) |
The value ζ (Δ) is a measure for the significance of the difference Δ. Even when a complete state-of-knowledge pdf of Δ is assumed, the metric (1) can be used to judge on the significance of the difference. Based on this metric we can restate the VIM3 definition of metrological comparability as follows [4]:
Definition: Two metrologically comparable results [x1, u(x1)] and [x2, u(x2)] for the same measurand are said to be metrologically compatible if
| (2) |
for a chosen value of some threshold κ, where
| (3) |
and r(x1, x2) is the correlation coefficient between the variables X1 and X2 with state-of-knowledge pdfs represented by the results [x1, u(x1)] and [x2, u(x2)].
In definition 1, the value of κ is a chosen threshold for declaring metrological compatibility (lack of significant difference) of two results. Values for ζ (x1−x2) larger than κ are regarded as significant. The results are compatible, when the difference between the measured values x1 and x2 is insignificant in view of the standard uncertainties u(x1) and u(x2).
The VIM3 does not discuss how the threshold κ should be determined. A proper choice of the threshold κ is to a large extent a matter of agreement because it requires accepting the economic consequences of that choice. A conventional value of the threshold κ in metrology is two.
If one would agree on a larger value for κ then small differences are not detectable any more. This would be a disadvantage for applications when detecting small differences is important. But if we would agree on a smaller value for κ then a lot of small differences become significant even though they might be only a consequence of noisy measurements and the economic consequences are suffered by the metrological community trying to provide compatible measurement systems.
2.2 Metrological Compatibility of a Set of Results
According to the VIM3 [2, Sec. 2.47], a set of comparable results [x1, u(x1)], …, [xn, u(xn)], where n ≥ 2, is metrologically compatible if every one of the n(n − 1)/2 pairs of results [xi, u(xi)] and [xj, u(xj)], for i, j = 1, 2, …, n and i < j, is metrologically compatible. We can use expression (2) in this case by replacing x1 with xi and x2 with xj.
If for all pairs of results the values of ζ(xi − xj) are smaller than or equal to a chosen threshold κ then the set of results [x1, u(x1)], [x2, u(x2)], …, [xn, u(xn)] is metrologically compatible.
We can say that the differences between the measured values x1, …, xn are insignificant in view of the uncertainties u(x1), …, u(xn).
Note 1: A conventional idea that if the number n of the measured values x1, …, xn is large, it is natural to expect one or more of them to be significantly different from the rest comes from the theory of sampling from probability distributions having long tails which extend, for example, beyond two standard deviations. If the measurement procedures are properly carried out and the results of measurement are properly evaluated according to the GUM taking into account all important influence quantities, then a set of results for the same measurand should be metrologically compatible. When some results of measurement seem anomalous, they require explanation rather than acceptance. Often, anomalous results are consequence of missing important influence quantities.
2.3 Metrological Compatibility With a Reference Result
Suppose that in addition to the n measurement procedures, which yield the comparable results [x1, u(x1)], [x2, u(x2)], …, [xn, u(xn)], where n ≥ 2, the same measurand is measured by a higher echelon measurement procedure (or laboratory) yielding the reference result [xR, u(xR)], where xR is the reference value with standard uncertainty u(xR). Alternatively, the common measurand may be a certified reference material of reference value xR with standard uncertainty u(xR), which are not revealed before all n results of measurement are reported. We will use the symbol XR for a variable with a state-of-knowledge pdf represented by the result [xR, u(xR)]. In general, the uncertainty u(xR) associated with the reference value xR is smaller than the uncertainties u(x1), …, u(xn) associated with the measured values x1, …, xn.
If for all differences between the results xi and value xR, the values ζ (xi − xR) are smaller than or equal to a chosen threshold κ then the set of results [x1, u(x1)], [x2, u(x2)], …, [xn, u(xn)] is metrologically compatible with the reference value xR. We can say that the differences between the measured values x1, …, xn and the reference value xR are insignificant in view of the uncertainties u(x1), …, u(xn) and u(xR).
One should not confuse the difference ζ (xi − xR) between the results [xi, u(xi)] and [xR, u(xR)] with En-values which do not seem to be uniquely defined.1
2.4 Metrological Compatibility With a Combined Result
Sometimes the results [x1, u(x1)], [x2, u(x2)], …, [xn, u(xn)], where n ≥ 2, need to be combined to determine a combined result [xC, u(xC)], where xC is the combined value and u(xC) is the standard uncertainty associated with xC. We will use the symbol XC for a variable with a state-of-knowledge pdf represented by [xC, u(xC)]. In accordance with the GUM, the combined variable XC for a value of the measurand should be defined as a measurement function of the input variables X1, …, Xn. Often, XC is set as a convex linear combination of X1, …, Xn with non-negative weights a1, …, an which sum up to one. Thus often a measurement function for XC is of the form
| (4) |
where ai ≥ 0 and Σiai = 1, for i = 1, 2, …, n. Since (4) is a linear function in Xi the expected value E(XC) of XC is the combined value xC, where
| (5) |
and the standard deviation S(XC) of XC is the standard uncertainty u(xC) where
| (6) |
If the individual measurement procedures are all uncorrelated then the cross-product term in (6) is zero.
If ai = 1/n for i = 1, 2, …, n, then XC reduces to the arithmetic average XA = (1/n) Σi Xi. The expected value E(XA) is xA = (1/n) Σi xi and the standard deviation S(XA) denoted by u(xA) can be determined from (6). If the pdfs for X1, …, Xn are uncorrelated, then
| (7) |
If ai = wi/Σi wi, where wi = 1/u2(xi) then XC reduces to the weighted mean XW = Σi wi Xi/Σi wi with weights inversely proportional to the variances u2(x1), …, u2(xn). The expected value E(XW) is xW = Σi wi xi/Σi wi and the standard deviation S(XW) denoted by u(xW) can be determined from (6). If the pdfs for X1, …, Xn are uncorrelated, then
| (8) |
If for all differences between the results xi and the combined value xC, the values ζ (xi − xC) are smaller than or equal to a chosen threshold κ then the set of results [x1, u(x1)], [x2, u(x2)], …, [xn, u(xn)] is metro-logically compatible with the combined value xC. Then we can say that the differences between the measured values x1, …, xn and the combined value xC are insignificant in view of the uncertainties u(x1), …, u(xn).
In evaluating u(xi − xC) the correlation coefficient between Xi and XC must be included because the pdfs of Xi and XC are always correlated, for i = 1, 2, …, n. For example, if the pdfs for X1, …, Xn are uncorrelated, then the variance, V(Xi − XC), denoted by u2(xi − xC) is
| (9) |
If ai = 1/n, for i = 1, 2, …, n, then xC reduces to the arithmetic average xA = (1/n) Σi xi and the uncertainty u(xi − xC) given in (9) reduces to u(xi − xA), where
| (10) |
If ai = wi/Σi wi, where wi = 1/u2(xi), for i = 1, 2, …, n, then xC reduces to the weighted mean xW = Σi wixi/Σi wi and the uncertainty u(xi − xC) given in (9) reduces to u(xi − xW), where
| (11) |
If the uncertainties u(x1), u(x2), …, u(xn) were all equal to u(x), say, then xW reduces to xA and u2(xW) reduces to u2(xA) = u2(x)/n. Then both (10) and (11) reduce to
| (12) |
Note 2: Sometimes, the standard uncertainties u(x1), u(x2), …, u(xn) are not all reliably determined. Also, the standard uncertainties are frequently inappropriate bases for assigning the weights a1, a2, …, an to the measured values x1, x2, …, xn to determine a combined result. Therefore the weighted mean xW may be inappropriate for combining the values. Thus, in our view, the arithmetic mean xA should be regarded as a default combined value.
3. Information Needed to Determine Sources of Incompatibility
A purpose of assessing metrological compatibility is to demonstrate lack of significant difference between the results of measurement for a common measurand. If a set of results turns out to be metrologically incompatible then the measurement procedures and calculations underlying the seemingly anomalous results should be investigated. Every result of measurement should have supporting documents which include the measurement function (measurement equation) and complete uncertainty budget. If the influence quantities, uncertainty components, and correlation coefficients identified in the uncertainty budget are reasonable then in search of the possible sources of incompatibility one must look into potential influence quantities not included in the uncertainty budget.
Investigations to determine the sources of incompatibility are generally done in retrospect long after completing the measurements. Therefore investigators need detailed descriptions of what was actually done during measurement. Often, metrologists do not have enough time and resources to document in sufficient detail for retrospective investigation what was actually done in a particular application of the measurement procedure. In the absence of such documentation it may be difficult to determine possible sources of incompatibility.
Note 3: We hope that in the not too distant future, metrologists and information technology experts would collaborate to develop tools which make it easier for metrologists to document in real time the actual measurement procedure while the measurements are being done. Such documentation should be helpful in identifying all potentially important influence quantities.
4. Determination of a Combined Value and Its Associated Uncertainty
Even when the common measurand is sufficiently stable, the results [x1, u(x1)], …, [xn, u(xn)] can exhibit large variation. Metrological incompatibility occurs when some or all results (measured values or standard uncertainties) are improperly determined. Frequently, improper results are consequence of missing important influence quantities. For example, in many chemical measurements, the measurand is the amount of one component in a sample of multi-component material. The other components can interfere with the measurements. Frequently, it is impossible to know all potential interferences. Therefore, it is difficult to be sure that all significant influence quantities have been accounted for in determining the measured values and uncertainties.
For a combined result [xC, u(xC)] to be legitimate it should be metrologically compatible with the contributing results of measurement [x1, u(x1)], …, [xn, u(xn)]. Therefore we propose the following principle.
Principle for combining multiple results for the same measurand: Determine the combined result [xC, u(xC)] from the expressions (5) and (6) as recommended in the GUM. If the results [x1, u(x1)], [x2, u(x2)], …, [xn, u(xn)] are metrologically compatible with the combined result [xC, u(xC)], then u(xC) is a valid expression for the standard uncertainty associated with xC. If the results [x1, u(x1)], [x2, u(x2)], …, [xn, u(xn)] are metrologically incompatible with the combined result [xC, u(xC)], then the seemingly anomalous results should be investigated. Until the investigation resolves the anomalous results, in the absence of additional knowledge, all results in a metrologically incompatible set should be regarded with suspicion. To determine a legitimate combined result, we propose that the measured values x1, …, xn should be sustained and each of the uncertainties u(x1), u(x2), …, u(xn) should be enlarged just enough to make the results [x1, u(x1)], [x2, u(x2)], …, [xn, u(xn)] metrologically compatible with the combined result [xC, u(xC)].
This approach was first proposed in [5] and has recently been used in [6]. Thus we define variables Y1, …, Yn with corrected state-of-knowledge pdfs for the common measurand as follows
| (13) |
where δX1, …, δXn are correction variables. Then a measurement function for the combined variable YC is
| (14) |
where ai ≥ 0 and Σi ai = 1, for i = 1, 2, …, n, and the pdfs for the correction variables δX1, …, δXn are mutually independent and independent of the pdfs for X1, …, Xn. The pdfs assigned to the correction variables δX1, …, δXn express the limits of knowledge. Thus, we assign zero expected values and the same variance u2(δ) to each of the correction variables δX1, …, δXn. Thus the expected value E(δXi) is zero and the variance V(δXi) is u2(δ), for i = 1, 2, …, n. It follows from (13) that the expected value yi and the variance u2(yi) of the pdf for Yi are
| (15) |
and
| (16) |
for i = 1, 2, …, n.
We propose that the variance u2(δ) should be set just large enough to make the results [y1, u(y1)], [y2, u(y2)], …, [yn, u(yn)] compatible with the result [yC, u(yC)]. As discussed in Sec. 2.4, the results [y1, u(y1)], [y2, u(y2)], …, [yn, u(yn)] are compatible with [yC, u(yC)] when
| (17) |
or equivalently
| (18) |
for all i = 1, 2, …, n. From (15), we have
| (19) |
and
| (20) |
From the appendix, we have
| (21) |
Therefore, the criterion of compatibility (18) is equivalent to
| (22) |
for all i = 1, 2, …, n. It follows that
| (23) |
for all i = 1, 2, …, n. Thus, if u2(δ) is chosen as
| (24) |
then each of the corrected measured values y1, …, yn would be metrologically compatible with the combined measured value yC. If the measured values x1, …, xn are compatible with the combined measured value xC then each of the n quantities in the curly parenthesis of (24) are negative and u2(δ) = 0. In that case the measurement function (14) reduces to (5) and the uncertainty associated with the combined measured value xC is given by (6).
4.1 Arithmetic Average
If ai = 1/n, for i = 1, 2, …, n, then xC reduces to the arithmetic average xA and from (24),
| (25) |
The combined value yC reduces to yA = (1/n) Σi yi = (1/n) Σi xi = xA. To assure that the measured values y1, …, yn are compatible with yA one can check that
| (26) |
where as shown in the appendix
| (27) |
Expressions for u2(xi− xA) and u2(δ) are given in (10) and (25), respectively. The uncertainty associated with yA is from (7)
| (28) |
4.2 Weighted Mean
Since the variance associated with yi is u2(yi) = u2(xi) + u2(δ), a weighted mean with weights inversely proportional to the variances of the results y1, …, yn is yW = Σi wi yi/Σi wi, where yi = xi, and wi = 1/u2(yi) = 1/[u2(xi) + u2(δ)] for i = 1, 2, …, n. The measured values y1, …, yn are compatible with yW if
| (29) |
for all i = 1, 2, …, n. Analogous to (11)
| (30) |
where
| (31) |
The variance u2(δ) is the smallest value which would make the measured values y1, …, yn compatible with yW. Such a value for u2(δ) can be iteratively determined using the value of u2(δ) from (25) as a starting value.
Note 4: Let us use the symbol Ytrue for a true quantity value [2, Sec. 2.11] of the common measurand commensurate with its description. (In the GUM, the same symbol Y is also used for a quantity with a state-of-knowledge pdf for the common measurand.) If the measurand is defined in extensive detail, a true value Ytrue may be essentially unique. If the measurand is defined in less detail, then a range of values may be commensurate with its definition and any one of them qualifies as a true value Ytrue of the measurand. The concept of metrological compatibility relates to the observed differences between the measured values x1, …, xn rather than to the unobservable differences between the measured values and a true value Ytrue of the measurand. Therefore, regardless of whether the measured values x1, …, xn are compatible or incompatible with the combined value xC, the measured values alone provide no information about the difference between xC and Ytrue. In particular, metrological compatibility does not imply that the difference between xC and Ytrue is not significant. However, there is no factual knowledge about potential significant difference between xC and Ytrue. Therefore, a correction applied to xC for its potential significant difference between xC and Ytrue and enlargement of the uncertainty u(xC) determined from (6) as discussed in [7] would be arbitrary.
5. Combined Result From an Interlaboratory Evaluation
The Columns 2 and 3 of table 1 reproduce from [8, Table 3] the measured values, cLab, and the corresponding standard uncertainties, u(cLab), for the amount content of lead (Pb) in natural river water as determined by the eight laboratories2 identified in column 1 of table 1. We will use these data to illustrate calculation of a combined result. Suppose the arithmetic average cAvg = 62.79 nmol/kg is used as the combined measured value. The associated standard uncertainty based on the expression (7) is u(cAvg) = 0.26 nmol/kg. The values of ζ(cLab − cAvg) between the reported results [cLab, u(cLab)] and the combined result [cAvg, u(cAvg)] determined by using the expression (10) for the standard uncertainty u(cLab − cAvg) are shown in column 4 of table 1. Suppose the threshold for metrological compatibility is set as κ = 2. One of the values of ζ(cLab − cAvg) (from LNE) is larger than 2.00. Therefore not all of the eight reported results [cLab, u(cLab)] are metrologically compatible with the combined result [cAvg, u(cAvg)]. Until potential flaws in the deviant result (from LNE or the others) are determined, all results must be regarded with suspicion. Therefore, as discussed in Sec. 4, we propose that all reported measured values should be sustained and each of the uncertainties should be enlarged by the amount u2(δ) = 1.130 determined from the expression (25). The adjusted (enlarged) standard uncertainties u(cLab) based on the expression (16) are shown in column 5 of table 1. Based on the adjusted uncertainties u(cLab), the standard uncertainty associated with the arithmetic mean cAvg determined from the expression (28) is u(cAvg) = 0.46 nmol/kg. The differences ζ(cLab, cAvg) based on the adjusted uncertainties are shown in column 6 of table 1. Since none of the values of ζ(cLab − cAvg) is larger than 2.00, the adjusted results [cLab, u(cLab)] given in columns 2 and 5 of table 1 are metrologically compatible with the combined result [cAvg, u(cAvg)].
Table 1.
The measured values cLab for the amount content of Pb in natural river water and their associated standard uncertainties u(cLab) in nmol/kg units as reported in [8]. Also shown are the differences ζ(cLab − cAvg) based on the reported uncertainties and the adjusted (enlarged) uncertainties
| Laboratory Identifier | Amount Content cLab/nmol/kg | Reported Uncertainty u(cLab)/nmol/kg | Reported ζ-value | Adjusted Uncertainty u(cLab)/nmol/kg | Adjusted ζ-value |
|---|---|---|---|---|---|
| NMi | 61.40 | 1.10 | 1.40 | 1.53 | 0.99 |
| NIMC | 62.21 | 0.30 | 1.56 | 1.10 | 0.54 |
| KRISS | 62.30 | 0.45 | 1.04 | 1.15 | 0.44 |
| LGC | 62.34 | 0.62 | 0.75 | 1.23 | 0.38 |
| NRC | 62.60 | 0.75 | 0.27 | 1.30 | 0.15 |
| IRMM | 62.70 | 0.26 | 0.25 | 1.09 | 0.08 |
| NIST | 62.84 | 0.15 | 0.19 | 1.07 | 0.05 |
| LNE | 65.90 | 1.35 | 2.60 | 1.72 | 2.00 |
Figures 3 and 4 display the measured values cLab (given in column 2 of table 1) and the arithmetic average cAvg = 62.79 nmol/kg along with the corresponding expanded uncertainty intervals (for coverage factor k =2). In Fig. 3, the expanded uncertainty intervals are based on the standard uncertainties as reported in [8] and reproduced in column 3 of table 1; in particular, the standard uncertainty u(cAvg) associated with cAvg is u(cAvg) = 0.26 nmol/kg. In Fig. 4, the expanded uncertainty intervals are based on the adjusted (enlarged) standard uncertainties displayed in column 5 of table 1; in particular, the standard uncertainty u(cAvg) associated with cAvg is u(cAvg) = 0.46 nmol/kg.
Fig. 3.
The measured values cLab and their arithmetic average cAvg for the amount content of lead (Pb) with the expanded uncertainty intervals (for coverage factor k = 2) determined from the uncertainties stated in the report [8] and reproduced in column 3 of table 1. The arithmetic average is cAvg = 62.79 nmol/kg with standard uncertainty u(cAvg) = 0.26 nmol/kg.
Fig. 4.
The measured values cLab and their arithmetic average cAvg for the amount content of lead (Pb) with the expanded uncertainty intervals (for coverage factor k = 2) determined from the adjusted (enlarged) uncertainties given in column 5 of table 1. The arithmetic average is cAvg = 62.79 nmol/kg with standard uncertainty u(cAvg) = 0.46 nmol/kg.
In both Figs. 3 and 4, the expanded uncertainty intervals (for coverage factor k = 2) for the measured values overlap with the expanded uncertainty interval for the arithmetic average cAvg. However, not all of the eight results in Fig. 3 are metrologically compatible with the combined result [cAvg, u(cAvg)]. This shows that there is no direct correspondence between the overlap of the expanded uncertainty intervals (for coverage factor k = 2) and the VIM3 concept of metrological compatibility.
6. Summary
The VIM3 [2] concept of metrological compatibility applies to only those results which are metrologically comparable; that is, the results must be traceable to the same reference. Metrological compatibility is a pairwise concept. Two metrologically comparable results for the same measurand are said to be metrologically compatible if the ζ-value of the difference between the results is less than or equal to a chosen threshold (usually 2.0). A set of metrologically comparable results is metrologically compatible if all of the distinct pairs of results are metrologically compatible. The concept of metrological compatibility easily extends to compatibility of a set of results with a reference result or a combined result. Metrological compatibility does not require complete knowledge of the pdfs represented by the results of measurement.
Often multiple evaluations for the same measurand must be combined to determine a combined result. For a combined result to be legitimate it should be metrologically compatible with the contributing results of measurement. When the results are metrologically incompatible with the combined result, we propose that the measured values should be sustained and each of the standard uncertainties should be enlarged just enough to make the results compatible with the combined result. Then the results can be combined using the GUM. This approach has been found to be useful in many practical applications.
Acknowledgments
We thank Javier Bernal, Tyler Estler and two anonymous referees for their comments on earlier drafts of this paper.
Biography
About the authors: Raghu Kacker is a researcher in the Applied and Computational Mathematics Division (ACMD) of the Information Technology Laboratory (ITL) of the National Institute of Standards and Technology (NIST). His current interests include software testing and evaluation of the uncertainty in outputs of computational models and physical measurements. He has co-authored over 100 refereed papers. He has a Ph.D. in statistics. He is a Fellow of the American Statistical Association and a Fellow of the American Society for Quality. Rüdiger Kessel was a guest researcher in the Applied and Computational Mathematics Division (ACMD) of the Information Technology Laboratory (ITL) of the National Institute of Standards and Technology (NIST). He is an electronic and data systems engineer and currently a researcher at Physikalisch Technische Bundesanstalt in Germany. He has a Ph.D. in sciences from the Analytical Chemistry Department of the University of Antwerp, Belgium and he is the developer of a standard software tool to evaluate uncertainty of measurement. His current interests include evaluation of uncertainty in physical and chemical measurements, modelling of measurements and software development. The National Institute of Standards and Technology is an agency of the U.S. Department of Commerce.
7. Appendix
Since δXC = ΣiaiδXi, we have expected value E(δXC) = 0 and variance . Thus , where the third term is covariance. Therefore .
If ai = 1/n, for i = 1, 2, …, n, then .
Footnotes
One version defines En-value as En = (xi − xR)/√[(2si)2 + (2sR)2], where xi and xR are regarded as realizations of random variables with sampling pdfs and si and sR are the estimated standard deviations of those sampling pdfs. Thus En-values are realizations of random variables with sampling pdfs. Some metrologists substitute in the denominator of the En-value, the expanded standard uncertainties for 2si and 2sR. This is inappropriate uses of the expanded standard uncertainties.
Reference [8] is the final report of the CIPM international key comparison CCQM-K2. In this paper, we have used data from [8] to illustrate calculation of a combined result. We do not address data analysis of a key comparison to determine the key comparison reference value (KCRV) and the degrees of equivalence (DOE).
Contributor Information
Rüdiger Kessel, Email: ruediger.kessel@nist.gov.
Raghu N. Kacker, Email: raghu.kacker@nist.gov.
Klaus-Dieter Sommer, Email: klaus-dieter.sommer@ptb.de.
8. References
- [1].ISO . 1995 Guide to the Expression of Uncertainty in Measurement (GUM) 2nd ed. Geneva: International Organization for Standardization; [Google Scholar]
- [2].BIPM/JCGM International Vocabulary of Metrology— Basic and general concepts and associated terms. 3rd ed. Sèvres: Bureau International des Poids et Mesures, Joint Committee for Guides in Metrology; 2008. http://www.bipm.org/utils/common/documents/jcgm/JCGM_200_2008.pdf. [Google Scholar]
- [3].Kacker RN, Forbes AB, Kessel R, Sommer K. Classical and Bayesian interpretation of the Birge test of consistency and its generalized version for correlated results from interlaboratory evaluations. Metrologia. 2008;45:257–264. [Google Scholar]
- [4].Kacker RN, Kessel R, Sommer K. Assessing differences between results determined according to the Guide to the Expression of Uncertainty in Measurement. J Res Natl Stand Technol. 2010;115:453–459. doi: 10.6028/jres.115.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Kessel R, Bergland M, Wellum R. Application of consistency checking to evaluation of uncertainty in multiple replicate measurements Accreditation and Quality Assurance: Journal of Quality. Comparability and Reliability in Chemical Measurement. 2008;13:293–298. [Google Scholar]
- [6].Wellum R, Verbruggen A, Kessel R. A new evaluation of the half-life of 241Pu. J Analytical Atomic Spectrometry. 2009;24:801–807. [Google Scholar]
- [7].Kacker RN, Datla RU, Parr AC. Statistical analysis of CIPM key comparisons based on the ISO Guide. Metrologia. 2004;41:340–352. [Google Scholar]
- [8].Papadakis I, Taylor PDP, De Bièvre P. CCQM-K2 key comparison: cadmium and lead content in natural water. Metrologia. 2001;38:543–547. [Google Scholar]




