Skip to main content
Journal of Research of the National Institute of Standards and Technology logoLink to Journal of Research of the National Institute of Standards and Technology
. 2010 Dec 1;115(6):453–459. doi: 10.6028/jres.115.031

Assessing Differences Between Results Determined According to the Guide to the Expression of Uncertainty in Measurement

Raghu N Kacker 1, Rüdiger Kessel 1, Klaus-Dieter Sommer 2
PMCID: PMC4548867  PMID: 27134797

Abstract

In some metrology applications multiple results of measurement for a common measurand are obtained and it is necessary to determine whether the results agree with each other. A result of measurement based on the Guide to the Expression of Uncertainty in Measurement (GUM) consists of a measured value together with its associated standard uncertainty. In the GUM, the measured value is regarded as the expected value and the standard uncertainty is regarded as the standard deviation, both known values, of a state-of-knowledge probability distribution. A state-of-knowledge distribution represented by a result need not be completely known. Then how can one assess the differences between the results based on the GUM? Metrologists have for many years used the Birge chisquare test as ‘a rule of thumb’ to assess the differences between two or more measured values for the same measurand by pretending that the standard uncertainties were the standard deviations of the presumed sampling probability distributions from random variation of the measured values. We point out that this is misuse of the standard uncertainties; the Birge test and the concept of statistical consistency motivated by it do not apply to the results of measurement based on the GUM. In 2008, the International Vocabulary of Metrology, third edition (VIM3) introduced the concept of metrological compatibility. We propose that the concept of metrological compatibility be used to assess the differences between results based on the GUM for the same measurand. A test of the metrological compatibility of two results of measurement does not conflict with a pairwise Birge test of the statistical consistency of the corresponding measured values.

Keywords: Birge test, interlaboratory evaluations, predictive p-value, uncertainty

1. Introduction

To test the proficiency of individual laboratories in conducting specific tasks, interlaboratory comparisons (ILC) are often used. In ILC between measurement laboratories, the task is generally the measurement of a common artifact or fractions of the same sample of material. To develop a certified reference material, a well characterized material is measured by two or more methods in one or more laboratories. In both cases the data consist of multiple results of measurement (measured values with associated uncertainties) of a common measurand. To assess the differences between two or more measured values for the same measurand, metrologists have for many years used a test proposed by physicist Raymond T. Birge in 1932 [1]. Birge introduced the term consistency for lack of significant differences between measured values. The Birge test is based on treating the measured values as realizations of random draws from sampling probability density functions (pdfs). A sampling pdf models possible outcomes for measured values in contemplated replications of the measurement procedure in the same conditions. Therefore, the consistency of measured values assessed by the Birge test is statistical consistency. The Birge test applies to uncorrelated measured values only. In Sec. 2, we review a concept of statistical consistency motivated by the Birge test. The idea of statistical consistency belongs to the period when the error analysis view of measurements was prevalent. The error analysis view of measurements was a hindrance to communicating the results of measurement and in advancing the science and technology of measurement. Therefore leading authorities in the field of metrology developed the Guide to the Expression of Uncertainty in Measurement (GUM) [2]. According to the GUM, a result of measurement consists of a measured value together with its associated standard uncertainty. In the GUM, the measured value is regarded as the expected value and the standard uncertainty is regarded as the standard deviation, both known values, of a state-of-knowledge probability distribution. A state-of-knowledge distribution represented by a result of measurement need not be completely known. We note in Sec. 3 that the Birge test and the concept of statistical consistency motivated by it are not applicable to the results of measurement based on the GUM. Then how can one assess the differences between results based on the GUM for the same measurand? In 2008, the International Vocabulary of Metrology, third edition (VIM3) [3] introduced the concept of metrological compatibility of two or more results of measurement determined according to the (GUM). In Sec. 4, we review the VIM3 concept of metrological compatibility and propose that this concept be used to assess the differences between multiple results based on the GUM for the same measurand. In Sec. 5, we show that a test of the metrological compatibility of two results of measurement does not conflict with a pairwise Birge test of the statistical consistency of the corresponding measured values.

2. The Birge Test and Concept of Statistical Consistency

Suppose x1, …, xn are n measured values for a common measurand which is believed to be sufficiently stable. The Birge test is based on regarding the measured values x1, …, xn as realizations of random draws from their presumed sampling pdfs. A sampling pdf models possible outcomes in contemplated replications of a measurement procedure subject to random effects in the same conditions. Therefore, the consistency (lack of significant differences between measured values) assessed by the Birge test is statistical consistency. The Birge test is applicable when the sampling pdfs of the measured values x1, …, xn are uncorrelated. The Birge test requires knowledge of the variances σ12, …, σn2 of the sampling pdfs of respectively. Statistical consistency of the measured values x1, …, xn means that their expected values are indistinguishable1 in view of the corresponding variances. Specifically, the Birge test checks whether the measured values x1, …, xn may be modeled as realizations from normal (Gaussian) sampling pdfs with unknown but equal expected values and known variances σ12, …, σn2. Birge proposed that to check the consistency of the measured values x1, …, xn, one can calculate the test statistic

R2=i=1nwi(xixw)2/(n1), (1)

where wi = 1/σi2, for i = 1, 2, …, n, and xW = Σi wi xii wi is the weighted mean of x1, …, xn. If the calculated value of R2 is substantially larger than one, then the dispersion of x1, …, xn is greater than what can be expected from the normal pdfs with equal expected values and known variances σ12, …, σn2. In that case the measured values x1, …, xn can be declared to be statistically inconsistent.

Statistical interpretation of the Birge test

Birge was a physicist and he proposed his test independently of and before much of the statistical theory as it is known today was established. However, the Birge test of consistency can now be interpreted as a classical (sampling theory) statistical test of hypothesis. The measured values x1, …, xn are presumed to have normal sampling pdfs with unknown but equal expected values and variance-covariance matrix τ2 × Diag [σ12, …, σn2], where τ2 is an unknown parameter and σ12, …, σn2 are known. The null hypothesis H0 is that τ2 ≤ 1 and the alternative hypothesis H1 is that τ2 > 1. The null hypothesis H0 means that the variances of x1, …, xn are not greater than σ12, …, σn2, respectively. The alternative hypothesis H1 means that the variances of x1, …, xn are greater than σ12, …, σn2 [4]. The classical p-value pC is the maximum probability under the null hypothesis of realizing in contemplated replications of the n measurements a value of the test statistic more extreme than its realized (calculated) value. The classical p-value of a realization of (n – 1) R2 is

pC=Pr{χ(n1)2(n1)R2}, (2)

where χ2(n–1) denotes a variable with the chi-square probability distribution with degrees of freedom (n – 1) [4]. If the classical p-value pC is too small, say, less than 0.05, then the null hypothesis is rejected with level of significance 0.05 or less. A rejection of the null hypothesis means that the dispersion of the measured values x1, …, xn is greater than what can be expected from normal distributions for x1, …, xn with equal expected values and stated variances σ12, …, σn2, respectively. The dispersion of x1, …, xn can be greater than expected under the null hypothesis because either the variances of x1, …, xn are greater than σ12, …, σn2 or their expected values are not equal. If the stated variances σ12, …, σn2 are not questionable then the assumption that the expected values of x1, …, xn are equal appears to be unreasonable. In that case, the measured values x1, …, xn can be declared to be statistically inconsistent.

Limitations of the Birge test

A limitation of the Birge test is that it is applicable for uncorrelated measured values x1, …, xn only. However, it can be easily generalized to correlated measured values x1, …, xn whose covariances denoted by σ12, …, σ(n–1)n are known [4]. The Birge test suggests the following notion of the statistical consistency of the measured values x1, …, xn: The measured values x = (x1, …, xn)t are said to be statistically consistent if their dispersion is not greater than what can be expected from the normal consistency model which postulates that the joint n-variate sampling pdf of x is normal N(1μ, D) with unknown expected value 1μ and variance-covariance matrix D = [σij], where 1 = (1, …, 1)t, σij is the covariance between xi and xj, and σii = σi2 for i, j = 1, 2, …, n [4].

Another limitation of the Birge test (and of its generalized version for correlated measured values) is that it is a one sided test of hypothesis which checks whether the dispersion of x1, …, xn is more than what can be expected from a normal consistency model. A review of the Birge test in [5] notes that if the realized value of the Birge test statistic R2 is substantially less than one, then the stated variances σ12, …, σn2 may well be too large. To avoid declarations of statistical consistency from overstated variances, the following definition of statistical consistency was proposed in [6].

Definition of statistical consistency

The measured values x = (x1, …, xn)t are said to be statistically consistent if they reasonably fit the normal consistency model which postulates that the joint n-variate sampling pdf of x is normal N(1μ, D) with unknown expected value 1μ and variance-covariance matrix D = [σij].

This definition requires a different approach for testing statistical consistency than the Birge test and its generalized version for correlated values. A modern method to assess the fit of a statistical model to the data is Bayesian posterior predictive checking [6]. Posterior predictive checking is a Bayesian adaptation of the classical (sampling theory) statistical hypothesis testing. A function of the data (and possibly unknown parameters) called ‘discrepancy measure’ is defined to characterize a potential discrepancy between the statistical model and the data. The posterior predictive p-value pP of adiscrepancy measure T(x) is the probability of realizing in contemplated replications a value of the discrepancy measure more extreme than its realized value. If the posterior predictive p-value is close to zero (or to one) then the fit of the statistical model to data is suspect.

If the measured values x1, …, xn were uncorrelated, then the statistic Tc(x) = (n – 1) R2 = Σi wi (xixW)2 is a useful discrepancy measure to check the overall fit of the normal consistency model N(1μ, D) to the measured values x1, …, xn. As discussed in [6, Sec. 2.4], the posterior predictive p-value of the realized discrepancy measure Tc(x) = (n – 1) R2 is

pP=Pr{χ(n1)2(n1)R2}. (3)

We note that (3) is identical to the classical p-value pC given in (2). Thus Bayesian posterior predictive checking of the discrepancy measure Tc(x) = (n – 1) R2 is equivalent to the Birge test of statistical consistency.

Bayesian posterior predictive checking can be used to investigate any number of potential discrepancies between the statistical model and the data. To assess the difference between two particular measured values xi and xj, the statistic Ti – j(x) = | xixj| is a useful discrepancy measure, for i, j = 1, 2, …, n and ij. The Bayesian posterior predictive p-value of the realized discrepancy measure |xixj| is

pP=Pr{Z|xixj|σi2+σj22ρijσiσj}, (4)

where ρij is the correlation coefficient between the presumed normal sampling pdfs of xi and xj; the covariance between xi and xj is σij = ρij σi σj, and Z denotes a variable with standard normal distribution N(0, 1) [6, Sec. 3.2]. A posterior predictive p-value pP close to zero suggests that the difference between xi and xj is larger than what can be expected from the normal statistical consistency model N(1μ, D). That is, the measured values xi and xj do not seem to have the same expected value and hence they are not mutually statistically consistent.

3. Concept of Statistical Consistency Does Not Apply to Results Based on the GUM

A result of measurement determined according to the GUM consists of a measured value together with its associated standard uncertainty. Suppose [x1, u(x1)], …, [xn, u(xn)] are n results of measurement for a common measurand, where x1, …, xn are the measured values and u(x1), …, u(xn) are the corresponding standard uncertainties. According to the GUM, a measured value xi and its associated standard uncertainty u(xi) represent a state-of-knowledge pdf attributed to the measurand, for i = 1, 2, …, n. Following the GUM, we use the symbol Xi for a quantity as well as for a variable with a state-of-knowledge pdf about the quantity Xi represented by the result [xi, u(xi)], for i = 1, 2, …, n. The measured value xi is regarded as the expected value E(Xi) and the standard uncertainty u(xi) is regarded as the standard deviation S(Xi) of the pdf of Xi, for i = 1, 2, …, n. The mainstream GUM requires knowledge of only the expected value E(Xi) and the standard deviation S(Xi) of a state-of-knowledge pdf of Xi. The GUM does not require that the state-of-knowledge pdf of Xi be completely known. When the state-of-knowledge pdfs of X1, …, Xn are correlated, the correlation coefficients are assumed to be known. Following the GUM we denote the correlation coefficient R(Xi, Xj) between the state-of-knowledge pdfs of Xi and Xj by the symbol r(xi, xj). Note that {x1, …, xn}, {u(x1), …, u(xn)}, and {r(x1, x2), …, r(x(n–1), xn)} are symbols for known values.

For many years, metrologists have used the Birge test as ‘a rule of thumb’ to assess the consistency of the measured values by treating the squared standard uncertainties u2(x1), …, u2(xn) as the known variances σ12, …, σn2 of the presumed normal (Gaussian) sampling pdfs of the measured values x1, …, xn; see, for example [8]. The guideline for the analysis of key comparisons developed by the BIPM Director’s Advisory Group on Uncertainties recommends the use of Birge chi-square test to assess the consistency of measured values by treating the squared standard uncertainties as the known variances of the presumed sampling pdfs of the measured values [9]. The consistency of the measured values from CIPM key comparisons and supplementary comparisons is almost always assessed using the Birge test [10].

The squared standard uncertainties u2(x1), …, u2(xn) cannot in any logical sense be identified with the known variances σ12, …, σn2 of the presumed normal (Gaussian) sampling pdfs of the measured values x1, …, xn. The standard deviation of a sampling pdf represents possible dispersion from random variation in contemplated replications of the measurement procedures. A standard uncertainty expresses the dispersion of a state-of-knowledge pdf which could be attributed to the measurand based on all available statistical and non-statistical information. A standard uncertainty includes all significant components whether arising from random effects or from corrections applied for systematic effects. All available statistical and non-statistical information is used to evaluate a standard uncertainty. In measurements done in high echelon laboratories, the component of uncertainty arising from random effects is generally a very small part of the combined standard uncertainty. Treating the squared standard uncertainties u2(x1), …, u2(xn) determined according to the GUM as the known variances σ12, …, σn2 from random variation (in contemplated replications of the measurements) is a misuse of the standard uncertainties. Also, as noted earlier, the state-of-knowledge pdfs represented by the results [x1, u(x1)], …, [xn, u(xn)] may not be completely known. Therefore the Birge test and the concept of statistical consistency motivated by the Birge test do not apply to the results of measurement determined according to the GUM.

4. VIM3 Concept of Metrological Compatibility Applies to Results Based on the GUM

A measured quantity value [3, definitions 1.19 and 2.10] is a product of a numerical value and a measurement unit. The measurement unit implies that the measured value is traceable to a reference for that measurement unit. A result of measurement (measured value together with its associated standard uncertainty) is traceable to a reference only if the result can be related to a practical realization of that reference through a documented unbroken chain of calibrations each contributing to the measurement uncertainty [3, definition 2.41]. Two or more results of measurement are metrologically comparable only if they are traceable to the same reference [3, definition 2.46]. Metrological comparability does not imply that the measured values have similar magnitudes. Thus, for example, distance between my apartment and my office expressed in meters is metrologically comparable to the distance between my apartment and the moon also expressed in meters. The concept of metrological compatibility discussed in the next section applies only to those results of measurement for a common measurand which are metrologically comparable. That is, the results must be traceable to the same reference.

The concept of statistical consistency can be applied to any set of numerical values which have similar magnitudes. They do not have to be measured values. Thus, for example, one can test statistical consistency of deviations (or relative deviations expressed as percentage) from a benchmark value. Although a metrologist is expected to assess consistency of only those measured values which have the same measurement unit, it is not a requirement of statistical consistency.

All n results [x1, u(x1)], …, [xn, u(xn)] for a common measurand must be traceable to the same reference for them to be metrologically comparable [3, definition 2.46]. The VIM3 concept of metrological compatibility is defined for two results of measurement at a time. The following definition is an elaboration of the succinct definition given in VIM3 [3, definition 2.47].

Definition of metrological compatibility

Two metrologically comparable results [x1, u(x1)] and [x2, u(x2)] for the same measurand are said be metrologically compatible if

ζ(x1x2)=|x1x2|u2(x1)+u2(x2)2r(x1,x2)u(x1)u(x2)κ, (5)

for a specified threshold κ, where r(x1, x2) is a symbol for the correlation coefficient R(X1, X2) between the variables X1 and X2. The quantity in the denominator of (5) is the standard deviation of the state-of-knowledge pdf for X1X2, which may be incompletely determined. When the pdfs represented by [x1, u(x1)] and [x2, u(x2)] are uncorrelated, then R(X1, X2) = 0 and (5) reduces to

ζ(x1x2)=|x1x2|u2(x1)+u2(x2)κ. (6)

A set of metrologically comparable results [x1, u(x1)], [x2, u(x2)], …, [xn, u(xn)] for the same measurand is said be metrologically compatible if for every one of the n(n – 1)/2 pairs of results [xi, u(xi)] and [xj, u(xj)] we have

ζ(xixj)=|xixj|u2(xi)+u2(xj)2r(xi,xj)u(xi)u(xj)κ, (7)

for a specified threshold κ [3, definition 2.47]. The VIM3 does not discuss how the threshold κ should be determined. A conventional value of κ is two.

The concept of metrological compatibility can be used to assess the differences between the results of measurement based on the GUM for the same measurand. The concepts of metrological comparability and compatibility do not require that the state-of-knowledge pdfs represented by the results [x1, u(x1)], [x2, u(x2)], …, [xn, u(xn)] be completely known. Thus they fit the GUM. When the set of results [x1, u(x1)], …, [xn, u(xn)] is metrologically compatible, we can say that the differences between the measured values x1, …, xn are insignificant in view of the uncertainties u(x1), …, u(xn).

To assess metrological compatibility of results based on the GUM using the criteria (5), (6), or (7)), the threshold κ needs to be specified. A proper choice of κ is to a large extent a matter of agreement because it requires accepting the economic consequences of that choice. Although a conventional value of κ is two, depending on the application, the interested parties could agree on a different value for κ. Once the value of the threshold κ is set the conclusion of a test of metrological compatibility based on the VIM3 definition is dichotomous, either a set of results is metrologically compatible or incompatible. The concept of metrological compatibility is being used by metrologists who are familiar with it; see for example [11, 12].

The VIM3 definition of metrological compatibility can be easily extended to metrological compatibility of a set of results and a reference result [xR, u(xR)], where xR is the reference value with standard uncertainty u(xR). Suppose the pdfs represented by the measurement results are uncorrelated with the pdf represented by the reference result. A set of results [x1, u(x1)], …, [xn, u(xn)] metrologically comparable with a reference result [xR, u(xR)] is compatible if

ζ(xixR)=|xixR|u2(xi)+u2(xR)κ, (8)

for i = 1, 2, …, n [13]. Similarly a set of results [x1, u(x1)], …, [xn, u(xn)] metrologically comparable with a combined result [xC, u(xC)], where xC is the combined value (such as arithmetic mean or a weighted mean) with standard uncertainty u(xC) is compatible if

ζ(xixC)=|xixC|u2(xi)+u2(xC)2r(xi,xC)u(xi)u(xC)κ, (9)

where r(xi, xC) denotes the correlation coefficient between the pdfs represented by [xi, u(xi)] and [xC, u(xC)], for i = 1, 2, …, n [13].

5. Concluding Remarks

For many years, metrologists have used the Birge chi-square test as ‘a rule of thumb’ to assess the differences between two or more measured values for the same measurand by pretending that the squared standard uncertainties were the known variances of the presumed normal sampling pdfs of the measured values. This is misuse of the standard uncertainties based on the GUM. The Birge test and the concept of statistical consistency do not apply to the results of measurement based on the GUM. As discussed in this paper, the VIM3 concept of metrological compatibility can be used to assess the differences between the results of measurement determined according to the GUM. Thus metrologists can start using the VIM3 concept of metrological compatibility in place of the Birge test to assess the differences between multiple results of measurement of the same measurand.

The following is a pertinent question. Could the conclusions (about mutual agreement of results) based on the VIM3 concept of metrological compatibility and the Birge test (based on treating squared standard uncertainties as the known variances of sampling pdfs of measured values) differ? It is difficult to directly compare the Birge test and a test of metrological compatibility because the former is defined for an arbitrary positive integer n > 1 and the latter is defined for only two results at a time. For pairwise comparisons (n = 2), the Birge test statistic R2 = Σi wi (xixW)2/(n – 1) reduces to

R2=(x1x2)2(σ12+σ22), (10)

which is square of (x1x2)/√(σ12 + σ22). Under the null hypothesis that the presumed normal sampling pdfs of x1 and x2 have the same expected value, the distribution of (x1x2)/√(σ12 + σ22) is normal N(0, 1). Therefore when n = 2, the normal distribution can be used to assess the absolute difference | x1x2 |. The square of a normal N(0, 1) variable has the chi-square distribution χ2(1) with degrees of freedom 1. Therefore the square of the (1 – α/2) × 100-th percentile z[1–α/2] of normal N(0, 1) distribution is equal to the (1 – α) × 100-th percentile χ2(1)[1–α] of χ2(1) distribution. Thus the realized value of (10) being less than χ2(1)[1–α] is equivalent to the ratio (x1x2)/√(σ12 + σ22) being less than z[1–α/2]. It follows that declaration of Birge statistical consistency when the classical p-value pC of the Birge test (2) is less than 0.05 (for example) is equivalent to the realization that

|x1x2|(σ12+σ22)z[0.975]=1.962. (11)

We note from (6) and (11) that if the threshold κ for metrological compatibility is set as κ = 2 then the conclusion of a check of metrological compatibility between a pair of results [x1, u(x1)] and [x2, u(x2)] would be identical to the assessment of statistical consistency between x1 and x2 based on the Birge test by (wrongly) treating u2(x1) and u2(x2) as σ12 and σ22, respectively (and treating the correlation coefficient R(X1, X2) as ρ12 which is zero in the Birge test). Therefore a pairwise Birge test of statistical consistency and a test of metrological compatibility do not conflict.

Acknowledgments

We thank Javier Bernal, Tyler Estler, Walter Liggett, and Raju Datla for their comments on earlier drafts of this paper.

Biography

About the Authors: Raghu N. Kacker is a mathematical statistician in the Information Technology Laboratory of the National Institute of Standards and Technology, Gaithersburg, MD 20899, USA.

Rüdiger Kessel is a guest researcher in the Information Technology Laboratory of the National Institute of Standards and Technology, Gaithersburg, MD 20899, USA.

Klaus-Dieter Sommer is director of the Chemical Physics and Explosion Protection Division of the National Metrology Institute of Germany, Physikalisch-Technische Bundesanstalt, D-38116 Braunschweig, Germany.

The National Institute of Standards and Technology is an agency of the U.S. Department of Commerce.

Footnotes

1

In statistical literature the term consistency is applied to a statistical estimator. A point statistical estimator is said to be consistent if it approaches the parameter being estimated as the sample size increases.

Contributor Information

Raghu N. Kacker, Email: raghu.kacker@nist.gov.

Rüdiger Kessel, Email: ruediger.kessel@nist.gov.

Klaus-Dieter Sommer, Email: klaus-dieter.sommer@ptb.de.

6. References

  • [1].Birge RT. The calculation of errors by the method of least squares. Physical Review. 1932;40:207–227. [Google Scholar]
  • [2].GUM . Guide to the Expression of Uncertainty in Measurement. 2nd ed. Geneva: International Organization for Standardization; 1995. 2008. version available at http://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf. [Google Scholar]
  • [3].BIPM/JCGM . International Vocabulary of Metrology—Basic and general concepts and associated terms. 3rd ed. Sèvres: Bureau International des Poids et Mesures, Joint Committee for Guides in Metrology; 2008. available at http://www.bipm.org/utils/common/documents/jcgm/JCGM_200_2008.pdf. [Google Scholar]
  • [4].Kacker RN, Forbes AB, Kessel R, Sommer K. Classical and Bayesian interpretation of the Birge test of consistency and its generalized version for correlated results from interlaboratory evaluations. Metrologia. 2008;45:257–264. [Google Scholar]
  • [5].Taylor BN, Parker WH, Langenberg DN. Determination of e / h,Using Macroscopic Quantum Phase Coherence in Superconductors: Implications for Quantum Electrodynamics and the Fundamental Physical Constants. Review of Modern Physics. 1969;41:375–496. [Google Scholar]
  • [6].Kacker RN, Forbes AB, Kessel R, Sommer K. Bayesian posterior predictive p-value of statistical consistency in interlaboratory evaluations. Metrologia. 2008;45:512–523. [Google Scholar]
  • [7].Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2nd ed. Chapman & Hall; 2004. [Google Scholar]
  • [8].Mohr PH, Taylor BN. CODATA recommended values of the fundamental physical constants: 1998. Reviews of Modern Physics. 2000;72:351–495. doi: 10.1103/RevModPhys.93.025010. current version available at http://physics.nist.gov/cuu/Constants/index.html. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Cox MG. The evaluation of key comparison data. Metrologia. 2002;39:589–595. these guidelines were developed by the BIPM Director’s Advisory Group on Uncertainties. [Google Scholar]
  • [10].The BIPM key comparison database 2010. http://kcdb.bipm.org/
  • [11].Wellum R, Verbruggen A, Kessel R. A new evaluation of the half-life of 241Pu. Journal of Analytical Atomic Spectrometry. 2009;24:801–807. [Google Scholar]
  • [12].Datla RU, Kessel R, Smith AW, Kacker RN, Pollock DB. Uncertainty analysis of remote sensing optical sensor data: guiding principles to achieve metrological consistency. International Journal of Remote Sensing. 2010;31:867–880. [Google Scholar]
  • [13].Kessel R, Kacker RN, Sommer K. Proposal for combining results from multiple evaluations of the same measurand. 2009. submitted for publication. [DOI] [PMC free article] [PubMed]

Articles from Journal of Research of the National Institute of Standards and Technology are provided here courtesy of National Institute of Standards and Technology

RESOURCES