Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2021 Dec 16;50(5):1037–1059. doi: 10.1080/02664763.2021.2012563

Ultrastructural calibration model for proficiency testing

Reiko Aoki a,CONTACT, Dorival Leão b, Juan P Mamani Bustamante a, Filidor Vilca c
PMCID: PMC10101683  PMID: 37065622

Abstract

Proficiency testing (PT) determines the performance of individual laboratories for specific tests or measurements and it is used to monitor the reliability of laboratories measurements. PT plays a highly valuable role as it provides objective evidence of the competence of the participant laboratories. In this paper, we propose a multivariate calibration model to assess equivalence among laboratories measurements in PT. Our method allows to deal with multivariate data, where the item under test is measured at different levels. Although intuitive, the proposed model is nonergodic, which means that the asymptotic Fisher information matrix is random. As a consequence, a detailed asymptotic analysis was carried out to establish the strategy for comparing the results of the participating laboratories. To illustrate, we apply our method to analyze the data from the Brazilian engine test group, PT program, where the power of an engine was measured by eight laboratories at several levels of rotation.

Keywords: Asymptotic theory, hypothesis testing, confidence region, ultrastructural model, measurement error model

1. Introduction

Comparative calibration models are typically used to compare different ways of measuring the same unknown quantity. The problem of comparing measurements may arise in different areas and contexts as can be seen in Refs. [2,3,7,14,21,23], for example. In this paper, we propose an ultrastructural calibration model motivated by the proficiency testing (PT) studies.

Proficiency studies are conducted to evaluate the equivalence of laboratories measurements. In these studies, a reference value of some measurand (the quantity to be measured) is determined and the results of all the laboratories are compared to this reference value. According to Refs. [6,12], accredited laboratories should assure the quality of test results by participating in PT programs. Various statistical techniques have been adopted to assess equivalence among laboratories measurements. These include classical statistical techniques such as paired t test, z-score, normalized error, repeated measures analysis of variance and Bland–Altman plot, see Refs. [1,10,12,15,19] and references therein.

As extensively discussed by ISO GUM [11], the measurement is only an estimate of the measurand and thus, it must be accompanied by its uncertainty. The approach to quantification of uncertainty of measurement is presented in Ref. [11]. As discussed by Gleser [8], the basic idea is to approximate a measurement equation Y=g(Z1,,Zd), where Y denotes the measurement, g is a known function and Z1,,Zd denote the d input quantities, by a first-order Taylor series about the expected values of Zi. The standard combined uncertainty is defined as the standard deviation of the probability distribution of Y based on this linear approximation. The expected value and the variance of each input quantity Zi may be based on measurements or any other information, such as the resolution of the measuring instrument. There are two types of uncertainty evaluation, besides this fact all uncertainties are determined by standard deviation. Type A is determined by the statistical analysis of a series of observations and type B is determined by other means, such as instruments specifications, correction factors or even data from additional experiments (see supplemental material for more detail).

One critical point in most techniques is the fact that the type B source of variability [11] is not considered. In this direction, Pinto et al. [17] extended Jaech's model [13] to encompass the type B source of variation and they evaluated this model under elliptical distributions. Toman [22] proposed a Bayesian Hierarquical model to encompass the type B source of variation into the model, see Ref. [16] for further developments of the Bayesian Hierarquical model.

In spite of the amount of techniques to assess equivalence among laboratories measurements, to the best of our knowledge, these approaches consider the reference as a single value. In some proficiency studies, the item under testing is measured at different levels. As one illustration, consider the measurement system to evaluate the engine power. In this case, the power (torque times rotation) is measured at different levels of rotation and as a result, we have one reference value for each level.

The first goal of this work is to propose an ultrastructural calibration model to assess equivalence among laboratories measurements considering that the item under testing is measured at different levels with the uncertainties described above.

Let xj represent the measurand with mean μxj and variance σxj2 at the jth level. In general, the parameter σxj2 is determined by one expert laboratory during the stability study which is conducted to guarantee the stability of the item under testing. In our example, the GM power train developed one standard engine and, during the stability study, evaluated its natural variability (σxj2).

One of the basic element in all PT is the evaluation of the performance of each participant. In order to do so, the PT provider has to determine a reference value, which can be obtained in two different manners. One is to employ a reference laboratory and the other is to use a consensus value. We will use the reference laboratory strategy.

In order to compare the measurements of the participant laboratories with the measurements obtained by the reference laboratory, we assume that the reference laboratory measures the item without bias,

Y1jk=xj+ϵ1jk,j=1,,m,k=1,n1, (1)

where ϵ1jk represents the measurement error related to the reference laboratory. As xj and ϵ1jk were determined by different laboratories with different methods, we assume that xj and ϵ1jk are independent. As a consequence, we obtain that E[Y1jk]=E[xj]=μj and Var[Y1jk]=σxj2+σ1j2.

Besides the fact that the reported result of the engine power has been corrected for all recognized significant systematic effects, many problems can occur during the measurement process. For example, the participant laboratory can fail to conduct the test in accordance with the requirements of the measurement procedure, poor working condition of equipment of the participant laboratory, among others.

In order to compare the measurement results between the participant laboratory and the reference laboratory, we assume that the measurement Yijk of the kth replica of the measurement of the engine power at the jth engine rotation level obtained by the ith laboratory is given by

Yijk=αi+βixj+ϵijk,k=1,ni,j=1,,m,i=2,,p, (2)

where ϵijk represents the measurement error, αi is the additive bias and βi is the multiplicative bias related to the measurements of the ith participant laboratory with respect to the measurements of the reference laboratory. As xj and ϵijk were determined by different laboratories with different methods, we assume that xj and ϵijk are also independent. As a consequence, we obtain that E[Yijk]=αi+βiμxj and Var[Yijk]=σxj2+σij2.

The variance σij2 is determined by the combined variance calculated and provided by the ith laboratory following the procedure proposed by ISO GUM [11] (see Supplemental Material for more details).

Considering the proposed ultrastructural calibration model, the second goal of this work is to develop a test to evaluate the competence of the group of laboratories and also the competence of individual laboratories with respect to the reference laboratory. In this case, we provide statistical testing hypothesis to evaluate additive and multiplicative bias. Finally, we propose one graphical analysis to assess the equivalence of the measurements of the ith laboratory with respect to the measurements of the reference laboratory.

Besides the fact that the proposed model is simple and intuitive, it presents some challenging properties. As we have only one item under testing for the entire PT program, the true unobserved value xj of the item under testing (engine power) at the jth level (rotation) is also the same during the PT program. As a consequence, there is a natural dependency among all measurements at the same jth level. This fact yields that the observed information matrix converges in probability to a random matrix with components related to the mean of the measurand (μxj) being null. Thus, it is not possible to obtain consistent estimate for the parameter μxj. Subsequently, the usual asymptotic theory is not applicable to the proposed ultrastructural calibration model.

To address this problem, we will apply the smoothness of the likelihood function to derive one suitable asymptotic theory to the ultrastructural calibration model, as developed by Refs. [20,24,25]. As a consequence of the asymptotic theory developed in Section 3, we will propose a Wald type test to evaluate the bias parameters. Moreover, we will apply the Wald statistics to develop a graphical analysis to assess the competence of each participant laboratory with respect to the reference laboratory.

In Section 2, we describe the model and obtain the Score function, as well as, the observed information matrix in closed form expressions. Moreover, we develop the Expectation Maximization (EM) algorithm to obtain the maximum likelihood estimates (MLEs) of the parameters. In Section 3, we develop the asymptotic theory to assess the equivalence among laboratories measurements in PT. Next, tests for the composite hypothesis and confidence regions are obtained. Moreover, we perform a simulation study considering different number of replicas, nominal values and parameter values. In Section 4, we apply the developed methodology to the real data set collected to perform a proficiency study. Finally, we discuss the obtained results in Section 5.

2. The model

Considering the model defined by (1) and (2), and the engine power illustration, let Y1j=(Y1j1,,Y1jn1)T and Yij=(Yij1,,Yijni)T represent, respectively, the measurements of the engine power of the reference laboratory and the ith laboratory at the jth engine rotation value, Yjn=(Y1j,,Ypj), the measurements of all the laboratories at the jth engine rotation value and finally, Yn=(Y1n,,Ymn) the observed data with n=i=1pni. Then, assuming that YjnNn(μj,Σj), where μj=(μ1jT,,μpjT)T=α+μxjβ and Σj=D(σj2)+σxj2ββ, with α=(0n1, α21n2,,αp1np), β=(1n1,β21n2,,βp1np), σj2=(σ1j21n1,,σpj21np), 0n1 ( 1ni) denoting a vector composed by n1 zeros ( ni one's) and D(a) denoting the diagonal matrix with the diagonal elements given by a, we have Y1jNn1(μ1j,Σ11j) and YijNni(μij,Σiij), i=2,,p, j=1,,m, where μ1j=μxj1n1, Σ11j=σ1j2In1+σxj21n11n1T, μij=(αi+βiμxj)1ni, Σiij=σij2Ini+βi2σxj21ni1niT, with Ini denoting the identity matrix of size ni. Furthermore,

fYjn(yjn,θ)=(2π)(n/2)Σj(1/2)exp{12(yjnμj)Σj1(yjnμj)},j=1,,mandfYn(yn,θ)=j=1mfYjn(yjn,θ)

with θ=(μx1,,μxm,α2,,αp,β2,,βp)T=(θ1,,θm+2(p1))T.

The log-likelihood function is given by

Ln(θ)=logfYn(yn,θ)=mn2log(2π)12j=1mlogajn12j=1mi=1pnilog(σij2)12j=1mQjn, (3)

where ajn=1+σxj2βD1(σj2)β and Qjn=(yjnμj)Σj1(yjnμj), j=1,,m.

Moreover, the covariance between the observations taken at the same value of the engine rotation by the reference laboratory (ith laboratory) is given by σxj2 ( βi2σxj2) and the covariance between the observations of the reference laboratory and the ith laboratory at the same value of the engine rotation is given by βiσxj2, while the covariance between the observations of the ith and hth laboratory at the jth engine rotation is given by βiβhσxj2, that is:

cov(Y1jk,Y1jl)=σxj2;cov(Yijk,Yijl)=βi2σxj2;cov(Y1jk,Yijl)=βiσxj2;cov(Yijk,Yhjl)=βiβhσxj2; i,h=2,,p; j=1,,m; k,l=1,,ni.

After algebraic manipulations, the elements of the score function, Un(θ)=Ln(θ)/θ, denoted by Uθqn, q=1,,m+2(p1), were obtained and are given by

Uμxjn=Mjnajnμxjσxj2,j=1,,m;Uαin=j=1m1σij2[niβiMjnσxj2ajnTijn],Uβin=j=1mσxj2ajnσij2{niβi+Mjn[niβiσxj2MjnajnTijn]},i=2,,p,

with Mjn=(μxj/σxj2)+(Y1jT1n1/σ1j2)+i=2p(βiTijn/σij2) and Tijn=(Yijαi1ni)T1ni.

Subsequently, the elements of the observed information matrix, Jn(θ)/n=(1/n)(2Ln(θ)/θθT), denoted by Jθr,θhn,r,h=1,,m+2(p1), were obtained in closed form expressions and are given by

Jμxj,μxjn=(1ajn)σxj2ajn,Jμxj,μxqn=0,Jμxj,αin=niβiσij2ajn,Jμxj,βin=1σij2ajn(2niβiσxj2MjnajnTijn),Jαi,αin=j=1mniσij2(1niβi2σxj2σij2ajn),Jαi,αln=j=1mninlβiβlσxj2σij2σlj2ajn,Jαi,βin=j=1mniσxj2σij2ajn[Mjnβiσij2(2niβiσxj2MjnajnTijn)],Jαi,βln=j=1mniβiσxj2σij2σlj2ajn(Tljn2nlβlσxj2Mjnajn),Jβi,βin=j=1mσxj2σij2ajn{ni(Tijn)2σij2+niσxj2ajn[(Mjn)2(14niσxj2βi2σij2ajn)+4βiMjnTijnσij22niβi2σij2]},Jβi,βln=j=1mσxj2σij2σlj2ajn{TijnTljn+2σxj2ajn[ninlβiβl(1+2σxj2(Mjn)2ajn)niβiMjnTljnnlβlMjnTijn]},jq,il,j,q=1,,m,i,l=2,,p.

2.1. EM algorithm

In this subsection, we are going to outline the EM algorithm [5] used to obtain the estimates of the parameters. In measurement error models, if the latent data xj, j=1,,m, is introduced to augment the observed data, the MLEs of the parameters based on the augmented data (complete data) become easy to obtain. Considering the model defined by (1) and (2) and the observed data for the jth engine rotation value, Yjn=(Y1j,,Ypj), we augment Yjn by considering the unobserved data xj. Then, the complete data for the jth engine rotation value is given by Yjcn=(xj,YjnT)T, with E(Yjcn)=μjc=(μxj,μjT)T and the covariance matrix given by Σjc=(σxj2Σ12jΣ21jΣj), j=1,,m, with Σ12j=Σ21jT=σxj2βT, μj and Σj as given above in Section 2. Furthermore, YjcnNn+1(μjc,Σjc) and let Ycn=(Y1cnT,,YmcnT)T, then

fYcn(ycn)=j=1mfYjcn(yjcn)=(2π)(m(n+1)/2)[j=1m(σxj2(σ1j2)n1(σpj2)np)(1/2)]×exp{j=1m12(Yjcnμjc)TΣjc1(Yjcnμjc)}.

It follows that the log-likelihood function of the complete data is given by

Lc(θ)=m(n+1)2log(2π)12j=1m(logσxj2+i=1pnilogσij2)12[j=1m(xjμxj)2σxj2+j=1mk=1n1(Y1jkxj)2σ1j2+j=1mi=2pk=1ni(Yijkαiβixj)2σij2],

which is much simpler than (3). Given the estimates of θ in the (r1)th iteration, θ(r1), the E step consists in the obtention of the expectation of the complete data log-likelihood function, Lc(θ), with respect to the conditional distribution of x=(x1,,xm)T given the observed data, Yn and θ(r1). The M step consists in the maximization of the function obtained in the E step with respect to θ, which gives the estimates of the parameters for the next iteration, θ(r). Each iteration of the EM algorithm increments the log-likelihood function of the observed data Ln(θ), i.e. Ln(θ(r1))Ln(θ(r)). When the likelihood function of the complete data belongs to the exponential family, the implementation of the EM algorithm is usually simple. In our case, the E step consists in the obtention of Eθ(r1)[xj|Yjn] and Eθ(r1)[xj2|Yjn], j=1,,m. In the M step, we maximize the log-likelihood function of the complete data where the values of the sufficient statistics were substituted by the expected values obtained in the E step.

The EM algorithm for the model defined by (1) and (2) may be summarized as follows.

E step: considering the properties of the multivariate normal distribution, the E step consists in the obtention of

x^j(r)=Eθ(r1)[xj|Yjn]=σxj2Mjn(r1)ajn(r1)andxj2^(r)=Eθ(r1)[xj2|Yjn]=σxj2ajn(r1)+(x^j(r))2,

where

Mjn(r1)=[μxj(r1)σxj2+k=1n1Y1jkσ1j2+i=2pβi(r1)σij2(k=1niYijkniαi(r1))]

and ajn(r1) represent, respectively, the value of Mjn and ajn evaluated at θ(r1).

M step: the M step consists in the obtention of

βi^(r)=(j=1m(x^j(r)/σij2)k=1niYijk)(j=1m(1/σij2))(j=1m(x^j(r)/σij2))(j=1m(1/σij2)k=1niYijk)ni[(j=1m(xj2^(r)/σij2))(j=1m(1/σij2))(j=1m(x^j(r)/σij2))2],αi^(r)=(j=1m(1/σij2)k=1niYijkniβ^i(r)j=1m(x^j(r)/σij2))nij=1m(1/σij2)andμ^xj(r)=x^j(r),

i=2,,p,j=1,,m.

Notice that closed form expressions were obtained for all the expressions in the M step, which means that this procedure will be computationally inexpensive and also it is very simple to implement. Furthermore, if the variance terms were not known, we can easily adapt the algorithm to obtain the estimate of the parameters σxj2 and σij2, i=1,,p,j=1,,m.

3. Asymptotic theory

In this section, we will develop the asymptotic theory necessary to prove the consistency and the asymptotic distribution of the MLE regarding the bias parameters. In the sequel, we will apply the regularity properties of the likelihood function to establish the asymptotic results, as proposed by Refs. [20,24,25]. For a given θ, we define

Pθ,(x1,,xm)n(Bn)=BnfYcn(ycn,θ)dycn,

for every BnA(Rm(n+1)), where A(Rm(n+1)) is the Borel σ-algebra. By applying Kolmogorov extension theorem, there exists a unique probability Pθ,(x1,,xm) defined on (R,A(R)) such that

Pθ,(x1,,xm)(Bn×R×R×)=Pθ,(x1,,xm)n(Bn),

for every BnA(Rm(n+1)). The marginal distribution of the observed data will be denoted by Pθ and the marginal distribution of the unobserved variables (x1,,xm) will be denoted by P(x1,,xm). We say that n when ni and (ni/n)wi where wi is a positive constant for every i=1,,p.

Let M be the space of all × matrices. The norm A of the matrix A is A∥=max{Ahs∣:h,s=1,,}. A sequence of matrices {Au:u=1,2,} converges to a limit A if, and only if, AuA∥→0. If the matrix A is positive definite, we write A>0. In this case, A1/2 denotes the symmetric positive square root of A. In the same way, for a given vector v=(v1,,v)R, we consider the norm of v as follows v∣=max{v1,,v}.

We denote by Bc(n,θ) the set of all vectors ψRm+2(p1) such that nψθ∣≤c, where c is a positive constant. Moreover, we denote by Rc(n,θ) the set of all random vectors ϕ with values in Rm+2(p1) such that nϕθ∣≤c. Here, we take the random vector ϕ as function of the observed data Yn.

Lemma 3.1

For any sequences {ψn:ψnBc(n,θ),n1} and {ϕn:ϕnRc(n,θ),n1}, there exist a positive semidefinite random matrix W(θ) such that

Pψn,(x1,,xm)[1nJn(ϕn)W(θ)∥≥ϵ]0,n.

Proof.

See supplementary material.

The random matrix W(θ) was obtained in closed form expressions and its elements are given by

W(μxjμxj)=W(μxjμxh)=W(μxjαi)=W(μxj,βi)=0;W(αiαi)=j=1m1σij2βi2j=1m1(σij2)2(q=1p(βq2/σqj2));W(αiαl)=βiβlj=1m1σij2σlj2(q=1p(βq2/σqj2));W(αiβi)=j=1mxjσij2βi2j=1mxj(σij2)2(q=1p(βq2/σqj2));W(αiβl)=βiβlj=1mxjσij2σlj2(q=1p(βq2/σqj2));W(βiβi)=j=1mxj2σij2{1βi2σij2(q=1p(βq2/σqj2))};W(βiβl)=βiβlj=1mxj2σij2σlj2(q=1p(βq2/σqj2));j,h=1,,m;jh;i,l=2,,p;il.

W(θ) has two important features. First, every component associated with μxj is null, it means that we do not have enough information to estimate μxj in a consistent way. This is a consequence of the fact that, for any level j=1,,m, the same item (engine) is measured by all the laboratories under the same conditions. Second, the matrix W(θ) is random and the model is considered nonergodic. As a consequence, the score random process Un(θ) is nonergodic. Furthermore, the components associated with μxj satisfy

Pψn,(x1,,xm)[|1nUμxjn(θ)|ϵ]0,ϵ>0,j=1,2,,m. (4)

We will denote by U~n(θ) and J~n(θ)/n the score vector and the observed information matrix without the components involving μxj, respectively. We also denote by W~(θ) the random matrix W(θ) without the components involving μxj. Moreover, we denote the bias components of the vector of parameters by θ~=(α2,,αp,β2,,βp)TR2(p1) and θ~^n the related MLE. Furthermore, the random matrices W(θ) and W~(θ) depend only on the bias components of the vector of parameters. As a consequence, from now on, we will denote W(θ) and W~(θ) by W(θ~) and W~(θ~), respectively.

Let {gn:n1} be a sequence of real continuous function defined on a metric space, we say that gn(τ) converges uniformly in τ to g(τ) if gn(τn)g(τ) for every sequence τnτ. Let λτ and {λτn:n1} be probabilities defined on the Borel subsets of a metric space depending on the arbitrary parameter τ, and let C be the space of real bounded uniformly continuous functions. We shall say that λτnuλτ uniformly if

gdλτngdλτuniformly inτ,for allgC.

If Q is a metric space and τQ, the family λτ of probabilities is continuous in τ if λτnλτ whenever τnτ in Q.

As described in the proof of Lemma 3.1, the random matrix W(θ~) is a function of the unobserved variables (x1,,xm) and the parameter θ~. Then, for each fixed θ~, it is defined on the probability space (Rm,A(Rm), P(x1,,xm)). We denote by Gθ~ the distribution of the random matrix W(θ~).

Lemma 3.2

Given the sequences {ψn:ψnBc(n,θ),n1} and {ϕn:ϕnRc(n,θ),n1} and g:Mm+2(p1)R a bounded continuous function, then

Eψn[g(1nJn(ϕn))]=g(1nJn(ϕn))dPψng(W(θ~))dP(x1,,xm)=E(x1,,xm)[g(W(θ~))],

for every θRm+2(p1). Moreover, the distribution Gθ~ of W(θ~) is continuous in θ~.

Proof.

See supplementary material.

Let sRm+2(p1) be a vector and let {θn:θnBc(n,θ),n1} be a sequence of parameters. We define ψn=θn+(1/n)s a vector in Rm+2(p1) such that ψnθ as n. As Ln(θn) is a smooth function, we may write

Ln(ψn)=Ln(θn)+(ψnθn)TUn(θn)12(ψnθn)TJn(ϕn)(ψnθn)=Ln(θn)+1nsTUn(θn)12nsTJn(ϕn)s, (5)

where ϕn=(1δn)θn+δnψn, 0<δn<1 and δn is random. As δn is a function of the observed data Yn and 0<δn<1, we conclude that ϕnRc(n,θ). As a consequence, we obtain that ϕnθ as n.

Theorem 3.3

Let {θn:θnBc(n,θ),n1} be a sequence of parameters and let {hn:hnRc(n,θ),n1} be a sequence of random vectors. Then, we have that

(1nUn(θn),1nJn(hn))u(H(θ~),W(θ~)),

where H(θ~)=(0mT,((W~(θ~))1/2z)T)T with z denoting the standard normal random vector on R2(p1), independent of the random matrix W~(θ~). Furthermore, the random matrix W~(θ~) is positive definite with probability one and it depends only on the bias components θ~ and the unobserved variables (x1,,xm).

Proof.

See supplementary material.

In the sequel, we will calculate the asymptotic distribution of the MLE regarding the bias parameters θ~. For every vector s=(s1,s2)Rm+2(p1) such that s1=(s1,,sm) and s2=(sm+1,,sm+2(p1)), it follows from Equation (5) that

Ln(ψn)Ln(θ)=1nsTUn(θ)12nsTJn(ϕn)s=1ns2TU~n(θ)12ns2TJ~n(θ)s2+1ns1TUμxjn(θ)12ni=1msi2Jiin(ϕn)12ni=1mj=m+1m+2(p1)sisjJijn(ϕn)12ni=m+1m+2(p1)j=1msisjJijn(ϕn)+12ns2T[J~n(θ)J~n(ϕn)]s2,

where θRm+2(p1), ψn=θ+(1/n)s and ϕn=(1δn)θ+δnψn, 0<δn<1 such that δn is random. As a consequence of Lemma 3.1 and Equation (4), we arrive at the following lemma.

Lemma 3.4

We have that

supsRm+2(p1)[Ln(ψn)Ln(θ)]=sups2R2(p1)[1ns2TU~n(θ)12ns2TJ~n(θ)s2]+op(1), (6)

for θRm+2(p1).

This Lemma is crucial to understand the behavior of the likelihood function with respect to the true value parameters (μx1,,μxm). For n sufficiently large, the impact of the true values vanishes. Moreover, the maximum with respect to s2 of the right side of Equation (6) satisfies

1nJ~n(θ)s^2=1nU~n(θ)+op(1). (7)

By applying Equation (6), for n sufficiently large, we conclude that (s1T,s^2T)T corresponds closely to the value of s^ that maximizes Ln(θ+n1/2(s1T,s2T)T) independent of the vector s1.

The maximum of Ln(θ+n1/2s), MLE θ^n, is given by

θ^n=θ+1ns^,which givesθ~^n=θ~+1ns^2andn(θ~^nθ~)=s^2,

where θ~^n corresponds to the MLE of the bias parameters θ~. As a consequence, we conclude that

(1nJ~n(θ))n(θ~^nθ~)=1nJ~n(θ)s^2. (8)

Summing up the results obtained from Equations (7) and (8), we arrive at the following theorem.

Theorem 3.5

The MLE θ~^n of θ~ satisfies

1nU~n(θ)(1nJ~n(θ))n(θ~^nθ~)u0,

uniformly in probability, where θ=(μx1,,μxm,θ~)Rm+2(p1).

As a consequence of Theorems 3.3 and 3.5 and the continuous mapping theorem, we conclude that

([1nJ~n(θ)]1/2n(θ~^nθ~),1nJ~n(θ))u(z,W~(θ~)). (9)

Equation (9) and the continuous mapping theorem yield

n(θ~^nθ~)u[W~(θ~)]1/2z. (10)

From Equation (10), we know that the asymptotic distribution of the MLE regarding the bias parameters θ~ is not normal, because the matrix W~(θ~) is random.

Corollary 3.6

The MLE θ~^n related to the bias parameters satisfies

(θ~^nθ~)u0.

In the sequel, we will derive the usual Wald statistics to perform hypothesis testing about the bias parameters. In order to do it, it is necessary to derive Theorem 3.3 with hn=θ^n. By applying Prohorov's theorem, we know that the sequence {n(θ~^nθ~):n1} is uniformly tight. Then, for each ϵ>0, there exists a constant c>0 such that

Pθ,(x1,,xm)[|n(θ~^nθ~)|>c]<ϵ,n1. (11)

As a consequence, with probability tending to one, θ~^nRc(n,θ~).

Lemma 3.7

Let {θn:θnBc(n,θ),n1} be a sequence of parameters. Then, we have that

(1nUn(θn),1nJ~n(θ^n))u(H(θ~),W~(θ~)).

Thus, we arrive at the following corollaries.

Corollary 3.8

Conditional on J~n(θ^n), the asymptotic distribution of n(θ~^nθ~) is given by

N2(p1)[02(p1),(1nJ~n(θ^n))1].

Applying again Equation (9), Lemma 3.7 and continuous mapping theorem, we arrive at the Wald statistics.

Corollary 3.9

We have that

(θ~^nθ~)T[J~n(θ^n)](θ~^nθ~)uzTz.

In the next subsection, these results are used to develop the Wald test statistics to test the equivalence of all the laboratories with respect to the reference laboratory, as well as the composite hypothesis.

3.1. Equivalence among participant laboratories

In this subsection, we will propose multiple hypothesis testing to assess the equivalence among the laboratories measurements with respect to the reference laboratory.

First, we will test for the equivalence of all laboratories with respect to the reference laboratory,

H0:α2==αp=0andβ2==βp=1. (12)

To test hypothesis (12), we may apply the Wald statistics as established in Corollary 3.9, i.e.

Qw=(θ~^nθ~0)T[J~n(θ^n)](θ~^nθ~0), (13)

with θ~0=(0(p1)T,1(p1)T)T for hypothesis defined in (12). Under the conditions established in Corollary 3.9, Qw as indicated above has an asymptotic χ2(p1)2 distribution.

In the sequel, we consider tests of the composite hypothesis

H0h:h(θ~)=0, (14)

where h:R2(p1)Rr is a vector-valued function such that the derivative matrix H(θ~)=(/θ~)h(θ~)T is continuous in θ~ and the Rank(H(θ~))=r. In order to develop these composite tests, consider the Taylor expansion

h(θ~+n1/2u)=h(θ~)+n1/2HT(θ~)u=h(θ~)+n1/2HT(θ~)u+n1/2[HT(θ~)HT(θ~)]u,

where θ~=θ~+n1/2γu, 0<γ<1 and uR2(p1). By applying the assumption on the continuity of H(θ~), we arrive at the following expression:

n[h(θ~+n1/2u)h(θ~)]=HT(θ~)u+n1/2[HT(θ~)HT(θ~)]u.

Letting u=n(θ~^nθ~), we obtain

n[h(θ~^n)h(θ~)]HT(θ~)n(θ~^nθ~)=op(1). (15)

Theorem 3.10

The compound Wald statistic

Qw=[h(θ~^n)h(θ~)]T[HT(θ~^n)(J~n(θ^n))1H(θ~^n)]1[h(θ~^n)h(θ~)]uzrTzr (16)

has an asymptotic χr2 distribution.

Proof.

See supplementary material.

If the null hypothesis (12) is rejected, the multiple test is performed,

H0i:αi=0andβi=1,i=2,,p. (17)

Let (J~n(θ^n))1=(vθiθj), i.e. vθiθj is representing the ijth element of the matrix (J~n(θ^n))1, then Qw for the hypothesis (17) can be written as

Qwi=(β~^i1)2vαiαi2α~^i(β~^i1)vαiβi+α~^i2vβiβivαiαivβiβivαiβi2,i=2,,p (18)

with asymptotic χ22 distribution.

As we are considering multiple test, it is important to control the type 1 error probability. For controlling the familywise error, it can be considered for example, the Simes–Hockberg procedure [9]. To provide a graphical analysis of the performance of the laboratories measurements with respect to the measurements of the reference laboratory, the result obtained in Theorem 3.10 can be used to obtain the confidence regions. Next, we present a simulation study considering the tests given in (13) and (16).

3.2. Simulation

In this subsection, we perform a simulation study to compare the behavior of the Wald test statistics developed in the previous section for different number of replicas, parameter values and nominal levels of the test. Considering the model defined in (1) and (2), with p = 5 (number of participant laboratories) and m = 5 (number of different engine rotation values), it generated 10,000 samples with 3, 7, 15 and 30 replicas. The parameters of the true unobserved value of the item under testing at the jth point ( xj, j=1,,m) was assumed to be: μx1=10,μx2=20,μx3=30,μx4=40 and μx5=50, for the mean values and σx1=0.24,σx2=0.31,σx3=0.38,σx4=0.45 and σx5=0.52, for the standard deviations. It considered three sets of parameter values for the standard deviation related to the measurement error of each laboratory ( i,i=1,,p) at the jth engine rotation value.

  1. σija:σi1=0.1,σi2=0.2,σi3=0.3,σi4=0.4,σi5=0.5;

  2. σijb:σi1=0.2,σi2=0.4,σi3=0.6,σi4=0.8,σi5=1.0;

  3. σijc:σi1=0.3,σi2=0.6,σi3=0.9,σi4=1.2,σi5=1.5.

Moreover, it was considered α=1%, α=5% and α=10% for the nominal significance levels. The routines were implemented in Ref. [18].

Table 1 shows the mean value, standard deviation (sd) and the mean square error (MSE) of the MLEs of the parameters considering the EM algorithm presented in Section 2.1, which was obtained with the samples generated for σija under H0. Clearly, as the number of replicas increase, the values of the sd and MSE decrease and the mean value of the estimates of αi and βi, i=2,,5, approaches the true value. On the other hand, considering the parameters μxj, j=1,,5, the same do not happen as the parameter μxj is not consistent. For σijb and σijc, the results were similar and are not shown.

Table 1.

MLEs of the parameters with σija.

  n = 3 n = 7
  Mean sd MSE Mean sd MSE
α2 0.0040 0.1264 0.0160 0.0029 0.0823 0.0068
α3 0.0009 0.1328 0.0176 0.0025 0.0892 0.0080
α4 0.0068 0.1242 0.0155 0.0040 0.0829 0.0069
α5 0.0038 0.1299 0.0169 0.0037 0.0865 0.0075
β2 0.9999 0.0067 0.0000 0.9995 0.0044 0.0000
β3 0.9999 0.0072 0.0001 0.9999 0.0047 0.0000
β4 0.999 0.0066 0.0000 0.9999 0.0044 0.0000
β5 1.0003 0.0070 0.0000 0.9996 0.0047 0.0000
μx1 10.00987 0.2489 0.0620 9.9919 0.2485 0.0618
μx2 20.0023 0.3070 0.0942 20.0008 0.3169 0.1004
μx3 30.0034 0.4192 0.1757 30.0090 0.3999 0.1600
μx4 40.0121 0.4669 0.2182 40.0045 0.4525 0.2048
μx5 50.0373 0.5577 0.3124 50.0074 0.5334 0.2846
  n = 3 n = 7
  Mean sd MSE Mean sd MSE
α2 0.0011 0.0585 0.0034 0.0003 0.0387 0.0014
α3 −0.0006 0.0577 0.0033 0.0009 0.0393 0.0015
α4 0.0009 0.0586 0.0034 0.0027 0.0404 0.0016
α5 0.0005 0.0556 0.0031 0.0004 0.0402 0.00165
β2 1.0000 0.0031 0.0000 1.0000 0.0021 0.00000
β3 1.0001 0.0032 0.0000 1.0000 0.0021 0.00000
β4 1.0000 0.0032 0.0000 1.0000 0.00222 0.0000
β5 1.0001 0.0030 0.0000 1.0000 0.0022 0.0000
μx1 10.0045 0.2506 0.0628 9.9983 0.2352 0.0553
μx2 19.9984 0.3189 0.1017 20.0020 0.2904 0.0841
μx3 29.9945 0.3933 0.1547 30.0221 0.3831 0.1472
μx4 40.0132 0.4678 0.2190 40.0187 0.4444 0.1978
μx5 49.9891 0.5234 0.2741 50.0036 0.5032 0.2753

Next, we consider the test for the equivalence of all laboratories with respect to the reference laboratory:

H0:α2==α5=0andβ2==β5=1.

It obtained the empirical significance levels considering the test obtained in (13). The results are summarized in Table 2.

Table 2.

Empirical sizes for the Wald test statistics for the test H0:α2==α5=0,β2==β5=1.

  σija σijb σijc
ni 1% 5% 10% 1% 5% 10% 1% 5% 10%
3 0.012 0.059 0.114 0.023 0.084 0.15 0.043 0.126 0.202
7 0.011 0.053 0.106 0.015 0.065 0.127 0.019 0.076 0.140
15 0.011 0.053 0.102 0.011 0.058 0.109 0.017 0.068 0.124
30 0.010 0.053 0.107 0.011 0.053 0.102 0.012 0.056 0.110

It can be noticed that as the number of replicas ( ni) increase, the empirical sizes approach the nominal sizes. Also, considering the first set of parameter values for the standard deviation of the measurement error of the laboratories ( σija), the nominal and empirical values are close even for small number of replicas, however as these standard deviations increase ( σijb and σijc), we need a larger number of replicas.

Furthermore, to simulate the power of the test for the equivalence of all laboratories with respect to the reference laboratory, it was considered a gradual distance from the null hypothesis for the second and forth laboratories and obtained the percentages of the observed values of the test statistics which were greater than the 95th quantile of the Chi-squared distribution with 8 degree of freedom.

Figure 1 shows the power of the test for different number of replicas ( ni=3,7,15 and 30) with the parameter of the standard deviation of the measurement error of each laboratory at the jth engine rotation value given by σija and σijb, respectively. Notice that in both figures as the number of replicas increase, the power of the test increases.

Figure 1.

Figure 1.

Simulated power for the Wald test statistics with σija and σijb for the test H0:α2==α5=0,β2==β5=1.

Figure 2 shows the power of the test as the standard deviation of the measurement error of the laboratories increases from σija to σijb for fixed number of replicas. In all cases, the power of the test under σija is greater than under σijb. Another point to observe is the fact that the distance between the two curves (power under σija and power under σijb) diminishes as the number of replicas increase.

Figure 2.

Figure 2.

Simulated power for the Wald test statistics with σija and σijb for the test H0:α2==α5=0,β2==β5=1.

Next, without loss of generality we consider the second laboratory to test for the equivalence of a laboratory with respect to the reference laboratory:

H0:α2=0andβ2=1.

It obtained the empirical significance levels considering the test obtained in (18). The results are summarized in Table 3 and it reaches the same conclusions as given for Table 2 for the equivalence of all laboratories with respect to the reference laboratory.

Table 3.

Empirical sizes for the Wald test statistics for the test H0:α2=0,β2=1.

  σija σijb σijc
ni 1% 5% 10% 1% 5% 10% 1% 5% 10%
3 0.016 0.065 0.126 0.024 0.081 0.147 0.035 0.114 0.189
7 0.010 0.051 0.101 0.017 0.070 0.129 0.023 0.088 0.151
15 0.010 0.051 0.101 0.013 0.061 0.113 0.016 0.068 0.126
30 0.008 0.048 0.102 0.012 0.053 0.102 0.013 0.063 0.120

Figure 3 shows the power of the test when the standard deviation of the measurement error of the laboratories are given by σija and σijb for different number of replicas. As the number of replicas increase, the power of the test increases, in addition when the standard deviation of the measurement error of the laboratories increase, the power decreases.

Figure 3.

Figure 3.

Simulated power for the Wald test statistics with σija and σijb for the test H0:α2=0,β2=1.

Furthermore, we considered a simulation study where the data set has the same characteristics as the data set considered in the Application Section, i.e. same number of laboratories, engine rotation values, number of replicas for each laboratory, the values of σxj2 and σij2, i=1,,8, j=1,,9 and k=1,,ni. For the values of αi, βi and μxj, the MLEs of these parameters were considered. μ^x1=8.8306,μ^x2=15.9425,μ^x3=26.9652,μ^x4=31.5969,μ^x5=37.3500,μ^x6=44.3796,μ^x7=47.5788,μ^x8=49.6742 and μ^x9=50.6601. See Table 6 for the estimates of αi and βi, i=2,,8. It generated 10,000 samples.

Table 6.

MLEs of the bias parameters.

  Laboratories
i 2 3 4 5 6 7 8
α^i 0.0700 0.1000 0.0658 0.2183 0.1288 −0.0315 0.0063
β^i 0.9661 0.9856 0.9957 0.9871 0.9983 0.9745 0.9913

For the test of equivalence of all laboratories with respect to the reference laboratory:

H0:α2==α8=0andβ2==β8=1,

it obtained the empirical significance level for α=1%, α=5% and α=10%, considering the test obtained in (13), which was respectively given by 0.012, 0.058 and 0.116.

Moreover, to simulate the power of the test, it was considered a gradual distance from the null hypothesis for the second and forth laboratories and obtained the percentages of the observed values of the test statistics which were greater than the 95th quantile of the Chi-squared distribution with 14 degrees of freedom. The left-hand panel of Figure 4 shows the corresponding power of the test.

Figure 4.

Figure 4.

Simulated power for the Wald test statistics: H0:α2==α8=0,β2==β8=1 (left panel) and H0:α2=0,β2=1 (right panel).

Next, we consider the second laboratory to test for the equivalence of a laboratory with respect to the reference laboratory:

H0:α2=0andβ2=1.

It was obtained the empirical significance level for α=1%, α=5% and α=10%, considering the test obtained in (18), the corresponding values were given by 0.015, 0.063 and 0.121, respectively. The right-hand panel of Figure 4 shows the power of the test.

In the next section, we apply the developed results for the real data set used in the stability study to show the usefulness of the proposed methodology.

4. Application

PT determines the performance of individual laboratories for specific measurements. The measurement procedure is developed under the coordination of the reference laboratory (one accredited laboratory). A set of detailed instructions are developed to enable participant laboratories to carry out a measurement without additional information. While to specify the item under testing, it is necessary to develop and characterize one suitable item. The main property of the item is its stability over time.

In our illustration, the GM power train developed an engine, in which all engine basic parameters were locked to reduce variability in the power engine measurements. Then, the engine was tested on an engine dynamometer during a period of time to prove the stability and characterize the measurand (xj) for each rotation value j, j=1,,m. By applying statistical process control techniques, GM power train estimated the stable variance of the measurand (σxj2,j=1,,m) under the specified conditions described in the measurement procedure.

In the sequel, the item (engine) under testing was sent to the participating laboratories. Each laboratory measured the item (power engine) according to a given set of instructions and reported their results together with the uncertainty to the administrator.

The engine power of the standard engine was measured by eight (p) laboratories at nine (m) engine rotation values. The natural variability ( σxj2) associated with the true unobserved values was evaluated during the stability study and can be found in Table 4.

Table 4.

Standard deviation of the true engine power measurements ( σxj) at the jth engine rotation value.

σx1 σx2 σx3 σx4 σx5 σx6 σx7 σx8 σx9
0.0877 0.1600 0.2720 0.3161 0.3760 0.4480 0.4760 0.5000 0.5080

The variance ( σij2) of the measurement error corresponding to the ith laboratory at the jth rotation value, i=1,,p; j=1,,m, was determined by the combined variance calculated and provided by the ith laboratory following the procedure proposed by ISO GUM [11], as described in the supplemental material. These values can be found in Table 5. The measurements of each laboratory can be found in Online Resource.

Table 5.

Measurement error standard deviations ( σij) for the ith laboratory at the jth engine rotation value.

  Laboratory
Rotation i = 1 i = 2 i = 3 i = 4 i = 5 i = 6 i = 7 i = 8
j = 1 0.0825 0.0735 0.0224 0.0900 0.2232 0.1005 0.1068 0.1578
j = 2 0.1466 0.1304 0.0424 0.1622 0.3984 0.1808 0.1929 0.2881
j = 3 0.2486 0.2216 0.0707 0.2739 0.6715 0.3058 0.3208 0.4869
j = 4 0.2912 0.2590 0.0831 0.3211 0.7918 0.3578 0.3788 0.5745
j = 5 0.3450 0.3081 0.0985 0.3803 0.9317 0.4250 0.4540 0.6740
j = 6 0.4111 0.3665 0.1166 0.4511 1.1026 0.5052 0.5403 0.7949
j = 7 0.4409 0.3918 0.1253 0.4830 1.1805 0.5374 0.5751 0.8511
j = 8 0.4627 0.4062 0.1300 0.5021 1.2229 0.5560 0.5992 0.8838
j = 9 0.4717 0.4136 0.1327 0.5114 1.2386 0.5644 0.6132 0.8978

First, considering the EM algorithm presented in Section 2, the MLEs of the bias parameters were obtained in Table 6.

Thereafter, considering the test given in (13) we tested the equivalence of all laboratories with respect to the reference laboratory. The value of the test statistics was given by Qw=2043.90. Therefore, we conclude that the group of laboratories are not consistent, there are at least one laboratory with significant multiplicative or additive bias.

Thus, we performed the multiple test given in Table 7, which gives the conclusion that the fourth and fifth laboratories are consistent at significance level α=1%. As we were performing multiple test, we applied the corrections developed by Hochberg (1988), Holm (1979) and Hommel (1988) to control the type 1 error probability for the family of tests. Inline graphic After applying the corrections, we conclude that the laboratories 4, 5 and 6 are compliant with the reference laboratory with the familywise error ratio smaller than 1%.

Table 7.

Wald test statistics, Qwi, for the hypothesis: H0:αi=0, βi=1,i=2,,8; with respective p-values.

  Laboratories
i 2 3 4 5 6 7 8
QWi 517.2679 69.3573 1.9682 6.6394 10.9409 324.5544 17.5634
p-Value 0.0000 0.0000 0.3738 0.0362 0.0042 0.0000 0.0002
p-Value  (Holm) 0.0000 0.0000 0.3738 0.0723 0.0126 0.0000 0.0006
p-Value  (Hochberg) 0.0000 0.0000 0.3738 0.0723 0.0126 0.0000 0.0006
p-Value  (Hommel) 0.0000 0.0000 0.3738 0.0723 0.0126 0.0000 0.0006

Subsequently, we constructed the confidence regions for the seven laboratories with the confident coefficient of 99% and Bonferroni corrections, so that the familywise error of the test is less than 1%. These regions can be found in Figure 5. We can conclude visually that laboratories 4, 5 and 6 are compliant with the reference laboratory. Moreover, all of the seven laboratories do not have additive bias.

Figure 5.

Figure 5.

Joint confidence regions for the participant laboratories.

5. Discussion

In this work, we propose a strategy to evaluate PT results with multivariate response. This is the most common case of PT, as in general, the item under test is measured at different levels of values.

PT determines the performance of individual laboratories for specific measurements, which means that PT compares the measuring results obtained by different laboratories. The usual comparative calibration model assumes that the measurand (x) is independent among the laboratories (participants and reference), as described in Refs. [4,7]. However, it is not the case in many PT. As we have only one item under testing (one engine), there is a natural dependency among all measurements at the same level (rotation). To fullfill this gap in the literature of measurement comparison model, we introduce one suitable ultrastructural model to encompass this dependency.

As a consequence of this dependency, the observed information matrix does not converge to the expected information matrix and the usual asymptotic theory is not applicable. In fact, the observed information matrix converges in probability to a random matrix. Furthermore, this random matrix has null components related to the mean of the measurand and in addition, the correspondent component of the score function also converges in probability to zero. In this paper, we extended the asymptotic theory developed by Refs. [20,24,25] to derive a Wald type test in this scenario, which is the base for assessing the competence of the participants laboratories.

The asymptotic theory developed in this work is based on two results. First, the observed information matrix converges in probability to a random matrix (see Lemma 3.1). Based on the fact that the log-likelihood function is smooth, we extend a result in Ref. [20] to obtain Theorem 3.3. In sequel, we apply standard arguments from asymptotic theory to derive a Wald type test. As a consequence, for any smooth log-likelihood function satisfying Lemma 3.1 and Theorem 3.3, we can apply the results of this paper to derive a Wald type test.

To assess the behavior of asymptotic results, a simulation study was performed. In general, we conclude that the performance of the asymptotic results are closely related to the sample size and the magnitude of the variance components. As the variance components of the reference laboratory are known before the start of the PT program, we can use the empirical power function to estimate the sample size.

To illustrate the developed methodology, we analyzed the results of the proficiency test related to the engine power in the Application Section. In the real data set considered here, we have eight laboratories including the reference laboratory. At the beginning of the program, the reference laboratory evaluated the stability of the engine under test and determined the component of variance related to the true value. To ensure comparability of results, the reference laboratory measured the engine at the beginning and end of the PT program. Each participating laboratory reported its measurements and respective uncertainties. The results of the participant laboratories were compared with the results of the reference laboratory using Wald statistics, as presented in Section 4.

Besides the fact that the proposed methodology was illustrated considering PT results, it can be applied in any situation where the interest is in comparing the measurements obtained using different manners with a reference value.

Preprint

arXiv:2011.00640 [math.ST]

Supplementary Material

Supplementary Data
supplementary Material

Funding Statement

The research was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior– Brasil(CAPES)– Finance Code 001. Research carried out using the computational resources of the Center for Mathematical Sciences Applied to Industry (CeMEAI) is funded by FAPESP (Grant Number 2013/07375-0).

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Altman D.G. and Bland J.M., Comparison of methods of measuring blood pressure, J. Epidemiol. Community Health 40 (1986), pp. 274–277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Barnett V.D., Simultaneous pairwise linear structural relationships, Biometrics 25 (1969), pp. 129–142. [PubMed] [Google Scholar]
  • 3.Cheng C.-L. and Van Ness J.W., Statistical Regression with Measurement Error, Kendall's Library of Statistics, Vol. 6, John Wiley & Sons, New York, 1997. [Google Scholar]
  • 4.Cheng C.-L. and Van Ness J.W., Statistical Regression with Measurement Error, Arnold, London and Oxford University Press, New York, 1999. [Google Scholar]
  • 5.Dempster A.P., Laird N.M., and Rubin D.B., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B (Methodol.) 39 (1977), pp. 1–38. [Google Scholar]
  • 6.EA-4/18 , 2010. Available at http://www.european-accreditation.org/publication/ea-4-18-inf-rev00-june-2010
  • 7.Giménez P. and Patat M.L., Local influence for functional comparative calibration models with replicated data, Stat. Pap. 55 (2014), pp. 431–454. [Google Scholar]
  • 8.Gleser L.J., Assessing uncertainty in measurement, Stat. Sci. 13 (1998), pp. 277–290. [Google Scholar]
  • 9.Hochberg Y., A sharper Bonferroni procedure for multiple tests of significance, Biometrika 75 (1988), pp. 800–802. [Google Scholar]
  • 10.ISO 13528 , Statistical methods for use in proficiency testing by interlaboratory comparisons, Tech. Rep., International Organization for Standardization, Geneva, 2015.
  • 11.ISO GUM , Guide to the expression of uncertainty in measurement, (gum), bipm, iec, ifcc, iupac, iupap, oiml., 1995.
  • 12.ISO, IEC 17043 , Conformity assessment general requirements for proficiency testing, Tech. Rep., International Organization for Standardization/International Electrotechnical Commission, Geneva, 2010.
  • 13.Jaech J.L., Statistical Analysis of Measurement Errors, Vol. 2, Wiley, New York, 1985.n [Google Scholar]
  • 14.Kimura D.K., Functional comparative calibration using an em algorithm, Biometrics 48 (1992), pp. 1263–1271. [Google Scholar]
  • 15.Linsinger T.P.J., Kandler W., Krska R., and Grasserbauer M., The influence of different evaluation techniques on the results of interlaboratory comparisons, Accreditation Qual. Assur. 3 (1998), pp. 322–327. [Google Scholar]
  • 16.Page G.L. and Vardeman S.B., Using Bayes methods and mixture models in inter-laboratory studies with outliers, Accreditation Qual. Assur. 15 (2010), pp. 379–389. [Google Scholar]
  • 17.Pinto D.L., Aoki R., and Silva G.F., Statistical analysis of proficiency testing results under elliptical distributions, Comput. Stat. Data Anal. 53 (2009), pp. 1427–1439. ISSN 0167-9473. Available at 10.1016/j.csda.2008.12.003. Available at http://www.sciencedirect.com/science/article/pii/S0167947308005720 [DOI] [Google Scholar]
  • 18.R Core Team , R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2016. Available at https://www.R-project.org/
  • 19.Rosario P., Martínez J.L., and Miguel Silván J., Comparison of different statistical methods for evaluation of proficiency test data, Accreditation Qual. Assur. 13 (2008), pp. 493–499. [Google Scholar]
  • 20.Sweeting T.J., Uniform asymptotic normality of the maximum likelihood estimator, Ann. Stat. 8 (1980), pp. 1375–1381. ISSN 00905364. Available at http://www.jstor.org/stable/2240949 [Google Scholar]
  • 21.Theobald C.M. and Mallinson J.R., Comparative calibration, linear structural relationships and congeneric measurements, Biometrics 34 (1978), pp. 39–45. [Google Scholar]
  • 22.Toman B., Bayesian approaches to calculating a reference value in key comparison experiments, Technometrics 49 (2007), pp. 81–87. [Google Scholar]
  • 23.Vilca-Labra F., Aoki R., and Zeller C.B., Hypotheses testing for structural calibration model, Stat. Pap. 52 (2011), pp. 553–565. [Google Scholar]
  • 24.Weiss L., Asymptotic properties of maximum likelihood estimators in some nonstandard cases, J. Am. Stat. Assoc. 66 (1971), pp. 345–350. [Google Scholar]
  • 25.Weiss L., Asymptotic properties of maximum likelihood estimators in some nonstandard cases, II, J. Am. Stat. Assoc. 68 (1973), pp. 428–430. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data
supplementary Material

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES