Abstract
A fully Bayesian method for quantitative genetic analysis of data consisting of ranks of, e.g., genotypes, scored at a series of events or experiments is presented. The model postulates a latent structure, with an underlying variable realized for each genotype or individual involved in the event. The rank observed is assumed to reflect the order of the values of the unobserved variables, i.e., the classical Thurstonian model of psychometrics. Parameters driving the Bayesian hierarchical model include effects of covariates, additive genetic effects, permanent environmental deviations, and components of variance. A Markov chain Monte Carlo implementation based on the Gibbs sampler is described, and procedures for inferring the probability of yet to be observed future rankings are outlined. Part of the model is rendered nonparametric by introducing a Dirichlet process prior for the distribution of permanent environmental effects. This can lead to potential identification of clusters of such effects, which, in some competitions such as horse races, may reflect forms of undeclared preferential treatment.
LATENT variable models for describing mechanisms by which some continuous scale maps into ordered or unordered categories of response have a long tradition in quantitative genetics and psychometry. For example, the threshold-liability model for ordinal categories dates back to Wright (1934), Dempster and Lerner (1950), Grüneberg (1952), and Falconer (1965); see Gianola (1982) for a review. Extremal models were pioneered in psychology by Thurstone (1927) and were adapted by Bock and Jones (1968) to explain choices between unordered alternatives.
Suppose that an observation is an assignment into one of M mutually exclusive and exhaustive categories of response. The classical Thurstonian extremal model postulates the existence of some latent or unobserved continuous valued vector , such that category i is observed when yi is larger than the other M − 1 elements of the vector. The probability of observing an assignment into category M, say, is given by
![]() |
where “parameters” are typically unknown quantities driving the process. The extremal model can also be applied to data consisting of ranks, presumably reflecting the outcome of some underlying continuous process, e.g., competing ability. For instance, Henery (1981) and Tavernier (1991) used this idea in the context of horse races, and Johnson et al. (2002) for analysis of ranked data collected in primate intelligence experiments. Ranked data also arise in human genetics (e.g., Lawrence et al. 1994). In Henery (1981) and Tavernier (1991), the setting is one where several individuals compete in a series of events (e.g., horse races). Prior to the event, each individual generates an unobservable random variable, or “liability of winning the event,” and these liabilities are regarded as mutually independent, given some parameters. Posterior to the event, the observed ranking corresponds to the order of the realized values of these liabilities. Clearly, the latent variables are no longer independent, a posteriori, because the ranking implies that , where y[1] is the liability of the individual ranked first, and so on. In particular, Tavernier (1991) described an empirical Bayes procedure for inferring features of the latent distribution, such as breeding value of racing horses or competing ability of each of the members of a list of competitors. The approach of Tavernier (1991) is based on calculating, iteratively, a conditional posterior mode, given some values of dispersion parameters; these, in turn, are estimated using an approximation to marginal maximum-likelihood estimation. Her procedure uses statistical and numerical approximations that are not needed in a Bayesian implementation based on Markov chain Monte Carlo (MCMC) sampling.
The objective of this study is to present theory for a fully Bayesian method for quantitative genetic analysis of data consisting of ranks scored at each of a series of runs, experiments, or events. The method uses a Thurstonian representation of the mechanism by which an underlying continuum is translated into observed ranks. The parameters driving the Bayesian hierarchical model include additive genetic and permanent environmental effects, and the MCMC implementation allows easy computation of probabilities of yet to be observed runs. The article is organized as follows. probability distribution of observed ranks outlines the Thurstonian concept and gives the sampling model for the observed data. In bayesian structure, a hierarchical specification is outlined. conditional maximum a posteriori estimation reviews the empirical Bayes methodology suggested by Tavernier (1991) and discusses potential shortcomings of this type of treatment of data. fully bayesian analysis gives details of the MCMC scheme proposed and presents procedures for inference and prediction. Some assumptions are relaxed in nonparametric distribution for effects. Specifically, the assumption of a normal distribution of permanent environmental deviates is replaced by a Dirichlet process, using a normal distribution as baseline probability measure; a Gibbs sampling scheme is presented. The article concludes with a discussion.
PROBABILITY DISTRIBUTION OF OBSERVED RANKS
Suppose nk individuals “compete” in event k and that the information available for analysis is that furnished by the ranks observed in the event. It is assumed that no ties are possible, but this restriction is easy to lift. As in Henery (1981) and Tavernier (1991), it is supposed that the observed ranking is a manifestation of the magnitudes of latent variables, liability of winning the event, which are specific to each of the individuals.
The model for the unobserved liability of individual j winning event k
is the standard specification of quantitative genetics,
![]() |
(1) |
(Dempster and Lerner 1950; Falconer 1965; Gianola 1982; Gianola and Foulley 1983), where β is a set of F location effects affecting liability and x′jk is a known incidence vector peculiar to individual j in event k; aj and pj are the additive genetic and permanent environmental effects, respectively, of individual j on liability; gk is an effect peculiar to event k (which may or may not be needed in the model) and is a random residual, distributed independently of all terms in the model, within and across subjects; and
is the residual variance. In (1),
. The liabilities are not observable, so it is assumed that
= 1 for parameter identification purposes. Hence, all effects are measured in units of residual standard deviation.
The liabilities of the “competitors” present in event k can be packed into the nk × 1 vector lk, and the corresponding vectorial representation is
![]() |
(2) |
where a is a Q × 1 vector of additive genetic effects of all potential participants or genetically related individuals; p is a C × 1 vector of permanent environmental effects (including perhaps nonadditive genetic deviations) of individuals with at least one record of competition; g is a K × 1 vector of event effects; ek is a residual vector, and
is an identity matrix of order nk; and Xk, Zak, Zpk, and Zgk are incidence matrices of appropriate order. Vertical concatenation of all liabilities into a
vector l leads to the representation
![]() |
(3) |
where and
.
Prior to the competition, and conditionally on β, a, p, and g, the joint density of the yet to be realized liabilities, in view of the mutual independence of the residuals, is
![]() |
(4) |
where denotes the density of the normally distributed liability ljk, with mean μjk and variance 1 and
is a set of indicator variables; i.e., δjk = 1 if competitor j is present in event k and 0 otherwise.
Consider competition k, such that the “winner” is the individual with the largest liability among the nk competitors. An ordered variable is denoted as , which is the realized liability of individual j attaining rank = 1, 2, … , nk in the event; for instance, if a competitor whose identification is 20 places second in the event, then the corresponding liability is denoted as
. Following Henery (1981) and Tavernier (1991), the probability that a given order or ranking is observed in event k
is given by
![]() |
(5) |
This probability can also be expressed in terms of nine contrasts between liabilities being >0. Since the latent variables are not observable, one can “anchor” the liability of the winner to 0, so that all other liabilities are negative; the mean liability of a “perfect” competitor (i.e., one winning all events) would then be null. The probability of the ranks observed in all K competitions (DATA), assuming that outcomes of different races are conditionally independent, given θ, is
![]() |
(6) |
BAYESIAN STRUCTURE
Consider now the problem of inferring, via the Bayesian approach, unobserved features of interest, e.g., the vector of additive genetics a, as it would be the case in horse breeding, or the linear combination v = a + p, which can be termed “competing ability.” Define , a vector of variance parameters. Assume that the density of the joint prior distribution of θ and Λ has the form
![]() |
(7) |
Above, , and
denote multivariate normal densities of orders F, Q, C, and K, respectively;
is a variance reflecting uncertainty about location parameters in β (it is assumed known and large); A is the numerator additive relationship matrix (Henderson 1976) and
is the additive genetic variance;
is the variance of permanent environmental effects and
is the variance between events. In addition,
is the density of a scaled inverted chi-square distribution on νi degrees of freedom, with
interpretable as a prior guess for
, and H =
is a set of known hyperparameters.
The density of the joint posterior distribution of θ and Λ is proportional to the product of (6) and (7):
![]() |
(8) |
CONDITIONAL MAXIMUM A POSTERIORI ESTIMATION
Prior to the advent of MCMC procedures for sampling from posterior distributions, it was standard to use two-stage approaches to carry out an approximate Bayesian analysis. In the first stage, a Gaussian approximation to is used to obtain an approximate maximum marginal-likelihood estimate of Λ (Gianola et al. 1986; Foulley et al. 1987),
, say. In the second stage, the joint mode of the conditional posterior distribution
is used to obtain point estimates of θ (Gianola and Foulley 1983) given
, or “conditional maximum a posteriori” (MAP). If the distribution
is exactly Gaussian, its posterior mode yields directly the posterior means of each of the elements of θ (Gianola and Fernando 1986). If, in addition, a flat prior is used for β, the posterior mean vector of
yields the best linear unbiased predictor of β and the best linear unbiased predictors of a, p, and g. Also, if a flat prior is assigned to β and to each of the elements of Λ, under Gaussian assumptions the elements of the mode of the marginal distribution
coincide with the restricted maximum-likelihood estimates of the variance components (Harville 1974).
Tavernier (1990, 1991) used this type of approach for finding the θ-maximizer of the log of the conditional posterior density , assuming a flat prior for β. The objective function for the model in (1), assuming a normal prior for β, is
![]() |
(9) |
Tavernier (1990, 1991) employed the Newton–Raphson algorithm for maximizing (9) in the context of horse races. This requires calculation of first and second derivatives of (9) with respect to θ. The gradient vector and the Hessian matrix needed for the Newton–Raphson machinery are
![]() |
and
![]() |
respectively, where
![]() |
These derivatives are not trivial and involve integration of multivariate normal densities, as well as differentiation under the integral sign. Tavernier (1991) gives approximations to first and second derivatives that require calculation of expectations, variances, and covariances of order statistics. Also, numerical calculation of high-order multivariate normal integrals is cumbersome. Since the Newton–Raphson algorithm is iterative and the number of competitors and races in a data set can be very large, the procedure is numerically intensive, as a large number of calculations must be effected.
Aside from these technical issues, the procedure relies on statistical approximations that may not be always adequate. For instance, in a high-dimensional, nonnormal, multivariate distribution, the components of the mode of the joint posterior density may be far from the corresponding posterior expectations. Also, computing a measure of precision of estimates of θ is a task by itself: a Gaussian approximation to the conditional (given Λ) posterior covariance matrix is obtained by inverting the negative of the Hessian matrix and by evaluating this inverse at the converged values of θ. The diagonal elements of this inverse give a large sample approximation to the conditional posterior variances of the elements of θ. Last but not least, the procedure for estimating parameters in Λ is also based on Gaussian approximations and, furthermore, the approximate modal inferences about θ do not take into account the uncertainty of the approximate estimates of Λ. All these difficulties can be circumvented via MCMC procedures, as discussed in the following sections.
FULLY BAYESIAN ANALYSIS
Joint posterior distribution:
The Bayesian structure given above is maintained, but the parameter vector is augmented with the unobserved liabilities as latent variables (Tanner and Wong 1987; Sorensen et al. 1995). A main difference with a standard latent variable problem is that, in our setting, the sample space of the liabilities is order constrained, within events.
After augmentation with the liabilities l, the joint posterior density is expressible as
![]() |
(10) |
Above, the indicator set Ik denotes the order of liabilities, or ranking attained, observed in competition k. For example, if competitors 1, 5, 7, and 14 participate in event 20, and the ranking observed is 5 > 14 > 1 > 7, then, following the notation employed in (10),
.
Markov chain Monte Carlo sampling:
A Gibbs sampler is described for effecting draws from the joint posterior distribution (10). This is a well-known procedure for sampling from joint distributions; see, e.g., Gilks et al. (1996) and Robert and Casella (2005). Briefly, the Gibbs sampler loops through all conditional posterior distributions, with uncertain quantities sampled either blockwise or piecewise. All samples from the fully conditionals are accepted with probability 1. Samples from early iterations are discarded as “burn in”; subsequently, at the end of each cycle of sampling, the coordinates of sample are regarded as draws from the corresponding marginal posterior distributions. The chain can be thinned, such that samples are as lowly correlated (serially) as desired. If storage capacity is not a limiting factor, then it is better to use all samples in the post-Gibbs analysis. Convergence diagnostics and processing issues are discussed in Cowles and Carlin (1996), Sorensen and Gianola (2002), and Gelman et al. (2003).
Sampling of liabilities:
The form of (10) indicates that the conditional posterior distribution of the liabilities in event k has the form
![]() |
(11) |
where ELSE denotes the unknown liabilities at all other events, θ, Δ, δ, and DATA. Note that, conditionally on ELSE, the liabilities at different events are mutually independent. However, within event k, liabilities are not independent, because of the sample space constraint imposed by knowledge of its outcome, represented by Ik. The dependence arises through the sample space, but the kernel of the conditional distribution of the nk liabilities in event k, given ELSE, is proportional to the product of the kernels under conditional independence. Sampling from the space-constrained distribution is straightforward:
Draw “independently” the nk liabilities of competitors in event k (whose indexes are in Ik) from the distributions
, j ∈ Ik.
If the liabilities sampled satisfy the order constraint in Ik, then accept the vector lk as a draw from the posterior distribution. Otherwise, reject the proposal and repeat the sampling until a valid draw is effected.
This procedure, although simple, is typically inefficient and it may have a large rejection rate. A more efficient sampling technique, described by Devroye (1986), is given in more detail in Sorensen and Gianola (2002). To illustrate, let the outcome of competition 20 be as in the example above; that is, . Hence, in event 20, the sample space of the liabilities is l7,20 < l1,20 < l14,20 < l5,20. To sample the four liabilities, one proceeds from either first to last or last to first (the liability of the winner can be set equal to zero, as stated above). For instance:
Sample l7,20 from
without any constraint. This is the realized value of the liability of the last competitor.
- Sample l1,20 from the truncated normal distribution
, where the parameters are as in the absence of truncation. The realized value is formed as
where
is the standard normal distribution function and U is a uniform random number in (0,1).
- Sample l14,20 from the truncated normal process
, forming the realized value as
- Sample the liability of the winner from
as
Once the K events are processed independently, this produces a realization of the vector of liabilities at iteration s,
, which becomes the “data” in model (3) at iteration s = 1, 2, … , S of the Gibbs sampler.
Sampling of parameters:
Given the liabilities, the conditional posterior density of all parameters can be deduced from (10) after fixing the vector l. This yields
![]() |
(12) |
Note that, given the liabilities, the information provided by the sets Ik is redundant. The density above is that of a Bayesian mixed-effects model under Gaussian assumptions (Gianola and Fernando 1986; Wang et al. 1993, 1994) with the sampled liabilities playing the role of data. The conditional posterior distributions of θ and Δ are in closed form (Wang et al. 1993, 1994), as follows. The location parameters θ can be sampled from the multivariate normal process
![]() |
(13) |
where
![]() |
and
![]() |
The draw from distribution (13) can be done either in a single pass (Garcia-Cortés and Sorensen 1996) or element by element (Wang et al. 1994); the rate of convergence to the equilibrium distribution is lower and the autocorrelation between samples higher when sampling is piecewise.
All conditional posterior distributions of the dispersion parameters are scaled inverted chi square and are mutually independent. The distributions are
![]() |
(14) |
![]() |
(15) |
and
![]() |
(16) |
The Gibbs sampling algorithm consists of drawing liabilities from distributions (11), such that the order constraints are satisfied, followed by draws from (13) and (14)–(16) to obtain parameter updates. The sampling process is repeated as needed, after convergence diagnostics are satisfied, and such that the Monte Carlo error of estimation of features of the posterior distribution is small enough.
Inference:
The procedure of Tavernier (1991) produces conditional posterior modes of location parameters, as already noted, but can also be used to estimate the probability of observing a certain outcome, even in a hypothetical competition, although ignoring uncertainty about estimates. The fully Bayesian Markov chain Monte Carlo procedure is more flexible. First, it is possible to produce a complete posterior distribution for any uncertain quantity, e.g., the competing ability aj + pj of individual j or the additive genetic variance . In addition, it is possible to estimate the probability of any event related to a future competition. Let If be an order associated with a yet to be realized event referred to as “future,” with nf participants. The predictive probability distribution of If, posterior to historical data (DATA), is given by
![]() |
(17) |
where denotes expectation with respect to the distribution (.|.). The integral in (17) does not have a closed form, since the posterior distribution
is unknown. However, it can be estimated from the Markov chain Monte Carlo samples via ergodic averaging.
Let , s = 1, 2, … , S be the
-coordinate of a sample from the posterior distribution
, with density (10), and let
![]() |
be the probability of observing the ranking in If in the future liabilities when . Note that, since event effects are assumed to be independent, DATA will not contain information about a future event effect gf. This does not pose a problem, as the posterior distribution can be augmented with this effect; however, DATA will convey information about
. A simulation-consistent estimator of (17) is
![]() |
(18) |
where is an indicator variable taking the value 1 if the ranking in If is attained in sample s and 0 otherwise. In practice, one samples liabilities for future event f from
![]() |
and checks whether or not the realized values satisfy if so,
. This process is repeated for every Monte Carlo sample, leading directly to (18).
A similar procedure can be used to estimate the probability of any future event A, such as whether “participant” j is first, second, or third in a competition involving nf competitors; let such an event be denoted as A. Then, the probability of observing A in a future event can be estimated as
![]() |
Operationally, one samples liabilities lf from as given above and checks whether or not A is realized, in which case
and 0 otherwise.
NONPARAMETRIC DISTRIBUTION FOR EFFECTS
Above, it was assumed that prior uncertainty about permanent environmental or event effects could be described reasonably by the normal distributions and
, respectively. However, it might be sensible to make less strong assumptions about these distributions. For instance, there may be clusters of events that are more similar to each other, i.e., clusters representing different, unknown, levels of competition. Likewise, there may be some competitors that receive similar “preferential” treatment. A suitable representation of uncertainty may be that furnished by the Dirichlet process prior (Escobar 1994; MacEachern 1994; Kleinman and Ibrahim 1998; Van Der Merwe and Pretorius 2003). The case of the permanent environmental effects is discussed here. A priori, let the permanent effects pi be independently distributed as
![]() |
where G is some general distribution. In turn, assume that G follows a Dirichlet process (DP) prior
![]() |
(Ferguson 1973; Antoniak 1974), meaning that G is a random member of a space of distributions, with M and being parameters of these process. For example, it could be that
is taken as “base prior,” a distribution approximating the nonparametric shape of G, with M interpretable as a degree of belief on how close
is to G. The parameter M is such that, when
, then
; the same is true if all random effects are identical (Kleinman and Ibrahim 1998). When G is mixed over the Dirichlet process, the Polya urn representation of the prior distribution of the random effects (Kleinman and Ibrahim 1998) is, for
as base prior,
![]() |
where the probabilities P1, P2, … , PC−1 follow from the parameters of the Dirichlet process. Using results in Escobar (1994), Kleinman and Ibrahim (1998), and Van Der Merwe and Pretorius (2003), the prior distribution of the random effects generates the sequence
![]() |
Note that as , the prior distribution tends toward the parametric process
, as noted earlier. The preceding implies that the conditional prior distribution of a permanent environmental effect (i, say), given all other such effects, is
![]() |
(19) |
where
![]() |
and B is the set over which the sum is taken (Escobar 1994; Van Der Merwe and Pretorius 2003). Note that pi can be equal to the permanent environmental effect of other subjects with nonnull prior probability; this has an implication in the construction of conditional posterior distributions. Representation (19) illustrates that the conditional prior distribution of any of the permanent effects, given all other p's, is a mixture of C − 1 degenerate distributions () with point masses at pj
and of the parametric base distribution
. The mixing probabilities of states j ≠ i are
for each of the C − 1 “conditioning” states, and
for the probability that pi is a draw from the normal process.
Given the liabilities, and assuming temporarily that M is assigned a fixed value, the joint posterior density of all uncertain quantities is obtained by modifying (12) into
![]() |
(20) |
Above, pi ∼ G denotes that pi follows the unknown distribution G drawn from a Dirichlet process with as base measure and with known parameter M. Since the Dirichlet process allows pi to take any of the other p's as a current value, data from all individuals with records of performance contribute to the conditional posterior distribution of pi. One has as conditional posterior distribution
![]() |
(21) |
Under standard parametric assumptions, the fully conditional posterior distribution of pi is given by
![]() |
(22) |
Using well-known results (e.g., Wang et al. 1993, 1994; Sorensen and Gianola 2002) it can be shown that
![]() |
(23) |
where
![]() |
is the column of Zp in (3) pertaining to individual j, and
![]() |
In addition, let
![]() |
(24) |
with this integral expressible in closed form. Note that the integrand is the product of the densities of all liabilities of individual i, with β, a, and g as fixed parameters (the p's of other individuals do not enter into the model for the liabilities of individual i) times the density. Hence, with li being the ri × 1 vector of liabilities of the individual, and
, and
being incidence matrices relating li to β, a, and g, respectively, a standard integration yields
![]() |
(25) |
where
![]() |
and is a matrix of 1's of order ri. Employing (23) and (24) in (22) leads to
![]() |
so that
![]() |
(26) |
The form of (26) indicates that the fully conditional posterior distribution of pi is a mixture of the C − 1 degenerate distributions with point masses at pj
and of the parametric distribution
. Let now
![]() |
Similar to that in Van Der Merwe and Pretorius (2003), the rule for drawing samples from the conditional posterior distribution pi|ELSE is to take as a current value for pi
![]() |
(27) |
where pj is the current state in the Markov chain of the permanent environmental effect of individual j, and Qj is evaluated at the current values of all parameters entering into this probability. In the calculation of Qj, the permanent effect pi is replaced by pj when computing . This means that values of pj that generate liabilities li with higher plausibility are assigned a higher probability of selection. Eventually, this leads to clustering of permanent environmental effects into “plausibility” groups.
The fully conditional posterior distributions of all other parameters can be readily deduced from (20) by fixing p as well as the liabilities
![]() |
(28) |
The conditional posterior distribution of β, a, and g given everything else is the multivariate normal process
![]() |
(29) |
where
![]() |
The conditional posterior distributions of the variance components are as in the parametric model, that is,
![]() |
(30) |
![]() |
(31) |
and
![]() |
(32) |
The Gibbs sampler draws liabilities as before, permanent environmental effects from (27), β, a, and g from (29), and the three variance components from the preceding expressions.
At each iteration of the Gibbs sampler, the procedure induces a clustering structure of permanent environmental effects, with each cluster consisting of individuals sharing the same value of pi. Following Van Der Merwe and Pretorius (2003), let t be the number of unique permanent environmental effects (0 < t ≤ C). The posterior distribution of t can be obtained readily from the sampling procedure. Bush and MacEachern (1996) suggest embedding an optional additional step in the Gibbs sampling procedure: let the number of clusters identified at iteration m be ; this means that there are
distinct values of the p's, which are labeled as
. The model for the liabilities in (3) is then rewritten as
![]() |
where is a condensation of Zp of order
obtained by assigning liabilities to the appropriate cluster of permanent environmental effects to which they belong. At that point of the scheme, Kleinman and Ibrahim (1998) and Van Der Merwe and Pretorius (2003) suggest drawing a sample of η from the process
![]() |
where
![]() |
and
![]() |
Once a sample of η is obtained, say , the permanent environmental effects are reconstituted as
, where
is a C ×
matrix that allocates cluster values to permanent environmental effects. Then, the MCMC process continues for other parameters as usual.
Turn attention now to inferring M, the degree of belief parameter of the Dirichlet process. Using results in Antoniak (1974) and West (1992), but giving more details, as their developments are compact, the prior distribution of the number of clusters t may be written as
![]() |
where the factor does not involve M;
is the Gamma function. As pointed out by West (1992), if one has samples of t (see above), the conditional posterior distribution of M is expressible as
![]() |
where p(M) is the prior density of M. This result follows, because, if one knows p, this gives immediately the number of clusters t as well as the number of permanent environmental effects per cluster. Then, with B(. , .) being the Beta function
![]() |
West (1992) notes that this density results from marginalizing the joint distribution [M, x|t, ELSE], where x is a continuous random variable taking values between 0 and 1. Hence, it follows that
![]() |
Consequently, the conditional posterior density of the auxiliary variable x is
![]() |
(33) |
so that . In addition, the conditional posterior distribution of M is
![]() |
and its form depends on the prior adopted for M. Since M must be positive, West (1992) suggests a Gamma (a, b) distribution as prior, so that
![]() |
The normalized density is
![]() |
The integrals in the denominator yield
![]() |
Hence, letting a* = (t + a) and the conditional posterior distribution becomes
![]() |
(34) |
This is a mixture of the two Gamma distributions indicated, with mixing probabilities πx and 1 − πx. Note that
![]() |
Since ,
![]() |
At the end of the MCMC sampling process there will be S samples of the number of clusters t and of the auxiliary variables x. The density of the marginal posterior distribution of M can be estimated (West 1992) using the Rao–Blackwell estimator
![]() |
(35) |
where , and
. If a = 0 and b = 0, the Gamma prior degenerates to
or, equivalently, to
∝ constant. If such an improper prior is adopted, the Rao–Blackwell estimator reduces to
![]() |
with a* = t and b* = −log x used in the calculation of . While the conditional posterior distribution of M is well defined for a = b = 0, this does not guarantee that its marginal posterior distribution will be always proper. Hence, the uniform prior on
should be used with caution.
CONCLUSION
This study presents parametric and semiparametric procedures for analysis of rank data. Observations presented as ranks arise often in behavioral or cognitive sciences and in the context of competitions such as horse jumping events or greyhound races. For instance, Johnson et al. (2002) carried out a metaanalysis of 30 primate (24 genera represented) intelligence experiments grouped into “paradigms” and “procedures within paradigm.” Paradigms are types of intelligence tests (e.g., “reversal learning,” where individuals learn to pick one object rather than another, and then the value of the objects changes, so that a previously unrewarded object is now rewarded); “procedures” refer to different methodologies used to investigate the paradigm. In one of six reversal studies the observed ranking was Cebus > Hylobates > Aotus > Saimiri. In the analysis of the ranks, they used Thurstonian models similar to those described in this article. Their Bayesian formulation allowed a clear assessment of uncertainty, which was viewed as especially important because, in these experiments, raters usually do not rank >15 items, so asymptotic approximations may be inadequate.
In a study of competitions of jumping horses, Tavernier (1990) argued that a “physical” measure of performance (e.g., time, number of obstacles broken or refused) is not always available, and that it is unclear how to measure the technical difficulty of an event. Also, measures such as yearly earnings are available only for horses that earn money, so that the information comes from a selected sample, which may distort inference considerably. As discussed earlier, Tavernier (1990, 1991) proposed an approximate Bayesian method in which ranks are viewed as being the manifestation of latent variables, as in the present study. In the analysis, involving ∼19,000 horses participating in all jumping competitions that took place in France in 1987, ∼14% of the horses had an accuracy <0.14 ("accuracy" ranges between 0 and 1), and 28% did not reach 0.45. These accuracies are approximations and probably understate uncertainty, because the error of estimates of parameters is not taken into account in the calculations. Important numerical problems were encountered by Tavernier (1990). For each Newton–Raphson iteration, and for a race with n horses, she had to calculate the following multivariate normal integrals: (1) an integral of order n − 1, (2) n − 1 integrals of order n − 2, and (3) integrals of order n − 3. For this reason, she used approximations to integrals based on Taylor series expansions. None of these integrals need to be computed explicitly or approximated in our procedure, as all calculations rely on Monte Carlo sampling. Another difficulty in the conditional maximum a posteriori method of Tavernier (1990, 1991) is the computation of the joint posterior mode, which, in the context of horse races in France may involve solving (reiteratively) a system exceeding 100,000 unknowns. In our procedure, solving of equations is replaced by effecting draws from conditional posterior distributions and, under the assumptions of this article, all these are recognizable. A potential difficulty may be slow mixing of the MCMC algorithm, but this can often be accelerated by either changing the algorithm or adopting a different parameterization. Introducing a Dirichlet process prior enhances the level of difficulty of the calculations, but it confers robustness with respect to possible departures from the normality assumption for the distributions of events or of permanent environmental effects. Further, the nonparametric treatment of permanent environmental effects is a flexible alternative to a thick-tailed distribution (Strandén and Gianola 1998, 1999; Rosa et al. 2001, 2003, 2004), in the sense that it allows for asymmetry, as well as for allocation of effects into clusters with some biological meaning.
Here, it was assumed that event effects were equally distributed a priori, but there may be heterogeneity with respect to level of difficulty. This can be accommodated via hierarchical modeling. For instance, the prior distribution could be modified into
, where α is a vector of unknown “levels of difficulty” effects, and M is a known matrix relating g to α; in turn, α could be assigned a vague prior. In their primate studies, Johnson et al. (2002) fitted a latent variable model including a genus effect and a “paradigm-genus” deviation; they found that the latter did not contribute to variability appreciably and concluded that taxa differed in domain-general ability. A nonparametric alternative would be to assign a Dirichlet process prior to the event effects g and let the data drive to clusters representing level of difficulty.
In the context of horse breeding, it may be that there are individuals whose genes predispose them to perform well in certain types of competition, but not in others; i.e., there may be differents sets of genes for different types of competition. Our model can be extended by assuming that each individual has a liability peculiar to each type of competition. Suppose there are three types of competition; here, there would be three liabilities per individual with a within-individual correlation matrix R0, say. All three liabilities would be realized only if the individual in question participates in the three types of events. A simpler model could postulate that liabilities for different events are independent, but that each individual possesses an additive genetic effect peculiar to the type of event. An additive genetic variance–covariance matrix G0 would be introduced and inferred as in a multivariate Bayesian mixed-effects model (Sorensen and Gianola 2002).
In many sports events, e.g., races or marathons, there are no ties. However, ties can be handled in our procedure via a simple modification of the limits the liability variates can take. For example, consider a competition with four participants (5, 7, 14, and 20), such that the observed ranking is here,
denotes a tie in second place between individuals 7 and 14. In the process of sampling the liabilities, it suffices to accept samples such that 5 and 20 take the largest and smallest values, respectively, with the liabilities 7 and 14 taking any order between those of 5 and 20; then, the ranking of 7 and 14 would vary at random over MCMC samples. Alternatively, one could assign the average of the two liabilities sampled for 7 and 14 in each of the MCMC iterations. Johnson et al. (2002) handled ties via a parametric model for the probability that two genera are tied, with one additional parameter (with a corresponding prior distribution) introduced in the implementation.
A useful extension of this model, at least in the context of evaluation of horse races, would be one for a joint analysis of ranks and of some “continuous” variate, such as time in the event or earnings. As mentioned above, there are situations in which the rank information is available for all competitors, but earnings are available only for the placed competitors. Clearly, earnings for the other competitors are not missing completely at random. However, if the conditional probability of missing earning information depends on the ranks, then missingness would be ignorable in a joint analysis of ranks and of earnings (Rubin 1976; Im et al. 1989; Sorensen et al. 2001). For example, such a model could postulate a joint normal distribution of the liabilities and of the continuous trait measured on some competitors.
Acknowledgments
Yu-Mei Chang and Xiao-Lin Wu are thanked for helpful comments. Research was completed while the senior author was Mercator Professor (Deutsche Forschungsgemeinschaft) at Georg-August-University, Göttingen, Germany. Support by the Wisconsin Agriculture Experiment Station and by grants National Research Initiatives Competitive Grants Program/U.S. Department of Agriculture 2003-35205-12833, National Science Foundation (NSF) DEB-0089742, and NSF DMS-NSF DMS-044371 is acknowledged.
References
- Antoniak, C. E., 1974. Mixtures of Dirichlet processes with applications to nonparametric problems. Ann. Stat. 2: 1152–1174. [Google Scholar]
- Bock, R. D., and L. V. Jones, 1968. The Measurement and Prediction of Judgment and Choice. Holden–Day, San Francisco.
- Bush, C. A., and S. N. MacEachern, 1996. A semiparametric Bayesian model for randomized block designs. Biometrika 83: 275–285. [Google Scholar]
- Cowles, M. K., and B. P. Carlin, 1996. Markov chain Monte Carlo convergence diagnostics: a comparative review. J. Am. Stat. Assoc. 91: 883–904. [Google Scholar]
- Dempster, E. R., and I. M. Lerner, 1950. Heritability of threshold characters. Genetics 35: 212–236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Devroye, L., 1986. Non-Uniform Random Variate Generation. Springer, New York.
- Escobar, M. D., 1994. Estimating normal means with a Dirichlet process prior. J. Am. Stat. Assoc. 89: 268–275. [Google Scholar]
- Falconer, D. S., 1965. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. 29: 51–76. [Google Scholar]
- Ferguson, T. S., 1973. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1: 209–230. [Google Scholar]
- Foulley, J. L., S. Im, D. Gianola and I. Höschele, 1987. Empirical Bayes estimation of genetic value for n binary traits. Genet. Sel. Evol. 19: 197–224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garcia-Cortés, L. A., and D. Sorensen, 1996. On a multivariate implementation of the Gibbs sampler. Genet. Sel. Evol. 28: 121–126. [Google Scholar]
- Gelman, A., J. B. Carlin, H. S. Stern and D. B. Rubin, 2003. Bayesian Data Analysis, Ed. 2. Chapman & Hall, London.
- Gianola, D., 1982. Theory and analysis of threshold characters. J. Anim. Sci. 54: 1079–1096. [Google Scholar]
- Gianola, D., and R. L. Fernando, 1986. Bayesian methods in animal breeding theory. J. Anim. Sci. 63: 217–244. [Google Scholar]
- Gianola, D., and J. L. Foulley, 1983. Sire evaluation for ordered categorical data with a threshold model. Genet. Sel. Evol. 15: 201–223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gianola, D., J. L. Foulley and R. L. Fernando, 1986. Prediction of breeding values when variances are not known. Genet. Sel. Evol. 18: 485–498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilks, W. R., S. Richardson and D. J. Spiegelhalter, 1996. Markov Chain Monte Carlo in Practice. Chapman & Hall, London.
- Grüneberg, H., 1952. Genetical studies on the skeleton of the mouse. IV. Quasi-continuous variations. J. Genet. 51: 95–114. [Google Scholar]
- Harville, D. A., 1974. Bayesian inference of variance components using only error contrasts. Biometrika 61: 383–385. [Google Scholar]
- Henderson, C. R., 1976. A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32: 69–83. [Google Scholar]
- Henery, R. J., 1981. Permutation probabilities as models for horse races. J. R. Stat. Soc. B 43: 86–91. [Google Scholar]
- Im, S., R. L. Fernando and D. Gianola, 1989. Likelihood inferences in animal breeding under selection: a missing data theory viewpoint. Genet. Sel. Evol. 21: 399–414. [Google Scholar]
- Johnson, V. E., R. O. Deaner and C. P. Van Schaik, 2002. Bayesian analysis of multi-study rank data with application to primate intelligence ratings. J. Am. Stat. Assoc. 97: 8–17. [Google Scholar]
- Kleinman, K. P., and J. G. Ibrahim, 1998. A semiparametric Bayesian approach to the random effects model. Biometrics 54: 921–938. [PubMed] [Google Scholar]
- Lawrence, S., R. Beasley, I. Doull, B. Begishvili, F. Lampe et al., 1994. Genetic analysis of atopy and asthma as quantitative traits and ordered polychotomies. Ann. Hum. Genet. 58: 359–368. [DOI] [PubMed] [Google Scholar]
- MacEachern, S. N., 1994. Estimation of normal means with a conjugate style Dirichlet process prior. Comm. Stat. Sim. 23: 727–741. [Google Scholar]
- Robert, C. P., and G. Casella, 2005. Monte Carlo Statistical Methods, Ed. 2. Springer, New York.
- Rosa, G. J. M., D. Gianola and J. I. Urioste, 2001. Assessing relationships between genetic evaluations using robust regression with an application to Holsteins in Uruguay. Acta Agric. Scand. Sect. A 51: 21–34. [Google Scholar]
- Rosa, G. J. M., C. R. Padovani and D. Gianola, 2003. Robust linear mixed models with normal/independent distributions and Bayesian MCMC implementation. Biom. J. 45: 573–590. [Google Scholar]
- Rosa, G. J. M., D. Gianola and C. R. Padovani, 2004. Bayesian longitudinal data analysis with mixed models and thick-tailed distributions using MCMC. J. Appl. Stat. 31: 855–873. [Google Scholar]
- Rubin, D. B., 1976. Inference and missing data. Biometrika 63: 581–582. [Google Scholar]
- Sorensen, D., and D. Gianola, 2002. Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. Springer–Verlag, New York.
- Sorensen, D., S. Andersen, D. Gianola and I. Korsgaard, 1995. Bayesian inference in threshold models using Gibbs sampling. Genet. Sel. Evol. 27: 229–249. [Google Scholar]
- Sorensen, D. A., R. L. Fernando and D. Gianola, 2001. Inferring the trajectory of genetic variance in the course of artificial selection. Genet. Res. 77: 83–94. [DOI] [PubMed] [Google Scholar]
- Strandén, I., and D. Gianola, 1998. Attenuating effects of preferential treatment with Student-t mixed linear models: a simulation study. Genet. Sel. Evol. 30: 565–583. [Google Scholar]
- Strandén, I., and D. Gianola, 1999. Mixed effects linear models with t-distributions for quantitative genetic analysis: a Bayesian approach. Genet. Sel. Evol. 31: 25–42. [Google Scholar]
- Tanner, M. A., and W. Wong, 1987. The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82: 528–550. [Google Scholar]
- Tavernier, A., 1990. Estimation of breeding value of jumping horses from their ranks. Livest. Prod. Sci. 26: 277–290. [Google Scholar]
- Tavernier, A., 1991. Genetic evaluation of horses based on ranks in competitions. Genet. Sel. Evol. 23: 159–173. [Google Scholar]
- Thurstone, L. L., 1927. A law of comparative judgement. Psych. Rev. 34: 278–286. [Google Scholar]
- Van Der Merwe, A. J., and A. L. Pretorius, 2003. Bayesian estimation in animal breeding using the Dirichlet process prior for correlated random effects. Genet. Sel. Evol. 35: 137–158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang, C. S., J. J. Rutledge and D. Gianola, 1993. Marginal inferences about variance components in a mixed linear model using Gibbs sampling. Genet. Sel. Evol. 25: 41–62. [Google Scholar]
- Wang, C. S., J. J. Rutledge and D. Gianola, 1994. Bayesian analysis of mixed linear models via Gibbs sampling with an application to litter size in Iberian pigs. Genet. Sel. Evol. 26: 91–115. [Google Scholar]
- West, M., 1992. Hyperparameter estimation in Dirichlet process mixture models. Technical Report 92-A03. ISDS, Duke University, Durham, NC.
- Wright, S., 1934. An analysis of variability in number of digits in an inbred strain of guinea pigs. Genetics 19: 506–536. [DOI] [PMC free article] [PubMed] [Google Scholar]