Using Parametric Multipoint Lods and Mods for Linkage Analysis Requires a Shift in Statistical Thinking

Susan E Hodge; Zeynep Baskurt; Lisa J Strug

doi:10.1159/000331463

. 2011 Dec 23;72(4):264–275. doi: 10.1159/000331463

Using Parametric Multipoint Lods and Mods for Linkage Analysis Requires a Shift in Statistical Thinking

Susan E Hodge ^a, Zeynep Baskurt ^b, Lisa J Strug ^b,^c,^*

PMCID: PMC3267993 PMID: 22189469

Abstract

Multipoint (MP) linkage analysis represents a valuable tool for whole-genome studies but suffers from the disadvantage that its probability distribution is unknown and varies as a function of marker information and density, genetic model, number and structure of pedigrees, and the affection status distribution [Xing and Elston: Genet Epidemiol 2006;30:447–458; Hodge et al.: Genet Epidemiol 2008;32:800–815]. This implies that the MP significance criterion can differ for each marker and each dataset, and this fact makes planning and evaluation of MP linkage studies difficult. One way to circumvent this difficulty is to use simulations or permutation testing. Another approach is to use an alternative statistical paradigm to assess the statistical evidence for linkage, one that does not require computation of a p value. Here we show how to use the evidential statistical paradigm for planning, conducting, and interpreting MP linkage studies when the disease model is known (lod analysis) or unknown (mod analysis). As a key feature, the evidential paradigm decouples uncertainty (i.e. error probabilities) from statistical evidence. In the planning stage, the user calculates error probabilities, as functions of one's design choices (sample size, choice of alternative hypothesis, choice of likelihood ratio (LR) criterion k) in order to ensure a reliable study design. In the data analysis stage one no longer pays attention to those error probabilities. In this stage, one calculates the LR for two simple hypotheses (i.e. trait locus is unlinked vs. trait locus is located at a particular position) as a function of the parameter of interest (position). The LR directly measures the strength of evidence for linkage in a given data set and remains completely divorced from the error probabilities calculated in the planning stage. An important consequence of this procedure is that one can use the same criterion k for all analyses. This contrasts with the situation described above, in which the value one uses to conclude significance may differ for each marker and each dataset in order to accommodate a fixed test size, α. In this study we accomplish two goals that lead to a general algorithm for conducting evidential MP linkage studies. (1) We provide two theoretical results that translate into guidelines for investigators conducting evidential MP linkage: (a) Comparing mods to lods, error rates (including probabilities of weak evidence) are generally higher for mods when the null hypothesis is true, but lower for mods in the presence of true linkage. Royall [J Am Stat Assoc 2000;95:760–780] has shown that errors based on lods are bounded and generally small. Therefore when the true disease model is unknown and one chooses to use mods, one needs to control misleading evidence rates only under the null hypothesis; (b) for any given pair of contiguous marker loci, error rates under the null are greatest at the midpoint between the markers spaced furthest apart, which provides an obvious simple alternative hypothesis to specify for planning MP linkage studies. (2) We demonstrate through extensive simulation that this evidential approach can yield low error rates under the null and alternative hypotheses for both lods and mods, despite the fact that mod scores are not true LRs. Using these results we provide a coherent approach to implement a MP linkage study using the evidential paradigm.

Key Words: Evidential paradigm, Likelihood, Parametric linkage, Complex disease

1. Introduction

Linkage analysis can provide an important tool for gene mapping on its own, and can provide data to guide prioritization of potential causal sequence variants identified as a part of next-generation sequence studies. It is commonplace for linkage studies to compute lod scores or mod scores (lod scores maximized over unknown disease model parameters), and to declare significant linkage when the observed lod/mod score exceeds some critical value deemed significant, where that critical value is determined by frequentist paradigm arguments. (Other published alternatives do exist, e.g. the posterior probability of linkage (e.g. Vieland [1]) and the false discovery rate (e.g. Devlin et al. [2])). That is, a given, fixed type I error level (test size α) determines the critical value at which one declares significant linkage evidence, and this critical value is calculated from the probability distribution of the lod score.

In multipoint (MP) linkage analysis, one computes the MP lod/mod score based on two simple hypotheses for the location parameter (i.e. trait locus is unlinked vs. trait locus is located at a particular position), and for this scenario, standard likelihood ratio (LR) testing theory does not apply. The distribution of the test statistic (lod or mod) under the null hypothesis does not have a standard form. Rather, this distribution varies as a function of marker information and density, genetic model, number and structure of pedigrees, and the affection status distribution [3, 4]. Consequently, the critical value at which the MP lod or mod score can be declared significant can vary by dataset and even by marker. Xing and Elston [3] suggest that, as a consequence, one should take care in interpreting MP parametric lod scores. However, Hodge et al. [4] point out that despite the unknown and varying critical value, the probability of obtaining strong evidence favoring linkage from the MP lod score when there is no linkage (referred to as M₀), converges to zero with increasing information. This behavior was already known for lod scores (proven by Royall [5]) and we demonstrated this via simulation for mods previously [4]. This implies that MP lod scores provide reliable linkage evidence on their own, irrespective of their corresponding frequentist critical value. Interpreting the statistical evidence for linkage directly from the LR or lod score is, in fact, an implementation of the evidential statistical paradigm.

The evidential paradigm [5–7] provides an alternative way to assess statistical evidence in a given dataset. In this paradigm, one bases inference solely on pure likelihood methods rather than using alternative measures of statistical evidence such as p values or Bayes factors. One uses the LR, calculated from two simple hypotheses for the parameter of interest, as the measure of evidence. One also calculates error probabilities, in the forms of probabilities of misleading evidence and probabilities of weak (inconclusive) evidence, to provide measures of the operating characteristics of the paradigm under different scenarios. These probabilities ensure that interpreting linkage evidence directly from the LR has a low probability of leading to incorrect conclusions about linkage. However, these error rates play no role in the subsequent interpretation of the statistical evidence for linkage, which is based solely on the value of the observed LR for two simple hypotheses. We review the paradigm more fully in Section 2.

Evidential methodology has been developed for diverse applications including, for example, bioequivalence trials [8], nonparametric approaches [9], health economics [10], robust regression [11], two-point linkage analysis with known disease model [12], sample size estimation [13], and genetic association analysis [14], as well as for dealing with the multiple testing paradox [15]. More complete details can be found in those papers. Here we extend this suite of evidential methodology to MP linkage analysis, under both known and unknown disease models, allowing investigators to conduct MP linkage analysis evidentially.

The evidential paradigm draws the important distinction between uncertainty and statistical evidence [7]: The LR measures the strength of statistical evidence, and this measure is mathematically and conceptually distinct from the frequency with which misleading evidence could occur (error probabilities). As a consequence, the evidential paradigm has the advantage of interpreting the evidence in the observed data set, as opposed to interpreting statistical evidence in terms of repetitions of the same experiment.

Another advantage of the evidential paradigm is that one can determine, after performing the analysis, whether one actually does have inconclusive (‘weak’) evidence or not. This contrasts with a power analysis in the frequentist approach, wherein one has only a probability ahead of time of being unable to reject the null hypothesis. Moreover, a power analysis does not distinguish between evidence that is truly inconclusive and evidence that actually strongly supports the null, as an evidential analysis does.

In addition to these and the other usual advantages of the evidential over the frequentist paradigm [6, 16], in MP linkage analysis the investigator can specify a common criterion for all markers and all studies. In contrast, under the frequentist paradigm, it is not straightforward to determine the required critical value to ensure a size α test corresponding to the MP lod/mod score, since the distribution of the MP lod is, in general, unknown [3–5]. Rather, once the α level is fixed at some value, say 0.05, then the criterion for linkage (e.g. 3.3) needs to be different, depending on the distribution of the MP lod/mod. In contrast, with the evidential paradigm, the user specifies a cutoff, choosing a criterion that corresponds to the desired level of ‘strength of evidence’.

In Section 2 we review salient aspects of the evidential paradigm. In Section 3 we develop two useful guidelines for investigators applying this paradigm. Then in Sections 4 (Methods) and 5 (Results) we present numerical results from our extensive simulations that, when combined with results from [4], illustrate how MP lod and mod scores can provide reliable linkage evidence in the absence of a corresponding frequentist critical value. Finally, in the Discussion (Section 6) we tie all this together to outline a coherent evidential approach to conducting a MP linkage study. Box 1 summarizes the overall algorithm involved in conducting a MP linkage analysis using the evidential paradigm.

Box 1. Algorithm to conduct multipoint linkage analysis using the evidential paradigm

(A) Study Design

Among all adjacent pairs of markers, choose the pair whose two loci are furthest apart. Call the distance between them d.

Simulate N replicates of a dataset consisting of pedigrees with structure and disease distribution as in one's dataset, and two markers separated by distance d.
Simulate both with linkage (disease locus located halfway between the two markers) and without linkage (disease locus located elsewhere) (Sec. 3.2).
If genetic model of trait is known, use that to calculate error probabilities based on lods; if not, calculate error probabilities based on mod scores.
Calculate M₀ and W₀ from the unlinked simulations (i.e. no linkage), and W₁ from the linked simulations, for chosen value of k (e.g., k = 32) (Sec. 3.1).

Ensure M₀ is small (e.g., ≥0.05), so that one has low probability of being misled by one's data.

Minimize W₀ and W₁ as much as possible.

(B) Study Analysis

Use multipoint linkage software to calculate genome-wide mod scores (or lod scores if trait model is known).

Where mod (lod) ≥ k, interpret this as strong evidence favoring linkage at that location.
e.g., if k = 32, it is 32 times more likely than not that the trait locus is located here.

Compute the support interval about the maximum (see Sec. 2.1) to obtain a set of genomic locations consistent with linkage.

This provides a set of locations across the genome whose linkage evidence is not substantially less than the evidence for linkage at the maximum.

Summarize regions of weak evidence – where LR is between k and 1/k – where one can neither conclude nor rule out linkage.

2. Review of the Evidential Paradigm and Previous Work

Royall [6] and Blume [7] introduced the evidential paradigm, and the methodology for two-point linkage analysis for known disease models was developed by Strug and Hodge [12]. Since evaluating the MP lod and mod scores over the whole genome is analogous to evaluating two-point lod and mod scores over the range of possible recombination fractions, the evidential paradigm is straightforward to apply to MP linkage analysis. Here we briefly summarize.

2.1. LR and the Criterion for Strong Evidence

To formulate the LR, one puts the null hypothesis H₀ of ‘no linkage’ in the denominator. For the numerator, one inserts a simple alternative hypothesis, H₁ (also see Sec. 3.2). Once the data have been observed, the investigator calculates and reports the LR as a function of the parameter of interest (in MP linkage analysis, that parameter is the location in the genome). The investigator then interprets each ratio as the strength of the evidence favoring various alternatively hypothesized parameter values over the null hypothesized value of no linkage.

One chooses a value k as the criterion for ‘strong evidence’, such that if a LR is greater than or equal to k, this is interpreted as strong evidence in favor of the hypothesis in the numerator, whereas if the LR is less than or equal to 1/k, that is strong evidence for the hypothesis in the denominator.¹ Benchmarks for k have been suggested and justified in the literature [6, 7, 16]. Some reasonable values of k are 32 and 100, whereas in the linkage literature, classically a value of k = 1,000 (corresponds to a lod score of 3.0) has been used.

Additionally, 1/k support intervals are calculated to give a set of parameter values that are consistent with the data, since the MLE is not better supported over these parameter values by a factor of k or more; these are analogous to confidence intervals yet have a different interpretation [6, 7, 9, 16, 17]. For lods and mods, one works with the log₁₀ LR, so one uses c, defined as c { log₁₀ k, as the corresponding criterion. If the lod or mod falls between c and – c, that outcome is interpreted as ‘weak evidence’, i.e. evidence that is not strong enough to favor either of the two hypotheses.

2.2. The Error Probabilities

Prior to data analysis, error probabilities are calculated from two simple hypotheses, the one representing ‘no linkage’, and the other a single simple alternative (e.g. a position across the genome for MP linkage analysis, or a specific value of the recombination fraction for two-point linkage analysis). The simple alternative is chosen for planning purposes only and has no effect on the investigator's ability to interpret the LR at other simple alternatives once the data have been observed, as we explain further below. If the null hypothesis is true, then a small LR leads to the correct conclusion, whereas a large one gives misleading information. The first outcome is referred to as ‘strong’ evidence and the second one as ‘misleading’ evidence. We denote their respective probabilities as follows:

\begin{array}{l} S_{0} \equiv Pr [log L R \leq - c | H_{0}] = probability of strong evidence \\ when H_{0} is true, \\ M_{0} \equiv Pr [log L R \geq c | H_{0}] = probability of misleading \\ evidence when H_{0} is true, \\ W_{0} \equiv Pr [- c < log L R < c | H_{0}] = probability of weak evidence \\ when H_{0} is true, \end{array}

(1)

where W₀ represents the third outcome, in which log LR falls between the two criteria. If the alternative hypothesis is true, a large LR leads to the correct conclusion, so we define:

\begin{array}{l} S_{1} \equiv Pr [log L R \geq - c | H_{1}] = probability of strong evidence \\ when H_{1} is true, \\ M_{1} \equiv Pr [log L R \leq - c | H_{1}] = probability of misleading \\ evidence when H_{1} is true, \\ W_{1} \equiv Pr [- c < log L R < c | H_{1}] = probability of weak evidence \\ when H_{1} is true, \end{array}

(2)

Note that it follows from (1) and (2) that

S_{i} + W_{i} + M_{i} = 1, for i = 0,1.

(3)

From (1) and (2), the reader can see that M₀ is analogous to type I error (α), and S₁ is analogous to power, in the frequentist paradigm. However, M₀ plays a different role than type I error: Instead of fixing a, and then seeing what cutoff value that corresponds to, the investigator fixes the cutoff value k (or c on the log 10 scale), then determines what the error levels are. The investigator is concerned with the corresponding M₀ (the evidential analog to type I error) only in the planning phase. Moreover, the investigator is less concerned with the actual value of this error probability and simply wants to ensure that it remains ‘small’ across the genome (i.e. less than some value, such as 0.05) [7]. This guarantees that interpreting strong evidence vis-à-vis the LR for two simple hypotheses has a low probability of leading one to draw incorrect conclusions about linkage evidence. Royall [5] has derived bounds that ensure this for lod scores. Here we extend this reasoning to MP mod scores, and we also investigate what happens under the alternative hypothesis (i.e. when the true state of nature is linkage).

The corresponding M₀ plays no role in the interpretation of the evidence strength, or in what critical values are required to interpret a given lod or mod score as representing strong linkage evidence or lack thereof; the error probabilities merely provide an assurance that the study design is reliable. This is because in the evidential paradigm, error probabilities and evidence strength are independent concepts [7, 12]. That is, their probability values do not affect the strength of the statistical evidence in the observed data, nor do they affect the probability that the observed evidence is misleading.

Royall [5] has shown that the probability of misleading evidence is bounded asymptotically for quite general one-parameter models (e.g. MP lod scores), for multi-parameter fixed-dimensional models, and even for profile likelihoods. In large samples, the type I error analog, M₀, cannot exceed Φ $(- \sqrt{2 ln k)}$ for any sample size or any alternatively hypothesized parameter value, where Φ is the standard normal cumulative density function (this result also holds for M₁). Therefore, for MP lod scores in large samples we can be confident that both M₀ and M₁ will remain small, and thus one is able to interpret the statistical evidence for linkage directly from the MP lod score, irrespective of a frequentist critical value. MP mod scores, on the other hand, are not LRs or profile LRs since they maximize the ratio over the unknown trait parameters in numerator and denominator simultaneously². Here we will evaluate the evidential error rates for MP mod scores via simulation, in order to ensure that MP mod scores are embodied by the operational characteristics fundamental to the evidential paradigm: (1) small M₀ (n, k) and M₁ (n, k) and (2) decreasing W0 (n, k) and W1 (n, k) with increasing sample size. These results will justify use of the evidential paradigm in MP linkage studies when the disease model is unknown and requires maximization over multiple trait parameters.

2.3. Our Previous Work

In Strug and Hodge [12] we applied the evidential paradigm to classical two-point linkage analysis, under simple generating models (GMs). We illustrated applications of the evidential paradigm in fully informative gametes, in double backcross sibling pairs, and in nuclear families, analyzing single-gene traits for which the disease model is known but reduced penetrance may be incorrectly specified. We combined analytical studies and simulations to develop the operational characteristics of the evidential paradigm as applied to linkage analysis, illustrating the resulting low rates of incorrect conclusions (i.e. small values of M₀ and M₁). We showed that, for reasonable values of k and of the recombination fraction specified under H₁ (i.e. Θ₁), M₀ and M1 were naturally very low, and W0 and W1 could be well controlled with sample size during the planning phase.

As part of the work in Hodge et al. [4] we presented some results for M₀ for MP lods and mods for two simple hypotheses (as computed in the evidential paradigm). In extensive simulations we confirmed that within the limits of simulation, not only were the M₀ very low, but, even more importantly, they decreased with increasing sample size for both MP lods and MP mods. (As mentioned in the Introduction, the lod results for two simple hypotheses follow directly from Royall [5], but the mod results required investigation by simulation.)

For example, we generated nuclear families under a dominant or recessive model with reduced penetrance, and analyzed them under the same model; i.e. the analysis model (AM) was the same as the GM for each analysis (lod analyses). Even when we used the low lod cutoff value of c = 0.9 (corresponds to k = 8), values of M₀ quickly dropped from very small (between 0.001 and 0.022) to essentially zero as dataset size increased from 10 to 30 families. For the corresponding mod analyses, using the same cutoff value, M₀ values were of course higher (though still below 0.04 even for datasets as small as 10 families) and required dataset sizes of 75 families before they approached zero; however, when we used a cutoff of c = 1.5 (k = 32), M₀ values were near zero by the time dataset size reached 50 families. For detailed numerical results see table 1 and figure 1 in Hodge et al. [4].

Table 1.

Relative magnitudes of S, W, and M for lods versus mods

When H₀ is true	When H₁ is true
S₀ mod ≤ S₀ lod	S₁ mod ≥ S₁ lod
W₀ mod ≥ W₀ lod	W₁ mod ≤ W₁ lod
M₀ mod ≥ M₀ lod	M₁ mod ≤ M₁ lod

Open in a new tab

Conclusions for S and M are proven rigorously (Appendix, Section 2). Conclusions for W are based on approximate arguments (Appendix, Section 3) and are borne out in the simulations for W₀ and W₁ (tables 4 and 6, respectively).

In the current study we continue with the situation of two simple hypotheses, and we extend these results to W0, as well as studying the behavior of the error probabilities for MP lods and mods under the alternative hypothesis of linkage, in order to ultimately present a coherent algorithm to implement MP linkage analysis using the evidential paradigm (Box 1).

3. Two Guidelines for Conducting Evidential MP Linkage Analysis

In order to conduct an MP linkage study evidentially, i.e. to calculate a lod/mod score and interpret the strength of the linkage evidence from that lod/mod value itself, one needs to know that the probability of errors is small for a given criterion, k, sample size and parameter vector. For lods we know the behavior of the error rates from Royall [5]. For mods, simulation is required in order to explore the behavior of these error probabilities. When we estimate the size of these error rates, two questions arise: (1) When is it sufficient to base error rate estimates on lods (i.e. rely on large sample results) rather than calculate the mod errors (Section 3.1)? (2) What parameter value should be used for the simple alternative hypothesis, to calculate errors (Section 3.2)? Box 1(A) summarizes the roles of these two guidelines in an evidential MP linkage study, while Box 1(B) provides the analysis algorithm.

3.1. Relative Magnitudes of Errors for Lods and Mods

Obviously, if one uses the same cutoff value for both lods and mods, the mod error rates will be higher than lod error rates if the true state of nature is ‘no linkage,’ and lower if the true state of nature is ‘linkage’. Thus, both M₀ and S₁ will be higher for mods than for lods.

It is less clear what happens to the W_i, the probabilities of weak evidence. It turns out that under some reasonable assumptions, W₀ is greater for mods, whereas W₁ is greater for lods; thus the W_i parallel the behavior of the M_i, not of the S_i. Table 1 summarizes these relationships, and our simulations (described below) demonstrate the patterns. Section 2 of the Appendix gives proofs for the behavior of the M and S, Section 3 gives supporting arguments for the behavior of the W.

In summary, since under the alternative, mod error rates are less than lod error rates, and since the M1 are bounded for lods [5], we can be assured that M 1 calculated from mods are small. Therefore, assuming MP mod scores under the alternative behave reliably according to the operational characteristics of the evidential paradigm (i.e. have small M₁ (n, k) and decreasing W₁ (n, k) with increasing sample size), one needs to calculate only M₀ based on mods in order to ensure low misleading error rates for a given study. For a well-designed study, however, one should be aware of the study's corresponding probabilities of weak evidence, W₀ and W₁, and sample size should be chosen as best as possible to minimize these probabilities.

3.2. Choosing the Alternative Hypothesis for Planning Purposes

In order to calculate the required planning probabilities M₀, W₀, and W1, the user must specify not only the null hypothesis of no linkage, but also a simple alternative hypothesis. This simple alternative should be the most conservative choice such that the corresponding probability for this simple alternative serves as an upper bound for any pair of markers in the genome scan. It is easy to misunderstand the role of this alternative hypothesis and to think that specifying it limits one's analysis or one's abilities to draw conclusions from one's data. However, that is not the case. The alternative hypothesis plays a role only in planning the study and plays no role in data analysis.

To understand this, consider that between any two marker loci A and B, the greatest probability of misleading evidence when there is no linkage (i.e. the maximum M₀) will occur at the midpoint between the two loci. This is true since at positions closer to either locus, there is more evidence against linkage with that locus. Table 2 illustrates this phenomenon for two loci separated by 10 cM. The table shows values of M₀ (exact, not simulated), calculated at distances of 0.1, 1.0, 2.0, 3.0, 4.0, and 5.0 cM from the first locus, and illustrates how the maximum M₀ always occurs in the middle (calculations as in Hodge et al. [4]). Thus, one can safely choose the midpoint between two markers to represent the alternative simple hypothesis. This provides a conservative estimate of the error rate under H₀, since for any other alternative that error rate would be smaller. Likewise, for any other alternative, W₀ is smaller. This implies that the largest sample size estimate also comes from the choice of the alternative at the midpoint.

Table 2.

Values of M₀ at positions between markers A and B, for cutoff value c = 0.9, for different numbers (n) of fully informative gametes

n	Distance from locus A, cM
	0.1	1.0	2.0	3.0	4.0	5.0
10	0.0019	0.0022	0.0023	0.0023	0.0023	0.0023
12	0.0005	0.0006	0.0007	0.0016	0.0016	0.0016
14	0.0001	0.0010	0.0010	0.0013	0.0016	0.0017
16	0.0000	0.0003	0.0006	0.0007	0.0008	0.0008
18	0.0001	0.0002	0.0002	0.0003	0.0003	0.0003
20	0.0000	0.0001	0.0001	0.0001	0.0001	0.0001
25	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000

Open in a new tab

Locus A and B are separated by 10.0 cM.

Numbers in the table result from exact calculations, not simulations.

Further, among all pairs of contiguous marker loci, one can pick that pair whose two loci are spaced furthest apart. The results for that pair will then be conservatively applicable to all other pairs. (Also see last paragraph of Section 5, Simulation Results, as well as Box 1.)

4. Simulation Methods

4.1. Generating Model

Here we use the same simulation models as in Hodge et al. [4]. Briefly, we simulated two marker loci, A and B, separated by a distance of 10 cM. The trait was due to either a dominant or recessive model with reduced penetrance. Let T represent the trait allele and t, the wild type. For the dominant GMs, the frequency of T was 0.01, and the penetrance vector for genotypes (TT, Tt, tt) was (f, f, 0.001), where f assumed the values 0.20, 0.50, or 0.80 (models D20, D50, and D80, respectively). For the recessive GMs, frequency of T = 0.14, and penetrance vectors = (f, 0.001, 0.001), where f assumed the same three values (models R20, R50, and R80, respectively).

For the linkage scenario, the trait locus was linked to both markers, either located halfway between the two marker loci (results shown in tables 3, 4, 5) or located at marker A (results not shown). Simulations were performed at the midpoint to start with, so as to correspond to the H₀ simulations [4]. Results when the trait locus was located at one of the markers were very similar (see Results for details). For the nonlinkage scenarios, the trait locus was not linked to either marker.

Table 3.

Values of M₁ for lods and mods in 1,000 simulated datasets, as a function of dataset size, for cutoff value c = 0.9 (k = 8), evaluated at a position halfway between two markers separated by 10 cM

GM	N umber of families
	10	20	30	40	50	75
a Lods
D20	0.009	0.004	0.001	0.002	0.000	0.000
D50	0.006	0.000	0.000	0.000	0.000	0.000
D80	0.001	0.000	0.000	0.000	0.000	0.000
R20	0.010	0.000	0.002	0.000	0.000	0.000
R50	0.001	0.002	0.000	0.000	0.000	0.000
R80	0.001	0.000	0.000	0.000	0.000	0.000

b Mods
D20	0.000	0.000	0.001	0.000	0.001	0.000
D50	0.000	0.000	0.000	0.000	0.000	0.000
D80	0.000	0.000	0.000	0.000	0.000	0.000
R20	0.001	0.000	0.000	0.000	0.000	0.000
R50	0.000	0.000	0.000	0.000	0.000	0.000
R80	0.000	0.000	0.000	0.000	0.000	0.000

Open in a new tab

Table 4.

Values of W₁ for lods and mods in 1,000 simulated datasets, as a function of dataset size, for cutoff value c = 0.9 (k = 8) and and c = 1.5 (k = 32), evaluated at a position halfway between two markers separated by 10 cM

GM	Number of families
	10		20		30		40		50		75
	k = 8	k = 32	k = 8	k = 32	k = 8	k = 32	k = 8	k = 32	k = 8	k = 32	k = 8	k = 32
a Lods
D20	0.269	0.576	0.072	0.163	0.017	0.044	0.009	0.024	0.005	0.013	0.000	0.000
D50	0.090	0.195	0.010	0.029	0.000	0.002	0.000	0.000	0.000	0.000	0.000	0.000
D80	0.019	0.035	0.001	0.002	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
R20	0.099	0.208	0.016	0.039	0.001	0.004	0.000	0.001	0.000	0.000	0.000	0.000
R50	0.032	0.088	0.004	0.008	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
R80	0.008	0.015	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000

b Mods
D20	0.234	0.512	0.056	0.130	0.012	0.033	0.006	0.016	0.003	0.005	0.000	0.000
D50	0.061	0.143	0.004	0.013	0.000	0.001	0.000	0.000	0.000	0.000	0.000	0.000
D80	0.007	0.017	0.001	0.002	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
R20	0.060	0.171	0.002	0.012	0.001	0.002	0.000	0.000	0.000	0.000	0.000	0.000
R50	0.006	0.042	0.000	0.001	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
R80	0.002	0.004	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000

Open in a new tab

Table 5.

Values of S1 for lods and mods in 1,000 simulated datasets, as a function of dataset size, for cutoff value c = 0.9 (k = 8) and c = 1.5 (k = 32), evaluated at a position halfway between two markers separated by 10 cM

GM	Number of families
	10		20		30		40		50		75
	k = 8	k = 32	k = 8	k = 32	k = 8	k = 32	k = 8	k = 32	k = 8	k = 32	k = 8	k = 32
a Lods
D20	0.722	0.424	0.924	0.836	0.982	0.955	0.989	0.975	0.995	0.987	1.000	1.000
D50	0.904	0.805	0.990	0.971	1.000	0.998	1.000	1.000	1.000	1.000	1.000	1.000
D80	0.980	0.965	0.999	0.998	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
R20	0.891	0.789	0.984	0.961	0.997	0.995	1.000	0.999	1.000	1.000	1.000	1.000
R50	0.967	0.912	0.994	0.992	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
R80	0.991	0.984	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000

b Mods
D20	0.766	0.488	0.944	0.870	0.987	0.967	0.994	0.984	0.997	0.995	1.000	1.000
D50	0.939	0.857	0.996	0.987	1.000	0.999	1.000	1.000	1.000	1.000	1.000	1.000
D80	0.993	0.983	0.999	0.998	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
R20	0.939	0.829	0.998	0.988	0.999	0.998	1.000	1.000	1.000	1.000	1.000	1.000
R50	0.994	0.958	1.000	0.999	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
R80	0.998	0.996	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000

Open in a new tab

We simulated four-child nuclear families with at least two affected children. Parents were typed and were fully informative except for phase (that is, both parents were heterozygous at both marker loci, with no marker alleles in common, but their phases were unknown), and parental affectedness status was known.

4.2. Analysis Models

We analyzed the simulated data assuming a number of AMs, among which the true model (the GM) was always included. Dominant AMs D10, D20, …, D90 had penetrance vectors (f, f, 0.001) where now f assumed values 0.1, 0.2, …, 0.9, respectively; and similarly for recessive AMs R10 through R90. For each GM and each value of n (n = 10, 20, 30, 40, 50, 75), we simulated N = 1,000 datasets of n families each. Each dataset was first analyzed via MP lods, i.e. where we assumed the correct GM in order to perform the analysis. Then each dataset was analyzed via MP mods; here we determined a max lod for each of the 18 different AMs, then determined the maximum of those 18 maxima. These AMs were chosen to be comparable to those used in the earlier papers [3, 4], and serve as an approximation to a true mod score, which would require an optimization procedure to calculate. Scores were calculated at only one position, located halfway between the two markers. We used cutoff values of k = 8 and 32, or, equivalently on the common log scale, c = 0.9 and 1.5.

For each analysis, we tabulated the proportion of times, out of 1,000, that the appropriate score (max lod or max mod) was less than or equal to – c (this proportion represents M₁), fell between – c and c (W₁), or was greater than or equal to c (S₁).

5. Simulation Results

M₀ and S₀ can be found in Table 1 of Hodge et al. [4]. Tables 3, 4, 5 here show the simulation results for M₁, W₁, S₁, respectively, when the trait locus is located halfway between markers A and B. Table 6 shows results for W₀, i.e. when the trait locus is not linked to either marker. In each table, part (a) shows the results for MP lods calculated under the correct model (the GM), whereas part (b) gives the results for MP lods maximized over the 18 models (MP mod) described above. We show results for k = 8 and k = 32, except in table 3 (M₁), where we show only results for k = 8, since almost all the k = 32 results equaled zero.

Table 6.

Values of W₀ for lods and mods in 1,000 simulated datasets, as a function of dataset size, for cutoff value c = 0.9 (k = 8) and and c = 1.5 (k = 32), evaluated at a position halfway between two markers separated by 10 cM

GM	Number of families
	10		20		30		40		50		75
	k = 8	k = 32	k = 8	k = 32	k = 8	k = 32	k = 8	k = 32	k = 8	k = 32	k = 8	k = 32
a Lods
D20	0.509	0.709	0.287	0.447	0.157	0.280	0.095	0.165	0.043	0.097	0.016	0.028
D50	0.294	0.459	0.106	0.179	0.035	0.053	0.013	0.023	0.008	0.016	0.001	0.002
D80	0.107	0.179	0.012	0.024	0.000	0.001	0.000	0.000	0.000	0.000	0.000	0.000
R20	0.284	0.477	0.103	0.177	0.034	0.069	0.013	0.032	0.007	0.013	0.000	0.001
R50	0.175	0.291	0.025	0.061	0.009	0.022	0.004	0.005	0.000	0.001	0.000	0.000
R80	0.066	0.122	0.005	0.009	0.001	0.003	0.000	0.000	0.000	0.000	0.000	0.000

b Mods
D20	0.857	0.973	0.686	0.906	0.497	0.723	0.409	0.622	0.325	0.521	0.179	0.321
D50	0.660	0.879	0.405	0.615	0.220	0.396	0.129	0.252	0.091	0.144	0.021	0.043
D80	0.509	0.743	0.235	0.402	0.095	0.172	0.045	0.081	0.022	0.040	0.002	0.005
R20	0.904	0.986	0.774	0.960	0.664	0.892	0.577	0.817	0.466	0.739	0.321	0.543
R50	0.892	0.986	0.775	0.956	0.617	0.872	0.521	0.780	0.456	0.705	0.284	0.505
R80	0.894	0.989	0.758	0.950	0.637	0.879	0.491	0.761	0.393	0.675	0.275	0.504

Open in a new tab

For M₁ (table 3) we can see that for each of the six GMs considered, and for all sample sizes, we observe very few instances of misleading evidence, the worst case occurring under a recessive model with 20% penetrance, for k = 8 with 10 families (M₁ = 0.010). Otherwise, we see that, with increasing information, these rates quick ly approach 0 as n increases, as would be predicted for lods based on Royall [5]. As expected (Appendix, Section 2), comparing table 3 a to b confirms that M₁ is uniformly smaller after maximizing over the 18 AMs. Therefore, for the models considered here, which cover a broad grid of potential models, at relatively small sample sizes, M₁ is essentially negligible – even for low values of k.

Turning to table 4 and the probability of observing weak MP linkage signals when there is linkage we see that: W₁ increases with k, for a given GM and sample size; W₁ decreases with increasing sample size n, for a given k and GM; and W₁ decreases as penetrance increases, for a given k and n. These results hold in both dominant and recessive models, for both lods and mods. Moreover, comparing individual cells from table 4a and b indicates that W₁ is smaller when one maximizes over all the 18 AMs rather than analyzing at just the true model, thus supporting the argument in Section 3 of the Appendix.

Table 5 shows the probabilities of strong evidence for lods (table 5a) and mods (table 5b). Since M₁ + W₁ + S₁ = 1 (eq. 3), it is not surprising to see that S₁ is smaller for larger k, is larger for larger n, and is larger for higher penetrances. Also, S₁ is always greater for mods than for lods when H₁ is true (Appendix, Section 2).

Finally, table 6 gives the probabilities of weak evidence when there is no linkage (H₀). These results qualitatively mirror those for W₁, in both dominant and recessive models, for both lods and mods. That is, W₀ also increases with k, for a given GM and sample size; decreases with increasing sample size n, for a given k and GM; and decreases as penetrance increases, for a given k and n. Unlike W₁, W₀ is larger for mods than for lods, also supporting the argument in Section 3 of the Appendix.

Interestingly, the values for W₀ are consistently larger than those for W₁, as we see if we compare tables 4 and 6, cell by cell. Yet misleading evidence probabilities (M_i) remain very small under both H₀ and H₁. Table 7 illustrates what happens to W₀ and W₁ as we move from lods to mods, for some typical examples. The table gives M, W, and S for two GMs (one dominant, one recessive, both with 50% penetrance) and datasets of 20 families, using k = 32 as the criterion. Under H₁ (part b of table 7), moving from lods to mods has only a small effect on the probabilities of strong evidence, S₁ (already high), and weak evidence, W₁ (already low). In contrast, under H₀ (table 7a), the probability of strong evidence, S₀, goes from reasonably high (0.82 for the dominant example, 0.94 for the recessive one) to much lower (0.37 dominant, 0.035 recessive). Yet almost all of that shift shows up in W₀, the probability of weak evidence, not in misleading evidence, which rises only slightly. In other words, although M₀ does increase in mods as compared to lods (Appendix, Section 2), even for mods it remains very small, as we had shown in our earlier work [4].

Table 7.

Illustrative values of M_i, W_i, and S_i for one dominant and one recessive model, in datasets of size n = 20, using k = 32

a Under H₀

	GM = D50			GM = R50
	M₀	W₀	S₀	M₀	W₀	s₀
Lod	0.004	0.179	0.817	0.001	0.177	0.938
Mod	0.016	0.615	0.369	0.009	0.956	0.035

b Under H₁

	GM = D50			G M = R50
	M₁	W₁	S₁	M₁	W₁	S₁
Lod	0.000	0.029	0.971	0.000	0.008	0.992
Mod	0.000	0.013	0.987	0.000	0.001	0.999

Open in a new tab

We also examined what happens under H₁ when the trait locus is located at one of the two marker loci. The results (not shown) for these runs are qualitatively similar to those where the trait locus is halfway between the two markers, but are slightly more favorable, in that the probabilities of weak evidence (W₁) are slightly smaller, and the probabilities of strong evidence (S₁) slightly greater. Therefore, the results shown in the tables represent the worst-case scenarios.

6. Discussion

6.1. Summary

In this study we have investigated via simulation the operational characteristics of the evidential paradigm when applied to MP linkage analysis for both lod and mod analysis. We have demonstrated that the operational characteristics of the evidential paradigm we observed in Strug and Hodge [12] continue to hold for MP mod scores despite being non-standard evidence functions, allowing us to extend the paradigm to MP linkage analysis with unknown disease models.

Here we have provided the theoretical results required to guide investigators on how to calculate the error probabilities of the evidential paradigm, described in Section 3 and summarized in Box 1. We have also performed extensive simulations, which indicate that the evidential approach can yield low error rates under the null and alternative hypotheses for both lods and mods (tables 3, 4, 5, 6).

6.2. Favorable Operating Characteristics of the Evidential Paradigm

The evidential paradigm is straightforward to apply to MP linkage studies, requiring only that MP lods or mods be calculated. The strength of the linkage evidence can then be interpreted directly from the observed value of the lod/mod score.

We have seen that although the probability of observing misleading evidence is, in general, quite low, the corresponding probabilities of weak evidence can be large. We reiterate that, just as in the two-point linkage case [12], during the planning stage investigators should focus more on minimizing weak signals, than on minimizing the misleading evidence probabilities. If the probability of weak evidence is too high, one can lower it either by increasing sample size or by choosing a lower value of k. Increasing sample size is the ideal solution but is not always possible. However, it may be feasible to lower k. Investigators tend to focus on reducing type I error probabilities in the frequentist paradigm, which in turn corresponds to larger required critical values analogous to the role of k. In the evidential paradigm, raising k has only a minor effect on lowering the probabilities of misleading evidence, but a larger effect on raising the probabilities of weak evidence [13] (also see tables 3, 4, 5 in this paper). Instead, investigators should focus on minimizing weak evidence probabilities, and for that purpose, one should not increase k beyond what is compelling [6, 16]. This concept of explicitly minimizing the probability of weak evidence represents a novel contribution of the evidential paradigm [18].

6.3. Advantage of Single Criterion

We have shown that, in contrast to frequentist analysis of MP linkage, with the evidential paradigm one can use a common linkage criterion for MP studies, presuming an investigator has established that the evidential error probability of misleading evidence for a proposed MP linkage study is small. Using the frequentist paradigm, if one requires type I error control of, for example, 5%, then the linkage criterion for significance would be different at each marker and across every study.

6.4. Meaning of Error Probabilities in the Evidential Paradigm

We return to the subject of what the error M₀ actually means. We have referred to it as being analogous to ‘type I error’ but it is really quite different. As we have mentioned, a major strength of the evidential paradigm lies in keeping the measure of errors separate from the measure of evidence [6, 7]. The fundamental distinction is that M₀ is a measure of uncertainty, not of evidence. Consider a study that yields an M₀ value of 0.01. All that this value is telling us is that if the null hypothesis is true – i.e. if the disease/trait really is not linked to these markers – there is only a 1 in 100 probability that we will observe a high LR that will mislead us into concluding linkage. This M₀ = 0.01 does not say anything about how strong the evidence is in favor of linkage; for that, we look at the actual value of the lod (mod) calculated from the given data set. Future work will determine whether the asymptotic upper bound on the probability of misleading evidence [5] holds for MP mod scores, avoiding the requirement of calculating M₀ to ensure it is small.

Appendix: Relative Magnitudes of Evidential Errors for Lods and Mods (Table 1)

1. Notation and Setup

Consider a set of m LRs, LR_i, i = 1, …, m, and let LR_max denote the maximum of the m LR s. Any individual LRi corresponds to an individual lod analysis, whereas LR_max corresponds to a mod analysis.

Also consider two positive numbers: 0 > a > b.

· For each LR_i, define P_i ≡ Pr[LR_i ≤ a],R_i ≡ Pr[LR_i ≥ b], and Q_i ≡ Pr[a > LRi >b] = 1 - P_i - R_i.

· For LR_max, let P ∗ be the probability that LR_max is less than or equal to a, i.e. P ∗ = Pr[LR_max ≥ a], and similarly, R_∗ = Pr[LR_max ≥ b] and Q ∗ ≡ Pr[a ≥ LR_max ≥b] = 1 - P ∗ - R ∗. First we examine M and S; then we will turn to W.

2. Implications for M and S

From basic probability principles, it is straightforward to show that the probability of the max LR being less than or equal to a cannot exceed the probability for any of the individual LRs to be less than or equal to a. That is,

Pr [L R_{max} \leq a] = Pr [\underset{i}{\cap} L R_{i} \leq a] \leq Pr [L R_{i} \leq a] \forall i, i . e .

P * \leq P_{i} \forall i .

(A.1)

On the other hand, the probability of the max LR being greater than or equal to b cannot be less than the probability for any of the individual LRs to be greater than or equal to b, i.e.

Pr [L R_{max} \geq b] = Pr [\underset{i}{\cup} (L R_{i} \geq b)] \geq Pr [L R_{i} \geq b] \forall i, i . e .

R * \leq R_{i} \forall i .

(A.2)

Applying these two results to the evidentialist paradigm, let b represent k, our upper cutoff value for strong evidence supporting H₁, and let a represent 1/ k, our lower cutoff value for strong evidence supporting H₀.

· When H₀ is true, then LR_i being less than or equal to a represents strong evidence, whereas LR_i being greater than or equal to b represents misleading evidence; hence, S_i = P_i and M_i = R_i; also S_∗ = P_∗ and M_∗ = R_∗. It follows from (A.1) and (A.2) that the probability of strong evidence will be less for mods than for lods, and the probability of misleading evidence will be greater for mods than for lods, as summarized in table 1, in the ‘ H₀ true’ column. · When H₁ is true, it is the other way around: LR_i being less than or equal to a represents misleading evidence, whereas LR_i being greater than or equal to b represents strong evidence; hence, M_i = P_i and S_i = R_i; and M_∗ = P_∗ and S_∗ = R_∗. Now the probability of strong evidence is greater, and the probability of misleading evidence less for mods than for lods, as in the ‘H₁ true’ column in table 1. These conclusions simply formalize the obvious fact that if one maximizes the lod over multiple models, one will achieve a LR that is greater than or equal to the LR resulting from only a single lod. If the null hypothesis is true, this procedure makes a misleading result more likely, whereas if the alternative is true, the procedure makes a not-misleading result more likely.

3. Implications for W

Under H₀, W_∗ is generally greater for mods than for lods, whereas under H₁, W_∗ is less for mods than for lods. We do not have definitive proofs of these relationships, but we show reasonable qualitative arguments here:

Working again with the P, Q, and R, and writing ‘ LR between’ as shorthand for the event a≥ LR ≥ b:

Q∗≡ Pr[LR_max between] = Pr[LR_i between] – A_i + B_i, for any i,

Q . = Q_{i} - A_{i} * B_{i}, for any i,

(A.3)

A_{i} \neq P R [(L R_{i} between) \cap {\underset{j \neq 1}{\cup} (L R_{j} \geq b)}]

B_{i} \equiv P R [(L R_{i} \leq a) \cap (max_{j \neq i} L R_{i} between)]

(A.4)

i.e. the probability that the max LR is ‘between’ (Q_∗) can be expressed as –

• the probability that any one LR – say, LR_i – is ‘between’ (Q_i)

• minus the probability of the event in which LR_i is ‘between’ but at least one other LR_j is greater than or equal to b (A_i)

• plus the probability that LR_i is less than or equal to a, and the maximum of all the other LR_j is ‘between’ (B_i).

Considering what A_i and B_i in (A.4) represent, we argue as follows (again letting b represent k, and a represent 1/k):

• In a well-designed study, when H₀ is true, all the R_i, which rep resent probabilities of misleading evidence for the individual LR s, will be very small. Then assuming a reasonable sample size, the P_i (probabilities of strong evidence when H₀ is true) will be reasonably large, and the Q_i, which represent probabilities of weak evidence, will be reasonably small, or in any case not larger than the P_i. Each A_i in (A.4) is largely governed by the presumably small probability that at least one of the LR is greater than or equal to b, whereas B_i represents the probability of the intersection of two events, neither of which has a very small probability. Thus, most often B_i will be greater than A_i, and consequently Q_∗ in (A.3) will be greater than any of the individual Q_i, or, equivalently, W_∗ 1 W_i Ci, as summarized in table 1, in the ‘H₀ true’ column.

• When H₁ is true, it is the other way around: Now in a well-designed study, the P_i represent probabilities of misleading evidence and will be very small. Each A_i is governed by the probability that at least one of the LR is greater than or equal to b, whereas B_i cannot be bigger than P_i for any LR, which should be very small. Most often A_i will be greater than B_i, so that Q_∗ in (A.3) will be less than any of the individual Q_i, or, equivalently, W_∗! W_i Ci, as in the ‘ H₁ true’ column in table 1. These approximate arguments concerning the W are borne out by the results from our simulations under H₀ and H₁ (tables 1 and 5).

Acknowledgments

This work was supported in part by grants MH-48858 from the National Institute of Mental Health (S.E.H.) and HG004314 from the Human Genome Research Institute (L.J.S.); also by the Natural Sciences and Engineering Research Council of Canada and the Ontario Ministry of Research and Innovation Early Researcher Award program (L.J.S.).

Footnotes

The two criteria need not be symmetric. One can specify one value k >1 for the level of evidence required to accept H₁, but use 1/l as the criterion for accepting H₀, where another value l >1 can be either greater than or less than k). See Strug and Hodge [12] for further discussion.

A profile LR is constructed by maximizing separate profile likelihood functions (maxΦL(Φ)) in the numerator and denominator of the ratio, where Φ is the unknown trait model. In contrast, a mod score is defined as maxΦlog10 LR(Φ), i.e. the unknown trait model parameters are maximized over the whole ratio.

References

1.Vieland VJ. Bayesian linkage analysis, or: how I learned to stop worrying and love the posterior probability of linkage. Am J Hum Genet. 1998;63:947–954. doi: 10.1086/302076. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Devlin B, Roeder K, Wasserman L. False discovery or missed discovery? Heredity. 2003;91:537–538. doi: 10.1038/sj.hdy.6800370. [DOI] [PubMed] [Google Scholar]
3.Xing C, Elston RC. Distribution and magnitude of type I error of model-based multipoint lod scores: implications for multipoint mod scores. Genet Epidemiol. 2006;30:447–458. doi: 10.1002/gepi.20157. [DOI] [PubMed] [Google Scholar]
4.Hodge SE, Rodriguez-Murillo L, Strug LJ, Greenberg DA. Multipoint lods provide reliable linkage evidence despite unknown limiting distribution: type I error probabilities decrease with sample size for multipoint lods and mods. Genet Epidemiol. 2008;32:800–815. doi: 10.1002/gepi.20350. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Royall R. On the probability of observing misleading statistical evidence. J Am Stat Assoc. 2000;95:760–780. [Google Scholar]
6.Royall R. Statistical Evidence: A Likelihood Paradigm. London: Chapman & Hall; 1997. [Google Scholar]
7.Blume JD. Likelihood methods for measuring statistical evidence. Stat Med. 2002;21:2563–2599. doi: 10.1002/sim.1216. [DOI] [PubMed] [Google Scholar]
8.Choi L, Caffo B, Rohde C. A survey of the likelihood approach to bioequivalence trials. Stat Med. 2008;27:4874–4894. doi: 10.1002/sim.3334. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Zhang Z. Interpreting statistical evidence with empirical likelihood functions. Biom J. 2009;51:710–720. doi: 10.1002/bimj.200800209. [DOI] [PubMed] [Google Scholar]
10.Hoch JS, Blume JD. Measuring and illustrating statistical evidence in a cost-effectiveness analysis. J Health Econ. 2007;27:476–495. doi: 10.1016/j.jhealeco.2007.07.002. [DOI] [PubMed] [Google Scholar]
11.Blume JD, Su L, Olveda RM, McGarvey ST. Statistical evidence for GLM regression parameters: a robust likelihood approach. Stat Med. 2007;26:2919–2936. doi: 10.1002/sim.2759. [DOI] [PubMed] [Google Scholar]
12.Strug LJ, Hodge SE. An alternative foundation for the planning and evaluation of linkage analysis. I. Decoupling ‘error probabilities’ from ‘measures of evidence’. Hum Hered. 2006;61:166–188. doi: 10.1159/000094709. [DOI] [PubMed] [Google Scholar]
13.Strug LJ, Rohde CA, Corey PN. An introduction to evidential sample size calculations. Am Stat. 2007;61:207–212. [Google Scholar]
14.Strug LJ, Hodge SE, Chiang T, Pal DK, Corey PN, Rohde C. A pure likelihood approach to the analysis of genetic association data: an alternative to Bayesian and Frequentist analysis. Eur J Hum Genet. 2010;18:933–941. doi: 10.1038/ejhg.2010.47. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Strug LJ, Hodge SE. An alternative foundation for the planning and evaluation of linkage analysis. II. Implications for multiple test adjustments. Hum Hered. 2006;61:200–209. doi: 10.1159/000094775. [DOI] [PubMed] [Google Scholar]
16.Edwards AWF. Likelihood. expanded ed. Baltimore: Johns Hopkins University Press; 1992. [Google Scholar]
17.Pawitan Y. In All Likelihood: Statistical Modelling and Inference Using Likelihood. New York: Oxford University Press; 2001. [Google Scholar]
18.Kalbfleisch JD. Comment on Royall's ‘On the probability of observing misleading statistical evidence’. J Am Stat Assoc. 2000;95:770–771. [Google Scholar]

[B1] 1.Vieland VJ. Bayesian linkage analysis, or: how I learned to stop worrying and love the posterior probability of linkage. Am J Hum Genet. 1998;63:947–954. doi: 10.1086/302076. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2.Devlin B, Roeder K, Wasserman L. False discovery or missed discovery? Heredity. 2003;91:537–538. doi: 10.1038/sj.hdy.6800370. [DOI] [PubMed] [Google Scholar]

[B3] 3.Xing C, Elston RC. Distribution and magnitude of type I error of model-based multipoint lod scores: implications for multipoint mod scores. Genet Epidemiol. 2006;30:447–458. doi: 10.1002/gepi.20157. [DOI] [PubMed] [Google Scholar]

[B4] 4.Hodge SE, Rodriguez-Murillo L, Strug LJ, Greenberg DA. Multipoint lods provide reliable linkage evidence despite unknown limiting distribution: type I error probabilities decrease with sample size for multipoint lods and mods. Genet Epidemiol. 2008;32:800–815. doi: 10.1002/gepi.20350. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5.Royall R. On the probability of observing misleading statistical evidence. J Am Stat Assoc. 2000;95:760–780. [Google Scholar]

[B6] 6.Royall R. Statistical Evidence: A Likelihood Paradigm. London: Chapman & Hall; 1997. [Google Scholar]

[B7] 7.Blume JD. Likelihood methods for measuring statistical evidence. Stat Med. 2002;21:2563–2599. doi: 10.1002/sim.1216. [DOI] [PubMed] [Google Scholar]

[B8] 8.Choi L, Caffo B, Rohde C. A survey of the likelihood approach to bioequivalence trials. Stat Med. 2008;27:4874–4894. doi: 10.1002/sim.3334. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9.Zhang Z. Interpreting statistical evidence with empirical likelihood functions. Biom J. 2009;51:710–720. doi: 10.1002/bimj.200800209. [DOI] [PubMed] [Google Scholar]

[B10] 10.Hoch JS, Blume JD. Measuring and illustrating statistical evidence in a cost-effectiveness analysis. J Health Econ. 2007;27:476–495. doi: 10.1016/j.jhealeco.2007.07.002. [DOI] [PubMed] [Google Scholar]

[B11] 11.Blume JD, Su L, Olveda RM, McGarvey ST. Statistical evidence for GLM regression parameters: a robust likelihood approach. Stat Med. 2007;26:2919–2936. doi: 10.1002/sim.2759. [DOI] [PubMed] [Google Scholar]

[B12] 12.Strug LJ, Hodge SE. An alternative foundation for the planning and evaluation of linkage analysis. I. Decoupling ‘error probabilities’ from ‘measures of evidence’. Hum Hered. 2006;61:166–188. doi: 10.1159/000094709. [DOI] [PubMed] [Google Scholar]

[B13] 13.Strug LJ, Rohde CA, Corey PN. An introduction to evidential sample size calculations. Am Stat. 2007;61:207–212. [Google Scholar]

[B14] 14.Strug LJ, Hodge SE, Chiang T, Pal DK, Corey PN, Rohde C. A pure likelihood approach to the analysis of genetic association data: an alternative to Bayesian and Frequentist analysis. Eur J Hum Genet. 2010;18:933–941. doi: 10.1038/ejhg.2010.47. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15.Strug LJ, Hodge SE. An alternative foundation for the planning and evaluation of linkage analysis. II. Implications for multiple test adjustments. Hum Hered. 2006;61:200–209. doi: 10.1159/000094775. [DOI] [PubMed] [Google Scholar]

[B16] 16.Edwards AWF. Likelihood. expanded ed. Baltimore: Johns Hopkins University Press; 1992. [Google Scholar]

[B17] 17.Pawitan Y. In All Likelihood: Statistical Modelling and Inference Using Likelihood. New York: Oxford University Press; 2001. [Google Scholar]

[B18] 18.Kalbfleisch JD. Comment on Royall's ‘On the probability of observing misleading statistical evidence’. J Am Stat Assoc. 2000;95:770–771. [Google Scholar]

PERMALINK

Using Parametric Multipoint Lods and Mods for Linkage Analysis Requires a Shift in Statistical Thinking

Susan E Hodge

Zeynep Baskurt

Lisa J Strug

Abstract

1. Introduction

Box 1. Algorithm to conduct multipoint linkage analysis using the evidential paradigm

2. Review of the Evidential Paradigm and Previous Work

2.1. LR and the Criterion for Strong Evidence

2.2. The Error Probabilities

2.3. Our Previous Work

Table 1.

3. Two Guidelines for Conducting Evidential MP Linkage Analysis

3.1. Relative Magnitudes of Errors for Lods and Mods

3.2. Choosing the Alternative Hypothesis for Planning Purposes

Table 2.

4. Simulation Methods

4.1. Generating Model

Table 3.

Table 4.

Table 5.

4.2. Analysis Models

5. Simulation Results

Table 6.

Table 7.

6. Discussion

6.1. Summary

6.2. Favorable Operating Characteristics of the Evidential Paradigm

6.3. Advantage of Single Criterion

6.4. Meaning of Error Probabilities in the Evidential Paradigm

Appendix: Relative Magnitudes of Evidential Errors for Lods and Mods (Table 1)

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Using Parametric Multipoint Lods and Mods for Linkage Analysis Requires a Shift in Statistical Thinking

Susan E Hodge

Zeynep Baskurt

Lisa J Strug

Abstract

1. Introduction

Box 1. Algorithm to conduct multipoint linkage analysis using the evidential paradigm

2. Review of the Evidential Paradigm and Previous Work

2.1. LR and the Criterion for Strong Evidence

2.2. The Error Probabilities

2.3. Our Previous Work

Table 1.

3. Two Guidelines for Conducting Evidential MP Linkage Analysis

3.1. Relative Magnitudes of Errors for Lods and Mods

3.2. Choosing the Alternative Hypothesis for Planning Purposes

Table 2.

4. Simulation Methods

4.1. Generating Model

Table 3.

Table 4.

Table 5.

4.2. Analysis Models

5. Simulation Results

Table 6.

Table 7.

6. Discussion

6.1. Summary

6.2. Favorable Operating Characteristics of the Evidential Paradigm

6.3. Advantage of Single Criterion

6.4. Meaning of Error Probabilities in the Evidential Paradigm

Appendix: Relative Magnitudes of Evidential Errors for Lods and Mods (Table 1)

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases