Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses

M J Sillanpää

doi:10.1038/hdy.2010.91

. 2010 Jul 14;106(4):511–519. doi: 10.1038/hdy.2010.91

Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses

M J Sillanpää ^1,^2,^*

PMCID: PMC3183892 PMID: 20628415

Abstract

Population-based genomic association analyses are more powerful than within-family analyses. However, population stratification (unknown or ignored origin of individuals from multiple source populations) and cryptic relatedness (unknown or ignored covariance between individuals because of their relatedness) are confounding factors in population-based genomic association analyses, which inflate the false-positive rate. As a consequence, false association signals may arise in genomic data association analyses for reasons other than true association between the tested genomic factor (marker genotype, gene or protein expression) and the study phenotype. It is therefore important to correct or account for these confounders in population-based genomic data association analyses. The common correction techniques for population stratification and cryptic relatedness problems are presented here in the phenotype–marker association analysis context, and comments on their suitability for other types of genomic association analyses (for example, phenotype–expression association) are also provided. Even though many of these techniques have originally been developed in the context of human genetics, most of them are also applicable to model organisms and breeding populations.

Keywords: association analysis, stratification, relatedness, genomic data, Bayesian statistics

Introduction

In genomic data association analysis, putative functional links can be identified by relating the measurements from single genomic data type to other data sources such as observed phenotypes, gene or protein expressions, molecular markers, functional classifications or cellular responses (for example, Risch and Merikangas, 1996; Jansen and Nap, 2001; Jansen et al., 2002; Bhattacharjee et al., 2008). Quantitative and qualitative traits are commonly studied by using the methods developed for phenotype–marker association analysis. These same methods can also be used to find regulatory pathways or patterns controlling gene expressions (eQTLs) and protein expressions (pQTLs) by treating the expression level of the gene or protein as a classical quantitative phenotype (Jansen and Nap, 2001; Bystrykh et al., 2005; Foss et al., 2007). Other possible links can be identified by using phenotype–expression and phenotype–protein association analysis as well as simultaneous association analysis of multiple data types (Hoti and Sillanpää, 2006; Bhattacharjee et al., 2008; Sillanpää and Noykova, 2008; Bhattacharjee and Sillanpää, 2009). Complementary evidence provided by different data sources can be joined together afterwards to study supported overlapping genomic regions (Aune et al., 2004; Bhattacharjee et al., 2008).

Genome-wide (population-based) association analysis is generally considered to be a main tool to infer causative links between genomic marker data and phenotype (Risch and Merikangas, 1996; McCarthy and Hirschhorn, 2008). This happens regardless of problems such as genetic heterogeneity (Terwilliger and Weiss, 1998; Sillanpää and Bhattacharjee, 2006), winner's curse (Lande and Thompson, 1990; Beavis, 1998; Göring et al., 2001; Xiao and Boehnke, 2009) and missing heritability (Maher, 2008; McCarthy and Hirschhorn, 2008; Slatkin, 2009). A particular property of marker data is the systematic spatial dependence along the chromosome. In gene- and protein-expression data, the spatial dependence of expression values at neighbouring genes (or mass-to-charge ratios) is not well established. However, it is possible that the normalization method used will induce some spatial dependence to the expression data. It is also common to assume the presence of some other form of dependence in the data, namely that the expressions of genes belonging to the same pathway are highly dependent on each other. Exception for the theme is provided by protein antibody microarrays (used in studies of cancer immune responses) in which the presence of spatial dependence is evident (Wu et al., 2009). It is good to keep in mind differences between data types, as they affect the applicability of the methods reviewed in this study.

In a population-based phenotype–marker association study, we hope that study individuals are distantly related at the small genome regions (containing the trait loci) so that there is systematic linkage disequilibrium generated by rare occurrence of recombination events within these regions during a large number of meioses in the ancestral pedigree. At the same time, individuals in the study sample are assumed to be mutually independent (unrelated or equally related to each other). However, this assumption does not hold for individuals showing more complex relationship structures, and the potential existence of such discrepancy in the data is generally known as a cryptic relatedness problem (for example, Devlin and Roeder, 1999; Voight and Pritchard, 2005; Zhang and Deng, 2010). In cryptic relatedness, relationships between individuals in the sample are typically either completely known (for example, pedigrees or families are available) or unknown (for example, a sample of potentially related individuals). The existence of individuals originating from multiple source populations in the study sample may create another problem known as population stratification (for example, Lander and Schork, 1994; Cardon and Palmer, 2003). In addition, in this case, the population structure may be either known (for example, a sample of very distinct populations) or unknown (for example, a random sample of individuals from a single or multiple sites).

It is well known that population-based genomic data association analyses generally suffer from confounding because of population stratification (inability to divide the variance into within- and among-population components) and cryptic relatedness (inability to account for varying within-population relationships among study individuals). If not properly accounted for, spurious associations may occur in the genomic data association analyses because of these confounding factors (stratification or cryptic relatedness) rather than real association between the tested genomic factor and the trait value. Population stratification is a more widely discussed topic than cryptic relatedness. Yet, several papers on both topics can be easily found in phenotype–marker association studies (Lander and Schork, 1994; Cardon and Palmer, 2003; Yu et al., 2006; Kang et al., 2008), and also in phenotype–expression association studies (Gibson, 2003; Kraft and Horvath, 2003; Kraft et al., 2003; Lu et al., 2004) and clinical quantitative trait locus studies in which phenotype is simultaneously explained by multiple data types (Hoti and Sillanpää, 2006; Pikkuhookana and Sillanpää, 2009). It may be noted that in some cases, for example, in model organisms, population stratification and cryptic relatedness may both be needed to be corrected simultaneously (Yu et al., 2006; Kang et al., 2008; Stich et al., 2008).

In the first section, we will cover the most common approaches to controlling population stratification. In the second section, we will consider approaches for cryptic relatedness. Finally, we consider the use of estimation-based variable selection and multilocus association models and their robustness to these confounding factors. In each point, the methods are presented for determining phenotype–marker association, but the suitability of each approach is also commented for other types of genomic data association analyses.

Approaches for population stratification

In principle, it is possible to minimize the risk of population stratification by carefully selecting the study material from a genetically isolated population, or by using stringent ethnic origin criteria. Otherwise, in population-based association studies, the current techniques to overcome the problem of hidden population structure (stratification) can roughly be divided into the following categories: (1) stratified analysis, (2) genomic controls, (3) structured association, (4) smoothing, (5) principal component approach, (6) matching, (7) approaches based on relationship information and, finally, (8) use of secondary samples. Even though most of these approaches have been considered only together with genetic marker data, they may be arguably applicable with small changes also for other data types (that is, use of gene and protein expressions as explanatory variables). In the following, we shortly present the underlying ideas behind these approaches.

Stratified analysis

If groups of individuals are known or have been observed before the analysis, it is possible to study within-group genomic associations, which are robust to population stratification (Clayton, 2007). Completely separate within-group (within strata) analyses will decrease the overall statistical power because of small sample sizes, but it is also possible to combine within-group information in the test statistic or joint likelihood. The family-based association test methods rely on this principle by using family as unit for a known group (Lange et al., 2002; Horvath et al., 2004).

Unfortunately, within-group membership information or families are not always easy to collect and one can try to approximate membership information (construct approximate ‘families') based on some other information that is available, for example, self-identified ethnicity in human data (see Tang et al., 2005), known location where individuals' grandparents lived, or estimate the most likely ancestry (population assignment) or pairwise relatedness based on an independent set of molecular markers (for example, Lynch and Ritland, 1999; Pritchard et al., 2000a; Weir et al., 2006). For within-population stratified analyses, see the section ‘Structured association'. It is, however, known that population-based association studies even with related individuals are statistically more powerful than family-based (within family) association studies (Teng and Risch, 1999; Havill et al., 2005; Aulchenko et al., 2007a; Hernandéz-Sánchez et al., 2003). Thus, other ways of correcting for population stratification than this may be more favourable.

Genomic controls

In the genomic control approach, one modifies (adjust) the threshold P-value on the basis of a neutral set of independent markers—null markers (genetic marker panel) providing information on the adequate adjustment factor (Devlin and Roeder, 1999; Banacu et al., 2002; Zheng et al., 2006). The adjustment factor (λ) describes variance inflation, that is, reduction in effective sample size (≈N/λ), where N is the original sample size (see Hinds et al., 2004). The underlying assumption for this method is that all the spurious association signals are smaller than the real signals, so that it is possible to handle the problem by adequately adjusting the threshold value. It is assumed that this external set of markers does not include any trait-associated loci. The benefit of this approach is that one does not need to make any assumptions on the number of subpopulations. Different variants of the genomic control approach have been considered and compared by Dadd et al. (2009). An improved version of this approach was presented by Wang (2009), in which, instead of using a single adjustment factor, one can use several of them. However, it has been argued that the genomic control approach suffers from weak statistical power when the effect of population structure is large, as is common in model organisms (Yu et al., 2006; Kang et al., 2008). In principle, by using the genomic control approach, it should be possible to construct a related genomic control adjustment factor based on the neutral gene expression data, which can then be applied for testing phenotype–expression association.

Structured association

In the structured association, unknown population membership probabilities are first estimated using population assignment methods (for example, Pritchard et al., 2000a; Dawson and Belkhir, 2001; Corander et al., 2003; Falush et al., 2003; Alexander et al., 2009). These probabilities are subsequently used in association analysis (Pritchard et al., 2000b; Thornsberry et al., 2001; Yu et al., 2006). Phenotype–marker association at candidate locus can be tested within subpopulations using likelihood ratio test (Pritchard et al., 2000b; Thornsberry et al., 2001). Alternatively, the association model can include the subpopulation mean terms weighted with individual membership probabilities (Yu et al., 2006). Versions of the method in which both of these tasks (estimation of population memberships and phenotype–marker association) are carried out simultaneously exist (for example, Satten et al., 2001; Ripatti et al., 2001; Sillanpää et al., 2001; Hoggart et al., 2003). In all of these structured association methods, population memberships are estimated on the basis of neutral set of independent markers (genetic marker panel). For exception to this, see Sillanpää and Bhattacharjee (2006). It may be noted that the structured association approach has been recently extended to genome-wide sets of marker loci (Alexander et al., 2009). Generally, it is easy to include the subpopulation mean terms also in other types of genomic (that is, phenotype–expression or phenotype–protein) association models or in models that consider marker and expression data jointly to explain the phenotype.

Smoothing

The idea in the smoothing approach is that the signal is smoothed along the chromosome according to an exponential decay or some other spatial function, depending on the genetic or physical map distances (Conti and Witte, 2003; Sillanpää and Bhattacharjee, 2005; Tsai et al., 2008). This is feasible for a tightly linked set of markers that are in considerable linkage disequilibrium with each other. In such a setting, neighbouring markers are used to strengthen the weak but real association signals and smooth the spurious association signals downwards. A similar control approach for spurious peaks has also been proposed for proteomics data sets, in which protein intensity peaks are smoothed with respect to the neighbouring locations (Du et al., 2006) and according to the m/z distance between the positions (Bhattacharjee et al., 2008). For a related smoothing approach for expression data, see Sillanpää and Noykova (2008).

Principal component approach

The use of principal component analysis to correct the stratification in structured populations has been suggested (Patterson et al., 2006; Price et al., 2006; Zhu et al., 2008). One proceeds by assuming that an external set of neutral molecular markers is available and any of them is not associated with the trait of interest. Principal components are first estimated from the correlation matrix of external marker genotypes of unrelated individuals (Price et al., 2006). Then, the first few principal components (explaining most of the underlying variation) are used as regression covariates in the association model, or are incorporated into a randomization test (see Kimmel et al., 2007). On minor levels of stratification, one can simply omit samples that appear as outliers in principal component approaches. In the related method of Epstein et al. (2007), instead of principal components, one uses components from a partial least-squares regression. Owing to their ease of implementation, these methods are very popular in human genetics, even if the correction given by them may fail, especially if the external marker set used is not large (for example, Epstein et al., 2007; Lee et al., 2008; Wang, 2009). It should be relatively easy to apply these corrections (estimated either from external marker or from expression data) to phenotype–expression and phenotype–protein–expression association studies.

Matching

During the design stage of the study, it is possible to collect pairs of individuals (full sibs or cousins; one case and one control individual) that are otherwise similar (that is, individually matched) with respect to covariates such as ethnic background, sex, age and so on (Gauderman et al., 1999; Zondervan et al., 2002). Group matching refers to a similar process in which instead of individuals, groups are matched. Even if matching could eliminate the problem of population stratification, one potential problem is over-matching (that is, reduction of statistical power by unnecessary matching on too many factors, which creates matched units that are too much alike also in their phenotypes). To consider matching in quantitative traits, phenotypically discordant sib-pair collection strategies may be used (cf. Risch and Zhang, 1995).

Special procedures for genetic ancestry matching that are applicable to existing data sets have been proposed lately on the basis of the information on non-genetic variables (Lee, 2004) and marker genotypes (Hinds et al., 2004; Luca et al., 2008; Guan et al., 2009). In Hinds et al. (2004), ancestry was estimated by population assignment methods, and in Luca et al. (2008) by principal components. These methods considered different strategies for genetic ancestry group matching and removing ‘unmatchable outlier individuals'. Guan et al. (2009) proposed the use of identity-by-state-based simple (dis)similarity measures as a tool for individual genetic ancestry matching. Luca et al. (2008) emphasized the use of ‘control databases' in genome-wide association studies and considered a problem wherein cases are sampled in a quite different region from that of the controls.

Generalization and application of these techniques to other forms of genomic association analysis should be relatively easy.

Use of relationship information

Affected trios

The transmission and disequilibrium test (TDT) allows to control for confounding by studying association in the presence of linkage (Spielman et al., 1993). Originally TDT was developed as a confirmatory second test to filter real associations out of spurious signals. This original motivation is well justified because TDT has lesser power than ordinary association testing (Long and Langley, 1999). However, TDT is nowadays commonly used as a general test of association. The basic version of TDT assumes binary phenotype, biallelic loci and uses data on affected trios (unrelated cases and their parents). At a given locus, it tests whether a certain allele is transmitted from heterozygous parents to the affected offspring more often than expected under Mendelian segregation, and the observed segregation distortion is taken as evidence for the locus having something to do with the affection status. TDT has been generalized to quantitative traits (Allison, 1997; Rabinowitz, 1997; Abecasis et al., 2000), multiallelic markers (Sham and Curtis, 1995), marker haplotypes (Clayton, 1999) and to several data structures (Spielman and Ewens, 1998; Abecasis et al., 2000). As handling of nonrandom missing genotype data in parents is problematic, a robust version of TDT has also been developed (Sebastiani et al., 2004).

Pseudo-control data

Another relationship information-based approach to control for confounding is the so-called pseudo-control approach, in which a sample containing only affected cases (and their parents) is collected and the artificial control sample is derived indirectly on the basis of parental genotypes and haplotypes. At each locus, pseudo-control individuals have genetic material that was not transmitted from the parents to the cases (for example, Falk and Rubinstein, 1987; Terwilliger and Ott, 1992; Lander and Schork, 1994; Gauderman et al., 1999; Greenland, 1999). The benefit of deriving the control sample in this way is that one obtains well-matched controls and avoids spurious associations because of ethnic confounding, that is, closer kinship among the affected samples (Terwilliger and Weiss, 1998). This pseudo-control approach has been generalized also to multilocus association analysis and single-tail sampling with quantitative traits (Sillanpää and Hoti, 2007).

Pedigree data

More general approaches use pedigree data, in which linkage information and association information are combined (George et al., 1999; Lund et al., 2003; Pérez-Enciso, 2003; Meuwissen and Goddard, 2004; Meuwissen and Goddard, 2007; Gasbarra et al., 2009; Hernandéz-Sánchez et al., 2009), resulting in a strong signal at true positions. Linkage information confirms only the real associations and one obtains weaker signals at the spurious positions, which provides a way to control confounding in association studies.

In pedigree-based linkage analysis, founder individuals are generally assumed to be unrelated. However, association information is also available in pedigrees, when founders are related. Thus, combined analysis tries to model also relationships between pedigree founders. To do so, some assumptions (for example, from effective population size) are often made from founders and/or a recent history of the population (see for example, Meuwissen and Goddard, 2001).

Correction methods using relationship information are not easy to generalize to gene- or protein-expression data because these approaches are based on the discrete nature of the marker data and the linkage concept.

Use of secondary samples

The idea behind this approach is that analysis is carried out jointly for two samples of data from the same study population, but with different study designs: one containing a population-based sample of individuals (the association signal from these data suffers from confounding) and the other sample comprising related individuals (the association signal from these data is robust to confounding; see above). As the overall signal is a synthesis of the individual signals of two data sets, it is likely to be relatively robust to confounding (for example, Epstein et al., 2005; Kazeem and Farrall, 2005). In addition, this ‘meta-analysis' approach improves the statistical power by combining information from multiple data sets. For a review and comparison of different secondary sample approaches, see Glaser and Holmans (2009) and Infante-Rivard et al. (2009). Alternatively, it is possible to analyse these two samples (with association and linkage) separately and study the overlap between the results (Manenti et al., 2009), or carry out association testing conditionally on the linkage results (Cantor et al., 2005).

As the use of secondary samples to correct for population stratification relies on two separate samples, issues relating to the presence of heterogeneity cannot be fully ruled out (see Sillanpää and Auranen, 2004). In addition, as this correction method is based on the use of relationship information and marker linkage, this method is not easy to generalize to gene- or protein-expression data. A variant in which two samples are combined and population stratification is corrected for by using the principal component approach (Zhu et al., 2008) should also be applicable for gene- or protein-expression data.

Approaches for cryptic relatedness

It is good to keep in mind that population stratification and cryptic relatedness are two different problems and correction methods typically consider only a single problem at a time. Exceptions to this were the stratified analysis, genomic controls and the use of relationship information subsections above. Especially useful in this respect may be the approaches in which linkage information and association information are combined. Otherwise, the current techniques to overcome the problem of cryptic relatedness in population-based association analysis can be divided into the following categories: (1) infinite polygenic model, (2) regression covariates, (3) test-statistic accounting for relatedness and (4) genomic controls. These techniques can be generalized quite easily to the other genomic data analyses. In the following, we shortly describe the ideas behind these methods.

Infinite polygenic model

The classical approach to correcting for relatedness in the sample is to include a polygenic component into the population-based genomic association analysis model (for example, George and Elston, 1987; Jannink et al., 2001; Lu et al., 2004; Yu et al., 2006; Bradbury et al., 2007; Pikkuhookana and Sillanpää, 2009). This practice is also known as the measured genotype approach. Because of the availability of large high-throughput association data sets, it is more popular to use a recent variant of this approach, called GRAMMAR (Amin et al., 2007; Aulchenko et al., 2007a, 2007b), in which residual dependencies are first precorrected from data and repeated (phenotype–marker) association analyses are carried out for adjusted residuals using rapid methods. Even though this approach was presented for marker data, it is straightforward to use it (or measured genotype approach) in concert with other types of genomic (for example, phenotype-expression or other) association analysis. Nevertheless, the precorrection approach suffers from model misspecification. It underestimates uncertainty in polygenic effects, and may reduce the statistical power (cf. Martinez et al., 2005). Moreover, estimation of variance components is known to be unstable when sample sizes are small (Misztal, 1996; Burton et al., 1999; Pikkuhookana and Sillanpää, 2009). In any case, the GRAMMAR approach has been shown to outperform many of the competing methods introduced for pedigree-based association analysis (see Aulchenko et al., 2007a). For unknown relationships, a relationship matrix can be first estimated using various methods (for example, Milligan, 2003; Leutenegger et al., 2003; Blouin, 2003; Weir et al., 2006; Frentiu et al., 2008). Interestingly, the use of a simple identity-by-state allele-sharing matrix has recently been found to provide an efficient alternative to more sophisticated methods in correcting for cryptic relatedness-induced confounding (Zhao et al., 2007; Kang et al., 2008). See also van de Casteele et al. (2001) and Bink et al. (2008).

Regression covariate approach

An alternate approach to correcting for relatedness is to use the method of Bonney (Bonney, 1986; Thomas, 2004). This method approximates the influence of the infinite polygenic model by having phenotypes of the parents, the spouse and sibs of the participant as regression covariates in the model (Bonney, 1986). Pikkuhookana and Sillanpää (2009) compared the performance of these two approaches (infinite polygenic model and regression covariates) in Bayesian genomic association models and found the regression covariate approach to perform better for smaller genomic data sets. It may be noted that their model considered the effects of marker genotypes and gene expressions jointly in explaining the phenotype. Although this approach provides a framework for including phenotypes from ungenotyped parents into the analysis (cf. Purcell et al., 2005), it cannot be applied in case of the unknown relatedness, as the phenotypes of the relatives are generally not available.

Test statistic accounting for relatedness

The test statistics often used for population-based association analysis assume the independence of individuals in the sample. Test statistics for population-based phenotype–marker association testing among related individuals (correlated family data) have been introduced, for example, for (between-family) association in family-based design (Teng and Risch, 1999) or for a more general design using all the related and unrelated individuals (Slager and Schaid, 2001). A similar kind of test based on family-based (within-family) association analysis has been developed for expression data (Kraft et al., 2003). In case of unknown relationships, one can start by estimating the family structure (or pairwise relatedness) using an additional set of molecular markers (for example, Gasbarra et al., 2007; Bink et al., 2008). However, one should be careful here whether the test statistic is measuring population-based or within-family association. Unlike within-family association, population-based association analysis with related individuals suffers from population stratification, but has more statistical power than family-based (within family) association analysis, which was considered in the section ‘Stratified analysis' (Teng and Risch, 1999; Slager and Schaid, 2001). With modifications, it is possible to derive suitable test statistics accounting for relatedness in population-based association for gene- or protein-expression data.

Genomic controls

Even though genomic control has been introduced to control for population stratification, it has also been suggested for handling cryptic relatedness (Devlin and Roeder, 1999; Banacu et al., 2002; Yan et al., 2009). In this method, variation inflation is corrected by adjusting the test statistic based on the information from unlinked null markers. This works also in case of unknown relatedness because unlike the infinite polygenic model, one does not need to estimate the pairwise relationships first. The genomic control approach has also been suggested to be useful for additional correction after the infinite polygenic model is fitted (Amin et al., 2007).

Use of estimation-based variable selection and multilocus models

It may sound strange that by using estimation-based variable selection and multilocus models (without a correction term), one can automatically reduce the number of false positives in genomic data association analyses. However, there is increasing evidence that Bayesian or frequentist multilocus modelling approaches are flexible enough to automatically account or self-correct for population stratification in binary traits (Setakis et al., 2006), ordinal and censored traits (Iwata et al., 2009), as well as in quantitative traits (Iwata et al., 2007). In case of cryptic relatedness and quantitative traits, the self-correction property of Bayesian multilocus association approach was found by Pikkuhookana and Sillanpää (2009). The robustness of the multilocus association approach to these problems results presumably from the fact that, during the estimation (variable selection) process, other genetic components, few at a time, could capture or explain a small amount of confounding variation (see Pikkuhookana and Sillanpää, 2009). This is essentially so because in these approaches, variable selection is done simultaneously with the effect estimation (Kilpikari and Sillanpää, 2003; O'Hara and Sillanpää, 2009) and large candidate panels jointly have the potential to explain many types of variation. For a close connection between the multilocus association model and a polygenic model with realized relationship matrix, see Hayes et al. (2009). However, additional studies on the self-correction property of these approaches are needed before one can say anything definitive on this, for example, on how much variability in the markers is needed for self-correction approach to be effective. In two-genotype data analysis (in which there is only a single estimable effect coefficient at each locus), Iwata et al. (2007) found that the use of a correction term in the model still provides some additional advantages over self-correction. However, it is likely that using single-nucleotide polymorphism data and by fitting two estimable coefficients (for three genotypes) can provide more variability and again more ability for self-correction in the model. As a conclusion, I wish to emphasize that there are good reasons why future studies on genomic data association analysis should focus on or at least pay much more attention to better characterizing the benefits and pitfalls of these estimation-based multilocus approaches. It is also well known that the use of multilocus models improves statistical power and helps avoid problems due to model misspecification, such as biased position estimates, and occurrence of ‘ghost QTLs' (see, for example, Sillanpää and Auranen, 2004).

Acknowledgments

I am grateful to Crispin M Mutshinda for constructive comments on this paper. This work was supported by research grants from the Academy of Finland and University of Helsinki's research funds.

The author declares no conflict of interest.

References

Abecasis GR, Cardon LR, Cookson WO. A general test of association for quantitative traits in nuclear families. Am J Hum Genet. 2000;66:279–292. doi: 10.1086/302698. [DOI] [PMC free article] [PubMed] [Google Scholar]
Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Allison DB. Transmission-disequilibrium tests for quantitative traits. Am J Hum Genet. 1997;60:676–690. [PMC free article] [PubMed] [Google Scholar]
Amin N, van Duijn CM, Aulchenko YS. A genomic background based method for association analysis in related individuals. PloS One. 2007;12:e1274. doi: 10.1371/journal.pone.0001274. [DOI] [PMC free article] [PubMed] [Google Scholar]
Aulchenko YS, de Koning D-J, Haley C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics. 2007a;177:577–585. doi: 10.1534/genetics.107.075614. [DOI] [PMC free article] [PubMed] [Google Scholar]
Aulchenko YS, Ripke S, Isaacs A, Van Duijn CM. GenABEL: an R library for genome-wide association analysis. Bioinformatics. 2007b;23:1294–1296. doi: 10.1093/bioinformatics/btm108. [DOI] [PubMed] [Google Scholar]
Aune TM, Parker JS, Mass K, Liu Z, Olson NJ, Moore JH. Co-localization of differentially expressed genes and shared susceptibility loci in human autoimmunity. Genet Epidemiol. 2004;27:162–172. doi: 10.1002/gepi.20013. [DOI] [PubMed] [Google Scholar]
Banacu SA, Devlin B, Roeder K. Association studies for quantitative traits in structured populations. Genet Epidemiol. 2002;22:78–93. doi: 10.1002/gepi.1045. [DOI] [PubMed] [Google Scholar]
Beavis WD.1998QTL analysis: power, precision, and accuracyIn: Paterson AH (ed.).Molecular Dissection of Complex Traits CRC Press: Boca Raton, FL; 145–162. [Google Scholar]
Bhattacharjee M, Botting CH, Sillanpää MJ. Bayesian biomarker identification based on marker-expression-proteomics data. Genomics. 2008;92:384–392. doi: 10.1016/j.ygeno.2008.06.006. [DOI] [PubMed] [Google Scholar]
Bhattacharjee M, Sillanpää MJ.2009Bayesian joint disease-marker-expression analysis applied to clinical characteristics of chronic fatigue syndromeIn: McConnell P, Lim S, Cuticchia AJ (eds)Methods of Microarray Data Analysis VI CreateSpace Publishing: Scotts Valley, CA; 15–34. [Google Scholar]
Bink MCAM, Anderson AD, van de Weg WE, Thompson EA. Comparison of marker-based pairwise relatedness estimators on a pedigreed plant population. Theor Appl Genet. 2008;117:843–855. doi: 10.1007/s00122-008-0824-1. [DOI] [PubMed] [Google Scholar]
Blouin MS. DNA-based methods for pedigree reconstruction and kinship analysis in natural populations. Trends Ecol Evol. 2003;18:503–511. [Google Scholar]
Bonney GE. Regressive logistic models for familial disease and other binary traits. Bioinformatics. 1986;42:611–625. [PubMed] [Google Scholar]
Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23:2633–2635. doi: 10.1093/bioinformatics/btm308. [DOI] [PubMed] [Google Scholar]
Burton P, Tiller K, Gurrin L, Cookson W, Musk A, Palmer LJ. Genetic variance components analysis for binary phenotypes using generalized linear mixed models (GLMMs) and Gibbs sampling. Genet Epidemiol. 1999;17:118–140. doi: 10.1002/(SICI)1098-2272(1999)17:2<118::AID-GEPI3>3.0.CO;2-V. [DOI] [PubMed] [Google Scholar]
Bystrykh L, Weersing E, Dontje B, Sutton S, Pletcher MT, Wiltshire T, et al. Uncovering regulatory pathways that affect hematopoietic stem cell function using ‘genetical genomics'. Nat Genet. 2005;37:225–232. doi: 10.1038/ng1497. [DOI] [PubMed] [Google Scholar]
Cantor RM, Chen GK, Pajukanta P, Lange K. Association testing in a linked region using large pedigrees. Am J Hum Genet. 2005;76:538–542. doi: 10.1086/428628. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cardon LR, Palmer LJ. Population stratification and spurious allelic association. Lancet. 2003;361:598–604. doi: 10.1016/S0140-6736(03)12520-2. [DOI] [PubMed] [Google Scholar]
Clayton DG. A generalization of the transmission/disequilibrium test for uncertain-haplotype transmission. Am J Hum Genet. 1999;65:1170–1177. doi: 10.1086/302577. [DOI] [PMC free article] [PubMed] [Google Scholar]
Clayton DG.2007Population associationIn: Balding DJ, Bishop M, Cannings C (eds).Handbook of Statistical Genomics3rd edn, vol. 2. pp.Wiley: Chichester, UK; 1264–1285. [Google Scholar]
Conti DV, Witte JS. Hierarchical modeling of linkage disequilibrium: genetic structure and spatial relations. Am J Hum Genet. 2003;72:351–363. doi: 10.1086/346117. [DOI] [PMC free article] [PubMed] [Google Scholar]
Corander J, Waldmann P, Sillanpää MJ. Bayesian analysis of genetic differentiation between populations. Genetics. 2003;163:367–374. doi: 10.1093/genetics/163.1.367. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dadd T, Weale ME, Lewis CM. A critical evaluation of genomic control methods for genetic association studies. Genet Epidemiol. 2009;33:290–298. doi: 10.1002/gepi.20379. [DOI] [PubMed] [Google Scholar]
Dawson KJ, Belkhir K. A Bayesian approach to the identification of panmictic populations and the assignment of individuals. Genet Res. 2001;78:59–77. doi: 10.1017/s001667230100502x. [DOI] [PubMed] [Google Scholar]
Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004. doi: 10.1111/j.0006-341x.1999.00997.x. [DOI] [PubMed] [Google Scholar]
Du P, Kibbe WA, Lin SM. Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics. 2006;22:2059–2065. doi: 10.1093/bioinformatics/btl355. [DOI] [PubMed] [Google Scholar]
Epstein MP, Allen AS, Satten GA. A simple and improved correction for population stratification in case-control studies. Am J Hum Genet. 2007;80:921–930. doi: 10.1086/516842. [DOI] [PMC free article] [PubMed] [Google Scholar]
Epstein MP, Veal CD, Trembath RC, Barker JNWN, Li C, Satten GA. Genetic association analysis using data from triads and unrelated subjects. Am J Hum Genet. 2005;76:592–608. doi: 10.1086/429225. [DOI] [PMC free article] [PubMed] [Google Scholar]
Falk CT, Rubinstein P. Haplotype relative risks: an easy reliable way to construct a proper control sample for risk calculations. Ann Hum Genet. 1987;51:227–233. doi: 10.1111/j.1469-1809.1987.tb00875.x. [DOI] [PubMed] [Google Scholar]
Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003;164:1567–1587. doi: 10.1093/genetics/164.4.1567. [DOI] [PMC free article] [PubMed] [Google Scholar]
Foss EJ, Radulovic D, Shaffer SA, Ruderfer DM, Bedalov A, Goodlett DR, et al. Genetic basis of proteome variation in yeast. Nat Genet. 2007;39:1369–1375. doi: 10.1038/ng.2007.22. [DOI] [PubMed] [Google Scholar]
Frentiu FD, Clegg SM, Chittock J, Burke T, Blows MW, Owens IPF. Pedigree-free animal models: the relatedness matrix reloaded. Proc R Soc B. 2008;275:639–647. doi: 10.1098/rspb.2007.1032. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gasbarra D, Pirinen M, Sillanpää MJ, Arjas E. Bayesian quantitative trait locus mapping based on reconstruction of recent genetic histories. Genetics. 2009;183:709–721. doi: 10.1534/genetics.109.104190. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gasbarra D, Pirinen M, Sillanpää MJ, Salmela E, Arjas E. Estimating geneologies from unlinked marker data: a Bayesian approach. Theor Pop Biol. 2007;72:305–322. doi: 10.1016/j.tpb.2007.06.004. [DOI] [PubMed] [Google Scholar]
Gauderman WJ, Witte JS, Thomas DC. Family-based association studies. J Natl Cancer Inst Monographs. 1999;26:31–37. doi: 10.1093/oxfordjournals.jncimonographs.a024223. [DOI] [PubMed] [Google Scholar]
George V, Tiwari HK, Zhu X, Elston RC. A test of transmission/disequilibrium for quantitative traits in pedigree data, by multiple regression. Am J Hum Genet. 1999;65:236–245. doi: 10.1086/302444. [DOI] [PMC free article] [PubMed] [Google Scholar]
George VT, Elston RC. Testing the association between polymorphic markers and quantitative traits in pedigrees. Genet Epidemiol. 1987;4:193–201. doi: 10.1002/gepi.1370040304. [DOI] [PubMed] [Google Scholar]
Gibson G. Population genomics: celebrating individual expression. Heredity. 2003;90:1–5. doi: 10.1038/sj.hdy.6800195. [DOI] [PubMed] [Google Scholar]
Glaser B, Holmans P. Comparison of methods for combining case-control and family-based association studies. Hum Hered. 2009;68:106–116. doi: 10.1159/000212503. [DOI] [PubMed] [Google Scholar]
Greenland S. A unified approach to the analysis of case-distribution (case-only) studies. Stat Med. 1999;18:1–15. doi: 10.1002/(sici)1097-0258(19990115)18:1<1::aid-sim961>3.0.co;2-l. [DOI] [PubMed] [Google Scholar]
Guan W, Liang K, Boehnke M, Abecasis CR. Genotype-based matching to correct for population stratification in large-scale case-control genetic association studies. Genet Epidemiol. 2009;33:508–517. doi: 10.1002/gepi.20403. [DOI] [PMC free article] [PubMed] [Google Scholar]
Göring HHH, Terwilliger JD, Blangero J. Large upward bias in estimation of locus-specific effects from genomewide scans. Am J Hum Genet. 2001;69:1357–1369. doi: 10.1086/324471. [DOI] [PMC free article] [PubMed] [Google Scholar]
Havill LM, Dyer TD, Richardson DK, Mahaney MC, Blangero J. The quantitative trait linkage disequilibrium test: a more powerful alternative to the quantitative transmission disequilibrium test for use in the absence of population stratification. BMC Genet. 2005;6 (Suppl 1:S91. doi: 10.1186/1471-2156-6-S1-S91. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hayes BJ, Visscher PM, Goddard ME. Increased accuracy of artificial selection by using the realized relationship matrix. Genet Res. 2009;91:47–60. doi: 10.1017/S0016672308009981. [DOI] [PubMed] [Google Scholar]
Hernandéz-Sánchez J, Grunchec J-A, Knott S. A web application to perform linkage disequilibrium and linkage analyses on a computational grid. Bioinformatics. 2009;25:1377–1383. doi: 10.1093/bioinformatics/btp171. [DOI] [PubMed] [Google Scholar]
Hernandéz-Sánchez J, Haley CS, Visscher PM. Power of QTL detection using association tests with family controls. Eur J Hum Genet. 2003;11:819–827. doi: 10.1038/sj.ejhg.5201042. [DOI] [PubMed] [Google Scholar]
Hinds DA, Stokowski RP, Patil N, Konvicka K, Kershenobich D, Cox DR, et al. Matching strategies for genetic association studies in structured populations. Am J Hum Genet. 2004;74:317–325. doi: 10.1086/381716. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hoggart CJ, Parra EJ, Shriver MD, Bonilla C, Kittles RA, Clayton DG, et al. Control of confounding in genetic associations in stratified populations. Am J Hum Genet. 2003;72:1492–1504. doi: 10.1086/375613. [DOI] [PMC free article] [PubMed] [Google Scholar]
Horvath S, Xu X, Lake SL, Silverman EK, Weiss ST, Laird NM. Family-based tests for associating haplotypes with general phenotype data: application to asthma genetics. Genet Epidemiol. 2004;26:61–69. doi: 10.1002/gepi.10295. [DOI] [PubMed] [Google Scholar]
Hoti F, Sillanpää MJ. Bayesian mapping of genotype x expression interactions in quantitative and qualitative traits. Heredity. 2006;97:4–18. doi: 10.1038/sj.hdy.6800817. [DOI] [PubMed] [Google Scholar]
Infante-Rivard C, Mirea L, Bull SB. Combining case-control and case-trio data from the same population in genetic association analyses: overview of approaches and illustration with a candidate gene study. Am J Epidemiol. 2009;170:654–664. doi: 10.1093/aje/kwp180. [DOI] [PubMed] [Google Scholar]
Iwata H, Ebana K, Fukuoka S, Jannink J-L, Hayashi T. Bayesian multilocus association mapping on ordinal and censored traits and its application to the analysis of genetic variation among Oryza sativa L germplasms. Theor Appl Genet. 2009;118:865–880. doi: 10.1007/s00122-008-0945-6. [DOI] [PubMed] [Google Scholar]
Iwata H, Uga Y, Yoshioka Y, Ebana K, Hayashi T. Bayesian association mapping of multiple quantitative trait loci and its application to the analysis of genetic variation among Oryza sativa L germplasms. Theor Appl Genet. 2007;114:1437–1449. doi: 10.1007/s00122-007-0529-x. [DOI] [PubMed] [Google Scholar]
Jannink J-L, Bink MCAM, Jansen RC. Using complex plant pedigrees to map valuable genes. Trends Plant Sci. 2001;6:337–342. doi: 10.1016/s1360-1385(01)02017-9. [DOI] [PubMed] [Google Scholar]
Jansen R, Greenbaum D, Gerstein M. Relating whole-genome expression data with protein-protein interactions. Genome Res. 2002;12:37–46. doi: 10.1101/gr.205602. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jansen RC, Nap J-P. Genetical genomics: the added value from segregation. Trends Genet. 2001;17:388–391. doi: 10.1016/s0168-9525(01)02310-1. [DOI] [PubMed] [Google Scholar]
Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, et al. Efficient control of population structure in model organism association mapping. Genetics. 2008;178:1709–1723. doi: 10.1534/genetics.107.080101. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kazeem GR, Farrall M. Integrating case-control and TDT studies. Ann Hum Genet. 2005;69:329–335. doi: 10.1046/j.1529-8817.2005.00156.x. [DOI] [PubMed] [Google Scholar]
Kilpikari R, Sillanpää MJ. Bayesian analysis of multilocus association in quantitative and qualitative traits. Genet Epidemiol. 2003;25:122–135. doi: 10.1002/gepi.10257. [DOI] [PubMed] [Google Scholar]
Kimmel G, Jordan MI, Halperin E, Shamir R, Karp RM. A randomization test for controlling population stratification in whole-genome association studies. Am J Hum Genet. 2007;81:895–905. doi: 10.1086/521372. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kraft P, Horvath S. The genetics of gene expression and gene mapping. Trends Biotechnol. 2003;21:377–378. doi: 10.1016/S0167-7799(03)00191-4. [DOI] [PubMed] [Google Scholar]
Kraft P, Schadt E, Aten J, Horvath S. A family-based test for correlation between gene expression and trait values. Am J Hum Genet. 2003;72:1323–1330. doi: 10.1086/375167. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lande R, Thompson R. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics. 1990;124:743–756. doi: 10.1093/genetics/124.3.743. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 1994;265:2037–2048. doi: 10.1126/science.8091226. [DOI] [PubMed] [Google Scholar]
Lange C, DeMeo DL, Laird NM. Power and design considerations for a general class of family-based association tests: quantitative traits. Am J Hum Genet. 2002;71:1330–1341. doi: 10.1086/344696. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee S, Sullivan P, Zou F, Wright F. Comment on a simple and improved correction for population stratification. Am J Hum Genet. 2008;82:524–526. doi: 10.1016/j.ajhg.2007.10.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee W-C. Case-control association studies with matching and genomic controlling. Genet Epidemiol. 2004;27:1–13. doi: 10.1002/gepi.20011. [DOI] [PubMed] [Google Scholar]
Leutenegger AL, Prum B, Génin E, Verny C, Lemainque A, Clerget-Darpoux F, et al. Estimation of the inbreeding coefficient through use of genomic data. Am J Hum Genet. 2003;73:516–523. doi: 10.1086/378207. [DOI] [PMC free article] [PubMed] [Google Scholar]
Long AD, Langley CH. The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 1999;9:720–731. [PMC free article] [PubMed] [Google Scholar]
Lu Y, Liu P-Y, Liu Y-J, Xu F-H, Deng H-W. Quantifying the relationship between gene expression and trait values in general pedigrees. Genetics. 2004;168:2395–2405. doi: 10.1534/genetics.104.031666. [DOI] [PMC free article] [PubMed] [Google Scholar]
Luca D, Ringquist S, Klei L, Lee AB, Gieger C, Wichman H-E, et al. On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants. Am J Hum Genet. 2008;82:453–463. doi: 10.1016/j.ajhg.2007.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lund MS, Sorensen P, Guldbrandtsen B, Sorensen DA. Multitrait fine mapping of quantitative trait loci using combined linkage disequilibria and linkage analysis. Genetics. 2003;163:405–410. doi: 10.1093/genetics/163.1.405. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lynch M, Ritland K. Estimation of pairwise relatedness with molecular markers. Genetics. 1999;152:1753–1766. doi: 10.1093/genetics/152.4.1753. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maher BS. The case of the missing heritability. Nature. 2008;456:18–21. doi: 10.1038/456018a. [DOI] [PubMed] [Google Scholar]
Manenti G, Galvan A, Pettinicchio A, Trincucci G, Spada E, Zolin A, et al. Mouse genome-wide association mapping needs linkage analysis to avoid false-positive loci. PloS Genet. 2009;5:e1000331. doi: 10.1371/journal.pgen.1000331. [DOI] [PMC free article] [PubMed] [Google Scholar]
Martinez V, Thorgaard G, Robison B, Sillanpää MJ. An application of Bayesian QTL mapping to early development in double haploid lines of rainbow trout including environmental effects. Genet Res. 2005;85:209–221. doi: 10.1017/S0016672305007871. [DOI] [PubMed] [Google Scholar]
McCarthy MI, Hirschhorn JN. Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet. 2008;17:R156–R165. doi: 10.1093/hmg/ddn289. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meuwissen THE, Goddard ME. Prediction of identity by descent probabilities from marker-haplotypes. Genet Sel Evol. 2001;33:605–634. doi: 10.1186/1297-9686-33-6-605. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meuwissen THE, Goddard ME. Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data. Genet Sel Evol. 2004;36:261–279. doi: 10.1186/1297-9686-36-3-261. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meuwissen THE, Goddard ME. Multipoint identity-by-descent prediction using dense markers to map quantitative trait loci and estimate effective population size. Genetics. 2007;176:2551–2560. doi: 10.1534/genetics.107.070953. [DOI] [PMC free article] [PubMed] [Google Scholar]
Milligan BG. Maximum-likelihood estimation of relatedness. Genetics. 2003;163:1153–1167. doi: 10.1093/genetics/163.3.1153. [DOI] [PMC free article] [PubMed] [Google Scholar]
Misztal I. Estimation of variance components with large-scale dominance models. J Dairy Sci. 1996;80:965–974. [Google Scholar]
O'Hara RB, Sillanpää MJ. A review of Bayesian variable selection methods: what, how and which. Bayesian Anal. 2009;4:85–118. [Google Scholar]
Patterson N, Price AL, Reich D. Population structure and eigen analysis. PloS Genet. 2006;2:e190. doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pérez-Enciso M. Fine mapping of complex trait genes combining pedigree and linkage disequilibrium information: a Bayesian unified framework. Genetics. 2003;163:1497–1510. doi: 10.1093/genetics/163.4.1497. [DOI] [PMC free article] [PubMed] [Google Scholar]
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000b;155:845–959. doi: 10.1093/genetics/155.2.945. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pritchard JK, Stephens M, Rosenberg NA, Donnelly P. Association mapping in structured populations. Am J Hum Genet. 2000a;67:170–181. doi: 10.1086/302959. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pikkuhookana P, Sillanpää MJ. Correcting for relatedness in Bayesian models for genomic data association analysis. Heredity. 2009;103:223–237. doi: 10.1038/hdy.2009.56. [DOI] [PubMed] [Google Scholar]
Purcell S, Sham P, Daly MJ. Parental phenotypes in family-based association analysis. Am J Hum Genet. 2005;76:249–259. doi: 10.1086/427886. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rabinowitz D. A transmission disequilibrium test for quantitative trait loci. Hum Hered. 1997;47:342–350. doi: 10.1159/000154433. [DOI] [PubMed] [Google Scholar]
Ripatti S, Pitkäniemi J, Sillanpää MJ. Joint modeling of genetic association and population stratification using latent class models. Genet Epidemiol. 2001;21 (Suppl 1:S401–S414. doi: 10.1002/gepi.2001.21.s1.s409. [DOI] [PubMed] [Google Scholar]
Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1616–1617. doi: 10.1126/science.273.5281.1516. [DOI] [PubMed] [Google Scholar]
Risch N, Zhang H. Extreme discordant sib pairs for mapping quantitative trait loci in humans. Science. 1995;268:1584–1589. doi: 10.1126/science.7777857. [DOI] [PubMed] [Google Scholar]
Satten GA, Flanders WD, Yang Q. Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am J Hum Genet. 2001;68:466–477. doi: 10.1086/318195. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sebastiani P, Abad MM, Alpargu G, Ramoni MF. Robust transmission/disequilibrium test for incomplete family genotypes. Genetics. 2004;168:2329–2337. doi: 10.1534/genetics.103.025841. [DOI] [PMC free article] [PubMed] [Google Scholar]
Setakis E, Stirnadel H, Balding DJ. Logistic regression protects against population structure in genetic association studies. Genome Res. 2006;16:290–296. doi: 10.1101/gr.4346306. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sham PC, Curtis D. An extended transmission/disequilibrium test (TDT) for multi-allele marker loci. Ann Hum Genet. 1995;59:323–336. doi: 10.1111/j.1469-1809.1995.tb00751.x. [DOI] [PubMed] [Google Scholar]
Sillanpää MJ, Auranen K. Replication in genetic studies of complex traits. Ann Hum Genet. 2004;68:646–657. doi: 10.1046/j.1529-8817.2004.00122.x. [DOI] [PubMed] [Google Scholar]
Sillanpää MJ, Bhattacharjee M. Bayesian association-based fine mapping in small chromosomal segments. Genetics. 2005;169:427–439. doi: 10.1534/genetics.104.032680. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sillanpää MJ, Bhattacharjee M. Association mapping of complex trait loci with context-dependent effects and unknown context variable. Genetics. 2006;174:1597–1611. doi: 10.1534/genetics.106.061275. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sillanpää MJ, Hoti Mapping quantitative trait loci from a single-tail sample of the phenotype distribution including survival data. Genetics. 2007;177:2361–2377. doi: 10.1534/genetics.107.081299. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sillanpää MJ, Kilpikari R, Ripatti S, Onkamo P, Uimari P. Bayesian association mapping for quantitative traits in a mixture of two populations. Genet Epidemiol. 2001;21 (Suppl 1:S692–S699. doi: 10.1002/gepi.2001.21.s1.s692. [DOI] [PubMed] [Google Scholar]
Sillanpää MJ, Noykova N. Hierarchical modelling of clinical and expression quantitative trait loci. Heredity. 2008;101:271–284. doi: 10.1038/hdy.2008.58. [DOI] [PubMed] [Google Scholar]
Slager SL, Schaid DJ. Evaluation of candidate genes in case-control studies: a statistical method to account for related subjects. Am J Hum Genet. 2001;68:1457–1462. doi: 10.1086/320608. [DOI] [PMC free article] [PubMed] [Google Scholar]
Slatkin M. Epigenetic inheritance and the missing heritability problem. Genetics. 2009;182:845–850. doi: 10.1534/genetics.109.102798. [DOI] [PMC free article] [PubMed] [Google Scholar]
Spielman SR, Ewens WJ. A sibship test for linkage in presence of association: the sib transmission/disequilibrium test. Am J Hum Genet. 1998;62:450–458. doi: 10.1086/301714. [DOI] [PMC free article] [PubMed] [Google Scholar]
Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region ans insulin-dependent diabetes mellitus (IDDM) Am J Hum Genet. 1993;52:506–516. [PMC free article] [PubMed] [Google Scholar]
Stich B, Möhring J, Piepho H-P, Heckenberger M, Buckler ES, Melchinger AE. Comparison of mixed-model approaches for association mapping. Genetics. 2008;178:1745–1754. doi: 10.1534/genetics.107.079707. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tang H, Quertermous T, Rodriguez B, Kardia SLR, Zhu X, Brown A, et al. Genetic structure, self-identified race/ethnicity, and confounding in case-control association studies. Am J Hum Genet. 2005;76:268–275. doi: 10.1086/427888. [DOI] [PMC free article] [PubMed] [Google Scholar]
Teng J, Risch N. The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases II Individual genotyping. Genome Res. 1999;9:234–241. [PubMed] [Google Scholar]
Terwilliger JD, Ott J. A haplotype-based ‘haplotype relative risk' approach to detecting allelic associations. Hum Hered. 1992;42:337–346. doi: 10.1159/000154096. [DOI] [PubMed] [Google Scholar]
Terwilliger JD, Weiss KM. Linkage disequilibrium mapping of complex trait: fantasy or reality. Curr Opin Biotechnol. 1998;9:578–594. doi: 10.1016/s0958-1669(98)80135-3. [DOI] [PubMed] [Google Scholar]
Thomas DC. Statistical Methods in Genetic Epidemiology. Oxford University Press: New York; 2004. [Google Scholar]
Thornsberry JM, Goodman MM, Doebley J, Kresovich S, Nielsen D, Buckler ES. Dwarf8 polymorphisms associate with variation in flowering time. Nat Genet. 2001;28:286–289. doi: 10.1038/90135. [DOI] [PubMed] [Google Scholar]
Tsai MY, Hsiao CK, Wen SH. A Bayesian spatial multimarker genetic random-effect model for fine-scale mapping. Ann Hum Genet. 2008;72:658–669. doi: 10.1111/j.1469-1809.2008.00459.x. [DOI] [PubMed] [Google Scholar]
van de Casteele T, Galbusera P, Matthysen E. A comparison of microsatellite-based pairwise relatedness estimators. Mol Ecol. 2001;10:1539–1549. doi: 10.1046/j.1365-294x.2001.01288.x. [DOI] [PubMed] [Google Scholar]
Voight BF, Pritchard JK. Confounding from cryptic relatedness in case-control association studies. PloS Genet. 2005;1:e32. doi: 10.1371/journal.pgen.0010032. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang K. Testing for genetic association in the presence of population stratification in genome-wide association studies. Genet Epidemiol. 2009;33:637–645. doi: 10.1002/gepi.20415. [DOI] [PubMed] [Google Scholar]
Weir BS, Anderson AD, Hepler AB. Genetic relatedness analysis: modern data and new challenges. Nat Rev Genet. 2006;7:771–780. doi: 10.1038/nrg1960. [DOI] [PubMed] [Google Scholar]
Wu J, Patwa TH, Lubman DM, Ghosh D. Identification of differentially expressed spatial clusters using humoral response microarray data. Comput Stat Data Anal. 2009;53:3094–3102. doi: 10.1016/j.csda.2008.04.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xiao R, Boehnke M. Quantifying and correcting for the winner's curse in genetic association studies. Genet Epidemiol. 2009;33:453–462. doi: 10.1002/gepi.20398. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yan T, Hou B, Yang Y. Correcting for cryptic relatedness by a regression-based genomic control method. BMC Genet. 2009;10:78. doi: 10.1186/1471-2156-10-78. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu J, Pressoir G, Briggs WH, Vroh BiI, Yamasaki M, Doebley JF, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006;38:203–208. doi: 10.1038/ng1702. [DOI] [PubMed] [Google Scholar]
Zhang F, Deng H-W. Correcting for cryptic relatedness in population-based association studies of continuous traits. Hum Hered. 2010;69:28–33. doi: 10.1159/000243151. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao K, Aranzana MJ, Kim S, Lister C, Shindo C, Tang C, et al. An Arabidopsis example of association mapping in structured samples. PloS Genet. 2007;3:e4. doi: 10.1371/journal.pgen.0030004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zheng G, Freidlin B, Gastwirth JL. Robust genomic control for association studies. Am J Hum Genet. 2006;78:350–356. doi: 10.1086/500054. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhu X, Li S, Cooper RS, Elston RC. A unified association analysis approach for family and unrelated samples correcting for stratification. Am J Hum Genet. 2008;82:352–365. doi: 10.1016/j.ajhg.2007.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zondervan KT, Cardon LR, Kennedy SH. What makes a good case-control study. Hum Reprod. 2002;17:1415–1423. doi: 10.1093/humrep/17.6.1415. [DOI] [PubMed] [Google Scholar]

[bib1] Abecasis GR, Cardon LR, Cookson WO. A general test of association for quantitative traits in nuclear families. Am J Hum Genet. 2000;66:279–292. doi: 10.1086/302698. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Allison DB. Transmission-disequilibrium tests for quantitative traits. Am J Hum Genet. 1997;60:676–690. [PMC free article] [PubMed] [Google Scholar]

[bib4] Amin N, van Duijn CM, Aulchenko YS. A genomic background based method for association analysis in related individuals. PloS One. 2007;12:e1274. doi: 10.1371/journal.pone.0001274. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Aulchenko YS, de Koning D-J, Haley C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics. 2007a;177:577–585. doi: 10.1534/genetics.107.075614. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Aulchenko YS, Ripke S, Isaacs A, Van Duijn CM. GenABEL: an R library for genome-wide association analysis. Bioinformatics. 2007b;23:1294–1296. doi: 10.1093/bioinformatics/btm108. [DOI] [PubMed] [Google Scholar]

[bib7] Aune TM, Parker JS, Mass K, Liu Z, Olson NJ, Moore JH. Co-localization of differentially expressed genes and shared susceptibility loci in human autoimmunity. Genet Epidemiol. 2004;27:162–172. doi: 10.1002/gepi.20013. [DOI] [PubMed] [Google Scholar]

[bib8] Banacu SA, Devlin B, Roeder K. Association studies for quantitative traits in structured populations. Genet Epidemiol. 2002;22:78–93. doi: 10.1002/gepi.1045. [DOI] [PubMed] [Google Scholar]

[bib9] Beavis WD.1998QTL analysis: power, precision, and accuracyIn: Paterson AH (ed.).Molecular Dissection of Complex Traits CRC Press: Boca Raton, FL; 145–162. [Google Scholar]

[bib10] Bhattacharjee M, Botting CH, Sillanpää MJ. Bayesian biomarker identification based on marker-expression-proteomics data. Genomics. 2008;92:384–392. doi: 10.1016/j.ygeno.2008.06.006. [DOI] [PubMed] [Google Scholar]

[bib11] Bhattacharjee M, Sillanpää MJ.2009Bayesian joint disease-marker-expression analysis applied to clinical characteristics of chronic fatigue syndromeIn: McConnell P, Lim S, Cuticchia AJ (eds)Methods of Microarray Data Analysis VI CreateSpace Publishing: Scotts Valley, CA; 15–34. [Google Scholar]

[bib12] Bink MCAM, Anderson AD, van de Weg WE, Thompson EA. Comparison of marker-based pairwise relatedness estimators on a pedigreed plant population. Theor Appl Genet. 2008;117:843–855. doi: 10.1007/s00122-008-0824-1. [DOI] [PubMed] [Google Scholar]

[bib13] Blouin MS. DNA-based methods for pedigree reconstruction and kinship analysis in natural populations. Trends Ecol Evol. 2003;18:503–511. [Google Scholar]

[bib14] Bonney GE. Regressive logistic models for familial disease and other binary traits. Bioinformatics. 1986;42:611–625. [PubMed] [Google Scholar]

[bib15] Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23:2633–2635. doi: 10.1093/bioinformatics/btm308. [DOI] [PubMed] [Google Scholar]

[bib16] Burton P, Tiller K, Gurrin L, Cookson W, Musk A, Palmer LJ. Genetic variance components analysis for binary phenotypes using generalized linear mixed models (GLMMs) and Gibbs sampling. Genet Epidemiol. 1999;17:118–140. doi: 10.1002/(SICI)1098-2272(1999)17:2<118::AID-GEPI3>3.0.CO;2-V. [DOI] [PubMed] [Google Scholar]

[bib17] Bystrykh L, Weersing E, Dontje B, Sutton S, Pletcher MT, Wiltshire T, et al. Uncovering regulatory pathways that affect hematopoietic stem cell function using ‘genetical genomics'. Nat Genet. 2005;37:225–232. doi: 10.1038/ng1497. [DOI] [PubMed] [Google Scholar]

[bib18] Cantor RM, Chen GK, Pajukanta P, Lange K. Association testing in a linked region using large pedigrees. Am J Hum Genet. 2005;76:538–542. doi: 10.1086/428628. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Cardon LR, Palmer LJ. Population stratification and spurious allelic association. Lancet. 2003;361:598–604. doi: 10.1016/S0140-6736(03)12520-2. [DOI] [PubMed] [Google Scholar]

[bib20] Clayton DG. A generalization of the transmission/disequilibrium test for uncertain-haplotype transmission. Am J Hum Genet. 1999;65:1170–1177. doi: 10.1086/302577. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Clayton DG.2007Population associationIn: Balding DJ, Bishop M, Cannings C (eds).Handbook of Statistical Genomics3rd edn, vol. 2. pp.Wiley: Chichester, UK; 1264–1285. [Google Scholar]

[bib22] Conti DV, Witte JS. Hierarchical modeling of linkage disequilibrium: genetic structure and spatial relations. Am J Hum Genet. 2003;72:351–363. doi: 10.1086/346117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Corander J, Waldmann P, Sillanpää MJ. Bayesian analysis of genetic differentiation between populations. Genetics. 2003;163:367–374. doi: 10.1093/genetics/163.1.367. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Dadd T, Weale ME, Lewis CM. A critical evaluation of genomic control methods for genetic association studies. Genet Epidemiol. 2009;33:290–298. doi: 10.1002/gepi.20379. [DOI] [PubMed] [Google Scholar]

[bib25] Dawson KJ, Belkhir K. A Bayesian approach to the identification of panmictic populations and the assignment of individuals. Genet Res. 2001;78:59–77. doi: 10.1017/s001667230100502x. [DOI] [PubMed] [Google Scholar]

[bib26] Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004. doi: 10.1111/j.0006-341x.1999.00997.x. [DOI] [PubMed] [Google Scholar]

[bib27] Du P, Kibbe WA, Lin SM. Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics. 2006;22:2059–2065. doi: 10.1093/bioinformatics/btl355. [DOI] [PubMed] [Google Scholar]

[bib28] Epstein MP, Allen AS, Satten GA. A simple and improved correction for population stratification in case-control studies. Am J Hum Genet. 2007;80:921–930. doi: 10.1086/516842. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Epstein MP, Veal CD, Trembath RC, Barker JNWN, Li C, Satten GA. Genetic association analysis using data from triads and unrelated subjects. Am J Hum Genet. 2005;76:592–608. doi: 10.1086/429225. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Falk CT, Rubinstein P. Haplotype relative risks: an easy reliable way to construct a proper control sample for risk calculations. Ann Hum Genet. 1987;51:227–233. doi: 10.1111/j.1469-1809.1987.tb00875.x. [DOI] [PubMed] [Google Scholar]

[bib31] Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003;164:1567–1587. doi: 10.1093/genetics/164.4.1567. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Foss EJ, Radulovic D, Shaffer SA, Ruderfer DM, Bedalov A, Goodlett DR, et al. Genetic basis of proteome variation in yeast. Nat Genet. 2007;39:1369–1375. doi: 10.1038/ng.2007.22. [DOI] [PubMed] [Google Scholar]

[bib33] Frentiu FD, Clegg SM, Chittock J, Burke T, Blows MW, Owens IPF. Pedigree-free animal models: the relatedness matrix reloaded. Proc R Soc B. 2008;275:639–647. doi: 10.1098/rspb.2007.1032. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Gasbarra D, Pirinen M, Sillanpää MJ, Arjas E. Bayesian quantitative trait locus mapping based on reconstruction of recent genetic histories. Genetics. 2009;183:709–721. doi: 10.1534/genetics.109.104190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] Gasbarra D, Pirinen M, Sillanpää MJ, Salmela E, Arjas E. Estimating geneologies from unlinked marker data: a Bayesian approach. Theor Pop Biol. 2007;72:305–322. doi: 10.1016/j.tpb.2007.06.004. [DOI] [PubMed] [Google Scholar]

[bib36] Gauderman WJ, Witte JS, Thomas DC. Family-based association studies. J Natl Cancer Inst Monographs. 1999;26:31–37. doi: 10.1093/oxfordjournals.jncimonographs.a024223. [DOI] [PubMed] [Google Scholar]

[bib37] George V, Tiwari HK, Zhu X, Elston RC. A test of transmission/disequilibrium for quantitative traits in pedigree data, by multiple regression. Am J Hum Genet. 1999;65:236–245. doi: 10.1086/302444. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] George VT, Elston RC. Testing the association between polymorphic markers and quantitative traits in pedigrees. Genet Epidemiol. 1987;4:193–201. doi: 10.1002/gepi.1370040304. [DOI] [PubMed] [Google Scholar]

[bib39] Gibson G. Population genomics: celebrating individual expression. Heredity. 2003;90:1–5. doi: 10.1038/sj.hdy.6800195. [DOI] [PubMed] [Google Scholar]

[bib40] Glaser B, Holmans P. Comparison of methods for combining case-control and family-based association studies. Hum Hered. 2009;68:106–116. doi: 10.1159/000212503. [DOI] [PubMed] [Google Scholar]

[bib41] Greenland S. A unified approach to the analysis of case-distribution (case-only) studies. Stat Med. 1999;18:1–15. doi: 10.1002/(sici)1097-0258(19990115)18:1<1::aid-sim961>3.0.co;2-l. [DOI] [PubMed] [Google Scholar]

[bib42] Guan W, Liang K, Boehnke M, Abecasis CR. Genotype-based matching to correct for population stratification in large-scale case-control genetic association studies. Genet Epidemiol. 2009;33:508–517. doi: 10.1002/gepi.20403. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] Göring HHH, Terwilliger JD, Blangero J. Large upward bias in estimation of locus-specific effects from genomewide scans. Am J Hum Genet. 2001;69:1357–1369. doi: 10.1086/324471. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] Havill LM, Dyer TD, Richardson DK, Mahaney MC, Blangero J. The quantitative trait linkage disequilibrium test: a more powerful alternative to the quantitative transmission disequilibrium test for use in the absence of population stratification. BMC Genet. 2005;6 (Suppl 1:S91. doi: 10.1186/1471-2156-6-S1-S91. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] Hayes BJ, Visscher PM, Goddard ME. Increased accuracy of artificial selection by using the realized relationship matrix. Genet Res. 2009;91:47–60. doi: 10.1017/S0016672308009981. [DOI] [PubMed] [Google Scholar]

[bib46] Hernandéz-Sánchez J, Grunchec J-A, Knott S. A web application to perform linkage disequilibrium and linkage analyses on a computational grid. Bioinformatics. 2009;25:1377–1383. doi: 10.1093/bioinformatics/btp171. [DOI] [PubMed] [Google Scholar]

[bib47] Hernandéz-Sánchez J, Haley CS, Visscher PM. Power of QTL detection using association tests with family controls. Eur J Hum Genet. 2003;11:819–827. doi: 10.1038/sj.ejhg.5201042. [DOI] [PubMed] [Google Scholar]

[bib48] Hinds DA, Stokowski RP, Patil N, Konvicka K, Kershenobich D, Cox DR, et al. Matching strategies for genetic association studies in structured populations. Am J Hum Genet. 2004;74:317–325. doi: 10.1086/381716. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] Hoggart CJ, Parra EJ, Shriver MD, Bonilla C, Kittles RA, Clayton DG, et al. Control of confounding in genetic associations in stratified populations. Am J Hum Genet. 2003;72:1492–1504. doi: 10.1086/375613. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] Horvath S, Xu X, Lake SL, Silverman EK, Weiss ST, Laird NM. Family-based tests for associating haplotypes with general phenotype data: application to asthma genetics. Genet Epidemiol. 2004;26:61–69. doi: 10.1002/gepi.10295. [DOI] [PubMed] [Google Scholar]

[bib51] Hoti F, Sillanpää MJ. Bayesian mapping of genotype x expression interactions in quantitative and qualitative traits. Heredity. 2006;97:4–18. doi: 10.1038/sj.hdy.6800817. [DOI] [PubMed] [Google Scholar]

[bib52] Infante-Rivard C, Mirea L, Bull SB. Combining case-control and case-trio data from the same population in genetic association analyses: overview of approaches and illustration with a candidate gene study. Am J Epidemiol. 2009;170:654–664. doi: 10.1093/aje/kwp180. [DOI] [PubMed] [Google Scholar]

[bib53] Iwata H, Ebana K, Fukuoka S, Jannink J-L, Hayashi T. Bayesian multilocus association mapping on ordinal and censored traits and its application to the analysis of genetic variation among Oryza sativa L germplasms. Theor Appl Genet. 2009;118:865–880. doi: 10.1007/s00122-008-0945-6. [DOI] [PubMed] [Google Scholar]

[bib54] Iwata H, Uga Y, Yoshioka Y, Ebana K, Hayashi T. Bayesian association mapping of multiple quantitative trait loci and its application to the analysis of genetic variation among Oryza sativa L germplasms. Theor Appl Genet. 2007;114:1437–1449. doi: 10.1007/s00122-007-0529-x. [DOI] [PubMed] [Google Scholar]

[bib55] Jannink J-L, Bink MCAM, Jansen RC. Using complex plant pedigrees to map valuable genes. Trends Plant Sci. 2001;6:337–342. doi: 10.1016/s1360-1385(01)02017-9. [DOI] [PubMed] [Google Scholar]

[bib56] Jansen R, Greenbaum D, Gerstein M. Relating whole-genome expression data with protein-protein interactions. Genome Res. 2002;12:37–46. doi: 10.1101/gr.205602. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib57] Jansen RC, Nap J-P. Genetical genomics: the added value from segregation. Trends Genet. 2001;17:388–391. doi: 10.1016/s0168-9525(01)02310-1. [DOI] [PubMed] [Google Scholar]

[bib58] Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, et al. Efficient control of population structure in model organism association mapping. Genetics. 2008;178:1709–1723. doi: 10.1534/genetics.107.080101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib59] Kazeem GR, Farrall M. Integrating case-control and TDT studies. Ann Hum Genet. 2005;69:329–335. doi: 10.1046/j.1529-8817.2005.00156.x. [DOI] [PubMed] [Google Scholar]

[bib60] Kilpikari R, Sillanpää MJ. Bayesian analysis of multilocus association in quantitative and qualitative traits. Genet Epidemiol. 2003;25:122–135. doi: 10.1002/gepi.10257. [DOI] [PubMed] [Google Scholar]

[bib61] Kimmel G, Jordan MI, Halperin E, Shamir R, Karp RM. A randomization test for controlling population stratification in whole-genome association studies. Am J Hum Genet. 2007;81:895–905. doi: 10.1086/521372. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib62] Kraft P, Horvath S. The genetics of gene expression and gene mapping. Trends Biotechnol. 2003;21:377–378. doi: 10.1016/S0167-7799(03)00191-4. [DOI] [PubMed] [Google Scholar]

[bib63] Kraft P, Schadt E, Aten J, Horvath S. A family-based test for correlation between gene expression and trait values. Am J Hum Genet. 2003;72:1323–1330. doi: 10.1086/375167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib64] Lande R, Thompson R. Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics. 1990;124:743–756. doi: 10.1093/genetics/124.3.743. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib65] Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 1994;265:2037–2048. doi: 10.1126/science.8091226. [DOI] [PubMed] [Google Scholar]

[bib66] Lange C, DeMeo DL, Laird NM. Power and design considerations for a general class of family-based association tests: quantitative traits. Am J Hum Genet. 2002;71:1330–1341. doi: 10.1086/344696. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib67] Lee S, Sullivan P, Zou F, Wright F. Comment on a simple and improved correction for population stratification. Am J Hum Genet. 2008;82:524–526. doi: 10.1016/j.ajhg.2007.10.014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib68] Lee W-C. Case-control association studies with matching and genomic controlling. Genet Epidemiol. 2004;27:1–13. doi: 10.1002/gepi.20011. [DOI] [PubMed] [Google Scholar]

[bib69] Leutenegger AL, Prum B, Génin E, Verny C, Lemainque A, Clerget-Darpoux F, et al. Estimation of the inbreeding coefficient through use of genomic data. Am J Hum Genet. 2003;73:516–523. doi: 10.1086/378207. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib70] Long AD, Langley CH. The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 1999;9:720–731. [PMC free article] [PubMed] [Google Scholar]

[bib71] Lu Y, Liu P-Y, Liu Y-J, Xu F-H, Deng H-W. Quantifying the relationship between gene expression and trait values in general pedigrees. Genetics. 2004;168:2395–2405. doi: 10.1534/genetics.104.031666. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib72] Luca D, Ringquist S, Klei L, Lee AB, Gieger C, Wichman H-E, et al. On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants. Am J Hum Genet. 2008;82:453–463. doi: 10.1016/j.ajhg.2007.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib73] Lund MS, Sorensen P, Guldbrandtsen B, Sorensen DA. Multitrait fine mapping of quantitative trait loci using combined linkage disequilibria and linkage analysis. Genetics. 2003;163:405–410. doi: 10.1093/genetics/163.1.405. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib74] Lynch M, Ritland K. Estimation of pairwise relatedness with molecular markers. Genetics. 1999;152:1753–1766. doi: 10.1093/genetics/152.4.1753. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib75] Maher BS. The case of the missing heritability. Nature. 2008;456:18–21. doi: 10.1038/456018a. [DOI] [PubMed] [Google Scholar]

[bib76] Manenti G, Galvan A, Pettinicchio A, Trincucci G, Spada E, Zolin A, et al. Mouse genome-wide association mapping needs linkage analysis to avoid false-positive loci. PloS Genet. 2009;5:e1000331. doi: 10.1371/journal.pgen.1000331. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib77] Martinez V, Thorgaard G, Robison B, Sillanpää MJ. An application of Bayesian QTL mapping to early development in double haploid lines of rainbow trout including environmental effects. Genet Res. 2005;85:209–221. doi: 10.1017/S0016672305007871. [DOI] [PubMed] [Google Scholar]

[bib78] McCarthy MI, Hirschhorn JN. Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet. 2008;17:R156–R165. doi: 10.1093/hmg/ddn289. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib79] Meuwissen THE, Goddard ME. Prediction of identity by descent probabilities from marker-haplotypes. Genet Sel Evol. 2001;33:605–634. doi: 10.1186/1297-9686-33-6-605. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib80] Meuwissen THE, Goddard ME. Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data. Genet Sel Evol. 2004;36:261–279. doi: 10.1186/1297-9686-36-3-261. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib81] Meuwissen THE, Goddard ME. Multipoint identity-by-descent prediction using dense markers to map quantitative trait loci and estimate effective population size. Genetics. 2007;176:2551–2560. doi: 10.1534/genetics.107.070953. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib82] Milligan BG. Maximum-likelihood estimation of relatedness. Genetics. 2003;163:1153–1167. doi: 10.1093/genetics/163.3.1153. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib83] Misztal I. Estimation of variance components with large-scale dominance models. J Dairy Sci. 1996;80:965–974. [Google Scholar]

[bib84] O'Hara RB, Sillanpää MJ. A review of Bayesian variable selection methods: what, how and which. Bayesian Anal. 2009;4:85–118. [Google Scholar]

[bib85] Patterson N, Price AL, Reich D. Population structure and eigen analysis. PloS Genet. 2006;2:e190. doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib86] Pérez-Enciso M. Fine mapping of complex trait genes combining pedigree and linkage disequilibrium information: a Bayesian unified framework. Genetics. 2003;163:1497–1510. doi: 10.1093/genetics/163.4.1497. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib87] Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]

[bib88] Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000b;155:845–959. doi: 10.1093/genetics/155.2.945. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib89] Pritchard JK, Stephens M, Rosenberg NA, Donnelly P. Association mapping in structured populations. Am J Hum Genet. 2000a;67:170–181. doi: 10.1086/302959. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib90] Pikkuhookana P, Sillanpää MJ. Correcting for relatedness in Bayesian models for genomic data association analysis. Heredity. 2009;103:223–237. doi: 10.1038/hdy.2009.56. [DOI] [PubMed] [Google Scholar]

[bib91] Purcell S, Sham P, Daly MJ. Parental phenotypes in family-based association analysis. Am J Hum Genet. 2005;76:249–259. doi: 10.1086/427886. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib92] Rabinowitz D. A transmission disequilibrium test for quantitative trait loci. Hum Hered. 1997;47:342–350. doi: 10.1159/000154433. [DOI] [PubMed] [Google Scholar]

[bib93] Ripatti S, Pitkäniemi J, Sillanpää MJ. Joint modeling of genetic association and population stratification using latent class models. Genet Epidemiol. 2001;21 (Suppl 1:S401–S414. doi: 10.1002/gepi.2001.21.s1.s409. [DOI] [PubMed] [Google Scholar]

[bib94] Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1616–1617. doi: 10.1126/science.273.5281.1516. [DOI] [PubMed] [Google Scholar]

[bib95] Risch N, Zhang H. Extreme discordant sib pairs for mapping quantitative trait loci in humans. Science. 1995;268:1584–1589. doi: 10.1126/science.7777857. [DOI] [PubMed] [Google Scholar]

[bib96] Satten GA, Flanders WD, Yang Q. Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am J Hum Genet. 2001;68:466–477. doi: 10.1086/318195. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib97] Sebastiani P, Abad MM, Alpargu G, Ramoni MF. Robust transmission/disequilibrium test for incomplete family genotypes. Genetics. 2004;168:2329–2337. doi: 10.1534/genetics.103.025841. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib98] Setakis E, Stirnadel H, Balding DJ. Logistic regression protects against population structure in genetic association studies. Genome Res. 2006;16:290–296. doi: 10.1101/gr.4346306. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib99] Sham PC, Curtis D. An extended transmission/disequilibrium test (TDT) for multi-allele marker loci. Ann Hum Genet. 1995;59:323–336. doi: 10.1111/j.1469-1809.1995.tb00751.x. [DOI] [PubMed] [Google Scholar]

[bib100] Sillanpää MJ, Auranen K. Replication in genetic studies of complex traits. Ann Hum Genet. 2004;68:646–657. doi: 10.1046/j.1529-8817.2004.00122.x. [DOI] [PubMed] [Google Scholar]

[bib101] Sillanpää MJ, Bhattacharjee M. Bayesian association-based fine mapping in small chromosomal segments. Genetics. 2005;169:427–439. doi: 10.1534/genetics.104.032680. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib102] Sillanpää MJ, Bhattacharjee M. Association mapping of complex trait loci with context-dependent effects and unknown context variable. Genetics. 2006;174:1597–1611. doi: 10.1534/genetics.106.061275. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib103] Sillanpää MJ, Hoti Mapping quantitative trait loci from a single-tail sample of the phenotype distribution including survival data. Genetics. 2007;177:2361–2377. doi: 10.1534/genetics.107.081299. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib104] Sillanpää MJ, Kilpikari R, Ripatti S, Onkamo P, Uimari P. Bayesian association mapping for quantitative traits in a mixture of two populations. Genet Epidemiol. 2001;21 (Suppl 1:S692–S699. doi: 10.1002/gepi.2001.21.s1.s692. [DOI] [PubMed] [Google Scholar]

[bib105] Sillanpää MJ, Noykova N. Hierarchical modelling of clinical and expression quantitative trait loci. Heredity. 2008;101:271–284. doi: 10.1038/hdy.2008.58. [DOI] [PubMed] [Google Scholar]

[bib106] Slager SL, Schaid DJ. Evaluation of candidate genes in case-control studies: a statistical method to account for related subjects. Am J Hum Genet. 2001;68:1457–1462. doi: 10.1086/320608. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib107] Slatkin M. Epigenetic inheritance and the missing heritability problem. Genetics. 2009;182:845–850. doi: 10.1534/genetics.109.102798. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib108] Spielman SR, Ewens WJ. A sibship test for linkage in presence of association: the sib transmission/disequilibrium test. Am J Hum Genet. 1998;62:450–458. doi: 10.1086/301714. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib109] Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region ans insulin-dependent diabetes mellitus (IDDM) Am J Hum Genet. 1993;52:506–516. [PMC free article] [PubMed] [Google Scholar]

[bib110] Stich B, Möhring J, Piepho H-P, Heckenberger M, Buckler ES, Melchinger AE. Comparison of mixed-model approaches for association mapping. Genetics. 2008;178:1745–1754. doi: 10.1534/genetics.107.079707. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib111] Tang H, Quertermous T, Rodriguez B, Kardia SLR, Zhu X, Brown A, et al. Genetic structure, self-identified race/ethnicity, and confounding in case-control association studies. Am J Hum Genet. 2005;76:268–275. doi: 10.1086/427888. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib112] Teng J, Risch N. The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases II Individual genotyping. Genome Res. 1999;9:234–241. [PubMed] [Google Scholar]

[bib113] Terwilliger JD, Ott J. A haplotype-based ‘haplotype relative risk' approach to detecting allelic associations. Hum Hered. 1992;42:337–346. doi: 10.1159/000154096. [DOI] [PubMed] [Google Scholar]

[bib114] Terwilliger JD, Weiss KM. Linkage disequilibrium mapping of complex trait: fantasy or reality. Curr Opin Biotechnol. 1998;9:578–594. doi: 10.1016/s0958-1669(98)80135-3. [DOI] [PubMed] [Google Scholar]

[bib115] Thomas DC. Statistical Methods in Genetic Epidemiology. Oxford University Press: New York; 2004. [Google Scholar]

[bib116] Thornsberry JM, Goodman MM, Doebley J, Kresovich S, Nielsen D, Buckler ES. Dwarf8 polymorphisms associate with variation in flowering time. Nat Genet. 2001;28:286–289. doi: 10.1038/90135. [DOI] [PubMed] [Google Scholar]

[bib117] Tsai MY, Hsiao CK, Wen SH. A Bayesian spatial multimarker genetic random-effect model for fine-scale mapping. Ann Hum Genet. 2008;72:658–669. doi: 10.1111/j.1469-1809.2008.00459.x. [DOI] [PubMed] [Google Scholar]

[bib118] van de Casteele T, Galbusera P, Matthysen E. A comparison of microsatellite-based pairwise relatedness estimators. Mol Ecol. 2001;10:1539–1549. doi: 10.1046/j.1365-294x.2001.01288.x. [DOI] [PubMed] [Google Scholar]

[bib119] Voight BF, Pritchard JK. Confounding from cryptic relatedness in case-control association studies. PloS Genet. 2005;1:e32. doi: 10.1371/journal.pgen.0010032. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib120] Wang K. Testing for genetic association in the presence of population stratification in genome-wide association studies. Genet Epidemiol. 2009;33:637–645. doi: 10.1002/gepi.20415. [DOI] [PubMed] [Google Scholar]

[bib121] Weir BS, Anderson AD, Hepler AB. Genetic relatedness analysis: modern data and new challenges. Nat Rev Genet. 2006;7:771–780. doi: 10.1038/nrg1960. [DOI] [PubMed] [Google Scholar]

[bib122] Wu J, Patwa TH, Lubman DM, Ghosh D. Identification of differentially expressed spatial clusters using humoral response microarray data. Comput Stat Data Anal. 2009;53:3094–3102. doi: 10.1016/j.csda.2008.04.026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib123] Xiao R, Boehnke M. Quantifying and correcting for the winner's curse in genetic association studies. Genet Epidemiol. 2009;33:453–462. doi: 10.1002/gepi.20398. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib124] Yan T, Hou B, Yang Y. Correcting for cryptic relatedness by a regression-based genomic control method. BMC Genet. 2009;10:78. doi: 10.1186/1471-2156-10-78. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib125] Yu J, Pressoir G, Briggs WH, Vroh BiI, Yamasaki M, Doebley JF, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006;38:203–208. doi: 10.1038/ng1702. [DOI] [PubMed] [Google Scholar]

[bib126] Zhang F, Deng H-W. Correcting for cryptic relatedness in population-based association studies of continuous traits. Hum Hered. 2010;69:28–33. doi: 10.1159/000243151. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib127] Zhao K, Aranzana MJ, Kim S, Lister C, Shindo C, Tang C, et al. An Arabidopsis example of association mapping in structured samples. PloS Genet. 2007;3:e4. doi: 10.1371/journal.pgen.0030004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib128] Zheng G, Freidlin B, Gastwirth JL. Robust genomic control for association studies. Am J Hum Genet. 2006;78:350–356. doi: 10.1086/500054. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib129] Zhu X, Li S, Cooper RS, Elston RC. A unified association analysis approach for family and unrelated samples correcting for stratification. Am J Hum Genet. 2008;82:352–365. doi: 10.1016/j.ajhg.2007.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib130] Zondervan KT, Cardon LR, Kennedy SH. What makes a good case-control study. Hum Reprod. 2002;17:1415–1423. doi: 10.1093/humrep/17.6.1415. [DOI] [PubMed] [Google Scholar]

PERMALINK

Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses

M J Sillanpää

Abstract

Introduction

Approaches for population stratification

Stratified analysis

Genomic controls

Structured association

Smoothing

Principal component approach

Matching

Use of relationship information

Affected trios

Pseudo-control data

Pedigree data

Use of secondary samples

Approaches for cryptic relatedness

Infinite polygenic model

Regression covariate approach

Test statistic accounting for relatedness

Genomic controls

Use of estimation-based variable selection and multilocus models

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses

M J Sillanpää

Abstract

Introduction

Approaches for population stratification

Stratified analysis

Genomic controls

Structured association

Smoothing

Principal component approach

Matching

Use of relationship information

Affected trios

Pseudo-control data

Pedigree data

Use of secondary samples

Approaches for cryptic relatedness

Infinite polygenic model

Regression covariate approach

Test statistic accounting for relatedness

Genomic controls

Use of estimation-based variable selection and multilocus models

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases