Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Jun 27.
Published in final edited form as: Genomics. 2011 Aug 2;98(4):233–241. doi: 10.1016/j.ygeno.2011.07.006

ANNOTATING INDIVIDUAL HUMAN GENOMES*

Ali Torkamani 1, Ashley A Scott-Van Zeeland 1, Eric J Topol 1, Nicholas J Schork 1
PMCID: PMC4074010  NIHMSID: NIHMS319503  PMID: 21839162

Abstract

Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely to amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants.

Keywords: Sequencing, functional analysis, computer modeling, genomic variation

INTRODUCTION

The DNA sequencing Era

The introduction and now wide-spread availability of efficient and cost-effective high-throughput DNA sequencing technologies has radically changed the way in which geneticists can approach the study of human diseases. For example, traditional strategies for identifying or ‘mapping’ genes and/or genetic variants that influence a disease typically involved tracing co-segregation or linkage disequilibrium (LD) patterns between a particular genetic variant and a phenotype either among family members, as in classical linkage analysis [1,2], or across unrelated individuals, as in more contemporary LD mapping-based association analysis [1,3]. The goal of these strategies is to identify a unique ancestral chromosomal segment or haplotype that is ‘marked’ by specific variants that track along with a disease from generation-to-generation or between individuals with the phenotype. The basic intuition is that this segment or haplotype contains a variant that causally influences the phenotype and the neighboring material is simply being inherited along with this causal variant. Thus, the identified associations between genetic variants and a particular phenotype resulting from these strategies are often indirect and merely pointed researchers to a genomic region that might harbor variants that are causally-related to the phenotype in question. The subsequent identification of the causal variants in the region implicated in a linkage or LD-mapping study is not always trivial and can require a great deal of effort.

Whole genome sequencing can facilitate tracing segregation and/or LD patterns involving a phenotype of interest or a haplotype that harbors a variant that influences that phenotype. This is the case because one does not have to first identify a region harboring a variant and then seek to determine the trait-influencing variant on the haplotype at a later time, but rather would have all the information needed for identifying an association and a causal variant simultaneously. DNA sequencing a genomic region or entire genomes among individuals, e.g., with and without a particular phenotype, will reveal all the genetic variants the individuals possess, shifting the problem from merely identifying a region harboring a variant to determining which among the many variants observed in the region (or genome as a whole) are likely to influence the phenotype [4]. As with traditional gene mapping methods, the use of DNA sequencing technologies to identify variants associated with a particular phenotype requires sophisticated data analysis methodologies [5]. The search for variants likely to be associated with a phenotype from sequence data can be greatly enhanced by leveraging knowledge about the variants discovered, such as their positions and likely functional effects, since this knowledge can be used to prioritize variants or weight them in relevant statistical analyses [5]. Assigning biological meaning to sequence variants, i.e., ‘annotating’ them, is the subject of this review, as there are a variety of ways variants can be annotated and exploited in various contexts.

There are other reasons why sequence annotations are important in modern human genetics. For example, many clinical diagnoses of congenital conditions require an understanding of variants an individual might possess that could be pathogenic, and this understanding requires the use of methods for classifying and annotating existing and novel variants, as is the case with BRCA1-associated breast cancer and similar conditions [6]. In addition, many contemporary sequencing initiatives focus on the mere identification and cataloguing of naturally-occurring sequence variants, such as the International HapMap Project [7], the 1000 Genomes Project [8], and the Cancer Genome Atlas Project [9], leaving it open to others to determine the biological significance of the variants identified. Finally, as whole genome sequencing (WGS) of individual human genomes becomes more efficient, the need for putting the variation across individual genomes into a biological context will become more pronounced [1012].

Annotating Sequence Variations

There are many ways in which one can assign a label or biological meaning to a position in the human genome that varies among individuals in the population at large. The simplest is to identify a variant’s position either within the sequence as a whole or in a putative functional genomic element. This is not trivial, as the genomic positions must be based on landmark sites in the genome or with respect to a ‘reference’ genome, such as that used for the human genome browser [13,14] or the HuRef genome [15]. The use of reference genomes can be problematic, however, since human genomes vary in size due to repetitive elements and functional elements are continually being identified and updated. In addition, sequence variants, perhaps despite being in a functional element, do not often perturb the actual functioning of a position or functional element [16]. The evolutionary or population history of a variant might shed light on a variant’s effect, but how one determines and assigns the functional impact of variant given the biological realities of e.g., redundancies and feedback in disease pathways, general buffering effects, and tissue-specific gene effects, is not trivial [1618], nor is understanding the origins of a variant [19].

One important issue surrounding the annotation of sequence variations involves the nature of the variants themselves [20]. Single nucleotide variants (SNVs) are the most abundant forms of variation and have been given a great deal of attention. Small insertions and deletions (‘indels’), repetitive sequence, copy number variations (CNVs), and inversions have been given less attention, although they are beginning to receive more. Another issue relating to variant annotation involves the potential context-specificity of particular effects. Epistasis, gene x environment interactions, or general multilocus and genetic background effects may mitigate or enhance the effect of any single variant. In addition, phase information (i.e., knowledge of the nucleotide content of the two homologous chromosomes an individual inherits in the vicinity of a variant) is known to influence the functional effects of a variant in many cases [21].

In the remainder of this review we consider the various methods and strategies for annotating DNA sequence variation. These strategies range from merely identifying the genomic elements (e.g., genes, promoters, microRNA binding sites, etc.) a variant reside in, predicting the likely functional effect of an association using sophisticated statistical algorithms, and directly examining the impact of the variant on phenotypic expression using contrived experimental settings involving in vitro or model organism-based assays. By ‘functional’ it is meant that a variant results in a change of phenotype either at the basic molecular level (e.g., change in binding strength of a transcription factor) or at the overt phenotypic level (e.g., the variant is associated with, or known to cause, a disease). Although many of the methods, analyses, and references we comment focus on single nucleotide variants (SNVs), quite a few of these methods are applicable to other forms of variation as well. Table 1 provides a listing of example tools and databases that provide annotations of the type we discuss. We consider the limitations of each of many approaches and provide references to some of the more salient examples of their use. We conclude with a few remarks on the future of genome annotations and the use of such annotations in clinical and non-clinical settings.

Table 1.

Example Tools for Human Variant Annotations

Tool Website/Reference Purpose/Theme
UCSC genome browser http://genome.ucsc.edu/ Position-specific functional organization of the genome
dbSNP http://www.ncbi.nlm.nih.gov/projects/SNP/ Catalog variants with population-genetic annotations
OMIM http://www.ncbi.nlm.nih.gov/omim Catalog known disease-causing mutations
HapMap http://hapmap.ncbi.nlm.nih.gov/ Catalog variants with population-genetic annotations
COSMIC http://www.sanger.ac.uk/perl/genetics/CGP/cosmic Catalog of somatic mutations from tumor sequencing
TAMAL http://neoref.ils.unc.edu/tamal/ Provides functional and population-genetic annotations
Variant Analyzer http://www.svaproject.org/ Provides functional annotations
PharmGKB http://www.pharmgkb.org/ Pharmacogenetics variant annotations
HGDP Selection Browser http://hgdp.uchicago.edu/cgi-bin/gbrowse/HGDP/ Browser for assessing signs of selection in the human genome
Association Database www.genome.gov/gwastudies Results of genome wide association studies (GWAS)
SeattleSeq http://gvs.gs.washington.edu/SeattleSeqAnnotation/ Variant Annotation
Gene Ontology http://www.geneontology.org/ Biological, Molecular and Cellular Annotations
KEGG Pathways http://www.genome.jp/kegg/pathway.html Pathway Analysis
DAVID http://david.abcc.ncifcrf.gov/ Multiple Annotations
UniProt http://www.uniprot.org/ Protein Elements
Transfac http://www.biobase-international.com Transcription factor databases
Genenetwork eQTL website www.genenetwork.org eQTL database

TYPES OF ANNOTATION

Genomic Position and Functional Elements

The simplest and most obvious form of sequence variant annotation is determining the variant’s position in the genome. Position-based variant annotations are greatly facilitated by the availability of reference genomes and web-based browsers that can be used to graphically display genomic positions on those reference genomes, such as the UCSC browser [13,14]. The position of a variant can also be annotated with respect to known functional elements in a genome, such as introns, exons, promoters, enhancers, silencers, microRNA binding sites, conserved regions, etc. Annotating a variant with respect to its position in functional genomic elements is crucial for putting the likely functional effect of that specific variant into perspective, as discussed in detail below [22]. The UCSC browser provides information on functional genomic elements and their positions in the genome as well as an easy way of representing the position of an identified variant relative to a reference genome [13,14].

There are few caveats with determining the position of a variant in a genome either relative to other variants or with respect to functional elements in a genome. First, not all functional elements in the human genome have been identified, although their existence is constantly being updated through functional studies. Second, humans are a diploid species, so that each human contains two genomes: one maternally-derived and one paternally-derived. The position of genetic variants on a genome relative to other variants should thus be determined with consideration of which chromosome or haplotype they reside on. Characterizing haplotype or ‘phase’ information from sequence data obtained from an individual is not trivial, but may be important for putting the functional effects of those variants into perspective [21, 23]. Third, variant positions can also be determined independently of a reference genome and therefore be assessed merely relative to other variants in an individual genome via de novo assemblies of individual genomes. However, such de novo assemblies are notoriously difficult for human genomes [15].

Population Characteristics

The population characteristics of a variant are an important form of initial annotation information. The frequency of a variant (or the frequencies of alleles at a multiallelic locus) within different populations is a useful indicator of the age of a variant. Frequency information also provides insight into the population-specificity of the variant – which could be a sign of selection effects ([24]; see below) – or suggest that the variant, if it had never been observed before in population studies, is a de novo mutation. However, the best way to determine if a variant is de novo is to genotype the parents of the carrier and determine if the variant was indeed inherited or not. Allele frequency information has also been shown to be associated with likely functional or phenotypic consequences of a variant [2527].

Other population characteristics that are important for variant annotation include linkage disequilibrium (LD) patterns exhibited by the variant with other variants. Such patterns can provide information about the haplotypic background upon which the variant arose and the potential for use of other variants as surrogates or proxies for the variant in question. Such information can greatly facilitate gene mapping studies [28]. Population characteristic annotations of variants are often recorded in databases such as the hapmap database [7] and dbSNP [29].

Conservation

One of the most widely used methods for estimating the likely functional impact of a variant is to investigate how conserved the nucleotides are encompassing the variant (or amino acids if the variant is in the coding region of a gene). If the wild-type or non-variant nucleotides at the position are conserved across species, then this could suggest that selection acted against alternative nucleotides at that position on an evolutionary time scale, indicating not only the likely functional or phenotypic significance of those nucleotides but also the deleterious nature of the variant nucleotides. Many programs used to estimate the likely functional impact of a variant leverage conservation information (e.g., [30]). However, conservation has been shown to be only a reasonable surrogate, rather than a perfect surrogate, for functionality and phenotypic relevance [31]. In addition, lineage and species-specific functionality of nucleotides raises questions about what species to include in an assessment of conservation (see, e.g., [32]).

Biophysical Modeling

One direct way to assess the likely functional impact of a DNA sequence variant is to determine the influence of the nucleotide substitution, indel, or larger structural anomaly on the structure of DNA sequence encompassing that variant or a protein encoded by the sequence harboring the variant. Although such biophysical modeling has been pursued at the level of DNA to characterize and quantify the structure and properties of unique sequences [3335, 31] it is far from trivial to perform and its use to assess the effect of nucleotide substitution requires extensive knowledge of not only the basic properties of DNA, but how those basic features impact the functioning of higher-order structures and processes such as proteins, binding events, and interactions involving different DNA sequence-mediated molecular structures.

Knowledge of protein structures associated with a gene whose coding regions harbor variants is on much more solid footing, since the characterization of the effects of particular residues or amino acids on protein function is achievable with appropriate structural assays and computational models [36, 37]. However, not all proteins have had their structures solved, nor do all amino acid substitutions result in an overt protein deformation or admit an easy functional characterization. In addition, relatively few proteins have had both mutant and wild-type structures solved, making it hard to evaluate how well one can predict the influence of specific amino acid substitutions on not only those proteins whose structures are solved, but homologous (or paralogous) protein structures as well [38].

Prediction/Classifier Analyses

In the absence of precise biophysical or functional models of DNA sequence or protein structures, determining the likely effect of a sequence variant on those structures can be pursued by building predictive models based on various features associated with previously characterized variants (e.g., sequence motifs, sequence conservation levels, similarity to other variants, the results of biochemical assays exploring those variants, etc.) that can discriminate known functional from non-functional variants. The information about which features appear to discriminate functional from non-functional variants can then be applied to novel or previously uncharacterized variants in order to estimate their likely functional potential. Prediction and classification analyses of this sort of used widely and form the basis of many computer programs for evaluating the functional significance of coding variants [39, 22, 32, 4042], variants in regulatory genomic regions [43], and somatic mutations in cancer [44, 45]. Problems inherent in these approaches are the need for specifying the variant ‘features’ that will be used to create predictive models, since the chosen features might not encompass the more important, yet unknown, features [40], and the need for training data sets for the development of the predictor or classifier for which the absolute functionality of variants has been determined.

Literature-based Annotations of Specific Variants

The best way of assigning a functional annotation to an observed sequence variant is to specifically study that variant using, e.g., relevant laboratory functional assays, transgenic mice, families with clinical phenotypes possessing that variant, etc. The literature contains thousands of papers describing the results of various assays investigating the significance of particular sequence variants. Accessing this information to annotate an observed variant in an individual genome is not entirely trivial as the reports in the literature describing any variant will undoubtedly have nuances and caveats, and be very context-specific. Databases such as OMIM [46] can clearly facilitate searches for at least the overt phenotypic influence of a variant, as can phenotype or disease-specific databases such as AlzGene [47].

Many reports describing the functional effects of human variants leverage laboratory-based in vitro or model organism functional assays, the generalizability or ultimate human in vivo significance of which is hard to gauge. Recently, Ioannidis and Kavvoura [48] and Cerulli and Goldstein [49] considered the results of in vitro studies designed to assess the functional effects of specific naturally-occurring human DNA sequence variants on the expression of genes in cis and compared them to the results of association studies involving those same variants with specific diseases. The authors found only a weak correlation, suggesting that the results of contrived laboratory-based functional assays probing the molecular effects of variants do not necessarily predict which variants play a large role in mediating disease susceptibility in the population at large. Despite this, knowledge of the molecular effects of variants, in whatever setting, can be important for making claims about the potential for those variants to influence gross phenotypic expression as well as those variants’ potential for participating in pathogenetic processes and networks that underlie diseases.

There is a growing trend to develop tools for mining the literature on genetic variants by leveraging appropriate keywords, assay types, etc. mentioned in published reports on the internet [50, 51]. The basic idea is to bring together and synthesize information about genetic variants and possibly make predictions about a novel variant based on what is learned from the literature about related variants (see, e.g., [50, 51]). However, there are clearly limits in the available published knowledge that might create biases in developing such predictive schemes [52].

Context-specific perturbation-based annotations

The functional effects of many sequence variants require appropriate environmental conditions to be present, and hence annotations assigned to variants that exhibit an effect only in certain contexts are important. For example, many variants have been found to exhibit interactions with environments of one sort or another in the context of disease susceptibility [53]. However, more specific interactions involving drugs or chemicals have received a great deal of recent attention and have led to databases cataloging evidence for such interactions, such as the ToxPO database [54] and, importantly, the Pharmacogenetics Knowledge Base or ‘PharmGKB’ [55]. Many studies and databases providing context-specific functional annotations of genetic variants are not necessarily population-based, but may leverage specific laboratory or model-organism based functional assays [56].

Molecular phenotype-based association study results

Many researchers have pursued tests of associations between genetic variants and the expression levels of genes measured on a genome-wide scale via microarray technologies [57,58]. Such studies typically involve measuring gene expression levels from tissues obtained on many individuals, genotyping those individuals, and then testing the statistical association between the genotyped variants and the expression levels of the genes. These studies have revealed thousands of putative statistical associations between genetic variations, working either in cis or in trans, and the expression levels of many genes [59]. Association studies of this type have been expanded to consider associations between genetic variants and protein levels [60] as well as the methylation levels of different genomic sites [61]. In fact, some investigators have considered integrating information about genetic variants influencing both, e.g., expression and methylation levels of certain genes [62].

Annotating variants with respect to their statistical association with changes in expression levels (expression Quantitative Trait Loci or ‘eQTLs’), protein levels (‘pQTLs’), and methylation levels (‘mQTLs’) can be very revealing about the likely functional effects of the variant, especially since it has been shown that variants that influence expression levels of genes from association studies are more likely to be associated with overt phenotypes and/or disease [63]. Two important issues in annotating variants in the context of their influence on expression level, protein level, and methylation level are that these phenomena are tissue-specific, making it important to consider the tissue or cell type that the assays were performed with. In addition, the statistical criteria for declaring a significant association between a variant and, e.g., the expression level of a gene must be considered since genome-wide association studies with small sample sizes – which is the rule for most eQTL, pQTL, and mQTL studies – can result in many false positive and false negative associations.

Clinical phenotype-based association study results

By far the most important and relevant variant annotation concerns knowledge of what overt, clinical phenotypes and diseases a variant may either cause or be associated with. The assessment of which overt or clinical phenotypes a variant might cause or be associated with is in contrast to variants exhibiting associations with molecular physiologic phenotypes such as gene expression or protein levels, protein structural deformations, or environmentally-induced phenotypes, since the molecular effects of a variant may be mitigated or offset by the effects of other genes or environmental conditions, such as pharmacotherapies, and hence not result in an overt phenotype. Databases such as OMIM [64] catalog disease-associated variants and are extremely useful for annotating previously observed variants likely to be causally associated with rare Mendelian diseases. In addition, the results of recent genome wide association studies (GWAS) have provided over 1000 very compelling associations between particular genetic variants, mostly SNVs, and common, complex diseases and phenotypes of all sorts ([3] and http://www.genome.gov/GWAstudies/).

Two important Issues with the use of association study results like those recorded in OMIM and GWAS results databases [65] for genome annotation include the fact that they rely on previous knowledge of the association and hence cannot be used for novel variants, and that one must consider the statistical significance of the association (as with eQTL, pQTL, and mQTL analysis) [66].

Multilocus phenotypic predictions/disease risk assessment

Once associations between multiple variants and a phenotype have been described, it is important to consider the collective effects of these variants on phenotypic expression. This is not trivial since it requires knowledge or assumptions about how the variants interact, but is of great interest since such models could be used for predicting or classifying individuals harboring specific variants into, e.g., diagnostic or prognostic categories with respect to disease. There are two ways relevant modeling and analyses can be pursued. The first is to directly assess the collective effects of variants in an appropriate data set with individuals studied before and after the onset of the phenotype so that predictions can be made. The second is to model the impact of the collection of variants by making assumptions about, e.g., their effect sizes, frequencies, combined influence (such as additive or multiplicative effects), the incidence and prevalence of the phenotype in question and how incidence and prevalence are impacted by the collective influence of the variants, and the influence of non-genetic factors. The second method is used routinely, as many association studies and candidate gene analyses have been pursued in isolation and not in one all-encompassing longitudinal study, creating a need to synthesize the information across such studies. Obviously, such studies are highly problematic since the assumptions they make may not be correct [67].

Despite this fact, phenotype prediction from genetic (and non-genetic) data is seen as the holy grail of genomic medicine and disease prevention, even though there are great concerns about how such information will be perceived and utilized [68]. Recent studies in diabetes and cardiovascular disease suggest that the addition of previously associated genetic risk loci in clinical risk models of those diseases increases both discriminative and predictive accuracy, albeit only marginally. Typically, the strongest predictors of disease onset are known clinical risk factors such as body mass index, age, or gender. While genetic risk variants can have unequivocal associations with disease, the addition of multilocus genetic risk to clinical models rarely provides large gains in predictive accuracy, and models built solely on genetic factors tend to perform comparably to non-genetic risk models. Table 2 provides a list of studies exploring multilocus risk or classification models for diabetes and a summary of the outcomes of these studies.

Table 2.

Studies Investigating Multilocus Genetic Models for Predicting or Classifying Diabetics.

Reference # Variants Study Design Outcome
Lyssenko, V. et al. (2005) 6 Predictive Effect of multilocus genetic risk is roughly equivalent to estimated relative risk
Weedon, M. et al. (2006) 3 Classification Low predictive power for multilocus genetic model only
Lango, H. et al. (2008) 18 Classification Adding genetic risk variants only marginally increases discriminative accuracy above BMI, Age and sex
Lu and Elston (2008) 12 Classification Addition of 9 variants and 4 environmental variables to model by Weedon et al., (2006) marginally increases AUC estimates
Lyssenko, V. et al. (2008) 16 Predictive Addition of genetics to clinical models improves discriminatory power and reclassification of patients in a small but statistically significant way
Meigs, J. et al. (2008) 18 Predictive Addition of genetics only slightly increases risk prediction accuracy above common clinical risk factors alone
Cauchi, S. et al. (2008) 22 Classification 15-loci model sufficient to attain high discriminative accuracy within French population
vanHoek, M. et al. (2008) 18 Predictive Adding genetic risk variants only marginally increases discriminative accuracy above BMI, Age and sex
Lu, Q. et al., (2009) 18 Classification Addition of 15 genetic loci to model from Lu and Elston (2008) provided only limited improvement in discriminative accuracy
Miyake, K. et al., (2009) 11 Classification Adding genetic risk variants only marginally increases discriminative accuracy above BMI, Age and sex
Talmud, P. et al., (2010) 20 Predictive Addition of genetics to clinical models only minimally improves predictive accuracy and reclassification

Relevant analyses that lead to genomic variation annotations of this sort are problematic for many reasons that go beyond mathematical assumptions about how independently characterized multiple variants might work together to influence phenotypic expression. For example, most studies that have identified variants to be used in a multilocus model exploit extant or prevalent cases and controls. Hence the resulting models provide classifications of individuals based on phenotype rather than true predictions of who will develop a phenotype over time. In order to make true predictions one would need either to pursue a longitudinal study with individuals without the phenotype at the start or have very reliable incidence data, and getting appropriate incidence data is not trivial [67]. Other issues involve the need to accommodate environmental effects, interrelationships with other diseases, and a need to recognize the limitations of classifications and predictions based solely on what is likely to be incomplete available knowledge of the genetic architecture of the phenotypes in question [69, 70].

Pathway-based annotations

Genes do not work in isolation to influence the expression of a phenotype, but rather work collectively to influence a particular biological process, biochemical pathway or metabolic network that mediates that phenotype’s expression. These processes and networks are replete with feedback and compensatory mechanisms, making it difficult to predict the phenotypic impact of isolated genetic variants, especially in the case of common chronic diseases. Similarly, gene regulatory elements tend to contain redundant and/or tissue-specific elements making it difficult to interpret the ultimate phenotypic impact of variants that disrupt, e.g. transcription factor binding sites. Thus, one needs to consider the influence of a variant that has not been characterized before in the context of the other genes and genomic elements it interacts with [7173].

Annotations that involve sets of variants, rather than an individual variant, are limited by the available knowledge as to how genes and genomic elements interact and mediate basic molecular processes. Despite this fact, many attempts have been made to annotate the functioning of entire pathways or processes on the basis of variants that disrupt the function of relevant genes [7478]. Conversely, the phenotypic impact of variants can be predicted on the basis of the biological processes the impacted genes participate in, allowing the inference of gene-phenotype relationships without prior phenotypic information. Moreover, methods such as co-expression or other network inference techniques can be utilized to infer biological process and the potential phenotypic impact of variants in genes of unknown function [7178]. Thus, pathway and network based annotation approaches can be powerful approaches to inferring phenotypic information where direct links to phenotype do not exist.

De Novo Association Analyses Involving Multiple Genomes

In the absence of prior information one might leverage to annotate the likely effect of a variant, one could conduct a de novo study to determine the phenotypic relevance of the variant. Such studies could include eQTL mapping, in vitro laboratory or model organism-based functional assays, association analyses, or any one of the study types that reflect the annotation types discussed in this review. Of these, the study types receiving the most attention are, as noted, association studies involving medically-relevant phenotypes. Designing and conducting such studies are not trivial as they are costly, laborious and time-consuming, especially if they involve the notion that an identified variant is one of many different rare variants that perturb a genomic element in a similar way [5]. Association studies can be made more efficient through the use of extant control data or pooling strategies [79], although such strategies are not without their problems.

The creation and use of genomic databases, such as dbGAP [80] or other medical genomics databases [81], which harbor DNA sequence variation information on individual genomes with some phenotypic information available from the subjects whose genomes were assayed, will facilitate the annotation of variations via de novo association studies. In fact, it is not hard to imagine a time when whole genome sequencing (WGS) will become routine and databases harboring the WGS information will be created, such that if a variant or variant type is observed, phenotypic associations involving that variant can be pursued in short order. Obviously, and justifiably, informed consent, anonymity, liability, and other issues concerning the use and distribution of individuals’ genome information will loom large in these contexts.

Ancestry

Variation in the frequency of a sequence variant across different populations can point to the origins (or even signs of selection) of that variant. With enough genetic markers across the genome that exhibit frequency differences across populations (‘Ancestry Informative Markers’) estimates of an individual’s ancestry can be made [82]. These same principles can be applied to specific genomic regions implicating a variant [83,84].

Within-Species Evidence for Selection

Many approaches to assessing evidence that a genomic region shows signs of selection – and hence is very likely to be functional or phenotypically-relevant – leverage within-species, as opposed to cross-species, information [24]. It is therefore possible to annotate variants and genomics regions harboring variants on the basis of evidence that they exhibit signs of selection. There are many different methods for assessing signs selection, and these methods do not always agree [24]. However, a recent paper describes an aggregate or consensus method for assessing selection that shows promise in differentiating variants that are truly reflective of selection vs. those that are not [85]. In addition, Lohmuller et al. [86] have considered how admixed populations can be studied for evidence of selection in particular genomic regions, thus providing an additional level of annotation for variants identified in individuals with mixed ancestry.

Somatic Cell and Tumor Genome Annotations

There is growing interest in identifying somatic mutations that contribute to cancer [87]. Annotating somatic mutations for their properties involves determining if, in fact, a variant is somatically acquired. This involves comparing an individual’s tumor genome with his or her germline genomic DNA [88]. Databases that catalog somatic mutations in cancer exist, such as the COSMIC database [89]. These databases build off individual sequencing studies of tumors, such as those recently pursued by Link et al. [90] which identified a number of inherited and somatic mutations involved in a leukemia, and Ding et al. [88] which identified somatic variants contributing to a metastastic tumor which were verified in functional analyses of those variants via xenograft models. Annotating novel somatic mutations for their likely tumorigenic effect is not trivial but can be pursued through the assessment of patterns among genes and pathways observed to be mutated in large consortium studies of tumors such as the The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium [91].

EXAMPLE STUDIES LEVERAGING ANNOTATION

There have been many very recent precedent-setting individual whole genome variant annotation studies (Table 3). Most of these studies have leveraged disease and trait association study-based annotations along with an assessment of overall ancestry of the individuals whose genome were sequenced. For example, J. Craig Venter’s genome and variations of relevance of to many phenotypes are discussed in his autobiography [92] as well as a paper describing the diploid assembly of his genome [15]. The genome of scientist Steven Quake was the subject of a recent report that estimated his lifetime risk of many common polygenic diseases as a function of variants identified in his genome that had been shown previously to be associated with these diseases [93]. Similarly, a recent study of an individual Charcot-Marie-Tooth neuropathy leveraged WGS to identify variants likely to contribute to that disease [94]. A similar study used WGS in a 4-member family and sophisticated variant annotation ‘filters’ to identify variants very likely to contribute to the Miller syndrome phenotype in the 2 offspring of the family [95].

Table 3.

Recent Individual Whole Genome Sequencing Studies with Variant Annotations

Individual Reference Platform Annotations
JC Venter Venter (2007) [92]; Levy et al. (2007) [15] Sanger Sequencing Disease, traits, ancestry
S. Quake Ashley et al. (2010) [93] Helicos Disease, traits, ancestry
Family with Miller Syndrome Roach et al. (2010) [95] Complete Genomics, Inc. Specific disease mutations
J. Lupski Lupski et al. (2010) [94] SOLiD Specific disease mutations
NA19240 Moore et al. (2011) [11] SOLiD Disease, traits, ancestry
NA18507 Moore et al. (2011) [11] SOLiD; Illumina Disease, traits, ancestry
Anonymous Chinese Asian Moore et al. (2011) [11] Illumina Disease, traits, ancestry
Anonymous Korean Asian Moore et al. (2011) [11] Illumina Disease, traits, ancestry
J. Watson Moore et al. (2011) [11] Roche 454 Disease, traits, ancestry
NA07022 Moore et al. (2011) [11] Complete Genomics Disease, traits, ancestry
NA12878 Moore et al. (2011) [11] SOLiD Disease, traits, ancestry

Two additional studies have considered annotating the genomes of multiple individuals. Moore et al. [11] considered disease association-based annotation and ancestry of 10 recently published individual genomes. Pelak et al. [4] explored the frequency and likely functional impact of variants identified in 20 individual whole genomes and, although they did not consider detailed annotations of each individual, they did characterize the number of likely deleterious and novel mutations per genome and found that, on average, each human genome may carry as many as 144,000 unique variants.

The use of DNA sequencing and variant annotations in clinical decision-making has also recently witnessed some sensational successes. For example, in a 5 year old boy who had undergone more than 100 surgical operations, was almost constantly hospitalized and intermittently septic, whole exome sequencing proved to be life-saving. The idiopathic disease that this boy was suffering from proved to be from a heretofore not seen mutation in the gene XIAP, and led his team of physicians to have him undergo a bone marrow transplant. This patient is now thriving and represents the first case of an idiopathic disease diagnosed and treatment-guided by sequencing [96]. Similarly, there are examples in tumor sequencing studies in which the causative mutation, as determined by sequencing and contrast of the tumor and germline genomes, led to appropriate tailored therapy and an excellent therapeutic response [97, 98]. In aggregate, such cases are the first to document the remarkable potential of not only sequencing, but appropriate annotation of the data, to improve the prognosis of individuals with serious medical conditions.

FUTURE DIRECTIONS FOR ANNOTATING INDIVIDUAL GENOMES

It is quite clear that advances in DNA sequencing technology, as well as assembly and variant calling algorithms, will lead to a situation in which the generation of DNA sequence is not a major impediment to identifying variants that may contribute to disease. Rather, future genetics research will focus on interpreting and making sense of the variations identified from the use of such sequencing technologies [10, 99], for example via the strategies outlined in this review.

However, annotating variants is an activity that also has some inherent challenges. For example, humans are diploid, so that annotating variants may require understanding the sequence context within which variants reside. Thus, phase information from DNA sequencing studies may be necessary for appropriate functional annotations, but is not trivial to obtain currently [21, 23]. Importantly, any one of the annotation methods described here has limitations, as we have tried to make clear. In addition, depending on the application, integrating different types of annotation make sense and the best way to do this is an open question. For example, in clinical decision making settings involving novel variants, the knowledge that other variants in a gene in question have been shown to influence disease, the particular variant in question has been shown to likely damage the gene or genomic element it resides in, and the variant has been shown to be associated with the expression level of that gene in tissues of relevance to the disease, would make the case for the role of that variant in mediating a patient’s disease easier to make than if only evidence from a single annotation existed. Also, in the context of tumor variant annotations, it makes sense to ask what other variations may exist in the tumor beyond the one of focus, how those variants may impact gene function, and, further how the genes impacted by those variants influence a fundamental process or pathway that includes the gene with the primary variant of interest.

Facilitating the integration of variant annotations will thus be an important future activity. The creation of databases, graphical methods displaying integrated annotation results, and statistical analysis and query tools will all help. In the context of disease risk predictions, longitudinal and clinical studies exploring the role of multiple variants must be pursued. Furthermore, the annotation needs to be developed must be adapted for different uses. For example, the interpretative depth and needs for the research community are quite distinct from that of individual physicians or patients. As more individuals access their whole genomic sequence to determine their unique biologic features including disease susceptibility, pharmacogenomic interactions, and ancestry, a different set of annotation tools to represent this information will be required. For all such applications it will be vital for the dissemination of the information to be done in intuitively interpretable, reliable, and ethically-aware ways.

Acknowledgments

This work was supported in part by the following National Institutes of Health research grants: U19 AG023122-01, R01 MH078151-01A1, N01 MH22005, U01 DA024417-01, P50 MH081755-01, UL1 RR025774, RC2 DA029475, R01 AG031224, U54 NS056883, as well as the Price Foundation and Scripps Genomic Medicine.

Footnotes

*

Note: This is an invited article for special issue on whole genome sequencing edited by Jian-Bing Fan.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 1994 Sep 30;265(5181):2037–48. doi: 10.1126/science.8091226. Review Erratum in: Science, 1994, Oct 21, 266 (5184), 353. [DOI] [PubMed] [Google Scholar]
  • 2.Ott J. Analysis of Human Genetic Linkage. 3. Baltimore: Johns Hopkins University Press; 1999. [Google Scholar]
  • 3.Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J Clin Invest. 2008 May;118(5):1590–605. doi: 10.1172/JCI34772. Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Pelak K, Shianna KV, Ge D, Maia JM, Zhu M, Smith JP, Cirulli ET, Fellay J, Dickson SP, Gumbs CE, Heinzen EL, Need AC, Ruzzo EK, Singh A, Campbell CR, Hong LK, Lornsen KA, McKenzie AM, Sobreira NL, Hoover-Fong JE, Milner JD, Ottman R, Haynes BF, Goedert JJ, Goldstein DB. The characterization of twenty sequenced human genomes. PLoS Genet. 2010 Sep 9;6(9):pii, e1001111. doi: 10.1371/journal.pgen.1001111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bansal V, Libiger O, Torkamani A, Schork NJ. Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet. 2010 Nov;11(11):773–85. doi: 10.1038/nrg2867. Epub 2010 Oct 13. Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hofstra RM, Spurdle AB, Eccles D, Foulkes WD, de Wind N, Hoogerbrugge N, Hogervorst FB IARC Unclassified Genetic Variants Working Group. Tumor characteristics as an analytic tool for classifying genetic variants of uncertain clinical significance. Hum Mutat. 2008 Nov;29(11):1292–303. doi: 10.1002/humu.20894. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.International HapMap Consortium. A haplotype map of the human genome. Nature. 2005 Oct 27;437(7063):1299–320. doi: 10.1038/nature04226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature. 467:1061–1073. doi: 10.1038/nature09534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455(7216):1061–1068. doi: 10.1038/nature07385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Davies K. The $1,000 Genome: The Revolution in DNA Sequencing and the New Era of Personalized Medicine. New York: The Free Press; 2010. [Google Scholar]
  • 11.Moore B, Hu H, Singleton M, Reese MG, De La Vega FM, Yandell M. Global analysis of disease-related DNA sequence variation in 10 healthy individuals: implications for whole genome-based clinical diagnostics. Genet Med. 2011 Mar;13(3):210–7. doi: 10.1097/GIM.0b013e31820ed321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Biesecker LG, Mullikin JC, Facio FM, Turner C, Cherukuri PF, Blakesley RW, Bouffard GG, Chines PS, Cruz P, Hansen NF, Teer JK, Maskeri B, Young AC, Manolio TA, Wilson AF, Finkel T, Hwang P, Arai A, Remaley AT, Sachdev V, Shamburek R, Cannon RO, Green ED NISC Comparative Sequencing Program. The ClinSeq Project: piloting large-scale genome sequencing for research in genomic medicine. Genome Res. 2009 Sep;19(9):1665–74. doi: 10.1101/gr.092841.109. Epub 2009 Jul 14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mangan ME, Williams JM, Kuhn RM, Lathe WC., 3rd The UCSC genome browser: what every molecular biologist should know. Curr Protoc Mol Biol. 2009 Oct;Chapter 19(Unit19):9. doi: 10.1002/0471142727.mb1909s88. Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Mangan ME, Williams JM, Lathe SM, Karolchik D, Lathe WC., 3rd UCSC genome browser: deep support for molecular biomedical research. Biotechnol Annu Rev. 2008;14:63–108. doi: 10.1016/S1387-2656(08)00003-3. Review. [DOI] [PubMed] [Google Scholar]
  • 15.Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, Lin Y, MacDonald JR, Pang AW, Shago M, Stockwell TB, Tsiamouri A, Bafna V, Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL, Venter JC. The diploid genome sequence of an individual human. PLoS Biol. 2007 Sep 4;5(10):e254. doi: 10.1371/journal.pbio.0050254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kingsley CB. Identification of causal sequence variants of disease in the next generation sequencing era. Methods Mol Biol. 2011;700:37–46. doi: 10.1007/978-1-61737-954-3_3. Review. [DOI] [PubMed] [Google Scholar]
  • 17.Alexander RP, Fang G, Rozowsky J, Snyder M, Gerstein MB. Annotating non-coding regions of the genome. Nat Rev Genet. 2010 Aug;11(8):559–71. doi: 10.1038/nrg2814. Epub 2010 Jul 13. Review. [DOI] [PubMed] [Google Scholar]
  • 18.The ENCODE Project Consortium. A User’s Guide to the Encyclopedia of DNA Elements (ENCODE) PLoS Biol. 2011 Apr;9(4):e1001046. doi: 10.1371/journal.pbio.1001046. Epub 2011 Apr 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Springer MS, Murphy WJ. Mammalian evolution and biomedicine: new views from phylogeny. Biol Rev Camb Philos Soc. 2007 Aug;82(3):375–92. doi: 10.1111/j.1469-185X.2007.00016.x. Review. Erratum in: Biol Rev Camb Philos Soc 2007 Nov; 82(4), 699. [DOI] [PubMed] [Google Scholar]
  • 20.Schork NJ, Murray SS, Frazer KA, Topol EJ. Common vs.rare allele hypotheses for complex diseases. Curr Opin Genet Dev. 2009 Jun;19(3):212–9. doi: 10.1016/j.gde.2009.04.010. Epub 2009 May 28. Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Tewhey R, Bansal V, Torkamani A, Topol EJ, Schork NJ. The importance of phase information for human genomics. Nat Rev Genet. 2011 Mar;12(3):215–23. doi: 10.1038/nrg2950. Epub 2011 Feb 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Plumpton M, Barnes MR. In: Predictive Functional Analysis of Polymorphisms: An Overview, in Bioinformatics for Geneticists: A Bioinformatics Primer for the Analysis of Genetic Data. 2. Barnes MR, editor. John Wiley & Sons, Ltd; Chichester, UK: 2007. [Google Scholar]
  • 23.Bansal V, Tewhey R, Topol EJ, Schork NJ. The next phase in human genetics. Nat Biotechnol. 2011 Jan;29(1):38–9. doi: 10.1038/nbt.1757. [DOI] [PubMed] [Google Scholar]
  • 24.Nielsen R, Hellmann I, Hubisz M, Bustamante C, Clark AG. Recent and ongoing selection in the human genome. Nat Rev Genet. 2007 Nov;8(11):857–68. doi: 10.1038/nrg2187. Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kryukov GV, Pennacchio LA, Sunyaev SR. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet. 2007 Apr;80(4):727–39. doi: 10.1086/513473. Epub 2007 Mar 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Gorlov IP, Gorlova OY, Sunyaev SR, Spitz MR, Amos CI. Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. Am J Hum Genet. 2008 Jan;82(1):100–12. doi: 10.1016/j.ajhg.2007.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhu Q, Ge D, Maia JM, Zhu M, Petrovski S, Dickson SP, Heinzen EL, Shianna KV, Goldstein DB. A Genome-wide Comparison of the Functional Properties of Rare and Common Genetic Variants in Humans. Am J Hum Genet. 2011 Apr 8;88(4):458–68. doi: 10.1016/j.ajhg.2011.03.008. Epub 2011 Mar 31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Altshuler D, Daly MJ, Lander ES. Genetic mapping in human disease. Science. 2008 Nov 7;322(5903):881–8. doi: 10.1126/science.1156409. Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Phillips C. Online resources for SNP analysis: a review and route map. Mol Biotechnol. 2007 Jan;35(1):65–97. doi: 10.1385/mb:35:1:65. Review. [DOI] [PubMed] [Google Scholar]
  • 30.Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4(7):1073–81. doi: 10.1038/nprot.2009.86. Epub 2009 Jun 25. [DOI] [PubMed] [Google Scholar]
  • 31.Parker SC, Hansen L, Abaan HO, Tullius TD, Margulies EH. Local DNA topography correlates with functional noncoding regions of the human genome. Science. 2009 Apr 17;324(5925):389–92. doi: 10.1126/science.1169050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Torkamani A, Kannan N, Taylor SS, Schork NJ. Congenital disease SNPs target lineage specific structural elements in protein kinases. Proc Natl Acad Sci U S A. 2008 Jul 1;105(26):9011–6. doi: 10.1073/pnas.0802403105. Epub 2008 Jun 25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Araúzo-Bravo MJ, Sarai A. Knowledge-based prediction of DNA atomic structure from nucleic sequence. Genome Inform. 2005;16(2):12–21. [PubMed] [Google Scholar]
  • 34.Farwer J, Packer MJ, Hunter CA. Prediction of atomic structure from sequence for double helical DNA oligomers. Biopolymers. 2006 Jan;81(1):51–61. doi: 10.1002/bip.20377. [DOI] [PubMed] [Google Scholar]
  • 35.Halvorsen M, Martin JS, Broadaway S, Laederach A. Disease-associated mutations that alter the RNA structural ensemble. PLoS Genet. 2010 Aug 19;6(8):e1001074. doi: 10.1371/journal.pgen.1001074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Dixit A, Torkamani A, Schork NJ, Verkhivker G. Computational modeling of structurally conserved cancer mutations in the RET and MET kinases: the impact on protein structure, dynamics, and stability. Biophys J. 2009 Feb;96(3):858–74. doi: 10.1016/j.bpj.2008.10.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Dixit A, Yi L, Gowthaman R, Torkamani A, Schork NJ, Verkhivker GM. Sequence and structure signatures of cancer mutation hotspots in protein kinases. PLoS One. 2009 Oct 16;4(10):e7485. doi: 10.1371/journal.pone.0007485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Friedman AJ, Torkamani A, Verkhivker G, Schork NJ. From coding variant to structure and function insight. In: Schortemeyer Richard, III, Hauppauge, editors. Protein Structure. New York: NOVA Publishers; 2011. in press. [Google Scholar]
  • 39.Sunyaev S, Ramensky V, Koch I, Lathe W, 3rd, Kondrashov AS, Bork P. Prediction of deleterious human alleles. Hum Mol Genet. 2001 Mar 15;10(6):591–7. doi: 10.1093/hmg/10.6.591. [DOI] [PubMed] [Google Scholar]
  • 40.Torkamani A, Schork NJ. Accurate prediction of deleterious protein kinase polymorphisms. Bioinformatics. 2007 Nov 1;23(21):2918–25. doi: 10.1093/bioinformatics/btm437. Epub 2007 Sep 12. [DOI] [PubMed] [Google Scholar]
  • 41.Mort M, Evani US, Krishnan VG, Kamati KK, Baenziger PH, Bagchi A, Peters BJ, Sathyesh R, Li B, Sun Y, Xue B, Shah NH, Kann MG, Cooper DN, Radivojac P, Mooney SD. In silico functional profiling of human disease-associated and polymorphic amino acid substitutions. Hum Mutat. 2010 Mar;31(3):335–46. doi: 10.1002/humu.21192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Xin F, Myers S, Li YF, Cooper DN, Mooney SD, Radivojac P. Structure-based kernels for the prediction of catalytic residues and their involvement in human inherited disease. Bioinformatics. 2010 Aug 15;26(16):1975–82. doi: 10.1093/bioinformatics/btq319. Epub 2010 Jun 15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Torkamani A, Schork NJ. Predicting functional regulatory polymorphisms. Bioinformatics. 2008 Aug 15;24(16):1787–92. doi: 10.1093/bioinformatics/btn311. Epub 2008 Jun 18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Torkamani A, Schork NJ. Prediction of cancer driver mutations in protein kinases. Cancer Res. 2008 Mar 15;68(6):1675–82. doi: 10.1158/0008-5472.CAN-07-5283. [DOI] [PubMed] [Google Scholar]
  • 45.Iversen E, Couch FJ, Goldgar DE, Tavtigian S, Monteiro A. A Computational Method to Classify Variants of Uncertain Significance Using Functional Assay Data With Application to BRCA1. Cancer Epidemiol Biomarkers Prev. 2011 Mar 29; doi: 10.1158/1055-9965.EPI-10-1214. Epub ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.McKusick VA. Mendelian Inheritance in Man and its online version, OMIM. Am J Hum Genet. 2007 Apr;80(4):588–604. doi: 10.1086/514346. Epub 2007 Mar 8. No abstract available. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Bertram L. Alzheimer’s disease genetics current status and future perspectives. Int Rev Neurobiol. 2009;84:167–84. doi: 10.1016/S0074-7742(09)00409-7. Review. [DOI] [PubMed] [Google Scholar]
  • 48.Ioannidis JP, Kavvoura FK. Concordance of functional in vitro data and epidemiological associations in complex disease genetics. Genet Med. 2006 Sep;8(9):583–93. doi: 10.1097/01.gim.0000237775.93658.0c. Review. [DOI] [PubMed] [Google Scholar]
  • 49.Cirulli ET, Goldstein DB. In vitro assays fail to predict in vivo effects of regulatory polymorphisms. Hum Mol Genet. 2007 Aug 15;16(16):1931–9. doi: 10.1093/hmg/ddm140. Epub 2007 Jun 12. [DOI] [PubMed] [Google Scholar]
  • 50.Xuan W, Wang P, Watson SJ, Meng F. Medline search engine for finding genetic markers with biological significance. Bioinformatics. 2007 Sep 15;23(18):2477–84. doi: 10.1093/bioinformatics/btm375. Epub 2007 Sep 6. [DOI] [PubMed] [Google Scholar]
  • 51.Song YC, Kawas E, Good BM, Wilkinson MD, Tebbutt SJ. DataBiNS: a BioMoby-based data-mining workflow for biological pathways and non-synonymous SNPs. Bioinformatics. 2007 Mar 15;23(6):780–2. doi: 10.1093/bioinformatics/btl648. Epub 2007 Jan 18. [DOI] [PubMed] [Google Scholar]
  • 52.Massanet-Vila R, Caminal P, Perera A. Graph theory-based measures as predictors of gene morbidity. Conf Proc IEEE Eng Med Biol Soc. 2010;2010:803–6. doi: 10.1109/IEMBS.2010.5626521. [DOI] [PubMed] [Google Scholar]
  • 53.Hunter DJ. Gene-environment interactions in human diseases. Nat Rev Genet. 2005 Apr;6(4):287–98. doi: 10.1038/nrg1578. Review. [DOI] [PubMed] [Google Scholar]
  • 54.Jo Y, Koh IS, Bae H, Hong MC, Shin MK, Kim YS. TOXPO: TOXicogenomics knowledgebase for inferring toxicit based on Polymorphism. Bio Chip J. 2010;4(2):99–104. [Google Scholar]
  • 55.Thorn CF, Klein TE, Altman RB. PharmGKB: the pharmacogenetics and pharmacogenomics knowledge base. Methods Mol Biol. 2005;311:179–91. doi: 10.1385/1-59259-957-5:179. Review. [DOI] [PubMed] [Google Scholar]
  • 56.Gamazon ER, Duan S, Zhang W, Huang RS, Kistner EO, Dolan ME, Cox NJ. PACdb: a database for cell-based pharmacogenomics. Pharmacogenet Genomics. 2010 Apr;20(4):269–73. doi: 10.1097/FPC.0b013e328337b8d6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M. Mapping complex disease traits with global gene expression. Nat Rev Genet. 2009 Mar;10(3):184–94. doi: 10.1038/nrg2537. Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Gilad Y, Rifkin SA, Pritchard JK. Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet. 2008 Aug;24(8):408–15. doi: 10.1016/j.tig.2008.06.001. Epub 2008 Jul 1. Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Wang J, Williams RW, Manly KF. WebQTL: Web-based complex trait analysis. Neuroinformatics. 2003;1:299–308. doi: 10.1385/NI:1:4:299. [DOI] [PubMed] [Google Scholar]
  • 60.Melzer D, Perry JR, Hernandez D, Corsi AM, Stevens K, Rafferty I, Lauretani F, Murray A, Gibbs JR, Paolisso G, Rafiq S, Simon-Sanchez J, Lango H, Scholz S, Weedon MN, Arepalli S, Rice N, Washecka N, Hurst A, Britton A, Henley W, van de Leemput J, Li R, Newman AB, Tranah G, Harris T, Panicker V, Dayan C, Bennett A, McCarthy MI, Ruokonen A, Jarvelin MR, Guralnik J, Bandinelli S, Frayling TM, Singleton A, Ferrucci L. A genome-wide association study identifies protein quantitative trait loci (pQTLs) PLoS Genet. 2008 May 9;4(5):e1000072. doi: 10.1371/journal.pgen.1000072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Tycko B. Mapping allele-specific DNA methylation: a new tool for maximizing information from GWAS. Am J Hum Genet. 2010 Feb 12;86(2):109–12. doi: 10.1016/j.ajhg.2010.01.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Bell JT, Pai AA, Pickrell JK, Gaffney DJ, Pique-Regi R, Degner JF, Gilad Y, Pritchard JK. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol. 2011 Jan 20;12(1):R10. doi: 10.1186/gb-2011-12-1-r10. Epub ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 2010 Apr 1;6(4):e1000888. doi: 10.1371/journal.pgen.1000888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Amberger J, Bocchini C, Hamosh A. A new face and new challenges for Online Mendelian Inheritance in Man (OMIM) Hum Mutat. 2011 May;32(5):564–7. doi: 10.1002/humu.21466. Epub 2011 Apr 5. [DOI] [PubMed] [Google Scholar]
  • 65.Johnson AD, O’Donnell CJ. An open access database of genome-wide association results. BMC Med Genet. 2009 Jan 22;10:6. doi: 10.1186/1471-2350-10-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TF, McCarroll SA, Visscher PM. Finding the missing heritability of complex diseases. Nature. 2009 Oct 8;461(7265):747–53. doi: 10.1038/nature08494. Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Yang Q, Flanders WD, Moonesinghe R, Ioannidis JP, Guessous I, Khoury MJ. Using lifetime risk estimates in personal genomic profiles: estimation of uncertainty. Am J Hum Genet. 2009 Dec;85(6):786–800. doi: 10.1016/j.ajhg.2009.10.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Bloss CS, Schork NJ, Topol EJ. Effect of direct-to-consumer genomewide profiling to assess disease risk. N Engl J Med. 2011 Feb 10;364(6):524–34. doi: 10.1056/NEJMoa1011893. Epub 2011 Jan 12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Wray NR, Yang J, Goddard ME, Visscher PM. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 2010 Feb 26;6(2):e1000864. doi: 10.1371/journal.pgen.1000864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.So HC, Sham PC. A unifying framework for evaluating the predictive power of genetic variants based on the level of heritability explained. PLoS Genet. 2010 Dec 2;6(12):e1001230. doi: 10.1371/journal.pgen.1001230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011 Jan;12(1):56–68. doi: 10.1038/nrg2918. Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Wang K, Li M, Hakonarson H. Analysing biological pathways in genome-wide association studies. Nat Rev Genet. 2010 Dec;11(12):843–54. doi: 10.1038/nrg2884. Review. Welch JS et al, Use of whole genome sequencing to diagnose a cryptic fusion oncogene. Journal of the American Medical Association, 2011, 305, 1577, 1584. [DOI] [PubMed] [Google Scholar]
  • 73.Torkamani A, Topol EJ, Schork NJ. Pathway analysis of seven common diseases assessed by genome-wide association. Genomics. 2008 Nov;92(5):265–72. doi: 10.1016/j.ygeno.2008.07.011. Epub 2008 Sep 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Jamshidi N, Vo TD, Palsson BO. In silico analysis of SNPs and other high-throughput data. Methods Mol Biol. 2007;366:267–85. doi: 10.1007/978-1-59745-030-0_15. [DOI] [PubMed] [Google Scholar]
  • 75.Jamshidi N, Palsson BØ. Systems biology of SNPs. Mol Syst Biol. 2006;2:38. doi: 10.1038/msb4100077. Epub 2006 Jul 4. Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Jamshidi N, Palsson BO. Using in silico models to simulate dual perturbation experiments: procedure development and interpretation of outcomes. BMC Syst Biol. 2009 Apr 30;3:44. doi: 10.1186/1752-0509-3-44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Köhler S, Bauer S, Horn D, Robinson PN. Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet. 2008 Apr;82(4):949–58. doi: 10.1016/j.ajhg.2008.02.013. Epub 2008 Mar 27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Yeger-Lotem E, Riva L, Su LJ, Gitler AD, Cashikar AG, King OD, Auluck PK, Geddie ML, Valastyan JS, Karger DR, Lindquist S, Fraenkel E. Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity. Nat Genet. 2009 Mar;41(3):316–23. doi: 10.1038/ng.337. Epub 2009 Feb 22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Bansal V, Tewhey R, Leproust EM, Schork NJ. Efficient and cost effective population resequencing by pooling and in-solution hybridization. PLoS One. 2011 Mar 30;6(3):e18353. doi: 10.1371/journal.pone.0018353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, Hao L, Kiang A, Paschall J, Phan L, Popova N, Pretel S, Ziyabari L, Lee M, Shao Y, Wang ZY, Sirotkin K, Ward M, Kholodov M, Zbicz K, Beck J, Kimelman M, Shevelev S, Preuss D, Yaschenko E, Graeff A, Ostell J, Sherry ST. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet. 2007 Oct;39(10):1181–6. doi: 10.1038/ng1007-1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.McCarty CA, Chisholm RL, Chute CG, Kullo IJ, Jarvik GP, Larson EB, Li R, Masys DR, Ritchie MD, Roden DM, Struewing JP, Wolf WA eMERGE Team. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics. 2011 Jan 26;4:13. doi: 10.1186/1755-8794-4-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Halder I, Shriver M, Thomas M, Fernandez JR, Frudakis T. A panel of ancestry informative markers for estimating individual biogeographical ancestry and admixture from four continents: utility and applications. Hum Mutat. 2008 May;29(5):648–58. doi: 10.1002/humu.20695. [DOI] [PubMed] [Google Scholar]
  • 83.Pasaniuc B, Sankararaman S, Kimmel G, Halperin E. Inference of locus-specific ancestry in closely related populations. Bioinformatics. 2009 Jun 15;25(12):i213–21. doi: 10.1093/bioinformatics/btp197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Sankararaman S, Sridhar S, Kimmel G, Halperin E. Estimating local ancestry in admixed populations. Am J Hum Genet. 2008 Feb;82(2):290–303. doi: 10.1016/j.ajhg.2007.09.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Grossman SR, Shylakhter I, Karlsson EK, Byrne EH, Morales S, Frieden G, Hostetter E, Angelino E, Garber M, Zuk O, Lander ES, Schaffner SF, Sabeti PC. A composite of multiple signals distinguishes causal variants in regions of positive selection. Science. 2010 Feb 12;327(5967):883–6. doi: 10.1126/science.1183863. Epub 2010 Jan 7. [DOI] [PubMed] [Google Scholar]
  • 86.Lohmueller KE, Bustamante CD, Clark AG. Detecting directional selection in the presence of recent admixture in African-Americans. Genetics. 2011 Mar;187(3):823–35. doi: 10.1534/genetics.110.122739. Epub 2010 Dec 31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Ding L, Wendl MC, Koboldt DC, Mardis ER. Analysis of next-generation genomic data in cancer: accomplishments and challenges. Hum Mol Genet. 2010 Oct 15;19(R2):R188–96. doi: 10.1093/hmg/ddq391. Epub 2010 Sep 15. Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Ding L, Ellis MJ, Li S, Larson DE, Chen K, Wallis JW, Harris CC, McLellan MD, Fulton RS, Fulton LL, Abbott RM, Hoog J, Dooling DJ, Koboldt DC, Schmidt H, Kalicki J, Zhang Q, Chen L, Lin L, Wendl MC, McMichael JF, Magrini VJ, Cook L, McGrath SD, Vickery TL, Appelbaum E, Deschryver K, Davies S, Guintoli T, Lin L, Crowder R, Tao Y, Snider JE, Smith SM, Dukes AF, Sanderson GE, Pohl CS, Delehaunty KD, Fronick CC, Pape KA, Reed JS, Robinson JS, Hodges JS, Schierding W, Dees ND, Shen D, Locke DP, Wiechert ME, Eldred JM, Peck JB, Oberkfell BJ, Lolofie JT, Du F, Hawkins AE, O’Laughlin MD, Bernard KE, Cunningham M, Elliott G, Mason MD, Thompson DM, Jr, Ivanovich JL, Goodfellow PJ, Perou CM, Weinstock GM, Aft R, Watson M, Ley TJ, Wilson RK, Mardis ER. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature. 2010 Apr 15;464(7291):999–1005. doi: 10.1038/nature08989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, Teague JW, Campbell PJ, Stratton MR, Futreal PA. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011 Jan;39(Database issue):D945–50. doi: 10.1093/nar/gkq929. Epub 2010 Oct 15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Link DC, Schuettpelz LG, Shen D, Wang J, Walter MJ, Kulkarni S, Payton JE, Ivanovich J, Goodfellow PJ, Le Beau M, Koboldt DC, Dooling DJ, Fulton RS, Bender RH, Fulton LL, Delehaunty KD, Fronick CC, Appelbaum EL, Schmidt H, Abbott R, O’Laughlin M, Chen K, McLellan MD, Varghese N, Nagarajan R, Heath S, Graubert TA, Ding L, Ley TJ, Zambetti GP, Wilson RK, Mardis ER. Identification of a novel TP53 cancer susceptibility mutation through whole-genome sequencing of a patient with therapy-related AML. JAMA. 2011 Apr 20;305(15):1568–76. doi: 10.1001/jama.2011.473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Barrett IP. Cancer genome analysis informatics. Methods Mol Biol. 2010;628:75–102. doi: 10.1007/978-1-60327-367-1_5. Review. [DOI] [PubMed] [Google Scholar]
  • 92.Venter JC. A Life Decoded: My Genome: My Life. New York: Viking; 2007. [Google Scholar]
  • 93.Ashley EA, Butte AJ, Wheeler MT, Chen R, Klein TE, Dewey FE, Dudley JT, Ormond KE, Pavlovic A, Morgan AA, Pushkarev D, Neff NF, Hudgins L, Gong L, Hodges LM, Berlin DS, Thorn CF, Sangkuhl K, Hebert JM, Woon M, Sagreiya H, Whaley R, Knowles JW, Chou MF, Thakuria JV, Rosenbaum AM, Zaranek AW, Church GM, Greely HT, Quake SR, Altman RB. Clinical assessment incorporating a personal genome. Lancet. 2010 May 1;375(9725):1525–35. doi: 10.1016/S0140-6736(10)60452-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Lupski JR, Reid JG, Gonzaga-Jauregui C, Rio Deiros D, Chen DC, Nazareth L, Bainbridge M, Dinh H, Jing C, Wheeler DA, McGuire AL, Zhang F, Stankiewicz P, Halperin JJ, Yang C, Gehman C, Guo D, Irikat RK, Tom W, Fantin NJ, Muzny DM, Gibbs RA. Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N Engl J Med. 2010 Apr 1;362(13):1181–91. doi: 10.1056/NEJMoa0908094. Epub 2010 Mar 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N, Bamshad M, Shendure J, Drmanac R, Jorde LB, Hood L, Galas DJ. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science. 2010 Apr 30;328(5978):636–9. doi: 10.1126/science.1186802. Epub 2010 Mar 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Worthey EA, Mayer AN, Syverson GD, Helbling D, Bonacci BB, Decker B, Serpe JM, Dasu T, Tschannen MR, Veith RL, Basehore MJ, Broeckel U, Tomita-Mitchell A, Arca MJ, Casper JT, Margolis DA, Bick DP, Hessner MJ, Routes JM, Verbsky JW, Jacob HJ, Dimmock DP. Making a definitive diagnosis: successful clinical application of whole exome sequencing in a child with intractable inflammatory bowel disease. Genet Med. 2011 Mar;13(3):255–62. doi: 10.1097/GIM.0b013e3182088158. [DOI] [PubMed] [Google Scholar]
  • 97.Jones SJ, Laskin J, Li YY, Griffith OL, An J, Bilenky M, Butterfield YS, Cezard T, Chuah E, Corbett R, Fejes AP, Griffith M, Yee J, Martin M, Mayo M, Melnyk N, Morin RD, Pugh TJ, Severson T, Shah SP, Sutcliffe M, Tam A, Terry J, Thiessen N, Thomson T, Varhol R, Zeng T, Zhao Y, Moore RA, Huntsman DG, Birol I, Hirst M, Holt RA, Marra MA. Evolution of an adenocarcinoma in response to selection by targeted kinase inhibitors. Genome Biol. 2010;11(8):R82. doi: 10.1186/gb-2010-11-8-r82. Epub 2010 Aug 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Welch JS, Westervelt P, Ding L, Larson DE, Klco JM, Kulkarni S, Wallis J, Chen K, Payton JE, Fulton RS, Veizer J, Schmidt H, Vickery TL, Heath S, Watson MA, Tomasson MH, Link DC, Graubert TA, DiPersio JF, Mardis ER, Ley TJ, Wilson RK. Use of whole-genome sequencing to diagnose a cryptic fusion oncogene. JAMA. 2011 Apr 20;305(15):1577–84. doi: 10.1001/jama.2011.497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Mardis ER. The $1,000 genome, the $100,000 analysis? Genome Med. 2010 Nov 26;2(11):84. doi: 10.1186/gm205. No abstract available. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Lyssenko V, Almgren P, Anevski D, Orho-Melander M, Sjögren M, Saloranta C, Tuomi T, Groop L Botnia Study Group. Genetic prediction of future type 2 diabetes. PLoS Med. 2005 Nov 1;2(12):e345. doi: 10.1371/journal.pmed.0020345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Weedon MN, McCarthy MI, Hitman G, Walker M, Groves CJ, Zeggini E, Rayner NW, Shields B, Owen KR, Hattersley AT, Frayling TM. Combining information from common type 2 diabetes risk polymorphisms improves disease prediction. PLoS Med. 2006 Oct;3(10):e374. doi: 10.1371/journal.pmed.0030374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Lango H, Palmer CN, Morris AD, Zeggini E, Hattersley AT, McCarthy MI, Frayling TM, Weedon MN UK Type 2 Diabetes Genetics Consortium. Assessing the combined impact of 18 common genetic variants of modest effect sizes on type 2 diabetes risk. Diabetes. 2008 Nov;57(11):3129–35. doi: 10.2337/db08-0504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Lu Q, Elston RC. Using the optimal receiver operating characteristic curve to design a predictive genetic test, exemplified with type 2 diabetes. Am J Hum Genet. 2008 Mar;82(3):641–51. doi: 10.1016/j.ajhg.2007.12.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Lyssenko V, Jonsson A, Almgren P, Pulizzi N, Isomaa B, Tuomi T, Berglund G, Altshuler D, Nilsson P, Groop L. Clinical risk factors, DNA variants, and the development of type 2 diabetes. N Engl J Med. 2008 Nov 20;359(21):2220–32. doi: 10.1056/NEJMoa0801869. [DOI] [PubMed] [Google Scholar]
  • 105.Meigs JB, Shrader P, Sullivan LM, McAteer JB, Fox CS, Dupuis J, Manning AK, Florez JC, Wilson PW, D’Agostino RB, Sr, Cupples LA. Genotype score in addition to common risk factors for prediction of type 2 diabetes. N Engl J Med. 2008 Nov 20;359(21):2208–19. doi: 10.1056/NEJMoa0804742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Cauchi S, Meyre D, Durand E, Proença C, Marre M, et al. Post Genome-Wide Association Studies of Novel Genes Associated with Type 2 Diabetes Show Gene-Gene Interaction and High Predictive Value. PLoS ONE. 2008;3(5):e2031. doi: 10.1371/journal.pone.0002031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.van Hoek M, Dehghan A, Witteman JC, van Duijn CM, Uitterlinden AG, Oostra BA, Hofman A, Sijbrands EJ, Janssens AC. Predicting type 2 diabetes based on polymorphisms from genome-wide association studies: a population-based study. Diabetes. 2008 Nov;57(11):3122–8. doi: 10.2337/db08-0425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Lu Q, Song Y, Wang X, Won S, Cui Y, Elston RC. The effect of multiple genetic variants in predicting the risk of type 2 diabetes. BMC Proc. 2009 Dec 15;3( Suppl 7):S49. doi: 10.1186/1753-6561-3-s7-s49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Miyake K, Yang W, Hara K, Yasuda K, Horikawa Y, Osawa H, Furuta H, et al. Construction of a prediction model for type 2 diabetes mellitus in the Japanese population based on 11 genes with strong evidence of the association. J Hum Genet. 2009 Apr;54(4):236–41. doi: 10.1038/jhg.2009.17. Epub 2009 Feb 27. [DOI] [PubMed] [Google Scholar]
  • 110.Talmud PJ, Hingorani AD, Cooper JA, Marmot MG, Brunner EJ, Kumari M, Kivimäki M, Humphries SE. Utility of genetic and non-genetic risk factors in prediction of type 2 diabetes: Whitehall II prospective cohort study. BMJ. 2010 Jan 14;340:b48. doi: 10.1136/bmj.b4838. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES