Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Dec 1.
Published in final edited form as: Nat Genet. 2015 Apr 27;47(6):640–642. doi: 10.1038/ng.3270

Analysis of loss-of-function variants and 20 risk factor phenotypes in 8,554 individuals identifies loci influencing chronic disease

Alexander H Li 1, Alanna C Morrison 1, Christie Kovar 2, L Adrienne Cupples 3,4, Jennifer A Brody 5, Linda M Polfus 1, Bing Yu 1, Ginger Metcalf 2, Donna Muzny 2, Narayanan Veeraraghavan 2, Xiaoming Liu 1, Thomas Lumley 5,6, Thomas H Mosley 7, Richard A Gibbs 2, Eric Boerwinkle 1,2
PMCID: PMC4470468  NIHMSID: NIHMS697101  PMID: 25915599

Abstract

A typical human exome harbors dozens of loss-of-function (LOF) variants1, which can lower disease risk factor levels and affect drug efficacy2. We hypothesized that LOF variants are enriched in genes influencing risk factor levels and the onset of common chronic diseases, such as cardiovascular disease and diabetes. To test this hypothesis, we sequenced the exomes of 8,554 individuals and analyzed the effects of predicted LOF variants on 20 chronic disease risk factor phenotypes. Analysis of this sample as discovery and replication strata of equal size verified two relationships in well-studied genes (PCSK9 and APOC3) and identified eight new loci. Previously unknown relationships included elevated fasting glucose in carriers of heterozygous LOF variation in TXNDC5, which encodes a biomarker for type 1 diabetes progression, and apparent recessive effects of C1QTNF8 on serum magnesium levels. These data demonstrate the utility of functional-variant annotation within a large sample of deeply phenotyped individuals for gene discovery.


Investigations of genotype-phenotype associations leading to the discovery of new genes or gene functions have traditionally been facilitated by focusing on the most severe cases or those involving the earliest ages of onset3. An alternative approach would be to identify variants with the most severe functional effects in a sample of deeply phenotyped individuals and then investigate the roles of these variants in health and disease. To test this approach, we sequenced the exomes of 8,554 individuals who had been assessed for many phenotypes related to common chronic diseases, such as diabetes and cardiovascular disease. We annotated predicted LOF variants in these individuals and investigated their effects on 20 chronic disease risk factor phenotypes. Gene-based analyses identified and replicated ten genetic loci associated with these measured traits. These results demonstrate the importance of detailed biological annotation in large-scale sequencing studies and the utility of deep phenotyping in cohort studies for further elucidation of the genetic architecture of human health and disease.

Whole-exome sequencing was done for 2,836 African American (AA) and 5,718 European American (EA) individuals from the Atherosclerosis Risk in Communities (ARIC) study (Supplementary Table 1). Ninety percent of target sites were covered at 20× depth or greater (mean depth, 110.1 per sample), revealing 1,911,892 total single-nucleotide variants (SNVs) with an average transition/ transversion ratio (Ti/Tv) of 3.3 per sample and 38,219 small insertions and deletions (indels). Indel sizes ranged from −51 base pairs (bp) to +27 bp, with a mode of −1 bp. We defined LOF variations as sequence changes predicted to abolish protein formation by all isoforms in the RefSeq database for a given gene and identified a total of 36,561 candidate LOF sites (13,783 frameshift indel, 8,772 splice, 14,006 premature stop; Table 1) in 11,260 protein-coding genes. Not surprisingly1, LOF variants were enriched in the very rare range of the site-frequency spectrum (minor allele frequency (MAF) < 0.1%) as compared to other functional categories (Supplementary Fig. 1).

Table 1.

Number of LOF sites per study sample and per individual

LOF sites
Average per individual
AA EA Combined AA EA
Stop 5,837 9,312 14,006 27.3 (2.1) 21.1 (2.2)
Splice 3,789 5,731   8,772 16.7 (1.9)   9.6 (1.8)
Frameshift 6,575 8,264 13,783 36.1 (4.4) 22.6 (3.1)

Total LOF 16,201   23,307   36,561 80.1 (8.4) 53.3 (7.1)

Shown are the total number of LOF sites observed and the average number of heterozygous (homozygous) LOF sites per individual.

We next characterized the prevalence of LOF variation by gene. Because mutations may arise more frequently in larger genes and codon usage influences the chance of premature stops, we exhaustively simulated every single-nucleotide substitution in each gene transcript to determine the maximum number of potential LOF substitution sites in each gene, which we then compared to the observed number of LOF sites in our sample (observed number/potential number = OP ratio)4,5. Almost half the genes in our capture regions presented no LOF alleles (n = 7,115, OP ratio = 0). The OP ratios of the remaining genes formed a distribution with a peak near 0.003 with a skewed right tail (Fig. 1a), underscoring the role of purifying selection against these sites. Genes known to influence human phenotypes in a dominant manner6 had smaller average OP ratios (Fig. 1b), whereas known recessive disease genes1 had larger OP ratios (Fig. 1c). The relationship between the OP ratio and the effects of LOF variants on the 20 risk factor phenotypes analyzed here is complex. Clearly, genes lacking LOF variants (i.e., OP ratio = 0) did not contribute to the analysis. Conversely, genes that tolerate a large number of LOF variants and had a high OP ratio (e.g., OP ratio > 0.1) did not contribute significantly to phenotypic variation. Genes contributing to the genetic architecture of health and disease in a population are likely to be important, by virtue of having an above-average OP ratio, but not so critical that LOF variants will lead to debilitating disease or be inconsistent with life. To this point, we observed that homologs of essential mouse genes7 (lethal phenotypes) had smaller average OP ratios than did non-essential phenotype-changing genes (P < 10−6, Wilcoxon), and these non-essential genes had smaller OP ratios compared to those for all other genes (P < 10−6, Wilcoxon; Fig. 1d). Genes with smaller OP ratios also tended to be stably expressed in more tissues and to interact with more proteins (Supplementary Figs. 2 and 3).

Figure 1.

Figure 1

OP ratio trends across gene groups. (a) Histogram of OP ratios for all genes. (b) Lower OP ratios were found for genes causing dominant disorders (n = 248) than for other genes (n = 16,435). (c) Higher OP ratios were noted for genes causing recessive disorders (n = 652) than for other genes (n = 16,031). (d) The lowest OP ratios noted were those of human paralogs of essential mouse genes (Essential, embryonic lethal phenotype (n = 2,356); Non-essential, non-lethal phenotype (n = 3,520); Other, no phenotype reported (n = 10,807)). Panels b–d show box plots of median values, with hinges at the 25th and 75th percentiles and whiskers extending to 1.5× the interquartile range.

To detect associations between LOF variation and common chronic disease phenotypes, we divided our sample into two nonoverlapping discovery and replication groups, each containing 4,277 individuals (Supplementary Tables 1 and 2). As LOF annotation enriches for variations with a similar predicted functional effect (namely, the reduction or abolishment of protein formation), we grouped LOF variants by gene and carried out burden testing for sites with MAF < 5% (T5 test)8. A summary of the most significant replicated results is shown in Table 2. The results for all gene-based tests can be found in Supplementary Table 3. As expected, LOF variants in PCSK9 were associated with lower total cholesterol levels9, and LOF variation in APOC3 was associated with lower triglyceride levels10. We noted eight previously unrecognized relationships with compelling statistical evidence that were replicated in the two sample groups (Table 2). Except in PCSK9 and APOC3, the effects were in the direction thought to be associated with increasing risk of disease.

Table 2.

Top replicated gene-based phenotype associations

Number of LOF sites Group
T5 P value
Std. beta
Genotype Measure Gene AA EA Disc. Rep. Total
Het Creatinine LHCGR 2 (1,1) 0 3 6.71 × 10−6 0.01 2.71 × 10−6 4.69
PLEKHG1 3 (1,2) 1 2 9.06 × 10−6    3.0 × 10−3 8.70 × 10−8 5.35
Fasting glucose GLIPR1 3 (2,1) 1 2 6.14 × 10−4 2.48 × 10−6 9.38 × 10−9 5.74
TXNDC5 7 (4,3) 6 3 6.82 × 10−4 5.75 × 10−5 5.62 × 10−7 5.00
FEV1/FVC SEPT10 5 (1,4) 0 5 6.26 × 10−6 1.21 × 10−4 3.07 × 10−6 −4.67  
Lactate WDR62 3 (1,2) 3 0   8.0 × 10−3 5.52 × 10−6 1.91 × 10−6 4.76
Total cholesterol PCSK9 6 (3,3) 24  2 8.27 × 10−5 4.44 × 10−4 5.25 × 10−8 −5.44  
Triglycerides APOC3 4 (3,1) 13  24  1.25 × 10−9 1.38 × 10−8  7.98 × 10−17 −8.33  
TIGIT 2 (1,1) 2 1 2.74 × 10−4 3.88 × 10−3 4.11 × 10−6 4.61

Hom Magnesium C1QTNF8 1 (1,0) 0 4 0.02 1.31 × 10−5 5.20 × 10−5 4.08

Shown are ten significant replicated associations that were driven by ≥3 individuals. “Genotype” denotes the heterozygous (Het) or homozygous (Hom) state of LOF individuals. “Number of LOF sites” (SNV, indel) describes the number of variants included for the T5 analyses. To standardize T5 betas (Std. beta) we calculated the ratio of beta to standard error. AA, African American; EA, European American; Disc., discovery strata; Rep., replication strata; Total, pooled discovery and replication; FEV1, forced expiratory volume in 1 s.

As an example, nine individuals with LOF variation in Thioredoxin domain containing 5 (TXNDC5) had elevated fasting blood-glucose levels (Fig. 2a), and this gene has recently been suggested as a candidate determinant in type 1 diabetes risk11. In follow-up analyses, we observed a weak association between TXNDC5 variation and fasting insulin levels within the ARIC study cohort (P = 0.047; Supplementary Table 3). In addition, five EA study participants had an LOF mutation in SEPT10, and these individuals had significantly reduced lung function (on the basis of the ratio of forced expiratory volume to forced vital capacity; P = 3.07 × 10−6). SEPT10 is associated with a known linkage peak for nicotine dependence12, and three of the five individuals carrying this LOF variation were self-reported former smokers.

Figure 2.

Figure 2

Distribution of phenotypes in carriers of LOF variations. (a) Elevated fasting glucose was observed in individuals with heterozygous TXNDC5 LOF variation (LOF-Het; n = 9) compared to individuals with no LOF variation in this gene (Non-LOF; n = 8,545). (b) Elevated serum magnesium was noted in C1QTNF8-homozygous individuals (LOF-Hom; n = 4) compared to LOF heterozygotes (n = 62) and non-LOF individuals (n = 8,488). Both panels show box plots of median values, with hinges at the 25th and 75th percentiles and whiskers extending to ±1.5× the interquartile range. Each circle represents one ARIC participant.

Considering that LOF alleles may primarily influence phenotype in the homozygous state13, we separately analyzed 1,156 homozygous LOF sites representing 921 genes. Similar gene-based T5 tests were carried out to compare the phenotype levels in LOF homozygotes to those in other individuals within the sample group. One homozygous association was replicated; the full set of results is provided in Supplementary Table 4. Four individuals were homozygous for LOF mutations in C1q and tumor necrosis factor related protein 8 (C1QTNF8), and these individuals had elevated serum magnesium levels (Fig. 2b). The diverse family of C1q-related genes, which includes adiponectin (ADIPOQ), is linked to both metabolism and inflammatory processes14, although C1QTNF8 is not well characterized.

We identified ten LOF mutation-phenotype relationships that were both significant and replicated, but it is important to more broadly consider the concept of replication in the context of rare- variant studies. In this study, 101 genotype-phenotype relationhips with compelling statistical evidence (P < 4.4 × 10−6) were exclusive to either the discovery or the replication group (Supplementary Tables 3 and 4); in these cases, LOF mutations were present in one or the other sample, but not in both. These ‘absent’ replications are thus not directly supported or discredited by our results, as they represent the chance of absence of the appropriate rare event (Supplementary Fig. 4).

The identification of LOF variations influencing chronic disease risk factors represents a new and diverse paradigm in genomic medicine. LOF variation in certain genes, such as TXNDC5, may predispose individuals to disease. Further characterization of these risk loci will allow researchers and clinicians to better understand the pathways and mechanisms underlying disease risk and to develop prevention strategies for at-risk individuals as DNA sequencing moves inevitably toward common clinical practice. LOF variations can also have a protective, risk-lowering effect on their carriers. When coupled with knowledge about the lack of other adverse effects, such LOF mutations may translate into novel drug targets. For example, LOF variants in PCSK9 are associated with reduced levels of LDL cholesterol and incident coronary heart disease, fueling a burgeoning and successful effort to identify PCSK9 inhibitors15.

The discovery of new gene associations via exome sequencing has many challenges and represents a classic problem related to the signal-to-noise ratio. In this study we considered three ways to increase signal in whole-exome sequence analyses. First, by including biochemical measures of risk factor levels, we were able to optimize a gene’s effect relative to the corresponding disease endpoint. Second, the data presented here reinforce the need for diverse sample populations in sequence-based gene-discovery studies, as sentinel signals may be race specific. Third, through careful annotation of the sequence motifs and variation—in this case, by focusing on LOF variation—we increased the likelihood of detecting a functional effect. As we make the transition from whole-exome sequencing to whole-genome sequencing16, careful annotation of variants with functional effects will become even more important and challenging.

METHODS

Methods and any associated references are available in the online version of the paper.

ONLINE METHODS

Sample selection

Whole-exome sequence data were derived from 8,554 individuals (5,718 EA and 2,836 AA) sampled from the Atherosclerosis Risk in Communities (ARIC) cohort study. Each ancestry group was then randomly divided in half to create two nonoverlapping and identically sized groups of 1,418 AA and 2,859 EA individuals for discovery and replication. EA individuals were selected as part of a large-cohort random sample or had extreme values for at least one of the following phenotypes: age at menopause, electrocardiogram QT interval, fasting blood glucose, fibrinogen level, renal function, Stamler-Kannel–like extremes of risk factors selected by principal components and waist-to-hip ratio. ARIC AA samples were randomly selected from the ARIC cohort for whole-exome sequencing. A detailed description of the ARIC study is provided elsewhere17.

Phenotype assessment

For these analyses, we selected heart, lung and/or blood phenotypes related to cardiovascular outcomes that were (1) specifically not included in the sampling design to reduce potential bias and (2) measured across the entire cohort to maximize sample size. The full set of phenotypes included in these analyses is listed in Supplementary Table 2. Serum magnesium was measured using the metallochromic dye Calmagite. Levels of phosphorus, calcium and creatinine (CRE) were measured by methods using ammonium molybdate, o-cresolphthalein complexone and modified kinetic Jaffe-picric acid, respectively. Serum potassium and sodium levels were measured with a direct electrochemical technique. The liver enzymes aspartate transaminase (AST), alanine aminotransferase (ALT) and γ-glutamyl transpeptidase (GGT) were measured using standard methods. Blood pressure was measured using a standardized Hawksley random-zero mercury column sphygmomanometer with participants in a sitting position after a resting period of 5 min. The size of the cuff was chosen according to the subject’s arm circumference. Three sequential recordings were obtained for systolic blood pressure (SBP) and diastolic blood pressure (DBP); the mean of the last two measurements was used in this analysis, and the first reading was discarded. Forced vital capacity (FVC) and the ratio of forced expiratory volume in 1 s (FEV1) to FVC were measured using a spirometer and the Pulmo-Screen II software. Triglycerides and total cholesterol (TCH) were measured using enzymatic methods in subjects who had fasted overnight. Fasting insulin (FI) was measured via radioimmunoassay. Fasting glucose was measured with the hexokinase method on individuals who had fasted for >8 h before samples were obtained. Uric acid was measured by the Uricase method. White blood cell counts were determined by an automated particle counter. Lactate (LAC) was measured using an enzymatic reaction that converts LAC to pyruvate.

Whole-exome sequencing

DNA sequencing was done using Illumina HiSeq instruments (San Diego, CA) after exome capture with VCRome 2.1 (NimbleGen, Inc., Madison, WI) using chemistry recommended by the manufacturer. Sequence alignment and variant calling were carried out via the Mercury pipeline in the DNAnexus18.

Variant calling and quality control

Mapping against Genome Reference Consortium Human Build 37 was done using Burrows-Wheeler alignment19, and allele calling and variant-call file (VCF) construction were done with the Atlas2 suite20 (Atlas-SNP and Atlas-Indel). The VCF included the following criteria, which we used to flag and remove low-quality SNVs with a single-nucleotide polymorphism (SNP) posterior probability less than 0.95, total depth of coverage less than 6×, fewer than three variant reads, an allelic fraction of <0.1, 99% reads in a single direction and homozygous reference alleles with <6× coverage. Stricter filters were applied to identify low-quality single-nucleotide substitutions with total depths less than 10. Similar, stricter filters were applied to identify low-quality indels with the following differences: (1) minimum total depth < 60, (2) allelic fraction < 0.2 for heterozygous variants (< 0.8 for homozygous variants) and (3) <30 variant reads.

Validation

We validated a subset of candidate LOF genotypes using independent platforms with an emphasis on indels. We used targeted sequencing methods (Sequenom and Sanger) and observed a validation rate of 97.4% for SNVs and 92.5% for LOF indel sites (Supplementary Table 5).

Emphasizing indels, we took advantage of two opportunities to validate LOF variants detected by the Illumina HiSeq instrument and the Mercury data-processing pipeline. First, 2,649 SNV LOF sites observed within our sample were also targeted on the Illumina exome chip21. In this overlapping set, 98% of genotypes were identical on the two platforms. Second, we selected 263 LOF genotypes (176 indels and 87 SNVs) for validation on independent platforms (Supplementary Table 5). These variants were a mixture of our top phenotype-association results (Table 2) and convenience samples of other sites, with oversampling of indels because of previous experience with their validation rates. These genotypes represented 147 unique LOF sites (126 indels and 21 SNVs). Validation genotypes were re-genotyped via both Sanger sequencing and a targeted Sequenom panel. Twenty-four genotypes failed both assays. Concordant genotypes were observed for 225 LOF genotypes (148 indels and 77 SNVs), and at least one platform was discordant for 14 genotypes. Of note, all of the 14 discordant genotypes were validated on one platform or the other, suggesting inconsistencies between the validation platforms. Thus, according to definitions that are common in the field, the observed validation rate for sites was 100%, and the observed validation rate for genotypes was 94.1% (225/(263 − 24)). More specifically, the observed rate for genotypes was 97.4% for SNVs and 92.5% for indel sites, and this might be a conservative underestimate of the true validation rate of our Illumina HiSeq data.

Annotation

We defined LOF variation as sequence changes predicted to trigger nonsense-mediated decay of mRNA transcripts derived from all isoforms of a given gene. Thus, the basic annotation22 categories of variation analyzed were premature stop codons, essential splice-site-disrupting variations, and indels predicted to disrupt the downstream reading frame. We further enriched for variants likely to abolish protein formation by identifying and excluding (1) stop-gain mutations occurring in the terminal gene exon and (2) LOF candidates that did not map to chromosomal coordinates used by all gene isoforms for a given gene (low-confidence-partial). Finally, we excluded candidate LOF sites with an MAF > 0.5 and genes lacking introns or designated as non-protein-coding by RefSeq. The full list of LOF sites, along with their frequency in the ARIC study, is provided in Supplementary Table 6.

We used resampling methods to determine the relationship between sample size and LOF variants. From each N samples, we randomly chose n samples and counted both the number of LOF variants and the number of genes carrying LOF variants. We repeated the process 1,000 times and calculated the average numbers of LOF variants and genes carrying LOF variants for sample size n. Supplementary Figure 4 shows the average numbers of LOF variants and genes carrying LOF variants with increasing sample size.

Genotype-phenotype association

A gene-based burden test (T5)8 was used to evaluate the association between aggregated rare LOF variants and phenotypes. We chose this test because of its interpretable detection of unidirectional phenotype mean shifts between LOF carriers and noncarriers. For the T5 analyses, we subjected the studied phenotype measures to various transformations: ALT, AST, CRE, FI and LAC values underwent natural log transformation; FEV1/FVC, GGT and Mg values underwent power transformation; and Ca values were corrected by the total calcium ((mmol/l) + ((40 − serum albumin (g/dl)) × 0.025)). TCH values were adjusted (TCH/0.8) only among statin users, and measured SBP and DBP were respectively adjusted by +15 mm Hg and +10 mm Hg for individuals taking antihypertensive medication; all other traits did not require transformation. T5 tests were implemented using the SeqMeta package available in Cran R (http://cran.r-project.org/web/packages/seqMeta/), and only associations that were independently detected in both sample strata, that persisted with the inclusion of all samples and that were driven by ≥3 individuals are presented in Table 2. Allele frequencies were calculated separately for each ancestry group, and only variants with an observed MAF < 5% were included in ancestry-specific analyses. Based on a Bonferroni correction procedure for the number of genes in our sample presenting LOF variation (n = 11,260), a P value of 4.4 × 10−6 was considered statistically significant. Similarly, a P value of 5.42 × 10−5 was considered significant for associations driven by homozygous individuals, adjusting for the number of genes presenting homozygous LOF genotypes (n = 921). The full set of meta-analyses of ancestry-specific results of gene-based associations with P ≥ 0.05, including those with bidirectional variant effects and observed in <3 individuals, is provided in Supplementary Tables 3 and 4. Quantile-quantile plots of all T5 P values are provided in Supplementary Figures 5 and 6.

OP ratio

We developed the OP ratio as a gene-based metric to quantify LOF variation while accounting for transcript size, and as a useful tool for comparing the rate of LOF variation in different gene groups. This metric is the ratio of the number of observed LOF sites in a gene to the number of possible LOF sites that could arise as a result of single-nucleotide substitutions. To demonstrate this we compared this metric to other measures of gene variation. We used the eGenetics database23 to rank all genes by the number of tissues in which they are stably expressed, calling the top 5% of this list “universally expressed.” On average, we observed smaller OP ratios for stably expressed genes than for all others (Supplementary Fig. 2). Similarly, we sorted the genes according to the number of known protein interactions according to ConsensusPathDB24 and categorized the top 5% of these genes as highly interacting genes. This gene group also had smaller OP ratios on average compared to other genes (Supplementary Fig. 3).

We compared our OP ratio with the Residual Variation Intolerance Score (RVIS)25 for 15,053 genes with both an OP ratio and an RVIS available. The RVIS is based on the ratio of common nonsynonymous and splicing-site SNPs to the total numbers of coding SNPs according to the ESP6500 data set. Both the OP ratio and the RVIS are designed to measure a gene’s tolerance to damaging amino-acid changes, but they differ in the measurements used and the databases they are based on. Both the Pearson’s correlation coefficient (0.204) and the Spearman’s rank correlation coefficient (0.229) between the two scores were highly statistically significant (P ≈ 0), although we did not see a clear linear relationship between them (Supplementary Fig. 7).

Supplementary Material

Supplemental Figures
Supplemental Table 3
Supplemental Table 4
Supplemental Table 6

Acknowledgments

The Atherosclerosis Risk in Communities (ARIC) study is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute (NHLBI) contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C and HHSN268201100012C). We thank the staff and participants of the ARIC study for their important contributions. Funding support for “Building on GWAS for NHLBI-diseases: the U.S. CHARGE Consortium” was provided by the National Institutes of Health through the American Recovery and Reinvestment Act of 2009 (ARRA) (5RC2HL102419). Sequencing was carried out at the Baylor College of Medicine Human Genome Sequencing Center (U54 HG003273).

Footnotes

Accession codes. These data have been submitted to dbGaP under study accession phs000668.v1.p1.

Note: Any Supplementary Information and Source Data files are available in the online version of the paper.

AUTHOR CONTRIBUTIONS

A.H.L. carried out variant quality control, annotation and data analysis. X.L. developed the OP ratio and analyses of this metric. A.H.L., A.C.M., L.M.P. and B.Y. did statistical analyses of quantitative traits. C.K., G.M., D.M. and N.V. ensured that high-quality sequence variants were delivered for analyses. L.A.C., J.A.B. and T.L. were involved with study design. T.H.M. coordinated clinical data collection and recruitment. R.A.G. and E.B. provided materials and project oversight. A.H.L., E.B., A.C.M., X.L., B.Y. and L.M.P. prepared the manuscript.

COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

References

  • 1.MacArthur DG, et al. A systematic survey of loss-of-function variants in human protein coding genes. Science. 2012;335:823–828. doi: 10.1126/science.1215040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Phillips IR, Shephard EA. Flavin-containing monooxygenases: mutations, disease and drug response. Trends Pharmacol Sci. 2008;29:294–301. doi: 10.1016/j.tips.2008.03.004. [DOI] [PubMed] [Google Scholar]
  • 3.Margaritte P, Bonaiti-Pellie C, King MC, Clerget-Darpoux F. Linkage of familial breast cancer to chromosome 17q21 may not be restricted to early-onset disease. Am J Hum Genet. 1992;50:1231–1234. [PMC free article] [PubMed] [Google Scholar]
  • 4.Liu X, Jian X, Boerwinkle E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat. 2011;32:894–899. doi: 10.1002/humu.21517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Liu X, Jian X, Boerwinkle E. dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum Mutat. 2013;34:E2393–E2402. doi: 10.1002/humu.22376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Dang VT, Kassahn KS, Marcos AE, Ragan MA. Identification of human haploinsufficient genes and their genomic proximity to segmental duplications. Eur J Hum Genet. 2008;16:1350–1357. doi: 10.1038/ejhg.2008.111. [DOI] [PubMed] [Google Scholar]
  • 7.Georgi B, Voight BF, Bućan M. From mouse to human: evolutionary genomics analysis of human orthologs of essential genes. PLoS Genet. 2013;9:e1003484. doi: 10.1371/journal.pgen.1003484. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83:311–321. doi: 10.1016/j.ajhg.2008.06.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Cohen J, et al. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nat Genet. 2005;37:161–165. doi: 10.1038/ng1509. [DOI] [PubMed] [Google Scholar]
  • 10.Crosby J, et al. Loss-of-function mutations in APOC3, triglycerides, and coronary disease. N Engl J Med. 2014;371:22–31. doi: 10.1056/NEJMoa1307095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Jin Y, et al. Risk of type 1 diabetes progression in islet autoantibody-positive children can be further stratified using expression patterns of multiple genes implicated in peripheral blood lymphocyte activation and function. Diabetes. 2014;63:2506–2515. doi: 10.2337/db13-1716. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Gizer IR, et al. Linkage scan of nicotine dependence in the University of California, San Francisco (UCSF) Family Alcoholism Study. Psychol Med. 2011;41:799–808. doi: 10.1017/S0033291710001273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Barbaric I, Miller G, Dear TN. Appearances can be deceiving: phenotypes of knockout mice. Brief Funct Genomic Proteomic. 2007;6:91–103. doi: 10.1093/bfgp/elm008. [DOI] [PubMed] [Google Scholar]
  • 14.Schäffler A, Buechler C. CTRP family: linking immunity to metabolism. Trends Endocrinol Metab. 2012;23:194–204. doi: 10.1016/j.tem.2011.12.003. [DOI] [PubMed] [Google Scholar]
  • 15.Sheridan C. Phase 3 data for PCSK9 inhibitor wows. Nat Biotechnol. 2013;31:1057–1058. doi: 10.1038/nbt1213-1057. [DOI] [PubMed] [Google Scholar]
  • 16.Morrison AC, et al. Whole-genome sequence-based analysis of high-density lipoprotein cholesterol. Nat Genet. 2013;45:899–901. doi: 10.1038/ng.2671. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.ARIC Investigators. The Atherosclerosis Risk in Communities (ARIC) study: design and objectives. Am J Epidemiol. 1989;129:687–702. [PubMed] [Google Scholar]
  • 18.Reid JG, et al. Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline. BMC Bioinformatics. 2014;15:30. doi: 10.1186/1471-2105-15-30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26:589–595. doi: 10.1093/bioinformatics/btp698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Challis D, et al. An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics. 2012;13:8. doi: 10.1186/1471-2105-13-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Grove ML, et al. Best practices and joint calling of the HumanExome BeadChip: the CHARGE Consortium. PLoS One. 2013;8:e68095. doi: 10.1371/journal.pone.0068095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kelso J, et al. eVOC: a controlled vocabulary for unifying gene expression data. Genome Res. 2003;13:1222–1230. doi: 10.1101/gr.985203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kamburov A, Stelzl U, Lehrach H, Herwig R. The ConsensusPathDB interaction database: 2013 update. Nucleic Acids Res. 2013;41:D793–D800. doi: 10.1093/nar/gks1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Petrovski S, Wang Q, Heinzen EL, Allen AS, Goldstein DB. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 2013;9:e1003709. doi: 10.1371/journal.pgen.1003709. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Figures
Supplemental Table 3
Supplemental Table 4
Supplemental Table 6

RESOURCES