Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Apr 1.
Published in final edited form as: Circ Genom Precis Med. 2018 Apr;11(4):e002178. doi: 10.1161/CIRCGEN.118.002178

Functional Assays to Screen and Dissect Genomic Hits: Doubling Down on the National Investment in Genomic Research

Kiran Musunuru 1, Daniel Bernstein 2,3, F Sessions Cole 4, Mustafa K Khokha 5, Frank S Lee 6, Shin Lin 7, Thomas V McDonald 8, Ivan P Moskowitz 9, Thomas Quertermous 3,10, Vijay G Sankaran 11,12, David A Schwartz 13, Edwin K Silverman 14, Xiaobo Zhou 14, Ahmed AK Hasan 15, Xiao-zhong James Luo 15
PMCID: PMC5901889  NIHMSID: NIHMS953338  PMID: 29654098

Executive Summary

The National Institutes of Health (NIH) have made substantial investments in genomic studies and technologies to identify DNA sequence variants associated with human disease phenotypes. The National Heart, Lung, and Blood Institute (NHLBI) has been at the forefront of these commitments to ascertain genetic variation associated with heart, lung, blood, and sleep (HLBS) diseases and related clinical traits. Genome-wide association studies (GWASs), exome and genome sequencing studies, and exome genotyping studies of NHLBI-funded epidemiological and clinical case-control studies are identifying large numbers of genetic variants associated with HLBS phenotypes. However, investigators face challenges in identification of genomic variants that are functionally disruptive among the myriad of computationally implicated variants. Studies to define mechanisms of genetic disruption encoded by computationally identified genomic variants require reproducible, adaptable, and inexpensive methods to screen candidate variant and gene function. High-throughput strategies will permit a tiered variant discovery and genetic mechanism approach that begins with rapid functional screening of a large number of computationally implicated variants and genes for discovery of those that merit mechanistic investigation. As such, improved variant-to-gene and gene-to-function screens—and adequate support for such studies—are critical to accelerating the translation of genomic findings. In this White Paper, we outline the variety of novel technologies, assays, and model systems that are making such screens faster, cheaper, and more accurate, referencing published work and ongoing work supported by the NHLBI’s R21/R33 Functional Assays to Screen Genomic Hits program. We discuss priorities that can accelerate the impressive but incomplete progress represented by “big data” genomic research.

In considering the progress to date and the needs for the future, we recommend the following:

  1. There should be programs with similar goals as the R21/R33 Functional Assays to Screen Genomic Hits program, with greater capacity and review by specialized study sections that can properly evaluate post-genomic functional studies.

  2. Any future programs geared towards discovery-oriented genomic research (sequencing, transcriptomics, proteomics, metabolomics, etc.) should, in conjunction with those discovery programs, be complemented by programs providing adequate capacity for functional screening and dissection of genomic hits.

A Substantial Investment in Genomic Research to Date

Since the completion of the Human Genome Project, the NHLBI has invested hundreds of millions of U.S. dollars on various genomic endeavors via a number of funding opportunities that have taken advantage of successive advances in genotyping, sequencing, and “omics” technologies (which collectively will be labeled as “genomics” for the purposes of this White Paper, since epigenetic modifications, RNA transcripts, proteins, and metabolites all relate to the expressed genome1). Leveraging existing NHLBI-funded population and clinical trial cohorts, the SNP Health Association Resource (SHARe) and Candidate-gene Association Resource (CARe) programs generated genotyping data on more than 60,000 individuals of varied ethnicities. The SNP Typing for Association with Multiple Phenotypes from Existing Epidemiologic Data (STAMPEED) program obtained genotyping data from a large number of additional individuals with various HLBS disorders and risk factors. Subsequently, the NHLBI GO Exome Sequencing Project (ESP) has enabled exome sequencing of 200,000 individuals from the aforementioned cohorts and other cohorts. Together, these programs have implicated millions of candidate genetic variants in a range of phenotypes spanning HLBS research.

In the pediatrics arena, the NHLBI has funded the Pediatric Cardiac Genomics Consortium, a branch of the Bench to Bassinet Program that seeks to understand the genetic architecture of congenital heart disease. The Consortium has collected DNA samples on over 10,000 probands and performed exome sequencing on well over 2,000 trios, identifying well over 400 genes that are candidates for congenital heart disease. Many of these genes have no known role in cardiac development, offering an exciting avenue for exploration.

The Trans-Omics for Precision Medicine (TOPMed) program is an ongoing multi-phase endeavor to obtain genome sequencing and other genomic data and integrate them with molecular, behavioral, imaging, environmental, and clinical data related to HLBS disorders. The TOPMed program plans to generate about 150,000 genome sequences and 50,000 RNA transcriptome sequences. As of November 30, 2017, the TOPMed program has generated over 100,000 genome sequences (1.3 × 1016 sequenced bases), and 10,000 RNA transcriptome sequences are in the pipeline. The TOPMed program, in a pilot, has also generated 3,000 RNA transcriptome sequences, 2,200 methylation profiles, and 2,000 metabolomic profiles on a number of blood cell types from NHLBI-funded longitudinal population cohorts. The TOPMed program also provides funding opportunities for the development of computational methods and tools to analyze genomic data.

One resource for the genomics community at large for determining causal variants and genes underlying the findings of genetic association studies has been generated by the Genotype-Tissue Expression (GTEx) Project,24 funded by the National Institutes of Health (NIH) Common Fund. Under this effort, genome sequencing of nearly 1,000 subjects has been undertaken, and RNA sequencing of multiple tissues sampled from these same individuals has been performed (total of ≈17,500 transcriptome sequences). Multiple strategies have been devised for incorporating these big data for dissecting GWAS-associated loci. One general approach has sought to colocalize GWAS mapping hits with expression quantitative trait loci (eQTLs)—i.e., genotype–gene expression associations—identified from the GTEx data.5 Another has been to impute the expression of genes of a locus based on the GTEx data from the genotypes of a GWAS6 or summary statistics thereof;7 to determine the causal gene, the expression values of the genes are then tested for association with the phenotype of study. Although simulations and various indirect measures that incorporate GTEx data should improve upon fine mapping of variants by linking them to putative regulatory sequences for target genes8 and pave the way for mechanistic studies of the target genes in appropriate in vivo or in vitro model systems, the optimal strategy is still unclear. Moreover, the extent of this improvement has yet to be quantified. Both objectives cannot be acheived without functional verification of putative causal variants and candidate trait genes in appropriate model systems.

A related effort supported by the NHLBI is the Next Generation Genetic Association Studies consortium, which has resulted in the generation of human induced pluripotent stem cell (iPSC) lines from more than 1,000 individuals of varied sex and ethnicity, both healthy individuals and patients with HLBS disorders.9 These iPSC lines are being differentiated into HLBS-relevant cell types and used for intensive genomic studies, e.g., expression quantitative trait locus (eQTL) analyses that are complementary to those performed with GTEx data. One advantage of the use of iPSC lines for eQTL studies is that they can serve not only as a platform for the discovery of eQTLs but also as a model system for the functional dissection of the causal variants and genes underlying those eQTLs.10

A different type of effort is represented by the Library of Integrated Network-based Cellular Signatures (LINCS) program, funded by the NIH Common Fund, which seeks to generate various types of genomic data (transcriptomics, proteomics, epigenomics, etc.) from cultured and primary human cells treated with various perturbagens (bioactive molecules, growth factors, cytokines, other ligands, short hairpin RNAs, CRISPR-based tools, etc.). These experiments can be subjected to integrative analyses that yield cellular signatures across different cell lines.11

The scientific output from these genomics efforts funded by the NHLBI and other NIH programs has been astounding, far too extensive to be covered even briefly in this White Paper. However, the overwhelming preponderance of the data generated by these efforts has permitted genotype-phenotype associations—correlations between genetic variants or other types of molecular variation and HLBS phenotypes. Substantially less progress has occurred in establishing which of these associations are causal—knowledge that is critical for any application of genomic data to patient diagnosis or the identification of novel therapeutic targets (whether regulatory genomic sequences, genes, RNA transcripts, or proteins). While proof of causation is critical, the molecular mechanisms are generally completely unknown, which are equally essential for understanding pathogenesis that can substantially impact therapeutics. In light of this gap, we argue that the time is right for a substantial effort to convert association into causation and achieve a mechanistic understanding of the molecular underpinnings of HLBS disorders and traits.

The NHLBI has taken the first steps towards capitalizing on genomic association data to achieve a broad-based mechanistic understanding of HLBS disorders with an R21/R33 funding opportunity that has unfolded over the past few years, the Functional Assays to Screen Genomic Hits program. The next sections of this White Paper are devoted to a discussion of the types of projects funded by this program, as well as similar projects funded by other NIH programs. We discuss how these projects illuminate the path forward and the challenges that lie ahead in implementing broad-based functional assays to screen and dissect the thousands of genomic hits that have emerged in the study of HLBS disorders.

Coding Versus Noncoding Variant Discovery: A Fundamental Distinction

The different approaches to the discovery of genetic variants associated with diseases dictate to some degree the types of variants that will be discovered. For example, exome sequencing, in which the coding sequences that make up only 1%–2% of the genome are biochemically isolated prior to analysis, is unsurprisingly focused on the discovery of nonsynonymous, frameshift, and splice-site variants. The presumption of exome sequencing studies is that genetic causation lies in coding variation. This understanding dictates that functional variant discovery include the direct interrogation of altered protein sequences for understanding of how coding variation affects phenotypes. On the other hand, GWASs rely on common genetic variation for the identification of variants associated with phenotypes. Although genomically unbiased, GWASs are tilted towards the implication of noncoding variants, and many studies now indicate that the vast majority of phenotypically meaningful variation identified by GWASs lies in noncoding regions.12 Noncoding variants associated with specific phenotypes presumably alter gene expression and require that investigators pursue transcriptional regulatory mechanisms. Whole genome sequencing permits unbiased assessment of coding, noncoding, and structural/copy number variants in disease and requires model systems to assess function of both coding and noncoding variants.

Platform I: Coding Variant Analysis

Model Systems: Animal Models, Standard Cultured Cells, and Induced Pluripotent Stem Cells

A paramount consideration in developing functional assays to screen and dissect genomic hits is the choice of model system. A perennial favorite is Mus musculus, the house mouse, due to a variety of factors: biological complexity similar to human that allows for deep functional phenotyping potentially relevant to human disease; a gestation period, breeding cycle, and lifespan that permit completion of a research study within a few years; and the availability of a variety of well-established and routinely used approaches to perform germline modification of the mouse genome. These advantages are being leveraged by recent efforts causally to link DNA variants identified in genomic studies in humans to HLBS traits that can be studied in mice. A recent analysis of >300,000 human exomes in relation to blood lipid traits identified novel associations between coding variants in JAK2 and A1CF with cholesterol and triglyceride levels, respectively.13 Knock-in mice with these coding variants, generated either by traditional gene targeting in mouse embryonic stem cells or by microinjection of the clustered regularly interspaced short palindromic repeats (CRISPR)–CRISPR-associated 9 (Cas9) system into mouse embryos, displayed concordant changes in blood cholesterol and triglyceride levels, confirming the genotype-phenotype relationships and offering the opportunity to dissect the mechanistic basis of each relationship.13 Application of genomic approaches and phenotypic characterization in mouse models with deficiencies of chronic obstructive pulmonary disease (COPD) GWAS-nominated genes has expanded our understanding of the development of COPD.1416

In another example, the coding sequence of exon 1 of the human PHD2 gene harboring amino acid substitutions (D4E/C127S) present in high altitude-adapted Tibetans17,18 was introduced into mice. While Asp-4 is conserved between human and mouse PHD2, Cys-127 is not, necessitating the generation of a control mouse line in which exon 1 of wild-type human PHD2 was introduced in a similar manner. These mice are being examined for resistance to hypoxemia-induced erythropoiesis and pulmonary hypertension, phenotypic features of Tibetan adaptation to chronic hypoxia. Interestingly, the D4E/C127S substitutions lead to loss of interaction of PHD2 with the HSP90 cochaperone p23, an interaction that ordinarily facilitates PHD2-catalyzed prolyl hydroxylation of Hypoxia Inducible Factor-α, leading to the proposition that Tibetan PHD2 is a loss-of-function allele.19,20 However, it has also been proposed that Tibetan PHD2 is a gain-of-function allele.21

The primary disadvantage of the mouse as a model system to interrogate the functional consequences of DNA variation or gene knockout is that even with genome-editing tools like CRISPR-Cas9, it is not feasible to assess the effects of many variants or many genes in a high-throughput fashion with germline-modified mice. This shortcoming can be mitigated through the use of viral vectors or nonviral means to modulate gene expression in target organs in postnatal animals. For example, adeno-associated viral vectors have been used to achieve hepatic overexpression of a series of candidate causal genes implicated in lipoprotein metabolism by GWASs on blood lipid traits.10,22,23 These studies have proven fruitful in determining which of the genes are causal, with experiments taking only a few weeks to complete. Similarly, the use of adeno-associated viral vectors expressing short hairpin RNAs or the use of synthetic short interfering RNAs encased in lipid nanoparticles to knock down candidate gene expression in mouse liver has been informative.22,23

Another disadvantage of the murine model is species-specific human disease pathophysiology. For example, human cardiotoxicity associated with individual members of the tyrosine kinase inhibitor class of chemotherapeutic agents does not correlate with murine cardiotoxicity.24 For the study of cardiomyopathies, the murine dominance of the α-MHC instead of the β-MHC isoform of myosin heavy chain complicates the study of hypertrophic cardiomyopathy, as similar mutations exhibit different biomechanics when placed on each of these two protein backbones.25

Compared to mice, alternative animal models potentially offer a higher-throughput approach to interrogate the functional consequences of DNA variation or gene knockout, due to massive clutch sizes compared to mouse, easy manipulation of the genome or knockdown/overexpression experiments, and substantially lower costs. In particular, easy accessibility to embryos and external development makes many non-mammalian models systems particularly well suited for questions about organ development including the heart, lung, and blood. These advantages must be balanced with evolutionary distance from humans, which might prevent faithful modeling of human disease phenotypes. Commonly used models such as Drosophila, Caenorhabditis elegans, and zebrafish permit germline screening of many more genes and variants than is feasible with mouse, but their HLBS anatomy and physiology differ substantially from humans in many important respects. Nonetheless, the literature is replete with examples of excellent, informative studies performed with these model organisms (too many to summarize here), and substantial infrastructure for high-throughput analyses of flies, worms, and fish already exists. Although the R21/R33 Functional Assays to Screen Genomic Hits program did not happen to fund any studies with these organisms, future efforts should certainly consider the best strategies to exploit the advantages of these and other similar organisms in post-genomic functional studies.

One model organism that was supported by the R21/R33 program was Xenopus, the frog model. With its relatively favorable evolutionary distance from humans along with high throughput and low cost, Xenopus might provide an optimal balance.26,27 The genomes of Xenopus laevis and its diploid cousin Xenopus tropicalis have shown remarkable conservation with human where substantial synteny across the genome is preserved compared to zebrafish. In addition, as air-breathing tetrapods, frogs have lungs and their cardiac anatomy includes two atria and a trabeculated ventricle unlike the two-chambered hearts in teleosts. Finally, like other high-througput systems, frogs can go from fertilization to a functional heart in just three days, making screens rapid and efficient. Recently, CRISPR-Cas9 technology has transformed the ability to deplete genes in X. tropicalis.28,29 By injection of CRISPR-Cas9 ribonucleoproteins, biallelic deletions can be detected just 2 hours after injection, well before the onset of zygotic gene activation. Therefore, F0 animals can be screened for phenotypes. Coupling optical coherence tomography for near-histology-resolution live imaging, even subtle cardiac phenotypes can be readily detected, including myocardial dysfunction, and phenocopy of patients can be evaluated.30 Using this approach, a single laboratory can screen well over 100 genes in the Xenopus model per year. Once candidate genes that phenocopy patients have been identified in this high-throughput model, functional analysis of the genes has led to multiple and diverse insights into fundamental cell and developmental biology including cardiac development.3133

A salient advantage of whole-animal systems is the ability to identify therapeutics that act on the whole animal to in turn affect a particular tissue. This advantage notwithstanding, standard cultured cells, i.e., transformed or immortalized cell lines that can be maintained in culture indefinitely as well as primary cells that are transiently maintained in culture, provide a useful and complementary approach to animal models. They offer several advantages: they are derived from humans or other higher mammals; they can be grown in large numbers and used in many experimental replicates; they are readily modified with the additional of genetic vectors via transfection, electroporation, or infection; and full experiments can be performed in as little as a few days. Disadvantages include: the inability to model complex multi-tissue phenotypes, largely restricting studies to cell-autonomous phenotypes; aneuploidy in transformed or immortalized cell lines; the inability to maintain some primary cell types in culture for more than a few days, with others limited to only a few passages; and the difficulty of procuring some cell types (e.g., cardiomyocytes).

iPSCs offer a means to overcome some of the disadvantages of standard cultured cells. Properly maintained iPSCs have human genomes with normal karyotypes through a sizable number of passages, and they can be differentiated into any of a variety of cell types, even ones that are difficult to procure as primary cells. While it is well recognized that differentiated iPSCs tend to be immature and heterogeneous—a notable disadvantage if trying to assess phenotypes that manifest in mature cells—they have proven to be useful in the modeling of a variety of HLBS traits. Furthermore, they have proven to be quite amenable to genome editing with tools such as transcription activator-like effector nucleases (TALENs) and CRISPR-Cas9,34,35 making them useful for modeling the functional effects of DNA variation on cellular phenotypes. Another advantage of iPSCs is that each line is genetically matched to the patient from whom it was derived, potentially allowing for the modeling of complex traits or diseases for which the genetic basis is not well understood.

iPSC-derived cardiomyocytes have been useful in recapitulating cancer patients’ predilection to doxorubicin-induced cardiotoxicity.36 Previous studies, including GWASs, have identified gene variants associated with a markedly increased risk of doxorubicin-induced cardiomyopathy, including a nonsynonyomous variant in the RARG (retinoic acid receptor gamma) gene.37 iPSC-derived cardiomyocytes have been derived from children carrying this variant, and the increased cardiotoxicity has been recapitulated in vitro. Demonstrating the power of using iPSC-derived cardiomyocytes combined with CRISPR-Cas9 gene editing, proof of the role of this variant has been obtained by editing the RARG variant back to wild-type, showing that it reverses the increased cardiotoxicity. Finally, the iPSC platform has allowed the identification of the mechanism by which this variant alters the cardiomyocyte response to this chemotherapeutic agent and identification of several potential new cardioprotective drugs. Thus, by careful confirmation using a human cardiomyocyte platform, it is possible for GWAS-based data to be translated directly into patient care in the evolving field of cardio-oncology pharmacogenomics.

Methods for the Rapid and Robust Interrogation of Coding Variants and Genes

Exome sequencing studies and exome genotyping studies, by design, identify coding variants that can potentially be linked to clinical traits and diseases. The functional effects of coding variants can be modeled in animal and cellular models with genome-editing tools such as TALENs and CRISPR-Cas9. Unlike with GWAS-identified variants, which are typically common in the population, coding variants with functional effects tend to be rare (although within any given gene, there may be many different coding variants). A substantial limitation of genome editing is the inefficiency of homology-directed repair to knock in a candidate variant, which limits the ability to screen variants individually or in a high-throughput fashion. This inefficiency remains a major challenge to overcome in the screening and dissection of genomic hits. More efficient knock-in methods such as base editing are being developed, and while base editing was originally limited to select nucleotide substitutions, recent developments have encouragingly expanded the range of allowable base-editing substitutions.38

Alternative approaches to interrogating coding variants that do not require precise knock-in into the endogenous gene are proving to be fruitful. One such approach is to use viral or non-viral vectors heterologously to express wild-type or mutant genes in cells with genetically silenced or normal candidate gene expression. The functional consequences of the different variants can thus be directly compared against the consequences of the wild-type gene, which serves as the reference. If a resulting phenotype can be measured in a high-throughput format, then a large number of variants can be assessed in a parallel fashion, often in a matter of days. This approach is being applied to functional surfactant gene mutations, for which lipidomic analysis provides a quantitative readout. Specifically, adenoviral-mediated, concurrent silencing and rescue of expression of mutations in the ATP-cassette binding transporter A3 gene (ABCA3) in a human, pulmonary, adenocarcinoma-derived epithelial cell line, A549, have suggested that lipidomic signatures, lamellar body phenotype, and ABCA3 itinerary can be used for high-throughput screening of drugs that can reconstitute ABCA3 mutant function.39

Ion channel arrhythmia gene variants are being interrogated in a similar fashion—heterologous expression of a panel of >1200 variants in the 3 most commonly sequenced cardiac ion channel genes in clinical settings (KCNH2, KCNQ1, and SCN5A for long QT syndrome and sudden cardiac death) in cells that are then evaluated by an automated patch-clamping platform. A combined approach of protein expression analyses and functional characterization allows for initial phenotypic classification. Preliminary analyses indicate that ~55% of variants exhibit deleterious function, ~25% with partial or intermediate dysfunction, and ~20% that are indistinguishable from wild-type. This work provides a framework and resource for investigators in the field and should have a tangible impact on clinical management of an increasing number of patients and future research on hereditary cardiac arrhythmias. Of note, these cardiac ion channel genes that are so important to human health are poorly expressed in mouse heart and for that reason are more tractable to study in cells in vitro.

For genes whose functions are faithfully modeled in animal models, assessing a series of coding variants in germline-modified animal models would be prohibitive, especially in mice. The use of viral vectors to express a wild-type gene versus mutant versions of the gene with missense variants in postnatal mice can permit the rapid assessment of the consequences of each variant on protein function. In one study, this approach was used to functionally annotate 11 variants in the ANGPTL3 gene with respect to the protein’s effects on blood triglyceride and cholesterol levels, and the same approach could readily be scaled up to assess dozens to hundreds of variants in the gene.40

The logical extreme of experimental efforts to functionally characterize a series of coding variants in a gene is saturation mutagenesis, i.e., mutate every possible variant in a single experiment. The so-called multiplex assay of variant effect (MAVE) is an ambitious goal, but one that is starting to gain traction in the laboratory. In a sterling proof-of-principle effort, a MAVE with the PPARG gene involved in lipodystrophy and type 2 diabetes mellitus yielded a functional readout for all 9,595 possible single amino acid substitutions in the protein product.41 We can expect the MAVE methodology to be transformative in the study of coding variants in genes linked to clinical phenotypes, allowing for the retrospective characterization of variants already cataloged in the population as well as the prospective characterization of variants not yet observed in nature.

Compared to the study of specific gene variants, the interrogation of candidate genes is somewhat more straightforward due to the availability of well-established techniques to modulate the expression of genes, whether heterologous overexpression or knockdown/knockout with RNA interference or genome-editing tools. These techniques can be readily applied to perform genome-wide screens or more targeted screens, e.g., all of the genes in or near loci implicated by a GWAS for a phenotype of interest. A large-scale RNA interference screen has been employed to interrogate candidate genes for erythroid cell traits by examining the effects of gene knockdown on the differentiation of primary human hematopoietic stem cells into mature erythoid cells. In this screen, approximately 400 genes were targeted using a median of 6 short hairpin RNAs. This screen has uncovered 50 candidate genes in 42 of 75 erythroid cell trait GWAS loci, including many potential therapeutic targets, revealing the value of gene-centric approaches for GWAS follow-up.

Platform II: Noncoding Variant Analysis

Interrogation of noncoding variation requires understanding of the current principles of gene regulation. In order to explain functional noncoding genetic variation, we must understand the fundamental characteristics of each specific causal variant and the cis-regulatory element (CRE) in which it lies: (1) in which cell type and at what developmental stage the CRE is active; (2) what are the target gene(s) of the CRE in the relevant cell type; (3) to what degree the variant alters gene regulatory function of the CRE; (4) what are the transcription factors that modulate CRE function. These tasks are complicated by the size of the regulatory genome: although coding sequences comprise only 1%–2% of the genome, gene regulatory function has been ascribed to a much larger percentage of the noncoding genome, effectively requiring consideration of the entire genome.

As noted, GWASs identify noncoding variants associated with the trait or disease in question; only in a minority of cases do coding variants appear to be responsible for the associations.12 The initial variants identified by GWASs are typically not causal variants themselves but are linked to causal variants in a haplotype block. Hundreds or even thousands of noncoding variants could be strongly linked to the top GWAS-identified variant in a locus. In principle, each of these is a candidate causal variant that might affect gene function through the regulation of expression, splicing, or transcript itinerary. To complicate the picture further, multiple causal variants in a single locus may combine to influence gene function and thereby the phenotype of interest. Thus, techniques are needed to screen through large numbers of candidates simultaneously, rather than testing each candidate one at a time.

Methods for the Rapid and Robust Interrogation of Noncoding Variants and Regulatory Landscapes

Genetic variants that alter gene regulatory function should reside within regions that possess gene regulatory activity. One common approach is therefore to map the locus for epigenomic features that suggest regulatory activity, with the rationale that causal variants are more likely to lie in regions with these features. Epigenetic features suggestive of cis-regulatory activity include chomatin accessibility or various histone marks correlated with enhancer activity. A variety of techniques is available to assess for regions of open chromatin where DNA is more accessible for binding of transcription factors: micronuclease sequencing (MNase-seq),42 DNase I hypersensitivity sequencing (DNase-seq),43 formaldehyde-assisted isolation of regulatory elements and sequencing (FAIRE-seq),44 and the assay for transposase-accessible chromatin and sequencing (ATAC-seq).45 Chromatin immunoprecipitation sequencing (ChIP-seq) with antibodies specific for chromatin modifications suggestive of enhancer activity46 or for individual transcription factors is also useful for pinpointing regions with regulatory potential. An enormous amount of data has been generated with these various techniques through the NIH-funded Encyclopedia of DNA Elements (ENCODE)47 and the Roadmap Epigenomics Project48 and is available for public use. Integration of the data generated with these techniques with the set of SNPs in linkage disequilibrium with a GWAS hit has allowed the identification of causal genetic variation, validated by direct functional interrogation in animal model systems in vivo.49

Gene regulatory elements act to affect target gene expression by direct contact with target promoters through three-dimensional space. Therefore, in addition to the methods that attempt to identify regulatory regions by epigenetic proxy as described above, recent efforts have sought to map three-dimensional chromatin interactions between regulatory elements and target gene promoters. These efforts have been performed at either genome-wide scale, such as Hi-C50 and chromatin interaction analysis by paired-end tag sequencing (ChIA-PET),51 or at specific genomic locations, such as circularized chromosome conformation capture (4C)52 and capture Hi-C.53 Application of new methods that incorporate regulatory region profiling and chromatin interaction measurements such as proximity ligation-assisted ChIP-seq (PLAC-seq)54 and HiChIP55 in relevant cell types will greatly increase our capacity to establish functional variant-gene connections for associated regions in HLBS diseases. An increasing amount of chromatin interaction data in multiple human and murine cell lines is becoming available with the completion of the NIH-funded 4D Nucleome Network project.56

More targeted use of epigenomic techniques has been useful in prioritizing candidate causal variants for further study. For example, primary human coronary artery smooth muscle cells (HCASMCs) have been employed systematically to characterize GWAS loci associated with coronary artery disease. Genomic studies, primarily HCASMC ATAC-seq investigation and mapping disequilibrium between HCASMC eQTLs and lead GWAS variants, have indicated that a significant portion of the genetic risk for coronary artery disease is encoded in variation in this cell type.57 Indeed, a number of causal variants and causal genes have been identified and their functional mechanism of disease association in smooth muscle cells investigated with in vitro and in vivo models.58,59

As noted above, the vast majority of causal variation does not reside within the coding exons of causal genes, but rather is located in unannotated stretches of DNA that are found upstream, downstream, or within intronic regions of the related causal gene. In some cases, the causal variant has been found to reside within a transcription factor binding site, and study of these cases has shown a direct mechanism of alteration of transcription factor binding differential regulation of gene expression.57,60 However, studies in HCASMCs and other studies suggest that alteration of transcription factor binding is not the mechanism for a majority of the causal variants,57,61 and efforts are underway to annotate the HCASMC genome to map the causal variants more accurately and understand the epigenomic pathways that lie upstream of the affected signaling pathways.62,63 Other strategies to understand epigenomic regulation include mapping quantitative trait loci that regulate transcription factor binding sites i that lie outside of canonical binding motifs and that regulate splicing, chromatin accessibility, and chromosomal looping. Through intersection of these quantitative trait loci with associated variation, it should be possible to identify more quickly each causal variant and understand the mechanism of association.

Alternative approaches use unbiased screening to identify loci that do not have typical epigenomic regulatory signatures. For example, massively parallel reporter assays (MPRAs) have the ability to test the regulatory activity of thousands of DNA sequences simultaneously.64,65 A large number of oligonucleotides spanning the sites of candidate causal variants, variously harboring the major alleles or minor alleles of the variants, are synthesized with unique barcodes and then subcloned into an expression plasmid. The library pool of plasmids is introduced into cells in vitro or into an animal in vivo. DNA and RNA sequencing of the barcode tags permits the calculation of the quantities of each expressed tag (RNA) compared to quantities of each tag in the original pool (DNA). Enrichment of RNA tags suggests enhancer sequences, whereas reduced tags suggest silencer sequences. Discordant expression of the major allele and minor allele versions of the same sequence identifies a putative causal variant that merits further investigation, e.g., genome editing to generate an allelic series of isogenic cell lines to demonstrate altered expression of the endogenous target gene.

MPRAs and follow-up genome editing studies have been successfully employed to screen and dissect causal variants in GWAS loci associated with erythroid cell traits66 and blood lipid levels.10 Ongoing work has focused on the FAM13A region on chromosome 4q22, which has been convincingly associated with COPD.67,68 Studies of Fam13a knockout mice have shown protection from emphysema development with cigarette smoke exposure and suggested that FAM13A promotes beta-catenin degradation.69 MPRAs are being used to screen more than 600 SNPs in this COPD GWAS region to find potential functional variants that influence gene expression.

Another unbiased approach uses genome-editing tools to alter systematically a candidate locus while preserving genomic context in cultured cells. TALENs or CRISPR-Cas9 are used to introduce insertion/deletion (indel) mutations at every possible target site throughout a locus, a form of saturation mutagenesis. By using a phenotypic readout of the target gene’s expression to sort high-expressing cells from low-expressing cells, it is possible to ascertain all sites with enhancer or silencer activity in the locus. The full complement of candidate causal variants can then be overlaid on the experimentally determined regulatory landscape to choose the variants to prioritize for further functional assessment. Perhaps the most notable application of this approach to date was to define an erythroid enhancer of human BCL11A first implicated by a GWAS on fetal hemoglobin levels, with possible implications for the treatment of β-hemoglobinopathies.70 Interestingly, the initial GWAS gene-centric follow-up of BCL11A as a regulator of fetal hemoglobin71 has already led to a clinical gene therapy trial at Boston Children’s Hospital, where short hairpin RNAs are being used to turn down its levels in patients with sickle cell disease. The approach of targeting noncoding elements similar to the BCL11A erythroid enhancer is now being applied to a variety of GWAS loci for various traits in hematopoietic stem cells, erythroid cells, and T cells.

For some genomic hits, functional studies can readily be carried out in multiple models—for example, tandem and complementary studies in human cells in vitro and genetically modified mouse models in vivo. The gain-of-function MUC5B promoter variant rs35705950 has been validated as a risk variant for idiopathic pulmonary fibrosis in ten independent studies, is the strongest known risk factor for the development of idiopathic pulmonary fibrosis, can potentially be used to identify individuals earlier in the course of disease, and represents a risk variant observed in >50% of the cases of idiopathic pulmonary fibrosis.72 To pursue the functional genomics of MUC5B, human airway epithelia with and without the MUC5B gain-of-function promoter variant were cultured at air-liquid interface and were found to develop endoplasmic reticulum stress and reduced wound closure following in vitro exposure to polyIC and tunicamycin. Moreover, studies with genetically engineered mice (Scgb1a1-Muc5bTg, SFTPC-Muc5bTg, and Muc5b−/−) indicate that the concentration of Muc5b is directly related to the extent of bleomycin-induced lung fibrosis (assessed by hydroxyproline and second harmonic imaging).

The importance of carefully defining the true genomic targets of noncoding genetic variation has been demonstrated in several cases in which published assumptions based on proximity annotation to GWAS hits has generated incorrect hypotheses about the actual causal genes. GWASs for cardiac arrhythmia identified SNPs located within SCN10A, described as a candidate novel sodium channel for cardiac rhythm biology based on genetic studies. Functional characterization demonstrated that the causal variation within SCN10A actually lies in a strong cardiac conduction system enhancer for a neighboring sodium channel gene, SCN5A, with a well-described dosage-sensitive relationship to cardiac rhythm.49 GWASs for obesity identified SNPs near FTO, a gene with a known relationship to body mass. However, these obesity-associated SNPs are in fact functionally connected with expression of IRX3, located hundreds of kilobases away, and IRX3 has proven to be a novel and important determinant of body mass.73 In each case, genomic data including SNP-promoter chromatin contacts were combined with laborious in vivo analysis of enhancer function to identify a non-local genomic target. Furthermore, proof of causation in each case required in vivo modeling and validation that altered target expression of the putative causal gene to demonstrate that the locus was meaningfully related to the phenotype of interest. This high bar for defining the mechanism of SNP-phenotype causation is essential if we are to deliver genomic targets that are actionable for future diagnostic and therapeutic work.

The Path Forward

Because advances in genomic technologies over the past few decades have generally outpaced advances in functional experimentation, the hundreds of millions of dollars of funding of genomic research programs have so far yielded a disproportionately small number of actionable findings with a direct impact on human health. Happily, the recent emergence of technologies such as TALENs and CRISPR-Cas9, MPRAs, MAVEs, iPSCs, newly implemented animal models, and high-throughput phenotyping platforms is starting to redress the imbalance between genomic discovery and functional follow-up studies. As outlined in this White Paper, investment in the R21/R33 Functional Assays to Screen Genomic Hits program has demonstrated the enormous potential of basic science research to bring genomic research to fruition.

We feel that fully capitalizing on the substantial investment in genomic research to date will require support for adequate capacity in functional screening and dissection of the genomic hits that have emerged from those programs. Accordingly, we propose the following. There should be programs with similar goals as the R21/R33 Functional Assays to Screen Genomic Hits program, with greater capacity. In particular, we must recognize that post-genomic functional studies are generally not favorably viewed at study sections since they are hypothesis-generating rather than hypothesis-derived. Therefore, these proposals should be reviewed with specialized study sections and expert panels in which throughput, likely failure rate, cost, etc., can be evaluated effectively. For example, a low-cost screen can have a higher failure rate than a high-cost screen and dissect more “hits” despite the higher failure rate due to throughput. These are not the traditional criteria that are considered at most study sections, so specialized study sections are required in order to properly evaluate studies. Targeted special programs committing resources to these endeavors that would allow groups to work together for longer periods and with greater budgets should be supported. In addition, incorporation of functional studies within ongoing discovery programs such as TOPMed should be considered. Any future programs geared towards discovery-oriented genomic research (sequencing, transcriptomics, proteomics, metabolomics, etc.) should, in conjunction with those discovery programs, be complemented by programs providing adequate capacity for functional screening and dissection of genomic hits. There have been very large funding commitments to genomic discovery efforts to date; continued commitments to this effort disconnected from adequate support for post-genomic functional studies would serve only to exacerbate the present imbalance between the two and to frustrate attempts to translate genomic discoveries for the benefit of human health. Furthermore, enhancing communication between investigators doing discovery and functional work is essential. This dialogue should be facilitated by collaborative programs that bring together people with expertise in genomic discovery, in various functional areas, and in specific HLBS diseases.

Acknowledgments

We are deeply grateful to the members of the TOPMed, ENCODE, and LINCS consortia for their feedback on this paper, particularly Leonard Zon, James Wilson, Aravind Subramanian, Stephen Rich, Kenneth Rice, James Meig, Ross Cagan, Ingrid Borecki, and Eric Boerwinkle.

Footnotes

DISCLOSURES: Dr. Cole serves on the Reproductive and Genetic Health Clinical Expert Panel for Illumina, Inc., and on the Scientific Advisory Board for ClearLine MD. Dr. Schwartz is the founder and chief scientific officer of Eleven P15, a company focused on the early diagnosis and treatment of pulmonary fibrosis; he has a patent awarded (US Patent no: 8,673,565) for the treatment and diagnosis of fibrotic lung disease. Dr. Silverman received honoraria from Novartis for Continuing Medical Education Seminars and grant and travel support from GlaxoSmithKline during the past three years. The other authors report no relevant disclosures. Any opinions, findings, and conclusions expressed in this paper are those of the authors and do not reflect the official views of the NHLBI or the NIH.

References

  • 1.Musunuru K, Ingelsson E, Fornage M, Liu P, Murphy AM, Newby LK, et al. The expressed genome in cardiovascular diseases and stroke: refinement, diagnosis, and prediction: a scientific statement from the American Heart Association. Circ Cardiovasc Genet. 2017;10:e000037. doi: 10.1161/HCG.0000000000000037. [DOI] [PubMed] [Google Scholar]
  • 2.GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45:580–585. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348:648–660. doi: 10.1126/science.1262110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.GTEx Consortium. Laboratory, Data Analysis &Coordinating Center (LDACC)—Analysis Working Group; Statistical Methods groups—Analysis Working Group; Enhancing GTEx (eGTEx) groups; NIH Common Fund; NIH/NCI; et al. Genetic effects on gene expression across human tissues. Nature. 2017;550:204–213. doi: 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 2014;10:e1004383. doi: 10.1371/journal.pgen.1004383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 2015;47:1091–1098. doi: 10.1038/ng.3367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BW, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet. 2016;48:245–252. doi: 10.1038/ng.3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Brown AA, Viñuela A, Delaneau O, Spector TD, Small KS, Dermitzakis ET. Predicting causal variants affecting expression by using whole-genome sequencing and RNA-seq from multiple human tissues. Nat Genet. 2017;49:1747–1751. doi: 10.1038/ng.3979. [DOI] [PubMed] [Google Scholar]
  • 9.Sweet DJ. iPSCs meet GWAS: the NextGen Consortium. Cell Stem Cell. 2017;20:417–418. doi: 10.1016/j.stem.2017.03.020. [DOI] [PubMed] [Google Scholar]
  • 10.Pashos EE, Park Y, Wang X, Raghavan A, Yang W, Abbey D, et al. Large, diverse population cohorts of hiPSCs and derived hepatocyte-like cells reveal functional genetic variation at blood lipid-associated loci. Cell Stem Cell. 2017;20:558–570e10. doi: 10.1016/j.stem.2017.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Keenan AB, Jenkins SL, Jagodnik KM, Koplev S, He E, Torre D, et al. The Library of Integrated Network-based Cellular Signatures NIH program: system-level cataloging of human cells response to perturbations. Cell Syst. 2018;6:13–24. doi: 10.1016/j.cels.2017.11.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Manolio TA. Genomewide association studies and assessment of the risk of disease. N Engl J Med. 2010;363:166–176. doi: 10.1056/NEJMra0905980. [DOI] [PubMed] [Google Scholar]
  • 13.Liu DJ, Peloso GM, Yu H, Butterworth AS, Wang X, Mahajan A, et al. Exome-wide association study of plasma lipids in >300,000 individuals. Nat Genet. 2017;49:1758–1766. doi: 10.1038/ng.3977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lao T, Jiang Z, Yun J, Qiu W, Guo F, Huang C, et al. Hhip haploinsufficiency sensitizes mice to age-related emphysema. Proc Natl Acad Sci U S A. 2016;113:E4681–E4687. doi: 10.1073/pnas.1602342113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lao T, Glass K, Qiu W, Polverino F, Gupta K, Morrow J, et al. Haploinsufficiency of Hedgehog interacting protein causes increased emphysema induced by cigarette smoke through network rewiring. Genome Med. 2015;7:12. doi: 10.1186/s13073-015-0137-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wan ES, Li Y, Lao T, Qiu W, Jiang Z, Mancini JD, et al. Metabolomic profiling in a Hedgehog Interacting Protein (Hhip) murine model of chronic obstructive pulmonary disease. Sci Rep. 2017;7:2504. doi: 10.1038/s41598-017-02701-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Xiang K, Ouzhuluobu, Peng Y, Yang Z, Zhang X, Cui C, et al. Identification of a Tibetan-specific mutation in the hypoxic gene EGLN1 and its contribution to high-altitude adaptation. Mol Biol Evol. 2013;30:1889–1898. doi: 10.1093/molbev/mst090. [DOI] [PubMed] [Google Scholar]
  • 18.Lorenzo FR, Huff C, Myllymäki M, Olenchock B, Swierczek S, Tashi T, et al. A genetic mechanism for Tibetan high-altitude adaptation. Nat Genet. 2014;46:951–956. doi: 10.1038/ng.3067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Song D, Li LS, Arsenault PR, Tan Q, Bigham AW, Heaton-Johnson KJ, et al. Defective Tibetan PHD2 binding to p23 links high altitude adaption to altered oxygen sensing. J Biol Chem. 2014;289:14656–14665. doi: 10.1074/jbc.M113.541227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bigham AW, Lee FS. Human high-altitude adaptation: forward genetics meets the HIF pathway. Genes Dev. 2014;28:2189–2204. doi: 10.1101/gad.250167.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Lorenzo FR, Huff C, Myllymäki M, Olenchock B, Swierczek S, Tashi T, et al. A genetic mechanism for Tibetan high-altitude adaptation. Nat Genet. 2014;46:951–956. doi: 10.1038/ng.3067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, Koseki M, et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature. 2010;466:707–713. doi: 10.1038/nature09270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Musunuru K, Strong A, Frank-Kamenetsky M, Lee NE, Ahfeldt T, Sachs KV, et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature. 2010;466:714–719. doi: 10.1038/nature09266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Cheng H, Kari G, Dicker AP, Rodeck U, Koch WJ, Force T. A novel preclinical strategy for identifying cardiotoxic kinase inhibitors and mechanisms of cardiotoxicity. Circ Res. 2011;109:1401–1409. doi: 10.1161/CIRCRESAHA.111.255695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Lowey S, Bretton V, Gulick J, Robbins J, Trybus KM. Transgenic mouse α- and β-cardiac myosins containing the R403Q mutation show isoform-dependent transient kinetic differences. J Biol Chem. 2013;288:14780–14787. doi: 10.1074/jbc.M113.450668. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Duncan AR, Khokha MK. Xenopus as a model organism for birth defects-Congenital heart disease and heterotaxy. Semin Cell Dev Biol. 2016;51:73–79. doi: 10.1016/j.semcdb.2016.02.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Garfinkel AM, Khokha MK. An interspecies heart-to-heart: Using Xenopus to uncover the genetic basis of congenital heart disease. Curr Pathobiol Rep. 2017;5:187–196. doi: 10.1007/s40139-017-0142-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Bhattacharya D, Marfo CA, Li D, Lane M, Khokha MK. CRISPR/Cas9: An inexpensive, efficient loss of function tool to screen human disease genes in Xenopus. Dev Biol. 2015;408:196–204. doi: 10.1016/j.ydbio.2015.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Moreno-Mateos MA, Vejnar CE, Beaudoin JD, Fernandez JP, Mis EK, Khokha MK, Giraldez AJ. CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo. Nat Methods. 2015;12:982–988. doi: 10.1038/nmeth.3543. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Deniz E, Jonas S, Hooper M, Griffin NJ, Choma MA, Khokha MK. Analysis of craniocardiac malformations in Xenopus using optical coherence tomography. Sci Rep. 2017;7:42506. doi: 10.1038/srep42506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Boskovski MT, Yuan S, Pedersen NB, Goth CK, Makova S, Clausen H, et al. The heterotaxy gene GALNT11 glycosylates Notch to orchestrate cilia type and laterality. Nature. 2013;504:456–459. doi: 10.1038/nature12723. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Del Viso F, Huang F, Myers J, Chalfant M, Zhang Y, Reza N, et al. Congenital heart disease genetics uncovers context-dependent organization and function of nucleoporins at cilia. Dev Cell. 2016;38:478–492. doi: 10.1016/j.devcel.2016.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Griffin JN, Del Viso F, Duncan AR, Robson A, Hwang W, Kulkarni S, et al. RAPGEF5 regulates nuclear translocation of β-catenin. Dev Cell. 2018;44:248–260e4. doi: 10.1016/j.devcel.2017.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ding Q, Lee YK, Schaefer EA, Peters DT, Veres A, Kim K, et al. A TALEN genome-editing system for generating human stem cell-based disease models. Cell Stem Cell. 2013;12:238–251. doi: 10.1016/j.stem.2012.11.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Ding Q, Regan SN, Xia Y, Oostrom LA, Cowan CA, Musunuru K. Enhanced efficiency of human pluripotent stem cell genome editing through replacing TALENs with CRISPRs. Cell Stem Cell. 2013;12:393–394. doi: 10.1016/j.stem.2013.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Burridge PW, Li YF, Matsa E, Wu H, Ong SG, Sharma A, et al. Human induced pluripotent stem cell-derived cardiomyocytes recapitulate the predilection of breast cancer patients to doxorubicin-induced cardiotoxicity. Nat Med. 2016;22:547–556. doi: 10.1038/nm.4087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Aminkeng F, Bhavsar AP, Visscher H, Rassekh SR, Li Y, Lee JW, et al. A coding variant in RARG confers susceptibility to anthracycline-induced cardiotoxicity in childhood cancer. Nat Genet. 2015;47:1079–1084. doi: 10.1038/ng.3374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Gaudelli NM, Komor AC, Rees HA, Packer MS, Badran AH, Bryson DI, Liu DR. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature. 2017;551:464–471. doi: 10.1038/nature24644. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wambach JA, Yang P, Wegner DJ, Heins HB, Kaliberova LN, Kaliberov SA, et al. Functional characterization of ATP-binding cassette transporter A3 mutations from infants with respiratory distress syndrome. Am J Respir Cell Mol Biol. 2016;55:716–721. doi: 10.1165/rcmb.2016-0008OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Stitziel NO, Khera AV, Wang X, Bierhals AJ, Vourakis AC, Sperry AE, et al. ANGPTL3 deficiency and protection against coronary artery disease. J Am Coll Cardiol. 2017;69:2054–2063. doi: 10.1016/j.jacc.2017.02.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Majithia AR, Tsuda B, Agostini M, Gnanapradeepan K, Rice R, Peloso G, et al. Prospective functional classification of all possible missense variants in PPARG. Nat Genet. 2016;48:1570–1575. doi: 10.1038/ng.3700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–837. doi: 10.1016/j.cell.2007.05.009. [DOI] [PubMed] [Google Scholar]
  • 43.Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH, Weng Z, et al. High-resolution mapping and characterization of open chromatin across the genome. Cell. 2008;132:311–322. doi: 10.1016/j.cell.2007.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Gaulton KJ, Nammo T, Pasquali L, Simon JM, Giresi PG, Fogarty MP, et al. A map of open chromatin in human pancreatic islets. Nat Genet. 2010;42:255–259. doi: 10.1038/ng.530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013;10:1213–1218. doi: 10.1038/nmeth.2688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Allis CD, Jenuwein T. The molecular hallmarks of epigenetic control. Nat Rev Genet. 2016;17:487–500. doi: 10.1038/nrg.2016.59. [DOI] [PubMed] [Google Scholar]
  • 47.ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Roadmap Epigenomics Consortium. Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller MJ, Amin V, Whitaker JW, Schultz MD, Ward LD, Sarkar A, Quon G, Sandstrom RS, Eaton ML, Wu YC, Pfenning AR, Wang X, Claussnitzer M, Liu Y, Coarfa C, Harris RA, Shoresh N, Epstein CB, Gjoneska E, Leung D, Xie W, Hawkins RD, Lister R, Hong C, Gascard P, Mungall AJ, Moore R, Chuah E, Tam A, Canfield TK, Hansen RS, Kaul R, Sabo PJ, Bansal MS, Carles A, Dixon JR, Farh KH, Feizi S, Karlic R, Kim AR, Kulkarni A, Li D, Lowdon R, Elliott G, Mercer TR, Neph SJ, Onuchic V, Polak P, Rajagopal N, Ray P, Sallari RC, Siebenthall KT, Sinnott-Armstrong NA, Stevens M, Thurman RE, Wu J, Zhang B, Zhou X, Beaudet AE, Boyer LA, De Jager PL, Farnham PJ, Fisher SJ, Haussler D, Jones SJ, Li W, Marra MA, McManus MT, Sunyaev S, Thomson JA, Tlsty TD, Tsai LH, Wang W, Waterland RA, Zhang MQ, Chadwick LH, Bernstein BE, Costello JF, Ecker JR, Hirst M, Meissner A, Milosavljevic A, Ren B, Stamatoyannopoulos JA, Wang T, Kellis M. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.van den Boogaard M, Smemo S, Burnicka-Turek O, Arnolds DE, van de Werken HJ, Klous P, et al. A common genetic variant within SCN10A modulates cardiac SCN5A expression. J Clin Invest. 2014;124:1844–1852. doi: 10.1172/JCI73140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Fullwood MJ, Liu MH, Pan YF, Liu J, Xu H, Mohamed YB, et al. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature. 2009;462:58–64. doi: 10.1038/nature08497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Simonis M, Klous P, Splinter E, Moshkin Y, Willemsen R, de Wit E, et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C) Nat Genet. 2006;38:1348–1354. doi: 10.1038/ng1896. [DOI] [PubMed] [Google Scholar]
  • 53.Jäger R, Migliorini G, Henrion M, Kandaswamy R, Speedy HE, Heindl A, et al. Capture Hi-C identifies the chromatin interactome of colorectal cancer risk loci. Nat Commun. 2015;6:6178. doi: 10.1038/ncomms7178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Fang R, Yu M, Li G, Chee S, Liu T, Schmitt AD, Ren B. Mapping of long-range chromatin interactions by proximity ligation-assisted ChIP-seq. Cell Res. 2016;26:1345–1348. doi: 10.1038/cr.2016.137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Mumbach MR, Rubin AJ, Flynn RA, Dai C, Khavari PA, Greenleaf WJ, Chang HY. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat Methods. 2016;13:919–922. doi: 10.1038/nmeth.3999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Dekker J, Belmont AS, Guttman M, Leshyk VO, Lis JT, Lomvardas S, et al. The 4D nucleome project. Nature. 2017;549:219–226. doi: 10.1038/nature23884. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Miller CL, Pjanic M, Wang T, Nguyen T, Cohain A, Lee JD, et al. Integrative functional genomics identifies regulatory mechanisms at coronary artery disease loci. Nat Commun. 2016;7:12092. doi: 10.1038/ncomms12092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Nurnberg ST, Cheng K, Raiesdana A, Kundu R, Miller CL, Kim JB, et al. Coronary artery disease associated transcription factor TCF21 regulates smooth muscle precursor cells that contribute to the fibrous cap. Genom Data. 2015;5:36–37. doi: 10.1016/j.gdata.2015.05.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Kojima Y, Downing K, Kundu R, Miller C, Dewey F, Lancero H, et al. Cyclin-dependent kinase inhibitor 2B regulates efferocytosis and atherosclerosis. J Clin Invest. 2014;124:1083–1097. doi: 10.1172/JCI70391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Miller CL, Anderson DR, Kundu RK, Raiesdana A, Nürnberg ST, Diaz R, et al. Disease-related growth factor and embryonic signaling pathways modulate an enhancer of TCF21 expression at the 6q23.2 coronary heart disease locus. PLoS Genet. 2013;9:e1003652. doi: 10.1371/journal.pgen.1003652. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Maurano MT, Haugen E, Sandstrom R, Vierstra J, Shafer A, Kaul R, Stamatoyannopoulos JA. Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo. Nat Genet. 2015;47:1393–1401. doi: 10.1038/ng.3432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Pjanic M, Miller CL, Wirka R, Kim JB, DiRenzo DM, Quertermous T. Genetics and genomics of coronary artery disease. Curr Cardiol Rep. 2016;18:102. doi: 10.1007/s11886-016-0777-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Miller CL, Pjanic M, Quertermous T. From locus association to mechanism of gene causality: the devil is in the details. Arterioscler Thromb Vasc Biol. 2015;35:2079–2080. doi: 10.1161/ATVBAHA.115.306366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Melnikov A, Murugan A, Zhang X, Tesileanu T, Wang L, Rogov P, et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat Biotechnol. 2012;30:271–277. doi: 10.1038/nbt.2137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Patwardhan RP, Hiatt JB, Witten DM, Kim MJ, Smith RP, May D, et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat Biotechnol. 2012;30:265–270. doi: 10.1038/nbt.2136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Ulirsch JC, Nandakumar SK, Wang L, Giani FC, Zhang X, Rogov P, et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell. 2016;165:1530–1545. doi: 10.1016/j.cell.2016.04.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Cho MH, Boutaoui N, Klanderman BJ, Sylvia JS, Ziniti JP, Hersh CP, et al. Variants in FAM13A are associated with chronic obstructive pulmonary disease. Nat Genet. 2010;42:200–202. doi: 10.1038/ng.535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Hobbs BD, de Jong K, Lamontagne M, Bossé Y, Shrine N, Artigas MS, et al. Genetic loci associated with chronic obstructive pulmonary disease overlap with loci for lung function and pulmonary fibrosis. Nat Genet. 2017;49:426–432. doi: 10.1038/ng.3752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Jiang Z, Lao T, Qiu W, Polverino F, Gupta K, Guo F, et al. A chronic obstructive pulmonary disease susceptibility gene, FAM13A, regulates protein stability of β-catenin. Am J Respir Crit Care Med. 2016;194:185–197. doi: 10.1164/rccm.201505-0999OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Canver MC, Smith EC, Sher F, Pinello L, Sanjana NE, Shalem O, et al. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature. 2015;527:192–197. doi: 10.1038/nature15521. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Sankaran VG, Menne TF, Xu J, Akie TE, Lettre G, Van Handel B, et al. Human fetal hemoglobin expression is regulated by the developmental stage-specific repressor BCL11A. Science. 2008;322:1839–1842. doi: 10.1126/science.1165409. [DOI] [PubMed] [Google Scholar]
  • 72.Evans CM, Fingerlin TE, Schwarz MI, Lynch D, Kurche J, Warg L, et al. Idiopathic pulmonary fibrosis: a genetic disease that involves mucociliary dysfunction of the peripheral airways. Physiol Rev. 2016;96:1567–1591. doi: 10.1152/physrev.00004.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Smemo S, Tena JJ, Kim KH, Gamazon ER, Sakabe NJ, Gómez-Marín C, et al. Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature. 2014;507:371–375. doi: 10.1038/nature13138. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES