Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2019 May 20;47(W1):W106–W113. doi: 10.1093/nar/gkz327

RegulationSpotter: annotation and interpretation of extratranscriptic DNA variants

Jana Marie Schwarz 1,2,3,, Daniela Hombach 2,3, Sebastian Köhler 2,4,5, David N Cooper 6, Markus Schuelke 1,3, Dominik Seelow 2,4
PMCID: PMC6602480  PMID: 31106382

Abstract

RegulationSpotter is a web-based tool for the user-friendly annotation and interpretation of DNA variants located outside of protein-coding transcripts (extratranscriptic variants). It is designed for clinicians and researchers who wish to assess the potential impact of the considerable number of non-coding variants found in Whole Genome Sequencing runs. It annotates individual variants with underlying regulatory features in an intuitive way by assessing over 100 genome-wide annotations. Additionally, it calculates a score, which reflects the regulatory potential of the variant region. Its dichotomous classifications, ‘functional’ or ‘non-functional’, and a human-readable presentation of the underlying evidence allow a biologically meaningful interpretation of the score. The output shows key aspects of every variant and allows rapid access to more detailed information about its possible role in gene regulation. RegulationSpotter can either analyse single variants or complete VCF files. Variants located within protein-coding transcripts are automatically assessed by MutationTaster as well as by RegulationSpotter to account for possible intragenic regulatory effects. RegulationSpotter offers the possibility of using phenotypic data to focus on known disease genes or genomic elements interacting with them. RegulationSpotter is freely available at https://www.regulationspotter.org.

INTRODUCTION

In the general search for disease mutations, Whole Genome Sequencing (WGS) is steadily gaining ground. In contrast to Whole Exome Sequencing (WES), it also detects variants within promoters and enhancers, while reducing enrichment problems and artefacts (1). However, the nature and sheer number of variants discovered by WGS pose new challenges for the identification of causal mutations. Whilst the prediction of the effect of variants leading to amino acid substitutions is now relatively straightforward, non-coding variants are much harder to classify. At present, several prediction tools such as GWAVA (2), CADD (3), deepSEA (4), or the REMM score of Genomiser (5) are able to assess ‘extratranscriptic’ variants located outside of transcripts. A recent publication by Rojano and colleagues summarises currently available prediction tools with their advantages and limitations (6). One major drawback of these programs is that they provide results in the form of scores instead of biologically meaningful annotations that are critical for our target audience, clinicians and life scientists. This is inherently problematic since the expertise of the latter groups is indispensable for the determination of the molecular cause of inherited diseases (7,8). A recent study by Shyr et al. (9) concluded that the ‘successful adoption of a clinical WES/WGS system is heavily dependent on its ability to address the diverse requirements of specialists in distinct healthcare domains’. They thus propose software interfaces specifically tailored to the needs of different professional groups. Most clinicians prefer graphical interfaces and the limiting of the displayed data to those particular features which are most relevant to their questions (9). In addition, these features must be represented in a meaningful, comprehensive fashion, not as a battery of raw scores. Tools such as Ensembl's Variant Effect Predictor (10) (VEP) or RegulomeDB (11) offer a higher degree of human readable annotation, but do not allow dedicated filtering strategies focused on phenotypic features or candidate genes. We have therefore developed RegulationSpotter, a web-based and user-friendly software for the rapid and convenient annotation and analysis of extratranscriptic DNA variants. In a novel approach to analysing regulatory variants, we focus on the human-readable presentation of the underlying biological data combined with an annotation-based score, thereby rendering RegulationSpotter usable to those clinicians and researchers who lack bioinformatics skills but who still strive to make sense of large sequencing data on their own.

METHODS

Software implementation and data integration

RegulationSpotter runs on a 48-CPU system with 512 GB RAM under Linux (CentOS 6). All data used by RegulationSpotter are physically integrated and stored in a PostgreSQL 9.5 database. RegulationSpotter program scripts are written in Perl (version 5.10) and run on an Apache 2.2 web server with HTTPS web protocol. All user interfaces are written in HTML with usage of JavaScript functions and were thoroughly tested for the Firefox browser under Linux, MacOS and Microsoft Windows. Additional testing involves Google Chrome and Safari. We employ TORQUE (version 4.2) as our job scheduling system.

Training data

We set up two different data sets with extratranscriptic variants (SNVs and InDels) to deduce the biological and clinical relevance of integrated regulatory features. The positive data set (P1) contains 457 extratranscriptic disease mutations (tag DM) from the Professional version of HGMD (HGMD Pro®, build 2018/1) (12) and the Genomiser publication (5). The negative data set (N1) comprises 8,000 randomly chosen common polymorphisms from the 1000 Genomes Project (13), all present in the homozygous state in more than 10 individuals. Further information about the generation of the data sets is given in the Supplementary Material.

RESULTS

RegulationSpotter is aimed at clinicians and life scientists

In order to address the need for a clinician-friendly software aimed at the analysis of deep sequencing projects, we have developed RegulationSpotter. Our tool is web-based and performs a comprehensive annotation of single base exchanges and short InDels. RegulationSpotter accepts VCF 4.1 files without any size limitation (e.g. complete WGS runs) and provides an in-depth annotation of all variants chosen to be analysed.

Table 1 summarises the functionalities of RegulationSpotter and comparable tools (see Discussion for details).

Table 1.

Overview of RegulationSpotter core features and comparable tools. (1) Helpful in this context means any effort to prepare and present the results in a structured or graphical way that helps the user to understand the numerous annotations. (2) This means that all types of variants (known, unknown, SNVs, short InDels) found by Whole Genome Sequencing can be submitted and analysed in a single run and without the need for prior processing of the format or the file size. (3) This refers to the possibility to restrict the analysis to variants residing in candidate genes or their associated regulatory elements such as promoters and distant enhancers or silencers. (✓) CADD scores are only available for a limited selection of short InDels

graphic file with name gkz327tbl1.jpg

RegulationSpotter integrates more than 100 distinct genomic features

RegulationSpotter integrates data on gene regulation from different publicly available resources (122 different features) to annotate extratranscriptic variants. We include various tracks from the Ensembl Regulatory Build (14) version 37/91, such as promoters, promoter flanking regions, enhancers, CTCF binding sites, transcription factor binding sites (TFBS) and open chromatin regions. Apart from integrating these precomputed regulatory features, we further processed and merged available annotation tracks to generate customised, refined annotations (e.g. promoters with epigenetic marks which suggest activity detected in at least three cell lines). The Ensembl Regulatory Build also comprises enhancers from VISTA (15) and promoter and TSS (transcription start site) annotations from the FANTOM5 project (16). In order to link distant modifiers such as enhancers to promoters, we use genome-wide interaction data from Hi-C (17) and ENCODE (18,19) 5C (20) experiments and FANTOM5 enhancer-TSS associations analysed with CAGE (21). Moreover, the degree of evolutionary conservation via PhastCons (22) and PhyloP (23) is also included (see Table S1 in the Supplementary Material for details on integrated data).

RegulationSpotter facilitates the analysis of Whole Genome Sequencing data

RegulationSpotter handles genotype data in VCF format but can also analyse single variants on-the-fly. Single queries are entered via chromosome, position, reference base and altered base. In order to streamline the analysis of data from WGS projects, the software determines for every variant (SNV or InDel), whether it is located within a protein coding transcript or outside, i.e. extratranscriptic. Irrespective of the chosen interface (single variant or VCF file analysis), variants within protein coding transcripts are automatically analysed by MutationTaster (24), our previously published software to predict the disease potential of intragenic variants, as well as by RegulationSpotter, in order to account for a potential intratranscriptic regulatory impact. Extratranscriptic variants are analysed solely by RegulationSpotter. Figure 1 depicts the numerous functional aspects covered by RegulationSpotter and MutationTaster, which can all be assessed in a single analysis run. In VCF file mode, users can initially restrict the analysis to candidate genes or regions, thereby selecting only those variants, which either reside within these genes, their promoter regions, or within interacting cis-regulatory regions. In addition, they can choose to exclude variants found in population based databases (currently 1000G and ExAC) with user-specified counts of carriers and/or homozygous carriers. This drastically reduces the run time to about 6–12 hours per genome (depending on server load). Submitted variants (along with the genotype and coverage) and the analysis results are stored in a database. Variants, which have already been analysed, are not re-analysed if uploaded in another project, saving more time. Re-analysis of the same genome, e.g. after a change in the alignment or variant calling pipeline, usually takes less than 10 minutes.

Figure 1.

Figure 1.

Overview of features that can be assessed in a single RegulationSpotter VCF analysis run. Depending upon a variant's localisation, different aspects are analysed either by RegulationSpotter or MutationTaster. (*) It should be noted that intragenic variants are always additionally analysed by RegulationSpotter to account for the possible regulatory effects of e.g. non-coding variants. UTR: untranslated region; CDS: coding sequence; TSS: transcription start site; NMD: nonsense-mediated mRNA decay.

After the analysis, an interface is displayed to select variants, export the data or delete a project (see Supplementary Figure S2). It also gives information on the number of variants that were processed and provides access to the variants excluded. Results can then be downloaded as simple text files, or watched directly online.

To reduce the number of variants to be further inspected, users can easily narrow them down to those located in user-defined genomic regions or affecting candidate genes. These candidate genes can either be entered manually or suggested by RegulationSpotter if the user enters the patient's phenotype, as clinical diagnoses (via OMIM (25) or OrphaNet (26)) or by their clinical symptoms (via the Human Phenotype Ontology, HPO (27)).

A summary table provides a quick overview of the variants meeting the display criteria (Figure 2 and Supplementary Figure S3). The most relevant functional aspects are displayed in a colour-coded matrix, either as on/off for dichotomic elements (e.g. location within a promoter) or as a colour grade for continuous and discontinuous values (such as conservation or allele frequencies). Hyperlinks guide the user to detailed per-variant results (see Supplementary Figure S4) showing key aspects of every variant and allowing rapid access to more detailed information about its possible role in gene regulation. By grouping different regulatory annotations together by their probable biological role, RegulationSpotter offers intuitive access to the sometimes complex regulatory element landscape. Users can easily generate hyperlinks to RegulationSpotter results, RegulationSpotter can therefore also be employed as a variant visualization tool in other software. Analysis results are stored in our database for at least three weeks, but can also be deleted earlier or stored longer upon the user's request.

Figure 2.

Figure 2.

Screenshot of the colour-coded results matrix. Variants chosen to be displayed are organised in a summary table (left part) and in a colour-coded matrix (right part) in order to allow the rapid overview of every variant. Users can follow hyperlinks to study every variant in further detail.

Phenotypic information can help to find the disease mutation

Apart from directly entering candidate genes or regions to filter variants before or after analysis, users can in the variant selection interface also specify a clinical diagnosis, disease (via OMIM or Orphanet) or clinical features (via HPO) in order to restrict the displayed variants to those residing in disease- or symptom-related candidate genes. The interface offers a text area with autocompletion functionality where entries from OMIM, Orphanet or the HPO automatically pop up while typing in a disease or symptom. Afterwards, a list of associated candidate genes is shown. Users can adjust a threshold to apply a more or less rigid filtering. Variants within distant regulatory regions such as enhancers are also displayed if the regulatory element is linked to a candidate gene. The connection between distant regulatory elements and associated genes is drawn from HiC and 5C data (see Supplementary Material).

Human-readable evidence for the disease potential of a variant

The results page for a single variant provides details about genomic features present within the region in which the variant is located. Various annotations retrieved from different sources are grouped depending on their assumed role in gene regulation (e.g. promoter and enhancer features), in order to facilitate their interpretation. Hyperlinks to external sources such as the Ensembl Genome Browser (28), the ZENBU genome browser (29) or our web-based tool ePOSSUM (30) (for studying the effects of variants in TFBS) allow quick and easy deeper investigation. Data about genomic interactions are additionally visualised in a plot which shows involved genes or transcripts as well as genomic loci associated with these genes and where on a genomic scale these are located (see Supplementary Figure S5).

Since regulatory information is mostly available for different cell types, we grouped them together into more intuitive colour-coded tissue groups, so that researchers can easily decide if a variant has annotations in their tissue of interest. Although much information is presented, the design allows clinicians and researchers to obtain a rapid yet thorough understanding of a variant's localisation, genomic context, and possible role in regulating disease-relevant genes. By offering annotations from a multitude of different data sources, it saves users from having to manually collect this information on their own.

Indication of functional relevance of extratranscriptic regions

Apart from annotating variants, RegulationSpotter provides a score (region score), which gauges the functional relevance of the region within which a given variant is located. This score is generated using a subset of 77 features, which showed discriminative power in a testset of known extratranscriptic disease mutations and putatively neutral polymorphisms (see Methods and Supplementary Material for test set assembly and feature selection). In contrast to similar scores generated by other programs, the region score is not generated by a classifier. Instead of applying machine-learning algorithms and risk overfitting to the low number of known extratranscriptic disease mutations, we carried out an initial feature selection and weight determination based on current biological knowledge. We then adjusted the region score on the basis of our preconception as to the effect of the different features and their relative risk ratio of appearing in either a set of 457 extratranscriptic functional variants from HGMD (12) and the Genomiser publication (data set P1) or in 8,000 common polymorphisms from the 1000G (data set N1; see Supplementary Figure S1 for relative risks). This strategy allowed us to limit score-relevant features to those likely to be most meaningful in terms of their biological role (see Supplementary Tables S1 and S2 for exact relative risks and feature weights). To facilitate the interpretation of the region score, we provide a colour-coded translation of the score and the functional relevance of the region. Blue indicates that the region is probably not functional in terms of gene regulation, whereas red encodes likely functional regions. Additionally, we distinguish between poor evidence (pale colour) and strong evidence (strong colour). Knowledge about the functional annotation can help to deduce the relevance of a variant in an intuitive way. However, owing to the relatively small number of known disease mutations that were available for training, we do not assess the effect of the variant itself, only the likely relevance of the genomic region. Intragenic variants, which can be analysed by MutationTaster, are classified as disease causing or polymorphism. Known polymorphisms found in the homozygous state and disease mutations from ClinVar are automatically recognised and classified. The Supplementary Material provides further information on region score generation and integrated features.

DISCUSSION

RegulationSpotter is a web-based tool for the convenient and streamlined analysis and interpretation of DNA variants from high-throughput sequencing projects. Since it is intended to aid clinicians and life scientists in the interpretation of complex WGS data, our main focus lies on the usability and comprehensive presentation of the results. Our aim has been to offer a broad range of regulatory annotations while also supporting the user in making sense of the presented data. The region score can help in sorting variants according to the amount of regulatory knowledge about the region they are located in. By allowing dynamic post-analysis filtering for candidate genes and regions, researchers can incorporate their clinical or biological knowledge, which is indispensable in the process of identifying likely disease variants. With the combination of (a) offering extensive annotation of complete VCF files from WGS, (b) easily understandable user interfaces and presentation of results and (c) taking into account previous knowledge about biological or disease-related questions, RegulationSpotter stands apart from classical effect prediction tools (see Table 1). The different aspects featured in the table are in our opinion highly relevant for clinicians and life scientists. They help to assess the possible involvement of a regulatory variant in the clinical phenotypes of their patients. For example, a presentation of results that goes beyond the simple listing of annotated features is desirable to facilitate our understanding of the numerous annotations. This can be achieved by graphical display or the grouping of different features in the context of their role in gene regulation. A transparent score facilitates understanding of the assumed functional impact of a variant. The possibility to connect clinical features and diseases to candidate genes, and the subsequent search for variants affecting these, takes into account the clinical expertise of physicians. Simple interfaces and web-based access are crucially important in a clinical setting where software often must not be installed locally. The analysis of all types of variants, going beyond SNVs, guarantees a seamless analysis with reduced effort. Taken together, we consider that RegulationSpotter offers a unique combination of services relevant to our target audience.

We developed the software in an iterative manner and in close collaboration with users, thereby maximising adaption to their needs. By accessing 122 different annotation features from a multitude of sources, RegulationSpotter spares the user from having to collect data on their own and significantly facilitates the in-depth study of potential regulatory variants. Instead of randomly displaying all available annotations, we group the single features together depending on their assumed role in gene regulation. Promoters and enhancers are key regulatory elements in gene expression (reviewed in (31–34)). Typical epigenetic marks for active promoters are trimethylation of H3K4 (H3K4me3) (35) and hypersensitivity to DNaseI (DNaseI hypersensitivity sites, DHS) (36,37), which are both largely invariant across different cell types (37,38). Active enhancers are characterised by monomethylation of H3K4 (H3K4me1) (35), acetylation of H3K27 (H3K27ac) (39) and hypersensitivity to DNaseI (36,37) and are highly specific for different cell types (38). Moreover, these marks have a significant predictive potential in terms of the functional relevance of extratranscriptic DNA sequence variants (40,41). In addition to integrating these precomputed multicell regulatory features from the Ensembl Regulatory build, which are rather broadly annotated due to their cell type-unspecific nature, we highlight H3K4me3 modifications and DHSs overlapping in at least three cell types as relevant custom annotation for active promoters. Enhancers are denoted as either active or poised in certain cell lines / types.

To facilitate further investigations, RegulationSpotter also provides hyperlinks, e.g. to our web-based tool ePOSSUM for studying the effects of variants in TFBS or to our candidate gene search engine GeneDistiller (42). To make maximum use of their own knowledge, users can select diseases and symptoms from OMIM, Orphanet and the HPO to automatically restrict their results to those variants residing in genes or regulatory elements known to cause the disease or phenotype in question. The degree of matching between gene and phenotype does not however affect the scoring. The implementation of a machine-learning algorithm that takes into account phenotypic information is currently hampered by the paucity of solved cases with a known disease-causing mutation and comparably deep and reliable phenotyping. With advances in the area of patient phenotyping and digital health recording, these shortcomings may be overcome in the future.

RegulationSpotter offers annotation and a sorting functionality based on the amount of annotation known for a certain genetic locus. It does not offer prediction of the functional relevance of a variant itself. The low number of functionally validated extratranscriptic disease mutations, most of which affect highly conserved nucleotides, may not truly reflect the majority of extratranscriptic disease mutations. These training mutations mostly exert a strong effect and do not reliably model lesser effects which might become deleterious in synergy with other variants (43). The lack of representative training data is also due to the fact that time- and cost-intensive validation studies such as mouse models are usually only conducted for variants with a clear indication of disease relevance, e.g. deletion of a TFBS or very high conservation. We therefore recommend critically assessing the functional evidence underlying any scores and for this reason offer comprehensive annotations in the first place and only as an addition the region score. We display all the integrated data and the respective sub-scores used to generate it, thereby allowing the user to understand the results without the need for further research on background information. Similar to the output of comparable programs, the region score should not be treated as an absolute criterion for the disease potential of a given variant, but rather should be used as an indicator to prioritize extratranscriptic variants according to their regulatory potential.

As mentioned, RegulationSpotter evaluates variants only based on positional effects, i.e. the available annotation at the locus, rather than considering the nature of the change itself. This is, however, also a common limitation of other available programs. It is known that single base exchanges or small InDels in promoter regions may lead to altered gene expression and ultimately to disease (44,45). However, to our knowledge, there is so far no systematic study, which investigated where, within the promoter, disease mutations preferentially reside.

It remains unclear how the exact nature of a DNA change within enhancer regions interferes with gene regulation. Recent studies in mice addressed the question whether mammalian enhancers typically act in an additive manner or if the regulation of one gene by several enhancers might serve as a kind of redundancy backup (46). Owing to enhancer redundancy, which appears to be a widespread feature of mammalian genomes, deleterious consequences due to changes in only a single enhancer may be prevented. On the other hand, there are so-called ultraconserved enhancers, where variation to just one single enhancer may result in a clear phenotype (47). A recent study raised the point that the still fragmentary knowledge and limited amount of representative training data might currently limit the performance of reliable algorithms that are capable of predicting the outcome of all kinds of regulatory DNA variant (43). More data and deeper annotation are needed for us to be able to deduce general rules about the functional consequences of variants located in distant regulatory regions.

With its focus on user-friendliness and comprehensible presentation of integrated data, RegulationSpotter can help to reduce the number of variants that require follow-up investigation, and we believe that it will become a valuable resource in human genomics research. As new knowledge and more data emerge, we shall be able to continually update and extend the data used by RegulationSpotter and optimise the computation of the region score to offer even better interpretation and annotation of extratranscriptic variants in the future.

DATA AVAILABILITY

RegulationSpotter is freely available at https://www.regulationspotter.org. No login is required. We provide a thorough documentation along with a tutorial on our website. With simple hyperlinks (position and alleles), RegulationSpotter can easily be used as a downstream application for WGS analysis.

RegulationSpotter accepts single-sample VCF files in VCF 4.1 format as well as single variants in VCF-like notation. Analysis of a WGS project with 3.5 million variants takes ∼4–12 h, depending on the server load. This length of time can be drastically reduced by filtering out common polymorphisms or confining the analysis to candidate genes, their promoters and interacting regions. These options are available in our upload interface. Uploaded data are available only via a unique secret URL, which is displayed to our users during the upload process. We strongly recommend to zip large VCF files prior to upload to reduce the upload time, which might be long, depending on the internet speed (e.g. the upload of 1GB at an upload speed of 5 Mbps takes approximately 30 min). The data are automatically deleted from the webserver after 3 weeks unless users actively delete their project or request an extension by E-mail.

Supplementary Material

gkz327_Supplemental_Files

ACKNOWLEDGEMENTS

The authors thank Dr Olivia Ebner for her previous support with updating the online documentation. We also thank the ENCODE Consortium and the ENCODE production laboratory generating the 5C datasets.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Deutsche Forschungsgemeinschaft [SE-2273/1-1 to D.S., SFB 665 TP C4 to M.S.]; Stiftung Charité [BIH_PRO_313 to D.S.]; E-RARE project Hipbi-RD [01GM1608 to D.S. and S.K.]; Studienstiftung des Deutschen Volkes (to D.H.); H2020 research project Solve-RD [779257 to S.K.]; NeuroCure under Germany's Excellence Strategy [EXC-2049-390688087 to M.S.]; Stiftung Industrieforschung, the Rolf Schildmann und Luzia Balsfulland Stiftung and the Nachwuchskommission of the Charité Berlin (Rahel-Hirsch scholarship) (to J.M.S.). We acknowledge support from the German Research Foundation (DFG) and the Open Access Publication Fund of Charité – Universitätsmedizin Berlin.

Conflict of interest statement. None declared.

REFERENCES

  • 1. Shen T., Lee A., Shen C., Lin C.J.. The long tail and rare disease research: the impact of next-generation sequencing for rare Mendelian disorders. Genet. Res. 2015; 97:doi:10.1017/S0016672315000166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Ritchie G.R.S., Dunham I., Zeggini E., Flicek P.. Functional annotation of noncoding sequence variants. Nat. Methods. 2014; 11:294–296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Kircher M., Witten D.M., Jain P., O’Roak B.J., Cooper G.M., Shendure J.. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 2014; 46:310–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Zhou J., Troyanskaya O.G.. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods. 2015; 12:931–934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Smedley D., Schubach M., Jacobsen J.O.B., Köhler S., Zemojtel T., Spielmann M., Jäger M., Hochheiser H., Washington N.L., McMurry J.A. et al.. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in mendelian disease. Am. J. Hum. Genet. 2016; 99:595–606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Rojano E., Seoane P., Ranea J.A.G., Perkins J.R.. Regulatory variants: from detection to predicting impact. Brief. Bioinform. 2018; doi:10.1093/bib/bby039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Biesecker L.G., Burke W., Kohane I., Plon S.E., Zimmern R.. Next generation sequencing in the clinic: Are we ready. Nat. Rev. Genet. 2012; 13:818–824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Lecroq T., Soualmia LF.. Managing large-scale genomic datasets and translation into clinical practice. Yearb. Med. Inform. 2014; 9:212–214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Shyr C., Kushniruk A., van Karnebeek C.D.M., Wasserman W.W.. Dynamic software design for clinical exome and genome analyses: insights from bioinformaticians, clinical geneticists, and genetic counselors. J. Am. Med. Inform. Assoc. JAMIA. 2016; 23:257–268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. McLaren W., Pritchard B., Rios D., Chen Y., Flicek P., Cunningham F.. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinforma. Oxf. Engl. 2010; 26:2069–2070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Boyle A.P., Hong E.L., Hariharan M., Cheng Y., Schaub M.A., Kasowski M., Karczewski K.J., Park J., Hitz B.C., Weng S. et al.. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012; 22:1790–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Stenson P.D., Mort M., Ball E.V., Evans K., Hayden M., Heywood S., Hussain M., Phillips A.D., Cooper D.N.. The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum. Genet. 2017; 136:665–677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. The 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015; 526:68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Zerbino D.R., Wilder S.P., Johnson N., Juettemann T., Flicek P.R.. The Ensembl regulatory build. Genome Biol. 2015; 16:56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Visel A., Minovitsky S., Dubchak I., Pennacchio L.A.. VISTA Enhancer Browser–a database of tissue-specific human enhancers. Nucleic Acids Res. 2007; 35:D88–D92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. FANTOM Consortium and the RIKEN PMI and CLST (DGT) Forrest A.R.R., Kawaji H., Rehli M., Baillie J.K., de Hoon M.J.L., Haberle V., Lassmann T., Kulakovskiy I.V. et al.. A promoter-level mammalian expression atlas. Nature. 2014; 507:462–470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Rao S.S.P., Huntley M.H., Durand N.C., Stamenova E.K., Bochkov I.D., Robinson J.T., Sanborn A.L., Machol I., Omer A.D., Lander E.S. et al.. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014; 159:1665–1680. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489:57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Sloan C.A., Chan E.T., Davidson J.M., Malladi V.S., Strattan J.S., Hitz B.C., Gabdank I., Narayanan A.K., Ho M., Lee B.T. et al.. ENCODE data at the ENCODE portal. Nucleic Acids Res. 2016; 44:D726–D732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Sanyal A., Lajoie B.R., Jain G., Dekker J.. The long-range interaction landscape of gene promoters. Nature. 2012; 489:109–113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Andersson R., Gebhard C., Miguel-Escalada I., Hoof I., Bornholdt J., Boyd M., Chen Y., Zhao X., Schmidl C., Suzuki T. et al.. An atlas of active enhancers across human cell types and tissues. Nature. 2014; 507:455–461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Siepel A., Bejerano G., Pedersen J.S., Hinrichs A.S., Hou M., Rosenbloom K., Clawson H., Spieth J., Hillier L.W., Richards S. et al.. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005; 15:1034–1050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Pollard K.S., Hubisz M.J., Rosenbloom K.R., Siepel A.. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 2010; 20:110–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Schwarz J.M., Cooper D.N., Schuelke M., Seelow D.. MutationTaster2: mutation prediction for the deep-sequencing age. Nat. Methods. 2014; 12:361–362. [DOI] [PubMed] [Google Scholar]
  • 25. McKusick V.A. Mendelian inheritance in man and its online version, OMIM. Am. J. Hum. Genet. 2007; 80:588–604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Rath A., Olry A., Dhombres F., Brandt M.M., Urbero B., Ayme S.. Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users. Hum. Mutat. 2012; 33:803–808. [DOI] [PubMed] [Google Scholar]
  • 27. Köhler S., Carmody L., Vasilevsky N., Jacobsen J.O.B., Danis D., Gourdine J.-P., Gargano M., Harris N.L., Matentzoglu N., McMurry J.A. et al.. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res. 2019; 47:D1018–D1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Aken B.L., Ayling S., Barrell D., Clarke L., Curwen V., Fairley S., Fernandez Banet J., Billis K., García Girón C., Hourlier T. et al.. The Ensembl gene annotation system. Database. 2016; 2016:doi:10.1093/database/baw093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Severin J., Lizio M., Harshbarger J., Kawaji H., Daub C.O., Hayashizaki Y. FANTOM Consortium . FANTOM Consortium Bertin N., Forrest A.R.R.. Interactive visualization and analysis of large-scale sequencing datasets using ZENBU. Nat. Biotechnol. 2014; 32:217–219. [DOI] [PubMed] [Google Scholar]
  • 30. Hombach D., Schwarz J.M., Robinson P.N., Schuelke M., Seelow D.. A systematic, large-scale comparison of transcription factor binding site models. BMC Genomics. 2016; 17:388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Lenhard B., Sandelin A., Carninci P.. Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nat. Rev. Genet. 2012; 13:233–245. [DOI] [PubMed] [Google Scholar]
  • 32. Bulger M., Groudine M.. Enhancers: the abundance and function of regulatory sequences beyond promoters. Dev. Biol. 2010; 339:250–257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Coppola C.J., Ramaker C.R., Mendenhall E.M.. Identification and function of enhancers in the human genome. Hum. Mol. Genet. 2016; 25:R190–R197. [DOI] [PubMed] [Google Scholar]
  • 34. Plank J.L., Dean A.. Enhancer function: mechanistic and genome-wide insights come together. Mol. Cell. 2014; 55:5–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Heintzman N.D., Stuart R.K., Hon G., Fu Y., Ching C.W., Hawkins R.D., Barrera L.O., Van Calcar S., Qu C., Ching K.A. et al.. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 2007; 39:311–318. [DOI] [PubMed] [Google Scholar]
  • 36. Thurman R.E., Rynes E., Humbert R., Vierstra J., Maurano M.T., Haugen E., Sheffield N.C., Stergachis A.B., Wang H., Vernot B. et al.. The accessible chromatin landscape of the human genome. Nature. 2012; 489:75–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Xi H., Shulha H.P., Lin J.M., Vales T.R., Fu Y., Bodine D.M., McKay R.D.G., Chenoweth J.G., Tesar P.J., Furey T.S. et al.. Identification and characterization of cell type–specific and ubiquitous chromatin regulatory structures in the human genome. PLoS Genet. 2007; 3:e136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Heintzman N.D., Hon G.C., Hawkins R.D., Kheradpour P., Stark A., Harp L.F., Ye Z., Lee L.K., Stuart R.K., Ching C.W. et al.. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature. 2009; 459:108–112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Creyghton M.P., Cheng A.W., Welstead G.G., Kooistra T., Carey B.W., Steine E.J., Hanna J., Lodato M.A., Frampton G.M., Sharp P.A. et al.. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. U.S.A. 2010; 107:21931–21936. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Shihab H.A., Rogers M.F., Gough J., Mort M., Cooper D.N., Day I.N.M., Gaunt T.R., Campbell C.. An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics. 2015; 31:1536–1543. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Khurana E., Fu Y., Colonna V., Mu X.J., Kang H.M., Lappalainen T., Sboner A., Lochovsky L., Chen J., Harmanci A. et al.. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science. 2013; 342:1235587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Seelow D., Schwarz J.M., Schuelke M.. GeneDistiller—distilling candidate genes from linkage intervals. PLoS One. 2008; 3:e3874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Drubay D., Gautheret D., Michiels S.. A benchmark study of scoring methods for non-coding mutations. Bioinforma. Oxf. Engl. 2018; 34:1635–1641. [DOI] [PubMed] [Google Scholar]
  • 44. Wu G., Shan J., Pang S., Wei X., Zhang H., Yan B.. Genetic analysis of the promoter region of the GATA4 gene in patients with ventricular septal defects. Transl. Res. J. Lab. Clin. Med. 2012; 159:376–382. [DOI] [PubMed] [Google Scholar]
  • 45. Chen H.I., Jagadeesh K.A., Birgmeier J., Wenger A.M., Guturu H., Schelley S., Bernstein J.A., Bejerano G.. An MTF1 binding site disrupted by a homozygous variant in the promoter of ATP7B likely causes Wilson Disease. Eur. J. Hum. Genet. EJHG. 2018; 26:1810–1818. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Osterwalder M., Barozzi I., Tissières V., Fukuda-Yuzawa Y., Mannion B.J., Afzal S.Y., Lee E.A., Zhu Y., Plajzer-Frick I., Pickle C.S. et al.. Enhancer redundancy provides phenotypic robustness in mammalian development. Nature. 2018; 554:239–243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Dickel D.E., Ypsilanti A.R., Pla R., Zhu Y., Barozzi I., Mannion B.J., Khin Y.S., Fukuda-Yuzawa Y., Plajzer-Frick I., Pickle C.S. et al.. Ultraconserved enhancers are required for normal development. Cell. 2018; 172:491–499. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkz327_Supplemental_Files

Data Availability Statement

RegulationSpotter is freely available at https://www.regulationspotter.org. No login is required. We provide a thorough documentation along with a tutorial on our website. With simple hyperlinks (position and alleles), RegulationSpotter can easily be used as a downstream application for WGS analysis.

RegulationSpotter accepts single-sample VCF files in VCF 4.1 format as well as single variants in VCF-like notation. Analysis of a WGS project with 3.5 million variants takes ∼4–12 h, depending on the server load. This length of time can be drastically reduced by filtering out common polymorphisms or confining the analysis to candidate genes, their promoters and interacting regions. These options are available in our upload interface. Uploaded data are available only via a unique secret URL, which is displayed to our users during the upload process. We strongly recommend to zip large VCF files prior to upload to reduce the upload time, which might be long, depending on the internet speed (e.g. the upload of 1GB at an upload speed of 5 Mbps takes approximately 30 min). The data are automatically deleted from the webserver after 3 weeks unless users actively delete their project or request an extension by E-mail.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES