Abstract
While great advances in predicting the effects of coding variants have been made, the assessment of non-coding variants remains challenging. This is especially problematic for variants within promoter regions which can lead to over-expression of a gene or reduce or even abolish its expression. The binding of transcription factors to the DNA can be predicted using position weight matrices (PWMs). More recently, transcription factor flexible models (TFFMs) have been introduced and shown to be more accurate than PWMs. TFFMs are based on hidden Markov models and can account for complex positional dependencies. Our new web-based application FABIAN-variant uses 1224 TFFMs and 3790 PWMs to predict whether and to which degree DNA variants affect the binding of 1387 different human transcription factors. For each variant and transcription factor, the software combines the results of different models for a final prediction of the resulting binding-affinity change. The software is written in C++ for speed but variants can be entered through a web interface. Alternatively, a VCF file can be uploaded to assess variants identified by high-throughput sequencing. The search can be restricted to variants in the vicinity of candidate genes. FABIAN-variant is available freely at https://www.genecascade.org/fabian/.
Graphical Abstract
Graphical Abstract.
FABIAN-variant: analysis of a promoter variant that has been reported to disable binding of the erythroid transcription factor GATA1 (chr1:155271258T>C).
INTRODUCTION
Any human individual harbours about 4 million short variants (single nucleotide variants and short indels) in their genome compared to the reference genome (1). Many computer programs are available for assessing the disease-causing potential of variants in coding regions (e.g. MutationTaster (2), PolyPhen (3), and SIFT (4)), but coding regions make up only about 1.5 % of the human genome. Much less is known about the effect of variants in the remaining non-coding DNA, and tools aiming at the prediction of such variants (e.g. CADD (5), Genomiser (6), and RegulationSpotter (7)) are hampered by the lack of known disease mutations outside of protein-coding genes as training cases.
Variants in regions essential for gene expression (especially promoters and enhancers) can alter the binding affinity of transcription factors, leading to up- or down-regulation of transcriptional activity (8). This may result in non-expression or severe under-expression of a gene, the subsequent loss of the encoded protein (null mutation), and lead to disease (9–11).
The prediction of transcription factor binding sites (TFBSs) presents an ongoing challenge in computational biology (12,13). The standard method for assessing the binding affinity of a transcription factor to DNA in silico is to compare the DNA bases with a position weight matrix (PWM) specific for the transcription factor. A PWM model is obtained from an assumed common binding motif by counting adenine, cytosine, guanine, and thymine bases at each position in experimentally-confirmed binding sites for a transcription factor. Many thousands of PWM profiles have been published in open-access databases (e.g. JASPAR (14), HOCOMOCO (15), and SwissRegulon (16)).
PWMs are relatively simple models that ignore the positional dependencies that have been repeatedly observed in TFBSs (17–20). More advanced models have in many cases been shown to give better results in identifying experimentally verified binding sites (21–23). A number of alternative modelling approaches have been proposed, several of which attempt to integrate dependencies between adjacent and/or distant positions. These include the binding energy model (BEM) (24), dinucleotide weight matrices (DWMs) (25), and transcription factor flexible models (TFFMs) (22). Among these, TFFMs have been gaining visibility since their inclusion in the JASPAR database for transcription factors, which includes >1000 human TFFMs in its current release (14). TFFMs are based on hidden Markov models (HMMs) and can account for complex positional dependencies as well as variable length nucleotide patterns. TFFM motifs are defined in terms of HMM states, transitions between states, initials, and emissions. Two types of TFFMs are commonly used and supported in FABIAN-variant: In first-order TFFMs, each position within a TFBS is represented by a HMM state emitting a nucleotide with probabilities dependent on the nucleotide found at the previous position. In detailed TFFMs, each HMM state in the first-order HMM is decomposed into four states (one per nucleotide) and transition probabilities reflect the emission probabilities of the first-order HMM (22). Like PWMs, TFFMs are derived from experimentally verified binding sites. Unlike PWMs, TFFMs cannot be evaluated using standard mathematical operations, but require dedicated algorithms for HMMs. Although several tools for evaluating PWMs exist (e.g. FIMO (26), motifbreakeR (27), and Pscan (28)), we have only found one other web application that works with TFFMs (TFBSPred (29)). However, TFBSPred does not provide a mechanism for evaluating DNA variants.
This article introduces a new user-friendly web application for predicting the effects of DNA variants on transcription factor binding. FABIAN-variant offers 1224 TFFMs and 3790 PWMs from different databases for 1387 different human transcription factors. It has different modes for analysing single variants, lists of variants, or up to 10 000 variants from a VCF file. For each variant and transcription factor, FABIAN-variant evaluates available models in the ‘wild-type’ and variant sequence and returns a combined score indicating whether or not and to which degree transcription factor binding may be affected. The backend is written in C++ for speed and most types of analysis are completed in just a few seconds. For VCF-based analyses, users can choose to be notified by email once the run completes. Results are visualised in the browser and can also be downloaded. Various filters for regions, genes, variants, and transcription factors are implemented (e.g. search in promoter regions of candidate genes, search with TFFMs only). Genome builds GRCh37 (hg19) and GRCh38 (hg38) are supported. FABIAN-variant is free and open to all users without login requirement.
SOFTWARE/BACKEND
Evaluation of TFFMs and PWMs
Different models for human TFBSs (TFFMs and PWMs) were downloaded from various data sources (Table 1) and imported into FABIAN-variant.
Table 1.
Data sources included in FABIAN-variant
Source | Data | URL | Reference |
---|---|---|---|
JASPAR 2022 | 612 detailed TFFMs | https://jaspar.genereg.net/ | (14) |
JASPAR 2022 | 612 first-order TFFMs | https://jaspar.genereg.net/ | (14) |
JASPAR 2022 | 877 PWMs | https://jaspar.genereg.net/ | (14) |
MotifDb 1.36.0 | * | https://doi.org/10.18129/B9.bioc.MotifDb | (41) |
CIS-BP 1.02 | 313 PWMs | http://cisbp.ccbr.utoronto.ca/ | (42) |
HOCOMOCO 11 | 768 PWMs | https://hocomoco11.autosome.org/ | (15) |
hPDI | 436 PWMs | http://bioinfo.wilmer.jhu.edu/PDI/ | (43) |
Jolma 2013 | 710 PWMs | https://doi.org/10.1016/j.cell.2012.12.009 | (44) |
SwissRegulon | 684 PWMs | https://swissregulon.unibas.ch/sr/ | (16) |
UniPROBE | 2 PWMs | http://the_brain.bwh.harvard.edu/uniprobe/ | (45) |
ENCODE 3 | 7,374,455 TFBSs† | https://www.encodeproject.org/ | (30) |
Ensembl Regulation 102 | 7,808,345 TFBSs† | https://www.ensembl.org/ | (31) |
FANTOM5 SSTAR | 4,987 TFBSs† | https://fantom.gsc.riken.jp/5/sstar/ | (32) |
Data included in this table and in FABIAN-variant is for human transcription factors only.
*MotifDb 1.36.0 is an annotated collection of PWM models, and we obtained all PWMs listed in this table except for JASPAR 2022 from MotifDb.
†Data for genome build GRCh37 is shown.
FABIAN-variant evaluates each selected TFFM and PWM model in a sliding window from −15 to +15 nucleotides around the variant location in both the reference sequence (‘wild-type’, WT) and the variant sequence (‘mutated’, MT). Both strands are considered. Then the highest scores for both sequences (0 ≤ WT,MT ≤ 1) are compared for each model. A greater WT score indicates a weakened binding affinity, and a greater MT score indicates an increased binding affinity caused by the variant. For each model, FABIAN-variant generates a joint score S between −1 (likely TFBS loss) and 1 (likely TFBS gain),
![]() |
with pseudocount α = 0.1 to avoid zero in the denominator. We use the inverse of WT and MT in the ratio (e.g. 1 − WT) to account for the fact that PWM and TFFM scores correlate with the likelihood that binding is possible in the first place and the ratio is comparatively higher with small denominators. WT, MT and the joint score S per model are shown on the results page of FABIAN-variant.
For most transcription factors, several models (TFFMs and PWMs) exist. To obtain the combined prediction per variant per transcription factor, FABIAN-variant calculates the average of joint scores S of the individual models. If both TFFMs and PWMs are available, FABIAN-variant by default uses only the results from TFFMs for the combined prediction (this setting can be changed on the results page so that both types of models are included in the combined score).
Known TFBSs
To allow the restriction to known TFBSs, we collected data from ChIP-seq experiments from ENCODE (30) and Ensembl Regulation (31), as well as from cap analysis of gene expression (CAGE) experiments from FANTOM5 (32) (Table 1). Please note that TFBSs obtained from ChIP-seq experiments are regions of several hundred bases, whereby the precise location of the actual TFBS within the region is however unknown.
We did not use Ensembl’s motif-derived predicted binding sites.
FEATURES/FRONTEND
Search interface
On the search page (Figure 1), users can select transcription factors and choose models to be included in the search. Variants can either be entered into a text field or uploaded as a VCF file. In the latter case, we provide filter options for candidate gene regions, custom genomic regions, coverage, homozygosity, and restriction to rare variants using data from gnomAD (33) and the 1000 Genomes Project (34).
Figure 1.
FABIAN-variant interface for a single variant. (A) Users can choose between a single variant, multiple variants, or a VCF file. (B) Input field for a chromosomal annotation of a single variant (e.g. 1:160001799G>C). Variants can also be entered as nucleotide sequences by clicking ‘Enter sequences directly’ (e.g. GGCCCTC...>TCACACT...). (C) ‘Known TFBSs’ searches for transcription factors known to bind at the location of the variant based on ENCODE, Ensembl, or FANTOM5 data. ‘Select individually...’ and ‘Paste names...’ open fields to submit a custom set of transcription factors. (D) The type of models can be restricted to TFFMs, PWMs, or data from specific sources. The numbers in parentheses update automatically and indicate the number of models included in the search based on the current input.
Users can choose to include all 5014 models and all 1387 transcription factors or limit the search to specific factors and models. The search can be restricted to transcription factors for which there are known binding sites at the genomic location of the variant. Other options are to use only TFFMs, only PWMs, or only models from a specific database.
Results for search of a single variant are available immediately after clicking on ‘Analyse’. If all models and not >100 variants are included in the search, FABIAN-variant usually returns the results in <90 s.
Results overview
A sample results page for two pathogenic promoter variants is shown in Figure 2 (chr1:155271258T>C has been reported to disable binding of the erythroid transcription factor GATA1 (35) and chr1:160001799G>C to disrupt binding of SP1 (36)). Coloured cells indicate the likelihood of a loss (red) or gain (blue) of a TFBS due to the variant based on the combined prediction of different models per variant. Deeper shades of the colour represent a greater loss or gain. Moving the mouse pointer over a coloured cell reveals the individual model scores.
Figure 2.
Results page for two promoter variants. (A) The results are divided into five sections for better readability. Variants are plotted in columns, transcription factors in rows. Coloured cells indicate the potential loss (red) or gain (blue) of a TFBS due to the variant. (B) Legend. Deeper shades of red or blue represent a greater loss or gain. Known TFBSs at the location of a variant are displayed with a border around the cell. Please note that the TFBSs obtained from ChIP-seq experiments are regions of several hundred bases and we do not know where within these stretches the real binding sites are located. (C) Users can define sorting and filters to limit the displayed data. (D) 145 transcription factors are currently shown in the table. The number is refreshed automatically based on active filters. (E) Results can be downloaded.
The results page provides access to all results for all variants, transcription factors, and models included in the search. Because the amount of results can be overwhelming for large searches, there are several sorting and filtering options at the top that can be used to reorder or hide information on the page. Changes to the options are immediately reflected in the results table. The filter options allow the user to only show transcription factors within a specified genomic region, with a predicted loss or gain of a TFBS, with a known TFBS at the location of a variant, or those which are manually selected with the mouse pointer. FABIAN-variant automatically applies pagination beyond 100 variants. We provide a download of the complete results in TSV format and a summary based on the selected filters.
Detailed results
Clicking on a coloured cell brings up detailed results for an individual variant and a single transcription factor (Figure 3). The page includes the model scores, an option to print results, reference and variant sequences, a list of all known TFBSs at the variant location, as well as sequence logos for the different models.
Figure 3.
The detailed results page is shown after clicking on the corresponding cell in the results table. (A) Options to download or print details on this page. (B) Six PWMs and four TFFMs for transcription factor SP2 were evaluated for variant chr1:160001799G>C (GRCh37). Higher scores in the reference sequence (WT) than in the variant sequence (MT) indicate a possible loss of a TFBS. (C) The combined prediction is shown below the individual model scores. (D) Reference and variant sequences (variant at position +1). (E) A list of known TFBSs at the variant location. (F) Sequence logos and information content for the ten models (abridged in the Figure).
IMPLEMENTATION
FABIAN-variant is an acronym for FAst BInding-site ANalysis and has been optimised for computational efficiency. The FABIAN-variant web server uses Perl CGI to run the C++ backend and a PostgreSQL database. The frontend includes JavaScript and Ajax for interactivity. Job scheduling is provided by Slurm. The code does not use other third-party libraries.
Each TFFM score is computed with a custom C++ implementation of the forward-backward algorithm from the GHMM library (37). Position count matrices (PCMs) were converted to PWMs using the method described in (38) based on the background nucleotide distribution in the human genome.
DISCUSSION
Since their inclusion in the JASPAR database for transcription factors, TFFMs are gaining visibility. FABIAN-variant is the first web application that can not only analyse variant effects with PWMs but also with TFFMs.
Because there are millions of non-coding variants in any human genome, it is not helpful to search for effects on transcription factor binding for all of them – one would drown in results. However, the search for potentially regulatory variants may be very helpful if restricted to candidate genes known to be involved in the patient’s disease. This might also reveal the ‘second mutation’ in case of recessive disorders where likely deleterious variants such as premature termination codons are only found on one allele.
Although FABIAN-variant is in principle capable of analysing all variants found in a typical whole-genome sequencing project, we have reduced the number of variants subjected to analysis to 10 000. Generating millions of results for each of the 1387 transcription factors covered by FABIAN-variant would lead to a plethora of data nobody would or could study. Instead, we provide filter options to restrict the analysis to variants found in a specific region or near functional or positional candidate genes.
A limitation of FABIAN-variant is that larger deletions that abolish the complete TFBS cannot be analysed because our application is aimed at the analysis of variants within the TFBS.
FABIAN-variant is aimed at the fast analysis of the variant effect on transcription factor binding on the sequence level and does not consider regulatory features that might affect transcription factor binding (e.g. chromatin accessibility, as implemented in SEMpl (39)).
OUTLOOK
The infrastructure of FABIAN-variant has been implemented in a way that is easily extensible when new versions of the underlying data are released. Additionally, we are considering adding deep learning-based models such as DeepBind (40) to the application.
We also plan to provide a simple API for the analysis of single variants from within other applications.
DATA AVAILABILITY
FABIAN-variant can be accessed at https://www.genecascade.org/fabian/. This website is free and open to all users without login requirement or use of cookies.
The results page for each analysis has a unique URL that can be used to access, share, or download results at a later time. Results are kept on the server for three days, after which time they are automatically deleted. Users can also choose to directly delete their data on the results page.
Links to the documentation and a tutorial are provided on the homepage. The documentation has a link to download the underlying TFFM and PWM model definitions.
Contributor Information
Robin Steinhaus, Exploratory Diagnostic Sciences, Berlin Institute of Health, 10117 Berlin, Germany; Institute of Medical Genetics and Human Genetics, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353 Berlin, Germany.
Peter N Robinson, The Jackson Laboratory for Genomic Medicine, Farmington, CT 06030, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT 06030, USA.
Dominik Seelow, Exploratory Diagnostic Sciences, Berlin Institute of Health, 10117 Berlin, Germany; Institute of Medical Genetics and Human Genetics, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, 13353 Berlin, Germany.
FUNDING
PNR was supported by NIH NICHD [5R01HD103805-02]. DS was supported by Deutsche Forschungsgemeinschaft (DFG) [FOR2841 TP05, TP09]. Funding for open access charge: Open Access Publication Fund of the Charité – Universitätsmedizin Berlin.
Conflict of interest statement. None declared.
REFERENCES
- 1. Reuter J.A., Spacek D.V., Snyder M.P.. High-throughput sequencing technologies. Mol. Cell. 2015; 58:586–597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Steinhaus R., Proft S., Schuelke M., Cooper D.N., Schwarz J.M., Seelow D.. MutationTaster2021. Nucleic Acids Res. 2021; 49:W446–W451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Adzhubei I.A., Schmidt S., Peshkin L., Ramensky V.E., Gerasimova A., Bork P., Kondrashov A.S., Sunyaev S.R.. A method and server for predicting damaging missense mutations. Nat. Methods. 2010; 7:248–249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Sim N.-L., Kumar P., Hu J., Henikoff S., Schneider G., Ng P.C.. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res. 2012; 40:W452–W457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Rentzsch P., Witten D., Cooper G.M., Shendure J., Kircher M.. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 2019; 47:D886–D894. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Smedley D., Schubach M., Jacobsen J.O., Köhler S., Zemojtel T., Spielmann M., Jäger M., Hochheiser H., Washington N.L., McMurry J.A.et al.. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease. Am. J. Hum. Genet. 2016; 99:595–606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Schwarz J.M., Hombach D., Köhler S., Cooper D.N., Schuelke M., Seelow D.. RegulationSpotter: annotation and interpretation of extratranscriptic DNA variants. Nucleic Acids Res. 2019; 47:W106–W113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Lee T.I., Young R.A.. Transcriptional regulation and its misregulation in disease. Cell. 2013; 152:1237–1251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Nougier C., Roualdes O., Fretigny M., d’Oiron R., Costa C., Negrier C., Vinciguerra C.. Characterization of four novel molecular changes in the promoter region of the factor VIII gene. Haemophilia. 2014; 20:e149–e156. [DOI] [PubMed] [Google Scholar]
- 10. Xu Y., Krishnan A., Wan X.S., Majima H., Yeh C.-C., Ludewig G., Kasarskis E.J., St Clair D.K.. Mutations in the promoter reveal a cause for the reduced expression of the human manganese superoxide dismutase gene in cancer cells. Oncogene. 1999; 18:93–102. [DOI] [PubMed] [Google Scholar]
- 11. Jang Y.J., LaBella A.L., Feeney T.P., Braverman N., Tuchman M., Morizono H., Ah Mew N., Caldovic L.. Disease-causing mutations in the promoter and enhancer of the ornithine transcarbamylase gene. Hum. Mutat. 2018; 39:527–536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Wasserman W.W., Sandelin A.. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 2004; 5:276–287. [DOI] [PubMed] [Google Scholar]
- 13. Hombach D., Schwarz J.M., Robinson P.N., Schuelke M., Seelow D.. A systematic, large-scale comparison of transcription factor binding site models. BMC Genomics. 2016; 17:1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Castro-Mondragon J.A., Riudavets-Puig R., Rauluseviciute I., Berhanu Lemma R., Turchi L., Blanc-Mathieu R., Lucas J., Boddie P., Khan A., Manosalva Pérez N.et al.. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2022; 50:D165–D173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Kulakovskiy I.V., Vorontsov I.E., Yevshin I.S., Sharipov R.N., Fedorova A.D., Rumynskiy E.I., Medvedeva Y.A., Magana-Mora A., Bajic V.B., Papatsenko D.A.et al.. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res. 2018; 46:D252–D259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Pachkov M., Balwierz P.J., Arnold P., Ozonov E., Van Nimwegen E.. SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates. Nucleic Acids Res. 2012; 41:D214–D220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Luscombe N.M., Laskowski R.A., Thornton J.M.. Amino acid–base interactions: a three-dimensional analysis of protein–DNA interactions at an atomic level. Nucleic Acids Res. 2001; 29:2860–2874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Man T.-K., Stormo G.D.. Non-independence of Mnt repressor–operator interaction determined by a new quantitative multiple fluorescence relative affinity (QuMFRA) assay. Nucleic Acids Res. 2001; 29:2471–2478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Barash Y., Elidan G., Friedman N., Kaplan T.. Modeling dependencies in protein-DNA binding sites. Proceedings of the Seventh Annual International Conference on Research in Computational Molecular Biology. 2003; 28–37. [Google Scholar]
- 20. Mukherjee S., Berger M.F., Jona G., Wang X.S., Muzzey D., Snyder M., Young R.A., Bulyk M.L.. Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nat. Genet. 2004; 36:1331–1339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Siebert M., Söding J.. Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences. Nucleic Acids Res. 2016; 44:6055–6069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Mathelier A., Wasserman W.W.. The next generation of transcription factor binding site prediction. PLoS Comput. Biol. 2013; 9:e1003214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Kulakovskiy I.V., Vorontsov I.E., Yevshin I.S., Soboleva A.V., Kasianov A.S., Ashoor H., Ba-Alawi W., Bajic V.B., Medvedeva Y.A., Kolpakov F.A.et al.. HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res. 2016; 44:D116–D125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Zhao Y., Ruan S., Pandey M., Stormo G.D.. Improved models for transcription factor binding site identification using nonindependent interactions. Genetics. 2012; 191:781–790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Siddharthan R. Dinucleotide weight matrices for predicting transcription factor binding sites: generalizing the position weight matrix. PloS One. 2010; 5:e9722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Grant C.E., Bailey T.L., Noble W.S.. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011; 27:1017–1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Coetzee S.G., Coetzee G.A., Hazelett D.J.. motifbreakR: an R/Bioconductor package for predicting variant effects at transcription factor binding sites. Bioinformatics. 2015; 31:3847–3849. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Zambelli F., Pesole G., Pavesi G.. Pscan: finding over-represented transcription factor binding site motifs in sequences from co-regulated or co-expressed genes. Nucleic Acids Res. 2009; 37:W247–W252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Zogopoulos V.L., Spaho K., Ntouka C., Lappas G.A., Kyranis I., Bagos P.G., Spandidos D.A., Michalopoulos I.. TFBSPred: A functional transcription factor binding site prediction webtool for humans and mice. Int. J. Epigenet. 2021; 1:1–11. [Google Scholar]
- 30. Consortium E.P.et al.. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489:57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Howe K.L., Achuthan P., Allen J., Allen J., Alvarez-Jarreta J., Amode M.R., Armean I.M., Azov A.G., Bennett R., Bhai J.et al.. Ensembl 2021. Nucleic Acids Res. 2021; 49:D884–D891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Abugessaisa I., Shimoji H., Sahin S., Kondo A., Harshbarger J., Lizio M., Hayashizaki Y., Carninci P., Forrest A., Kasukawa T.et al.. FANTOM5 transcriptome catalog of cellular states based on Semantic MediaWiki. Database. 2016; 2016:baw105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P.et al.. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020; 581:434–443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. G.P. Consortium A global reference for human genetic variation. Nature. 2015; 526:68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Manco L., Ribeiro M.L., Máximo V., Almeida H., Costa A., Freitas O., Barbot J., Abade A., Tamagnini G.. A new PKLR gene mutation in the R-type promoter region affects the gene transcription causing pyruvate kinase deficiency. Br. J. Haematol. 2000; 110:993–997. [DOI] [PubMed] [Google Scholar]
- 36. Almeida A.M., Murakami Y., Layton D.M., Hillmen P., Sellick G.S., Maeda Y., Richards S., Patterson S., Kotsianidis I., Mollica L.et al.. Hypomorphic promoter mutation in PIGM causes inherited glycosylphosphatidylinositol deficiency. Nat. Med. 2006; 12:846–851. [DOI] [PubMed] [Google Scholar]
- 37. Schliep A., Costa I.G.. General Hidden Markov Model library (GHMM). 2022; http://ghmm.sourceforge.net/.
- 38. Bucher P. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J. Mol. Biol. 1990; 212:563–578. [DOI] [PubMed] [Google Scholar]
- 39. Nishizaki S.S., Ng N., Dong S., Porter R.S., Morterud C., Williams C., Asman C., Switzenberg J.A., Boyle A.P.. Predicting the effects of SNPs on transcription factor binding affinity. Bioinformatics. 2020; 36:364–372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Alipanahi B., Delong A., Weirauch M.T., Frey B.J.. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 2015; 33:831–838. [DOI] [PubMed] [Google Scholar]
- 41. Shannon P., Richards M.. MotifDb: An Annotated Collection of Protein-DNA Binding Sequence Motifs. 2022; R package version 1.36.0. 10.18129/B9.bioc.MotifDb. [DOI]
- 42. Weirauch M.T., Yang A., Albu M., Cote A.G., Montenegro-Montero A., Drewe P., Najafabadi H.S., Lambert S.A., Mann I., Cook K.et al.. Determination and inference of eukaryotic transcription factor sequence specificity. Cell. 2014; 158:1431–1443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Xie Z., Hu S., Blackshaw S., Zhu H., Qian J.. hPDI: a database of experimental human protein–DNA interactions. Bioinformatics. 2010; 26:287–289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Jolma A., Yan J., Whitington T., Toivonen J., Nitta K.R., Rastas P., Morgunova E., Enge M., Taipale M., Wei G.et al.. DNA-binding specificities of human transcription factors. Cell. 2013; 152:327–339. [DOI] [PubMed] [Google Scholar]
- 45. Hume M.A., Barrera L.A., Gisselbrecht S.S., Bulyk M.L.. UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein–DNA interactions. Nucleic Acids Res. 2015; 43:D117–D122. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
FABIAN-variant can be accessed at https://www.genecascade.org/fabian/. This website is free and open to all users without login requirement or use of cookies.
The results page for each analysis has a unique URL that can be used to access, share, or download results at a later time. Results are kept on the server for three days, after which time they are automatically deleted. Users can also choose to directly delete their data on the results page.
Links to the documentation and a tutorial are provided on the homepage. The documentation has a link to download the underlying TFFM and PWM model definitions.