Abstract
Summary
Sequence logos were introduced nearly 30 years ago as a human-readable format for representing consensus sequences, and they remain widely used. As new experimental and computational techniques have developed, logos have been extended: extra symbols represent covalent modifications to nucleotides, logos with multiple letters at each position illustrate models with multi-nucleotide features and symbols extending below the x-axis may represent a binding energy penalty for a residue or a negative weight output from a neural network. Web-based visualization tools for genomic data are increasingly taking advantage of modern web technology to offer dynamic, interactive figures to users, but support for sequence logos remains limited. Here, we present LogoJS, a Javascript package for rendering customizable, interactive, vector-graphic sequence logos and embedding them in web applications. LogoJS supports all the aforementioned logo extensions and is bundled with a companion web application for creating and sharing logos.
Availability and implementation
LogoJS is implemented both in plain Javascript and ReactJS, a popular user-interface framework. The web application is hosted at logojs.wenglab.org. All major browsers and operating systems are supported. The package and application are open-source; code is available at GitHub.
Contact
zhiping.weng@umassmed.edu
Supplementary information
Supplementary data are available at Bioinformatics online.
1 Introduction
Sequence logos have been a popular format for representing biological sequence patterns since their introduction nearly 30 years ago (Schneider and Stephens, 1990). Symbols are scaled in height by their frequencies at each position in the sequence, and each position’s total height may further be scaled by the position’s information content to visually emphasize significant positions. As new assays and computational techniques have emerged, various sequence logo extensions have been introduced, including new symbols for covalent modifications, such as cytosine methylation (Ngo et al., 2019; Viner et al., 2016; Zuo et al., 2017), multi-letter symbols to represent multi-nucleotide features from transcription factor binding models (Kulakovskiy et al., 2018; Rube et al., 2018), and negative letter heights to represent nucleotide depletion (Dey et al., 2018; Thomsen and Nielsen, 2012), residue binding energy penalties (Foat et al., 2006), and negative weights from deep-learned models of transcription factor binding (Greenside et al., 2018).
Web-based tools for generating sequence logos include WebLogo (Crooks et al., 2004) and Seq2Logo (Thomsen and Nielsen, 2012), and offline tools include seqLogo (Bembom, 2014), RWebLogo and ggseqlogo (Wagih, 2017). These tools support a range of features, but to our knowledge, none can generate the full range of logo types described above. Additionally, they generate static logo images, which are difficult to annotate or make interactive after the fact within a web application.
Publicly available genomic data are expanding at an accelerating pace. Visualization techniques are increasingly utilizing Javascript-based tools, which are highly interactive and allow users to explore results from a variety of assays in real-time (Down et al., 2011; Durand et al., 2016; Kerpedjiev et al., 2018; Thorvaldsdóttir et al., 2013; Vanderkam et al., 2016). Support for sequence logos within this paradigm remains limited, however, and prominent motif databases predominantly render static image logos without support for dynamic or interactive features (Khan et al., 2018; Kulakovskiy et al., 2018). Existing Javascript packages for rendering logos are generally far more limited in scope than the aforementioned web-based and offline tools, with little to no support for extended features, such as custom alphabets and dinucleotide features (Larsen; Lichtenberg; Maguire et al.). We designed LogoJS to fill this void.
2 Implementation
LogoJS is an open-source Javascript package for creating embeddable, publication-ready sequence logos in scalable vector-graphic format (Supplementary Fig. S1, left). Examples of the package’s flexibility are presented in Figure 1.
Fig. 1.
Functionalities of LogoJS. (A–H) Example logos rendered with LogoJS. (A) DNA logo. (B) Protein logo. (C) Methylated DNA logo. (D) Logo with negative letters. (E and F) Dinucleotide and trinucleotide logos. (G) Annotated logo highlighting a SNP interrupting a TF binding motif. (H) DNA–protein interaction logo. (I and J) Screenshots from the built-in web application. (I) FASTA editor: the user can paste sequences to view and edit a motif in real-time. (J) Upload editor: the user can upload motifs in MEME, JASPAR, or TRANSFAC format and save or share them
LogoJS offers built-in support for standard DNA, RNA and protein logos in frequency and information content space (Fig. 1A and B). Input formats include position weight, probability, or frequency matrices (PWMs, PPMs and PFMs), raw-value matrices and FASTA sequences. When computing PWMs from other input formats, LogoJS offers several adjustable parameters. Small sample sizes can be accounted for by subtracting a small sample error correction factor from positions with few aligned sequences, or adding a constant pseudocount to each position. Users can also provide custom background frequencies for each symbol, and can choose to include alignment gaps in FASTA sequences in the total sequence count for each position. Using these options, LogoJS faithfully reproduces the output of existing tools; implementation details and a comparison against outputs from other tools are available in the Supplementary Figures S2–S4. Other methods for computing symbol heights are not explicitly supported but may be implemented by the user using the raw-value matrix input option.
LogoJS supports a wide range of sequence logos beyond standard DNA, RNA and protein logos. Users may generate logos using any combination of custom symbols including capital letters, lower-case letters and digits; an example using ‘MW’ to represent methylated CpG dinucleotides on the plus and minus strands is shown in Figure 1C. In contrast to many other packages, LogoJS can accept a raw-value matrix as input for letter heights, allowing for symbols with negative height (Fig. 1D). We also support the display of multiple consecutive letters or digits in a single position, which allows, e.g. logos with dinucleotide, trinucleotide, or even higher-order features (Fig. 1E and F).
Rendered logos are vector graphics, so custom annotations are easily added: a logo can, e.g. highlight a variant interrupting a motif (Fig. 1G) or illustrate interacting residues between DNA and protein sequences (Fig. 1H). The logos are also interactive; symbols can respond to mouse events through a bundled application programming interface, so the embedding application may update or display extended information when the user mouses over or clicks a part of the logo. This built-in interactivity is, to our knowledge, unique to LogoJS. An extended feature comparison with other tools is available in the Supplementary Material.
To demonstrate features of LogoJS and provide visual documentation, we deployed a companion web application at logojs.wenglab.org (Supplementary Fig. S1, right). Galleries of the use cases from Figure 1(A–H) are available within the application along with code to generate them. The application can also render logos from uploaded FASTA sequences or output from common tools and databases, such as MEME (Bailey et al., 2009, 2015), JASPAR (Khan et al., 2018) and TRANSFAC (Wingender et al., 1996). Users may download the resulting logos in vector or non-vector image formats, obtain code for embedding them at their website, or obtain links to share them with others (Fig. 1J). Users can also manually or programmatically generate direct URLs for logos using a combination of GET parameters for embedding or sharing. More details are available in the Supplementary Material.
In summary, LogoJS is a lightweight, open-source Javascript package offering flexible, extensible and interactive sequence logos for sharing and embedding in websites. To our knowledge, LogoJS offers the richest and most adaptable feature set of any existing Javascript logo package.
Supplementary Material
Acknowledgements
We thank Mingshi Gao, Jack Huey, Hao Chen, Kaili Fan and Greg Andrews for productive discussions and beta testing.
Funding
This work was supported by the National Institutes of Health [HG009446].
Conflict of Interest: Zhiping Weng co-founded and serves as a scientific advisor for Rgenta Inc.
References
- Bailey T.L. et al. (2009) MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res., 37, W202–W208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bailey T.L. et al. (2015) The MEME suite. Nucleic Acids Res., 43, W39–W49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bembom O. (2014) Sequence Logos for DNA Sequence Alignments. https://www.bioconductor.org/packages/release/bioc/vignettes/seqLogo/inst/doc/seqLogo.pdf (6 February 2020, date last accessed).
- Crooks G.E. et al. (2004) WebLogo: a sequence logo generator. Genome Res., 14, 1188–1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dey K.K. et al. (2018) A new sequence logo plot to highlight enrichment and depletion. BMC Bioinformatics, 19, 473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Down T.A. et al. (2011) Dalliance: interactive genome viewing on the web. Bioinformatics, 27, 889–890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Durand N.C. et al. (2016) Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst., 3, 99–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Foat B.C. et al. (2006) Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE. Bioinformatics, 22, e141–e149. [DOI] [PubMed] [Google Scholar]
- Greenside P. et al. (2018) Discovering epistatic feature interactions from neural network models of regulatory DNA sequences. Bioinformatics, 34, i629–i637. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kerpedjiev P. et al. (2018) HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol., 19, 125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Khan A. et al. (2018) JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res., 46, D260–D266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kulakovskiy I.V. et al. (2018) HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res., 46, D252–D259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Larsen S. jseqlogo Github. https://github.com/SimonLarsen/jseqlogo (6 February 2020, date last accessed).
- Lichtenberg S. d3-sequence-logo Github. https://github.com/splichte/d3-sequence-logo (6 February 2020, date last accessed).
- Maguire E. et al. Redesigning the Sequence Logo with Glyph-based Approaches to Aid Interpretation. https://isa-tools.org/wp-content/uploads/2014/07/sequencelogoredesign.pdf (6 February 2020, date last accessed).
- Ngo V. et al. (2019) Finding de novo methylated DNA motifs. Bioinformatics, 35, 3287–3293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rube H.T. et al. (2018) A unified approach for quantifying and interpreting DNA shape readout by transcription factors. Mol. Syst. Biol., 14, e7902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schneider T.D., Stephens R.M. (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res., 18, 6097–6100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomsen M.C.F., Nielsen M. (2012) Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res., 40, W281–W287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thorvaldsdóttir H. et al. (2013) Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform., 14, 178–192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vanderkam D. et al. (2016) pileup.js: a JavaScript library for interactive and in-browser visualization of genomic data. Bioinformatics, 32, 2378–2379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Viner C. et al. (2016) Modeling methyl-sensitive transcription factor motifs with an expanded epigenetic alphabet. bioRxiv, 043794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wagih O. (2017) ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics, 33, 3645–3647. [DOI] [PubMed] [Google Scholar]
- Wingender E. et al. (1996) TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res., 24, 238–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zuo Z. et al. (2017) Measuring quantitative effects of methylation on transcription factor-DNA binding affinity. Sci. Adv., 3, eaao1799. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.