The qBED track: a novel genome browser visualization for point processes

Arnav Moudgil; Daofeng Li; Silas Hsu; Deepak Purushotham; Ting Wang; Robi D Mitra

doi:10.1093/bioinformatics/btaa771

. 2020 Nov 24;37(8):1168–1170. doi: 10.1093/bioinformatics/btaa771

The qBED track: a novel genome browser visualization for point processes

Arnav Moudgil ^1,^2,³, Daofeng Li ^4,⁵, Silas Hsu ^6,⁷, Deepak Purushotham ^8,⁹, Ting Wang ^10,¹¹, Robi D Mitra ^12,^13,^✉

Editor: Robinson Peter

PMCID: PMC8150125 PMID: 32941613

Abstract

Summary

Transposon calling cards is a genomic assay for identifying transcription factor binding sites in both bulk and single cell experiments. Here, we describe the qBED format, an open, text-based standard for encoding and analyzing calling card data. In parallel, we introduce the qBED track on the WashU Epigenome Browser, a novel visualization that enables researchers to inspect calling card data in their genomic context. Finally, through examples, we demonstrate that qBED files can be used to visualize non-calling card datasets, such as Combined Annotation-Dependent Depletion scores and GWAS/eQTL hits, and thus may have broad utility to the genomics community.

Availability and implementation

The qBED track is available on the WashU Epigenome Browser (http://epigenomegateway.wustl.edu/browser), beginning with version 46. Source code for the WashU Epigenome Browser with qBED support is available on GitHub (http://github.com/arnavm/eg-react and http://github.com/lidaof/eg-react). A complete definition of the qBED format is available as part of the WashU Epigenome Browser documentation (https://eg.readthedocs.io/en/latest/tracks.html#qbed-track). We have also released a tutorial on how to upload qBED data to the browser (http://dx.doi.org/10.17504/protocols.io.bca8ishw).

1 Introduction

Advances in genomic technologies lead to new data formats and visualizations. The Human Genome Project originated the popular Browser Extensible Data (BED) standard for describing genomic intervals (Kent et al., 2002), while the bedGraph and wiggle (.wig/.bigWig) formats have emerged as flexible standards for encoding genome-wide signals, such as from normalized ChIP- or ATAC-seq assays (Kent et al., 2010).

We have developed transposon calling cards to identify genome-wide transcription factor (TF) binding sites (TFBSs) (Wang et al., 2007, 2012) using a TF of interest fused to a transposase. The fusion construct deposits reporter transposons into the genome near TFBS. We visualize these data by depicting each insertion as a discrete point along the (genomic) x-axis and the number of reads supporting that particular insertion on the y-axis. The result resembles a scatterplot in which an increased density of insertions is observed near TFBS. Historically, we relied on an in-house genome browser called GNASHY to visualize calling card data. While useful, this browser was restricted to viewing one sample at a time, and it did not support conventional genomic formats. Thus, any comparative analysis against orthogonal data relied on manually aligning images from different browsers (Wang et al., 2012).

Calling card technology is currently undergoing a renaissance. For example, we have recently reported mapping TFBS in vivo in both bulk (Cammack et al., 2020) and single cell preparations (Moudgil et al., 2020). As the scope of calling cards grows, we anticipate greater interest and increasingly complex visualizations. Here, to better support users, we describe the qBED format, a new text-based standard for storing calling card data. We also describe the qBED track, a companion interface for visualizing qBED data on the WashU Epigenome Browser. Finally, we demonstrate the format’s flexibility by providing examples of non-calling card genomic data visualized as qBED tracks.

2 Implementation and applications

We named our format qBED because it stores multidimensional, quantitative information about quantized events, such as calling card transpositions. Formally, qBED follows the BED3 + 3 standard, inheriting BED’s half-open, 0-based intervals. qBED files are compatible with programs like bedtools (Quinlan and Hall, 2010) and can be compressed and indexed with bgzip and tabix, respectively (Li et al., 2009). For calling card data, the first three columns denote the chromosome, start and end coordinates of the transposon insertion. The fourth column encodes a numerical value—in this case, the number of reads supporting each insertion—and is the last required column. The fifth and sixth columns are optional, but recommended, as the former denotes the strand that was targeted, while the latter encodes an annotation string, such as a sample-specific barcode (Fig. 1A).

Fig. 1. — Overview of the qBED format and qBED tracks. (A) Example of a qBED file encoding transposon calling card data. The first three columns are inherited from the BED standard and encode the location of the insertion site. The fourth column stores the number of reads observed for each entry, while the fifth denotes strand. The sixth and final column is an annotation recording the sample-specific barcode for each insertion in the library. (B) Screenshot of qBED tracks depicting calling card data in the WashU Epigenome Browser. qBED features appear on two-dimensional tracks, with genomic position along the x-axis and a numerical value on the y-axis (here, log-transformed read counts). An informational panel appears upon rollover of a calling card insertion, revealing read count, strand, barcode and approximate location. Right-clicking on a qBED track pulls up a configuration panel. Tracks can be customized with respect to color, size, y-axis limits and transformations, marker size, opacity and sample size. Finally, orthogonal datasets like TF and histone ChIP-seq can be directly displayed alongside calling card data

To visualize qBED files, we created the qBED track and implemented it in the WashU Epigenome Browser (Li et al., 2019), a leading portal for epigenomic analysis. Each qBED element is drawn as a circular marker in two-dimensional space: genomic position along the x-axis and numerical value along the y-axis (Fig. 1B). When an element spans more than one base, the marker is drawn at the midpoint of the interval. Moreover, multiple markers can co-occur at the same x-coordinate, stratifying across the y-axis. qBED tracks support interactive exploration of data. A rollover pane appears as a cursor approaches an element (Fig 1B), displaying the numerical value, strand and annotation (columns 4, 5 and 6, respectively). At the top of the rollover pane is an approximate (to the nearest pixel) genomic location. Right-clicking on the track opens a customization panel, where track color, marker size and opacity can be set. For very large datasets, a random subsample of the data can be displayed which reduces overplotting and memory consumption. Finally, the qBED track enables calling card data to be natively juxtaposed alongside other datasets, such as ChIP-seq from the same cell type.

The qBED format may also be broadly useful for existing genomic analyses. We present two such examples. Combined Annotation-Dependent Depletion (CADD) scores integrate multiple data streams to predict the deleteriousness of genomic variants (Kircher et al., 2014). These are typically displayed as vertical lines depicting the maximum score observed for each variant (Fig. 2A). We converted CADD scores for indels from Variant Call Format (VCF) to a qBED file, encoding the CADD score in the numeric column and the mutation in the annotation column. A view of the homeobox gene CRX reveals a cluster of strongly deleterious indels in the terminal exon (Fig. 2A). This display enables inspection of individual polymorphisms and emphasizes the density of variants along both the genomic (x) and CADD (y) axes, offering an unvarnished look at the complete spectrum of CADD scores.

Fig. 2. — Application of the qBED specification to other genomic datasets. (A) Top: CADD scores for the gene *CRX*, as visualized on the UCSC Genome Browser. Bottom: CADD scores visualized on the WashU Epigenome Browser after conversion to qBED. Genomic position is along the x-axis and Phred-style CADD scores are along the y-axis. The mouseover pane reveals more information on an individual variant. (B) eQTLs for CD20+ B cells visualized as qBED tracks. The top track shows all significant eQTLs in view plotted as a BED (density) track, followed by a qBED representation of the same data. The y-axis represents the negative base-ten logarithm of the P-value. The next three tracks show significant eQTLs for the genes *GSDMB*, *ORMDL3* and *ZPBP2*, respectively. Finally, we show H3K27ac ChIP-seq (coverage on the y-axis) and a super-enhancer for this cell type. A mouseover pane can reveal further details stored in the qBED file, including Reference SNP ID and mutation

A second application of qBED files is in genome wide association studies (GWAS) and expression quantitative trait locus (eQTL) mapping, which correlate single nucleotide polymorphisms (SNPs) with either phenotypes or gene expression, respectively. Epigenetic context can be used to prioritize variants as many significant SNPs fall in non-coding regions (Tak and Farnham, 2015). However, quantitative views of SNPs are not supported by most genome browsers: investigators either manually align separate images of SNPs and epigenetic profiles; or encode SNPs as BED tracks, emphasizing SNP density but sacrificing the quantitative associations (Farh et al., 2015). We converted a publicly available eQTL dataset from CD20+ B cells (Schmiedel et al., 2018) to qBED format, storing the negative base-ten logarithm of the P-value in the numeric column; and storing the mutation and linked gene in the annotation field. We simultaneously plotted H3K27ac ChIP-seq data and a track of super-enhancers (The ENCODE Project Consortium, 2012) for the same cell type (Fig. 2B). The qBED visualization shows both the density of variants and the significance of each variant, alongside epigenetic context, all in a single pane. We can also separate eQTLs by target gene and assign them to individual tracks, revealing how genes in close proximity to each other can have different eQTL effect sizes from the same locus. In particular, eQTLs associated with GSDMB and ORMDL3 expression span a large swath of flanking DNA, including overlapping an adjacent super-enhancer, while eQTLs associated with ZPBP2 expression are restricted to a much narrower segment. This example demonstrates how the qBED track can bridge the fields of association studies and epigenomics, simplifying certain kinds of analyses for researchers.

The qBED track is best positioned for exploring dense, quantitative data driven by high-resolution point processes. As such, it can be a useful complement to the popular lollipop track (Jay et al., 2016; Lee et al., 2019), either as a way to publish figures of raw data without the clutter of lollipop stems; or as a way to inspect data before choosing individual points for further annotation and emphasis. Moreover, by first specifying the x- and y- (columns 1–3 and 4, respectively) values, the qBED format prioritizes the two-dimensional relationship of the data. This may also have some advantages for data compression as the strand and annotation columns are not required and can be added if additional specificity is required. In contrast, interval-based formats like BED and ENCODE’s tagAlign would require six fields to store the same data. Finally, where tagAlign stores the actual sequence of each read, qBED defers to the reference genome for the sequence at the x coordinate. The annotation field can be used to record departures from the reference, similar to the way VCF encodes SNPs, but remains broadly flexible for the end user.

3 Conclusion

The qBED format and its accompanying track offer researchers the ability to visualize genomic point processes—such as transposon insertions, polymorphism deleteriousness or variant associations—by adding a numerical y-axis for stratifying features on the genomic x-axis. While the six-column format presented here is complete enough for existing analyses, extra columns could be added to encode additional information for each element. These could be visualized, pending browser support, with a numerical colour scale, for quantitative data; or different marker shapes, for categorical data.

Funding

This work was supported by the National Institutes of Health [F30 HG009986 to A.M.; RF1 MH117070 and R01 GM123203 to R.D.M.; R01 HG007354, R01 HG007175, R01 ES024992, U01 CA200060, U24 ES026699 and U01 HG009391 to T.W.] and the American Cancer Society [RSG-14-049-01-DMC to T.W.].

Conflict of Interest: none declared.

Contributor Information

Arnav Moudgil, Department of Genetics, St. Louis, MO 63110, USA; Edison Family Center for Genome Sciences and Systems Biology, St. Louis, MO 63110, USA; Medical Scientist Training Program, Washington University in St. Louis School of Medicine, St. Louis, MO 63110, USA.

Daofeng Li, Department of Genetics, St. Louis, MO 63110, USA; Edison Family Center for Genome Sciences and Systems Biology, St. Louis, MO 63110, USA.

Silas Hsu, Department of Genetics, St. Louis, MO 63110, USA; Edison Family Center for Genome Sciences and Systems Biology, St. Louis, MO 63110, USA.

Deepak Purushotham, Department of Genetics, St. Louis, MO 63110, USA; Edison Family Center for Genome Sciences and Systems Biology, St. Louis, MO 63110, USA.

Ting Wang, Department of Genetics, St. Louis, MO 63110, USA; Edison Family Center for Genome Sciences and Systems Biology, St. Louis, MO 63110, USA.

Robi D Mitra, Department of Genetics, St. Louis, MO 63110, USA; Edison Family Center for Genome Sciences and Systems Biology, St. Louis, MO 63110, USA.

References

Cammack A.J. et al. (2020) A viral toolkit for recording transcription factor–DNA interactions in live mouse tissues. Proc. Natl. Acad. Sci. USA, 117, 10003–10014. [DOI] [PMC free article] [PubMed] [Google Scholar]
Farh K.K.-H. et al. (2015) Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature, 518, 337–343. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jay J.J. et al. (2016) Lollipops in the clinic: information dense mutation plots for precision medicine. PLoS One, 11, e0160519. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kent W.J. et al. (2002) The Human Genome Browser at UCSC. Genome Res., 12, 996–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kent W.J. et al. (2010) BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics, 26, 2204–2207. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kircher M. et al. (2014) A general framework for estimating the relative of human genetic variants. Nat. Genet., 46, 310–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lee C.M. et al. (2019) UCSC Genome Browser enters 20th year. Nucleic Acids Res., 48, D756–D761. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H. et al. ; 1000 Genome Project Data Processing Subgroup. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25, 2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li D. et al. (2019) WashU Epigenome Browser update 2019. Nucleic Acids Res., 47, W158–W165. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moudgil A. et al. (2020) Self-reporting transposons enable simultaneous readout of gene expression and transcription factor binding in single cells. Cell, 182, 992–1008.e21. [DOI] [PMC free article] [PubMed] [Google Scholar]
Quinlan A.R., Hall I.M. (2010) BEDTools: a flexible suite of utilities comparing genomic features. Bioinformatics, 26, 841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schmiedel B.J. et al. (2018) Impact of genetic polymorphisms on human immune cell gene expression. Cell, 175, 1701–1715.e16. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tak Y.G., Farnham P.J. (2015) Making sense of GWAS: using epigenomics and genome engineering to understand the functional relevance of SNPs in non-coding regions of the human genome. Epigenet. Chromatin, 8, 57. [DOI] [PMC free article] [PubMed] [Google Scholar]
The ENCODE Project Consortium. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang H. et al. (2007) Calling cards for DNA-binding proteins. Genome Res., 17, 1202–1209. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang H. et al. (2012) “Calling Cards” for DNA-binding proteins in mammalian cells. Genetics, 190, 941–949. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B1] Cammack A.J. et al. (2020) A viral toolkit for recording transcription factor–DNA interactions in live mouse tissues. Proc. Natl. Acad. Sci. USA, 117, 10003–10014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B2] Farh K.K.-H. et al. (2015) Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature, 518, 337–343. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B3] Jay J.J. et al. (2016) Lollipops in the clinic: information dense mutation plots for precision medicine. PLoS One, 11, e0160519. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B4] Kent W.J. et al. (2002) The Human Genome Browser at UCSC. Genome Res., 12, 996–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B5] Kent W.J. et al. (2010) BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics, 26, 2204–2207. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B6] Kircher M. et al. (2014) A general framework for estimating the relative of human genetic variants. Nat. Genet., 46, 310–315. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B7] Lee C.M. et al. (2019) UCSC Genome Browser enters 20th year. Nucleic Acids Res., 48, D756–D761. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B8] Li H. et al. ; 1000 Genome Project Data Processing Subgroup. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25, 2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B9] Li D. et al. (2019) WashU Epigenome Browser update 2019. Nucleic Acids Res., 47, W158–W165. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B10] Moudgil A. et al. (2020) Self-reporting transposons enable simultaneous readout of gene expression and transcription factor binding in single cells. Cell, 182, 992–1008.e21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B11] Quinlan A.R., Hall I.M. (2010) BEDTools: a flexible suite of utilities comparing genomic features. Bioinformatics, 26, 841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B12] Schmiedel B.J. et al. (2018) Impact of genetic polymorphisms on human immune cell gene expression. Cell, 175, 1701–1715.e16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B13] Tak Y.G., Farnham P.J. (2015) Making sense of GWAS: using epigenomics and genome engineering to understand the functional relevance of SNPs in non-coding regions of the human genome. Epigenet. Chromatin, 8, 57. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B14] The ENCODE Project Consortium. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B15] Wang H. et al. (2007) Calling cards for DNA-binding proteins. Genome Res., 17, 1202–1209. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btaa771-B16] Wang H. et al. (2012) “Calling Cards” for DNA-binding proteins in mammalian cells. Genetics, 190, 941–949. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

The qBED track: a novel genome browser visualization for point processes

Arnav Moudgil

Daofeng Li

Silas Hsu

Deepak Purushotham

Ting Wang

Robi D Mitra

Roles

Abstract

Summary

Availability and implementation

1 Introduction

2 Implementation and applications

Fig. 1.

Fig. 2.

3 Conclusion

Funding

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

The qBED track: a novel genome browser visualization for point processes

Arnav Moudgil

Daofeng Li

Silas Hsu

Deepak Purushotham

Ting Wang

Robi D Mitra

Roles

Abstract

Summary

Availability and implementation

1 Introduction

2 Implementation and applications

Fig. 1.

Fig. 2.

3 Conclusion

Funding

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases