The UCSC Genome Browser

Donna Karolchik; Angie S Hinrichs; W James Kent

doi:10.1002/0471250953.bi0104s28

. Author manuscript; available in PMC: 2010 Dec 1.

Published in final edited form as: Curr Protoc Bioinformatics. 2009 Dec;CHAPTER:Unit1.4. doi: 10.1002/0471250953.bi0104s28

The UCSC Genome Browser

Donna Karolchik ¹, Angie S Hinrichs ², W James Kent ³

PMCID: PMC2834533 NIHMSID: NIHMS164001 PMID: 19957273

Abstract

The University of California Santa Cruz (UCSC) Genome Browser (genome.ucsc.edu) is a popular Web-based tool for quickly displaying a requested portion of a genome at any scale, accompanied by a series of aligned annotation “tracks”. The annotations—generated by the UCSC Genome Bioinformatics Group and external collaborators—display gene predictions, mRNA and expressed sequence tag alignments, simple nucleotide polymorphisms, expression and regulatory data, phenotype and variation data, and pairwise and multiple-species comparative genomics data. All information relevant to a region is presented in one window, facilitating biological analysis and interpretation. The database tables underlying the Genome Browser tracks can be viewed, downloaded, and manipulated using another Web-based application, the UCSC Table Browser. Users can upload data as custom annotation tracks in both browsers for research or educational use. This unit describes how to use the Genome Browser and Table Browser for genome analysis, download the underlying database tables, and create and display custom annotation tracks.

Keywords: Genome Browser, Table Browser, UCSC, human genome, genome analysis, comparative genomics, human variation, Bioinformatics, Bioinformatics Fundamentals, Biological Databases

INTRODUCTION

The rapid progress of public sequencing and analysis efforts on vertebrate genomes has increased the demand for tools that offer quick and easy access to the data and annotations at many levels and facilitate comparative data analysis. The University of California Santa Cruz (UCSC) Genome Bioinformatics Web site at http://genome.ucsc.edu provides links to a variety of genome analysis tools, most notably the UCSC Genome Browser (Kent et al., 2002; Kuhn et al., 2009), a graphical tool for viewing a specified region of a genome and a collection of aligned annotation “tracks.” Another tool on the Web site—the UCSC Table Browser—supplies convenient access to the MySql database tables (Karolchik et al., 2003) underlying the Genome Browser annotations. Both browsers support a custom annotation tracks feature that enables users to upload their own data for display and comparison.

The main protocol of this unit (see Basic Protocol) describes how to display and navigate through a specific section of a genome and its associated annotation tracks in the Genome Browser, configure the view to focus on annotations of interest and optimize comparative analysis, link to external information, and download sequence or annotation data. Support Protocol 1 explains the process for creating and displaying a custom annotation track based on the user’s own data. Support Protocol 2 provides a basic overview of the UCSC Table Browser, describing the most commonly used functions, how to set up a simple query, and an introduction to some of the advanced features. The Genome Browser annotations and software continually evolve as new data and techniques become available; therefore, it is recommended that the user consult the UCSC Web site (http://genome.ucsc.edu) and the current version of the User’s Guide (http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html) for the latest information on new releases and features.

BASIC PROTOCOL: USING THE UCSC GENOME BROWSER

The Genome Browser software and data may be accessed on the Internet from the UCSC Genome Bioinformatics Group Web site at http://genome.ucsc.edu.

Necessary Resources

Hardware

Unix, Windows, or Macintosh workstation with an Internet connection and a minimum display resolution of 800 × 600 dpi.

Software

An up-to-date Internet browser that supports JavaScript, such as Firefox 3.0 and higher (http://www.mozilla.com/firefox); Internet Explorer 6.0 and higher (http://www.microsoft.com/ie); or Safari 3.0 and higher (http://www.apple.com/safari). The browser must have cookies enabled.

Files

None

Navigate to the Genome Browser window to a specific genomic position

1.
Open the UCSC Genome Bioinformatics Group home page, at http://genome.ucsc.edu, in a Web browser.

The UCSC Genome Bioinformatics home page provides links to the Genome Browser application and a variety of other useful tools: BLAT (Kent et al., 2002), for quickly mapping sequences to a genome assembly; the Table Browser (Karolchik et al., 2004; Kuhn et al., 2009), for viewing and manipulating the data underlying the Genome Browser; the Gene Sorter (Kent et al., 2005), for exploring relationships (expression, homology, etc.) among groups of genes; VisiGene, for browsing through a large collection of in situ mouse and frog images to examine expression patterns; the Proteome Browser (Hsu et al., 2005), for viewing information about a selected protein; an in silico PCR tool for rapidly searching a sequence database with a pair of PCR primers; and Genome Graphs, a tool for viewing quantities plotted along chromosomes. General information about the Genome Browser tool suite can be found in the User’s Guide—accessed via the Help link—and the FAQ. From the home page, the user can also download the genomic sequence and annotation data, display contributed custom tracks and older archived data, review a log of released data, and access helpful utilities, training materials, credits for contributors and collaborators, mirror information, and related publications.
2.
Click the Genome Browser link in the left-hand sidebar menu to open the Genome Browser Gateway page.

On the Gateway page (Fig. 1.4.1), the user can set the parameters that determine which portion of a genome the Genome Browser will initially display. The bottom portion of the page provides information about the currently selected genome assembly and a list of sample position queries that can be used to open the browser.

Alternatively, the Genome Browser can be accessed by clicking on the BLAT link on the home page and then searching a DNA or protein sequence for regions of homology (step 16).
3.
Select the clade, genome, and assembly of interest, then type one or more search terms or a set of genomic coordinates into the “position or search term” text box to specify the genome region to display. Click the “submit” button.

The position search supports direct positional queries such as chromosome bands or chromosome coordinate ranges, as well as queries related to genomic features such as gene symbols, mRNA or EST accession numbers, author names, or other descriptive terms likely to occur in GenBank (Benson et al., 2009). The Gateway page shows examples of valid position requests applicable to the selected genome assembly.

If the position query is resolved to a single location, the Genome Browser will display a page containing an annotation track image specific to the position query, accompanied by navigation controls and display controls (Fig. 1.4.2). Frequently, the position search returns a list of several matches in response to a query rather than immediately displaying the Genome Browser page. When this occurs, click on the item of interest and the Genome Browser will open to that location. Invalid position queries (e.g., withdrawn gene names, abandoned synonyms, misspelled identifiers, and data added after the last Genome Browser database update) will result in a warning message and the previous or default position will be retained.

A custom annotation track can be uploaded into the Genome Browser by clicking the “add custom tracks” button on the Gateway page. For more information on creating and uploading custom annotation tracks, see Support Protocol 1.

To access an older genome assembly that is no longer available from the assembly menu, look in the Genome Browser archives at http://genome-archive.cse.ucsc.edu

. Several aspects of the Genome Browser display can be customized by clicking the “configure tracks and display” button (see step 8).

Figure 1.4.1 — The Genome Browser Gateway page, set up to span the central portion of chromosome 7 (chr7:45,000,000-70,000,000) in the March 2006 human assembly (NCBI36). Custom annotation tracks (Support Protocol 1) can be uploaded by clicking the “add custom tracks” button. The initial Genome Browser display may be configured by clicking the “configure tracks and display” button. The lower portion of this page (not shown) displays a description of the selected assembly, relevant links, and examples of queries that may be entered in the “position or search term” box.

Figure 1.4.2 — The Genome Browser annotation track page zoomed in to display the PHOX2B gene on human chromosome 4, March 2006 assembly (NCBI36). The navigation and configuration buttons are visible above and below the image. The red rectangle in the ideogram directly above the annotation tracks image indicates the location of the currently displayed region of the chromosome. The SNPs (130) track visibility has been changed from dense to pack to show individual SNPs, some of which are colored according to gene region (e.g. UTR, coding-synonymous or coding-nonsynonymous). Three additional tracks have been added to the display by changing their visibilities from hide to pack: GAD View and OMIM Genes in the Phenotype and Disease Associations group, and TFBS Conserved in the Regulation group. PHOX2B is a developmental gene that has also been associated with cancer; move the mouse over the PHOX2B item in the GAD View track in order to see a list of diseases associated with the gene. In the Vertebrate Multiz Alignment & Conservation track, note the areas of high conservation peaking in the upstream region (to the right because PHOX2B is on the antisense strand), UTRs, most exons as well as part of the first intron.

Browse and configure the annotation tracks display

4.
Examine the Genome Browser annotation tracks page (Fig. 1.4.2).

This image displays a set of annotation tracks aligned beneath a Base Position track (the “ruler”) indicating genomic coordinate positions. Tracks are organized into groups reflecting the nature of their data. The first time the Genome Browser is opened, the application’s default values are used to configure this display. Any preferences and configurations set during the session will be retained for use in subsequent sessions on the same Web browser. To reset the display to the set of default tracks for the selected assembly, click the “default tracks” button.

The annotation tracks image is accompanied by controls to configure the display and navigate through the sequence. For selected assemblies, a chromosome band ideogram directly above the image graphically indicates the location of the currently displayed region on the overall chromosome. Custom annotation tracks can be uploaded to the current assembly by clicking the “custom tracks” button below the image (see Support Protocol 1 for more information).

Figure 1.4.2 shows the annotation track image opened to the position of the gene PHOX2B on chromosome 4. To reach this position, enter “PHOX2B” in the position/search box, select the first matching item (the UCSC Genes PHOX2B), and then click the zoom out 1.5x button. Note that the Genome Browser automatically changes the text in the Position box to show the chromosomal position of the resulting display. In most annotation tracks, the aligned regions are represented by vertical bars or blocks. In the Spliced ESTs track shown in this example, the degree of darkness of the block shading corresponds to the number of features aligning to the region. In the mRNA and gene prediction tracks, the thicker regions (usually coding exons) are connected by thin horizontal lines representing gaps (usually spliced-out introns). Thinner blocks on the leading and trailing ends of the aligning regions in gene tracks represent the 5′ and 3′ untranslated regions (UTRs). In full or pack display mode, arrowheads on the connecting lines indicate the direction of transcription.

Note the comparative genomics annotations displayed in Figure 1.4.2. The Conservation track shows a measure of evolutionary conservation among multiple species, which tends to indicate functional regions of the genome. The lower section of the track shows pairwise alignments of each species to the reference sequence; the top section displays the evolutionary conservation scores assigned by the phyloP program in the PHAST package (Siepel et al., 2005). At this level of detail, the phyloP scores highlight exons, untranslated regions (UTRs) and other regions that show signs of conservation across species.

To generate a high-quality image of this annotation tracks image in PostScript or PDF format, click the PDF/PS link in the top menu bar.
5.
Change the display mode of an annotation track by locating the track’s name in the Track Controls section below the image, selecting a display mode from the track’s pull-down menu, and then clicking the “refresh” button.

Depending on individual display modes, annotation tracks may be hidden from view (hide mode), displayed with all features collapsed into a single line (dense mode), or fully expanded with each feature on a separate line (full mode). Many tracks feature two additional display modes: pack mode, in which each feature is displayed and labeled, but not necessarily on a separate line, and squish mode, which is similar to pack mode, but displays unlabeled features at half-height. To quickly toggle between dense and full (or pack) modes in the annotation track image, click on the track’s label. To hide all the tracks in the display, click the “hide all” button beneath the annotation tracks image.

By adjusting the display modes of the tracks in the annotation track graphic the user can restrict the display to data of interest, reduce clutter, and improve speed. Dense display mode is useful to get an overview of an annotation or to reduce the space used by a track when the individual feature details of an annotation track are not required. Squished and packed displays show individual feature details of densely populated tracks while conserving space. Use full mode sparingly: in some tracks, the number of features that may potentially align at a selected position can be quite large. When the feature count is excessive in full display mode, the browser displays the track in pack mode if possible; if the track does not support pack mode, it displays the first 250 items individually, then groups the remaining items into a single line in dense mode at the bottom of the track.
6.
To change the image to a new genomic position, type a different set of search terms into the position/search box, then click the “jump” button.

Figure 1.4.3 shows the larger region obtained by entering the query 22q13.32;22q13.33 on the March 2006 (NCBI36) human genome assembly. Several tracks that display best in large regions due to the sparseness of their annotations have been added to the display, and several tracks whose many items would saturate the display have been hidden. At this large scale, the completeness of the assembly is indicated by the sparse gaps, and it is easy to see regions of relative gene density or scarcity Coarse measures such as population genetic statistics have more of a perceivable signal, while fine-scale measures such as the per-base Conservation scores have almost no signal due to averaging over large numbers of bases.
7.
Use the mouse drag-and-zoom feature or the “zoom” and “move” buttons to increase or decrease the breadth of the displayed coordinate range, or to shift one or both ends of the coordinate range to the left or right.

To quickly zoom in to an exact coordinate range, click on the desired leftmost coordinate in the Base Position track and drag the mouse to the right to highlight the region of interest. The navigation buttons are useful for generally focusing the display on a position. “Zoom” buttons increase or decrease the displayed coordinate range by 1.5-, 3-, or 10-fold. To zoom in by 3-fold on a particular coordinate, click the Base Position track at that location. To rapidly zoom in to the base composition of the sequence underlying the current annotation track image, click the zoom-in “base” button. “Move” buttons shift the displayed coordinates in the indicated direction by approximately 10%, 50%, or 95% of the displayed size. To scroll the coordinate position of one side of the track display while holding the position of the opposite end static, click the corresponding “move start” or “move end” arrow button. For example, to preserve the left-hand display coordinate but increase the right-hand coordinate, click the “move end” forward arrow. To increase or decrease the scroll interval, edit the number in the “move start” or “move end” text box.
8.
Click the “configure” button above or below the annotation tracks image to access a Web page for changing display characteristics (such as the image width and text size), hiding, showing, or reordering track groups, and displaying the chromosome ideogram, the track controls section, and image labels. Click the “submit” button on this page to apply the changes and return to the annotation tracks page.

The default display width of the annotation tracks graphic is optimized for smaller monitors with lower resolutions. Most displays are no longer subject to these limitations; in these situations the visible portion of the genome can be increased and the need for screen redraws can be reduced by setting the image width to a larger number.

Exercise caution when using the “show all” option in the track configuration section: if the group or assembly has a large amount of annotation data, the Web browser session may freeze or terminate before the data sets are loaded.
9.
Click the vertical button to the left of a displayed track to view additional information about the annotation and (in many cases) to filter or configure the features displayed in the track.

The description page can also be displayed by clicking the track’s name in the “track controls” section.

Click the button adjacent to the UCSC Genes track to view an example of a typical description page. This page contains a configuration/filter section (when applicable) followed by a description of the annotation track, information about interpreting and configuring the track display, a discussion of the methods used to collect and compute the data, credits for authors and contributors, associated references, and in this case, restrictions on the use of the data. Additional credits can be found by clicking the Credits link on the home page.

Most of the tracks in the Genome Browser have filter or configuration options that modify the graphical characteristics or restrict the display to features that match filtering criteria. Filters are useful for focusing attention on relevant features when a track contains large amounts of data. Some of the more complex graphical annotations, such as the continuous value graph (“wiggle”) display featured in the Conservation track, offer an extensive set of configuration options. In most cases, detailed configuration information can be found in the “Display Conventions and Configuration” section on the description page.

Filter and configuration settings are persistent from session to session on the same Web browser. To revert to the original default settings for a track, manually restore the settings on the description page; to undo all changes that have been made to default settings for any track or tool, click the “Click here to reset” link on the Gateway page.
10.
Click on the label of a feature in a track shown in pack or full display mode to view detailed information about the feature and access links to additional information.

The types of information available vary by track. Enter HOXA1 into the position/search box, click the jump button, and select the first matching item under UCSC Genes. In the track display image, click on the HOXA1 gene label in the UCSC Genes track in Figure 1.4.3 to view an extensive collection of information about the gene, including the associated UniProt (The UniProt Consortium, 2009) and RefSeq (Pruitt et al., 2009) descriptions, microarray expression data, links to associated information about this gene in several UCSC tools (such as the Gene Sorter, Proteome Browser, and Table Browser) as well as links to related records in external databases, including Online Mendelian Inheritance in Man (OMIM; Amberger et al., 2009; UNIT 1.2), Entrez Gene (Sayers et al., 2009), GeneLynx (Lenhard et al., 2001), GeneCards (Safran et al., 2003), AceView, PubMed (Sayers et al., 2009; UNIT 1.3), the HUGO Gene Nomenclature Committee Database (HGNC; Bruford et al., 2008), the Cancer Genome Anatomy Project (CGAP; Strausberg et al., 2001), PDB (Deshpande et al., 2005), ModBase (Pieper et al., 2006), InterPro (Hunter et al., 2009), Pfam (Finn et al., 2008), the Stanford SOURCE, Jackson Lab, and the Allen Brain Atlas. The page also includes hyperlinks that will display the corresponding protein, mRNA, and genomic sequences for HOXA1. These sequences are a useful source of input into the BLAT tool, which will be discussed in step 16.

The Genome Browser also provides direct links to the Ensembl Browser (Hubbard et al., 2009; UNIT 1.15) and NCBI’s Map Viewer (Sayers et al., 2009; UNIT 1.5), when available. To view the complementary annotation in one of these browsers, return to the annotation tracks page and click the Ensembl or NCBI link in the top menu bar.

Figure 1.4.3 — The Genome Browser annotation track page displaying chromosome bands 22q13.32 and 22q13.33 in the March 2006 human assembly (NCBI36). Several tracks useful for display of large regions have been made visible: from the Mapping and Sequencing Tracks group, Chromosome Bands, Recombination Rate and Gap; from the Phenotype and Disease Associations, GAD View, OMIM Genes and RGD Human QTLs; and from the Variation group, HGDP Smoothed F_ST (fixation index), HGDP XP-EHH (estimated likelihood of positive selection), Tajima’s D (measure of nucleotide diversity) and DGV Structural Variation. “Squish” display mode (Basic Protocol 1, step 5) has been set for UCSC Genes and DGV structural variation, in order to show the density of items in those tracks along the genome. Several tracks have been hidden because they have so many items in this large region that they would display as solid bars in dense mode, or take up large amounts of vertical space if displayed in pack or squish mode.

Examine the underlying data and download the sequence and annotation data tables

11.
Click the DNA link on the annotation tracks page menu bar to view the DNA sequence underlying the features in the image. This option allows the user to change the formatting and coloring of the text that represents the sequence to highlight features of interest.

The initial window that displays provides options for marking or masking repeats, changing the case of the letters that represent the DNA, showing the reverse complement of the sequence, and displaying additional sequence upstream or downstream of the selected sequence. Click the “extended case/color options” button to display additional font and color configuration options.

The Extended DNA Case/Color Options page is useful for highlighting features within a genomic sequence, pointing out overlaps between two types of features, or masking out unwanted features. In Figure 1.4.4, the configuration has been set to display exons from the UCSC Genes track in uppercase letters. The Spliced EST track is configured to reflect the level of coverage by setting its color to RGB value (0,64,0). When the Submit button is clicked, the Extended DNA Output window displays exons in uppercase. Nucleotides covered by a single EST appear as dark green, while regions with more EST alignments appear progressively brighter, saturating at four ESTs.

Be careful when requesting complex formatting for a large chromosomal region: when all the HTML tags have been added to the output page, the file size may exceed the limits that the Web browser, clipboard, and other software can display.
12.
Click on the Tables link on the annotation tracks page menu bar to examine the database tables underlying the Genome Browser annotation tracks.

The Table Browser tool provides a graphical interface for viewing and manipulating Genome Browser data. Support Protocol 2 gives a brief introduction to using the Table Browser. Additional information can be found in the Table Browser User’s Guide accessible from the Help link in the Table Browser top menu bar.
13.
Click the Home link on the top menu bar to return to the UCSC Genome Bioinformatics home page, then click the Downloads link on the side bar to display a listing of sequence files and database tables available for downloading.

The Downloads page contains links to all the Genome Browser assemblies, annotations, and source code available on the Genome Browser downloads server. To access older assembly versions, it may be necessary to look in the archives (http://genome-archive.cse.ucsc.edu). Data is also downloadable at the Genome Browser FTP site (ftp://hgdownload.cse.ucsc.edu/goldenPath/). FTP or rsync is recommended for large data downloads. All data in the Genome Browser are freely available, except where noted in the README.txt file specific to a particular downloads directory. The Genome Browser and BLAT source are freely available for academic, noncommercial, and personal use; commercial licensing information can be found via the Licenses link on the home page.

Figure 1.4.4 — An extended DNA Case/Color Options request to display the DNA for the chr7:27,098,861-27,102,416 region of the March 2006 human assembly. This configuration will show UCSC Genes in uppercase, all other regions in lowercase, and Spliced ESTs in varying intensities of green, depending on the level of coverage.

Convert coordinates in the displayed range to a different assembly using the Convert, LiftOver, or BLAT tools

14.
Return to the annotation tracks page. Click the Convert link in the menu bar to convert the coordinates in the displayed range to those of a different assembly.

The coordinate conversion tool is useful for locating the position of a feature of interest in a different genome assembly. Coordinates of features frequently change from one assembly to the next as gaps are closed, strand orientations are corrected, and duplications are reduced. For example, to map the location of a sequence in the May 2004 human assembly to the March 2006 human assembly, open the May 2004 Genome Browser to the desired position, click the Convert link, select the Mar. 2006 option in the New Assembly pull-down menu, then click the “Submit” button. If successful, the Convert tool displays one or more coordinate ranges in the March 2006 assembly to which the May 2004 sequence maps.
15.
To convert multiple sets of sequence coordinates between assemblies or to exert control over the parameters used in the conversion, use the LiftOver batch coordinate conversion tool.

The LiftOver tool can be accessed from the Utilities link on the Genome Bioinformatics home page. Enter the list of coordinate ranges in the large text box, one per line, or upload the list from a file. Detailed information about parameter settings can be found at the bottom of the page, as well as information about a Linux command-line version of the tool.
16.
Alternatively, use BLAT to map a sequence to a different assembly:
1. To determine the sequence underlying the region currently displayed on the annotation tracks image, click the DNA link in the top menu bar on the annotation tracks page, then click the “get DNA” button. For more information on the Get DNA utility, see step 11. To find the sequence of a specific feature within the annotation track image, click on the feature label to display its details page. In most cases, the sequence is available as a link from this page. Note that BLAT limits input to 25,000 bases.
2. Using the Web browser’s copy function, copy the entire sequence onto the clipboard. Return to the previous page and click the BLAT link in the top menu bar.
3. On the BLAT Web page, paste the sequence into the large text box (Fig. 1.4.5). Select the genome and assembly to which to map the sequence, then click the “submit” button. If successful, BLAT will display a list of search results sorted by score (Fig. 1.4.6).
4. To view the details of the matching alignments, click the “details” link; to display the sequence in the Genome Browser, click the “browser” link.
This procedure demonstrates one use of the BLAT search tool. This tool, which can be accessed from the BLAT link on the top menu bar of most Genome Browser pages, is a very fast sequence alignment tool similar to BLAST (UNITS 3.3 & 3.4), but optimized for inputs with high similarity, e.g. sequences from the same species. For more information on BLAT, refer to the Genome Browser User’s Guide.

Figure 1.4.5 — A BLAT search set up to align the FASTA sequence in the text box against the March 2006 human genome assembly. This sequence was obtained by copying and pasting the first 600 bases of output from the Get DNA search illustrated in Figure 1.4.4.

Figure 1.4.6 — The results returned by the BLAT search shown in Figure 1.4.5. Clicking on the “browser” link for a given line will display the data in the Genome Browser; the “details” link will display a page showing a base-by-base of the alignment to the genome.

SUPPORT PROTOCOL 1: CREATING A CUSTOM ANNOTATION TRACK

Custom annotation tracks enable users to upload personal data for temporary use in the Genome Browser and Table Browser. Custom tracks are viewable only on the machine from which they are uploaded, and the data may be accessed only by the users on that machine. Tracks are kept for 48 hr after the last time accessed or until the user switches to a different genome assembly; no permanent archives are created. Optionally, users can make custom annotations viewable by others as well.

Typically, custom annotation tracks are displayed under the corresponding genomic positions on the Base Position track. Each custom track has its own track control and persists even when not displayed in the Genome Browser window (e.g., if the position changes to a range that no longer includes the track).

Since space is limited in the annotation track graphic, many excellent genome-wide tracks must be excluded from the set provided with the browser. A Web page with links to several user-contributed custom tracks can be found by clicking the Custom Tracks link on the home page. The information in this section provides an overview of custom annotation tracks. For a more detailed discussion of formats and syntax, refer to the Genome Browser custom annotation track documentation Web page at http://genome.ucsc.edu/goldenPath/help/customTrack.html.