Abstract
The UCSC Genome Browser (https://genome.ucsc.edu) is a widely utilized web-based tool for visualization and analysis of genomic data, encompassing over 4000 assemblies from diverse organisms. Since its release in 2001, it has become an essential resource for genomics and bioinformatics research. Annotation data available on Genome Browser includes both internally created and maintained tracks as well as custom tracks and track hubs provided by the research community. This last year's updates include over 25 new annotation tracks such as the gnomAD 4.1 track on the human GRCh38/hg38 assembly, the addition of three new public hubs, and significant expansions to the Genome Archive[GenArk) system for interacting with the enormous variety of assemblies. We have also made improvements to our interface, including updates to the browser graphic page, such as a new popup dialog feature that now displays item details without requiring navigation away from the main Genome Browser page. GenePred tracks have been upgraded with right-click options for zooming and precise navigation, along with enhanced mouseOver functions. Additional improvements include a new grouping feature for track hubs and hub description info links. A new tutorial focusing on Clinical Genetics has also been added to the UCSC Genome Browser.
Graphical Abstract
Introduction
The University of California Santa Cruz[UCSC) Genome Browser is an interactive web tool that allows for the visualization, retrieval, and analysis of genomic data from a wide range of organisms. Launched in 2001, the UCSC Genome Browser initially provided access to a single draft genome assembly (1). Over the years, it has significantly expanded and now hosts over 4000 assemblies, encompassing data from numerous species. Users can access specific genome assemblies through the Gateway page by entering the assembly name or the GC accession number identifier from GenBank (2). Users can also search for and request specific assemblies that may not be readily available in the existing database by using the Assembly Request page (https://genome.ucsc.edu/assemblyRequest.html).
The UCSC Genome Browser's most frequently used tool provides a visual display of datasets that have been aligned to an assembly sequence; displayed datasets are commonly referred to as tracks. The most popular assemblies among our users, human GRCh38/hg38 and GRCh37/hg19, feature over 37 000 data tracks. Our next most popular assemblies, for mice, provide over 9000 additional data tracks (3). Tracks are organized into several categories: Gene, Regulation, Variation, and others to provide a structured way to navigate the vast amount of data available. To further aid users in navigating this extensive data collection, the UCSC Genome Browser offers a Track Search feature. This feature allows users to input search terms that query track descriptions, group classifications, and track names within a selected assembly and receive back a list of relevant datasets. Users can also upload their data to the UCSC Genome Browser through custom tracks or track hubs to visualize those data alongside the natively available tracks. Beyond visualization, the UCSC Genome Browser also offers both web-based and command-line tools with over hundreds of utilities that can assist users with data analysis.
The UCSC Genome Browser is accessible via the primary UCSC site (https://genome.ucsc.edu) and can also be accessed through our European (https://genome-euro.ucsc.edu) and Asian (https://genome-asia.ucsc.edu) mirror sites. These mirror sites ensure reliable and fast access to the UCSC Genome Browser resources worldwide, catering to an international user base; serving over 7000 unique users daily, with an estimated annual user base of 1.4 million. For users whose requirements are incompatible with accessing a web server hosted externally, such as when working with protected or embargoed data, we offer alternative solutions. The Genome Browser in the Cloud (GBiC) installation script enables the setup of a full mirror of the UCSC Genome Browser on the user's own server or cloud infrastructure. Additionally, the Genome Browser in a Box (GBiB) provides a virtual machine version of the UCSC Genome Browser, designed to run locally on a laptop or desktop computer. Both options ensure that protected data remain securely within the user's environment without being transmitted to UCSC.
The UCSC Genome Browser is continually updated to incorporate new features and data. Software updates are released on a tri-weekly basis, accompanied by announcements detailing new features, track releases, and other updates. This regular update schedule ensures that the software remains at the forefront of genomic research tools, continually offering the latest data and functionality to its users.
New and updated annotations
Over the past year, the UCSC Genome Browser has undergone significant updates and expansions. Notably, more than 25 tracks have been added or updated across various assemblies. This includes both the addition of new clinical tracks, which provide vital information relevant to medical and translational research, as well as updates to existing gene tracks to ensure they reflect the latest genomic annotations and discoveries. We have also introduced three new public hubs.
New clinical tracks
This year, ten new clinical tracks were incorporated into our human assemblies. Among these, the gnomAD 4.1 track stands out; it presents variants from 807 162 individuals, including 730 947 exomes and 76 215 genomes (4). Additionally, we have introduced new prediction score tracks, including AbSplice, BayesDel, and Illumina SpliceAI. AbSplice is a method that predicts aberrant splicing across human tissues. Its track displays precomputed AbSplice scores for all possible single-nucleotide variants genome-wide (5). BayesDel provides a deleteriousness meta-score for coding and non-coding variants, single nucleotide variants and small insertions/deletions (6). SpliceAI is an open-source deep learning splicing prediction algorithm that can predict splicing alterations caused by DNA variations (7).
Another significant addition to our human assemblies is the DECIPHER Dosage Sensitivity track. This track utilizes an ensemble machine learning model to predict dosage sensitivity probabilities (pHaplo & pTriplo) for all autosomal genes. It has identified 2 987 haploinsufficient and 1 559 triplosensitive genes, including 648 genes uniquely identified as triplosensitive (8).
Gene set updates
Six gene tracks have been added and updated in both human and mouse assemblies, including the GENCODE KnownGene tracks, now updated to version 46 for human (V46) and version VM35 for mouse (9). The GENCODE ‘KnownGene’ v45lift37 gene track has also been integrated for the hg19 assembly, along with the GENCODE/UCSC Genes Archive superTrack, which contains the 2013 UCSC Genes track for reproducibility purposes. The GENCODE KnownGene tracks serve as the default gene tracks for human and mouse assemblies, with each gene in these tracks is associated with metadata and corresponding records at other resources.
A RefSeq Historical track has been added for the hg38 assembly, enabling the search of previous RefSeq transcript versions, including NM_ accessions and HGVS searches (10,11). We have also included gene tracks that undergo annual automatic updates, such as the HGMD and MANE tracks for human assemblies. The HGMD track displays transcripts with clinical variants from the Human Gene Mutation Database (HGMD) (12). The Matched Annotation from NCBI and EMBL-EBI (MANE) track showcases high-confidence transcripts that are identically annotated between RefSeq (NCBI) and Ensembl/GENCODE (led by EMBL-EBI) (13).
In addition, the search results have been enhanced to prioritize MANE transcripts, which are now displayed as the top result (Figure 1A). The color of the MANE transcripts has also been updated to distinguish them from other transcripts (Figure 1B).
Other new tracks
In addition, we have added and updated 10 new tracks to our vertebrate assemblies. Among these, the CRISPR Targets track for the Telomere-to-Telomere assembly (hs1 human) identifies DNA sequences that can be targeted by CRISPR RNA guides using the Cas9 enzyme from S. pyogenes (PAM: NGG) (14). The VISTA Enhancers track has been incorporated into both human and mouse assemblies, displaying potential enhancers whose activity has been experimentally validated in transgenic mice (15).
We have also added the EVA SNP release 6 for 37 assemblies, which includes mappings of single nucleotide variants and small insertions and deletions (indels) (16). Furthermore, the Variants of Concern track has been updated to include the latest WHO-designated variants of concern (VOC), highlighting amino acid and nucleotide mutations in SARS-CoV-2 variants as defined in December 2021 (17).
New hubs
We accept submissions of datasets or assemblies to be featured as ‘Public hubs’. These track hubs are announced upon their addition to the public hubs page. This year, we have introduced three new public hubs.
The first new public hub is the ImpactHub, which utilizes the Impact machine learning model to provide predictions across 707 pairs of transcription factors (TFs) and cell types for regulatory element activity (18). The second is the Predominant PAS hub, which displays the locations of predominant polyadenylation sites and predominant polyadenylation hexamers for almost 16 000 protein-coding genes (19). The third hub is an assembly hub featuring the Sunflower Sea Star (Pycnopodia helianthoides) (20). These hubs are maintained by their respective authors.
Genome Archive (GenArk)
We continue to expand our GenArk hub library, adding over 1000 new GenArk assemblies, each equipped with Genome Browser annotations and BLAT support. This expansion includes the addition of Telomere-to-Telomere (T2T) primate and mouse assemblies to GenArk. Furthermore, we have integrated IGV outlinks from GenArk index pages to enhance usability and data access.
Recently, we published a new paper titled ‘GenArk: Towards a Million UCSC Genome Browsers,’ which details our Genome Archive (GenArk) system and its ongoing development (21).
New genome browser software
We have updated our software, particularly enhancing the visualization of our pages. The browser graphic page has undergone several modifications aimed at improving the user experience.
Browser graphic page
The browser graphic page displays data annotations, known as tracks, for a reference genome. Users can zoom, drag, and configure these display annotations. The browser graphic page has undergone several updates, the addition of trash icons next to custom tracks for quick removal, enhanced the display for more track columns on wider screens, increased font sizes for dialog boxes, and reduction in text amount on the page (Figure 2).
We have introduced the new Item Details feature which simplifies the user experience by displaying track item details in a pop-up dialogue box. This feature allows the information to be viewed without the need to navigate away from the current page (Figure 3).
The genePred tracks, such as the GENCODE and NCBI RefSeq tracks, have undergone several updates. By right-clicking on a genePred track, users now have options for zooming, entering an exon position, or entering a codon for quicker navigation within the browser graphic (Figure 4A). The mouseOver function in genePred tracks has also been enhanced to display the phase of the first and last codon. Additionally, it now indicates whether exons are in-frame or out-of-frame (Figure 4B).
The search box on the browser graphic display allows users to enter position queries or find terms that match track data, track descriptions, help documents, and public hub track descriptions. We have updated the search box which now shows the five most recent search terms. Any genes searched within the browser's graphical display or terms selected from the search results page will be displayed beneath the search box (Figure 5).
New hub features
We have introduced a new grouping feature for track hubs, which allows the structured organization of tracks into distinct groups (Figure 6). This can reduce the need for multiple hubs, each requiring separate files, by consolidating tracks into grouped hubs, where each group is managed within a single hub. This feature can be applied to a UCSC genome, a GenArk assembly, or an assembly hub. These track hub groups are kept separate from other track hubs and the native UCSC Genome Browser track groups, allowing for greater organizational flexibility. For instance, you can add a ‘genes’ group without causing conflicts or confusion.
Additionally, we have added an info link to the track hub's blue bar name, directing users to the hub description page for detailed information about the track hub. We have also implemented the highlighting of genomic DNA covered by track hubs using the Extended DNA Case/Color Options found under View → DNA Sequence (Figure 7). Highlighting options include case changes, underlining, bold, italics or color.
New and updated tools
We offer a REST API that enables querying of both annotation and sequence data from any UCSC genome assembly or hub (22). This year, we have expanded API support to include bigChain, bigMaf and bigDbSnp track types. Additionally, we introduced a new API function, revComp, which retrieves the reverse complement of a given sequence.
Tutorial
We previously offered an interactive introduction tutorial for new users and have now added a new tutorial specifically focused on Clinical Genetics in the UCSC Genome Browser. This clinical tutorial guides users on searching for variants and related queries using HGVS terms, genome coordinates, gene symbols, and specific annotation IDs like NM identifiers and rsIDs. It also explains how to find recommended track sets that help configure displays with relevant annotations for variant interpretation. It includes examples of selecting tracks such as the Clinical SNVs and Clinical CNVs track sets. The tutorial highlights other features that may assist in variant interpretation.
Email support
We continue to offer email support through both public and confidential mailing lists, where UCSC Genome Browser staff address questions regarding tools or data. More details about our mailing lists can be found at https://genome.ucsc.edu/contacts.html, including a link to the public list, which archives previously answered questions.
Future plans
A major goal is to implement user accounts with 10 GB of dedicated storage for uploading annotation files and attaching them as track hubs. As storage costs decrease, we anticipate that future upgrades to our storage array will provide significantly larger capacity for the same cost as our current system. Additionally, we aim to offer a hub maker interface tool for files uploaded into a user's storage space, facilitating the creation of basic hubs. We continue to make progress on the Liftover on the Fly feature, which enables the automatic lifting of annotations to unannotated genomes without the need for manual track lifting. Furthermore, we are developing more beginner-friendly tutorials, which will include guides on the Gateway page, UCSC Genome Browser graphic display, and the Table Browser.
Acknowledgements
The authors extend their gratitude to the users and data providers for their ongoing use and support of the Genome Browser. We also thank the UCSC IT group, including Jorge Garcia and Erich Weiler, our grant administrators, and the members of our Scientific Advisory Board.
Notes
Present address: Gerardo Perez, Genomics Institute, University of California Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA.
Contributor Information
Gerardo Perez, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Galt P Barber, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Anna Benet-Pages, Institute of Neurogenomics, Helmholtz Zentrum Munchen GmbH - German Research Center for Environmental Health, 85764 Neuherberg, Germany; Medical Genetics Center[Medizinisch Genetisches Zentrum), Munich 80335, Germany.
Jonathan Casper, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Hiram Clawson, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Mark Diekhans, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Clay Fischer, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Jairo Navarro Gonzalez, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Angie S Hinrichs, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Christopher M Lee, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Luis R Nassar, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Brian J Raney, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Matthew L Speir, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Marijke J van Baren, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Charles J Vaske, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
David Haussler, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
W James Kent, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Maximilian Haeussler, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
Data availability
All users can use the UCSC Genome Browser (https://genome.ucsc.edu/) freely but with exceptions to the source code for the Blat utility, liftOver utility and other utilities which are free for non-profit academic research and for personal use. A license is required for commercial use of these utilities or the source code.
Funding
National Human Genome Research Institute [U24HG002371 to G.P., G.P.B., J.C., H.C., M.D., C.F., J.N.G., A.S.H., C.M.L., L.R.N., B.J.R., M.L.S., M.J.v., C.J.V., D.H., W.J.K., M.H., U41HG010972 to B.J.R., M.H., U24HG007234 to M.D., R01HG010329 to M.D., D.H., R01HG010485 to M.D., D.H., U41HG010972 to B.J.R., D.H., RM1HG011543 to D.H.]; National Institute of Mental Health [R01MH120295 to M.D., RF1MH132662 to M.H., U24MH132628 to D.H., M.H.]; NIH Office of the Director [OT2OD033761 to M.D.]; California Department of Public Health [20-11088 to D.H.]; California Institute for Regenerative Medicine [DISC0-14514 to M.H.]; Centers for Disease Control and Prevention [75D30121C11554 to A.S.H.]; U.S. Department of Health and Human Services [6NU50CK000539-02-11 to A.S.H.].
Conflict of interest statement. G.P., G.P.B., J.C., H.C., M.D., C.F., J.N.G., A.S.H., C.M.L., L.R.N., B.J.R., M.L.S., D.H., W.J.K., M.H. receive royalties from the sale of UCSC Genome Browser source code, LiftOver, GBiB, and GBiC licenses to commercial entities. W.J.K. owns Kent Informatics.
References
- 1. Kent W.J., Sugnet C.W., Furey T.S., Roskin K.M., Pringle T.H., Zahler A.M., Haussler D.. The Human genome browser at UCSC. Genome Res. 2002; 12:996–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Benson D.A., Cavanaugh M., Clark K., Karsch-Mizrachi I., Ostell J., Pruitt K.D., Sayers E.W.. GenBank. Nucleic Acids Res. 2018; 46:D41–D47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Raney B.J., Barber G.P., Benet-Pagès A., Casper J., Clawson H., Cline M.S., Diekhans M., Fischer C., Navarro Gonzalez J., Hickey G.et al.. The UCSC Genome Browser database: 2024 update. Nucleic Acids Res. 2023; 52:D1082–D1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P.et al.. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020; 581:434–443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Wagner N., Çelik M.H., Hölzlwimmer F.R., Mertes C., Prokisch H., Yépez V.A., Gagneur J.. Aberrant splicing prediction across human tissues. Nat. Genet. 2023; 55:861–870. [DOI] [PubMed] [Google Scholar]
- 6. Feng B.-J. PERCH: a unified framework for disease gene prioritization. Hum. Mutat. 2017; 38:243–251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Jaganathan K., Panagiotopoulou S.K., McRae J.F., Darbandi S.F., Knowles D., Li Y.I., Kosmicki J.A., Arbelaez J., Cui W., Schwartz G.B., Chow E.D., Kanterakis E., Gao H., Kia A., Batzoglou S., Sanders S.J., Farh K.K.-H.. Predicting splicing from primary sequence with deep learning. Cell. 2019; 176:535–548. [DOI] [PubMed] [Google Scholar]
- 8. Collins R.L., Glessner J.T., Porcu E., Lepamets M., Brandon R., Lauricella C., Han L., Morley T., Niestroj L.-M., Ulirsch J.et al.. A cross-disorder dosage sensitivity map of the human genome. Cell. 2022; 185:3041–3055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Frankish A., Carbonell-Sala S., Diekhans M., Jungreis I., Loveland J.E., Mudge J.M., Sisu C., Wright J.C., Arnan C., Barnes I.et al.. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 2022; 51:D942–D949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. O’Leary N.A., Wright M.W., Brister J.R., Ciufo S., Haddad D., McVeigh R., Rajput B., Robbertse B., Smith-White B., Ako-Adjei D.et al.. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016; 44:D733–D745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. HGVS Recommendations for the Description of Sequence Variants: 2016 Update - Dunnen - 2016 - Human Mutation - Wiley Online Library (WWW Document) (2 October 2024, date last accessed) https://onlinelibrary.wiley.com/doi/10.1002/humu.22981. [DOI] [PubMed]
- 12. Stenson P.D., Mort M., Ball E.V., Shaw K., Phillips A.D., Cooper D.N.. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet. 2014; 133:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Morales J., Pujar S., Loveland J.E., Astashyn A., Bennett R., Berry A., Cox E., Davidson C., Ermolaeva O., Farrell C.M.et al.. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature. 2022; 604:310–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Haeussler M., Schönig K., Eckert H., Eschstruth A., Mianné J., Renaud J.-B., Schneider-Maunoury S., Shkumatava A., Teboul L., Kent J.et al.. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol. 2016; 17:148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Visel A., Minovitsky S., Dubchak I., Pennacchio L.A.. VISTA Enhancer Browser—A database of tissue-specific human enhancers. Nucleic Acids Res. 2007; 35:D88–D92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Yuan D., Ahamed A., Burgin J., Cummins C., Devraj R., Gueye K., Gupta D., Gupta V., Haseeb M., Ihsan M.et al.. The European Nucleotide Archive in 2023. Nucleic Acids Res. 2024; 52:D92–D97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Tracking SARS-CoV-2 variants (WWW Document) (25 August 2024, date last accessed) https://www.who.int/activities/tracking-SARS-CoV-2-variants.
- 18. Amariuta T., Luo Y., Gazal S., Davenport E.E., van de Geijn B., Ishigaki K., Westra H.-J., Teslovich N., Okada Y., Yamamoto K.et al.. IMPACT: genomic annotation of cell-State-specific regulatory elements inferred from the epigenome of bound transcription factors. Am. J. Hum. Genet. 2019; 104:879–895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Shiferaw H.K., Hong C.S., Cooper D.N., Johnston J.J., NISC Biesecker, L.G.. Genome-wide identification of dominant polyadenylation hexamers for use in variant classification. Hum. Mol. Genet. 2023; 32:3211–3224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Pycnopodia helianthoides (sunflower sea star) (WWW Document) (25 August 2024, date last accessed) https://bioinf.uni-greifswald.de/hubs/sunflower_sea_star/description.html.
- 21. Clawson H., Lee B.T., Raney B.J., Barber G.P., Casper J., Diekhans M., Fischer C., Gonzalez J.N., Hinrichs A.S., Lee C.M.et al.. GenArk: towards a million UCSC genome browsers. 2023; Research Square doi:2 April 2023, preprint: not peer reviewed 10.21203/rs.3.rs-2697398/v1. [DOI] [PMC free article] [PubMed]
- 22. Lee C.M., Barber G.P., Casper J., Clawson H., Diekhans M., Gonzalez J.N., Hinrichs A.S., Lee B.T., Nassar L.R., Powell C.C.et al.. UCSC Genome Browser enters 20th year. Nucleic Acids Res. 2020; 48:D756–D761. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All users can use the UCSC Genome Browser (https://genome.ucsc.edu/) freely but with exceptions to the source code for the Blat utility, liftOver utility and other utilities which are free for non-profit academic research and for personal use. A license is required for commercial use of these utilities or the source code.