Abstract
The UCSC Archaeal Genome Browser (http://archaea.ucsc.edu) offers a graphical web-based resource for exploration and discovery within archaeal and other selected microbial genomes. By bringing together existing gene annotations, gene expression data, multiple-genome alignments, pre-computed sequence comparisons and other specialized analysis tracks, the genome browser is a powerful aggregator of varied genomic information. The genome browser environment maintains the current look-and-feel of the vertebrate UCSC Genome Browser, but also integrates archaeal and bacterial-specific tracks with a few graphic display enhancements. The browser currently contains 115 archaeal genomes, plus 31 genomes of viruses known to infect archaea. Some of the recently developed or enhanced tracks visualize data from published high-throughput RNA-sequencing studies, the NCBI Conserved Domain Database, sequences from pre-genome sequencing studies, predicted gene boundaries from three different protein gene prediction algorithms, tRNAscan-SE gene predictions with RNA secondary structures and CRISPR locus predictions. We have also developed a companion resource, the Archaeal COG Browser, to provide better search and display of arCOG gene function classifications, including their phylogenetic distribution among available archaeal genomes.
INTRODUCTION
The feature-rich UCSC Genome Browser (1,2), created originally to annotate the human genome, has become an established graphical web resource for analyzing higher eukaryotes. Its extensible architecture and the relatively small size of microbial genomes enabled us to develop the Archaeal Genome Browser with modest resources 6 years ago (3). The initial set of 26 archaeal genomes contained basic annotation tracks including G/C (guanine/cytosine) content, protein gene predictions from NCBI RefSeq (4) and the Comprehensive Microbial Resource (5), as well as known and predicted non-coding RNA genes. Additional details for protein gene annotations derived from Pfam domains (6), COG groups (7), KEGG pathway information (8) and ModBase structure predictions (9) were also integrated. Computational predictions of promoters and Shine–Dalgarno motifs (Chan and Lowe, manuscript in preparation), microarray gene expression studies (10) and multiple-genome alignments (11,12) further enriched the content of the genome browser with diverse gene information and enabled comparative genomic analysis.
With the increased availability of complete archaeal genomes, we expand our database to include 115 archaeal genomes, the genomes of 31 viruses known to infect archaea and more than 250 bacterial genomes. In addition, we develop new tracks to provide more diverse or detailed information based on NCBI conserved protein domains, CRISPR predictions and paralogs within genome. The newly implemented arCOGs Browser presents the Archaeal Clusters of Orthologous Genes (13), and further enables users to classify genes and their functions. To analyze gene expression patterns and discover novel transcripts, we include 19 microarray and two RNA sequencing (RNA-seq) data sets in six different archaeal species; we expect many more RNA-seq data sets in the near future as these functional genomics studies become more common. Together with new functionality in the UCSC Genome Browser, the information combined from diverse sources provides a valuable resource for the archaeal research community.
NEW BROWSER FEATURES
Updated entry portal
We have redesigned the home page of the Archaeal Genome Browser (archaea.ucsc.edu) for better presentation of database resources and easier accessibility to the rapidly growing genome browser collection. From the entry portal, a series of ‘tabs’ provides direct access to supporting information or resources. A ‘News’ section highlights newly added genomes, new functional genomics data sets and improved browser features. To help researchers use the genome browser more effectively, a ‘Tutorials’ section now contains short video guides and slide presentations, covering basic navigation, core tracks, browser configuration and advanced data search/extraction using the powerful ‘table browser’ database interface (14). Other new sections accessible from the home page include links to the new arCOGs browser, summary information about all available functional genomics data sets, a link to ‘Gene-Pub’ (described below) and a ‘Resources’ page with a collection of links to general information about archaea, archaeal research labs and other genome analysis resources.
Genome access and description
The standard method of launching the genome browsers has been the selection of genomes based on major clades at the genome gateway pages. To provide easier access, we have added a quick search box on the home page of the genome browser website. When users enter a partial genome name, the search box suggests matching available genomes, with links to the genome browser and species information/gateway page. Users may also peruse the full list of species from the ‘Genomes’ tab on the home page.
At each species’ information/gateway page, researchers can view basic information including genome size, a listing of all associated chromosomes and plasmids, the number of predicted genes, G/C content, taxonomy and a direct link to the source sequence RefSeq entry at NCBI. Other species-specific information relating to the organism's habitat and physiology can be viewed via abstracts and PubMed links to the primary literature which detail the species’ isolation and genome sequencing, if available (Figure 1A). Finally, the species’ phylogeny among closely related species can be viewed in a phylogram that is computed from a multiple-genome alignment (12,15); the alignment itself is viewable within the ‘Conservation’ track of the species’ genome browser.
Figure 1.
Genome description page layout and feature browsing. (A) An example of a genome information/gateway page that includes genome size, the number of predicted genes, taxonomy, links to publications detailing the species’ isolation and genome sequencing, and links to feature sets available for indexed browsing. (B) The Pfam (6) domains within the genome are displayed in the left frame after clicking on the ‘Pfam protein domains’ link described in Figure 1A. The right frame displays the genomic region of the feature selected from the left frame; in this example, the selected 1-cysPrx_C Pfam domain is displayed within the genome browser.
Feature set browsing and information tooltip
The relatively small number of genes in each microbial genome enables us to provide a feature-list browsing window for basic annotation tracks including NCBI RefSeq (4) coding and non-coding genes, tRNAscan-SE (16) gene predictions, Pfam protein domains (6), Rfam non-coding RNA predictions (17) and CRISPR predictions (18,19) (Figure 1A). By selecting an available feature set from the genome information/gateway page, a list of the genes or features with the name, description and position in the genome can be displayed on the left frame while the genome browser is displayed in the right frame for the selected feature (Figure 1B). In this way, users can browse, select and inspect the genomic context of many features in a track rapidly, without shifting between windows.
For ease of navigation, we have introduced an information tooltip function within the browser display. When users move the mouse over a track item such as a gene in the NCBI RefSeq track, the feature name and brief description will be shown in a popup box without launching the item description page (Figure 1B).
New features from UCSC Genome Browser
By using the code base developed for the eukaryote-specific UCSC Genome Browser, we have been able to adopt its regularly improving functionality into the Archaeal Genome Browser. Users can now move the genome display region by dragging the browser image left and right, and zoom in to a desired region by holding the shift key while dragging over it. Annotation tracks can also be reordered by dragging them up or down in the browser window. To enable the display of very large data sets and sequence alignments from high-throughput sequencing results, the genome browser group has introduced BigBed and BigWig (20) file formats and included the support of the Binary Alignment/Map (BAM) (21) file format. A new data hub feature also allows researchers to create custom BigBed, BigWig or BAM tracks in groups (composite tracks) and host the data at local web servers. To facilitate user-directed genome analyses with a variety of tools, researchers can export data directly from the UCSC table browser to Galaxy (22), an external, interactive platform for computational biological research. Users may also access the table browser functions of the Archaeal Genome Browser directly on the Galaxy website, and can visualize the results of new Galaxy data analyses within our genome browsers by exporting custom tracks from Galaxy.
NEW DATA
Genome assemblies
Since our initial publication in 2006, we have continued to add complete archaeal genomes as they become available, as well as draft genomes on request. To date, the Archaeal Genome Browser contains 115 archaeal genomes; broken down into subdomains, these include 35 crenarchaea, 75 euryarchaea, 2 thaumarchaea, 1 nanoarchaeon, 1 korarchaeon and 1 Caldiarchaeum. We have also provided 277 bacterial genomes and 31 archaeal viral genomes for comparative studies. In collaboration with other research labs, we have included seven mycobacteriophages and the human malaria parasite Plasmodium falciparum.
Browser tracks
Besides the gene annotations from NCBI RefSeq (4) and Comprehensive Microbial Resource (5), we have created new tracks in the standard collection for archaeal genomes that further diversify the sources of genome annotation (Figure 2). These include gene predictions from the Integrated Microbial Genomes system (23); individual gene sequences that appear as independent entries in GenBank (generally from pre-genome age gene locus studies) (24); and protein-coding gene predictions using Glimmer (25), GeneMark (26) and Prodigal (27). To better highlight conserved protein motifs, we have added genomic mappings of Pfam database (6) entries as a separate track, as well as matches to the NCBI conserved domain database (28) using RPS-BLAST (29). We also continue to update the protein sequence alignment tracks using BLASTP (29). The BLASTP alignment results enable us to identify potential paralogs within each genome. As the number of experimental characterization studies increases, valuable insights into gene function often are not integrated into existing genome annotation. We have developed Gene-Pub as a platform for establishing a link between published research and archaeal gene annotations. Researchers can access the Gene-Pub submission system through the website entry portal page, and provide their updated, experimentally supported annotation, with the corresponding publication(s) for any archaeal gene. The submitted information will be reviewed for accuracy and then appended to existing gene annotation.
Figure 2.
Browser tracks available for a sample genome, P. furiosus. (A) GC Percent graph showing the G+C content percentage within a 20-nt sliding window. (B) CRISPR predictions (18,19). (C) NCBI RefSeq (4) protein-coding gene annotations with track elements color-coded to indicate COG functional category (38). (D) Comprehensive Microbial Resource gene annotations (5). (E) Integrated Microbial Genomes gene annotations (23). (F) Protein-coding gene predictions using GeneMark (26), Glimmer (25) and Prodigal (27). (G) Operon predictions from MicrobesOnline (39) and OperonDB (40). (H) Independent gene entries at GenBank. (I) Genomic mappings of Pfam database (6) entries. (J) NCBI conserved domain database (28) search results using RPS-BLAST (29). (K) Non-coding RNA gene annotations from Rfam (17). (L) tRNA gene predictions by tRNAscan-SE (16). (M) NCBI RefSeq (4) non-coding RNA gene annotations. (N) Insertion sequence elements annotated by ISfinder (41). (O) Palindromic transcription factor binding site predictions. (P) Promoter predictions on + strand using 1-nt sliding window of 16-nt BRE/TATA promoter motif scan. (Q) Poly-T motifs as possible transcription termination signals. (R) Ribosomal binding site predictions on + strand using 1-nt sliding window of 10-nt Shine–Dalgarno motif scan. (S) Small RNA sequencing data coverage by Michael Terns and colleagues (42). (T) Microarray expression data showing metabolism of elemental sulfur by Michael Adams and colleagues (43). (U) High similarity nucleotide alignments with other loci in the genome. (V) Paralogs within genome identified by BLASTP search (29). (W) Multiple sequence alignment similarity plot of closely related species using PhyloHMM (12,15). (X) Orthologous gene annotations in closely related genomes based on genome sequence alignments. (Y) Phylogenetic breakdown of BLASTP (29) protein similarities across all proteins within supported archaeal genomes.
Due to the existence of non-canonical tRNA introns and atypical tRNAs in archaea (30–34), the automated, usually dated annotations of these and other non-coding RNAs in RefSeq (4) may not be accurate. We therefore include a tRNA gene track predicted by an improved version of tRNAscan-SE (16,34) with links to the Genomic tRNA Database (35) for detailed information. The Rfam (17) track and CRISPR predictions (18,19), allow recognition of other known non-coding RNAs, giving the most complete view of non-coding RNAs in archaeal genomes.
From the Archaeal Genome Browser entry portal page, researchers can find a list of 19 available RNA sequencing and microarray experimental data sets that can be leveraged for gene expression analyses and novel transcript discovery. Links to Gene Expression Omnibus (GEO) (36), journal publications at PubMed, and the corresponding genome browsers are provided. Users may also retrieve the source RNA sequencing reads of Pyrococcus furiosus and Sulfolobus solfataricus by following links to the NCBI Sequence Read Archive (37). Within the genome browsers, we provide color-coded microarray tracks for displaying gene expression data. Furthermore, we have developed two separate sets of bar graphs to show the RNA-seq read coverage, as well as read-end density (Figure 2), enabling users to estimate the abundance level, length and boundaries of transcribed elements.
The increased number of complete microbial genomes in the past 5 years allows us to include multiple-genome nucleotide alignments among closely related species for most archaea and selected bacteria. For example, we provide a 14-way genome alignment in the browsers for Halobacteria. Enabling the visualization of annotation between species, the new ‘aligned features’ track can be used to identify orthologous genes in closely related genomes based on nucleotide sequence alignments. On the protein level, BLASTX tracks allow detection of potentially missed ORFs in intergenic regions, based on BLASTX comparison of unannotated nucleotide sequences to the protein databases.
Archaeal COG Browser
Clusters of Orthologous Groups of proteins (COGs) have been widely used for evolutionary gene classifications, focusing primarily on bacterial and eukaryotic systems (38). To extend this work, the Evolutionary Genomics Research Group at NCBI developed the Archaeal Clusters of Orthologous Genes (arCOGs) that classify genes and provide improved functional annotation specific to archaeal genomes (13). With the input and support from the arCOGs team, we created the arCOGs Browser to provide better accessibility to this valuable resource. The current data set in the arCOGs browser represents the 2010 update, and will be updated periodically as releases are made available by the arCOGs team at NCBI. Researchers can search for gene loci and arCOG annotations across all archaeal genomes and view both the distribution and homolog count of any given arCOG or functional category (Figure 3). Each arCOG gene entry also links to the Archaeal Genome Browser for graphical viewing within its genome context. Within the genome browsers, arCOG annotations and links to the arCOG browser are reciprocally listed on the description page for each gene in the NCBI RefSeq track.
Figure 3.
Using the Archaeal COG Browser. The left panel displays the main search interface with search results shown in the table below. Upon clicking on an arCOG link in the results table (in this example, the second entry arCOG00078), a new page displayed in the right panel gives the phylogenetic distribution of proteins within the arCOG (13).
The arCOGs database and the new arCOGs browser address significant deficiencies in existing archaeal gene annotations. A large proportion of genes in archaeal genomes are annotated as hypothetical proteins due to the limited availability of biochemical and comparative genomic data when these genomes were initially sequenced. Even as new gene characterization studies are published and comparative genomic data multiply, annotation updates at Genbank using these information resources are seldom. For example, half of the genes in Pyrobaculum aerophilum, a crenarchaeon that was sequenced a decade ago, fall into this category. Using the arCOG data and browser, we found that more than 40% of these hypothetical proteins have an arCOG functional classification. Thus, integration of these new functional assignments within the Archaeal Genome Browsers represents an important advance for the research community.
FUTURE DEVELOPMENT
We will continue to incorporate new archaeal genomes and update existing annotations in the Archaeal Genome Browser. Because the Archaeal Genome Browser shares the same code base as the actively developed UCSC Genome Browser, we expect to offer new feature updates regularly. With the expanded use of next-generation sequencing technologies, track updates to enhance accessibility and visualization of new functional data will be of growing importance. The Archaeal Genome Browser will continue to focus on providing complete genomes and publicly available annotations, although we encourage members of the research community to contribute genome-wide data sets and new analyses, as well as unpublished genomes still in the process of community-based annotation efforts.
FUNDING
Funding for open access charge: The National Science Foundation (DBI-0641061 and EF-0827055).
Conflict of interest statement. None declared.
ACKNOWLEDGEMENTS
We thank Jim Kent and the Genome Bioinformatics Group of UC Santa Cruz for providing us with excellent assistance in developing the Archaeal Genome Browser. We are grateful to Lowe Laboratory members David Bernick, Aaron Cozen, Lauren Lui, Andrew Uzilov and Matthew Weirauch (University of Toronto) who helped develop annotation tracks. We thank Kira Makarova, Yuri Wolf and the arCOGs team in Evolutionary Genomics Research Group of NCBI for support and input in the development of the arCOGs Browser.
REFERENCES
- 1.Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. The human genome browser at UCSC. Genome Res. 2002;12:996–1006. doi: 10.1101/gr.229102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rhead B, Karolchik D, Kuhn RM, Hinrichs AS, Zweig AS, Fujita PA, Diekhans M, Smith KE, Rosenbloom KR, Raney BJ, et al. The UCSC Genome Browser database: update 2010. Nucleic Acids Res. 2010;38:D613–D619. doi: 10.1093/nar/gkp939. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Schneider KL, Pollard KS, Baertsch R, Pohl A, Lowe TM. The UCSC Archaeal Genome Browser. Nucleic Acids Res. 2006;34:D407–D410. doi: 10.1093/nar/gkj134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35:D61–D65. doi: 10.1093/nar/gkl842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Davidsen T, Beck E, Ganapathy A, Montgomery R, Zafar N, Yang Q, Madupu R, Goetz P, Galinsky K, White O, et al. The comprehensive microbial resource. Nucleic Acids Res. 2010;38:D340–D345. doi: 10.1093/nar/gkp912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, et al. The Pfam protein families database. Nucleic Acids Res. 2010;38:D211–D222. doi: 10.1093/nar/gkp985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science. 1997;278:631–637. doi: 10.1126/science.278.5338.631. [DOI] [PubMed] [Google Scholar]
- 8.Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 2006;34:D354–D357. doi: 10.1093/nar/gkj102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Pieper U, Eswar N, Webb BM, Eramian D, Kelly L, Barkan DT, Carter H, Mankoo P, Karchin R, Marti-Renom MA, et al. MODBASE, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 2009;37:D347–D354. doi: 10.1093/nar/gkn791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Cozen AE, Weirauch MT, Pollard KS, Bernick DL, Stuart JM, Lowe TM. Transcriptional map of respiratory versatility in the hyperthermophilic crenarchaeon Pyrobaculum aerophilum. J. Bacteriol. 2009;191:782–794. doi: 10.1128/JB.00965-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004;14:708–715. doi: 10.1101/gr.1933104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Siepel A, Haussler D. Combining phylogenetic and hidden Markov models in biosequence analysis. J. Comput. Biol. 2004;11:413–428. doi: 10.1089/1066527041410472. [DOI] [PubMed] [Google Scholar]
- 13.Makarova KS, Sorokin AV, Novichkov PS, Wolf YI, Koonin EV. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea. Biol. Direct. 2007;2:33. doi: 10.1186/1745-6150-2-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004;32:D493–D496. doi: 10.1093/nar/gkh103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15:1034–1050. doi: 10.1101/gr.3715005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gardner PP, Daub J, Tate J, Moore BL, Osuch IH, Griffiths-Jones S, Finn RD, Nawrocki EP, Kolbe DL, Eddy SR, et al. Rfam: Wikipedia, clans and the “decimal” release. Nucleic Acids Res. 2011;39:D141–D145. doi: 10.1093/nar/gkq1129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Grissa I, Vergnaud G, Pourcel C. CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res. 2007;35:W52–W57. doi: 10.1093/nar/gkm360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bland C, Ramsey TL, Sabree F, Lowe M, Brown K, Kyrpides NC, Hugenholtz P. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats. BMC Bioinformatics. 2007;8:209. doi: 10.1186/1471-2105-8-209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010;26:2204–2207. doi: 10.1093/bioinformatics/btq351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Goecks J, Nekrutenko A, Taylor J. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11:R86. doi: 10.1186/gb-2010-11-8-r86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Markowitz VM, Chen IM, Palaniappan K, Chu K, Szeto E, Grechkin Y, Ratner A, Anderson I, Lykidis A, Mavromatis K, et al. The integrated microbial genomes system: an expanding comparative analysis resource. Nucleic Acids Res. 2010;38:D382–D390. doi: 10.1093/nar/gkp887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2011;39:D32–D37. doi: 10.1093/nar/gkq1079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23:673–679. doi: 10.1093/bioinformatics/btm009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Besemer J, Lomsadze A, Borodovsky M. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 2001;29:2607–2618. doi: 10.1093/nar/29.12.2607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. doi: 10.1186/1471-2105-11-119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, et al. CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 2011;39:D225–D229. doi: 10.1093/nar/gkq1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 30.Marck C, Grosjean H. Identification of BHB splicing motifs in intron-containing tRNAs from 18 archaea: evolutionary implications. RNA. 2003;9:1516–1531. doi: 10.1261/rna.5132503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Fujishima K, Sugahara J, Tomita M, Kanai A. Large-scale tRNA intron transposition in the archaeal order Thermoproteales represents a novel mechanism of intron gain. Mol. Biol. Evol. 2010;27:2233–2243. doi: 10.1093/molbev/msq111. [DOI] [PubMed] [Google Scholar]
- 32.Randau L, Munch R, Hohn MJ, Jahn D, Soll D. Nanoarchaeum equitans creates functional tRNAs from separate genes for their 5'- and 3'-halves. Nature. 2005;433:537–541. doi: 10.1038/nature03233. [DOI] [PubMed] [Google Scholar]
- 33.Fujishima K, Sugahara J, Kikuta K, Hirano R, Sato A, Tomita M, Kanai A. Tri-split tRNA is a transfer RNA made from 3 transcripts that provides insight into the evolution of fragmented tRNAs in archaea. Proc. Natl Acad. Sci. USA. 2009;106:2683–2687. doi: 10.1073/pnas.0808246106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Chan PP, Cozen AE, Lowe TM. Discovery of permuted and recently split transfer RNAs in Archaea. Genome Biol. 2011;12:R38. doi: 10.1186/gb-2011-12-4-r38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Chan PP, Lowe TM. GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 2009;37:D93–D97. doi: 10.1093/nar/gkn787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, et al. NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res. 2011;39:D1005–D1010. doi: 10.1093/nar/gkq1184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2008;36:D13–D21. doi: 10.1093/nar/gkm1000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4:41. doi: 10.1186/1471-2105-4-41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Price MN, Huang KH, Alm EJ, Arkin AP. A novel method for accurate operon predictions in all sequenced prokaryotes. Nucleic Acids Res. 2005;33:880–892. doi: 10.1093/nar/gki232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Pertea M, Ayanbule K, Smedinghoff M, Salzberg SL. OperonDB: a comprehensive database of predicted operons in microbial genomes. Nucleic Acids Res. 2009;37:D479–D482. doi: 10.1093/nar/gkn784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Siguier P, Perochon J, Lestrade L, Mahillon J, Chandler M. ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res. 2006;34:D32–D36. doi: 10.1093/nar/gkj014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Hale CR, Zhao P, Olson S, Duff MO, Graveley BR, Wells L, Terns RM, Terns MP. RNA-guided RNA cleavage by a CRISPR RNA-Cas protein complex. Cell. 2009;139:945–956. doi: 10.1016/j.cell.2009.07.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Schut GJ, Bridger SL, Adams MW. Insights into the metabolism of elemental sulfur by the hyperthermophilic archaeon Pyrococcus furiosus: characterization of a coenzyme A- dependent NAD(P)H sulfur oxidoreductase. J. Bacteriol. 2007;189:4431–4441. doi: 10.1128/JB.00031-07. [DOI] [PMC free article] [PubMed] [Google Scholar]



