Abstract
Originally released in 2005, BASys (Bacterial Annotation System) was one of the first web servers to support online bacterial genome annotation and interactive genomic display. Over the past 20 years, web technologies and annotation algorithms have advanced considerably. To keep current with these advances and changing needs of microbial genomics, we have developed BASys2 (Bacterial Annotation System 2.0). BASys2 represents a significant upgrade to BASys, offering much more rapid (up to 8000× faster) and far more complete (2× as many data fields) genome annotation with significantly improved genome visualization capabilities. More specifically, BASys2 reduces annotation time from 24 h to as little as 10 s through a fast genome-matching and a novel annotation transfer strategy. Accepting either FASTA or FASTQ files, BASys2 is able to generate up to 62 annotation fields per gene/protein, leveraging over 30 bioinformatics tools and 10 different databases. Among the more unique features of BASys2 is its extensive support for whole metabolome annotation and complete structural proteome generation. BASys2’s new interactive genome viewer allows rapid, dynamic visualization of complete bacterial genome maps with options to display/hide multiple concentric annotation tracks, show/remove color-coded legends, manipulate the genome map, and select/view individual gene and metabolite annotations. Available as a web server, a desktop viewer application, and a locally installable Docker image, BASys2 allows researchers to achieve unprecedented annotation depth and to easily upload, download, and display these rich genome/metabolome annotations. The BASys2 web server is freely accessible at https://basys2.ca.
Graphical Abstract
Graphical Abstract.
Introduction
When the first bacterial genome (Haemophilus influenzae) was sequenced in 1994, it took nearly 13 months to complete [1]. Since then, developments in genome sequencing, ranging from short-read, next-generation sequencing (NGS) methods to long-read NGS approaches, now allow bacterial genomes to be sequenced in just a few hours [2]. The result has been an explosion of bacterial genome data. As of 1 March 2025, there are now 2.58 million bacterial or archaeal genome sequences in the NCBI data repository (https://www.ncbi.nlm.nih.gov/datasets/genome/). Prokaryotic DNA sequencing has become so fast, so routine, and so inexpensive that ∼4000 microbial genomes are deposited every day into the NCBI genome archive. This massive growth in genome sequencing has led to considerable pressure on bacterial genome annotation tools. It has also led to a new appreciation of the importance of bacterial genome annotation. Indeed, it is widely recognized that microbial genome annotation plays an absolutely critical role in understanding bacterial physiology, evolution, and environment interactions. In other words, without high-quality, accurate genome annotations, it will be impossible to truly understand the role or importance of any bacterial isolate.
The very first microbial genome annotation tools employed specialized gene identification programs such as GLIMMER [3] and GeneMark [4], which were often coupled with semi-automated BLAST [5] searches or manual literature reviews to functionally annotate genes via sequence similarity [6]. These methods were relatively inaccurate, manually intensive, and slow. This semi-automated approach was greatly improved by the development of fully automated genome annotation and genome visualization tools such as the TIGR Annotation Engine, released in 2004 [7], and the Bacterial Annotation System (BASys), released in 2005 [8]. Subsequently, these general genome annotation servers were succeeded by the Rapid Annotation using Subsystem Technology (RAST) server, first described in 2008 [9] and thereafter integrated into the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) [10]. Later, installable software tools for pipeline-based genome annotation, such as Prokka, were developed [11]. Over the past 20 years, many other bacterial genome annotation tools have appeared, offering more specialized annotations of rRNA genes (https://github.com/tseemann/barrnap), tRNA genes [12], prophage genes [13], mobile genetic elements [14], CRISPR elements [15], and secondary metabolites [16, 17]. Complementing this work was the development of NCBI’s Prokaryotic Genome Annotation Pipeline, or PGAP [18], and the allocation of dedicated NCBI computing resources to permit near-continuous bacterial genome annotation. PGAP has allowed both the NCBI and the microbial community to keep microbial genome annotations current with the ongoing flood of microbial genome data.
Despite these impressive developments, it is still apparent that most general prokaryotic genome annotation systems provide rather sparse annotations. Indeed, the average prokaryotic genome annotation tool generates only six to seven annotations per gene (name, gene abbreviation, gene sequence, protein sequence, genome location or position, length, function, or functional class) and supports just a few annotation classes (protein-coding genes, tRNAs, and rRNAs), with some tools offering annotations for non-coding small RNAs and pseudogenes. To acquire additional annotations, one must use much more specialized, “one-off” annotation tools or web servers [12–17] and manually combine the annotations.
Another obvious limitation is the fact that very few prokaryotic genome annotation systems offer high-quality, interactive, genome visualization interfaces. The exceptions are RAST/SEED, BV-BRC (which both require a login), BASys [8], Proksee [19], and GenSASv6.0 [20]. These servers offer interactive web-based visualizations. Other programs, such as Prokka/ARTEMIS and Prokka/Galaxy [21, 22], offer command-line visualization capabilities. Further “gap” assessments by our team also noted a dearth of genome annotation tools for performing metabolite prediction, metabolic reaction/pathway prediction, operon prediction, protein structure prediction, and comprehensive functional annotation.
Given these limitations, we decided to revisit our old BASys server (which actually offered many of these missing annotation/visualization capabilities) and to update it with the most recent tools and latest technologies for genome annotation and visualization. The result of this upgrade is BASys2 (https://www.basys2.ca). BASys2 offers much more rapid (up to 8000× faster) and far more complete (2× as many data fields) genome annotation capabilities than the original BASys server. Comparisons to other genome annotation servers show even greater relative improvements or capabilities (Table 1). The new BASys2 server employs >30 different bioinformatics programs and 10 different databases to assemble and/or annotate a genome with up to 62 annotation fields in as little as 10 s. Unlike other annotation tools, BASys2 has a strong focus on generating in-depth metabolite/metabolome annotations. This allows researchers to connect microbial genes and proteins to both biochemical pathways and metabolites via RHEA [23], HMDB [24], and MiMeDB [24]. BASys2 also provides rich protein structural data (including 3D coordinate data) and interactive structural visualizations for all annotated proteins through its use of the AlphaFold Protein Structure Database (APSD) [25] as well as Proteus2 and Homodeller [26]. In addition, BASys2’s new interactive genome viewer provides users with fast, easily navigable, high-resolution, multi-track genome visualizations. This allows genes to be clicked (with a mouse or trackpad) to generate detailed “Gene Cards” or metabolites to be clicked to generate “MetaboCards.” All data generated by BASys2 can be freely downloaded, as can the BASys2 desktop viewer application (BDVA) and a locally installable Docker image of BASys2 for “power” users. Additional details about BASys2, its improvements, design, implementation, and comparison against other tools are presented in the following pages.
Table 1.
Web server genome annotation tool comparison using Escherichia coli str. K-12 substr. MG1655
| BASys2 | BASys | Proksee | Prokka w. Galaxy | BV-BRC | RAST/SEED | GenSASv6.0 | |
|---|---|---|---|---|---|---|---|
| Depth of genome annotation | ++++ | +++ | ++ | + | +++ | +++ | +++ |
| Displays 3D protein structure | Yes | No | No | No | Yes | No | No |
| 3D protein coverage | ++++ | – | – | – | + | – | – |
| Supports metabolite annotation | Yes | No | No | No | Yes | Yes | No |
| Displays metabolome annotation | Yes | No | No | No | Yes | Yes | No |
| Displays metabolic pathways | Yes | No | No | No | Yes | Yes | No |
| Depth of metabolite annotation | +++ | + | – | – | + | + | – |
| Supports whole metabolome downloads | Yes | No | No | No | No | No | No |
| Processing speed (min) | 0.5 (Ave) | 1440 | 44 | 2.5 | 15 | 51 | 222 |
| Visualization capabilities | Genome (CGView.js), 3D structure (Mol*), chemical structure, PathBank pathways | Genome (CGView) | Genome (CGView.js) | Genome (JBrowse) | Genome (JBrowse), 3D structure (Mol*), KEGG pathways | Genome (JBrowse), KEGG pathways | Genome (JBrowse) |
| User interface | Simple, modern | Simple, outdated | Complex, modern | Complex, modern | Complex, modern | Complex, outdated | Complex, outdated |
| Output file viewer | Yes | No | Yes | Yes | Yes | No | No |
| Log in required | No | No | No | No | Yes | Yes | Yes |
The query FASTA file was annotated using six different web server tools and compared. Speed and annotation depth were measured based on using all options/tools available through the respective web servers.
Back-end improvements
The original, back-end BASys code was >20 years old and relied heavily on older versions of BLAST and outdated databases. As a result, the old version of BASys would typically take 24 h to complete a genome annotation, which is obviously unacceptable when it only takes ∼2 h to sequence an entire bacterial genome. In creating BASys2, we obtained or wrote newer, faster, or better annotation programs; acquired faster multi-processor CPUs; implemented massive parallelism throughout the annotation pipeline; and incorporated many new databases. The net result is a much improved, significantly faster, and far more comprehensive annotation pipeline, which is described in more detail below.
Improving the BASys annotation pipeline
As seen in Fig. 1, BASys2 now accepts whole genomes as FASTA files, GenBank files (GBF format), NCBI accession numbers, and raw reads (in FASTQ format). Users can select GenBank, FASTA, or FASTQ files from their computer using the “Browse” function at the top half of the submission page and submit the file to the pipeline by pressing “Submit.” Example files (Fig. 1) are also available for users to explore the server’s functionality and performance. If a FASTQ file is chosen and submitted, the FASTQ reads are assembled and cleaned up using SPAdes [27]. Assembly quality metrics are assessed against BASys2’s own database of assembly characteristics for >30 000 bacterial species. For assemblies that do not map to any species in the database, the evaluation defaults to standard NCBI reference sequence exclusion criteria. With the current server configuration, the genome assembly task typically takes ∼4 min for a genome of 4.1 million bases with 100-fold coverage (1.5 million paired-end short reads or 15 000 long reads). The net result of the assembly is an unannotated FASTA file, which is used as input into the BASys2 annotation pipeline. If a GenBank file is chosen, the file may be annotated with coding sequences (CDS) and other data from PGAP, or it may be unannotated and exist as a simple FASTA file. If a FASTA file is uploaded, no annotation CDS information is expected. Regardless of the input file type or format, BASys2 performs the appropriate file format (GBF, FASTA, or FASTQ compliant) and file size checks (genomes must have >400 000 bases) and makes the appropriate decision to either annotate the file with pre-identified CDS data or extract the CDS using its own tools.
Figure 1.
BASys2 annotation file upload page. Users can upload whole genome sequence files in FASTA or GenBank format, as well as raw reads in FASTQ format. Raw read FASTQ files must be uploaded as a ZIP file containing two FASTQ files. Upon file selection, users click the “Submit” button to go on to Assembly, if needed, and then Annotation. Alternatively, and below the file upload section, users can input NCBI accession numbers for FASTA/GenBank file retrieval.
If users choose to submit an NCBI accession (lower half of the submission page; see Fig. 1), BASys2 retrieves the corresponding GenBank file from the NCBI genome website and uploads the necessary sequence and/or annotation data to the pipeline. BASys2 then makes the appropriate decision to either annotate the file with pre-identified CDS or extract the CDS using its own CDS extraction tools. As explained in more detail below, BASys2 has two annotation pipelines: (i) a database-naïve pipeline (slower) and (ii) a database-informed pipeline (faster). The decision about which pipeline to follow is made on the basis of a sequence content/sequence comparison analysis of the query sequence against a specially annotated BASys2 database. Details regarding the database-naïve pipeline are explained here, while the details regarding the database-informed pipeline are given in the next section.
In BASys2’s database-naïve annotation pipeline, the first step involves identifying protein CDS. CDS need to be determined (or extracted from the uploaded file) before continuing onto the next step in the annotation pipeline. If uploaded sequences are not pre-annotated with CDS data, the sequences are run through Prodigal [28] and FragGeneScan [29] to find all CDS regions. In the second step of the BASys2 pipeline, the remaining non-coding regions are run through tRNAscan-SE [30], ARAGORN [12], BARRNAP (https://github.com/tseemann/barrnap), and sRNAtoolbox [31] to find tRNA, rRNA, tmRNA (transfer–message RNA), sRNA (small regulatory RNA) and other miscellaneous RNAs.
Functional annotation of all identified bacterial proteins is performed by similarity searches against multiple databases and running various in-house and third-party tools. Specifically, BLAST [5] and DIAMOND [32] searches of the query genome's identified CDS done against the UniProt database [33] are used to provide a large amount of functional annotation data. Sequence identity thresholds and annotation rules originally developed and thoroughly tested in BASys were used to perform the functional annotation transfers, with a BLASTP threshold cutoff of E = 1 × 10–10. The Clusters of Orthologous Genes (COG) database [34] is also searched for additional functional descriptions of each gene. COG annotation was a key feature of the original BASys server and we continue to offer this service using the same annotation transfer rules. A large number of in-house tools are used to calculate various protein/CDS characteristics (Gene Ontology [GO] annotations, cell location, sequence length, isoelectric point, molecular weight, average hydrophobicity, selected amino acid content, membrane spanning and signal sequence regions, etc.). Additional structural (secondary and tertiary structure) predictions are performed, as well as detailed metabolite/metabolic reaction predictions for each CDS (described in more detail in later sections). Paralogs of each gene/protein found within the same genome are also identified and annotated via an all-against-all BLASTP search of each protein in the genome using an E-value cutoff of 1 × 10–5.
In addition to these standard annotations, other annotations derived from many high-performing, third-party programs have been added to the BASys2 pipeline. For instance, operons are found using Rockhopper [35]. Likewise, PHASTEST [36] is used to find prophage regions and prophage genes, all of which are reported with a confidence score. Mobile regions are increasingly being recognized for their importance in genome annotation, so we have added three programs to identify these regions. First, a database search against mobileOG-db [14] has been added to identify regions that are likely to have entered the genome in a few ways and are noted in the results. Alien Hunter [37] is used to predict areas and assign quality scores to genome regions that resulted from horizontal gene transfer. CRISPRCasFinder [15] has also been added to identify CRISPR–Cas regions within the genome. Antimicrobial regions are annotated using the Comprehensive Antibiotic Resistance Database (CARD) and Resistance Gene Identifier (RGI) system [38]. CARD/RGI identifies the locations of these genes as well as type of antimicrobial mechanisms, antibiotic classes, and associated antibiotics. Biosynthetic gene clusters and their corresponding secondary metabolites are detected and annotated using AntiSMASH 7.0 [16]. A database-naïve annotation of a typical bacterial genome of four megabases using the current server configuration (multi-CPU, highly parallelized version of BASys2) takes about 20 min.
Performance optimizations (database-informed annotation)
BASys2 is also capable of performing much faster genome annotations via its database-informed annotation pipeline (DIAP). In implementing BASys2’s DIAP, we employed a clever database and annotation transfer strategy that significantly increase the speed of BASys2’s genome annotation. This method exploits the Ensembl Bacteria database (EBD) [39]. The EBD contains a non-redundant set of 31 332 bacterial genomes that covers nearly all known bacterial species. This means almost any newly sequenced genome is guaranteed to have >95% sequence identity (the normal sequence identity cutoff for separate bacterial species) to at least one genome in the EBD. By initially running the database-naïve BASys2 annotation pipeline (described in the previous section) on the entire set of EBD genomes, we were able to create a pre-annotated microbial genome database (called the BASys-DB). Furthermore, because similar bacterial species have very similar genome lengths and very similar A,T,G,C content, it is possible to create a characteristic content/length identifier (CCLID) or similarity identifier for every genome in the EBD. Therefore, if a new genome is submitted, its CCLID similarity can be calculated in microseconds and its CCLID similarity (equivalent to a cosine score) to all CCLIDs in the EBD can be calculated in milliseconds. The two highest scoring CCLID matches are then compared using FastANI [40] to confirm the best match.
Surprisingly, the vast majority of queries to BASys2 are to well-known model genomes (most of which are in the EBD). The result is that when such a query is submitted, BASys2 simply finds the exact match in the BASys2-DB (via the CCLID/FastANI search) and returns the fully annotated genome through the BASys2 genome viewer within 10 s. The second most common type of BASys2 query is for a strain variant to a previously well-characterized bacterial genome species. This leads to the identification of a single genome in the BASys2-DB that almost always has a >95% sequence identity match (i.e. a strain variant) to >99% of the query’s detectable genes. This allows BASys2 to rapidly transfer the vast majority of its annotations from the matched genome directly to the new genome with minimal calculations. These transfer annotations can typically be done in ∼60 s. In the very rare case where the query genome is from a completely novel species and exhibits <95% sequence identity to an EBD genome, the query genome sequence is sent through the database-naïve BASys2 annotation pipeline, described earlier. These annotations obviously take longer (20 min). Estimates of the annotation time and annotation progress for any query genome are provided in real time on the BASys2 website.
Structural annotation upgrades
One of the strengths of the original BASys server was its extensive support for protein structural annotations. In keeping with this focus, an updated structural prediction pipeline has been implemented in BASys2 to increase both the scope and accuracy of its protein structure predictions and annotations. Specifically, BASys2 now makes use of the AlphaFold Protein Structure Database (APSD) [41] to generate most of its structural annotations. To facilitate this process, all proteins in the BASys2-DB were first searched against the APSD using BLAST. This generated a local bacterial-specific APSD consisting of both PDB coordinates and sequence data covering about 6% of the 93.8 million sequences (but essentially all the expected structure folds) in BASys2-DB. All remaining structurally unannotated sequences in the BASys2-DB were then modeled from APSD homologues using HOMODELLER [26]. All 3D structures generated in this way then had their secondary structures assigned using VADAR [42] and membrane spanning regions identified via Proteus2 [26]. This led to the creation of a database (called BASys2-DBn3D) consisting of nearly 93.8 million bacterial sequences with 93.8 million predicted/modeled 3D structures and secondary structure assignments. Therefore, when a new genome is submitted to BASys2, it is first checked for sequence similarity against the BASys2-DB, and if it is found to be sufficiently similar, the structure annotation pipeline makes use of the DIAP method, described earlier, to rapidly transfer structural data and structural annotations. For completely novel or unusual genomes, the slower, database-naïve approach is used by searching against the BASys2-DBn3D database and employing HOMODELLER to generate 3D coordinate data. Regardless of the pipeline used, BASys2 now uniquely offers 3D coordinate data, 3D structure thumbnail images, interactively viewable structures (via Mol* Viewer [43]), secondary structure annotations, and membrane spanning region identification for all genomes annotated through its pipeline.
Metabolomic annotation upgrades
Microbes are essentially microscale chemical factories. As a result, their vital roles in metabolism, catabolism, and chemical transformation are increasingly being recognized and more thoroughly characterized. To support this shift in prokaryotic genome annotation, BASys2 now offers extensive metabolomic annotations for each identified CDS. As with the case for its structural annotations, a metabolite annotation pipeline was set up to extensively annotate all 31 000 genomes in the BASys2-DB with metabolite and reaction data predicted via BLAST searches of the UniProt database. A protein sequence identity cutoff of 60% (over the full sequence length) was applied to identify metabolic enzymes and thereby determine the associated metabolites and reactions. This sequence identity threshold was verified through multiple manual checks and literature-based annotations. Reaction data, including substrates and products, for each gene were then collected from the RHEA database. Additional metabolite data (nomenclature, synonyms, SMILES and InChI codes, structures, chemical formulas, molecular weights, chemical classes, descriptions, health effects, origins, etc.) were calculated via in-house programs or extracted by the BASys2 team from in-house databases such as HMDB and MiMeDB, as well as external sources, such as ChEBI [44] and PubChem [45]. Consistency checks, name normalization, and cross-referencing across databases were done using InChI keys to eliminate duplicate entries.
This work led to a richly metabolite-annotated database (BASys2-DB) covering all 31 000 bacterial genomes in the EBD. It also led to a robust, automated bacterial metabolite annotation pipeline. Consequently, when a new genome is submitted to BASys2, it is first checked for sequence similarity against the BASys2-DB, and if found to be sufficiently similar, the metabolite annotation pipeline makes use of the DIAP method, described earlier, to rapidly transfer metabolite data and metabolome annotations to the new genome. For novel or unusual genomes, the slower, database-naïve approach is used to annotate the metabolome. Regardless of the pipeline used, all metabolite annotations are presented as lightly annotated “MetaboCards” in a manner similar to the MetaboCards shown for HMDB and MiMeDB. As a result of this upgrade, BASys2 now uniquely offers rich metabolite and reaction data (on a single gene basis) via BASys2’s MetaboCards, and equally rich metabolome data (on a whole genome basis). Each BASys2 MetaboCard is also hyperlinked to the HMDB and MiMeDB, allowing users to access much more information on each metabolite, such as associated metabolic pathways, potential health effects, spectral data, and known bioactivity. Additionally, BASys2 allows users to visualize metabolic pathways directly within MiMeDB, offering a clearer understanding of how genes and their associated metabolites contribute to biochemical processes.
Hardware and implementation upgrades
The original BASys was coded in Perl making use of the common gateway interface (CGI) for generating dynamic web pages. It also employed a somewhat outdated Apache server architecture and used a much older, much more limited version of CGView [46] to visualize bacterial genomes. The entire BASys server was run on a single 32-bit Intel/Linux PC rated at 1.8 GHz. In contrast, BASys2 now uses a 128 CPU core cluster with four Intel Xeon X5460 processors, each rated at 3.16 GHz, along with six AMD Opteron 2220 and two AMD Opteron 6348 processing cores. This multi-core architecture allows nearly all of BASys2’s annotation operations to be run in parallel. To support more facile data manipulation, faster server operations, and full-stack integration, the BASys2 web server framework was rewritten from CGI to Ruby on Rails. Likewise, BASys2 now uses the nginx server architecture that offers reverse proxy and load balancing to support multiple protocols, thereby improving the handling of web traffic. BASys2’s version of CGView has been substantially upgraded and now makes use of JavaScript to support more rapid, multi-track interactive front-end visualization [47], which is described in more detail below.
Front-end improvements
Genome map upgrades
BASys2 features an enhanced graphical user interface (GUI) designed to be more user friendly and intuitive than the original BASys GUI. BASys2’s updated GUI adopts a new theme consistent with all other modern web servers produced by our lab, ensuring a more cohesive and consistent user experience. This includes a new, thematically consistent home page with menu tabs (“Annotate,” “Upload,” “Tutorial,” “Download,” and “About”) and interactive sliding links on the landing page as alternate routes to “Annotate” genomes or “Upload” previously annotated genomes. Clicking on the “Annotate” tab or “Annotate” link brings users to the Annotation page (Fig. 1) where users may browse and upload FASTA, FASTQ, or GenBank files from their computer or enter a GenBank accession ID. The Submission Results page (Fig. 2) has been redesigned to provide more comprehensive genome statistics, including information on the genome sequence length, gene count, phage region count, and other relevant metrics. Users can not only interactively view the BASys2 genome map, but conveniently download the full genome map as a ZIP file, along with a JSON representation of the BASys2 genome map that can be used with our desktop app, the BASys2-Desktop Visualizer app or BDVA (discussed below).
Figure 2.
BASys2 Submission Results page. All predicted genome annotations are displayed in a circular genome view. The results can be downloaded in ZIP and JSON format from the links provided. Genome summary statistics are displayed that include information on the genome sequence length, the number of phage regions found, and the total number of genes found. Users can navigate to the BASys2 Metabolome Table by clicking the button on the right side.
A key advancement in BASys2 is the integration of CGView.js [47] for interactive circular genome map visualization. This JavaScript version of CGView allows for rapid rendering, easy navigation, interactive zooming to base-pair resolution, and real-time editability. Additionally, users can customize the genome map's gene and feature coloring using an integrated color picker. The genome annotation tracks (forward and reverse strands) can be further colored based on COG functions by selecting the “Fetch COG Annotation” option, which applies functional categorization and updates the color legend accordingly. BASys2 also supports high-quality figure generation, enabling users to export genome maps as PNG or SVG images at publication-quality resolution.
The BASys2 genome viewer presents its genome annotation results as a series of concentric circular annotation tracks that can be zoomed down to the sequence level. A figure legend, displayed on the right-hand side of the viewer, describes each annotation track and provides detailed information on the corresponding genes and features. Users can toggle (off/on) individual tracks and the legend using the switches labeled with their respective track names, located above the viewer. The BASys2 genome viewer also includes a set of control buttons that facilitate map navigation and image export. By default, zooming via mouse scrolling is disabled; however, this functionality can be enabled by selecting the “Lock” button to unlock this option. Hovering over any feature on the map displays a truncated version of its annotation data, enhancing the viewer's overall interactivity.
For more precise navigation, users can enter a specific base-pair position in the “Go To BP” text box, which directs and automatically zooms the genome map to the selected genomic region and displays all associated features. Additionally, a search bar located above the map allows users to locate specific genes or metabolites by entering their common name. As the user types a query, matching features remain visible while non-matching ones are temporarily hidden, thereby streamlining the identification of relevant annotations (Fig. 2). We believe these improvements collectively enhance the efficiency and clarity of genome visualization in BASys2.
Clicking on a specific gene feature (a gene name or a colored gene segment) opens a scrollable “GeneCard” modal (Fig. 3), presenting the complete annotation dataset for the selected gene in a structured, easily interpretable format. Each GeneCard includes up to 62 annotations including the gene or feature name, feature type (CDS, rRNA, tRNA, etc.), abbreviations and synonyms, start and stop positions, DNA length, DNA sequence, and the preceding and following genes. If the feature is a protein, the protein length, molecular weight, protein sequences, COG function, GO function, isolectric point, secondary structure, ProSite features, and many other annotations are provided. Cross-referenced database identifiers and hyperlinks from Pfam [48], InterPro [49], UniProt, PROSITE [50], and more are also available. A notable enhancement for BASys2 is the inclusion of 3D protein structure images and 3D structure data for all protein CDS annotated for each genome. For detailed structural analysis, users can open a selected 3D structure in the Mol* Viewer (Fig. 3), which enables interactive visualization, zooming, rotation, image customization, and high-resolution image export. To ensure greater reproducibility and provenance tracking, each annotation in BASys2 is accompanied by metadata on its acquisition/prediction source, specifying the computational tool used and the associated score of the BLAST match or prediction. Users can also download the DNA or protein sequence of a selected feature as a FASTA file via the “Download DNA FASTA” buttons or “Download Protein FASTA” at the top of the GeneCard modal.
Figure 3.
Detail Card and 3D protein structure viewer. Results for each annotation feature can be viewed in a modal where users can scroll through all the available information. For genes with protein predictions, a 3D protein structure viewer (powered by Mol*Viewer) can be opened by clicking the “Open 3D Viewer” button. Similarly, a static chemical structure image will be shown for metabolite features (not shown).
When a user selects a metabolite feature, a scrollable MetaboCard modal appears displaying up to 30 metabolite annotation fields. These include details such as the chemical name, IUPAC name, chemical synonyms, chemical structure, molecular weight, chemical formula, SMILES and InChI representation, and a direct hyperlink to the full MetaboCard. To explore the entire metabolome associated with the organism, users can click the “Metabolome Table” button above the genome map, which directs user to an interactive table listing all identified metabolites along with their names, molecular formula, molecular weights, and structures (Fig. 4A). Clicking on a metabolite entry ID provides access to its full MetaboCard (Fig. 4B). The complete metabolome dataset can be exported as a CSV file by clicking “Download CSV” button. Metabolome annotations are also included in the full annotation ZIP file.
Figure 4.
Metabolome of organism table view and standard MetaboCard view. (A) All the predicted metabolite annotations are displayed as an interactive table. The data can be downloaded as a CSV file by clicking the “Download CSV” button. Clicking on any entry will direct the user to a detailed MetaboCard for the metabolite of interest. Clicking the “Genome View” button takes the user back to the genome map. (B) Detailed information for each metabolite provides users with names, SMILES, chemical structure images, pathway associations, and external database references.
BDVA and BASys2 Docker image
Once a user ends a session with BASys2, all files are removed. To prevent data loss or the need for repeated annotations, all of BASYs2’s annotations are available for download as a ZIP file by clicking the “Download Results” link located at the top of the genome viewer. The ZIP file includes the complete genome annotation data (gene positions, gene tracks, gene data, metabolite data, structures, etc.) including the genome map data as a JSON file. Users can also select just the genome map for download (as a JSON file) by clicking the “Download CGView Map” link located at the top of the genome viewer. These JSON files can be uploaded to the BASys2 web server and viewed on the web server via its “Upload” option at the user's convenience.
Alternatively, users can view their files offline through the BDVA. This tool, which can be downloaded via the “Download” tab, located at the top of the BASys2 home page, is available for MacOS, Windows, and Linux operating systems. Once the BDVA is downloaded, any downloaded BASys2 JSON file can be uploaded to the BDVA, allowing users to visualize their genome annotations offline. The BDVA provides the same functionality as the online viewer, enabling users to explore genome annotations, navigate the genome map, and interact with features in an offline environment. Additionally, the BDVA also allows users to export genome maps as high-resolution PNG or SVG images for publication purposes.
Users interested in running BASys2 locally may also download (via the “Download” tab, located at the top of the BASys2 home page) and install a Docker image to run BASys2 locally. Installation instructions and file size information is provided on the corresponding “Download” page.
Comparison to other genome annotation tools
To more objectively assess the differences or improvements found in BASys2 relative to other genome annotation tools, a detailed comparison was performed. Six web-based prokaryotic genome annotation tools were selected—BASsy2, BASys, Proksee, Prokka via Galaxy, BV-BRC, RAST/SEED, and GenSASv6.0. Note that PGAP was not included in the comparison as it is only available as a downloadable software application rather than an openly available webserver. Each of these were assessed according to a number of criteria, including annotation speed, total number of genome annotation fields, total number of display tracks, support for fully automated annotation, support for protein structure analysis, support for metabolome annotation, support for metabolite display, total number of metabolite annotation fields, support for reaction and pathway display, overall data visualization capabilities, quality of graphical user interface (GUI), support for data downloads, support for data uploads, support for offline viewing, and login requirements (Table 1). An unannotated FASTA file corresponding to Escherichia coli str. K-12 substr. MG1655 was used as a standard organism for evaluating each tool.
As seen from this comparison, BASsy2 stands out as the most comprehensive genome annotation tool, offering the fastest annotations, the highest degree of genome annotation depth, the largest number of display tracks, the only tool supporting protein structure analysis for all annotated proteins, the most comprehensive tool for metabolome/metabolite analysis, the highest degree of metabolome annotation depth, unique support for reaction and pathway display, overall data visualization capabilities, quality of GUI, and data management. Proksee via Galaxy, while among the fastest tools, lacks protein structure and metabolome annotation capabilities, has very limited annotation depth, and provides only genome visualization through a complex GUI. GenSASv6.0 and RAST, despite having a moderate annotation depth, are among the slowest and both have an outdated GUI, requiring login access. Neither has an output file viewer. BV-BRC provides comprehensive annotation, including pathway visualizations and 3D protein structure representations. However, 3D structural data are only available for annotated proteins that have an existing AlphaFold2-predicted structure, which typically corresponds to ∼6% of any query genome. These comparisons highlight some of the significant annotation/visualization strengths and advantages achieved by BASys2.
Conclusion
BASys2 represents a significant advancement in bacterial genome annotation, addressing critical gaps in functionality, speed, and user accessibility. By supporting comprehensive, cutting-edge genomic, structural, and metabolomic annotations, BASys2 provides microbial researchers with a powerful, automated, and user-friendly platform that enables exceptionally comprehensive microbial analyses. Compared to existing annotation tools, BASys2 stands out for its rapid processing capabilities, extensive annotation depth, and intuitive data visualization tools.
The creation of BASys-DB and the implementation of a novel, rapid, database-informed genome annotation transfer method with CCLID and FastANI have helped reduce BASys’s genome annotation time from 24 h to as little as 5 s. Likewise, the integration of metabolomic data, linking genes to biochemical pathways and metabolites through RHEA, HMDB, and MiMeDB, has given BASys2 exceptional metabolite and metabolome annotation capabilities, which should greatly improve bacterial metabolome analysis. Likewise, the inclusion of protein structural annotations via APSD and HOMODELLER gives BASys2 some unique capabilities for gaining important structural insights. BASys2’s upgraded genome viewer along with its newly developed BVDA and downloadable Docker image should further enhance usability, allowing researchers to explore BASys2’s rich multi-omic annotations both online and offline.
Future improvements to BASys2 will focus on adding more comprehensive pathway and operon visualization tools, enhancing its integration with MiMeDB, refining its visualization features to accommodate increased user customization of images, converting more of the Perl backend into Python, and creating an API that would support larger scale user queries. While it has been 20 years since BASys was last updated, we hope the wait has been worthwhile.
Acknowledgements
Author contributions: Jenna Poelzer (Resources [supporting], Software [lead], Validation [supporting], Writing—original draft [lead]), Scott Han (Data curation [lead], Investigation [supporting], Methodology [supporting], Resources [lead], Software [supporting]), Sukanta Saha (Methodology [supporting], Software [supporting], Visualization [supporting]), Eponine Oler (Data curation [supporting], Resources [supporting], Software [supporting], Supervision [supporting]), Ray Kruger (Data curation [supporting], Resources [supporting], Software [supporting], Validation [supporting]), Mark Berjanskii (Project administration [supporting], Supervision [supporting], Validation [supporting]), Scott MacKay (Project administration [supporting], Supervision [supporting], Validation [supporting]), David S. Wishart (Conceptualization [lead], Data curation [supporting], Funding acquisition [lead], Investigation [supporting], Methodology [lead], Project administration [supporting], Resources [lead], Supervision [lead], Validation [supporting], Visualization [supporting], Writing—original draft [supporting], Writing—review & editing [lead]).
Contributor Information
Jenna Poelzer, Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada.
Scott Han, Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada.
Sukanta Saha, Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada.
Eponine Oler, Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada.
Ray Kruger, Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada.
Mark Berjanskii, Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada.
Scott MacKay, Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada.
David S Wishart, Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E9, Canada; Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8, Canada; Department of Laboratory Medicine and Pathology, University of Alberta, Edmonton, AB T6G 2B7, Canada; Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta, Edmonton, AB T6G 2H7, Canada.
Conflict of interest
None declared.
Funding
The authors wish to thank the Canada Foundation for Innovation (CFI) and Genome Alberta, a division of Genome Canada, for financial support. Funding to pay the Open Access publication charges for this article was provided by Genome Canada.
Data availability
The BASys2 web server is freely accessible at https://basys2.ca. The source code for the CGView.js genome browser can be accessed and downloaded at https://js.cgview.ca. The BDVA is available through the download tab on the BASys2 website. Other example data files (FASTA, FASTQ, GenBank, JSON) are also available via the BASys2 download tab. A Docker image of BASys2, along with instructions for installing this image and running BASys2 locally, is also available via the download tab on the BASys2 website.
References
- 1. Fleischmann RD, Adams MD, White O et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995; 269:496–512. 10.1126/science.7542800. [DOI] [PubMed] [Google Scholar]
- 2. Quainoo S, Coolen JPM, van Hijum SAFT et al. Whole-genome sequencing of bacterial pathogens: the future of nosocomial outbreak analysis. Clin Microbiol Rev. 2017; 30:1015–63. 10.1128/CMR.00016-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Salzberg SL, Delcher AL, Kasif S et al. Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 1998; 26:544–8. 10.1093/nar/26.2.544. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Borodovsky M, Rudd KE, Koonin EV Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. Nucleic Acids Res. 1994; 22:4756–67. 10.1093/nar/22.22.4756. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Altschul SF, Gish W, Miller W et al. Basic local alignment search tool. J Mol Biol. 1990; 215:403–10. 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 6. Riley M, Abe T, Arnaud MB et al. Escherichia coli K-12: a cooperatively developed annotation snapshot—2005. Nucleic Acids Res. 2006; 34:1–9. 10.1093/nar/gkj405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Poole FL, Gerwe BA, Hopkins RC et al. Defining genes in the genome of the hyperthermophilic archaeon Pyrococcus furiosus: implications for all microbial genomes. J Bacteriol. 2005; 187:7325–32. 10.1128/JB.187.21.7325-7332.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Van Domselaar GH, Stothard P, Shrivastava S et al. BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res. 2005; 33:W455–9. 10.1093/nar/gki593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Aziz RK, Bartels D, Best AA et al. The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 2008; 9:75. 10.1186/1471-2164-9-75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Olson RD, Assaf R, Brettin T et al. Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): a resource combining PATRIC, IRD and ViPR. Nucleic Acids Res. 2023; 51:D678–89. 10.1093/nar/gkac1003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Seemann T Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014; 30:2068–9. 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]
- 12. Laslett D ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. 2004; 32:11–6. 10.1093/nar/gkh152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Zhou Y, Liang Y, Lynch KH et al. PHAST: a fast phage search tool. Nucleic Acids Res. 2011; 39:W347–52. 10.1093/nar/gkr485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Brown CL, Mullet J, Hindi F et al. mobileOG-db: a manually curated database of protein families mediating the life cycle of bacterial mobile genetic elements. Appl Environ Microbiol. 2022; 88:e00991-22. 10.1128/aem.00991-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Couvin D, Bernheim A, Toffano-Nioche C et al. CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic Acids Res. 2018; 46:W246–51. 10.1093/nar/gky425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Blin K, Shaw S, Augustijn HE et al. antiSMASH 7.0: new and improved predictions for detection, regulation, chemical structures and visualisation. Nucleic Acids Res. 2023; 51:W46–50. 10.1093/nar/gkad344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Skinnider MA, Johnston CW, Gunabalasingam M et al. Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences. Nat Commun. 2020; 11:6058. 10.1038/s41467-020-19986-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Tatusova T, DiCuccio M, Badretdin A et al. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 2016; 44:6614–24. 10.1093/nar/gkw569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Grant JR, Enns E, Marinier E et al. Proksee: in-depth characterization and visualization of bacterial genomes. Nucleic Acids Res. 2023; 51:W484–92. 10.1093/nar/gkad326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Humann JL, Lee T, Ficklin S et al.. Kollmar M Structural and functional annotation of eukaryotic genomes with GenSAS. Gene Prediction, Methods in Molecular Biology. 2019; 1962:New York: Springer New York; 29–51. 10.1007/978-1-4939-9173-0_3. [DOI] [PubMed] [Google Scholar]
- 21. Wee SK, Yap EPH GALAXY workflow for bacterial next-generation sequencing de novo assembly and annotation. Curr Protoc. 2021; 1:e242. 10.1002/cpz1.242. [DOI] [PubMed] [Google Scholar]
- 22. Carver T, Harris SR, Berriman M et al. Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics. 2012; 28:464–9. 10.1093/bioinformatics/btr703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Alcántara R, Axelsen KB, Morgat A et al. Rhea—a manually curated resource of biochemical reactions. Nucleic Acids Res. 2012; 40:D754–60. 10.1093/nar/gkr1126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Wishart DS, Guo A, Oler E et al. HMDB 5.0: the Human Metabolome Database for 2022. Nucleic Acids Res. 2022; 50:D622–31. 10.1093/nar/gkab1062. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Wishart DS, Oler E, Peters H et al. MiMeDB: the Human Microbial Metabolome Database. Nucleic Acids Res. 2023; 51:D611–20. 10.1093/nar/gkac868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Montgomerie S, Cruz JA, Shrivastava S et al. PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation. Nucleic Acids Res. 2008; 36:W202–9. 10.1093/nar/gkn255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Bankevich A, Nurk S, Antipov D et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012; 19:455–77. 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Hyatt D, Chen G-L, LoCascio PF et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010; 11:119. 10.1186/1471-2105-11-119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Rho M, Tang H, Ye Y FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 2010; 38:e191. 10.1093/nar/gkq747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Chan PP, Lowe TM tRNAscan-SE: searching for tRNA genes in genomic sequences. Methods Mol Biol. 2019; 1962:1–14. 10.1007/978-1-4939-9173-0_1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Aparicio-Puerta E, Gómez-Martín C, Giannoukakos S et al. sRNAbench and sRNAtoolbox 2022 update: accurate miRNA and sncRNA profiling for model and non-model organisms. Nucleic Acids Res. 2022; 50:W710–7. 10.1093/nar/gkac363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Buchfink B, Xie C, Huson DH Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015; 12:59–60. 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
- 33. UniProt Consortium UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023; 51:D523–31. 10.1093/nar/gkac1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Galperin MY, Vera Alvarez R, Karamycheva S et al. COG database update 2024. Nucleic Acids Res. 2025; 53:D356–63. 10.1093/nar/gkae983. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Tjaden B De novo assembly of bacterial transcriptomes from RNA-seq data. Genome Biol. 2015; 16:1. 10.1186/s13059-014-0572-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Wishart DS, Han S, Saha S et al. PHASTEST: faster than PHASTER, better than PHAST. Nucleic Acids Res. 2023; 51:W443–50. 10.1093/nar/gkad382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. da Silva Filho AC, Raittz RT, Guizelini D et al. Comparative analysis of genomic island prediction tools. Front Genet. 2018; 9:619. 10.3389/fgene.2018.00619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Alcock BP, Huynh W, Chalil R et al. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic Acids Res. 2023; 51:D690–9. 10.1093/nar/gkac920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Harrison PW, Amode MR, Austine-Orimoloye O et al. Ensembl 2024. Nucleic Acids Res. 2024; 52:D891–9. 10.1093/nar/gkad1049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Jain C, Rodriguez-R LM, Phillippy AM et al. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun. 2018; 9:5114. 10.1038/s41467-018-07641-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Varadi M, Anyango S, Deshpande M et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022; 50:D439–44. 10.1093/nar/gkab1061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Willard L VADAR: a web server for quantitative evaluation of protein structure quality. Nucleic Acids Res. 2003; 31:3316–9. 10.1093/nar/gkg565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Sehnal D, Bittrich S, Deshpande M et al. Mol* Viewer: modern web app for 3D visualization and analysis of large biomolecular structures. Nucleic Acids Res. 2021; 49:W431–7. 10.1093/nar/gkab314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Hastings J, Owen G, Dekker A et al. ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016; 44:D1214–9. 10.1093/nar/gkv1031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Kim S, Thiessen PA, Bolton EE et al. PubChem substance and compound databases. Nucleic Acids Res. 2016; 44:D1202–13. 10.1093/nar/gkv951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Stothard P, Wishart DS Circular genome visualization and exploration using CGView. Bioinformatics. 2005; 21:537–9. 10.1093/bioinformatics/bti054. [DOI] [PubMed] [Google Scholar]
- 47. Stothard P, Grant JR, Van Domselaar G Visualizing and comparing circular genomes using the CGView family of tools. Brief Bioinform. 2019; 20:1576–82. 10.1093/bib/bbx081. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Mistry J, Chuguransky S, Williams L et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 2021; 49:D412–9. 10.1093/nar/gkaa913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Paysan-Lafosse T, Blum M, Chuguransky S et al. InterPro in 2022. Nucleic Acids Res. 2023; 51:D418–27. 10.1093/nar/gkac993. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Sigrist CJA, Cerutti L, de Castro E et al. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 2010; 38:D161–6. 10.1093/nar/gkp885. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The BASys2 web server is freely accessible at https://basys2.ca. The source code for the CGView.js genome browser can be accessed and downloaded at https://js.cgview.ca. The BDVA is available through the download tab on the BASys2 website. Other example data files (FASTA, FASTQ, GenBank, JSON) are also available via the BASys2 download tab. A Docker image of BASys2, along with instructions for installing this image and running BASys2 locally, is also available via the download tab on the BASys2 website.





