Abstract
Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [<virus name> (<strain>)/<isolation host-suffix>/<country of sampling>/<year of sampling>/<genetic variant designation>-<isolate designation>], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences.
Keywords: Bundibugyo virus, cDNA clone, cuevavirus, Ebola, Ebola virus, ebolavirus, filovirid, Filoviridae, filovirus, genome annotation, ICTV, International Committee on Taxonomy of Viruses, Lloviu virus, Marburg virus, marburgvirus, mononegavirad, Mononegavirales, mononegavirus, Ravn virus, RefSeq, Reston virus, reverse genetics, Sudan virus, Taï Forest virus, virus classification, virus isolate, virus nomenclature, virus strain, virus taxonomy, virus variant
1. Introduction
The National Center for Biotechnology Information (NCBI) RefSeq project was initiated to create a nonredundant and curated set of genomic, transcript, and protein sequence records [1]. Genomic RefSeq records provide a reference nucleotide sequence wherein individual protein coding regions and other sequence features are annotated, using the best available experimental data as a guide. Akin to the labeling of reference specimens as type specimens in other taxonomic schemes, RefSeq reference sequences can be considered type sequences for type viruses.
In the case of virological RefSeq records, each viral species was initially represented by only one genome sequence record, and all other genome records for members of the same species, or for different strains, variants, and isolates of the same member of this species were linked to this record as “genome neighbors” [2]. The rationale behind choosing a particular virus isolate sequence as reference sequence is unclear in most cases and has almost never been published. Annotation of individual RefSeq entries was performed using PubMed-indexed experimental data through NCBI inhouse and individual expert curation, since subspecialty-wide committees or expert groups had not been established.
The process of curating genome sequence data must now be fundamentally reformed, since the number of sequenced viral genomes has increased exponentially over the past decade [3]. Little to no experimental data are available for most new virus genomes, and annotation is often computationally transferred from related genomes or predicted de novo [4]. Moreover, the utility of reference genomes has expanded to include use in sequence assembly and pathogen detection pipelines [5,6,7,8,9]. With these changes, the data model has adapted, and multiple RefSeq records can now be maintained for several members of a particular virus species. This approach offers representation of the extant sequence diversity (or genotypes) within a particular species. Also, the approach provides a mechanism to maintain well annotated records from experimentally important laboratory isolates and from less studied isolates from the wild.
2. Current Filovirus RefSeq Entries
The mononegaviral family Filoviridae includes three genera, Cuevavirus, Ebolavirus, and Marburgvirus. Eight distinct filoviruses are recognized as members of a total of seven species distributed among these three genera (Table 1) [10,11,12,13,14].
Table 1.
Current Taxonomy and Nomenclature (Ninth ICTV Report and Updates) |
---|
Order Mononegavirales |
Family Filoviridae |
Genus Marburgvirus |
Species Marburg marburgvirus |
Virus 1: Marburg virus (MARV) |
Virus 2: Ravn virus (RAVV) |
Genus Ebolavirus |
Species Taï Forest ebolavirus |
Virus: Taï Forest virus (TAFV) |
Species Reston ebolavirus |
Virus: Reston virus (RESTV) |
Species Sudan ebolavirus |
Virus: Sudan virus (SUDV) |
Species Zaire ebolavirus |
Virus: Ebola virus (EBOV) |
Species Bundibugyo ebolavirus |
Virus: Bundibugyo virus (BDBV) |
Genus Cuevavirus |
Species Lloviu cuevavirus |
Virus: Lloviu virus (LLOV) |
These eight viruses are differentiated from each other by biological characteristics [12] and genomic sequence divergence [11,12,15,16]. This divergence is determined based on sequences of well-characterized variants of root viruses (from here on called type variants of type viruses) for each taxon [12]. These sequences, therefore, become de facto type sequences. Using type sequences allows algorithmic representation of filovirus relationships and newly isolated filoviruses can theoretically be automatically pre-assigned to existing or novel taxa (Figure 1). Temporary type filovirus variants were established by the 2010–2011 ICTV Filoviridae Study Group [12]. These temporary type variants were largely consistent with those chosen for RefSeq (Table 2) by the NCBI, which automatically chose the first sequence available for a new virus.
Table 2.
Filovirus Species | Type Virus of Species (Virus Abbreviation) | Type Variant and Isolate of Type Virus of Species | Type Sequence of Type Variant of Type Virus of Species (RefSeq) |
---|---|---|---|
Bundibugyo ebolavirus | Bundibugyo virus (BDBV) | Unnamed variant represented by isolate “811250”1 | NC_014373 |
Lloviu cuevavirus | Lloviu virus (LLOV) | Unnamed variant represented by isolate “MS-Liver-86/2003”2 | NC_016144 |
Marburg marburgvirus | Marburg virus (MARV) | Unnamed variant represented by isolate “Musoke” | NC_001608 |
Reston ebolavirus | Reston virus (RESTV) | Unnamed variant represented by isolate “Pennsylvania” | NC_004161 |
Sudan ebolavirus | Sudan virus (SUDV) | Unnamed variant represented by isolate “Boniface” [sic]3 | None |
Taï Forest ebolavirus | Taï Forest virus (TAFV) | Unnamed variant represented by isolate “Côte d’Ivoire”4 | NC_014372 |
Zaire ebolavirus | Ebola virus (EBOV) | Unnamed variant represented by isolate “Mayinga” | NC_002549 |
1 Isolate “811250” is/was not explicitly mentioned in [12] or RefSeq entry NC_014373 at the time of writing, but could be deduced from [17]; 2 The RefSeq isolate name “MS-Liver-86/2003” is mentioned only as “sample 86” in [18]. Note that LLOV has not been isolated in culture yet. “Isolate” here refers to the theoretical isolate, the coding sequences of which would correspond to this RefSeq sequence; 3 ”Boneface” is often misspelled “Boniface” in the literature, including in [12]. A review of original sample records at CDC clearly identified the correct name as “Boneface” (Stuart T. Nichol and Pierre E. Rollin, personal communication). RefSeq does not contain a “Boneface” entry but at the time of writing instead listed SUDV variant “Gulu” without an isolate reference (NC_006432); 4RefSeq entry NC_014372 did not contain an isolate name at the time of writing.
These variants and sequences therefore needed to be re-evaluated by filovirus experts. To achieve uniformity and consistency, the current RefSeq entries have to be relabeled to conform to current ICTV taxonomy. In addition, type filovirus variant designations have to be chosen and the individual isolate names have to be adjusted to the filovirus strain/variant/isolate schemes that were recently established [19].
3. RefSeq Entry Reevaluation
The “gold standard” filovirus type RefSeq entry should be selected on the basis of experimental importance and accessibility and represent a repository of functional information about a particular filovirus. It is of crucial importance that any functional annotation of a RefSeq entry (e.g., functions of particular genome parts or of genome-encoded proteins), is linked to the actual sequence associated with these experiments. The RefSeq entry should contain the most characterized virus/variant/isolate/sequence, independent of whether this virus, variant, or isolate was the first one discovered or the most widely used experimentally. Importantly, decisions on RefSeq entries do not entail a mandate that future experiments should necessarily be performed with the viruses associated with these entries. However, direct comparisons with RefSeq-associated viruses are highly recommended to further increase the detail associated with the RefSeq entries. These entries should be updated, and, if necessary, corrected on a continuous basis by a filovirus RefSeq subcommittee comprised of filovirus experts, whose composition is currently under consideration.
The authors of this article confirmed or replaced the current taxonomic type virus variants and isolates and the current filovirus RefSeq entries based on the availability of scientific information characterizing a particular virus. If scientific information is scarce for all members belonging to an entire taxon, other criteria such as availability, passaging history, or medical importance were used in decision making. Decisions were reached by consensus or simple majority voting, with the understanding that all authors will apply the final decisions reached by the entire group and enforce them in their functions as authors, peer-reviewers, and/or editors.
3.1. Cuevavirus RefSeq Entries
Only one cuevavirus, Lloviu virus (LLOV), has been described [18]. At the time of writing, LLOV had not been isolated in culture, and the sequence diversity of LLOV had only been defined in a single study using deep sequencing techniques on samples from deceased Schreibers’s long-fingered bats (Miniopterus schreibersii) [18]. Only one additional study has been published on this virus, characterizing molecular-biological characteristics of the LLOV glycoprotein [20]. The coding-complete genome of one LLOV has been determined (Genbank #JF828358), which therefore automatically became the current RefSeq sequence (#NC_016144) (see [21] for sequencing nomenclature used in this article). In the absence of additional deposited LLOV sequences and characterization data, this RefSeq entry should therefore be upheld but be considered temporary until a complete genome, including all non-coding sequences, is determined.
In line with filovirus strain/variant/isolate definitions outlined previously [19], we propose the variant designation “Asturias” (after the Principality of Asturias in Spain, where Cueva del Lloviu is located in which LLOV was discovered [18]) and the “isolate” name “Bat86” (instead of “MS-Liver-86/2003”) for this virus:
Full name: | Lloviu virus M.schreibersii-wt/ESP/2003/Asturias-Bat86 |
Shortened name: | LLOV/M.sch/ESP/03/Ast-Bat86 |
Abbreviated name: | LLOV/Ast-Bat86 |
Accordingly, in RefSeq #NC_016144 the definition line “Lloviu virus, complete genome” was changed to “Lloviu cuevavirus isolate Lloviu virus M.schreibersii-wt/ESP/2003/Asturias-Bat86, [coding-]complete genome.” The RefSeq <strain> field was cleared; and the RefSeq <isolate> field was filled with “Lloviu virus M.schreibersii-wt/ESP/2003/Asturias-Bat86.” The same changes should be applied to GenBank #JF828358. [Note here and below that the International Nucleotide Sequence Database Collaboration (INSDC) standard currently does not offer options other than “complete” or “partial,” and, in particular, does not provide a possibility for the designation “coding-complete.” Also note here and below that neither RefSeq nor GenBank currently can handle italics or extended Latin characters, which is why the species names are not italicized in the entry’s definition line and <organism> fields and why letters with diacritics revert to their basic Latin letter counterpart].
3.2. Ebolavirus RefSeq Entries
The genus Ebolavirus includes five species, each of which is represented by one virus.
3.2.1. Bundibugyo Virus
Bundibugyo virus (BDBV) is the second least characterized ebolavirus. Although at least eight isolates of this virus are available [17,22], all experiments reported to date have been performed with one particular isolate, “811250” (often wrongly referred to as “200706291”). The complete sequence of this isolate is the one found in the current RefSeq entry (NC_014373). This isolate, obtained after two passages of clinical material in Vero E6 cells, came from a male patient who died in 2007 in Uganda [17]. We propose the variant designation “Butalya” (after Butalya Parish, Kikyo Subcounty in Uganda’s Bundibugyo district where BDBV was discovered) and the isolate name “811250” for this virus:
Full name: | Bundibugyo virus H.sapiens-tc/UGA/2007/Butalya-811250 |
Shortened name: | BDBV/H.sap/UGA/07/But-811250 |
Abbreviated name: | BDBV/But-811250 |
Accordingly, in RefSeq #NC_014373, the definition line “Bundibugyo ebolavirus, complete genome” was changed to “Bundibugyo ebolavirus isolate Bundibugyo virus H.sapiens-tc/UGA/2007/Butalya-811250, complete genome.” The RefSeq <isolate> field was filled with “Bundibugyo virus H.sapiens-tc/UGA/2007/Butalya-811250.” The same changes should be applied to GenBank #FJ217161.
3.2.2. Ebola Virus
Ebola virus (EBOV) is the most thoroughly characterized ebolavirus. Dozens of EBOV isolates are available, but the vast majority of published experiments have been performed with isolates “Mayinga” and “Kikwit” (reviewed in [23]). The “Mayinga” isolate, the first EBOV isolate obtained in 1976, has been used extensively for molecular-biological characterizations. The “Kikwit” variant, obtained during an Ebola virus disease outbreak in 1995, has been used almost exclusively for pathogenesis studies in nonhuman primates in the US (the “Mayinga” isolate is used almost everywhere else) [23]. All available EBOV cDNA clone systems are based on the “Mayinga” isolate (see [24]). The only available mouse- and guinea pig-adapted EBOV strains are derived from the “Mayinga” isolate (see [25]), and all available EBOV protein crystal structures are derived from the “Mayinga” isolate [26,27,28,29,30,31,32,33,34]. The “Mayinga” isolate was therefore chosen as the prototype EBOV for RefSeq (#NC_002549), which lists a complete genome obtained after 3–4 passages in Vero E6 cells. We uphold this decision and propose the variant designation “Yambuku” (after the village in which EBOV first emerged [35,36]) and retain the isolate designation “Mayinga” (the last name of a nurse who succumbed to infection [36]) for this virus:
Full name: | Ebola virus H.sapiens-tc/COD/1976/Yambuku-Mayinga |
Shortened name: | EBOV/H.sap/COD/76/Yam-May |
Abbreviated name: | EBOV/Yam-May |
Accordingly, in RefSeq #NC_002549 the definition line “Zaire ebolavirus, complete genome” was changed to “Zaire ebolavirus isolate Ebola virus H.sapiens-tc/COD/1976/Yambuku-Mayinga, complete genome.” The RefSeq <strain” field was cleared; the RefSeq <isolate> field was filled with “Ebola virus H.sapiens-tc/COD/1976/Yambuku-Mayinga;” and the <organism> field was corrected to “Zaire ebolavirus.” The same changes should be applied to GenBank #AF086833.
3.2.3. Reston Virus
Reston virus (RESTV) has caused multiple epizootics among captive macaques (1989-1990, 1992, 1996) and domestic pigs in 2008 (reviewed in [37]). At least 10 isolates were obtained during all these outbreaks, and eight complete or coding-complete genomic sequences have been deposited. However, the vast majority of RESTV experiments, in particular those regarding molecular characterization, have been performed with “Pennsylvania” (reviewed in [23]). “Pennsylvania” is the only RESTV variant for which there is a reverse genetics system [38]. In addition, “Pennsylvania” sequences served as the basis for the available RESTV protein crystal structures [39,40,41,42,43]. “Pennsylvania” (NC_004161) was chosen for the current RESTV RefSeq entry, which we propose to maintain. We propose the variant designation “Philippines89” (a reference to the time and place from which this virus was exported to the US in 1989) and the isolate name “Pennsylvania” for this virus:
Full name: | Reston virus M.fascicularis-tc/USA/1989/Philippines89-Pennsylvania |
Shortened name: | RESTV/M.fas/USA/89/Phi89-Pen |
Abbreviated name: | RESTV/Phi89-Pen |
Accordingly, in RefSeq #NC_004161, the definition line “Reston ebolavirus, complete genome” was changed to “Reston ebolavirus isolate Reston virus M.fascicularis-tc/USA/1989/Philippines89-Pennsylvania, complete genome.” The RefSeq <strain> field was cleared; and the RefSeq <isolate> field was filled with “Reston virus M.fascicularis-tc/USA/1989/Philippines89-Pennsylvania”. The same changes should be applied to GenBank #AF522874.
3.2.4. Sudan Virus
Sudan virus (SUDV) is the second-best characterized ebolavirus. Approximately 15 SUDV isolates have been described, but very few experiments have been performed with any of these isolates. Early experiments focused on isolate “Boneface” (often misspelled “Boniface”). Recently variant “Gulu” isolate “808892” has become a more popular choice, and data from experiments with this virus continue to accumulate (reviewed in [23]). Crystal structures for GP1,2 were determined for both viruses [33,42,44,45]. However, the passaging history of the “Boneface” isolate has not been thoroughly documented and includes passaging in guinea pigs and culturing in various cell types. The “Gulu-808892” isolate, on the other hand, is completely sequenced and is the current virus of choice for nonhuman primate experiments in the US. While the “Boneface” isolate was chosen by the 2010-2011 ICTV Filoviridae Study Group as the type SUDV [12], “Gulu-808892” isolate was chosen as the prototype SUDV for RefSeq (#NC_006432). We propose to support the RefSeq decision and to change the SUDV type virus variant to “Gulu.” As several “Gulu” isolates are available, we propose the variant designation “Gulu” for the virus variant that caused the disease outbreak that started in Gulu District, Uganda, in 2000, and the isolate designation “808892” for the RefSeq entry of this particular virus. (“808892” was obtained after three Vero E6 cell passages of clinical material coming from an infected male who died):
Full name: | Sudan virus H.sapiens-tc/UGA/2000/Gulu-808892 |
Shortened name: | SUDV/H.sap/UGA/00/Gul-808892 |
Abbreviated name: | SUDV/Gul-808892 |
Accordingly, in RefSeq #NC_006432, the definition line “Sudan ebolavirus, complete genome” was changed to “Sudan ebolavirus isolate Sudan virus H.sapiens-tc/UGA/2000/Gulu-808892, complete genome.” The RefSeq <strain> field was cleared; and the RefSeq <isolate> field was filled with “Sudan virus H.sapiens-tc/UGA/2000/Gulu-808892.” The same changes should be applied to GenBank #AY729654.
3.2.5. Taï Forest Virus
Taï Forest virus (TAFV) is the least characterized ebolavirus. Only one isolate (“807212” = “CI”) was obtained from a female survivor [46] after seven passages in Vero E6 cells, and the coding-complete genome of this isolate is the only genomic TAFV sequence available [17]. Therefore, this sequence automatically became the current RefSeq sequence (#NC_014372). In the absence of additional deposited TAFV sequences and characterization data, this RefSeq entry should therefore be upheld but be considered temporary.
We propose the variant designation “Pauléoula” (after the village of Pauléoula, Guiglo Department in Moyen-Cavally Region, Côte d’Ivoire, where TAFV was first found [46]) and the isolate name “CI” (for “Côte d’Ivoire”) for this virus:
Full name: | Taï Forest virus H.sapiens-tc/CIV/1994/Pauléoula-CI |
Shortened name: | TAFV/H.sap/CIV/94/Pau-CI |
Abbreviated name: | TAFV/Pau-CI |
Accordingly, in RefSeq # NC_014372, the definition line “Tai Forest ebolavirus, complete genome” was changed to “Taï Forest ebolavirus isolate Taï Forest virus H.sapiens-tc/CIV/1994/Pauléoula-CI, [coding-]complete genome,” and the RefSeq <isolate> field was filled with “Taï Forest virus H.sapiens-tc/CIV/1994/Pauléoula-CI.” The same changes should be applied to GenBank #FJ217162.
3.3. Marburgvirus RefSeq Entries
The genus Marburgvirus includes a single species, which is represented by two divergent viruses.
3.3.1. Marburg Virus
Marburg virus (MARV) is the most thoroughly characterized marburgvirus. Some 70 MARV isolates are available, but the majority of published experiments have been performed with isolate “Musoke” (reviewed in [23]). However, experiments not characterizing MARV but rather the disease it causes are increasingly performed with an “Angola” isolate in the US and continue to be performed with “Popp” or “Voege” isolates in Russia. The only available MARV cDNA clone systems are based on the “Musoke” isolate (see [24]) and on a nonhuman (bat) isolate [47]. The “Musoke” isolate has therefore been chosen as the prototype MARV for RefSeq (#NC_001608). We uphold this decision and propose the variant designation “Mt. Elgon” (after Mount Elgon, Kenya, where this variant is thought to have originated [48]) and the isolate designation “Musoke” (after a Nairobi doctor who got infected [49]) with this virus):
Full name: | Marburg virus H.sapiens-tc/KEN/1980/Mt. Elgon-Musoke |
Shortened name: | MARV/Hsap/KEN/80/MtE-Mus |
Abbreviated name: | MARV/MtE-Mus |
Accordingly, in RefSeq #NC_001608, the definition line “Marburg marburgvirus, complete genome” was changed to “Marburg marburgvirus isolate Marburg virus H.sapiens-tc/KEN/1980/Mt. Elgon-Musoke, complete genome.” The RefSeq <strain” field was cleared, and the RefSeq <isolate> field was filled with “Marburg virus H.sapiens-tc/KEN/1980/Mt. Elgon-Musoke.” The same changes should be applied to GenBank #DQ217792.
3.3.2. Ravn Virus
Ravn virus (RAVV) is a largely uncharacterized marburgvirus that belongs to the same species as MARV. At least three human (“Ravn” = “810040,” “09DCR,” ”02Uga”) and four Egyptian rousette isolates (“44Bat,” “188Bat,” “982Bat,” “1304 Bat”) have been obtained. Virtually all RAVV characterization experiments have been performed with “Ravn” = “810040,” which was obtained after at least two passages in SW-13 cells and four passages in Vero E6 cells. Since RAVV is a phylogenetically distinct marburgvirus, we created a RefSeq entry for the “Ravn” isolate, for which we propose the variant designation “Kitum Cave” (after Kenya’s Kitum Cave on Mount Elgon where RAVV first emerged) and the isolate designation “810040”:
Full name: | Ravn virus H.sapiens-tc/KEN/1987/Kitum Cave-810040 |
Shortened name: | RAVV/H.sap/KEN/87/KiC-810040 |
Abbreviated name: | RAVV/KiC-810040 |
Accordingly, the RefSeq entry was created with the definition line “Marburg marburgvirus isolate Ravn virus H.sapiens-tc/KEN/1987/Kitum Cave-810040, [coding-]complete genome.” The RefSeq <isolate> field contains “Ravn virus H.sapiens-tc/KEN/1987/Kitum Cave-810040.” The deposited sequence (NC_024781) is identical with GenBank #DQ447649, which should be updated accordingly.
A summary of the proposed designations and RefSeq accession numbers can be found in Table 3.
Table 3.
Filovirus Species | Type Virus of Species (Virus Abbreviation) | Type Variant and Isolate of Type Virus of Species | Type Sequence of Type Variant of Type Virus of Species (RefSeq) |
---|---|---|---|
Bundibugyo ebolavirus | Bundibugyo virus (BDBV) | Bundibugyo virus H.sapiens-tc/UGA/2007/Butalya-811250 | NC_014373 |
Lloviu cuevavirus | Lloviu virus (LLOV) | Lloviu virus M.schreibersii-wt/ESP/2003/Asturias-Bat861 | NC_016144 |
Marburg marburgvirus | Marburg virus (MARV) | Marburg virus H.sapiens-tc/KEN/1980/Mt. Elgon-Musoke | NC_001608 |
Reston ebolavirus | Reston virus (RESTV) | Reston virus M.fascicularis-tc/USA/1989/Philippines89-Pennsylvania | NC_004161 |
Sudan ebolavirus | Sudan virus (SUDV) | Sudan virus H.sapiens-tc/UGA/2000/Gulu-808892 | NC_006432 |
Taï Forest ebolavirus | Taï Forest virus (TAFV) | Taï Forest virus H.sapiens-tc/CIV/1994/Pauléoula-CI | NC_014372 |
Zaire ebolavirus | Ebola virus (EBOV) | Ebola virus H.sapiens-tc/COD/1976/Yambuku-Mayinga | NC_002549 |
1 Note that LLOV has not been isolated in culture yet. “Isolate” here refers to the theoretical isolate, the coding sequences of which would correspond to this RefSeq sequence.
Acknowledgments
We thank Laura Bollinger (IRF-Frederick) for carefully editing the manuscript. The content of this publication does not necessarily reflect the views or policies of the US Department of the Army, the US Department of Defense or the US Department of Health and Human Services or of the institutions and companies affiliated with the authors. J.H.K. performed this work as an employee of Tunnell Government Services, Inc.; M.G.L. as an employee of Lovelace Respiratory Research Institute; and G.G.O. as an employee of MRI Global; all three subcontractors to Battelle Memorial Institute; and J.C.J., J.K., and J.P. performed this work as employees of Battelle Memorial Institute; all under Battelle Memorial Institute’s prime contract with NIAID, under Contract No. HHSN272200700016I. This research was further supported in part by the Intramural Research Program of the NIH, National Library of Medicine (Y.B., O.B., and J.R.B.), and the Intramural Research Program of the NIH, NIAID (T.H.). This work was also funded under Agreement No. HSHQDC-07-C-00020 awarded by the Department of Homeland Security Science and Technology Directorate (DHS/S&T) for the management and operation of the National Biodefense Analysis and Countermeasures Center (NBACC), a Federally Funded Research and Development Center. This work was partially supported by the Defense Threat reduction Agency. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the US Department of Homeland Security. In no event shall the DHS, NBACC, or Battelle National Biodefense Institute (BNBI) have any responsibility or liability for any use, misuse, inability to use, or reliance upon the information contained herein. The Department of Homeland Security does not endorse any products or commercial services mentioned in this publication.
Author contributions
All authors were engaged in the discussion about the best possible RefSeq virus variants and sequences. The final decisions presented in the paper were reached by consensus or simple majority voting, with the understanding that all authors will apply the final decisions reached by the entire group and enforce them in their functions as authors, peer-reviewers, and/or editors.
Conflicts of Interest
The authors declare no conflict of interest.
References
- 1.Pruitt K.D., Tatusova T., Brown G.R., Maglott D.R. NCBI Reference Sequences (RefSeq): Current status, new features and genome annotation policy. Nucl. Acids Res. 2012;40:D130–D135. doi: 10.1093/nar/gkr1079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bao Y., Federhen S., Leipe D., Pham V., Resenchuk S., Rozanov M., Tatusov R., Tatusova T. National center for biotechnology information viral genomes project. J. Virol. 2004;78:7291–7298. doi: 10.1128/JVI.78.14.7291-7298.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Brister J.R., Le Mercier P., Hu J.C. Microbial virus genome annotation-mustering the troops to fight the sequence onslaught. Virology. 2012;434:175–180. doi: 10.1016/j.virol.2012.09.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Klimke W., O’Donovan C., White O., Brister J.R., Clark K., Fedorov B., Mizrachi I., Pruitt K.D., Tatusova T. Solving the Problem: Genome Annotation Standards before the Data Deluge. Stand. Genomic Sci. 2011;5:168–193. doi: 10.4056/sigs.2084864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wang Q., Jia P., Zhao Z. VirusFinder: Software for efficient and accurate detection of viruses and their integration sites in host genomes through next generation sequencing data. PLoS One. 2013;8:e64465. doi: 10.1371/journal.pone.0064465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Gaynor A.M., Nissen M.D., Whiley D.M., Mackay I.M., Lambert S.B., Wu G., Brennan D.C., Storch G.A., Sloots T.P., Wang D. Identification of a novel polyomavirus from patients with acute respiratory tract infections. PLoS Pathog. 2007;3:e64. doi: 10.1371/journal.ppat.0030064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kostic A.D., Ojesina A.I., Pedamallu C.S., Jung J., Verhaak R.G., Getz G., Meyerson M. PathSeq: Software to identify or discover microbes by deep sequencing of human tissue. Nat. Biotechnol. 2011;29:393–396. doi: 10.1038/nbt.1868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Holtz L.R., Finkbeiner S.R., Zhao G., Kirkwood C.D., Girones R., Pipas J.M., Wang D. Klassevirus 1, a previously undescribed member of the family Picornaviridae, is globally widespread. Virol. J. 2009;6:86. doi: 10.1186/1743-422X-6-86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Borozan I., Watt S.N., Ferretti V. Evaluation of alignment algorithms for discovery and identification of pathogens using RNA-Seq. PLoS One. 2013;8:e76935. doi: 10.1371/journal.pone.0076935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Adams M.J., Carstens E.B. Ratification vote on taxonomic proposals to the International Committee on Taxonomy of Viruses (2012) Arch. Virol. 2012;157:1411–1422. doi: 10.1007/s00705-012-1299-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kuhn J.H., Becker S., Ebihara H., Geisbert T.W., Jahrling P.B., Kawaoka Y., Netesov S.V., Nichol S.T., Peters C.J., Volchkov V.E., et al. Family Filoviridae. In: King A.M.Q., Adams M.J., Carstens E.B., Lefkowitz E.J., editors. Virus Taxonomy—Ninth Report of the International Committee on Taxonomy of Viruses. Elsevier/Academic Press; London, UK: 2011. pp. 665–671. [Google Scholar]
- 12.Kuhn J.H., Becker S., Ebihara H., Geisbert T.W., Johnson K.M., Kawaoka Y., Lipkin W.I., Negredo A.I., Netesov S.V., Nichol S.T., et al. Proposal for a revised taxonomy of the family Filoviridae: Classification, names of taxa and viruses, and virus abbreviations. Arch. Virol. 2010;155:2083–2103. doi: 10.1007/s00705-010-0814-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bukreyev A.A., Chandran K., Dolnik O., Dye J.M., Ebihara H., Leroy E.M., Mühlberger E., Netesov S.V., Patterson J.L., Paweska J.T., et al. Discussions and decisions of the 2012–2014 International Committee on Taxonomy of Viruses (ICTV) Filoviridae Study Group, January 2012–June 2013. Arch. Virol. 2013;159:821–830. doi: 10.1007/s00705-013-1846-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Adams M.J., Lefkowitz E.J., King A.M., Carstens E.B. Ratification vote on taxonomic proposals to the International Committee on Taxonomy of Viruses (2014) Arch. Virol. 2014;159:2831–2841. doi: 10.1007/s00705-014-2114-3. [DOI] [PubMed] [Google Scholar]
- 15.Bao Y., Chetvernin V., Tatusova T. PAirwise Sequence Comparison (PASC) and Its Application in the Classification of Filoviruses. Viruses. 2012;4:1318–1327. doi: 10.3390/v4081318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lauber C., Gorbalenya A.E. Genetics-based classification of filoviruses calls for expanded sampling of genomic sequences. Viruses. 2012;4:1425–1437. doi: 10.3390/v4091425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Towner J.S., Sealy T.K., Khristova M.L., Albariño C.G., Conlan S., Reeder S.A., Quan P.L., Lipkin W.I., Downing R., Tappero J.W., et al. Newly discovered Ebola virus associated with hemorrhagic fever outbreak in Uganda. PLoS Pathog. 2008;4:e1000212. doi: 10.1371/journal.ppat.1000212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Negredo A., Palacios G., Vazquez-Morón S., González F., Dopazo H., Molero F., Juste J., Quetglas J., Savji N., de la Cruz Martínez M., et al. Discovery of an ebolavirus-like filovirus in europe. PLoS Pathog. 2011;7:e1002304. doi: 10.1371/journal.ppat.1002304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kuhn J.H., Bao Y., Bavari S., Becker S., Bradfute S., Brister J.R., Bukreyev A.A., Chandran K., Davey R.A., Dolnik O., et al. Virus nomenclature below the species level: A standardized nomenclature for natural variants of viruses assigned to the family Filoviridae. Arch. Virol. 2013;158:301–311. doi: 10.1007/s00705-012-1454-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Maruyama J., Miyamoto H., Kajihara M., Ogawa H., Maeda K., Sakoda Y., Yoshida R., Takada A. Characterization of the envelope glycoprotein of a novel filovirus, Lloviu virus. J. Virol. 2014;88:99–109. doi: 10.1128/JVI.02265-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ladner J.T., Beitzel B., Chain P.S., Davenport M.G., Donaldson E.F., Frieman M., Kugelman J.R., Kuhn J.H., O'Rear J., Sabeti P.C., et al. Standards for sequencing viral genomes in the era of high-throughput sequencing. MBio. 2014;5:e01360–14. doi: 10.1128/mBio.01360-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Albariño C.G., Shoemaker T., Khristova M.L., Wamala J.F., Muyembe J.J., Balinandi S., Tumusiime A., Campbell S., Cannon D., Gibbons A., et al. Genomic analysis of filoviruses associated with four viral hemorrhagic fever outbreaks in Uganda and the Democratic Republic of the Congo in 2012. Virology. 2013;442:97–100. doi: 10.1016/j.virol.2013.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kuhn J.H. Filoviruses. A compendium of 40 years of Epidemiological, Clinical, and Laboratory Studies Archives of Virology Supplementum. SpringerWienNewYork; Vienna, Austria: 2008. [PubMed] [Google Scholar]
- 24.Kuhn J.H., Bao Y., Bavari S., Becker S., Bradfute S., Brauburger K., Rodney Brister J., Bukreyev A.A., Caì Y., Chandran K., et al. Virus nomenclature below the species level: A standardized nomenclature for filovirus strains and variants rescued from cDNA. Arch. Virol. 2014;159:1229–1237. doi: 10.1007/s00705-013-1877-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kuhn J.H., Bao Y., Bavari S., Becker S., Bradfute S., Brister J.R., Bukreyev A.A., Caì Y., Chandran K., Davey R.A., et al. Virus nomenclature below the species level: A standardized nomenclature for laboratory animal-adapted strains and variants of viruses assigned to the family Filoviridae. Arch. Virol. 2013;158:1425–1432. doi: 10.1007/s00705-012-1594-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Brown C.S., Lee M.S., Leung D.W., Wang T., Xu W., Luthra P., Anantpadma M., Shabman R.S., Melito L.M., Macmillan K.S., et al. In Silico Derived Small Molecules Bind the Filovirus VP35 Protein and Inhibit Its Polymerase Cofactor Activity. J. Mol. Biol. 2014;426:2045–2058. doi: 10.1016/j.jmb.2014.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Binning J.M., Wang T., Luthra P., Shabman R.S., Borek D.M., Liu G., Xu W., Leung D.W., Basler C.F., Amarasinghe G.K. Development of RNA Aptamers Targeting Ebola Virus VP35. Biochemistry. 2013;52:8406–8419. doi: 10.1021/bi400704d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Prins K.C., Delpeut S., Leung D.W., Reynard O., Volchkova V.A., Reid S.P., Ramanan P., Cárdenas W.B., Amarasinghe G.K., Volchkov V.E., et al. Mutations abrogating VP35 interaction with double-stranded RNA render Ebola virus avirulent in guinea pigs. J. Virol. 2010;84:3004–3015. doi: 10.1128/JVI.02459-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Leung D.W., Prins K.C., Borek D.M., Farahbakhsh M., Tufariello J.M., Ramanan P., Nix J.C., Helgeson L.A., Otwinowski Z., Honzatko R.B., et al. Structural basis for dsRNA recognition and interferon antagonism by Ebola VP35. Nat. Struct. Mol. Biol. 2010;17:165–172. doi: 10.1038/nsmb.1765. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Leung D.W., Ginder N.D., Fulton D.B., Nix J., Basler C.F., Honzatko R.B., Amarasinghe G.K. Structure of the Ebola VP35 interferon inhibitory domain. Proc. Natl. Acad. Sci. USA. 2009;106:411–416. doi: 10.1073/pnas.0807854106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Malashkevich V.N., Schneider B.J., McNally M.L., Milhollen M.A., Pang J.X., Kim P.S. Core structure of the envelope glycoprotein GP2 from Ebola virus at 1.9-Å resolution. Proc. Natl. Acad. Sci. USA. 1999;96:2662–2667. doi: 10.1073/pnas.96.6.2662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Lee J.E., Fusco M.L., Hessell A.J., Oswald W.B., Burton D.R., Saphire E.O. Structure of the Ebola virus glycoprotein bound to an antibody from a human survivor. Nature. 2008;454:177–182. doi: 10.1038/nature07082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Bornholdt Z.A., Noda T., Abelson D.M., Halfmann P., Wood M.R., Kawaoka Y., Saphire E.O. Structural rearrangement of Ebola virus VP40 begets multiple functions in the virus life cycle. Cell. 2013;154:763–774. doi: 10.1016/j.cell.2013.07.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Hartlieb B., Muziol T., Weissenhorn W., Becker S. Crystal structure of the C-terminal domain of Ebola virus VP30 reveals a role in transcription and nucleocapsid association. Proc. Natl. Acad. Sci. USA. 2007;104:624–629. doi: 10.1073/pnas.0606730104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ebola haemorrhagic fever in Zaire, 1976. Bull. World Health Organ. 1978;56:271–293. [PMC free article] [PubMed] [Google Scholar]
- 36.Garrett L. Yambuku—Ebola. In: Garrett L., editor. The Coming Plague—Newly Emerging Disease in a World out of Balance. Farrar, Straus & Giroux; New York, USA: 1994. pp. 100–152. [Google Scholar]
- 37.Miranda M.E., Miranda N.L. Reston ebolavirus in Humans and Animals in the Philippines: A Review. J. Infect. Dis. 2011;204:S757–S760. doi: 10.1093/infdis/jir296. [DOI] [PubMed] [Google Scholar]
- 38.Groseth A., Marzi A., Hoenen T., Herwig A., Gardner D., Becker S., Ebihara H., Feldmann H. The Ebola virus glycoprotein contributes to but is not sufficient for virulence in vivo. PLoS Pathog. 2012;8:e1002847. doi: 10.1371/journal.ppat.1002847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kimberlin C.R., Bornholdt Z.A., Li S., Woods V.L., Jr., MacRae I.J., Saphire E.O. Ebolavirus VP35 uses a bimodal strategy to bind dsRNA for innate immune suppression. Proc. Natl. Acad. Sci. USA. 2010;107:314–319. doi: 10.1073/pnas.0910547107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Leung D.W., Shabman R.S., Farahbakhsh M., Prins K.C., Borek D.M., Wang T., Mühlberger E., Basler C.F., Amarasinghe G.K. Structural and functional characterization of Reston Ebola virus VP35 interferon inhibitory domain. J. Mol. Biol. 2010;399:347–357. doi: 10.1016/j.jmb.2010.04.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Clifton M.C., Kirchdoerfer R.N., Atkins K., Abendroth J., Raymond A., Grice R., Barnes S., Moen S., Lorimer D., Edwards T.E., et al. Structure of the Reston ebolavirus VP30 C-terminal domain. Acta Crystallogr. F Struct. Biol. Commun. 2014;70:457–460. doi: 10.1107/S2053230X14003811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Zhang A.P., Bornholdt Z.A., Liu T., Abelson D.M., Lee D.E., Li S., Woods V.L., Jr., Saphire E.O. The Ebola Virus Interferon Antagonist VP24 Directly Binds STAT1 and Has a Novel, Pyramidal Fold. PLoS Pathog. 2012;8:e1002550. doi: 10.1371/annotation/5af1d62a-8262-4340-ad06-c450a50295e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Bale S., Julien J.P., Bornholdt Z.A., Krois A.S., Wilson I.A., Saphire E.O. Ebolavirus VP35 coats the backbone of double-stranded RNA for interferon antagonism. J. Virol. 2013;87:10385–10388. doi: 10.1128/JVI.01452-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Bale S., Dias J.M., Fusco M.L., Hashiguchi T., Wong A.C., Liu T., Keuhne A.I., Li S., Woods V.L., Jr., Chandran K., et al. Structural basis for differential neutralization of ebolaviruses. Viruses. 2012;4:447–470. doi: 10.3390/v4040447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Dias J.M., Kuehne A.I., Abelson D.M., Bale S., Wong A.C., Halfmann P., Muhammad M.A., Fusco M.L., Zak S.E., Kang E., et al. A shared structural solution for neutralizing ebolaviruses. Nat. Struct. Mol. Biol. 2011;18:1424–1427. doi: 10.1038/nsmb.2150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Le Guenno B., Formenty P., Wyers M., Gounon P., Walker F., Boesch C. Isolation and partial characterisation of a new strain of Ebola virus. Lancet. 1995;345:1271–1274. doi: 10.1016/s0140-6736(95)90925-7. [DOI] [PubMed] [Google Scholar]
- 47.Albariño C.G., Uebelhoer L.S., Vincent J.P., Khristova M.L., Chakrabarti A.K., McElroy A., Nichol S.T., Towner J.S. Development of a reverse genetics system to generate recombinant Marburg virus derived from a bat isolate. Virology. 2013;446:230–237. doi: 10.1016/j.virol.2013.07.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Smith D.H., Johnson B.K., Isaacson M., Swanapoel R., Johnson K.M., Killey M., Bagshawe A., Siongok T., Keruga W.K. Marburg-virus disease in Kenya. Lancet. 1982;1:816–820. doi: 10.1016/S0140-6736(82)91871-2. [DOI] [PubMed] [Google Scholar]
- 49.Preston R. The Hot Zone—A Terrifying New Story. Random House; New York, NY, USA: 1994. [Google Scholar]