Abstract
The rapid and accurate identification of pathogens is critical in the control of infectious disease. To this end, we analyzed the capacity for viral detection and identification of a newly described high-density resequencing microarray (RMA), termed PathogenID, which was designed for multiple pathogen detection using database similarity searching. We focused on one of the largest and most diverse viral families described to date, the family Rhabdoviridae. We demonstrate that this approach has the potential to identify both known and related viruses for which precise sequence information is unavailable. In particular, we demonstrate that a strategy based on consensus sequence determination for analysis of RMA output data enabled successful detection of viruses exhibiting up to 26% nucleotide divergence with the closest sequence tiled on the array. Using clinical specimens obtained from rabid patients and animals, this method also shows a high species level concordance with standard reference assays, indicating that it is amenable for the development of diagnostic assays. Finally, 12 animal rhabdoviruses which were currently unclassified, unassigned, or assigned as tentative species within the family Rhabdoviridae were successfully detected. These new data allowed an unprecedented phylogenetic analysis of 106 rhabdoviruses and further suggest that the principles and methodology developed here may be used for the broad-spectrum surveillance and the broader-scale investigation of biodiversity in the viral world.
The ability to simultaneously screen for a large panel of pathogens in clinical samples, especially viruses, will represent a major development in the diagnosis of infectious diseases and in surveillance programs for emerging pathogens. Currently, most diagnostic methods are based on species-specific viral nucleic acid amplification. Although rapid and extremely sensitive, these methods are suboptimal when testing for a large number of known pathogens, when viral sequence divergence is high, when new but related viruses are anticipated, or when no clear viral etiologic agent is suspected. To overcome these technical difficulties, newer technologies have been employed, especially microarrays dedicated to pathogen detection. Indeed, DNA microarrays have been shown to be a powerful platform for the highly multiplexed differential diagnosis of infectious diseases. For example, pathogen microarrays can be simultaneously used to screen various viral or bacterial families and have been successfully used in the detection of microbial agents from different clinical samples (10-12, 19, 32, 35, 41, 42, 48).
The “classical” DNA microarrays developed so far are based on the use of long-oligonucleotide pathogen-specific probes (≥50 nucleotides [nt]). Although powerful in terms of sensitivity, these diagnostic tools have the disadvantage of decreased specificity, making it necessary to target multiple markers, and rely on hybridization patterns for pathogen identification, leading to unquantifiable errors (4). Moreover, these methods lack comprehensive information about the pathogen at the single-nucleotide level, which could represent a major problem when the sequences in question show a high degree of similarity (21). The microarray-based pathogen resequencing assay represents a promising alternative tool with which to overcome these limitations. This method identifies each specific pathogen and is capable of resequencing, or “fingerprinting,” multiple pathogens in a single test. Indeed, this technology uses tiled sets of 105 to 106 probes of 25mers, which contain one perfectly matched and three mismatched probes per base for both strands of the target genes (16). This technology also offers the potential for a single test that detects and discriminates between a target pathogen and its closest phylogenetic neighbors, which expands the repertoire of identifiable organisms far beyond those that are initially included in the array. Successful results have been obtained using this technology, especially for the detection of broad-spectrum respiratory tract pathogens using respiratory pathogen microarrays (2, 25, 26) or the detection of a broad range of biothreat agents (1, 23, 36, 45). The amplification step, which is more often limiting for this technology, has also benefited from recent developments. Phi29 polymerase-based amplification methods provide amplified DNA with minimal changes in sequence and relative abundance for many biomedical applications (3, 31, 40). The amplification factor varied from 106 to 109, and it was also demonstrated that coamplification occurred when viral RNA was mixed with bacterial DNA (3). This whole-transcriptome amplification (WTA) approach can also be successfully applied to viral genomic RNA of all sizes. Amplifying viral RNA by WTA provides considerably better sensitivity and accuracy of detection than random reverse transcription (RT)-PCR in the context of resequencing microarrays (RMAs) (3).
The rhabdoviruses are single-stranded, negative-sense RNA genome viruses classified into six genera, three of which—Vesiculovirus, Lyssavirus, and Ephemerovirus—include arthropod-borne agents that infect birds, reptiles, and mammals, as well as a variety of non-vector-borne mammalian or fish viruses (International Committee on Taxonomy of Viruses database [ICTVdb]) (reviewed in reference 7). These rhabdoviruses are the etiological agents of human diseases, such as rabies, that cause serious public health problems. Some rhabdoviruses also cause important economic losses in livestock. The three others genera include Nucleorhabdovirus and Cytorhabdovirus, which are arthropod-borne viruses infecting plants, and Novirhabdovirus, which comprises fish viruses. Other than the well-characterized rhabdoviruses that are known to be important for agriculture and public health, there is also a constantly growing list of rhabdoviruses, isolated from a variety of vertebrate and invertebrate hosts, that are partially characterized and are still waiting for definitive genus or species assignment. Considering the large spectrum of potential animal reservoirs of these viruses compared to the few identified virus species, it is highly likely that the number of uncharacterized rhabdoviruses is immense.
Unclassified or unassigned viruses have been tentatively identified as members of the family Rhabdoviridae by electron microscopy, based on their bullet-shaped morphology—a characteristic trait of members of this family—or using their antigenic relationships based on serological tests (9, 38). Gene sequencing and phylogenetic relationships have then been progressively applied to complete this initial virus taxonomy (6, 22, 27). Importantly, a strongly conserved domain in the rhabdovirus genome, within the polymerase gene, is a useful target for the exploration of the distant evolutionary relationships among these diverse viruses (6). This region corresponds to block III of the viral polymerase, a region predicted to be essential for RNA polymerase function, as it is highly conserved among most of the RNA-dependent RNA polymerases (14, 33, 46). A direct application using this sequence region was recently described for lyssavirus RNA detection in human rabies diagnosis (13). Taking advantage of these characteristics, this polymerase region was also used to design probes for high-density RMAs, also called PathogenID arrays (Affymetrix), which are optimized for the detection and sequence determination of several RNA viruses, particularly rhabdoviruses (1).
In the present study, PathogenID microarrays containing probes for the detection of up to 126 viruses were tested using a consensus sequence determination strategy for the analysis of output RMA data. We demonstrate that this approach has the potential to identify, in experimentally infected and clinical specimens, known but also phylogenetically related rhabdoviruses for which precise sequence information was not available.
MATERIALS AND METHODS
Design of the PathogenID microarray for rhabdovirus detection.
Two generations of PathogenID arrays were used in this study: PathogenID v1.0, containing probes for the detection of 42 viruses (including 3 prototype rhabdoviruses), 50 bacteria, and 619 toxin or antibiotic resistance genes (previously described in reference 1), and PathogenID v2.0, which is able to detect 126 viruses (including 30 different rhabdoviruses), 124 bacteria, 673 toxin or antibiotic resistance genes, and two human genes as controls. These arrays include prototype sequences of all of the species (or genotypes) of the genus Lyssavirus, of the other major genera defined in the family Rhabdoviridae, such as Ephemerovirus and Vesiculovirus, and of 13 rhabdoviruses awaiting classification or tentatively classified among minor groups such as the Le Dantec and Hark Park groups (6). For all of the selected probes tiled on the two versions of the PathogenID array, the same conserved region of the viral polymerase gene was used (block III). However, the size of the target region tiled on the array was longer in the second version (up to 937 nt in length for some sequences, compared to roughly 500 nt in the first version) (Tables 1 and 2).
TABLE 1.
Genus and speciesa (abbreviation) | Strain | Host species/vector | Origin | Yr of first isolation | Tiled regionb (nt) | Length (nt) | Biological sample tested | GenBank accession no. |
---|---|---|---|---|---|---|---|---|
Origin of tiled sequences | ||||||||
Lyssavirus Rabies virus (RABV) | PV | Vaccine | 7452-7953 | 502 | NC_001542 | |||
Vesiculovirus Vesicular stomatitis Indiana virus (VSIV) | VSVLMS | 7453-7953 | 497 | K02378 | ||||
Ephemerovirus Bovine ephemeral fever virus (BEFV) | BB7721 | Bos taurus | Australia | 1968 | 7454-7952 | 498 | NC_002526 | |
Rhabdoviruses tested | ||||||||
Lyssavirus species | ||||||||
Rabies virus (RABV) | 8764THA | Human | Thailand | 1983 | Human brain | EU293111 | ||
Rabies virus (RABV) | 9147FRA | Red fox | France | 1991 | Fox brain | EU293115 | ||
Rabies virus (RABV) | 93128MAR | Fixed strain | Morocco | ? | Mouse brain | GU815994 | ||
Rabies virus (RABV) | 9811CHI | Dog | China | 1998 | Mouse brain | GU815995 | ||
Rabies virus (RABV) | 0435AFG | Dog | Afghanistan | 2004 | Mouse brain | GU815996 | ||
Rabies virus (RABV) | 9001FRA | Dog bitten by bat | French Guiana | 1990 | Mouse brain | EU293113 | ||
Rabies virus (RABV) | 9026CI | Dog | Ivory Coast | 1990 | Mouse brain | GU815997 | ||
Rabies virus (RABV) | 9105USA | Fox | USA | 1991 | Fox brain | GU815998 | ||
Rabies virus (RABV) | 9233GAB | Dog | Gabon | 1992 | Dog brain | GU815999 | ||
Rabies virus (RABV) | 93127FRA | Fixed strain | France | ? | Mouse brain | GU816000 | ||
Rabies virus (RABV) | 9503TCH | Fixed strain (Vnukovo, SAD) | Czechoslovakia | ? | Mouse brain | GU816001 | ||
Rabies virus (RABV) | 9737POL | Raccoon dog | Poland | 1997 | Mouse brain | GU816002 | ||
Rabies virus (RABV) | Challenge virus strain (CVS_IP13) | Fixed strain | Mouse brain | GU816003 | ||||
Rabies virus (RABV) | ERA | Fixed strain | Mouse brain | GU816005 | ||||
Rabies virus (RABV) | LEP | Fixed strain | Chicken embryo fibroblasts | GU816004 | ||||
Vesiculovirus Vesicular stomatitis Indiana virus (VSIV) | Orsay (0503FRA) | Fixed strain | BSR cellsc | GU816006 |
Classifications and names of viruses correspond to approved virus taxonomy according to ICTVdb. Names in italics are those of validated virus species.
Position according to the reference Pasteur virus genome (NC_001542) after alignment of all of the tiled sequences with the reference sequence.
Clone of the baby hamster kidney cell line BHK-21.
TABLE 2.
Genus or groupa and species (abbreviation)a | UA/TS/UCb | Strain | Host species/vector or source | Tiled region (nt)c | Length (nt) | Biological sample tested | Origin of sample | Yr of first isolation | GenBank accession no. |
---|---|---|---|---|---|---|---|---|---|
Origin of tiled sequences | |||||||||
Lyssavirus | |||||||||
Genotype 1, Rabies virus (RABV) | PV | Vaccine | 7040-7977 | 937 | NC_001542 | ||||
Genotype 2, Lagos bat virus (LBV) | 8619NGA | Bat, Eidolon helvum | 7040-7977 | 937 | Nigeria | 1956 | EU293110 | ||
Genotype 3, Mokola virus (MOKV) | MOKV | Cat | 7040-7977 | 937 | Zimbabwe | 1981 | NC_006429 | ||
Genotype 4, Duvenhage virus (DUVV) | 94286SA | Bat, Minopterus species | 7040-7977 | 937 | South Africa | 1981 | EU293120 | ||
Genotype 5, European bat lyssavirus 1 (EBLV-1) | 8918FRA | Bat, Eptesicus serotinus | 7040-7977 | 937 | France | 1989 | EU293112 | ||
Genotype 6, European bat lyssavirus 2 (EBLV-2) | 9018HOL | Bat, Myotis dasycneme | 7040-7977 | 937 | Netherlands | 1986 | EU293114 | ||
Genotype 7, Australian bat lyssavirus (ABLV) | ABLh | Human | 7040-7977 | 937 | Australia | 1986 | AF418014 | ||
Vesiculovirus | |||||||||
Chandipura virus (CHPV) | I 653514 | Human | 7040-7981 | 935 | India | 1965 | AJ810083 | ||
Isfahan virus (ISFV) | 91026-167 | Phlebotomus papatasi | 7040-7981 | 935 | Iran | 1975 | AJ810084 | ||
Vesicular stomatitis New Jersey virus (VSNJV) | VSV NJ-O | Bos taurus, equine/Culex nigripalpus, Culicoides species, Mansonia indubitans | 7040-7981 | 935 | United States | 1949 | AY074804 | ||
Vesicular stomatitis Indiana virus (VSIV) | VSVLMS | ? | 7040-7981 | 935 | ? | ? | K02378 | ||
Perinet virus (PERV) | TS | Ar Mg 802 | Culex antennatus | 7089-7502 | 405 | Madagascar | 1978 | AY854652 | |
Spring viremia of carp virus (SVCV) | TS | VR-1390 | Cyprinus carpio | 7040-7981 | 935 | Yugoslavia | 1971 | U18101 | |
Ephemerovirus | |||||||||
Adelaide River virus (ARV) | DPP 61 | Bos taurus | 7089-7502 | 408 | Australia | 1981 | AY854635 | ||
Bovine ephemeral fever virus (BEFV) | BB7721 | Bos taurus | 7089-7502 | 408 | Australia | 1968 | AY854642 | ||
Kimberley virus (KIMV) | TS | CS 368 | Bos taurus | 7089-7502 | 408 | Australia | 1980 | AY854637 | |
Kotonkan virusd (KOTV) | UA | Ib Ar23380 | Culicoides species | 7089-7502 | 408 | Nigeria | 1967 | AY854638 | |
Other dimarhabodvirusesd | |||||||||
Almpiwar group | |||||||||
Almpiwar virus (ALMV) | UA | MRM4059 | Ablepharus boutonii virgatus | 7089-7502 | 411 | Australia | 1966 | AY854645 | |
Humpty doo virus (HDOOV) | UA | CS 79 | Lasiohelea species | 7089-7502 | 411 | Australia | 1975 | AY854643 | |
Oak-Vale virus (OVRV) | UA | CS 1342 | Culex species | 7089-7502 | 408 | Australia | 1981 | AY854670 | |
Hart Park group | |||||||||
Flanders virus (FLANV) | UA | 61-7484 | Culiseta melanura | 7089-7502 | 410 | United States | 1961 | AF523199 | |
Ngaingan virus (NGAV) | UA | NRM14556 | Culicoides brevitarsis | 7089-7502 | 408 | Australia | 1970 | AY854649 | |
Parry Creek virus (PCRV) | UA | OR 189 | Culex annulirostris | 7089-7502 | 408 | Australia | 1972 | AY854647 | |
Wongabel virus (WONV) | UA | CS 264 | Culicoides austropalpalis | 7089-7502 | 408 | Australia | 1979 | AY854648 | |
Le Dantec and Kern Canyon group | |||||||||
Fukuoka virus (FUKV) | UA | FUK-11 | Culicoides punctatus | 7089-7502 | 408 | Japan | 1982 | AY854651 | |
Le Dantec virus (LDV) | UA | DakHD 763 | Human | 7089-7502 | 408 | Senegal | 1965 | AY854650 | |
Tibrogargan group, Tibrogargan virus (TIBV) | UA | CS 132 | Culicoides brevitarsis | 7089-7502 | 408 | Australia | 1976 | AY854646 | |
Other animal rhabdoviruses | |||||||||
Tupaia rhabdovirus (TUPV) | UA | TRV 1591 | Tupaia belangeri | 7089-7502 | 408 | Thailand | ? | NC_007020 | |
Sigma virus (SIGMAV) | UA | 234HRC | Drosophila melanogaster | 6220-6642 | 408 | ? | ? | X91062 | |
Sea trout rhabdovirus (STRV) | UC | 28/97 | Salmo trutta trutta | 7108-7576 | 415 | Sweden | 1996 | AF434992 | |
Rhabdoviruses tested | |||||||||
Lyssavirus | |||||||||
Genotype 1 | |||||||||
Rabies virus (RABV) | 93127FRA | Fixed strain | Mouse brain | France | ? | GU816000 | |||
Rabies virus (RABV) | 8764THA | Human | Human brain | Thailand | 1983 | EU293111 | |||
Rabies virus (RABV) | 08339FRA | Human (probably contaminated by bat) | Human saliva | France (French Guiana) | 2008 | GU816007 | |||
Rabies virush (RABV) | 07029SEN | Human | Skin biopsy | Senegal | 2006 | ||||
Genotype 2, Lagos bat virus (LBV) | 8619NGA | Bat, Eidolon helvum | Mouse brain | Nigeria | 1956 | EU293110 | |||
Genotype 3, Mokola virus (MOKV) | 86100CAM | Shrew | Mouse brain | Cameroon | 1981 | NC_006429 | |||
Genotype 4, Duvenhage virus (DUVV) | 86132SA | Human | Mouse brain | South Africa | 1971 | EU293119 | |||
Genotype 5 | |||||||||
European bat lyssavirus 1 subtype a (EBLV-1a) | 07240FRA | Cat (contaminated by bat) | Cat brain | France | 2007 | EU626552 | |||
European bat lyssavirus 1 subtype a (EBLV-1b) | 08341FRA | Bat, Eptesicus serotinus | Bat brain | France | 2008 | GU816009 | |||
European bat lyssavirus 1 subtype b (EBLV-1b) | 8918FRA | Bat, Eptesicus serotinus | Mouse brain | France | 1989 | EU293112 | |||
Genotype 6, European bat lyssavirus 2 (EBLV-2) | 9018HOL | Bat, Myotis dasycneme | Mouse brain | Holland | 1986 | EU293114 | |||
Genotype 7, Australian bat lyssavirus (ABLV) | 9810AUS | Bat | Mouse brain | Australia | ? | GU816008 | |||
Genotype 8 (tentative species), Dakar bat lyssavirus (DBLV) | UC | 0406SEN (AnD 42443) | Bat, Eidolon helvum | Mouse brain | Senegal | 1985 | EU293108 | ||
Not assigned, West Caucasian bat virus (WCBV) | UC | Bat, Myotis schreibersii | Plasmide | Russia | 2002 | EF614258 | |||
Vesiculovirus | |||||||||
Vesicular stomatitis Indiana virus (VSIV) | Orsay (0503FRA) | ? | BSR cellsf | ? | ? | GU816006 | |||
Boteke virus (BTKV) | TS | DakArB 1077 (0417RCA) | Coquillettidia maculipennis | Mouse brain | Central African Republic | 1968 | GU816014 | ||
Jurona virus (JURV) | TS | BeAr 40578 (0414BRE) | Hemagogus spegazzinii | Mouse brain | Brazil | 1962 | GU816024 | ||
Porton's virus (PORV) | TS | 1643 (0416MAL) | Mansonia uniformis | Mouse brain | Malaysia (Sarawak) | ? | GU816013 | ||
Ephemerovirus | |||||||||
Kotonkan virusd (KOTV) | UA | Ib Ar23380 (9145NIG) | Culicoides species | Mouse brain | Nigeria | 1967 | AY854638 | ||
Kimberley virus (KIMV) | TS | CS 368 | Bos taurus | Mouse brain | Australia | 1980 | AY854637 | ||
Other animal rhabodviruses | |||||||||
Hart Park group | |||||||||
Kamese virus (KAMV) | UA | MP 6186 (08343OUG) | Culex annulioris | Mouse brain | Uganda | 1967 | GU816011 | ||
Mossuril virus (MOSV) | UA | SA Ar 1995 (0418MOZ) | Culex sitiens | Mouse brain | Mozambique | 1959 | GU816012 | ||
Kolongo and Sandjimba group, Sandjimba virus (SJAV) | UA | DakAnB 373d (07244RCA) | Acrocephalus schoenobaenus | Mouse brain | Central African Republic | 1970 | GU816019 | ||
Le Dantec and Kern Canyon group | |||||||||
Keuraliba virus (KEUV) | UA | DakAnD 5314 (9715SEN, 0420SEN) | Tatera kempi | Mouse brain | Senegal | 1968 | GU816021 | ||
Nkolbisson virus (NKOV) | UA | Ar Y 31/65 (0425CAM) | Eretmapodites leucopous | Mouse brain | Ivory Coast, Cameroon | 1965 | GU816022 | ||
Ungrouped | |||||||||
Garba virusg (GARV) | UA | DakAnB 439a (0422RCA) | Corythornis cristata | Mouse brain | Central African Republic | 1970 | GU816018 | ||
Nasoule virusg (NASV) | UA | DakAnB 4289a (0410RCA) | Andropadus virens | Mouse brain | Central African Republic | 1973 | GU816017 | ||
Ouango virusg (OUAV) | UA | DakAnB 1582a (9718RCA) | Ploceus melanocephalus | Mouse brain | Central African Republic | 1970 | GU816015 | ||
Bimbo virusg (BBOV) | UA | DakAnB 1054d (9716RCA) | Euplectes afer | Mouse brain | Central African Republic | 1970 | GU816016 | ||
Bangoran virus (BGNV) | UA | DakArB 2053 (0424RCA) | Culex perfuscus | Mouse brain | Central African Republic | 1969 | GU816010 | ||
Gossas virush (GOSV) | UA | DakAnD 401 (08344SEN) | Tadarida species | Mouse brain | Senegal | 1964 | NAi |
Unless stated otherwise, the classifications and names of viruses correspond to approved virus taxonomy according to ICTVdb. Names in italic are those of validated virus species.
UA, unassigned; TS, tentative species; UC, unclassified (not found in ICTVdb).
Position according to the reference Pasteur virus genome (NC_001542) after alignment of all of the tiled sequences with the reference sequence (except for tiled sequences from ungrouped rhabdoviruses TUPV and SIGMAV, which were aligned independently with the reference sequence).
Taxonomical classification according to reference 6.
A 977-nt fragment of the polymerase gene (from nt 7020 to nt 7997, according to the reference Pasteur virus genome [NC_001542]) was synthesized in vitro and then cloned into plasmid pCR2.1 (Operon).
Clone of the baby hamster kidney cell line BHK-21.
Not detected using PathogenID v2.0 microarray but amplified by PCR or nested PCR using consensus or specific primers.
Not detected using PathogenID v2.0 microarray or amplified by PCR or nested PCR using consensus or specific primers.
NA, not applicable.
Virus strains and biological samples analyzed.
Detailed descriptions of all of the prototype and field virus strains used in this study and their sources are listed in Tables 1 and 2. Briefly, 16 and 31 different viruses were tested using PathogenID v1.0 (15 lyssaviruses and 1 vesiculovirus) and PathogenID v2.0 (14 lyssaviruses, 1 vesiculovirus, and 12 unassigned and 4 tentative species of animal rhabdoviruses according to ICTVdb), respectively. Samples tested included in vitro-infected cells, a synthetic nucleotide target (when the corresponding virus strain was not available), brain biopsy specimens obtained from experimentally infected mice, and biological specimens from various animals (bat, cat, dog, and fox brains) and humans (brain, saliva, and skin biopsy specimens).
Extraction and amplification of viral RNA.
RNA extraction from biological samples was processed with TRI Reagent (Molecular Research Center) according to the manufacturer's recommendations. After extraction, viral RNAs were reverse transcribed and then amplified using the whole-transcriptome amplification (WTA) protocol (QuantiTect Whole Transcriptome kit; Qiagen) as described previously (3).
Microarrays assay.
All of the amplification products obtained from viral RNA were quantified by Quantit BR (Invitrogen) according to the manufacturer's instructions or by the NanoDrop ND-1000 spectrophotometer instrument (Thermo Scientific). A recommended amount of target DNA was fragmented and labeled according to GeneChip Resequencing Assay manual (Affymetrix). The microarray hybridization process was carried out according to the protocol recommended by the manufacturer (Affymetrix). All of the details and parameter settings for the data analysis (essentially conversion of raw image files obtained from scanning of the microarrays into FASTA files containing the sequences of base calls made for each tiled region of the microarray) have been described previously (1). The base call rate refers to the percentage of base calls generated from the full-length tiled sequence.
Data analysis.
In the first approach, resequencing data obtained by the PathogenID v1.0 microarray were manually submitted to the NCBI nr/nt database for BLASTN query. The default BLAST options were modified. The word size was set to 7 nt. The expected threshold was increased from its default value of 10 to 100,000 to reduce the filtering of short sequences and sequences rich in undetermined calls, which can assist correct taxonomic identification. To avoid false-negative results induced by high numbers of undetermined nucleotides in the sequences, the “low complexity level filter” (−F) was also turned off. BLAST sorts the resulting hits according to their bit scores so that the sequence that is the most similar to the entry sequence appears first. Identification of the virus strains tested was considered successful only when the best hit was unique and corresponded to the expected species or isolate (according to the nucleotide sequences of these viruses already available in the NCBI nr/nt database).
In the second approach, an automatic bioinformatics-based analysis of RMA data provided by PathogenID v2.0 was developed, including a consensus sequence determination strategy completed with a systematic BLAST strategy. The general workflow of this strategy is represented in Fig. 1. A Perl script reads the input data, which consist of one FASTA file per sample that contains all of the sequences read by the GSEQ software from the hybridization. A modified version of the filtering process described by Malanoski et al. (29) is applied to the sequences. The retained sequences contain stretches of nucleotides that are ascertained according to the following algorithm. Briefly, sequences that do not contain subsequences fulfilling specific parameters (minimum nucleotide length [m] and maximum undetermined nucleotide content [N]) defined by the user are discarded. These parameters differ from those described in the original filtering process, where m was fixed to 20 and N was a value depending on m, leading to the filtering out of all short subsequences, even with a high base call rate. For subsequence determination, the program starts from the first base call of the sequence considered and searches for the first m base window area that scores the elongation threshold defined by the user, which represents another difference from the filtering process described by Malanoski et al., where this elongation threshold was fixed at 60% (29). The subsequence is extended by one base (m + 1) if the percentage of N remains inferior to the elongation threshold. When this threshold is exceeded, the elongation is stopped and the subsequence is conserved. This process is reiterated until the end of the sequence is reached to generate as many informative sequences as possible. All of our analyses were performed with the following filtering parameters: m = 12, N = 10, and elongation threshold = 10%.
A systematic BLAST strategy to search for sequence homologues was then performed with the filtered sequences containing subsequences. These sequences individually undergo a BLAST analysis based on a local viral and bacterial database (sequences obtained after filtering from the NCBI nr/nt database, updated and used for BLAST queries in December 2009), and the taxonomies of the best BLAST hits are retrieved (Fig. 1A). The default BLAST options were modified as previously described. When several hits obtain the highest bit score, the script automatically retrieves the taxonomies of the 10 first BLAST hits. The final taxonomic identification of each virus strain tested was done by the user as follows: (i) identification at the species or isolate level when a unique best hit corresponds to the expected species or isolate, (ii) identification at the genus level (if available) when multiple best viral hits exist and correspond to different species within the same genus of the family Rhabdoviridae, (iii) identification at the family level when multiple best viral hits exist and correspond to different rhabdoviruses genera, or (iv) negative or inaccurate identification when a BLAST query is not possible or when multiple best hits correspond to other viral families, respectively.
For the consensus sequence determination strategy, resequencing data obtained from rhabdoviral tiled sequences are filtered as previously described and then submitted to a multiple alignment with CLUSTAL W (39), from which a consensus sequence is determined (Fig. 1B). For each sequence in the alignment, if a called base has undetermined calls on both sides, it is replaced by an undetermined call. If different calls appear in the sequences for a given position, the majority base call is added to the consensus. The positions that contain an undetermined call or a gap are not considered in the majority base call computation. If multiple base calls tie for the majority, an undetermined call appears at this position in the consensus sequence. This procedure generally increases the length and accuracy of the query sequence for subsequent analysis. Homology searching of the consensus sequences is performed with BLAST using the parameters previously described, and the taxonomy of the best hit is retrieved as for the systematic homology searching approach. We tested if the resulting consensus sequences had higher identification accuracy than any individual sequence or could be used to design PCR primers for a characterization of a potential novel isolate.
Sequencing confirmation.
Conventional sequencing was undertaken after the PCR amplification of viral targets directly from biological samples (after RNA extraction and RT) or from 10- to 100-fold water-diluted WTA products. Primer design was first based on consensus sequences obtained using the consensus sequence determination strategy previously described and/or on rhabdovirus nucleotide sequences available in GenBank. Depending on the results obtained and the virus strain tested, the primer design, the set of primers used, and the PCR conditions used for partial polymerase gene amplification were then adjusted (list of primers and the PCR conditions are available on request from the corresponding author). All PCR products were obtained using the proofreading DNA polymerase ExtTaq (Takara). Sequence assembly and consensus sequences were obtained using Sequencher 4.7 (Gene Codes).
Phylogenetic analysis.
The data set of 15 newly sequenced rhabdoviruses from this study (including the Sandjimba and Kolongo viruses previously only identified on the basis of partial nucleoprotein gene sequences, as well as Piry virus, for which the nucleotide sequences of different genes were available) was compared with the corresponding block III polymerase amino acid sequences of 91 other rhabdoviruses collected from GenBank (see Table 6). DNA translation was performed with BioEdit software (17), and sequence alignment was performed using the CLUSTAL W program (39) and then checked for accuracy by eye. This resulted in a final alignment of 106 sequences 160 amino acid residues in length. Phylogenetic analysis of these sequences was then undertaken using the Bayesian method available in the MRBAYES package (18). This analysis utilized the WAG model of amino acid replacement with a gamma distribution of among-site rate variation. Chains were run for 10 million generations (with a 10% burn in), at which point all of the parameter estimates had converged. The level of support for each node is provided by Bayesian posterior probability (BPP) values.
TABLE 6.
Genus and namea (species) | UA/TS/UCb | Abbreviation | Strain | Principal host species/vectorc | Sample origin | Yr of first isolation | GenBank accession no. |
---|---|---|---|---|---|---|---|
Lyssavirus | |||||||
Rabies virus (1) | RABV | 9001FRA | Dog bitten by bat | French Guiana | 1990 | EU293113 | |
Rabies virus (1) | RABV | 9147FRA | Fox | France | 1991 | EU293115 | |
Rabies virus (1) | RABV | 8743THA | Human | Thailand | 1983 | EU293121 | |
Rabies virus (1) | RABV | 9704ARG | Bat, Tadarida brasiliensis | Argentina | 1997 | EU293116 | |
Rabies virus (1) | RABV | 9706CHI | Vaccine AG | China | AY854663 | ||
Rabies virus (1) | RABV | 9702IND | Human | India | 1997 | AY854665 | |
Lagos bat virus (2) | LBV | 8619NGA | Bat, Eidolon helvum | Nigeria | 1956 | EU293110 | |
Mokola virus (3) | MOKV | MOKV | Cat | Zimbabwe | 1981 | NC_006429 | |
Mokola virus (3) | MOKV | 86100CAM | Shrew | Cameroon | 1974 | EU293117 | |
Mokola virus (3) | MOKV | 86101RCA | Rodent | Republic of Central Africa | 1981 | EU293118 | |
Duvenhage virus (4) | DUVV | 94286SA | Bat, Miniopterus species | South Africa | 1981 | EU293120 | |
Duvenhage virus (4) | DUVV | 86132SA | Human | South Africa | 1971 | EU293119 | |
European bat lyssavirus 1 (5) | EBLV-1 | 8918FRA | Bat, Eptesicus serotinus | France | 1989 | EU293112 | |
European bat lyssavirus 1 (5) | EBLV-1 | 08120FRA | Bat, Eptesicus serotinus | France | 2008 | EU626551 | |
European bat lyssavirus 2 (6) | EBLV-2 | 9018HOL | Bat, Myotis dasycneme | Netherlands | 1986 | EU293114 | |
European bat lyssavirus 2 (6) | EBLV-2 | 9337SWI | Bat, Myotis daubentonii | Switzerland | 1993 | AY854657 | |
Australian bat lyssavirus (7) | ABLV | ABLh | Human | Australia | 1986 | AF418014 | |
Australian bat lyssavirus (7) | ABLV | ABLb (S6-1256) | Bat, Saccolaimus species | Australia | 1996 | NC_003243 | |
Dakar bat lyssavirus (8 [proposed]) | UC | DBLV | 0406SEN (AnD 42443) | Bat, Eidolon helvum | Senegal | 1985 | EU293108 |
Dakar bat lyssavirus (8 [proposed]) | UC | DBLV | KE131 | Bat, Eidolon helvum | Kenya | 2007 | EU259198 |
Irkut virus | UC | IRKV | Bat, Murina leucogaster | Russia | 2002 | EF614260 | |
Ozernoe virus | UC | Human | Russia | 2007 | FJ905105 | ||
Aravan virus | UC | ARAV | Bat, Myotis blythi | Kyrgyzstan | 1991 | EF614259 | |
Khujand virus | UC | KHUV | Bat, Myotis mystacinus | Tajikistan | 2001 | EF614261 | |
West Caucasian bat virus | UC | WCBV | Bat, Miniopterus schreibersii | Russia | 2002 | EF614258 | |
Vesiculovirus | |||||||
Chandipura virus | CHPV | I 653514 | Human; domestic animalsd; hedgehog, Atelerix species; dipteran, Phlebotomus species | India | 1965 | AJ810083 | |
Cocal virus | COCV | TRVL 40233 | Livestock, equine, bovine; mites, Gigantolaelaps species | Trinidad and Tobago, Trinidad | 1961 | EU373657 | |
Isfahan virus | ISFV | 91026-167 | Dipteran, Phlebotomus papatasi | Iran | 1975 | AJ810084 | |
Piry virus | PIRYV | BeAn 24232 (0413BRE) | Human; opossum, Philander opossum | Brazil | 1960 | GU816023 | |
Vesicular stomatitis New Jersey virus | VSNJV | VSV NJ-O | Several livestock species, including Bos taurus and equines; several dipteran species, including Culex nigripalpus, Culicoides species, and Mansonia indubitans | Utah | 1949 | AY074804 | |
Vesicular stomatitis New Jersey virus | VSNJV | VSV NJ-H | Several livestock species, including Sus scrofa; several dipteran species, including Culex nigripalpus, Culicoides species, and Mansonia indubitans | Georgia | 1952 | AY074803 | |
Vesicular stomatitis Indiana virus | VSIV | Mudd-Summers (MS) | Bovine, Bos taurus | Indiana | 1925 | EU849003 | |
Vesicular stomatitis Indiana virus | VSIV | 85CLB | Bovine | Colombia | 1985 | AF473865 | |
Vesicular stomatitis Indiana virus | VSIV | 98COE | Equine | Colorado | 1998 | AF473864 | |
Vesicular stomatitis Alagoas virus | VSAV | Indiana 3 | Equine livestock (mule), Bos taurus; dipterans, Phlebotomus species | Brazil | 1964 | EU373658 | |
Jurona viruse | TS | JURV | BeAr 40578 (0414BRE) | Dipteran, Hemagogus spegazzinii | Brazil | 1962 | GU816024 |
Perinet virus | TS | PERV | Ar Mg 802 | Dipterans, Anopheles coustani, Culex antennatus, Culex gr. pipiens, Mansonia uniformis, Phlebotomus berentensis | Madagascar | 1978 | AY854652 |
Pike fry rhabdovirus | TS | PFRV | F4 | Fish, Esox lucius | Netherlands | 1972 | FJ872827 |
Scophthalmus maximus rhabdovirus | UC | SMRV | QZ-2005 | Fish, Scophthalmus maximus | China | ? | AY895167 |
Spring viremia of carp virus | TS | SVCV | Fijan_cell (VR-1390, isolated from fat head minnow cells) | Fish, Cyprinus carpio | Yugoslavia | 1971 | AJ318079 |
Spring viremia of carp virus | TS | SVCV | Fijan_tissue (VR-1390, isolated from tissues of diseased common carp) | Fish, Cyprinus carpio | Yugoslavia | 1971 | U18101 |
Spring viremia of carp virus | TS | SVCV | BJ0505-2 | Fish, Cyprinus carpio | China | 2005 | EU177782 |
Ephemerovirus | |||||||
Adelaide River virus | ARV | DPP 61 | Bovine, Bos taurus | Australia | 1981 | AY854635 | |
Berrimah virus | BRMV | DPP 63 | Bovine, Bos taurus | Australia | 1981 | AY854636 | |
Bovine ephemeral fever virus | BEFV | Cs 1933 | Bovine, Bos taurus | Australia | 1973 | AY854641 | |
Bovine ephemeral fever virus | BEFV | Cs 42 | Dipteran, Anopheles bancrofti | Australia | 1975 | AY854639 | |
Bovine ephemeral fever virus | BEFV | BB7721 | Bovine, Bos taurus | Australia | 1968 | NC_002526 | |
Kimberley virus | TS | KIMV | CS 368 | Bovine, Bos taurus | Australia | 1980 | AY854637 |
Kotonkan virus | UA | KOTV | IbAr 23380 | Dipteran, Culicoides species | Nigeria | 1967 | AY854638 |
Almpiwar group | |||||||
Almpiwar virus | UA | ALMV | MRM 4059 | Mammals,d bovine, equine, ovine, kangaroo, bandicoot, human; birds;d lizard, Ablepharus boutonii virgatus and other skinksd | Australia | 1966 | AY854645 |
Charleville virus | UA | CHVV | Ch 9824 | Human;d dipteran, Phlebotomus and Lasiohelea species | Australia | 1969 | AY854644 |
Charleville virus | UA | CHVV | Ch 9847 | Human;d dipteran, Phlebotomus and Lasiohelea species | Australia | 1969 | AY854672 |
Humpty doo virus | UA | HDOOV | CS 79 | Dipterans, Lasiohelea species, Culicoides marksi | Australia | 1975 | AY854643 |
Hart Park group | |||||||
Bangoran viruse | UA | BGNV | DakArB 2053 (0424RCA) | Bird, Turdus libonyanus; dipteran, Culex perfuscus | Central African Republic | 1969 | GU816010 |
Flanders virus | UA | FLANV | 61-7484 | Birds, Seiurus aurocapillus, Agelaius phoeniceus; dipterans, Culiseta melanura, Culex species | New York | 1961 | AF523199 |
Kamese viruse | UA | KAMV | MP 6186 (08343OUG) | Dipterans, Aedes africanus, Culex species, including Culex annulioris | Uganda | 1967 | GU816011 |
Mossuril viruse | UA | MOSV | SA Ar 1995 (0418MOZ) | Birds, Andropadus virens, Coliuspasser macrourus; dipterans, Aedes abnormalis, Culex species, including Culex sitiens | Mozambique | 1959 | GU816012 |
Ngaingan virus | UA | NGAV | MRM 14556 | Mammals,d wallabies, kangaroos, bovines; dipteran, Culicoides brevitarsis | Australia | 1970 | AY854649 |
Parry Creek virus | UA | PCRV | OR 189 | Dipteran, Culex annulirostris | Australia | 1972 | AY854647 |
Porton's viruse | TS (VSV) | PORV | 1643 (0416MAL) | Dipteran, Mansonia uniformis | Malaysia (Sarawak) | ? | GU816013 |
Wongabel virus | UA | WONV | CS 264 | Sea birds;d dipteran, Culicoides austropalpalis | Australia | 1979 | AY854648 |
Le Dantec group | |||||||
Fukuoka virus | UA (Kern Canyon Group) | FUKV | FUK-11 | Bovine; dipteran, Culicoides punctatus, Culex tritaeniorhynchus | Japan | 1982 | AY854651 |
Keuraliba viruse | UA | KEUV | DakAnD 5314 (9715SEN, 0420SEN) | Rodents, Tatera species, including Tatera kempi, Taterillus species | Senegal | 1968 | GU816021 |
Le Dantec virus | UA | LDV | DakHD 763 | Human | Senegal | 1965 | AY854650 |
Nkolbisson viruse | UA (Kern Canyon Group) | NKOV | Ar YM 31/65 (0425CAM) | Dipteran, Aedes species, Eretmapodites species, including Eretmapodites leucopous, Culex telesilla | Cameroon | 1965 | GU816022 |
Moussa group | |||||||
Moussa virus | UC | MOUSV | C23 | Dipteran, Culex decens | Ivory Coast | 2004 | FJ985748 |
Moussa virus | UC | MOUSV | D24 | Dipteran, Culex species | Ivory Coast | 2004 | FJ985749 |
Sandjimba group | |||||||
Bimbo viruse | UA | BBOV | DakAnB 1054d (9716RCA) | Bird, Euplectes afer | Central African Republic | 1970 | GU816016 |
Boteke viruse | TS (VSV) | BTKV | DakArB 1077 (0417RCA) | Dipteran, Coquillettidia maculipennis | Central African Republic | 1968 | GU816014 |
Garba viruse | UA | GARV | DakAnB 439a (0422RCA) | Birds, Corythornis cristata, Nectarina pulchella | Central African Republic | 1970 | GU816018 |
Kolongo virus | UA | KOLV | DakAnB 1094d (9717RCA) | Birds, Euplectes afer, Ploceus cucullatus | Central African Republic | 1970 | GU816020 |
Nasoule viruse | UA | NASV | DakAnB 4289a (0410RCA) | Bird, Andropadus virens | Central African Republic | 1973 | GU816017 |
Oak-Vale virus | UA | OVRV | CS 1342 | Ferral pigs;d dipteran, Aedes vigilax, Culex species, including (Culex edwardsi) | Australia | 1981 | AY854670 |
Ouango viruse | UA | OUAV | DakAnB 1582a (9718RCA) | Bird, Ploceus melanocephalus | Central African Republic | 1970 | GU816015 |
Sandjimba virus | UA | SJAV | DakAnB 373d (07244RCA) | Bird, Acrocephalus schoenobaenus | Central African Republic | 1970 | GU816019 |
Sigma group | |||||||
Drosophila affinis sigma virus | UC | DAffSV | 10 | Dipterian, Drosophila affinis | New Connecticut | 2007 | GQ410980 |
Drosophila melanogaster sigma virus | UA | SIGMAV (DMelSV) | AP30 | Dipterian, Drosophila melanogaster | Florida | 2005 | NC_013135 |
Drosophila melanogaster sigma virus | UA | SIGMAV (DMelSV) | HAP23 | Dipterian, Drosophila melanogaster | France | ? | GQ375258 |
Drosophila obscura sigma virus | UC | DObsSV | 10A | Dipterian, Drosophila obscura | United Kingdom | 2007 | GQ410979 |
Sinistar group | |||||||
Siniperca chuatsi rhabdovirus | UC | SCRV | Fish, Siniperca chuatsi | China | ? | NC_008514 | |
Starry flounder rhabdovirus | UC | SFRV | Fish, Platichthys stellatus | Washington | 2000 | AY450644 | |
Tibrogargan group | |||||||
Tibrogargan virus | UA | TIBV | CS 132 | Bovines,d water buffaloes, cattle; dipteran, Culicoides brevitarsis | Australia | 1976 | AY854646 |
Tupaia virus | TS (VSV) | TUPV | TRV 1591 | Tree shrew, Tupaia belangeri | Thailand | ? | NC_007020 |
Novirhabdovirus | |||||||
Hirame rhabdovirus | HIRRV | CA 9703 | Fish, including cultured Korean flounders, Paralichthys olivaceus, Plecoglossus altivelis, Milio macrocephalus, and Sebastes inermis | Korea | 1997 | NC_005093 | |
Infectious hematopoietic necrosis virus | IHNV | HV7601 | AB231660 | ||||
Infectious hematopoietic necrosis virus | IHNV | WRAC | Fish, including salmonid Oncorhynchus tschawytscha | Idaho | NC_001652 | ||
Snakehead rhabdovirus | SHRV | Fish, including Ophicephalus striatus, Clarias bratachus, and Oxyeleotis marmoratus | Thailand | NC_000903 | |||
Viral hemorrhagic septicemia virus | VHSV | KRRV9822 | Fish, Japanese flounder | Japan | AB179621 | ||
Viral hemorrhagic septicemia virus | VHSV | 07-71 | Fish, Oncorhynchus mykiss | France | AJ233396 | ||
Viral hemorrhagic septicemia virus | VHSV | JF00Ehi1 | Fish, Paralichthys olivaceus | Japan | 2000 | AB490792 | |
Viral hemorrhagic septicemia virus | VHSV | 14-58 | Fish, Oncorhynchus mykiss | France | AF143863 | ||
Nucleorhabdovirus | |||||||
Maize mosaic virus | MMV | Plants (host), Graminae, including Zea mays; hemipterans (vector), Delphacidae | United States | NC_005975 | |||
Rice yellow stunt virus | RYSV | Plant (host), Oryza sativa; homopterans (vector), Cicadellidae | NC_003746 | ||||
Sonchus yellow net virus | SYNV | Plants (host), Asteraceae, including Sonchus oleraceus; hemipterans (vector), Aphididae | NC_001615 | ||||
Iranian maize mosaic nucleorhabdovirus | UC | IMMNV | Plants (host), Graminae, including Zea mays; hemipterans (vector), Delphacidae | Iran | NC_011542 | ||
Maize fine streak virus | UC | MFSV | Plants (host), Graminae, including Zea mays; homopterans (vector), Cicadellidae | Georgia | 1999 | NC_005974 | |
Orchid fleck virusf | UC | OFV | So | Plants (host), Orchidaceae, including Cymbidiumspecies; acarid (vector), Brevipalpus californicus | Japan | NC_009609 | |
Taro vein chlorosis virus | UC | TaVCV | Plant (host), Colocasia esculenta | Fiji Islands | NC_006942 | ||
Cytorhabdovirus | |||||||
Barley yellow striate mosaic | BYSMV | Zanjan-1 | Plants (host), Graminae, including Triticumspecies; hemipterans (vector), Delphacidae | Iran | FJ665628 | ||
Lettuce necrotic yellows virus | LNYV | 318 | Several (host) plant families and species, including Allium sativum and Lactuca sativa; hemipterans (vector), Aphididae | Australia | NC_007642 | ||
Northern cereal mosaic virus | NCMV | Plants (host), Graminae, including Hordeum vulgare; hemipterans (vector), Delphacidae | Japan | NC_002251 | |||
Strawberry crinkle virus | SCV | HB-A1 | Plant (host), Fragariaspecies; hemipterans (vector), Aphididae | AY331389 | |||
Strawberry crinkle virus | SCV | 37-2 | Plant (host), Fragariaspecies; hemipterans (vector), Aphididae | AY331388 | |||
Strawberry crinkle virus | SCV | 37-1 | Plant (host), Fragariaspecies; hemipterans (vector), Aphididae | AY331387 | |||
Lettuce yellow mottle virus | UC | LYMoV | Plant (host), Lactuca sativa | France | 1998 | NC_011532 | |
Taastrup group, Taastrup virus | UC | TV | Hemipteran (potential vector), Psammotettix alienus | France | 1996 | AY423355 |
Names of viruses in italics correspond to approved species according to the International Committee on Taxonomy of Viruses database.
UA, unassigned; TS, tentative; UC, unclassified (not found in ICTVdb).
In bold is the host species from which the virus was first isolated, if that information is available.
Serological detection only.
First identification based on nucleic acid determination and classification based on phylogenic analysis (this study).
Also tentatively classified into the new genus Dichorhabdovirus according to its unusual bipartite genome (20).
Nucleotide sequence accession numbers.
The GenBank accession numbers for the sequences newly acquired are designated GU815994 to GU816024 and are indicated in Tables 1, 2 and 6.
RESULTS
Identification of lyssaviruses based on two successive PathogenID microarray generations using a systematic BLAST strategy.
To test whether PathogenID microarrays, and specifically the prototype tiled regions, could be used for the identification of a broad number of viral variants without relying on predetermined hybridization patterns, representative animal viruses from the family Rhabdoviridae (including unassigned or tentatively classified rhabdoviruses according to ICTVdb) were studied. The capability of these RMAs to identify and discriminate between near phylogenetic neighbors was first tested using one sequence of the genus Lyssavirus (strain PV, genotype or species 1) tiled on the first generation of the PathogenID microarray (Table 1). It was possible to use BLAST to successfully identify virus strains with approximately 18% nucleotide divergence compared to the prototype (Fig. 2). The hybridization of 15 virus strains representative of the genetic diversity found in this species indicated that a single tiled sequence was able to detect all of the variant strains belonging to the same species.
In addition, we evaluated the spectrum of detection of the second generation of the PathogenID microarray, which included one prototype sequence representative of each of the seven described species in the genus Lyssavirus (Table 2). All of the isolates tested led to the correct species identification using a systematic BLAST strategy when hybridizing a target belonging to the same species that is tiled on the array (Table 3). Moreover, all of the tested isolates of a known genotype were also recognized by heterospecific tiled sequences (Table 3). We also investigated the capacity of this RMA to detect more distantly related viruses not yet classified into a species. Isolates 0406SEN and WCBV, which have been proposed to represent new species of the genus Lyssavirus (5, 15), were surprisingly recognized by almost all of the seven species sequences tiled on the PathogenID v2.0 microarray (Table 3). This recognition indicates that each sequence tiled on the array has the ability to identify strains that are more than 18% divergent, and up to 25.9% in some cases (Table 3). This analysis also reveals that information on a strain hybridized on PathogenID v2.0 can be obtained from distinct species or isolates tiled on the array. Evaluation of the spectrum of detection of this RMA was further extended to two other genera of the family Rhabdoviridae—Ephemerovirus and Vesiculovirus (Table 4). Here again, successful identification was achieved using homospecific sequences tiled on the array, confirming the reliability of the identification.
TABLE 3.
Species (abbreviation) and isolate of lyssaviruses and parameter tested | Result from tiled sequence of Lyssavirus genotype: |
||||||
---|---|---|---|---|---|---|---|
1 (RABV), PV | 2 (LBV), 8619NGA | 3 (MOKV), MOKV | 4 (DUVV), 94286SA | 5 (EBLV-1), 8918FRA | 6 (EBLV-2), 9018HOL | 7 (ABLV), ABLV | |
1 (RABV) | |||||||
93127FRA | |||||||
Base call ratea | 95.0 | 3.8 | 4.6 | 6.2 | 8.0 | 9.0 | 6.7 |
Identificationb | A | C | B | A | A | A | B |
Divergencec | 0.2 | 25.6 | 24.8 | 22.8 | 23.2 | 21.0 | 22.0 |
8764THA | |||||||
Base call rate | 32.6 | 5.4 | 5.7 | 7.0 | 6.0 | 3.3 | 9.0 |
Identification | A | A | B | C | A | A | B |
Divergence | 13.7 | 24.3 | 24.9 | 22.7 | 22.9 | 21.2 | 20.7 |
2 (LBV) | |||||||
8619NIG | |||||||
Base call rate | 2.7 | 96.6 | 11.1 | 6.9 | 7.1 | 5.3 | 6.7 |
Identification | Neg | A | A | A | Neg | Neg | B |
Divergence | 25.8 | 0.0 | 22.3 | 22.8 | 25.2 | 23.8 | 23.3 |
3 (MOKV) | |||||||
86100CAM | |||||||
Base call rate | 2.7 | 7.2 | 56.3 | 8.1 | 7.0 | 7.7 | 3.6 |
Identification | A | A | A | A | A | A | B |
Divergence | 24.9 | 22.5 | 10.2 | 22.0 | 23.6 | 22.2 | 24.5 |
4 (DUVV) | |||||||
86132SA | |||||||
Base call rate | 3.9 | 1.1 | 1.4 | 97.3 | 0.6 | 2.5 | 5.6 |
Identification | A | Neg | Neg | A | A | A | Neg |
Divergence | 23.2 | 22.8 | 22.2 | 6.0 | 20.6 | 21.6 | 21.9 |
5 (EBLV-1) | |||||||
8918FRA | |||||||
Base call rate | 8.1 | 6.9 | 15.2 | 13.3 | 93.8 | 7.8 | 4.8 |
Identification | B | A | A | A | A | A | Neg |
Divergence | 23.8 | 25.5 | 23.8 | 20.7 | 0.6d | 23.4 | 22.1 |
6 (EBLV-2) | |||||||
9018HOL | |||||||
Base call rate | 5.7 | 2.1 | 3.5 | 6.5 | 4.1 | 98.4 | 8.7 |
Identification | Neg | Neg | B | A | B | A | A |
Divergence | 21.2 | 23.8 | 23.5 | 21.7 | 23.3 | 0.0 | 22.3 |
7 (ABLV) | |||||||
9810AUS | |||||||
Base call rate | 8.4 | 8.3 | 1.4 | 12.9 | 3.8 | 11.3 | 94.9 |
Identification | B | A | Neg | A | B | B | A |
Divergence | 22.5 | 23.7 | 24.4 | 21.6 | 22.1 | 22.4 | 1.6 |
8e (DBLV) | |||||||
0406SEN | |||||||
Base call rate | 19.3 | 63.5 | 29.4 | 16.3 | 22.8 | 18.3 | 19.5 |
Identification | A | A | A | A | A | A | A |
Divergence | 25.0 | 20.1 | 21.5 | 23.4 | 23.5 | 22.8 | 23.8 |
Unclassified | |||||||
WCBV | |||||||
Base call rate | 25.3 | 28.3 | 32.7 | 26.9 | 26.5 | 23.8 | 24.5 |
Identification | C | A | A | A | A | A | A |
Divergence | 25.7 | 23.8 | 24.2 | 24.9 | 24.6 | 24.8 | 25.9 |
Percentage of base calls generated from full-length tiled sequences.
Taxonomic identification according to the following: A, identification at the species or isolate level when a unique best hit corresponds to the expected species or isolate; B, identification at the genus level when multiple best viral hits exist and correspond to the genus Lyssavirus; C, identification at the family level when multiple best viral hits exist and correspond to genera of the family Rhabdoviridae; Neg, negative or inaccurate identification when a BLAST query is not possible or when there are multiple best hits and some or all of them correspond to other viral families, respectively. Underlined are results obtained using the sequence belonging to the same species tiled on the array (homonymous sequence).
Percentage of nucleotide divergence (based on a 937-nt region of the polymerase gene, positions 7040 to 7977, according to the reference Pasteur virus genome (NC_001542).
The tiled sequence of 8918FRA corresponds to a preliminary sequencing result, and the complete genome of this virus strain was obtained later (EU293112), which may explain the 7-nt difference between those two sequences.
Tentative species.
TABLE 4.
Rhabdovirus genus and species (isolate) and parameter tested | Result from specific rhabdovirus sequence tiled |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Vesiculovirus |
Ephemerovirus |
Lyssavirus RABV (PV) | |||||||||
CHPV | ISFV | PERV | SVCV | VSIV | VSNJV | ARV | BEFV | KIMV | KOTV | ||
Vesiculovirus VSIV (0503FRA) | |||||||||||
Base call ratea | 1.2 | 4.1 | 1.0 | 1.0 | 98.6 | 2.9 | 0 | 0 | 0 | 0 | 0 |
Scoreb | Neg | Neg | Neg | Neg | A | A | Neg | Neg | Neg | Neg | Neg |
Ephemerovirus | |||||||||||
KIMVc (CS 368) | |||||||||||
Base call rate | 1.9 | 1.1 | 0 | 0 | 0.3 | 0.3 | 9.4 | 7.3 | 70.6 | 9.1 | 1.4 |
Score | Neg | Neg | Neg | Neg | Neg | Neg | Neg | Neg | A | Neg | Neg |
KOTVd (Ib Ar23380, 9145NIG) | |||||||||||
Base call rate | 6.6 | 3.8 | 5.7 | 3.2 | 3.7 | 7.2 | 8.8 | 5.2 | 3.4 | 100 | 2.1 |
Score | Neg | Neg | Neg | Neg | Neg | C | Neg | Neg | Neg | A | Neg |
Lyssavirus RABV (93127FRA) | |||||||||||
Base call rate | 0.3 | 1.2 | 2.6 | 1.5 | 0 | 0 | 0.1 | 0 | 0.1 | 2.3 | 95.0 |
Score | Neg | Neg | Neg | Neg | Neg | Neg | Neg | Neg | Neg | Neg | A |
Percentage of base calls generated from full-length tiled sequences.
Taxonomic identification according to the following: A, identification at the species or isolate level when a unique best hit corresponds to the expected species or isolate; C, identification at the family level when multiple best viral hits exist and correspond to genera of the family Rhabdoviridae; Neg, negative or inaccurate identification when a BLAST query is not possible or when multiple best hits exist and some or all of them correspond to other viral families. Underlined are results obtained using the sequence belonging to the same species or isolate tiled on the array (homonymous sequence).
TS, tentative species according to ICTVdb.
Taxonomic classification according to reference 6.
In both experiments (Tables 3 and 4), low base call rate values were obtained for several combinations of hybridized and tiled sequences. These values were sufficient for viral identification by BLAST, despite the presence of sequence reads as short as 14 nt. This indicates that most of these short sequences corresponded to highly conserved sequence domains. The accuracy of these short sequences was checked by comparison with those obtained by classical sequencing (data not shown).
Identification of lyssaviruses based on the consensus sequence determination strategy.
A bioinformatic workflow was developed to gather stretches of sequence reads obtained with more or less distantly related sequences tiled on PathogenID v2.0. The aim of this strategy was to enlarge the length of the sequence determined in order to improve the sensitivity of the BLAST analysis compared to previously described methodologies (29). All of the sequence reads obtained from prototype sequences of the genus Lyssavirus (at least 12 nt long with no more than one undetermined base, whether or not they initially led to a positive BLAST identification) were used to generate a contiguous sequence. When overlapping fragments were identified, a consensus sequence was generated to remove ambiguous or undetermined base calls. The methodology used to obtain consensus sequences confirmed the species identification after BLAST analysis in the case of the seven lyssavirus nucleotide sequences used for hybridization (Table 5). Moreover, these consensus sequences were found to be more powerful in identifying unclassified or new species of lyssaviruses not tiled on the RMA than resequencing data collected individually from each tiled sequences, as shown for strains 0406SEN and WCBV. In both cases, an increase in the base call rate was observed using this consensus sequence strategy, from 63.5% (best base call rate obtained from individual prototype sequences) to 75.9% for strain 0406SEN and from 32.7% to 60.9% for WCBV (Tables 3 and 5). Once again, this increase in nucleotide base determination was associated with a relatively high accuracy (91.8% and 97.3% concordance between the consensus sequences and the reference sequences of isolates 0406SEN and WCBV, respectively (Table 5). To further demonstrate the ability of this strategy to detect and identify novel virus species, consensus sequences were generated based only on six of the seven prototype tiled sequences (excluding the homospecific sequence of the same species tiled on the array). All of the strains of the seven species tested were accurately and specifically identified using this restricted approach (Table 5). These results indicate that the consensus sequences obtained could improve the detection of a novel domain(s) not identified using only the closest prototype sequence tiled on the RMA.
TABLE 5.
Lyssavirus species (abbreviation), isolate, and parameter tested | Result obtained with analysis strategy of use of: |
||
---|---|---|---|
Prototype sequence only | Consensus sequence |
||
Including all tiled sequences | Excluding prototype sequence | ||
1 (RABV) | |||
93127FRA | |||
Base call ratea | 95.0 | 96.3 | 32.7 |
BLAST scoreb | 791 | 801 | 38 |
Accuracyc | 100 | 99.9 | 95.9 |
8764THA | |||
Base call rate | 32.6 | 47.4 | 26.7 |
BLAST score | 46 | 64 | 31 |
Accuracy | 94.8 | 99.1 | 98.4 |
2 (LBV), 8619NIG | |||
Base call rate | 96.6 | 96.4 | 28.1 |
BLAST score | 816 | 814 | 39 |
Accuracy | 99.9 | 99.9 | 97.7 |
3 (MOKV), 86100CAM | |||
Base call rate | 56.3 | 67.4 | 28.4 |
BLAST score | 66 | 112 | 64 |
Accuracy | 98.2 | 99.8 | 98.5 |
4 (DUVV), 86132SA | |||
Base call rate | 97.3 | 97.3 | 18.1 |
BLAST score | 843 | 833 | 20 |
Accuracy | 99.9 | 99.8 | 96.4 |
5 (EBLV-1), 8918FRA | |||
Base call rate | 93.8 | 96.0 | 41.1 |
BLAST score | 757 | 807 | 83 |
Accuracy | 100 | 100 | 97.9 |
6 (EBLV-2), 9018HOL | |||
Base call rate | 98.4 | 98.8 | 26.8 |
BLAST score | 871 | 879 | 44 |
Accuracy | 100 | 99.9 | 99.6 |
7 (ABLV), ABLV | |||
Base call rate | 94.9 | 95.6 | 29.7 |
BLAST score | 749 | 741 | 40 |
Accuracy | 100 | 99.9 | 94.5 |
8e (DBLV), 0406SEN | |||
Base call rate | NAd | 75.9 | NA |
BLAST score | NA | 82 | NA |
Accuracy | NA | 91.8 | NA |
?, WCBV | |||
Base call rate | NA | 60.9 | NA |
BLAST score | NA | 56 | NA |
Accuracy | NA | 97.3 | NA |
Percentage of base calls generated from full-length tiled sequences.
BLAST score (bit score) obtained after BLAST query on a local viral and bacterial database using the consensus sequence determination strategy with m (minimum nucleotide length) = 12 and N (maximum undetermined nucleotide content) = 10. Default BLAST parameters, except for the minimum word length (7 nt), the expect threshold (increased from the default of 10 to 100,000), and the low complexity level filter (−F) turned off. All of the BLAST scores indicate correct identification at the species or isolate level (i.e., unique best hit corresponds to the expected species or isolate).
Percentage of nucleotides correctly identified, compared to the sequence obtained after classical sequencing of the corresponding Lyssavirus species tested.
NA, not applicable.
Tentative species.
Assessment of clinical specimens.
A total of 17 brain biopsy samples originating from experimentally infected mice and various clinical samples (n = 8) obtained from the National Reference Centre for Rabies at the Institut Pasteur were tested for lyssavirus detection and identification using the two versions of the PathogenID microarray (Tables 1 and 2). These specimens were previously collected from humans and animals with clinically documented encephalitis and suspected of having rabies. They were used to compare RMA results with conventional methods of diagnosis, including the RT-heminested PCR (RT-hnPCR) technique for the intra vitam diagnosis of rabies in humans (13), the fluorescent-antibody test, the rabies tissue culture inoculation test, and the enzyme-linked immunosorbent assay for the postmortem diagnosis of humans and animals (8, 47). Among the eight clinical samples, most were brain biopsy specimens collected from different rabid mammals, including a bat, a cat, a dog, and two foxes, and from a human. The two other samples comprised a saliva specimen and a skin biopsy sample collected from two different rabid human patients (Tables 1 and 2). Except for the skin biopsy case, which was not recognized, this comparison demonstrated a complete concordance between our method and conventional methods for all of the samples tested. Hence, the accuracy of the sequences provided with PathogenID microarray was close to that obtained using classical sequencing (data not shown). The failure to detect lyssaviruses in the skin biopsy samples was probably due to insufficient sensitivity of the current RMA method, as viral RNA was only weakly detected after RT-hnPCR.
In sum, these results demonstrated that the newly developed amplification process by WTA coupled to hybridization to the PathogenID microarray allowed the detection of a large range of viral variants from various complex biological samples, including clinical samples (Tables 1 and 2).
Application of the RMA strategy to characterize new rhabdoviruses.
Broad-spectrum detection was demonstrated using the consensus sequences-based analysis strategy among viruses of the family Rhabdoviridae, and the more distantly related viruses examined included many viruses that are not yet classified as species. Accordingly, 17 different rhabdoviruses were tested by using brain samples from experimentally infected mice (n = 16) or infected cell suspension. These viruses included four strains belonging to the genus Vesiculovirus, with Vesicular stomatitis Indiana virus (VSIV) and Boteke (BOTK), Jurona (JURV), and Porton's (PORV) viruses, the latter three of which are currently classified as tentative species; two strains belonging to the genus Ephemerovirus, the Kimberley (KIMV) and kotonkan (KOTV) viruses, corresponding to a tentative and an unassigned species, respectively; and 11 presently unassigned rhabdoviruses, namely, the Kamese (KAMV), Mossuril (MOSV), Sandjimba (SAJV), Keuraliba (KEUV), Nkolbisson (NKOV), Garba (GARV), Nasoule (NASV), Ouango (OUAV), Bimbo (BBOV), Bangoran (BGNV), and Gossas (GOSV) viruses (virus taxonomy according to ICTVdb) (Table 2).
In the first step, successful detection and identification of these viruses using the PathogenID v2.0 microarray was obtained for 12 (70.5%) out of 17 viruses; accurate taxonomic positioning—that is, within the family Rhabdoviridae—was also achieved, and for some, the corresponding genus (when available) was also matched accurately (data not shown). In the second step, specific and consensus primers were designed based on the stretches of sequences identified by the microarray using the consensus sequence determination strategy and then subsequently used for PCR and classical sequencing of the amplified target nucleotide sequences. For four (GARV, NASV, OUAV, and BBOV) of the five rhabdoviruses not detected by the microarray, a region of 1,000 nt of the polymerase gene encompassing that tiled on the array was successfully amplified by PCR and sequenced using the primers described above. The only exception was the GOSV isolate, which remained undetected by either the microarray or PCR. Further, two other rhabdoviruses not previously tested with the PathogenID v2.0 microarray—Kolongo virus (KOLV, an unclassified species) and Piry virus (PIRYV, a vesiculovirus)—were also amplified and sequenced using these primers.
All of the newly sequenced nucleotide regions of the polymerase gene were further translated into protein sequences and aligned with 88 sequences of animal or plant rhabdoviruses obtained from GenBank, producing a total data set of 106 sequences 160 amino acid residues in length. A Bayesian phylogenetic analysis of these sequences tentatively distinguished 15 groups of viruses based on their strongly supported monophyly (Table 6 and Fig. 3). The members of the six genera—Ephemerovirus, Lyssavirus, Vesiculovirus, Cytorhabdovirus, Nucleorhabdovirus, and Novirhabdovirus—fall into well-supported monophyletic groups (BPP value, ≥0.97) (Fig. 3). Interestingly, this analysis suggested the existence of at least nine more groups of currently unclassified rhabdoviruses, which reflect important biological characteristics of the viruses in question. Five of these groups have been proposed previously and were further supported by our analysis (data available at the CRORA database website [http://www.pasteur.fr/recherche/banques/CRORA/]) (6, 27; reviewed in reference 7). The first group, tentatively named the Hart Park group, contains the previously described Parry Creek (PCRV), Wongabel (WONV), Flanders (FLANV), and Ngaingan (NGAV) viruses added to the newly identified viruses BGNV, KAMV, MOSV, and PORV. This group has a large distribution that encompasses Africa, Australia, Malaysia, and the United States. These viruses have a wide host range, as they have been found to infect dipterans, birds, and mammals. The second group is the Almpiwar group, containing four members—two strains of Charleville (CHVV) virus, i.e., CHVV_Ch9824 and CHVV_Ch9847—and the Almpiwar (ALMV) and Humpty doo (HDOOV) viruses. Viruses of this group were isolated in Australia and are associated with infections of dipterans and lizards but also birds and mammals, including humans. Another group, herein referred to as the Le Dantec group, was also seen to form a distinct cluster with Le Dantec virus (LDV), Fukuoka virus (FUKV), and the two newly molecularly identified viruses KEUV and NKOV. Members were isolated in Japan and Africa, where they were shown to infect dipterans and mammals, including humans. The fourth group has been tentatively named the Tibrogargan group and includes the Tupaia (TUPV) and Tibrogargan (TIBV) viruses. These viruses were isolated in Southeast Asia, Australia, and New Guinea from dipterans and mammals. Finally, we observed the Sigma group as previously described (27). It includes Drosophila affinis (DAffSV), Drosophila obscura (DObsSV), and two strains of Drosophila melanogaster (SIGMAV_AP30 and SIGMAV_HAP23) sigma viruses, infecting Drosophila flies which were found in the United States and Europe.
In addition, four other tentative groups of viruses are newly described in this study. The Sandjimba group includes the first molecularly classified viruses BBOV, BTKV, NASV, GARV, and OUAV and the previously described Oak-Vale virus (OVRV), SJAV, and KOLV (identification of the latter two based only on a limited region of the nucleoprotein gene). These viruses were isolated from birds and dipterans from the Central African Republic and Australia (data available at http://www.pasteur.fr/recherche/banques/CRORA/) (6, 9). Interestingly, all of the African members of this group clustered closely, whereas the sole Australian virus was more divergent, suggesting a potential geographical segregation. Second, the Sinistar group includes the Siniperca chuatsi rhabdovirus (SCRV) isolated from mandarin fish in China (37) and the starry flounder rhabdovirus (SFRV) from starry flounder in the United States (30). These two viruses appear to be more closely related to the Le Dantec group than to viruses in the genus Vesiculovirus, in which several other fish rhabdoviruses are classified. The third one is the Moussa group, including two isolates of Moussa virus (MOUV_D24 and MOUV_C23) collected from mosquitoes in Ivory Coast (34). Finally, a phylogenetic analysis suggests the presence of another group within the plant rhabdoviruses: the Taastrup group, which comprises the single isolate Taastrup virus (TV) isolated from leafhoppers (Psammotettix alienus) originally collected in France (28). All of these groups were strongly supported by the Bayesian analysis (BPP value, ≥0.98), with the exception of the Sigma group, which exhibits a BPP value of 0.88.
In addition, classification of some uncharacterized rhabdoviruses from our phylogenetic analysis diverged from that previously suggested by serology (according to ICTVdb) and will probably need further investigation to determine their precise taxonomic positions within the family Rhabdoviridae (Table 6) (9, 38). In particular, PORV and BTKV, previously identified as vesiculoviruses, were included within the Hart Park and Sandjimba groups, respectively, and NKOV was classified into the Le Dantec group instead of the Kern Canyon group. Moreover, in contrast to a previous phylogenetic study (22), TUPV was found to be more closely related to TIBV than to any other isolates in the Sandjimba group. Finally, our study confirmed the previous serology-based classification of JURV and the recently identified Scophthalmus maximus rhabdovirus (SMRV) within the Vesiculovirus genus (38, 49).
DISCUSSION
We have analyzed the capacity of viral detection and identification of two versions of a newly described RMA, termed PathogenID, which was designed specifically for multiple pathogen detection using database similarity searching (1). To evaluate this microarray, we focused on one of the largest and most diverse viral families described to date, the Rhabdoviridae (ICTVdb, reviewed in reference 7). All of the virus strains tested (except WCBV) were extracted from biological samples and amplified using a nonspecific and unbiased WTA step as previously described (3). Rhabdovirus-targeted sequences were selected among blocks of conservation within the polymerase gene (6). This region was chosen so as to encompass a sufficient number of homologous but also polymorphic sites. The key advantage of this RMA strategy is that it does not require a specific match between the samples tested and tiled sequences; indeed, mismatches add value as they allow precise typing of the unknown genetic resequenced element. In our case, the conserved nature of the target region of the polymerase gene (block III) and the capability of detection of the RMA allows a precise taxonomic identification (i.e., family, genus, species) and also provides key information on phylogenetic relationships for some unclassified, unassigned, or tentative species of rhabdoviruses. For example, results obtained by the PathogenID v1.0 microarray evaluation demonstrated that most of the intraspecies nucleotide diversity found in the genus Lyssavirus can be covered by a single prototype sequence tiled on the microarray. Using the second version of PathogenID which included one prototype sequence of each of the seven species recognized thus far within the genus Lyssavirus, we extended the spectrum of detection of the RMA to potentially all of the known or unknown lyssaviruses (i.e., positive detection of virus isolates presenting up to 25.9% nucleotide divergence with the tiled sequence considered), which is greater than that previously reported (24-26, 43, 44).
This study also indicates that accurate viral identification may still be possible even when only shorter sequences are obtained from individual tiled prototype sequences. Indeed, taken individually, these short stretches of nucleotide sequence could not give positive results during the initial BLAST query. However, when used in the consensus sequence determination strategy employed here, they improved the identification of virus strains distantly related with that tiled on the RMA. For example, we were able to test and detect rhabdoviruses based on sequence data obtained with tiled sequences that originated from other viral genera.
The strategy developed here also allowed the potential detection of genetically diverse rhabdoviruses previously identified or unknown by using a limited number of sequences tiled on the microarray. Using the PathogenID v2.0 microarray, we were able to identify 30 rhabdoviruses in total. This included 12 viruses currently unclassified, unassigned, or assigned as tentative species within the family Rhabdoviridae (according to ICTVdb). Moreover, the consensus sequence-based analysis of RMA results was shown to be accurate compared to sequences obtained through classical sequencing (Table 5 and data not shown). Sequence data provided by the PathogenID v2.0 microarray were also extremely helpful in the design of specific primers to further sequence the targeted region of the viral polymerase gene of some other rhabdoviruses. Finally, this approach allowed us to undertake the largest phylogenetic analysis of the family Rhabdoviridae (Table 6 and Fig. 3), even though it is important to note that the list of viruses and potential taxa described here is still incomplete and more viruses will clearly be characterized in the near future. Despite these phylogenetic divisions, all of the viruses included in these proposed groups are closely related to vesiculoviruses and ephemeroviruses and were found to infect a large spectrum of animals, included dipterans and mammals (and previously referred to as the dimarhabdovirus supergroup (6) but also lizards (Almpiwar group), birds (especially the Sandjimba group but also with Hart Park group), and fish (Sinistar group) (Table 6).
Although promising, inadequate sequence selection for the design of the RMA, and consequently a lack of coverage of the viral sequence space, represents an important limitation. A proper selection of blocks of conserved sequence across taxonomic subdivisions in the viral world could be similarly defined and targeted by the RMA assay, and in doing so improve the detection power of this tool and therein greatly aid in the identification of members of the family Rhabdoviridae or even other viral families. The results presented here validated the usefulness of the design methodology. It emphasizes the gain in identification using a consensus sequence strategy determination compared to a systematic BLAST strategy (29). Indeed, this strategy allows us to use and accurately analyze the RMA output data, even if only short subsequences with a high base call rate are obtained. It provides an informative alternative to current molecular methods, such as classical or multiplex PCR, for the rapid identification of viral pathogens. It is currently being applied to assist in a new generation of RMA aimed at the detection and identification of genetically diverse and unknown viral pathogens and more broadly of any virus present in a clinical specimen. In contrast to conventional microarrays, it is not limited by the requirement of prior knowledge of the identities of viruses present in biological samples and it is not restricted to the detection of a limited number of candidate viruses. As such, this strategy has a great potential for being implemented as a high-throughput platform to identify more divergent viral organisms. This technology could be especially useful in clinical diagnosis or in surveillance programs for detecting uncharacterized viral pathogens or highly variable virus strains in the same taxonomic genus or family, which is frequently the case for RNA viruses (2). The potential applications of such a methodology therefore appear to be numerous: differential diagnostics for illnesses with multiple potential causes (for example, central nervous diseases like encephalitis and meningitis), tracking of emergent pathogens, the distinction of biological threats from harmless phylogenetic neighbors, and the broader-scale investigation of biodiversity in the viral world.
Acknowledgments
This work was supported by grant UC1 AI062613 (G. C. Kennedy) from the U.S. National Institute of Allergy and Infectious Diseases, National Institute of Health; the Programme Transversal de Recherche (PTR DEVA 246) from the Institut Pasteur, Paris, France; the European Commission, through the VIZIER Integrated Project (LSHG-CT-2004-511966); and the Institut Pasteur International Network Actions Concertées InterPasteuriennes (2003/687). We thank the sponsorship of the Total-Institut Pasteur for financial support.
We are grateful to D. Blondel, H. Zeller, and the CRORA database for having provided some of the rhabdovirus isolates tested in this study. We are also grateful to the technical staff of the Genotyping of Pathogens and Public Health Technological Platform for their patience and their excellent work in the sequencing of the different rhabdoviruses.
Footnotes
Published ahead of print on 7 July 2010.
REFERENCES
- 1.Berthet, N., P. Dickinson, I. Filliol, A. K. Reinhardt, C. Batejat, T. Vallaeys, K. A. Kong, C. Davies, W. Lee, S. Zhang, Y. Turpaz, B. Heym, G. Coralie, L. Dacheux, A. M. Burguière, H. Bourhy, I. G. Old, J. M. Manuguerra, S. T. Cole, and G. C. Kennedy. 2007. Massively parallel pathogen identification using high-density microarrays. Microb. Biotechnol. 1:79-86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Berthet, N., I. Leclercq, A. Dublineau, S. Shigematsu, A. M. Burguiere, C. Filippone, A. Gessain, and J. C. Manuguerra. 2010. High-density resequencing DNA microarrays in public health emergencies. Nat. Biotechnol. 28:25-27. [DOI] [PubMed] [Google Scholar]
- 3.Berthet, N., A. K. Reinhardt, I. Leclercq, S. van Ooyen, C. Batejat, P. Dickinson, R. Stamboliyska, I. G. Old, K. A. Kong, L. Dacheux, H. Bourhy, G. C. Kennedy, C. Korfhage, S. T. Cole, and J. C. Manuguerra. 2008. Phi29 polymerase based random amplification of viral RNA as an alternative to random RT-PCR. BMC Mol. Biol. 9:77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bodrossy, L., and A. Sessitsch. 2004. Oligonucleotide microarrays in microbial diagnostics. Curr. Opin. Microbiol. 7:245-254. [DOI] [PubMed] [Google Scholar]
- 5.Botvinkin, A. D., E. M. Poleschuk, I. V. Kuzmin, T. I. Borisova, S. V. Gazaryan, P. Yager, and C. E. Rupprecht. 2003. Novel lyssaviruses isolated from bats in Russia. Emerg. Infect. Dis. 9:1623-1625. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bourhy, H., J. A. Cowley, F. Larrous, E. C. Holmes, and P. J. Walker. 2005. Phylogenetic relationships among rhabdoviruses inferred using the L polymerase gene. J. Gen. Virol. 86:2849-2858. [DOI] [PubMed] [Google Scholar]
- 7.Bourhy, H., A. Gubala, R. P. Weir, and D. Boyle. 2008. Animal rhabdoviruses, p. 111-121. In B. W. J. Mahy and M. H. V. Van Regenmortel (ed.), Encyclopedia of virology, vol. 1. Elsevier, Oxford, United Kingdom. [Google Scholar]
- 8.Bourhy, H., P. E. Rollin, J. Vincent, and P. Sureau. 1989. Comparative field evaluation of the fluorescent-antibody test, virus isolation from tissue culture, and enzyme immunodiagnosis for rapid laboratory diagnosis of rabies. J. Clin. Microbiol. 27:519-523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Calisher, C. H., N. Karabatsos, H. Zeller, J. P. Digoutte, R. B. Tesh, R. E. Shope, A. P. Travassos da Rosa, and T. D. St. George. 1989. Antigenic relationships among rhabdoviruses from vertebrates and hematophagous arthropods. Intervirology 30:241-257. [DOI] [PubMed] [Google Scholar]
- 10.Chiu, C. Y., A. A. Alizadeh, S. Rouskin, J. D. Merker, E. Yeh, S. Yagi, D. Schnurr, B. K. Patterson, D. Ganem, and J. L. DeRisi. 2007. Diagnosis of a critical respiratory illness caused by human metapneumovirus by use of a pan-virus microarray. J. Clin. Microbiol. 45:2340-2343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Chiu, C. Y., A. L. Greninger, K. Kanada, T. Kwok, K. F. Fischer, C. Runckel, J. K. Louie, C. A. Glaser, S. Yagi, D. P. Schnurr, T. D. Haggerty, J. Parsonnet, D. Ganem, and J. L. DeRisi. 2008. Identification of cardioviruses related to Theiler's murine encephalomyelitis virus in human infections. Proc. Natl. Acad. Sci. U. S. A. 105:14124-14129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chiu, C. Y., A. Urisman, T. L. Greenhow, S. Rouskin, S. Yagi, D. Schnurr, C. Wright, W. L. Drew, D. Wang, P. S. Weintrub, J. L. Derisi, and D. Ganem. 2008. Utility of DNA microarrays for detection of viruses in acute respiratory tract infections in children. J. Pediatr. 153:76-83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Dacheux, L., J. M. Reynes, P. Buchy, O. Sivuth, B. M. Diop, D. Rousset, C. Rathat, N. Jolly, J. B. Dufourcq, C. Nareth, S. Diop, C. Iehle, R. Rajerison, C. Sadorge, and H. Bourhy. 2008. A reliable diagnosis of human rabies based on analysis of skin biopsy specimens. Clin. Infect. Dis. 47:1410-1417. [DOI] [PubMed] [Google Scholar]
- 14.Delarue, M., O. Poch, N. Tordo, D. Moras, and P. Argos. 1990. An attempt to unify the structure of polymerases. Protein Eng. 3:461-467. [DOI] [PubMed] [Google Scholar]
- 15.Delmas, O., E. C. Holmes, C. Talbi, F. Larrous, L. Dacheux, C. Bouchier, and H. Bourhy. 2008. Genomic diversity and evolution of the lyssaviruses. PLoS One 3:e2057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Hacia, J. G. 1999. Resequencing and mutational analysis using oligonucleotide microarrays. Nat. Genet. 21:42-47. [DOI] [PubMed] [Google Scholar]
- 17.Hall, T. A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41:95-98. [Google Scholar]
- 18.Huelsenbeck, J. P., and F. Ronquist. 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754-755. [DOI] [PubMed] [Google Scholar]
- 19.Kistler, A., P. C. Avila, S. Rouskin, D. Wang, T. Ward, S. Yagi, D. Schnurr, D. Ganem, J. L. DeRisi, and H. A. Boushey. 2007. Pan-viral screening of respiratory tract infections in adults with and without asthma reveals unexpected human coronavirus and human rhinovirus diversity. J. Infect. Dis. 196:817-825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kondo, H., T. Maeda, Y. Shirako, and T. Tamada. 2006. Orchid fleck virus is a rhabdovirus with an unusual bipartite genome. J. Gen. Virol. 87:2413-2421. [DOI] [PubMed] [Google Scholar]
- 21.Kothapalli, R., S. J. Yoder, S. Mane, and T. P. Loughran, Jr. 2002. Microarray results: how accurate are they? BMC Bioinformatics 3:22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kuzmin, I. V., G. J. Hughes, and C. E. Rupprecht. 2006. Phylogenetic relationships of seven previously unclassified viruses within the family Rhabdoviridae using partial nucleoprotein gene sequences. J. Gen. Virol. 87:2323-2331. [DOI] [PubMed] [Google Scholar]
- 23.Leski, T. A., B. Lin, A. P. Malanoski, Z. Wang, N. C. Long, C. E. Meador, B. Barrows, S. Ibrahim, J. P. Hardick, M. Aitichou, J. M. Schnur, C. Tibbetts, and D. A. Stenger. 2009. Testing and validation of high density resequencing microarray for broad range biothreat agents detection. PLoS One 4:e6569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lin, B., K. M. Blaney, A. P. Malanoski, A. G. Ligler, J. M. Schnur, D. Metzgar, K. L. Russell, and D. A. Stenger. 2007. Using a resequencing microarray as a multiple respiratory pathogen detection assay. J. Clin. Microbiol. 45:443-452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lin, B., A. P. Malanoski, Z. Wang, K. M. Blaney, A. G. Ligler, R. K. Rowley, E. H. Hanson, E. von Rosenvinge, F. S. Ligler, A. W. Kusterbeck, D. Metzgar, C. P. Barrozo, K. L. Russell, C. Tibbetts, J. M. Schnur, and D. A. Stenger. 2007. Application of broad-spectrum, sequence-based pathogen identification in an urban population. PLoS One 2:e419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lin, B., Z. Wang, G. J. Vora, J. A. Thornton, J. M. Schnur, D. C. Thach, K. M. Blaney, A. G. Ligler, A. P. Malanoski, J. Santiago, E. A. Walter, B. K. Agan, D. Metzgar, D. Seto, L. T. Daum, R. Kruzelock, R. K. Rowley, E. H. Hanson, C. Tibbetts, and D. A. Stenger. 2006. Broad-spectrum respiratory tract pathogen identification using resequencing DNA microarrays. Genome Res. 16:527-535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Longdon, B., D. J. Obbard, and F. M. Jiggins. 2010. Sigma viruses from three species of Drosophila form a major new clade in the rhabdovirus phylogeny. Proc. Biol. Sci. 277:35-44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Lundsgaard, T. 1997. Filovirus-like particles detected in the leafhopper Psammotettix alienus. Virus Res. 48:35-40. [DOI] [PubMed] [Google Scholar]
- 29.Malanoski, A. P., B. Lin, Z. Wang, J. M. Schnur, and D. A. Stenger. 2006. Automated identification of multiple micro-organisms from resequencing DNA microarrays. Nucleic Acids Res. 34:5300-5311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Mork, C., P. Hershberger, R. Kocan, W. Batts, and J. Winton. 2004. Isolation and characterization of a rhabdovirus from starry flounder (Platichthys stellatus) collected from the northern portion of Puget Sound, Washington, USA. J. Gen. Virol. 85:495-505. [DOI] [PubMed] [Google Scholar]
- 31.Paez, J. G., M. Lin, R. Beroukhim, J. C. Lee, X. Zhao, D. J. Richter, S. Gabriel, P. Herman, H. Sasaki, D. Altshuler, C. Li, M. Meyerson, and W. R. Sellers. 2004. Genome coverage and sequence fidelity of phi29 polymerase-based multiple strand displacement whole genome amplification. Nucleic Acids Res. 32:e71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Palacios, G., P. L. Quan, O. J. Jabado, S. Conlan, D. L. Hirschberg, Y. Liu, J. Zhai, N. Renwick, J. Hui, H. Hegyi, A. Grolla, J. E. Strong, J. S. Towner, T. W. Geisbert, P. B. Jahrling, C. Buchen-Osmond, H. Ellerbrok, M. P. Sanchez-Seco, Y. Lussier, P. Formenty, M. S. Nichol, H. Feldmann, T. Briese, and W. I. Lipkin. 2007. Panmicrobial oligonucleotide array for diagnosis of infectious diseases. Emerg. Infect. Dis. 13:73-81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Poch, O., I. Sauvaget, M. Delarue, and N. Tordo. 1989. Identification of four conserved motifs among the RNA-dependent polymerase encoding elements. EMBO J. 8:3867-3874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Quan, P. L., S. Junglen, A. Tashmukhamedova, S. Conlan, S. K. Hutchison, A. Kurth, H. Ellerbrok, M. Egholm, T. Briese, F. H. Leendertz, and W. I. Lipkin. 2010. Moussa virus: a new member of the Rhabdoviridae family isolated from Culex decens mosquitoes in Cote d'Ivoire. Virus Res. 147:17-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Quan, P. L., G. Palacios, O. J. Jabado, S. Conlan, D. L. Hirschberg, F. Pozo, P. J. Jack, D. Cisterna, N. Renwick, J. Hui, A. Drysdale, R. Amos-Ritchie, E. Baumeister, V. Savy, K. M. Lager, J. A. Richt, D. B. Boyle, A. Garcia-Sastre, I. Casas, P. Perez-Brena, T. Briese, and W. I. Lipkin. 2007. Detection of respiratory viruses and subtype identification of influenza A viruses by GreeneChipResp oligonucleotide microarray. J. Clin. Microbiol. 45:2359-2364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Taitt, C. R., A. P. Malanoski, B. Lin, D. A. Stenger, F. S. Ligler, A. W. Kusterbeck, G. P. Anderson, S. E. Harmon, L. C. Shriver-Lake, S. K. Pollack, D. M. Lennon, F. Lobo-Menendez, Z. Wang, and J. M. Schnur. 2008. Discrimination between biothreat agents and ‘near neighbor’ species using a resequencing array. FEMS Immunol. Med. Microbiol. 54:356-364. [DOI] [PubMed] [Google Scholar]
- 37.Tao, J. J., G. Z. Zhou, J. F. Gui, and Q. Y. Zhang. 2008. Genomic sequence of mandarin fish rhabdovirus with an unusual small non-transcriptional ORF. Virus Res. 132:86-96. [DOI] [PubMed] [Google Scholar]
- 38.Tesh, R. B., A. P. Travassos Da Rosa, and J. S. Travassos Da Rosa. 1983. Antigenic relationship among rhabdoviruses infecting terrestrial vertebrates. J. Gen. Virol. 64(Pt. 1):169-176. [DOI] [PubMed] [Google Scholar]
- 39.Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Vora, G. J., C. E. Meador, D. A. Stenger, and J. D. Andreadis. 2004. Nucleic acid amplification strategies for DNA microarray-based pathogen detection. Appl. Environ. Microbiol. 70:3047-3054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Wang, D., L. Coscoy, M. Zylberberg, P. C. Avila, H. A. Boushey, D. Ganem, and J. L. DeRisi. 2002. Microarray-based detection and genotyping of viral pathogens. Proc. Natl. Acad. Sci. U. S. A. 99:15687-15692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wang, D., A. Urisman, Y. T. Liu, M. Springer, T. G. Ksiazek, D. D. Erdman, E. R. Mardis, M. Hickenbotham, V. Magrini, J. Eldred, J. P. Latreille, R. K. Wilson, D. Ganem, and J. L. DeRisi. 2003. Viral discovery and sequence recovery using DNA microarrays. PLoS Biol. 1:E2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Wang, Z., L. T. Daum, G. J. Vora, D. Metzgar, E. A. Walter, L. C. Canas, A. P. Malanoski, B. Lin, and D. A. Stenger. 2006. Identifying influenza viruses with resequencing microarrays. Emerg. Infect. Dis. 12:638-646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Wang, Z., A. P. Malanoski, B. Lin, C. Kidd, N. C. Long, K. M. Blaney, D. C. Thach, C. Tibbetts, and D. A. Stenger. 2008. Resequencing microarray probe design for typing genetically diverse viruses: human rhinoviruses and enteroviruses. BMC Genomics 9:577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wilson, W. J., C. L. Strout, T. Z. DeSantis, J. L. Stilwell, A. V. Carrano, and G. L. Andersen. 2002. Sequence-specific identification of 18 pathogenic microorganisms using microarray technology. Mol. Cell. Probes 16:119-127. [DOI] [PubMed] [Google Scholar]
- 46.Xiong, Y., and T. H. Eickbush. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353-3362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Xu, G., P. Weber, Q. Hu, H. Xue, L. Audry, C. Li, J. Wu, and H. Bourhy. 2007. A simple sandwich ELISA (WELYSSA) for the detection of lyssavirus nucleocapsid in rabies suspected specimens using mouse monoclonal antibodies. Biologicals 35:297-302. [DOI] [PubMed] [Google Scholar]
- 48.Yoo, S. M., J. Y. Choi, J. K. Yun, J. K. Choi, S. Y. Shin, K. Lee, J. M. Kim, and S. Y. Lee. 2010. DNA microarray-based identification of bacterial and fungal pathogens in bloodstream infections. Mol. Cell. Probes 24:44-52. [DOI] [PubMed] [Google Scholar]
- 49.Zhang, Q. Y., J. J. Tao, L. Gui, G. Z. Zhou, H. M. Ruan, Z. Q. Li, and J. F. Gui. 2007. Isolation and characterization of Scophthalmus maximus rhabdovirus. Dis. Aquat. Organ. 74:95-105. [DOI] [PubMed] [Google Scholar]