Skip to main content
PLOS Pathogens logoLink to PLOS Pathogens
. 2021 Jul 29;17(7):e1009714. doi: 10.1371/journal.ppat.1009714

First historical genome of a crop bacterial pathogen from herbarium specimen: Insights into citrus canker emergence

Paola E Campos 1,2, Clara Groot Crego 1, Karine Boyer 1, Myriam Gaudeul 2,3, Claudia Baider 4, Damien Richard 1, Olivier Pruvost 1, Philippe Roumagnac 5,6, Boris Szurek 5, Nathalie Becker 2,#, Lionel Gagnevin 5,6,#, Adrien Rieux 1,*,#
Editor: David Mackey7
PMCID: PMC8320980  PMID: 34324594

Abstract

Over the past decade, ancient genomics has been used in the study of various pathogens. In this context, herbarium specimens provide a precious source of dated and preserved DNA material, enabling a better understanding of plant disease emergences and pathogen evolutionary history. We report here the first historical genome of a crop bacterial pathogen, Xanthomonas citri pv. citri (Xci), obtained from an infected herbarium specimen dating back to 1937. Comparing the 1937 genome within a large set of modern genomes, we reconstructed their phylogenetic relationships and estimated evolutionary parameters using Bayesian tip-calibration inferences. The arrival of Xci in the South West Indian Ocean islands was dated to the 19th century, probably linked to human migrations following slavery abolishment. We also assessed the metagenomic community of the herbarium specimen, showed its authenticity using DNA damage patterns, and investigated its genomic features including functional SNPs and gene content, with a focus on virulence factors.

Author summary

Herbarium collections are a precious resource to plant pathologists, tracking crop diseases on specimens collected in the past centuries. In addition to indicating the presence of a disease at a specific time and locality, recent molecular technologies now allow extraction and microbial DNA sequencing from dead specimens. Despite challenges due to the degraded nature of DNA retrieved from historical samples, we were able to reconstruct the genome of a pathogenic bacterium from a 1937 herbarium specimen collected in Mauritius: Xanthomonas citri pv. citri, responsible for Asiatic citrus canker (ACC, an economically important agricultural disease controlled mostly through prophylactic and quarantine measures). Enhanced knowledge about the epidemiology and evolution of this bacterial pathogen is valuable to improve these measures. We compared the genome of this 1937 bacterial strain to a collection of modern strains, included it in a tree representing their genetic relationships, and calculated both evolutionary mutation rate and divergence times. This “forensic investigation” informs us about how and when the disease developed in the South West Indian Ocean Islands. We hypothesize that there was a single (or a few related) introduction of ACC in Mauritius in the mid-19th century, followed by expansion to the neighbouring islands.

Introduction

Since the origins of agriculture, humanity has struggled with the incessant, devastating impact of plant diseases on food production [1]. As illustrated by the 19th century potato late blight epidemic caused by the oomycete Phytophthora infestans [2], crop pathogens have been responsible for tremendous losses, resulting in starvation for millions of human beings and massive migrations. Today, up to 40% of yield losses among major cultivated crops are associated with plant pathogens and pests with major economic impact [3]. Simultaneously, more than 800 million people remain chronically undernourished worldwide [4]. It is also widely acknowledged that the extensive use of pesticides against crop pathogens is detrimental to the environment, affects public health and threatens biodiversity [5].

In order to most effectively manage current infectious crop diseases and prevent future epidemics, a better understanding of the factors underlying pathogen emergence, adaptation and spread is necessary [1,6]. As sequencing technologies have become more accessible, genetic analyses have played an increasingly important role in infectious disease research. Whole genome sequencing of pathogens can confirm suspected cases of an infectious disease, discriminate between different strains, classify novel pathogens and reveal virulence mechanisms in a time- and cost-efficient manner [7,8]. In addition to examining individual pathogen sequences, multiple sequences can be combined within phylogenetic methods to assess population structure, elucidate evolutionary/transmission history and infer several demographic, evolutionary or epidemiological parameters [9,10]. Until recently, most studies were performed on field-sampled contemporary individuals over a time interval of maximum four decades. Although such studies grant a good understanding of the population structure and recent emergences of pathogens, such small temporal differences between samples do neither allow a thorough detection of measurable amounts of evolutionary changes, nor a reconstruction of deeper evolutionary timelines [11], leaving many questions on crop pathogen emergence unanswered. With the first studies on ancient DNA (hereafter aDNA) obtained from historical or ancient samples such as archaeological tissue remains or museum specimens [12], it became possible to explore the past from a genetic perspective.

The few studies performed on crop pathogens from herbarium specimens worldwide have emphasized the role of historical collections for understanding the evolution and epidemiology of plant pathogens [1317]. First, the observation of disease symptoms associated with herbarium specimen information (collection date, geographic location, host species or other phenotypic traits) may allow a direct update of past disease occurrence, distribution and host range. For instance, Antonovics et al. [18] made use of infected Silene sp. historical specimens to survey the incidence of anther smut disease and showed a possible change of host range of this disease in the Eastern USA. Second, recent molecular developments have allowed a more efficient retrieval and sequencing of low quantity, short and degraded nucleic acids from historical desiccated plant tissues [19]. Historical and modern genomes can then be compared to detect changes in genetic contents and arrangement over time, such as the loss or gain of functional genes or the change of ploidy levels, for both pathogens and their host plants [20]. Moreover, by expanding the temporal range between samples, the chance of detecting evolutionary changes, i.e. temporal signal, increases [21]. Compared to their most recent common ancestor, modern genomes are expected to have accumulated more mutations than their historical counterparts. These differences can be used to directly infer mutation rates, divergence time between lineages and sudden changes in genetic diversity [14,22], which can be correlated with historical and socioeconomic events. Such analyses performed on historical DNA sequences of P. infestans retrieved from 19th century herbarium specimens resolved the debated origin and identity of the strain that caused the 1840s late blight pandemic [14,17,2325]. Although reconstructions of crop pathogen history using full genomic sequences have been successfully realized on historical oomycetes [14,23] and viruses [2628], such an achievement has not yet been reported for a bacterial crop pathogen, for which only few genetic markers were previously exploited [29].

In this work, we focus on Xanthomonas citri pv. citri (Xci), the bacterium responsible for the Asiatic citrus canker (ACC) [29,30]. ACC causes important economic losses in most citrus-producing areas worldwide, both by decreasing fruit yield and quality, but also by limiting exportations due to its quarantine status [31]. The earliest records of ACC, dating back to 1812–1844, are in herbarium specimens from Indonesia and India [32], suggesting an Asiatic origin of Xci [3335]. From there, although without direct evidence, Xci would have spread through multiple dispersal events over time, leading to its current broad distribution across continents and islands. In this context, a comparison of Xci multilocus genotypes retrieved from herbarium specimens suggested Japan as being the source of the 1911 ACC original outbreak in Florida [29]. Aiming for a refined chronology of these spreading events within the South West Indian Ocean (SWIO) area, where Xci diversity is well-documented [3538], we focused on SWIO herbarium specimens.

We report the first genome of a historical bacterial pathogen retrieved from a citrus herbarium specimen collected in 1937 in Mauritius, 20 years after the first report of ACC on this island [39]. We studied the metagenomic composition of this herbarium sample and showed its authenticity as aDNA material by assessing damage patterns. Using tip-calibrated phylogenetic inferences performed with both the 1937 historical strain and a large set of modern genomes, we elucidated the emergence history of Xci in the SWIO islands and further analyzed its genomic characteristics, with a particular focus on virulence factors.

Results

Laboratory procedures & high-throughput sequencing

Herbarium specimen MAU 0015151 Citrus sp. from 1937, Mauritius (hereafter HERB_1937, Fig 1) was sampled from the Mauritius Herbarium collections (https://herbaria.plants.ox.ac.uk/bol/mau) and chosen for this study as the most ancient symptomatic herbarium specimen available from the SWIO area. It precedes the oldest culture available from this island by ~50 years. DNA was carefully extracted in a bleach-cleaned facility with no prior exposure to modern Xci DNA using an optimized protocol (see Material & Methods). Extracted DNA (yield of 0.75 ng per mg of leaf tissue) was shown with a specific and exclusive qPCR diagnostic assay [40] to contain Xci DNA, roughly equivalent to 3x105 CFU/cm2 (average CT of 32.0, CT cut-off = 35.4 and no-CT value for the negative control). Total DNA was then converted into an Illumina library, and sequencing generated 220.9 M paired-end reads with a base call accuracy of 99.90 to 99.96%. Following adaptor trimming and quality checking, insert reads were 59 ± 24 nt long and underwent four main analyses: metagenomic inference, ancient DNA authentication, comparative genomics and phylogenetic analyses, as summarized in Fig 2. Importantly, no Citrus nor Xci-specific DNA fragments were found in our negative control, thus ruling out in-lab contamination.

Fig 1. Citrus sp. specimen MAU 0015151 (HERB_1937), Mauritius Herbarium.

Fig 1

MAU 0015151 Citrus sp. specimen (HERB_1937) was collected from Mauritius in 1937 and deposited in the Mauritius Herbarium. Leaf areas displaying typical symptoms of Asiatic citrus canker are highlighted with blue dotted frames.

Fig 2. Major steps performed for characterization and integration of our herbarium sample into genomic analyzes.

Fig 2

See Material & Methods for more details on the workflow processed for HERB_1937 in this study. Abbreviations: CTAB, cetyl trimethylammonium bromide; DNA, deoxyribonucleic acid; BAM file, binary alignment map file; BLAST, basic local alignment search tool; NCBI nt database: national center for biotechnology information nucleotide database; SNP, single-nucleotide polymorphism.

Metagenomic composition

DNA extracted from leaf lesions is expected to originate from different sources. Using a combination of mapping- and BLAST-based approaches (as detailed in Material & Methods section), we studied the metagenomic composition of the reads obtained from HERB_1937. Identified sequences mostly consisted of the Citrus plant host genus (21.0%), followed by, at the species level, Homo sapiens (5.4%), and Xci found in 1.2% of the reads (Fig 3). Other reads were assigned to higher taxonomic levels, corresponding to one bacterial family and eleven different genera (from 0.18% (Burkholderia) to 1.57% (Methylobacterium) of aligned reads). Plant, fungi, vertebrate, bacteria and phage genera were marginally found (less than 0.25% of aligned reads for each genera). Altogether, reads unassigned to the species level added up to 12.4% of the reads. More than half (60.1%) of the reads were not assigned to any known taxa (Fig 3).

Fig 3. Metagenomic composition of HERB_1937 historical specimen.

Fig 3

Proportions of reads assigned to Homo sapiens (5.4%), Citrus sp. (21.0%), Xci (1.2%). Others (12.4%): reads unassigned at the species level; unassigned reads (60.1%). Table: reads unassigned at the species level were assigned to the family (for Beijerinckiaceae) or genus level (for all others) and belong, for 0.18% to 1.57% of the aligned reads to the domain bacteria; “Others (<0.25% all reads)” include reads assigned to different plant, fungi, vertebrate, bacteria and phage genera (each identified genus totalizing less than 0.25% of all reads).

Historical genome reconstruction & characterization

A high quality Xci genome was reconstructed from HERB_1937 (hereafter called HERB_1937_Xci) by mapping the reads (discarding the 5 terminal nucleotides) to Xci IAPAR 306 reference sequences [41]. 0.74% (N = 1,628,776) of the total number of reads mapped to the Xci reference genome, a value unsurprisingly smaller than the 1.2% found with the “metagenomic pipeline” which combined both mapping and BLAST-n approaches. The reference chromosome was covered by a depth (the number of mapped reads at a given position) of at least 1X for 94% of its sequence, and displayed a mean depth of ~6X. Both pXAC33 and pXAC64 plasmids displayed a higher mean depth and larger non-covered regions (Fig 4 and Table 1). As non-covered positions can be caused by the absence of genes in the historical strain compared to the modern reference, but also by reads mapping ambiguously to multiple positions (repeated regions or replicated genes), we further characterized these loci (see gene content & virulence factors section).

Fig 4. Coverage plots for the reconstructed HERB_1937_Xci chromosome and plasmids (pXAC33, pXAC64) sequences.

Fig 4

From inside to outside, a light to dark blue scale (delimited by a white line) represents 1, 1–5, 5–15, 15-35-fold coverage (Xci chromosome) and 1, 1–5, 5–35, 35-90-fold coverage (plasmids). Red rings indicate no identified coverage (depth = 0). SNP positions between the respective reconstructed and reference sequences are indicated (orange line). Accession numbers for Xci reference strain IAPAR 306: NC_003919.1 (chromosome), NC_003921.3 (plasmid pXAC33) and NC_003922.1 (plasmid pXAC64).

Table 1. Summary of mapping, depth coverage and damage statistics for the reconstructed HERB_1937_Xci genome.

Mapping, depth, coverage and damage statistics (read length, purine enrichment and deamination rate) are indicated for HERB_1937_Xci chromosome and plasmids (pXAC33, pXAC64). nt: nucleotides, SD: standard deviation.

Genome Endogenous Xci DNA (%)* Mean depth** Coverage (%)*** Read length (nt) Purine frequency enrichment at position -1 Deamination rate at terminal position (%)
0X 1X 5X 10X Mean SD Mean SD 5’C/T 3’G/A
Chromosome 0.71 5.9 5.8 94.2 53.0 7.0 42.75 12.64 1.79 0.00 2.25 2.35
pXAC33 0.01 21.9 17.4 82.6 80.2 75.1 45.43 14.21 1.76 0.05 2.91 2.96
pXAC64 0.02 17.3 11.5 88.5 82.3 63.6 45.20 13.85 1.77 0.01 2.65 2.73
Mean 44.46 13.57 1.77 0.03 2.60 2.68

* Reads mapping to Xci reference genome/total reads before duplicate removal, expressed in %.

** Average number of mapped reads at each base of the reference genome.

*** Percentage of reference genome covered at nX depth.

Ancient DNA damage assessment

Ancient DNA is typically degraded, presenting short fragments, excess of purine bases before DNA breaking points and cytosine deamination at fragment extremities [12,42]. We searched for such patterns of degradation in HERB_1937_Xci using the dedicated tool mapDamage2 [43]. The mean read length of HERB_1937_Xci reads was 44.5 ± 13.5 nt, showing substantial fragmentation (Fig 5A). DNA fragmentation being partially caused by depurination, we also examined the nucleotidic context surrounding 5’ end DNA breakpoints. We found a mean relative purine enrichment of 1.77 ± 0.03 between upstream positions -1 and -5 of HERB_1937_Xci reads (Table 1). Modern strains, fragmented by enzymatic digestion prior to library construction, displayed no such enrichment (0.87 ± 0.01). Cytosine deamination was investigated by monitoring 5’C/T substitutions versus complementary 3’G/A substitutions, a classical analysis for double-stranded, blunt-ended libraries constructed prior to sequencing. Mean deamination rates of HERB_1937_Xci reads reached maximal values at the terminal nucleotide (2.64 ± 0.29%, Table 1 and Fig 5B). For statistical analyses, we took into account the five successive extreme positions of the reads, harbouring a significant increase between each nucleotide (outwards) along the five first or last positions of the reads (Wilcoxon matched-pairs signed rank test, 2-tailed p-value = 0.0313). The maximal rate among reads of three modern Xci controls displayed a significantly lower value of 0.10 ± 0.07% (p<0.0001, unpaired 2-tailed Mann-Whitney test, Fig 5B). The apparent lower deamination rate for HERB_1937_Xci chromosome reads, as compared to both plasmids (Table 1), was analyzed similarly, along the five first or last positions of the reads. Interestingly, we found a significantly lower deamination rate for HERB_1937_Xci chromosome reads (1.6%) as compared to both plasmids (1.9%) (Wilcoxon matched-pairs signed rank test, 2-tailed p-value = 0.002). In contrast, we observed similar fragment lengths for chromosome and plasmid reads (Table 1 and Fig 5).

Fig 5. HERB_1937_Xci post-mortem DNA damage patterns.

Fig 5

Post-mortem DNA damage patterns were measured on historical HERB_1937_Xci (full, dotted or dashed blue lines for chromosome, pXAC33 and pXAC64 respectively) and compared with three modern Xci strains isolated from SWIO in 2012, 2013 and 2015 respectively (red lines, see results and S1 Table for full description). (A) Fragment length distribution (nucleotides; relative frequency in arbitrary units). (B) Deamination percentages of the first 25 nucleotides from the 5’ (C to T substitutions) and 3’ (G to A substitutions) ends, respectively. Dots: five most extreme nucleotides of the reads, showing a significant increase (towards the extremity) between each nucleotide along the five first or last positions of all HERB_1937_Xci reads. Along the five extreme nucleotides, reads matching to HERB_1937_Xci harboured significantly higher values than modern controls, and reads matching to the HERB_1937_Xci chromosome harboured significantly lower deamination rates than sequences matching to either plasmid (see results for statistics).

We performed the same analyses using the Methylobacterium reference genome (M. organophilum, strain DSM 760), since 1.57% of the reads unassigned at the species level were attributed to this particular genus (Fig 3). Mean read length was estimated at 66 ± 25 nt, relative enrichment of purine frequency reached 1.80 ± 0.52 (as previously, between position -1 and -5), and deamination rates at the terminal nucleotides averaged 2.18 ± 0.07%.

Gene content & virulence factors

Out of 5,125 coding sequences (CDS) of strain IAPAR 306, only 139 were covered on less than 75% of their length by HERB_1937_Xci reads and will hereafter be designed as non-covered (S2 Table). Ninety-five of those CDS are present in multiple copies in IAPAR 306 with a strong nucleotide identity, leading to ambiguous mapping and poor coverage and can be considered as false absences. Among those 95 CDS, 77 are predicted to encode for full-length, or fragments of transposases. Four correspond to highly identical copies of Transcription Activator-Like Effector (TALE) genes (see specific paragraph below). The remaining multi-copy genes code for the elongation factor Tu, a xylose isomerase, a filamentous haemagglutinin and seven hypothetical proteins. Forty-four IAPAR 306 CDS were non-covered because no homologous reads were present in our dataset. Most of them hypothetically code for proteins of unknown function, with the exception of an identified type I restriction-modification system (including DNA methylase, endonuclease and specificity determinant), and six recombinases or integrases. Interestingly, 28 successive non-covered CDS correspond to a 27-kb block present only in IAPAR 306 and a few of its close relatives, which contains six transposases, three recombinases and 19 proteins of unknown function.

We verified the presence or absence of specific genes whose products have proven or are hypothetically involved in the pathogenicity of Xanthomonas. In particular, the type III secretion system (T3SS) is a syringe-like apparatus which injects “effectors” directly into the plant cell to inhibit plant defences and contribute to symptom development [44]. For this we investigated the presence in HERB_1937_Xci of a group of 82 genes found in Xci or other Xanthomonas species [45], encoding either for the T3SS or for type III effectors (T3E, which may participate in pathogenicity) [46]. Reads from HERB_1937_Xci covered 57 of those CDS on more than 94% of their sequence (S3 Table), including the entire set (24 CDS) of genes necessary for the T3SS and 33 potential T3E genes. The coverage of the remaining 25 CDS, all virulence factors from other Xanthomonas but not present in Xci [46] reached a maximum of 45.1% of their length (S3 Table), indicating the absence of the corresponding genes.

On the plasmids, most of the non-covered positions of HERB_1937_Xci were localized in four regions coding for TALE proteins (Fig 4 and S2 Table). These peculiar T3Es are responsible for the development of canker symptoms on citrus [47]; as our samples harbored such symptoms, we expected to find homologs in HERB_1937_Xci. Xci injects these TALE into the plant cell, activating the host’s transcriptional machinery to its benefit [48,49]. Tale genes encode for transcription activator-like proteins containing an N-terminal domain responsible for translocation from the bacterium to the plant cell, a C-terminal domain containing nuclear localization signals and a eukaryotic transcription activation domain, flanking tandem repeats of 33–34 conserved amino acids (S1 Fig). These repeats are highly homologous except for two amino-acid residues (called Repeat Variable Di-residues: RVD) responsible for DNA-binding specificity. We hypothesized that reads corresponding to tale genes were initially not mapped due to their particular structure and multicopy nature. Hence, we realized specific alignments using the sequences coding for either the N-terminal domain, the C-terminal domain and the repeat domain (reduced to a three-repeat string) of tale gene pthA4 as three independent references to test for the presence of tale sequences in HERB_1937_Xci. Almost 6,000 newly mapped reads corresponding to the tale gene were recovered (S1 Fig), with a mean depth of 44X for both 5’ and 3’ ends, about two times the mean depth of plasmid sequences outside tale gene positions (~23X). Moreover, two loci on the 5’ end sequence were biallelic, presenting either T or C bases with a T/C ratio of 43/57 and 35/65, respectively. One of these loci translated into a conservative amino-acid substitution, found elsewhere in TALEs of proteic databases. Taken together, these results suggest the existence of two to four different 5’ end sequences of tale genes, and therefore as many tale genes, in HERB_1937_Xci historical genome. Finally, among the remaining reads corresponding to the central repeat domain, we identified eight patterns of nucleotides coding for the RVD (S4 Table). Interestingly, although the most prevalent are found in modern Xci tale genes in similar proportions [50], three RVDs are unreported in modern TALE.

SNPs, phylogenetic reconstruction and tip-dating at the SWIO scale

We localized the SNPs between HERB_1937_Xci and the IAPAR 306 reference genome (Fig 4). After filtration of dubious SNPs (i.e. eliminated because of low depth, heteroplasy and/or proximity to another SNP), HERB_1937_Xci displayed 83 high-quality SNPs on its chromosome sequence, one and four in sequences corresponding to pXAC33 and pXAC64, respectively. Forty-three SNPs were non-synonymous substitutions on the chromosome reference sequence, one on pXAC33 and three on pXAC64. The SNPs found between HERB_1937_Xci and IAPAR 306 were not characterized any further; a more meaningful analysis was performed for SNPs identified between HERB_1937_Xci and its related SWIO clade.

A total of 2,634 high confidence SNPs were found within the alignment of HERB_1937_Xci historical chromosome with 116 modern samples from the SWIO islands. The ClonalFrameML [51] analysis identified a single 5.9 kb recombinant region including two SNPs, which were removed from further inferences. For the 2,632 recombination-free SNPs, we estimated a ratio of non-synonymous (dN) to synonymous (dS) changes of 4.16. We identified 15 SNPs unique to HERB_1937_Xci and restricted to chromosome sequences, among which 14 were attributed to coding sequences. Analysis of these SNPs led to the identification of three synonymous and 11 non-synonymous mutations, which were characterized at the protein level. Interestingly, seven of those reveal unique amino-acids at these positions (among similar but non-redundant proteins of the Xanthomonas genus identified by BLASTp), thus harboring previously unknown proteic features (S5 Table). We added an outgroup and built a Maximum-Likelihood (ML) phylogeny with RAxML [52], which placed HERB_1937_Xci historical sequence outside of the “modern” SWIO clade (S2 Fig). The ML tree was well-supported and structured in three lineages: a Mauritius lineage (lineage A), sister-group of the rest of the modern strains of the SWIO comprising two lineages, the first with strains from Mauritius and Reunion (lineage B), and the second with strains from all SWIO islands (lineage C) (S2 Fig).

As a requirement to perform tip-based calibration, we tested the presence of temporal signal in our tree with both a linear regression between samples ages and root-to-tip distance, and a date-randomization test [22]. Both statistical tests revealed the presence of temporal signal (i.e. progressive accumulation of substitutions over time) within the SWIO tree. The linear regression test displayed a significant positive slope (value = 19.236x10-5, adjusted R2 = 0.270 with a p-value = 2.07x10-10), with HERB_1937_Xci showing clear evidence of branch shortening (Fig 6B). The date-randomization test of the inferred root age of the real versus date-randomized datasets showed no overlap (95% Highest Posterior Density, S3 Fig). Therefore, we built a time-calibrated tree with BEAST [53], which was globally congruent (similar topology and node supports) with the ML tree (Fig 6A). Phylogenetic diversity of Mauritius island strains (1530.4) was significantly higher (p-value = 2.2x10-16) than those calculated from the other islands (Reunion strains = 1518.7, Rodrigues = 691.1, Comoros = 1024.0 and Mayotte = 319.0). We inferred a root date of 1843 [95% HPD: 1803–1881] and a mean substitution rate of 9.4x10-8 [95% HPD: 7.3x10-8–11.4x10-8] per site per year, with a standard deviation for the uncorrelated log-normal relaxed clock of 0.271 [95% HPD: 0.182–0.366] suggesting low heterogeneity amongst branches (Figs 6B and S4). To specifically evaluate the contribution of HERB_1937_Xci, we considered modern strains only: although the dataset still displayed temporal signal (slope value = 9.885x10-5, adjusted R2 = 0.077, p-value = 0.0009) (Fig 6B), the BEAST analysis performed under the same parameters yielded significantly different values. An older tree root date of 1800 [95% HPD: 1745–1852] was inferred, together with a lower mean substitution rate of 8.2x10-8 substitutions per site per year [95% HPD: 6.4x10-8–9.9x10-8] and a standard deviation for the uncorrelated log-normal relaxed clock of 0.188 [95% HPD: 0.082–0.289] among branches (S4B Fig). In summary, when comparing the estimates of root ages–with and without HERB_1937_Xci in the datasets–our results indicate that integrating the historical sequence significantly improves the accuracy of the temporal inferences, with a reduction of the 95% HPD from ~107 years to ~78 years (Fig 6C).

Fig 6. Tip-dating Bayesian inferences on historical and modern Xci genomes from the SWIO islands.

Fig 6

(A) Dated BEAST tree of 116 Xci modern strains sampled from the SWIO islands between 1978 and 2015 with historical HERB_1937_Xci (highlighted in green) built from 2,632 non recombining SNPs. Node support values are displayed by diamonds, in white for Posterior Probabilities below 0.9, in black for values above 0.9; node bars cover 95% Highest Probability Density of node height. The tree is structured in three lineages (A, B & C). Branches are collapsed and colored, according to the sample’s geographic origin, except lineage A, which is cartooned to help visualization. Tip labels include the geographic origin and, in cases of collapsed or cartooned branches, the number of samples. Map layer is from Natural Earth, available from https://www.naturalearthdata.com. (B) Linear regression of root-to-tip distance on year of sampling (tip date) test for temporal signal. Regression lines are plotted in blue when integrating historical HERB_1937_Xci genome and in red (dotted lines) when not. Grey areas correspond to their confidence interval. Associated values are the regression equation, adjusted R2 (Adj R2) and p-value. (C) Boxplot distribution of root age, with (left) and without (right) integrating historical HERB_1937_Xci in the dating inference, and associated statistical comparisons. Boxes represent 25th to 75th percentiles, Minimum-Maximum intervals are displayed by a vertical bar and outliers as circles.

Discussion

We sequenced the genome of HERB_1937_Xci, an historical strain of the crop bacterial pathogen Xanthomonas citri pv. citri (Xci) from an infected herbarium specimen sampled in 1937 in Mauritius. To our knowledge, HERB_1937_Xci is the first historical genome of a pathogenic bacterium obtained from herbarium material. Similar achievement has been previously successfully realized on viruses [28,54], oomycetes such as Phytophthora infestans [14,23,24], and more recently on cyanobacteria [55]. But for plant pathogenic bacteria in general, and more specifically for Xci, only multilocus genotyping data could be exploited from such historical material [29].

Adopting a shotgun-based deep sequencing strategy allowed us to describe the metagenomic diversity contained within our historical herbarium specimen. Among assigned reads, HERB_1937 displayed 1.2% of Xci DNA for 21.0% of Citrus sp. DNA, a pathogen/plant ratio in the range of those previously observed for P. infestans, a nonvascular pathogen isolated from infected herbarium potato leaves [20,23,24]. The microbial community also contained several bacterial genera, all described in NGS studies as part of the citrus leaf [56] or root [56,57] microbiota. The three most prominent genera (Methylobacterium, Curtobacterium and Sphingomonas, >0.5% of aligned reads) belong to the core citrus leaf microbiome [58], and the relative abundance of Methylobacterium reads among bacteria (11.7%) is consistent with studies on modern samples (from 5 to 58% [56]). These bacterial genera were thus likely associated to the living citrus plant, and/or to HERB_1937 sample, colonized during collection and storage in the herbarium. Bieker et al. [59], using deamination studies, identified a fungal species proposed to have colonized herbarium specimens shortly after collection. As illustrated by the typical aDNA patterns we observed for Methylobacterium spp., we may exclude recent or laboratory contaminations [60]. Interestingly, up to 5.4% of the reads were assigned to human DNA resulting from contaminations during specimen manipulation (collection, mounting or storage). Finally, a substantial amount (60.1%) of HERB_1937 were unassigned reads, reflecting either the incompleteness of the reference database as compared to the microbial diversity of the sample [61] or the difficulty for short reads to be assigned taxonomically, a typical result in ancient DNA research [62].

Characterization of DNA degradation patterns specific to aDNA (fragmentation, depurination and deamination) combined with clear evidence of branch shortening confirmed the historical nature of our reconstructed Xci genome, a key point in any ancient DNA study [12,41]. Patterns of DNA degradation of HERB_1937_Xci appeared consistent with those measured on P. infestans from 19th century herbarium samples [23,24,61]. Interestingly, we observed significantly higher deamination rates of cytosine residues in reads mapping to either of the two plasmids, as compared to reads mapping to chromosomal DNA. Depurination rates and fragment sizes did not harbour such significant differences in our study. A possible explanation for our observation relates to differential methylation patterns of cytosines. In a recent study investigating epigenetic modifications in Xanthomonas species, N4-methylcytosines (N4meC, a bacteria-specific pattern) were identified in higher proportions in chromosomes versus plasmids [63]. Interestingly, N4meC have previously been found to be more resistant to deamination than unmethylated cytosines [64]. The lower deamination rate observed on HERB_1937_Xci chromosome (as compared to the plasmids) could thus be due to a better protection of the chromosomal cytosines from deamination, independently of depurination and fragmentation mechanisms. Further investigations, such as ancient methylome mappings [65], should refine our molecular understanding of the degradation patterns observed in this study.

Phylogenetic reconstruction confidently placed HERB_1937_Xci at the root of the modern SWIO lineages, a position that reveals its genetic relatedness with the SWIO Xci founding population. Modern Mauritian strains were found in all three main SWIO lineages and displayed the highest phylogenetic diversity, a typical pattern for source populations during biological invasions [66], which points Mauritius as the most likely entry point of ACC disease in the SWIO islands. Future studies including new historical genomes from other islands will be necessary to confirm this hypothesis. We estimated the age of the ancestor of all strains (i.e. the root), which is a proxy for Xci emergence date to 1843 [95% HPD: 1803–1881]. This predates the earliest record of the disease in the area (1917 in Mauritius [39]) and refines the recent estimation of 1818 [95% HPD: 1762–1868] obtained from modern strains only [36]. Xci and its main host genus, Citrus, originated in Asia [34,35] and were most likely disseminated out of their area of origin by human-mediated movements of plants or plant propagative material [31,67]. Richard et al. proposed two possible origins of the pathogen in the SWIO [36]. On the one hand, they hypothesized that a French botanist and colonial administrator, Pierre Poivre (1719–1786) could have introduced infected citrus plants from several Asian countries during his numerous peregrinations starting in mid-18th century [68]. Later, tens of thousands of indentured labourers arrived from several Asian countries (most numerously from India) after the abolition of slavery in Mauritius (1835) and Reunion (1848), mainly to work in agriculture [69]. This flow of people from the Asiatic continent, along with their possessions which consisted among other things of seeds, plants and fruits [70] may have led to the introduction of Xci in the SWIO area. The updated time frame of emergence inferred from our data favours the second scenario. Future work including strains from the hypothetical Asian cradle of Xci, with some possibly obtained from herbarium specimens, will be required to investigate the geographic origin of the strains that first invaded SWIO.

Although both the root position of HERB_1937_Xci and the monophyly of all SWIO strains suggest one or few successful historical introductions of genetically (and likely geographically) closely related Xci strains in this area, the structure of the phylogeny indicates multiple inter-island migration events, likely via infected plant material exchange. Such events may have first occurred between Mauritius and Reunion islands at the very beginning of the history of Xci in the area, as illustrated by the deepness of the most recent common ancestor (MRCA) shared between strains of those two islands that used to share tight historical and political links at the time. More recent migrations between i) Mauritius and Rodrigues (an island ca. 600 km east of Mauritius, part of its territory), ii) Reunion and the Comoros archipelago (Mayotte, 1,435 km distant from Reunion, part of the French overseas territories) and iii) the four islands of the Comoros archipelago (promoted by their historical and economic relationships, see insert Fig 6A). Altogether, our findings emphasise the influence of human-associated migratory events in shaping the global distribution and the emergence of preadapted crop pathogens, a well-known phenomenon [6,24,71,72]. Additionally, our results indicate that integrating historical genomes in phylogenetic analyses significantly refines divergence time estimates, as highlighted in previous ancient DNA studies [73,74].

Tip-date calibration of the SWIO phylogenetic tree also enabled us to estimate a mean mutation rate of 9.4x10-8 substitutions per site per year for Xci. This value is consistent with the recent estimation of 8.4x10-8 substitutions per site per year obtained by Richard et al. in the same area [36] and falls within estimations made over a similar time span (80 years) on several human-associated bacterial pathogens, spanning one order of magnitude (10−8–10−7) [75]. This rate, among the first published for a crop pathogen, is averaged across all sites of the non-recombining portions of the Xci chromosome and appears to be homogeneous within the various SWIO lineages. Interestingly, we observed a relatively high dN/dS ratio as compared to other bacterial species [76], which might result from selection for diversification following Xci emergence and evolution within SWIO islands. In summary, our substitution rate estimate is crucial to further studies, since it can improve the prediction of the evolution of Xci using various modelling-based frameworks.

Finally, we aimed to compare the genomic features of HERB_1937_Xci with its modern counterparts. Among the 15 SNPs unique to HERB_1937_Xci and restricted to chromosome sequences, five non-synonymous SNPs are considered to induce conservative amino acid changes, and are thus not expected to alter the conformation or the active site of their respective proteins. Interestingly, among the six non-conservative SNPs, the location (next to the hinge and binding domain) of an amino acid substitution of the essential metabolic enzyme isocitrate dehydrogenase could modulate its adaptability, and thus the fitness of the pathogen [77]. Finally, seven non-synonymous SNPs account for unique amino acids at given positions of Xanthomonas sequence alignments, providing an exclusive signature for seven HERB_1937_Xci specific protein homologs.

Our investigation of HERB_1937_Xci gene content showed that it was globally similar to the one observed in reference strain IAPAR 306. The non-covered CDS corresponded to repeated CDS or to absent CDS. The former are mostly transposases, or other multicopy genes. Among actually absent CDS are mostly proteins of unknown function, recombinases, or notably a type I restriction-modification system, together with four adjacent CDS. The 27-kb block probably corresponds to a genomic island recently acquired by strain IAPAR 306 (and a few of its close relatives) but absent in other Xci. It is inserted in the middle of a CDS encoding for a putative competence protein [78]. However, as our gene content analysis resulted from sequences reconstructed by mapping, we were unable to identify potential genome rearrangements. Furthermore, any genetic material present in HERB_1937_Xci but absent in the reference sequences used to reconstruct the historical genome would have been missed. Gene content investigation based on de novo assembly of historical reads would be a way to overcome this limitation but the short length of aDNA reads, their mixed origin as well as the relatively low coverage of HERB_1937_Xci hampered us from applying such strategy [79,80].

We showed that HERB_1937_Xci contains a complete set of genes for its type III secretion system, as well as the same assortment of effector genes as modern Xci strains [46]. In particular, Transcription Activator-Like Effectors (TALE) are crucial virulence factors for Xci [50]. We determined HERB_1937_Xci to possess between two and four paralogs of the functional tale gene pthA4 present in strain IAPAR 306, a value consistent with modern Xci strains [81]. Although it was not possible to localize them in the genome or reconstitute their central repeat domain, sequences corresponding to their N- and C-terminal domains, as well as a repertoire of RVD sequences, were reconstructed, suggesting their functionality. Most of the essential RVD sequences were present in approximately the same proportion as in modern Xci tale genes. Three unique sequences encoding unreported RVDs could be mutational variants of the present RVD sequences: AAA—from AAT—(K* from N*), CACGAA and CAGGAT from CACGAT (respectively HE and QD from HD). A design of TALENs with artificial RVDs recently showed that HE and QD were functional and preferentially binding to C on target DNA in vivo, like HD [82]. This suggests that apart from undetectable loss of genes (in the case of effectors not identified in databases) and modification in the structure of the repeat region of TALE genes (which might have an important impact on virulence), the effector repertoire of Xci has been stable at least since the time of the last common ancestor of all SWIO strains.

In summary, our results show that herbarium specimens can provide a wealth of genomic information on bacterial pathogens, their associated microbial community, or their plant host (an aspect that we did not explore in this study). The present work focused on a single herbarium specimen in order to evaluate the feasibility of genetic analyzes and the added value such samples bring to phylogenetic and epidemiological approaches. Broader studies to reconstruct Asiatic citrus canker’s worldwide propagation and evolutionary history would require additional, well-chosen, geographically and temporally representative samples. More generally, similar investigations could be and are performed on other important bacterial plant pathogens to elucidate their evolutionary history, investigate plant-pathogens interactions further, and study the temporal dynamic of plant-associated microbial communities. Such studies emphasise the interest of biological collections and will hopefully help to decipher the epidemiological and evolutionary factors leading to the emergence of plant pathogens. This, in turn, may provide clues to improve disease monitoring and achieve sustainable control.

Material & methods

Herbarium sampling

The collections of the Mauritius Herbarium (https://herbaria.plants.ox.ac.uk/bol/mau) were prospected in June 2017. Several citrus specimens displaying typical citrus canker lesions were sampled on site using gloves and sterile equipment and brought back to CIRAD laboratory in individual envelopes where they have been stored in vacuum-sealed boxes at 17°C until use. MAU 0015151 (Fig 1), a Citrus sp. specimen collected by Reginald E. Vaughan at Phoenix, Mauritius in 1937 was chosen as being the oldest specimen sampled from the SWIO area. The date and exact place of collection, which do not appear on the specimen itself, were found in the original collection book of the collector. MAU 0015151 was deposited in 1937 at the collection of the Mauritius Institute (Port Louis, Mauritius). This collection was moved in 1960 to Réduit to form the core of The Mauritius Herbarium (MAU, acronym according Thiers 2021), where it has been preserved since under controlled temperature and humidity and regularly poisoned (e.g. fumigation and/or use of Kew mixture: solution of mercury, phenol, and ethanol).

DNA extraction, quality control and real-time quantitative PCR assay

HERB_1937 sample DNA extraction was performed in a bleach-cleaned facility room with no prior exposure to modern Xci DNA. DNA extraction was performed following a custom CTAB protocol modified from Ausubel (2003) [83]. Briefly, a pool of five canker lesions (to obtain approximately 10 mg) from a single leaf of HERB_1937 were cut. A 10 mg piece of a plant species that is not a host to Xci, a Coffea arabica herbarium 1965 specimen, was integrated as an aDNA negative control sample. Both samples were pulverised at room temperature and soaked in a CTAB extraction solution (1% CTAB, 700 mM NaCl, 0.1 mg/mL Proteinase K, 0.05 mg/mL RNAse A, 0.5% N- lauroylsarcosine, 1X Tris-EDTA) under constant agitation and until tissue lysis at 56°C (up to six hours); an equal volume of 24:1 chloroform:isoamyl alcohol was added before centrifugation and recuperation of the aqueous phase (twice), followed by adding 7/3 volume of pure ethanol for an overnight precipitation at -20°C. Dried pellets were resuspended in 10 mM Tris buffer and stored at -20°C until further use. Quality assessment was performed for fragment size and concentration with Qubit (Invitrogen life Technologies) and TapeStation (Agilent Technologies) high sensitivity assays, according to the manufacturers’ instructions. To confirm the specific presence of Xci in HERB_1937, we performed the Xci-exclusive Xac-qPCR diagnostic assay developed by Robène et al. on 3 replicates of 5 μL water-diluted (10 fold) DNA extract, our negative control and following the recommended amplification conditions [40].

Library preparation & sequencing

Library preparation and sequencing were outsourced (https://www.fasteris.com/dna/). Briefly, DNA was converted into a double-stranded library using a custom TruSeq DNA Nano Illumina protocol omitting the fragmentation step and using a modified bead ratio to keep small fragments. Sequencing of both HERB_1937 and the negative control sample was performed in a paired-end 2×150 cycles configuration on a single lane of the NextSeq flow cell.

Initial read trimming and merging

BBDuk from BBMap 37.92 [84] was first run with an entropy of 0.6 to remove artefactual homopolymer sequences. Illumina adaptors were trimmed out using the Illuminaclip option in Trimmomatic 0.36 [85]. Such roughly trimmed-reads were processed using the post-mortem DNA damage pipeline detailed below. Additional quality-trimming was performed with Trimmomatic 0.36 based on base-quality (LEADING:15; TRAILING:15; SLIDINGWINDOW:5:15) and read length (MINLEN:30). Paired reads were then merged using AdapterRemoval 2.2.2 [86] using default parameters before running both the metagenomic and the phylogenetic pipelines detailed below and in Fig 2.

Negative control sequences analysis

Reads generated from the negative control sample were sequentially mapped to reference sequence genomes of Coffea arabica (GCA_003713225.1), Citrus sinensis (AJPS00000000.1) and Xci (strain IAPAR 306, chromosome NC_003919.1, plasmids pXAC33 NC_003921.3 and pXAC64 NC_003922.1) using BWA-aln 0.7.15 (default options and seed disabled).

Metagenomic pipeline

The metagenomic composition of historical HERB_1937 sample was assessed following a two-step procedure. First, reads were sequentially mapped to reference sequence genomes of human (GCF_000001405.39), Citrus sinensis (AJPS00000000.1) and Xci (strain IAPAR 306, chromosome NC_003919.1, plasmids pXAC33 NC_003921.3 and pXAC64 NC_003922.1) using the “very-sensitive” option (seed of–L 20) of Bowtie 2 [87]. In a second step, BLAST analysis was performed on 1,000,000 randomly chosen unmapped reads against the nucleotide database using the blastn command of NCBI BLAST 2.2.31 [88]. Only top hits with an e-value below 0.001 were saved. The proportion of each taxon in the sample was scaled over the total number of reads.

Ancient DNA damage assessment pipeline

Post-mortem DNA damage measured by DNA fragment length distribution, purine frequencies before DNA breakpoints and 5’ C to T or 3’ G to A misincorporation patterns were assessed with mapDamage2 [89] for both the historical specimen and three modern strains (strains LJ225-01, LK144-08 & LM053-06 isolated in 2012, 2013 and 2015, respectively—see S1 Table). Alignments were generated using BWA-aln 0.7.15 (default options and seed disabled) [90] as short-read aligner for the historical specimen and Bowtie 2 (options—non-deterministic—very-sensitive) [87] for the modern strains using IAPAR 306 Xci reference genome (plasmids pXAC33, pXAC44 and chromosome). PCR duplicates were removed using picardtools 2.7.0 MarkDuplicates [91]. An independent damage assessment was performed using Methylobacterium reference sequence (Methylobacterium organophilum strain DSM 760 QEKZ01000001.1) with BWA-aln (same options as above). Statistical analyses were performed using GraphPad Prism version 6.00 for macOS, GraphPad Software, San Diego, California USA (www.graphpad.com) [92].

Historical genome reconstruction & characterization

Sequencing depths were computed using BEDTools genomecov 2.24.0 [93], and graphically represented with CIRCOS 0.69.9 [94]. BAM files were extremity-trimmed for 5 bp at each end with BamUtil 1.0.14 [95]. SNPs were called with GATK UnifiedGenotyper [96]. SNPs that met at least one of the following conditions: depth<average depth + 1sd [X = 9], allelic frequency<0.9, distance from another SNP<20 bp were considered as dubious and filtered out. Consensus historical sequences were then reconstructed by introducing the remaining high-quality SNPs in the reference genome and replacing both filtered-out variants and non-covered sites (depth = 0) by an N. Non-covered regions were identified with BEDTools 2.24.0 [93].

Gene content analysis

The presence (or absence) of a CDS was assumed when its sequence coverage was found to be above (or below) a 75% threshold. Their repeated nature, as well as their hypothetical functions (as predicted for strain IAPAR 306, chromosome NC_003919.1, plasmids pXAC33 NC_003921.3 and pXAC64 NC_003922.1) were assessed using the annotated reference sequences within the genome browser and synteny tool of the MicroScope platform [97] based on a small set of public strains from the SWIO and the rest of the world (C40, LH201, LB100-1, JJ10-1, FDC217, LG115, LG97, LB302 [33]), and a few additional representatives of the genus Xanthomonas (X. citri pv. bilvae strain NCPPB 3213, X. euvesicatoria 85–10, X. campestris pv. campestris 8004, X. perforans 91–118).

To investigate the presence of virulence factors in HERB_1937_Xci, we used a list of 82 Type III effectors (see list in S3 Table) found in Xanthomonas [45,46]. The reference sequences used to assess homology were the IAPAR 306 CDS when available or other Xanthomonas CDS for genes not present in Xci. We assessed coverage for the 57 effectors found in Xci from the reconstructed historical genome. For the 25 effectors from other Xanthomonas, reads were realigned on reference sequences with BWA-aln as described above. Coverage data was recovered from BAM files with BAMStats 1.25 tool [98].

In a second step, we aimed to specifically retrieve reads that initially did not map to tale genes. We performed a BWA-aln alignment (options as above) on the sequences coding for the conserved N- and C-terminal domains of the pthA4 CDS from strain IAPAR 306 (S1 Fig). For reads corresponding to the central repeat domain, we constructed a chimera sequence of three repeats (containing Ns at the variable nucleotide positions in RVD) as a reference for the mapping.

Phylogeny pipeline & tree-calibration

An alignment of HERB_1937_Xci and 116 modern genomes (date range: 1978–2015) from the SWIO islands was constructed for phylogenetic analyses, with the modern Xci strain LG117 from Bangladesh used as outgroup (CDAX01000000) (S1 Table). Variants from modern strains were independently called and filtered using the same parameters as for HERB_1937_Xci (except for the threshold on depth that was modified to a value of 15). Regions acquired via horizontal gene transfers were identified with ClonalFrameML [51] and removed to account for the effect of recombination on phylogenetic reconstruction and avoid incongruent trees. A Maximum Likelihood tree was constructed using RAxML 8.2.4 [52] using a rapid Bootstrap analysis, a General Time-Reversible model of evolution [99] following a Γ distribution with four rate categories (GTRGAMMA) and 1,000 alternative runs.

The existence of a temporal signal was investigated by two different tests. First, a linear regression test between sample age and root-to-tip distances (computed from the ML tree) was done using the distRoot function from the “adephylo” R package [100]. Temporal signal was considered present if a significant positive correlation was observed. Secondly, we performed a date-randomization test [101] with 20 independent date-randomized datasets using R package “TipDatingBeast” [102]. Temporal signal was considered present when there was no overlap between the inferred root height 95% Highest Posterior Density (95% HPD) of the initial dataset and that of 20 date-randomized datasets. Tip-dating calibration Bayesian inferences were performed with BEAST 1.8.4 [53]. For this, leaf heights were constrained to be proportional to sample ages. Flat priors (i.e., uniform distributions) for the substitution rate (10−12 to 10−2 substitutions/site/year), as well as for the age of any internal node in the tree, were applied. We also considered a GTR substitution model with a Γ distribution and invariant sites (GTR+G+I), an uncorrelated relaxed log-normal clock to account for variations between lineages, and a tree prior for demography of exponential growth as best-fit parameters described in Richard et al. [36]. The Bayesian topology was conjointly estimated with all other parameters during the Markov Chain Monte-Carlo and no prior information from the ML tree was incorporated in BEAST. Three independent chains were run for 25 million steps and sampled every 2,500 steps with a burn-in of 2,500 steps. Convergence to the stationary distribution and sufficient sampling and mixing were checked by inspection of posterior samples (effective sample size >200) in Tracer 1.7.1 [103]. Parameter estimation was based on the samples combined from the different chains. The best-supported tree was estimated from the combined samples by using the maximum clade credibility method implemented in TreeAnnotator [53]. In order to assess the effect of including our historical sample in the tree calibration, we computed the same inferences on a dataset excluding HERB_1937_Xci. Wilcoxon rank sum test with continuity correction and a Bartlett test of homogeneity of variances were performed on the posterior estimates of the tree root age, to respectively compare the mean and variance of this parameter from both datasets. Finally, phylogenetic diversity (PD) [104], calculated as the sum of branch lengths of the minimum spanning path between strains of the region (island, or group of islands in the case of the Comoros) was calculated on patristic distances from the reconstructed phylogeny using the distRoot function implemented in “adephylo” R package [100]. To account for heterogeneity in region samplings, they were down-sampled to the smallest sampling (Mayotte = 10) and PD by region averaged over 1,000 iterations. PD comparison was done using a Wilcoxon rank sum test with continuity correction.

Supporting information

S1 Fig. Reads depth of a Transcription Activator-Like Effector (TALE) gene of HERB_1937.

(PDF)

S2 Fig. Maximum Likelihood (ML) phylogenetic tree of Xci genomes.

(PDF)

S3 Fig. Date-randomization test results.

(PDF)

S4 Fig. Effect of integrating HERB_1937_Xci on substitution rate estimates in BEAST.

(PDF)

S1 Table. Published modern genomes included in the phylogenetic analyzes.

(PDF)

S2 Table. List of Xci reference strain IAPAR 306 coding sequences (CDS) covered on less than 75% of their length by HERB_1937_Xci reads and hence designed as non-covered.

(PDF)

S3 Table. List and coverage of 82 Xanthomonas virulence factors CDS (pthA4 not included) used in this study.

(PDF)

S4 Table. List and frequency of nucleotide patterns coding for RVD found in HERB_1937_Xci reads.

(PDF)

S5 Table. Description of the 14 SNPs found in coding regions between HERB_1937_Xci and modern strains of the SWIO clade.

(PDF)

Acknowledgments

We are grateful to F. Chiroleu, A. Doizy, A. Duvermy, P. Lefeuvre, F. Balloux, V. Llaurens, R. Debruyne, A. Pérez-Quintero & I. Robène for valuable comments and discussions. Computational work was performed on the CIRAD HPC data center of the South Green bioinformatics platform (http://www.southgreen.fr/)

Data Availability

The authors confirm that all data underlying the findings are fully available without restriction. HERB_1937 raw reads were deposited to the Sequence Read Archive (SRR12792042). Consensus historical genome reconstructed for chromosome, plasmids pXAC33 and pXAC64 have also been deposited on GenBank database (CP072205-CP072207). The modern genomes used in this study have previously been published in the NCBI GenBank repository under accession numbers listed in S1 Table. Accession numbers of any previously published data used in this study are listed in Supplementary information.

Funding Statement

This work was financially supported by l’Agence Nationale pour la Recherche (AR: JCJC MUSEOBACT contrat ANR-17-CE35-0009-01; https://anr.fr/), the European Regional Development Fund (AR, KB, OP, NB: ERDF contract GURDT I2016‐1731‐0006632; https://www.europe-en-france.gouv.fr/fr/fonds-europeens/fonds-europeen-de-developpement-regional-FEDER), Région Réunion (AR, KB, OP, NB; https://www.regionreunion.com/), the French Agropolis Foundation Labex Agro –Montpellier (AR, OP, PR, BS, NB, LG: E-SPACE project number 1504-004) & (AR, PR, BS, LG: MUSEOVIR project number 1600-004; https://www.agropolis-fondation.fr/?lang=en), the SYNTHESYS Project (LG: grant GB-TAF-6437 & AR: grant GB-TAF-7130; http://www.synthesys.info/), the COST Action (LG, BS: grant CA16107 EuroXanth supported by COST; https://www.cost.eu/) & CIRAD/AI-CRESI (AR, PR, BS, LG : grant 3/2016; https://www.cirad.fr/en/home-page). PhD of PC. was co-funded by ED 227, Muséum national d'Histoire naturelle et Sorbonne Université, French Ministry of Higher Education, Research and Innovation, France. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Stukenbrock EH, McDonald BA. The origins of plant pathogens in agro-ecosystems. Annu Rev Phytopathol. 2008;46:75–100. doi: 10.1146/annurev.phyto.010708.154114 [DOI] [PubMed] [Google Scholar]
  • 2.Turner RS. After the famine: plant pathology, Phytophthora infestans, and the late blight of potatoes, 1845–1960. Hist Stud Phys Biol. 2005;35:341–70. doi: 10.1525/hsps.2005.35.2.341 [DOI] [Google Scholar]
  • 3.Savary S, Willocquet L, Pethybridge SJ, Esker P, McRoberts N, Nelson A. The global burden of pathogens and pests on major food crops. Nat Ecol Evol. 2019;3:430–9. doi: 10.1038/s41559-018-0793-y [DOI] [PubMed] [Google Scholar]
  • 4.FAO, IFAD, UNICEF, WFP & WHO. The state of food security and nutrition in the world—Building resilience for food and food security. Rome: FAO; 2017. [Google Scholar]
  • 5.Bernades MFF, Pazin M, Pereira LC, Dorta DJ. Impact of pesticides on environmental and human health. Toxicology studies: cells, drugs and environment. Rijeka, Croatia; 2015. pp.195–233. [Google Scholar]
  • 6.Anderson PK, Cunningham AA, Patel NG, Morales FJ, Epstein PR, Daszak P. Emerging infectious diseases of plants: pathogen pollution, climate change and agrotechnology drivers. Trends Ecol Evol. 2004;19:535–44. doi: 10.1016/j.tree.2004.07.021 [DOI] [PubMed] [Google Scholar]
  • 7.Pallen MJ, Loman NJ, Penn CW. High-throughput sequencing and clinical microbiology: progress, opportunities and challenges. Curr Opin Microbiol. 2010;13:625–31. doi: 10.1016/j.mib.2010.08.003 [DOI] [PubMed] [Google Scholar]
  • 8.Relman DA. Microbial genomics and infectious diseases. N Engl J Med. 2011;365:347–57. doi: 10.1056/NEJMra1003071 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Croucher NJ, Didelot X. The application of genomics to tracing bacterial pathogen transmission. Curr Opin Microbiol. 2015;23:62–7. doi: 10.1016/j.mib.2014.11.004 [DOI] [PubMed] [Google Scholar]
  • 10.Li LM, Grassly NC, Fraser C. Genomic analysis of emerging pathogens: methods, application and future trends. Genome Biol. 2014;15:1–9. doi: 10.1186/s13059-014-0541-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Drummond AJ, Pybus OG, Rambaut A, Forsberg R, Rodrigo AG. Measurably evolving populations. Trends Ecol Evol. 2003;18:481–8. doi: 10.1016/S0169-5347(03)00216-7 [DOI] [Google Scholar]
  • 12.Pääbo S, Poinar H, Serre D, Jaenicke-Després V, Hebler J, Rohland N, et al. Genetic analyses from ancient DNA. Annu Rev Genet. 2004;38:645–79. doi: 10.1146/annurev.genet.37.110801.143214 [DOI] [PubMed] [Google Scholar]
  • 13.Bieker VC, Martin MD. Implications and future prospects for evolutionary analyses of DNA in historical herbarium collections. Bot Lett. 2018;165:409–18. doi: 10.1080/23818107.2018.1458651 [DOI] [Google Scholar]
  • 14.Yoshida K, Burbano HA, Krause J, Thines M, Weigel D, Kamoun S. Mining herbaria for plant pathogen genomes: back to the future. PLoS Pathog. 2014;10:1–6. doi: 10.1371/journal.ppat.1004028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ristaino JB, Groves CT, Parra GR. PCR amplification of the Irish potato famine pathogen from historic specimens. Nature. 2001;411:695–7. doi: 10.1038/35079606 [DOI] [PubMed] [Google Scholar]
  • 16.May KJ, Ristaino JB. Identity of the mtDNA haplotype(s) of Phytophthora infestans in historical specimens from the Irish potato famine. Mycol Res. 2004;108:171–9. doi: 10.1017/s0953756204009876 [DOI] [PubMed] [Google Scholar]
  • 17.Saville AC, Martin MD, Ristaino JB. Historic late blight outbreaks caused by a widespread dominant lineage of Phytophthora infestans (Mont.) de Bary. PLoS ONE. 2016;11:1–22. doi: 10.1371/journal.pone.0168381 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Antonovics J, Hood ME, Thrall PH, Abrams JY, Duthie GM. Herbarium studies on the distribution of anther-smut fungus (Microbotryum violaceum) and Silene species (Caryophyllaceae) in the eastern United States. Am J Bot. 2003;90:1522–31. doi: 10.3732/ajb.90.10.1522 [DOI] [PubMed] [Google Scholar]
  • 19.Gutaker RM, Reiter E, Furtwängler A, Schuenemann VJ, Burbano HA. Extraction of ultrashort DNA molecules from herbarium specimens. BioTechniques. 2017;62:1–4. doi: 10.2144/000114517 [DOI] [PubMed] [Google Scholar]
  • 20.Yoshida K, Sasaki E, Kamoun S. Computational analyses of ancient pathogen DNA from herbarium samples: challenges and prospects. Front Plant Sci. 2015;6:1–6. doi: 10.3389/fpls.2015.00001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Biek R, Pybus OG, Lloyd-Smith JO, Didelot X. Measurably evolving pathogens in the genomic era. Trends Ecol Evol. 2015;30:306–13. doi: 10.1016/j.tree.2015.03.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Rieux A, Balloux F. Inferences from tip-calibrated phylogenies: a review and a practical guide. Mol Ecol. 2016;25:1911–24. doi: 10.1111/mec.13586 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Martin MD, Cappellini E, Samaniego JA, Zepeda ML, Campos PF, Seguin-Orlando A, et al. Reconstructing genome evolution in historic samples of the Irish potato famine pathogen. Nat Commun. 2013;4:1–7. doi: 10.1038/ncomms3172 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Yoshida K, Schuenemann VJ, Cano LM, Pais M, Mishra B, Sharma R, et al. The rise and fall of the Phytophthora infestans lineage that triggered the Irish potato famine. eLife. 2013;2:1–25. doi: 10.7554/eLife.00731 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ristaino JB. The importance of mycological and plant herbaria in tracking plant killers. Front Ecol Evol. 2020;7:1–11. doi: 10.3389/fevo.2019.00521 [DOI] [Google Scholar]
  • 26.Al Rwahnih M, Rowhani A, Golino D. First report of Grapevine red blotch-associated virus in archival grapevine material from Sonoma County, California. Plant Dis. 2015;99:895. doi: 10.1094/PDIS-12-14-1252-PDN [DOI] [Google Scholar]
  • 27.Malmstrom CM, Shu R, Linton EW, Newton LA, Cook MA. Barley yellow dwarf viruses (BYDVs) preserved in herbarium specimens illuminate historical disease ecology of invasive and native grasses. J Ecology. 2007;95:1153–66. doi: 10.1111/j.1365-2745.2007.01307.x [DOI] [Google Scholar]
  • 28.Smith O, Clapham A, Rose P, Liu Y, Wang J, Allaby RG. A complete ancient RNA genome: identification, reconstruction and evolutionary history of archaeological Barley stripe mosaic virus. Sci Rep. 2015;4:1–6. doi: 10.1038/srep04003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Li W, Song Q, Brlansky RH, Hartung JS. Genetic diversity of citrus bacterial canker pathogens preserved in herbarium specimens. PNAS. 2007;104:18427–32. doi: 10.1073/pnas.0705590104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Hasse CH. Pseudomonas citri, the cause of citrus canker. J Agric Res. 1915;97–100. [Google Scholar]
  • 31.Graham JH, Gottwald TR, Cubero J, Achor DS. Xanthomonas axonopodis pv. citri: factors affecting successful eradication of citrus canker. Mol Plant Pathol. 2004;5:1–15. doi: 10.1046/j.1364-3703.2004.00197.x [DOI] [PubMed] [Google Scholar]
  • 32.Fawcett HS, Jenkins AE. Records of citrus canker from herbarium specimens of the genus Citrus in England and the United States. Phytopathology. 1933;820–4. [Google Scholar]
  • 33.Gordon JL, Lefeuvre P, Escalon A, Barbe V, Cruveiller S, Gagnevin L, et al. Comparative genomics of 43 strains of Xanthomonas citri pv. citri reveals the evolutionary events giving rise to pathotypes with different host ranges. BMC Genomics. 2015;16:1–20. doi: 10.1186/1471-2164-16-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Patané JSL, Martins J, Rangel LT, Belasque J, Digiampietri LA, Facincani AP, et al. Origin and diversification of Xanthomonas citri subsp. citri pathotypes revealed by inclusive phylogenomic, dating, and biogeographic analyses. BMC Genomics. 2019;20:1–23. doi: 10.1186/s12864-018-5379-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Pruvost O, Magne M, Boyer K, Leduc A, Tourterel C, Drevet C, et al. A MLVA genotyping scheme for global surveillance of the citrus pathogen Xanthomonas citri pv. citri suggests a worldwide geographical expansion of a single genetic lineage. PLoS ONE. 2014;9:1–11. doi: 10.1371/journal.pone.0098129 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Richard D, Pruvost O, Balloux F, Boyer C, Rieux A, Lefeuvre P. Time-calibrated genomic evolution of a monomorphic bacterium during its establishment as an endemic crop pathogen. Mol Ecol. 2020;1–13. doi: 10.1111/mec.15328 [DOI] [PubMed] [Google Scholar]
  • 37.Richard D, Ravigné V, Rieux A, Facon B, Boyer C, Boyer K, et al. Adaptation of genetically monomorphic bacteria: evolution of copper resistance through multiple horizontal gene transfers of complex and versatile mobile genetic elements. Mol Ecol. 2017;26:2131–49. doi: 10.1111/mec.14007 [DOI] [PubMed] [Google Scholar]
  • 38.Pruvost O, Boyer K, Ravigné V, Richard D, Vernière C. Deciphering how plant pathogenic bacteria disperse and meet: Molecular epidemiology of Xanthomonas citri pv. citri at microgeographic scales in a tropical area of Asiatic citrus canker endemicity. Evol Appl. 2019;12:1523–38. doi: 10.1111/eva.12788 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Aubert B. Vergers de la Réunion et de l’Océan Indien. CIRAD. Hommes et fruits en pays du Sud. CIRAD. 2014. pp.111–65. French
  • 40.Robène I, Maillot-Lebon V, Chabirand A, Moreau A, Becker N, Moumène A, et al. Development and comparative validation of genomic-driven PCR-based assays to detect Xanthomonas citri pv. citri in citrus plants. BMC Microbiol. 2020;20:1–13. doi: 10.1186/s12866-019-1672-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.da Silva ACR, Ferro JA, Reinach FC, Farah CS, Furlan LR, Quaggio RB, et al. Comparison of the genomes of two Xanthomonas pathogens with differing host specificities. Nature. 2002;417:459–63. doi: 10.1038/417459a [DOI] [PubMed] [Google Scholar]
  • 42.Dabney J, Meyer M, Pääbo S. Ancient DNA damage. Cold Spring Harb Perspect Biol. 2013;7:1–8. doi: 10.1101/cshperspect.a012567 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Jónsson H, Ginolhac A, Schubert M, Johnson PLF, Orlando L. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics. 2013;29:1682–4. doi: 10.1093/bioinformatics/btt193 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Büttner D. Behind the lines–actions of bacterial type III effector proteins in plant cells. FEMS Microbiol Rev. 2016;40:894–937. doi: 10.1093/femsre/fuw026 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.The Xanthomonas Resource (http://www.xanthomonas.org/t3e.html). 2018. Available from http://www.xanthomonas.org/t3e.html (accessed in May 2020)
  • 46.Escalon A, Javegny S, Vernière C, Noël LD, Vital K, Poussier S, et al. Variations in type III effector repertoires, pathological phenotypes and host range of Xanthomonas citri pv. citri pathotypes: type III effectors in Xanthomonas citri pv. citri. Mol Plant Pathol. 2013;14:483–96. doi: 10.1111/mpp.12019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Swarup S, De Feyter R, Brlansky RH, Gabriel DW. A pathogenicity locus from Xanthomonas citri enables strains from several pathovars of X. campestris to elicit canker-like lesions on citrus. Phytopathology. 1991;81:802–9. [Google Scholar]
  • 48.Duan YP, Castañeda A, Zhao G, Erdos G, Gabriel DW. Expression of a single, host-specific, bacterial pathogenicity gene in plant cells elicits division, enlargement, and cell death. Mol Plant Microbe Interact. 1999;12:556–60. doi: 10.1094/MPMI.1999.12.6.556 [DOI] [Google Scholar]
  • 49.Hu Y, Zhang J, Jia H, Sosso D, Li T, Frommer WB, et al. Lateral organ boundaries 1 is a disease susceptibility gene for citrus bacterial canker disease. PNAS. 2014;111:521–9. doi: 10.1073/pnas.1318582111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Perez-Quintero AL, Szurek B. A decade decoded: spies and hackers in the history of TAL effectors research. Annu Rev Phytopathol. 2019;57:459–81. doi: 10.1146/annurev-phyto-082718-100026 [DOI] [PubMed] [Google Scholar]
  • 51.Didelot X, Wilson DJ. ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol. 2015;11:1–18. doi: 10.1371/journal.pcbi.1004041 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–3. doi: 10.1093/bioinformatics/btu033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007;7:214. doi: 10.1186/1471-2148-7-214 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Peyambari M, Warner S, Stoler N, Rainer D, Roossinck MJ. A 1,000-year-old RNA virus. J Virol. 2018;93:1–30. doi: 10.1128/JVI.01188-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Dvořák P, Hašler P, Poulíčková A. New insights into the genomic evolution of cyanobacteria using herbarium exsiccatae. Eur J Phycol. 2020;55:30–8. doi: 10.1080/09670262.2019.1638523 [DOI] [Google Scholar]
  • 56.Blaustein RA, Lorca GL, Meyer JL, Gonzalez CF, Teplitski M. Defining the core citrus leaf- and root-associated microbiota: factors associated with community structure and implications for managing huanglongbing (citrus greening) disease. Appl Environ Microbiol. 2017;83:1–14. doi: 10.1128/AEM.00210-17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Xu J, Zhang Y, Zhang P, Trivedi P, Riera N, Wang Y, et al. The structure and function of the global citrus rhizosphere microbiome. Nat Commun. 2018;9:1–10. doi: 10.1038/s41467-017-02088-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Zhang Y, Trivedi P, Xu J, Roper MC, Wang N. The citrus microbiome: from structure and function to microbiome engineering and beyond. Phytobiomes J. 2021;1–40. doi: 10.1094/PBIOMES-11-20-0084-RVW [DOI] [Google Scholar]
  • 59.Bieker VC, Sánchez Barreiro F, Rasmussen JA, Brunier M, Wales N, Martin MD. Metagenomic analysis of historical herbarium specimens reveals a postmortem microbial community. Mol Ecol Resour. 2020;1–14. doi: 10.1111/1755-0998.13125 [DOI] [PubMed] [Google Scholar]
  • 60.Glassing A, Dowd SE, Galandiuk S, Davis B, Chiodini RJ. Inherent bacterial DNA contamination of extraction and sequencing reagents may affect interpretation of microbiota in low bacterial biomass samples. Gut Pathog. 2016;8:1–12. doi: 10.1186/s13099-015-0083-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Weiß CL, Gansauge M-T, Aximu-Petri A, Meyer M, Burbano HA. Mining ancient microbiomes using selective enrichment of damaged DNA molecules. BMC Genomics. 2020;21:1–9. doi: 10.1186/s12864-020-06820-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Key FM, Posth C, Krause J, Herbig A, Bos KI. Mining metagenomic datasets for ancient DNA: recommended protocols for authentication. Trends in Genetics. 2017;33:508–20. doi: 10.1016/j.tig.2017.05.005 [DOI] [PubMed] [Google Scholar]
  • 63.Seong HJ, Park H-J, Hong E, Lee SC, Sul WJ, Han S-W. Methylome analysis of two Xanthomonas spp. using single-molecule real-time sequencing. Plant Pathol J. 2016;32:500–7. doi: 10.5423/PPJ.FT.10.2016.0216 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Ehrlich M, Norris KF, Wang RY, Kuo KC, Gehrke CW. DNA cytosine methylation and heat-induced deamination. Biosci Rep. 1986;6:387–93. doi: 10.1007/BF01116426 [DOI] [PubMed] [Google Scholar]
  • 65.Hanghøj K, Renaud G, Albrechtsen A, Orlando L. DamMet: ancient methylome mapping accounting for errors, true variants, and post-mortem DNA damage. GigaScience. 2019;8:1–6. doi: 10.1093/gigascience/giz025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Estoup A, Guillemaud T. Reconstructing routes of invasion using genetic data: why, how and so what? Mol Ecol. 2010;19:4113–30. doi: 10.1111/j.1365-294X.2010.04773.x [DOI] [PubMed] [Google Scholar]
  • 67.Gottwald TR, Graham JH, Schubert TS. Citrus canker: the pathogen and its impact. Plant Health Prog. 2002;1–34. doi: 10.1094/PHP-2002-0812-01-RV [DOI] [Google Scholar]
  • 68.Du Pont de Nemours PS. Oeuvres complètes de P. Poivre, intendant des isles de France et de Bourbon, correspondant de l’académie des sciences, etc. 1797. French
  • 69.Carter M, Torabully K. Coolitude, an anthology of the Indian labour diaspora. Anthem Press; 2002. [Google Scholar]
  • 70.Beaujard P. The first migrants to Madagascar and their introduction of plants: linguistic and ethnological evidence. Azania: Archaeological Research in Africa. 2011;46:169–89. doi: 10.1080/0067270X.2011.580142 [DOI] [Google Scholar]
  • 71.McDonald BA, Linde C. Pathogen population genetics, evolutionary potential, and durable resistance. Annu Rev Phytopathol. 2002;40:349–79. doi: 10.1146/annurev.phyto.40.120501.101443 [DOI] [PubMed] [Google Scholar]
  • 72.Goss EM. Genome-enabled analysis of plant-pathogen migration. Annu Rev Phytopathol. 2015;53:121–35. doi: 10.1146/annurev-phyto-080614-115936 [DOI] [PubMed] [Google Scholar]
  • 73.Bos KI, Harkins KM, Herbig A, Coscolla M, Weber N, Comas I, et al. Pre-Columbian mycobacterial genomes reveal seals as a source of New World human tuberculosis. Nature. 2014;514:494–7. doi: 10.1038/nature13591 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Duggan AT, Perdomo MF, Piombino-Mascali D, Marciniak S, Poinar D, Emery MV, et al. 17th Century variola virus reveals the recent history of smallpox. Curr Biol. 2016;26:3407–12. doi: 10.1016/j.cub.2016.10.061 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Duchêne S, Holt KE, Weill F-X, Le Hello S, Hawkey J, Edwards DJ, et al. Genome-scale rates of evolutionary change in bacteria. Microb Genom. 2016;2:1–12. doi: 10.1099/mgen.0.000094 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Rocha EPC, Smith JM, Hurst LD, Holden MTG, Cooper JE, Smith NH, et al. Comparisons of dN/dS are time dependent for closely related bacterial genomes. J Theor Biol. 2006;239:226–35. doi: 10.1016/j.jtbi.2005.08.037 [DOI] [PubMed] [Google Scholar]
  • 77.Spaans SK, Weusthuis RA, van der Oost J, Kengen SWM. NADPH-generating systems in bacteria and archaea. Front Microbiol. 2015;6:1–27. doi: 10.3389/fmicb.2015.00001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Gwinn ML, Ramanathan R, Smith HO, Tomb J-F. A new transformation-deficient mutant of Haemophilus influenzae Rd with normal DNA uptake. J Bacteriol. 1998;180:746–8. doi: 10.1128/JB.180.3.746-748.1998 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Alkan C, Sajjadian S, Eichler EE. Limitations of next-generation genome sequence assembly. Nat Methods. 2011;8:61–5. doi: 10.1038/nmeth.1527 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Nagarajan N, Pop M. Sequence assembly demystified. Nat Rev Genet. 2013;14:157–67. doi: 10.1038/nrg3367 [DOI] [PubMed] [Google Scholar]
  • 81.Lee S, Lee J, Lee DH, Lee Y-H. Diversity of pthA gene of Xanthomonas strains causing citrus bacterial canker and its relationship with virulence. Plant Pathol J. 2008;24:357–60. doi: 10.5423/PPJ.2008.24.3.357 [DOI] [Google Scholar]
  • 82.Juillerat A, Pessereau C, Dubois G, Guyot V, Maréchal A, Valton J, et al. Optimized tuning of TALEN specificity using non-conventional RVDs. Sci Rep. 2015;5:1–7. doi: 10.1038/srep08150 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Ausubel FM, Brent R, Kingston RE, Moore DD, Seidman JG, Smith JA, et al. Current protocols in molecular biology. John Wiley & Sons, New York; 2003. [Google Scholar]
  • 84.Joint Genome Institute. BBTools. 1997. Available from https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/ (accessed in May 2020)
  • 85.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20. doi: 10.1093/bioinformatics/btu170 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Schubert M, Lindgreen S, Orlando L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res Notes. 2016;9:1–7. doi: 10.1186/s13104-015-1837-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9. doi: 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10. doi: 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
  • 89.Ginolhac A, Rasmussen M, Gilbert MTP, Willerslev E, Orlando L. mapDamage: testing for damage patterns in ancient DNA sequences. Bioinformatics. 2011;27:2153–5. doi: 10.1093/bioinformatics/btr347 [DOI] [PubMed] [Google Scholar]
  • 90.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–60. doi: 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Broad Institute. Picard Tools (http://broadinstitute.github.io/picard/).
  • 92.GraphPad Prism. San Diego, California, USA: GraphPad Software; Available from www.graphpad.com (accessed in May 2020)
  • 93.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2. doi: 10.1093/bioinformatics/btq033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–45. doi: 10.1101/gr.092759.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Jun G, Wing MK, Abecasis GR, Kang HM. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Res. 2015;25:918–25. doi: 10.1101/gr.176552.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43:491–8. doi: 10.1038/ng.806 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Vallenet D, Engelen S, Mornico D, Cruveiller S, Fleury L, Lajus A, et al. MicroScope: a platform for microbial genome annotation and comparative genomics. Database. 2009;1–12. doi: 10.1093/database/bap021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.SourceForge. BAMStats. 2011. Available from http://bamstats.sourceforge.net (accessed in May 2020)
  • 99.Tavaré S, Miura RM. Some probabilistic and statistical problems in the analysis of DNA sequences. Some mathematical questions in biology: DNA sequence analysis. Providence; 1986. pp.57–86. [Google Scholar]
  • 100.Jombart T, Balloux F, Dray S. adephylo: new tools for investigating the phylogenetic signal in biological traits. Bioinformatics. 2010;26:1907–9. doi: 10.1093/bioinformatics/btq292 [DOI] [PubMed] [Google Scholar]
  • 101.Duchêne S, Duchêne D, Holmes EC, Ho SYW. The performance of the date-randomization test in phylogenetic analyses of time-structured virus data. Mol Biol Evol. 2015;32:1895–906. doi: 10.1093/molbev/msv056 [DOI] [PubMed] [Google Scholar]
  • 102.Rieux A, Khatchikian CE. TIPDATINGBEAST: an R package to assist the implementation of phylogenetic tip-dating tests using BEAST. Mol Ecol Resour. 2017;17:608–13. doi: 10.1111/1755-0998.12603 [DOI] [PubMed] [Google Scholar]
  • 103.Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst Biol. 2018;67:901–4. doi: 10.1093/sysbio/syy032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Faith DP. Conservation evaluation and phylogenetic diversity. Biological Conservation. 1992;61:1–10. doi: 10.1016/0006-3207(92)91201-3 [DOI] [Google Scholar]

Decision Letter 0

David Mackey, Wenbo Ma

9 Feb 2021

Dear Dr Rieux,

Thank you very much for submitting your manuscript "First historical genome of a crop bacterial pathogen from herbarium specimen: insights into citrus canker emergence" for consideration at PLOS Pathogens.

First off, I'd like to apologize for length of time that has passed since this paper was submitted. Difficulty in recruiting reviewers and receiving reviews, compounded by the holiday break, resulted in the delay.

As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments.

We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts.

Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

David Mackey

Associate Editor

PLOS Pathogens

Wenbo Ma

Section Editor

PLOS Pathogens

Kasturi Haldar

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0001-5065-158X

Michael Malim

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0002-7699-2064

***********************

Reviewer's Responses to Questions

Part I - Summary

Please use this section to discuss strengths/weaknesses of study, novelty/significance, general execution and scholarship.

Reviewer #1: The manuscript entitled “First historical genome of a crop bacterial pathogen from herbarium specimen: insights into citrus canker emergence” by P.E. Campos and co-authors reports for the first time the description of the genome of a plant bacterial pathogen from a herbarium specimen. The authors targeted a 1937 specimen of Citrus sp. from Mauritius island (located in South West Indian Ocean, SWIO) showing typical symptoms of Asiatic citrus canker, an economically important disease caused by Xanthomonas citri pv. citri (Xci). They first confirmed the historical nature of the reconstructed genome HERB_1937 Xci by characterizing the DNA degradation patterns specific to ancient DNA. The authors further described the associated metagenome and again excluded recent contaminations by looking at the aDNA patterns of the most predominant taxon of the associated bacterial community. They investigated the HERB_1937_Xci gene content both on the chromosome and the two plasmids focusing on virulence factors and revealed a full equipment for virulence and detected three new allele sequence in TALE genes that are important for virulence. Then the authors compared the HERB_1937_Xci genome with a large set of modern genomes, especially including many genomes originating from the SWIO region. They performed a phylogenetic reconstruction which confidently placed HERB_1937_Xci at the root of the modern SWIO strains lineages making him a very likely member of the founder population. Having ensured the presence of a temporal signal, the authors finally estimated evolutionary parameters using a tip-calibration inference/ Bayesian tip-calibration inferences and refined the estimation of the age of the ancestor of modern SWIO strains proposing a scenario for the emergence of ACC in the SWIO region.

The authors have taken the opportunity of an immense resource of biodiversity that lies in the herbaria to increase our present knowledge on genetic diversity of a bacterial plant quarantine pathogen that still threatens large areas of citrus production. They reconstructed the introduction routes of this invasive bacterial species in SWIO. The evolutionary history that can be deduced provides information about the origin and history of the ACC emergence and the construction of the genetic diversity of the SWIO Xci populations.

They have produced original and new results that underline the interest of the methodological approach. From what I can judge, this methodological approach and the analyses implemented are appropriate and relevant to reach the conclusions presented. The manuscript is well written and the results are clearly presented.

Reviewer #2: Strength

First study to sequence the whole bacterial genome of an important plant pathogen in a historic herbarium sample

Data analysis was thorough and paper is well- written

Weakness

Could have sequenced more specimens to address migration and root tree and examine evolution for genome over time.

Did not justify why they chose this particular sample.

Reviewer #3: This manuscript makes another contribution to a relatively underexplored area in ancient DNA – herbarium specimens from historical collections that show indications of infection. I was asked to comment specifically on the measures of authenticity regarding ancient DNA acquisition and analysis, so my comments largely relate to that. These comments aside, the paper is very clearly written and the analyses are extensive. I include a few comments below that might improve quality of the analysis.

**********

Part II – Major Issues: Key Experiments Required for Acceptance

Please use this section to detail the key new experiments or modifications of existing experiments that should be absolutely required to validate study conclusions.

Generally, there should be no more than 3 such required experiments or major modifications for a "Major Revision" recommendation. If more than 3 experiments are necessary to validate the study conclusions, then you are encouraged to recommend "Reject".

Reviewer #1: No major issues to be discussed here.

Reviewer #2: Need to sequence more historic genomes to justify some of their conclusions. That would make the paper stronger. They included a lot of previously sequenced genome data from modern lineages. For this work, I would ask them to qualify their conclusions and not do additional experiments

Reviewer #3: First, it is typical in ancient DNA studies to eliminate all reads 30bp and shorter, after trimming (if this is done, which it is here). It seems the authors have not done so, despite their caution against it (line 172).

The damage plot they present in Figure 4b is also confusing. The use of two horizontal axes, one on the top and one on the bottom, is confusing, as they are clearly independent, though at first glance they look as though they are describing the damage profiles of all reads 25bp in length. A more intuitive way to show their damage profiles would be to show the 5’and 3’ plots side by side. They could show only the first 25bp of the read in either direction, but this removes the ambiguity of the relationship between the positions in the 5’ and 3’ ends as shown in the current plot.

Line specific comments:

Line 61: Change “fossil remains” to “preserved tissues” or something similar, as fossils are rock with no preserved biological tissues

Line 105 – a sentence or two describing the preservation conditions of the sample would be helpful

Line 109 – it’s not clear if the reads were merged, or simply treated as regular mate pair data. Reads should be merged, as treatment as modern paired end data would lead to incorrect base calling due to high background DNA. Treatment of each read independently will result in SNP calls in duplicate reads (via a call in both the forward and reverse read) and will inflate SNP coverage, which could be an issue with high non-target background. I suggest they merge their reads and reprocess the data.

Line 116 – Greater clarity is needed here to explore how these percentages were generated. I assume via BLAST? What seems to be missing is a description of the percent Xci DNA in bulk DNA content based on mapping to a reference. This is a key value in aDNA studies.

Line 127 – Remove or edit the passage “as compared to the standard read length of 150 nt obtained for modern Xci samples”, since read length is dependent on the sequencing platform and kit, and that is not clear based on this description.

Line 144 – 148: I’m not sure why the authors opted to make this comparison. A far more informative metric would be to compare the Xci DNA damage profile to that of the host (perhaps to a plastid or a chromosome). That is typically done as a means of authenticity in ancient pathogen investigations.

Table 1: Column 2 should be % of reads mapping to the reference based on total DNA content (% endogenous DNA), before duplicate removal. Currently this represents the proportion of mapped reads that assigns to the genetic component, which isn’t as informative. Column 3 is called “Mean depth”, though I suppose they mean “Mean coverage”, which is a clearer term. Last column, which shows the % reads with damaged terminal bases show values between 1.62 and 1.88, though 2.64% is reported in the main text (line 135). Why the discrepancy?

Lines 168 – 169: Reporting % of the reference covered at 1X is not terribly informative, especially if base calling required a min of 15-fold coverage (as is described in the methods). Also is it not clear if the map quality was 0 or higher. If it was 0, the authors are reporting reads that will map to multiple locations in the genome, which could inflate their coverage, especially if these positions are later removed for SNP calling.

Line 273 – “obtained from such material”… greater specificity is needed here to describe what “such material” is.

Line 277 – “contamination-free environment”: no environment is contamination-free.

Lines 291 – 293: There is no need to explore the reasons for 60% of the reads not mapping to a reference. This is a nearly ubiquitous phenomenon in ancient DNA work, and probably also in modern metagenomics studies.

Line 295 – change “studies” to “study”

Line 297 – were the preservation conditions for P. infestans similar?

Line 298 – “in the reads mapping to either of two plasmids” is a better way to say this

Lines 331 – 334: If a conclusion is being drawn from tree topology, the ML tree with bootstraps should ideally be included in the main text. Currently the authors include only the Bayesian tree, which was constructed with the ML tree as a prior (if I have understood the methods properly)

Line 361 – typo “deshydrogenase”

Lines 392 – 403: There is some odd grammar in this concluding paragraph, and the English should be adjusted for better clarity

**********

Part III – Minor Issues: Editorial and Data Presentation Modifications

Please use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity.

Reviewer #1: Minor comments are indicated below

The first three points are free comments and do not necessarily need to be followed up on.

- Line 45. And to possibly close the loop, loss of biodiversity can impact disease risk (Keesing, 2006, Ecol. Lett. 9:485; Rohr et al., 2020, Nature Ecol. & Evol. 4:24).

- May be you could refer to the pioneering work of Ristaino et al (2001, Nature 411:695).

- Line 91. You could add a sentence highlighting further the contribution of herbarium specimens in deciphering the original outbreak of ACC in Florida at the beginning of the twentieth century (Li et al., 2007).

- Line 105 (and line 416). DNA was extracted from a pool of five lesions which could represent independent infection events. Is there any possibility of strains mixture? Or did you observe any sequence variation within HERB_1937_Xci that could be attributed to the presence of different strains?

- Line 115. You talk about identified sequences at the species level but the taxonomic level of the plant host is at the genus level. Even if this was not your objective, did you get any clues from the reads about the species identification of your citrus specimen?

- Line 199. Change to S3 Table for homogenization.

- Line 261. Check with Fig 6b where R2 = 0.043 and p-value = 0.0105.

- Line 312. Did you estimate any genetic diversity indices to support this assertion?

- Line 336. Can a privileged ancient history between these two islands explain older migration events and shed light on the presence of older Xci common ancestors?

Bibliography

- Line 645. Change to canker-like.

- Line 743. Reference Bolger et al. to be completed.

- Line 776. Prefer: Jombart T., Balloux F., Dray S., adephylo: new tools for investigating the phylogenetic signal in biological traits, Bioinformatics, Volume 26, Issue 15, 1 August 2010, Pages 1907–1909, https://doi.org/10.1093/bioinformatics/btq292

I could not consult the data associated with HERB_1937 deposited to the SRA and on GenBank database which were not yet available.

Reviewer #2: Page Line Comments

Abstract 1 Change to “ancient genomics has been used”

3 43 Add a . after (3) and start new sentence with “More”

3 46 Change to “infectious crop” diseases

3 54 Rework this sentence as there have been studies with herbarium collections that sampled over more than 4 decades. See paper by Ristaino et al “Ristaino, J. B. 2020. The importance of mycological and plant herbaria in tracking a plant killer. Front. Ecol. Evol. 7:521. doi: 10.3389/fevo.2019.00521” for a review of the studies with plant mycological collections to track plant disease epidemic”.

3 58 Delete “to” and change to “systemic detection of evolutionary changes or reconstruction of …”

4 63 Add more citations here. Yoshida was not the first to use herbarium specimens to identify the famine-era lineage of P. infestans. Cite “May, K. J. and Ristaino, J. B. 2004. Identity of the Mitochondrial DNA Haplotype(s) of Phytophthora infestans in Historical Specimens from the Irish Potato Famine. Mycol. Res. 108:171-179.”

4 81 Add more citations here “Saville, A. , Martin, M, Gilbert, T. and Ristaino, J. 2016. Historic late blight outbreaks caused by a widespread dominant lineage of Phytophthora infestans (Mont.) de Bary Plos One 11: e0168381 https://doi.org/10.1371/journal.pone.0168381” and others mentioned previously. I’d like to think our work has helped resolved the debate about the identity and source of the 19th century outbreak.

5 84 Cite Li et al. (26) here as Hartungs work did use herbarum specimens to study citrus canker and I believe did the first genetic work with a historic strain from 1911. Although they did not use full genomes sequences they did to multilocus genotyping so say a bit more here and compare your results to previous work in the discussion.

5 104 Indicate the name of the herbarium that the sample was retrieved from and define SWIO.

6 116 There was a greater percentage of reads from some other bacterial species than X. citir (1.2% vis Methylbacterium 1.57%) Methylobacterium has been identified as a contaminant during DNA extraction and can lead to its erroneous appearance in microbiota or metagenomic datasets. Did you run a preliminary PCR on the samples to confirm they were actually infected with X citri before the llumina sequencing using species specific primers. Rep or Box PCR, etc It would be good to conform or look for sequences specific to the pathogen to confirm that the samples were infected by X citri.

6 122 Could reduce some of this section about DNA damage as its clear you are using herbarium material and the read will likely be short.

9 175 Did you sequence more than one herbarium sample? Why was this particular sample of interest. Is the oldest existing specimen? It would have been better to sequence more samples from the various islands and at different times in the past to present day.

10 199 The data on the T3SS is interesting. Were the absent genes viewed necessary for virulence in modern lineages?

10 215 This mapping method that uncovered new reads in the TALE region is interesting

11 227 Say more about the number and location of the SNPS. What do you mean by “dubious SNPS”?

12 268 I can’t help but think that sequencing more samples would also have improved the root date of the tree. What do you know about movement of citrus during this time period? Are there historical records on the introduction of the pathogen that might indicate times of introduction?

13 274 Add more citations here as stated above on page 3 line 54.

13 284 Sequencing more samples would have given you an indication of whether this was contamination or infection by other bacteria.

14 320 Are the French botanists samples available in the Paris herbarium? These would be interesting to see if you can push back the introduction date.

15 331 You really need more samples to determine if one or more introduction occurred or even it the same pathovar was introduced. This is speculation so reword this.

15 333 You could do migration analysis on the data to see if directional gene flow has occurred among the island nations.

16 363 How would this influence pathogenicity – more or less pathogenic? Is the historic lineage potentially more virulent then modern day lineages or not?

17 374 I encourage you to continue more sequencing of historic genomes of this important pathogen so you can try de novo genome assembly and do further migration and phylogenetic analysis . Search for specimens in unlikely places. They may be in museum collections from explorers to the region including the British.

Reviewer #3: (No Response)

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here on PLOS Biology: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/plospathogens/s/submission-guidelines#loc-materials-and-methods

Decision Letter 1

David Mackey, Wenbo Ma

13 May 2021

Dear Dr Rieux,

Thank you very much for submitting your manuscript "First historical genome of a crop bacterial pathogen from herbarium specimen: insights into citrus canker emergence" for consideration at PLOS Pathogens. This revised version of your manuscript was reviewed by one of the previous independent reviewers. They make several suggestions to further improve the manuscript. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. Please take special care to address the concern regarding in-lab contamination and the need to report on mapping reads in the negative control.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email.

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

David Mackey

Associate Editor

PLOS Pathogens

Wenbo Ma

Section Editor

PLOS Pathogens

Kasturi Haldar

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0001-5065-158X

Michael Malim

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0002-7699-2064

***********************

Reviewer Comments (if any, and for reference):

Reviewer's Responses to Questions

Part I - Summary

Please use this section to discuss strengths/weaknesses of study, novelty/significance, general execution and scholarship.

Reviewer #3: (No Response)

**********

Part II – Major Issues: Key Experiments Required for Acceptance

Please use this section to detail the key new experiments or modifications of existing experiments that should be absolutely required to validate study conclusions.

Generally, there should be no more than 3 such required experiments or major modifications for a "Major Revision" recommendation. If more than 3 experiments are necessary to validate the study conclusions, then you are encouraged to recommend "Reject".

Reviewer #3: (No Response)

**********

Part III – Minor Issues: Editorial and Data Presentation Modifications

Please use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity.

Reviewer #3: I appreciate that the authors took the time to reprocess their data as merged, as this did have an effect on their coverage estimates. I’m slightly curious as to why the merged data resulted in a greater number of genomic regions with no coverage, as I would have thought that merging would result in a reduction of reads spanning only those positions that were covered via paired end analysis. Was the map quality metric altered between these two analyses? Certainly increasing the map quality could lead to this phenomenon. Typically map quality values of 30 or even 37 are used, not just removal of reads with map quality 0. Regarding coverage thresholds for SNP calling, the authors have now used average depth as a criterion for inclusion in the analysis, citing a study (ref 1 of their response) as a demonstration of this approach. While this approach is acceptable in their case, it should not be considered a global criterion for ancient DNA work, where coverage is often highly variable and can be very low. It is customary to set a threshold independent of coverage such as 3X or better yet 5X for SNP calling with ancient data.

I thank the authors for including their percent Xci DNA in bulk content (0.74%). I have not seen them disclose their sequencing depth anywhere in the manuscript, and that should be mentioned ideally in the main text. Their genomic reconstruction was performed from 1.6 million mapping reads, so by extension I assume their sequencing depth was ca. 220 million reads. I’m also curious about why the BLAST analysis assigned 1.2% of bulk DNA to Xci. I would have thought BLASTn would have returned fewer reads assigned to Xci compared to mapping, since the mapping process is based on read similarity only. Was a filtering step applied before the mapping? A comment about this difference could be useful.

A description of the mapping process and statistics of the Xci mapping should be presented in the main text before the damage assessment, which is calculated based on the mapped reads. They should also be clear on line 146 that their patterns of degradation apply to reads mapping to the Xci reference genome as opposed to simply “degradation in HERB_1937” as is it currently stated. Why are the damage profiles now shown in both the main manuscript (for the chromosome) and the SI (for the chromosome and two plasmids)? Why not just show the current SI image in the main text? I also caution the authors against making assumptions that damage alone will “itself prove the authenticity” (wording from response letter) of their data. If they were to evaluate the human reads and observe a similar damage pattern, the data would have to be explained by contamination. Authenticity is conferred through damage profile along with context of the finding. Branch shortening in the phylogeny can be a useful metric for authentication of ancient data, and this is shown in their supplementary ML tree. The authors may wish to mention that as well at some point in the MS to bolster their claim of authenticity, along with highlighting its ancestral phylogenetic position (currently on line 329 outside the context of authenticity). They entertain the possibility on lines 303 – 304 of their data coming from in-lab contamination, but I don’t think this statement is needed in light of the accompanying authenticity data they present. Reporting on the mapping reads in their negative controls is currently missing in the main manuscript and absolutely must be included: This should also help to rule out in-lab contamination.

The resolution of the main text images I received is of poor quality, and it looks like the samples have 20% damage as opposed to 2% damage, which really shocked and confused me at first. Please make sure that high resolution figures accompany the manuscript in print.

In their table 1 (main MS), I assume “most extreme position” refers to the terminal position? I’ve not come across that terminology and I find it unintuitive. Also, it is not common for % damaged bases to be presented as an average over the terminal five nucleotides, as the authors have done here. % damage at the terminal 5’ and 3’ ends is sufficient and is more commonly reported.

Is it worth mention that the reads mapping to Xci are shorter than average for bulk DNA content and those mapping to Methylobacterium are longer than average? Do the authors know of a biological reason for this difference? GC content perhaps? Cellular structure?

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

References:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Decision Letter 2

David Mackey, Wenbo Ma

14 Jun 2021

Dear Dr Rieux,

We are pleased to inform you that your manuscript 'First historical genome of a crop bacterial pathogen from herbarium specimen: insights into citrus canker emergence' has been provisionally accepted for publication in PLOS Pathogens. The careful and thorough responses to review comments was much appreciated.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Pathogens.

Best regards,

David Mackey

Associate Editor

PLOS Pathogens

Wenbo Ma

Section Editor

PLOS Pathogens

Kasturi Haldar

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0001-5065-158X

Michael Malim

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0002-7699-2064

***********************************************************

Reviewer Comments (if any, and for reference):

Acceptance letter

David Mackey, Wenbo Ma

6 Jul 2021

Dear Dr Rieux,

We are delighted to inform you that your manuscript, "First historical genome of a crop bacterial pathogen from herbarium specimen: insights into citrus canker emergence," has been formally accepted for publication in PLOS Pathogens.

We have now passed your article onto the PLOS Production Department who will complete the rest of the pre-publication process. All authors will receive a confirmation email upon publication.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any scientific or type-setting errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Note: Proofs for Front Matter articles (Pearls, Reviews, Opinions, etc...) are generated on a different schedule and may not be made available as quickly.

Soon after your final files are uploaded, the early version of your manuscript, if you opted to have an early version of your article, will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Pathogens.

Best regards,

Kasturi Haldar

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0001-5065-158X

Michael Malim

Editor-in-Chief

PLOS Pathogens

orcid.org/0000-0002-7699-2064

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Reads depth of a Transcription Activator-Like Effector (TALE) gene of HERB_1937.

    (PDF)

    S2 Fig. Maximum Likelihood (ML) phylogenetic tree of Xci genomes.

    (PDF)

    S3 Fig. Date-randomization test results.

    (PDF)

    S4 Fig. Effect of integrating HERB_1937_Xci on substitution rate estimates in BEAST.

    (PDF)

    S1 Table. Published modern genomes included in the phylogenetic analyzes.

    (PDF)

    S2 Table. List of Xci reference strain IAPAR 306 coding sequences (CDS) covered on less than 75% of their length by HERB_1937_Xci reads and hence designed as non-covered.

    (PDF)

    S3 Table. List and coverage of 82 Xanthomonas virulence factors CDS (pthA4 not included) used in this study.

    (PDF)

    S4 Table. List and frequency of nucleotide patterns coding for RVD found in HERB_1937_Xci reads.

    (PDF)

    S5 Table. Description of the 14 SNPs found in coding regions between HERB_1937_Xci and modern strains of the SWIO clade.

    (PDF)

    Attachment

    Submitted filename: Answer_to_reviewers.docx

    Attachment

    Submitted filename: Answer to reviewer.pdf

    Data Availability Statement

    The authors confirm that all data underlying the findings are fully available without restriction. HERB_1937 raw reads were deposited to the Sequence Read Archive (SRR12792042). Consensus historical genome reconstructed for chromosome, plasmids pXAC33 and pXAC64 have also been deposited on GenBank database (CP072205-CP072207). The modern genomes used in this study have previously been published in the NCBI GenBank repository under accession numbers listed in S1 Table. Accession numbers of any previously published data used in this study are listed in Supplementary information.


    Articles from PLoS Pathogens are provided here courtesy of PLOS

    RESOURCES