Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2019 Nov 11;15(11):e1008452. doi: 10.1371/journal.pgen.1008452

Transcriptional and genomic parallels between the monoxenous parasite Herpetomonas muscarum and Leishmania

Megan A Sloan 1, Karen Brooks 2, Thomas D Otto 2,¤, Mandy J Sanders 2, James A Cotton 2,*, Petros Ligoxygakis 1,*
Editor: Won-Jae Lee3
PMCID: PMC6872171  PMID: 31710597

Abstract

Trypanosomatid parasites are causative agents of important human and animal diseases such as sleeping sickness and leishmaniasis. Most trypanosomatids are transmitted to their mammalian hosts by insects, often belonging to Diptera (or true flies). These are called dixenous trypanosomatids since they infect two different hosts, in contrast to those that infect just insects (monoxenous). However, it is still unclear whether dixenous and monoxenous trypanosomatids interact similarly with their insect host, as fly-monoxenous trypanosomatid interaction systems are rarely reported and under-studied–despite being common in nature. Here we present the genome of monoxenous trypanosomatid Herpetomonas muscarum and discuss its transcriptome during in vitro culture and during infection of its natural insect host Drosophila melanogaster. The H. muscarum genome is broadly syntenic with that of human parasite Leishmania major. We also found strong similarities between the H. muscarum transcriptome during fruit fly infection, and those of Leishmania during sand fly infections. Overall this suggests Drosophila-Herpetomonas is a suitable model for less accessible insect-trypanosomatid host-parasite systems such as sand fly-Leishmania.

Author summary

Trypanosomes and Leishmania are parasites that cause serious Neglected Tropical Diseases (NTDs) in the world’s poorest people. Both of these are dixenous trypanosomatids, transmitted to humans and other mammals by biting flies. They are called dixenous as they can establish infections in two different types of hosts– insect vectors and mammals. In contrast, monoxenous trypanosomatids usually only infect insects. Despite establishment in the insect’s midgut being key to transmission of NTDs, events during early establishment inside the insect are still unclear in both dixenous and monoxenous parasites. Here, we study the interaction between a model insect–the fruit fly Drosophila melanogaster–and its natural monoxenous trypanosomatid parasite Herpetomonas muscarum. We show that both the genome of this parasite, and gene regulation at early stages of infection have strong parallels with Leishmania. This work has begun to identify evolutionarily conserved aspects of the process by which trypanosomatids establish in insects, thus potentially highlighting key checkpoints necessary for transmission of dixenous parasites. In turn, this might inform new strategies to control trypanosomatid NTDs.

Introduction

The family Trypanosomatidae belong to the order Kinetoplastida, a group characterized by the presence a mitochondrial organelle rich in DNA (kDNA) called the kinetoplast. This family includes parasitic flagellates that undergo cyclical development in both vertebrate and invertebrate hosts (and are therefore dixenous). These parasites are best known as agents of important diseases in humans, domestic animals and plants. However, several genera of this order such as Crithidia, Herpetomonas, Blastocrithia and Leptomonas are restricted to a single host (monoxenous), usually an insect from the orders Diptera, Hemiptera or Siphonaptera [1]. Although such monoxenous or “lower” trypanosomatids seem to have their lifecycle essentially confined to insect hosts [2], they have also been reported in plants [3] and immunocompromised humans [1].

There is an increasing interest in monoxenous trypanosomatids as a model for understanding the evolution and ecology of trypanosomatids [4], as well as how they may modify their insect host [4]. It is now clear that monoxenous trypanosomatids are ubiquitous parasites of a wide range of insect groups and have numerous effects on the physiology of the insect host. These effects include alterations in fertility and reproduction, modified food intake, delayed development and reduction in lifespan [5]. In projections of total animal biodiversity, insects represent more than 60% of all animals [6]. Therefore, knowledge of insect physiology and what can influence it, is essential for maintaining a species-rich environment especially when longitudinal population data show a sharp decline in flying insect biomass [7]. In this context, studies of trypanosomatid-insect interactions will provide vital insights into the ecology of crucial insect species (e.g. pollinators).

To this end, a number of monoxenous trypanosomatid genomes and transcriptomes are being investigated [8,9]; including bee parasites from the genus, Lotmaria passim (the honey bee parasite) and Leptomonas pyrrhocoris a globally disseminated parasite isolated from fire bugs [10,11]. These studies, and earlier work on the molecular biology of trypanosomatids, have revealed that monoxenous parasites share many distinctive genome features with their better-studied dixenous relatives [12].

The genomic DNA is arranged into ‘polycistronic’ (multi-gene) transcriptional units of functionally unrelated genes, the majority of which lack introns. Given this gene arrangement, the cells do not control an individual gene’s expression by varying its transcription level, instead expression is controlled by RNA-binding proteins [13] and other post-transcriptional processes such as RNA editing [14]. RNA editing processes include trans-splicing where 39 nucleotides, called a splice leader sequence, are added to the 5’ end of mRNAs [15]. The splice leaders (also called mini exons) are encoded in tandem repeats in a different genomic locus to the gene.

Trypanosomatid kDNA is arranged in interlocking ‘maxi-circles’ [1618]. The kDNA maxicircle is homologous to mitochondrial genomes in other systems but the sequence encoding many of typical mitochondrial proteins is scrambled, relying on post-transcriptional mRNA editing to reconstitute the correct coding sequence [19]. The kinetoplast also contains thousands of associated ‘mini-circles’ which encode guide RNAs involved in this editing process [17].

In addition to ecological insights, studies of monoxenous trypanosomatids may help us gain new perspective on interactions of more medically important parasites and their insect vectors, which mediate neglected tropical diseases such as Leishmaniasis (vectored by phlebotomine sand flies) and sleeping sickness (tsetse flies). To inform, and accelerate, research in these experimentally challenging dipteran-parasite relationships, we have developed the study of the model dipteran Drosophila melanogaster and its natural trypanosomatid Herpetomonas muscarum [20]. We have established that a network of signalling in the intestine of the host was important for clearance as well as for maintaining fecundity. This network involved NF-κB and STAT-mediated transcription, which regulate intestinal stem cell proliferation that the parasite attempts to suppress. Here, we turn our attention to the parasite. We report the genome of H. muscarum isolated from a wild population of Drosophila melanogaster in Oxfordshire, UK. We also report the transcriptomes of this H. muscarum isolate from in vitro culture and during the course of infection in D. melanogaster. The similarities with Leishmania major both at the genome level as well as transcriptome regulation were striking. This was especially the case in the early phases of host infection when the parasite needs to overcome the barrier of the insect midgut and establish infection. Given the resistance mechanisms to parasite establishment (and therefore onward transmission) reside in the dipteran midgut [21], the Drosophila-Herpetomonas model may allow researchers to take advantage of the extensive toolkit of genetic approaches available for Drosophila to uncover mechanistic details of evolutionary conserved aspects of the relationship between trypanosomatids and dipteran vectors, where the tool-box for functional studies is not yet fully developed.

Results/Discussion

The Herpetomonas muscarum genome

Assembly

PacBio and Illumina sequence reads were generated from an axenic culture of H. muscarum promastigotes as described in Materials and Methods. The reads were assembled into a genome of 41.7 Mbp in 264 scaffolds with the largest 1,793,442 bp in length (N50 = 707,495 bp). We observed a median read coverage of 114x with populations of scaffolds coverage at approximately 50x and 160x which may represent monosomic and trisomic scaffolds (Fig 1, predicting 37–39 chromosomes). Kmer analysis of the sequencing reads estimated the haploid genome length to be approximately 35.2 Mbp with a read error rate of less than 1% (S1 Fig, Vurture et al., 2017). While the GenomeScope [22] model does not fit the aneuploid nature of trypanosomatid genomes (see below), we believe this suggests our assembly is approximately the correct size.

Fig 1. Average coverage depth of H. muscarum scaffolds > 100 kb.

Fig 1

The solid line shows the global median read coverage. The dashed line shows 1.5x and the dotted line shows 2x the global median read coverage respectively. In blue are scaffolds which were mapped, by PROmer (Kurtz et al., 2004), to the L. major chromosome 31—sequences of 300bp which map with > 70% identity. The shade of blue represents the proportion of the scaffold which was mapped.

Annotation

Gene model annotation was generated with Companion [23] using evidence from RNA-seq data (described below) and the proteomes of L. major, L. braziliensis and T. brucei as described in Materials and Methods. The final H. muscarum v1 annotation contains 12,687 genes, of which 12,162 are inferred to be protein-coding (Table 1).

Table 1. Herpetomonas muscarum genome annotation summary.
Feature H. muscarum v1.0
Genes 12687
mRNAs 12162
CDSs 12175
Polypeptides 12934
Pseudogenes 772
rRNAs 168
snRNAs 3
snoRNAs 181
tRNAs 173

All unique open reading frames produced by the gene models were kept, even in cases where the gene prediction was not strongly supported by RNA-sequencing evidence, in an attempt to not ‘miss’ genes. It is therefore likely that this annotation contains a higher number of genes than the ‘true annotation’. However, the number of reported genes is close to that reported for other trypanosomatid species e.g. T. brucei TREU927 strain contains 11,567 genes [24]. We also note that the few T. brucei genes reported to contain intronic sequences, e.g. poly(A)-polymerase (Tb927.3.3160) and the mini-exon gene (see below), also appear to contain intronic regions in H. muscarum.

Conserved features of trypanosomatid genomes

Genome structure and large scale synteny

As seen in other trypanosomatid genomes, open reading frames were found on both strands on many scaffolds. Genes are (mostly) arranged in large groups of genes present on the same strand and in the same direction, which is indicative of the polycistronic transcripts typical in trypanosomatid genomes. The regions between polycistrons, commonly referred to as strand switch regions (SSRs), are thought to contain the transcriptional start sites for transcription of each group of genes. We used the SSRs to define and estimate the number of polycistrons. Here we defined SSRs to begin and end at genes where the downstream open reading frame is on the opposing strand of the same scaffold. This highlighted 386 genes from 112 different scaffolds. These putative strand switches were manually inspected and could be grouped into different three situations. There were 128 bona fide strand switches which were either divergent (72 cases) or convergent (56 cases) (S1 Table). There were 166 cases where a single gene (or small group of < 5 genes) had become inverted within a polycistron. Small genes (< 350bp) encoding hypothetical proteins and tRNAs were commonly found in these cases, though other larger genes were also found in these groups e.g. HMUS00935500.1 an putative trans-sialidase. Finally, there were 92 cases where a strand switch does occur, but the precise locus was unclear. These cases tended to be at where a single gene at the end of a scaffold was on the opposing strand to all other genes on the scaffold–as such it was unclear if this represented a bona fide strand switch or a single gene inversion. Overall, this indicated there are at least 128 polycistrons in the H. muscarum genome, though this is likely to be an underestimate given the ambiguity of some strand switch regions. Comparisons with other trypanosomatids genomes also suggest this figure is an underestimate, e.g. L. major is predicted to have 184 polycistrons [25] and T. brucei is predicted to have 150 [26], both of which have smaller genomes and fewer predicted chromosomes than H. muscarum.

Despite diverging before the existence of mammals [27], trypanosomatids show high gene order conservation across the genome. As expected, the H. muscarum scaffold showed synteny with other trypanosomatid genomes (Fig 2A–2E). Herpetomonas was most highly syntenic with L. major despite being considered phylogenetically closer to Phytomonas and Leptomonas. To quantify this, we took non-overlapping windows of adjacent H. muscarum genes with single copy orthologs in three comparator genomes: L. major, T. brucei and Leptomonas seymouri. For each window size, we count for how many windows have all orthologs on the same scaffold in the comparator (syntenic windows), and for how many of those all the genes are in the same relative order as their H. muscarum orthologs (colinear windows). Almost 96% of 3-gene windows of single-copy orthologs between H. muscarum and L. major (1845/1926) are syntenic, and 53% of these are colinear (985/1845). This conserved genome structure is shared, to a slightly lesser extent across the trypanosomatids (91.7% or 1386/1511 syntenic with T. brucei brucei, 55% or 766/1386 colinear, 80.9% or 1643/2030 syntenic with L. seymouri, 46% or 761/1643 colinear). This relationship holds across window sizes (Fig 2F). The values for synteny with Leptomonas seymouri are likely to be biased downwards by the fragmentary assembly available for that species, and this analysis does not capture rearrangements, expansions or contractions of multi-gene families, for which one-to-one orthology is unlikely to be clear.

Fig 2. Synteny and colinearity between H. muscarum and other trypanosomatids.

Fig 2

As an example, this plot shows co-linearity between H. muscarum genes (genes highlighted in blue) on scaffold 40 and: A. L. major chromosome 1 (genes highlighted in red). B. Phytomonas EM1 scaffolds HF955082, HF955140 and HF955140 (genes highlighted in green) C. Leptomonas pyrrhocoris scaffolds LpyrH10_33 and LpyrH10_41 (genes highlighted in pink) D. Crithidia bombi scaffolds (genes highlighted in yellow) OESO01000125 and OESO01000148. E. Trypanosoma brucei chromosomes 9 and 11 (genes highlighted in purple). Scaffold/Chromosome labels show length in bp. This data was produced using Promer alignments (Delcher et al., 2002). Ribbons between scaffolds show windows of >100 amino acid (translated) align with at least 50% identity. This data was visualised using Circos (Krzywinski et al. 2009). To quantify these relationships, we investigated all windows of consecutive genes with single-copy orthology in H. muscarum in comparison to L. major, T. brucei brucei and Leptomonas seymouri. F. Shows the proportions of these windows for which all genes occurred on a single scaffold in the comparison genome (syntenic windows), and the proportion of those for which all gene occurred in the same order as in H. muscarum (colinear windows) for a range of window sizes from 3 to 60 genes. Numbers of windows included in the comparisons varies from 1926 windows of 3 single-copy orthologs with L. major to 49 windows of 60 adjacent genes with single-copy orthologs in T. brucei. Note that synteny values are also affected by the degree of continuity of the comparison species genome for Leptomonas seymouri.

Splice leader sequence

In trypanosomatids, each mRNA is capped, via trans-splicing [reviewed in 15], with a conserved 39bp sequence called the splice leader (SL). The SL is encoded by the mini-exon genes which are found throughout the genome in tandem arrays. Each mini-exon has two components; the highly conserved 39bp sequence trans-spliced on to mRNAs (the exon) and a less well conserved intronic sequence. Between each mini-exon gene there is a variable spacer region which is not transcribed. To find the splice leader sequence for our H. muscarum isolate, we searched for the conserved 39bp SL sequence from Phytomonas serpens (L42381.1) in the H. muscarum scaffolds. This gave 259 hits over 24 scaffolds, which we used to identify 19 clusters of mini-exon gene repeats (over 15 scaffolds) containing 3–43 copies of the mini exon gene (see S2 Table). The first 111bp of the gene are common to all copies of the mini-exon gene and contain a 40bp splice leader sequence and what we predict to be the intron.

The splice leader sequence (1-40bp) and the putative intronic region (41-111bp) were then aligned with mini-exon sequences of several other trypanosomatids in the Leishmaniinae clade—including 9 other Herpetomonas isolated from heteropterans in the neotropics [28]. Whilst the splice leader sequence is well-conserved across the clade (Table 2), we observe variability in the A/T-rich region between bases 11-19bp which appears genus specific, with the exception of the Herpetomonas sequences. H. rotimani and H. nabiculae have identical sequence across the 11-19bp region. However, the H. muscarum and H. nabiculae differ from each other, and the other Herpetomonas sequences over this variable region. Additionally, compared to other trypanosomatids, the Herpetomonas sequences have an ‘additional’ adenosine between bases 10 and 11. The intronic region from H. muscarum shows high similarity to that of previously reported Herpetomonas sequences. The first 15bp of the intronic sequence appear to be conserved in other species from the Leishmaniiae clade, however the sequence becomes more variable thereafter in both in terms of base content and length.

Table 2. Alignment of highly conserved splice leader sequences (bases 1–40 of mini-exon gene) of H. muscarum and other species from the Leishmaniiae clade.

The variable AT-rich region (positions 11–19, bold) is shown by genus. Herpetomonas sp. appear to have an additional A or T residue, dependant on species at position 11.

Species Accession # Splice leader sequence (bases 1–40)
Herpetomonas muscarum AACTAACGCTAAAAATTGTTACAGTTTCTGTACATTATTG
Herpetomonas muscarum EU095982.1*, EU095980.1*, EU095979.1*, EU095983.1, EU095984.1, EU095981.1* AACTAACGCTAAAAATTGTTACAGTTTCTGTACTATATTG
Herpetomonas sp. TCC263 EU095976.1 AACTAAAGCATTATATAGATACAGTTTCTGTACTATATTG
Herpetomonas sp. TCC263 EU095977.1 AACTAAAGCATTATATAGATACAGTTTCTGTACTATATTG
Herpetomonas roitmani EU095978.1 AACTAAAGCATTATATAGATACAGTTTCTGTACTTTATTG
Herpetomonas nabiculae KF054153.1 AACTAACGCTAT-TATTGTTACAGTTTCTGTACTTTATTG
Phytomonas EM1 X87138.1 AACTAACGCT-ATTCTAGATACAGTTTCTGTACTTTATTG
Phytomonas serpens L42381.1, L42378.1, L42377.1, L42382.1, L42376.1 AACTAACGCT-ATTCTAGATACAGTTTCTGTACTTTATTG
Phytomonas sp. Mar8 AF250993.1 AACTAACGCT-ATTCTAGATACAGTTTCTGTACTTTATTG
Phytomonas sp. Alp1 AF250967.1 AACTAACGCT-ATTCTAGATACAGTTTCTGTACTTTATTG
Leishmania braziliensis MG010484.1 AACTAACGCT-ATATAAGTATCAGTTTCTGTACTTTATTG
Leishmania tarentolae AY100201.1 AACTAACGCT-ATATAAGTATCAGTTTCTGTACTTTATTG
Leishmania hoogstraali AY100197.1, AY100200.1 AACTAACGCT-ATATAAGTATCAGTTTCTGTACTTTATTG
Leishmania gymnodactyli AY100195.1, AY100196.1 AACTAACGCT-ATATAAGTATCAGTTTCTGTACTTTATTG
Leishmania adleri AY100199.1, AY100194.1 AACTAACGCT-ATATAAGTATCAGTTTCTGTACTTTATTG
Leishmania major XR_002460055.1 AACTAACGCT-ATATAAGTATCAGTTTCTGTACTTTATTG
Leishmania mexicana Agami and Shapira 1992 AACTAACGCT-ATATAAGTATCAGTTTCTGTACTTTATTG
Leishmania donovani CP022617.1 AACTAACGCT-ATATAAGTATCAGTTTCTGTACTTTATTG
Leishmania infantum AF097653.1 AACTAACGCT-ATATAAGTATCAGTTTCTGTACTTTATTG
Blastocrithidia culicis DQ860204.1 AACTAACGCT-ATATTTGTTACAGTTTCTGTACTATATTG
Blastocrithidia culicis DQ860203.1 AACTAACGCT-ATATTTGTTACAGTTTCTGTACTTTATTG

Tubulin loci

The architecture of the tubulin arrays has been described in a number of trypanosomatids [29], with two mutually exclusive formats being defined–monotypic and alternating. Monotypic tubulin arrays consist of either alpha-tubulin or beta-tubulin. Alternating arrays contain both alpha-tubulin and beta-tubulin genes which alternate along the array. The H. muscarum orthologues of Trypanosoma brucei alpha and beta tubulin genes were found using Orthofinder and used to locate the tubulin arrays.

We identified three genomic loci containing H. muscarum tubulin genes (Fig 3). Two of these loci consist of beta-alpha alternating arrays and the third locus consists of four copies of a beta tubulin genes. The alternating beta-alpha arrays are consistent with previous findings (reported as Herpetomonas megaseliae) [29] and suggested that, like T. brucei, H. muscarum genome has the alternating tubulin array configuration. However, the presence of a monotypic beta tubulin array in addition to the alternating arrays contrasts the established model in which each species has either alternating or monotypic arrays, but not both.

Fig 3.

Fig 3

A. Alternating tubulin arrays in H. muscarum. Scaffolds 22 and 67 were found to have two loci containing alternating putative alpha (red) and beta (blue) tubulin genes. Several of these genes we predict to be tubulin pseudogenes (alpha—pink, beta—light blue) as they contain tubulin domains but also contain sequence consistent with non-LTR transposons. B. A monotypic beta tubulin locus in H. muscarum. Four copies of a putative beta tubulin (blue) were found in tandem on H. muscarum scaffold 20. This locus appears similar to the single copy beta tubulin locus on L. major chromosome 8 as the order of adjacent genes (grey) is conserved. We also see synteny with a locus in T. brucei on chromosome 5, however the beta tubulin gene is absent. Dotted lines indicate orthologous genes. Blue lines indicate orthologous beta tubulin genes.

The genes surrounding the monotypic beta tubulin locus shared some synteny with regions of chromosome 4 of T. brucei and chromosome 8 of L. major (gene numbers Tb927.5.970 –Tb927.927.5.3090 and Lmj.08.1090-Lmj.08.11140). Interestingly this region of L. major chromosome 8 is one of two singleton beta-tubulin loci in the species. As such, the tubulin configuration of H. muscarum was an intermediate between the tubulin array configurations of T. brucei and L. major.

The predicted Herpetomonas muscarum proteome

Orthofinder [30] was used to identify orthologous proteins from other trypanosomatids in the predicted proteome of H. muscarum. For the analysis, protein coding genes from the following species were used: 9 Trypanosoma species/subspecies (Trypanosoma brucei brucei, Trypanosoma brucei gambiense, Trypanosoma congolense, Trypanosoma cruzi, Trypanosoma evansi, Trypanosoma grayi, Trypanosoma rangeli, Trypanosoma theileri and Trypanosoma vivax), 4 Leishmania species (Leishmania braziliensis, Leishmania donovani, Leishmania infantum and Leishmania major); 6 additional monoxenous trypanosomatids along with our Herpetomonas muscarum predictions (Angomonas deanei, Leptomonas pyrrhocoris, Leptomonas seymori, Crithidia bombi, Crithidia expoeki, Crithidia fasciculata). Finally, we included a free-living, non-trypanosomatid kinetoplastid, Bodo saltans, as an outgroup. From these 21 species 87.5% of genes were assigned to 12,701 orthogroups (for summary see Table 3, full orthogroups table S3 Table). We found 7,265 of these orthogroups contained H. muscarum genes. There were 45 orthogroups containing only H. muscarum genes, these groups contain 215 genes. Overall, 90.7% of H. muscarum predicted proteins were assigned to an orthogroup.

Table 3. Summary of Orthofinder analysis of 13 trypanosomatid genomes.

(Trypanosoma rangeli, Trypanosoma grayi, Trypanosoma brucei brucei, Trypanosoma brucei gambiense, Trypanosoma vivax, Trypanosoma congolense, Leishmania donovani, Leishmania major, Leishmania mexicana, Leptomonas pyrococcus, Leptomonas seymori, Crithidia fasciculata and Bodo saltans).

Total number of genes 212,664
Number of genes in orthogroups 186,070
Number of unassigned genes 26,594
Percentage of genes in orthogroups 87.50%
Number of unassigned genes 12.50%
Number of orthogroups 12,701
Number of species-specific orthogroups 313
Number of genes in species-specific orthogroups 4,212
Percentage of genes in species-specific orthogroups 2.0%
Mean orthogroup size 14.7
Median orthogroup size 14
Number of orthogroups with all species present 9
Number of single copy orthogroups 0

Orthofinder also produced a phylogenetic tree based on protein sequences from proteins in orthogroups which contained a single gene from every species used in the analysis (Fig 4A). This tree is consistent with others published for the trypanosomatids (Maslov et al., 2013). Unsurprisingly H. muscarum shares more orthogroups with L. major (6,607) than T. brucei (5,893)–which is more distantly related (Fig 4B). However, H. muscarum had slightly more orthogroups in common (6754) with the two Leptomonas sp. used in the analysis (Fig 4C). Finally, within the Leishmaniinae clade H. muscarum and two species of ‘old world’ Leishmania, L. major and L. donovani, shared 81.2% of their orthogroups (Fig 4D). A global examination of the patterns of gene family sharing between H. muscarum, and other trypanosomatid groups confirmed these patterns (Fig 5A). Most gene families, including most genes, are present in all of the groups, and another significant set of families is shared by all the trypanosomatid groups but missing from the outgroup, the free-living kinetoplastid Bodo saltans. These trypanosomatidae-specific gene families tend to be quite large, while many smaller gene families are specific to genera Crithidia and Trypanosoma, perhaps because of the more extensive taxon sampling of these lineages. There are exceptions, including some strikingly large gene families unique to trypanosomes, Leishmania and a number of other taxonomic groups (Fig 5B). Monoxenous trypanosomatids share many more genes families with Leishmania than Trypanosoma, and there are strikingly few families specific to the Leishmania lineage or any of the monoxenous parasites except Crithidia, explaining the strikingly similar predicted proteomes of Leishmania and H. muscarum.

Fig 4. Relationship between H. muscarum and other trypanosomatids.

Fig 4

A. Phylogeny based on all orthogroups containing a single gene from each species. Other panels show Venn-Euler diagrams in which the areas of each eliptical section are approximately proportional to the number of orthogroups shared by each of (B) H. muscarum, L. major and T. brucei brucei; (C) H. muscarum, Leptomonas pyrrhocoris and L. seymouri and (D) H. muscarum, Leishmania donovani and L. major. Diagram layouts were generated by EulerApe v2.0.3.

Fig 5. A global view of gene family sharing between trypanosomatids.

Fig 5

A. The numbers of gene families (orthogroups; pink bars; values on left-hand y-axis) and the numbers of genes in those groups (blue bars; values on right-hand y-axis) with particular patterns of sharing between high-level groups in our Orthofinder data. Shading in the lower panel from pink to blue represents how widespread each set of families are, with pink representing families specific to one group and dark blue those families present in all groups. B. Scatterplot of gene family size against the number of species a family is present in, with each point representing a single gene family (families with less than 3 genes in total are excluded), and points coloured according to the number of higher-level taxonomic groups they are shared between, as in the lower part of panel A. [code to draw this diagram is a modified version of UpSetR].

We could not look in detail at all of the homology relationships between genes in this extensive comparison. We used a more focused OrthoFinder analysis to investigate specific groups of orthologues between H. muscarum and T. brucei genes of interest e.g. metabolic pathway genes, as T. brucei is the best-studied kinetoplastid at the molecular and cellular level. We summarise our findings in Table 4 (for full data see S4S16 Tables) and discuss some of the orthologues of interest, including surprisingly ‘missing’ orthologues, below.

Table 4. Summary of H. musccarum proteins orthologous to important T. brucei proteins.

H.muscarum orthologues/T. brucei proteins T.brucei/L.major without orthologues in H. muscarum
METABOLISM
Glycolysis 44/45 Tb927.10.4520
Gluconeogenesis 2/2 n/a
Pentose phosphate pathways 12/13 Tb927.2.5800
NADPH metabolism 4/4 n/a
Acetate metabolism 14/17 Tb927.11.2230, Tb927.8.2790, Tb927.6.2790
TCA cycle 17/17 n/a
Mitochondrial carriers 24/25 Tb927.9.12140
Respiratory chain 79/82 Tb927.7.6350, Tb927.10.7090, Tb927.10.3120
Amino acid transporters 31/31 n/a
Lipid metabolism 9/11 Tb927.10.11930, Tb927.4.2700
Leu-Isoleu-Val degradation 22/23 Tb927.4.2700
Fatty Acid Biosynthesis 14/14 n/a
Sphingolipid biosynthesis 7/11 Tb927.9.9410, Tb927.9.9400, Tb927.9.9390, Tb927.9.9380
Glycerophspholipid biosynthesis 16/16 n/a
GPI-N-glycosylation biosynthesis 47/49 Tb927.4.4200, Tb927.1.4830
DIFFERENTIATION AND DNA
Quorum sensing 32/35 Tb927.4.3650, Tb927.11.2250, Tb927.11.11480
Bloodstream to procyclic form differentiation 10/12 Tb927.10.10260, Tb927.10.11220
Epimastigote meiosis 4/5 Tb927.9.15510
RNA regulators of the life cycle 18/18 n/a
Proteins with RNA-binding annotation 54/57 Tb927.10.14950, Tb927.6.2550, Tb927.9.6870
RNAi machinery 5/5 n/a
PROTEIN KINASES 147/169 Tb11.v5.0564, Tb11.v5.0644, Tb927.1.3130, Tb927.10.12480, Tb927.10.15880, Tb927.10.4940, Tb927.10.9980, Tb927.11.5150, Tb927.11.5860, Tb927.3.1850, Tb927.3.3920, Tb927.3.5650, Tb927.3.840, Tb927.4.4330, Tb927.5.4430, Tb927.7.4090, Tb927.9.12400, Tb927.9.12880, Tb927.9.1500, Tb927.9.1570, Tb927.9.16260, Tb927.9.2350
PHOSPHATASES 86/93 Tb927.07.v5.1, Tb07.30D13.60, Tb927.10.4930, Tb927.11.11740, Tb927.11.4990, Tb927.11.5740, Tb927.8.8040
NUCLEAR PROTEOME
Nuclear Pores 27/27 n/a
Exosome 12/12 n/a
Spliceosome 56/59 Tb927.10.7390, Tb927.9.6870, Tb927.3.1090
Kinetochore 30/34 Tb927.10.6330, Tb927.11.1030, Tb927.5.4520, Tb927.9.13970
OTHER PROTEINS OF INTEREST
GP63 14/15 Tb927.11.7610
Mucins 8/11 Tb927.8.7190, TcMUCII, Tb927.11.18610, Tb927.11.3400
LPG biosynthesis 20/29 LmjF.14.1400, LmjF.02.0160, LmjF.02.0170, LmjF.02.0190, LmjF.02.0200, LmjF.02.0210, LmjF.02.0230, LmjF.35.0010, LmjF.25.2460, LmjF.31.3190, LmjF.36.0010, LmjF.02.0010, LmjF.21.0010, LmjF.07.1170, LmjF.34.0510, LmjF.02.0180, LmjF.02.0220, LmjF.05.1230, LmjF.19.650, LmjF.32.3900
Trypanothione synthesis 2/2 LmjF.05.0350, LmjF.27.1870

Metabolism

H. muscarum is missing sphingolipid (SL) biosynthesis genes SLS1-4, including the inositol phosphorylceramide synthase and two choline phosphorylceramide synthases. These genes are part of the same orthogroup from our analysis. Most of the Trypanosoma have 4 genes assigned to this orthogroup (with the exception of T. cruzi (2) and T. vivax (0)). However, other species used in this analysis had only 1 gene assigned to this orthogroup. Given that SLs are thought to be essential to eukaryotic membranes [31], this seemed surprising. However, L. major promastigotes do not require de novo SL synthesis and a mutant devoid of SLs was viable and replicated as log-phase promastigotes [32]. However, the SL-free mutant was unable to differentiate into a metacyclic stage in vitro and showed severe defects in vesicular trafficking. As such, like L. major, H. muscarum and the other species without a complete SLS pathway may rely on scavenging sphingolipids from the environment.

H. muscarum did not have orthologues for the carnitine O-acetyltransferase (CAT) (Tb927.11.2230) and L-threonine 3-dehydrogenase (Tb927.6.2790) genes of the acetate metabolism pathway. We were also unable to find an orthologue to these genes in other species from the Leishmaniinae clade used in the analysis. As such these genes may have been lost sometime after the group diverged from Trypanosoma.

Additionally, three T. brucei respiratory chain genes did not appear to have orthologues in H. muscarum, including mitochondrial NADH-ubiquinone oxidoreductase flavoprotein 2 (Tb927.7.6350), which had orthologues in all species used in the analysis apart from H. muscarum. Similarly, the only genomes in the analysis without an orthologue for the cytochrome c oxidase assembly protein (Tb927.10.3120) were H. muscarum and Phytomonas EM1. Given the importance of these genes, this likely indicates an important gap in the H. muscarum annotation. Finally, no orthologue was identified for the T. brucei alternative oxidase (AOX) (Tb927.10.7090) which is found in Trypanosoma and is upregulated in bloodstream forms. This oxidase is thought to enhance organisms ability to cope with stress associated with temperature change, infections and oxidative stress [33].

We also note that for several T. brucei genes there were multiple H. muscarum orthologues. Two of the most extreme examples of this being the high-affinity arginine transporter AAT13 [34, 35] and the endo-/lysosome-associated membrane-bound phosphatase 2 (MBAP2) which have 38 and 18 orthologous genes in H. muscarum respectively. The increased copy number of these genes, hints at their importance, though the reason for their high-copy number in H. muscarum is as yet unclear. AAT13 and MBAP2 have been shown to be highly upregulated in Leishmania after their ingestion by sand flies and in conditions of nutrient starvation [36, 37]. Speculatively, the increased copy number of these genes may reflect the nutrient availability in Herpetomonas’ environment/host(s).

Differentiation

RNA-binding proteins (RBPs) have emerged as key modulators of gene expression in trypanosomatids—particularly in the context of trypanosome development and differentiation [38]. Orthologues were found for 72/75 T. brucei RNA-binding proteins. RNA-binding proteins with no orthologues found in H. muscarum were: chromatin-remodelling-associated RRM2 (Tb927.6.2550) [39], the pre‐RNA processing protein RBSR1 (Tb927.9.6870) [40] and a hypothetical RBP (Tb927.10.14950).

We have not observed differentiation in H. muscarum using ‘classical’ temperature/pH manipulations in vitro or during D. melanogaster infections. As such the ‘completeness’ of the H. muscarum RBP repertoire, relative to T. brucei which has multiple discrete forms, is of interest. Several of these proteins had multiple orthologues in H. muscarum including RBP10 (4 orthologues, Tb927.8.2780). RBP10 is known to be highly expressed in bloodstream forms of T. brucei and its overexpression in procyclics led to an increase of many bloodstream-form specific mRNAs, as well as transcripts associated with sugar transport, the flagellum and cytoskeleton [41]. The role for this protein in H. muscarum is unclear, as it does not appear to have a bona fide vertebrate host, however given this proteins links to sugar transport, it may play a more general role in metabolism in H. muscarum. Comparisons of H. muscarum RBP expression levels/timings with other trypanosomatids may shed more light on their role in the cell and potentially why we do not observe differentiated forms for this species.

In addition to the RBPs, we were unable to find any orthologues for the hydrophilic acylated surface proteins (HASPs) or small hydrophilic endoplasmic reticulum-associated proteins (SHERPs) which are associated with metacyclogenesis in Leishmania. We also note that the repressor of differentiation kinase 1 (RDK1, Tb927.11.14070) has 6 orthologues in H. muscarum. In T. brucei, RDK1 acts with the PTP1/PIP39 phosphatase cascade to prevent uncontrolled differentiation from bloodstream to procyclic form [42]. Given that H. muscarum is thought to be confined to insects, the presence of multiple copies of this gene which assists in maintaining a ‘vertebrate’ cell form in T. brucei is intriguing. It may be that this protein has an alternative role in H. muscarum.

Surface proteins

No orthologues were found for the EP procyclins which are known to be expressed highly T. brucei procyclic whilst in the tsetse vectors and are thought to provide protection from the digestive enzymes in the insect midgut [43, 44]. As such H. muscarum likely relies on other surface proteins for protection in the insect midgut (see the transcriptomic data below).

The lipophosphoglycan (LPG) is an abundant component of the Leishmania cell surface and its importance during multiple stages of the Leishmania life cycle, including interactions with the insect gut epithelium, is well known [45, 46]. As such the prescence of LPG synthesis ezymes in H. muscarum is of great interest (see Table 4 and S9 Table). Single copy orthologues were found for the LPG biosynthesis-associated proteins GPI12/14 and LPG2-5. The β-galactofuranosyl transferases LPG-1, -1R and -1L were grouped together in a single orthogroup (orthogroup 32) which contained 12 H. muscarum orthologues. However, no orthologues could be found for the β-galactofuranosyl transferases LPG1G1-3 in H. muscarum, these genes were only found in Leishmania species and L. pyrrocoris in our analyses (orthogroup 7861). Orthogroup 32 contained genes from all species used in this analysis with the exception of the two T. brucei sub species. Speculatively, orthogroup 32 may represent a more ancient group of these enzymes, whilst orthogroup 7861 may be a more recent development within the Leishmania/Leptomonas species.

The three L. major side chain arabinosyltransferases SCA1, 2 and L were grouped into a single orthogroup (orthogroup 886). This orthogroup consisted of only Leishmania, Leptomonas and T. grayi proteins. Similarly, the L. major side chain galactosyltransferases (SCG1-7) and related proteins (SCGR1-6) were grouped into a single orthogroup (orthogroup 60) which contained protein sequences from only Leishmania and Leptomonas suggesting these proteins may be Leishmaniiae specific.

Orthofinder was unable to find an orthologue to the major surface proteins of salivary gland forms of T. brucei—BARPs (bloodstream alanine-rich proteins). These GPI-anchored proteins required for tsetse salivary gland colonisation [47, 48]. Additionally, we do not find orthologues for the T. brucei metacyclic invariant surface proteins (MISPs) which are found extending above the VSG coat in salivary gland metacyclic forms [49]. Given the proteins are crucial for salivary gland colonisation, the lack of copies in the H. muscarum genome may partially explain the inability of H. muscarum to colonise the salivary glands of D. melanogaster, instead infections are confined to the insect crop and gut [20].

Finally, the 13 T. brucei GP63 genes were grouped with 28 H. muscarum genes. GP63 is a major surface protease in L. major promastigotes. The comparatively high copy number of GP63 in H. muscarum may highlight its importance. Furthermore, GP63 has been implicated in Leishmania virulence [50], and as such these will be of interest in future studies.

Nuclear proteome

Kinetochore interacting protein 3 (KKIP3, Tb927.10.6700) and SR protein (Tb927.9.6870) had no orthologues in H. muscarum or other species from the Leishmaniiae clade used in the analysis and as such they appear to be Trypanosoma specific. RNAi of KKIP3 in T. brucei resulted in defects in DNA segregation and reduced population growth [51].

Additionally, T. brucei’s kinetochore interacting protein 1 (KKIP1), PHF5-like protein (Tb927.10.7390) and U1 small nuclear ribonucleoprotein 24 kDa (Tb927.3.1090) had orthologues in all species used in the analysis apart from H. muscarum. Similar to KKIP3, RNAi knock down of KKIP1 caused defects in DNA replication, though in the case of KKIP these defects were more severe–resulting in the loss of entire chromosomes [51]. It is unclear if these genes have been lost in H. muscarum or this indicates a gap in the current annotation. Based on the importance of KKIP1 and the fact these genes have orthologues in all other species analysed, it is likely to be the latter.

Finally, H. muscarum appears to have a ‘full set’ of the T. brucei RNA interference pathway genes including an orthologue for TbARGO1 (Tb927.10.10850). Genes from this well-conserved (in metazoans) pathway have been lost in several trypanosomatids including: L. major, L. donovani and T. cruzi [52, 53, 54]. The loss of this pathway in these organisms has been linked to Leishmania RNA virus perturbation [54, 55]—though this has not been explicitly demonstrated. Further investigations to look for evidence of viruses akin to the LRVs in H. muscarum could test the link between RNAi and virus infection in trypanosomatids. The presence of a functional RNAi pathway has also been linked to transposon activity in Leishmania–with RNA-negative species lacking active transposable elements (TEs), and RNAi competent L. braziliensis harbouring several classes of active TEs [55, 56]. Given this, it is possible that the loss/lack of active TEs in L. major and L. donovani have lifted the requirement of the RNAi pathway to protect against TE-associated genomic perturbations. We did observe transcripts corresponding to the telomere associated transposable elements (TATEs) in all H. muscarum transcriptomes (see below). As such, there may also be an important link between RNAi and transposon activity in trypanosomatids.

The H. muscarum transcriptome during in vitro culture

We first analysed the transcriptome of H. muscarum during in vitro axenic culture, specifically to compare log-phase and stationary phase cultures. Knowledge of the log-phase transcriptome was especially important as this was the ‘pre-infection’ transcriptome in our Drosophila infection model. By comparing the log-phase H. muscarum transcriptome with that of H. muscarum in flies we sought to identify genes important in the establishment of infection (see section below). The principal component analysis (PCA) plot (S2 Fig) shows that the first principal component is mostly capturing variation between distinct clusters of samples from log and stationary phase and explains 68% of the variance in these data. As expected, we found extensive differential expression between log-phase and stationary phase, with 4044 genes significantly differentially regulated (p-adjusted <0.05) (S16 Table). This is approximately a third of the genome but most changes in expression were modest, with only 264 genes upregulated ≥ 2-fold in stationary phase cells and 811 downregulated ≥ 2-fold which we will discuss further below. GO enrichment analysis, using Ontologizer [57], did not identify any significantly enriched GO terms associated with differentially regulated genes. However only 62% of H. muscarum genes have associated GO terms. As such, we looked for enrichment in Pfam domains. There were 26 Pfam domains significantly enriched in the genes upregulated in stationary phase and 73 Pfam domains significantly enriched among downregulated transcripts (S17 Table), which we discuss further below.

Cell cycle associated proteins

The Pfam domain associated with cyclins was significantly enriched in genes upregulated in stationary phase cells. From this, we investigated the expression profiles of the cyclins, and their associated kinases. Eleven were found to be differentially regulated between the two cell populations (Table 5).

Table 5. Significantly differentially regulated cyclins and cyclin-related kinases between stationary, and log phase H. muscarum.
Gene Name H. muscarum orthologue ID log2FoldChange adjusted p-value
CRK4 HMUS00195900.1 1.3 8.89E-10
cyclin 11 HMUS01322900.1 1.2 4.32E-05
cyclin 2 HMUS00751100.1 1.2 2.72E-19
cyclin 4 HMUS00787500.1 1.1 1.53E-17
cyclin 7 HMUS00475100.1 0.8 2.41E-14
CRK10 HMUS01143000.1 0.7 6.49E-09
cyclin 5 HMUS00580100.1 0.7 2.02E-12
cyclin 10 HMUS01323000.1 0.5 0.001
CRK12 HMUS00986000.1 0.3 0.015
DNA-directed RNA polymerase III subunit, putative HMUS00638800.1 -0.3 0.032
mitochondrial DNA polymerase I protein C HMUS00828800.1 -0.5 0.006
mitochondrial DNA polymerase I protein D HMUS00617400.1 -0.5 0.018
mitochondrial DNA polymerase I protein B, HMUS01100200.1 -0.6 0.007
DNA polymerase alpha/epsilon subunit B HMUS00740000.1 -0.7 0.004
DNA polymerase delta catalytic subunit HMUS00566500.1 -0.7 0.006
CRK3 HMUS00914500.1 -1.0 1.06E-40
cyclin 8 HMUS00524500.1 -1.0 1.40E-39

There was significant downregulation of the mitosis-associated cyclin 8, CRK3 and several mitochondrial DNA polymerase subunits in stationary phase cells. Knockdown of CRK3 in T. brucei is associated with a reduction in cell growth [58]. Furthermore, there was upregulation of the G1-associated cyclins 7, 4 and 11. These observations reflect the observed reductions in cell replication at higher cell densities. Consistent with this, and with a reduction in cell growth, there were also significant reductions in transcripts for α- and β-tubulins, DNA polymerases and several protein synthesis-related genes including: 40S ribosomal subunits, 28S rRNAs and five putative elongation factor 2 genes. However, there was also upregulation of mitosis-associated cyclin 2 in the stationary phase cells. Cyclin 2 has two roles in T. brucei procyclics: cell cycle progression through G1 and the maintenance of correct cell morphology at the posterior end of the cell [59]. The CRKs 10 and 12, which were also upregulated in stationary phase cells, have been shown to interact with cyclin 2 and their knock-down results in growth defects [60]. CRK12 is also essential to survival of T. brucei in mice and its depletion by RNAi lead to defects in endocytosis, an enlarged flagellar pocket and abnormal kinetoplast localisation [61]. Given the relative abundance of many transcripts associated with reduced replication in stationary phase cells, the upregulation of cyclin 2 and its associated CRKs (10 and 12) may be more relevant to the maintenance of correct cell morphology than mitosis.

Stress and metabolism

Stationary phase (of growth) is associated with build-up of toxic waste products and fewer nutrients available per cell. It was therefore unsurprising that we observed transcriptional changes indicating metabolic change and nutrient starvation. Genes containing the Pfam domain associated with major autophagy marker ATG8 were significantly enriched in stationary phase transcripts (33 in total). Autophagy is a vital process for survival in nutrient poor environments and involves the segregation of the cell components to be recycled into double membrane-bound vesicles called autophagosomes. The requirement for increased amounts of membrane in autophagy, may partially explain the upregulation of fatty-acid synthesis related genes in stationary phase, as fatty acids are crucial components of cell membranes. Three lipases, two putative lipase precursor-like proteins, fatty-acyl-CoA Synthase 1 and putative fatty acid elongase (ELO) protein were upregulated upon entry into stationary phase. This is consistent with observations of Trypanosoma cruzi cultures [62].

Whilst the upregulation of autophagy-related genes is an indicator of cell stress, we also observed the downregulation of several genes with domains associated with responding to oxidative stress including: thioredoxin, glutathione S-transferase and alkyl hydroperoxide reductase (AhpC) and thiol specific antioxidant (TSA). As such, cells do not appear to be under significant oxidative stress. Other forms of stress, such as reduced nutrient availability or pH changes, may be driving the predicted increases in autophagy. Additionally, transcripts bearing the heat shock protein 60 HSP60 domain (PF00118) were also significantly enriched in the downregulated transcripts, which is another indicator of cell stress.

Cell surface proteins

Proteins sharing a domain (cl28643) with the variant surface protein (VSP) proteins of the Giardia lamblia, a flagellated intestinal pathogen, were highly represented among genes upregulated in stationary phase H. muscarum. In G. lamblia, these VSPs are integral membrane proteins rich in cysteine residues, often in CxxC repeats. They have a highly conserved C-terminal membrane spanning region which has a hydrophilic cytoplasmic tail with a conserved five amino acid CRGKA signature sequence, and an extended polyadenylation signal [63, 64]. One VSP, of hundreds in the Giardia genome, is expressed per Giardia cell and they are thought to protect the cells from proteolysis [65]. A similar strategy of surface protein expression is utilised by blood stage T. brucei cells [66]. This method of antigen switching plays a major role in immune system avoidance and survival in vertebrate hosts. In H. muscarum the VSP domain-containing genes are predicted, by Phobius [67], to encode proteins with 8–9% cysteine residues, and a single predicted transmembrane domain predicted at the C-terminus. Notably there were also ten VSP domain containing proteins downregulated upon entry into stationary phase.

In addition to the VSP domain containing genes, several other putative surface proteins were differentially regulated upon entry to stationary phase; two putative amastin genes were highly upregulated, and eight transcripts which encode for proteins with the cytomegalovirus UL20A protein domain (PF05984), were downregulated in stationary phase H. muscarum cells. The functions of proteins with UL20a domains, including the domains namesake, are largely unknown. Deletion of UL20a from the human cytomegalovirus genome resulted in reduced viral production in infected fibroblasts [68]. Further study will be required to elucidate the role of these proteins in trypanosomatids.

Transcription

The bias towards downregulated transcripts in the stationary phase cells as compared to log phase suggests a reduction of transcription and translation during stationary phase. Furthermore, five tRNA-synthase Pfam domains (PF00133.22, PF00749.21, PF00152.20, PF00587.25, PF01411.19) were significantly enriched in downregulated transcripts (chi-squared, p< 0.05) and RNA polymerase III subunits were also downregulated. Overall, transcriptomic changes associated with cell surface remodelling, autophagy and reductions in transcription were observed in cells entering stationary phase. Cyclin expression patterns appear to suggest a bias in cells at G1 phase, as reported for in vitro culture of T. brucei procyclics [69].

Transcriptome of H. muscarum inside D. melanogaster compared to in vitro culture

To identify potentially important H. muscarum genes during the infection of D. melanogaster we sought to analyse the transcriptome of the trypanosomatid over the course of infection by RNA-sequencing analysis. RNA was purified from infected flies at 6, 12, 18, and 54 hours post-ingestion of H. muscarum. The resulting RNAs were sequenced and mapped to the concatenated genomes of D. melanogaster and H. muscarum. Reads were later resolved to the corresponding species. Here we will discuss the resulting transcriptome of H. muscarum: the transcriptome of D. melanogaster after ingestion of H. muscarum in the same experiment was discussed elsewhere [20].

The number of reads which mapped to the H. muscarum genome ranged from 6949 to approx. 16.2 million reads per sample. At 6 hours post ingestion 40% of the total mapped reads were shown to map to H. muscarum (average of 3 biological replicates). This decreased to 20% in samples from 12 hours and 9% at 18 hours post ingestion. This correlates with the observed decrease in H. muscarum numbers as the parasite was cleared by D. melanogaster 18–54 hours post ingestion [20]. For differential expression analysis, only data up to 18 hours post infection was used as at 54 hours the number of sequencing reads mapping to the H. muscarum genome dropped below 1% of the total number of mapped reads (Fig 6).

Fig 6.

Fig 6

A. RNA-seq reads extracted from infected flies (whole) which mapped to H. muscarum genome. Error bars show the standard error of the mean. B. Principal component analysis of differentially expressed H. muscarum genes in log phase culture vs. samples isolated from infection flies at 6, 12 and 18 hours post-ingestion. There are two clear sample groupings (circled) which correspond to RNA from l in in vitro culture log phase cells and RNA isolated from infected flies. Different shades of blue indicate the sample origin (n = 3 per condition).

Principal component analysis (PCA) shows that the first two principal components of variation in mRNAs between H. muscarum from in vitro culture and H. muscarum after ingestion by D. melanogaster explained 58% and 10% of the variance in these data (Fig 6B). The PCA plot shows a high degree of difference between the in vitro samples and samples isolated from infected flies. The level of change in expression was much higher than between the two in vitro conditions discussed above.

For the infections, log phase H. muscarum cultures were used to feed the flies. In order to identify transcriptomic changes in H. muscarum associated with being ingested by the fly, we compared the transcriptome of H. muscarum cells from log phase in vitro culture to the in-fly transcriptomes. Over a third of the genome, 4,633 genes, was significantly differentially regulated (Wald test, adjusted p-value < 0.05) between log phase axenic culture samples and samples from infected flies (S18 Table). Comparisons of gene expression between sequential time points over the course of infection revealed that there was a large initial transcriptomic change upon ingestion with 4662 genes differentially regulated between log phase culture and six hours post ingestion. This large initial transcriptomic shift was followed by more subtle transcriptomic changes between 6–12 (204 genes) and 12–18 hours (25 genes) (adjusted p-values < 0.05). Here we describe some of the changes in gene expression observed after ingestion and how these compare with other published transcriptome studies of trypanosomatids in their insect vectors including notable work by Inbar et al., 2017 [37] on genes expression of four morphologically distinct L. major stages in a sand fly vector and Savage et al., 2016 [70] on T. brucei in three tsetse fly tissues.

Herpetomonas muscarum genes differentially regulated at six hours post-ingestion by Drosophila melanogaster

Approximately a third of the H. muscarum genome was found to be significantly differentially expressed between log phase axenic culture and six hours post ingestion by D. melanogaster (p < 0.05) (S19 Table). Of this subset, 640 genes had a fold change of ≥ 4 between the time points–highlighting the magnitude of the trypanosomatids response to ingestion. GO enrichment analysis, using Ontologizer [57], identified two significantly enriched GO terms in the 346 transcripts comparatively enriched at six hours post ingestion; OG0000045 (autophagosome assembly, p = 0.0014) and OG0003333 (amino acid transporters, p = 0.0002). Given the aforementioned lack of annotated GO terms in H. muscarum, we also looked at Pfam enrichment in the H. muscarum genes significantly upregulated upon ingestion by the fly. The top 15 represented Pfam domains in genes upregulated ≥ 4-fold at six hours post-ingestion are all significantly enriched compared to the full gene set (S20 Table). Additionally, there were several Pfam domains enriched in the downregulated transcripts, which we discuss further below.

Leucine-rich repeat proteins

The most represented Pfam domain in genes upregulated at 6 hours post ingestion were the leucine-rich repeat (LRR) domains. LRRs are primarily known to be involved in protein-protein and protein-glycolipid interactions and are the major domain of the Leishmania protein surface antigens (PSAs), which are known virulence factors. Ten of the upregulated LRR-containing genes encode orthologues of the Leishmania PSAs (Fig 7A). The predicted protein structures for 8/10 of these transcripts consists of a single transmembrane domain at the N-terminus, with the majority of the protein predicted to be on the external face of the cell (S21 Table). One transcript encodes a protein with no predicted transmembrane domains and could therefore be a secreted protein. The remaining transcript encodes a protein with two predicted transmembrane domains, with the region between these domains on the external face of the cell. Other upregulated LRR-containing transcripts are putative adenylate cyclases. These proteins also feature prominently in the T. brucei genes which are differentially regulated upon ingestion by tsetse [70]. These signalling proteins likely assist in the coordination of the trypanosomatids’ responses to the environment with its vector.

Fig 7. Heat map of normalised, log transformed counts for differentially expressed Herpetomonas muscarum surface proteins.

Fig 7

A. H. muscarum orthologues to the Leishmania promastigote surface antigens. B. Transcripts encoding proteins with a Giardia variant surface protein (PF03302.13) domain. The black bar indicates the genes from orthogroup 11 which are mostly downregulated upon ingestion of H. muscarum by the fly. C. Differentially regulated H. muscarum amastin genes. Log = log phase axenic culture samples, Stat = stationary phase axenic culture samples. 6h = six hours post ingestion by D. melanogaster, 12h = twelve hours post ingestion by D. melanogaster, 18h = eighteen hours post ingestion by D. melanogaster.

Cell surface genes

Seven of the top fifteen genes, 21/346 overall, upregulated in H. muscarum at six hours post ingestion by D. melanogaster contained the Giardia variant-specific surface protein (VSP) domain (PF03302.13). These genes are members of three distinct orthogroups. A heatmap showing the normalised read counts for these genes across all samples is shown in Fig 7B. Transmembrane domain prediction tools [67, 71] predict a single transmembrane domain at the N-terminus in the majority of predicted protein sequence for these genes. However, there were also eight transcripts without predicted transmembrane domains, which are predicted to be secreted proteins. The majority of these putative surface antigens are 769–781 amino acids in length, have a single predicted transmembrane helix at residues 7–29 (S21 Table). As previously mentioned, many of these proteins are also upregulated by the cells upon entry into stationary phase, though not to the same levels. Additionally, several transcripts for VSP-containing proteins are downregulated in H. muscarum upon entry into the fly. These thirteen proteins are generally smaller than those upregulated at the same time point (95–501 amino acids) and tended to be part of orthogroup 11.

Thirty amastins, from 11 different orthogroups, were differentially regulated in H. muscarum at 6 hours post ingestion (Fig 7C). The majority (21) were upregulated upon entry into the fly, though 14 transcripts were also upregulated during stationary phase in vitro culture. Each orthogroup represented contained both up- and down-regulated genes. The function of this family of glycoproteins, are not well understood. In Leishmania, amastins are more commonly associated with macrophage-dwelling amastigote forms, where they are known to be important to both survival and virulence [72]. However, it has also been shown that β-amastins are upregulated during the insect stages of the life cycle in T. cruzi [73]. The H. muscarum amastins from orthogroup 18 share only 25–30% identity (across the whole sequence) to the two pairs of T. cruzi β-amastin alleles highlighted in this study. This may initially seem to be quite low, however the β-amastins have been shown to be highly divergent (18–25% identity) between T. cruzi strains [73]. Therefore, based on sequence alone, it is unclear which proteins may have parallel roles in the two trypanosomatid species.

Several other classes of surface protein genes were differentially expressed between log-phase axenic culture and six hours post-ingestion. Transcripts for proteins containing the Cytomegalovirus UL20A protein domain (PF05984) were significantly down regulated upon ingestion. Five of these genes were from orthogroup 11 –the same group as many of the down regulated VSP domain containing genes. Finally, sixteen (of the twenty-eight in the genome) H. muscarum orthologues to known Leishmania virulence factor, GP63, were significantly differentially regulated in the first six hours post ingestion by the fly. All but one of the differentially regulated GP63 orthologues were predicted to be GPI-anchored at the cell surface (GPI-SOM online tool) [74]. The exception, HMUS00892600.1, is predicted (THTMM v2.0) [71] to have a single transmembrane domain and for the majority of the protein to be cytosolic. Most GP63 transcripts were upregulated in H. muscarum after ingestion (log2 foldchanges 0.29–2.73), however two putative GP63 genes, HMUS01311000 and HMUS01311200, were downregulated with log2 foldchanges of -1.94 and -1.58 respectively.

Stress-related genes

The insect gut is a hostile environment. The presence of digestive enzymes, changes in pH and the insect’s gut microbiota make surviving a difficult challenge for any invading organisms. In correlation with this, a number of stress-associated genes and pathways are upregulated in H. muscarum upon entry into the fly. As previously mentioned, autophagy is an important process for survival in stressful conditions where fewer nutrients are available—such as in the midgut of an insect. Similar to observed in stationary phase axenic culture, twenty-six putative ATG8 genes were upregulated in H. muscarum at six hours post ingestion compared to log-phase axenic culture–suggesting extensive protein recycling is occurring in the cells. Additionally, 40 heat shock protein 83 genes were shown to be upregulated at six hours after ingestion. Heat shock proteins act as molecular chaperones which stabilise other proteins, help them to fold correctly and be regulated after damage in stressful conditions. The upregulation of these genes provides further evidence that these cells are in a stressed state.

Metabolism

There was significant enrichment of putative amino acid, pteridine and sugar transporters in the upregulated transcripts. These included the amino acid transporters (AATs) orthologous to the Leishmania amino acid permease 3 (AAP3), AAT11, AAT12 and AAT20. AAP3 has been shown to be arginine specific and is linked to virulence in L. donovani infections in humans [75]. AAT11 is upregulated in during stress responses associated to purine starvation [76]. In L. major, AAP3 and AAT20 were strongly upregulated in the motile, gut-dwelling nectonomad forms [37]. These transporters have been shown to transport neutral amino acids across the cell membrane, notably proline and alanine, which can be used as alternative carbon sources by trypanosomatids and are abundant in insect vectors haemolymph.

Six putative pteridine transporters were also upregulated in H. muscarum at 6 hours post ingestion. Pteridines are needed by trypanosomatids to produce enzyme cofactors such as biopterin. Leishmania parasites are unable to synthesize their own pteridines [77] and as such must scavenge them from their environment. It is not currently known if H. muscarum is also a pteridine auxotroph, however like the Leishmania species, the cells appear to scavenge from the environment upon entry into the fly.

Several transcripts putatively involved in lipid metabolism were downregulated in H. muscarum following ingestion by D. melanogaster, including triglyceride lipases and members of the biotin/lipoate protein ligase (BLPL) family. This contrasts what has been observed in L. major in the midgut of sand flies where genes from these families were upregulated [37]. Therefore, whilst upregulation of pteridine and amino acid transporters appears to be a conserved trypanosomatid response to being ingested by insects, lipid metabolism during insect infection may differ between trypanosomatid genera.

Gene expression-related transcripts

Consistent with the differential expression of many genes upon entry into the fly, and therefore a predicted increase in chromatin remodelling and translation activity, there was upregulation of histones (2A, 3 and 4), RNA polymerase subunits 1 and 2, putative 40S/60S ribosomal proteins and putative 28S beta rRNAs in H. muscarum after ingestion by the fly. This result is consistent with what has been reported in T. brucei where the 40S and 60S ribosomal subunits were amongst the most highly upregulated genes in cells isolated from the midgut and proventriculus of G. morsitans [70].

Cell cycle

Upon ingestion by the fly there was strong upregulation of putative G1-associated cyclins 4, 7 and 11 as well as the G1 associated cyclin-related kinase 1 (CRK1) [58]. Cyclin 6, cyclin 8 and CRK9, which are associated with the G2/mitosis transition [59, 78], were slightly downregulated suggesting a reduction in cell replication at six hours post ingestion (Table 6). Consistent with this there was also downregulation of putative DNA polymerase kappa, the theta DNA polymerase subunit and mitochondrial DNA polymerase subunits. Furthermore meiosis-associated genes NBS1, Rad50 and SPO11 were also downregulated.

Table 6. Cell cycle-associated proteins differentially expressed in H. muscarum upon ingestion by D. melanogaster.

Fold changes shown are at 6 hours post ingestion compared to log phase axenic culture.

Gene Name H. muscarum orthologue ID log2foldchange adjusted p-value
cyclin 11 HMUS01322900.1 -3.31 6.54E-27
cyclin 4 HMUS00787500.1 -1.15 3.62E-13
CRK4 HMUS00195900.1 -0.95 3.46E-03
CRK1 HMUS01116400.1 -0.84 9.95E-08
CRK8 HMUS00385600.1 -0.49 2.32E-02
cyclin 7 HMUS00475100.1 -0.44 2.21E-02
cyclin 8 HMUS00524500.1 0.36 1.94E-02
mitochondrial DNA polymerase I protein D HMUS00617400.1 0.57 9.16E-03
cyclin 6 HMUS00719100.1 0.74 2.14E-02
cyclin 5 HMUS00580100.1 0.85 9.34E-05
CRK9 HMUS01274200.1 0.87 1.45E-03
DNA polymerase theta catalytic subunit HMUS00097200.1 1.15 1.51E-07
mitochondrial DNA polymerase I protein C HMUS00828800.1 1.25 7.27E-09
DNA polymerase kappa HMUS01207400.1 1.36 4.93E-03
CRK11 HMUS00452900.1 1.46 8.20E-04
CRK12 HMUS00986000.1 2.07 6.68E-18

Given the apparent reduction replication rate in H. muscarum cells at six hours after ingestion, the upregulation of nine tubulin genes (3 alpha- and 6 beta-tubulins) is likely to accommodate the changes in cell morphology, rather than to produce new daughter cells. Tubulin upregulation is also observed in T. brucei isolated from the midgut and proventriculus of Glossina morsitans [70], though these cells are replicative–as such the ‘motivation’ for increased tubulin gene expression may be different.

Differentiation and RNA-binding proteins

It is well documented that (human) disease-causing trypanosomatids have several life-cycle stages within their respective vectors. Coordinated differentiation between these discrete stages requires a suite of RNA-binding proteins (RBPs) which regulate parasite gene expression [38]. Despite the lack of observed differentiated forms in infections of D. melanogaster, several differentiation associated-RBPs are differentially regulated in the trypanosomatid after infection including RBP10 and hnRNP F/H. These proteins have been shown to regulate gene expression in T. brucei blood-stream forms [41, 79]. RNAi knockdown of RBP10 in bloodstream trypanosomes resulted in the downregulation of a large number of bloodstream form mRNAs [41]. The same study showed that overexpression of the protein in procyclics led to an increase of many bloodstream-form specific mRNAs, including genes involved in sugar transport. This is likely owing to the fact blood is a glucose-rich environment and the cell will attempt to utilize this ready carbon source [80]. Three out of the four orthologues of TbRBP10 were strongly (> 4-fold) upregulated in H. muscarum cells after ingestion by D. melanogaster. During feeding experiments sucrose is added to the H. muscarum culture media to encourage the flies to feed. As such these genes may be unregulated in response to increased sugars available in the environment.

However, several other cell-cycle regulating RBPs associated with blood-stream form trypanosomes were also upregulated in H. muscarum after ingestion by the fly, including zinc-finger domain-containing RBPs ZC3H11 and ZC3H18. The former is essential in bloodstream-form trypanosomes and is involved in protection from heat shock, whilst depletion of ZC3H18 delayed blood stream form-to-procyclic differentiation in T. brucei [81, 82]. As such the situation may be more complex than solely metabolism-driven expression changes.

In addition to parallels with blood-stream form trypanosomes, transcripts for ALBA3/4 proteins (named for their ‘acetylation lowers binding affinity’ domain) were significantly downregulated in H. muscarum upon entry into the fly. In T. brucei, these proteins are expressed in all stages, except those found in the tsetse proventriculus. RNAi knockdown of these proteins in T. brucei axenic procyclics resulted in elongation of the cell body and repositioning of the nucleus and the kinetoplast to resemble the epimastigote cell-stage [83]. As such the reduction in ALBA3/4 transcripts suggests there may be parallels between trypanosomes during the latter stages of tsetse infection and H. muscarum during D. melanogaster infection.

Other differentially regulated RNA-binding proteins with as yet unclear roles in differentiation included: the essential gene expression regulation protein RBP42 and ZC3H12, a protein associated with differentiation [38].

Herpetomonas muscarum genes differentially regulated between six- and twelve-hours post-ingestion by Drosophila melanogaster

There were 204 genes which were differentially regulated between six- and twelve-hours post ingestion (p-adjusted < 0.05), 161 of these had a fold change of ≥ 2 with just 31 genes upregulated at the latter timepoint (S22 Table). Hypothetical proteins lacking functional information dominated the highly upregulated genes. The most enriched transcript at 12 hours post ingestion encodes a putative surface protein, the top blastp hit for which was the Giardia variant-specific surface protein VSP136-4. This suggests VSP domain-containing proteins continue to be important throughout infection of the fly. Two DNA replication and repair associated transcripts were also upregulated at 12 hours post ingestion: an orthologue of T. brucei cell division cycle protein 45 (CDC45), and tyrosyl-DNA phosphodiesterase-like protein. CDC45 is part of the CMG (Cdc45·Mcm2–7·GINS) complex which functions as a helicase during DNA replication [84] and may also play a role in DNA repair [85]. Furthermore, Tyrosyl-DNA phosphodiesterases are involved in the repair of topoisomerase-related DNA damage [86]. These observations indicate that H. muscarum cells are under genotoxic stress after ingestion by D. melanogaster.

Herpetomonas muscarum genes differentially regulated between twelve- and eighteen-hours post-ingestion by Drosophila melanogaster

In the 23 genes found to be upregulated at 18 hours post ingestion (compared to at 12 hours, see S23 Table) genes involved in binding to damaged DNA (OG00033330) were significantly enriched. Only two of these transcripts were able to be assigned putative functions: eukaryotic replication factor A and a structure-specific endonuclease. This observation provides further evidence of genotoxic stress in H. muscarum after ingestion, as indicated by other upregulated DNA repair genes at 12 hours post-ingestion.

The most highly upregulated transcript at 18 hours post ingestion was an orthologue of the L. major UDP-galactose transporter LPG5B. This protein allows import of UDP-galactose into the golgi body where they are used to synthesize phosphoglycans. Capul et al., (2007) showed that, in L. major, loss of LPG5B resulted in cells with defects in proteophosphoglycans (PPG) [87]. PPGs are known virulence factors and are found in membrane bound, filamentous and secreted forms. The viscous secreted PPG is thought to protect the L. major in the gut and may also force the fly to regurgitate the infective Leishmania cells into the bite wounds of vertebrates.

Herpetomonas muscarum genes differentially regulated between stationary phase in vitro culture and in-fly samples

Comparisons between stationary phase in vitro culture and in-fly samples revealed 5102 differentially expressed genes (adjusted p-value < 0.05). Approximately 55% of the genes differentially regulated between in vitro and in-fly samples were the same for log phase vs in-fly and stationary phase vs. in-fly comparisons (Fig 8). However, 1639 genes were only significantly differentially regulated in stationary phase vs in-fly comparisons (S24 Table). Genes differentially regulated between log phase in vitro culture and in-fly samples have already been discussed, we will now outline the genes only differentially regulated when the transcriptomes of stationary phase in vitro samples of H. muscarum are compared with those after ingestion by D. melanogaster. Of the 1639 genes, 750 had a fold change of ≥ 2 –approximately a third of which were upregulated in H. muscarum after ingestion by D. melanogaster.

Fig 8. Venn diagram showing the numbers of genes differentially expressed in Herpetomonas muscarum between two in vitro culture conditions and after ingestion by Drosophila melanogaster.

Fig 8

Half of the top ten in-fly enriched transcripts were TATE (telomere associated mobile elements) DNA transposons and among the most represented Pfam hits in the fly-enriched transcripts were reverse transcriptase (PF00078.27) and phage integrase (PF00589.22) domains (S25 Table). Though TATE DNA transposons comprise 1.32% of the L. major genome, very little is known about these transposable elements, other than that they contain a tyrosine recombinase [88]. It is possible that these transposable elements are more mobile in H. muscarum cells ingested by the fly. However, we predict that the overall level of transcription of cells in stationary phase cultures are reduced (vs. log phase, see above). As such, the comparative increase in TATE transposon transcription between stationary phase cells and H. muscarum from Drosophila may not be specifically a result of ingestion, but a reflection of general transcription levels in the two groups of cells.

As previously discussed, transcripts for several proteins containing a Giardia VSP domain are enriched in stationary phase compared to log phase in axenic culture. However, five were shown to be even more abundant in the H. muscarum cells ingested by D. melanogaster. Two other putative surface antigens were also enriched in ingested H. muscarum which contained a domain similar to Cytomegalovirus UL20A glycoprotein and the domain of unknown function DUF4148.

Transcripts encoding for putative antioxidant proteins were significantly enriched in H. muscarum after ingestion by the fly. Enriched Pfam domains in the upregulated gene set included thioredoxin, glutathione S-transferase and alkyl hydroperoxide reductase (AhpC)/thiol specific antioxidant (TSA) domains. Our previous work showed that the D. melanogaster response to H. muscarum ingestion included the production of reactive oxygen species [20], as such the upregulation of these antioxidant proteins is likely an attempt to cope with this insect immune response.

Conclusion

Here we have described the genome and predicted proteome of the monoxenous trypanosomatid H. muscarum and characterised the transcriptome of the parasite both in culture and inside the gut of its natural host D. melanogaster. H. muscarum shows similarity in both genome structure and content to Leishmania, with significant synteny to L. major and sharing 80% of orthogroups with other members of the subfamily Leishmaniinae. While most Herpetomonas genes have orthologs in other trypanosomatids, a number of genes found elsewhere appear to have been lost in Herpetomonas, in particular genes associated with the specialised life stages of dixenous trypanosomatids. We might expect loss of some mammal-stage specific genes, such as HASPs, HERPs and sphingolipid synthesis genes important in metacyclic Leishmania cells, but more surprising might be the loss of genes expressed in insect stages such as BARPs and procyclins.

The transcriptome of Herpetomonas inside its insect host also showed strong parallels with the responses of Leishmania promastigotes inside the sand fly gut, in particular both parasites showing significant upregulation of PSAs and GP63 (this study; see ref. [37]). These proteins have been shown to be associated with virulence in Leishmania and are important for establishment of parasite infection in the midgut, and so for transmission. The extensive changes in transcript abundance of genes likely to be expressed on the cell surface during insect infections includes a number of gene families not known to be important in dixenous trypanosomatids (e.g. related to Giardia variant surface protein) implies that a dynamic cell surface may be a shared feature of trypanosomatid life cycles beyond dixenous groups [89], and that even more diversity of surface proteins may be present in the monoxenous trypanosomatids, supporting findings from free-living kinetoplastids. We also note that the majority of the genes showing changes in expression later in insect infections are hypothetical, including many hypothetical genes conserved with other trypanosomatids. This reflects similar findings in better-studied dixenous parasites [37, 70] and highlights how much we still have to learn about the interactions between trypanosomatids and their insect host.

In the wild, there is little data pertaining to the percentage of sand flies with established Leishmania infection in endemic regions. In this context, the parallel to the more accessible Drosophila-Herpetomonas system is important, as the genetic component of the parasite that influences midgut establishment is easier to determine. However, more work is needed to ascertain whether genes upregulated in Leishmania and those in Herpetomonas are truly functionally related. The limitation is the difference between the lifestyles of these insects. Most strikingly, female sand flies become infected with Leishmania during blood feeding, while Drosophila is never haematophagous. Nevertheless, sand flies are also plant feeders, so there is some overlap in the ecological niche as well as in their basic biology. The presence of trypanosomatids is another shared feature of the midgut landscape of these flies, and our data suggest that at least some aspects of the molecular interaction between flies and trypanosomatids may also be conserved.

Materials and methods

Herpetomonas muscarum culture

H. muscarum were cultured in supplemented BHI (3% brain heart infusion broth, 2.5mg/ml haemin, 1% FCS) and incubated at 28°C. For most experiments, cells were maintained in a log phase of growth by splitting every 3 days.

Infection of D. melanogaster (see reference 20)

For each independent infection of a group of 20–30 flies, 107 H. muscarum cells were harvested from a 3 days-old culture (which showed the highest infectivity rate from our experience) and resuspended in 500ul 1% sucrose. The parasite solution was then transferred to a 21mm Whatman Grade GF/C glass microfibre filter circle (Fisher Scientific). Circles containing the parasite cells were placed into standard Drosophila small culture vial without any food. The flies used in the infections were 4–5 days old before they were starved overnight. After starvation, the flies were transferred to food vials that contained the Whatman circles with the parasite cells. After 6h of feeding, flies were moved and reared on standard yeast/molasses medium. At different time points post oral infection, infected flies were collected for downstream experiments and frozen at -80°C for molecular analyses.

DNA extraction for genome sequencing

Genomic DNA was extracted from 100 million H. muscarum cells from log phase cells from in vitro culture using the Norgen Biotek Genomic DNA extraction kit according to the manufacturer's instructions.

RNA extraction for RNA-seq

8ml of H. muscarum promastigote culture at a density of 9.25 x106 cells per ml (measured by haemocytometer) was diluted 1:40 in supplemented BHI and divided between 4 tissue culture flasks. The immediate post-dilution density was 6.5 x105 cells per ml. The following day the cell density was measured to be 1.18 x 106 cells per ml. 45ml was taken from each flask and the cells pelleted by centrifugation for 10 mins at 1000xg. The supernatant was discarded and the Norgen Biotek RNA Purification kit was (according to manufacturer's instructions) used to purify RNA from the cell pellet. This process was repeated for 5.3ml of the remaining culture three days later when the cell density was 1.21x107 cells per ml. The resulting RNA was eluted at concentrations 97–170 ng per μl with a 260/230 absorbance 1.86–2.19.

Reference genome

To produce the reference genome Illumina and Pacific Biosciences sequencing platforms were used. For Illumina sequencing 1ug of genomic DNA was sheared into 300–500 base pair (bp) fragments by focused ultrasonication (Covaris Adaptive Focused Acoustics technology, AFA Inc., Woburn, USA). An amplification-free Illumina library was prepared [90] and 150 bp paired-end reads were generated on an Illumina MiSeq following the manufacturer’s standard sequencing protocols [91]. For the Pacific Biosciences SMRT technology, 8 μg of genomic DNA was sheared to 20-25kb by passing through a 25mm blunt ended needle. A SMRT bell template library was generated using the Pacific Biosciences issued protocol (20 kb Template Preparation Using BluePippin(tm) Size-Selection System). After a greater than 7kb size-selection using the BluePippin(tm) Size-Selection System (Sage Science, Beverly, MA) the library was sequenced using P6 polymerase and chemistry version 4 (P6C4) on 6 single-molecule real-time (SMRT) cells [92].

The Pacific Bioscience reads were assembly with HGAP3 [93], with genome size parameter set to 25Mb, to produce 285 contigs. The obtained assembly was then corrected with ICORN2 [94], for five iterations. Using the Argus Optical Mapping System from OpGen, an optical map was generated from high molecular weight genomic DNA captured in agarose plugs and the restriction enzymes KpnI and BamHI. The data was analysed with associated MapManager and MapSolver software tools (http://www.opgen.com/products-services/argus-system). The optical map consisted of 37–38 chromosomes with approximately half being contiguous. With the information obtained from the optical map and REAPR [95], manual genome improvement was performed on the PacBio assembly to produce a final genome assembly of 181 contigs. Analysis of the frequency distribution of Kmers was performed using GenomeScope version 1.0 [22] with the kmer frequencies estimated using Jellyfish [96] using the default parameters suggested in the GenomeScope manual.

Transcriptomic libraries Poly-A mRNA was purified from total RNA using oligodT magnetic beads and strand-specific indexed libraries were prepared using the KAPA Stranded RNA-Seq kit followed by ten cycles of amplification using KAPA HiFi DNA polymerase (KAPA Biosystems). Libraries were quantified and pooled based on a post-PCR Agilent Bioanalyzer and 75 bp paired-end reads were generated on the Illumina HiSeq v4 following the manufacturer's standard sequencing protocols (as above).

Data release

All sequencing data was submitted to the European Nucleotide Archive (ENA) under accession number ERP008869.

Genome annotation

CRAM output files containing RNA sequencing reads from both H. muscarum in vitro culture and infected D. melanogaster were converted to fastq format and then mapped to the genome sequence using the next generation sequencing reads alignment package HISAT2 version 2.1.0 [97]. The mapped reads from each sample were assembled into transcripts with the Cufflinks package version 2.2.1[98] and merged to form a single transcript set for all reads. The Companion annotation tool [23] was then used to generate several genome annotation files based on the RNA sequencing transcriptomic evidence and pre-existing gene models from three other trypanosomatids–L. braziliensis, L. major and T. brucei (individual annotation statistics S26 Table).

Orthofinder proteome analysis

The following proteomes were inputted into the Orthofinder script; Trypanosoma brucei brucei 927 v5.1 [24], Trypanosoma brucei gambiense DAL972 v3 [99], Trypanosoma congolense IL3000 [100], Trypanosoma cruzi (CL Brener) [27], Trypanosoma evansi STIB805 [101], Trypanosoma grayi ANR4 v1 [102], Trypanosoma rangeli SC_58 v1 [103], Trypanosoma theileri Edinburgh [104], Trypanosoma vivax Y486 [105], Leishmania braziliensis M2903 [56], Leishmania donovani BPK282 v1 [105], Leishmania infantum JPCM5 [56], Leishmania major Friedlin v6 [106], Leptomonas pyrrhocoris ASM129339v1 [11], Leptomonas seymori ASM129953v1 [107], Crithidia bombi [9], Crithidia expoeki [9], Crithidia fasciculata v14.0 [108], Angomonas deanei [8], Phytomonas EM1[109] and Bodo saltans v3 [110]. Where possible the above sequences were obtained from TriTrypDB v41 [111].

RNAseq analysis in vitro culture

CRAM output files were converted to fastq format and then mapped to the concatenated D. melanogaster and H. muscarum genome sequences using the hisat2 [98] mapper. Mapped reads were then counted using HTseq-count (v. 0.10.0) [112] and differential expression analysed using the DESeq2 package in R [113].

RNAseq analysis samples from whole flies

Total RNA of 8–10 flies at 6h, 12h, 18h post H. muscarum oral infection was extracted with total RNA purification kit from Norgen Biotek following the manufacturer’s instruction. Each time point was repeated in three independent experiments. cDNA libraries were prepared with the Illumina TruSeq RNA Sample Prep Kit v2. All sequencing was performed on the Illumina HiSeq 2000 plaftform using TruSeq v3 chemistry (Oxford Gene Technology, OGT). All sequence was paired end and performed over 100 cycles. Read files (Fastq) were generated and then mapped to the concatenated D. melanogaster and H. muscarum genome sequences using the hisat2 mapper [98]. Mapped reads were then counted using HTseq-count (v. 0.10.0) [112] and differential expression analysed using the DESeq2 package in R [113].

Supporting information

S1 Fig. GenomeScope kmer profile and model for H. muscarum genome.

(PDF)

S2 Fig. Principal component analysis of differentially expressed H. muscarum genes in log phase culture vs. stationary phase culture.

(A). There are two clear sample groupings (circled) which correspond to RNA each condition (n = 3 per condition). Dark blue = log phase samples and light blue = stationary phase samples.

(PDF)

S1 Table. Coordinates of putative strand switch regions in the H. muscarum genome.

(XLSX)

S2 Table. BLAST hits for the Phytomonas serpens spliced leader sequence in the H. muscarum genome.

(XLSX)

S3 Table. Alignment of intronic region of the mini-exon gene from several trypanosomatids of the Leishmaniiae clade and Protein orthogroups from Orthofinder analysis (in full).

The first 15bp of the intron sequences appear to be conserved across the clade with the sequence becoming more variable thereafter.

(XLSX)

S4 Table. H. musccarum proteins orthologous to important T. brucei proteins: metabolism-associated proteins.

(XLSX)

S5 Table. H. musccarum proteins orthologous to important T. brucei proteins: Differentiation- and RNA-associated proteins.

(XLSX)

S6 Table. H. musccarum proteins orthologous to important T. brucei proteins: RNAi-associated proteins.

(XLSX)

S7 Table. H. musccarum proteins orthologous to important T. brucei proteins: Phosphatases.

(XLSX)

S8 Table. H. musccarum proteins orthologous to important T. brucei proteins: Protein kinases.

(XLSX)

S9 Table. H. musccarum proteins orthologous to important T. brucei proteins: GP63.

(XLSX)

S10 Table. H. musccarum proteins orthologous to important T. brucei proteins: Mucins.

(XLSX)

S11 Table. H. musccarum proteins without orthologues in other trypanosomatids.

(XLSX)

S12 Table. H. musccarum proteins orthologous to important T. brucei proteins: Kinetochore-associated proteins.

(XLSX)

S13 Table. H. musccarum proteins orthologous to important T. brucei proteins: Spliceosome-associated proteins.

(XLSX)

S14 Table. H. musccarum proteins orthologous to important T. brucei proteins: Exosome-associated proteins.

(XLSX)

S15 Table. H. muscarum proteins orthologous to important T. brucei proteins: Nuclear proteins.

(XLSX)

S16 Table. H. muscarum genes differentially expressed between log and stationary phase H. muscarum in vitro culture.

(XLSX)

S17 Table. Pfam domains significantly enriched in differentially regulated H. muscarum genes upon entry into stationary phase during axenic culture (vs. log phase).

(XLSX)

S18 Table. H. muscarum genes differentially expressed between log phase in vitro culture and after ingestion by D. melanogaster (all samples).

(XLSX)

S19 Table. H. muscarum genes differentially expressed between log phase in vitro culture and 6 hours after ingestion by D. melanogaster.

(XLSX)

S20 Table. Significantly enriched Pfam domains in differentially regulated Herpetomonas muscarum genes at six hours post ingestion by Drosophila melanogaster (vs log-phase axenic culture).

The table shows the top 10 represented Pfam domains in the significantly up- and downregulated genes. Chi-squared tests were performed to test for statistically significant enrichment of the Pfams frequency in upregulated genes vs. the Pfams in the whole genome.

(XLSX)

S21 Table. Structural predictions for differentially expressed H. muscarum surface proteins.

Structural predictions were acquired using the TMHMM1.0 online tool (Krogh et al., 2001).

(XLSX)

S22 Table. H. muscarum genes differentially expressed between 6 and 12 hours after ingestion by D. melanogaster.

(XLSX)

S23 Table. H. muscarum genes differentially expressed between 12 and 18 hours after ingestion by D. melanogaster.

(XLSX)

S24 Table. Table of genes differentially regulated between Herpetomonas muscarum after ingestion by Drosophila melanogaster vs. stationary phase axenic culture and not log phase axenic culture, p-adjusted < 0.05.

(XLSX)

S25 Table. The top 10 represented Pfam domains in Herpetomonas muscarum ingested by Drosophila melanogaster vs. stationary phase axenic culture.

(XLSX)

S26 Table. Summary statistics of three H. muscarum genome annotations using gene models from L. major, L. braziliensis and T. brucei and a maximal annotation which combines all three annotations.

(XLSX)

Acknowledgments

We thank the staff of the DNA pipelines at Wellcome Sanger Institute for sequencing and generating sequencing libraries.

Data Availability

All sequencing data are available at the European Nucleotide Archive (ENA) under accession number ERP008869.

Funding Statement

KB, TDO, MJS and JAC were supported by Wellcome via their core support for the Wellcome Sanger Institute (WSI) through grant 206194. Work in Oxford was supported by a Consolidator grant from the European Research Council (310912 Droso-Parasite, to PL), project grant BB/K003569 from the Biological and Biotechnological Sciences Research Council (to PL) and a Wellcome Trust doctoral scholarship (to MAS). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Pacheco, Raquel S, Marzochi, Mauro CA, Pires, Marize Q, Brito, Célia MM, Madeira, de Fátima Maria, & Barbosa-Santos, Elizabeth GO. (1998). Parasite Genotypically Related to a Monoxenous Trypanosomatid of Dog's Flea Causing Opportunistic Infection in an HIV Positive Patient. Memórias do Instituto Oswaldo Cruz, 93(4), 531–537. 10.1590/s0074-02761998000400021 [DOI] [PubMed] [Google Scholar]
  • 2.Zídková L., Cepicka I., Votypka J., Svobodová M. 2010. Herpetomonas trimorpha sp. nov. (Trypanosomatidae, Kinetoplastida), a parasite of the biting midge Culicoides truncorum (Ceratopogonidae, Diptera). International Journal of Systematic and Evolutionary Microbiology. 60(9): pp. 2236–2246. [DOI] [PubMed] [Google Scholar]
  • 3.Rowton E. D. and Barclay McGhee R. (1978) ‘Population Dynamics of Herpetomonasampelophilae, with a Note on the Systematics of Herpetomonas from Drosophila spp.’, The Journal of Protozoology. John Wiley & Sons, Ltd (10.1111), 25(2), pp. 232–235. 10.1111/j.1550-7408.1978.tb04402.x [DOI] [Google Scholar]
  • 4.Lange C. E., and Lord J. (2012). “Protistan entomopathogens,” in Insect Pathology, 2nd Edn, eds Vega B. and Kaya H.(Amsterdam: Elsevier; ), 367–394. 10.1016/B978-0-12-384984-7.00010-5 [DOI] [Google Scholar]
  • 5.Vega F. E. and Kaya H. K. 2012. Insect Pathology Second Edition Academic Press; Amsterdam (The Netherlands) and Boston (Massachusetts). Elsevier. ISBN: 978-0-12-384984-7. [Google Scholar]
  • 6.Erwin T. L. 1983. Tropical forest canopies: the last biotic frontier. Bulletin of the Entomological Society of America, Volume 29: 14–19. [Google Scholar]
  • 7.Hallmann CA, Sorg M, Jongejans E, Siepel H, Hofland N, et al. (2017) More than 75 percent decline over 27 years in total flying insect biomass in protected areas. PLOS ONE 12(10): e0185809 10.1371/journal.pone.0185809 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Motta M. C. M., Martins A. C., de Souza S. S., Catta-Preta C. M., Silva R., Klein C. C., de Almeida L. G., de Lima Cunha O., Ciapina L. P., Brocchi M. 2013. Predicting the Proteins of Angomonas deanei, Strigomonas culicis and Their Respective Endosymbionts Reveals New Aspects of the Trypanosomatidae Family. PLoS ONE. 8(4): e60209 10.1371/journal.pone.0060209 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Schmid-Hempel P. et al. (2018) ‘The genomes of Crithidia bombi and C. expoeki, common parasites of bumblebees’, PLoS ONE, 13(1). 10.1371/journal.pone.0189738 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Runckel C., DeRisi J., Flenniken M. L. 2014. A draft genome of the honey bee trypanosomatid parasite Crithidia mellificae. PLoS ONE, 9(4). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Flegontov P., Butenko A., Firsov S., Kraeva N., Eliáš M., Field M. C., Filatov D., Flegontova O., Gerasimov E. S., Hlaváčová J. et al. , 2016. Genome of Leptomonas pyrrhocoris: A high-quality reference for monoxenous trypanosomatids and new insights into evolution of Leishmania. Scientific Reports, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Teixeira S. M., de Paiva R. M., Kangussu-Marcolino M. M., Darocha W. D. 2012. Trypanosomatid comparative genomics: Contributions to the study of parasite biology and different parasitic diseases. Genetics and Molecular Biology, 35(1): pp. 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zoltner M, Krienitz N, Field MC, Kramer S (2018) Comparative proteomics of the two T. brucei PABPs suggests that PABP2 controls bulk mRNA. PLOS Neglected Tropical Diseases 12(7): e0006679 10.1371/journal.pntd.0006679 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Stuart K., Allen T. E., Heidmann S., Seiwert S. D. 1997. RNA editing in kinetoplastid protozoa. Microbiology and Molecular Biology Reviews, 61(1): pp. 105–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Liang X. Haritan, A., Uliel, S., Michaeli, S. 2003. trans and cis Splicing in Trypanosomatids: Mechanism, Factors, and Regulation. Eukaryotic Cell, 2(5): pp. 830–840. 10.1128/EC.2.5.830-840.2003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Chen J., Rauch C. A., White J. H., Englund P. T., Cozzarelli N. R. 1995. The topology of the kinetoplast DNA network. Cell, 80(1): pp. 61–69. 10.1016/0092-8674(95)90451-4 [DOI] [PubMed] [Google Scholar]
  • 17.Lukeš J., Guilbride D. L., Votýpka J., Zíková A., Benne R., Englund P. T. 2002. Kinetoplast DNA Network: Evolution of an Improbable Structure. Eukaryotic Cell, 1(4): pp. 495–502. 10.1128/EC.1.4.495-502.2002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Borghesan T. C., Ferreira R. C., Takata C. S., Campaner M., Borda C. C., Paiva F., Milder R. V., Teixeira M. M., Camargo E. P. 2013. Molecular Phylogenetic Redefinition of Herpetomonas (Kinetoplastea, Trypanosomatidae), a Genus of Insect Parasites Associated with Flies. Protist. Urban & Fischer, 164(1): pp. 129–152. [DOI] [PubMed] [Google Scholar]
  • 19.Simpson L. and Thiemann O. H. 1995. Sense from nonsense: RNA editing in mitochondria of kinetoplastid protozoa and slime molds. Cell, 81: pp. 837–840. 10.1016/0092-8674(95)90003-9 [DOI] [PubMed] [Google Scholar]
  • 20.Wang L., Sloan M., Ligoxygakis P. 2018. Intestinal NF-κB and STAT signalling is important for uptake and clearance in a Drosophila-Herpetomonas interaction model. PLoS Genet 15(3): e1007931 10.1371/journal.pgen.1007931 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Van Den Abbeele J., & Rotureau B. (2013). New insights in the interactions between African trypanosomes and tsetse flies. Frontiers in cellular and infection microbiology, 3, 63 10.3389/fcimb.2013.00063 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Vurture G. W., Sedlazeck F. J., Nattestad M., Underwood C. J., Fang H., Gurtowski J., Schatz M. C. 2017. GenomeScope: fast reference-free genome profiling from short reads, Bioinformatics, 33(14): pp. 2202–2204. 10.1093/bioinformatics/btx153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Steinbiss S., Silva-Franco F., Brunk B., Foth B., Hertz-Fowler C., Berriman M., Otto T. D. 2016. Companion: a web server for annotation and analysis of parasite genomes. Nucleic Acids Research, 44: pp. W29–W34. 10.1093/nar/gkw292 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Berriman M., Ghedin E., Hertz-Fowler C., Blandin G., Renauld H., Bartholomeu D. C., Lennard N. J., Caler E., Hamlin N. E., Haas B., et al. 2005. The Genome of the African Trypanosome Trypanosoma brucei. Science 309(5733): 416 LP–422. [DOI] [PubMed] [Google Scholar]
  • 25.Thomas S., Green A., Sturm N. R., Campbell D. A., & Myler P. J. (2009). Histone acetylations mark origins of polycistronic transcription in Leishmania major. BMC genomics, 10, 152 10.1186/1471-2164-10-152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Daniels J. P., Gull K., & Wickstead B. 2010. Cell biology of the trypanosome genome. Microbiology and molecular biology reviews: MMBR, 74(4): pp. 552–569. 10.1128/MMBR.00024-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.El-Sayed N. M., Myler P. J., Blandin G., et al. (2005) ‘Comparative Genomics of Trypanosomatid Parasitic Protozoa’, Science, 309(5733), p. 404 LP-409. 10.1126/science.1112181 [DOI] [PubMed] [Google Scholar]
  • 28.Yurchenko V., Kostygov A., Havlová J., Grybchuk-Ieremenko A., Ševčíková T., Lukeš J., Ševčík J., Votýpka J. 2016. Diversity of Trypanosomatids in Cockroaches and the Description of Herpetomonas tarakana sp. n.’, Journal of Eukaryotic Microbiology. 63(2): pp. 198–209. 10.1111/jeu.12268 [DOI] [PubMed] [Google Scholar]
  • 29.Jackson A. P., Vaughan S. and Gull K. (2006) ‘Evolution of Tubulin Gene Arrays in Trypanosomatid parasites: genomic restructuring in Leishmania’, BMC Genomics. London: BioMed Central, 7, p. 261 10.1186/1471-2164-7-261 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Emms D. and Kelly S. 2015. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biology. 16(157). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Sutterwala S. S., Hsu F., Sevova E. S., Schwartz K. J., Zhang K., Key P., Turk J., Beverley S. M., Bangs J. D. 2008. Developmentally regulated sphingolipid synthesis in African trypanosomes. Molecular Microbiology, 70: pp. 281–296. 10.1111/j.1365-2958.2008.06393.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhang K., Showalter M., Revollo J., Hsu F. F., Turk J., Beverley S. M. 2003. Sphingolipids are essential for differentiation but not growth in Leishmania. EMBO J., 22: pp. 6016–6026. 10.1093/emboj/cdg584 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Vanlerberghe G. C. and McIntosh L. (1997) ‘Alternative Oxidase: From Gene to Function’, Annual Review of Plant Physiology and Plant Molecular Biology. Annual Reviews, 48(1), pp. 703–734. 10.1146/annurev.arplant.48.1.703 [DOI] [PubMed] [Google Scholar]
  • 34.Jackson A. P. 2007. Origins of amino acid transporter loci in trypanosomatid parasites. BMC evolutionary biology, 7, 26 10.1186/1471-2148-7-26 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Shaked‐Mishan P., Suter‐Grotemeyer M., Yoel‐Almagor T., Holland N., Zilberstein D. and Rentsch D. 2006. A novel high‐affinity arginine transporter from the human parasitic protozoan Leishmania donovani. Molecular Microbiology, 60: pp. 30–38. 10.1111/j.1365-2958.2006.05060.x [DOI] [PubMed] [Google Scholar]
  • 36.Martin J. L. et al. (2014) ‘Metabolic reprogramming during purine stress in the protozoan pathogen Leishmania donovani’, PLoS pathogens. Public Library of Science, 10(2), p. e1003938 10.1371/journal.ppat.1003938 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Inbar E., Hughitt V. K., Dillon L. A. L., Ghosh K., El-Sayed N. M. and Sacks D. L. 2017. The Transcriptome of Leishmania major Developmental Stages in Their Natural Sand Fly Vector, mBio, 8(2). 10.1128/mBio.00029-17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Kolev N. G., Ullu E. and Tschudi C. (2014) The emerging role of RNA-binding proteins in the life cycle of Trypanosoma brucei. Cellular microbiology 16(4): 482–489. 10.1111/cmi.12268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Naguleswaran A., Gunasekera K., Schimanski B., Heller M., Hemphill A., Ochsenreiter T. and Roditi I. (2015) Trypanosoma brucei RRM1 Is a Nuclear RNA-Binding Protein and Modulator of Chromatin Structure. mBio 6(2): e00114–15. 10.1128/mBio.00114-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Wippel H. H., Malgarin J. S., Martins S. de T., Vidal N. M., Marcon B. H., Miot H. T., Marchini F. K., Goldenberg S. and Alves. (2019) The Nuclear RNA-binding Protein RBSR1 Interactome in Trypanosoma cruzi. Journal of Eukaryotic Microbiology. John Wiley & Sons, Ltd [DOI] [PubMed] [Google Scholar]
  • 41.Wurst M., Seliger B., Jha B. A., Klein C., Queiroz R., Clayton C. Expression of the RNA recognition motif protein RBP10 promotes a bloodstream-form transcript pattern in Trypanosoma brucei. Mol Microbiol. 2012; 83:1048–1063.1111) 66(2): 244–253. 10.1111/j.1365-2958.2012.07988.x [DOI] [PubMed] [Google Scholar]
  • 42.Jones N. G. et al. (2014) ‘Regulators of Trypanosoma brucei cell cycle progression and differentiation identified using a kinome-wide RNAi screen’, PLoS pathogens. Public Library of Science, 10(1), pp. e1003886–e1003886. 10.1371/journal.ppat.1003886 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Acosta-Serrano A. et al. (2001) ‘The surface coat of procyclic Trypanosoma brucei: Programmed expression and proteolytic cleavage of procyclin in the tsetse fly’, Proceedings of the National Academy of Sciences. National Academy of Sciences, 98(4), pp. 1513–1518. 10.1073/pnas.041611698 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Haines L. R. et al. (2010) ‘Tsetse EP protein protects the fly midgut from trypanosome establishment’, PLoS pathogens. Public Library of Science, 6(3), pp. e1000793–e1000793. 10.1371/journal.ppat.1000793 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Pimenta P. F. et al. (1992) ‘Stage-specific adhesion of Leishmania promastigotes to the sandfly midgut’, Science, 256(5065), pp. 1812 LP–1815. 10.1126/science.1615326 [DOI] [PubMed] [Google Scholar]
  • 46.Kamhawi S. et al. (2004) ‘A role for insect galectins in parasite survival.’, Cell. United States, 119(3), pp. 329–341. [DOI] [PubMed] [Google Scholar]
  • 47.Urwyler S., Studler E., Renggli C. K., Roditi I. 2007. A family of stage-specific alanine-rich proteins on the surface of epimastigote forms of Trypanosoma brucei. Mol Microbiol., 63: pp. 218–228 10.1111/j.1365-2958.2006.05492.x [DOI] [PubMed] [Google Scholar]
  • 48.Fragoso C. M., Schumann Burkard G., Oberle M., Renggli C. K., Hilzinger K., Roditi I. 2009. PSSA-2, a Membrane-Spanning Phosphoprotein of Trypanosoma brucei, Is Required for Efficient Maturation of Infection. PLoS ONE, 4(9):e7074 10.1371/journal.pone.0007074 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Casas-Sánchez A. and Acosta-Serrano Á. (2016) Skin deep. eLife. eLife Sciences Publications, Ltd 5: e21506 10.7554/eLife.21506 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Pereira F., Santos-Mallet J. R., Branquinha M. H., d'Avila-Levy C. M., Santos A. L. 2010. Influence of leishmanolysin-like molecules of Herpetomonas samuelpessoai on the interaction with macrophages. Microbes and infection, 10.1016/j.micinf.2010.07.010 [DOI] [PubMed] [Google Scholar]
  • 51.D’Archivio S. and Wickstead B. (2017) ‘Trypanosome outer kinetochore proteins suggest conservation of chromosome segregation machinery across eukaryotes’, The Journal of Cell Biology, 216(2), p. 379 LP–391. 10.1083/jcb.201608043 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Robinson K.A. and Beverley S.M., 2003. Improvements in transfection efficiency and tests of RNA interference (RNAi) approaches in the protozoan parasite Leishmania. Molecular and biochemical parasitology, 128(2), pp.217–228. 10.1016/s0166-6851(03)00079-3 [DOI] [PubMed] [Google Scholar]
  • 53.DaRocha W.D., Otsu K. and Teixeira S.M., 2004. Tests of cytoplasmic RNA interference (RNAi) and construction of a tetracyclineinducible T7 promoter system in Trypanosome cruzi. Mol Biochem Parasitol, 133(2): pp.175–186 10.1016/j.molbiopara.2003.10.005 [DOI] [PubMed] [Google Scholar]
  • 54.Beverley S.M., 2003. Protozomics: trypanosomatid parasite genetics comes of age. Nature Reviews Genetics, 4(1): pp.11 10.1038/nrg980 [DOI] [PubMed] [Google Scholar]
  • 55.Lye L. F., Owens K., Shi H., Murta S. M. F., Vieira A. C., Turco S. J., Tschudi C., Ullu E., Beverley. 2010. Retention and Loss of RNA Interference Pathways in Trypanosomatid Protozoans. PLOS Pathogens, 6(10): e1001161 10.1371/journal.ppat.1001161 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Peacock C. S., Seeger K., Harris D., Murphy L., Ruiz J. C., Quail M. A., Peters N., Adlem E., Tivey A., Aslett M., Kerhornou A., et al. 2007. Comparative genomic analysis of three Leishmania species that cause diverse human disease. Nat. Genet. 39(7): pp. 839–47. 10.1038/ng2053 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Vingron M. et al. (2007) ‘Improved detection of overrepresentation of Gene-Ontology annotations with parent–child analysis’, Bioinformatics, 23(22), pp. 3024–3031. 10.1093/bioinformatics/btm440 [DOI] [PubMed] [Google Scholar]
  • 58.Tu X. and Wang C. C. (2004) ‘The Involvement of Two cdc2-related Kinases (CRKs) in Trypanosoma brucei Cell Cycle Regulation and the Distinctive Stage-specific Phenotypes Caused by CRK3 Depletion’, Journal of Biological Chemistry, 279(19), pp. 20519–20528. 10.1074/jbc.M312862200 [DOI] [PubMed] [Google Scholar]
  • 59.Hammarton T. C., Engstler M. and Mottram J. C. (2004) ‘The Trypanosoma brucei Cyclin, CYC2, Is Required for Cell Cycle Progression through G1 Phase and for Maintenance of Procyclic Form Cell Morphology’, Journal of Biological Chemistry, 279(23), pp. 24757–24764. 10.1074/jbc.M401276200 [DOI] [PubMed] [Google Scholar]
  • 60.Liu Y., Hu H. and Li Z. (2013) ‘The cooperative roles of PHO80-like cyclins in regulating the G1/S transition and posterior cytoskeletal morphogenesis in Trypanosoma brucei’, Molecular Microbiology. John Wiley & Sons, Ltd (10.1111), 90(1), pp. 130–146. 10.1111/mmi.12352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Monnerat S. et al. (2013) ‘Identification and Functional Characterisation of CRK12:CYC9, a Novel Cyclin-Dependent Kinase (CDK)-Cyclin Complex in Trypanosoma brucei’, PloS one. Public Library of Science, 8(6), pp. e67327–e67327. 10.1371/journal.pone.0067327 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Lee S. H., Stephens J. L., Paul K. S., Englund P. T. 2006. Fatty Acid Synthesis by Elongases in Trypanosomes. Cell, 126(4): pp. 691–699. 10.1016/j.cell.2006.06.045 [DOI] [PubMed] [Google Scholar]
  • 63.Svärd S. G. et al. (1998) ‘Differentiation-associated surface antigen variation in the ancient eukaryote Giardia lamblia’, Molecular Microbiology. John Wiley & Sons, Ltd (10.1111), 30(5), pp. 979–989. 10.1046/j.1365-2958.1998.01125.x [DOI] [PubMed] [Google Scholar]
  • 64.Adam R. D. (2001) ‘Biology of Giardia lamblia’, Clinical Microbiology Reviews, 14(3), p. 447 LP–475. 10.1128/CMR.14.3.447-475.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Nash T. E. et al. (1988) ‘Antigenic variation in Giardia lamblia.’, The Journal of Immunology, 141(2), p. 636 LP–641. [PubMed] [Google Scholar]
  • 66.Aitcheson N. et al. (2005) ‘VSG switching in Trypanosoma brucei: antigenic variation analysed using RNAi in the absence of immune selection’, Molecular microbiology, 57(6), pp. 1608–1622. 10.1111/j.1365-2958.2005.04795.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Käll L., Krogh A., Sonnhammer E. L. L. 2004. A Combined Transmembrane Topology and Signal Peptide Prediction Method. Journal of Molecular Biology, 338(5): pp. 1027–1036. 10.1016/j.jmb.2004.03.016 [DOI] [PubMed] [Google Scholar]
  • 68.Van Damme E. and Van Loock M. (2014) ‘Functional annotation of human cytomegalovirus gene products: an update’, Frontiers in microbiology. Frontiers Media S.A., 5, p. 218 10.3389/fmicb.2014.00218 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Matthews K. R. (2005) ‘The developmental cell biology of Trypanosoma brucei’, Journal of cell science, 118(Pt 2), pp. 283–290. 10.1242/jcs.01649 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Savage A. F. et al. (2016) ‘Transcriptome Profiling of Trypanosoma brucei Development in the Tsetse Fly Vector Glossina morsitans’, PloS one. Public Library of Science, 11(12), pp. e0168877–e0168877. 10.1371/journal.pone.0168877 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Krogh A., Larsson B., von Heijne G., and Sonnhammer E. L. L. 2001. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. Journal of Molecular Biology, 305(3): pp. 567–580. 10.1006/jmbi.2000.4315 [DOI] [PubMed] [Google Scholar]
  • 72.de Paiva RMC, Grazielle-Silva V, Cardoso MS, Nakagaki BN, Mendonça-Neto RP, et al. (2015) Amastin Knockdown in Leishmania braziliensis Affects Parasite-Macrophage Interaction and Results in Impaired Viability of Intracellular Amastigotes. PLOS Pathogens 11(12): e1005296 10.1371/journal.ppat.1005296 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Kangussu-Marcolino M. M., de Paiva R. M. C., Araújo P. R., de Mendonça-Neto R. P., Lemos L., Bartholomeu D. C., Mortara R. A., da Rocha W. D., Teixeira S. M. R. 2013. Distinct genomic organization, mRNA expression and cellular localization of members of two amastin sub-families present in Trypanosoma cruzi. BMC Microbiology, 13(1): pp. 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Fankhauser N. and Mäser P. (2005) Identification of GPI anchor attachment signals by a Kohonen self-organizing map. Bioinformatics. 21(9), pp. 1846–1852. 10.1093/bioinformatics/bti299 [DOI] [PubMed] [Google Scholar]
  • 75.Darlyuk I. et al. (2009) ‘Arginine Homeostasis and Transport in the Human Pathogen Leishmania donovani’, Journal of Biological Chemistry, 284(30), pp. 19800–19807. 10.1074/jbc.M901066200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Marchese L. et al. (2018) ‘The Uptake and Metabolism of Amino Acids, and Their Unique Role in the Biology of Pathogenic Trypanosomatids’, Pathogens. 10.3390/pathogens7020036 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Cunningham M. L. and Beverley S. M. 2001. ‘Pteridine salvage throughout the Leishmania infectious cycle: implications for antifolate chemotherapy. Molecular and Biochemical Parasitology. Elsevier, 113(2): pp. 199–213. 10.1016/s0166-6851(01)00213-4 [DOI] [PubMed] [Google Scholar]
  • 78.Gourguechon S. and Wang C. C. (2009) ‘CRK9 contributes to regulation of mitosis and cytokinesis in the procyclic form of Trypanosoma brucei’, BMC cell biology. BioMed Central, 10, p. 68 10.1186/1471-2121-10-68 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Gupta S. K., Kosti I., Plaut G., Pivko A., Tkacz I. D., Cohen-Chalamish S., et al. (2013) The hnRNP F/H homologue of Trypanosoma brucei is differentially expressed in the two life cycle stages of the parasite and regulates splicing and mRNA stability. Nucleic Acids Res. 41:6577–6594 10.1093/nar/gkt369 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Smith T. K. et al. (2017) ‘Metabolic reprogramming during the Trypanosoma brucei lifecycle’, F1000Research. F1000Research, 6, p. F1000 Faculty Rev-683. 10.12688/f1000research.10342.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Benz C., Mulindwa J., Ouna B., Clayton C. (2011) The Trypanosoma brucei zinc finger protein ZC3H18 is involved in differentiation. Mol Biochem Parasitol. 177:148–151. 10.1016/j.molbiopara.2011.02.007 [DOI] [PubMed] [Google Scholar]
  • 82.Droll D., Minia I., Fadda A., Singh A., Stewart M., Queiroz R., Clayton C. (2013) Post-transcriptional regulation of the trypanosome heat shock response by a zinc finger protein. PLoS Pathog. 9:e1003286 10.1371/journal.ppat.1003286 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Subota I., Rotureau B., Blisnick T., Ngwabyt S., Durand-Dubief M., Engstler M., Bastin P. (2011) ALBA proteins are stage regulated during trypanosome development in the tsetse fly and participate in differentiation. Mol Biol Cell. 22:4205–4219. 10.1091/mbc.E11-06-0511 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Dang H. Q. and Li Z. (2011) ‘The Cdc45·Mcm2-7·GINS protein complex in trypanosomes regulates DNA replication and interacts with two Orc1-like proteins in the origin recognition complex’, The Journal of biological chemistry. 2011/07/28. American Society for Biochemistry and Molecular Biology, 286(37), pp. 32424–32435. 10.1074/jbc.M111.240143 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.DeBrot A., Lancaster C. and Bjornsti M.A. (2016) ‘Function of Cdc45 in DNA Replication and in Response to Genotoxic Stress’, The FASEB Journal. Federation of American Societies for Experimental Biology, 30(1_supplement), p. 7982 798.2. 10.1096/fj.15-275990 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Kawale A. S. and Povirk L. F. (2018) ‘Tyrosyl-DNA phosphodiesterases: rescuing the genome from the risks of relaxation’, Nucleic acids research. 2017/12/04. Oxford University Press, 46(2), pp. 520–537. 10.1093/nar/gkx1219 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Capul A. A. et al. (2007) ‘Two Functionally Divergent UDP-Gal Nucleotide Sugar Transporters Participate in Phosphoglycan Synthesis in Leishmania major’, Journal of Biological Chemistry, 282(19), pp. 14006–14017. 10.1074/jbc.M610869200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Pita S. et al. (2019) ‘The Tritryps Comparative Repeatome: Insights on Repetitive Element Evolution in Trypanosomatid Pathogens’, Genome biology and evolution. Oxford University Press, 11(2), pp. 546–551. 10.1093/gbe/evz017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Jackson A. P. 2016. Gene family phylogeny and the evolution of parasite cell surfaces. Molecular and Biochemical Parasitology, 209(1): pp. 64–75. [DOI] [PubMed] [Google Scholar]
  • 90.Kozarewa I., Ning Z., Quail M. A., Sanders M. J., Berriman M., Turner D. J. 2009. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat. Methods, 6: pp. 291–295. 10.1038/nmeth.1311 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Iraad F., Bronner M. A., Quail D. J., Turner H. S. 2014. Improved Protocols for Illumina Sequencing. Curr. Protoc. Hum. Genet., 80:18(2): pp. 1–42. [DOI] [PubMed] [Google Scholar]
  • 92.Eid J. L., Fehr A., Gray J., Luong K., Lyle J., Otto G., Peluso P., Rank D., Baybayan P., Bettman B., Bibillo A., et al. 2009. Real-time DNA sequencing from single polymerase molecules. Science, 323: pp. 133–138 10.1126/science.1162986 [DOI] [PubMed] [Google Scholar]
  • 93.Chin C. S., Alexander D. H., Marks P., Klammer A. A., Drake J., Heiner C., Clum A., Copeland A., Huddleston J., Eichler E. E., Turner S. W., Korlach J. 2013. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10:563–569 10.1038/nmeth.2474 [DOI] [PubMed] [Google Scholar]
  • 94.Otto T. D., Sanders M., Berriman M., Newbold C. 2010. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology. Bioinforma. Oxf. Engl. 26: pp. 1704–1707. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Hunt M., Kikuchi T., Sanders M., Newbold C., Berriman M., Otto T. D. 2013. REAPR: a universal tool for genome assembly evaluation. Genome Biol, 14: R47 10.1186/gb-2013-14-5-r47 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Marçais G. and Kingsford C. 2011. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 27(6): pp. 764–770. 10.1093/bioinformatics/btr011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Kim D., Langmead B. and Salzberg S. L. 2015. HISAT: a fast spliced aligner with low memory requirements. Nature Methods, 12, pp. 357 10.1038/nmeth.3317 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Trapnell C., Roberts A., Goff L., Pertea G., Kim D., Kelley D. R., Pimentel H., Salzberg S. L., Rinn J. L., Pachter L. 2012. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols, 7: pp. 562 10.1038/nprot.2012.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Jackson A. P., Sanders M., Berry A., McQuillan J., Aslett M. A., Quail M. A., Chukualim B., Capewell P., MacLeod A., Melville S. E., Gibson W., Barry J. D., Berriman M., Hertz-Fowler C. 2010. The genome sequence of Trypanosoma brucei gambiense, causative agent of chronic human African trypanosomiasis. PLoS Negl. Trop Dis, 13(4):e658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Jackson A. P., Berry A., Aslett M., Allison H. C., Burton P., Vavrova-Anderson J., Brown R., Browne H., Corton N., Hauser H. et al. 2012. Antigenic diversity is generated by distinct evolutionary mechanisms in African trypanosome species. PNAS, 109: pp. 3416–3421. 10.1073/pnas.1117313109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Carnes J., Anupama A., Balmer O., Jackson A., Lewis M., Brown R., Cestari I., Desquesnes M., Gendrin C., Hertz-Fowler C. 2015. Genome and Phylogenetic Analyses of Trypanosoma evansi Reveal Extensive Similarity to T. brucei and Multiple Independent Origins for Dyskinetoplasty. PLOS Neglected Tropical Diseases. Public Library of Science 9(1): e3404 10.1371/journal.pntd.0003404 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Kelly S., Ivens A., Manna P. T., Gibson W., Field M. C. 2014. A draft genome for the African crocodilian trypanosome Trypanosoma grayi. Sci Data. 5(1):140024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Stoco P. H., Wagner G., Talavera-Lopez C., Gerber A., Zaha A., et al. 2014. Genome of the Avirulent Human-Infective Trypanosome—Trypanosoma rangeli. PLOS Neglected Tropical Diseases, 8(9): e3176 10.1371/journal.pntd.0003176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Kelly S., Ivens A., Mott G. A., O'Neill E., Emms D., Macleod O., Voorheis P., Tyler K., Clark M., Matthews J., Matthews K., Carrington M. 2017. An Alternative Strategy for Trypanosome Survival in the Mammalian Bloodstream Revealed through Genome and Transcriptome Analysis of the Ubiquitous Bovine Parasite Trypanosoma (Megatrypanum) theileri. Genome Biol Evol., 9(8): pp. 2093–2109. 10.1093/gbe/evx152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Downing T., Imamura H., Decuypere S., Clark T. G., Coombs G. H., Cotton J. A., Hilley J. D., de Doncker S., Maes I., Mottram J. C., et al. 2011. Whole genome sequencing of multiple Leishmania donovani clinical isolates provides insights into population structure and mechanisms of drug resistance. Genome Res, 21(12): pp. 2143–56. 10.1101/gr.123430.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Ivens A. C., Peacock C. S., Worthey E. A., Murphy L., Aggarwal G., Berriman M., Sisk E., Rajandream M. A., Adlem E., Aert R., et al. 2005. The genome of the kinetoplastid parasite, Leishmania major. Science, 309(5733): pp. 436–42. 10.1126/science.1112680 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Kraeva N., Butenko A., Hlaváčová J., Kostygov A., Myškova J., Grybchuk D., Leštinová T., Votýpka J., Volf P., Opperdoes F., Flegontov P., Lukeš J., Yurchenko V. 2015. Leptomonas seymouri: Adaptations to the Dixenous Life Cycle Analyzed by Genome Sequencing, Transcriptome Profiling and Co-infection with Leishmania donovani. PLoS Pathog., 11(8):e1005127 10.1371/journal.ppat.1005127 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Runckel C., DeRisi J., & Flenniken M. L. (2014). A draft genome of the honey bee trypanosomatid parasite Crithidia mellificae. PloS one, 9(4), e95057 10.1371/journal.pone.0095057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Porcel B. M., Denoeud F., Opperdoes F., Noel B., Madoui M-A., Hammarton T. C., Field M. C., Da Silva, Couloux A., Poulain J., et al. 2014. The streamlined genome of Phytomonas spp. relative to human pathogenic kinetoplastids reveals a parasite tailored for plants. PLoS genetics. e1004007 10.1371/journal.pgen.1004007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Jackson A. P., Quail M. A., Berriman M. 2008. Insights into the genome sequence of a free-living Kinetoplastid: Bodo saltans (Kinetoplastida: Euglenozoa). BMC Genomics. 9(9):594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Aslett M., Aurrecoechea C., Berriman M., Brestelli J., Brunk B. P., Carrington M., Depledge D. P., Fischer S., Garjria B., Gao X., et al. , 2010. TriTrypDB: a functional genomic resource for the Trypanosomatidae. Nucleic Acids Research. 38(37): D457–D462 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Anders S., Pyl P. T. and Huber W. 2015. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics, 31(2): pp. 166–169. 10.1093/bioinformatics/btu638 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Love M. I., Huber W., Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15: pp. 550 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Gregory S Barsh, Won-Jae Lee

5 Sep 2019

* Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. *

Dear Dr Ligoxygakis,

Thank you very much for submitting your Research Article entitled 'Transcriptional and genomic parallels between the monoxenous parasite Herpetomonas muscarum and Leishmania' to PLOS Genetics.

The manuscript was fully evaluated at the editorial level and by two peer reviewers. We had hoped to secure a third reviewer but have now decided to proceed based on the reviews in hand. As you will see, both reviewers are enthusiastic about the work. There are some comments and suggestions that we ask you address in a minor revision. We therefore ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer.

In addition we ask that you:

1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

Please let us know if you have any questions while making these revisions.

Yours sincerely,

Won-Jae Lee

Guest Editor

PLOS Genetics

Gregory Barsh

Editor-in-Chief

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: This manuscript entitled “Transcriptional and genomic parallels between the monoxenous parasite Herpetomonas muscarum and Leishmania” by Sloan et al describes the result of thoroughly conducted investigations on the genomic and transcriptomic features of Herpetomonas muscarum. The authors present the genome of H. muscarum, and further performed transcriptomics studies on axenic promastigotes in logarithmic and stationary phase as well as the parasites in the insect host of Drosophila melanogaster. Based on the sequencing result, an extensive analysis was carried out by comparing the features with dixenous trypanosomatids of Trypanosomes. Reported findings are not only of interest to the community of Herpetomonas and Drosophila, but may provide scientific insights to the researchers in the field of Trypanosomes.

There are few comments and suggestions.

1. In line 85, the authors stated that the parasites of interest “lack introns”. However, in the case of Trypanosoma brucei, Siegel et al (Nucleic Acids Res, 15:4946, 2010) reported an existence of two introns (Tb927.3.3160, a poly(A) polymerase and Tb927.8.1510, a DNA/RNA helicase). Thus, it is encouraged to state “virtually (or mostly) lack introns”. Plus, for H. muscarum, are those introns exists in the orthologs?

2. Trypanosomes are known to possess Base J (β-D-Glucopyranosyloxymethyluracil) acting as a RNA polymerase II transcription terminator. From this sequencing study, can the authors check the possible existence or signatures of Base J?

3. From line 281, the authors discuss some of the orthologues of interest based on OrthoFinder analysis (GP63, RNAi machinery, Leishmania RNA virus and etc were well discussed). On top of discussed orthologues, Leishmania and Trypanosoma parasites are known for other key features/virulence factors; trypanothione (with synthetase/reductase, Fairlamb et al, Science, 1985), lipophosphoglycan (Spath et al, Science, 2003), glycosome (Gualdrón-López et al, Int J Parasitol, 2012), acidocalcisome (Vercesi et al, Biochem J, 1994) and etc. The reviewer suggests the authors to discuss about these important features based on the genome sequencing data of H. muscarum.

4. Uniformity of Figure sub-numbering for corrections. (ex “A”, “A.”, “(A)” are all used in the figures).

5. For Table S26, two same files have been uploaded in Supplementary 15 and 16. There is no file uploaded for Table S25. Please check.

6. In the download system, Table S16 = supp7 and Table S17 = supp8. However, it is switched in the text. Please check.

7. As in Supplementary file 17, input of supporting Table # in the excel tab would help readers to easily follow the text and tables.

8. Abbreviations such as Leishmania major (L. major), Trypanosoma brucei (T. brucei) and etc should be maintained throughout the manuscript.

Reviewer #2: The present manuscript is descriptive investigation of the genomic and transcriptional parallels between the drosophila parasite Herpetomonas muscarum and Leishmania. At present, research on host- parasite interactions between Phlebotomine flies and Leishmania spp. remains sorely understudied due to a variety of constraints from routine colonisation of sand flies, adapted in vivo infection systems, but most importantly genetic tools to tease out the molecular mechanisms incurred during these unique host-parasite interactions in the midgut. Furthermore the notion of "midgut ecology" is now becoming more and more recognised and laboratories models for more in-depth studies will be necessary to better understand how microbes and parasites are interacting within this environment and with the mesenteron. The comparative results of your study are remarkable and convincingly support the promotion of the Drosophila-Hepetomonas model as a proxy for Phlebotomine flies and Leishmania spp. interactions and to better understand trypanosomid establishment in the dipteran insects midgut. I do appreciate the fact that you limit the focus of this comparison to this midgut compartment as it would be difficult to extrapolate further given that H. muscarum remains in the crop and midgut and has no interaction with the salivary glands as compared to dixenous trypanosomatid parasites.

Not only does the genome structure and organisation provide interesting parallels, but also the sometimes intriguing insights from the experimental transcriptional studies in vitro and in vivo. This however will require further investigation to understand the meaning and role of transcriptional flux and predicted proteome during infestation of the midgut (ex significance of tubulin array configurations or the significance of the surprising lack of the SLS1-4 gene for sphingolipid biosynthesis gene).

The paper is very well written and easy to follow. The data support the text of the manuscript. I did find one typographical error on line 313 ..."unregulated in Leishmania during sand flies and..." there is something missing here. Perhaps it should read, ""unregulated in Leishmania during sand fly infestation and..."

One aspect that you may want to consider in future studies is the idea of "pre-conditioning" of midgut ecology during larval midgut to adult midgut morphogenesis. It would be interesting to know if the larval microbiota could subsequently "facilitate" Heptomonas infections in the adult drosophila. Even though your studies were conducted under axenic conditions, it would be interesting to know how the carry-over of the larval micobiota or newly established adult microbiota are somehow affecting these early stages of parasite interaction with the mesenteron. Working on cave-dwelling sand flies we are intrigued by the presumed microbiota shift between the sand fly larval diet, which is essentially bat guano and organic materials from dead bats to the adult diet of blood and plant sap/nectar.

In conclusion this is a very important study to publish and I do hope it will encourage other drosophila researchers to embrace this model and shed more light on Phlebotomine flies and Leishmania interactions in the future.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Decision Letter 1

Gregory S Barsh, Won-Jae Lee

1 Oct 2019

Dear Dr Ligoxygakis,

We are pleased to inform you that your manuscript entitled "Transcriptional and genomic parallels between the monoxenous parasite Herpetomonas muscarum and Leishmania" has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional accept, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about one way to make your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Won-Jae Lee

Guest Editor

PLOS Genetics

Gregory Barsh

Editor-in-Chief

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-19-01181R1

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Acceptance letter

Gregory S Barsh, Won-Jae Lee

1 Nov 2019

PGENETICS-D-19-01181R1

Transcriptional and genomic parallels between the monoxenous parasite Herpetomonas muscarum and Leishmania

Dear Dr Ligoxygakis,

We are pleased to inform you that your manuscript entitled "Transcriptional and genomic parallels between the monoxenous parasite Herpetomonas muscarum and Leishmania" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Nicholas White

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. GenomeScope kmer profile and model for H. muscarum genome.

    (PDF)

    S2 Fig. Principal component analysis of differentially expressed H. muscarum genes in log phase culture vs. stationary phase culture.

    (A). There are two clear sample groupings (circled) which correspond to RNA each condition (n = 3 per condition). Dark blue = log phase samples and light blue = stationary phase samples.

    (PDF)

    S1 Table. Coordinates of putative strand switch regions in the H. muscarum genome.

    (XLSX)

    S2 Table. BLAST hits for the Phytomonas serpens spliced leader sequence in the H. muscarum genome.

    (XLSX)

    S3 Table. Alignment of intronic region of the mini-exon gene from several trypanosomatids of the Leishmaniiae clade and Protein orthogroups from Orthofinder analysis (in full).

    The first 15bp of the intron sequences appear to be conserved across the clade with the sequence becoming more variable thereafter.

    (XLSX)

    S4 Table. H. musccarum proteins orthologous to important T. brucei proteins: metabolism-associated proteins.

    (XLSX)

    S5 Table. H. musccarum proteins orthologous to important T. brucei proteins: Differentiation- and RNA-associated proteins.

    (XLSX)

    S6 Table. H. musccarum proteins orthologous to important T. brucei proteins: RNAi-associated proteins.

    (XLSX)

    S7 Table. H. musccarum proteins orthologous to important T. brucei proteins: Phosphatases.

    (XLSX)

    S8 Table. H. musccarum proteins orthologous to important T. brucei proteins: Protein kinases.

    (XLSX)

    S9 Table. H. musccarum proteins orthologous to important T. brucei proteins: GP63.

    (XLSX)

    S10 Table. H. musccarum proteins orthologous to important T. brucei proteins: Mucins.

    (XLSX)

    S11 Table. H. musccarum proteins without orthologues in other trypanosomatids.

    (XLSX)

    S12 Table. H. musccarum proteins orthologous to important T. brucei proteins: Kinetochore-associated proteins.

    (XLSX)

    S13 Table. H. musccarum proteins orthologous to important T. brucei proteins: Spliceosome-associated proteins.

    (XLSX)

    S14 Table. H. musccarum proteins orthologous to important T. brucei proteins: Exosome-associated proteins.

    (XLSX)

    S15 Table. H. muscarum proteins orthologous to important T. brucei proteins: Nuclear proteins.

    (XLSX)

    S16 Table. H. muscarum genes differentially expressed between log and stationary phase H. muscarum in vitro culture.

    (XLSX)

    S17 Table. Pfam domains significantly enriched in differentially regulated H. muscarum genes upon entry into stationary phase during axenic culture (vs. log phase).

    (XLSX)

    S18 Table. H. muscarum genes differentially expressed between log phase in vitro culture and after ingestion by D. melanogaster (all samples).

    (XLSX)

    S19 Table. H. muscarum genes differentially expressed between log phase in vitro culture and 6 hours after ingestion by D. melanogaster.

    (XLSX)

    S20 Table. Significantly enriched Pfam domains in differentially regulated Herpetomonas muscarum genes at six hours post ingestion by Drosophila melanogaster (vs log-phase axenic culture).

    The table shows the top 10 represented Pfam domains in the significantly up- and downregulated genes. Chi-squared tests were performed to test for statistically significant enrichment of the Pfams frequency in upregulated genes vs. the Pfams in the whole genome.

    (XLSX)

    S21 Table. Structural predictions for differentially expressed H. muscarum surface proteins.

    Structural predictions were acquired using the TMHMM1.0 online tool (Krogh et al., 2001).

    (XLSX)

    S22 Table. H. muscarum genes differentially expressed between 6 and 12 hours after ingestion by D. melanogaster.

    (XLSX)

    S23 Table. H. muscarum genes differentially expressed between 12 and 18 hours after ingestion by D. melanogaster.

    (XLSX)

    S24 Table. Table of genes differentially regulated between Herpetomonas muscarum after ingestion by Drosophila melanogaster vs. stationary phase axenic culture and not log phase axenic culture, p-adjusted < 0.05.

    (XLSX)

    S25 Table. The top 10 represented Pfam domains in Herpetomonas muscarum ingested by Drosophila melanogaster vs. stationary phase axenic culture.

    (XLSX)

    S26 Table. Summary statistics of three H. muscarum genome annotations using gene models from L. major, L. braziliensis and T. brucei and a maximal annotation which combines all three annotations.

    (XLSX)

    Data Availability Statement

    All sequencing data are available at the European Nucleotide Archive (ENA) under accession number ERP008869.


    Articles from PLoS Genetics are provided here courtesy of PLOS

    RESOURCES