Summary
Soil biota has a crucial impact on soil ecology, global climate changes, and effective crop management and studying the diverse ecological roles of dipteran larvae deepens the understanding of soil food webs. A multi-omics study of Pseudolycoriella hygida comb. nov. (Diptera: Sciaroidea: Sciaridae) aimed to characterize carbohydrate-active enzymes (CAZymes) for litter degradation in this species. Manual curation of 17,881 predicted proteins in the Psl. hygida genome identified 137 secreted CAZymes, of which 33 are present in the saliva proteome, and broadly confirmed by saliva CAZyme catalytic profiling against plant cell wall polysaccharides and pNP-glycosyl substrates. Comparisons with two other sciarid species and the outgroup Lucilia cuprina (Diptera: Calliphoridae) identified 42 CAZyme families defining a sciarid CAZyme profile. The litter-degrading potential of sciarids corroborates their significant role as decomposers, yields insights to the evolution of insect feeding habits, and highlights the importance of insects as a source of biotechnologically relevant enzymes.
Subject areas: Mycology, Biocatalysis, Plant biology
Graphical abstract

Highlights
-
•
The Pseudolycoriella genome (609 Mb) encodes 17881 genes and 1022 CAZymes
-
•
The saliva proteome includes 33 of the 137 predicted secreted CAZymes
-
•
Diverse saliva CAZyme activities detected on polysaccharides and synthetic substrates
-
•
Comparison with other sciarids shows a rich conserved repertoire of secreted CAZymes
Mycology; Biocatalysis; Plant biology
Introduction
Soil is an integral part of terrestrial ecosystems and participates in biogeochemical cycles and is host to about 40% of the global animal biomass.1 Although 30 trophic groups constitute the most dominant taxa in soil food webs,2 soil fauna are generally underrepresented in global surveys.3,4,5 Understanding how the soil biota sustains soil food webs, their participation in decomposition of organic matter, nutrient release and cycling, soil aggregation, and plant growth can contribute to crop management in agriculture and improved understanding of global climate change.2,6
Soil biota includes prokaryotes and fungi that release enzymes into the environment together with heterotrophic protists and animals that ingest and digest plant litter.2,7 In order to use vegetal and fungal biomass as a food source, decomposers must produce and secrete enzymes that digest the components of plant and fungal cell walls, including the polysaccharides cellulose, hemicellulose, and chitin. Descriptions of carbohydrate-active enzymes (CAZymes) are organized in a curated database,8 which is based on families of structurally related enzymes and carbohydrate-binding modules that act in the formation, modification, and cleavage of the chemical bonds in plant and fungal cell wall polysaccharides. This information has been used to identify diverse strategies for polysaccharide degradation in a wide variety of animals,9 either as part of mutualistic interactions involving the enzymes secreted by gut microbiota or through the action of endogenous enzymes in some arthropods.9 The study of lignocellulose utilization by endogenous enzyme systems is an emerging field with potential impact for agroindustry. Indeed, recent transcriptome analyses have identified both isopods and coleopterans as rich sources of CAZymes10,11,12,13 suggesting that arthropods are a hitherto poorly studied yet abundant source of CAZymes.
The insect order Diptera (true flies) with over 180 families and 160,000 species represents 10% of all global animal diversity14,15,16 and is present in virtually all terrestrial ecosystems. In soil, dipteran larvae fulfill diverse ecological roles that include detritivory, fungivory, herbivory, and predation, as well as falling prey to predatory macroinvertebrates. The Sciaridae family, also known as black fungus gnats, belong to the Diptera infraorder Bibionomorpha, a megadiverse group that includes two other species-rich families, the Mycetophilidae and Cecidomyiidae.17 Sciarids are often uniformly dark-colored and are found in diverse, mostly shady moist habitats such as forests, moorlands, and pastures.18 Several studies indicate that sciarid larvae participate in soil food webs where they feed on decaying plant material such as leaf litter and dead wood previously attacked by fungi.18,19,20,21,22 A small number of species also feed on living plant material or fungi, and some of these become agricultural pests that cause damage to crops, greenhouse plants, and cultures of edible mushrooms.23,24,25,26 However, the details of biomass degradation by these flies and insects as a whole are largely unknown. In addition to the recent focus of taxonomic27,28,29 and genomic studies,30,31,32,33 sciarids have also been studied as model organisms to understand molecular mechanisms of developmentally regulated gene amplification and sex determination that are characteristic of this family,34,35 as well as more general metazoan mechanisms such as ribosomal RNA and telomere organization.35
With the aim of further understanding the role of sciarids in biomass degradation, we have combined genomic, proteomic, and biochemical approaches to characterize the repertoire of CAZymes present in the saliva of Pseudolycoriella hygida (Sauaia and Alves, 1968)36 (Figure 1A). This sciarid has been maintained in the laboratory for nearly six decades and has been referred to as “Bradysia hygida” in the papers published so far, following the original description by Sauaia and Alves.36 The position of this species in the system of Sciaridae was reviewed on the basis of genetic and morphological markers (Trinca et al.32 and this study), and the necessary correction of the binary species name is justified in the section “Re-classification of B. hygida” (see experimental model and subject details in STAR Methods). The Psl. hygida genome constitutes the third annotated sciarid genome to become available in GenBank, from an underrepresented family among the Diptera genomes, and the first of the Pseudolycoriella genus to be sequenced.
Figure 1.
The Psl. hygida genome
(A) Psl. hygida life cycle. See STAR Methods for further details.
(B) Manually corrected Hi-C heatmap. The four main red squares in the diagonal correspond to the four Psl. hygida chromosomes. Chromosomes A, B, C, and X are indicated at the top and right end of the image. Scaffolds that were not integrated into the main chromosome-length scaffolds are presented in the extreme lower right square.
(C) Distribution of repetitive sequences in the Psl. hygida genome. Repeat classes that represent less than 1% of the genome were grouped in the “Other” category.
(D) BUSCO assessment results for the final genome assembly and the predicted gene set. The coloring scheme for the bar segments is shown in the inset of the panel.
(E) Number of genes functionally annotated in each database, as indicated on the left hand of the panel. A total of 17,881 genes were annotated in the Psl. hygida genome. See also Figures S1, S3, and S4 and Tables S1–S5 and S8.
The set of secreted CAZymes encoded by the Psl. hygida genome is indicative of its ability to degrade fungal and plant cell walls including enzymes that hydrolyze cellulose, xylans, glucans, pectin, and chitin and is well supported by biochemical assays and proteomic analysis of the larval salivary secretion. Secreted CAZymes sets have also been identified in the genomes of the sciarids Bradysia tilicola (Loew, 1850)37 (= Sciara coprophila Lintner, 1895),38 the Chinese chive maggot, Bradysia cellarum (Frey, 1948)39 (= Bradysia odoriphaga Yang and Zhang, 1985),40 and the Australian sheep blowfly, Lucilia cuprina (Wiedeman, 1830)41 (Diptera: Calliphoridae) whose larvae are primary facultative parasitic and feed on tissue fluids, blood, and dermal tissue of the vertebrate host.30,33,42 Comparison of these four sets of secreted CAZymes identified a cohort of 42 CAZyme families active against components of fungal and plant cell walls that we suggest defines a sciarid CAZyme repertoire, together with 32 CAZyme families that are common to all four species that probably represent common metabolic traits involving CAZyme activities. Together, this study provides new information and expands the knowledge with respect to the role of sciarids as decomposers, yields insights on the evolution of feeding habits in insects, and is also a resource for the prospection of enzymes of interest for biotechnological processes.
Results and discussion
The Psl. hygida genome
The Psl. hygida genome comprises 609.27 Mb (N50 = 155.8 Mb; L50 = 2; final coverage 440-fold) (Figures 1B, S1, and Tables S1–S3) and constitutes the first genome of this genus to be sequenced and the third annotated sciarid genome to be available in the public databases constituting a contribution to the area of Diptera genomics. The genome coverage in Diptera is uneven and of the 371 genome assemblies currently available (GenBank, January 2023), 77% are from the Brachycera (285 assemblies), followed by the Culicomorpha (61 assemblies, 16%). On the other hand, the megadiverse Bibionomorpha infraorder, which includes the very large families of Sciaridae, Mycetophilidae, and Cecydomyiidae,15,17 is represented by only 14 genome assemblies available (3.7%).
The Psl. hygida genome is larger when compared to those of the sciarids B. tilicola (301.9 Mb)33 and B. cellarum (362.8 Mb).30 The genome of Psl. hygida is not only larger but also approximately 62% of its content is composed of repetitive sequences (Figure 1C and Table S4), which is 50% higher when compared to repetitive sequences of the B. tilicola (41%) and B. cellarum (40.6%) genomes. The high repetitive sequence content of Psl. hygida partially accounts for the larger genome size of Psl. hygida, when compared to the two other available sciarid genomes. Nevertheless, BUSCO analyses43 using the Insecta_odb10 dataset revealed that the Psl. hygida genome is 94.6% complete (Figure 1D), similar to that described for the B. tilicola (94.3%) and B. cellarum (96.5%) genome assemblies.30,33
The assembled Psl. hygida genome is the most contiguous among the three sciarid genomes, which was achieved through the use of Hi-C libraries (Figure S1). The chromosome numbers of this species are 2n = 8 (XX) and 2n = 7 (XO), for females and males, respectively, and were determined through the analysis of the mitotic chromosomes of larval neuroblast cells.44 Accordingly, four main chromosome-length scaffolds were assembled (Figure 1B), matching the four Psl. hygida chromosomes (A, B, C, and X),44 which comprise 90.3% (550 Mb) of the whole genome assembly (Table S5). BLASTN searches using previously mapped Psl. hygida45,46,47 as queries identified the chromosome-length scaffolds 1, 2, and 4 as chromosomes A, B, and C, respectively (Table S5). In addition, the contact matrix of the final assembly (Figure 1B) agrees with the Psl. hygida karyotype that presents two metacentric autosome pairs (A and B), and a third and smallest autosome pair (C), which is subtelocentric.44 The chromosome-length scaffold 3, for which no previously characterized gene was available, was identified as chromosome X by elimination (Table S5) and two observations support the identification of chromosome-length scaffold 3 as the X chromosome. Psl. hygida female larvae present an unequal metacentric sexual pair (X), whereas males present one metacentric X chromosome44 and in the contact matrix of the final assembly, chromosome-length scaffold 3 is metacentric (Figure 1B). In addition, a high-quality cytogenetic map based on female larval polytene chromosomes is available for Psl. hygida,48 which describes the occurrence of several regions of ectopic pairing in one of the arms of the X chromosome. Inspection of the contact matrix of the final assembly (Figure 1B) reveals that the short arm of chromosome length 3 is not as well resolved as the long arm. This might be the result of the pooling of three unequal short X arms, two from females and one from males, that were both present in the sequenced DNA and that contain interspersed non-identical sequences.
The remaining 9.7% (60 Mb) of the genome is distributed in 6,448 scaffolds that were not integrated into the chromosome-length scaffolds, most likely due to the high repetitive sequence content (Figure 1C and Table S4). Previous studies have shown that the 18S and 28S ribosomal genes in Psl. hygida reside in the short arm of chromosome C,49,50 which is subtelocentric.44 However, the short arm of chromosome C is not observed in the contact matrix of the final assembly (Figure 1B). BLASTN searches confirmed the presence of the 18S and 28S ribosomal genes in chromosome-length scaffold 4/chromosome C (data not shown) and also in about 1% of the 6,448 scaffolds that were not integrated into the chromosome-length scaffolds. Together, these results support the suggestion that part of the 6,488 scaffolds contains repetitive sequences that were not integrated into the chromosome-length scaffolds.
The majority (48.8%) of repetitive sequences in Psl. hygida are unclassified, which corresponds to about 30.6% of the whole genome sequence. About 84.1% and 16.0% of the repetitive sequences are unclassified, respectively, in B. tilicola and B. cellarum, which accounts for 29.2% (B. tilicola) and 6.5% (B. cellarum) of these genomes.30,33 The main classes of repeat elements in the Psl. hygida genome are long interspersed nuclear elements (LINEs) (17.6% of the genome), followed by DNA elements (6.3%) and long terminal repeats (LTRs) (5.7%) (Table S4). In B. tilicola, DNA elements account for 3.4% of the genome, followed by LINEs (3.1%) and LTRs (1.5%), whereas B. cellarum includes LTRs (33.2%), LINEs (0.9%), and DNA elements (0.8%).30,33 These results reveal that even though the percentages corresponding to these repeat elements vary between each genome, LTRs, LINEs, and DNA elements are the most prevalent repeat elements in all three sciarid genomes.
Annotation of protein-coding genes with the Maker3 pipeline was based on public datasets and Psl. hygida transcriptomes derived from 20 biological samples that include different stages, tissues, and rearing conditions (Figure S2 and Tables S6 and S7). Results from BUSCO showed that the predicted gene set is 89.7% complete and comprises 17,881 protein-coding genes (CGs), 337 tRNAs, and 477 rRNAs (Figure 1D and Table S1). Out of the 17,881 CGs, 15,327 reside in the chromosome-length scaffolds, whereas 2,554 are distributed in the 6,448 scaffolds that were not integrated into the chromosome-length scaffolds. Searches performed against four different databases (NCBI metazoan database, KEGG, UniProt, and InterProScan) resulted in the functional annotation of 16,628 (92.9%) Psl. hygida CGs (Figure 1E and Table S1). A total of 15,603 CGs (87.3%) were annotated based on the NCBI metazoan database, with the majority of alignments against predicted genes from Sciaridae. Specifically, about 60% of the alignments were against the B. tilicola gene set (GenBank accession number GCA_014529535.1), and 28% against the B. cellarum gene set (GenBank accession number GCA_016920775.1). Furthermore, 73.6%, 69.4%, and 65.4% of the CGs were annotated using InterProScan, KEGG, and the UniProt database, respectively (Figure 1E and Table S1). Overall, 8,722 of the 17,881 (48.8%) Psl. hygida CGs were annotated using all four databases. Finally, 1,253 (7.0%) Psl. hygida CGs do not present significant similarity to sequences currently deposited in any public database (Figure S3). Gene Ontology (GO) annotation assigned 7,358 of the 13,164 InterProScan hits to 16,975 GO terms. In the GO Biological Process category, the five top terms were proteolysis (9.64%), transmembrane transport (8.65%), proteinphosphorylation (5.32%), carbohydrate metabolic process (4.12%), and regulation of DNA-templated transcription (4.07%) (Figure S4).
The Psl. hygida genome encodes a diverse repertoire of secreted carbohydrate-active enzymes
Sciarid larvae generally feed on decaying plant litter or fungi that grow on decaying wood and we, therefore, reasoned that enzymes participating in carbohydrate metabolism should be present in the Psl. hygida genome. We performed searches in the genome to identify the biocatalytic potential of Psl. hygida against polysaccharides commonly found in plant and fungal cell walls. An initial BLAST search of the Psl. hygida protein dataset against the carbohydrate-active enzymes database (CAZy, www.cazy.org) identified a total of 2,955 Psl. hygida proteins (e-value ≤ 1 × 10−5) (Figure 1E and Table S8). Applying a higher stringency (e-value ≤ 1 × 10−50) to the BLAST search resulted in the identification of 1,022 proteins, which after analysis with the SignalP 5.0 program51 resulted in the identification of 190 potentially secreted CAZymes in the Psl. hygida genome (Table S9). These sequences were further validated by manual curation and resulted in a final list containing 137 secreted proteins (Table S9) that includes 82 glycosyl hydrolases (GH - 59.9%), 27 glycosyl transferases (GT - 19.7%), 9 non-catalytic carbohydrate-binding modules (CBM - 6.6%), 10 enzymes with auxiliary activities (AA - 7.3%), 4 polysaccharide lyases (PL - 2.9%), and 5 carbohydrate esterases (CE - 3.6%) (Figure 2A).
Figure 2.
CAZymes distribution in the predicted Psl. hygida gene set and Gene Ontology (GO) annotation of the saliva proteome
(A) Sunburst chart showing the distribution of manually curated CAZymes in the predicted Psl. hygida gene set. The inner circle shows the CAZyme classes, and the outer circle shows the numbers of annotated CAZymes found in the named families within each class. The asterisks denote CAZymes that were also identified in the Psl. hygida saliva proteome (Table S9). Those predicted CAZyme families with a single representative are shown alongside the open brackets.
(B) The bars show the GO terms annotated in the saliva proteome in the Biological Process category. The labels on each bar give the percentage of each term. GO terms with less than 1% are not shown and correspond to 10.48% of the annotated proteins. (GH) Glycoside Hydrolase; (GT) Glycosyl Transferase; (AA) Auxiliary Activities; (CBM) Carbohydrate-Binding Module; (CE) Carbohydrate Esterase; (PL) Polysaccharide Lyase. See also Figure S4 and Tables S9 and S10.
It is noteworthy that the CAZy database includes enzymes that are active against host organism components such as the extracellular matrix polysaccharides and endogenous insect chitin. An overview of the CAZyme sequences (Figure 2A) reveals that the predicted CAZyme secretome is rich in chitinases (GH18 and 19) and chitin-active enzymes (GH20 and CE4). Enzymes involved in the degradation of plant and fungal cell walls were also present, with representatives from various families of β-glucanases (GH5, 16, 17, and 55), along with endoglucanases (GH9 and 45) and β-glucosidases (GH1) that act in cellulose digestion. In addition, enzymes active against xylans were identified (GH10, 11, 43, 67, and 115), together with pectinolytic enzymes (GH28, PL8, and CE8) and amylose digesting alpha-amylases/alpha-glucosidases (GH13 and 31). In addition to GH families, oxidoreductases are represented by glucose oxidases (AA3), glucose dehydrogenase (AA12), and lytic chitin monooxygenases (AA15) (Figure 2A). The predicted secreted proteins from Psl. hygida include relatively few known CBMs (Figure 2A), which is in contrast to the genomes of prokaryotes and many fungi found in biomass-degrading environments.52,53 The chitin binding family CBM14 is associated with CE4 catalytic domains, and β-glucan binding family CBM39 is associated with GH16 β-glucanase domain. The CBM39-GH16 fusion is the ortholog of the Drosophila melanogaster GNBP3 beta-glucan binding protein involved in sensing fungal infection and triggering Toll pathway activation.54 During manual curation, nine sequences were found to contain multiple tandem repeats of chitin-binding domains (annotated as CBM14 domains) which is a typical feature of insect peritrophins.55 A total of nine GT families are represented and the GT1 family with 18 enzymes is the highest number observed for a single family; however, none of the GT family proteins identified in the genome was confirmed in the salivary gland proteomic studies (CAZyme families with asterisks in Figure 2A; highlighted in green in Table S9). Since GTs that are directed to the secretory pathway are either endoplasmic reticulum or Golgi apparatus–resident proteins,56 they will not be further discussed here.
Proteome analysis reveals the presence of CAZymes in the saliva of Psl. hygida larvae
Laboratory-cultured Psl. hygida larvae feed on a mixture of partially decomposed Ilex paraguariensis leaves and soil, and during the whole larval stage continuously secrete saliva that can be observed under the stereomicroscope as fine threads deposited on top of the diet. To characterize the saliva protein content and investigate the presence of the predicted CAZymes, we next performed a proteome analysis of the saliva from 12-day-old larvae, a developmental stage during which the larvae are actively feeding. The saliva showed a complex protein composition according to SDS-PAGE (Figure S5) that was further confirmed by the mass spectrometry analysis of the gel slice. After comparison with the predicted Psl. hygida protein database, a total of 1,149 proteins were identified, of which 375 included a signal peptide as identified by SignalP 5.051 (Table S10). Of the nine chitin binding peritrophin-A domain proteins containing a signal peptide in the Psl. hygida genome, only a single triple CBM14 repeat protein (Bhyg_10270-RA) that is typical of cuticular proteins analogous to peritrophins (CPAPs)55 was identified in the proteome. This result suggests only minor contamination of the saliva samples with proteins from larval tissues and indicates that the saliva collection method of tying the last abdominal segment to avoid contamination with fecal material did not result in larval injuries.
From the 375 proteins that present a signal peptide, Gene Ontology analysis in the Biological Process category classified 124 proteins, identifying 26 (21%) as being involved in carbohydrate metabolic processes, surpassed only by proteolysis with 78 (62.9%) (Figure 2B). Comparison with the GO biological function categories of all proteins predicted in the genome (Figure S4) shows the saliva is enriched in enzymes involved in proteolysis (6.5-fold) and carbohydrate metabolism (5.1-fold), which may be an adaptation to the decomposer lifestyle for obtaining saccharides, peptides, and amino acids from the environment and also in detoxifying or avoiding plant defenses.57
Thirty three of the 137 predicted secreted CAZymes in the Psl. hygida genome were also present in the set of 375 proteins with a signal peptide identified in the saliva proteome, corresponding to 8.8% (33/375) of the secreted proteins in the proteome and 24.1% (33/137) of the proteins present in the final Psl. hygida CAZyme curated list (Table S9). These results indicate that the genome annotation was successful in predicting the catalytic capacity of the Psl. hygida secretome, and the list of 33 CAZymes identified in the saliva proteome was used for experimental planning for enzyme activity assays.
Enzyme assays confirm CAZyme activity in Psl. hygida saliva against polysaccharides and synthetic substrates
The CAZyme content predicted from genome annotation and found in the saliva proteome was next correlated with enzyme activity measurements using a range of polysaccharide and synthetic pNP-glycosyl substrates. We initially investigated enzymatic activities against polysaccharides present in the plant cell wall. Reducing sugar-releasing activity was detected using the polysaccharide substrates xyloglucan, β-glucan, xylan, and pectin, as well as the cellulose derivative carboxymethyl cellulose (CMC) and Avicel (microcrystalline cellulose) (Figure 3A). These results confirm the presence of enzymes that use these polysaccharides as substrates, in particular the high levels of β-glucanase activity correlates with the presence of glucan β-glucanase (Bhyg_05720-RA and Bhyg_05920-RA) and glucan β-glucosidases (Bhyg_03001-RA and Bhyg_11825-RA) (Table 1). Catalytic profiling with liquid chromatography–mass spectrometry (LC-MS) further confirmed the saliva β-glucanase activity by the observation of DP2, DP3, and DP4 hydrolysis products with the β-glucan substrate (Figure S6). Xylanase activity is correlated with the presence of endo-1,4-β-xylanases (Bhyg_03654-RA and Bhyg_10010-RA) and α-L-arabinofuranosidase (Bhyg_13495-RA). Enzymes active against pectin included a polygalacturonase (Bhyg_08062-RA) and pectate lyases (Bhyg_02396-RA and Bhyg_08863-RA) (Table 1). The reducing sugar-release assays with CMC and Avicel substrates show low levels of activity, and LC-MS results (Figure S7) demonstrating the presence of cellobiose (or carboxymethyl cellobiose in the case of the CMC substrate). These are typical catalytic properties of GH9 β-glucanases,58,59 and two GH9 enzymes (Bhyg_05554-RA and Bhyg_13095-RA) are observed in the saliva proteome (Table 1).
Figure 3.
Catalytic activity of the Psl. hygida saliva
(A) Catalytic activity of saliva measured against polysaccharides at pH 5 (blue bars) and pH 8 (orange bars). The polysaccharides are as indicated in the graph. Error bars show the mean ± SD of 3 measurements from each of the two biological replicates.
(B) Catalytic activity of saliva measured against synthetic p-nitrophenol monosaccharide derivatives at pH 5 (blue bars) and pH 8 (orange bars). Individual substrates are as shown in the graph. Error bars show the mean ± SD of 3 measurements from each of the two biological replicates.
(C) Chromatograms using LC-MS detection of the retention times of chitin oligosaccharides of chitin samples (black lines) and after treatment with saliva (red lines). The degree of polymerization and m/z ratios of the individual chitooligosaccharides are as indicated in the relevant panel. See also Figures S6–S10 and Tables 1, S9 and S10.
Table 1.
CAZymes identified in the proteome with inferred activities in the biochemical assays and discussed in the text
| Protein | Annotation | EC Number | CAZy family | Substrate | Activity |
|---|---|---|---|---|---|
| Bhyg_05720-RA | Endo-1,3-β-glucanase | 3.2.1.39 | GH17 | β-glucan | PCW |
| Bhyg_05920-RA | Endo-1,3-β-glucanase | 3.2.1.39 | GH16_4 | β-glucan | PCW |
| Bhyg_03001-RA | Glucan-1,3-β-glucosidase | 3.2.1.58 | GH55 | β-glucan | PCW |
| Bhyg_11825-RA | Glucan-1,3-β-glucosidase | 3.2.1.58 | GH5 | β-glucan | PCW |
| Bhyg_03654-RA | Endo-1,4-β-xylanase | 3.2.1.8 | GH11 | Xylan; pNP- β –Xyl | PCW |
| Bhyg_10010-RA | Endo-1,4-β-xylanase | 3.2.1.8 | GH10 | Xylan; pNP- β –Xyl | PCW |
| Bhyg_13495-RA | Exo-α-(1–5)-L-arabinofuranosidase | 3.2.1.55 | GH43_26 | Xylan; pNP-α-Ara | PCW |
| Bhyg_08062-RA | Polygalacturonase | 3.2.1.67 | GH28 | Pectin | PCW |
| Bhyg_02396-RA | Exo-β-1,4-glucuronan lyase | 4.2.2.- | PL8_4 | Pectin | PCW |
| Bhyg_08863-RA | Pectate lyase | 4.2.2.2 | PL1 | Pectin | PCW |
| Bhyg_05554-RA | Endoglucanase | 3.2.1.4 | GH9 | CMC; Avicel | PCW |
| Bhyg_13095-RA | Endoglucanase | 3.2.1.4 | GH9 | CMC; Avicel | PCW |
| Bhyg_02719-RA | Lactase | 3.2.1.21/23 | GH1 | pNP- β -Glu; pNP- β -Fuc | PCW |
| Bhyg_13059-RA | β-galactosidase | 3.2.1.23 | GH35 | pNP-β-Gal | PCW |
| Bhyg_05816-RA | Endochitinase | 3.2.1.14 | GH19 | Chitin | IE/FCW |
| Bhyg_10652-RA | Chitooligosaccharidolytic β-N-acetylglucosaminidase | 3.2.1.52 | GH20 | Chitin | IE/FCW |
| Bhyg_13716-RA | Lytic chitin monooxygenase | 1.14.99.53/54 | AA15 | Chitin | IE/FCW |
| Bhyg_09101-RA | Lytic chitin monooxygenase | 1.14.99.53/54 | AA15 | Chitin | IE/FCW |
Protein, Annotation, EC number, and CAZy family are as described in Table S9. Substrate indicates the substrate in which enzymatic activity was detected in the biochemical assays. Activity indicates if the identified proteins and respective detected enzymatic activities may be related to plant cell wall degradation (PCW), insect exoskeleton remodeling during moulting (IE), or fungal cell wall degradation (FCW). See also Figures 3, S6–S8 and Tables S9 and S10.
To further characterize the enzymatic activities, we next assayed the activity against a set of pNP-glycosyl substrates (Figure 3B). Activity against pNP-β-Fuc, pNP-β–Xyl, pNP-β-Glu, pNP-β-Gal, and pNP-β-Man was observed, and significantly lower activity was detected against the substrates pNP-α-Ara, pNP-α–Xyl, and pNP-α-Gal (Figure 3B). The observed activities broadly reflect the CAZyme capacity identified in the saliva proteome, where β-1,4-xylanases (Bhyg_03654-RA and Bhyg_10010-RA), β-1,4-glucosidase (Bhyg_02719-RA), and a β-1,4-galactosidase (Bhyg_13059-RA) were identified (Tables 1 and S9). An α-L-arabinofuranosidase (Bhyg_13495-RA) was identified in the saliva proteome, and the low-level activity against the pNP-α-Ara substrate is consistent with its function as a debranching enzyme that removes arabinosyl groups from O-2- or O-3-mono-substituted xylose residues in arabinoxylan.60 Interestingly, β-fucosidase and β-mannosidase activities were detected, yet the proteome identified α-fucosidases (Bhyg_00429-RA and Bhyg_15341-RA) and an α-mannosidase (Bhyg_09007-RA) (Table S9). The amino acid sequence of the GH1 enzyme identified in the proteome (Bhyg_02719-RA) reveals all residues in the −1 glycone-binding site are conserved with other GH1 β-glucosidases that also present β-fucosidase activity;61 however, no clear candidate enzyme for the β-mannosidase activity was identified in the proteome. The observed levels of salivary α-xylosidase activity were low, and no α-xylosidase candidate was observed in the proteome. No α-galactosidase activity was detected in the saliva, and indeed no secreted α-galactosidases were predicted in the saliva. The broad corroboration between the biochemical data derived from enzymatic activity assays and the saliva proteome studies demonstrates that Psl. hygida larvae secrete a collection of CAZymes that can actively depolymerize diverse plant cell wall polysaccharides and so provide a source of carbohydrates for growth and development.
The proteomic analyses also indicate the presence of endo- and exo-chitinases in saliva samples (Bhyg_05816-RA and Bhyg_10652-RA, respectively), enzymes potentially active against chitin, a polysaccharide component of both insect exoskeleton and fungal cell wall. LC-MS experiments confirmed endochitinase activity after incubation of chitin with saliva by the absence of DP4 and DP5, and the reduction in the levels of DP3 chitooligosaccharides together with the concomitant increase in levels of N-acetylglucosamine (DP1) and N,N′-diacetylchitobiose (DP2) (Figure 3C and Table 1). No evidence of chitin acetylation was detected in the LC-MS experiments. Further analysis of the chitin incubation LC-MS spectra revealed evidence for the generation of low levels of aldonic acid and lactone products (Figure S8), indicative of oxidative cleavage of the β-1,4-glycosidic bonds in the chitin polysaccharide. The proteome includes two chitin-specific family AA15 lytic chitin monooxygenases (Bhyg_13716-RA and Bhyg_09101-RA), members of the lytic polysaccharide monooxygenase superfamily (LPMO), and it is striking that trace LPMO activity could be detected against the chitin substrate even though the assays were not optimized for the detection of this activity. The function assigned to insect chitinases is exoskeleton remodeling during molting;62,63 however, the possibility that the chitinases observed in the larval saliva degrade chitin in fungal cell walls that are probably part of the larval diet in nature cannot be discarded.
CAZyme activity of Psl. hygida saliva on leaf matter
Shortly after eclosion, Psl. hygida larvae group together, and during the whole larval stage, these clusters display a migratory behavior toward the areas in the cultivation boxes containing fresh diet. During migration and feeding, the larvae continuously secrete saliva onto the diet and defecate, and we next investigated if the enzymatic activities detected in larval saliva are also present in the diet that had been exposed to free-living Psl. hygida larvae (Figure S9). In the experiments evaluating the effect of saliva on leaf litter, enzyme assays against each substrate were corrected for the activity observed in litter samples in the absence of saliva, and the levels of β-glucanase, xylanase, pectinase, and CMCase activities on the leaf matter exposed to larvae are all significantly higher as compared to unexposed leaf matter (Figure S9A). Higher activity was also observed against the substrates pNP-β-Fuc, pNP-β-Xyl, pNP-β-Glu, and pNP-β-Man, with lower activity against pNP-α-Gal, pNP-β-Gal, pNP-α-Ara, and pNP-α-Xyl (Figure S9B). The broad correlation in the activity data between the saliva samples (Figure 3) and the activity on plant litter (Figure S9) therefore support the hypothesis that enzymes in the saliva remain active once secreted into the environment.
We next performed a preliminary analysis of the activity in gut extracts from 12-day-old larvae. The observed activity profiles against pNP-glycosyl and polysaccharide substrates present differences when gut extracts were compared with saliva samples (Figure S10). For example, at pH 5, the pNP-β-Gal hydrolysis activity in the gut extract was relatively lower than in saliva. In contrast, the levels of xylanase activity in gut extracts were higher than in the saliva and the relative activity levels at pH 5 and 8 were similar in the gut extract but 6-fold higher at pH 5 in the saliva. Although the additional source of activity most likely includes a contribution from the gut microbiota,64 a search in the gut transcriptome identified corresponding transcripts for all 18 proteins identified in the saliva proteome and validated through enzymatic assays (Table 1). This suggests that a subset of the proteins detected in the saliva proteome are also expressed in the guts and therefore host gut enzymes are likely to also contribute to biomass degradation in sciarids. Further work is required to understand the details of the contributions of saliva and host gut CAZymes and gut microbiota to total enzymatic activity. Therefore, the enzymes detected in leaf litter after migration are most likely the combination of enzymes from the larval saliva, host gut enzymes in feces, and the gut microbiome, and might be a strategy that benefits groups of larvae feeding in the same area. This opens the interesting possibility that the contribution of sciarid larvae to the decomposition process in soils depends on a process of digestion that initiates with salivary secretion and continues after defecation.
Identification of a sciarid-specific CAZyme repertoire
We next investigated if the catalytic capabilities identified in Psl. hygida are also found in the two other sciarid species for which genome sequence information is available.30,33 CAZyme profiling followed by manual curation identified a total of 204 and 271 secreted CAZymes in the genomes of B. cellarum and B. tilicola, respectively (Figure 4 and Table S9). The sets of secreted CAZyme families present in B. cellarum (55 families) and B. tilicola (66 families) were compared to the 52 secreted CAZyme families present in the Psl. hygida genome. A total of 74 secreted CAZy families are present in at least one of the three investigated sciarid genomes. Out of these, 54.0% (40/74) of the CAZy families are common to all three species, 25.7% (19/74) are shared between two species, and 20.3% (15/74) are unique to one of the three species (Figure 5A).
Figure 4.
CAZyme families and occurrences identified in the genomes of three sciarids and L. cuprina
The number of enzymes in the CAZymes classes of: (A) auxiliary activities (AA).
(B) carbohydrate-binding modules (CBM).
(C) carbohydrate esterases (CE).
(D) glycosyl transferases (GT).
(E) polysaccharide lyases (PL). (F) glycosyl hydrolases (GH). Each bar includes the number of enzymes found in the given CAZy family in the four genomes analyzed, Psl. hygida (blue), B. cellarum (orange), B. tilicola (gray), and L. cuprina (yellow). For clarity, the GT (panel D) and GH (panel F) families are separated into two bar charts showing the less (i) and more (ii) numerous enzymes with suitably adjusted x axis scales. See also Table S9.
Figure 5.
Comparative analysis of the number of CAZy families found in the sciarid and L. cuprina genomes
(A) Venn diagram showing the number of CAZy families in the genomes of Psl. hygida, B. tilicola, and B. cellarum.
(B) Venn diagram showing the number of CAZy families present in the L. cuprina genome and in a combined dataset of all CAZy families present in the genomes of Psl. hygida, B. tilicola, and B. cellarum (All sciarids). See also Table S9.
The subset of secreted CAZyme families shared by the three sciarid species might be indicative of core biochemical pathways associated with dipteran CAZyme activities. Furthermore, this comparison may also identify CAZyme families associated with the general sciarid larval lifestyle, namely acting as plant litter decomposers in soil food webs. These hypotheses were evaluated by comparing the 74 sciarid CAZy families with the CAZy families present in the Calliphorid blowfly L. cuprina, a facultative parasitic species, whose larvae feed on tissue fluids, blood, and dermal tissue of a vertebrate host, with adults feeding on plant and animal material.42,65 The curated predicted secreted CAZyme profile of L. cuprina includes 148 sequences from 32 CAZy families, which is lower in comparison with the sciarids (Figure 4 and Table S9). Further comparison of these sequences with a combined dataset of all CAZyme families present in sciarids (Figure 5B) shows that 100% (32/32) of L. cuprina CAZyme families are common to all four species, probably reflecting common metabolic traits derived from CAZyme activities.
More specifically, all four sets of secreted CAZy families include GH18 chitinases (3.2.1.14), chitin-active GH20 β-hexosaminidase (EC 3.2.1.52), GH27 α-N-acetylgalactosaminidase (3.2.1.49), CE4 chitin deacetylase (3.5.1.41), and AA15, which is consistent with the suggested role of LPMOs in chitin turnover during molting66 (Figure 4 and Table S9). LPMOs preferentially use hydrogen peroxide, which may be generated by the AA3_2 GMC oxidoreductase, also present in all species.67 All four predicted secretomes include enzymes for trehalose metabolism, namely both GH37 and GH65 trehalases (3.2.1.28). The L. cuprina predicted secretomes include an increased number of genes encoding secreted GH22 C-type lysozyme (3.2.1.17) (Figure 4 and Table S9) as compared to the sciarid genomes and this may reflect an alternative chitin digestion strategy, although expression of C-type lysozyme could be related to immunity and or digestion, as proposed for the carrion beetle Nicrophorus vespilloides.68 All four species present multiple copies of the GH13_15 α-amylases (3.2.1.1) and GH13_17 α-glucosidases (3.2.1.20) indicating the potential for hydrolysis of α-(1→4) glycosidic bonds in starch and dextrans. In addition, one or more of the GH31 enzymes may present α-glucosidase activity. Finally, all four genomes include multiple copies of chitin-binding peritrophin/chorion/cuticle CBM14 proteins (Figure 4 and Table S9). In addition to the catalytically active enzymes, all four species present at least one CBM39-GH16_4 beta-glucan binding protein fusion (highlighted in yellow in Table S9). Three-dimensional modeling and amino acid sequence alignment confirms substitution of key catalytic nucleophile and acid/base residues in the active sites leading to a loss of enzymatic activity, consistent with their role as antifungal sensors in the insect innate immune system.54
In addition to the 32 CAZyme families shared by all four species, the comparison identified 42 CAZyme families in sciarids that were investigated in more detail (Figure 5B). It is noteworthy that the sciarid CAZy families are enriched in GH enzymes as compared to that of L. cuprina. Although GH18 chitinases are present in all four genomes, the GH19 chitinases are found only in the three sciarid predicted secretomes (Figure 4 and Table S9). Since the GH19 Bhyg_05816-RA is observed in the Psl. hygida proteome, these enzymes are candidates for secreted salivary chitinases in sciarids. In metazoans, GH19 chitinases have only been found in parasitoid wasps and mosquitoes, where their presence has been attributed to two independent horizontal gene transfer (HGT) events.69 Further phylogenetic studies are needed to establish whether the GH19 chitinases in sciarids are also the result of HGT. In addition, the AA3_2 and LPMO AA15 enzymes are present in all four predicted secretomes; however, sciarids also include the AA12 glucose dehydrogenase (1.1.99.35) (Figure 4 and Table S9). The glucose dehydrogenase may oxidize glucose to generate reducing equivalents required for LPMO activity, and we note that the AA12, AA3_2, and LPMO AA15 enzymes are all observed in the Psl. hygida saliva proteome together with LPMO activity (Tables S9 and S10, and Figure S8). This may indicate an additional role of the AA15 enzymes observed in sciarid saliva in the oxidative breakdown of cellulose and/or chitin in plant and fungal cell walls. Although not a CAZyme, secreted catalase was identified in the proteome of Psl. hygida saliva and catalases are present in all three sciarid predicted secretomes (Table S9) yet absent in L. cuprina. This may be indicative of a cohort of oxidoreductases in sciarid saliva that maintains a favorable redox balance for polysaccharide degradation.
All sciarid predicted secretomes contain enzymes active against the principal polysaccharide components of lignocellulose (Figure 4 and Table S9). Pectin degradation is suggested by the presence of pectate lyase (PL1), exo-polygalacturonase (GH28), and pectin esterase (CE8). In addition, pectin acetylesterase (CE13) is present in B. cellarum and B. tilicola. Potential xyloglucanases are represented by GH45 (3.2.1.4) activity against soluble β-1,4 glucan acting with debranching α-1,6-xylosidase (3.2.1.177), β-1,4-galactosidase (3.2.1.23), and α-1,2-fucosidase (3.2.1.51) provided by GH31, GH35, and GH29 enzymes, respectively. It is noteworthy that the B. cellarum genome encodes a β-1,4-galactosidase/α-1,2-fucosidase chimeric polypeptide (KAG4075199.1) comprised of the fusion between GH2 and GH29 domains. Cellulose degradation is suggested by the presence of GH9 endoglucanases (3.2.1.4) and putative GH1 β-glucosidases. CAZymes for glucan hydrolysis are well represented in all sciarid genomes, with possible endo-(1,3)-β-D-glucanases (3.2.1.39) from GH16, 17, 64, and 71 together with exo-(1,3)-β-D-glucanases (3.2.1.58) from GH5 and GH55 (Figure 4). This catalytic capacity would allow the hydrolysis of plant and fungal β-D-glucans including lichenin and laminarin. In addition, the B. cellarum and B. tilicola include the GH30_1 (3.2.1.-) and GH30_3 (3.2.1.75) enzymes that hydrolyze 1,6-β-D-glycoside linkages in fungal 1,3/1,6-β-D-glucans. The GH30 family is absent in the Psl. hygida genome (Figure 4 and Table S9), suggesting a more specialized capacity for fungal glucan hydrolysis.
Xylan active enzymes present subtle differences between the three sciarids (Figure 4 and Table S9). The Psl. hygida and B. tilicola predicted secretomes include the endoxylanases GH10 and GH11 (3.2.1.8), enzymes absent in B. cellarum. All sciarids include β-xylosidase (GH43, 3.2.1.37) and debranching enzymes xylan α-1,2-glucuronidase (GH67, 3.2.1.131, with Psl. hygida also including a GH115 enzyme), β-galactosidase (GH1, GH35, 3.2.1.23), and α-L-arabinofuranosidase (GH43_26, 3.2.1.55, with B. cellarum and B. tilicola also including a GH51 enzyme). This cohort of GHs hydrolyzes a range of hemicelluloses such as arabinoxylan, arabinogalactan, and L-arabinan.
The sciarid predicted secretomes include multiple GH31 and GH1 enzymes that may include α-galactosidase (3.2.1.22) and β-1,4-mannosidase (3.2.1.25, detected in the Psl. hygida saliva biochemical assays), respectively (Figures 3, 4, and Table S9). In addition, B. cellarum and B. tilicola include GH5_10 endo-1,4-β-mannosidase (3.2.1.78) potentially allowing hydrolysis of plant β-mannans, galactomannans, and galactoglucomannans. Furthermore, Psl. hygida contains a GH76 endo-1,6-α-mannosidase (3.2.1.101) suggesting hydrolytic capacity against α-mannans present in some yeast and fungal cell walls (Figure 4 and Table S9).
Overall, our analyses reveal that the CAZyme potential identified in the Psl. hygida genome, and partially validated through proteomic analysis and enzymatic assays, is also found in the genomes of B. tilicola and B. cellarum and suggests that all three sciarids are able to degrade and use plant and fungal cell walls as a food source. Our results are in agreement with previous studies, undertaken both under laboratory and field conditions, that have shown that the larvae of different Bradysia species are able to develop using different combinations of a variety of food sources, including, but not restricted to, decomposing plant material, fungus, healthy plant tissue, and manure (reviewed in the study by Harris et al.26).
Available information on feeding habits indicates that in the laboratory Psl. hygida feeds on decaying plant matter and can also use white mushrooms (Agaricus bisporus) as a food source; B. cellarum is considered to be an agricultural pest in China that causes damage to at least seven plant families crops70; B. tilicola is a synanthropic species that damages plants in greenhouses and edible mushrooms as shown by Broadley et al.71, and in the laboratory is maintained with a food mixture of yeast, mushroom powder, and ground straw.72 However, the possibility that each of these species might still exploit other food sources cannot be excluded, since their feeding habits have not been systematically studied in their natural habitats. A study that has reconstructed the Sciaridae phylogeny led to the proposal that the ancestral larval habitat of the family is dead plant material and that shifts to feeding on living plants occurred in the more derived genera, including Bradysia.29
We propose that the set of 42 CAZyme families identified in the comparative analysis (Figure 5B) enables sciarid larvae to feed on a wide variety of food sources including plant material and fungi, which could explain why some sciarid species are able to migrate to anthropogenic ecosystems and become synanthropic agricultural pests.23 The partial overlap between the feeding habits of the investigated species might explain the inability to unambiguously identify sets of species-specific CAZymes in this study. For instance, the available information about B. cellarum indicates it feeds on living plants;70 however, comparison of the repertoire of CAZymes in this species with Psl. hygida and B. tilicola (Figures 4, 5B, and Table S9) did not detect unique CAZyme families that might indicate B. cellarum is a phytophagous species. In addition, the repertoire of CAZymes shared between the sciarid species might also be due to other aspects related to species-specific niche factors. In either case, sequencing of additional sciarid genomes should shed light on whether the identified CAZyme potential found in the Pseudolycoriella and Bradysia is also present in other sciarid species.
The comparative genome analyses identified a putative set of enzymes that defines a sciarid CAZyme profile. This suggestion is supported by the observation that the differences in feeding habits between the sciarids and L. cuprina are reflected in their CAZyme repertoires. Vegetable matter is an essential food source for insects that feed on wood, foliage, or plant litter and it might be expected that CAZyme repertoires similar to the sciarid CAZyme profile should also be present in the genomes of other Diptera or insects using these sources for nutrients. Indeed, a recent comparative study with 23 species of longhorned beetles (Cerambycidae) identified a set of 10 CAZyme families present in the majority of the species that would enable these xylophagous beetles to deconstruct plant cell walls.10 It is noteworthy that of these 10 CAZyme families, none are present in the L. cuprina genome whereas four families (GH9, GH28, GH45, and GH43_26) are present in the three sciarid genomes together with GH10 that is found in both Psl. hygida and B. tilicola genomes. Additional studies might reveal whether the identified sciarid CAZyme profile includes a core of CAZymes that constitutes a signature of a more general and ancestral plant and fungal cell wall feeding habit. Further phylogenetic comparisons within other Diptera and insects will expand the knowledge of the relationships between CAZyme profiles and feeding habits.
Limitations of the study
Current estimates suggest that there are about 8000–10000 known sciarid species and one limitation of this study is related to the taxonomic sampling. The family Sciaridae currently consists of four accepted subfamilies (Chaetosciarinae, Cratyninae, Megalosphyinae, and Sciarinae) and a fifth, still undescribed subfamily with the provisional name “Pseudolycoriella group”, which includes ten genera.29,73,74,75 However, genome sequences are available only for two Megalosphyinae representatives of Bradysia (B. tilicola and B. cellarum) and a single member of the Pseudolycoriella genus group (Psl. hygida), which is presented in this study. In addition, the composition of the saliva of larvae found in natural environments is unknown, and the saliva employed in the proteomic and biochemical assays collected from Psl. hygida larvae reared in the laboratory may differ from that in the natural habitat. Furthermore, the biochemical characterization of the CAZyme potential identified in the saliva proteome of Psl. hygida was limited by the available substrates and it is anticipated that assays using additional substrates would reveal and/or confirm more CAZyme activities. Finally, the enzyme activity measured in the saliva is compared with the repertoire of CAZymes curated from the genome and those actually identified in the saliva proteome. Although this provides strong correlative evidence, data from the characterization of the heterologous enzymes would provide direct confirmation.
STAR★Methods
Key resources table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Chemicals,peptides andrecombinantproteins | ||
| Ammonium acetate | Supelco | Cat#73594 |
| Ammonium hydroxide | Sigma-Aldrich | Cat#30501 |
| 3,5-Dinitrosalicylic acid | Sigma-Aldrich | Cat#D0550 |
| Potassium chloride | Sigma-Aldrich | Cat#P3911 |
| Potassium phosphate monobasic | Sigma-Aldrich | Cat#P0662 |
| Sodium chloride | Sigma-Aldrich | Cat#S9888 |
| Sodium phosphate dibasic dodecahydrate | Sigma-Aldrich | Cat#71650 |
| Sodium tetraborate | Sigma-Aldrich | Cat#B9876 |
| p-nitrophenyl-β-D-fucopyranoside (pNP-β-Fuc) | Sigma-Aldrich | Cat#S781304 |
| p-nitrophenyl-β-D-xylopyranoside (pNP-β-Xyl) | Sigma-Aldrich | Cat#N2132 |
| p-nitrophenyl-β-D-glucopyranoside (pNP-β-Glu) | Sigma-Aldrich | Cat#N7006 |
| p-nitrophenyl-β-D-galactopyranoside (pNP-β-Gal) | Sigma-Aldrich | Cat#N1252 |
| p-nitrophenyl-β-D-mannopyranoside (pNP-β-Man) | Sigma-Aldrich | Cat#N1268 |
| p-nitrophenyl-α-L-arabinofuranoside (pNP-α-Ara) | Sigma-Aldrich | Cat#N3641 |
| p-nitrophenyl-α-D-galactopyranoside (pNP-α-Gal) | Sigma-Aldrich | Cat#N0877 |
| p-nitrophenyl-α-D-xylopyranoside (pNP-α-Xyl) | Sigma-Aldrich | Cat#N1895 |
| Avicel | Sigma-Aldrich | Cat#11365 |
| Carboxymethylcellulose | Sigma-Aldrich | Cat#C4888 |
| Citrus pectin | Sigma-Aldrich | Cat#P9135 |
| Chitin from shrimp shells | Sigma-Aldrich | Cat#C9752 |
| Xylan from beechwood | Sigma-Aldrich | Cat#9014-63-5 |
| Sequencing Grade Modified Trypsin | Promega | Cat#V5111 |
| RNeasy Plus Micro Kit | QIAGEN | Cat#74034 |
| Software and algorithms | ||
| Velvet 1.2.10 | Zerbino and Birney76 | https://github.com/dzerbino/velvet/ |
| HGAP4 | Chin et al.77 | https://github.com/PacificBiosciences/pb-assembly |
| Quiver 1.1.0 | Chin et al.77 | https://github.com/PacificBiosciences/pb-assembly |
| SSPACE 3.0 | Boetzer et al.78 | https://github.com/nsoranzo/sspace_basic |
| GapFiller 1–10 | Boetzer and Pirovano79 | https://sourceforge.net/projects/gapfiller/ |
| BUSCO 5.1.2 | Manni et al.43 | https://busco.ezlab.org/ |
| Juicer 1.5 | Durand et al.80 | https://github.com/aidenlab/juicer |
| 3D-DNA 180114 | Dudchenko et al.81 | https://github.com/aidenlab/3d-dna |
| JuiceBox Assembly Tools 1.11.08 | Robinson et al.82 | https://github.com/aidenlab/Juicebox |
| BLAST 2.10.0+ | Altschul et al.83 | https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download |
| Trimmomatic 0.36 | Bolger et al.84 | http://www.usadellab.org/cms/?page=trimmomatic |
| Trinity 2.10.0 | Grabherr et al.85 | https://github.com/trinityrnaseq/trinityrnaseq |
| RepeatModeler 2.0.1 | Flynn et al.86 | https://www.repeatmasker.org/RepeatModeler/ |
| RECON 1.08 | Bao and Eddy87 | http://eddylab.org/software/recon/ |
| RepeatScout 1.0.5 | Price et al.88 | http://www.repeatmasker.org/RepeatScout-1.0.6.tar.gz |
| Tandem Repeats Finder 4.0.9 | Benson89 | https://github.com/Benson-Genomics-Lab/TRF |
| RepeatMasker 4.0.7 | Smit et al.90 | https://www.repeatmasker.org/ |
| tRNA-scan 2.0.7 | Chan et al.91 | http://lowelab.ucsc.edu/tRNAscan-SE/ |
| Barrnap 0.9 | Seemann et al.92 | https://github.com/tseemann/barrnap |
| MAKER 3 | Campbell et al.93 | https://github.com/Yandell-Lab/maker |
| SNAP | Korf94 | https://github.com/KorfLab/SNAP |
| Augustus 3.4.0 | Stanke et al.95 | https://github.com/Gaius-Augustus/Augustus |
| GeneMark-ES 3.61 | Ter-Hovhannisyan et al.96 | http://exon.gatech.edu/GeneMark/ |
| InterProScan | Blum et al.97 | https://www.ebi.ac.uk/interpro/download/InterProScan/ |
| Draw Venn Diagram | N/A | https://bioinformatics.psb.ugent.be/webtools/Venn/ |
| SignalP 5.0 | Armenteros et al.51 | https://services.healthtech.dtu.dk/service.php?SignalP-5.0 |
| MassLynx 4.1 | Waters Corporation | https://www.waters.com/waters/en_US/MassLynx-MS-Software/nav.htm?cid=513662 |
| Origin 8.5 | OriginLab Corporation | https://www.originlab.com/index.aspx?go=PRODUCTS/Origin |
| Deposited data | ||
| Psl. hygida DNAseq raw data | This paper | GenBank: SAMN26680550; GenBank: SAMN26680551; GenBank: SAMN26680552 |
| Psl. hygida genome assembly and annotation | This paper | GenBank: WJQU00000000. (BioSample SAMN12911131; BioProject PRJNA575761). Version described in this paper is version WJQU01000000. |
| Psl. hygida RNAseq raw data | This paper | See Table S5 for GenBank accession numbers |
| Psl. hygida transcriptome assembly | This paper | GenBank: SAMN27413055 |
| Psl. hygida saliva proteome | This paper | ProteomeXchange Consortium (PRIDE): PXD033046 |
| Other | ||
| Ascentis Express HILIC – HPLC column (10 cm × 4.6 mm x 2.7 μm) | Supelco | Cat#53979-U |
Resource availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Nadia Monesi (namonesi@fcfrp.usp.br).
Materials availability
No newly generated materials are associated with this paper.
Experimental model and subject details
Re-classification of B. hygida
The Neotropical species described by Sauaia and Alves (1968)36 as ‘B. hygida’ has been used as a model organism in genetics since the late 1960s. The original description is based on type material of 50 adults (29 males, 21 females) reared in the laboratory. From this laboratory culture, which still exists today on the campus of the Universidade de São Paulo in Ribeirão Preto, Brazil (21°10′10.1″S 47°50′55.9″W), numerous males and females have been studied morphologically, which are conspecific with the type specimens of ‘B. hygida’ (Sauaia and Alves)36: 85–88, Figures 1–13. This dark brown species has a roundish head capsule with a closed eye bridge; long 3-segmented palps; a slender first palpal segment with several setae and an unmodified sensory area; a bare postpronotum; a triangular katepisternum; a scutellum with 6 strong setae; large wings with a short CuA-stem, short R/R1 vein complex, and a R5 vein with dorsal macrotrichia only; toothed tarsal claws as well as fore tibiae with an anteroapical patch of 6–8 setae, which tend to form a narrow, irregular row (setae without a common basal carina and patch without curved, depressed margin). The male genitalia have a sclerotised tegmen with a high arched median process ventrally; narrow oval gonostyli; a densely setose, rounded gonostylar apex without tooth; 3–4 mesial spines and a long, downcurved whiplash seta on the upper third of gonostylus. Because of the above-mentioned characters, the species belongs to the genus Pseudolycoriella Menzel & Mohrig, 199898,99 and must bear the name Psl. hygida (Sauaia and Alves, 1968) comb. nov. The new combination proposed here is supported by the genetic studies of (Trinca et al.32: Figures 3 and 4). They found in their analysis based on 32 mitogenome sequences that ‘B. hygida’ (GenBank accession no. MW442371) forms a cluster with another Pseudolycoriella, but not with the three included Bradysia species.
Psl. hygida life cycle
This description refers to the life cycle of Psl. hygida (Sauaia and Alves, 1968) comb. nov., a species originally described as belonging to the genus Bradysia (see “Re-classification of B. hygida”), which has been maintained in the laboratory since 1965, in the Campus of the University of São Paulo, in Ribeirão Preto, State of São Paulo, Brazil. At present, the culture is maintained in the laboratory of Dr. Monesi (21°10′10.1″S 47°50′55.9″W).
Psl. hygida was maintained in plastic cultivation boxes (11 cm × 11 cm) layered with humid soil as previously described.100 Whenever necessary, the soil was substituted by acid treated and sterilised sand, which does not interfere with Psl. hygida development. The life cycle lasts about 41 days at 22°C101 and comprises the embryonic stage (9 days), the larval stage (22 days), the pupal stage (5 days) and the adult stage (5 days) (Figure 1A). Fertilised eggs are placed on top of the soil and surrounded by a diet made of partially decomposed I. paraguariensis leaves. Soon after hatching, the larvae start feeding on a mixture of diet and soil. The larvae tend to form a single group and present a migratory behavior toward the fresher diet, which is added to the cultivation boxes on a daily basis. During the larval stage, which comprises four instars and three moults, the larvae ingest the diet and continuously secrete saliva which leads to the formation of a web of salivary gland secretion in the diet that can be readily observed under the stereomicroscope.
The fourth larval instar of Psl. hygida has been previously characterised.101 Briefly, at the 12th day after hatching the larvae undergo the third moult. On the 18th day after moulting (sixth day of the fourth instar) the larval eyespots appear (E1), which change in shape and position and constitute landmarks for staging larvae during the second half of the fourth instar. On the eighth day of the fourth instar (E3), the larvae abandon the food and start spinning the cocoon. The eyespot E7 (about 20 h after E3) is a reliable indicator of larval development and older larvae are staged by selecting E7 larvae and counting the hours after E7. The patterns of polypeptide synthesis during the fourth larval instar in the salivary gland and in the saliva have been previously characterised in SDS-PAGE gels followed by fluorography.101,102
Before pupation, the larvae build individual cocoons in the soil with a mixture of salivary gland secretion and soil. The pupal moult occurs at E7+26 h and the pupal stage lasts 5 days. Soon after emergence, the adults mate and two days later the females start laying eggs which are then collected to start a new cultivation box. According to Sauaia,48 the ratio between females and males in Psl. hygida is about 5:1. The identification of female larvae and pupae is based on their size and adult females are readily identified based on the terminalia. Because females are larger and their frequency is higher in the culture, larval, pupal, and adult samples consisted only of female specimens.
Method details
Psl. hygida sample collection
All biological samples were collected from a Psl. hygida culture maintained in Dr. Monesi's laboratory on the Campus of the University of São Paulo, in Ribeirão Preto, Brazil. For embryo collection, cultivation boxes containing recently emerged adults were monitored on a daily basis until the first sighting of eggs. Adults were removed and the cultivation boxes were further maintained at 22°C to obtain embryo collection over defined time intervals. For RNA extraction, embryo collections were performed at 6 different time intervals comprising the first six days of embryonic development (0–12 h; 0–48 h; 24–72 h; 48–96 h; 72–120 h; 96–144 h). The Psl. hygida eggs are approximately spherical with a diameter of 300 μm. Due to their small size, it is not possible to count the number of eggs to be used in each preparation and hence the embryo collections employed for nucleic acid extractions were weighed. Using the protocols described below, 100 mg of embryos yields about 130 μg of total DNA and about 20 μg of total RNA.
Staging of 12-day-old larvae was based both on the age of the larvae and on the selection of larvae that were undergoing the third moult, which is identified based on the absence of the head capsule. Staging of fourth instar larvae at E1, E3, and E7 were based on the shape and position of the eye spots.101 To collect older larvae (E7+8 h and E7+16 h), larvae at E7 were selected and maintained at 22°C over the desired time period. Guts (n = 31), fat body (n = 40), and brains (n = 150) were dissected from late fourth instar larvae, yielding about 4 μg (guts), 7 μg (fat body), and 4.5 μg (brains) of total RNA, respectively. Salivary glands (n = 14) were dissected from 12-day-old larvae, and fourth instar larvae at ages E1, E3, E7, E7+8 h, and E7+16 h, yielding about 15 μg of total RNA. All dissections were performed in Ringer’s solution (80 mM NaCl; 64 mM KCl; 4 mM MgSO4; 1 mM Ca acetate; 2.9 mM Na2HPO4; 2.1 mM NaH2PO4) and dissected tissues were placed in homogenization buffer (RNeasy Plus Micro kit, Qiagen N.V., Venlo, NE) and maintained on ice until RNA extraction. Twelve-day-old larvae were also collected from larval groups cultivated in boxes lined with sand and used as a source for RNA extraction from whole larvae (n = 5), dissected guts (n = 30), and salivary glands (n = 20), yielding about 9.3 μg (whole larvae), 22.6 μg (guts), and 17 μg (salivary glands), respectively, of total RNA.
Pupae and adults were staged based on the day of puparium formation and adult emergence, respectively. Whole pupae (1 to 5-day-old) (n = 10) and whole 3-day-old females (n = 25) were collected and immediately processed, yielding about 8.4 μg (whole pupae) and 2.6 μg (adults) of total RNA, respectively.
For saliva collection, approximately 12-day-old larvae were selected, and the last segment of the larvae was tied with elastane fishing line under the stereomicroscope to avoid contamination of the saliva with feces. Groups of 20 tied larvae were transferred to a microtube, a few holes were made in the microtube cap, followed by incubation at 22°C for 3 h during which the larvae continue to secrete saliva. After removing the larvae, the microtube was briefly centrifuged at 15000 g for 30 s, and for enzymatic assays the secreted saliva, usually 2–3 μL, was stored at −20°C in the presence of 10 μL of PBS 1x (0.136 mM NaCl; 2.7 mM KCl; 10 mM Na2HPO4.12H2O; 2 mM KH2PO4). For proteomic analysis the secreted saliva from approximately 80 larvae were pooled and stored in the absence of PBS. Prior to enzymatic assays or proteomic analysis, the protein in the pooled samples were quantified using the Bradford assay. According to our estimates, one larva secretes about 140 ng of total protein per hour.
DNA extraction and genomic sequencing
DNA was extracted from about 360 mg of 3-8-day-old Psl. hygida embryos collected from laboratory cultivation boxes as previously described (see Psl. hygida sample collection), which yielded about 460 μg of total DNA. Embryos were initially rinsed three times with wash buffer (300 mM NaCl; 0.25% Triton X-100), followed by 10 min treatment in a 2.6% hypochlorite solution to remove the chorion, and three final washes in sterile water. Genomic DNA was extracted as previously described.103 Briefly, after the final wash in water, the embryos were homogenized in 1 mL of homogenization buffer (60 mM NaCI, 10 mM EDTA, 5% sucrose, 0.15 mM spermine, 0.15 mM spermidine, 10 mM Tris-HCl, pH 7.5), using a pellet pestle (FisherbrandTM) coupled to a Pellet PestleTM Cordless Motor (FisherbrandTM). After the addition of 1 mL of TE buffer (30 mM EDTA, 2% SDS, 5% sucrose, 0.2 M Tris-HCl, pH 9.0) and proteinase K (final concentration 20 μg/mL), the homogenates were incubated at 37°C, for 3 h, under gentle agitation. After three phenol:chloroform (1:1) and one chloroform extractions, DNA was ethanol precipitated, washed with 70% ethanol and resuspended in 300 μL of sterile deionised water. DNA sizing and quantification were determined using High Sensitivity D1000 ScreenTape (Agilent Technologies, Santa Clara, CA, USA). Genomic DNA was sequenced using PacBio, that yielded 6.12 Gb (1,130,452 reads; 10x coverage), and Illumina technologies that yielded 113.78 Gb (758,568,804 reads; 190x coverage) (Tables S1 and S2). For Illumina sequencing, DNA libraries were prepared using the Illumina TruSeq DNA PCR-Free library preparation kit and sequenced in a HiSeq 4000 sequencing system (2 × 150 bp).
About 60 mg of 4–6 days old Psl. hygida frozen embryos were used as a starting point for the Hi-C libraries. After collection, the embryos were washed in MilliQ ultrapure water, to remove soil and diet residues, briefly dried out, weighed and frozen in liquid nitrogen. Library preparation was performed by Admera Health Inc (NJ, USA), using the Arima standard protocol that employs two restriction enzymes that recognize the ˆGATC and GˆANTC sequences. Libraries were sequenced in a NovaSeq 6000 system (2 × 150 bp) and yielded 143.34 Gb (955,646,550 reads in each direction) corresponding to a 240x genomic coverage (Tables S1 and S2). Sequencing (PacBio, Illumina, and Hi-C) was performed by Admera Health Inc (NJ, USA). The raw reads files have been deposited in GenBank under the BioProject accession number PRJNA575761 (See Table S2 for accession numbers).
Draft genome assembly
Draft genome assembly was performed by Admera Health Inc. (NJ, USA). Initially, the Velvet version 1.2.10 software76 was used to perform k-mer analysis and the relationship between the k-mers to construct the de Bruijn graph. Due to inherent sequencing errors and the existence of both low sequencing depth regions and repeated sequences, a complete de Bruijn graph could not be obtained, although there were some bifurcation points which could be connected continuously to generate segmented contig sequences. The HGAP4 software77 was used to identify the longest PacBio reads (seed-reads) and to map the shorter PacBio reads to the seed-reads, which resulted in polished consensus sequences. These error-free sequences in turn were employed to generate a preliminary draft assembly from the PacBio data. In the next step, the calibration software Quiver version 1.1.077 was employed to correct the contig draft assembly and to obtain the calibrated draft assembly results. Next, the SSPACE software version 3.078 was used to align reads from all sequencing libraries (PacBio and Illumina reads) back to the calibrated draft assembly. In this step, the distance, order, and orientation of the paired-end reads were all considered when assembling the contigs into scaffold sequences. Finally, GapFiller (version 1–10)79 was used to align the reads from all sequenced libraries back to the scaffolded sequences and employed the aligned reads to fill up the gaps within the scaffold sequences. At the same time, the scaffold sequences were extended to generate scaffold sequences with the lowest proportion of unknown base “N” and longest sequence lengths. To measure the completeness of the Psl. hygida draft genome assembly, the Insecta dataset of single-copy orthologous insects specific genes from OrthoDB (Insecta_odb10) was selected to investigate gene status (single-copy, duplicated, fragmented, and missing) in the scaffolded genome using the Benchmarking Universal Single-Copy Orthologs (BUSCO) version 5.1.2 software43 (Table S3).
Alignment of Hi-C reads and draft genome correction
The pipeline developed by the Aiden Lab (https://aidenlab.org) was used to refine the Psl. hygida draft genome assembly and resulted in 4 chromosome-length scaffolds (Figures 1B and S1). Juicer version 1.580 was initially used to align the Hi-C reads to the draft scaffolds with default parameters. 3D-DNA version 18011481 was used to correct the draft assembly with the aligned Hi-C reads evidence. Since 51% of the draft assembly scaffolds were smaller than the 3D-DNA default cut-off (15,000 bp), the specific threshold input scaffold length was decreased to 2,500 bp, corresponding to 97% of the draft assembly scaffolds. At the final step, the JuiceBox Assembly Tools software version 1.11.0882 was employed to visualise the 3D-DNA results and to manually correct the order, orientation, and to determine the boundaries of the chromosome-length scaffolds. BUSCO version 5.1.243 was employed to evaluate final genome assembly completeness against the Insecta_odb10 dataset.
The identity of three of the four chromosome-length scaffolds was determined using BLASTN, version 2.10.0+83 searches (e-value ≤ 1 × 10−5) to identify in the genome assembly three previously cytogenetically mapped Psl. hygida genes: A chromosome/EcR/(AF121910.2); B chromosome/BhB10-1 DNA puff gene/(L43904.1); C chromosome/BhC4-1 DNA puff gene/(U13883.1) (Table S5). Since no previously mapped gene was available for the X chromosome, the fourth chromosome-length scaffold was identified as the X chromosome by elimination. The Psl. hygida genome assembly described in this paper has been deposited at DDBJ/ENA/GenBank under the accession WJQU00000000 (BioSample SAMN12911131; BioProject PRJNA575761). The version described in this paper is version WJQU01000000.
Total RNA extraction and sequencing
RNA libraries were made from 20 different Psl. hygida samples (See “Psl. hygida sample collection” and Table S6). Dissected tissues were placed in homogenization buffer (RNeasy Plus Micro kit, Qiagen N.V., Venlo, NE) and maintained on ice, followed by total RNA extraction. Whole animals (embryos, larvae, pupae, and adults) were collected and immediately processed for total RNA extraction. All RNA samples were extracted using the RNeasy Plus Micro kit (Qiagen N.V., Venlo, NE), according to the manufacturer’s protocol. Total RNA was eluted in DEPC treated water, RNA integrity was evaluated using the RNA 6000 Nano Kit in a 2100 bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) and RNA quantification was determined using a Qubit 4 Fluorometer. RNA libraries were prepared using the Illumina TruSeq Stranded mRNA Sample Prep LT Protocol and sequenced in an Illumina HiSeq 2500 sequencing system, using the HiSeq SBS v4 kit and 100 bp paired-end reads. The RNA sequencing was performed by the “Centro de Genômica Funcional”, ESALQ-USP (Piracicaba, SP, Brazil). All SRA files have been deposited in GenBank under BioProject PRJNA575761 (Table S6).
Transcriptome assembly
The Psl. hygida transcriptome assembly was employed as Expressed Sequence Tag (EST) evidence in the genome annotation pipeline. Initially, low quality bases were removed from the reads in the FASTq files using Trimmomatic version 0.36,84 employing a sliding window of four bases and a mean quality cut-off value of 15 and using the Phred quality score (phred+33 encoding). The options “leading” and “trailing” were set to a value of 3. A single transcriptome fasta file was assembled comprising all RNA-seq FASTq files using Trinity version 2.10.0,85 employing default parameters. The longest isoform of each transcript was selected with the “get_longest_isoform_seq_per_trinity_gene.pl” Trinity script. Completeness of the assembled transcriptome was assessed using BUSCO version 5.1.2 and the Insecta_odb10 dataset.43 This transcriptome assembly has been deposited in GenBank under the accession SAMN27413055 (BioProject PRJNA575761) (Table S7).
Genome annotation
A Psl. hygida repeat elements library was built using RepeatModeler version 2.0.186 together with RECON version 1.08,87 RepeatScout version 1.0.5,88 and Tandem Repeats Finder version 4.0.9.89 The Psl. hygida de novo repeat elements library and the Dfam 3.0 database104 were employed by RepeatMasker version 4.0.790 to mask the repetitive sequences in the Psl. hygida genome assembly (Figure 1 and Table S4). The tRNAs were predicted with the tRNA-scan, version 2.0.791 in the masked genome assembly, using default parameters and rRNAs were annotated in the unmasked genome assembly, employing the Barrnap software, version 0.9,92 using default parameters.
Gene prediction employed the Maker3 version 3.01.03 pipeline93 together with SNAP94 (trained with the BUSCO output), Augustus version 3.4.0,95 and GeneMark-ES version 3.6196 (Figure S2). The first Maker3 prediction round employed SNAP, GeneMark-ES and used as evidence the Long-isoform Psl. hygida transcriptome and a dataset of the UniProt curated proteins.105 In the next step, the Maker3 output was used for Augustus training. Augustus output together with hints directly generated from RNA-seq reads and the assembled Psl. hygida transcriptome were used as the input for the final prediction round of Maker3.
For functional annotation, the predicted protein-coding genes were used as queries in BLAST searches (e-value ≤ 1 × 10−5 and identity ≥50%) performed against KEGG and UniProt databases, and a database of all metazoan proteins sequences available in the NCBI non-redundant (NR) database (2022/02/14). The InterProScan97 was used to assign Gene Ontology terms to the predicted protein coding genes. A Venn diagram showing the number of functionally annotated genes in the different databases was constructed using the web tool “Draw Venn Diagram” (https://bioinformatics.psb.ugent.be/webtools/Venn/). The Psl. hygida final annotated genome assembly was deposited at DDBJ/ENA/GenBank under the accession WJQU00000000. (BioSample SAMN12911131; BioProject PRJNA575761). The version described in this paper is version WJQU01000000.
Identification of CAZymes
The subset of potentially secreted CAZymes in the Psl. hygida predicted gene set was identified through BLAST searches (e-value ≤ 1 × 10−5 and identity ≥10%) against the dbCAN2 database106 (Table S8). Alignments with an e-value ≤ 1 × 10−50 were further analyzed for the presence of a secretion signal peptide using SignalP 5.051 using default parameters. To extend the annotation of this initial dataset of predicted secreted CAZymes, BLAST searches (e-value ≤ 1 × 10−5) were performed against the metazoan NCBI non-redundant (NR) database [excluding Sciaridae (NCBI: txid7184)], the UniProt database105 and Pfam database,107 followed by manual curation to remove false positive hits (Table S9). The same pipeline was employed to identify the CAZymes present in the B. tilicola33 (GCA_014529535.1) and B. cellarum30 (GCA_016920775.1) predicted gene sets (Table S9). In the case of the L. cuprina42 (ASM2204524v1), the predicted annotation was directly extracted from the GenBank database (Table S9). Two of the initial datasets contained protein isoforms (Bradysia tillicola and L. cuprina) which were manually removed prior to the final curation step. Venn diagrams comparing the CAZymes families annotated in all four species were performed in the web tool “Draw Venn Diagram” (https://bioinformatics.psb.ugent.be/webtools/Venn/).
Shotgun protein identification in saliva samples
Saliva was collected from larvae as described in the previous section “Psl. hygida sample collection”. The total volume obtained was pooled and proteins were quantified via the Bradford assay. Then, 30 μg of protein were run on a one-dimensional reducing 12.5% SDS-PAGE for 2 h at 200 V108 and the resulting gel was stained for 16 h with 0.08% colloidal Coomassie Brilliant Blue G-250 (USB Corporation, Cleveland, OH, USA). For shotgun protein identification, the gel lane containing all proteins was cut in two and each piece was analyzed separately (Figure S5). Proteins were digested in-gel with trypsin (Sequencing Grade Modified Trypsin, Promega, Madison, WI, USA) and the resulting peptides were run on an UltiMate 3000 RSLCnano System (Thermo Fisher Scientific, Waltham, MA, USA) coupled to a tribrid linear ion trap-quadrupole-Orbitrap mass spectrometer (Orbitrap Fusion Lumos Tribrid, Thermo Fisher Scientific, Waltham, MA, USA) with a 120 min total analysis time. Proteins were identified with MaxQuant software109 against a database of the Psl. hygida proteome generated via in silico translation of the genome. Secreted proteins were selected and filtered with SignalP 5.051 and Gene Ontology classification was performed with InterProScan97 (Table S10). The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE110 partner repository with the dataset identifier PXD033046.
Enzyme assays
All samples employed in the enzyme assays were obtained from cultivation boxes lined with sterilised acid treated sand. Enzyme assays were performed using extracts obtained by homogenization of the following biological samples in 1 x PBS (0.136 mM NaCl; 2.7 mM KCl; 10 mM Na2HPO4.12H2O; 2 mM KH2PO4): saliva collected from 12-day-old larvae, larval diet containing saliva of 12-day-old Psl. hygida larvae, larval diet that was kept in rearing boxes without Psl. hygida larvae (control, in Figure S9), and guts dissected from 12-day-old larvae.
Enzymatic activity against the synthetic substrates (Sigma-Aldrich Chem. Co., St. Louis, MO, USA) was measured at 37°C in McIlvaine buffer (pH 5.0 and 8.0),111 using p-nitrophenyl-β-D-fucopyranoside (pNP-β-Fuc), p-nitrophenyl-β-D-xylopyranoside (pNP-β–Xyl), p-nitrophenyl-β-D-glucopyranoside (pNP-β-Glu), p-nitrophenyl-β-D-galactopyranoside (pNP-β-Gal), p-nitrophenyl-β-D-mannopyranoside (pNP-β-Man), p-nitrophenyl-α-L-arabinofuranoside (pNP-α-Ara), p-nitrophenyl-α-D-galactopyranoside (pNP-α-Gal), and p-nitrophenyl-α-D-xylopyranoside (pNP-α–Xyl) as substrates at a final concentration of 2 mmol L−1. The reactions were initiated by the addition of the sample extracts of interest and were interrupted after convenient time intervals by adding 100 μL of saturated sodium tetraborate solution. Hydrolysis rates were estimated by quantifying the liberation of the p-nitrophenolate ion according to a standard curve under the same conditions as described for the reaction. One unit (U) was defined as the amount of enzyme that releases 1 nmol of product per hour. Catalytic activity was defined as U/mL enzyme−1 (U/mL).
The hydrolysis of the polymeric substrates (from Sigma-Aldrich Chem. Co., St. Louis, MO, USA, unless otherwise indicated) carboxymethylcellulose (CMC, 1% w/v), Avicel (1% w/v), xyloglucan (0.5% w/v) extracted from tamarind112 (Tamarindus indica L.), β-glucan (0.5% w/v) extracted from barley113 (Invicta Brewery, Ribeirão Preto, SP, Brazil), citrus pectin (0.5% w/v), xylan from beechwood (1% w/v), chitin from shrimp shells (1% w/v) were estimated using the dinitrosalicylic acid method114 under the same assay conditions as used for the synthetic substrates. All reactions were performed in experimental triplicate with a minimum of two biological replicates (Figures 3 and S9, and S10). Data are presented as the mean ± SD of all measurements. Controls with heat inactivated extracts were used in all enzymatic assays.
Catalytic profiling of saliva by LC-MS
Mass spectrometry analysis of hydrolysis products of the polymeric substrates CMC, Avicel, barley β-glucan and chitin were performed with HILIC-MS (Hydrophobic interaction liquid chromatography coupled with mass spectrometry) (Figures 3 and S6–S8). Chromatographic conditions consisted of HILIC column (Supelco, Bellefonte, PA, USA), at 40°C with mobile phase composed of water and ammonium acetate 0.1% (A) and acetonitrile and NH4OH 0.1% (B). Gradient elution started with 10% A and 90% B, increased to 50% A and 50% B at 6 min and returned to the initial condition after 6.10 min and was maintained to the final time of 15 min, with a constant flow of 0.50 mL/min. Mass spectrometry was performed in a Xevo TQ-S (Waters Corporation, Milford, MA, USA) mass spectrometer, with electrospray ionization and quadrupole analyser operated at a capillary voltage of 3.2 kV, with a source temperature of 150°C and solvation gas N2 at 300°C. Data was collected by MassLynx 4.1 software (Waters Corporation, Milford, MA, USA) and selected ions chromatograms plotted with Origin software (OriginLab Corporation, Northampton, MA, USA).
Quantification and statistical analysis
The experimental results were expressed by the mean ± standard deviation of three independent replicates using the OriginPro 8.0 software (OriginLab Corporation).
Acknowledgments
This work was supported by: Fundação de Apoio às Ciências: Humanas, Exatas e Naturais (FAC), Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP; grants 2017/13734-3 to S.C., 2018/25664-2 to C.V.G., 2016/17582-0 to L.P.M., 2014/14318-5 to L.P.M.A., 2020/05636-4 to T.T.T., 2016/24139-6 to R.J.W., 2016/25325-8 to N.M.), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq; grants 130268/2019 to J.V.C.U., 157704/2017-3 and 800605/2018-7 to M.M.S., 165191/2020-1 to G.T.P.B., 305788/2017-5 to R.J.W.), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES; Grant 88882.378754/2019-01 to V.T.). The authors would like to thank the Proteomics Core Facility (European Molecular Biology Laboratory, Heidelberg, Germany) for proteomic analyses and Sophie Tandonnet for her useful suggestions regarding the genome annotation pipeline. RNA sequencing was performed by the “Centro de Genômica Funcional”, ESALQ-USP (Piracicaba, SP, Brazil). We also thank Prof. Dr. Walter Ribeiro Terra and Profa. Dra. Clelia Ferreira Terra for helpful discussion and thoughtful insights on this work.
Author contributions
Conceptualization, R.J.W. and N.M.; Formal Analysis, V.T., V.K., and G.T.P.B.; Investigation, S.C., J.V.C.U., C.V.G., M.M.S., L.P.M., G.T.P.B., F.M., L.P.M.A., and N.M.; Data Curation, V.T., G.T.P.B., R.J.W., and N.M.; Writing – Original Draft Preparation, V.T., S.C., C.V.G., L.P.M., G.T.P.B., R.J.W., and N.M.; Writing – Review & Editing Preparation, G.T.P.B., F.M., L.P.M.A., T.T.T., R.J.W., and N.M.; Visualization, V.T., S.C., C.V.G., L.P.M., and G.T.P.B.; Supervision, R.J.W. and N.M.; Project Administration, R.J.W. and N.M.; Funding Acquisition L.P.M.A., R.J.W., and N.M.
Declaration of interests
The authors declare no competing interests.
Inclusion and diversity
We support inclusive, diverse, and equitable conduct of research.
Published: March 20, 2023
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.isci.2023.106449.
Contributor Information
Richard John Ward, Email: rjward@ffclrp.usp.br.
Nadia Monesi, Email: namonesi@fcfrp.usp.br.
Supplemental information
Data and code availability
-
•
RNA-seq and DNA-seq raw data, transcriptome and genome assemblies and genome annotation have all been deposited at DDBJ/ENA/GenBank and are publicly available as of the date of publication. Accession numbers are listed in the key resources table. Proteome data have been deposited at ProteomeXChange Consortium (PRIDE) and are publicly available as of the date of publication. The accession number is listed in the key resources table.
-
•
This paper does not report original code.
-
•
Any additional information required to reanalyse the data reported in this paper is available from the lead contact upon request (namonesi@fcfrp.usp.br).
References
- 1.Lavelle P., Mathieu J., Spain A., Brown G., Fragoso C., Lapied E., De Aquino A., Barois I., Barrios E., Barros M.E., et al. Soil macroinvertebrate communities: a world-wide assessment. Global Ecol. Biogeogr. 2022;31:1261–1276. doi: 10.1111/geb.13492. [DOI] [Google Scholar]
- 2.Potapov A.M., Beaulieu F., Birkhofer K., Bluhm S.L., Degtyarev M.I., Devetter M., Goncharov A.A., Gongalsky K.B., Klarner B., Korobushkin D.I., et al. Feeding habits and multifunctional classification of soil-associated consumers from protists to vertebrates. Biol. Rev. Camb. Philos. Soc. 2022;97:1057–1117. doi: 10.1111/brv.12832. [DOI] [PubMed] [Google Scholar]
- 3.Briones M.J.I. The serendipitous value of soil fauna in ecosystem functioning: the unexplained explained. Front. Environ. Sci. 2018;6 doi: 10.3389/fenvs.2018.00149. [DOI] [Google Scholar]
- 4.Guerra C.A., Heintz-Buschart A., Sikorski J., Chatzinotas A., Guerrero-Ramírez N., Cesarz S., Beaumelle L., Rillig M.C., Maestre F.T., Delgado-Baquerizo M., et al. Blind spots in global soil biodiversity and ecosystem function research. Nat. Commun. 2020;11:3870. doi: 10.1038/s41467-020-17688-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gongalsky K.B. Soil macrofauna: study problems and perspectives. Soil Biol. Biochem. 2021;159 doi: 10.1016/j.soilbio.2021.108281. [DOI] [Google Scholar]
- 6.van der Putten W.H., Bardgett R.D., de Ruiter P.C., Hol W.H.G., Meyer K.M., Bezemer T.M., Bradford M.A., Christensen S., Eppinga M.B., Fukami T., et al. Empirical and theoretical challenges in aboveground–belowground ecology. Oecologia. 2009;161:1–14. doi: 10.1007/s00442-009-1351-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Briones M.J.I. Soil fauna and soil functions: a jigsaw puzzle. Front. Environ. Sci. 2014;2 doi: 10.3389/fenvs.2014.00007. [DOI] [Google Scholar]
- 8.Drula E., Garron M.-L., Dogan S., Lombard V., Henrissat B., Terrapon N. The carbohydrate-active enzyme database: functions and literature. Nucleic Acids Res. 2022;50:D571–D577. doi: 10.1093/nar/gkab1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Cragg S.M., Beckham G.T., Bruce N.C., Bugg T.D.H., Distel D.L., Dupree P., Etxabe A.G., Goodell B.S., Jellison J., McGeehan J.E., et al. Lignocellulose degradation mechanisms across the tree of life. Curr. Opin. Chem. Biol. 2015;29:108–119. doi: 10.1016/j.cbpa.2015.10.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Shin N.R., Shin S., Okamura Y., Kirsch R., Lombard V., Svacha P., Denux O., Augustin S., Henrissat B., McKenna D.D., Pauchet Y. Larvae of longhorned beetles (Coleoptera; Cerambycidae) have evolved a diverse and phylogenetically conserved array of plant cell wall degrading enzymes. Syst. Entomol. 2021;46:784–797. doi: 10.1111/syen.12488. [DOI] [Google Scholar]
- 11.Bredon M., Herran B., Lheraud B., Bertaux J., Grève P., Moumen B., Bouchon D. Lignocellulose degradation in isopods: new insights into the adaptation to terrestrial life. BMC Genom. 2019;20:462. doi: 10.1186/s12864-019-5825-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bredon M., Dittmer J., Noël C., Moumen B., Bouchon D. Lignocellulose degradation at the holobiont level: teamwork in a keystone soil invertebrate. Microbiome. 2018;6:162. doi: 10.1186/s40168-018-0536-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Nakayama D.G., Santos Júnior C.D., Kishi L.T., Pedezzi R., Santiago A.C., Soares-Costa A., Henrique-Silva F. A transcriptomic survey of Migdolus fryanus (sugarcane rhizome borer) larvae. PLoS One. 2017;12 doi: 10.1371/journal.pone.0173059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wiegmann B.M., Richards S. Genomes of Diptera. Curr. Opin. Insect Sci. 2018;25:116–124. doi: 10.1016/j.cois.2018.01.007. [DOI] [PubMed] [Google Scholar]
- 15.Wiegmann B.M., Trautwein M.D., Winkler I.S., Barr N.B., Kim J.W., Lambkin C., Bertone M.A., Cassel B.K., Bayless K.M., Heimberg A.M., et al. Episodic radiations in the fly tree of life. Proc. Natl. Acad. Sci. USA. 2011;108:5690–5695. doi: 10.1073/pnas.1012675108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wiegmann B.M., Yeates D.K. In: Manual of Afrotropical Diptera. Kirk-Spriggs A.H., Sinclair B.J., editors. 2017. Phylogeny of Diptera; pp. 253–265. (SANBI Graphics & Editing)). [Google Scholar]
- 17.Ševčík J., Kaspřák D., Mantič M., Fitzgerald S., Ševčíková T., Tóthová A., Jaschhof M. Molecular phylogeny of the megadiverse insect infraorder Bibionomorpha sensu lato (Diptera) PeerJ. 2016;4 doi: 10.7717/peerj.2563. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Menzel F., Smith J.E. In: Manual of Afrotropical Diptera. Kirk-Spriggs A.H., Sinclair B.J., editors. 2017. Sciaridae (black fungus gnats) pp. 557–580. (SANBI Graphics & Editing)). [Google Scholar]
- 19.Frouz J. Use of soil dwelling Diptera (Insecta, Diptera) as bioindicators: a review of ecological requirements and response to disturbance. Agric. Ecosyst. Environ. 1999;74:167–186. doi: 10.1016/S0167-8809(99)00036-5. [DOI] [Google Scholar]
- 20.Hövemeyer K. Diversity patterns in terrestrial dipteran communities. J. Anim. Ecol. 1999;68:400–416. doi: 10.1046/j.1365-2656.1999.00292.x. [DOI] [Google Scholar]
- 21.Nielsen B.O., Nielsen L.B. Seasonal aspects of sciarid emergence in arable land (Diptera: Sciaridae) Pedobiologia. 2004;48:231–244. doi: 10.1016/j.pedobi.2004.01.001. [DOI] [Google Scholar]
- 22.Seeber J., Seeber G.U.H., Kössler W., Langel R., Scheu S., Meyer E. Abundance and trophic structure of macro-decomposers on alpine pastureland (Central Alps, Tyrol): effects of abandonment of pasturing. Pedobiologia. 2005;49:221–228. doi: 10.1016/j.pedobi.2004.10.003. [DOI] [Google Scholar]
- 23.Babytskiy A.I., Moroz M.S., Kalashnyk S.O., Bezsmertna O.O., Dudiak I.D., Voitsekhivska O.V. New findings of pest sciarid species (Diptera, Sciaridae) in Ukraine, with the first record of Bradysia difformis. Biosyst. Divers. 2019;27:131–141. doi: 10.15421/011918. [DOI] [Google Scholar]
- 24.Ullah F., Gul H., Desneux N., Said F., Gao X., Song D. Fitness costs in chlorfenapyr-resistant populations of the chive maggot, Bradysia odoriphaga. Ecotoxicology. 2020;29:407–416. doi: 10.1007/s10646-020-02183-7. [DOI] [PubMed] [Google Scholar]
- 25.Zhang S., Du Y., Abudisilimu B., Bai X., Zhang P., Liu Z., Cao Z., Jing X. Recent research on pesticides to manage the chive maggot, Bradysia odoriphaga Yang et Zhang (Diptera:Sciaridae) in China. Pak. J. Agric. Sci. 2020;57:623–630. [Google Scholar]
- 26.Harris M.A., Gardner W.A., Oetting R.D. A review of the scientific literature on fungus gnats (Diptera: Sciaridae) in the genus Bradysia. J. Entomol. Sci. 1996;31:252–276. doi: 10.18474/0749-8004-31.3.252. [DOI] [Google Scholar]
- 27.Chimeno C., Hausmann A., Schmidt S., Raupach M.J., Doczkal D., Baranov V., Hübner J., Höcherl A., Albrecht R., Jaschhof M., et al. Peering into the darkness: DNA barcoding reveals surprisingly high diversity of unknown species of Diptera (Insecta) in Germany. Insects. 2022;13 doi: 10.3390/insects13010082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Menzel F., Gammelmo Ø., Olsen K.M., Köhler A. The black fungus gnats (Diptera, Sciaridae) of Norway - Part I: species records published until December 2019, with an updated checklist. ZooKeys. 2020;957:17–104. doi: 10.3897/zookeys.957.46528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Shin S., Jung S., Menzel F., Heller K., Lee H., Lee S. Molecular phylogeny of black fungus gnats (Diptera: Sciaroidea: Sciaridae) and the evolution of larval habitats. Mol. Phylogenet. Evol. 2013;66:833–846. doi: 10.1016/j.ympev.2012.11.008. [DOI] [PubMed] [Google Scholar]
- 30.Li M., Yang X., Fan F., Ge Y., Hong D., Wang Z., Lu C., Chen S., Wei G. De novo genome assembly of Bradysia cellarum (Diptera: Sciaridae), a notorious pest in traditional special vegetables in China. Insect Mol. Biol. 2022;31:508–518. doi: 10.1111/imb.12776. [DOI] [PubMed] [Google Scholar]
- 31.Miao X., Huang J., Menzel F., Wang Q., Wei Q., Lin X.-L., Wu H. Five mitochondrial genomes of black fungus gnats (Sciaridae) and their phylogenetic implications. Int. J. Biol. Macromol. 2020;150:200–205. doi: 10.1016/j.ijbiomac.2020.01.271. [DOI] [PubMed] [Google Scholar]
- 32.Trinca V., Uliana J.V.C., Ribeiro G.K.S., Torres T.T., Monesi N. Characterization of the mitochondrial genomes of Bradysia hygida, Phytosciara flavipes and Trichosia splendens (Diptera: Sciaridae) and novel insights on the control region of sciarid mitogenomes. Insect Mol. Biol. 2022;31:482–496. doi: 10.1111/imb.12774. [DOI] [PubMed] [Google Scholar]
- 33.Urban J.M., Foulk M.S., Bliss J.E., Coleman C.M., Lu N., Mazloom R., Brown S.J., Spradling A.C., Gerbi S.A. High contiguity de novo genome assembly and DNA modification analyses for the fungus fly, Sciara coprophila, using single-molecule sequencing. BMC Genom. 2021;22:643. doi: 10.1186/s12864-021-07926-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Hodson C.N., Ross L. Evolutionary perspectives on germline-restricted chromosomes in flies (Diptera) Genome Biol. Evol. 2021;13 doi: 10.1093/gbe/evab072. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Simon C.R., Siviero F., Monesi N. Beyond DNA puffs: what can we learn from studying sciarids? Genesis. 2016;54:361–378. doi: 10.1002/dvg.22946. [DOI] [PubMed] [Google Scholar]
- 36.Sauaia H., Alves M.A.R. A description of a new species of Bradysia (Diptera, Sciaridae) Pap. Avuls. Zool. 1968;22:85–88. [Google Scholar]
- 37.Loew H. In: Vierter Theil. Heine J.J., editor. 1850. Dipterologische beiträge. [DOI] [Google Scholar]
- 38.Lintner J.A. Sciara coprophila n. sp. the manure-fly (ord. Diptera: fam. Mycetophilidae) Annual Report of the New York State Museum. 1895;48:391–397. New York State Museum. [Google Scholar]
- 39.Frey R. Entwurf einer neuen Klassifikation der Mückenfamilie Sciaridae (Lycoriidae). II. Die nordeuropäischen Arten. Not. Entomol. 1948;27:33–92. [Google Scholar]
- 40.Yang J.K., Zhang X.M. Notes on the fragrant onion gnats with descriptions of two new species of Bradysia (Diptera: Sciaridae) Acta Agric. Univ. Pekin. 1985;11:153–156. [Google Scholar]
- 41.Wiedemann C.R.W. 1830. Aussereuropäische zweiflügelige Insekten: als Fortsetzung des Meigenschen Werkes (Schulzische Buchhandlung) [DOI] [Google Scholar]
- 42.Anstead C.A., Korhonen P.K., Young N.D., Hall R.S., Jex A.R., Murali S.C., Hughes D.S.T., Lee S.F., Perry T., Stroehlein A.J., et al. Lucilia cuprina genome unlocks parasitic fly biology to underpin future interventions. Nat. Commun. 2015;6:7344. doi: 10.1038/ncomms8344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Manni M., Berkeley M.R., Seppey M., Zdobnov E.M. BUSCO: assessing genomic data quality and beyond. Curr. Protoc. 2021;1:e323. doi: 10.1002/cpz1.323. [DOI] [PubMed] [Google Scholar]
- 44.Borges A.R., Gaspar V.P., Fernandez M.A. Unequal X chromosomes in Bradysia hygida (Diptera:Sciaridae) females: karyotype assembly and morphometric analysis. Genetica. 2000;108:101–105. doi: 10.1023/a:1004016809267. [DOI] [PubMed] [Google Scholar]
- 45.Candido-Silva J.A., de Carvalho D.P., Coelho G.R., de Almeida J.C. Indirect immune detection of ecdysone receptor (EcR) during the formation of DNA puffs in Bradysia hygida (Diptera, Sciaridae) Chromosome Res. 2008;16:609–622. doi: 10.1007/s10577-008-1215-9. [DOI] [PubMed] [Google Scholar]
- 46.Fontes A.M., Conacci M.E., Monesi N., de Almeida J.C., Paçó-Larson M.L. The DNA puff BhB10-1 gene encodes a glycine-rich protein secreted by the late stage larval salivary glands of Bradysia hygida. Gene. 1999;231:67–75. doi: 10.1016/s0378-1119(99)00089-x. [DOI] [PubMed] [Google Scholar]
- 47.Paçó-Larson M., de Almeida J.C., Edström J.E., Sauaia H. Cloning of a developmentally amplified gene sequence in the DNA puff C4 of Bradysia hygida (Diptera: Sciaridae) salivary glands. Insect Biochem. Mol. Biol. 1992;22:439–446. doi: 10.1016/0965-1748(92)90139-6. [DOI] [Google Scholar]
- 48.Sauaia H. Universidade de São Paulo); 1971. Cromossomas Politênicos de Bradysia hygida. Inibição do Desenvolvimento dos Puffs de DNA pela Hydroxiuréia. [Google Scholar]
- 49.Gaspar V.P., Borges A.R., Fernandez M.A. NOR sites detected by Ag-dAPI staining of an unusual autosome chromosome of Bradysia hygida (Diptera:Sciaridae) colocalize with C-banded heterochromatic region. Genetica. 2002;114:57–61. doi: 10.1023/a:1014698401988. [DOI] [PubMed] [Google Scholar]
- 50.Gaspar V.P., Shimauti E.L.T., Fernandez M.A. Chromosomal localization and partial sequencing of the 18S and 28S ribosomal genes from Bradysia hygida (Diptera: Sciaridae) Genet. Mol. Res. 2014;13:2177–2185. doi: 10.4238/2014.March.26.6. [DOI] [PubMed] [Google Scholar]
- 51.Almagro Armenteros J.J., Tsirigos K.D., Sønderby C.K., Petersen T.N., Winther O., Brunak S., von Heijne G., Nielsen H. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol. 2019;37:420–423. doi: 10.1038/s41587-019-0036-z. [DOI] [PubMed] [Google Scholar]
- 52.Mba Medie F., Davies G.J., Drancourt M., Henrissat B. Genome analyses highlight the different biological roles of cellulases. Nat. Rev. Microbiol. 2012;10:227–234. doi: 10.1038/nrmicro2729. [DOI] [PubMed] [Google Scholar]
- 53.Várnai A., Mäkelä M.R., Djajadi D.T., Rahikainen J., Hatakka A., Viikari L. Carbohydrate-binding modules of fungal cellulases: occurrence in nature, function, and relevance in industrial biomass conversion. Adv. Appl. Microbiol. 2014;88:103–165. doi: 10.1016/B978-0-12-800260-5.00004-8. [DOI] [PubMed] [Google Scholar]
- 54.Gottar M., Gobert V., Matskevich A.A., Reichhart J.M., Wang C., Butt T.M., Belvin M., Hoffmann J.A., Ferrandon D. Dual detection of fungal infections in Drosophila via recognition of glucans and sensing of virulence factors. Cell. 2006;127:1425–1437. doi: 10.1016/j.cell.2006.10.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Dias R.O., Cardoso C., Leal C.S., Ribeiro A.F., Ferreira C., Terra W.R. Domain structure and expression along the midgut and carcass of peritrophins and cuticle proteins analogous to peritrophins in insects with and without peritrophic membrane. J. Insect Physiol. 2019;114:1–9. doi: 10.1016/j.jinsphys.2019.02.002. [DOI] [PubMed] [Google Scholar]
- 56.Hansen S.F., Bettler E., Rinnan A., Engelsen S.B., Breton C. Exploring genomes for glycosyltransferases. Mol. Biosyst. 2010;6:1773–1781. doi: 10.1039/c000238k. [DOI] [PubMed] [Google Scholar]
- 57.Gloss A.D., Abbot P., Whiteman N.K. How interactions with plant chemicals shape insect genomes. Curr. Opin. Insect Sci. 2019;36:149–156. doi: 10.1016/j.cois.2019.09.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Linton S.M. Review: the structure and function of cellulase (endo-β-1,4-glucanase) and hemicellulase (β-1,3-glucanase and endo-β-1,4-mannase) enzymes in invertebrates that consume materials ranging from microbes, algae to leaf litter. Comp. Biochem. Physiol. B Biochem. Mol. Biol. 2020;240 doi: 10.1016/j.cbpb.2019.110354. [DOI] [PubMed] [Google Scholar]
- 59.Watanabe H., Tokuda G. Cellulolytic systems in insects. Annu. Rev. Entomol. 2010;55:609–632. doi: 10.1146/annurev-ento-112408-085319. [DOI] [PubMed] [Google Scholar]
- 60.Poria V., Saini J.K., Singh S., Nain L., Kuhad R.C. Arabinofuranosidases: characteristics, microbial production, and potential in waste valorization and industrial applications. Bioresour. Technol. 2020;304 doi: 10.1016/j.biortech.2020.123019. [DOI] [PubMed] [Google Scholar]
- 61.Marana S.R. Molecular basis of substrate specificity in family 1 glycoside hydrolases. IUBMB Life. 2006;58:63–73. doi: 10.1080/15216540600617156. [DOI] [PubMed] [Google Scholar]
- 62.Chen W., Jiang X., Yang Q. Glycoside hydrolase family 18 chitinases: the known and the unknown. Biotechnol. Adv. 2020;43 doi: 10.1016/j.biotechadv.2020.107553. [DOI] [PubMed] [Google Scholar]
- 63.Rathore A.S., Gupta R.D. Chitinases from bacteria to human: properties, applications, and future perspectives. Enzyme Res. 2015;2015 doi: 10.1155/2015/791907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Muñoz-Benavent M., Pérez-Cobas A.E., García-Ferris C., Moya A., Latorre A. Insects' potential: understanding the functional role of their gut microbiome. J. Pharm. Biomed. Anal. 2021;194 doi: 10.1016/j.jpba.2020.113787. [DOI] [PubMed] [Google Scholar]
- 65.Tellam R.L., Bowles V.M. Control of blowfly strike in sheep: current strategies and future prospects. Int. J. Parasitol. 1997;27:261–273. doi: 10.1016/S0020-7519(96)00174-9. [DOI] [PubMed] [Google Scholar]
- 66.Qu M., Guo X., Tian S., Yang Q., Kim M., Mun S., Noh M.Y., Kramer K.J., Muthukrishnan S., Arakane Y. AA15 lytic polysaccharide monooxygenase is required for efficient chitinous cuticle turnover during insect molting. Commun. Biol. 2022;5:518–612. doi: 10.1038/s42003-022-03469-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Forsberg Z., Sørlie M., Petrović D., Courtade G., Aachmann F.L., Vaaje-Kolstad G., Bissaro B., Røhr Å.K., Eijsink V.G. Polysaccharide degradation by lytic polysaccharide monooxygenases. Curr. Opin. Struct. Biol. 2019;59:54–64. doi: 10.1016/j.sbi.2019.02.015. [DOI] [PubMed] [Google Scholar]
- 68.Vogel H., Shukla S.P., Engl T., Weiss B., Fischer R., Steiger S., Heckel D.G., Kaltenpoth M., Vilcinskas A. The digestive and defensive basis of carcass utilization by the burying beetle and its microbiota. Nat. Commun. 2017;8 doi: 10.1038/ncomms15186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Martinson E.O., Martinson V.G., Edwards R., Mrinalini, Werren J.H. Laterally transferred gene recruited as a venom in parasitoid wasps. Mol. Biol. Evol. 2016;33:1042–1052. doi: 10.1093/molbev/msv348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Zhang P., Liu F., Mu W., Wang Q., Li H. Comparison of Bradysia odoriphaga Yang and Zhang reared on artificial diet and different host plants based on an age-stage, two-sex life table. Phytoparasitica. 2015;43:107–120. doi: 10.1007/s12600-014-0420-7. [DOI] [Google Scholar]
- 71.Broadley A., Kauschke E., Mohrig W. Black fungus gnats (Diptera: Sciaridae) found in association with cultivated plants and mushrooms in Australia, with notes on cosmopolitan pest species and biosecurity interceptions. Zootaxa. 2018;4415:201–242. doi: 10.11646/zootaxa.4415.2.1. [DOI] [PubMed] [Google Scholar]
- 72.Sciara Stock Center Husbandry and Life Cycle. https://sites.brown.edu/sciara/sciara-maintenance-and-methods/husbandry-and-life-cycle/
- 73.Shin S., Lee H., Lee S. Proposal of a new subfamily of Sciaridae (Diptera: Sciaridae), with description of one new species from South Korea. Zootaxa. 2019;4543:127–136. doi: 10.11646/zootaxa.4543.1.8. [DOI] [PubMed] [Google Scholar]
- 74.Vilkamaa P., Rudzinski H.-G., BurdÍkovÁ N., ŠevČÍk J. Phylogenetic position of Aerumnosa Mohrig (Diptera, Sciaridae) as revealed by multigene analysis, with the description of four new Oriental species. Zootaxa. 2018;4399:248–260. doi: 10.11646/zootaxa.4399.2.8. [DOI] [PubMed] [Google Scholar]
- 75.Vilkamaa P., Menzel F. Re-classification of Lycoriella Frey sensu lato (Diptera, Sciaridae), with description of Trichocoelina gen. n. and twenty new species. Zootaxa. 2019;4665:67. doi: 10.11646/zootaxa.4665.1.1. [DOI] [PubMed] [Google Scholar]
- 76.Zerbino D.R., Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Chin C.-S., Alexander D.H., Marks P., Klammer A.A., Drake J., Heiner C., Clum A., Copeland A., Huddleston J., Eichler E.E., et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods. 2013;10:563–569. doi: 10.1038/nmeth.2474. [DOI] [PubMed] [Google Scholar]
- 78.Boetzer M., Henkel C.V., Jansen H.J., Butler D., Pirovano W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics. 2011;27:578–579. doi: 10.1093/bioinformatics/btq683. [DOI] [PubMed] [Google Scholar]
- 79.Boetzer M., Pirovano W. Toward almost closed genomes with GapFiller. Genome Biol. 2012;13:R56. doi: 10.1186/gb-2012-13-6-r56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Durand N.C., Shamim M.S., Machol I., Rao S.S.P., Huntley M.H., Lander E.S., Aiden E.L. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Dudchenko O., Batra S.S., Omer A.D., Nyquist S.K., Hoeger M., Durand N.C., Shamim M.S., Machol I., Lander E.S., Aiden A.P., Aiden E.L. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Robinson J.T., Turner D., Durand N.C., Thorvaldsdóttir H., Mesirov J.P., Aiden E.L. Juicebox.js provides a cloud-based visualization system for Hi-C data. Cell Syst. 2018;6:256–258.e1. doi: 10.1016/j.cels.2018.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 84.Bolger A.M., Lohse M., Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Grabherr M.G., Haas B.J., Yassour M., Levin J.Z., Thompson D.A., Amit I., Adiconis X., Fan L., Raychowdhury R., Zeng Q., et al. Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol. 2011;29:644–652. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Flynn J.M., Hubley R., Goubert C., Rosen J., Clark A.G., Feschotte C., Smit A.F. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Bao Z., Eddy S.R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002;12:1269–1276. doi: 10.1101/gr.88502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Price A.L., Jones N.C., Pevzner P.A. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21:i351–i358. doi: 10.1093/bioinformatics/bti1018. [DOI] [PubMed] [Google Scholar]
- 89.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Smit A.F.A., Hubley R., Green P. 2013. RepeatMasker Open-4.0.http://www.repeatmasker.org [Google Scholar]
- 91.Chan P.P., Lowe T.M. tRNAscan-SE: searching for tRNA genes in genomic sequences. Methods Mol. Biol. 2019;1962:1–14. doi: 10.1007/978-1-4939-9173-0_1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Seemann T. 2013. Barrnap 0.9: Rapid Ribosomal RNA Prediction.https://github.com/tseemann/barrnap [Google Scholar]
- 93.Campbell M.S., Holt C., Moore B., Yandell M. Genome annotation and curation using MAKER and MAKER-P. Curr. Protoc. Bioinformatics. 2014;48:4.11.1–4.11.39. doi: 10.1002/0471250953.bi0411s48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Korf I. Gene finding in novel genomes. BMC Bioinf. 2004;5:59. doi: 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Stanke M., Schöffmann O., Morgenstern B., Waack S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinf. 2006;7:62. doi: 10.1186/1471-2105-7-62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Ter-Hovhannisyan V., Lomsadze A., Chernoff Y.O., Borodovsky M. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res. 2008;18:1979–1990. doi: 10.1101/gr.081612.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Blum M., Chang H.Y., Chuguransky S., Grego T., Kandasaamy S., Mitchell A., Nuka G., Paysan-Lafosse T., Qureshi M., Raj S., et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 2021;49:D344–D354. doi: 10.1093/nar/gkaa977. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Menzel F., Mohrig W. Beiträge zur Taxonomie und Faunistik der paläarktischen Trauermücken (Diptera, Sciaridae). Teil VI - neue Ergebnisse aus Typenuntersuchungen und die daraus resultierenden taxonomisch-nomenklatorischen Konsequenzen. Stud. Dipterol. 1998;5:351–378. [Google Scholar]
- 99.Menzel F., Mohrig W. Ampyx-Verlag; 2000. Revision der paläarktischen Trauermücken (Diptera: Sciaridae) [Google Scholar]
- 100.Uliana J.V.C., Brancini G.T.P., Hombría J.C.G., Digiampietri L.A., Andrioli L.P., Monesi N. Characterizing the embryonic development of B. hygida (Diptera: Sciaridae) following enzymatic treatment to permeabilize the serosal cuticle. Mech. Dev. 2018;154:270–276. doi: 10.1016/j.mod.2018.08.002. [DOI] [PubMed] [Google Scholar]
- 101.Laicine E.M., Alves M.A., de Almeida J.C., Rizzo E., Albernaz W.C., Sauaia H. Development of DNA puffs and patterns of polypeptide synthesis in the salivary glands of Bradysia hygida. Chromosoma. 1984;89:280–284. doi: 10.1007/BF00292475. [DOI] [PubMed] [Google Scholar]
- 102.de-Almeida J.C. A 28-fold increase in secretory protein synthesis is associated with DNA puff activity in the salivary gland of Bradysia hygida (Diptera, Sciaridae) Braz. J. Med. Biol. Res. 1997;30:605–614. doi: 10.1590/S0100-879X1997000500006. [DOI] [PubMed] [Google Scholar]
- 103.Monesi N., Jacobs-Lorena M., Paçó-Larson M.L. The DNA puff gene BhC4-1 of Bradysia hygida is specifically transcribed in early prepupal salivary glands of Drosophila melanogaster. Chromosoma. 1998;107:559–569. doi: 10.1007/s004120050342. [DOI] [PubMed] [Google Scholar]
- 104.Storer J., Hubley R., Rosen J., Wheeler T.J., Smit A.F. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob. DNA. 2021;12:2. doi: 10.1186/s13100-020-00230-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.UniProt Consortium UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–D489. doi: 10.1093/nar/gkaa1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Zhang H., Yohe T., Huang L., Entwistle S., Wu P., Yang Z., Busk P.K., Xu Y., Yin Y. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2018;46:W95–W101. doi: 10.1093/nar/gky418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Mistry J., Chuguransky S., Williams L., Qureshi M., Salazar G.A., Sonnhammer E.L.L., Tosatto S.C.E., Paladin L., Raj S., Richardson L.J., et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 2021;49:D412–D419. doi: 10.1093/nar/gkaa913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.Laemmli U.K. Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature. 1970;227:680–685. doi: 10.1038/227680a0. [DOI] [PubMed] [Google Scholar]
- 109.Tyanova S., Temu T., Cox J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protoc. 2016;11:2301–2319. doi: 10.1038/nprot.2016.136. [DOI] [PubMed] [Google Scholar]
- 110.Perez-Riverol Y., Bai J., Bandla C., García-Seisdedos D., Hewapathirana S., Kamatchinathan S., Kundu D.J., Prakash A., Frericks-Zipper A., Eisenacher M., et al. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 2022;50:D543–D552. doi: 10.1093/nar/gkab1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.McIlvaine T.C. A buffer solution for colorimetric comparison. J. Biol. Chem. 1921;49:183–186. doi: 10.1016/S0021-9258(18)86000-8. [DOI] [Google Scholar]
- 112.Buckeridge M.S., Crombie H.J., Mendes C.J., Reid J.S., Gidley M.J., Vieira C.C. A new family of oligosaccharides from the xyloglucan of Hymenaea courbaril L. (Leguminosae) cotyledons. Carbohydr. Res. 1997;303:233–237. doi: 10.1016/s0008-6215(97)00161-4. [DOI] [PubMed] [Google Scholar]
- 113.Temelli F. Extraction and functional properties of barley β-glucan as affected by temperature and pH. J. Food Science. 1997;62:1194–1201. doi: 10.1111/j.1365-2621.1997.tb12242.x. [DOI] [Google Scholar]
- 114.Miller G.L. Use of dinitrosalicylic acid reagent for determination of reducing sugar. Anal. Chem. 1959;31:426–428. doi: 10.1021/ac60147a030. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
-
•
RNA-seq and DNA-seq raw data, transcriptome and genome assemblies and genome annotation have all been deposited at DDBJ/ENA/GenBank and are publicly available as of the date of publication. Accession numbers are listed in the key resources table. Proteome data have been deposited at ProteomeXChange Consortium (PRIDE) and are publicly available as of the date of publication. The accession number is listed in the key resources table.
-
•
This paper does not report original code.
-
•
Any additional information required to reanalyse the data reported in this paper is available from the lead contact upon request (namonesi@fcfrp.usp.br).





