Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2023 Jul 3;19(7):e1010832. doi: 10.1371/journal.pgen.1010832

Thousands of Pristionchus pacificus orphan genes were integrated into developmental networks that respond to diverse environmental microbiota

Marina Athanasouli 1, Nermin Akduman 1,¤, Waltraud Röseler 1, Penghieng Theam 1, Christian Rödelsperger 1,*
Editor: Kaveh Ashrafi2
PMCID: PMC10348561  PMID: 37399201

Abstract

Adaptation of organisms to environmental change may be facilitated by the creation of new genes. New genes without homologs in other lineages are known as taxonomically-restricted orphan genes and may result from divergence or de novo formation. Previously, we have extensively characterized the evolution and origin of such orphan genes in the nematode model organism Pristionchus pacificus. Here, we employ large-scale transcriptomics to establish potential functional associations and to measure the degree of transcriptional plasticity among orphan genes. Specifically, we analyzed 24 RNA-seq samples from adult P. pacificus worms raised on 24 different monoxenic bacterial cultures. Based on coexpression analysis, we identified 28 large modules that harbor 3,727 diplogastrid-specific orphan genes and that respond dynamically to different bacteria. These coexpression modules have distinct regulatory architecture and also exhibit differential expression patterns across development suggesting a link between bacterial response networks and development. Phylostratigraphy revealed a considerably high number of family- and even species-specific orphan genes in certain coexpression modules. This suggests that new genes are not attached randomly to existing cellular networks and that integration can happen very fast. Integrative analysis of protein domains, gene expression and ortholog data facilitated the assignments of biological labels for 22 coexpression modules with one of the largest, fast-evolving module being associated with spermatogenesis. In summary, this work presents the first functional annotation for thousands of P. pacificus orphan genes and reveals insights into their integration into environmentally responsive gene networks.

Author summary

The inference of biological function for genes in newly sequenced genomes heavily relies on sequence conservation with classical model organisms. Consequently, newly evolved orphan genes that do not have homologs will not be associated with any function. Here, we use coexpression with known genes in order to assign potential functions to orphan genes in the nematode Pristionchus pacificus. To this end, we generated transcriptome profiles of P. pacificus worms that were grown on 24 different bacteria and clustered the genes into 28 large coexpression modules which contain thousands of orphan genes. Integrative analysis could associate most coexpression modules with biological processes or tissues, which results in the first functional annotation for thousands of orphan genes. Complementary analysis of gene ages shows that modules associated with female reproduction are highly constrained whereas a male-reproductive module is much more likely to integrate new genes, which could possibly be explained by sperm competition. This links sexual conflict with gene network evolution and environmental regulation.

Introduction

The evolution of phenotypic diversity across all domains of life has been accompanied by the formation of new genes. As a result, up to one third of the gene content in extant genomes consists of orphan genes that lack homologs in other taxonomic lineages [1]. Although the definition of these taxonomically-restricted orphan genes is context-dependent and the numbers of identified orphan genes will vary with the phylogenetic resolution and methods for homology detection, orphan genes exist in any given genome [2]. Orphan genes may reflect ancient genes that evolve so fast that homology cannot be detected [3]. Alternatively, orphan genes may arise de novo from previously non-coding sequences [4,5]. This has been demonstrated in multiple taxonomic groups including vertebrates [6,7], insects [8,9], nematodes [10,11], yeast [12,13], and plants [14,15]. Horizontal gene transfer can be considered as a third mechanism that would lead to the generation of orphan genes [16,17]. The origin of orphan genes can most conclusively be studied for relatively recent gene births because potentially neutrally evolving ancestral sequences may degenerate very fast in sister taxa rendering homology detection impossible [18]. Thus, to what extent more ancient orphan genes arose from divergence or de novo formation is currently not known. Another important question concerns the biological functions of orphan genes. Given that large-scale experimental screens for possible functions are only possible in a limited number of model species [19,20] and that inference of function based on sequence conservation is not possible, most orphan genes have no known function. It has been hypothesized that new genes play a role in adapting to changing environments [21], but this is only supported by limited data [22,23]. In this study, we want to test this hypothesis in the nematode model organism Pristionchus pacificus by exploring the transcriptomic changes after exposure to different bacteria. P. pacificus is a free-living nematode that was initially established as a model system for comparative studies with the classical model organism Caenorhabditis elegans [24]. C. elegans and P. pacificus were estimated to have shared the last common ancestor around 130–310 million years ago [25]. Numerous studies have revealed conserved and divergent patterns across development [26], neurobiology [27], and behavior [28]. When the genome of P. pacificus was sequenced, around one third of its genes were classified as orphan genes that do not have any homolog outside the diplogastrid family [29]. By sequencing nine additional diplogastrid genomes, we established a phylogenomic framework to assign these orphan genes into phylostrata and to study their evolutionary dynamics and origin at the inter- and intra-species level [30,31]. These studies confirmed trends such as little expression evidence of orphan genes, rapid evolution, and high turnover which were also found in genomic studies of other animals and plants [32,33]. In contrast to the broad understanding of new gene evolution in Pristionchus nematodes, only two orphan genes have been experimentally characterized. The first orphan gene, dauerless, was identified in a screen to dissect the genetic basis of natural variation in dauer formation [34]. The dauer stage represents an alternative developmental stage in most nematodes that allows the worms to survive unfavorable environmental conditions for extended time periods. Overexpression of dauerless results in the suppression of dauer formation and it has been speculated that the ability to regulate dauerless dosage might provide individual strains an advantage during intraspecific competition [34]. The second orphan gene, self-1, encodes a micropeptide that was identified in a screen for killing behavior. Knockouts or modifications of self-1 resulted in a loss of self-recognition and led to killing by relatives. These two cases demonstrate that Pristionchus orphan genes are involved in important developmental decisions and in the evolution of novel behaviors. To complement this detailed functional knowledge of a very limited number of orphan genes with broader functional data on a genome-wide scale, we employ expression data as a proxy for function. This is under the assumption that in order to carry out a certain function, a gene must be expressed at a given condition. Furthermore, large-scale expression data can be used to group genes into functional modules and to transfer functional annotations based on coexpression [3537]. Specifically, we aim to investigate the P. pacificus transcriptome in response to different microbiota which denote the assemblage of microorganisms present in a defined environment [38]. Environmental bacteria can interact with nematodes in a variety of different ways such as serving as food source [39], constituents of the gut microbiome [40], or pathogens [41]. Here, we focus on the transcriptomic response to 24 different bacteria, most of which were isolated previously from Pristionchus-associated environments [42]. These bacteria are non-pathogenic in the sense that worm populations can survive for at least several days [42]. This is in contrast to highly pathogenic strains that can kill complete worm populations within a few hours [41]. Our main goals are to group P. pacificus genes into functional modules that respond differently to environmental microbiota, to characterize these modules, and finally to test whether some of these modules are enriched in orphan genes. This will allow us to better understand the plastic response to diverse environmental microbiota and to further elucidate the regulation and evolution of the associated gene networks.

Results

The transcriptomic response of nematodes to various microbiota does not strictly reflect bacterial phylogeny

To investigate the transcriptomic response of P. pacificus to different environmental microbiota, we grew worms on monoxenic cultures of 24 bacterial strains that included commonly used food bacteria such as Escherichicha coli OP50 and HB101 as well as 22 bacterial strains that were previously isolated from Pristionchus-associated environments [42]. The selected bacteria represent Alpha-, Beta-, Gammaproteobacteria as well as Flavobacteriia. A single RNA-seq data was generated per bacterial strain from 50 young adult worms that were manually picked from mixed-stage cultures (see Methods). Most transcriptome profiles of worms grown on different bacteria appeared to be highly similar (Pearson r > 0.9, Fig 1A). One sample, Wautersiella LRB104, showed a quite distinct transcriptome with correlation coefficients around 0.8 (Fig 1A and 1B). However, even this outlier shows much higher correlations than transcriptomes from different developmental stages with correlation coefficients of 0.6 [43,44]. The observed correlation coefficients translate into hundreds to thousands of genes with an absolute fold change > 2 between the most similar and most dissimilar pair of samples (S1 Fig). Next, we wanted to test whether the transcriptome profiles follow a phylogenetic pattern, i.e. are transcriptome profiles of worms grown on bacteria of the same family more similar than nematode transcriptomes from more distantly related bacteria (Fig 1C). This did not show significantly higher correlation for transcriptomes from more closely related bacteria (P = 0.35, t-test) suggesting that the transcriptomic response of nematodes to various environmental microbiota do not strictly reflect phylogeny. This observation could indicate that the observed differences are driven by factors that are difficult to control (e.g. bacterial concentration), or alternatively, that strain-specific changes are obscuring family-specific signals. The latter would be the case if critical metabolic pathways are plasmid encoded and could easily be horizontally transferred. In summary, P. pacificus nematodes exhibit substantial transcriptomic variation in response to environmental microbiota and these responses do not strictly reflect the bacterial phylogeny.

Fig 1. Transcriptional response to 24 bacterial environments.

Fig 1

(A) The heatmap shows the correlation between transcriptomes. Apart from Wautersiella LRB104 all transcriptomes are highly similar. (B) Complementary analysis using principal component analysis identifies Wautersiella LRB104 as the most distinct environment. C) The histograms show the distribution of correlation values for comparisons within and across bacterial families. The expression profiles of P. pacificus worms on bacteria of different families can be more similar than the profile on bacteria of the same family. This suggests that the transcriptional response does not strictly reflect bacterial phylogeny.

Almost half of all genes respond to diverse environmental microbiota

Nematodes interact with bacteria in many different ways involving processes like chemical attraction and repulsion [45], digestion [39], and detoxification [46]. Moreover, those processes can trigger secondary effects on worm development and physiology [47]. Thus, the response of worms might involve multiple regulatory and metabolic pathways and might impact many other biological processes. To characterize gene networks that respond to diverse environmental microbiota, we computed coexpression modules from the RNA-seq data using a widely used graph clustering approach [48]. Based on correlation coefficient of 0.7 (S2 Fig), we identified 28 large coexpression modules with more than 50 genes (Fig 2 and S1 Table). The largest modules exhibit relatively drastic changes on only one or few bacteria (mostly Wautersiella LRB104 and Hafnia LRB17), but also more subtle differences on other bacteria. For example, genes of module 1 show strongly reduced expression on Wautersiella LRB104, but also mildly lower expression on some other bacteria. These bacteria yield largely opposite trends for genes in module 2. Module 3 shows highest expression on the Hafnia strains and the strongest pattern of module 4 results again from high expression on Wautersiella LRB104. Also, module 5 shows weak expression on Wautersiella LRB104, Pseudomonas LRB26, and Hafnia LRB17, whereas module 6 stands out by having high expression on Achromobacter L35. Given that the most extreme expression differences are frequently observed in the transcriptomic response to Wautersiella LRB104, we compared developmental timing between E. coli OP50 and both Wautersiella strains (LRB89 and LRB104). This demonstrated that worms exhibit a developmental delay on both Wautersiella bacteria relative to E. coli OP50 (S3 Fig). Given that both Wautersiella strains change the developmental timing of P. pacificus worms, we cannot fully explain why the transcriptomic response to Wautersiella LRB104 differs so starkly from all the other transcriptomes. To test, how strongly this outlier influences the structure of the coexpression network and the subsequent analysis, we recomputed coexpression modules after removing the Wautersiella LRB104 data and compared coexpression modules across both data sets (S4 Fig). This showed a clear one-to-one correspondence between most coexpression modules indicating that the structure of the coexpression network is largely robust. In addition, systematic analysis of coexpression networks that were computed from subsampled data revealed that with fewer RNA-seq samples, network modules tend to get larger (S5 Fig). This may be due to two reasons. First, with fewer samples, it is easier to exceed a given correlation threshold just by chance. Second, with higher numbers of RNA-seq samples, expression profiles can only become more complex. This will cause splits of larger modules. Thus, the full data set with all 24 RNA-seq samples yields the most conservative lower estimate of 14,275 for the number of environmentally responsive genes in large coexpression modules. That means that almost half of the 28,896 annotated genes in P. pacificus respond to environmental microbiota.

Fig 2. Expression level of environmentally responsive coexpression modules.

Fig 2

We visualized the z-score normalized expression levels for the 28 largest modules across the bacterial environments. Modules with more than 100 genes were randomly downsampled to 100 genes. While genes of module 4 are most strongly expressed on Wautersiella LRB104, module 3 shows the highest expression in response to the two Hafnia strains. This demonstrates that environmental microbiota can modulate specific coexpression modules.

Coexpression modules exhibit developmental signatures

Bacterial diets can alter the developmental rate in C. elegans and P. pacificus [39,47]. Therefore, we wanted to test whether the coexpression modules also show a developmental signature. To this end, we visualized the normalized expression across P. pacificus development [43] for all genes in the largest coexpression modules (Fig 3). This showed that the coexpression modules that were obtained from adult worms on different bacteria also exhibit distinct expression profiles during development. Furthermore, this developmental signature is very consistent between most genes of the same module. For example, genes in module 1 are consistently activated only late in development (>48h, Fig 3). Their expression is preceded by genes in module 2 which consistently increase expression starting from the 40h timepoint (Fig 3). This could mean that the same network modules control development as well as the response to environmental microbiota. Alternatively, this developmental signature could be an indirect effect of altered developmental timing. Even if only adult worms were manually picked for RNA extraction, there might still be differences in the age of these worms. This is because worms may delay or accelerate their development in response to different bacteria [39,47] and consequently, adult worms that were grown on different bacteria may not be of the same chronological age. If the expression of certain modules rather follows the chronological age than the morphological stage, such modules could potentially cause a similar developmental signature. However, no matter if the expression profiles represent immediate response to different bacteria or variation in the chronological age of worms, both scenarios would reflect either direct or indirect consequences of the exposure to different microbial environments.

Fig 3. Developmental signature of coexpression modules.

Fig 3

We visualized the z-score normalized expression levels of genes in the 28 largest coexpression modules throughout postembryonic development after hatching (0h) on E. coli OP50 [43]. Modules with more than 100 genes were randomly downsampled to 100 genes. The coexpression modules show distinct expression profiles suggesting a link between environmental response and developmental regulation.

Coexpression modules have distinct regulatory architecture

The observation of distinct expression profiles among environmentally responsive genes suggests that the coexpression modules might be coregulated by diverse sets of transcription factors. To test this, we searched for overrepresented motifs in the promoter sequences of the largest coexpression modules using a de novo motif discovery approach as implemented in the HOMER software [49]. This identified a diverse set of DNA motifs that are highly enriched in specific modules (S6 Fig). While modules 1,5,7, and 23 show enrichments of the same motifs (ZBTB32, CUX1, and LIN54), many other coexpression modules have a unique regulatory architecture with very specific motifs that are only enriched in the given module (e.g. POU5F1 in module 2 and Foxd3 in module 24, S6 Fig). Unfortunately, it is not straightforward to infer the regulator for a given motif as families of transcription factors might be large and may bind very similar motifs [50]. For example, the human zinc finger and BTB domain containing protein ZBTB32 which binds a motif that is highly enriched in module 1 (S6 Fig) has dozens of orthologs in the genomes of C. elegans and P. pacificus. However, the most significantly enriched motif of module 2 has the highest similarity with the motif of POU domain homeobox transcription factor POU5F1. This family has only three members (unc-86, ceh-6 and ceh-18.) in C. elegans [51] and four in P. pacificus. Similarly, C. elegans ces-1 has a one-to-one ortholog in P. pacificus. Thus, for individual modules future experimental analysis could be used to dissect the regulatory relationships at a mechanistic level. Nevertheless, already the diversity of motifs that are found across different coexpression modules supports that they have distinct regulatory architecture. In addition, this analysis supports that the genes in a given module are not only coexpressed, but also coregulated.

3,727 Pristionchus pacificus orphan genes respond to diverse environmental microbiota

To test to what extent new genes could contribute to the response to diverse environmental microbiota, we performed a phylostratigraphic analysis to determine the distribution of gene ages across coexpression modules. Phylostratigraphy is a commonly used method to map gene birth events to a branch within a species tree by searching for the most distant homolog [52]. Here, we selected nine additional diplogastrid genomes [30] and the genomes of C. elegans, Bursaphelenchelus xylophilus [53], Brugia malayi [54], and Trichinella spiralis [55] to assign genes to phylostrata (S2 Table). Altogether, 10,318 (35.7%) of all genes were defined as diplogastrid family-specific orphan genes (BLASTP e-value < 10−3). Visualization of the distribution of phylostrata across the coexpression modules shows that almost half of all the genes in module 2 are diplogastrid-specific orphan genes (Fig 4). Furthermore, among the smaller modules 20 and 22, we observe relatively high ratios of species-specific orphan genes. This finding suggests that the integration of new genes into regulatory networks can happen very rapidly. Alternatively, such genes might represent ancient but rapidly evolving sequences such as antimicrobial peptides where we underestimate the gene age due to the failure to detect homologs [3]. In total, 3,727 diplogastrid-specific orphan genes are found among the 28 largest coexpression modules. As indicated above, this represents a lower estimate for the total number of environmentally-responsive orphan genes as additional orphan genes are found in smaller modules (S4 Fig). Relative to the genome-wide fraction of orphan genes (35.7%), this represents an underrepresentation as only 26.1% of genes in large coexpression modules are orphan genes (P < 2.2 ✕ 10−16, Fisher’s exact test). This may be largely explained by the fact that orphan genes are generally lowly expressed [31]. In our data set, only 66.4% of orphan genes are expressed whereas this number increases to 88.5% for non-orphan genes. Nevertheless, despite their lower level of expression, our study could demonstrate that thousands of orphan genes are indeed embedded into developmental networks that plastically respond to diverse microbiota. Further, it identified which of the orphan genes respond to different environments and it established associations between these orphan genes and specific network modules where they have been embedded.

Fig 4. Phylostratigraphic analysis across coexpression modules.

Fig 4

Phylostrata were defined based on the presence of homologs for a given gene in the most distantly related species. Each phylostratum defines a branch in the phylogeny where a gene likely originated. The barplot shows the distribution of phylostrata across coexpression modules with the dashed line marking the fraction of diplogastrid-specific orphan genes across all genes. The stars indicate significant enrichment and depletion of orphan genes based on simulating the integration of new genes to existing network modules (P < 0.01, S7 Fig).

The major fast-evolving module is associated with spermatogenesis

The distribution of phylostrata across coexpression modules suggests that new genes are not attached randomly into regulatory networks, i.e. not every module is equally likely to acquire a new gene. On the contrary, individual coexpression modules are more likely to integrate new genes than others. To assess the non-randomness in the distribution of diplogastrid-specific orphan genes, we simulated the integration of new genes into existing networks. Specifically, we tested a model where all network modules are equally likely to acquire a new gene and a model where the probability of attachment is proportional to the size of the module (S7 Fig). Basically, the second model predicts much more accurately the number of orphan genes per module and it allows to define fast evolving modules with significant enrichments of orphan genes (Modules 2–4, 8, 15, 18, 20, 22, 25, and 27, P < 0.01, S7 Fig and Fig 4). Constrained modules (Modules 1, 5, 7, 9, 13, 23, and 26) were defined analogously as modules that are significantly depleted in orphan genes. To better characterize the biological differences between such fast evolving and more constrained modules, we performed overrepresentation analysis of protein domains (S3 Table), metabolic pathways (S4 Table), and other expression data sets [43,5658]. The expression data includes 11 sets of regionally expressed genes that were identified from RNA tomography along the anterior-posterior axis of individual worms [58]. These analyses were complemented by tissue enrichment analysis (TEA, S5 Table) using one-to-one orthologs in C. elegans [59]. The overrepresentation of germline associated regions P6-P8 and TEA support that the constrained module 1 is associated with oogenesis (Fig 5A and S5 Table). In contrast, the enrichment of Motile sperm proteins, TEA and expression in sperm-related regions (P5, P9) associate the fast evolving module 2 with spermatogenesis (Figs S8 and 5A and S5 Table). Thus, the two largest environmentally responsive coexpression modules represent a constrained module associated with oogenesis and a fast evolving module that is associated with spermatogenesis. In C. elegans, the reproductive system is known to plastically respond to changes of environmental microbiota [60,61]. Thus, it is unsurprising to see the corresponding modules to respond dynamically to diverse bacteria in P. pacificus. These results also recapitulate previous findings of signatures of rapid evolution in spermatogenesis-associated genes in C. elegans and P. pacificus [58,62] This out-of-testis trend of new gene formations was also observed in insects and mammals and likely reflects a combination between rapid evolution of spermatogenesis-associated genes and a more permissive chromatin state [63,64].

Fig 5. Transcriptomic and metabolic enrichment of coexpression modules.

Fig 5

(A) Coexpression modules show distinct overlaps with regional genes (P1-P11) that were identified from spatial transcriptomics. (B) Multiple metabolic pathways are enriched in individual coexpression modules. (C) Coexpression modules were compared with multiple expression gene sets. 16 out of 28 coexpression modules show significant overlap with differentially expressed genes in previous comparisons of environmental microbiota.

Thousands of orphan genes can be associated with biological processes or tissues

The strong support for the association of the two largest coexpression modules with oogenesis and spermatogenesis motivated us to test if more coexpression modules could be annotated with biologically meaningful labels. Therefore, we complemented protein domain information (S8 Fig), spatial transcriptomics (Fig 5A), and TEA (S5 Table) with the overrepresentation of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Fig 5B) and other P. pacificus expression gene sets (Fig 5C) [43,47,56,57,6567]. This allowed us to assign labels for 22 of the 28 largest coexpression modules (Table 1). Please note that these labels are not exclusive but rather describe the most strongly enriched biological terms. Notably, nine of the coexpression modules are associated with the nervous system which is also supported by the patterns of G protein coupled receptors (GPCRs), neuropeptides, and nuclear hormone receptors (NHRs) (S9 Fig). All these gene classes are thought to be enriched in neurons [68,69]. In addition, we identified a gland cell related module which captures 17 out of 24 target genes of the regulators of the mouth-form polyphenism in P. pacificus [70] (S10 Fig). Altogether, this analysis predicts associations for 13,923 (48%) P. pacificus genes with biological processes and anatomical structures. This includes predicted associations for 3,556 (35%) of diplogastrid-specific orphan genes.

Table 1. Biological annotation for coexpression modules.

Most coexpression modules could be labeled with biological processes or tissues based on manual inspection of the results of different overrepresentation analyses.

Module Number of genes Name Evidence
1 3,202 Oogenesis 1 P6-P8, sex-bias, TEA
2 1,995 Spermatogenesis P5+P9, motile sperm proteins, sex-bias, TEA
3 1,735 Intestine 1 Intestine RNA-seq
4 1,618 Nervous system 1 GPCRs, neuropeptides, NHRs, P1-P3, TEA
5 1,099 Oogenesis 2 P6-P8, sex-bias, TEA
6 921 Nervous system 2 GPCRs, Axon regeneration, P2-P3, TEA
7 472 Oogenesis 3 P6-P8, sex-bias
8 388 Nervous system 3 GPCRs
9 346 Muscle / intestine 2 Intestine RNA-seq, TEA
10 304 Cuticle 1 Collagens, oscillation
11 300 Cuticle 2 Collagens, oscillation, TEA
12 253 Nervous system 4 GPCRs, NHRs
13 241 Intestine 3 Fatty acid degradation, drug metabolism, P4
14 235 Nervous system 5 GPCRs, P2, TEA
15 133 Unknown
16 116 Intestine 4 Intestine RNA-seq, P4
17 116 Unknown
18 104 Nervous system 6 TEA
19 94 Nervous system 7 TEA
20 81 Orphan 1
21 80 Intestine 4 Intestine RNA-seq
22 74 Orphan 2
23 72 Oogenesis4 P6-P8, TEA
24 64 Gland cell / mouthform P2, Astacins
25 60 Orphan 3
26 59 unknown
27 57 Nervous system 8 TEA
28 56 Nervous system 9 GPCRs

Comparisons with previous RNA-seq studies reveal interactions between environmental microbiota and the nervous system

In the current study, we wanted to get a very broad overview of expression changes across a wide range of bacterial environments. Therefore, we decided to sequence only a single transcriptome from many different bacterial environments. This is in contrast to previous studies that focussed on specific interactions between hosts and microbes and defined more robust transcriptomic changes with regard to the standard diet E. coli OP50 by sequencing multiple biological replicates [47,6567]. In order to gain additional support for the environmental regulation of the identified coexpression modules, we quantified how many of the coexpression modules exhibit associations with sets of significantly differentially expressed genes in six transcriptomic comparisons of distinct microbial environments including Novsphingobium, Lysinobacillus, E. coli K12, and two Cryptococcus yeast strains [47,6567]. This showed significant enrichments for 16 modules with differentially expressed genes in at least one of the previous studies (Fig 5C). Moreover, we find the intestinal module 9 and the nervous system related module 6 to be most frequently overrepresented. This further supports the impact of different bacteria on the nematode nervous system with potential consequences on behavior. Such an effect has been experimentally characterized in P. pacificus for the case of Novosphingobium bacteria. Feeding on these bacteria accelerates development and makes the worms more efficient predators of C. elegans [47]. Another example of behavioral modulation constitutes the effect of a neurotransmitter release by bacteria on C. elegans perception [45]. Notably, six of the nine nervous system related modules are significantly affected in at least one environmental comparison. This highlights the potential of microbial environments to modulate the nervous system and behavior.

Discussion

Which genes respond to changing environments? Do orphan genes play a role in adapting to diverse microbiota? Can we infer potential functions for orphan genes based on coexpression with known genes? How are new genes integrated into existing biological networks? To study these questions, we characterized the transcriptomic response of P. pacificus nematodes to 24 environmental microbiota. This analysis let to the identification of 28 large coexpression modules that contain almost half of all genes in P. pacificus. Further integrative analysis associated 22 of these modules with biological processes or tissues. These modules capture previously characterized gene sets such as collagens with oscillating expression [43] or target genes of mouth-form regulators that are expressed in the gland cell [70]. Moreover, these functionally annotated modules contain 3,556 (35%) of diplogastrid-specific orphan genes, of which a large fraction is found in the spermatogenesis associated module 2. While sperm associated genes are known to evolve rapidly [62,71], our study adds evidence that the expression of spermatogenesis-associated genes can be modulated by different microbiota. One major limitation of the current study consists in the fact that we only sequenced a single biological replicate per environment. This strongly limits the usefulness of our data set to dissect the effect of individual bacterial strains on nematode gene expression. However, we would argue that this does not undermine our main findings as coexpression signals can still be inferred from such a data set even if single expression estimates are not robust. In addition, we would like to point out that since the worms have been grown on selected bacteria, the effect of the bacteria likely accumulates in the nematodes throughout development and might be different from an immediate transcriptomic response after short time exposure. Thus, the observed differences could partially represent an indirect consequence of altered developmental rates. Even if only young adults were picked for RNA-seq experiments, we cannot exclude the possibility that the expression of certain gene modules reflects a chronological age rather than a morphological age. Future work could elucidate to what extent the transcriptomic responses represent immediate response to different bacteria or developmental variation. In addition, we currently do not know to what extent different bacteria constitute a diet or could also interact with nematodes by colonizing their gut [72]. Nevertheless, despite all these uncertainties, we would still argue that most of the observed transcriptomic variation represents either direct or indirect responses to different microbial environments.

One of the main achievements of our work constitutes the association of thousands of P. pacificus genes with biological processes and tissues. This includes thousands of diplogastrid-specific orphan genes. Of course, this type of functional annotation represents a completely different level than the knowledge for the two experimentally characterized orphan genes, where knockout and transgenic lines are available that show organism level phenotypes [34,73]. Our functional associations rely on the assumption that coexpression implies cofunctionality. Although this is a frequently employed method for functional assignment [35,36], the example of the out of testis pattern shows that this assumption might not always be true. Specifically, it has been shown that testis-biased expression of many new genes might be an effect of an overall permissive chromatin state leading to higher transcriptional complexity [64]. Thus, our data should rather be considered as a source to generate new hypotheses that have to be tested experimentally. As such, our functional annotations may be helpful to interpret future transcriptomic studies in P. pacificus. Some of these modules might also be relevant for studying the polyphenism of the mouth morphology in P. pacificus. This phenomenon has developed into one of the best studied animal systems for developmental plasticity [74,75]. Our observation of environmentally dependent expression variation in a gland cell related module that includes many target genes of mouth-form regulators [70] suggests an additional control layer that is independent of the mouth-form. This is because no bacteria are currently known that alter the mouth-form ratio in the highly predatory P. pacificus reference strain PS312 [47]. However, it could well be that the same bacteria can induce the predatory morph in a different wild isolate with a low or intermediate frequency of the predatory morph.

Another major finding concerns the integration of new genes into existing gene regulatory networks. Our analysis clearly shows that new genes are not randomly attached to existing modules, but rather specific modules have much higher propensity to acquire new genes. The likelihood to integrate new genes may be associated with the biological function of a given module. For example, while the oogenesis associated modules 1, 5, 7, and 23 are composed of the oldest gene sets, the spermatogenesis-associated module 2 has one of the youngest gene contents. The strong difference in evolvability between these two reproductive processes could be due to strong sperm competition which results from the difference in the number of gametes between both sexes [76]. Another factor could consist in the differential control of transposons between spermatocytes and oocytes resulting in higher potential for molecular innovation during spermatogenesis [77,78]. Another aspect of network evolution concerns the timing of events. The presence of many species-specific orphan genes in some coexpression modules suggests that integration can happen relatively fast. Together with a recent study on the evolution of the polyphenism network [79], our work reveals first insights into the evolution of environmentally responsive networks in Pristionchus nematodes and similar studies in other taxonomic groups could be done to test whether these observations reflect general patterns of gene-regulatory network evolution.

Methods

Bacterial Culture Conditions

All bacterial strains were recovered from glycerol stocks [42] and then plated on nematode growth medium (NGM) and incubated overnight at 37°C. From these plates single colonies were seeded in lysogeny broth (LB) medium and grown overnight at 37°C in a shaking incubator.

Nematode Culture Conditions

The wild type strain of P. pacificus (PS312) was maintained at 20°C on nematode growth medium (NGM) seeded with E. coli OP50 before use in experiments [80]. From every generation, five young adults were transferred to fresh plates with a wormpick.

RNA sequencing

Bleaching was used to synchronize P. pacificus nematodes and to remove E. coli OP50 before transferring bleached eggs to NGM plates with bacterial strains [80]. These bacterial plates were obtained by spotting 50 μL of overnight bacterial cultures onto each 6 cm NGM plate with an L-spreader followed by incubation for 2 days. The concentration of the overnight bacterial cultures was quantified by measuring the optical density at 600 nm (OD600). To make the OD600 of the initial cultures identical, the OD600 of the overnight cultures were measured and diluted to OD600 = 1 with fresh LB. Worms were grown on these plates at 20°C and every generation young adult worms were passaged to fresh bacterial plates. This was done for at least two generations. From mixed-stage cultures, 50 young adults with only one egg were manually selected with a wormpick under a binocular microscope into an Eppendorf tube containing 100 μl M9 buffer and immediately frozen at -80°C. Total RNA was extracted using Direct-Zol RNA Mini prep kit from Zymo Research according to the manufacturer’s guidelines. RNA libraries were prepared using the Illumina Truseq RNA library prep kit according to the manufacturer’s guidelines. The libraries were quantified using a combination of Qubit and Bioanalyzer (Agilent Technologies) and normalized to 2.5 nM. Samples were sequenced as 150 bp single end reads on multiplexed lanes of an Illumina HiSeq3000 in our inhouse sequencing facility. Raw reads were depositted at the European Nucleotide archive under the study accession PRJEB60166.

Developmental timing

Bacterial strains were grown overnight from a single colony in LB medium. The medium was shaken at 180 rpm and incubated at 30°C for LRB89 and LRB104 and 37°C for OP50. These cultures were spotted on 6-cm NGM plates and were grown for 2 days at RT. Synchronized J2 worms were obtained by bleaching and were put on these NGM plates [80]. Nematode cultures were grown at 20°C. 56h later, the distribution of developmental stages was scored based on the vulval development under a ZEISS SteREO Discovery microscope [47].

Identification of coexpression modules

The raw reads were aligned to the reference P. pacificus genome (version El Paco) with STAR (version 2.7.3a) and quantified with featureCounts from the Subread R package (version 2.0.1) using the current gene annotations (El Paco gene annotations version 3) [8183]. After prefiltering the count matrix by removing genes (rows) that have less than 10 reads total, we retained 23,294 (80.6%) of all P. pacificus genes with some evidence of expression. The read counts across the different conditions were normalized with the DESeq2 counts function (with normalized = TRUE option) [84]. The normalized read counts were used to create a coexpression network with MCL [48]. Different correlation and inflation thresholds were assessed for the coexpression network, ranging from 0.6 to 0.8 and from 1.5 to 4 respectively. The network was constructed using a correlation of 0.7 and an inflation threshold of 2. The network modules containing more than 50 genes were selected for further analysis. To assess the robustness of module assignments, we recomputed coexpression networks after either removing the outlier LRB104 or systematically downsampling the data set. Taking the full coexpression network calculated from 24 RNA-seq samples as reference, we evaluated the classification of 10,000 randomly chosen gene pairs from a subsampled RNA-seq data set. For the full and the subsampled data set, a gene pair was either classified to be part of the same module or not. If a gene pair was classified as being part of the same module in both data sets, such a pair was classified as true positive. If a pair was only part of a module in the subsampled data set, but not in the full data set, this was scored as false positive. True negative and false negative cases were defined analogously. Subsequently, the positive predictive value (PPV) and negative predictive value (NPV) for 10 randomly subsampled data sets of a fixed size (S5 Fig). For comparison, the data set without LRB104 yielded a PPV of 84% and an NPV of 98%.

Motif analysis and phylostratigraphy

To test for overrepresented motifs in the promoter regions, we focused on the 500-bp nucleotide sequence upstream of each annotated gene in a given module. These sequences were taken as input for the findMotif.pl script of the Hypergeometric Optimization of Motif EnRichment (HOMER) suite (version v4.10.1) [49]. As a background set, we used the 500-bp upstream sequences of all the other large coexpression modules. For visualization, we selected among the significant motifs only one representative motif per known motif match and applied a cutoff that a motif needs to be present in at least 20% of the promoter regions of a given module. For analyzing the distribution of phylostrata across coexpression modules, we assigned a gene to a phylostratum based on the presence of the most distant homolog. For this purpose, we performed BLASTP searches (e-value < 0.001) of all 28,896 P. pacificus proteins against protein data of nine diplogastrid genomes (version PPCAC) [30], C. elegans, B. xylophilus, B. malayi, and T. spiralis (WormBase ParaSite version WBPS14) [85]. This identified 10,318 diplogastrid specific orphan genes. To simulate the integration of new genes into existing network modules, we first defined the ancestral module sizes based on the number of ancient genes (non-orphan genes) for the 28 large modules. We then simulated the integration of new genes by either assigning equal probabilities for attachment to all network modules or by scaling the attachment probability as a linear function of the module size. The second model predicts much more accurately the number of orphan genes per module (Pearson’s r = 0.62, S7 Fig). Therefore, we used 100 iterations of the second model to calculate empirical P-values for the overrepresentation of orphan genes (P < 0.01, Figs S7 and 4).

Compilation of P. pacificus expression gene sets

Sets of 3,502 P. pacificus genes with regional expression were extracted from supplementary table S3 in [58]. These genes exhibit enriched expression in at least one out of eleven regions (P1-P11) that were identified by RNA tomography of adult P. pacificus animals. A conversion table between assembled transcripts (version trinity 2016) and current gene annotations (El Paco gene annotations V3) were obtained from the supplementary table S10 in [43]. We extracted the set of 2,964 developmentally oscillating genes from the supplementary table S3 in [43]. In addition, we reanalyzed P. pacificus RNA-seq data sets to compile candidate genes with intestinal expression [56], sex-biased expression [57], and differentially expressed genes in response to altered microbial environments [47,6567]. For the intestinal data and exposure to Lysinibacillus, we identified candidate genes by aligning RNA-seq reads to the P. pacificus reference assembly (version El Paco) with the help of the tophat alignment program (version v2.0.14, default settings) and testing for differential expression using the cuffdiff program (version v2.2.1, P<0.1) [86]. This yielded 723 intestine enriched genes, and 1,959 candidate genes with differential expression after one hour exposure to Lysinibacillus. For the data sets with multiple biological replicates [47,57,65,67], we realigned RNA-seq data with STAR (version 2.7.1a) [81], generated count matrices with the featureCounts function of the subread package in R (version 3.6.3) [82] and for significant differential expression with the DESeq2 package (FDR-corrected P<0.05) [84].

Overrepresentation of gene families, metabolic pathways, and expression sets

To test for overrepresentation of specific gene sets among the coexpression modules, we first generated protein domain annotations by the hmmsearch program (version 3.3, -E 0.001 option) against the Pfam-A.hmm database (version 3.1b2). Similarly, KEGG annotations were obtained by identification of orthologs in the KEGG database using the blastkoala web application (with the ‘eukaryotes’ and ‘family_eukaryotes’ values for taxonomic group and database level, respectively) [87]. Genes with orthologs in the KEGG database were then annotated with the corresponding C. elegans KEGG accessions. From the protein domain predictions, we identified 1,434 genes with protein domains that were termed as GPCRs and 275 putative NHR genes based on the presence of the Hormone_recep domain (PF00104). Potential neuropeptides were identified by BLASTP search (-evalue 0.001) of 107 C. elegans neuropeptides (flp and nlp genes) against the current P. pacificus proteins. For 59 C. elegans neuropeptides, we could identify 85 homologs in P. pacificus of which we only used 38 single copy candidates as neuropeptide candidates. These annotations were combined with various expression data sets to perform overrepresentation analysis using the Fisher’s exact-test with multiple testing correction (FDR corrected P < 106) in R (version 3.6.3).

Annotation of coexpression modules

In order to associate the coexpression modules with biological processes or tissues, we complemented the results of the overrepresentation analysis in P. pacificus with tissue enrichment analysis (TEA) of C. elegans orthologs [59]. Specifically, C. elegans orthologs of P. pacificus genes were extracted from the best reciprocal BLASTP hit data set from Athanasouli et al. (2020) [83] and submitted to the WormBase web interface (https://wormbase.org/tools/enrichment/tea/tea.cgi, version WS283) [83]. We manually inspected the results of all different analyses and subjectively assigned biological processes or tissues for a given module. Functional assignments with multiple types of evidence (e.g. Protein domains, spatial transcriptomics, sex-biased expression) should be considered as high confidence associations whereas associations that are supported by only a single analysis are more uncertain.

Supporting information

S1 Fig. Analysis of the most similar and dissimilar RNA-seq data sets.

The scatter plots show the normalized expression (TPM) for different pairs of samples. As lowly expressed genes tend to be more variable, we visualized the number of genes with at least two-fold expression difference across multiple mean expression levels.

(PDF)

S2 Fig. Parameter combinations for MCL clustering.

We evaluated the total number of modules and singleton modules as a function of the inflation factor (I) and correlation coefficient (r). High r and I values generally increase the number of modules, whereas low r and I parameters generate fewer but larger modules. We decided to use r = 0.7 and I = 2 for the final analysis, because it gave a moderate number of modules with a relatively low number of singletons.

(PDF)

S3 Fig. Developmental timing on different bacteria.

The distribution of developmental stages 56h after J2 synchronization is shown for P. pacificus worms on E. coli OP50 and two Wautersiella bacteria (5 biological replicates). Nematodes grow slower on both Wautersiella strains. The significance level was computed by a χ2 -test (mean P-value from all pairwise comparisons).

(PDF)

S4 Fig. Comparison of MCL networks constructed with or without LRB104.

The heatmap shows fraction genes from modules of the complete coexpression network (including LRB104) that overlap within a given module from the coexpression network without LRB104. For most modules there exists a 1–1 correspondence between both networks, indicating that the network structure is robust with regard to the Wautersiella LRB104 data set.

(PDF)

S5 Fig. Comparison of Coexpression networks on subsampled RNA-seq data.

Taking the full coexpression network calculated from 24 RNA-seq samples as reference, we evaluated the classification of 10,000 randomly chosen gene pairs using subsampled RNA-seq data. For the full and the subsampled data set, a gene pair was either classified to be part of the same module or not. This allowed us to calculate the positive predictive value (PPV, panel A) and negative predictive value (NPV, panel B) for 10 randomly subsampled data sets of a fixed size. While the NPV is always close to 1, the PPV shows drastic differences between the full and subsampled data. This suggests that with fewer RNA-seq samples, additional gene pairs are assigned to the same coexpression module. However, with additional samples, such modules may be split into smaller components. Consistently, the number of genes in large modules (N>50) is much higher for smaller sample sizes (panel C). Panel D shows the number of diplogastrid-specific orphan genes in large modules.

(PDF)

S6 Fig. Regulatory architecture of coexpression modules.

Significantly overrepresented motifs were identified for each module by the HOMER software. We arbitrarily selected motifs that occurred in at least 20% of promoters of a given module and visualized their distribution across all modules. Note that the complementary regulation by less frequent motifs and other regulatory mechanisms such as microRNAs are not considered here. Sequence logos for each motif are shown at the right and the labels to the left indicate the best motif match among known motifs.

(PDF)

S7 Fig. Enrichment analysis for simulated network evolution.

We simulated the integration of novel genes into existing networks by estimating ancestral module sizes based on the number of ancient genes (non-orphan genes) and then assigning an equivalent number of orphan genes to existing modules with equal probabilities (panel A and C) and with probabilities that were proportional to the module size (panel B and D). The scatterplots show the observed and simulated number of orphan genes per module. The barplots show the median enrichment of the observed relative to the simulated number of orphan genes (error bars indicate the minimal and maximal values from 100 simulations).

(PDF)

S8 Fig. Overrepresented protein domains in coexpression modules.

Specific protein domains are strongly overrepresented in twelve coexpression modules. The left barplot shows the number of genes with a given protein domain for each module and the the right bar plot shows the negative logarithm of the FDR corrected P-value (Fisher’s exact test). The most significant association is between Motile sperm proteins (PF00635) and coexpression module 2.

(PDF)

S9 Fig. Distribution of GPCRs, NHRs, and neuropeptides across the coexpression modules.

(PDF)

S10 Fig. Expression of selected module 24 genes.

The heatmap shows the expression of genes that are shared between module 24 and the target genes of mouth form regulators and additional module 24 genes with 1–1 orthologs in C. elegans. The C. elegans ortholog of PPA25527 (D1044.3) is reported to be expressed in the gland cell and reporter lines of multiple candidate genes show expression in the gland of P. pacificus (Sieribriennikov et al. 2020) [70].

(PDF)

S1 Table. Coexpression modules.

(XLSX)

S2 Table. Phylostrata.

(XLSX)

S3 Table. Protein domain information.

(XLSX)

S4 Table. Pristionchus pacificus KEGG annotation.

(XLSX)

S5 Table. Tissue enrichment analysis (TEA).

(XLSX)

Acknowledgments

We would like to thank all members of the Sommer lab for helpful discussions.

Data Availability

Raw reads were deposited at the European Nucleotide archive under the study accession PRJEB60166.

Funding Statement

This work was funded by the Max Planck Society. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Tautz D, Domazet-Lošo T. The evolutionary origin of orphan genes. Nat Rev Genet. 2011;12: 692–702. doi: 10.1038/nrg3053 [DOI] [PubMed] [Google Scholar]
  • 2.Arendsee ZW, Li L, Wurtele ES. Coming of age: orphan genes in plants. Trends Plant Sci. 2014;19: 698–708. doi: 10.1016/j.tplants.2014.07.003 [DOI] [PubMed] [Google Scholar]
  • 3.Weisman CM, Murray AW, Eddy SR. Many but not all lineage-specific genes can be explained by homology detection failure. PLoS Biol. 2020;18: e3000862. doi: 10.1371/journal.pbio.3000862 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Schlötterer C. Genes from scratch—the evolutionary fate of de novo genes. Trends Genet. 2015;31: 215–219. doi: 10.1016/j.tig.2015.02.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Van Oss SB, Carvunis A-R. De novo gene birth. PLoS Genet. 2019;15: e1008160. doi: 10.1371/journal.pgen.1008160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Knowles DG, McLysaght A. Recent de novo origin of human protein-coding genes. Genome Res. 2009;19: 1752–1759. doi: 10.1101/gr.095026.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ruiz-Orera J, Hernandez-Rodriguez J, Chiva C, Sabidó E, Kondova I, Bontrop R, et al. Origins of De Novo Genes in Human and Chimpanzee. PLoS Genet. 2015;11: e1005721. doi: 10.1371/journal.pgen.1005721 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zhao L, Saelao P, Jones CD, Begun DJ. Origin and spread of de novo genes in Drosophila melanogaster populations. Science. 2014;343: 769–772. doi: 10.1126/science.1248286 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gubala AM, Schmitz JF, Kearns MJ, Vinh TT, Bornberg-Bauer E, Wolfner MF, et al. The Goddard and Saturn Genes Are Essential for Drosophila Male Fertility and May Have Arisen De Novo. Mol Biol Evol. 2017;34: 1066–1082. doi: 10.1093/molbev/msx057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Zhang W, Gao Y, Long M, Shen B. Origination and evolution of orphan genes and de novo genes in the genome of Caenorhabditis elegans. Sci China Life Sci. 2019;62: 579–593. doi: 10.1007/s11427-019-9482-0 [DOI] [PubMed] [Google Scholar]
  • 11.Prabh N, Rödelsperger C. De Novo, divergence, and Mixed Origin Contribute to the Emergence of Orphan Genes in Pristionchus Nematodes. G3. 2019;9: 2277–2286. doi: 10.1534/g3.119.400326 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Carvunis A-R, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, Simonis N, et al. Proto-genes and de novo gene birth. Nature. 2012;487: 370–374. doi: 10.1038/nature11184 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Vakirlis N, Hebert AS, Opulente DA, Achaz G, Hittinger CT, Fischer G, et al. A Molecular Portrait of De Novo Genes in Yeasts. Mol Biol Evol. 2018;35: 631–645. doi: 10.1093/molbev/msx315 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Zhang L, Ren Y, Yang T, Li G, Chen J, Gschwend AR, et al. Rapid evolution of protein diversity by de novo origination in Oryza. Nature Ecology & Evolution. 2019;3: 679–690. [DOI] [PubMed] [Google Scholar]
  • 15.Yates TB, Feng K, Zhang J, Singan V, Jawdy SS, Ranjan P, et al. The Ancient Salicoid Genome Duplication Event: A Platform for Reconstruction of De Novo Gene Evolution in Populus trichocarpa. Genome Biology and Evolution. 2021;13: evab198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wissler L, Gadau J, Simola DF, Helmkampf M, Bornberg-Bauer E. Mechanisms and dynamics of orphan gene emergence in insect genomes. Genome Biol Evol. 2013;5: 439–455. doi: 10.1093/gbe/evt009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Athanasouli M, Rödelsperger C. Analysis of repeat elements in the Pristionchus pacificus genome reveals an ancient invasion by horizontally transferred transposons. BMC Genomics. 2022;23: 523. doi: 10.1186/s12864-022-08731-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Rödelsperger C, Prabh N, Sommer RJ. New Gene Origin and Deep Taxon Phylogenomics: Opportunities and Challenges. Trends Genet. 2019;35: 914–922. doi: 10.1016/j.tig.2019.08.007 [DOI] [PubMed] [Google Scholar]
  • 19.Vakirlis N, Acar O, Hsu B, Castilho Coelho N, Van Oss SB, Wacholder A, et al. De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences. Nat Commun. 2020;11: 781. doi: 10.1038/s41467-020-14500-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Chen S, Zhang YE, Long M. New Genes in Drosophila Quickly Become Essential. Science. 2010;330: 1682–1685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Khalturin K, Hemmrich G, Fraune S, Augustin R, Bosch TCG. More than just orphans: are taxonomically-restricted genes important in evolution? Trends Genet. 2009;25: 404–413. doi: 10.1016/j.tig.2009.07.006 [DOI] [PubMed] [Google Scholar]
  • 22.Cardoso-Silva CB, Aono AH, Mancini MC, Sforça DA, da Silva CC, Pinto LR, et al. Taxonomically Restricted Genes Are Associated With Responses to Biotic and Abiotic Stresses in Sugarcane (spp.). Front Plant Sci. 2022;13: 923069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Cheng CH. Evolution of the diverse antifreeze proteins. Curr Opin Genet Dev. 1998;8: 715–20. doi: 10.1016/s0959-437x(98)80042-7 [DOI] [PubMed] [Google Scholar]
  • 24.Sommer RJ. Pristionchus pacificus. WormBook. 2006; 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Howard RS, Giacomelli M, Lozano-Fernandez J, Edgecombe GD, Fleming JF, Kristensen RM, et al. The Ediacaran origin of Ecdysozoa: integrating fossil and phylogenomic data. J Geol Soc London. 2022;179: gs2021–107. [Google Scholar]
  • 26.Wang X, Sommer RJ. Antagonism of LIN-17/Frizzled and LIN-18/Ryk in nematode vulva induction reveals evolutionary alterations in core developmental pathways. PLoS Biol. 2011;9: e1001110. doi: 10.1371/journal.pbio.1001110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ishita Y, Chihara T, Okumura M. Serotonergic modulation of feeding behavior in Caenorhabditis elegans and other related nematodes. Neurosci Res. 2020;154: 9–19. doi: 10.1016/j.neures.2019.04.006 [DOI] [PubMed] [Google Scholar]
  • 28.Lo W-S, Roca M, Dardiry M, Mackie M, Eberhardt G, Witte H, et al. Evolution and Diversity of TGF-β Pathways are Linked with Novel Developmental and Behavioral Traits. Molecular Biology and Evolution. 2022;39: msac252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Borchert N, Dieterich C, Krug K, Schütz W, Jung S, Nordheim A, et al. Proteogenomics of Pristionchus pacificus reveals distinct proteome structure of nematode models. Genome Research. 2010;20: 837–846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Prabh N, Roeseler W, Witte H, Eberhardt G, Sommer RJ, Rödelsperger C. Deep taxon sampling reveals the evolutionary dynamics of novel gene families in Pristionchus nematodes. Genome Research. 2018;28: 1664–1674. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Prabh N, Rödelsperger C. Multiple Pristionchus pacificus genomes reveal distinct evolutionary dynamics between de novo candidates and duplicated genes. Genome Res. 2022;32: 1315–1327. doi: 10.1101/gr.276431.121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Stein JC, Yu Y, Copetti D, Zwickl DJ, Zhang L, Zhang C, et al. Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. Nat Genet. 2018;50: 285–296. doi: 10.1038/s41588-018-0040-0 [DOI] [PubMed] [Google Scholar]
  • 33.Dowling D, Schmitz JF, Bornberg-Bauer E. Stochastic Gain and Loss of Novel Transcribed Open Reading Frames in the Human Lineage. Genome Biol Evol. 2020;12: 2183–2195. doi: 10.1093/gbe/evaa194 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Mayer MG, Rödelsperger C, Witte H, Riebesell M, Sommer RJ. The Orphan Gene dauerless Regulates Dauer Development and Intraspecific Competition in Nematodes by Copy Number Variation. PLoS Genet. 2015;11: e1005146. doi: 10.1371/journal.pgen.1005146 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Li J, Singh U, Arendsee Z, Wurtele ES. Landscape of the Dark Transcriptome Revealed Through Re-mining Massive RNA-Seq Data. Front Genet. 2021;12: 722981. doi: 10.3389/fgene.2021.722981 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, et al. The STRING database in 2017: quality-controlled protein–protein association networks, made broadly accessible. Nucleic Acids Research. 2017;45: D362–D368. doi: 10.1093/nar/gkw937 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kim SK, Lund J, Kiraly M, Duke K, Jiang M, Stuart JM, et al. A gene expression map for Caenorhabditis elegans. Science. 2001;293: 2087–2092. doi: 10.1126/science.1061603 [DOI] [PubMed] [Google Scholar]
  • 38.Marchesi JR, Ravel J. The vocabulary of microbiome research: a proposal. Microbiome. 2015;3: 31. doi: 10.1186/s40168-015-0094-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Watson E, MacNeil LT, Ritter AD, Safak Yilmaz L, Rosebrock AP, Caudy AA, et al. Interspecies Systems Biology Uncovers Metabolites Affecting C. elegans Gene Expression and Life History Traits. Cell. 2014;156: 759–770. doi: 10.1016/j.cell.2014.01.047 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Zhang F, Berg M, Dierking K, Félix M-A, Shapira M, Samuel BS, et al. Caenorhabditis elegans as a Model for Microbiome Research. Frontiers in Microbiology. 2017;8: 485. doi: 10.3389/fmicb.2017.00485 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Sinha A, Rae R, Iatsenko I, Sommer RJ. System wide analysis of the evolution of innate immunity in the nematode model species Caenorhabditis elegans and Pristionchus pacificus. PLoS One. 2012;7: e44255. doi: 10.1371/journal.pone.0044255 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Akduman N, Rödelsperger C, Sommer RJ. Culture-based analysis of Pristionchus-associated microbiota from beetles and figs for studying nematode-bacterial interactions. PLoS One. 2018;13: e0198018. doi: 10.1371/journal.pone.0198018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Sun S, Rödelsperger C, Sommer RJ. Single worm transcriptomics identifies a developmental core network of oscillating genes with deep conservation across nematodes. Genome Res. 2021;31: 1590–1601. doi: 10.1101/gr.275303.121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Baskaran P, Rödelsperger C, Prabh N, Serobyan V, Markov GV, Hirsekorn A, et al. Ancient gene duplications have shaped developmental stage-specific expression in Pristionchus pacificus. BMC Evol Biol. 2015;15: 185. doi: 10.1186/s12862-015-0466-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.O’Donnell MP, Fox BW, Chao P-H, Schroeder FC, Sengupta P. A neurotransmitter produced by gut bacteria modulates host sensory behaviour. Nature. 2020;583: 415–420. doi: 10.1038/s41586-020-2395-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Lansdon P, Carlson M, Ackley BD. Wild-type Caenorhabditis elegans isolates exhibit distinct gene expression profiles in response to microbial infection. BMC Genomics. 2022;23: 229. doi: 10.1186/s12864-022-08455-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Akduman N, Lightfoot JW, Röseler W, Witte H, Lo W-S, et al. Bacterial vitamin B12 production enhances predatory behaviors in nematodes. ISME J. 2020; 14:1494–1507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30: 1575–1584. doi: 10.1093/nar/30.7.1575 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38: 576–589. doi: 10.1016/j.molcel.2010.05.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Narasimhan K, Lambert SA, Yang AWH, Riddell J, Mnaimneh S, Zheng H, et al. Mapping and analysis of Caenorhabditis elegans transcription factor sequence specificities. Elife. 2015;44: e06967. doi: 10.7554/eLife.06967 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Burglin TR, Ruvkun G. Regulation of ectodermal and excretory function by the C. elegans POU homeobox gene ceh-6. Development. 2001;128: 779–790. doi: 10.1242/dev.128.5.779 [DOI] [PubMed] [Google Scholar]
  • 52.Domazet-Lošo T, Brajković J, Tautz D. A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends in Genetics. 2007;23: 533–539. doi: 10.1016/j.tig.2007.08.014 [DOI] [PubMed] [Google Scholar]
  • 53.Kikuchi T, Cotton JA, Dalzell JJ, Hasegawa K, Kanzaki N, McVeigh P, et al. Genomic insights into the origin of parasitism in the emerging plant pathogen Bursaphelenchus xylophilus. PLoS Pathog. 2011;7: e1002219. doi: 10.1371/journal.ppat.1002219 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Foster JM, Grote A, Mattick J, Tracey A, Tsai Y-C, Chung M, et al. Sex chromosome evolution in parasitic nematodes of humans. Nat Commun. 2020;11: 1964. doi: 10.1038/s41467-020-15654-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Mitreva M, Jasmer DP, Zarlenga DS, Wang Z, Abubucker S, Martin J, et al. The draft genome of the parasitic nematode Trichinella spiralis. Nat Genet. 2011;43: 228–235. doi: 10.1038/ng.769 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Lightfoot JW, Chauhan VM, Aylott JW, Rödelsperger C. Comparative transcriptomics of the nematode gut identifies global shifts in feeding mode and pathogen susceptibility. BMC Res Notes. 2016;9: 142. doi: 10.1186/s13104-016-1886-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Rödelsperger C, Röseler W, Prabh N, Yoshida K, Weiler C, Herrmann M, et al. Phylotranscriptomics of Pristionchus Nematodes Reveals Parallel Gene Loss in Six Hermaphroditic Lineages. Curr Biol. 2018;28: 3123–3127.e5. doi: 10.1016/j.cub.2018.07.041 [DOI] [PubMed] [Google Scholar]
  • 58.Rödelsperger C, Ebbing A, Sharma DR, Okumura M, Sommer RJ, Korswagen HC. Spatial Transcriptomics of Nematodes Identifies Sperm Cells as a Source of Genomic Novelty and Rapid Evolution. Mol Biol Evol. 2021;38: 229–243. doi: 10.1093/molbev/msaa207 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Angeles-Albores D N Lee RY, Chan J, Sternberg PW. Tissue enrichment analysis for C. elegans genomics. BMC Bioinformatics. 2016;17: 366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Le TS, Nguyen THG, Ha BH, Huong BTM, Nguyen TTH, Vu KD, et al. Reproductive Span of is Extended by Sp. J Nematol. 2022;54: 20220010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Kissoyan KAB, Peters L, Giez C, Michels J, Pees B, Hamerich IK, et al. Exploring Effects of Protective Natural Microbiota on Host Physiology. Front Cell Infect Microbiol. 2022;12: 775728. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Cutter AD, Ward S. Sexual and Temporal Dynamics of Molecular Evolution in C. elegans Development. Molecular Biology and Evolution. 2005;22: 178–188. doi: 10.1093/molbev/msh267 [DOI] [PubMed] [Google Scholar]
  • 63.Levine MT, Jones CD, Kern AD, Lindfors HA, Begun DJ. Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci U S A. 2006;103: 9935–9939. doi: 10.1073/pnas.0509809103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Soumillon M, Necsulea A, Weier M, Brawand D, Zhang X, Gu H, et al. Cellular source and mechanisms of high transcriptome complexity in the mammalian testis. Cell Rep. 2013;3: 2179–2190. doi: 10.1016/j.celrep.2013.05.031 [DOI] [PubMed] [Google Scholar]
  • 65.Han Z, Sieriebriennikov B, Susoy V, Lo W-S, Igreja C, Dong C, et al. Horizontally Acquired Cellulases Assist the Expansion of Dietary Range in Pristionchus Nematodes. Mol Biol Evol. 2022;39: msab370. doi: 10.1093/molbev/msab370 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Lo W-S, Han Z, Witte H, Röseler W, Sommer RJ. Synergistic interaction of gut microbiota enhances the growth of nematode through neuroendocrine signaling. Curr Biol. 2022;32: 2037–2050.e4. doi: 10.1016/j.cub.2022.03.056 [DOI] [PubMed] [Google Scholar]
  • 67.Sanghvi GV, Baskaran P, Röseler W, Sieriebriennikov B, Rödelsperger C, Sommer RJ. Life History Responses and Gene Expression Profiles of the Nematode Pristionchus pacificus Cultured on Cryptococcus Yeasts. PLoS One. 2016;11: e0164881. doi: 10.1371/journal.pone.0164881 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Sural S, Hobert O. Nematode nuclear receptors as integrators of sensory information. Curr Biol. 2021;31: 4361–4366.e2. doi: 10.1016/j.cub.2021.07.019 [DOI] [PubMed] [Google Scholar]
  • 69.Hobert O. The neuronal genome of Caenorhabditis elegans. WormBook. 2013. 1–106. doi: 10.1895/wormbook.1.161.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Sieriebriennikov B, Sun S, Lightfoot JW, Witte H, Moreno E, et al. Conserved nuclear hormone receptors controlling a novel plastic trait target fast-evolving genes expressed in a single cell. PLoS Genet. 2020;16: e1008687. doi: 10.1371/journal.pgen.1008687 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Verster AJ, Styles EB, Mateo A, Derry WB, Andrews BJ, Fraser AG. Taxonomically Restricted Genes with Essential Functions Frequently Play Roles in Chromosome Segregation in and. G3. 2017;7: 3337–3347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Dirksen P, Marsh SA, Braker I, Heitland N, Wagner S, Nakad R, et al. The native microbiome of the nematode Caenorhabditis elegans: gateway to a new host-microbiome model. BMC Biol. 2016;14: 38. doi: 10.1186/s12915-016-0258-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Lightfoot JW, Wilecki M, Rödelsperger C, Moreno E, Susoy V, Witte H, et al. Small peptide-mediated self-recognition prevents cannibalism in predatory nematodes. Science. 2019;364: 86–89. doi: 10.1126/science.aav9856 [DOI] [PubMed] [Google Scholar]
  • 74.Sommer RJ, Dardiry M, Lenuzzi M, Namdeo S, Renahan T, Sieriebriennikov B, et al. The genetics of phenotypic plasticity in nematode feeding structures. Open Biol. 2017;7. doi: 10.1098/rsob.160332 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Namdeo S, Moreno E, Rödelsperger C, Baskaran P, Witte H, Sommer RJ. Two independent sulfation processes regulate mouth-form plasticity in the nematode. Development. 2018;145: dev166272. [DOI] [PubMed] [Google Scholar]
  • 76.Parker GA. Conceptual developments in sperm competition: a very brief synopsis. Philosophical Transactions of the Royal Society B: Biological Sciences. 2020; 375:20200061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Kurhanewicz NA, Dinwiddie D, Bush ZD, Libuda DE. Elevated Temperatures Cause Transposon-Associated DNA Damage in C. elegans Spermatocytes. Curr Biol. 2020;30: 5007–5017.e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Bhalla N. Meiosis: Is Spermatogenesis Stress an Opportunity for Evolutionary Innovation? Current biology: CB. 2020. pp. R1471–R1473. doi: 10.1016/j.cub.2020.10.042 [DOI] [PubMed] [Google Scholar]
  • 79.Casasa S, Biddle JF, Koutsovoulos GD, Ragsdale EJ. Polyphenism of a Novel Trait Integrated Rapidly Evolving Genes into Ancestrally Plastic Networks. Mol Biol Evol. 2021;38: 331–343. doi: 10.1093/molbev/msaa235 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Pires-daSilva A. Pristionchus pacificus protocols. WormBook. 2013; 1–20. doi: 10.1895/wormbook.1.114.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29: 15–21. doi: 10.1093/bioinformatics/bts635 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30: 923–30. doi: 10.1093/bioinformatics/btt656 [DOI] [PubMed] [Google Scholar]
  • 83.Athanasouli M, Witte H, Weiler C, Loschko T, Eberhardt G, Sommer RJ, et al. Comparative genomics and community curation further improve gene annotations in the nematode Pristionchus pacificus. BMC Genomics. 2020;21: 708. doi: 10.1186/s12864-020-07100-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15: 550. doi: 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Bolt BJ, Rodgers FH, Shafie M, Kersey PJ, Berriman M, Howe KL. Using WormBase ParaSite: An Integrated Platform for Exploring Helminth Genomic Data. Methods Mol Biol. 2018;1757: 471–491. [DOI] [PubMed] [Google Scholar]
  • 86.Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7: 562–578. doi: 10.1038/nprot.2012.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Kanehisa M, Sato Y, Morishima K. BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. J Mol Biol. 2016;428: 726–731. doi: 10.1016/j.jmb.2015.11.006 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Gregory S Barsh, Kaveh Ashrafi

14 Apr 2023

Dear Dr %Rödelsperger%,

Thank you very much for submitting your Research Article entitled 'Thousands of Pristionchus pacificus orphan genes were integrated into developmental networks that respond to diverse environmental microbiota' to PLOS Genetics.

The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review a much-revised version. We cannot, of course, promise publication at that time.

Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions.

Yours sincerely,

Kaveh Ashrafi

Academic Editor

PLOS Genetics

Gregory Barsh

Editor-in-Chief

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: This article may represent a significant step toward understanding orphan gene function in P. pacificus and possibly beyond (including the nematode model C. elegans), which should accelerate their experimental study and elucidation. The authors provide a creative roadmap for exploring orphan gene function in other organisms, by exploiting now widely available transcriptomics technologies and datasets. They also provide a unique open resource that can be mined by other groups for their own investigations. Lastly, the authors provide essential insight into the sexually dimorphic evolution of gene regulatory networks, which resonates well beyond nematode phyla. The article is very well written and well-presented, making it an easy and interesting read.

The broader idea behind the authors approach is reminiscent of the C. elegans gene expression mountain database that was used successfully in earlier years to narrow-down investigation of gene functions and genetic interactions by C. elegans biologists. Thanks to the authors, other researchers will be able to focus the experimental investigation of diplogastrid orphan genes around specified predicted functions. This may alleviate a key bottleneck in attributing function to orphan genes, and thus improve the depth of annotation for less conserved genetic sequences, especially in newly sequenced divergent genomes.

Major limitation.

While the authors analysis is innovative and convincing, the dataset itself is limiting as the authors do not appear to have performed biological replicates for the conditions studied, instead relying on sequencing 24 transcriptome of young P. pacificus adults grown for two generations on 24 different monoxenic bacterial cultures. While the methodologies indicate that 50 adults per sample were individually picked checking that they were visually (or morphologically as stated by the authors) at the same developmental stage (holding a single egg), the apparent absence of biological replicates in this analysis (by contrast to collecting triplicates or tetraplicates in other studies) undermines the strength of the study’s conclusions. Given the observed variability in both worm and bacterial growth across biological replicates in other studies and their possible impact on transcriptome expression, the lack of biological replicates here means that we could be looking here at effects that are specific to this dataset and cannot be easily reproduced.

While this undermines the usefulness of this dataset in understanding how given bacterial diets may specifically influence P. pacificus gene expression, it would not majorly affect the prediction of biological functions associated with orphan gene clusters, nor would it diminish the interest of the analytical approach chosen.

Although the authors initially (in their results) propose interpretations that, one may argue, go beyond what can be reasonably concluded from their results, the discussion is very sensible and highlights key limitations of the study, thus providing a fair and insightful account of their research.

They further highlight the fact that the main value of the study lies as a resource and a predictive tool to guide future investigations of orphan genes, which I agree with. Beyond that, the various ways they analyzed their data and compared them to other published datasets has clear merit and will be a source of inspiration for others.

Specific issues

1) Some methods are minimal, particularly for sample preparation, and would not allow for replication of the findings. Additional references to the standard protocols used, and details about the sequencing facility used (one cannot assume that an Illumina platform is standard equipment, yet) are necessary.

2) The authors should also be careful in their use of the word nematode (generalization) in lieu of P. pacificus. This is particularly problematic when looking at interactions with bacteria as adult P. pacificus are parasites and not bacterivorous predators, by contrast with C. elegans for instance. C. elegans adult responses to bacteria likely reflect influences from both a dietary and a gut microbiota component. Generalizing statements to “nematodes” that include species with distinct ecologies is inaccurate.

3) Lines 298-299: « suggesting that the transcriptomic response of nematodes to various environmental microbiota do not strictly reflect phylogeny ». This is seemingly true for P. pacificus but perhaps misleading as it could be that strain-specific changes are obscuring phylum specific changes (a greater coverage of bacterial diversity might reveal that). This could also be due to a methodological limitation: the limited ability to detect changes led to overlooking phylum-specific changes.

Most of the variation seem to come from a couple of isolates LRB104, 80 and 17, which may dwarf other changes and lower the discovery rate of other interesting effects. Would the authors consider rerunning the analysis excluding at least LRB104?

Fig2, could the authors also provide a reordered ranking of bacterial list based on phylogeny to help visualizing whether one or more modules may change expression based on bacterial phylogeny? Not all transcriptional changes may be equal in terms of biological relevance and key modules that could be related to known immune or metabolic pathways might actually map phylogeny when others don’t. The whole transcriptome picture may be masking that.

4) Could the authors specify in the fig 3 legend and/or the text what control condition they used to monitor gene expression in the 28 modules over time? I assume it is E. coli OP50 but it needs to be made clear there for non-nematologists. The authors should also specify the nematode species in the legend. It would also be very interesting to identify the timing of larval stages either on the left or the right of the Fig3 plot.

5) Fig4, how about reordering modules based on similarity in regulatory architecture? 1, 5, 7 and 23 for instance will clearly appear correlated. While I appreciate the unbiased approach, have the authors considered looking for known and functionally characterized regulatory motifs associated with major transcription factors to see how they cluster across their modules? Note that some motifs might be engaged successively within a module because the module includes transcription factors and microRNAs acting within the same pathway, and these may be missed with the 20% threshold applied. Perhaps a comment on that cold be helpful.

6) Fig 5. Interesting representation. About the expression “gene age class”, pardon my candor here but is it so commonly used that it cannot be said differently? My issue with the phrase is that it resemble “age genes”, which characterizes genes modulating aging. Could it be referred to as “gene ancestry class” instead?

7) The authors state line 349 that « the integration of new genes into regulatory network can happen very rapidly ». This would not be particularly surprising for once, but secondly, the evolutionary distance considered here is pretty long (likely representing over 200 million years) as nematodes are an ancient and extent phylum. If this were to be considered from a mammalian evolution perspective, it would be a slow process. Rather than stating “very rapidly”, “relatively rapidly” may be preferred.

8) Figure 7C, “maile-biased” instead of “male-biased”

9) Sperm protein encoding genes tend to regularly pop up as very significantly modulated in C. elegans transcriptomic studies. The significance of these changes is mostly understudied, and apart from the authors’ previous work, has also been attributed to differences in biological age between the conditions tested that careful sampling did not manage to avoid. This is particularly believable when comparing worms grown on distinct diets and/or bacteria as these often affect reproductive timing more than other physiological functions, leading to a different developmental timing for reproductive organs vs other major organs. The fact that these genes appear as one of the major orphan gene clusters revealed in this study, is thus unsurprising. The authors do acknowledge this in conclusion but perhaps it could also come across in their result section.

Reviewer #2: The study described in the manuscript takes an interesting approach to assigning functional annotations, even if putative, to orphan genes unique to the parasitic model nematode Pristionchus pacificus. The premise of using expression under exposure to different bacteria as a proxy for function makes sense. The described analysis enables exploration of ideas about the creation of new genes and function and their incorporation into regulatory networks and offers a resource that can be utilized as a starting point for studying the function of otherwise uncharacterized genes. With that said, some data is over-interpretated, assigning significance (without statistical support) to patterns that might arise randomly, and also, in some cases, too much information, including figures that are more suitable for supplementary information (i.e. Figs. 6 and 4). Detailed comments are described below:

1) It’s not clear to me why the authors would start with a paragraph that global transcription profiles of P. pacificus worms raised on different bacteria are essentially similar. Considering that subsequent analyses focus on differences between gene responses to different bacteria, this runs the risk of confusing the reader. Differences of interest, which are described later on, are not expected to be discerned at the global scale. In fact, in the second section the authors write “Thus, almost half of the 28,896 annotated genes in P. pacificus respond to environmental microbiota.” If so, how come the global correlations were so high (0.9)?

2) The identification of modules of co-expressed genes is a central element in this study. The authors demonstrate that the distinction between these modules is robust so that removal of the data for the most strongly affecting bacteria LRB104 does not change module membership.

Repeating this with other expression profiles – removing data for different strains and recomputing the graph - could lend further confidence in the described model.

3) In fig. 3 the authors suggest that developmental signatures in co-expression modules on different bacteria may represent effects of different bacteria on developmental timing, but the authors indicated earlier that all worms harvested were adults of the same, relatively precise stage. If so, developmental delays are not an option here. This runs the risk of confusing the reader.

4) Fig. 5 is particularly important for the analysis, as it describes that relationships between orphan genes and co-expression modules. In attempting to establish a relationship I see no statistical evaluation. The authors report that 3727 orphan genes are included among the co-expressed genes but considering that a third of P. pacificus genes are considered orphans, and that co-expressed modules included 14,275 genes, 3727 seems like an under-representation of orphan genes among the co-expressed genes, so there is nothing particularly special about their inclusion in the modules. This could happen by chance. So, it’s not clear what’s the point the authors are trying to make here. Is the point that orphan genes are part of regulatory networks in the first place? If so, isn’t that the expectation for de novo created genes, which are thought to arise (at least in some cases) through the evolution of regulatory sequences (Van Oss 2019)? This needs to be clarified.

5) “The distribution of age classes across coexpression modules suggests that new genes are not attached randomly into regulatory networks, i.e. not every module is equally likely to acquire a new gene.” – The authors should provide statistical support for this claim of non-random distribution, or avoid it. The data does not rule out that new genes are in fact attached randomly into regulatory networks.

6) Why is module 1 defined as “highly constrained”? what does that mean? And what is the “fast evolving module”? no number is mentioned.

7) Figures 4 and 6 don’t seem to add useful information, or such that is necessary to better understand the data. They are better off moved to supplementary material.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Decision Letter 1

Gregory S Barsh, Kaveh Ashrafi

7 Jun 2023

Dear Dr %Rödelsperger%,

Thank you very much for submitting your Research Article entitled 'Thousands of Pristionchus pacificus orphan genes were integrated into developmental networks that respond to diverse environmental microbiota' to PLOS Genetics.

The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some  minor concerns that we ask you address in a revised manuscript.

We therefore ask you to modify the manuscript according to the review recommendations. Your revisions should address the specific points made by each reviewer.

In addition we ask that you:

1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

Please let us know if you have any questions while making these revisions.

Yours sincerely,

Kaveh Ashrafi

Academic Editor

PLOS Genetics

Gregory Barsh

Editor-in-Chief

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I appreciate the extra explanations and supporting information provided, which I believe has clarified key points and improved the manuscript. The revised version has now addressed all comments raised, although the limitation of dealing with single biological replicates remains.

While the authors have reported and discussed this limitation, it may be worth reiterating it in the abstract to not mislead (careless) readers:

" Specifically, we performed RNA-seq experiments of P. pacificus worms grown on monoxenic cultures of 24 different bacteria" could be rephrased as follows: "Specifically, we analysed 24 RNA-seq samples from adult P. pacificus worms raised on 24 different monoxenic bacterial cultures".

I also have a couple of additional points that I would like to be addressed before publication:

It is customary to provide a general method section that describes the way worms and bacteria are routinely maintained.

There are several instances of abbreviations used in Tables and Figures that are not spelled out in the legends. While some specialist may know exactly what they mean, the broader readership of PLoS Genetics would not.

Finally, there are issues with the way the pdf file comes out, so it would be worth ascertaining that the resolution of the figures is as intended before publishing it.

I trust that all transcriptomics datasets (raw reads) will be made available for others to study.

Reviewer #2: The revisions made in the new version address my earlier concerns. Two small comments:

1 .It would be less confusing in Fig. 3 to distinguish the colors for modules from those for developmental stages.

2. In the abstract, line 50: “higher” than what? Could the author mean just “high”? or, “higher than expected by chance”?

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Decision Letter 2

Gregory S Barsh, Kaveh Ashrafi

15 Jun 2023

Dear Dr %Rödelsperger%,

We are pleased to inform you that your manuscript entitled "Thousands of Pristionchus pacificus orphan genes were integrated into developmental networks that respond to diverse environmental microbiota" has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Kaveh Ashrafi

Academic Editor

PLOS Genetics

Gregory Barsh

Editor-in-Chief

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-23-00215R2

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Acceptance letter

Gregory S Barsh, Kaveh Ashrafi

26 Jun 2023

PGENETICS-D-23-00215R2

Thousands of Pristionchus pacificus orphan genes were integrated into developmental networks that respond to diverse environmental microbiota

Dear Dr Rödelsperger,

We are pleased to inform you that your manuscript entitled "Thousands of Pristionchus pacificus orphan genes were integrated into developmental networks that respond to diverse environmental microbiota" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Anita Estes

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Analysis of the most similar and dissimilar RNA-seq data sets.

    The scatter plots show the normalized expression (TPM) for different pairs of samples. As lowly expressed genes tend to be more variable, we visualized the number of genes with at least two-fold expression difference across multiple mean expression levels.

    (PDF)

    S2 Fig. Parameter combinations for MCL clustering.

    We evaluated the total number of modules and singleton modules as a function of the inflation factor (I) and correlation coefficient (r). High r and I values generally increase the number of modules, whereas low r and I parameters generate fewer but larger modules. We decided to use r = 0.7 and I = 2 for the final analysis, because it gave a moderate number of modules with a relatively low number of singletons.

    (PDF)

    S3 Fig. Developmental timing on different bacteria.

    The distribution of developmental stages 56h after J2 synchronization is shown for P. pacificus worms on E. coli OP50 and two Wautersiella bacteria (5 biological replicates). Nematodes grow slower on both Wautersiella strains. The significance level was computed by a χ2 -test (mean P-value from all pairwise comparisons).

    (PDF)

    S4 Fig. Comparison of MCL networks constructed with or without LRB104.

    The heatmap shows fraction genes from modules of the complete coexpression network (including LRB104) that overlap within a given module from the coexpression network without LRB104. For most modules there exists a 1–1 correspondence between both networks, indicating that the network structure is robust with regard to the Wautersiella LRB104 data set.

    (PDF)

    S5 Fig. Comparison of Coexpression networks on subsampled RNA-seq data.

    Taking the full coexpression network calculated from 24 RNA-seq samples as reference, we evaluated the classification of 10,000 randomly chosen gene pairs using subsampled RNA-seq data. For the full and the subsampled data set, a gene pair was either classified to be part of the same module or not. This allowed us to calculate the positive predictive value (PPV, panel A) and negative predictive value (NPV, panel B) for 10 randomly subsampled data sets of a fixed size. While the NPV is always close to 1, the PPV shows drastic differences between the full and subsampled data. This suggests that with fewer RNA-seq samples, additional gene pairs are assigned to the same coexpression module. However, with additional samples, such modules may be split into smaller components. Consistently, the number of genes in large modules (N>50) is much higher for smaller sample sizes (panel C). Panel D shows the number of diplogastrid-specific orphan genes in large modules.

    (PDF)

    S6 Fig. Regulatory architecture of coexpression modules.

    Significantly overrepresented motifs were identified for each module by the HOMER software. We arbitrarily selected motifs that occurred in at least 20% of promoters of a given module and visualized their distribution across all modules. Note that the complementary regulation by less frequent motifs and other regulatory mechanisms such as microRNAs are not considered here. Sequence logos for each motif are shown at the right and the labels to the left indicate the best motif match among known motifs.

    (PDF)

    S7 Fig. Enrichment analysis for simulated network evolution.

    We simulated the integration of novel genes into existing networks by estimating ancestral module sizes based on the number of ancient genes (non-orphan genes) and then assigning an equivalent number of orphan genes to existing modules with equal probabilities (panel A and C) and with probabilities that were proportional to the module size (panel B and D). The scatterplots show the observed and simulated number of orphan genes per module. The barplots show the median enrichment of the observed relative to the simulated number of orphan genes (error bars indicate the minimal and maximal values from 100 simulations).

    (PDF)

    S8 Fig. Overrepresented protein domains in coexpression modules.

    Specific protein domains are strongly overrepresented in twelve coexpression modules. The left barplot shows the number of genes with a given protein domain for each module and the the right bar plot shows the negative logarithm of the FDR corrected P-value (Fisher’s exact test). The most significant association is between Motile sperm proteins (PF00635) and coexpression module 2.

    (PDF)

    S9 Fig. Distribution of GPCRs, NHRs, and neuropeptides across the coexpression modules.

    (PDF)

    S10 Fig. Expression of selected module 24 genes.

    The heatmap shows the expression of genes that are shared between module 24 and the target genes of mouth form regulators and additional module 24 genes with 1–1 orthologs in C. elegans. The C. elegans ortholog of PPA25527 (D1044.3) is reported to be expressed in the gland cell and reporter lines of multiple candidate genes show expression in the gland of P. pacificus (Sieribriennikov et al. 2020) [70].

    (PDF)

    S1 Table. Coexpression modules.

    (XLSX)

    S2 Table. Phylostrata.

    (XLSX)

    S3 Table. Protein domain information.

    (XLSX)

    S4 Table. Pristionchus pacificus KEGG annotation.

    (XLSX)

    S5 Table. Tissue enrichment analysis (TEA).

    (XLSX)

    Attachment

    Submitted filename: nutrigenomics responses.pdf

    Attachment

    Submitted filename: Response environmentally responsive orphans-1.pdf

    Data Availability Statement

    Raw reads were deposited at the European Nucleotide archive under the study accession PRJEB60166.


    Articles from PLOS Genetics are provided here courtesy of PLOS

    RESOURCES