Abstract
Micro-eukaryotes are ubiquitous and play vital roles in diverse ecological systems, yet their diversity and functions are scarcely known. This may be due to the limitations of formerly used conventional culture-based methods. Metagenomics and metatranscriptomics are enabling to unravel the genomic, metabolic, and phylogenetic diversity of micro-eukaryotes inhabiting in different ecosystems in a more comprehensive manner. The in-depth study of structural and functional characteristics of micro-eukaryote community residing in soil is crucial for the complete understanding of this major ecosystem. This review provides a deep insight into the methodologies employed under these approaches to study soil micro-eukaryotic organisms. Furthermore, the review describes available computational tools, pipelines, and database sources and their manipulation for the analysis of sequence data of micro-eukaryotic origin. The challenges and limitations of these approaches are also discussed in detail. In addition, this review summarizes the key findings of metagenomic and metatranscriptomic studies on soil micro-eukaryotes. It also highlights the exploitation of these methods to study the structural as well as functional profiles of soil micro-eukaryotic community and to screen functional eukaryotic protein coding genes for biotechnological applications along with the future perspectives in the field.
Keywords: Soil, Micro-eukaryotes, Microbial diversity, cDNA library, 18S rRNA
Introduction
The ecosystem portrays the interactions of biotic components with its abiotic surroundings (Schowalter 2022). Its proper functioning is dependent on these interactions (Madigan et al. 1997; Wardle 2006; Debroas et al. 2017). The biotic components of ecological systems include diverse forms of life including plants, animals, and microorganisms. Microbes were the first organisms that were originated about 3.8 billion years ago and they still represent the major life forms (Cooper 2000). Although tiny, these organisms have significant impacts on environmental processes in both spatial and temporal dimensions due to their vast diversity at both chemical and molecular levels. They engage in dynamic interactions and sustain relationship with both other microbes and higher organisms, as noted by Escalas et al (2019). Microbial communities perform crucial functions in almost every biogeochemical process that keeps this planet habitable (Falkowski et al. 2008). They contribute significantly to various ecosystem functions like climate regulation, nutrient cycling, primary production, decomposition, and carbon storage. They are also involved in transformation of chemicals and metal compounds that are considered as environmental pollutants (Giller et al. 2004; Ducklow 2008).
The environmental microbiome consists of both prokaryotic and eukaryotic microbes. Prokaryotes, which include bacteria and archaea, still make the major microscopic entity. Eukaryotic microbes or micro-eukaryotes are microscopic and nucleated organisms. Although, micro-eukaryotes evolved lately compared to prokaryotes, they represent a crucial biotic component of the environment. They are involved in many ecological processes along with their prokaryotic counterpart (Hedges et al. 2004). Micro-eukaryotes belong to many phylotypes such as unicellular and small multicellular protists, fungi, algae, etc. They have been grouped into seven supergroups including Opisthokonta, which includes multicellular animal and fungi, and Archaeplastida which includes green plants and algae. Amoebozoa, Stramenopiles, Alveolata, Rhizaria, and Excavata includes unicellular micro-eukaryotes (Adl et al. 2012).
The heterogeneous taxonomic and functional composition of microorganisms have been the object of interest for researchers since early times as they have been considered as the important connection between patterns of biological diversity and ecological functions (Loreau et al. 2001; Mouillot et al. 2011; Bardgett and Van Der Putten 2014; Lamarque et al. 2014). Most of the scientific studies in the past reported the isolation, diversity, and functional characterization of genes of microbes belonging to bacteria and archaea. These aspects of micro-eukaryotes have long been overlooked by the scientific community due to inherent characteristics of micro-eukaryotes as well as technical limitations of the study methods (Escobar-Zepeda et al. 2015; Marmeisse et al. 2017).
Studies on micro-eukaryotes involved conventional culture-based methods in the past. These methods rely on culturing microbes in growth media under standard laboratory conditions. But, a major proportion of microbes, including eukaryotic microbes from a given environment, is thought to be uncultivable by standard culture techniques. Therefore, these methods fail to identify many metabolically active but unculturable microbes. Moreover, these methods are generally tedious and unpredictable, thus are not potent for discovering the diversity of microbes and their functions efficiently and accurately. Therefore, it is very unlikely that the traditional culture-based method can explore and identify the diversity of micro-eukaryotes and wealth of micro-eukaryotic genes in a microbial community. Studies estimate that about 95% of the predicted micro-eukaryotic species are currently uncharacterized (Hawksworth and Lücking 2017; Larsen et al. 2017; Pawlowski et al. 2012). In case of terrestrial ecosystems, it is known that approximately 99% of the microorganisms cannot be cultivated. Therefore, culture-based approaches only provide a modest recovery of about 1% of the overall microbial biodiversity of the soil (Gilbert et al. 2012; Nazir 2016). Consequently, massive knowledge gap exists regarding the diversity, phylogeny, and function of micro-eukaryotes, particularly in terrestrial ecosystems (Olm et al. 2019).
In recent years, culture-independent techniques have been established to surpass the constraints of conventional culture-based approaches. Culture-independent techniques have become a potent tool for studying microbial communities, allowing researchers to investigate them irrespective of whether member organisms can be cultured under laboratory conditions. These methods are based on the analysis of nucleic acids directly isolated from the environmental samples. From its advent, the culture-independent methods have progressed with the advancements in the area of genomics and transcriptomics that have undergone a profound evolution in the last 20 years with progressive developments in the area of molecular techniques. The aid of metagenomics and metatranscriptomics to environmental samples for microbiota excavations can be well depicted by the continuous enrichment of publications in this area. This review compiles the metagenomic and metatranscriptomic studies on soil micro-eukaryotes, recently used methodological approaches in the field, and the main challenges along with future directions in this area.
Soil: a complex ecosystem with diverse forms of eukaryotic life playing diverse role
Soil represents an open, dynamic, and complex ecosystem. It is exceptionally variable, both temporally and spatially, in the physical structure and chemical composition (Carmon and Ben-Dor 2017). It is a vital natural resource for majority of living beings, including humans, animals, and plants. Soil is one of the major ecosystems to understand basic biological activities. It offers ecosystem services such as water and nutrient cycling. It represents the largest terrestrial pool of organic carbon (Cambou et al. 2016). It also serves as a growth medium for plants. All of these have a huge impact on agricultural production (Dotto et al. 2017; Lal and Moldenhauer 1987). However, the composition, integrity, and quality of soil are impacted by numerous anthropogenic activities, such as discharge of untreated industrial and municipal wastes, urbanization, and improper agricultural practices. The soil property is also affected by the climate change.
Soil inhabits diverse forms of life, and its integrity is crucial for the existence of these life forms. It harbors a vast majority of microorganisms, approximately 10 billion per gram (Torsvik and Øvreås 2002). Microorganisms serve important roles in balancing the soil ecosystem. Micro-eukaryotes are essential components of the microbial community of soil (Coleman and Whitman 2005; Adl and Gupta 2006). These organisms participate in numerous processes within terrestrial ecosystems. For example, fungi in soil, which forms a vast majority of microbial community (Bailey et al. 2002), are responsible for the degradation and accretion of organic matter, which in turn, help to provide proper nutrition to plants (Liu et al. 2015). Many of these are responsible for causing diseases to plants and other higher organisms, including humans (Gauthier and Keller 2013). Soil also represents an extensive reservoir of parasitic fungi and protists (Gutierrez et al. 2016; Siñski and Behnke 2004). Soil protists are crucial as they are predators of soil and graze on bacterial communities (Bonkowski 2004), thus helps in maintaining the cycles of nitrogen and carbon (Bonkowski and Schaefer 1997; Esteban et al. 2006). Few microalgae groups are also members of micro-eukaryotic community of soil (Salih and Hassan 2021). These micro-eukaryotic algae, part of Archaeplastida, Excavata, Rhizaria, and Alveolata, play essential roles in soil ecosystems by driving primary production through photosynthesis. Nevertheless, they are also associated with other significant roles such as serving as pioneers in soil formation, contributing to soil aggregate stability, producing biologically active compounds, and serving as a food source for various heterotrophic soil organisms (Zancan et al. 2006). Certain metazoans can also be a part of the micro-eukaryotic community, with nematodes being the primary representatives at various trophic levels. These metazoan species feed on a variety of soil organisms, including bacteria, fungi, protists, and other nematodes (Ferris 2010).
The species composition and their abundance in different soils of varied environments could be helpful to understand the relation of micro-eukaryotes with the specific conditions of the soil. This may also be helpful in elaborating the eukaryotic taxonomy in finer details. Micro-eukaryotes in soil serve as valuable reservoir of genes encoding numerous proteins and enzymes. These genes are not only involved in general metabolism, but also in withstanding or coping with the extreme or stressful conditions for their survival. Pollution and climate change alter the properties of soil and directly affect the microbes residing in their particular habitat. Because of these deviations in soil, microbial diversity pattern and expression profile of their genome are also affected. Therefore, these microbes and their activities need to be analyzed for the basic understanding of biological processes and to discover novel genes and their products that may have value in biotechnological applications.
Structural metagenomics and structural metatranscriptomics: revealing the entire and active micro-eukaryotic diversity in soil
Soil structural metagenomics
The emergence of the subfield of genomics known as “metagenomics,” alternatively termed ecogenomics, community genomics, or environmental genomics, enabled the characterization of microbial profiles in various environments and the exploration of genomes from numerous unknown and unculturable microbes. Metagenomics exploits various genomic approaches to unveil the diversity of genes that are taxonomically and phylogenetically significant, as well as functional protein-encoding genes with biotechnological values without being constrained by the usual restrictions of species cultivation (Riesenfeld et al. 2004; Schmeisser et al. 2007; Uhlik et al. 2013; Batista-García et al. 2016). It involves the genomic analysis by directly extracting the DNA from various microorganism communities in most Earth ecosystems, studying it in different ways (Fig. 1). For microbial community architecture study in soil, two distinct approaches are typically employed under structural metagenomics: 1. marker-gene-based targeted metagenomics and 2. whole-genome or “shotgun” metagenomics.
Fig. 1.
Cumulative description of processes and pipelines of metagenomics and metatranscriptomics to study soil micro-eukaryotes. Flow diagram is showing DNA-based metagenomic and RNA-based metatranscriptomic approaches to study community profiles (structural) and functions (functional) of micro-eukaryotes in the soil. Structural metagenomics employs sequencing of metagenomic or PCR-amplified marker DNA for the study of entire micro-eukaryotic community structure in the soil. However, structural metatranscriptomics is based on sequencing of total RNA- or PCR-amplified cDNA synthesized from marker RNA sequence for the study of active micro-eukaryotic community in the soil. Functional metagenomics includes metagenomic library construction followed by function- or sequence-based study of individual clones. This library is used to decipher potential functions of micro-eukaryotes in the soil and screening of genes encoding enzymes/proteins of interest. Functional metatranscriptomics relies on cDNA library constructed from soil-extracted micro-eukaryotic mRNA. This library is further used for the study of actual functional diversity of soil micro-eukaryotes through high-throughput sequencing and in silico functional predictions of sequences. This metatranscriptomic library is also exploited for the screening of micro-eukaryotic genes with desired function by expressing them in a suitable expression system. Studies based on the described approaches are being applied to explore the community structure of soil micro-eukaryotes and to screen novel functional eukaryotic genes that may have applications in biotechnology
The marker-gene-based targeted metagenomics involves the use of conserved eukaryotic marker genes such as large subunit 28S rDNA, small subunit 18S ribosomal DNA (rDNA), mitochondrial cytochrome b gene, β-tubulin gene, internal transcribed spacers (ITS), etc. (Schöler et al. 2017) (Fig. 1). Degenerate primers are designed from the conserved regions of the marker gene to amplify the gene sequences using polymerase chain reaction (PCR) from soil-extracted genomic DNA. The amplified DNA products are used for the construction of DNA library in a suitable vector. The library is maintained in a suitable host, and clones are sequenced. The sequence data are further analyzed for taxonomic identifications. The method, where clones are sequenced individually by Sanger sequencing, is time-consuming and a number of rare sequences can be easily overlooked. More recently, amplified DNA are sequenced by high-throughput sequencing that generates large volume of data from a few nanograms of the DNA.
Raw sequence reads produced from the amplified targeted marker region of a metagenome are analyzed to estimate the diversity and taxa composition. Many pipelines have been developed that can be used to process and annotate targeted metagenome sequence data. The commonly used comprehensive data processing, taxonomic annotation, and analysis pipelines are QIIME (Caporaso et al. 2010), MOTHUR (Schloss et al. 2009), and Parallel-Meta Suite (Chen et al. 2022). In addition, there exist specialized computational pipelines designed for identifying micro-eukaryotes in metagenomic data. As per Lind and Pollard (2021), EukDetect is a precise and sensitive bioinformatic tool for identification of micro-eukaryotes from whole metagenome shotgun sequencing. It encompasses a wide range of microbial eukaryotes and excels in detecting low-abundance and closely related species. In addition, it demonstrates resilience against bacterial contamination in eukaryotic genomes (Lind and Pollard 2021). CCMetagen is a pipeline utilized for the comprehensive and precise identification of micro-eukaryotes and prokaryotes in metagenomic data, ensuring accurate results (Marcelino et al. 2020). It effectively utilizes the complete NCBI nucleotide collection as a reference for identifying species with incomplete genome data across various biological kingdoms. (Marcelino et al. 2020).
Accurate taxonomic categorization of sequence reads stands as a critical element in the analysis of data derived from marker genes. Commonly, reference-based methods are employed, wherein taxonomic assignments rely on simple sequence similarity searches against databases containing 18S rRNA reference sequences. Such databases include SILVA (Quast et al. 2013) and Ribosomal Database Project (RDP) (Cole et al. 2014). Preference is also given by scientific community to other important databases such as the International Nucleotide Sequence Database Collaboration (INSDC: GenBank, EMBL, and DDBJ) which harbors thousands of Sanger-sequenced clones from environmental survey of eukaryotic diversity. Recently, certain eukaryote-specific databases are also developed. These databases are EukRef (del Campo et al. 2018), EukRibo (Berney et al. 2022), and MetaEuk (Levy Karin et al. 2020). Moreover, group-specific databases are created which exclusively contain reference sequences from a particular domain of eukaryotic organisms (Table 1). PR2 (Protist Ribosomal Reference) Database is a 18S rRNA curated database for protists (Guillou et al. 2013). 18S-NemaBase is a curated database containing 18S rRNA sequences of nematodes (Gattoni et al. 2023). UNITE is a database based on eukaryotic nuclear ribosomal ITS region (Nilsson et al. 2019). This database is generally used for the taxonomic identification of fungal species.
Table 1.
Commonly used computational software resources and databases for metagenomic and metatranscriptomic data processing and analysis including processing and analysis of micro-eukaryotic sequence data
Software/database | Web address | Applications |
---|---|---|
General metagenomic/metatranscriptomic pipelines | ||
Parallel-Meta Suite | http://bioinfo.single-cell.cn/parallel-meta.html | Comprehensive and full-automatic computational toolkit for rapid data mining for both metagenomic shotgun sequences and 16S/18S/ITS rRNA amplicon sequences |
VEBA | https://github.com/jolespin/veba | A modular end-to-end suite for in silico recovery, clustering, and analysis of prokaryotic, micro-eukaryotic, and viral genomes from metagenomes |
BIOCOM-PIPE | https://zenodo.org/record/3678129 | Metabarcoding pipeline for the characterization of microbial diversity from 16S, 18S, and 23S rRNA gene amplicons |
MetaTrans | https://manichanh.vhir.org/metatrans.org/index.html | An open-source pipeline for metatranscriptomics |
QIIME | http://www.qiime.org | Data processing, OTU picking, taxonomy assignment of fungal, viral, bacterial, and archaeal communities |
Mothur | http://www.mothur.org/ | Data processing, OUT picking, taxonomy assignment, chimera checking and ecological analyses |
MG-RAST | https://www.mg-rast.org/ | The MG-RAST pipeline performs quality control, protein prediction, clustering, and similarity-based annotation on nucleic acid sequence datasets using a number of bioinformatics tools and supports amplicon (16S, 18S, and ITS) sequence datasets and metatranscriptome (RNA-Seq) sequence datasets |
Micro-eukaryote-specific metagenomic/metatranscriptomic pipelines | ||
EukDetect | https://github.com/allind/EukDetect | EukDetect pipeline provides an automated and reliable way to characterize eukaryotes in shotgun sequencing datasets from diverse microbiomes |
CCMetagen | https://github.com/vrmarcelino/CCMetagen | Comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data |
EukRef | https://github.com/eukref | EukRef aims to assemble a curated reference database of 18S rRNA gene sequences covering all eukaryotes |
General metagenomic/metatranscriptomic databases | ||
eggNOG | https://github.com/eggnogdb/eggnog-mapper | A database of orthology relationships, functional annotation, and gene evolutionary histories |
RDP | http://rdp.cme.msu.edu/ | Database and tools for high-throughput rRNA analysis of prokaryotic and eukaryotic sequences |
FOAM | https://github.com/mmdavid/FOAM | FOAM, a functional gene database provides a new functional ontology dedicated to classify gene functions relevant to environmental microorganisms based on hidden Markov models (HMMs) |
SILVA | https://www.arb-silva.de/ | SILVA provides comprehensive, quality checked, and regularly updated datasets of aligned (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea, and Eukarya) |
NCBI-SRA | https://www.ncbi.nlm.nih.gov/sra | The Sequence Read Archive (SRA) is the National Center for Biotechnology Information (NCBI) database that stores sequence data obtained from next generation sequence technology |
KEGG | https://www.genome.jp/kegg/ | KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies |
Micro-eukaryote specific metagenomic/metatranscriptomic databases | ||
NemaBase | http://www.WormsEtAl.com/databases | Curated 18S rRNA Database of Nematode Sequences |
EukRibo | https://zenodo.org/record/6327891 | A manually curated eukaryotic 18S rDNA reference database to facilitate identification of new diversity |
MetaEuk | https://github.com/soedinglab/metaeuk | Sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics |
UNITE | https://unite.ut.ee/ | It is a web-based database and sequence management environment for the molecular identification of fungi. It targets the formal fungal barcode and the nuclear ribosomal internal transcribed spacer (ITS) region |
Taxmapper | https://bitbucket.org/dbeisser/taxmapper/src/master/ | It is an analysis tool, reference database, and workflow for metatranscriptome analysis of eukaryotic microorganisms |
ALCOdb | http://alcodb.jp | Algae Gene Coexpression database (ALCOdb) provides gene coexpression information to survey gene modules for a function of interest |
PR2 | https://pr2-database.org/ | The PR2 (Protist Ribosomal Reference) database ecosystem is a set of three interconnected 18S rRNA databases that are useful in particular for metabarcoding applications |
Quality control tools | ||
Trim Galore | https://github.com/FelixKrueger/TrimGalore | A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data |
MultiQC | http://multiqc.info | Summarize results from different analysis (such as FastQC) into one report |
Trimmomatic | http://www.usadellab.org/cms/?page¼trimmomatic | Flexible read trimming tool for Illumina data |
Cutadapt | https://cutadapt.readthedocs.io | Find and remove adapter sequences, primers, poly-A tails and other types of unwanted sequence |
FastQC | http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ | Quality control tool showing statics such as quality values, sequence length distribution, and GC content distribution |
Assembly, Assessment of assembly and Clustering tools | ||
MetaSPAdes | https://github.com/ablab/spades | MetaSPAdes has an efficient approach for analyzing micro-diversity, a new repeat resolution pipeline that utilizes rare strain variants to improve the consensus assembly of strain mixtures |
MOCAT2 | http://mocat.embl.de/ | Pipeline for read filtering, taxonomic profiling, assembly, gene prediction, and functional analysis |
Megahit | https://github.com/voutcn/megahit | Co-assembly of metagenomic reads with variable k-mer lengths and low memory usage |
SOAPdenovo2 | https://github.com/aquaskyline/SOAPdenovo2 | An empirically improved memory-efficient short-read de novo assembler |
BUSCO | http://busco.ezlab.org/ | Assess genome assembly and gene set completeness based on single-copy orthologs, also for eukaryotes |
CheckM | http://ecogenomics.github.io/CheckM/ | Tools for assessing quality of (meta)genomic assemblies providing genome completion and contamination estimates, especially for bacteria and viruses |
CD-HIT | http://cd-hit.org | A fast program for clustering and comparing large sets of protein or nucleotide sequences |
UCLUST | http://www.drive5.com/usearch | A new clustering method that exploits USEARCH to assign sequences to clusters |
Aligners, classifier, profilers | ||
MetaBAT2 | https://bitbucket.org/berkeleylab/metabat | An adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies |
Mash | http://mash.readthedocs.io | MinHash-based taxonomic profiler enabling superfast overlap estimations |
Kraken-HLL | https://github.com/fbreitwieser/kraken-hll | Extension of Kraken counting unique k-mers for taxa and allowing multiple databases |
COCACOLA | https://github.com/younglululu/COCACOLA | Binning contigs in using read coverage, correlation, sequence composition and paired-end read linkage |
MaxBin 2.0 | http://sourceforge.net/projects/maxbin/ | An automated binning algorithm to recover genomes from multiple metagenomic datasets |
Kaiju | https://github.com/bioinformatics-centre/kaiju | Fast taxonomic classifier against protein sequences using FM-index with reduced amino acid alphabet |
Kallisto | https://github.com/pachterlab/kallisto | Taxonomic profiler using pseudo-alignment with k-mers using techniques based on transcript (RNA-seq) quantification |
PanPhlAn | http://segatalab.cibio.unitn.it/tools/panphlan/ | Pan-genome-based phylogenomic analysis |
PhyloPythiaS( +) | https://github.com/algbioi/ppsp/wiki | A self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes |
MetaPhlAn2 | http://segatalab.cibio.unitn.it/tools/metaphlan/ | Marker gene-based taxonomic profiler |
GOTTCHA | http://lanl-bioinformatics.github.io/GOTTCHA/ | Taxonomic profiler that maps read against short unique sub-sequences (‘signature’) at multiple taxonomic ranks |
DIAMOND | https://github.com/bbuchfink/diamond | Protein homology search using spaced seeds with a reduced amino acid alphabet, 2000–20000 times faster than BLASTX |
Kraken | https://ccb.jhu.edu/software/kraken/ | Fast taxonomic classifier using in-memory k-mer search of metagenomics reads against a database built from multiple genomes |
WebMGA | https://github.com/weizhongli/webMGA | A customizable web server for fast metagenomic sequence analysis |
BLAST + | https://blast.ncbi.nlm.nih.gov | Highly sensitive nucleotide and translated-nucleotide protein alignment |
Shotgun metagenomics involves sequencing all microbial genomes within a sample without specific targets (Pérez-Cobas et al. 2020). It involves extraction of high-quality total community DNA which is fragmented to obtain DNA of desired length, followed by sequencing on compatible platform (Sabale et al. 2020). Thus, whole-genome sequencing (WGS) captures all the DNA fragments from the entire array of species, including micro-eukaryotes within a microbiome (Pérez-Cobas et al. 2020).
DNA sequence reads generated after WGS of metagenome can not only reveal the genomic diversity of the microbiome but the taxonomic characterization of individual microbes. After raw reads are produced, quality filtering is essential to generate high-quality data. Elimination of sequencing errors is required which consequently help to reduce the false positives. The filtered sequence data can be further analyzed with or without assembly (Pérez-Cobas et al. 2020). The prevalent assembly tools utilized for metagenomic samples include MEGAHIT (Li et al. 2015), metaSPAdes (Nurk et al. 2017), and SOAPdenovo2 (Luo et al. 2012). Contigs obtained after assembly can be used for binning which attempts to assign every metagenomic sequence to a taxonomic group (Sharpton 2014). Several binning tools are available, for example, MaxBin 2.0 (Wu et al. 2016), MetaBAT2 (Kang et al. 2019). The abundance of tools for assembly and binning can be overwhelming, making the selection process quite challenging. Few commonly used computational resources and databases for marker gene and shotgun sequence data processing and analysis are listed in Table 1.
Marker-based targeted metagenomics offers a cost-effective way to collect eukaryotic microbial diversity information with minimal sequencing. However, it comes with various constraints such as limited taxonomic precision, copy number variations in the 18S rDNA, and taxonomic biases linked to PCR amplification, including the number of cycles used and the choice of primers or hyper-variable regions (Frioux et al. 2020). Shotgun metagenomics offers a taxonomic classification advantage as it is typically less influenced by biases linked to PCR, which is essential for amplifying marker genes. Furthermore, WGS enables taxonomy to be assigned at the strain and species levels (Pérez-Cobas et al. 2020). As WGS allows identification of organisms from entire biological realm, multiple challenges remain associated to identify eukaryotes in WGS datasets. First, eukaryotes constitute a smaller fraction of the total reads in shotgun sequencing compared to bacteria. The second major barrier is the unavailability of specific robust methodological tools for eukaryotic data analysis. The available bioinformatic tools are not able to accurately address the eukaryotic genomes due to widespread contamination of bacterial sequences in the sequencing data, and therefore these methods sometimes misattribute bacterial reads to eukaryotic species (Lind and Pollard 2021).
Although soil harbors a remarkable variety of micro-eukaryotes, research utilizing 18S rDNA sequences amplified from soil DNA (Lesaulnier et al. 2008) consistently reveals the prevalence of fungi within soil micro-eukaryotic communities. A study using the Illumina MiSeq platform to analyze eukaryotic microbial diversity in the rhizospheric soil of a high-altitude alpine forest revealed that Ascomycota (63% of the relative abundance) and Basidiomycota (13% of the relative abundance) were the two most abundant phyla (Praeg et al. 2019). High-throughput sequencing using the 18S rDNA to investigate the impact of crop rotation (pea and wheat) on micro-eukaryotic diversity revealed that fungal diversity was significantly impacted. Pea–wheat rotation increased the abundance of Fusarium graminearum, followed by Rhizopus, and Boeremia (Woo et al. 2022). Thakur et al. (2022) conducted a study on the diversity of micro-eukaryotes in soil contaminated with heavy metals. They utilized amplicons from the hypervariable region of 18S rDNA extracted from this polluted soil, employing paired-end chemistry on the Illumina MiSeq platform. As soil protists are an indispensable part of soil micro-eukaryotic communities, Bates et al. (2013) used high-throughput pyrosequencing of 18S rDNA to conduct a detailed and comprehensive survey on protistan diversity in soils of different geographical regions representing a variety of biome types ranging from tropical forests to deserts. The studied sequences belonged to five supergroups, with taxa dominated by the SAR group (Stramenopile, Alveolates, and Rhizaria). Heger et al. (2016) performed the first study to show a consistent decrease in protist diversity and change in community pattern along with the elevational gradient in alpine habitats.
Capra et al. (2016) designed novel PCR primer sets that specifically targeted the V4–V5 and V5–V7 variable regions of the 18S rDNA. These primers were able to capture the extensive taxonomic diversity of metazoans residing in the soil ecosystem. Although soil nematodes are highly abundant metazoans in the pedosphere, constituting 80–90% of all metazoans on Earth (van den Hoogen et al. 2020), only 4% of the approximately 1 million species have been formally identified and described (Creer et al. 2010). Kenmotsu et al. (2020) employed primer sets targeting four regions of the 18S rDNA to identify the most suitable regions for Illumina MiSeq-assisted amplicon sequencing. This approach enabled them to conduct next generation sequencing (NGS)-assisted DNA barcoding on individual soil nematodes. The results showed that employing a specific primer set targeting region 4 in DNA barcoding, coupled with analysis using DADA2, is the most appropriate method for taxonomic analysis of soil nematodes. Sapkota and Nicolaisen (2015) developed a DNA amplification method that effectively amplifies nematode DNA (along with other metazoan DNA) from various agricultural soils, eliminating the need for enrichment. Their findings revealed that 64.4% of the sequences were nematode DNA, with minimal plant or fungal sequences. The remaining 30% comprised diverse metazoan species. These nematode sequences encompassed a wide taxonomic range, predominantly from nematode species previously identified as common in soil, including Tylenchida, Rhabditida, Dorylaimida, Triplonchida, and Araeolaimida. A high-throughput sequencing study to assess the composition and diversity of Sphagnum-associated micro-eukaryotes residing in peatland revealed that the majority of operational taxonomic units (OTUs) were assigned to the five supergroups of eukaryotes; Opisthokonta, Alveolata, Archaeplastida, and Rhizaria, in order of taxonomic richness. The study also found that under drought conditions, micro-eukaryotic diversity decreased and this change in their community could have an impact on overall functioning of peatland ecosystem (Reczuga et al. 2020).
Soil structural metatranscriptomics
The DNA-based metagenomic diversity analysis gives an idea of the ‘entire’ micro-eukaryotic community. However, it provides no information about the ‘active’ members because the sample may contain many microbes that are not physiologically active. DNA from dead and inactive cells and extracellular DNA pose another drawback of this method (Torti et al. 2015; Carini et al. 2016). RNA-based metatranscriptomic diversity analysis has an advantage over structural metagenomics because it utilizes transcripts from expressed genes. Consequently, it offers more pertinent data for ecological evaluations by focusing on the segment of the community engaged in active interactions with the environment, rather than the community as a whole (Hempel et al. 2022). The initial step of structural metatranscriptomics is to isolate the total RNA from soil, which includes RNA from both prokaryotes and eukaryotes. This is followed by the conversion of RNA into complementary DNA (cDNA) by reverse transcription (Fig. 1). The resulting cDNA can be sequenced directly (Urich et al. 2008) with or without the cloning steps. Alternatively, specific expressed markers from cDNA can be PCR amplified using degenerate primers, followed by sequencing (Damon et al. 2012; Lehembre et al. 2013). Sequencing total RNA directly after reverse transcription offers the advantage of being unbiased by PCR or cloning methods (Lehembre et al. 2013). Furthermore, it enhances sequencing information by incorporating commonly employed genetic markers. This is significant as approximately 80% of RNA comprises rRNA, encompassing both the small subunit (SSU) and the large subunit (LSU) rRNA markers of micro-eukaryotes (Westermann et al. 2012; Peano et al. 2013).
Numerous bioinformatics pipelines have been created to analyze metatranscriptomic data effectively. For instance, MG-RAST conducts quality control, protein prediction, clustering, and similarity-based annotation on nucleic acid sequence datasets, employing various bioinformatic tools. It can also analyze specific amplicon sequence datasets for 16S, 18S, and ITS, as well as sequence datasets for metatranscriptome (RNA-Seq) (Meyer et al. 2008). An efficient open-source pipeline known as MetaTrans is used to analyze the structural composition along with functions of active microbial communities (Martinez et al. 2016) in a microbiome (Table 1).
Most studies utilizing metatranscriptomics to evaluate the active diversity of soil micro-eukaryotic communities have indicated the presence of all major eukaryotic clades in soil samples (Urich et al. 2008; Damon et al. 2011; Lehembre et al. 2013). Bailly et al. (2007) conducted groundbreaking research on the diversity of soil micro-eukaryotic community using reverse-transcribed soil 18S rRNA, focusing on RNA-based analysis. The analysis revealed that over 70% of the sequences were attributed to fungi and protists, with metazoans being the other significant group identified. Urich et al. (2008) conducted direct pyrosequencing of total RNA to thoroughly analyze a soil micro-eukaryotic community. Their findings revealed that fungi constituted the majority at 50%, with plants at 20% and metazoans at 10%. Among fungi, Ascomycota was the most prevalent phylum, comprising two-thirds of the sequences, followed by Glomeromycota and Basidiomycota. Tveit et al. (2013) applied this method to analyze high-arctic peat soils, revealing active eukaryotic microorganisms. Their findings showed that 15–30% of the sequences belonged to eukaryotes. Within the eukaryotic sequences, Alveolates made up 17–20%, while other prevalent protists included Amoebozoa and Rhizaria, both bacterial predators, as well as Stramenopiles. Fungi and metazoans constituted a smaller portion of the eukaryotic sequences. This was in contrast to the temperate soil where fungi and metazoans represented major fraction of all micro-eukaryotes (Urich et al. 2008). Similarly, Geisen et al. (2015) performed direct sequencing approach to reveal the diversity of the active soil protist communities within mineral and organic soil. All five supergroups of these organisms were reported from each site, with the sequences from Rhizaria being the most numerous. Bang-Andreasen et al. (2020) utilized total RNA sequencing to reveal significant shifts in the active eukaryotic microbiome of agricultural and forest soil following wood ash amendment. They noted a substantial rise in the bacterivorous protozoan group due to increased bacterial growth. This finding underscores the crucial role of protozoans in controlling bacterial abundance in soil. Damon et al. (2011) conducted a study on forest soil’s micro-eukaryotic community, employing PCR amplification of 18S cDNA derived from total RNA extracted from the soil. Their findings revealed a prevalence of fungal sequences (up to 60%) and metazoans, with protists accounting for less than 12% of the sequences. Similarly, for heavy metal-contaminated and non-contaminated soil, taxonomic affiliation of the 18S ribosomal sequences revealed a dominance of fungi and metazoans representing up to 73% of the total 18S rRNA sequences (Lehembre et al. 2013). Thakur et al. (2022) studied the active micro-eukaryotic diversity in heavy metal-contaminated soil by analyzing 18S hypervariable region amplified from total soil RNA. A summary of the studies on micro-eukaryotic diversity in different soil ecosystems is summarized in Table 2.
Table 2.
Structural-metagenomics- and structural-metatranscriptomics-based studies on the micro-eukaryote diversity of different soil ecosystems
Year | Sample site | Technology | Targeted region | Representative microbial diversity identified | Citations |
---|---|---|---|---|---|
2022 | Heavy-metal-contaminated soil | Illumina-MiSeq platform | 18S rDNA and cDNA synthesized from 18S rRNA | Opisthokonta, Amoebozoa, and fungi | Thakur et al. (2022) |
2022 | Crop rotation soil | 454 Life Sciences, Roche | 18S rRNA gene | Fusarium, Schizosaccharomycetaceae, Capnodiales, Rhizopus, and Boeremia | Woo et al. (2022) |
2022 | Mangrove sediments | Illumina MiSeq | 18S rDNA genes | Ascomycota, Dothideomycetes, Eurotiomycetes and Sordariomycetes | Zhang et al. (2022) |
2022 | Sand, clay, and rock substrates | Illumina MiSeq | 18S rRNA genes | Basidiomycota, Ascomycota, Phragmoplastophyta, Chytridiomycota, Cercozoa, Vertebrata, Dinoflagellata, Arthropoda, Nematozoa, Mucoromycota and Metazoa | Meyer et al. (2008) |
2020 | Peatland soil | Illumina Hiseq technology | 18s rDNA V9 region | Opisthokonta, Alveolata, Archaeplastida, Rhizaria | Reczuga et al. (2020) |
2019 | Antarctic dry valley soils | FLX (Roche) amplicon pyrosequencing | 18S rRNA genes | Tilletiopsis, Sporobolomyces Alveolata, and Halteria | Niederberger et al. (2019) |
2017 | Mangrove soil sediment | Illumina Hiseq | 18s rDNA | Streptophyta, Ascomycota | Imchen et al. (2017) |
2016 | Arctic soil from the Glacier | Ion Torrent Personal Genome Machine (PGM) platform | Operational taxonomic units (OTUs) | Ascomycota, Streptophyta, and Chordata | Seok et al. (2016) |
2016 | Soil samples | Illumina | 18s rRNA genes | Soil metazoans community | Capra et al. (2016 |
2015 | Tubers of G. flavilabella and the surrounding soil | Illumina HiSeq | 28S rDNA segments | Basidiomycota Ascomycota and Agaricomycetes | Liu et al. (2015) |
2015 | Mineral and organic soil | RNA-Seq Technology | 18s rRNA | Rhizaria | Geisen et al. (2015) |
2013 | Soil from metal smelting sites | Sanger sequencing of 18s rRNA | 18s rRNA | Fungi and Metazoans | Lehembre et al. (2013) |
2013 | Soil of diverse climate | pyrosequencing using FLX + technology | 18s rRNA | SAR subgroup | Bates et al. (2013) |
2013 | High-Arctic peat soils | RNA-Seq Technology | 18s rRNA | Dominant taxa Alveolates, Amoebozoa, Rhizaria, and Stramenopiles, fungi and metazoan less fraction | Tveit et al. (2013) |
2012 | Forest soil | Sanger sequencing of cDNA sequences | 18s rRNA | Fungi, Metazoans, and Protists | Damon et al. (2012 |
2008 | Soil | RNA-Seq Technology | 18s rRNA | fungi (50%) followed by plants (20%) and Metazoans (10%) | Urich et al. (2008) |
2007 | Forest soil | Sanger sequencing of 18s rDNA | 18s rDNA | Fungi, Protists, and Metazoa | Bailly et al. (2007) |
Challenges of structural metagenomics and structural metatranscriptomics to study micro-eukaryotic diversity in soil
In marker-gene-based methods, the findings clearly show that no PCR primers are currently available that can amplify all known eukaryotic groups efficiently and uniformly. New set of primer pairs are continuously designed and used for the analysis of micro-eukaryotes in soil to obtain a better taxonomic resolution (Hugerth et al. 2014; Capra et al. 2016). Therefore, the quest for the “true universal primers” remains unsolved. Thus, precise recognition of eukaryotic microbial organisms requires meticulous PCR primer selection, prioritizing the sequencing of immensely informative regions while preventing biases caused by uneven amplification of taxa. Ongoing endeavors should focus on creating 18S rDNA/rRNA primer sets suitable for modern high-throughput sequencing methods, capable of generating phylogenetically distinctive sequences with minimal prejudice. In addition, contemporary high-throughput sequencing platforms like Illumina, extensively employed in amplicon-based research (Kenmotsu et al. 2020), yield brief reads. The size of the amplicon and sequencing region may be crucial for accurate phylogenetic assignments. Thus, considerable development with respect to sequencing technology for obtaining longer reads with more accuracy is constantly required.
The study of micro-eukaryotes poses numerous challenges in the field of bioinformatics. In most eukaryotic organisms, ribosomal DNA (rDNA) is composed of 18S, 5.8S, and 28S units, arranged in repeated sequences within the genome. The number of rDNA copies is strongly linked to genome size, reaching tens of thousands in some species (Prokopowich et al. 2003), which complicates tasks such as clustering sequences into operational taxonomic units (OTUs) and correlating read numbers from marker surveys with individual counts in samples (Bik et al. 2012). Despite falling sequencing costs, deep sequencing is becoming the most cost-effective method for exploring the ecological and functional roles of complex ecosystems. However, this approach has specific technological limitations, including challenges related to user-friendliness, as well as the accuracy of analytical tools and pipelines. For a variety of reasons, including the development of binning techniques exclusively for prokaryotes, micro-eukaryotes are mostly overlooked from assembly-centric studies (Espinoza and Dupont 2022). Methodological heterogeneity, data accessibility, and undeveloped computational infrastructure presently pose challenges to conduct comparative eukaryotic meta-analyses. The creation of reliable computational pipelines for the assessment of marker gene data is dependent on accurate eukaryotic reference datasets with consistent taxonomic levels. It is difficult to provide precise statistics on the number of available genome sequences of eukaryotic microbes. There are significant disparities between lineages, some of which are still represented by a single genome while others are represented by hundreds (Marmeisse et al. 2017). For microbiome data to be findable, accessible, interoperable, and reusable, protocols and metadata collecting must be standardized. The data storage, accessibility, and sharing must also be simple and uncomplicated (Wilkinson et al. 2016). Despite being crucial to the field, computational methods are currently not autonomous and require expert curation during the analysis (Espinoza and Dupont 2022). These limitations can be addressed by further developments and studies in the field by exploiting different meta-omics approaches.
Functional metagenomics and functional metatranscriptomics: studying potential and actual functions of micro-eukaryotes in soil
Soil functional metagenomics
Soil functional metagenomics requires shearing of the total DNA isolated from soil sample and cloning of generated DNA fragments of different size into a suitable vector to create a metagenomic DNA library (Fig. 1). The library is generally propagated in bacterial species like E. coli. This library represents the genetic content of all the microbes present in the soil sample. Depending on the objective, the library can be further used to study various ecological and molecular aspects through different methods. The resulting metagenomic library may be used for screening of desired genes. There are two broad approaches for the screening of metagenomic library: 1. sequence-driven approach and 2. function-driven approach (metabolic activity based) (Simon and Daniel 2011). The sequence-driven approach might necessitate the use of primers tailored to specific targets (in PCR) or probes (in hybridization). These primers or probes are crafted from genes or protein families’ conserved regions, enabling the screening of metagenomic libraries for the desired sequence (Daniel 2005). High-throughput sequencing technologies gave another dimension to the sequence-driven approach for metagenomic libraries. Sequence-based analysis through high-throughput sequencing provides information on all the sequences present in the samples, and their functions can be predicted with the help of available bioinformatic tools. The sequence-based screening has an advantage that it does not depend on the expression of eukaryotic genes in a foreign host. Function-driven approach is metabolic activity-based method which involves the screening of clones that express a desired trait, followed by its identification and characterization via sequencing, protein expression, and molecular or biochemical analysis (Tripathi and Nailwal 2020). The major bottleneck of function-based screening of micro-eukaryotic genes is the selection of suitable substrates for the detection of positive clones. The function-based screening is still widely used because it has the advantage that prior sequence information is not necessary. Hence, it is the only method to screen ‘novel’ eukaryotic genes encoding enzymes or proteins directly from the soil. While, sequence-based screening is limited to genes for which the target sequence is partially or completely known.
The concept of functional metagenomics was initially proven through the sequencing of environmental genomes extracted from the water of the Sargasso Sea. This pioneer study revealed the metabolic diversity of microbial communities (Venter et al. 2004). Additional studies also provided information about expressed genes and proteins, their interacting protein partners, and novel genes and biocatalysts. Functional metagenomics have been successfully used to isolate and identify completely new protein families that form deeply branched evolutionary lineages (Ufarté et al. 2015). The functional analysis conducted using the Clusters of Orthologous Genes (COG) database identified sequences associated with a predicted transporter for divalent heavy-metal cations, a copper chaperone, and a metalloprotein responsible for delivering copper to superoxide dismutase. This metalloprotein plays a crucial role in antioxidant defense against oxidative stress (Passarini et al. 2022). The fungal community demonstrated tolerance activity through genes associated with the biosorption process of heavy metals (Passarini et al. 2022). In addition, Forsberg et al. (2015) discovered two variants of thymidylate synthase, ThyA and ThyX, which provided furfural tolerance to E. coli when expressed in a heterologous system. A phenolic acid decarboxylase (PadC) from the genus Pantoea showed tolerance against ferulic acid. This enzyme is described to have high industrial value (Forsberg et al. 2015).
Although functional metagenomics is a powerful method to isolate novel genes or proteins/enzymes with biotechnological value, it cannot be used as a proxy to infer the ‘actual’ microbial activities in the soil (Yadav et al. 2016), such as the immediate effect of the environmental factors on the endeavors of the soil micro-eukaryotic community. Furthermore, functional metagenomics largely favors the detection of archaeal and bacterial genes, micro-eukaryotic DNA constitutes merely a minor portion of the total DNA obtained from environmental samples (Marmeisse et al. 2017). The dilution of eukaryotic DNA is facilitated by the extensive size of the eukaryotic genome, primarily because of the prevalence of unidentified intronic or intergenic regions within it (Bailly et al. 2007; Yadav et al. 2016). Hence, only intron-less genes can be expressed efficiently by the metagenomic-based library. Application of cDNA synthesized directly from eukaryotic transcripts can be exploited to overcome these limitations of soil functional metagenomics.
Soil functional metatranscriptomics
Functional metatranscriptomics is a sub-area of metatranscriptomics that specifically targets the expressed protein-coding eukaryotic genes (Bailly et al. 2007). The process involves extracting total RNA from the samples collected from the environment (Fig. 1). Eukaryotic mRNA can be isolated from total RNA by targeting the poly-A tail, effectively removing ribosomal RNA, other non-coding RNAs, and bacterial mRNAs that are predominant in environmental metatranscriptomes (Grant et al. 2006; Bailly et al. 2007; Yadav et al. 2016; Marmeisse et al. 2017). These mRNA molecules are converted into complementary DNAs (cDNAs). These cDNAs can undergo direct sequencing using high-throughput methods or can be inserted into a suitable expression vector, like a yeast expression vector, for further analysis (Bailly et al. 2007). The environmental cDNA library that was created reflects the variety of genes expressed by various micro-eukaryotic species (Grant et al. 2006). One can investigate these libraries to discover new genes through heterologous expression in an appropriate model organism (Damon et al. 2011). The metatranscriptomic sequence data can be used to explore the functional aspects of the environment under study or can be screened to mine several potential genes or enzymes for biotechnological applications. The overview of a method to screen expressed genes from a specific soil environment by functional metatranscriptomics is illustrated in Fig. 2.
Fig. 2.
An overview of a method based on functional metatranscriptomic approach to screen expressed novel genes from natural or anthropogenically stressful soil. Double-stranded cDNAs synthesized from soil extracted micro-eukaryotic mRNAs is either subjected to direct metatranscriptome sequencing for sequence-based study or library construction for the function-based screening of genes. Metatranscriptome sequencing is followed by de novo assembly of transcripts and in silico annotation for the prediction of their putative functions. Encoded proteins/enzymes of selected transcripts may be subjected to molecular and biochemical characterization before their application in the industry. In function-based screening of genes, cDNAs may be first subjected to size fractionation through agarose gel electrophoresis. These size-fractionated cDNAs are further used for constructing sized cDNA libraries in a suitable vector and host to avoid limitations of a standard cDNA library. These libraries are screened for novel eukaryotic genes with desired functions by functional complementation in a suitable eukaryotic system. Screened genes are sequenced and subjected to protein expression before their molecular as well as biochemical characterization. These screened genes can be implicated in suitable biotechnological applications
Functional metatranscriptomics was successfully applied to study expressed micro-eukaryotic genes from forest soil. Bailly et al. (2007) pioneered the application of functional metatranscriptomics to study soil from a Pine tree forest. They identified histidine biosynthetic genes by complementing a histidine auxotrophic yeast mutant. Using similar approach, a new group of eukaryotic oligopeptide transporters were identified through functional complementation of a yeast mutant lacking the ability to uptake di/tripeptides (Damon et al. 2011). The extensive analysis of yeast mutants expressing environmental transporters using high-throughput phenotyping revealed their ability to interact with a wide range of substrates. These transporters were subsequently introduced into Xenopus oocytes for further study (Damon et al. 2011). A gene from fungi found in a metatranscriptomic library from a sugar maple forest was identified through functional complementation of yeast mutants. This gene encodes both acid phosphatase and imidazole glycerol-phosphate dehydratase (Kellner et al. 2011). Forest soils harbor numerous carbohydrate active enzymes (CAZymes) that participate in the degradation of lignocellulosic biomass. Kellner et al. (2014) implemented functional metatranscriptomic approach to study widespread occurrence of litter and coarse woody debris-dwelling fungal genes involved in decomposition of organic matter via peroxidase secretion.
Functional metatranscriptomics has also been utilized for the study of expressed genes and enzymes in polluted soils. Lehembre et al. (2013) implemented this approach and were successful in isolating functional micro-eukaryotic genes participating in heavy metal resistance. Ziller et al. (2017) conducted the initial comprehensive biochemical analysis of ‘environmental proteins’ encoded by micro-eukaryotic genes of unidentified taxonomic origin. This analysis involved screening these proteins through a functional metatranscriptomic approach. These environmental proteins were cysteine-rich proteins (CRP) that were shown to be involved in metal chelation and homeostasis. Using functional metatranscriptomics, Thakur et al. (2018) identified many cadmium (Cd)-tolerant genes including eukaryotic ubiquitin fusion protein (UFP) from heavy-metal-contaminated soil. Eukaryotic UFP proteins play important roles in abiotic stress conditions like heavy metal stress; hence, they can be further explored to use them as biomarkers to detect heavy metal contamination. Mukherjee et al. (2019a) identified a gene encoding a metal-tolerant aldehyde dehydrogenase from soil contaminated with metals. They achieved this through functional complementation of a yeast mutant. A functional complementation assay was employed to screen for metal tolerance, leading to the discovery of a new family of serine protease inhibitors (Serpin) (Mukherjee et al. 2019b). Thakur et al. (2019), discovered a new gene, the von Willebrand factor type D domain (VWD) of vitellogenin, which possesses antioxidant properties. They found this gene in metal-contaminated soil through screening a eukaryotic cDNA library using a yeast complementation assay. A study in functional metatranscriptomics unveiled the discovery of several active eukaryotic genes (Thakur et al. 2022). This study identified 38 unique genes showing high tolerance toward heavy metals. Bragalini et al. (2014) introduced a novel version of solution hybrid selection (SHS) tailored for the effective retrieval of functional cDNAs derived from poly-A mRNAs extracted from soil. These cDNAs can be employed for comprehensive studies of eukaryotic gene families within microbial communities thriving in diverse environments. Thus, functional metatranscriptomics is serving as a promising approach to study the functional aspects of micro-eukaryotes in different soil types and resulting in the discovery of new pathways, protein, and enzymes with promising biotechnological potential. Furthermore, this method could be extensively utilized in microbial ecology to comprehend the involvement of active microorganisms in distinct environmental circumstances. A detailed summary of the functional metatranscriptomic studies on soil micro-eukaryotes is summarized in Table 3.
Table 3.
Functional metatranscriptomic-based studies on micro-eukaryotes of different soils
Sample site | cDNA preparation | Key findings | Citation |
---|---|---|---|
Agroforestry contaminated soil | Mint-2 cDNA synthesis kit (Evrogen) using poly-dT primer | Cadmium-tolerant cDNA clones | Thakur et al. (2022) |
Metal-contaminated forest soil | Mint-2 cDNA synthesis kit (Evrogen) using poly-dT primer | Serine protease inhibitor conferring metal tolerance | Mukherjee et al. (2019b) |
Metal-contaminated forest soil | Mint-2 cDNA synthesis kit (Evrogen) using poly-dT primer | Aldehyde dehydrogenase conferring metal tolerance | Mukherjee et al. (2019a) |
Agroforestry contaminated soil | SMART cDNA Library Construction Kit (Clontech) using poly-dT primer | VWD-like protein/ multi-metal tolerant | Thakur et al. (2019) |
Agroforestry contaminated soil | SMART cDNA Library Construction Kit (Clontech) using poly-dT primer | Ubiquitin fusion protein/heavy metal tolerance gene | Thakur et al. (2018) |
Soil near metal smelter | SMART cDNA Library Construction Kit (Clontech) using poly-dT primer | Metal-tolerant genes | Lehembre et al. (2013) |
Beech and spruce forest | SMART cDNA Library Construction Kit (Clontech) using poly-dT primer | Eukaryotic diversity, novel enzymes for organic matter degradation | Damon et al. (2012) |
Spruce forest | SMART cDNA Library Construction Kit (Clontech) using poly-dT primer | Novel fungal oligopeptide transporter | Damon et al. (2011) |
Sugar maple forest soil | SMART cDNA Library Construction Kit (Clontech) using poly-dT primer | Acid phosphatase, Imidazole glycerol phosphate dehydratase | Kellner et al. (2011) |
Sandy, nutrient-poor soil | Random primer | CO2 fixation and ammonia oxidizers enzymes | Urich et al. (2008) |
Forest soil | SMART cDNA Library Construction Kit (Clontech) | Eukaryotic diversity, novel enzymes | Bailly et al. (2007) |
Challenges of functional metatranscriptomics to study the actual functions of micro-eukaryotes in soil
Metatranscriptomic studies of soil encounter various methodological challenges that need to be resolved, even when they offer insights into active microbial communities and their related functions. Obtaining high-quality RNA from soil poses a significant challenge due to the simultaneous extraction of humic acids and other organic compounds, which can interfere with subsequent analyses like PCR (Fraissinet-Tachet et al. 2013). Furthermore, Poly-A based capture of eukaryotic mRNA during library preparation has a disadvantage in the case of polyadenylated prokaryotic mRNA (Sarkar 1997) and eukaryotic mRNA lacking poly-A tails (Katinakis et al. 1980). To maintain the integrity of the RNA in the soil samples, they should be immediately frozen in liquid nitrogen and processed for RNA isolation as soon as possible to avoid possible degradation of very small fraction of micro-eukaryotic mRNA (Yadav et al. 2016).
Obtaining library enriched with full-length cDNA is also one of the challenging aspects of functional metatranscriptomics which requires untruncated RNA with high quality that can be converted and amplified to obtain full-length cDNA. In addition, constructing cDNA libraries presents a significant challenge in obtaining cDNA clones with longer inserts, typically exceeding 1 kb. This difficulty arises from the preferential amplification of shorter cDNA molecules amidst the complex mixture of both long and short cDNAs. Yadav et al. (2014) demonstrated an economical and highly efficient approach to create eukaryotic cDNA libraries of specific sizes from soil samples. After the size fractionation of cDNAs, long and short inserts are ligated to the vector and transformed separately, thereby increasing the chances of the screening of cDNA clones with longer insert size.
Due to the intricate nature of the microbiome, metatranscriptome investigations, particularly those involving multiple samples and extensive coverage such as in studies on differential gene expression, have traditionally utilized high-throughput sequencing for short read data, employing Illumina sequencing technology. Determining suitable experimental parameters, like sequencing depth, for metatranscriptomes is challenging due to the limited knowledge about samples, such as their microbial composition, community membership abundance, genome sizes, and expression levels within and among genomes (Shakya et al. 2019). Due to the shortage of suitable reference genomes, it is difficult to assemble metatranscriptomic data, which could potentially result in a less than optimal proportion of reads in any dataset being accurately described in terms of functionality or taxonomy. However, to fully utilize the data, bioinformatic pipelines and database resources must be improved continuously. This calls for constant, reciprocal collaboration between computational and biological scientists.
Challenges associated with the expression of micro-eukaryotic genes in heterologous hosts
One common fact associated with both functional metagenomics and functional metatranscriptomics is the limitation of the expression of environmental DNA or RNA in heterologous host. Heterologous expression of environmental nucleic acid is heavily host dependent. E. coli serves as a widely accepted and frequently utilized host for the economical and proficient production of numerous proteins at a high level. But, chances of finding many proteins or enzymes may be markedly reduced when bacterial systems are employed for eukaryotic ones in metagenomic or metatranscriptomic studies. Variations in codon usage, promoter regulation/activation, and RNA processing/translation naturally restrict the efficient functional expression of numerous eukaryotic genes in prokaryotes (Hannig and Makrides 1998). Yeasts are commonly used in the expression and screening of eukaryotic genes. But, a single type of eukaryotic system is thought to be inefficient to express the plethora of genes coming from an assemblage of eukaryotic genomes of diverse phylogenetic origin. Screening and characterization of micro-eukaryotic genes in different systems are required to explore the complete set of genes possessed by these organisms. In addition, highly efficient eukaryotic host systems and strains are required to be developed which can express eukaryotic proteins more correctly and easily (Da Silva and Srikrishnan 2012). These eukaryotic systems can express functionally diverse heterologous eukaryotic proteins with accurate catalytic or structural properties which will help to determine their precise function.
Conclusion and future perspective
Soil metagenomics and metatranscriptomics have a considerable effect on how the soil micro-eukaryotic world is being viewed and studied now. The vast diversity and ecological functions of micro-eukaryotic world of soil from different natural zones are being determined using different methods under these approaches in a more efficient manner. These studies revealed the links among the environmental factors, community structure of micro-eukaryotes, and their specific activities in soil. With respect to the ecological role of micro-eukaryotes, such as lignocellulosic biomass degradation, metal resistance etc., functional metatranscriptomic studies should be performed in parallel with the functional metagenomics to reveal not only active genes but also potential genes and their putative functions in the soil of different climatic zones and conditions. Metagenomics and metatranscriptomics through high-throughput sequencing have opened the door to understand these organisms in a more refined way, yet many technical and conceptual issues need to be overcome. We can anticipate new insights into the variety and roles of micro-eukaryotes within the soil ecosystem. This demands more effort, curiosity, and developments in these fields.
Acknowledgements
The authors are thankful to Science and Engineering Research Board (SERB), New Delhi, for financial support under the project No. YSS/2015/000218.
Declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
References
- Adl SM, Gupta VVSR. Protists in soil ecology and forest nutrient cycling. Can J for Res. 2006;36(7):1805–1817. doi: 10.1139/X06-056. [DOI] [Google Scholar]
- Adl SM, Simpson AGB, Lane CE, et al. The revised classification of eukaryotes. J Eukaryot Microbiol. 2012;59:429–514. doi: 10.1111/j.1550-7408.2012.00644.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bailey VL, Smith JL, Bolton H. Fungal-to-bacterial ratios in soils investigated for enhanced C sequestration. Soil Biol Biochem. 2002;34(7):997–1007. doi: 10.1016/S0038-0717(02)00033-0. [DOI] [Google Scholar]
- Bailly J, Fraissinet-Tachet L, Verner MC, Debaud JC, Lemaire M, Wésolowski-Louvel M, Marmeisse R. Soil eukaryotic functional diversity, a metatranscriptomic approach. ISME J. 2007;1(7):632–642. doi: 10.1038/ismej.2007.68. [DOI] [PubMed] [Google Scholar]
- Bang-Andreasen T, Anwar MZ, Lanzen A, Kjoller R, Ronn R, Ekelund F, Jacobsen CS. Total RNA sequencing reveals multilevel microbial community changes and functional responses to wood ash application in agricultural and forest soil. FEMS Microbiol Ecol. 2020;96(3):fiaa016. doi: 10.1093/femsec/fiaa016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bardgett RD, Van Der Putten WH. Belowground biodiversity and ecosystem functioning. Nature. 2014;515(7528):505–511. doi: 10.1038/nature13855. [DOI] [PubMed] [Google Scholar]
- Bates ST, Clemente JC, Flores GE, Walters WA, Parfrey LW, Knight R, Fierer N. Global biogeography of highly diverse protistan communities in soil. ISME J. 2013;7(3):652–659. doi: 10.1038/ismej.2012.147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Batista-García RA, del Rayo S-C, Talia P, Jackson SA, O’Leary ND, Dobson ADW, Folch-Mallol JL. From lignocellulosic metagenomes to lignocellulolytic genes: trends, challenges and future prospects. Biofuels Bioprod Biorefin. 2016;10(6):864–882. doi: 10.1002/bbb.1709. [DOI] [Google Scholar]
- Berney C, Henry N, Mahé F, 2022. EukRibo: a manually curated eukaryotic 18S rDNA reference database to facilitate identification of new diversity. bioRxiv. [DOI]
- Bik HM, Porazinska DL, Creer S, Caporaso JG, Knight R, Thomas WK. Sequencing our way towards understanding global eukaryotic biodiversity. Trends Ecol Evol. 2012;27(4):233–243. doi: 10.1016/j.tree.2011.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bonkowski M. Protozoa and plant growth: The microbial loop in soil revisited. New Phytol. 2004;162(3):617–631. doi: 10.1111/j.1469-8137.2004.01066.x. [DOI] [PubMed] [Google Scholar]
- Bonkowski M, Schaefer M. Interactions between earthworms and soil protozoa: a trophic component in the soil food web. Soil Biol Biochem. 1997;29(3–4):499–502. doi: 10.1016/S0038-0717(96)00107-1. [DOI] [Google Scholar]
- Bragalini C, Ribière C, Parisot N, et al. (2014) Solution hybrid selection capture for the recovery of functional full-length eukaryotic cdnas from complex environmental samples. DNA Res. 2014;21(6):685–694. doi: 10.1093/dnares/dsu030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cambou A, Cardinael R, Kouakoua E, Villeneuve M, Durand C, Barthès BG. Prediction of soil organic carbon stock using visible and near infrared reflectance spectroscopy (VNIRS) in the field. Geoderma. 2016;261:151–159. doi: 10.1016/j.geoderma.2015.07.007. [DOI] [Google Scholar]
- Caporaso JG, Kuczynski J, Stombaugh J, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7(5):335–336. doi: 10.1038/nmeth.f.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Capra E, Giannico R, Montagna M, Turri F, Cremonesi P, Strozzi F, Leone P, Gandini G, Pizzi F. A new primer set for DNA metabarcoding of soil Metazoa. Eur J Soil Biol. 2016;77:53–59. doi: 10.1016/j.ejsobi.2016.10.005. [DOI] [Google Scholar]
- Carini P, Marsden PJ, Leff JW, Morgan EE, Strickland MS, Fierer N. Relic DNA is abundant in soil and obscures estimates of soil microbial diversity. Nat Microbiol. 2016;2:16242. doi: 10.1038/nmicrobiol.2016.242. [DOI] [PubMed] [Google Scholar]
- Carmon N, Ben-Dor E. An advanced analytical approach for spectral - based modelling of soil properties. In J Emerg Technol Adv Eng. 2017;7(3):90–97. [Google Scholar]
- Chen YL, Jian Z, Yufeng Z, et al. Parallel-meta suite: interactive and rapid microbiome data analysis on multiple platforms. iMeta. 2022;1:e1. doi: 10.1002/imt2.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cole JR, Wang Q, Fish JA, et al. Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Res. 2014;42(Database issue):D633–D642. doi: 10.1093/nar/gkt1244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Coleman DC, Whitman WB. Linking species richness, biodiversity and ecosystem function in soil systems. Pedobiologia. 2005;49(6):479–497. doi: 10.1016/j.pedobi.2005.05.006. [DOI] [Google Scholar]
- Cooper GM (2000) The Cell: a molecular approach. 2nd Edition, Sunderland (MA): Sinauer Associates, The Development and Causes of Cancer. https://www.ncbi.nlm.nih.gov/books/NBK9963/
- Creer S, Fonseca VG, Porazinska DL, et al. Ultrasequencing of the meiofaunal biosphere: practice, pitfalls and promises. Mol Ecol. 2010;19:4–20. doi: 10.1111/j.1365-294X.2009.04473.x. [DOI] [PubMed] [Google Scholar]
- Da Silva NA, Srikrishnan S. Introduction and expression of genes for metabolic engineering applications in Saccharomyces cerevisiae. FEMS Yeast Res. 2012;12(2):197–214. doi: 10.1111/j.1567-1364.2011.00769.x. [DOI] [PubMed] [Google Scholar]
- Damon C, Vallon L, Zimmermann S, Haider MZ, Galeote V, Dequin S, Luis P, Fraissinet-tachet L, Marmeisse R, Lyon D. A novel fungal family of oligopeptide transporters identified by functional metatranscriptomics of soil eukaryotes. ISME J. 2011;5:1871–1880. doi: 10.1038/ismej.2011.67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Damon C, Lehembre F, Oger-Desfeux C, Luis P, Ranger J, Fraissinet-Tachet L, Marmeisse R. Metatranscriptomics reveals the diversity of genes expressed by eukaryotes in forest soils. PLoS ONE. 2012;7(1):e28967. doi: 10.1371/journal.pone.0028967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daniel R. The metagenomics of soil. Nat Rev Microbiol. 2005;3(6):470–478. doi: 10.1038/nrmicro1160. [DOI] [PubMed] [Google Scholar]
- Debroas D, Domaizon I, Humbert JF, Jardillier L, Lepére C, Oudart A, Taib N. Overview of freshwater microbial eukaryotes diversity: a first analysis of publicly available metabarcoding data. FEMS Microbiol Ecol. 2017;93(4):1–14. doi: 10.1093/femsec/fix023. [DOI] [PubMed] [Google Scholar]
- del Campo J, Kolisko M, Boscaro V, et al. EukRef: phylogenetic curation of ribosomal RNA to enhance understanding of eukaryotic diversity and distribution. PLoS Biol. 2018;16(9):e2005849. doi: 10.1371/journal.pbio.2005849. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dotto AC, Dalmolin RSD, Grunwald S, ten Caten A, Filho PW. Two preprocessing techniques to reduce model covariables in soil property predictions by Vis-NIR spectroscopy. Soil Tillage Res. 2017;172(May):59–68. doi: 10.1016/j.still.2017.05.008. [DOI] [Google Scholar]
- Ducklow H. Microbial services: Challenges for microbial ecologists in a changing world. Aquat Microb Ecol. 2008;53(1):13–19. doi: 10.3354/ame01220. [DOI] [Google Scholar]
- Escalas A, Hale L, Voordeckers JW, Yang Y, Firestone MK, Alvarez-Cohen L, Zhou J. Microbial functional diversity: From concepts to applications. Ecol Evol. 2019;9(20):12000–12016. doi: 10.1002/ECE3.5670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Escobar-Zepeda A, Vera-Ponce de León A, Sanchez-Flores A. The road to metagenomics: from microbiology to DNA sequencing technologies and bioinformatics. Front Genet. 2015;6(Dec):1–15. doi: 10.3389/fgene.2015.00348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Espinoza JL, Dupont CL. VEBA: a modular end-to-end suite for in silico recovery, clustering, and analysis of prokaryotic, microeukaryotic, and viral genomes from metagenomes. BMC Bioinformatics. 2022;23:419. doi: 10.1186/s12859-022-04973-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Esteban GF, Clarke KJ, Olmo JL, Finlay BJ. Soil protozoa-An intensive study of population dynamics and community structure in an upland grassland. Appl Soil Ecol. 2006;33(2):137–151. doi: 10.1016/j.apsoil.2005.07.011. [DOI] [Google Scholar]
- Falkowski PG, Fenchel T, Delong EF. The microbial engines that drive earth’s biogeochemical cycles. Science. 2008;320(5879):1034–1039. doi: 10.1126/science.1153213. [DOI] [PubMed] [Google Scholar]
- Ferris H. Form and function: Metabolic footprints of nematodes in the soil food web. Eur J Soil Biol. 2010;46(2):97–104. doi: 10.1016/j.ejsobi.2010.01.003. [DOI] [Google Scholar]
- Forsberg KJ, Patel S, Witt E, Wang B, Ellison T, Dantas G. Identification of genes conferring tolerance to lignocellulose-derived inhibitors by functional selections in soil metagenomes. Appl Environ Microbiol. 2015;82(2):528–537. doi: 10.1128/AEM.02838-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fraissinet-Tachet L, Marmeisse R, Zinger L, Luis P. Metatranscriptomics of soil eukaryotic communities. In: Martin F, editor. The ecological genomics of fungi. Hoboken: Wiley; 2013. [Google Scholar]
- Frioux C, Singh D, Korcsmaros T, Hildebrand F. From bag-of-genes to bag-of-genomes: metabolic modelling of communities in the era of metagenome-assembled genomes. Comput Struct Biotechnol J. 2020 doi: 10.1016/j.csbj.2020.06.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gattoni K, Gendron EMS, Sandoval-Ruiz R, et al. 18S-NemaBase: Curated 18S rRNA Database of Nematode Sequences. J Nematol. 2023;55(1):3923. doi: 10.2478/jofnem-2023-0006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gauthier GM, Keller NP. Crossover fungal pathogens: the biology and pathogenesis of fungi capable of crossing kingdoms to infect plants and humans. Fungal Genet Biol. 2013;61:146–157. doi: 10.1016/j.fgb.2013.08.016. [DOI] [PubMed] [Google Scholar]
- Geisen S, Tveit AT, Clark IM, Richter A, Svenning MM, Bonkowski M, et al. Metatranscriptomic census of active protists in soils. ISME J. 2015;9:2178–2190. doi: 10.1038/ismej.2015.30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilbert J, Li LL, Taghav S, McCorkle SM, Tringe S, Van Der Lelie D (2012) Bioprospecting metagenomics for new glycoside hydrolases. In: Himmel M (eds) Biomass conversion, methods in molecular biology, 908, pp 141–151. 10.1007/978-1-61779-956-3_14 [DOI] [PubMed]
- Giller PS, Hillebrand H, Berninger UG, et al. Biodiversity effects on ecosystem functioning: Emerging issues and their experimental test in aquatic environments. Oikos. 2004;104(3):423–436. doi: 10.1111/j.0030-1299.2004.13253.x. [DOI] [Google Scholar]
- Grant S, Grant WD, Cowan DA, Jones BE, Ma Y, Ventosa A, Heaphy S. Identification of eukaryotic open reading frames in metagenomic cDNA libraries made from environmental samples. Appl Environ Microbiol. 2006;72(1):135–143. doi: 10.1128/AEM.72.1.135-143.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guillou L, Bachar D, Audic S, et al. (2013) The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy. Nucleic Acids Res. 2013;41(D1):D597–D604. doi: 10.1093/nar/gks1160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gutierrez P, Bulman S, Alzate J, Ortiz MC, Marin M. Mitochondrial genome sequence of the potato powdery scab pathogen Spongospora subterranea. Mitochondrial DNA. 2016;27(1):58–59. doi: 10.3109/19401736.2013.873898. [DOI] [PubMed] [Google Scholar]
- Hannig G, Makrides SC. Strategies for optimizing heterologous protein expression in Escherichia coli. Trends Biotechnol. 1998;16(2):54–60. doi: 10.1016/s0167-7799(97)01155-4. [DOI] [PubMed] [Google Scholar]
- Hawksworth DL, Lücking R. Fungal diversity revisited: 2.2 to 3.8 million species. Microbiol Spectrum. 2017;5(4):10. doi: 10.1128/microbiolspec.FUNK-0052-2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hedges SB, Blair JE, Venturi ML, Shoe JL. A molecular timescale of eukaryote evolution and the rise of complex multicellular life. BMC Evol Biol. 2004;4:2. doi: 10.1186/1471-2148-4-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heger TJ, Derungs N, Theurillat JP, Mitchell EAD. Testate amoebae like it hot: species richness decreases along a subalpine-alpine altitudinal gradient in both natural Calluna vulgaris litter and transplanted Minuartia sedoides cushions. Microb Ecol. 2016;71(3):725–734. doi: 10.1007/s00248-015-0687-3. [DOI] [PubMed] [Google Scholar]
- Hempel CA, Wright N, Harvie J, Hleap JS, Adamowicz SJ, Steinke D. Metagenomics versus total RNA sequencing: most accurate data-processing tools, microbial identification accuracy and perspectives for ecological assessments. Nucleic Acids Res. 2022;50(16):9279–9293. doi: 10.1093/nar/gkac689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hugerth LW, Muller EEL, Hu Y, Lebrun LAM, Roume H, Lundin D, Wilmes P, Andersson AF. Systematic design of 18S rRNA gene primers for determining eukaryotic diversity in microbial consortia. PLoS ONE. 2014;9(4):e95567. doi: 10.1371/journal.pone.0095567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Imchen M, Kumavath R, Barh D, Avezedo V, Ghosh P, Viana M, Wattam AR. Searching for signatures across microbial communities: metagenomic analysis of soil samples from mangrove and other ecosystems. Sci Rep. 2017;7(1):1–13. doi: 10.1038/s41598-017-09254-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kang DD, Li F, Kirton E, et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 2019;7:e7359. doi: 10.7717/peerj.7359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katinakis PK, Slater A, Burdon RH. Non-polyadenylated mRNAs from eukaryotes. FEBS Lett. 1980 doi: 10.1016/0014-5793(80)80515-1. [DOI] [PubMed] [Google Scholar]
- Kellner H, Luis P, Portetelle D, Vandenbol M. Screening of a soil metatranscriptomic library by functional complementation of Saccharomyces cerevisiae mutants. Microbiol Res. 2011;166(5):360–368. doi: 10.1016/j.micres.2010.07.006. [DOI] [PubMed] [Google Scholar]
- Kellner H, Luis P, Pecyna MJ, Barbi F, Kapturska D, Krüger D, Zak DR, Marmeisse R, Vandenbol M, Hofrichter M. Widespread occurrence of expressed fungal secretory peroxidases in forest soils. PLoS ONE. 2014;9(4):e95557. doi: 10.1371/journal.pone.0095557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kenmotsu H, Uchida K, Hirose Y, Eki T. Taxonomic profiling of individual nematodes isolated from copse soils using deep amplicon sequencing of four distinct regions of the 18S ribosomal RNA gene. PLoS ONE. 2020;15(10):e0240336. doi: 10.1371/journal.pone.0240336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lal R, Moldenhauer WC. Effects of soil erosion on crop productivity. Crit Rev Plant Sci. 1987;5(4):303–367. doi: 10.1080/07352688709382244. [DOI] [Google Scholar]
- Lamarque P, Lavorel S, Mouchet M, Quétier F. Plant trait-based models identify direct and indirect effects of climate change on bundles of grassland ecosystem services. Proc Natl Acad Sci USA. 2014;111(38):13751–13756. doi: 10.1073/pnas.1216051111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Larsen BB, Miller EC, Rhodes MK, Wiens JJ. Inordinate fondness multiplied and redistributed: the number of species on earth and the new pie of life. Q Rev Biol. 2017;92(3):229–265. doi: 10.1086/693564. [DOI] [Google Scholar]
- Lehembre F, Doillon D, David E, et al. Soil metatranscriptomics for mining eukaryotic heavy metal resistance genes. Environ Microbiol. 2013;15:2829–2840. doi: 10.1111/1462-2920.12143. [DOI] [PubMed] [Google Scholar]
- Lesaulnier C, Papamichail D, McCorkle S, Ollivier B, Skiena S, Taghavi S, Zak D, Van Der Lelie D. Elevated atmospheric CO2 affects soil microbial diversity associated with trembling aspen. Environ Microbiol. 2008;10(4):926–941. doi: 10.1111/j.1462-2920.2007.01512.x. [DOI] [PubMed] [Google Scholar]
- Levy Karin E, Mirdita M, Söding J. MetaEuk sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics. Microbiome. 2020;8(1):48. doi: 10.1186/s40168-020-00808-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li D, Liu C-M, Luo R, et al. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674–1676. doi: 10.1093/bioinformatics/btv033. [DOI] [PubMed] [Google Scholar]
- Lind AL, Pollard KS. Accurate and sensitive detection of microbial eukaryotes from whole metagenome shotgun sequencing. Microbiome. 2021;9:58. doi: 10.1186/s40168-021-01015-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu T, Li CM, Han YL, Chiang TY, Chiang YC, Sung HM. Highly diversified fungi are associated with the achlorophyllous orchid Gastrodia flavilabella. BMC Genom. 2015;16(1):185. doi: 10.1186/s12864-015-1422-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loreau M, Naeem S, Inchausti P, et al. Biodiversity and ecosystem functioning: current knowledge and future challenges. Science. 2001;294(5543):804–808. doi: 10.1126/science.1064088. [DOI] [PubMed] [Google Scholar]
- Luo R, Liu B, Xie Y, et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience. 2012;1(1):2017–217X–1–18. doi: 10.1186/2047-217X-1-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Madigan MT, Martinko JM, Parker J. Brock biology of microorganisms. 8. New York: Prentice Hall International Inc; 1997. [Google Scholar]
- Marcelino VR, Clausen PTLC, Buchmann JP, et al. CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data. Genome Biol. 2020;21(1):103. doi: 10.1186/s13059-020-02014-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marmeisse R, Kellner H, Fraissinet-Tachet L, Luis P. Discovering protein-coding genes from the environment: time for the eukaryotes? Trends Biotechnol. 2017;35(9):824–835. doi: 10.1016/j.tibtech.2017.02.003. [DOI] [PubMed] [Google Scholar]
- Martinez X, Pozuelo M, Pascal V, et al. MetaTrans: an open-source pipeline for metatranscriptomics. Sci Rep. 2016;6:26447. doi: 10.1038/srep26447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meyer F, Paarmann D, D'Souza M, et al. The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinform. 2008;9:386. doi: 10.1186/1471-2105-9-386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mouillot D, Villéger S, Scherer-Lorenzen M, Mason NWH. Functional structure of biological communities predicts ecosystem multifunctionality. PLoS ONE. 2011 doi: 10.1371/journal.pone.0017476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mukherjee A, Yadav R, Marmeisse R, Fraissinet-Tachet L, Reddy MS. Heavy metal hypertolerant eukaryotic aldehyde dehydrogenase isolated from metal contaminated soil by metatranscriptomics approach. Biochimie. 2019;160:183–192. doi: 10.1016/j.biochi.2019.03.010. [DOI] [PubMed] [Google Scholar]
- Mukherjee A, Yadav R, Marmeisse R, Fraissinet-Tachet L, Reddy MS. Detoxification of toxic heavy metals by serine protease inhibitor isolated from polluted soil. Int Biodeterior Biodegrad. 2019;143:104718. doi: 10.1016/j.ibiod.2019.104718. [DOI] [Google Scholar]
- Nazir A. Review on Metagenomics and its Applications. Imp J Interdiscip Res. 2016;2(3):277–286. [Google Scholar]
- Niederberger TD, Bottos EM, Sohm JA, Gunderson T, Parker A, Coyne KJ, Capone DG, Carpenter EJ, Cary SC. Rapid microbial dynamics in response to an induced wetting event in Antarctic dry valley soils. Front Microbiol. 2019;10:621. doi: 10.3389/fmicb.2019.00621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nilsson RH, Karl-H L, Taylor AFS, et al. The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications. Nucleic Acids Res. 2019;47(D1):D259–D264. doi: 10.1093/nar/gky1022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017;27(5):824–834. doi: 10.1101/gr.213959.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Olm MR, West PT, Brooks B, Firek BA, Baker R, Morowitz MJ, Banfield JF. Genome-resolved metagenomics of eukaryotic populations during early colonization of premature infants and in hospital rooms. Microbiome. 2019;7(1):1–16. doi: 10.1186/s40168-019-0638-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Passarini MRZ, Ottoni JR, Costa PEd, et al. Fungal community diversity of heavy metal contaminated soils revealed by metagenomics. Arch Microbiol. 2022;204:255. doi: 10.1007/s00203-022-02860-7. [DOI] [PubMed] [Google Scholar]
- Pawlowski J, Audic S, Adl S, et al. CBOL Protist working group: barcoding eukaryotic richness beyond the animal, plant, and fungal kingdoms. PLoS Biol. 2012;10(11):e1001419. doi: 10.1371/journal.pbio.1001419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peano C, Pietrelli A, Consolandi C, Rossi E, Petiti L, Tagliabue L, De Bellis G, Landini P. An efficient rRNA removal method for RNA sequencing in GC-rich bacteria. Microb Inform Exp. 2013;3(1):1. doi: 10.1186/2042-5783-3-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pérez-Cobas AE, Gomez-Valero L, Buchrieser C. Metagenomic approaches in microbial ecology: an update on whole-genome and marker gene sequencing analyses. Microbial Genom. 2020 doi: 10.1099/mgen.0.000409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Praeg N, Pauli H, Illmer P. Microbial diversity in bulk and rhizosphere soil of Ranunculus glacialis along a high-alpine altitudinal gradient. Front Microbiol. 2019 doi: 10.3389/fmicb.2019.01429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prokopowich CD, Gregory TR, Crease TJ. The correlation between rDNA copy number and genome size in eukaryotes. Genome. 2003;46(1):48–50. doi: 10.1139/g02-103. [DOI] [PubMed] [Google Scholar]
- Quast C, Pruesse E, Yilmaz P, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucl Acids Res. 2013;41:D590–D596. doi: 10.1093/nar/gks1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reczuga MK, Seppey CVW, Mulot M, et al. Assessing the responses of Sphagnum micro-eukaryotes to climate changes using high throughput sequencing. PeerJ. 2020;8:1–26. doi: 10.7717/peerj.9821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Riesenfeld CS, Schloss PD, Handelsman J. Metagenomics: genomic analysis of microbial communities. Annu Rev Genet. 2004;38:525–552. doi: 10.1146/annurev.genet.38.072902.091216. [DOI] [PubMed] [Google Scholar]
- Sabale SN, Suryawanshi PP, Krishnaraj PU (2020) Soil metagenomics: concepts and applications, metagenomics—basics, methods and applications, 10.5772/intechopen. 78746 (Chapter 2). 10.5772/intechopen.88958
- Salih WY, Hassan FM. Environmental diagnosing of the new algal pollution of Tigris River in Iraq. IOP Conf Ser Earth Environ Sci. 2021;877:012024. doi: 10.1088/1755-1315/877/1/012024. [DOI] [Google Scholar]
- Sapkota R, Nicolaisen M. High-throughput sequencing of nematode communities from total soil DNA extractions. BMC Ecol. 2015;15(1):3. doi: 10.1186/s12898-014-0034-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sarkar N. Polyadenylation of mRNA. Annu Rev Biochem. 1997;66(1):173–197. doi: 10.1146/annurev.biochem.66.1.173. [DOI] [PubMed] [Google Scholar]
- Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75(23):7537–7541. doi: 10.1128/AEM.01541-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmeisser C, Steele H, Streit WR. Metagenomics, biotechnology with non-culturable microbes. Appl Microbiol Biotechnol. 2007;75(5):955–962. doi: 10.1007/s00253-007-0945-5. [DOI] [PubMed] [Google Scholar]
- Schöler A, Jacquiod S, Vestergaard G, Schulz S, Schloter M. Analysis of soil microbial communities based on amplicon sequencing of marker genes. Biol Fertil Soils. 2017;53(5):485–489. doi: 10.1007/s00374-017-1205-1. [DOI] [Google Scholar]
- Schowalter TD (2022) Ecosystem structure and function. In: Insect ecology, pp 519–566. Elsevier. 10.1016/B978-0-323-85673-7.00004-6
- Seok YJ, Song E, Cha I, Lee H, Roh SW, Jung JY, Lee YK, Nam YD, Seo MJ. Microbial community of the Arctic soil from the glacier foreland of Midtre Lovenbreen in Svalbard by metagenome analysis. Microbiol Biotechnol Lett. 2016;44(2):171–179. doi: 10.4014/mbl.1601.01003. [DOI] [Google Scholar]
- Shakya M, Lo C, Chain PS. Advances and Challenges in metatranscriptomic analysis. Front Genet. 2019;10:904. doi: 10.3389/fgene.2019.00904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharpton TJ. An introduction to the analysis of shotgun metagenomic data. Front Plant Sci. 2014 doi: 10.3389/fpls.2014.00209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simon C, Daniel R. Metagenomic analyses: Past and future trends. Appl Environ Microbiol. 2011;77(4):1153–1161. doi: 10.1128/AEM.02345-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Siñski E, Behnke JM. Apicomplexan parasites: environmental contamination and transmission. Pol J Microbiol. 2004;53:67–73. [PubMed] [Google Scholar]
- Thakur B, Yadav RK, Fraissinet-Tachet L, Marmeisse R, Reddy MS. Isolation of multi-metal tolerant ubiquitin fusion protein from metal polluted soil by metatranscriptomic approach. J Microbiol Methods. 2018 doi: 10.1016/j.mimet.2018.08.001. [DOI] [PubMed] [Google Scholar]
- Thakur B, Yadav RK, Vallon L, Marmeisse R, Fraissinet-Tachet L, Reddy MS. Multi-metal tolerance of von Willebrand factor type D domain isolated from metal contaminated site by metatranscriptomics approach. Sci Total Environ. 2019;661:432–440. doi: 10.1016/j.scitotenv.2019.01.201. [DOI] [PubMed] [Google Scholar]
- Thakur B, Yadav RK, Marmeisse R, Prashanth S, Krishnamohan M, Fraissinet-Tachet L, Reddy MS. Metagenomic analysis of heavy metal-contaminated soils reveals distinct clades with adaptive features. Int J Environ Sci Technol. 2022 doi: 10.1007/s13762-022-04635-5. [DOI] [Google Scholar]
- Torsvik V, Øvreås L. Microbial diversity and function in soil: from genes to ecosystems. Curr Opin Microbiol. 2002;5(3):240–245. doi: 10.1016/S1369-5274(02)00324-7. [DOI] [PubMed] [Google Scholar]
- Torti A, Lever MA, Jørgensen BB. Origin, dynamics, and implications of extracellular DNA pools in marine sediments. Mar Genom. 2015;24:185–196. doi: 10.1016/j.margen.2015.08.007. [DOI] [PubMed] [Google Scholar]
- Tripathi LK, Nailwal TK (2020) Metagenomics: applications of functional and structural approaches and meta-omics. In: Recent advancements in microbial diversity. Elsevier Inc. 10.1016/b978-0-12-821265-3.00020-7
- Tveit A, Schwacke R, Svenning MM, Urich T. Organic carbon transformations in high-Arctic peat soils: key functions and microorganisms. ISME J. 2013;7:299–311. doi: 10.1038/ismej.2012.99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ufarté L, Potocki-Veronese G, Laville É. Discovery of new protein families and functions: new challenges in functional metagenomics for biotechnologies and microbial ecology. Front Microbiol. 2015;6:563. doi: 10.3389/fmicb.2015.00563. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uhlik O, Leewis MC, Strejcek M, Musilova L, Mackova M, Leigh MB, Macek T. Stable isotope probing in the metagenomics era: a bridge towards improved bioremediation. Biotechnol Adv. 2013;31(2):154–165. doi: 10.1016/j.biotechadv.2012.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Urich T, Lanzén A, Ji Qi, Daniel HH, Schleper C, Schuster SC, Ward N. Simultaneous assessment of soil microbial community structure and function through analysis of the meta-transcriptome. PLoS ONE. 2008;3(6):e2527. doi: 10.1371/journal.pone.0002527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van den Hoogen J, Geisen S, Wall DH, et al. A global database of soil nematode abundance and functional group composition. Sci Data. 2020;7(1):103. doi: 10.1038/s41597-020-0437-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304(5667):66–74. doi: 10.1126/science.1093857. [DOI] [PubMed] [Google Scholar]
- Wardle DA. The influence of biotic interactions on soil biodiversity. Ecol Lett. 2006;9(7):870–886. doi: 10.1111/j.1461-0248.2006.00931.x. [DOI] [PubMed] [Google Scholar]
- Westermann AJ, Gorski SA, Vogel J. Dual RNA-seq of pathogen and host. Nat Rev Microbiol. 2012;10(9):618–630. doi: 10.1038/nrmicro2852. [DOI] [PubMed] [Google Scholar]
- Wilkinson M, Dumontier M, Aalbersberg I, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:160018. doi: 10.1038/sdata.2016.18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woo SL, De Filippis F, Zotti M, Vandenberg A, Hucl P, Bonanomi G. Pea-wheat rotation affects soil microbiota diversity, community structure, and soilborne pathogens. Microorganisms. 2022;10(2):1–12. doi: 10.3390/microorganisms10020370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32(4):605–607. doi: 10.1093/bioinformatics/btv638. [DOI] [PubMed] [Google Scholar]
- Yadav RK, Barbi F, Ziller A, Luis P, Marmeisse R, Reddy MS, Fraissinet-Tachet L. Construction of sized eukaryotic cDNA libraries using low input of total environmental metatranscriptomic RNA. BMC Biotechnol. 2014;14(1):1–6. doi: 10.1186/1472-6750-14-80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yadav RK, Bragalini C, Fraissinet-Tachet L, Marmeisse R, Luis P (2016) Metatranscriptomics of soil eukaryotic communities. In: Methods in molecular biology, Vol. 1399, pp. 273–287. 10.1007/978-1-4939-3369-3_16 [DOI] [PubMed]
- Zancan S, Trevisan R, Paoletti MG. Soil algae composition under different agro-ecosystems in North-Eastern Italy. Agr Ecosyst Environ. 2006;112(1):1–12. doi: 10.1016/j.agee.2005.06.018. [DOI] [Google Scholar]
- Zhang Y, Gui H, Zhang S, Li C. Diversity and potential function of prokaryotic and eukaryotic communities from different mangrove sediments. Sustainability. 2022;14(6):3333. doi: 10.3390/su14063333. [DOI] [Google Scholar]
- Ziller A, Yadav RK, Capdevila M, Reddy MS, Vallon L, Marmeisse R, Atrian S, Palacios Ò, Fraissinet-Tachet L. Metagenomics analysis reveals a new metallothionein family: sequence and metal-binding features of new environmental cysteine-rich proteins. J Inorg Biochem. 2017;167:1–11. doi: 10.1016/j.jinorgbio.2016.11.017. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Berney C, Henry N, Mahé F, 2022. EukRibo: a manually curated eukaryotic 18S rDNA reference database to facilitate identification of new diversity. bioRxiv. [DOI]