Skip to main content
Synthetic and Systems Biotechnology logoLink to Synthetic and Systems Biotechnology
. 2025 Jul 21;10(4):1377–1387. doi: 10.1016/j.synbio.2025.07.006

Gene Surfing: An efficient and versatile tool for targeted enzyme mining in metagenomics

Tong Xu a, Danyang Huang a, Tingting Huang a, Yuxin Wang a, Wanqiu Chen a, Shijunyin Chen a,b, Yurong Qian c,d, Haitao Yue a,b,
PMCID: PMC12396413  PMID: 40893472

Abstract

Microbial community studies have established enzymes' pivotal catalytic roles in ecosystem metabolism, yet cultivation-dependent methods fail to exploit uncultured microbial enzyme resources. Metagenomics overcomes this by directly accessing microbial genetic information, but its massive data generation challenges precise enzyme identification: (1) Restricted applicability across varied sample types. (2) Narrow functional scope in target enzyme discovery.

To address this, we developed Gene Surfing, a bioinformatics workflow platform based on Snakemake. It integrates modules for data quality control (Fastp), genome assembly (MEGAHIT), assembly evaluation (QUAST and MetaQUAST), functional annotation (Prokka), and homologous sequence retrieval (MMseqs2). Gene Surfing offers scalability, reproducibility, and efficiency, addressing key challenges in enzyme identification. Validation results include: Cellulose-degrading enzymes (GH5 family): 1,311,316 potential lignocellulolytic enzyme sequences were identified, with 127 sequences functionally validated (84.25 % activity rate); Polyethylene-degrading enzymes: 705 candidate sequences were found, 38 of which were heterologously expressed, showing an 81.5 % activity rate (31/38); Endonucleases (HNH superfamily): 585 potential sequences were retrieved, with 4 out of 7 tested showing activity (57.1 % success rate).

Keywords: Bioinformatics workflow, Snakemake workflow, Metagenomic sequencing, Target gene identification, Heterologous expression

Highlights

  • A Snakemake bioinformatics workflow was developed for targeted identification of genomic target genes.

  • Target enzymes can be effectively screened out from multiple metagenomic samples.

  • The tool's validity is confirmed by activity validation of three different functional enzymes.

1. Introduction

DNA sequencing technology has made significant progress in terms of throughput and cost reduction, offering a new opportunity for genomic research. With its fast and high-throughput characteristics, next-generation sequencing (NGS) technology enables the acquisition of high-quality sequence data, making it a powerful tool for genomics research [1,2]. The advancements in these technologies have made it possible to directly obtain the genomic sequences of all microorganisms in the environment without the need for microbial isolation and cultivation, greatly expanding the depth and breadth of microbiological studies.

Enzymes are critical for industrial applications like biocatalysis and bioremediation due to their efficiency and selectivity [3,4]. While microbial enzymes are cost-effective and scalable for industrial use,traditional cultivation methods fail to access over 99 % of environmental microbes, limiting enzyme discovery. NGS overcomes this by analyzing metagenomic data to assemble microbial genomic fragments (contigs), identify enzyme-coding sequences from uncultured microorganisms, and predict their functional roles in specific environments [5,6].

Recent studies have recovered 43,191 bacterial and archaeal genomes from publicly available marine metagenomic data, covering the diversity of 138 phyla. Through bioinformatic analysis, a new CRISPR-Cas9 system, 10 antimicrobial peptides, and 3 enzymes that degrade PET were discovered, with their effectiveness validated through in vitro experiments. These findings support global genomic sequencing initiatives, deepen the understanding of marine microbial diversity and its potential applications, and demonstrate the enormous potential of NGS technology in uncovering microbial enzymes and other biological resources [7].

Sequence-based screening dominates large-scale metagenomic analysis for discovering functional genes and enzymes, offering greater efficiency and broader coverage of uncultivated microbial resources compared to functional screening [8,9]. This approach uses homology-based searches, aligning metagenomic sequences against known gene templates (e.g., via BLAST) to identify functionally similar candidates, enabling rapid identification of enzymes or metabolic pathway-related genes.Despite enabling target gene discovery, current metagenomic analysis tools struggle with escalating data volumes and computational demands, compounded by complex workflows, heavy reliance on high-performance computing, and poor tool integration/user-friendliness [10]. Developing streamlined, efficient analysis frameworks is critical to overcoming these challenges [11].

Existing metagenomic enzyme prediction tools face limitations in accuracy, methodological robustness, and practical implementation [12]. To address these issues, we have developed Gene Surfing, an efficient bioinformatics workflow specifically designed for targeted mining of enzymes from metagenomes. This method, based on Snakemake, integrates metagenomic processing tools and is designed as a modular component, allowing users to conduct different enzyme mining tasks based on submitted query sequences. Users can input FASTQ-format metagenomic data, sequentially performing quality control with Fastp, assembly with Megahit, gene prediction with Prokka, and target enzyme sequence mining (using Mmseqs2 for sequence alignment). Ultimately, users will obtain a candidate gene list for target enzyme sequences (with seqtik2 used to extract the target sequences). Gene Surfing provides users with a convenient solution for metagenomic functional enzyme prediction. Users need only to install Snakemake and Mamba in advance—Snakemake is used to execute workflows, and Mamba is used to efficiently manage and install necessary software packages, with installation guides available in the relevant documentation. Fig. 1 illustrates the overall technical roadmap of this study.

Fig. 1.

Fig. 1

Graphical summary of the method.

2. Methods

2.1. Metagenomic data processing

Data Quality Control: Raw sequence quality control was performed using fastp (v0.23.4) [13], including adapter trimming for both ends (−5 -3), sliding window quality filtering (4 bp window, Phred ≥20), minimum read length restriction (50 bp), and removal of sequences containing >3 ambiguous bases or >40 % low-quality bases. Multi-threading acceleration (-w {threads}) was used to generate HTML/JSON reports covering base quality distribution, GC content, and adapter contamination statistics.

Sequence Assembly and Evaluation: Metagenomic assembly was performed using MEGAHIT (v1.2.9) [14] employing the iterative De Bruijn graph algorithm with k-mer optimization for computational efficiency and contig length maximization. Key parameters included multi-threading (-t {threads}) and output directory specification (-o 2.assembly/{sample name}). To ensure assembly integrity and mitigate chimeric artifacts—a known limitation of short-read assemblies that may generate spurious contigs affecting downstream analyses—a multi-faceted quality assessment was implemented:

  • 1.Assembly Metric Evaluation: Contiguity and completeness metrics (N50, L50, total length, contig count, largest contig) were computed using QUAST (v5.2.0) [15] with a minimum contig length threshold (--min-contig 500) and multi-threading (--threads {threads}), outputting results to designated directories (-o {output}).

  • 2.Chimera Detection: Potential chimeric contigs were identified via MetaQUAST (v5.2.0) [16], which aligns assemblies against a reference microbial genome database to detect discordant alignments indicative of misassemblies.

Gene Prediction and Functional Annotation: Gene prediction and annotation were performed using Prokka (v1.14.6) [17], integrating Prodigal (for coding genes), Infernal (for non-coding RNA), and the Pfam/TIGRFAM databases. Metagenome optimization mode (--metagenome) was enabled to improve annotation efficiency for low-abundance sequences, with a strict e-value threshold (--evalue 1e-10) to balance sensitivity and false positives. Multi-threading acceleration (--cpus {threads}) was used to generate standardized annotation files, saved to 4.Gene_prokka_annote/{sample name}, and traceability was ensured through uniform naming parameters (--prefix, --locustag).

2.2. Metagenomic sequence alignment and Extraction

Candidate enzyme sequences homologous to target functional families were identified using MMseqs2 [18] with rigorously validated thresholds. Reference databases were constructed in MMseqs2-compatible format for both user-provided query sequences and predicted metagenomic gene sequences. Sequence alignments were performed using the mmseqs easy-search command with empirically optimized parameters: a minimum global sequence identity threshold of 60 % (--min-seq-id 0.6) to ensure evolutionary conservation [[19], [20], [21]], query-based coverage requirement of 70 % (-c 0.7 --cov-mode 2) to retain complete functional domains [22], a default sensitivity parameter of -s 5.7, and nucleotide alignment mode (--search-type 3). Post-alignment filtering applied a stringent E-value cutoff of <1 × 10−3 to control false discoveries. All candidate alignments underwent manual verification of domain boundary consistency and coverage distribution. Validated sequences meeting these criteria were extracted using seqkit (V2.8.0) [23] for downstream functional characterization.

2.3. Construction and induction of gene expression in engineered bacteria

In this study, the pET-28a(+) vector was selected as the expression plasmid. This vector features a strong, inducible T7/lac hybrid promoter and a pBR322 origin of replication for high-copy-number propagation. It also carries a kanamycin resistance gene for antibiotic selection. The pET-28a(+) vector is widely used in E. coli systems for high-level expression of recombinant proteins, particularly soluble proteins, and is well suited for subsequent affinity purification and downstream functional or structural analyses [24,25]. Additionally, pGEX-2T and pET32a(+) plasmids were also used as expression vectors; both vectors possess strong inducible T7/lac hybrid promoters and carry ampicillin resistance. The gene sequence was optimized using the IDT Codon Optimization Tool based on the codon usage bias of E. coli [26]. During the optimization process, specific restriction enzyme recognition sites were eliminated, and the GC content of the gene was controlled within the range of 40 %–60 % to enhance translational efficiency. These adjustments significantly improved the expression level of the gene in E. coli. The optimized sequence was chemically synthesized by GENEWIZ, and then cloned into the target vector. Information on restriction enzyme sites, forward and reverse primers used for gene cloning, and the target vector is provided in Supplementary Table S4. The recombinant plasmid was introduced into E. coli BL21(DE3) competent cells via chemical transformation. Transformed cells were spread onto LB agar plates containing 50 μg/mL kanamycin and incubated overnight at 37 °C. Single colonies were then picked and screened by colony PCR, followed by sequencing to confirm the presence of positive clones.

Isopropyl β-d-1-thiogalactopyranoside (IPTG) was added to the culture medium at a final concentration of 0.2 mmol/L and induced by shaking at 200 rpm and 37 °C for 6 h to express cellulase. For nuclease protein expression, IPTG was added to a final concentration of 0.2–0.6 mmol/L, and cultures were incubated at 30 °C and 200 rpm for 6–9 h. For polyethylene-degrading enzyme expression, IPTG was added to a final concentration of 0.8 mmol/L, and cultures were incubated at 16 °C and 200 rpm for 20 h. Cells were harvested by centrifugation at 12,000 rpm for 2 min from 1 mL of each culture. The cell pellets were resuspended in 100 μL of 20 mmol/L PBS (pH 7.4) and sonicated on ice for 2 min (180 W, 2-s pulses with 4-s intervals). The sonicated samples were centrifuged at 12,000 rpm for 15 min at 4 °C. Then, 20 μL of the supernatant was mixed with 5 μL of 5× SDS loading buffer and heated at 95 °C for 15 min. The control sample was prepared following the same procedure using E.coli BL21(DE3) harboring the empty pET-28a(+) vector. Cellulase and polyethylene-degrading enzyme samples were separated by 10 % SDS-PAGE, while nuclease samples were separated by 8 % SDS-PAGE. Gels were stained with Coomassie Brilliant Blue R-250 for 25 min and subsequently destained overnight on a horizontal shaker using Coomassie destaining solution. The destaining solution was replaced 2–3 times until the gel background became clear and transparent and protein bands were distinctly visible for observation.

2.4. Verification of polyethylene degradation functionality

The expression products after induction were mixed with PE films and cultured for 30 days under conditions of 150 rpm. Based on the methodology established by Moon Gyung Yoon and colleagues, E. coli BL21 expressing heterologous alkane monooxygenases and E. coli BL21 carrying the empty pET-28a(+) vector were used as the control group. Both strains were inoculated at a 5 % inoculum size into MSM medium, where PE film and microspheres served as the sole carbon sources, and cultured for 30 days (with three replicates per group) at 37 °C [27]. After 30 days of co-culture, the PE films were placed into an ultrasonic cleaning device and sequentially washed for 30 min each with 2 % sodium dodecyl sulfate solution and 75 % (v/v) ethanol. The films were then dried overnight at 50 °C. The mass loss of the films was measured using an analytical balance with a precision of 0.0001 g. Microspheres were collected by repeated filtration in anhydrous ethanol and sterile water, followed by drying at 40 °C. The microspheres were subsequently analyzed via scanning electron microscopy (SEM), Fourier-transform infrared spectroscopy (FTIR), and mass loss determination.

The surface morphology of PE films and microspheres was examined using a scanning electron microscope (Hitachi S-4800, resolution: 1.0 nm) [28]. PE films were cut into 2 mm × 2 mm pieces and mounted onto conductive carbon adhesive tape, while PE microspheres were directly affixed to conductive tape. The samples were then coated with gold using an ion sputter coater. Imaging was performed in high-vacuum mode using secondary electron detection at an accelerating voltage of 25 kV to observe surface morphological changes. Three replicate images were taken for each sample. The proportion of damaged surface area on the polyethylene was quantified using ImageJ software (version 1.54j).

Fourier-transform infrared spectroscopy (FTIR, Bruker INVENIO R, resolution: 0.4 cm−1, measurement range: 340-8000 cm−1) was used to analyze the chemical structural changes of PE films and microspheres before and after enzymatic treatment [29]. The specific parameters were set as follows: scanning range 4000-500 cm−1, spectral resolution 4 cm−1, and 32 scans per sample. Data were processed using Origin 2024 software, and the resulting FTIR spectra were used to calculate the carbonyl index (CI). The carbonyl index is an important quantitative indicator of polyethylene oxidation; it increases with the degree of oxidation. There are two common methods for calculating the carbonyl index: peak area analysis and peak height analysis. As peak height analysis is more susceptible to variations from sample conditions and instrument response, this study adopted the peak area method. The carbonyl index (CI) was calculated using the following formula:

CI = Acarbonyl / Amethylene

Where Acarbonyl is the integrated area of the carbonyl absorption band in the range of 1900-1650 cm−1, and Amethylene is the integrated area of the methylene absorption band in the range of 2960-2850 cm−1.

2.5. Determination of cellulase activity

To quantify cellulase activity, the amount of reducing sugars/ends released by incubating a known amount of each enzyme with a given substrate was measured using the 3,5-dinitrosalicylic acid (DNS) method, specifically glucose and xylose.

For filter paper enzyme activity test, 50 mg of Whatman No. 1 filter paper (1 × 6 cm) was incubated with 1.0 mL citrate buffer (pH 4.8) and 100 μL crude enzyme at 50 °C for 60 min. The reaction was terminated with 3 mL DNS, boiled for 5 min, and absorbance measured at 540 nm [30]. 1 U of filter paper enzyme activity is defined as the amount of enzyme that releases 1 μg of glucose per minute under conditions of 50 °C and pH 4.8.

Endo-β-1,4-glucanase activity was determined using the 3,5-dinitrosalicylic acid (DNS) method:Endo-β-1,4-glucanase activity was determined using the 3,5-dinitrosalicylic acid (DNS) method. A reaction mixture containing 100 μL of crude enzyme and 900 μL of 1 % carboxymethyl cellulose sodium salt (CMC-Na) prepared in acetate buffer (pH 5.5) was incubated at 50 °C for 30 min. Subsequently, 3 mL of DNS reagent was added, and the mixture was boiled for 5 min to allow the released reducing sugars to react with the DNS reagent. The absorbance was then measured at 540 nm.1 U of endo-β-1,4-glucanase activity is defined as the amount of enzyme that releases 1 μg of glucose per minute under conditions of 50 °C and pH 5.5.

For β-glucosidase activity determination [31], 100 μL enzyme was incubated with 900 μL 5 mmol/L pNPG (pH 5.5) at 45 °C for 20 min. Reaction terminated with 2 mL 1 mol/L Na2CO3; absorbance read at 410 nm. 1 U of β-glucosidase activity is defined as the amount of enzyme that releases 1 μmol of p-nitrophenol (pNP) per minute under conditions of 45 °C and pH 5.5.

Xylanase activity was determined using the 3,5-dinitrosalicylic acid (DNS) method: A reaction mixture containing 900 μL of 1 % corn xylan dissolved in acetate buffer (pH 5.5) and 100 μL of enzyme solution was incubated at 45 °C for 30 min. Then, 3 mL of DNS reagent was added, and the mixture was boiled for 5 min to allow the released reducing sugars to react with the DNS reagent. The absorbance was subsequently measured at 540 nm.1 U of xylanase activity is defined as the amount of enzyme that releases 1 μg of xylose per minute under conditions of 45 °C and pH 5.5.

Reducing sugar standard curves were prepared using glucose and xylose as standards at concentrations of 0, 0.2, 0.4, 0.6, 0.8, and 1.0 mg/mL. After reaction with 3 mL of DNS reagent, the absorbance was measured at 540 nm, and the standard curves were obtained by linear regression fitting [32]. For the p-nitrophenol (pNP) standard curve, pNP was used as the standard at concentrations of 0, 0.2, 0.4, 0.6, 0.8, and 1.0 mmol/mL. The reaction was terminated by adding 2 mL of 1 mol/L Na2CO3, and the absorbance was measured at 410 nm. The standard curve was then generated by linear regression fitting (Fig. 1) [33].

2.6. Determination of nuclease activity

A stock solution containing different ions such as Ca2+, Na+, Ni2+, Mn2+, and Mg2+ was prepared by mixing chemical reagents such as calcium chloride, sodium chloride, nickel sulfate, magnesium chloride, and manganese sulfate with distilled water. These ions were used as metal ions in the buffer system. The specific composition of the reaction buffer is shown in Table 1. After the reaction is completed, 5 μL of 200 mM EDTA is added, followed by thorough shaking to terminate the reaction. Nuclease activity is then assessed using 1 % agarose gel electrophoresis.

Table 1.

Reaction system.

Component Stock Concentration Added Volume (μL) Final Concentration/Amount
Tris-HCl 200 mM 2 20 mM
Bovine Serum Albumin 1 mg/mL 2 0.1 mg/mL
Dithiothreitol 10 mM 2 1 mM
Glycero 50 % 2 10 %
Substrate DNA 100 ng/μL 3 300 ng
Metal Ions / 2 1 ×
Protein / 3 100 nM
Distilled Deionized Water / / up to 20 μL

3. Results

3.1. Construction of the enzyme mining workflow

To improve the automation of genomic analysis and reduce human errors, this study developed a comprehensive bioinformatics system (software and versions are listed in Table 2). After the raw sequencing data undergoes quality control with fastp and generates a visualization report, sequence deduplication and assembly are performed by MEGAHIT, while Quast assesses the assembly integrity. Prokka annotates the genes in the contig files, and high-similarity gene sequences are filtered through efficient alignment with MMseqs2. Target sequences are then extracted using SeqKit and annotated for carbohydrate-active enzymes using the dbCAN/CAZy database. The complete workflow is automated through Snakemake, and after users configure the mamba environment, it can be directly run.

Table 2.

Software used in the pipeline.

software version ref
fastp V0.20.1 (34)
megahit V1.2.9 (14)
quast V5.2.0 ()
metaquast V5.2.0 (16)
prokka V1.14.6 ()
mmseqs2 V13.45111 (18)
seqkit V2.8.0 (23)
run_dbcan V4 (35)
Snakemake V7.32.4 (36)

3.2. Functional validation of cellulase-encoding sequences

This study focused on metagenomic data from herbivorous ungulate feces and tree trunks in Xinjiang, aiming to explore lignocellulose-degrading enzyme genes. Using the Gene Surfing tool, we identified 1,311,316 potential gene sequences from 1,047,744,760 raw sequences. After comparing these to the CAZy database, 2114 sequences were selected for potential lignocellulose degradation. To validate cellulase activity, we performed structural and functional checks using AlphaFold, Phyre2, and AutoDock. After thorough validation, 127 enzyme sequences with significant E-values and Bit-Scores were identified as potential cellulases, providing reliable candidates for experimental validation. The metagenomic sample information used for cellulase recognition in this study is presented in Supplementary Materials Table S1.

In this study, a heterologous expression system was used to express these cellulase genes in Escherichia coli, and the function of the recombinant enzymes was assessed by measuring four enzyme activities: filter paper enzyme activity, endoglucanase activity, β-glucosidase activity, and xylanase activity (Fig. 2a). Filter paper enzyme activity was assessed by measuring the enzyme's ability to degrade cellulose filter paper. Cellulases hydrolyze the β-1,4 glycosidic bonds in cellulose [37], disrupting its polymeric structure [38]. This leads to the degradation of cellulose into smaller sugars or oligosaccharides, ultimately yielding a soluble product. CMC-Na, a derivative of cellulose, consists of glucose units linked by β-1,4 glycosidic bonds, similar to natural cellulose. The endoglucanase activity test helps verify whether the enzyme has cellulose degradation ability [39]. β-glucosidase is an important component of the cellulase system [40,41]. The enzyme degrades cellobiose to glucose and reduces the inhibition of cellulase by cellobiose. Testing the activity of β-glucosidase helps verify the final stage of cellulose hydrolysis and provides a reference for cellulase activity. Hemicellulases, especially xylanases, hydrolyze β-1,4 glycosidic bonds in hemicellulosic polysaccharides like xylan [42]. Although hemicellulose and cellulose are different polysaccharides, both contain β-1,4 glycosidic bonds, and hemicellulose and cellulases share some structural similarities in their substrates. Xylanase activity testing can indirectly indicate whether the enzyme has broad glycosidase activity and may play a supporting role in cellulose degradation. If the enzyme significantly hydrolyzes hemicellulose substrates, it suggests a broader glycosidic bond degradation capability and hints at its potential for cellulose degradation. In addition, cellulose and lignin are often mediated by proteins that bind and interfere with the recognition and degradation activity of a single enzyme. Therefore, xylanase activity was tested simultaneously.

Fig. 2.

Fig. 2

Verification of potential cellulase activity. (a) Schematic of cellulose enzyme hydrolysis substrates. (b) Proportion of active cellulose enzyme sequences. (c) Enzyme activity heatmap of 127 cellulose enzyme genes. (d) Enzyme activity of cellulose enzymes from different sources.

Fig. 2c shows the enzyme activity heatmap of 127 cellulase genes. Only 20 gene sequences did not show any enzyme activity, while the other gene sequences demonstrated varying degrees of enzyme activity. Among the 107 cellulase gene sequences that exhibited enzymatic activity (Fig. 2b), most of the unknown function genes exhibited significant cellulase activity, and some gene sequences not only possessed single-function enzymatic activity but also showed bifunctional or trifunctional potential (Fig. 2d). For example, the ML19_1 sequence derived from the fecal samples of Tarim red deer exhibited filter paper enzyme activity of 28.04 U/mL, β-glucosidase activity of 214.02 U/mL, and xylanase activity of 2572.32 U/mL simultaneously in the same expression system, indicating strong multifunctional enzymatic activity. In contrast, the Pi_2_3 sequence from the trunk metagenomic data demonstrated only endoglucanase activity of 10.5 U/mL, demonstrating lower enzymatic activity, SDS-PAGE was shown in Supplementary Fig. 2a. These findings suggest that certain genes are not only effective in degrading cellulose, but also have strong multifunctionality and high potential for degrading different substrates.

Numerous studies have been devoted to screening cellulose- and hemicellulose-degrading enzyme genes from different ecological environments and enhancing the production efficiency of these enzymes through heterologous expression techniques [43]. For example, Ariaeenejad et al. successfully screened a heat-resistant xylanase gene from the camel rumen macrogenome and found that, after expression in E. coli BL21, the enzyme efficiently degraded xylan and demonstrated good degradation performance by forming cracks and pores on the surface of pulp [44]. Similarly, in this study, we screened and identified a variety of cellulase gene sequences from the macrogenomic data of different animal fecal samples and functionally verified them through heterologous expression technology. The results showed that 84.25 % of the gene sequences exhibited varying degrees of enzyme activity, with some sequences demonstrating single-function enzyme activities, while others showed multi-function enzyme activities. This indicates that the gene mining method has high screening efficiency and universality, and it lays a foundation for the development of enzyme diversity and function enhancement.

3.3. In vitro cleavage activity validation of nuclease-encoding sequences

To validate the broad applicability of the Gene Surfing platform in functional enzyme discovery, we further explored its application in the mining of nucleases. While prior studies validated Gene Surfing's efficacy in cellulase discovery, nucleases—a distinct functional enzyme class with unique substrate specificities and mechanisms—were targeted here to demonstrate the platform's versatility and accuracy in screening diverse functional enzymes through gene mining and validation. The metagenomic sample information used for nuclease recognition in this study is presented in Supplementary Materials Table S2.

Specifically, a literature review was conducted to select typical nuclease sequences with cleavage activity as probes, which were then mined from metagenomic data obtained from animal feces, soil, human saliva, and other samples. These datasets contained a total of 175,448,010 sequences. After performing sequence alignment using Gene Surfing, 585 potential nuclease sequences were identified (Fig. 3a). To validate the effectiveness of these sequences and further narrow the scope of validation, SnapGene was used to align the primary structures of the probe sequences with the potential sequences. Additionally, tertiary structure predictions and comparisons were performed using tools such as Phyre2 and SWISS-MODEL. Ultimately, 7 potential nuclease-encoding sequences were selected [45] (Fig. 3b).

Fig. 3.

Fig. 3

Discovery and Ex Vivo Cleavage Activity Validation of Potential Nucleases. (a) Distribution of 585 nuclease sequences in metagenomic samples. (b) Schematic diagram of vector construction for the 7 nuclease sequences. (c) Cleavage verification of the substrate PESC by nucleases M3FII, S23I, and MY2. (d) Cleavage activity of nuclease PM6I on target pESC and pWJQ. (d.i) Cleavage activity under Na+ conditions. (f) Cleavage activity under Ca2+ conditions. (h) Cleavage activity under Mn2+ conditions. (e.g.i) Cleavage activity of MY2I and PM6I on adenoviral plasmid pDC316-mCMV. (e) Cleavage activity under Ni2+ conditions. (g) Cleavage activity under Mn2+ conditions. (i) Cleavage activity under Na + conditions. (j) Proportional relationship between the total number of nuclease sequences discovered, the number of validated sequences, and the number of active sequences.

Induced expression and protein purification were performed for seven sequences, with in vitro nuclease activity verification showing that M3FII, S23I, and MY2I cleaved supercoiled plasmids, SDS-PAGE was shown in Supplementary Fig. 2b. The experimental group's plasmid bands were weaker than the control, and as the reaction time increased, the bands became fainter (Fig. 3c). PM6I's extracellular cleavage activity is shown in Fig. 3d, f, and 3g. Under Na+ conditions, PM6I cleaved supercoiled plasmid DNA into relaxed and linear forms, confirming its endonuclease activity (Fig. 3d). With divalent cations like Ca2+ (Fig. 3f) and Mn2+, PM6I promoted cleavage (Fig. 3h), with complete degradation into oligonucleotides after 60 min of Mn2+ treatment. MY2I and PM6I also exhibited cleavage activity on adenovirus plasmids under Mn2+ [46] (Fig. 3g).

Nuclease endonucleases catalyze the hydrolysis of internal phosphodiester bonds in polynucleotides, important in molecular biology and genetic engineering. GVE2 HNH cleaved both linear and circular plasmids, while RecJ degraded supercoiled plasmids within 2 h, with optimal activity in Mg2+. PM6I exhibited similar activity to GVE2 HNH but with faster degradation of supercoiled plasmids and a preference for Mn2+, not Mg2+. This study confirmed that S23I, M3FII, MY2I, and PM6I are nucleases. PM6I cleaves supercoiled plasmids into relaxed, linear forms, and then oligonucleotides. It remains unclear whether PM6I and MY2I have targeted cleavage functions like Cas9, suggesting they may be potential CRISPR-Cas system candidates [47].

Seven nuclease-encoding sequences were expressed, with four sequences showing nuclease activity, yielding a screening success rate of 57.1 %. However, only 1.20 % of the total sequences mined were validated, suggesting further potential for discovering nucleases from environmental metagenomes and supporting the Gene Surfing method's effectiveness [48].

3.4. Verification of the polyethylene degrading enzyme coding sequence

Based on the effectiveness of the Gene Surfing nuclease gene prediction method validated in previous studies, this research aims to expand the application of this technology to the field of polyethylene degradation enzyme development. In response to the environmental issues caused by polyethylene, a challenging synthetic polymer to degrade, exploring efficient degrading microorganisms and enzyme systems has become a key focus in environmental biotechnology research [49].

To mine potential polyethylene-degrading enzymes, we aligned gene or amino acid sequences of 11 known polyethylene-degrading enzymes with 4,573,494,521 annotated sequences from multiple metagenomes, yielding 701 potential sequences. Among these, alkane monooxygenase AOR82996.1, laccase ADZ57284.1, hydrolase QPI65985.1, laccase RQM31760.1 [50], AFC76164.1, and CAA77015.1 were used to mine sequences from soil, animal feces, and insect samples, resulting in 54, 114, 3, 2, 26, and 2 potential polyethylene-degrading sequences, respectively [51] (Fig. 4a). The metagenomic sample information used for polyethylene-degrading enzymes recognition in this study is presented in Supplementary Materials Table S3.

Fig. 4.

Fig. 4

Verification of the activity of potential polyethylene hydrolase. (a) Metagenomic mining results using different index sequences. (b) SEM analysis of PE films after 30 days of reaction with polyethylene-degrading enzymes. (c) SEM analysis of PE microspheres after 30 days of reaction with polyethylene-degrading enzymes. (d,e) FTIR analysis of PE films treated with polyethylene-degrading enzymes for 30 days. (f,g) Functional group changes after polyethylene-degrading enzymes reacted with PE microplastics. (h,i) Degradation efficiency of PE films (∗ indicates 0.01 < P < 0.05, ∗∗ indicates 0.001 < P < 0.01, ∗∗∗ indicates P < 0.001, ns indicates P > 0.05 with no statistical significance).

Four enzymes were selected for demonstration: Est-GKFE3 (from gut microbiome), Lac-S12-2 (from soil), Hyd-S11-1 (from soil), and AlkB-NJ1-2 (from farmland soil), corresponding to esterase, hydrolase, laccase, and alkane monooxygenase, respectively, SDS-PAGE was shown in Supplementary Fig. 2c. After 30 days of treatment with AlkB-NJ1-2, new absorption peaks were observed in the FTIR spectrum of PE films at 3409-3320 cm−1, 1751 cm−1, 1639 cm−1, 1164 cm−1, and 1076 cm−1. These peaks correspond to the stretching vibrations of functional groups such as intermolecular hydrogen bonds (–OH), carbonyl (C Created by potrace 1.16, written by Peter Selinger 2001-2019 O), carbon–carbon double bonds (C Created by potrace 1.16, written by Peter Selinger 2001-2019 C), and C–O in primary and secondary alcohols. In PE microspheres treated with AlkB-NJ1-2, the characteristic peaks at 2923 cm−1 and 2850 cm−1 showed both a decrease in intensity and a slight rightward shift. These changes suggest that enzymatic degradation led to the cleavage of long PE chains, resulting in alterations in hydrocarbon chain conformation and molecular packing. For PE films treated with Hyd-S11-1, new peaks appeared at 1018 cm−1 (C–O), 1747 cm−1 (C Created by potrace 1.16, written by Peter Selinger 2001-2019 O), and 1643 cm−1 (C Created by potrace 1.16, written by Peter Selinger 2001-2019 C), with a calculated carbonyl index of 2.22 %. In the corresponding microspheres, new absorption emerged at 1115-1060 cm−1 (C–O–C), and the carbonyl index increased to 7.13 %, indicating more intensive hydrolytic oxidation of the microspheres within 30 days. Compared to the control, treatment with Lac-S12-2 resulted in a new peak at 1654 cm−1 (C Created by potrace 1.16, written by Peter Selinger 2001-2019 C) in the PE films, with a carbonyl index of 2.73 %. The treated microspheres exhibited new peaks at 3405-3263 cm−1 (–OH hydrogen bonds), 1653 cm−1 (C Created by potrace 1.16, written by Peter Selinger 2001-2019 C), and 1095-1018 cm−1 (C–O), with a carbonyl index of 2.90 %. These results suggest that both PE forms primarily underwent oxidation through the introduction of C Created by potrace 1.16, written by Peter Selinger 2001-2019 C bonds, with comparable degrees of oxidation over 30 days. After 30 days of Est-GKFE3 treatment, PE films showed new peaks at 1639 cm−1 (C Created by potrace 1.16, written by Peter Selinger 2001-2019 C) and 1187/1006 cm−1 (C–H bending), with a moderate carbonyl index. Treated microspheres exhibited absorption peaks similar to those observed for other esterases, particularly at 1660 cm−1 (C Created by potrace 1.16, written by Peter Selinger 2001-2019 C) and 1000-1300 cm−1 (C–O). In summary, after 30 days of enzymatic treatment, all four enzymes induced the formation of functional groups such as C Created by potrace 1.16, written by Peter Selinger 2001-2019 O, C Created by potrace 1.16, written by Peter Selinger 2001-2019 C, and C–O in both PE films and microspheres. Due to their higher surface area, PE microspheres exhibited faster oxidation, generally reflected by higher carbonyl index values than those of films. Moreover, the decrease and shift in the 2923/2850 cm−1 peaks further support that enzymatic catalysis led to cleavage of PE chains and structural reorganization (Fig. 4d and g). SEM results revealed the PE film surface was smooth in the control group but showed cracks, holes, wrinkles, and indentations after enzyme interaction [52] (Fig. 4b). PE microspheres transformed from spherical to irregular shapes with debris adsorption, small pores, and reduced particle size (Fig. 4c).

Weight loss analysis showed that the four polyethylene-degrading enzymes (Lac-S12-2, Hyd-S11-1, Est-GKFE3, AlkB-NJ1-2) effectively degraded PE films, with degradation rates of 0.929 %, 0.423 %, 1.077 %, and 0.746 %, respectively, significantly higher than the control group (Fig. 4h and i). The study also demonstrated that the mining method used is effective for screening polyethylene-degrading enzymes from various environmental sources. Of the 38 selected enzyme coding sequences, 31 showed polyethylene degradation activity, achieving an 81.5 % success rate, confirming the method's efficiency for high-throughput screening of polyethylene-degrading enzymes.

Polyethylene, characterized by its robustness, low production cost, and high flexibility, is the most widely used polyolefin material [53,54]. However, it poses severe environmental challenges due to its long half-life and resistance to degradation. Compared to conventional recycling methods, enzymatic biodegradation of PE offers significant advantages. Zheng Guo et al. identified a novel thermophilic laccase (LfLac3) from the PE-degrading bacterium Lysinibacillus fusiformis. Upon expression of LfLac3 in E.coli XL1-Blue, the crude enzyme solution incubated with PE films for eight weeks induced new FTIR absorption peaks at 1600-1800 cm−1 and 2361 cm−1, corresponding to carbonyl (C Created by potrace 1.16, written by Peter Selinger 2001-2019 O) and –C–N– stretching vibrations, respectively [55]. Similarly, a laccase (rPsLac) recently discovered from Psychrobacter sp. NJ228 was shown to catalyze the formation of polar functional groups such as carbonyl and carboxyl groups on the PE surface [56]. Recent efforts to discover PE-degrading enzymes have predominantly focused on culturable microorganisms capable of degrading plastics, while exploration of enzymes from unculturable microbes remains limited. Mao and colleagues enriched plastic-degrading microorganisms from activated sludge by using PE plastic as the sole carbon and energy source. After 28 days of incubation, PE films exhibited a 3 % weight loss, indicating degradation. Metagenomic sequencing and assembly of the activated sludge revealed 14 plastic degradation-related genes, including oxidases, laccases and lipases [57], demonstrating the reliability of mining plastic-degrading enzymes from metagenomic data [58]. In contrast, the polyethylene-degrading functional genes heterologously expressed in this study were sourced from diverse environmental metagenomes, including those from pseudosteppe soils, herbivorous animal guts, cotton field soils with different planting years, and medicated residue soils, rather than from commonly culturable microorganisms. Furthermore, in our experiments with AlkB-NJ1-2 for PE film degradation, new absorption peaks appeared at 1600-1800 cm−1 in the FTIR spectra, coinciding with the carbonyl peaks observed in PE films treated with LfLac3, suggesting functional similarity between AlkB-NJ1-2 and LfLac3. Additionally, PE films treated with Est-GKFE3 showed a weight loss of 1.077 %, consistent with Mao et al.'s findings. These results confirm that polyethylene-degrading enzymes mined using the Genesurfing tool possess effective PE degradation activity. They also validate the efficiency and accuracy of this mining approach, providing a rapid and reliable technical route for the high-throughput screening of PE-degrading enzymes from environmental metagenomes.

4. Discussion

With the continuous and rapid development of high-throughput sequencing technologies, bioinformatics tools, and biological databases, metagenomics has evolved into a powerful platform for mining novel enzymes and biomass conversion pathways from complex environmental microbiomes [59]. Among these, the strategy of screening homologous sequences based on the similarity to known functional gene sequences remains the primary means for discovering characterized enzymes or metabolic pathways due to its directness and effectiveness [60]. For instance, Davidi and colleagues used the large subunit of Rhodospirillum rubrum Rubisco as a reference, performed BLASTp searches against marine and sediment metagenomic datasets, and combined USEARCH clustering, MAFFT alignment, and RAxML phylogenetic analysis to successfully identify 33,565 Rubisco candidate sequences with carboxylase activity. They validated the reliability of this approach through heterologous expression of 143 variants [61]. In another study focusing on cellulases, Mehdi developed a specialized tool to identify cellulase genes in metagenomic contigs via sequence alignment and classify them using machine learning to predict optimal temperature and pH [62]. However, such studies typically focus on single enzyme types, not only limiting the number and diversity of enzymes discovered from metagenomic datasets but also failing to fully exploit the complex, cooperative enzyme systems in microbial communities and reveal their overall functional potential in material cycles (e.g., biomass degradation). Additionally, although sequence similarity-based strategies form the foundation of functional gene mining, traditional decentralized analytical workflows (involving multiple independent tools and manual steps) often suffer from inefficiencies and inconsistent results [63], significantly increasing analysis costs and hindering comparisons and integrations across studies.

To address these challenges, we developed GeneSurfing—a streamlined pipeline integrating standardized analysis modules to enable fully automated processing from raw metagenomic sequencing data to target enzyme identification. The core advantages of this platform lie in its integrated, standardized, and automated design, as well as its flexibility in identifying target genes based on user-defined probe sequences. Users can provide different functional gene sequences (e.g., those of specific enzyme families) as probes according to research needs, and GeneSurfing efficiently and specifically mines corresponding homologous candidate genes from massive metagenomic data. This flexibility greatly expands the platform's applicability, enabling it to serve diverse enzyme or functional gene discovery goals. By integrating key steps such as data preprocessing, gene prediction, sequence similarity-based target enzyme identification, and result integration, GeneSurfing significantly enhances analysis efficiency, ensures result reproducibility and comparability, and reduces the technical barrier of metagenomic enzyme mining through a user-friendly interface, providing an efficient solution for large-scale, systematic enzyme resource discovery.

To fully validate the versatility and effectiveness of GeneSurfing, we applied it to metagenomic samples from diverse environments, including animal gut, soil, and plant endophytes, successfully mining three types of functional enzymes (cellulases, polyethylene-degrading enzymes such as laccases, and specific homologous enzymes). Experimental validation showed that in cellulase mining, 127 target genes were selected for heterologous expression from 2114 candidate sequences, of which 107 (84.25 %) exhibited activity—this success rate significantly outperforms traditional CMC agar plate screening, fully demonstrating the platform's screening efficiency and accuracy. In polyethylene-degrading enzyme mining, four laccase-encoding genes were identified, and their expression products, after co-culturing with PE films for 30 days, caused characteristic surface damage (cracks and cavities) confirmed by scanning electron microscopy (SEM), highlighting the value of metagenomic mining in discovering enzymes with environmental remediation (e.g., plastic degradation) potential. These successful cases from different habitats and targeting different enzyme types strongly confirm the broad applicability and reliability of GeneSurfing for multi-functional enzyme mining in various metagenomic contexts. Compared with traditional culturing methods, metagenomic technology again highlights its core advantage in this study: bypassing the “culturability” bottleneck of microorganisms to directly mine rich, diverse, and high-performance enzyme resources from environmental genetic material [64].

Nevertheless, this study and current metagenomic enzyme mining methods still have certain limitations. First, the core functional annotation of GeneSurfing still relies on sequence similarity searches in known databases, which may miss novel enzymes or distant homologs with low homology to known families but similar functions [65]. Second, existing heterologous expression systems (primarily based on model microorganisms like E. coli) have limited compatibility with complex enzyme systems, potentially leading to incorrect expression or folding of some enzymes (e.g., those requiring specific folding chaperones or post-translational modifications), causing false negatives in functional validation [66]. Furthermore, the platform's current annotation mainly provides broad enzyme functions (e.g., “cellulase”), and its ability to accurately predict finer substrate specificity, kinetic parameters, or optimal conditions remains to be improved. Finally, in-depth analysis of the correlation between specific habitat microbial community structures and their specific enzymatic functional potential requires integrating more profound ecological or associative analyses.

Looking to the future, the GeneSurfing framework has broad optimization and application prospects. The design advantage of its modular workflow provides a foundation for integrating richer predictive and analytical functions. For example, future considerations include introducing modules for species diversity analysis (e.g., based on 16S rRNA genes or marker genes), metagenomic binning, and metagenome-assembled genome (MAGs) construction, which will help achieve in-depth correlation analysis between microbial community structures and functional potentials, more accurately locating the sources and ecological backgrounds of target genes. More importantly, we plan to explore integrating the concept of the synthetic biology “Design-Build-Test-Learn (DBTL) cycle” into the platform's future development. This introduction may provide new ideas for overcoming the limitations of current sequence similarity-based methods (e.g., insufficient identification of low-homology novel genes). Specifically, integrating machine learning or deep learning models to learn the complex features of enzyme sequences is expected to enhance the ability to predict distant homologs or entirely new functional genes. Meanwhile, functional gene data obtained from experimental validation can serve as valuable feedback to iteratively optimize prediction models or guide subsequent probe design, gradually improving the platform's efficiency and accuracy in discovering novel biological components. Additionally, integrating emerging protein structure prediction tools (such as AlphaFold2 [67] to assist functional annotation and rational design, as well as expanding the platform to more functional gene types (e.g., antibiotic resistance genes, secondary metabolite synthesis gene clusters, etc.), will make GeneSurfing a more powerful tool for mining and utilizing environmental microbial functional resources, continuously optimizing the depth and breadth of metagenomic functional gene mining.

CRediT authorship contribution statement

Tong Xu: Writing – original draft, Visualization, Software, Methodology, Investigation, Formal analysis, Data curation. Danyang Huang: Writing – original draft, Visualization, Methodology, Formal analysis, Data curation. Tingting Huang: Writing – original draft, Validation, Methodology, Data curation. Yuxin Wang: Writing – original draft, Visualization, Methodology, Formal analysis. Wanqiu Chen: Writing – original draft, Investigation. Shijunyin Chen: Writing – review & editing, Writing – original draft, Supervision, Project administration, Methodology. Yurong Qian: Writing – review & editing, Supervision, Software, Project administration. Haitao Yue: Writing – review & editing, Supervision, Resources, Project administration, Investigation.

Funding

This study was supported by the Third Xinjiang Scientific Expedition Program,the National Key Research and Development Program of China (grant 2022xjkk020603); Key Research and Development Project of Xinjiang Uygur Autonomous Region of China (grant 2023B02034, 2023B02034-2), and the National Natural Science Foundation of China (grant U2003305, 31860018), and the Tianshan Young Top Talents-Basic Research Talents(2024TSYCJU0002).

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

All the data generated or analyzed during this study are included in this article. The Gene Surfing pipeline is available under the MIT license and can be accessed at https://github.com/XT33KAKA/GeneSurfing.

Acknowledgements

Not applicable.

Footnotes

Peer review under the responsibility of Editorial Board of Synthetic and Systems Biotechnology.

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.synbio.2025.07.006.

Appendix A. Supplementary data

The following are the supplementary data to this article:

Multimedia component 1
mmc1.docx (1,002KB, docx)
Multimedia component 2
mmc2.xlsx (23.5KB, xlsx)

References

  • 1.Koutsandreas T., Koutsandreas T., Ladoukakis E., Ladoukakis E., Pilalis E., Pilalis E., et al. ANASTASIA: an automated metagenomic analysis pipeline for novel enzyme discovery exploiting next generation sequencing data. Front Genet. 2019;10:469. doi: 10.3389/fgene.2019.00469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Mehdi F.S., Shohreh A., Fereshteh F.A., Behrouz Z., Takeshi K., Kaveh K., et al. MCIC: automated identification of cellulases from metagenomic data and characterization based on temperature and pH dependence. Front Microbiol. 2020;11:567863 doi: 10.3389/fmicb.2020.567863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Erdem E., Woodley J.M. Using enzymes for catalysis under industrial conditions. ACS Catal. 2024;14(24):18436–18441. doi: 10.1021/acscatal.4c05265. [DOI] [Google Scholar]
  • 4.Vivek K., Sandhia G.S., Subramaniyan S. Extremophilic lipases for industrial applications: a general review. Biotechnol Adv. 2022;60 doi: 10.1016/j.biotechadv.2022.108002. [DOI] [PubMed] [Google Scholar]
  • 5.Huber L.B., Kaur N., Henkel M., Marchand V., Motorin Y., Ehrenhofer-Murray A.E., et al. A dual-purpose polymerase engineered for direct sequencing of pseudouridine and queuosine. Nucleic Acids Res. 2023;51(8):3971–3987. doi: 10.1093/nar/gkad177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Aliaga Goltsman D.S., Alexander L.M., Lin J.-L., Fregoso Ocampo R., Freeman B., Lamothe R.C., et al. Compact Cas9d and HEARO enzymes for genome editing discovered from uncultivated microbes. Nat Commun. 2022;13(1):7602. doi: 10.1038/s41467-022-35257-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Chen J., Jia Y., Sun Y., Liu K., Zhou C., Liu C., et al. Global marine microbial diversity and its potential in bioprospecting. Nature. 2024;633(8029):371–379. doi: 10.1038/s41586-024-07891-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Jia B., Han X., Kim K.H., Jeon C.O. Discovery and mining of enzymes from the human gut microbiome. Trends Biotechnol. 2022;40(2):240–254. doi: 10.1016/j.tibtech.2021.06.008. [DOI] [PubMed] [Google Scholar]
  • 9.Dandare S.U., Young J.M., Kelleher B.P., Allen C.C.R. The distribution of novel bacterial laccases in alpine paleosols is directly related to soil stratigraphy. Sci Total Environ. 2019;671:19–27. doi: 10.1016/j.scitotenv.2019.03.250. [DOI] [PubMed] [Google Scholar]
  • 10.Breitwieser F.P., Lu J., Salzberg S.L. A review of methods and databases for metagenomic classification and assembly. Briefings Bioinf. 2019;20(4):1125–1136. doi: 10.1093/bib/bbx120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Alser M., Lindegger J., Firtina C., Almadhoun N., Mao H., Singh G., et al. Comput Struct Biotechnol J 2022;20:4579-4599. From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yu T., Cui H., Li J.C., Luo Y., Jiang G., Zhao H. Enzyme function prediction using contrastive learning. Science. 2023;379(6639):1358–1363. doi: 10.1126/science.adf2465. [DOI] [PubMed] [Google Scholar]
  • 13.Chen S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta. 2023;2(2) doi: 10.1002/imt2.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Li D., Luo R., Liu C.-M., Leung C.-M., Ting H.-F., Sadakane K., et al. MEGAHIT v1.0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods. 2016;102:3–11. doi: 10.1016/j.ymeth.2016.02.020. [DOI] [PubMed] [Google Scholar]
  • 15.Gurevich A., Saveliev V., Vyahhi N., Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Mikheenko A., Saveliev V., Gurevich A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics. 2016;32(7):1088–1090. doi: 10.1093/bioinformatics/btv697. [DOI] [PubMed] [Google Scholar]
  • 17.Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30(14):2068–2069. doi: 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]
  • 18.Mirdita M., Steinegger M., Söding J. MMseqs2 desktop and local web server app for fast, Interactive sequence searches. Bioinformatics. 2019;35(16):2856–2858. doi: 10.1093/bioinformatics/bty1057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Devos D., Valencia A. Practical limits of function prediction. Proteins: Struct, Funct, Bioinf. 2000;41(1):98–107. doi: 10.1002/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO;2-S. [DOI] [PubMed] [Google Scholar]
  • 20.Wilson C.A., Kreychman J., Gerstein M. Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J Mol Biol. 2000;297(1):233–249. doi: 10.1006/jmbi.2000.3550. [DOI] [PubMed] [Google Scholar]
  • 21.Tian W.D., Skolnick J. How well is enzyme function conserved as a function of pairwise sequence identity? J Mol Biol. 2003;333(4):863–882. doi: 10.1016/j.jmb.2003.08.057. [DOI] [PubMed] [Google Scholar]
  • 22.Park H., Joachimiak M.P., Jungbluth S.P., Yang Z., Riehl W.J., Canon R.S., et al. A bacterial sensor taxonomy across earth ecosystems for machine learning applications. mSystems. 2024;9(1) doi: 10.1128/msystems.00026-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Shen W., Sipos B., Zhao L. SeqKit2: a Swiss army knife for sequence and alignment processing. Imeta. 2024;3(3):e191. doi: 10.1002/imt2.191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ganesh K.R., Ningaraju T.M., Peter A., Lakshminarayana Reddy C.N., Kumar V.K. Molecular cloning and heterologous expression of lipase gene from Pseudomonas aeruginosa in Escherichia coli. Int J Biol Macromol. 2025;297 doi: 10.1016/j.ijbiomac.2025.139866. [DOI] [PubMed] [Google Scholar]
  • 25.Luo W.Q., Chen J.H., Zhou H.J., Huang X.W., Zhou Y., Yuan J.G., et al. Cloning and expression of a novel human HCUTA cDNA. Chin Sci Bull. 2000;45(14):1301–1304. doi: 10.1007/BF03182907. [DOI] [Google Scholar]
  • 26.Demissie E.A., Park S.-Y., Moon J.H., Lee D.-Y. Comparative analysis of codon optimization tools: advancing toward a multi-criteria framework for synthetic gene design. J Microbiol Biotechnol. 2025;35 doi: 10.4014/jmb.2411.11066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kemmish H.A.-O., Fasnacht M., Yan L. Fully automated antibody structure prediction using BIOVIA tools: validation study. PLoS One. 2017;12(5) doi: 10.1371/journal.pone.0177923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Adiguzel A.O., Sen F., Konen-Adiguzel S., Kideys A.E., Karahan A., Doruk T., et al. Identification of cutinolytic esterase from microplastic-associated microbiota using functional metagenomics and its plastic degrading potential. Mol Biotechnol. 2024;66(10):2995–3012. doi: 10.1007/s12033-023-00916-7. [DOI] [PubMed] [Google Scholar]
  • 29.Akcaozoglu S., Adiguzel A.O., Akcaozoglu K., Deveci E.U., Gonen C. Investigation of the bacterial modified waste PET aggregate VIA Bacillus safensis to enhance the strength properties of mortars. Constr Build Mater. 2021;270 doi: 10.1016/j.conbuildmat.2020.121828. [DOI] [Google Scholar]
  • 30.Liu M., Yan K., Yu S., Tan F., Hu W., Dai Z., et al. Ganoderma lucidum driven fermentation of Rosa roxburghii pomace: effects on noodle physicochemical properties, digestion, and gut microbiota. Food Chem X. 2024;24 doi: 10.1016/j.fochx.2024.102014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Li N., Qiu Z., Cai W., Shen Y., Wei D., Chen Y., et al. The Ras small GTPase RSR1 regulates cellulase production in Trichoderma reesei. Biotechnol Biofuel Bioproduct. 2023;16(1):87. doi: 10.1186/s13068-023-02341-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Li J., Bai J., Yuan J., Fan S., Zhang T., Pan T., et al. Heterologous expression and characterization of an endoglucanase from Lactobacillus plantarum dy-1. Food Funct. 2023;14(8):3760–3768. doi: 10.1039/d2fo02460h. [DOI] [PubMed] [Google Scholar]
  • 33.Li H., Li X., Bu X., Cheng J., Wu D. Discovery and characterization of a new (3-glucosidase from Wolfiporia cocos through RNA sequencing and heterologous expression in Escherichia coli. Int J Biol Macromol. 2025;299 doi: 10.1016/j.ijbiomac.2025.139974. [DOI] [PubMed] [Google Scholar]
  • 34.Chen S., Zhou Y., Chen Y., Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zheng J., Hu B., Zhang X., Ge Q., Yan Y., Akresi J., et al. dbCAN-seq update: CAZyme gene clusters and substrates in microbiomes. Nucleic Acids Res. 2023;51(D1):D557–D563. doi: 10.1093/nar/gkac1068. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Köster J., Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28(19):2520–2522. doi: 10.1093/bioinformatics/bts480. [DOI] [PubMed] [Google Scholar]
  • 37.Edema H., Ashraf M.F., Samkumar A., Jaakola L., Karppinen K. Characterization of cellulases from softening fruit for enzymatic depolymerization of cellulose. Carbohydr Polym. 2024;343 doi: 10.1016/j.carbpol.2024.122493. [DOI] [PubMed] [Google Scholar]
  • 38.Ma X., Li S., Tong X., Liu K. An overview on the current status and future prospects in Aspergillus cellulase production. Environ Res. 2024;244 doi: 10.1016/j.envres.2023.117866. [DOI] [PubMed] [Google Scholar]
  • 39.Zajki-Zechmeister K., Eibinger M., Nidetzky B. Enzyme synergy in transient clusters of endo- and exocellulase enables a multilayer mode of processive depolymerization of cellulose. ACS Catal. 2022;12(17):10984–10994. doi: 10.1021/acscatal.2c02377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Yang W., Su Y., Wang R., Zhang H., Jing H., Meng J., et al. Microbial production and applications of β-glucosidase-A review. Int J Biol Macromol. 2024;256 doi: 10.1016/j.ijbiomac.2023.127915. [DOI] [PubMed] [Google Scholar]
  • 41.Salgado J.C.S., Meleiro L.P., Carli S., Ward R.J. Glucose tolerant and glucose stimulated β-glucosidases – a review. Bioresour Technol. 2018;267:704–713. doi: 10.1016/j.biortech.2018.07.137. [DOI] [PubMed] [Google Scholar]
  • 42.Méndez-Líter J.A., de Eugenio L.I., Nieto-Domínguez M., Prieto A., Martínez M.J. Hemicellulases from Penicillium and Talaromyces for lignocellulosic biomass valorization: a review. Bioresour Technol. 2021;324 doi: 10.1016/j.biortech.2020.124623. [DOI] [PubMed] [Google Scholar]
  • 43.Meng Z., Ma J., Sun Z., Yang C., Leng J., Zhu W., et al. Characterization of a novel bifunctional enzyme from buffalo rumen metagenome and its effect on in vitro ruminal fermentation and microbial community composition. Anim Nutr. 2023;13:137–149. doi: 10.1016/j.aninu.2023.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Ariaeenejad S., Maleki M., Hosseini E., Kavousi K., Moosavi-Movahedi A.A., Salekdeh G.H. Mining of camel rumen metagenome to identify novel alkali-thermostable xylanase capable of enhancing the recalcitrant lignocellulosic biomass conversion. Bioresour Technol. 2019;281:343–350. doi: 10.1016/j.biortech.2019.02.059. [DOI] [PubMed] [Google Scholar]
  • 45.Wei L., Wang Z., Dong Y., Yu D., Chen Y. Enhanced CRISPR/Cas12a fluorimetry via a DNAzyme-embedded framework nucleic acid substrate. Anal Chem. 2024;96(41):16453–16461. doi: 10.1021/acs.analchem.4c04710. [DOI] [PubMed] [Google Scholar]
  • 46.Chen J., Chen Y., Huang L., Lin X., Chen H., Xiang W., et al. Trans-nuclease activity of Cas9 activated by DNA or RNA target binding. Nat Biotechnol. 2024;43(4):558–568. doi: 10.1038/s41587-024-02255-7. [DOI] [PubMed] [Google Scholar]
  • 47.Yang J., Li X., He Q., Wang X., Tang J., Wang T., et al. Structural basis for the activity of the type VII CRISPR–Cas system. Nature. 2024;633(8029):465–472. doi: 10.1038/s41586-024-07815-0. [DOI] [PubMed] [Google Scholar]
  • 48.Fang J., Sheng L., Ye Y., Ji J., Sun J., Zhang Y., et al. Recent advances in biosynthesis of mycotoxin-degrading enzymes and their applications in food and feed. Crit Rev Food Sci Nutr. 2025;65(8):1465–1481. doi: 10.1080/10408398.2023.2294166. [DOI] [PubMed] [Google Scholar]
  • 49.Hu X., Gu H., Sun X., Wang Y., Liu J., Yu Z., et al. Metagenomic exploration of microbial and enzymatic traits involved in microplastic biodegradation. Chemosphere. 2024;348 doi: 10.1016/j.chemosphere.2023.140762. [DOI] [PubMed] [Google Scholar]
  • 50.Zeng B., Fu Y., Ye J., Yang P., Cui S., Qiu W., et al. Ancestral sequence reconstruction of the prokaryotic three-domain laccases for efficiently degrading polyethylene. J Hazard Mater. 2024;476 doi: 10.1016/j.jhazmat.2024.135012. [DOI] [PubMed] [Google Scholar]
  • 51.De Filippis F., Bonelli M., Bruno D., Sequino G., Montali A., Reguzzoni M., et al. Plastics shape the black soldier fly larvae gut microbiome and select for biodegrading functions. Microbiome. 2023;11(1):205. doi: 10.1186/s40168-023-01649-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Zhang X., Feng X., Lin Y., Gou H., Zhang Y., Yang L. Degradation of polyethylene by Klebsiella pneumoniae Mk-1 isolated from soil. Ecotoxicol Environ Saf. 2023;258 doi: 10.1016/j.ecoenv.2023.114965. [DOI] [PubMed] [Google Scholar]
  • 53.Bai F., Fan J., Zhang X., Wang X., Liu S. Biodegradation of polyethylene with polyethylene-group-degrading enzyme delivered by the engineered Bacillus velezensis. J Hazard Mater. 2025;488 doi: 10.1016/j.jhazmat.2025.137330. [DOI] [PubMed] [Google Scholar]
  • 54.Tournier V., Duquesne S., Guillamot F., Cramail H., Andre I., Taton D., et al. Enzymes? Power for plastics degradation. Chem Rev. 2023;123(9):5612–5701. doi: 10.1021/acs.chemrev.2c00644. [DOI] [PubMed] [Google Scholar]
  • 55.Zhang Y., Plesner T.J., Ouyang Y., Zheng Y.-C., Bouhier E., Berentzen E.I., et al. Computer-aided discovery of a novel thermophilic laccase for low-density polyethylene degradation. J Hazard Mater. 2023;458 doi: 10.1016/j.jhazmat.2023.131986. [DOI] [PubMed] [Google Scholar]
  • 56.Zhang A., Hou Y., Wang Q., Wang Y. Characteristics and polyethylene biodegradation function of a novel cold-adapted bacterial laccase from Antarctic sea ice psychrophile Psychrobacter sp. NJ228. J Hazard Mater. 2022;439 doi: 10.1016/j.jhazmat.2022.129656. [DOI] [PubMed] [Google Scholar]
  • 57.Li Q., Li H., Tian L., Wang Y., Ouyang Z., Li L., et al. Genomic insights and metabolic pathways of an enriched bacterial community capable of degrading polyethylene. Environ Int. 2025;197 doi: 10.1016/j.envint.2025.109334. [DOI] [PubMed] [Google Scholar]
  • 58.Hu X., Gu H., Sun X., Wang Y., Liu J., Yu Z., et al. Metagenomic exploration of microbial and enzymatic traits involved in microplastic biodegradation. Chemosphere. 2024;348 doi: 10.1016/j.chemosphere.2023.140762. 140762. [DOI] [PubMed] [Google Scholar]
  • 59.Gongora-Castillo E., Lopez-Ochoa L.A., Apolinar-Hernandez M.M., Caamal-Pech A.M., Contreras-de la Rosa P.A., Quiroz-Moreno A., et al. Data mining of metagenomes to find novel enzymes: a non-computationally intensive method. 3 Biotech. 2020;10(2) doi: 10.1007/s13205-019-2044-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Zhang C., Freddolino L. A large-scale assessment of sequence database search tools for homology-based protein function prediction. Brief Bioinform. 2024;25(4) doi: 10.1093/bib/bbae349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Davidi D., Shamshoum M., Guo Z., Bar-On Y.M., Prywes N., Oz A., et al. Highly active rubiscos discovered by systematic interrogation of natural sequence diversity. EMBO J. 2020;39(18) doi: 10.15252/embj.2019104081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Shahraki M.F., Ariaeenejad S., Atanaki F.F., Zolfaghari B., Koshiba T., Kavousi K., et al. MCIC: automated identification of cellulases from metagenomic data and characterization based on temperature and pH dependence. Front Microbiol. 2020;11 doi: 10.3389/fmicb.2020.567863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Bharti R., Grimm D.G. Current challenges and best-practice protocols for microbiome analysis. Brief Bioinform. 2021;22(1):178–193. doi: 10.1093/bib/bbz155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Liu S., Moon C.D., Zheng N., Huws S., Zhao S., Wang J. Opportunities and challenges of using metagenomic data to bring uncultured microbes into cultivation. Microbiome. 2022;10(1) doi: 10.1186/s40168-022-01272-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Bateman A., Martin M.-J., Orchard S., Magrane M., Ahmad S., Alpi E., et al. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res. 2023;51(D1):D523–D531. doi: 10.1093/nar/gkac1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Snoeck S., Guidi C., De Mey M. “Metabolic burden” explained: stress symptoms and its related responses induced by (over)expression of (heterologous) proteins in Escherichia coli. Microb Cell Fact. 2024;23(1) doi: 10.1186/s12934-024-02370-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Varadi M., Anyango S., Deshpande M., Nair S., Natassia C., Yordanova G., et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022;50(D1):D439–D444. doi: 10.1093/nar/gkab1061. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1
mmc1.docx (1,002KB, docx)
Multimedia component 2
mmc2.xlsx (23.5KB, xlsx)

Articles from Synthetic and Systems Biotechnology are provided here courtesy of KeAi Publishing

RESOURCES