Abstract
Summary
The number of metagenome-assembled genomes (MAGs) is rapidly increasing with the growing scale of metagenomic studies, driving fast progress in microbiome research. Sample-wise assembly has become the standard due to its computational efficiency and strain-level resolution. It requires dereplication, the removal of near-identical genomes assembled in different metagenomic samples. We present MAGmax, an efficient dereplication tool that enhances both the quantity and quality of MAGs through a strategy of bin merging and reassembly. Unlike dRep, which selects a single representative bin per genome cluster, MAGmax merges multiple bins within a cluster and reassembles them to increase coverage. MAGmax produces more dereplicated, higher-quality MAGs than dRep at its speed and using three times less memory.
Availability and implementation
The MAGmax open source software, implemented in Rust, is available under the GPLv3 license at https://github.com/soedinglab/MAGmax.
1 Introduction
Shotgun sequencing of environmental samples has enabled the recovery of metagenome-assembled genomes (MAGs), providing insights into the ecology and function of uncultured microbes. The standard computational workflow for reconstructing MAGs consists of the following steps: assembly of metagenomic reads, binning of assembled contigs according to their genome of origin, and dereplication of bins (Zhou et al. 2022). In large-scale studies, each sample is typically processed independently for assembly and binning (Ma et al. 2023). This sample-wise strategy offers greater computational efficiency compared to co-assembly across all samples, and is particularly advantageous for reconstructing strain-level genomes, as it reduces the complexity of the deBruijn graph during assembly (Delgado and Andersson 2022).
Since the same genomes may be present in multiple metagenomic samples, bins obtained from sample-wise assemblies are often redundant. Redundancy is typically addressed by clustering bins that belong to the same genome based on average nucleotide identity (ANI) and selecting a representative bin per cluster based on quality metrics such as genome completeness and purity. A widely used tool for this dereplication step is dRep (Olm et al. 2017), which uses MASH to precluster bins and then applies FastANI (Jain et al. 2018) or skani (Shaw and Yu 2023) to define more refined secondary clusters. For each secondary cluster, dRep selects the highest-quality bin as the representative, using CheckM1 to estimate completeness and contamination (Parks et al. 2015).
The current dereplication approach has several limitations: (i) It discards bins with completeness below the predefined threshold, even if they are highly pure, throwing away valuable genomic information. (ii) MASH underestimates ANI for bins with <90% completeness (Shaw and Yu 2023), which can result in grouping incomplete bins from the same genome into separate primary clusters. (iii) The average linkage clustering algorithm used by dRep may fail to cluster together bins with ANI above the threshold. (iv) Even when ANI estimates are accurate and bins are correctly grouped, only a single representative bin is retained while other bins are discarded that may contain genomic regions not covered by the selected bin (Evans and Denef 2020). (v) dRep does not support CheckM2, which provides more accurate estimates of completeness and contamination than CheckM1 (Chklovski et al. 2023).
We introduce MAGmax, a tool that improves the yield and quality of metagenome-assembled genomes (MAGs) through bin merging and reassembly, addressing limitations of dRep. MAGmax integrates sequences from all bins in a cluster, including those with low completeness but high purity, and reassembles them to enhance bin quality. It uses skani, a state-of-the-art method for accurate ANI estimation, and CheckM2 for assessing bin quality.
2 Methods
2.1 Datasets preprocessing
For benchmarking, we selected 76 real metagenomic samples from three different environments: 15 human gut samples from villagers in Honduras (BioProject accession: PRJNA999635) (Beghini et al. 2025), 18 from black soil (BioProject accession: PRJNA1226397) (Galanova et al. 2025), and 43 from neonatal gut (Study Accession: ERP115334) (Shao et al. 2024). Raw reads from black soil were error-corrected using Musket (v1.1) (Liu et al. 2013), while reads from gut samples were processed using kneadData (v0.12.0) to perform error correction and remove human DNA contamination using the GRCh37 reference assembly. Corrected reads were sample-wise assembled using SPAdes (v4.0.0) (Bankevich et al. 2012). For neonatal samples, MEGAHIT (v1.2.9) was used for assembly. Read mapping was performed using Strobealign (v0.13.0-25-g3a97f6b) (Sahlin 2022) to generate abundance matrices and alignment files, followed by sorting using samtools (v1.19) (Li et al. 2009).
2.2 Binning
Contigs from all samples within the same environment were concatenated, and binning was performed using GenomeFace (Lettich et al. 2024), VAMB (Nissen et al. 2021), and MetaBAT2 (Kang et al. 2019) with multi-sample coverage. For GenomeFace and VAMB, contigs longer than 1 kb were used, while MetaBAT2 required contigs longer than 1.5 kb. Bins generated by MetaBAT2 were manually split by sample IDs, while GenomeFace and VAMB produced sample-wise bins directly. Only bins with a total sequence length of at least 200 kb were considered for dereplication.
2.3 Benchmarking runs
Bins were dereplicated at the strain-level (99% ANI) using dRep (Olm et al. 2017) with the options --S_algorithm skani, -sa 0.95, -pa 0.99, -comp 50, -p 24 and -cont 5 or -cont 10. MAGmax was run on the same input bins with the options -c 50, --threads 24 and -p 95 or -p 90. Runtime and peak memory usage were evaluated on a Linux cluster system using 1 CPU with 24 cores and 128 GB of RAM.
2.4 Bin quality assessment
CheckM2 (Chklovski et al. 2023) was used to estimate bin completeness and contamination. The quality score was calculated as completeness − 5 contamination (Parks et al. 2017), and compared between bins obtained from dRep and MAGmax. dRep’s de-replicated bins were further filtered based on completeness and purity estimated by CheckM2. MAGmax output was assessed and filtered using completeness and contamination values estimated using CheckM1 with the default settings (Parks et al. 2015).
3 Results
Algorithm overview: MAGmax takes sample-specific genomic bins as input and filters them based on a user-defined purity threshold (default: ). It identifies single-linkage connected components among these bins based on average nucleotide identities (ANI, default: 99%) between bin pairs, using skani (Shaw and Yu 2023) and a depth-first search algorithm. Within each component, clusters are formed using maximal clique detection. In this approach, a bin can belong to multiple clusters. The algorithm ensures that all bin pairs with sequence identity above the ANI cutoff are grouped together. Bins that are not part of any cluster are added to an existing cluster if they share ANI above the cutoff only with at least one cluster member. For each cluster, MAGmax selects the representative bin with the highest quality score, defined as completeness—5 contamination, where completeness must be and contamination . If no such bin exists, the bins within the cluster are merged and reassembled using SPAdes (Bankevich et al. 2012). The quality score of the reassembled bin is then compared with that of original input bins and the bin with the best quality score is selected. Finally, MAGmax performs a round of redundancy removal, retaining only the best-quality bin from any pair sharing an ANI above the cutoff (Fig. 1a).
Figure 1.
(a) Overview of MAGmax. The blue star marks a high-quality bin. (b) Complementary cumulative distribution of the number of dereplicated genomic bins over the bin quality score (completeness − 5 purity) in combination with three popular binners. (c) Quality scores of dRep bins (y-axis) and the corresponding MAGmax bins (x-axis) based on genomic cluster membership. Grey/blue circles: MAGmax and dRep bins are identical/different. (d) Scatter plot showing quality scores of merged and reassembled bins (x-axis) versus input bins (y-axis). Blue circles represent the highest-quality bins within each genomic cluster, while cyan circles indicate the other input bins. Vertical lines connect the input bins that were used for merging and reassembly. (e) Runtime (in minutes) and peak memory usage (in GB).
Output: A non-redundant set of dereplicated genomic bins, including bins improved through merging and reassembly, and a text file listing bin completeness and contamination estimated by CheckM2 (Chklovski et al. 2023).
We evaluated the performance of MAGmax on metagenomic samples from three different datasets: 15 gut samples from villagers in Honduras (Beghini et al. 2025), 18 samples from black soil (Galanova et al. 2025), and 43 gut samples from newborns at day 21 postpartum (Shao et al. 2024). To obtain genomic bins, we binned sample-wise assembled contigs with multi-sample coverage using GenomeFace (Lettich et al. 2024), VAMB (Nissen et al. 2021), and MetaBAT2 (Kang et al. 2019) (Materials). MAGmax and dRep were independently applied to dereplicate bins at the strain-level (ANI 99%).
First, we compared the number of bins produced by dRep and MAGmax as a function of quality score, using bins with completeness and or contamination. In the dRep output, some bin pairs remained redundant with ANI above 99% (Table 1, available as supplementary data at Bioinformatics online). To avoid redundancy in all subsequent analyses, we removed the lower-quality bin within each redundant pair. Results presented in the main text correspond to bins with contamination .
The total number of dereplicated bins obtained using MAGmax is consistently higher than those produced by dRep (Fig. 1b). In the GenomeFace results, which generated the highest number of bins among the three binning tools, MAGmax recovered 37, 1, and 6 more bins than dRep for the human gut (Honduras), black soil, and neonatal gut samples, respectively. The largest gain was seen in the MetaBAT2 results, where MAGmax yielded 52, 2, and 18 additional bins across the same datasets. Similar results were observed for contamination level (Fig. 1, available as supplementary data at Bioinformatics online).
Next, we compared the quality of each dRep bin with the corresponding output bin from MAGmax, which could be the same bin, a different representative bin from its genomic cluster, or a merged and reassembled bin. In Fig. 1c, GenomeFace bins from the Honduras gut dataset showed that 96.1% of bins were the same between two methods (grey circles), while 3.9% differed, with MAGmax bins having consistently higher quality than dRep bins (blue circles). Across the binning tools, an average of 95% of bins were selected commonly by both methods and the remaining 5% consistently showed higher quality scores in MAGmax (Fig. 2a, available as supplementary data at Bioinformatics online). This trend was consistent across the Honduras gut and neonatal gut datasets.
In the black soil dataset, VAMB produced only 10 bins, from which MAGmax produced one non-redundant bin. The dRep run failed as no clusters were formed during the primary clustering step with MASH (Ondov et al. 2016). For MetaBAT2, dRep and MAGmax selected identical bins. For GenomeFace, MAGmax produced one bin that differed from dRep and had a higher quality score (Fig. 2a, available as supplementary data at Bioinformatics online). Results for 10% contamination were consistent with those for 5% (Fig. 2b, available as supplementary data at Bioinformatics online).
Since dRep uses CheckM1 to select representative bins, we evaluated MAGmax’s performance based on CheckM1 quality scores (Fig. 2, available as supplementary data at Bioinformatics online). Bins from MAGmax that differ from dRep within the same genomic cluster consistently show higher quality according to CheckM1 predictions, indicating that MAGmax produces superior bins compared to those selected by dRep.
To assess the impact of merging and reassembly, we compared the quality scores of merged and reassembled out bins with the corresponding input bins. Figure 1d shows example results from GenomeFace bins for the Honduras gut dataset, where output bins exhibited substantial improvement in the quality, with scores increasing by 0.2%–36.9% compared to the highest quality input bin (dark blue). For instance, two input bins with quality scores were merged and reassembled into a single bin with a score of 0.6, demonstrating the effectiveness of this strategy. Similar trends were observed across other binning tools, datasets, and at the 10% contamination level (Fig. 3, available as supplementary data at Bioinformatics online).
Across all datasets and binning tools, MAGmax was on average faster than dRep, while using one-third of the peak memory (Fig. 1e). The gain in speed is largely attributed to MAGmax’s integration of CheckM2, which is considerably faster than CheckM1 used by dRep (Chklovski et al. 2023).
In conclusion, MAGmax enhances both the quality and quantity of metagenome-assembled genomes (MAGs) through dereplication and bin enrichment. It allows users to set desired cutoffs for ANI, completeness, and contamination. It also provides the option to perform dereplication without reassembly, which is useful when dereplication must be conducted against reference genomes available in public databases. With the exponential increase in metagenomic data, MAGmax’s ability to integrate genomic information across multiple samples, coupled with its speed and memory efficiency, will improve the recovery of MAGs in large-scale metagenomic studies.
Supplementary Material
Acknowledgements
This work used the Scientific Compute Cluster at GWDG, the joint data center of the Max Planck Society for the Advancement of Science (MPG) and the University of Göttingen.
Contributor Information
Arangasamy Yazhini, Quantitative and Computational Biology, Max-Planck Institute for Multidisciplinary Sciences, 37077 Göttingen, Germany.
Johannes Söding, Quantitative and Computational Biology, Max-Planck Institute for Multidisciplinary Sciences, 37077 Göttingen, Germany; Campus Institute Data Science (CIDAS), University of Göttingen, 37077 Göttingen, Germany.
Author contributions
Arangasamy Yazhini (Conceptualization [lead], Data curation [lead], Formal analysis [lead], Funding acquisition [equal], Investigation [lead], Methodology [lead], Project administration [equal], Resources [supporting], Software [lead], Supervision [supporting], Validation [equal], Visualization [lead], Writing—original draft [lead], Writing—review & editing [supporting]) and Johannes Söding (Conceptualization [lead], Data curation [supporting], Formal analysis [supporting], Funding acquisition [equal], Investigation [supporting], Methodology [supporting], Project administration [equal], Resources [lead], Software [supporting], Supervision [lead], Validation [equal], Visualization [supporting], Writing—original draft [supporting], Writing—review & editing [lead])
Supplementary data
Supplementary data is available at Bioinformatics online.
Conflict of interest: None declared.
Funding
Y.A. acknowledges support from Marie Skłodowska-Curie Actions Postdoctoral Fellowships (Project No. 101111457) under the Horizon Europe programme of the European Union and from the Max-Planck society.
Data availability
All metagenomic datasets used in this study are publicly available in the European Nucleotide Archive (https://www.ebi.ac.uk/ena/browser/home).
References
- Bankevich A, Nurk S, Antipov D et al. Spades: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012;19:455–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beghini F, Pullman J, Alexander M et al. Gut microbiome strain-sharing within isolated village social networks. Nature 2025;637:167–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chklovski A, Parks DH, Woodcroft BJ et al. Checkm2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat Methods 2023;20:1203–12. [DOI] [PubMed] [Google Scholar]
- Delgado LF, Andersson AF. Evaluating metagenomic assembly approaches for biome-specific gene catalogues. Microbiome 2022;10:72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Evans JT, Denef VJ. To dereplicate or not to dereplicate? mSphere 2020;5:10–1128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galanova OO, Mitkin NM, Danilova AA et al. Assessment of soil health through metagenomic analysis of microbiomes in Russian’s black soil. Microorganisms 2025;13:854. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jain C, Rodriguez-R LM, Phillippy AM et al. High throughput ani analysis of 90k prokaryotic genomes reveals clear species boundaries. Nat Commun 2018;9:5114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kang DD, Li F, Kirton E et al. Metabat 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 2019:7:e7359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lettich R, Egan R, Riley R et al. Genomeface: a deep learning-based metagenome binner trained on 43,000 microbial genomes. bioRxiv, https://doi.org/10.1101/2024.02.07.579326, 2024, preprint: not peer reviewed.
- Li H, Handsaker B, Wysoker A et al. ; 1000 Genome Project Data Processing Subgroup. The sequence alignment/map format and samtools. Bioinformatics 2009;25:2078–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Y, Schröder J, Schmidt B. Musket: a multistage k-mer spectrum-based error corrector for illumina sequence data. Bioinformatics 2013;29:308–15. [DOI] [PubMed] [Google Scholar]
- Ma B, Lu C, Wang Y et al. A genomic catalogue of soil microbiomes boosts mining of biodiversity and genetic resources. Nat Commun 2023;14:7318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nissen JN, Johansen J, Allesøe RL et al. Improved metagenome binning and assembly using deep variational autoencoders. Nat Biotechnol 2021;39:555–60. [DOI] [PubMed] [Google Scholar]
- Olm MR, Brown CT, Brooks B et al. drep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J 2017;11:2864–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ondov BD, Treangen TJ, Melsted P et al. Mash: fast genome and metagenome distance estimation using minhash. Genome Biol 2016;17:132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parks DH, Imelfort M, Skennerton CT et al. Checkm: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 2015;25:1043–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parks DH, Rinke C, Chuvochina M et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol 2017;2:1533–42. [DOI] [PubMed] [Google Scholar]
- Sahlin K. Strobealign: flexible seed size enables ultra-fast and accurate read alignment. Genome Biol 2022;23:260. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shao Y, Garcia-Mauriño C, Clare S et al. Primary succession of bifidobacteria drives pathogen resistance in neonatal microbiota assembly. Nat Microbiol 2024;9:2570–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shaw J, Yu YW. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nat Methods 2023;20:1661–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou Y, Liu M, Yang J. Recovering metagenome-assembled genomes from shotgun metagenomic sequencing data: methods, applications, challenges, and opportunities. Microbiol Res 2022;260:127023. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All metagenomic datasets used in this study are publicly available in the European Nucleotide Archive (https://www.ebi.ac.uk/ena/browser/home).

