Skip to main content
Experimental & Molecular Medicine logoLink to Experimental & Molecular Medicine
. 2024 Jul 1;56(7):1501–1512. doi: 10.1038/s12276-024-01262-7

Genome-resolved metagenomics: a game changer for microbiome medicine

Nayeon Kim 1, Junyeong Ma 1, Wonjong Kim 1, Jungyeon Kim 1, Peter Belenky 2,, Insuk Lee 1,3,
PMCID: PMC11297344  PMID: 38945961

Abstract

Recent substantial evidence implicating commensal bacteria in human diseases has given rise to a new domain in biomedical research: microbiome medicine. This emerging field aims to understand and leverage the human microbiota and derivative molecules for disease prevention and treatment. Despite the complex and hierarchical organization of this ecosystem, most research over the years has relied on 16S amplicon sequencing, a legacy of bacterial phylogeny and taxonomy. Although advanced sequencing technologies have enabled cost-effective analysis of entire microbiota, translating the relatively short nucleotide information into the functional and taxonomic organization of the microbiome has posed challenges until recently. In the last decade, genome-resolved metagenomics, which aims to reconstruct microbial genomes directly from whole-metagenome sequencing data, has made significant strides and continues to unveil the mysteries of various human-associated microbial communities. There has been a rapid increase in the volume of whole metagenome sequencing data and in the compilation of novel metagenome-assembled genomes and protein sequences in public depositories. This review provides an overview of the capabilities and methods of genome-resolved metagenomics for studying the human microbiome, with a focus on investigating the prokaryotic microbiota of the human gut. Just as decoding the human genome and its variations marked the beginning of the genomic medicine era, unraveling the genomes of commensal microbes and their sequence variations is ushering us into the era of microbiome medicine. Genome-resolved metagenomics stands as a pivotal tool in this transition and can accelerate our journey toward achieving these scientific and medical milestones.

Subject terms: Genome informatics, Genomics

Metagenomics: illuminating human gut microbiota for medical breakthrough

The human body houses numerous microbes, tiny organisms, that are vital for our health. This research aims to overcome limitations using genome-resolved metagenomics, a method that assembles complete genomes from complex microbial communities without needing to grow the organisms in a lab. The study focuses on the gut microbiome, using advanced computer methods to build metagenome-assembled genomes from DNA sequencing data. The research successfully increased the genetic diversity of the human gut microbiome by adding many new genomes to the existing database. The main findings include identifying new microbial species and expanding the genetic repertoire of known species, providing deeper understanding of the microbial diversity within the human gut. Researchers conclude that genome-resolved metagenomics is a significant advancement in microbiome research, offering understanding of microbial communities and their functions. This summary was initially drafted using artificial intelligence, then revised and fact-checked by the author.

Introduction

The human body is home to a multitude of symbiotic microbial cells that outnumber the host’s own cells and exert a significant influence on human physiology. As evidence regarding the role of commensal microbes in human diseases has accumulated, microbiome medicine has emerged as a new field in biomedical research. This field seeks to harness human microbiota and derived molecules for the prevention and treatment of diseases. Achieving this goal requires a comprehensive understanding of the taxonomic and functional organization of the human microbiome.

Historically, microbial community research has been a domain within microbial ecology that was initially focused on environmental microbes. However, the discovery of vast microbial communities within the human body has expanded the scope of this field. For many years, human microbiome research has adopted methodologies based on bacterial phylogeny and taxonomy, particularly 16S rRNA gene sequence analysis1, which is adequate for revealing differences in taxonomic composition between diseased microbiomes and their healthy counterparts. However, the limited taxonomic resolution of 16S rRNA sequences2 and their inherent inability to perform functional analysis pose obstacles to further advancements, including identifying the functional elements of the microbiome that directly influence host physiology. This situation is reminiscent of human genetics prior to the availability of the human reference genome. The absence of a comprehensive human genome map meant that the search for disease genes was based on sparse genomic landmarks, leading to the identification of only broad chromosomal regions associated with diseases. This approach often requires years of subsequent studies to precisely locate the genes responsible. The decoding of the human genome and the cataloging of single nucleotide variations accelerated the discovery of disease-associated genes and genetic variations, thus ushering in the era of genomic medicine3.

In this review, we advocate for a similar transition in microbiome medicine. Decoding the complete genomes of all commensal microbial species and cataloging their genetic components will expedite the development of new biomarkers and therapeutics derived from the human microbiome. Genome assembly, especially for not-yet-cultured species, has been technically challenging for many years. However, recent advancements in genome-resolved metagenomics have ushered in significant changes in research. Numerous computational methods have been developed for de novo genome assembly from metagenome shotgun sequencing data, leading to the rapid accumulation of draft genomes in the form of metagenome-assembled genomes (MAGs). This review discusses computational methods for MAG construction and their impact on human microbiome research, with a particular focus on gut microbiome research. Additionally, while the same research framework can be applied to study various microbial communities in the body, this review primarily addresses the study of prokaryotic commensal microbes, noting that MAG reconstruction is also possible for symbiotic fungi and viruses.

Inherent limitations of 16S rRNA gene sequencing

16S rRNA gene sequencing has been a popular method for taxonomic analysis of microbial communities due to its cost-effectiveness and straightforward bioinformatic interpretation, making it widely accessible. However, this approach has several inherent limitations related to its target of analysis, the 16S rRNA sequence.

First, the variation in 16S rRNA sequences generally does not permit taxonomic classification at the species level. Recent studies have shown that even the analysis of entire 16S regions using long-read sequencing might not be sufficient for species-level taxonomic differentiation4. Moreover, differences at the subspecies level between microbes of the same species can have significant impacts on host physiology, yet these nuances are often overlooked in taxonomic analyses of microbiomes. Second, 16S rRNA sequences do not provide information about the functional capabilities of microbes. Although tools such as PICRUSt5 allow for the prediction of metabolic pathways based on 16S rRNA sequences, the results are mere inferences drawn from a limited number of representative genomes associated with a given 16S rRNA sequence. Third, 16S rRNA sequences are unique to prokaryotes, rendering the detection of nonbacterial commensals, such as fungi, viruses, and protists, impossible using this sequence information. Fourth and most critically, the study of novel species that are considered ‘microbial dark matter’ is challenging, as the interpretation of 16S rRNA sequences is heavily reliant on databases populated with known bacterial species. This dependency can impede the discovery and understanding of previously uncharacterized microbial entities.

Microbiome analysis with whole-metagenome sequencing (WMS): a new paradigm

The Human Microbiome Project (HMP)6 differed from the Human Genome Project in that it did not produce reference genomes from sequencing data. This was due to the complexity of assembling individual bacterial genomes from mixed sequence reads originating from various bacterial sources. At that time, computational algorithms were not advanced enough to effectively separate and assemble these genomes accurately. Nonetheless, HMP was crucial in shifting microbiome research toward WMS, which involves sequencing all genetic material in a sample to provide a more comprehensive understanding of the microbiome.

HMP significantly contributed to human microbiome research by releasing WMS datasets from the healthy human microbiome to the public. These datasets included 541 samples from the gut, 215 from the vaginal microbiome, 1090 from the oral microbiome, and 56 from the skin microbiome, highlighting the project’s broad scope and its impact on understanding human health. This release led to the development of numerous bioinformatics tools for analysis. The second phase of the HMP, known as HMP2 or iHMP, aimed to provide a more comprehensive understanding of host-microbiome interactions over time7. Extensive multiomics data encompassing host and microbiome interactions were generated for HMP2. These included WMS data related to the human gut and vaginal microbiome during pregnancy and preterm birth, inflammatory bowel diseases, and prediabetes. As a result, the public database was enriched with WMS data from an additional 2000 gut samples and 930 vaginal samples, thus further advancing the resources available for human microbiome research. Thanks to this large-scale consortium project and numerous other studies, the volume of public WMS datasets for the human gut microbiome has grown rapidly, exceeding 110,000 samples by 2023 (Fig. 1). However, a notable issue is the significant geographical bias in the data. Most of the public WMS data originate from a few countries, such as the US, China, and some European nations, leaving gut microbiome data from most Asian and African countries underrepresented. This gap is critical, as the gut microbiota composition is heavily influenced by diet and lifestyle810. The current human gut microbiome data landscape, therefore, lacks comprehensiveness. Addressing this imbalance by including underrepresented populations in future sample collections and analyses is essential for a more accurate global understanding of the human gut microbiome.

Fig. 1. Distribution of human gut whole metagenome sequencing (WMS) samples submitted to the NCBI Sequence Read Archive (SRA) by country and year.

Fig. 1

The bar graph shows the annual cumulative number of human gut WMS samples submitted to the NCBI SRA. The pie chart inset breaks down the contribution of different countries to the total sample submissions; the USA contributed the majority, followed by China, Sweden, and other countries as of the last recorded year. Countries contributing less than 2% are grouped under “Others.” This figure highlights the increasing growth rate and geographical bias of human gut WMS data in public databases.

Genome-resolved metagenomics: enabling versatile study of the human microbiome

Genome-resolved metagenomics is a transformative approach in microbiome studies, delving into the DNA of mixed microbial communities to directly assemble and analyze individual genomes from metagenomic data. This technique marks a significant advancement over traditional 16S rRNA sequencing, offering an enriched depth of understanding and unprecedented insights into the human microbiome (Fig. 2).

Fig. 2. Comparison of 16S rRNA sequencing and whole-metagenome sequencing (WMS) in microbiome analysis.

Fig. 2

a 16S rRNA sequencing analysis can be used to perform taxonomy profiling and functional inference based on the taxonomic profile. b Various routes of microbiome analysis through the WMS, which include both assembly-free and assembly-based approaches. The figure emphasizes the comprehensive insights provided by WMS in understanding microbiomes compared to 16S rRNA sequencing.

At the core of this method, genome-resolved metagenomics allows for the assembly of novel genomes spanning a variety of microorganisms, encompassing bacteria, viruses, and fungi. Including these novel species genomes extends the phylogenetic tree, thus bringing previously undetectable species into focus11. Furthermore, the increasing availability of genomic data at the species level facilitates in-depth investigations of variations within species12. This advancement lays the groundwork for the development of comprehensive pangenomes13, which would offer a more detailed understanding of the genetic diversity within species. Researchers are now poised to uncover numerous novel coding sequences, which could lead to the identification of new metagenome protein families14,15. Genomic comparisons within bacterial species facilitate the tracking of the intra- and interindividual transmission of commensal bacteria16,17, while genome-centric analysis offers a window into microbiome evolution through genetic mutations and horizontal gene transfer18. The within-species genetic diversity reflects the microbiome’s adaptive journey within specific host environments19, thus revealing potential statistical associations between single nucleotide variants (SNVs) or structural variants (SVs) of microbial genomes and host phenotypes20,21. Finally, MAGs enable us to conduct genome-scale metabolic modeling for uncultured bacterial species22, representing a substantial portion of the human gut microbiome, ultimately allowing for the metabolic modeling of individual microbiomes23.

Assembly of individual microbial genomes from metagenomic sequencing reads

Generating MAGs from mixed short-read sequences originating from various microorganisms is the first step in genome-resolved metagenomics. The construction of MAGs comprises a two-step process that includes assembly and binning (Fig. 3).

Fig. 3. Workflow of metagenome-assembled genome (MAG) reconstruction from metagenomic samples.

Fig. 3

This flowchart outlines the process of generating MAGs from a stool sample. The procedure begins with the collection of a stool sample, followed by shotgun metagenomic sequencing to obtain fragmented DNA. The DNA fragments are then assembled into contigs. These contigs are clustered based on nucleotide composition and coverage depth to form MAGs through the binning process. The final step involves a quality assessment of the assembled genomes, evaluating completeness and checking for contamination.

During the initial assembly step, short reads are pieced together into longer contigs, resembling the assembly of a puzzle, where the overlapping regions of these short reads serve as the connecting elements. Generally, there are two assembly models: the overlap-layout-consensus (OLC) model and the De Bruijn graph. In the OLC model, each read is represented as a node in a graph, with the overlaps between reads depicted as edges. However, as sequencing depth increases, this method can lead to large and complex graphs. Conversely, the De Bruijn graph model enhances scalability by dividing reads into k-mers24. Short-read assemblers such as metaSPAdes25 and MEGAHIT26 employ this strategy by splitting short reads into k-mer fragments and then using De Bruijn graphs to assemble these fragments into extended contigs27. The assembly process can be undertaken in two ways: single-assembly, which is performed independently for each sample, and coassembly, which is carried out on merged samples after pooling multiple samples28,29. Each method has distinct advantages and drawbacks (Supplementary Table 1a). Unlike environmental samples, such as those from interconnected ocean and soil, the human gut microbiome represents a distinct environment that varies among individuals. Consequently, preserving strain-specific variants such as SNVs is crucial. The preservation of strain specificity is attainable through diverse paths in the De Bruijn graph. However, this process results in the generation of numerous fragmented contigs30,31. Therefore, we recommend employing a single-assembly approach. If the goal is to capture low-abundance taxa, increasing sequencing depth instead of coassembly is advisable32.

Moving on to the binning step, contigs originating from the same genome are grouped into bins, each corresponding to a specific genome. Binning involves clustering similar contigs based on their sequence composition and coverage depth3336. The sequence composition refers to nucleotide features, including k-mers. Given that a species is distinguished by the constancy of k-mers and GC ratios throughout its genome3739, these features can be employed to cluster contigs into a genome bin. The tetranucleotide frequency (TNF) is the most frequently utilized metric for this purpose and has demonstrated superior performance in comparison to other k-mer sizes36. Additionally, contigs from the same genome are co-abundant in a sample40, making contigs with comparable coverage depths more likely candidates for belonging to the same genome. The coverage depth can be calculated from a single sample (single-coverage binning) and from a group of samples (multi-coverage binning)41. These two approaches present advantages and disadvantages (Supplementary Table 1b). Single-coverage binning based on co-abundance within a single sample may inadvertently introduce contaminated contigs into a genome bin, which can affect downstream analyses. To mitigate this issue, we suggest adopting multi-coverage binning using co-abundance across multiple samples. Implementing this approach requires careful consideration of which samples to collectively analyze in multi-coverage binning to ensure accuracy and reduce the risks of contamination.

Furthermore, in the clustering of contigs from the same species, various tools differ in the features and algorithms employed for binning (Supplementary Table 1c)3335,4247. Given that no single tool universally outperforms in all scenarios48,49, using several binning tools and combining their results through ensemble methods is common. The merging step, which is referred to as bin refinement, combines the results of multiple binning tools to create a single bin with the highest quality combination of contigs50. The tools used for this process are summarized in Supplementary Table 1d5153.

As the generated genome sequences can be used in various downstream analyses, we need to measure the quality of the final bin, i.e., a single genome sequence. While there are quantitative quality metrics such as the N50 and number of contigs, there are two absolute metrics for measuring genome quality that universally define MAG quality: completeness and contamination54,55. The reliability of a genome sequence is directly proportional to its completeness and inversely proportional to its contamination level. According to the widely recognized Minimum Information about a Metagenome-Assembled Genome (MIMAG) standard56, a genome with over 50% completeness and less than 10% contamination is classified as a medium-quality draft genome. In contrast, a genome with more than 90% completeness and less than 5% contamination is considered a near-complete draft genome. Completeness refers to how much of the actual genome is covered by the assembled genome sequence. Low completeness of a genome sequence can result in an underestimation of the functional capacity of a species when inferring its functional capabilities or conducting metabolic modeling57. Contamination in a genome sequence indicates the presence of extraneous fragments that do not belong to the genome being sequenced58. Contamination in genome sequences arises from various sources, including the mixing of closely related genomes during the binning process due to their similar sequence compositions. Additionally, genomes that are taxonomically distant can become contaminated for various reasons. Various computational tools are available to detect contamination in a genome (Supplementary Table 1e)54,59. For comprehensive quality control, using multiple tools due to their differing strengths is advisable60,61. Another common source of contamination is the inclusion of sequences from the host, such as human DNA in microbiome studies, or that from fungal and viral sequences. Particularly for this third type of contamination that involves eukaryotic or viral sequences, extra caution is necessary.

Expansion of phylogeny through MAGs and their taxonomic classification

Recent advancements in bioinformatics and the reduced cost of metagenomic sequencing have greatly facilitated the large-scale construction of bacterial MAGs, which require accurate taxonomic classification. Traditionally, the classification of bacterial genomes has relied on the National Center for Biotechnology Information (NCBI) taxonomy, a system grounded in the International Code of Nomenclature of Prokaryotes62. However, this consensus-based nomenclature system often struggles to keep up with the swift identification and categorization of new species. To address these challenges, an automatic and objective approach for classifying new bacterial and archaeal genomes involves their integration into a reference phylogenetic tree. The Genome Taxonomy Database (GTDB)63, a reference bacterial taxonomy database, provides a contemporary solution for this. Unlike the NCBI taxonomy, which often uses the 16S rRNA region for classification, GTDB bases its reference on 120 specific single-copy marker proteins for bacterial genomes. GTDB has also made efforts to rectify common issues in traditional taxonomy, such as the removal of polyphyletic groups to align phylogeny with taxonomy64 and the normalization of unequal taxonomic ranks65. The GTDB Toolkit (GTDB-Tk)66 was developed to facilitate the accurate taxonomic classification of novel genomes by placing them within the GTDB framework. This phylogenetic reference allows for the genome sequence-based taxonomy annotation of new species by determining their phylogenetic positions. While most the species in the GTDB currently carry nonstandard placeholder names, this system allows for taxonomy annotation based on genome sequences, even for novel species, by inferring their position in the phylogeny.

Many MAGs have revealed novel microbial species, thereby significantly expanding the current phylogenetic tree. This advancement is particularly evident in the study of the human gut microbiome, where a limited number of species have been isolated, leaving the vast majority uncultured67. For instance, to date, fewer than 20% of the prokaryotic species cataloged in the Human Reference Gut Microbiome (HRGM)32 have at least one genome assembled from an isolated strain (isolate genome), with the majority of species being defined solely by MAGs (Fig. 4). Notably, several large clades of bacterial taxa have not yet had any isolate genome. With the increasing ease of assembling genomes for uncultured species via MAGs, phylogenetic trees representing prokaryotic life are poised for rapid expansion63,68.

Fig. 4. Comparison of species or genera having isolate genomes and metagenome-assembled genomes (MAGs) and those with MAG alone.

Fig. 4

a The phylogenetic tree represents 5414 microbial species cataloged in the Human Reference Gut Microbiome (HRGM), 893 of which (16.5%) possessed at least one isolate genome marked on the outer ring. b Bar chart comparing the number of genera composed of isolate genomes alone, MAGs alone, and those with both isolate genomes and MAGs. The ‘Non-singleton Genera’ column shows numbers excluding genera represented by a single species. This visualization underscores the extent to which MAGs complement isolate genomes in representing microbial diversity, particularly within non-singleton genera.

Functional annotations of MAGs: making sense of novel genomes

Compiling a part list of a genome is a fundamental step toward understanding bacterial species, involving the prediction of open reading frames (ORFs) in MAGs and the annotation of their functions. A recent benchmark study assessed various gene prediction tools, revealing no universally superior tool69. However, Prodigal70, widely recognized as the most popular tool in this domain, consistently showed robust performance in different scenarios. Prodigal identifies key features in sequences, such as the ribosome binding site (RBS) motif, start codon usage, and coding statistics, through unsupervised learning. This approach allows for efficient gene prediction in non-model bacterial organisms. Once ORFs are predicted, they undergo automated functional annotation, where each gene is assigned, functional terms based on its homology to known proteins. A widely used database for such homology-based functional annotation is eggNOG71, which contains millions of orthologous groups (OGs). Compared with other annotation tools, eggNOG-mapper72 is particularly notable for its high accuracy, which can be achieved by effectively distinguishing between paralogs—similar sequences with potentially different functions73. Other tools offering homology-based functional annotations include InterProScan74, Prokka75, Bakta76, DRAM77, and MicrobeAnnotator78. Functional annotation tools leverage a range of databases to provide annotations, prominently including KEGG (pathway/orthology)79, Gene Ontology80, Pfam81, and Carbohydrate-Active Enzyme Database (CAZy) annotations82. These databases are widely used for their comprehensive resources in various domains of biological research.

Genome mining is a crucial method for identifying genes with specialized functions in prokaryotic genomes, particularly in the context of antibiotic resistance, which is a growing concern for human health due to the widespread use of antibiotics. This excessive use has led to the emergence of antibiotic-resistant pathogens83. Antibiotic resistance genes (ARGs), which may originate in commensal bacteria, become a significant risk when they are transferred to pathogens via horizontal gene transfer (HGT)8486. Recent studies on the human gut microbiome have shown a link between antibiotic consumption and the prevalence of ARGs in the microbiome8789. The detection of ARGs typically involves aligning sequences with a known ARG reference database for homology-based identification using tools such as RGI90 or ABRicate91. For identifying novel ARGs, methods such as the hidden Markov model (HMM) approach, exemplified by ResFam92 or fARGene93, and deep learning techniques such as deepARG94 or PLM-ARG95 are also employed. These methods rely on a database of known ARG sequences for accurate identification. Integrating multiple databases, including CARD90, ResFinderFG96, and MEGARes97, to enhance the comprehensiveness and accuracy of ARG detection is common.

Antimicrobial peptides (AMPs) are short peptides, typically composed of fewer than 100 amino acids98, that inhibit the growth of various microorganisms, including bacteria, fungi, parasites, and viruses99. These peptides are being evaluated as potential antibiotic alternatives, largely due to their anti-inflammatory and immunomodulatory properties100. Given the diversity of the human gut microbiome, it is expected to be a rich source of novel AMPs101. Research has increasingly focused on AMPs derived from the human gut microbiome, as they are likely nontoxic to human cells102. Machine learning-based methods have proven more effective than homology-based methods in identifying AMPs, owing to their short length103,104. Furthermore, recent efforts have been directed toward using deep learning techniques for the discovery of novel AMP candidates within the human gut microbiome102,104,105.

Expansion of the pangenome through MAGs: uncovering the full functional potential of individual microbial species

The collection of MAGs for individual species provides insight into their functional potential, often conceptualized as a pangenome, which encompasses all the genes within a species, including both a core genome, comprising genes common to most strains, and an accessory genome, made up of genes found in only a subset of strains106,107. The inclusion of MAGs has led to a significant expansion in pangenome size for many species, surpassing what is observed with pangenomes constructed solely from isolate genomes. This phenomenon is exemplified by the pangenome analysis of Akkermansia muciniphila in the HRGM (Fig. 5). The accessory genome, which contributes to functional diversity within subspecies108, plays a crucial role in adaptation to different hosts and can be associated with pathogenic traits109,110. Thus, developing a comprehensive pangenome that distinguishes between core and accessory genomes is vital for a deeper understanding of the diversity within microbial species across different host populations.

Fig. 5. Expansion of the Akkermansia muciniphila pangenome with isolates and metagenome-assembled genomes (MAGs).

Fig. 5

This graph depicts the growth of the pangenome size (gene count) of Akkermansia muciniphila as more genomes are sequenced. The solid line indicates the rarefaction curve for isolate genomes only, showing initial rapid growth in pangenome size that begins to plateau as the number of genomes increases. The dashed line represents the extrapolation curve when both isolates and MAGs are considered, suggesting a larger pangenome. This illustrates the impact of incorporating MAGs on understanding the genomic diversity of this species, highlighting that MAGs substantially increase the known gene repertoire beyond what is observed with isolates alone.

The basic method for conducting pangenome analysis involves collecting all protein sequences from a species and clustering them to identify homologous genes111. Although this approach is efficient, it does not differentiate between paralogs, which are genes derived from gene duplications within the genome that often evolve distinct functions112. A more advanced technique groups homologous genes while preserving synteny using graph-based methods that incorporate information about gene neighborhoods113. This strategy is widely used in pangenome analysis tools such as Roary114, PPanGGOLiN115, and panaroo116. However, these tools can encounter difficulties in accurately grouping genes with weak synteny conservation, particularly those affected by HGT between different species111.

The quality of genomes plays a critical role in pangenome analysis. MAGs, while convenient to obtain, often face quality issues such as fragmented assemblies and potential contamination during the binning process. Fragmented assemblies can lead to gene loss, particularly at contig ends, which may affect the core genome. Contamination, conversely, might result in false positives within the accessory genome, leading to apparent expansion117. Panaroo is a widely used tool for pangenome analysis with MAGs, adept at managing challenges associated with MAGs, such as truncated ends and potential contamination, despite not being specifically designed for MAGs. Recently, ggcaller118, another tool, has been introduced to address similar challenges in pangenome analysis.

Cataloging microbial genomes for specific environments: toward a comprehensive reference microbiome

Taxonomic and functional profiling of microbiome samples can be achieved through either assembly-based or assembly-free approaches. The assembly-based method involves de novo assembly of sequence reads into contigs and species bins, which, while not reliant on reference microbial genomes, is time-consuming and computationally intensive. Consequently, for commonly studied environments such as the human gut, the preferred method is the assembly-free approach that utilizes reference microbial genomes. The extensive collection of MAGs has prompted in-depth research into the creation of biome-specific reference databases designed for specific environments119. These references provide a rich source of genome and protein sequences for a given environment to aid in the identification of previously unknown genomes and proteins. Using biome-specific references as databases is especially advantageous for studying complex environments such as the human gut microbiome, facilitating quick and precise taxonomic and functional profiling of metagenomic samples without the need for de novo assembly.

There are several distinct catalogs of reference microbial genomes specific for the human gut. The Unified Human Gastrointestinal Genome (UHGG)120 is a comprehensive catalog that merges three prior large-scale collections of human gut bacterial genomes11,61,67. The UHGG provides 204,938 nonredundant genomes across 4644 prokaryotic species. However, the UHGG collection exhibits a geographical bias, primarily representing samples from the US, China, Denmark, and Spain, and thus lacks representation of gut microbes from various other regions. To mitigate this limitation, the HRGM32, in which MAGs from fecal metagenome samples from three East Asian countries—Korea, India, and Japan—were added, was introduced. The HRGM expands the range to 232,098 nonredundant genomes across 5414 prokaryotic species, increasing both genome and species numbers by approximately 10% compared to those of UHGG. These catalogs have markedly enhanced the classification of taxonomic reads beyond what is available in traditional catalogs such as the Reference Sequence (RefSeq) database, thus emphasizing the significance of having comprehensive catalogs for studying the human gut microbiome32,120. Further efforts have produced additional gut microbiome catalogs focusing on underrepresented geographic areas such as Israel121, Singapore122, and Inner Mongolia123. Additionally, cataloging MAGs from the fecal metagenomes of children under three years old has revealed many new microbial species, offering valuable insights for the study of the human gut microbiome in early life61.

Sequence-resolved microbiome analysis: a population genetics perspective on the human microbiome

The key benefit of genome-resolved microbiome analysis lies in the application of genomics to study the human microbiome. With access to numerous genomes across various subspecies, exploring the genetic diversity within species of the human microbiome has become possible12. This genetic variation is crucial for several applications, including tracking identical strains, identifying links between specific strains and host phenotypes, and discovering bacterial genetic variants that correlate with host phenotypes124. Strain-level profiling, which utilizes single nucleotide variants (SNVs) among strains, has been instrumental125. Recent research has demonstrated the association of bacterial SNVs with host phenotypes, such as body mass index, underscoring the importance of nucleotide-level diversity in microbiome research21. Efforts to profile bacterial structural variants (SVs) in the human microbiome have also been made by examining their relationships with human health20. For instance, SVs in human gut bacteria have been linked to bile acid metabolism126 and the response to immune checkpoint inhibitors127, thus highlighting the significant insights genome-resolved analysis offers in understanding the human microbiome.

The traditional method for detecting bacterial genetic variations involves culturing microorganisms, isolating their genomes, sequencing them, and then identifying mismatches through whole-genome alignment128. However, this approach is less effective for the human gut microbiome, as a significant portion of the microbiome remains uncultured. An alternative strategy, metagenotyping, involves aligning WMS reads against reference microbial genomes to identify genetic variations; this strategy offers a feasible solution for analyzing the human gut microbiome. Key tools for read alignment-based metagenotyping include StrainPhlAn129, metaSNV130, and MIDAS131. These methods, while comprehensive, are time intensive and require high read coverage to accurately differentiate between actual genetic variations and sequencing errors. To overcome these limitations, newer tools such as GT-Pro132 that employ exact k-mer matching algorithms for metagenotyping have been developed. K-mer-based methods are quicker than read alignment-based techniques, although they may sometimes be less accurate.

Metagenotyping has become a key tool in exploring the transmission of microbial communities at the strain level. The applications of metagenotyping range from studying microbe transfer within the body, such as from the oral cavity to the gut, to investigating how specific diseases may be linked to oral-to-gut microbial transmission17,133. In addition to individual-level studies, metagenotyping is instrumental for examining interindividual microbial transfers, including vertical transmission from mothers to infants and microbial sharing within households or larger populations16,134. Another significant application of metagenotyping is analyzing strain-level changes in the gut microbiome composition following fecal microbiota transplantation (FMT), providing valuable insights into this therapeutic intervention135137. These examples underline the versatility and potential usefulness of metagenotyping in various transmission-related research areas.

Metagenotyping has also proven effective in tracking the evolutionary dynamics of gut microbiomes, both within individuals and across different individuals. When applied to longitudinal samples from a single person, this technique allows for comparisons of strain similarities within and between individuals. Metagenotyping also enables the observation of how specific strains of species evolve over time within an individual129,138141. Additionally, metagenotyping has been useful for detecting genetic changes in gut microbes that occur in response to external influences, such as antibiotic treatments142. These applications suggest that metagenotyping holds considerable potential for use in broader studies, particularly in examining how various factors induce genetic variations in gut microbes.

Metabolic modeling of MAGs: enabling metabolic simulation of personal microbiomes

As personalized medicine advances, simulating host-microbiome metabolic interactions is becoming essential for forecasting health outcomes and customizing treatments143145. In the past, the field of metabolic engineering primarily used genome-scale metabolic models (GEMs) that were reconstructed for culturable species with complete genome sequences to predict genetic content146,147. Currently, the recent surge in the availability of numerous MAGs has opened the door to reconstructing GEMs for gut commensal microbes that are not yet culturable.

The main objective in reconstructing GEMs is to chart the behaviors of specific organisms and predict their interactions within individual models. The reconstruction of genome sequences is a meticulous and laborious process that requires thorough curation. Given the immense diversity of microbes in the human microbiome, which encompasses thousands of species, automation of this process is vital. To this end, several tools have been developed for automated GEM reconstruction, such as RAVEN148, Pathway Tools149, and merlin150, which greatly aid in MAG-based metabolic modeling. Notable among these tools are ModelSEED151,152, CarveMe153, and gapseq154. Generated GEMs can be evaluated using MEMOTE155, which offers a standardized method for quality assessment. This tool ensures that GEMs meet specific criteria for accuracy and completeness, thus facilitating their use in research and application.

ModelSEED, a web-based platform, streamlines the process of generating draft metabolic models. This platform utilizes the SEED framework pipeline, which begins with assembling genome sequences and submitting them to the RAST annotation server for genetic content prediction156. This process involves constructing gene‒protein-reaction associations, generating biomass reactions, assembling the reaction network, and analyzing reaction reversibility thermodynamics. The end result is an optimized draft model. The AGORA project, using the ModelSEED pipeline, has produced over 7000 GEMs for human gut bacteria, combining automated draft model generation from MAGs with manual curation157,158.

CarveMe is a command-line tool designed for the quick, automated reconstruction of GEMs. The process starts by creating a universal draft model from the reactions and metabolites in the BiGG Models159, and enhancing it with manually annotated key aspects of bacterial metabolism to finalize the universal model. CarveMe then customizes this model for specific species using a process called ‘carving’, which includes gap-filling and removing irrelevant reactions and metabolites for each species153. This pipeline rapidly reconstructs metabolic models from genome sequences while maintaining critical metabolic functions.

Gapseq, another automated tool for model reconstruction, utilizes multiple biochemical databases to predict pathways from genetic content. In contrast to other tools, its reaction database sources from UniProt160 protein sequence database and the Transporter Classification Database (TCDB)161, encompassing 131,207 unique sequences154. These sequences contribute to 15,150 reactions and 8,446 metabolites, which are integrated into the universal model for reconstruction and gap-filling, thus providing a comprehensive approach to model building.

The primary technique for inferring the phenotypic behavior of organisms from GEMs is constraint-based reconstruction and analysis (COBRA)162,163. COBRA employs a systems biology approach to model an organism’s phenotypic behavior mathematically and computationally under various constraints. These constraints can represent genetic variations, environmental conditions, or interactions between different behaviors. Within the COBRA framework, flux balance analysis (FBA) stands out as a widely recognized method. FBA uses mathematical techniques to solve linear problems and determine the optimal metabolic fluxes (either mass or rate) within a reconstructed metabolic model under specific constraints164. FBA is particularly useful for simulating various biological phenomena, including maximal growth rates, the rates of metabolite production, and the impact of gene knockouts. These COBRA methods are accessible through a range of open-source software packages165168, with the COBRA toolbox being the most popular.

GEMs for human gut bacteria are frequently utilized to predict metabolic interactions between microbes and conduct community metabolic modeling. Tools such as CASINO169, BacArena170 and the Microbiome Modeling Toolbox163 are among the most popular for these purposes. Many bacterial species in the gut depend metabolically on other species, and this dependency often dictates their co-occurrence within microbial communities171. Modeling these metabolic interactions is key to understanding the structure and resilience of microbial ecosystems, including the human gut microbiome. Moreover, metabolic modeling of a personal gut microbiome can uncover the roles of specific metabolites in human diseases. This aspect of modeling is particularly important because it can reveal connections between the microbiome, diseases, and potential treatments172174. Additionally, modeling interactions between the host, microbiome, and diet can inform personalized dietary recommendations or drug dosages175. Hence, community-wide metabolic modeling that leverages both MAGs and GEMs is poised to make significant contributions to precision microbiome medicine176.

Limitations and challenges

There are challenges and limitations in the current application of genome-resolved metagenomics within human microbiome research. First, a significant proportion of MAGs within existing databases represent incomplete genomes. These genomes often contain gaps. The quality of reference genomes plays a crucial role in the success of subsequent genome-centric microbiome analyses. Therefore, active research is underway to develop methods capable of reconstructing complete MAGs (cMAGs) without any gaps48. Traditional short-read sequencing techniques fall short when it comes to assembling highly conserved sequence regions across different species, such as 16S rRNA genes, and are unable to capture genomic regions transferred between species through HGT. Researchers are exploring methods that incorporate long-read sequencing as a solution. Initially, hybrid sequencing, which combines the nucleotide-level precision of short-read sequencing with the template of long-read sequences, was employed to construct cMAGs177. More recently, employing only high-fidelity long-read sequencing has also proven successful in constructing cMAGs178. The use of high-fidelity long-read metagenomic sequencing is expected to lead to a rapid increase in the availability of complete genomes of commensal bacteria in the human body.

Second, the majority of MAGs have been assembled from metagenomic samples originating from a limited number of countries179. This unequal representation in microbiome data, and consequently in the assembled genomes, can lead to a variety of issues. These issues include an incomplete understanding of microbiome diversity across different populations and an incomplete cataloging of reference microbial genomes. Potential consequences include inconsistencies in identifying disease-associated microbes and misinterpretations in comparative microbiome studies. Therefore, future efforts in MAG-based human microbiome research should prioritize underrepresented populations.

Third, applying genome-resolved metagenomics to low-biomass samples, such as tissue microbiomes, is challenging. In these cases, only a small fraction of shotgun sequencing reads are derived from microbial genomes. To address this challenge, various methods for host DNA removal have been developed180183 and currently several host DNA removal kits are commercially available184,185. Effectively enriching bacterial DNA significantly increases the likelihood of reconstructing MAGs from low-biomass samples. This extends the applicability of the genome-resolved metagenomics approach to a wider array of microbial communities within the human body.

Supplementary information

Supplementary Table 1 (319.8KB, pdf)

Acknowledgements

This research was supported by the Korea Health Technology R&D Project, Korea Health Industry Development Institute (KHIDI), Ministry of Health & Welfare, Republic of Korea grant HI19C1344 (NY). P.B. was supported by the National Institute of Diabetes and Digestive and Kidney Diseases (R01 DK125382). The funding agencies had no role in the design and preparation of this manuscript.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Peter Belenky, Email: peter_belenky@brown.edu.

Insuk Lee, Email: insuklee@yonsei.ac.kr.

Supplementary information

The online version contains supplementary material available at 10.1038/s12276-024-01262-7.

References

  • 1.Clarridge, J. E. 3rd Impact of 16S rRNA gene sequence analysis for identification of bacteria on clinical microbiology and infectious diseases. Clin. Microbiol. Rev.17, 840–862, 10.1128/CMR.17.4.840-862.2004 (2004). 10.1128/CMR.17.4.840-862.2004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Palys, T., Nakamura, L. K. & Cohan, F. M. Discovery and classification of ecological diversity in the bacterial world: the role of DNA sequence data. Int. J. Syst. Bacteriol.47, 1145–1156, 10.1099/00207713-47-4-1145 (1997). 10.1099/00207713-47-4-1145 [DOI] [PubMed] [Google Scholar]
  • 3.Lander, E. S. Initial impact of the sequencing of the human genome. Nature470, 187–197, 10.1038/nature09792 (2011). 10.1038/nature09792 [DOI] [PubMed] [Google Scholar]
  • 4.Hassler, H. B. et al. Phylogenies of the 16S rRNA gene and its hypervariable regions lack concordance with core genome phylogenies. Microbiome10, 104, 10.1186/s40168-022-01295-y (2022). 10.1186/s40168-022-01295-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Langille, M. G. et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat. Biotechnol.31, 814–821, 10.1038/nbt.2676 (2013). 10.1038/nbt.2676 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Human Microbiome Project, C. Structure, function and diversity of the healthy human microbiome. Nature486, 207–214, 10.1038/nature11234 (2012). 10.1038/nature11234 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Integrative, H. M. P. R. N. C. The integrative human microbiome project. Nature569, 641–648, 10.1038/s41586-019-1238-8 (2019). 10.1038/s41586-019-1238-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Conlon, M. A. & Bird, A. R. The impact of diet and lifestyle on gut microbiota and human health. Nutrients7, 17–44, 10.3390/nu7010017 (2014). 10.3390/nu7010017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Parizadeh, M. & Arrieta, M. C. The global human gut microbiome: genes, lifestyles, and diet. Trends Mol. Med.29, 789–801, 10.1016/j.molmed.2023.07.002 (2023). 10.1016/j.molmed.2023.07.002 [DOI] [PubMed] [Google Scholar]
  • 10.Rothschild, D. et al. Environment dominates over host genetics in shaping human gut microbiota. Nature555, 210–215, 10.1038/nature25973 (2018). 10.1038/nature25973 [DOI] [PubMed] [Google Scholar]
  • 11.Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature568, 499–504, 10.1038/s41586-019-0965-1 (2019). 10.1038/s41586-019-0965-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Van Rossum, T., Ferretti, P., Maistrenko, O. M. & Bork, P. Diversity within species: interpreting strains in microbiomes. Nat. Rev. Microbiol.18, 491–506, 10.1038/s41579-020-0368-1 (2020). 10.1038/s41579-020-0368-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhai, Y. & Wei, C. Open pangenome of Lactococcus lactis generated by a combination of metagenome-assembled genomes and isolate genomes. Front. Microbiol.13, 948138, 10.3389/fmicb.2022.948138 (2022). 10.3389/fmicb.2022.948138 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Pavlopoulos, G. A. et al. Unraveling the functional dark matter through global metagenomics. Nature622, 594–602, 10.1038/s41586-023-06583-7 (2023). 10.1038/s41586-023-06583-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Baltoumas, F. A. et al. NMPFamsDB: a database of novel protein families from microbial metagenomes and metatranscriptomes. Nucleic Acids Res. 10.1093/nar/gkad800 (2023). [DOI] [PMC free article] [PubMed]
  • 16.Valles-Colomer, M. et al. The person-to-person transmission landscape of the gut and oral microbiomes. Nature614, 125–135, 10.1038/s41586-022-05620-1 (2023). 10.1038/s41586-022-05620-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Schmidt, T. S. et al. Extensive transmission of microbes along the gastrointestinal tract. Elife 8. 10.7554/eLife.42693 (2019). [DOI] [PMC free article] [PubMed]
  • 18.Brito, I. L. Examining horizontal gene transfer in microbial communities. Nat. Rev. Microbiol.19, 442–453, 10.1038/s41579-021-00534-7 (2021). 10.1038/s41579-021-00534-7 [DOI] [PubMed] [Google Scholar]
  • 19.Costea, P. I. et al. Subspecies in the global human gut microbiome. Mol. Syst. Biol.13, 960, 10.15252/msb.20177589 (2017). 10.15252/msb.20177589 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Zeevi, D. et al. Structural variation in the gut microbiome associates with host health. Nature568, 43–48, 10.1038/s41586-019-1065-y (2019). 10.1038/s41586-019-1065-y [DOI] [PubMed] [Google Scholar]
  • 21.Zahavi, L. et al. Bacterial SNPs in the human gut microbiome associate with host BMI. Nat. Med.29, 2785–2792, 10.1038/s41591-023-02599-8 (2023). 10.1038/s41591-023-02599-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zorrilla, F., Buric, F., Patil, K. R. & Zelezniak, A. metaGEM: reconstruction of genome scale metabolic models directly from metagenomes. Nucleic Acids Res.49, e126, 10.1093/nar/gkab815 (2021). 10.1093/nar/gkab815 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Heinken, A., Basile, A., Hertel, J., Thinnes, C. & Thiele, I. Genome-scale metabolic modeling of the human microbiome in the era of personalized medicine. Annu. Rev. Microbiol.75, 199–222, 10.1146/annurev-micro-060221-012134 (2021). 10.1146/annurev-micro-060221-012134 [DOI] [PubMed] [Google Scholar]
  • 24.Simpson, J. T. & Pop, M. The theory and practice of genome sequence assembly. Annu Rev. Genomics Hum. Genet16, 153–172, 10.1146/annurev-genom-090314-050032 (2015). 10.1146/annurev-genom-090314-050032 [DOI] [PubMed] [Google Scholar]
  • 25.Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res.27, 824–834, 10.1101/gr.213959.116 (2017). 10.1101/gr.213959.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics31, 1674–1676, 10.1093/bioinformatics/btv033 (2015). 10.1093/bioinformatics/btv033 [DOI] [PubMed] [Google Scholar]
  • 27.Compeau, P. E., Pevzner, P. A. & Tesler, G. How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol.29, 987–991, 10.1038/nbt.2023 (2011). 10.1038/nbt.2023 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Churcheward, B., Millet, M., Bihouee, A., Fertin, G. & Chaffron, S. MAGNETO: an automated workflow for genome-resolved metagenomics. mSystems7, e0043222, 10.1128/msystems.00432-22 (2022). 10.1128/msystems.00432-22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Delgado, L. F. & Andersson, A. F. Evaluating metagenomic assembly approaches for biome-specific gene catalogues. Microbiome10, 72, 10.1186/s40168-022-01259-2 (2022). 10.1186/s40168-022-01259-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sczyrba, A. et al. Critical assessment of metagenome Interpretation-a benchmark of metagenomics software. Nat. Methods14, 1063–1071, 10.1038/nmeth.4458 (2017). 10.1038/nmeth.4458 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J.11, 2864–2868, 10.1038/ismej.2017.126 (2017). 10.1038/ismej.2017.126 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Kim, C. Y. et al. Human reference gut microbiome catalog including newly assembled genomes from under-represented Asian metagenomes. Genome Med.13, 134, 10.1186/s13073-021-00950-7 (2021). 10.1186/s13073-021-00950-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods11, 1144–1146, 10.1038/nmeth.3103 (2014). 10.1038/nmeth.3103 [DOI] [PubMed] [Google Scholar]
  • 34.Wu, Y. W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics32, 605–607, 10.1093/bioinformatics/btv638 (2016). 10.1093/bioinformatics/btv638 [DOI] [PubMed] [Google Scholar]
  • 35.Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ7, e7359, 10.7717/peerj.7359 (2019). 10.7717/peerj.7359 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Nissen, J. N. et al. Improved metagenome binning and assembly using deep variational autoencoders. Nat. Biotechnol.39, 555–560, 10.1038/s41587-020-00777-4 (2021). 10.1038/s41587-020-00777-4 [DOI] [PubMed] [Google Scholar]
  • 37.Teeling, H., Meyerdierks, A., Bauer, M., Amann, R. & Glockner, F. O. Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ. Microbiol.6, 938–947, 10.1111/j.1462-2920.2004.00624.x (2004). 10.1111/j.1462-2920.2004.00624.x [DOI] [PubMed] [Google Scholar]
  • 38.Musto, H. et al. Genomic GC level, optimal growth temperature, and genome size in prokaryotes. Biochem Biophys. Res. Commun.347, 1–3, 10.1016/j.bbrc.2006.06.054 (2006). 10.1016/j.bbrc.2006.06.054 [DOI] [PubMed] [Google Scholar]
  • 39.Saeed, I., Tang, S. L. & Halgamuge, S. K. Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition. Nucleic Acids Res.40, e34, 10.1093/nar/gkr1204 (2012). 10.1093/nar/gkr1204 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol.32, 822–828, 10.1038/nbt.2939 (2014). 10.1038/nbt.2939 [DOI] [PubMed] [Google Scholar]
  • 41.Mattock, J. & Watson, M. A comparison of single-coverage and multi-coverage metagenomic binning reveals extensive hidden contamination. Nat. Methods20, 1170–1173, 10.1038/s41592-023-01934-8 (2023). 10.1038/s41592-023-01934-8 [DOI] [PubMed] [Google Scholar]
  • 42.Lindez, P. P. et al. Adversarial and variational autoencoders improve metagenomic binning. Commun. Biol.6, 1073, 10.1038/s42003-023-05452-3 (2023). 10.1038/s42003-023-05452-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Liu, C. C. et al. MetaDecoder: a novel method for clustering metagenomic contigs. Microbiome10, 46, 10.1186/s40168-022-01237-8 (2022). 10.1186/s40168-022-01237-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Pan, S., Zhu, C., Zhao, X. M. & Coelho, L. P. A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments. Nat. Commun.13, 2326, 10.1038/s41467-022-29843-y (2022). 10.1038/s41467-022-29843-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Hickl, O., Queiros, P., Wilmes, P., May, P. & Heintz-Buschart, A. binny: an automated binning algorithm to recover high-quality genomes from complex metagenomic datasets. Brief Bioinform. 23. 10.1093/bib/bbac431 (2022). [DOI] [PMC free article] [PubMed]
  • 46.Wang, Z., Huang, P., You, R., Sun, F. & Zhu, S. MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities. Genome Biol.24, 1, 10.1186/s13059-022-02832-6 (2023). 10.1186/s13059-022-02832-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Pan, S., Zhao, X. M. & Coelho, L. P. SemiBin2: self-supervised contrastive learning leads to better MAGs for short- and long-read sequencing. Bioinformatics39, i21–i29, 10.1093/bioinformatics/btad209 (2023). 10.1093/bioinformatics/btad209 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Chen, L. X., Anantharaman, K., Shaiber, A., Eren, A. M. & Banfield, J. F. Accurate and complete genomes from metagenomes. Genome Res.30, 315–333, 10.1101/gr.258640.119 (2020). 10.1101/gr.258640.119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Meyer, F. et al. Critical assessment of metagenome interpretation: the second round of challenges. Nat. Methods19, 429–440, 10.1038/s41592-022-01431-4 (2022). 10.1038/s41592-022-01431-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Song, W. Z. & Thomas, T. Binning_refiner: improving genome bins through the combination of different binning programs. Bioinformatics33, 1873–1875, 10.1093/bioinformatics/btx086 (2017). 10.1093/bioinformatics/btx086 [DOI] [PubMed] [Google Scholar]
  • 51.Sieber, C. M. K. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat. Microbiol.3, 836–843, 10.1038/s41564-018-0171-1 (2018). 10.1038/s41564-018-0171-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Uritskiy, G. V., DiRuggiero, J. & Taylor, J. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome6, 158, 10.1186/s40168-018-0541-1 (2018). 10.1186/s40168-018-0541-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Ruhlemann, M. C., Wacker, E. M., Ellinghaus, D. & Franke, A. MAGScoT: a fast, lightweight and accurate bin-refinement tool. Bioinformatics38, 5430–5433, 10.1093/bioinformatics/btac694 (2022). 10.1093/bioinformatics/btac694 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Chklovski, A., Parks, D. H., Woodcroft, B. J. & Tyson, G. W. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat. Methods20, 1203–1212, 10.1038/s41592-023-01940-w (2023). 10.1038/s41592-023-01940-w [DOI] [PubMed] [Google Scholar]
  • 55.Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res.25, 1043–1055, 10.1101/gr.186072.114 (2015). 10.1101/gr.186072.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol.35, 725–731, 10.1038/nbt.3893 (2017). 10.1038/nbt.3893 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Eisenhofer, R., Odriozola, I. & Alberdi, A. Impact of microbial genome completeness on metagenomic functional inference. ISME Commun.3, 12, 10.1038/s43705-023-00221-z (2023). 10.1038/s43705-023-00221-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Cornet, L. & Baurain, D. Contamination detection in genomic data: more is not enough. Genome Biol.23, 60, 10.1186/s13059-022-02619-9 (2022). 10.1186/s13059-022-02619-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Orakov, A. et al. GUNC: detection of chimerism and contamination in prokaryotic genomes. Genome Biol.22, 178, 10.1186/s13059-021-02393-0 (2021). 10.1186/s13059-021-02393-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Fullam, A. et al. proGenomes3: approaching one million accurately and consistently annotated high-quality prokaryotic genomes. Nucleic Acids Res.51, D760–D766, 10.1093/nar/gkac1078 (2023). 10.1093/nar/gkac1078 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Zeng, S. et al. A compendium of 32,277 metagenome-assembled genomes and over 80 million genes from the early-life human gut microbiome. Nat. Commun.13, 5139, 10.1038/s41467-022-32805-z (2022). 10.1038/s41467-022-32805-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Schoch, C. L. et al. NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database (Oxford) 2020. 10.1093/database/baaa062 (2020). [DOI] [PMC free article] [PubMed]
  • 63.Parks, D. H. et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res.50, D785–D794, 10.1093/nar/gkab776 (2022). 10.1093/nar/gkab776 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Beiko, R. G. Microbial malaise: how can we classify the microbiome? Trends Microbiol.23, 671–679, 10.1016/j.tim.2015.08.009 (2015). 10.1016/j.tim.2015.08.009 [DOI] [PubMed] [Google Scholar]
  • 65.Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol.36, 996–1004, 10.1038/nbt.4229 (2018). 10.1038/nbt.4229 [DOI] [PubMed] [Google Scholar]
  • 66.Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics36, 1925–1927, 10.1093/bioinformatics/btz848 (2019). 10.1093/bioinformatics/btz848 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Nayfach, S., Shi, Z. J., Seshadri, R., Pollard, K. S. & Kyrpides, N. C. New insights from uncultivated genomes of the global human gut microbiome. Nature568, 505–510, 10.1038/s41586-019-1058-x (2019). 10.1038/s41586-019-1058-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Castelle, C. J. & Banfield, J. F. Major new microbial groups expand diversity and alter our understanding of the tree of life. Cell172, 1181–1197, 10.1016/j.cell.2018.02.016 (2018). 10.1016/j.cell.2018.02.016 [DOI] [PubMed] [Google Scholar]
  • 69.Dimonaco, N. J., Aubrey, W., Kenobi, K., Clare, A. & Creevey, C. J. No one tool to rule them all: prokaryotic gene prediction tool annotations are highly dependent on the organism of study. Bioinformatics38, 1198–1207, 10.1093/bioinformatics/btab827 (2022). 10.1093/bioinformatics/btab827 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinforma.11, 119, 10.1186/1471-2105-11-119 (2010). 10.1186/1471-2105-11-119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res.47, D309–D314, 10.1093/nar/gky1085 (2019). 10.1093/nar/gky1085 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Cantalapiedra, C. P., Hernandez-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol.38, 5825–5829, 10.1093/molbev/msab293 (2021). 10.1093/molbev/msab293 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Gabaldon, T. & Koonin, E. V. Functional and evolutionary implications of gene orthology. Nat. Rev. Genet.14, 360–366, 10.1038/nrg3456 (2013). 10.1038/nrg3456 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res.33, W116–W120, 10.1093/nar/gki442 (2005). 10.1093/nar/gki442 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics30, 2068–2069, 10.1093/bioinformatics/btu153 (2014). 10.1093/bioinformatics/btu153 [DOI] [PubMed] [Google Scholar]
  • 76.Schwengers, O. et al. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microb. Genom 7. 10.1099/mgen.0.000685 (2021). [DOI] [PMC free article] [PubMed]
  • 77.Shaffer, M. et al. DRAM for distilling microbial metabolism to automate the curation of microbiome function. Nucleic Acids Res.48, 8883–8900, 10.1093/nar/gkaa621 (2020). 10.1093/nar/gkaa621 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Ruiz-Perez, C. A., Conrad, R. E. & Konstantinidis, K. T. MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes. BMC Bioinforma.22, 11, 10.1186/s12859-020-03940-5 (2021). 10.1186/s12859-020-03940-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Kanehisa, M., Furumichi, M., Sato, Y., Kawashima, M. & Ishiguro-Watanabe, M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res.51, D587–D592, 10.1093/nar/gkac963 (2023). 10.1093/nar/gkac963 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Gene Ontology, C. The gene ontology resource: enriching a GOld mine. Nucleic Acids Res.49, D325–D334, 10.1093/nar/gkaa1113 (2021). 10.1093/nar/gkaa1113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res.49, D412–D419, 10.1093/nar/gkaa913 (2021). 10.1093/nar/gkaa913 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Drula, E. et al. The carbohydrate-active enzyme database: functions and literature. Nucleic Acids Res.50, D571–D577, 10.1093/nar/gkab1045 (2022). 10.1093/nar/gkab1045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Antimicrobial Resistance, C. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet399, 629–655, 10.1016/S0140-6736(21)02724-0 (2022). 10.1016/S0140-6736(21)02724-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Crits-Christoph, A., Hallowell, H. A., Koutouvalis, K. & Suez, J. Good microbes, bad genes? The dissemination of antimicrobial resistance in the human microbiome. Gut Microbes14, 2055944, 10.1080/19490976.2022.2055944 (2022). 10.1080/19490976.2022.2055944 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Sommer, M. O. A., Dantas, G. & Church, G. M. Functional characterization of the antibiotic resistance reservoir in the human microflora. Science325, 1128–1131, 10.1126/science.1176950 (2009). 10.1126/science.1176950 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Forster, S. C. et al. Strain-level characterization of broad host range mobile genetic elements transferring antibiotic resistance from the human microbiome. Nat. Commun.13, 1445, 10.1038/s41467-022-29096-9 (2022). 10.1038/s41467-022-29096-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Lee, K. et al. Population-level impacts of antibiotic usage on the human gut microbiome. Nat. Commun.14, 1191, 10.1038/s41467-023-36633-7 (2023). 10.1038/s41467-023-36633-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Fredriksen, S., de Warle, S., van Baarlen, P., Boekhorst, J. & Wells, J. M. Resistome expansion in disease-associated human gut microbiomes. Microbiome11, 166, 10.1186/s40168-023-01610-1 (2023). 10.1186/s40168-023-01610-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Rowan-Nash, A. D., Araos, R., D’Agata, E. M. C. & Belenky, P. Antimicrobial resistance gene prevalence in a population of patients with advanced dementia is related to specific pathobionts. iScience23, 100905, 10.1016/j.isci.2020.100905 (2020). 10.1016/j.isci.2020.100905 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Alcock, B. P. et al. CARD 2023: expanded curation, support for machine learning, and resistome prediction at the comprehensive antibiotic resistance database. Nucleic Acids Res.51, D690–D699, 10.1093/nar/gkac920 (2023). 10.1093/nar/gkac920 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Seemann, T. Abricate, Githubhttps://github.com/tseemann/abricate.
  • 92.Gibson, M. K., Forsberg, K. J. & Dantas, G. Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J.9, 207–216, 10.1038/ismej.2014.106 (2015). 10.1038/ismej.2014.106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Berglund, F. et al. Identification and reconstruction of novel antibiotic resistance genes from metagenomes. Microbiome7, 52, 10.1186/s40168-019-0670-1 (2019). 10.1186/s40168-019-0670-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Arango-Argoty, G. et al. DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. Microbiome6, 23, 10.1186/s40168-018-0401-z (2018). 10.1186/s40168-018-0401-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Wu, J. et al. PLM-ARG: antibiotic resistance gene identification using a pretrained protein language model. Bioinformatics 39. 10.1093/bioinformatics/btad690 (2023). [DOI] [PMC free article] [PubMed]
  • 96.Gschwind, R. et al. ResFinderFG v2.0: a database of antibiotic resistance genes obtained by functional metagenomics. Nucleic Acids Res.51, W493–W500, 10.1093/nar/gkad384 (2023). 10.1093/nar/gkad384 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Bonin, N. et al. MEGARes and AMR++, v3.0: an updated comprehensive database of antimicrobial resistance determinants and an improved software pipeline for classification using high-throughput sequencing. Nucleic Acids Res.51, D744–D752, 10.1093/nar/gkac1047 (2023). 10.1093/nar/gkac1047 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Zhang, G., Ross, C. R. & Blecha, F. Porcine antimicrobial peptides: new prospects for ancient molecules of host defense. Vet. Res.31, 277–296, 10.1051/vetres:2000121 (2000). 10.1051/vetres:2000121 [DOI] [PubMed] [Google Scholar]
  • 99.Huan, Y., Kong, Q., Mou, H. & Yi, H. Antimicrobial peptides: classification, design, application and research progress in multiple fields. Front. Microbiol.11, 582779, 10.3389/fmicb.2020.582779 (2020). 10.3389/fmicb.2020.582779 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Divyashree, M. et al. Clinical Applications of Antimicrobial Peptides (AMPs): where do we stand now? Protein Pept. Lett.27, 120–134, 10.2174/0929866526666190925152957 (2020). 10.2174/0929866526666190925152957 [DOI] [PubMed] [Google Scholar]
  • 101.Garcia-Gutierrez, E., Mayer, M. J., Cotter, P. D. & Narbad, A. Gut microbiota as a source of novel antimicrobials. Gut Microbes10, 1–21, 10.1080/19490976.2018.1455790 (2019). 10.1080/19490976.2018.1455790 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Ma, Y. et al. Identification of antimicrobial peptides from the human gut microbiome using deep learning. Nat. Biotechnol.40, 921–931, 10.1038/s41587-022-01226-0 (2022). 10.1038/s41587-022-01226-0 [DOI] [PubMed] [Google Scholar]
  • 103.Spanig, S. & Heider, D. Encodings and models for antimicrobial peptide classification for multi-resistant pathogens. BioData Min.12, 7, 10.1186/s13040-019-0196-x (2019). 10.1186/s13040-019-0196-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Veltri, D., Kamath, U. & Shehu, A. Deep learning improves antimicrobial peptide recognition. Bioinformatics34, 2740–2747, 10.1093/bioinformatics/bty179 (2018). 10.1093/bioinformatics/bty179 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Li, C. et al. AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BMC Genomics23, 77, 10.1186/s12864-022-08310-4 (2022). 10.1186/s12864-022-08310-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Tettelin, H. et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome. Proc. Natl Acad. Sci. USA102, 13950–13955, 10.1073/pnas.0506758102 (2005). [DOI] [PMC free article] [PubMed]
  • 107.Mira, A., Martin-Cuadrado, A. B., D’Auria, G. & Rodriguez-Valera, F. The bacterial pan-genome:a new paradigm in microbiology. Int. Microbiol.13, 45–57, 10.2436/20.1501.01.110 (2010). 10.2436/20.1501.01.110 [DOI] [PubMed] [Google Scholar]
  • 108.Zhu, A., Sunagawa, S., Mende, D. R. & Bork, P. Inter-individual differences in the gene content of human gut bacterial species. Genome Biol.16, 82, 10.1186/s13059-015-0646-9 (2015). 10.1186/s13059-015-0646-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Tantoso, E. et al. To kill or to be killed: pangenome analysis of Escherichia coli strains reveals a tailocin specific for pandemic ST131. BMC Biol.20, 146, 10.1186/s12915-022-01347-7 (2022). 10.1186/s12915-022-01347-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Holt, K. E. et al. Genomic analysis of diversity, population structure, virulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to public health. Proc. Natl Acad. Sci. USA112, E3574–E3581, 10.1073/pnas.1501049112 (2015). 10.1073/pnas.1501049112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Manzano-Morales, S., Liu, Y., Gonzalez-Bodi, S., Huerta-Cepas, J. & Iranzo, J. Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses. Genome Biol.24, 250, 10.1186/s13059-023-03089-3 (2023). 10.1186/s13059-023-03089-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Soria, P. S., McGary, K. L. & Rokas, A. Functional divergence for every paralog. Mol. Biol. Evol.31, 984–992, 10.1093/molbev/msu050 (2014). 10.1093/molbev/msu050 [DOI] [PubMed] [Google Scholar]
  • 113.Fang, G., Rocha, E. P. & Danchin, A. Persistence drives gene clustering in bacterial genomes. BMC Genomics9, 4, 10.1186/1471-2164-9-4 (2008). 10.1186/1471-2164-9-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Page, A. J. et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics31, 3691–3693, 10.1093/bioinformatics/btv421 (2015). 10.1093/bioinformatics/btv421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Gautreau, G. et al. PPanGGOLiN: depicting microbial diversity via a partitioned pangenome graph. PLoS Comput. Biol.16, e1007732, 10.1371/journal.pcbi.1007732 (2020). 10.1371/journal.pcbi.1007732 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Tonkin-Hill, G. et al. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol.21, 180, 10.1186/s13059-020-02090-4 (2020). 10.1186/s13059-020-02090-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Li, T. & Yin, Y. Critical assessment of pan-genomic analysis of metagenome-assembled genomes. Brief Bioinform. 23 10.1093/bib/bbac413 (2022). [DOI] [PMC free article] [PubMed]
  • 118.Horsfield, S. T., Tonkin-Hill, G., Croucher, N. J. & Lees, J. A. Accurate and fast graph-based pangenome annotation and clustering with ggCaller. Genome Res.33, 1622–1637, 10.1101/gr.277733.123 (2023). 10.1101/gr.277733.123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Gurbich, T. A. et al. MGnify genomes: a resource for biome-specific microbial genome catalogues. J. Mol. Biol.435, 168016, 10.1016/j.jmb.2023.168016 (2023). 10.1016/j.jmb.2023.168016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol.39, 105–114, 10.1038/s41587-020-0603-3 (2021). 10.1038/s41587-020-0603-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Leviatan, S., Shoer, S., Rothschild, D., Gorodetski, M. & Segal, E. An expanded reference map of the human gut microbiome reveals hundreds of previously unknown species. Nat. Commun.13, 3863, 10.1038/s41467-022-31502-1 (2022). 10.1038/s41467-022-31502-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Gounot, J. S. et al. Genome-centric analysis of short and long read metagenomes reveals uncharacterized microbiome diversity in Southeast Asians. Nat. Commun.13, 6044, 10.1038/s41467-022-33782-z (2022). 10.1038/s41467-022-33782-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Jin, H. et al. A high-quality genome compendium of the human gut microbiome of Inner Mongolians. Nat. Microbiol.8, 150–161, 10.1038/s41564-022-01270-1 (2023). 10.1038/s41564-022-01270-1 [DOI] [PubMed] [Google Scholar]
  • 124.Ghazi, A. R., Munch, P. C., Chen, D., Jensen, J. & Huttenhower, C. Strain identification and quantitative analysis in microbial communities. J. Mol. Biol.434, 167582, 10.1016/j.jmb.2022.167582 (2022). 10.1016/j.jmb.2022.167582 [DOI] [PubMed] [Google Scholar]
  • 125.Zhao, C., Dimitrov, B., Goldman, M., Nayfach, S. & Pollard, K. S. MIDAS2: metagenomic Intra-species diversity analysis system. Bioinformatics 39. 10.1093/bioinformatics/btac713 (2023). [DOI] [PMC free article] [PubMed]
  • 126.Wang, D. et al. Characterization of gut microbial structural variations as determinants of human bile acid metabolism. Cell Host Microbe29, 1802–1814.e1805, 10.1016/j.chom.2021.11.003 (2021). 10.1016/j.chom.2021.11.003 [DOI] [PubMed] [Google Scholar]
  • 127.Liu, R. et al. Gut microbial structural variation associates with immune checkpoint inhibitor response. Nat. Commun.14, 7421, 10.1038/s41467-023-42997-7 (2023). 10.1038/s41467-023-42997-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Treangen, T. J., Ondov, B. D., Koren, S. & Phillippy, A. M. The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol.15, 524, 10.1186/s13059-014-0524-x (2014). 10.1186/s13059-014-0524-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Truong, D. T., Tett, A., Pasolli, E., Huttenhower, C. & Segata, N. Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res.27, 626–638, 10.1101/gr.216242.116 (2017). 10.1101/gr.216242.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Costea, P. I. et al. metaSNV: a tool for metagenomic strain level analysis. PLoS One12, e0182392, 10.1371/journal.pone.0182392 (2017). 10.1371/journal.pone.0182392 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Nayfach, S., Rodriguez-Mueller, B., Garud, N. & Pollard, K. S. An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography. Genome Res.26, 1612–1625, 10.1101/gr.201863.115 (2016). 10.1101/gr.201863.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Shi, Z. J., Dimitrov, B., Zhao, C., Nayfach, S. & Pollard, K. S. Fast and accurate metagenotyping of the human gut microbiome with GT-Pro. Nat. Biotechnol.40, 507–516, 10.1038/s41587-021-01102-3 (2022). 10.1038/s41587-021-01102-3 [DOI] [PubMed] [Google Scholar]
  • 133.Chen, B. Y. et al. Roles of oral microbiota and oral-gut microbial transmission in hypertension. J. Adv. Res.43, 147–161, 10.1016/j.jare.2022.03.007 (2023). 10.1016/j.jare.2022.03.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Asnicar, F. et al. Studying vertical microbiome transmission from mothers to infants by strain-level metagenomic profiling. mSystems 2. 10.1128/mSystems.00164-16 (2017). [DOI] [PMC free article] [PubMed]
  • 135.Li, S. S. et al. Durable coexistence of donor and recipient strains after fecal microbiota transplantation. Science352, 586–589, 10.1126/science.aad8852 (2016). 10.1126/science.aad8852 [DOI] [PubMed] [Google Scholar]
  • 136.Schmidt, T. S. B. et al. Drivers and determinants of strain dynamics following fecal microbiota transplantation. Nat. Med.28, 1902–1912, 10.1038/s41591-022-01913-0 (2022). 10.1038/s41591-022-01913-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Ianiro, G. et al. Variability of strain engraftment and predictability of microbiome composition after fecal microbiota transplantation across different diseases. Nat. Med.28, 1913–1923, 10.1038/s41591-022-01964-3 (2022). 10.1038/s41591-022-01964-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Schloissnig, S. et al. Genomic variation landscape of the human gut microbiome. Nature493, 45–50, 10.1038/nature11711 (2013). 10.1038/nature11711 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.Lloyd-Price, J. et al. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature550, 61–66, 10.1038/nature23889 (2017). 10.1038/nature23889 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.Vatanen, T. et al. Genomic variation and strain-specific functional adaptation in the human gut microbiome during early life. Nat. Microbiol.4, 470–479, 10.1038/s41564-018-0321-5 (2019). 10.1038/s41564-018-0321-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Garud, N. R., Good, B. H., Hallatschek, O. & Pollard, K. S. Evolutionary dynamics of bacteria in the gut microbiome within and across hosts. PLoS Biol.17, e3000102, 10.1371/journal.pbio.3000102 (2019). 10.1371/journal.pbio.3000102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Roodgar, M. et al. Longitudinal linked-read sequencing reveals ecological and evolutionary responses of a human gut microbiome during antibiotic treatment. Genome Res.31, 1433–1446, 10.1101/gr.265058.120 (2021). 10.1101/gr.265058.120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143.Visconti, A. et al. Interplay between the human gut microbiome and host metabolism. Nat. Commun.10, 4505, 10.1038/s41467-019-12476-z (2019). 10.1038/s41467-019-12476-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 144.Nicholson, J. K. et al. Host-gut microbiota metabolic interactions. Science336, 1262–1267, 10.1126/science.1223813 (2012). 10.1126/science.1223813 [DOI] [PubMed] [Google Scholar]
  • 145.Zhang, Y., Chen, R., Zhang, D., Qi, S. & Liu, Y. Metabolite interactions between host and microbiota during health and disease: Which feeds the other? Biomed. Pharmacother.160, 114295, 10.1016/j.biopha.2023.114295 (2023). 10.1016/j.biopha.2023.114295 [DOI] [PubMed] [Google Scholar]
  • 146.Yim, H. et al. Metabolic engineering of Escherichia coli for direct production of 1,4-butanediol. Nat. Chem. Biol.7, 445–452, 10.1038/nchembio.580 (2011). 10.1038/nchembio.580 [DOI] [PubMed] [Google Scholar]
  • 147.Edwards, J. S. & Palsson, B. O. Systems properties of the Haemophilus influenzae Rd metabolic genotype. J. Biol. Chem.274, 17410–17416, 10.1074/jbc.274.25.17410 (1999). 10.1074/jbc.274.25.17410 [DOI] [PubMed] [Google Scholar]
  • 148.Wang, H. et al. RAVEN 2.0: a versatile toolbox for metabolic network reconstruction and a case study on Streptomyces coelicolor. PLoS Comput. Biol.14, e1006541, 10.1371/journal.pcbi.1006541 (2018). 10.1371/journal.pcbi.1006541 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 149.Karp, P. D. et al. Pathway Tools version 19.0 update: software for pathway/genome informatics and systems biology. Brief. Bioinform17, 877–890, 10.1093/bib/bbv079 (2016). 10.1093/bib/bbv079 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150.Dias, O., Rocha, M., Ferreira, E. C. & Rocha, I. Reconstructing genome-scale metabolic models with merlin. Nucleic Acids Res.43, 3899–3910, 10.1093/nar/gkv294 (2015). 10.1093/nar/gkv294 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 151.Seaver, S. M. D. et al. The ModelSEED Biochemistry Database for the integration of metabolic annotations and the reconstruction, comparison and analysis of metabolic models for plants, fungi and microbes. Nucleic Acids Res.49, D575–D588, 10.1093/nar/gkaa746 (2021). 10.1093/nar/gkaa746 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152.Henry, C. S. et al. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat. Biotechnol.28, 977–982, 10.1038/nbt.1672 (2010). 10.1038/nbt.1672 [DOI] [PubMed] [Google Scholar]
  • 153.Machado, D., Andrejev, S., Tramontano, M. & Patil, K. R. Fast automated reconstruction of genome-scale metabolic models for microbial species and communities. Nucleic Acids Res.46, 7542–7553, 10.1093/nar/gky537 (2018). 10.1093/nar/gky537 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 154.Zimmermann, J., Kaleta, C. & Waschina, S. gapseq: informed prediction of bacterial metabolic pathways and reconstruction of accurate metabolic models. Genome Biol.22, 81, 10.1186/s13059-021-02295-1 (2021). 10.1186/s13059-021-02295-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 155.Lieven, C. et al. MEMOTE for standardized genome-scale metabolic model testing. Nat. Biotechnol.38, 272–276, 10.1038/s41587-020-0446-y (2020). 10.1038/s41587-020-0446-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 156.Overbeek, R. et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res.42, D206–D214, 10.1093/nar/gkt1226 (2014). 10.1093/nar/gkt1226 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 157.Magnusdottir, S. et al. Generation of genome-scale metabolic reconstructions for 773 members of the human gut microbiota. Nat. Biotechnol.35, 81–89, 10.1038/nbt.3703 (2017). 10.1038/nbt.3703 [DOI] [PubMed] [Google Scholar]
  • 158.Heinken, A. et al. Genome-scale metabolic reconstruction of 7,302 human microorganisms for personalized medicine. Nat. Biotechnol.41, 1320–1331, 10.1038/s41587-022-01628-0 (2023). 10.1038/s41587-022-01628-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 159.King, Z. A. et al. BiGG Models: a platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res.44, D515–D522, 10.1093/nar/gkv1049 (2016). 10.1093/nar/gkv1049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 160.UniProt, C. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res.49, D480–D489, 10.1093/nar/gkaa1100 (2021). 10.1093/nar/gkaa1100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 161.Saier, M. H. Jr, Reddy, V. S., Tamang, D. G. & Vastermark, A. The transporter classification database. Nucleic Acids Res.42, D251–D258, 10.1093/nar/gkt1097 (2014). 10.1093/nar/gkt1097 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 162.Becker, S. A. et al. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nat. Protoc.2, 727–738, 10.1038/nprot.2007.99 (2007). 10.1038/nprot.2007.99 [DOI] [PubMed] [Google Scholar]
  • 163.Heinken, A. & Thiele, I. Microbiome Modelling Toolbox 2.0: efficient, tractable modelling of microbiome communities. Bioinformatics38, 2367–2368, 10.1093/bioinformatics/btac082 (2022). 10.1093/bioinformatics/btac082 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 164.Orth, J. D., Thiele, I. & Palsson, B. O. What is flux balance analysis? Nat. Biotechnol.28, 245–248, 10.1038/nbt.1614 (2010). 10.1038/nbt.1614 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 165.Heirendt, L. et al. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat. Protoc.14, 639–702, 10.1038/s41596-018-0098-2 (2019). 10.1038/s41596-018-0098-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 166.Jung, T. S., Yeo, H. C., Reddy, S. G., Cho, W. S. & Lee, D. Y. WEbcoli: an interactive and asynchronous web application for in silico design and analysis of genome-scale E.coli model. Bioinformatics25, 2850–2852, 10.1093/bioinformatics/btp496 (2009). 10.1093/bioinformatics/btp496 [DOI] [PubMed] [Google Scholar]
  • 167.Lee, S. Y. et al. MetaFluxNet, a program package for metabolic pathway construction and analysis, and its use in large-scale metabolic flux analysis of Escherichia coli. Genome Inf.14, 23–33 (2003). [PubMed] [Google Scholar]
  • 168.Klamt, S., Saez-Rodriguez, J. & Gilles, E. D. Structural and functional analysis of cellular networks with CellNetAnalyzer. BMC Syst. Biol.1, 2, 10.1186/1752-0509-1-2 (2007). 10.1186/1752-0509-1-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 169.Shoaie, S. et al. Quantifying diet-induced metabolic changes of the human gut microbiome. Cell Metab.22, 320–331, 10.1016/j.cmet.2015.07.001 (2015). 10.1016/j.cmet.2015.07.001 [DOI] [PubMed] [Google Scholar]
  • 170.Bauer, E., Zimmermann, J., Baldini, F., Thiele, I. & Kaleta, C. BacArena: individual-based metabolic modeling of heterogeneous microbes in complex communities. PLoS Comput. Biol.13, e1005544, 10.1371/journal.pcbi.1005544 (2017). 10.1371/journal.pcbi.1005544 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 171.Zelezniak, A. et al. Metabolic dependencies drive species co-occurrence in diverse microbial communities. Proc. Natl Acad. Sci. USA112, 6449–6454, 10.1073/pnas.1421834112 (2015). 10.1073/pnas.1421834112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 172.Hertel, J., Heinken, A., Martinelli, F. & Thiele, I. Integration of constraint-based modeling with fecal metabolomics reveals large deleterious effects of Fusobacterium spp. on community butyrate production. Gut Microbes13, 1–23, 10.1080/19490976.2021.1915673 (2021). 10.1080/19490976.2021.1915673 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 173.Hertel, J. et al. Integrated analyses of microbiome and longitudinal metabolome data reveal microbial-host interactions on Sulfur Metabolism in Parkinson’s disease. Cell Rep.29, 1767–1777.e1768, 10.1016/j.celrep.2019.10.035 (2019). 10.1016/j.celrep.2019.10.035 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 174.Hale, V. L. et al. Synthesis of multi-omic data and community metabolic models reveals insights into the role of hydrogen sulfide in colon cancer. Methods149, 59–68, 10.1016/j.ymeth.2018.04.024 (2018). 10.1016/j.ymeth.2018.04.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 175.Kolodziejczyk, A. A., Zheng, D. & Elinav, E. Diet-microbiota interactions and personalized nutrition. Nat. Rev. Microbiol.17, 742–753, 10.1038/s41579-019-0256-8 (2019). 10.1038/s41579-019-0256-8 [DOI] [PubMed] [Google Scholar]
  • 176.Nielsen, J. Systems biology of metabolism: a driver for developing personalized and precision medicine. Cell Metab.25, 572–579, 10.1016/j.cmet.2017.02.002 (2017). 10.1016/j.cmet.2017.02.002 [DOI] [PubMed] [Google Scholar]
  • 177.Moss, E. L., Maghini, D. G. & Bhatt, A. S. Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nat. Biotechnol.38, 701–707, 10.1038/s41587-020-0422-6 (2020). 10.1038/s41587-020-0422-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 178.Kim, C. Y., Ma, J. & Lee, I. HiFi metagenomic sequencing enables assembly of accurate and complete genomes from human gut microbiota. Nat. Commun.13, 6367, 10.1038/s41467-022-34149-0 (2022). 10.1038/s41467-022-34149-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 179.Abdill, R. J., Adamowicz, E. M. & Blekhman, R. Public human microbiome data are dominated by highly developed countries. PLoS Biol.20, e3001536, 10.1371/journal.pbio.3001536 (2022). 10.1371/journal.pbio.3001536 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 180.Feehery, G. R. et al. A method for selectively enriching microbial DNA from contaminating vertebrate host DNA. PLoS One8, e76096, 10.1371/journal.pone.0076096 (2013). 10.1371/journal.pone.0076096 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 181.Nelson, M. T. et al. Human and extracellular DNA depletion for metagenomic analysis of complex clinical infection samples yields optimized viable microbiome profiles. Cell Rep.26, 2227–2240.e2225, 10.1016/j.celrep.2019.01.091 (2019). 10.1016/j.celrep.2019.01.091 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 182.Marotz, C. A. et al. Improving saliva shotgun metagenomics by chemical host DNA depletion. Microbiome6, 42, 10.1186/s40168-018-0426-3 (2018). 10.1186/s40168-018-0426-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 183.Wu-Woods, N. J. et al. Microbial-enrichment method enables high-throughput metagenomic characterization from host-rich samples. Nat. Methods20, 1672–1682, 10.1038/s41592-023-02025-4 (2023). 10.1038/s41592-023-02025-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 184.Heravi, F. S., Zakrzewski, M., Vickery, K. & Hu, H. Host DNA depletion efficiency of microbiome DNA enrichment methods in infected tissue samples. J. Microbiol. Methods170, 105856, 10.1016/j.mimet.2020.105856 (2020). 10.1016/j.mimet.2020.105856 [DOI] [PubMed] [Google Scholar]
  • 185.Ahannach, S. et al. Microbial enrichment and storage for metagenomics of vaginal, skin, and saliva samples. iScience24, 103306, 10.1016/j.isci.2021.103306 (2021). 10.1016/j.isci.2021.103306 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table 1 (319.8KB, pdf)

Articles from Experimental & Molecular Medicine are provided here courtesy of Korean Society for Biochemistry and Molecular Biology

RESOURCES