AI in microbiome‐related healthcare

Niklas Probul; Zihua Huang; Christina Caroline Saak; Jan Baumbach; Markus List

doi:10.1111/1751-7915.70027

. 2024 Nov 2;17(11):e70027. doi: 10.1111/1751-7915.70027

AI in microbiome‐related healthcare

Niklas Probul ¹, Zihua Huang ², Christina Caroline Saak ¹, Jan Baumbach ^1,^3,^✉, Markus List ^2,⁴

PMCID: PMC11530995 PMID: 39487766

Abstract

Artificial intelligence (AI) has the potential to transform clinical practice and healthcare. Following impressive advancements in fields such as computer vision and medical imaging, AI is poised to drive changes in microbiome‐based healthcare while facing challenges specific to the field. This review describes the state‐of‐the‐art use of AI in microbiome‐related healthcare. It points out limitations across topics such as data handling, AI modelling and safeguarding patient privacy. Furthermore, we indicate how these current shortcomings could be overcome in the future and discuss the influence and opportunities of increasingly complex data on microbiome‐based healthcare.

ARTIFICIAL INTELLIGENCE IN MICROBIOME RESEARCH

In recent years, it has become increasingly evident that human health is tightly linked to the human microbiome (Pruss & Sonnenburg, 2021; Rao et al., 2021; Talmor‐Barkan et al., 2022). Conversely, non‐communicable human diseases such as inflammatory bowel disease are related to an imbalance in the gut microbiome, often called dysbiosis (Santana et al., 2022; Yu, 2018). The term dysbiosis, which is widely used, is ill‐defined and disputed (Brüssow, 2020). It relates to the general concept that changes in microbial composition are associated with a disease. These changes are most evident when a significant loss of microbiota richness occurs, but many studies now aim at identifying more subtle disease‐associated microbial signatures. Generally, microbiome‐related healthcare aims to identify changes in microbial profiles that can be used to diagnose a disease and that can be used to suggest therapies for restoring a healthy microbiota. Countless companies have recognized this potential and offer microbiome tests along with probiotics and food supplements to treat a microbiome imbalance. However, a recent study has shown that the claims of these companies are mostly not substantiated by research and lack clinical validation (Hoffmann et al., 2024). This is unsurprising, as the complexity of microbial communities in general and host–microbiome interactions in particular is far from understood (Moreno‐Indias et al., 2021). With the increasing availability of heterogeneous and complex microbiome‐related data, however, integrating artificial intelligence (AI) into microbiome research has become increasingly popular and necessary (Hernández Medina et al., 2022) as it offers a promising avenue for advancing microbiome‐related healthcare as presented in this review. Various and partially conflicting definitions of AI exist, but we will attempt to clarify what we mean in this context. When researchers speak of AI, they typically focus on creating systems capable of performing tasks that require a certain amount of intelligence, like pattern recognition, decision‐making or natural language processing. AI tools go beyond statistical learning methods, which can also recognize patterns but need more explicit programming to fulfil a task. Researchers also often speak of classical machine learning when they refer to simpler models that perform adequately on smaller datasets. In contrast, contemporary AI methods such as deep learning are much more complex but require vastly more data and computational power. While AI has the potential to be applied to nearly all parts of microbiome analysis, we only highlight an excerpt of available AI methods and applications in microbiome‐related healthcare as an introduction to the field.

AI has already been instrumental in advancing microbiome data analysis, for instance, by increasing the quality of microbial genomes constructed from patient samples and improving the detection of novel microbes, genes and metabolic pathways (Sun et al., 2023). It is also frequently used in preprocessing pipelines for microbiome‐related analysis tasks such as taxonomic annotation (Bokulich et al., 2018) and feature selection (Peng et al., 2005; Queen & Emrich, 2021).

Furthermore, AI has advanced biomarker detection for drug discovery (Bohr & Memarzadeh, 2020), pathogen classification and detection (Jiang et al., 2022) as well as predicting disease susceptibility, progression and treatment response (Routy et al., 2018) based on microbial compositions found in, for example, the gut microbiome (Giuffrè et al., 2023). For example, Gacese et al. created random forest and logistic regression prediction models based on patient's gut microbiome metagenomics data, faecal calprotectin (FCal), human beta defensin 2 (HBD2) and chromogranin A (CgA) to distinguish Inflammatory bowel disease (IBD) and irritable bowel syndrome (IBS) in a non‐invasive way, which aim to reduce the number of endoscopies needed (Gacesa et al., 2021). This holds for communicable diseases, but also for non‐communicable diseases whose origin and onset are harder to classify and detect. For instance, the role of the microbiome in autoimmune diseases such as graft‐versus‐host disease, inflammatory bowel disease and Type 2 diabetes is recognized but not clearly understood, limiting the clinical application of microbiome‐associated diagnostics and therapy.

The highly individual composition of the human microbiome is a strong argument for developing personalized medicine approaches and personalized nutrition (Ratiner et al., 2023). For example, Zeevi et al. built a model using relative abundances of 16S rRNA‐based phyla and other features (e.g. meal features, clinical features and dietary habits) to predict the personalized postprandial glycemic response to real‐life meals (Zeevi et al., 2015). Towards this goal, AI can improve the cultivation of microbial communities (Li, Liu, et al., 2023; Selma‐Royo et al., 2023) and, in turn, help to model and subsequently develop synthetic communities with desired properties and behaviours (Mabwi et al., 2021). This could also help transition from donor‐based faecal transplants to more flexible synthetic community‐based therapies (van Leeuwen et al., 2023).

Although AI clearly shows the capabilities to revolutionize the field of microbiome‐related healthcare, we will highlight the various challenges that need to be addressed before it can be applied effectively. The decision‐making process of AI models needs to be well understood through concepts such as ‘explainable AI’ to generate actionable insights. Further, we need more heterogeneous and diverse datasets to create models that generalize well (i.e. perform well on unseen data). While data quality rises, we need to establish models that perform under challenging conditions, such as with noisy or lower quality data.

Compared to AI‐driven advances in other healthcare fields, such as medical imaging and radiomics, microbiome‐related healthcare arguably lags behind (McCoubrey et al., 2021). Contributing factors include, but are not limited to, insufficient high‐quality data to cope with the sizable inter‐individual and geographic heterogeneity already observed in the healthy human microbiome (Olsson et al., 2022), often unclear functional annotations and a lack of standardized cooperation between institutions. More specifically, we are lacking robust datasets with a large number of samples. Because microbiomes are highly diverse across individuals and populations, it is difficult to distinguish between the true signal and batch effects caused by, for example, different primers in 16S sequencing or different DNA extraction protocols. In general, institutes will have to collaborate to gather more data. However, due to legal barriers and privacy concerns, this is challenging.

This review covers the most important state‐of‐the‐art microbiome analysis approaches in applied healthcare and research and raises awareness for emerging challenges. We consider the types of microbiome‐related data available for AI development and application, assess existing and emerging use cases for AI in microbiome‐related healthcare and discuss current roadblocks and future perspectives.

COMPREHENSIVE MOLECULAR PROFILING IN MICROBIOME RESEARCH

Data are the foundation of all AI research. Thus, it is important to understand the properties and applications of different omics data in the field of microbiome‐related healthcare.

Several technologies are used for molecular microbiome profiling across different omics levels. Differences in accessibility (e.g. costs, scalability and turnaround time) and complexity (e.g. type and volume of data) affect whether these technologies are used rather for research‐oriented or clinical applications (Figure 1). The most prevalent and cost‐effective method for microbiome taxonomy profiling is amplicon sequencing of the 16S rRNA gene, which allows studying microbiome composition and richness but offers limited insights into functional characteristics related to human health (Matchado et al., 2023). 16S rRNA gene sequencing is also subject to considerable biases (Abellan‐Schneyder et al., 2021) and has a low taxonomic resolution (up to the genus level). While full‐length 16S intragenomic copy variants can potentially provide taxonomic resolution at species and strain levels (Johnson et al., 2019), microbial research is shifting towards metagenomic sequencing (MGS), which offers not only strain‐level resolution but also vastly improved functional annotation (Yen & Johnson, 2021). The advantages of short‐read sequencing technologies such as Illumina (higher quality reads) and those of third‐generation long‐read sequencing platforms, such as Oxford Nanopore and Pacific Biosciences (longer reads), can be jointly leveraged to recover higher quality metagenome‐assembled genomes, for example, producing assemblies with fewer contigs, higher total number of assembled sequences and significantly higher N50 values (Chen et al., 2022).

Current and emerging landscape of applying AI in microbiome‐related healthcare. Created with BioRender.com.

In principle, metatranscriptomics could offer a more representative view of the functional activity as compared to metagenomics. However, metatranscriptomics is currently more challenging compared to metagenomics because of the short half‐life of prokaryotic mRNA and difficulty in bioinformatics analysis (e.g. assigning reads to specific species/strain) (Ojala et al., 2023). Metabolomics complements this further by quantifying metabolites driving the interactions within and between the microbiota. Targeted metabolomics excel in metabolite annotation and interpretation, but this requires a priori knowledge of the molecules of interest before measurement (Alarcon‐Barrera et al., 2022). Untargeted metabolomics aims to measure as many metabolites as possible in a sample, but its effective use suffers from a lack of consensus about methods and difficulties in robustly annotating unknown metabolites in the absence of well‐established standards (Roach et al., 2024). The emerging field of metaproteomics could bridge the gap between metatranscriptomics and metabolomics and offer insights into signalling peptides, which play a crucial role in host‐microbiome interaction (Cheng et al., 2019). However, metaproteomics applications remain limited due to challenges such as false‐positive peptide matches during database construction (Miura & Okuda, 2023).

An untapped potential lies in integrating information across omics layers (Jiang et al., 2019; Sankaran & Holmes, 2019). As microbiome‐related diseases can affect different functional layers, combining multiple omics types is essential in understanding the full impact on the disease host (Muller et al., 2024). This has already been explored successfully in pilot studies, such as a study that linked specific bacteria proteases with disease severity in ulcerative colitis (Mills et al., 2022). Notably, this is also still an open research challenge in other fields, such as oncology, where initiatives such as the Cancer Genome Atlas have made huge multi‐omics datasets available more than a decade ago (Agamah et al., 2022; Cancer Genome Atlas Research Network et al., 2013). Further untapped potential lies in the incorporation of medical record data that capture the patient's full medical history and contain information about a donor's lifestyle and diet to account for the dynamic nature of the microbiome as it can be affected by various environmental factors (e.g. disease state, diet and age) (Dong & Gupta, 2019; Spor et al., 2011).

In contrast to the bulk profiling approaches introduced thus far, single‐cell sequencing has the potential to disentangle the complex interactions and unique contributions of taxa within microbiota. Single‐cell protocols such as DoTA‐seq, SMH‐seq and Microbe‐seq (Lan et al., 2024; Lötstedt et al., 2023; Zheng et al., 2022) have been developed to address unique challenges due to considerable variance in the size of microbial species, low amounts of genetic material and high risk of premature lysis or permeabilization due to the properties of microbial cell membranes. Additional methodological approaches help to chart horizontal gene transfer of mobile genetic elements (MGEs) as a driving force in the genetic composition and diversity of environments. For instance, combining MGS and MetaHi‐C technology allows assigning MGEs to their host bacteria (Marbouty et al., 2021) which can aid in the tracking of MGEs to predict the spread of antibiotic resistance genes within microbiomes (Ellabaan et al., 2021).

While the most suitable profiling method for any given application has to be determined on a case‐by‐case basis, there is a clear trend towards creating more complex data through technologies that allow deeper or more high‐throughput profiling or the combination of multiple datasets into one.

APPLYING AI TO MICROBIOME‐RELATED DATA

Specifically, the application of AI to microbiome data analysis has to contend with unique challenges, such as its compositional nature, high dimensionality, technical variability, missing data and integration needs (Papoutsoglou et al., 2023). Like most molecular data, microbiome data suffers from the curse of dimensionality, where many features are highly correlated (D'Elia et al., 2023). The composition of communities in microbial data is also highly sparse and heterogeneous, precluding its use in methods that assume a specific distribution of the input data.

Nonetheless, AI has already been frequently applied to microbiome‐related data for taxonomic and functional annotation, phenotype prediction, biomarker discovery, patient stratification and classification of disease subtypes (Beghini et al., 2021; Ghaemi et al., 2019; Li, Wang, et al., 2023; Liu et al., 2022; Su et al., 2022; Thomas et al., 2019). For example, an oral and faecal microbiota‐based random forest (RF) classifier has been developed to distinguish individuals with colorectal cancer (CRC) from healthy individuals using log‐transformed operational taxonomic unit (OTU) data (Flemer et al., 2018). In another example, 33 gene biomarkers were used to identify individuals with CRC based on MGS sequencing (Gupta et al., 2019). However, most studies reporting successful classifiers did not include independent validation data, such that the potential for generalization and clinical applicability often remains questionable. For instance, a random forest classifier trained for predicting Type 2 diabetes (Reitmeier et al., 2020) initially showed promising results but failed when applied to an independent cohort from a different geographic location. This emphasizes the need for cross‐regional studies, data integration and more complex methods that can counteract such biases or be retrained for local use.

While classical machine learningI methods are mostly applied to classification tasks in the microbiome field, contemporary methods like deep learning (DL) methods involving various artificial neural network architectures often demonstrate superior performance in identifying complex and non‐linear patterns (e.g. low‐dimensional representation of features), but require more training data. For example, EnsDeepDP learns a low‐dimensional representation of the microbiome data as input for a classical machine learning‐based classification model and performs better than existing algorithms (Shen et al., 2023). Alternatively, methods such as Met2Img, DeepMicro, metaNN and megaD combine convolutional neural networks (CNN) and autoencoders for data compression and classification (Lo & Marculescu, 2019; Mreyoud et al., 2022; Nguyen et al., 2018; Oh & Zhang, 2020). Graph and network analysis have a long tradition in microbial research, particularly modelling interactions between different taxa (Matchado et al., 2021). More recently, graph neural networks (GNN) have allowed combining the advantages of graph structures with those of neural networks. For example, GNNs have been used to construct disease prediction models from gut metagenomics data (Syama et al., 2023). Longitudinal data are still relatively rare in spite of its ability to capture dynamic changes in the microbiome. However, methods such as phyLoSTM, a combination of CNN and long short‐term memory networks (LSTM), can leverage time‐course data to associate changes in the microbiota with the host's environmental factors, leading to improved disease prediction (Sharma & Xu, 2021). Moreover, methods such as DeepARG aim to identify antibiotic resistance genes (ARGs) and antimicrobial peptides (Arango‐Argoty et al., 2018). While the application of AI in metagenomics data and gene discovery is impressive, most microbial genes only have rudimentary or no functional annotation. Although AI has previously been used for creating reference‐based functional annotations through the application of probabilistic ML methods like hidden Markov models (Finn et al., 2011), DL‐based functional annotation pipelines surpass reference‐based annotation, allowing for the discovery of new functional groups and patterns (Maranga et al., 2023; Pavlopoulos et al., 2023). Also, for emerging single‐cell profiling data, suitable DL‐based tools have been proposed, including ScAnCluster, scDFC and scDeepCluster, which are better suited to clustering bacterial cells (Chen et al., 2020; Hu et al., 2023; Tian et al., 2019).

Through the application of state‐of‐the‐art AI methods, we are able to effectively extract information from complex data. As AI research continues, we can only expect easier‐to‐use, better interpretable and more efficient AI methods (Shao et al., 2022).

CURRENT LIMITATIONS AND FUTURE PERSPECTIVES OF APPLYING AI IN MICROBIOME‐RELATED HEALTHCARE

Despite the progress of using AI in microbiome‐related disease research, few models have found clinical application, likely due to a lack of robustness and generalizability. This can often be attributed to the lack of independent validation or errors during data processing and training. Challenges in training AI models are further aggravated in microbiome research due to enormous variability attributed to a variety of different factors ranging from high impact factors such as geography, diet and medication, via modest effects from host genomics to smaller but variable influences like general lifestyle (Falony et al., 2016; Gupta et al., 2017; Yatsunenko et al., 2012).

Several bottlenecks further hamper the application of AI in microbiome‐related healthcare. Like other biomedical research fields, professionals with expertise in AI technologies and in‐depth domain knowledge are scarce and projections do not expect this to change in the foreseeable future (Glennon et al., 2023). The typical lack of interpretability in AI models is another major bottleneck to translational research. Even if an AI model can successfully solve a given task, it is often very hard to determine on which grounds the model made its decision. While methods like random forests can intrinsically assess how important a given feature is for decision‐making, other ML types require a separate analysis that estimates the impact of individual features (Carrieri et al., 2021; Štrumbelj & Kononenko, 2014). While there are many approaches to tackling the issues of explainable AI, they cannot guarantee a robust explanation of the full model (van der Velden et al., 2022).

An additional frequent issue is data leakage, where data is not properly split into training and test sets, allowing methods to learn shortcuts, leading to overoptimistic reporting and performance failure in real‐world applications (Whalen et al., 2022). This reinforces the need for independent test data, which is often difficult to find (Kim et al., 2017; Papoutsoglou et al., 2023). The acquisition of such independent test data is made significantly easier if a model can make use of unlabeled data, that is, data that has not been reviewed and tagged for a certain purpose. The recently emerging Large Language Models (LLMs) show this capability and allow untrained users such as healthcare professionals or patients to interact with the accumulated knowledge in an intuitive way. LLMs are computational models designed to process and generate natural language, and thus provide a potentially valuable tool to allow intuitive interaction with computational tools. Subject to active research, LLMs have already shown promising results in diagnostic reasoning and clinical conversation in general (Singhal et al., 2023; Tu et al., 2024), while strategies for mitigating the risk of LLMs providing inaccurate information are still sought after (Huang et al., 2023). While further improvements are necessary for LLMs to provide robust medical advice, there have already been pilot projects like GutGPT, that applied LLMs as clinical decision support systems for gastrointestinal bleeding risk (Chan et al., 2023). They can also be applied as a chatbot in patient education, as they can help patients understand and live with their diagnosis (Busch et al., 2024). Although natural language processing is their prime use case, LLMs can also be adopted to decode and understand omics data for different purposes (Liu et al., 2024), such as the prediction of DNA–protein interaction (Zhou et al., 2023), DNA methylation (Jin et al., 2022), protein expression and mRNA degradation (Ren et al., 2024). As training an LLM requires vast computational resources (Chowdhery et al., 2022; Heim, 2022), approaches like transfer‐learning and domain adaptation will be crucial in modifying pre‐trained models towards requirements specific to a country, demographic or phenotype (Karabacak & Margetis, 2023).

A key requirement for the training of effective AI models is access to large‐scale, comprehensive and high‐quality microbiome data. Multiple international efforts for microbiome data collection have been undertaken in the past, like the Human Microbiome Project (Human Microbiome Project Consortium, 2012) and the MetaHIT project (Ehrlich and MetaHIT Consortium, 2011) and paved the way for new projects such as the Million Microbiomes from Humans Project (China National GeneBank (CNGB), n.d.). While such collaborative efforts are instrumental for research, the processing of genetic information and clinical metadata also raises privacy concerns. Microbiome data are considered sensitive information as it is potentially identifying and thus subject to privacy regulations such as the European GDPR (General Data Protection Regulation (GDPR)—Official Legal Text, n.d.). While the GDPR makes concessions for research projects, sharing data between cooperating parties, such as hospitals, is not trivial, particularly in international cooperations. The need for gathering data in one place can be mitigated by federated learning (FL) techniques, where multiple partners can contribute to model training without revealing sensitive data. While FL generally takes more time to execute due to the additional communication overhead, it has been shown that the resulting FL models perform as well as traditional central approaches, even if it is split in a heterogeneous and uneven way (Nasirigerdeh et al., 2022; Zolotareva et al., 2021). Collaborative FL is a promising option for overcoming legal barriers associated with the GDPR and increases the amount of training data, improving performance and generalizability of the resulting models (Asad et al., 2020; Brauneck et al., 2023; Hauschild et al., 2022; Huang et al., 2022). While no microbiome‐specific application of FL has been reported, general‐purpose FL platforms, such as FeatureCloud.ai (Matschinske et al., 2023) or PADME (Jaberansary et al., 2023) could be adapted to microbiome data.

Although the effective implementation of AI in a clinical environment remains challenging, the emergence of new and improved AI methods broadens the scope of possible applications in this environment and has produced many promising avenues to resolve these issues.

CONCLUSION

In the past, microbiome research mostly focused on studying composition, richness and associations between taxa and disease states as well as microbiome–host interactions. With the increasing availability of diverse and large‐scale datasets across omics layers and the emergence of more sophisticated AI methods, the field is now shifting to investigate the function of microbiomes and interactions within the diverse microbiome communities as well as host–microbiome–drug interactions. Despite these advances, AI has not yet been applied in microbiome‐related healthcare practice. However, AI will likely be key to developing microbiome‐based diagnostics and therapies. In the short‐to‐medium term, diseases with abundant microbiome profiling data or a comparably strong microbiome shift compared to inter‐individual microbiome changes, such as inflammatory bowel disease or colorectal cancer, offer the most realistic prospects of actionable AI‐based healthcare. Still, contemporary methods are not yet suited to fully address the complexity, heterogeneity and diversity of the human microbiome, especially considering its dynamic changes within and across individuals, geographic locations and lifestyles.

With the ever‐increasing need for large amounts of data and computing power, spreading and using open science practices has become increasingly important (‘Open Science’, n.d.; UNESCO, 2021). Initiatives that focus on data collection and availability, such as databases like EMBL‐EBI's MGnify (Richardson et al., 2023) and JGI's IMG/M (Chen et al., 2023) but also larger open science initiatives like European Open Science Cloud (‘EOSC Association’, n.d.) and the Research Data Alliance (‘RDA’, n.d.), build the backbone for enabling widespread access to all resources (data and computational) necessary to effectively engage with AI research. Nonetheless, for AI to pave the way towards personalized medicine, challenges with respect to a lack of robustness and generalization must be overcome. This can be facilitated by improving the amount and quality of training data and its integration across molecular layers and, more importantly, by expanding the use of independent validation data and cross‐regional studies. To facilitate the latter, researchers and clinicians need to account for data privacy regulations and new computing paradigms such as federated learning need to be embraced.

GLOSSARY

Artificial intelligence (AI): The broader concept of technologies to simulate human intelligence and thus to enable machines to carry out tasks that require a certain amount of intelligence to be solved.

Curse of dimensionality: This refers to the situation where many more features are available compared to the number of samples. Each feature adds another degree of freedom to a model. If too few samples are available to robustly train and learn all feature parameters (e.g. coefficients in a regression model), a model is said to suffer from the curse of dimensionality. In this case, problems such as overfitting and a lack of generalization are frequently observed.

Deep learning (DL): A commonly used subset of machine learning techniques that specifically uses neural networks with a large number of layers. This is most frequently applied to tasks with a very large amount of available training data, such as computer vision and natural language processing.

Explainable AI (XAI): Refers to techniques and methods that try to explain the decisions and ‘thought process’ of AI models, helping domain experts to assess the quality of the prediction and to understand the underlying patterns better.

Feature: A variable in a dataset such as microbial taxa, protein or metabolite abundance. The more features a dataset has, the higher its dimensionality.

Generalization: An ML model is said to generalize if it shows similar performance on independent test data compared to the data it was originally trained and tested on. Independent data ideally comes from a different site.

Large language model (LLM): A deep learning model typically designed to understand and replicate natural human language. The concept has also been applied successfully in biology to represent and study biological sequences.

Machine learning (ML): Machine learning refers to computer algorithms that improve over time through the observation of data. Whereas traditional statistical models are generally aimed at making judgements about a dataset by verifying concrete hypotheses, ML is primarily designed to make predictions or classifications on previously unseen data.

Microbiome: The term extends beyond the microorganisms to include their genomes and the surrounding environmental conditions.

Microbiota: The collection of microorganisms themselves that reside in a certain defined environment.

Neural network (NN): An ML model that mimics the human brain and consists of layers of interconnected nodes (neurons). The different kinds of NN are often defined by the type of connection between layers, the number of layers or the size of the layers.

Open science: Refers to the practice of making scientific research, data, methods and publications freely accessible and transparent to enable reuse and collaboration by anyone.

Overfitting: If a model performs excellently on validation data but poorly on test data, this highlights that the model memorized data rather than learning higher level patterns that are needed for a model to generalize to unseen data (see also Generalization).

Probabilistic ML (PML): A subset of ML that uses probabilistic modelling to make predictions and decisions. Key PML methods used in microbiome analysis include hidden Markov models, Bayesian networks and Dirichlet‐multinomial mixture models.

Random forest (RF): A popular ensemble ML method that uses the majority vote of a large number of decision trees. Each decision tree is trained on a random subset of training samples and features.

AUTHOR CONTRIBUTIONS

Niklas Probul: Writing – review and editing; writing – original draft; visualization; conceptualization. Zihua Huang: Conceptualization; writing – original draft; writing – review and editing; visualization; funding acquisition. Christina Caroline Saak: Conceptualization; writing – original draft; writing – review and editing; supervision; project administration; funding acquisition. Jan Baumbach: Conceptualization; funding acquisition; writing – original draft; writing – review and editing; project administration; supervision. Markus List: Conceptualization; funding acquisition; writing – original draft; writing – review and editing; project administration; supervision.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflict of interests.

ACKNOWLEDGEMENTS

The Microb‐AI‐ome project has received funding from the European Union's Horizon research and innovation programme under the Grant Agreement no 101079777. Views and opinions expressed in this paper however are those of the author(s) only and do not necessarily reflect those of the European Union. This work was also developed as part of the FeMAI project and is funded by the German Federal Ministry of Education and Research (BMBF) under grant number 01IS21079. This work was also funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) [395357507 (SFB 1371)]. ZH is supported by a scholarship from the China Scholarship Council (CSC). We acknowledge financial support from the Open Access Publication Fund of Universität Hamburg. The authors declare no conflict of interests. Open Access funding enabled and organized by Projekt DEAL.

Probul, N. , Huang, Z. , Saak, C.C. , Baumbach, J. & List, M. (2024) AI in microbiome‐related healthcare. Microbial Biotechnology, 17, e70027. Available from: 10.1111/1751-7915.70027

Niklas Probul and Zihua Huang contributed equally.

Christina Caroline Saak, Jan Baumbach and Markus List contributed equally.

REFERENCES

Abellan‐Schneyder, I. , Matchado, M.S. , Reitmeier, S. , Sommer, A. , Sewald, Z. , Baumbach, J. et al. (2021) Primer, pipelines, parameters: issues in 16S rRNA gene sequencing. mSphere, 6(1). Available from: 10.1128/mSphere.01202-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
Agamah, F.E. , Bayjanov, J.R. , Niehues, A. , Njoku, K.F. , Skelton, M. , Mazandu, G.K. et al. (2022) Computational approaches for network‐based integrative multi‐omics analysis. Frontiers in Molecular Biosciences, 9, 967205. [DOI] [PMC free article] [PubMed] [Google Scholar]
Alarcon‐Barrera, J.C. , Kostidis, S. , Ondo‐Mendez, A. & Giera, M. (2022) Recent advances in metabolomics analysis for early drug development. Drug Discovery Today, 27(6), 1763–1773. [DOI] [PubMed] [Google Scholar]
Arango‐Argoty, G. , Garner, E. , Pruden, A. , Heath, L.S. , Vikesland, P. & Zhang, L. (2018) DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. Microbiome, 6(1), 23. [DOI] [PMC free article] [PubMed] [Google Scholar]
Asad, M. , Moustafa, A.M.I. & Ito, T. (2020) Federated learning versus classical machine learning: a convergence comparison. In The 15th International Conference on Knowledge, Information and Creativity Support Systems (KICSS 2020), Tasmania, Australia (Online), 25‐27 November 2020. unknown.
Beghini, F. , McIver, L.J. , Blanco‐Míguez, A. , Dubois, L. , Asnicar, F. , Maharjan, S. et al. (2021) Integrating taxonomic, functional, and strain‐level profiling of diverse microbial communities with bioBakery 3. eLife, 10. Available from: 10.7554/eLife.65088 [DOI] [PMC free article] [PubMed] [Google Scholar]
Bohr, A. & Memarzadeh, K. (2020) The rise of artificial intelligence in healthcare applications. In: Bohr, A. & Memarzadeh, K. (Eds.) Artificial intelligence in healthcare. Academic Press, pp. 25–60. [Google Scholar]
Bokulich, N.A. , Dillon, M.R. , Bolyen, E. , Kaehler, B.D. , Huttley, G.A. & Gregory Caporaso, J. (2018) Q2‐sample‐classifier: machine‐learning tools for microbiome classification and regression. Journal of Open Research Software, 3(30), 934. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brüssow, H. (2020) Problems with the concept of gut microbiota dysbiosis. Microbial Biotechnology, 13(2), 423–434. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brauneck, A. , Schmalhorst, L. , Majdabadi, M.M.K. , Bakhtiari, M. , Völker, U. , Saak, C.C. et al. (2023) Federated machine learning in data‐protection‐compliant research. Nature Machine Intelligence, 5(1), 2–4. [Google Scholar]
Busch, F. , Hoffmann, L. , Rueger, C. , van Dijk, E.H.C. , Kader, R. , Ortiz‐Prado, E. et al. (2024) Systematic review of large language models for patient care: current applications and challenges. bioRxiv. 10.1101/2024.03.04.24303733 [DOI]
Cancer Genome Atlas Research Network , Weinstein, J.N. , Collisson, E.A. , Mills, G.B. , Mills, K.R. , Shaw, B.A. et al. (2013) The cancer genome atlas Pan‐Cancer analysis project. Nature Genetics, 45(10), 1113–1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carrieri, A.P. , Haiminen, N. , Maudsley‐Barton, S. , Gardiner, L.‐J. , Murphy, B. , Mayes, A.E. et al. (2021) Explainable AI reveals changes in skin microbiome composition linked to phenotypic differences. Scientific Reports, 11(1), 4565. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chan, C. , You, K. , Chung, S. , Giuffrè, M. , Saarinen, T. , Rajashekar, N. et al. (2023) Assessing the usability of GutGPT: a simulation study of an AI clinical decision support system for gastrointestinal bleeding risk. arXiv [cs.HC]. arXiv. http://arxiv.org/abs/2312.10072
Chen, I.‐M.A. , Chu, K. , Palaniappan, K. , Ratner, A. , Huang, J. , Huntemann, M. et al. (2023) The IMG/M data management and analysis system v.7: content updates and new features. Nucleic Acids Research, 51(D1), D723–D732. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen, L. , Zhai, Y. , He, Q. , Wang, W. & Deng, M. (2020) Integrating deep supervised, self‐supervised and unsupervised learning for single‐cell RNA‐seq clustering and annotation. Genes, 11(7), 792. Available from: 10.3390/genes11070792 [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen, L. , Zhao, N. , Cao, J. , Liu, X. , Xu, J. , Ma, Y. et al. (2022) Short‐ and long‐read metagenomics expand individualized structural variations in gut microbiomes. Nature Communications, 13(1), 3175. Available from: 10.1038/s41467-022-30857-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
Cheng, H.‐Y. , Ning, M.‐X. , Chen, D.‐K. & Ma, W.‐T. (2019) Interactions between the gut microbiota and the host innate immune response against pathogens. Frontiers in Immunology, 10, 607. [DOI] [PMC free article] [PubMed] [Google Scholar]
China National GeneBank (CNGB) . (n.d.) MMHP: million microbiomes from humans project. Million Microbiomes from Humans Project. Available from: https://db.cngb.org/mmhp/ [Accessed 25th January 2024].
Chowdhery, A. , Narang, S. , Devlin, J. , Bosma, M. , Mishra, G. , Roberts, A. et al. (2022) PaLM: scaling language modeling with pathways. arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2204.02311
D'Elia, D. , Truu, J. , Lahti, L. , Berland, M. , Papoutsoglou, G. , Ceci, M. et al. (2023) Advancing microbiome research with machine learning: key findings from the ML4Microbiome COST action. Frontiers in Microbiology, 14, 1257002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dong, T.S. & Gupta, A. (2019) Influence of early life, diet, and the environment on the microbiome. Clinical Gastroenterology and Hepatology, 17(2), 231–242. Available from: 10.1016/j.cgh.2018.08.067 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ehrlich, S.D. & MetaHIT Consortium . (2011) MetaHIT: the European Union project on metagenomics of the human intestinal tract. In: Nelson, K.E. (Ed.) Metagenomics of the human body. New York, NY: Springer New York, pp. 307–316. [Google Scholar]
Ellabaan, M.M.H. , Munck, C. , Porse, A. , Imamovic, L. & Sommer, M.O.A. (2021) Forecasting the dissemination of antibiotic resistance genes across bacterial genomes. Nature Communications, 12(1), 2435. [DOI] [PMC free article] [PubMed] [Google Scholar]
EOSC Association . (n.d.) EOSC Association. https://eosc.eu/ [Accessed 29th August 2024].
Falony, G. , Joossens, M. , Vieira‐Silva, S. , Wang, J. , Darzi, Y. , Faust, K. et al. (2016) Population‐level analysis of gut microbiome variation. Science (New York, N.Y.), 352(6285), 560–564. [DOI] [PubMed] [Google Scholar]
Finn, R.D. , Clements, J. & Eddy, S.R. (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Research, 39(Web Server issue), W29–W37. [DOI] [PMC free article] [PubMed] [Google Scholar]
Flemer, B. , Warren, R.D. , Barrett, M.P. , Cisek, K. , Das, A. , Jeffery, I.B. et al. (2018) The oral microbiota in colorectal cancer is distinctive and predictive. Gut, 67(8), 1454–1463. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gacesa, R. , Vich Vila, A. , Collij, V. , Mujagic, Z. , Kurilshikov, A. , Voskuil, M.D. et al. (2021) A combination of fecal calprotectin and human beta‐defensin 2 facilitates diagnosis and monitoring of inflammatory bowel disease. Gut Microbes, 13(1), 1943288. [DOI] [PMC free article] [PubMed] [Google Scholar]
General Data Protection Regulation (GDPR) – Official Legal Text . (n.d.) General Data Protection Regulation (GDPR). https://gdpr‐info.eu/ [Accessed 29th August 2022].
Ghaemi, M.S. , DiGiulio, D.B. , Contrepois, K. , Callahan, B. , Ngo, T.T.M. , Lee‐McMullen, B. et al. (2019) Multiomics modeling of the immunome, transcriptome, microbiome, proteome and metabolome adaptations during human pregnancy. Bioinformatics, 35(1), 95–103. [DOI] [PMC free article] [PubMed] [Google Scholar]
Giuffrè, M. , Moretti, R. & Tiribelli, C. (2023) Gut microbes meet machine learning: the next step towards advancing our understanding of the gut microbiome in health and disease. International Journal of Molecular Sciences, 24(6). Available from: 10.3390/ijms24065229 [DOI] [PMC free article] [PubMed] [Google Scholar]
Glennon, M. , La Croce, C. , Micheletti, G. , Raczko, N. , Freitas, L. , Moise, C. et al. (2023) Results of the new European Data Market Study 2021–2023, D2.7. 2.0. https://ec.europa.eu/newsroom/dae/redirection/document/101694
Gupta, A. , Dhakan, D.B. , Maji, A. , Saxena, R. , Vishnu Prasoodanan, P.K. , Mahajan, S. et al. (2019) Association of Flavonifractor plautii, a flavonoid‐degrading bacterium, with the gut microbiome of colorectal cancer patients in India. mSystems, 4(6). Available from: 10.1128/mSystems.00438-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gupta, V.K. , Paul, S. & Dutta, C. (2017) Geography, ethnicity or subsistence‐specific variations in human microbiome composition and diversity. Frontiers in Microbiology, 8, 1162. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hauschild, A.‐C. , Lemanczyk, M. , Matschinske, J. , Frisch, T. , Zolotareva, O. , Holzinger, A. et al. (2022) Federated random forests can improve local performance of predictive models for various healthcare applications. Bioinformatics, 38(8), 2278–2286. [DOI] [PubMed] [Google Scholar]
Heim, L. (2022) Estimating 🌴PaLM's training cost. Blog.heim.xyz (Blog). April 5, 2022. https://blog.heim.xyz/palm‐training‐cost/.
Hoffmann, D.E. , von Rosenvinge, E.C. , Roghmann, M.‐C. , Palumbo, F.B. , McDonald, D. & Ravel, J. (2024) The DTC microbiome testing industry needs more regulation. Science, 383(6688), 1176–1179. [DOI] [PubMed] [Google Scholar]
Hu, D. , Liang, K. , Zhou, S. , Wenxuan, T. , Liu, M. & Liu, X. (2023) scDFC: a deep fusion clustering method for single‐cell RNA‐seq data. Briefings in Bioinformatics, 24(4). Available from: 10.1093/bib/bbad216 [DOI] [PubMed] [Google Scholar]
Huang, C. , Huang, J. & Liu, X. (2022) Cross‐silo federated learning: challenges and opportunities. arXiv [cs.LG], June. https://arxiv.org/abs/2206.12949.
Huang, L. , Yu, W. , Ma, W. , Zhong, W. , Feng, Z. , Wang, H. et al. (2023) A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2311.05232.
Human Microbiome Project Consortium . (2012) A framework for human microbiome research. Nature, 486(7402), 215–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jaberansary, M. , Maia, M. , Yediel, Y.U. , Beyan, O. & Kirsten, T. (2023) Analyzing distributed medical data in FAIR data spaces. In Companion Proceedings of the ACM Web Conference 2023, 1480–84. WWW '23 Companion. New York, NY, USA: Association for Computing Machinery.
Jiang, D. , Armour, C.R. , Chenxiao, H. , Mei, M. , Tian, C. , Sharpton, T.J. et al. (2019) Microbiome multi‐omics network analysis: statistical considerations, limitations, and opportunities. Frontiers in Genetics, 10, 995. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jiang, Y. , Luo, J. , Huang, D. , Liu, Y. & Li, D.‐D. (2022) Machine learning advances in microbiology: a review of methods and applications. Frontiers in Microbiology, 13, 925454. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jin, J. , Yingying, Y. , Wang, R. , Zeng, X. , Pang, C. , Jiang, Y. et al. (2022) iDNA‐ABF: multi‐scale deep biological language learning model for the interpretable prediction of DNA methylations. Genome Biology, 23(1), 219. [DOI] [PMC free article] [PubMed] [Google Scholar]
Johnson, J.S. , Spakowicz, D.J. , Hong, B.‐Y. , Petersen, L.M. , Demkowicz, P. , Chen, L. et al. (2019) Evaluation of 16S rRNA gene sequencing for species and strain‐level microbiome analysis. Nature Communications, 10(1), 5029. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karabacak, M. & Margetis, K. (2023) Embracing large language models for medical applications: opportunities and challenges. Cureus, 15(5), e39305. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim, D. , Hofstaedter, C.E. , Zhao, C. , Mattei, L. , Tanes, C. , Clarke, E. et al. (2017) Optimizing methods and dodging pitfalls in microbiome research. Microbiome, 5(1), 52. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lan, F. , Saba, J. , Ross, T.D. , Zhou, Z. , Krauska, K. , Anantharaman, K. et al. (2024) Massively parallel single‐cell sequencing of diverse microbial populations. Nature Methods, 21(2), 228–235. Available from: 10.1038/s41592-023-02157-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
Li, B. , Wang, T. , Qian, M. & Wang, S. (2023) MKMR: a multi‐kernel machine regression model to predict health outcomes using human microbiome data. Briefings in Bioinformatics, 24(3). Available from: 10.1093/bib/bbad158 [DOI] [PMC free article] [PubMed] [Google Scholar]
Li, M. , Liu, J. , Zhu, J. , Wang, H. , Sun, C. , Gao, N.L. et al. (2023) Performance of gut microbiome as an independent diagnostic tool for 20 diseases: cross‐cohort validation of machine‐learning classifiers. Gut Microbes, 15(1), 2205386. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu, J. , Yang, M. , Yu, Y. , Xu, H. , Li, K. & Zhou, X. (2024) Large language models in bioinformatics: applications and perspectives. ArXiv, January. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10802675/.
Liu, Y. , Zhu, J. , Wang, H. , Wenwei, L. , Lee, Y.K. , Zhao, J. et al. (2022) Machine learning framework for gut microbiome biomarkers discovery and modulation analysis in large‐scale obese population. BMC Genomics, 23(1), 850. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lo, C. & Marculescu, R. (2019) MetaNN: accurate classification of host phenotypes from metagenomic data using neural networks. BMC Bioinformatics, 20(Suppl 12), 314. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lötstedt, B. , Stražar, M. , Xavier, R. , Regev, A. & Vickovic, S. (2023) Spatial host–microbiome sequencing reveals niches in the mouse gut. Nature Biotechnology, 42(9), 1394–1403. Available from: 10.1038/s41587-023-01988-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
Mabwi, H.A. , Kim, E. , Song, D.‐G. , Yoon, H.S. , Pan, C.‐H. , Komba, E.V.G. et al. (2021) Synthetic gut microbiome: advances and challenges. Computational and Structural Biotechnology Journal, 19, 363–371. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maranga, M. , Szczerbiak, P. , Bezshapkin, V. , Gligorijevic, V. , Chandler, C. , Bonneau, R. et al. (2023) Comprehensive functional annotation of metagenomes and microbial genomes using a deep learning‐based method. mSystems, 8(2), e0117822. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marbouty, M. , Thierry, A. , Millot, G.A. & Koszul, R. (2021) MetaHiC phage‐bacteria infection network reveals active cycling phages of the healthy human gut. eLife, 10. Available from: 10.7554/eLife.60608 [DOI] [PMC free article] [PubMed] [Google Scholar]
Matchado, M.S. , Lauber, M. , Reitmeier, S. , Kacprowski, T. , Baumbach, J. , Haller, D. et al. (2021) Network analysis methods for studying microbial communities: a mini review. Computational and Structural Biotechnology Journal, 19, 2687–2698. [DOI] [PMC free article] [PubMed] [Google Scholar]
Matchado, M.S. , Rühlemann, M. , Reimeiter, S. , Kacprowski, T. , Frost, F. , Haller, D. et al. (2023) On the limits of 16S rRNA gene‐based metagenome prediction and functional profiling. bioRxiv. 10.1101/2023.11.07.564315 [DOI] [PMC free article] [PubMed]
Matschinske, J. , Späth, J. , Bakhtiari, M. , Probul, N. , Majdabadi, M.M.K. , Nasirigerdeh, R. et al. (2023) The FeatureCloud platform for federated learning in biomedicine: unified approach. Journal of Medical Internet Research, 25, e42621. [DOI] [PMC free article] [PubMed] [Google Scholar]
McCoubrey, L.E. , Elbadawi, M. , Orlu, M. , Gaisford, S. & Basit, A.W. (2021) Harnessing machine learning for development of microbiome therapeutics. Gut Microbes, 13(1), 1–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mills, R.H. , Dulai, P.S. , Vázquez‐Baeza, Y. , Sauceda, C. , Daniel, N. , Gerner, R.R. et al. (2022) Multi‐omics analyses of the ulcerative colitis gut microbiome link Bacteroides vulgatus proteases with disease severity. Nature Microbiology, 7(2), 262–276. [DOI] [PMC free article] [PubMed] [Google Scholar]
Miura, N. & Okuda, S. (2023) Current progress and critical challenges to overcome in the bioinformatics of mass spectrometry‐based metaproteomics. Computational and Structural Biotechnology Journal, 21, 1140–1150. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moreno‐Indias, I. , Lahti, L. , Nedyalkova, M. , Elbere, I. , Roshchupkin, G. , Adilovic, M. et al. (2021) Statistical and machine learning techniques in human microbiome studies: contemporary challenges and solutions. Frontiers in Microbiology, 12, 635781. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mreyoud, Y. , Song, M. , Lim, J. & Ahn, T.‐H. (2022) MegaD: deep learning for rapid and accurate disease status prediction of metagenomic samples. Life, 12(5). Available from: 10.3390/life12050669 [DOI] [PMC free article] [PubMed] [Google Scholar]
Muller, E. , Shiryan, I. & Borenstein, E. (2024) Multi‐Omic integration of microbiome data for identifying disease‐associated modules. Nature Communications, 15(1), 2621. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nasirigerdeh, R. , Torkzadehmahani, R. , Matschinske, J. , Frisch, T. , List, M. , Späth, J. et al. (2022) sPLINK: a hybrid federated tool as a robust alternative to meta‐analysis in genome‐wide association studies. Genome Biology, 23(1), 32. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nguyen, T.H. , Prifti, E. , Chevaleyre, Y. , Sokolovska, N. & Zucker, J.‐D. (2018) Disease classification in metagenomics with 2D embeddings and deep learning. arXiv [cs.CV]. arXiv. http://arxiv.org/abs/1806.09046
Oh, M. & Zhang, L. (2020) DeepMicro: deep representation learning for disease prediction based on microbiome data. Scientific Reports, 10(1), 6026. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ojala, T. , Häkkinen, A.‐E. , Kankuri, E. & Kankainen, M. (2023) Current concepts, advances, and challenges in deciphering the human microbiota with metatranscriptomics. Trends in Genetics: TIG, 39(9), 686–702. [DOI] [PubMed] [Google Scholar]
Olsson, L.M. , Boulund, F. , Nilsson, S. , Khan, M.T. , Gummesson, A. , Fagerberg, L. et al. (2022) Dynamics of the normal gut microbiota: a longitudinal one‐year population study in Sweden. Cell Host & Microbe, 30(5), 726–739.e3. [DOI] [PubMed] [Google Scholar]
Open Science . (n.d.) Research and innovation. https://research‐and‐innovation.ec.europa.eu/strategy/strategy‐2020‐2024/our‐digital‐future/open‐science_en [Accessed 29th August 2024].
Papoutsoglou, G. , Tarazona, S. , Lopes, M.B. , Klammsteiner, T. , Ibrahimi, E. , Eckenberger, J. et al. (2023) Machine learning approaches in microbiome research: challenges and best practices. Frontiers in Microbiology, 14, 1261889. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pavlopoulos, G.A. , Baltoumas, F.A. , Liu, S. , Selvitopi, O. , Camargo, A.P. , Nayfach, S. et al. (2023) Unraveling the functional dark matter through global metagenomics. Nature, 622(7983), 594–602. [DOI] [PMC free article] [PubMed] [Google Scholar]
Peng, H. , Long, F. & Ding, C. (2005) Feature selection based on mutual information: criteria of max‐dependency, max‐relevance, and min‐redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238. [DOI] [PubMed] [Google Scholar]
Pruss, K.M. & Sonnenburg, J.L. (2021) C. difficile exploits a host metabolite produced during toxin‐mediated disease. Nature, 593(7858), 261–265. [DOI] [PMC free article] [PubMed] [Google Scholar]
Queen, O. & Emrich, S.J. (2021) LASSO‐based feature selection for improved microbial and microbiome classification. In 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2301–8. IEEE.
Rao, C. , Coyte, K.Z. , Bainter, W. , Geha, R.S. , Martin, C.R. & Rakoff‐Nahoum, S. (2021) Multi‐kingdom ecological drivers of microbiota assembly in preterm infants. Nature, 591(7851), 633–638. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ratiner, K. , Ciocan, D. , Abdeen, S.K. & Elinav, E. (2023) Utilization of the microbiome in personalized medicine. Nature Reviews. Microbiology, 22, 291–309. Available from: 10.1038/s41579-023-00998-9 [DOI] [PubMed] [Google Scholar]
RDA . (n.d.) Research data alliance. https://www.rd‐alliance.org/ [Accessed 29th August 2024].
Reitmeier, S. , Kiessling, S. , Clavel, T. , List, M. , Almeida, E.L. , Ghosh, T.S. et al. (2020) Arrhythmic gut microbiome signatures predict risk of type 2 diabetes. Cell Host & Microbe, 28(2), 258–272.e6. [DOI] [PubMed] [Google Scholar]
Ren, Z. , Jiang, L. , Di, Y. , Zhang, D. , Gong, J. , Gong, J. et al. (2024) CodonBERT: a BERT‐based architecture tailored for codon optimization using the cross‐attention mechanism. Bioinformatics (Oxford, England), 40(7), btae330. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ricardo, H.M. , Kutuzova, S. , Nielsen, K.N. , Johansen, J. , Hansen, L.H. , Nielsen, M. et al. (2022) Machine learning and deep learning applications in microbiome research. ISME Communications, 2(1), 98. [DOI] [PMC free article] [PubMed] [Google Scholar]
Richardson, L. , Allen, B. , Baldi, G. , Beracochea, M. , Bileschi, M.L. , Burdett, T. et al. (2023) MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Research, 51(D1), D753–D759. [DOI] [PMC free article] [PubMed] [Google Scholar]
Roach, J. , Mital, R. , Haffner, J.J. , Colwell, N. , Coats, R. , Palacios, H.M. et al. (2024) Microbiome metabolite quantification methods enabling insights into human health and disease. Methods, 222, 81–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
Routy, B. , Le Chatelier, E. , Derosa, L. , Duong, C.P.M. , Alou, M.T. , Daillère, R. et al. (2018) Gut microbiome influences efficacy of PD‐1‐based immunotherapy against epithelial tumors. Science, 359(6371), 91–97. [DOI] [PubMed] [Google Scholar]
Sankaran, K. & Holmes, S.P. (2019) Multitable methods for microbiome data integration. Frontiers in Genetics, 10, 627. [DOI] [PMC free article] [PubMed] [Google Scholar]
Santana, P.T. , Rosas, S.L.B. , Ribeiro, B.E. , Marinho, Y. & de Souza, H.S.P. (2022) Dysbiosis in inflammatory bowel disease: pathogenic role and potential therapeutic targets. International Journal of Molecular Sciences, 23(7). Available from: 10.3390/ijms23073464 [DOI] [PMC free article] [PubMed] [Google Scholar]
Selma‐Royo, M. , Segata, N. & Ricci, L. (2023) Human microbiome cultivation expands with AI. Nature Biotechnology, 41(10), 1389–1391. [DOI] [PubMed] [Google Scholar]
Shao, Z. , Zhao, R. , Yuan, S. , Ding, M. & Wang, Y. (2022) Tracing the evolution of AI in the past decade and forecasting the emerging trends. Expert Systems with Applications, 209, 118221. [Google Scholar]
Sharma, D. & Xu, W. (2021) phyLoSTM: a novel deep learning model on disease prediction from longitudinal microbiome data. Bioinformatics, 37(21), 3707–3714. [DOI] [PubMed] [Google Scholar]
Shen, Y. , Zhu, J. , Deng, Z. , Wenwei, L. & Wang, H. (2023) EnsDeepDP: an ensemble deep learning approach for disease prediction through metagenomics. IEEE/ACM Transactions on Computational Biology and Bioinformatics/IEEE, ACM, 20(2), 986–998. [DOI] [PubMed] [Google Scholar]
Singhal, K. , Tu, T. , Gottweis, J. , Sayres, R. , Wulczyn, E. , Le Hou, K.C. et al. (2023) Towards expert‐level medical question answering with large language models. arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2305.09617.
Spor, A. , Koren, O. & Ley, R. (2011) Unravelling the effects of the environment and host genotype on the gut microbiome. Nature Reviews Microbiology, 9(4), 279–290. Available from: 10.1038/nrmicro2540 [DOI] [PubMed] [Google Scholar]
Štrumbelj, E. & Kononenko, I. (2014) Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41(3), 647–665. [Google Scholar]
Su, Q. , Liu, Q. , Lau, R.I. , Zhang, J. , Zhilu, X. , Yeoh, Y.K. et al. (2022) Faecal microbiome‐based machine learning for multi‐class disease diagnosis. Nature Communications, 13(1), 6818. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sun, T. , Niu, X. , He, Q. , Chen, F. & Qi, R.‐Q. (2023) Artificial intelligence in microbiomes analysis: a review of applications in dermatology. Frontiers in Microbiology, 14, 1112010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Syama, K. , Jothi, A.A.A. & Khanna, N. (2023) Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE. BMC Bioinformatics, 24(1), 126. [DOI] [PMC free article] [PubMed] [Google Scholar]
Talmor‐Barkan, Y. , Bar, N. , Shaul, A.A. , Shahaf, N. , Godneva, A. , Bussi, Y. et al. (2022) Metabolomic and microbiome profiling reveals personalized risk factors for coronary artery disease. Nature Medicine, 28(2), 295–302. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thomas, A.M. , Manghi, P. , Asnicar, F. , Pasolli, E. , Armanini, F. , Zolfo, M. et al. (2019) Metagenomic analysis of colorectal cancer datasets identifies cross‐cohort microbial diagnostic signatures and a link with choline degradation. Nature Medicine, 25(4), 667–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tian, T. , Wan, J. , Song, Q. & Wei, Z. (2019) Clustering single‐cell RNA‐seq data with a model‐based deep learning approach. Nature Machine Intelligence, 1(4), 191–198. [Google Scholar]
Tu, T. , Palepu, A. , Schaekermann, M. , Saab, K. , Freyberg, J. , Tanno, R. et al. (2024) Towards conversational diagnostic AI. arXiv [cs.AI]. arXiv. http://arxiv.org/abs/2401.05654.
UNESCO . (2021) UNESCO recommendation on Open Science. UNESCO. https://unesdoc.unesco.org/ark:/48223/pf0000379949.
van der Velden, B.H.M. , Kuijf, H.J. , Gilhuijs, K.G.A. & Viergever, M.A. (2022) Explainable artificial intelligence (XAI) in deep learning‐based medical image analysis. Medical Image Analysis, 79, 102470. [DOI] [PubMed] [Google Scholar]
van Leeuwen, P.T. , Brul, S. , Zhang, J. & Wortel, M.T. (2023) Synthetic microbial communities (SynComs) of the human gut: design, assembly, and applications. FEMS Microbiology Reviews, 47(2). Available from: 10.1093/femsre/fuad012 [DOI] [PMC free article] [PubMed] [Google Scholar]
Whalen, S. , Schreiber, J. , Noble, W.S. & Pollard, K.S. (2022) Navigating the pitfalls of applying machine learning in genomics. Nature Reviews. Genetics, 23(3), 169–181. [DOI] [PubMed] [Google Scholar]
Yatsunenko, T. , Rey, F.E. , Manary, M.J. , Trehan, I. , Dominguez‐Bello, M.G. , Contreras, M. et al. (2012) Human gut microbiome viewed across age and geography. Nature, 486(7402), 222–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yen, S. & Johnson, J.S. (2021) Metagenomics: a path to understanding the gut microbiome. Mammalian Genome: Official Journal of the International Mammalian Genome Society, 32(4), 282–296. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu, L.C.‐H. (2018) Microbiota dysbiosis and barrier dysfunction in inflammatory bowel disease and colorectal cancers: exploring a common ground hypothesis. Journal of Biomedical Science, 25(1), 79. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zeevi, D. , Korem, T. , Zmora, N. , Israeli, D. , Rothschild, D. , Weinberger, A. et al. (2015) Personalized nutrition by prediction of glycemic responses. Cell, 163(5), 1079–1094. [DOI] [PubMed] [Google Scholar]
Zheng, W. , Zhao, S. , Yin, Y. , Zhang, H. , Needham, D.M. , Evans, E.D. et al. (2022) High‐throughput, single‐microbe genomics with strain resolution, applied to a human gut microbiome. Science, 376(6597), eabm1483. Available from: 10.1126/science.abm1483 [DOI] [PubMed] [Google Scholar]
Zhou, Z. , Ji, Y. , Li, W. , Dutta, P. , Davuluri, R. & Liu, H. (2023) DNABERT‐2: efficient foundation model and benchmark for multi‐species genome. arXiv [q‐Bio.GN]. arXiv. http://arxiv.org/abs/2306.15006
Zolotareva, O. , Nasirigerdeh, R. , Matschinske, J. , Torkzadehmahani, R. , Bakhtiari, M. , Frisch, T. et al. (2021) Flimma: a federated and privacy‐aware tool for differential gene expression analysis. Genome Biology, 22(1), 338. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0001] Abellan‐Schneyder, I. , Matchado, M.S. , Reitmeier, S. , Sommer, A. , Sewald, Z. , Baumbach, J. et al. (2021) Primer, pipelines, parameters: issues in 16S rRNA gene sequencing. mSphere, 6(1). Available from: 10.1128/mSphere.01202-20 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0002] Agamah, F.E. , Bayjanov, J.R. , Niehues, A. , Njoku, K.F. , Skelton, M. , Mazandu, G.K. et al. (2022) Computational approaches for network‐based integrative multi‐omics analysis. Frontiers in Molecular Biosciences, 9, 967205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0003] Alarcon‐Barrera, J.C. , Kostidis, S. , Ondo‐Mendez, A. & Giera, M. (2022) Recent advances in metabolomics analysis for early drug development. Drug Discovery Today, 27(6), 1763–1773. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0004] Arango‐Argoty, G. , Garner, E. , Pruden, A. , Heath, L.S. , Vikesland, P. & Zhang, L. (2018) DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. Microbiome, 6(1), 23. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0005] Asad, M. , Moustafa, A.M.I. & Ito, T. (2020) Federated learning versus classical machine learning: a convergence comparison. In The 15th International Conference on Knowledge, Information and Creativity Support Systems (KICSS 2020), Tasmania, Australia (Online), 25‐27 November 2020. unknown.

[mbt270027-bib-0006] Beghini, F. , McIver, L.J. , Blanco‐Míguez, A. , Dubois, L. , Asnicar, F. , Maharjan, S. et al. (2021) Integrating taxonomic, functional, and strain‐level profiling of diverse microbial communities with bioBakery 3. eLife, 10. Available from: 10.7554/eLife.65088 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0007] Bohr, A. & Memarzadeh, K. (2020) The rise of artificial intelligence in healthcare applications. In: Bohr, A. & Memarzadeh, K. (Eds.) Artificial intelligence in healthcare. Academic Press, pp. 25–60. [Google Scholar]

[mbt270027-bib-0008] Bokulich, N.A. , Dillon, M.R. , Bolyen, E. , Kaehler, B.D. , Huttley, G.A. & Gregory Caporaso, J. (2018) Q2‐sample‐classifier: machine‐learning tools for microbiome classification and regression. Journal of Open Research Software, 3(30), 934. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0010] Brüssow, H. (2020) Problems with the concept of gut microbiota dysbiosis. Microbial Biotechnology, 13(2), 423–434. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0009] Brauneck, A. , Schmalhorst, L. , Majdabadi, M.M.K. , Bakhtiari, M. , Völker, U. , Saak, C.C. et al. (2023) Federated machine learning in data‐protection‐compliant research. Nature Machine Intelligence, 5(1), 2–4. [Google Scholar]

[mbt270027-bib-0011] Busch, F. , Hoffmann, L. , Rueger, C. , van Dijk, E.H.C. , Kader, R. , Ortiz‐Prado, E. et al. (2024) Systematic review of large language models for patient care: current applications and challenges. bioRxiv. 10.1101/2024.03.04.24303733 [DOI]

[mbt270027-bib-0012] Cancer Genome Atlas Research Network , Weinstein, J.N. , Collisson, E.A. , Mills, G.B. , Mills, K.R. , Shaw, B.A. et al. (2013) The cancer genome atlas Pan‐Cancer analysis project. Nature Genetics, 45(10), 1113–1120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0013] Carrieri, A.P. , Haiminen, N. , Maudsley‐Barton, S. , Gardiner, L.‐J. , Murphy, B. , Mayes, A.E. et al. (2021) Explainable AI reveals changes in skin microbiome composition linked to phenotypic differences. Scientific Reports, 11(1), 4565. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0014] Chan, C. , You, K. , Chung, S. , Giuffrè, M. , Saarinen, T. , Rajashekar, N. et al. (2023) Assessing the usability of GutGPT: a simulation study of an AI clinical decision support system for gastrointestinal bleeding risk. arXiv [cs.HC]. arXiv. http://arxiv.org/abs/2312.10072

[mbt270027-bib-0015] Chen, I.‐M.A. , Chu, K. , Palaniappan, K. , Ratner, A. , Huang, J. , Huntemann, M. et al. (2023) The IMG/M data management and analysis system v.7: content updates and new features. Nucleic Acids Research, 51(D1), D723–D732. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0016] Chen, L. , Zhai, Y. , He, Q. , Wang, W. & Deng, M. (2020) Integrating deep supervised, self‐supervised and unsupervised learning for single‐cell RNA‐seq clustering and annotation. Genes, 11(7), 792. Available from: 10.3390/genes11070792 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0110] Chen, L. , Zhao, N. , Cao, J. , Liu, X. , Xu, J. , Ma, Y. et al. (2022) Short‐ and long‐read metagenomics expand individualized structural variations in gut microbiomes. Nature Communications, 13(1), 3175. Available from: 10.1038/s41467-022-30857-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0017] Cheng, H.‐Y. , Ning, M.‐X. , Chen, D.‐K. & Ma, W.‐T. (2019) Interactions between the gut microbiota and the host innate immune response against pathogens. Frontiers in Immunology, 10, 607. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0018] China National GeneBank (CNGB) . (n.d.) MMHP: million microbiomes from humans project. Million Microbiomes from Humans Project. Available from: https://db.cngb.org/mmhp/ [Accessed 25th January 2024].

[mbt270027-bib-0019] Chowdhery, A. , Narang, S. , Devlin, J. , Bosma, M. , Mishra, G. , Roberts, A. et al. (2022) PaLM: scaling language modeling with pathways. arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2204.02311

[mbt270027-bib-0020] D'Elia, D. , Truu, J. , Lahti, L. , Berland, M. , Papoutsoglou, G. , Ceci, M. et al. (2023) Advancing microbiome research with machine learning: key findings from the ML4Microbiome COST action. Frontiers in Microbiology, 14, 1257002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0111] Dong, T.S. & Gupta, A. (2019) Influence of early life, diet, and the environment on the microbiome. Clinical Gastroenterology and Hepatology, 17(2), 231–242. Available from: 10.1016/j.cgh.2018.08.067 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0021] Ehrlich, S.D. & MetaHIT Consortium . (2011) MetaHIT: the European Union project on metagenomics of the human intestinal tract. In: Nelson, K.E. (Ed.) Metagenomics of the human body. New York, NY: Springer New York, pp. 307–316. [Google Scholar]

[mbt270027-bib-0022] Ellabaan, M.M.H. , Munck, C. , Porse, A. , Imamovic, L. & Sommer, M.O.A. (2021) Forecasting the dissemination of antibiotic resistance genes across bacterial genomes. Nature Communications, 12(1), 2435. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0023] EOSC Association . (n.d.) EOSC Association. https://eosc.eu/ [Accessed 29th August 2024].

[mbt270027-bib-0024] Falony, G. , Joossens, M. , Vieira‐Silva, S. , Wang, J. , Darzi, Y. , Faust, K. et al. (2016) Population‐level analysis of gut microbiome variation. Science (New York, N.Y.), 352(6285), 560–564. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0025] Finn, R.D. , Clements, J. & Eddy, S.R. (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Research, 39(Web Server issue), W29–W37. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0026] Flemer, B. , Warren, R.D. , Barrett, M.P. , Cisek, K. , Das, A. , Jeffery, I.B. et al. (2018) The oral microbiota in colorectal cancer is distinctive and predictive. Gut, 67(8), 1454–1463. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0027] Gacesa, R. , Vich Vila, A. , Collij, V. , Mujagic, Z. , Kurilshikov, A. , Voskuil, M.D. et al. (2021) A combination of fecal calprotectin and human beta‐defensin 2 facilitates diagnosis and monitoring of inflammatory bowel disease. Gut Microbes, 13(1), 1943288. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0028] General Data Protection Regulation (GDPR) – Official Legal Text . (n.d.) General Data Protection Regulation (GDPR). https://gdpr‐info.eu/ [Accessed 29th August 2022].

[mbt270027-bib-0029] Ghaemi, M.S. , DiGiulio, D.B. , Contrepois, K. , Callahan, B. , Ngo, T.T.M. , Lee‐McMullen, B. et al. (2019) Multiomics modeling of the immunome, transcriptome, microbiome, proteome and metabolome adaptations during human pregnancy. Bioinformatics, 35(1), 95–103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0030] Giuffrè, M. , Moretti, R. & Tiribelli, C. (2023) Gut microbes meet machine learning: the next step towards advancing our understanding of the gut microbiome in health and disease. International Journal of Molecular Sciences, 24(6). Available from: 10.3390/ijms24065229 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0031] Glennon, M. , La Croce, C. , Micheletti, G. , Raczko, N. , Freitas, L. , Moise, C. et al. (2023) Results of the new European Data Market Study 2021–2023, D2.7. 2.0. https://ec.europa.eu/newsroom/dae/redirection/document/101694

[mbt270027-bib-0032] Gupta, A. , Dhakan, D.B. , Maji, A. , Saxena, R. , Vishnu Prasoodanan, P.K. , Mahajan, S. et al. (2019) Association of Flavonifractor plautii, a flavonoid‐degrading bacterium, with the gut microbiome of colorectal cancer patients in India. mSystems, 4(6). Available from: 10.1128/mSystems.00438-19 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0033] Gupta, V.K. , Paul, S. & Dutta, C. (2017) Geography, ethnicity or subsistence‐specific variations in human microbiome composition and diversity. Frontiers in Microbiology, 8, 1162. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0034] Hauschild, A.‐C. , Lemanczyk, M. , Matschinske, J. , Frisch, T. , Zolotareva, O. , Holzinger, A. et al. (2022) Federated random forests can improve local performance of predictive models for various healthcare applications. Bioinformatics, 38(8), 2278–2286. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0035] Heim, L. (2022) Estimating 🌴PaLM's training cost. Blog.heim.xyz (Blog). April 5, 2022. https://blog.heim.xyz/palm‐training‐cost/.

[mbt270027-bib-0036] Hoffmann, D.E. , von Rosenvinge, E.C. , Roghmann, M.‐C. , Palumbo, F.B. , McDonald, D. & Ravel, J. (2024) The DTC microbiome testing industry needs more regulation. Science, 383(6688), 1176–1179. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0037] Hu, D. , Liang, K. , Zhou, S. , Wenxuan, T. , Liu, M. & Liu, X. (2023) scDFC: a deep fusion clustering method for single‐cell RNA‐seq data. Briefings in Bioinformatics, 24(4). Available from: 10.1093/bib/bbad216 [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0038] Huang, C. , Huang, J. & Liu, X. (2022) Cross‐silo federated learning: challenges and opportunities. arXiv [cs.LG], June. https://arxiv.org/abs/2206.12949.

[mbt270027-bib-0039] Huang, L. , Yu, W. , Ma, W. , Zhong, W. , Feng, Z. , Wang, H. et al. (2023) A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2311.05232.

[mbt270027-bib-0040] Human Microbiome Project Consortium . (2012) A framework for human microbiome research. Nature, 486(7402), 215–221. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0041] Jaberansary, M. , Maia, M. , Yediel, Y.U. , Beyan, O. & Kirsten, T. (2023) Analyzing distributed medical data in FAIR data spaces. In Companion Proceedings of the ACM Web Conference 2023, 1480–84. WWW '23 Companion. New York, NY, USA: Association for Computing Machinery.

[mbt270027-bib-0042] Jiang, D. , Armour, C.R. , Chenxiao, H. , Mei, M. , Tian, C. , Sharpton, T.J. et al. (2019) Microbiome multi‐omics network analysis: statistical considerations, limitations, and opportunities. Frontiers in Genetics, 10, 995. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0043] Jiang, Y. , Luo, J. , Huang, D. , Liu, Y. & Li, D.‐D. (2022) Machine learning advances in microbiology: a review of methods and applications. Frontiers in Microbiology, 13, 925454. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0044] Jin, J. , Yingying, Y. , Wang, R. , Zeng, X. , Pang, C. , Jiang, Y. et al. (2022) iDNA‐ABF: multi‐scale deep biological language learning model for the interpretable prediction of DNA methylations. Genome Biology, 23(1), 219. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0045] Johnson, J.S. , Spakowicz, D.J. , Hong, B.‐Y. , Petersen, L.M. , Demkowicz, P. , Chen, L. et al. (2019) Evaluation of 16S rRNA gene sequencing for species and strain‐level microbiome analysis. Nature Communications, 10(1), 5029. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0046] Karabacak, M. & Margetis, K. (2023) Embracing large language models for medical applications: opportunities and challenges. Cureus, 15(5), e39305. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0047] Kim, D. , Hofstaedter, C.E. , Zhao, C. , Mattei, L. , Tanes, C. , Clarke, E. et al. (2017) Optimizing methods and dodging pitfalls in microbiome research. Microbiome, 5(1), 52. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0112] Lan, F. , Saba, J. , Ross, T.D. , Zhou, Z. , Krauska, K. , Anantharaman, K. et al. (2024) Massively parallel single‐cell sequencing of diverse microbial populations. Nature Methods, 21(2), 228–235. Available from: 10.1038/s41592-023-02157-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0048] Li, B. , Wang, T. , Qian, M. & Wang, S. (2023) MKMR: a multi‐kernel machine regression model to predict health outcomes using human microbiome data. Briefings in Bioinformatics, 24(3). Available from: 10.1093/bib/bbad158 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0049] Li, M. , Liu, J. , Zhu, J. , Wang, H. , Sun, C. , Gao, N.L. et al. (2023) Performance of gut microbiome as an independent diagnostic tool for 20 diseases: cross‐cohort validation of machine‐learning classifiers. Gut Microbes, 15(1), 2205386. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0050] Liu, J. , Yang, M. , Yu, Y. , Xu, H. , Li, K. & Zhou, X. (2024) Large language models in bioinformatics: applications and perspectives. ArXiv, January. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10802675/.

[mbt270027-bib-0051] Liu, Y. , Zhu, J. , Wang, H. , Wenwei, L. , Lee, Y.K. , Zhao, J. et al. (2022) Machine learning framework for gut microbiome biomarkers discovery and modulation analysis in large‐scale obese population. BMC Genomics, 23(1), 850. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0052] Lo, C. & Marculescu, R. (2019) MetaNN: accurate classification of host phenotypes from metagenomic data using neural networks. BMC Bioinformatics, 20(Suppl 12), 314. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0113] Lötstedt, B. , Stražar, M. , Xavier, R. , Regev, A. & Vickovic, S. (2023) Spatial host–microbiome sequencing reveals niches in the mouse gut. Nature Biotechnology, 42(9), 1394–1403. Available from: 10.1038/s41587-023-01988-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0053] Mabwi, H.A. , Kim, E. , Song, D.‐G. , Yoon, H.S. , Pan, C.‐H. , Komba, E.V.G. et al. (2021) Synthetic gut microbiome: advances and challenges. Computational and Structural Biotechnology Journal, 19, 363–371. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0054] Maranga, M. , Szczerbiak, P. , Bezshapkin, V. , Gligorijevic, V. , Chandler, C. , Bonneau, R. et al. (2023) Comprehensive functional annotation of metagenomes and microbial genomes using a deep learning‐based method. mSystems, 8(2), e0117822. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0055] Marbouty, M. , Thierry, A. , Millot, G.A. & Koszul, R. (2021) MetaHiC phage‐bacteria infection network reveals active cycling phages of the healthy human gut. eLife, 10. Available from: 10.7554/eLife.60608 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0056] Matchado, M.S. , Lauber, M. , Reitmeier, S. , Kacprowski, T. , Baumbach, J. , Haller, D. et al. (2021) Network analysis methods for studying microbial communities: a mini review. Computational and Structural Biotechnology Journal, 19, 2687–2698. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0057] Matchado, M.S. , Rühlemann, M. , Reimeiter, S. , Kacprowski, T. , Frost, F. , Haller, D. et al. (2023) On the limits of 16S rRNA gene‐based metagenome prediction and functional profiling. bioRxiv. 10.1101/2023.11.07.564315 [DOI] [PMC free article] [PubMed]

[mbt270027-bib-0058] Matschinske, J. , Späth, J. , Bakhtiari, M. , Probul, N. , Majdabadi, M.M.K. , Nasirigerdeh, R. et al. (2023) The FeatureCloud platform for federated learning in biomedicine: unified approach. Journal of Medical Internet Research, 25, e42621. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0059] McCoubrey, L.E. , Elbadawi, M. , Orlu, M. , Gaisford, S. & Basit, A.W. (2021) Harnessing machine learning for development of microbiome therapeutics. Gut Microbes, 13(1), 1–20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0060] Mills, R.H. , Dulai, P.S. , Vázquez‐Baeza, Y. , Sauceda, C. , Daniel, N. , Gerner, R.R. et al. (2022) Multi‐omics analyses of the ulcerative colitis gut microbiome link Bacteroides vulgatus proteases with disease severity. Nature Microbiology, 7(2), 262–276. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0061] Miura, N. & Okuda, S. (2023) Current progress and critical challenges to overcome in the bioinformatics of mass spectrometry‐based metaproteomics. Computational and Structural Biotechnology Journal, 21, 1140–1150. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0062] Moreno‐Indias, I. , Lahti, L. , Nedyalkova, M. , Elbere, I. , Roshchupkin, G. , Adilovic, M. et al. (2021) Statistical and machine learning techniques in human microbiome studies: contemporary challenges and solutions. Frontiers in Microbiology, 12, 635781. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0063] Mreyoud, Y. , Song, M. , Lim, J. & Ahn, T.‐H. (2022) MegaD: deep learning for rapid and accurate disease status prediction of metagenomic samples. Life, 12(5). Available from: 10.3390/life12050669 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0064] Muller, E. , Shiryan, I. & Borenstein, E. (2024) Multi‐Omic integration of microbiome data for identifying disease‐associated modules. Nature Communications, 15(1), 2621. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0065] Nasirigerdeh, R. , Torkzadehmahani, R. , Matschinske, J. , Frisch, T. , List, M. , Späth, J. et al. (2022) sPLINK: a hybrid federated tool as a robust alternative to meta‐analysis in genome‐wide association studies. Genome Biology, 23(1), 32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0066] Nguyen, T.H. , Prifti, E. , Chevaleyre, Y. , Sokolovska, N. & Zucker, J.‐D. (2018) Disease classification in metagenomics with 2D embeddings and deep learning. arXiv [cs.CV]. arXiv. http://arxiv.org/abs/1806.09046

[mbt270027-bib-0067] Oh, M. & Zhang, L. (2020) DeepMicro: deep representation learning for disease prediction based on microbiome data. Scientific Reports, 10(1), 6026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0068] Ojala, T. , Häkkinen, A.‐E. , Kankuri, E. & Kankainen, M. (2023) Current concepts, advances, and challenges in deciphering the human microbiota with metatranscriptomics. Trends in Genetics: TIG, 39(9), 686–702. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0069] Olsson, L.M. , Boulund, F. , Nilsson, S. , Khan, M.T. , Gummesson, A. , Fagerberg, L. et al. (2022) Dynamics of the normal gut microbiota: a longitudinal one‐year population study in Sweden. Cell Host & Microbe, 30(5), 726–739.e3. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0070] Open Science . (n.d.) Research and innovation. https://research‐and‐innovation.ec.europa.eu/strategy/strategy‐2020‐2024/our‐digital‐future/open‐science_en [Accessed 29th August 2024].

[mbt270027-bib-0071] Papoutsoglou, G. , Tarazona, S. , Lopes, M.B. , Klammsteiner, T. , Ibrahimi, E. , Eckenberger, J. et al. (2023) Machine learning approaches in microbiome research: challenges and best practices. Frontiers in Microbiology, 14, 1261889. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0072] Pavlopoulos, G.A. , Baltoumas, F.A. , Liu, S. , Selvitopi, O. , Camargo, A.P. , Nayfach, S. et al. (2023) Unraveling the functional dark matter through global metagenomics. Nature, 622(7983), 594–602. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0073] Peng, H. , Long, F. & Ding, C. (2005) Feature selection based on mutual information: criteria of max‐dependency, max‐relevance, and min‐redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0074] Pruss, K.M. & Sonnenburg, J.L. (2021) C. difficile exploits a host metabolite produced during toxin‐mediated disease. Nature, 593(7858), 261–265. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0075] Queen, O. & Emrich, S.J. (2021) LASSO‐based feature selection for improved microbial and microbiome classification. In 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2301–8. IEEE.

[mbt270027-bib-0076] Rao, C. , Coyte, K.Z. , Bainter, W. , Geha, R.S. , Martin, C.R. & Rakoff‐Nahoum, S. (2021) Multi‐kingdom ecological drivers of microbiota assembly in preterm infants. Nature, 591(7851), 633–638. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0077] Ratiner, K. , Ciocan, D. , Abdeen, S.K. & Elinav, E. (2023) Utilization of the microbiome in personalized medicine. Nature Reviews. Microbiology, 22, 291–309. Available from: 10.1038/s41579-023-00998-9 [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0078] RDA . (n.d.) Research data alliance. https://www.rd‐alliance.org/ [Accessed 29th August 2024].

[mbt270027-bib-0079] Reitmeier, S. , Kiessling, S. , Clavel, T. , List, M. , Almeida, E.L. , Ghosh, T.S. et al. (2020) Arrhythmic gut microbiome signatures predict risk of type 2 diabetes. Cell Host & Microbe, 28(2), 258–272.e6. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0080] Ren, Z. , Jiang, L. , Di, Y. , Zhang, D. , Gong, J. , Gong, J. et al. (2024) CodonBERT: a BERT‐based architecture tailored for codon optimization using the cross‐attention mechanism. Bioinformatics (Oxford, England), 40(7), btae330. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0081] Ricardo, H.M. , Kutuzova, S. , Nielsen, K.N. , Johansen, J. , Hansen, L.H. , Nielsen, M. et al. (2022) Machine learning and deep learning applications in microbiome research. ISME Communications, 2(1), 98. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0082] Richardson, L. , Allen, B. , Baldi, G. , Beracochea, M. , Bileschi, M.L. , Burdett, T. et al. (2023) MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Research, 51(D1), D753–D759. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0083] Roach, J. , Mital, R. , Haffner, J.J. , Colwell, N. , Coats, R. , Palacios, H.M. et al. (2024) Microbiome metabolite quantification methods enabling insights into human health and disease. Methods, 222, 81–99. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0084] Routy, B. , Le Chatelier, E. , Derosa, L. , Duong, C.P.M. , Alou, M.T. , Daillère, R. et al. (2018) Gut microbiome influences efficacy of PD‐1‐based immunotherapy against epithelial tumors. Science, 359(6371), 91–97. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0085] Sankaran, K. & Holmes, S.P. (2019) Multitable methods for microbiome data integration. Frontiers in Genetics, 10, 627. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0086] Santana, P.T. , Rosas, S.L.B. , Ribeiro, B.E. , Marinho, Y. & de Souza, H.S.P. (2022) Dysbiosis in inflammatory bowel disease: pathogenic role and potential therapeutic targets. International Journal of Molecular Sciences, 23(7). Available from: 10.3390/ijms23073464 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0087] Selma‐Royo, M. , Segata, N. & Ricci, L. (2023) Human microbiome cultivation expands with AI. Nature Biotechnology, 41(10), 1389–1391. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0088] Shao, Z. , Zhao, R. , Yuan, S. , Ding, M. & Wang, Y. (2022) Tracing the evolution of AI in the past decade and forecasting the emerging trends. Expert Systems with Applications, 209, 118221. [Google Scholar]

[mbt270027-bib-0089] Sharma, D. & Xu, W. (2021) phyLoSTM: a novel deep learning model on disease prediction from longitudinal microbiome data. Bioinformatics, 37(21), 3707–3714. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0090] Shen, Y. , Zhu, J. , Deng, Z. , Wenwei, L. & Wang, H. (2023) EnsDeepDP: an ensemble deep learning approach for disease prediction through metagenomics. IEEE/ACM Transactions on Computational Biology and Bioinformatics/IEEE, ACM, 20(2), 986–998. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0091] Singhal, K. , Tu, T. , Gottweis, J. , Sayres, R. , Wulczyn, E. , Le Hou, K.C. et al. (2023) Towards expert‐level medical question answering with large language models. arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2305.09617.

[mbt270027-bib-0114] Spor, A. , Koren, O. & Ley, R. (2011) Unravelling the effects of the environment and host genotype on the gut microbiome. Nature Reviews Microbiology, 9(4), 279–290. Available from: 10.1038/nrmicro2540 [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0092] Štrumbelj, E. & Kononenko, I. (2014) Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41(3), 647–665. [Google Scholar]

[mbt270027-bib-0093] Su, Q. , Liu, Q. , Lau, R.I. , Zhang, J. , Zhilu, X. , Yeoh, Y.K. et al. (2022) Faecal microbiome‐based machine learning for multi‐class disease diagnosis. Nature Communications, 13(1), 6818. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0094] Sun, T. , Niu, X. , He, Q. , Chen, F. & Qi, R.‐Q. (2023) Artificial intelligence in microbiomes analysis: a review of applications in dermatology. Frontiers in Microbiology, 14, 1112010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0095] Syama, K. , Jothi, A.A.A. & Khanna, N. (2023) Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE. BMC Bioinformatics, 24(1), 126. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0096] Talmor‐Barkan, Y. , Bar, N. , Shaul, A.A. , Shahaf, N. , Godneva, A. , Bussi, Y. et al. (2022) Metabolomic and microbiome profiling reveals personalized risk factors for coronary artery disease. Nature Medicine, 28(2), 295–302. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0097] Thomas, A.M. , Manghi, P. , Asnicar, F. , Pasolli, E. , Armanini, F. , Zolfo, M. et al. (2019) Metagenomic analysis of colorectal cancer datasets identifies cross‐cohort microbial diagnostic signatures and a link with choline degradation. Nature Medicine, 25(4), 667–678. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0098] Tian, T. , Wan, J. , Song, Q. & Wei, Z. (2019) Clustering single‐cell RNA‐seq data with a model‐based deep learning approach. Nature Machine Intelligence, 1(4), 191–198. [Google Scholar]

[mbt270027-bib-0099] Tu, T. , Palepu, A. , Schaekermann, M. , Saab, K. , Freyberg, J. , Tanno, R. et al. (2024) Towards conversational diagnostic AI. arXiv [cs.AI]. arXiv. http://arxiv.org/abs/2401.05654.

[mbt270027-bib-0100] UNESCO . (2021) UNESCO recommendation on Open Science. UNESCO. https://unesdoc.unesco.org/ark:/48223/pf0000379949.

[mbt270027-bib-0101] van der Velden, B.H.M. , Kuijf, H.J. , Gilhuijs, K.G.A. & Viergever, M.A. (2022) Explainable artificial intelligence (XAI) in deep learning‐based medical image analysis. Medical Image Analysis, 79, 102470. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0102] van Leeuwen, P.T. , Brul, S. , Zhang, J. & Wortel, M.T. (2023) Synthetic microbial communities (SynComs) of the human gut: design, assembly, and applications. FEMS Microbiology Reviews, 47(2). Available from: 10.1093/femsre/fuad012 [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0103] Whalen, S. , Schreiber, J. , Noble, W.S. & Pollard, K.S. (2022) Navigating the pitfalls of applying machine learning in genomics. Nature Reviews. Genetics, 23(3), 169–181. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0104] Yatsunenko, T. , Rey, F.E. , Manary, M.J. , Trehan, I. , Dominguez‐Bello, M.G. , Contreras, M. et al. (2012) Human gut microbiome viewed across age and geography. Nature, 486(7402), 222–227. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0105] Yen, S. & Johnson, J.S. (2021) Metagenomics: a path to understanding the gut microbiome. Mammalian Genome: Official Journal of the International Mammalian Genome Society, 32(4), 282–296. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0106] Yu, L.C.‐H. (2018) Microbiota dysbiosis and barrier dysfunction in inflammatory bowel disease and colorectal cancers: exploring a common ground hypothesis. Journal of Biomedical Science, 25(1), 79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[mbt270027-bib-0107] Zeevi, D. , Korem, T. , Zmora, N. , Israeli, D. , Rothschild, D. , Weinberger, A. et al. (2015) Personalized nutrition by prediction of glycemic responses. Cell, 163(5), 1079–1094. [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0115] Zheng, W. , Zhao, S. , Yin, Y. , Zhang, H. , Needham, D.M. , Evans, E.D. et al. (2022) High‐throughput, single‐microbe genomics with strain resolution, applied to a human gut microbiome. Science, 376(6597), eabm1483. Available from: 10.1126/science.abm1483 [DOI] [PubMed] [Google Scholar]

[mbt270027-bib-0108] Zhou, Z. , Ji, Y. , Li, W. , Dutta, P. , Davuluri, R. & Liu, H. (2023) DNABERT‐2: efficient foundation model and benchmark for multi‐species genome. arXiv [q‐Bio.GN]. arXiv. http://arxiv.org/abs/2306.15006

[mbt270027-bib-0109] Zolotareva, O. , Nasirigerdeh, R. , Matschinske, J. , Torkzadehmahani, R. , Bakhtiari, M. , Frisch, T. et al. (2021) Flimma: a federated and privacy‐aware tool for differential gene expression analysis. Genome Biology, 22(1), 338. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

AI in microbiome‐related healthcare

Niklas Probul

Zihua Huang

Christina Caroline Saak

Jan Baumbach

Markus List

Abstract

ARTIFICIAL INTELLIGENCE IN MICROBIOME RESEARCH

COMPREHENSIVE MOLECULAR PROFILING IN MICROBIOME RESEARCH

FIGURE 1.

APPLYING AI TO MICROBIOME‐RELATED DATA

CURRENT LIMITATIONS AND FUTURE PERSPECTIVES OF APPLYING AI IN MICROBIOME‐RELATED HEALTHCARE

CONCLUSION

GLOSSARY

AUTHOR CONTRIBUTIONS

CONFLICT OF INTEREST STATEMENT

ACKNOWLEDGEMENTS

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

AI in microbiome‐related healthcare

Niklas Probul

Zihua Huang

Christina Caroline Saak

Jan Baumbach

Markus List

Abstract

ARTIFICIAL INTELLIGENCE IN MICROBIOME RESEARCH

COMPREHENSIVE MOLECULAR PROFILING IN MICROBIOME RESEARCH

FIGURE 1.

APPLYING AI TO MICROBIOME‐RELATED DATA

CURRENT LIMITATIONS AND FUTURE PERSPECTIVES OF APPLYING AI IN MICROBIOME‐RELATED HEALTHCARE

CONCLUSION

GLOSSARY

AUTHOR CONTRIBUTIONS

CONFLICT OF INTEREST STATEMENT

ACKNOWLEDGEMENTS

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases