Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2024 Jun 28:2024.06.24.600378. [Version 1] doi: 10.1101/2024.06.24.600378

BioMapAI: Artificial Intelligence Multi-Omics Modeling of Myalgic Encephalomyelitis / Chronic Fatigue Syndrome

Ruoyun Xiong 1,2, Elizabeth Fleming 1, Ryan Caldwell 1, Suzanne D Vernon 3, Lina Kozhaya 1, Courtney Gunter 1,2, Lucinda Bateman 3, Derya Unutmaz 1, Julia Oh 1,*
PMCID: PMC11230215  PMID: 38979186

Abstract

Chronic diseases like ME/CFS and long COVID exhibit high heterogeneity with multifactorial etiology and progression, complicating diagnosis and treatment. To address this, we developed BioMapAI, an explainable Deep Learning framework using the richest longitudinal multi-’omics dataset for ME/CFS to date. This dataset includes gut metagenomics, plasma metabolome, immune profiling, blood labs, and clinical symptoms. By connecting multi-‘omics to asymptom matrix, BioMapAI identified both disease- and symptom-specific biomarkers, reconstructed symptoms, and achieved state-of-the-art precision in disease classification. We also created the first connectivity map of these ‘omics in both healthy and disease states and revealed how microbiome-immune-metabolome crosstalk shifted from healthy to ME/CFS. Thus, we proposed several innovative mechanistic hypotheses for ME/CFS: Disrupted microbial functions – SCFA (butyrate), BCAA (amino acid), tryptophan, benzoate - lost connection with plasma lipids and bile acids, and activated inflammatory and mucosal immune cells (MAIT, γδT cells) with INFγ and GzA secretion. These abnormal dynamics are linked to key disease symptoms, including gastrointestinal issues, fatigue, and sleep problems.

Introduction

Chronic diseases, such as cancer1, diabetes2, rheumatoid arthritis (RA)3, myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS)4, and possibly long COVID5,6, the sequela of SARS-CoV-2 infection, can evolve over decades and exhibit diverse phenotypic and physiological manifestations across individuals. This heterogeneity is reflected in disease progression and treatment responses, complicating the establishment of standardized clinical protocols, and demanding personalized therapeutic strategies7.

However, this heterogeneity has not been well studied, leaving substantial knowledge and technical gaps8. Current cohort studies often focus on identifying one or two key disease indicators, such as HbA1C levels for diabetes9,10 or survival rates for cancer11, even with the advent of multi-‘omics. This approach has difficulty accommodating the highly multifactorial etiology and progression of most chronic diseases, with different patients exhibiting varying symptoms and disease markers12. To address this challenge, methods must link a more complex matrix of disease-associated outcomes with a range of ‘omics data types to enable precise targeting of biomarkers tailored to each patient’s specific symptoms.

Here, we introduce BioMapAI, an explainable AI framework that we developed to integrate multi-‘omics data to decode complex host symptomatology, specifically applied to ME/CFS. Affecting at least 10 million people globally, ME/CFS is a chronic, complex, multi-system illness characterized by impaired function and persistent fatigue, post-exertional malaise, multi-site pain, sleep disturbances, orthostatic intolerance, cognitive impairment, gastrointestinal issues, and other symptoms 13,14,15. The pathogenesis of ME/CFS is not well understood, with triggers believed to include viral infections such as Epstein-Barr Virus (EBV)16, enteroviruses17 and SARS coronavirus18. As a chronic disease, ME/CFS can persist for years or even a lifetime, with each patient developing distinct illness patterns13. Therefore, a universal approach to clinical care and symptom management is insufficient, and a personalized approach is crucial for effectively addressing the complex nature of ME/CFS. Additionally, given similarities in causality and symptomatology to long COVID19,20, studying ME/CFS specifically can provide broader insights into post-viral syndromes, and more generally, our AI-driven approach can be applied to a range of diseases with complex symptomatology not readily explained by a single data type.

We generated a rich longitudinal, multi-‘omics dataset of 153 ME/CFS patients and 96 age-gender-matched healthy controls, comprised of gut metagenomics, plasma metabolome, immune cell profiling, activation, and cytokines, together with blood labs, detailed clinical symptoms, and lifestyle survey data. We aimed to: 1) identify new disease biomarkers - not only for ME/CFS but also to specify biomarkers that could explain the complex symptomatology, and 2) define interactions between microbiome, immune system, and metabolome – rather than studying single data types in isolation, we created the first connectivity map of these ‘omics. This map critically accounts for covariates such as age and gender, providing an important baseline in healthy individuals contrasted with aberrant connections identified in disease.

BioMapAI is a strategically designed Deep Neural Network (DNN) that connected the multi-‘omics profiles to a matrix of clinical symptoms. Here, applying to ME/CFS, it identifies both disease- and symptom-specific biomarkers, accurately reconstructing key clinical symptoms, achieves state-of-the-art precision in disease classification, and generates several innovative mechanistic hypotheses for disease. By revealing microbiome-immune-metabolome crosstalk shifts from healthy to diseased states, we found depletion of microbial butyrate (SCFA) and amino acids (BCAA) in ME/CFS, linked with abnormal activation of inflammatory and mucosal immune cells – MAIT and γδT cells with INFγ and GzA. This altered dynamic correlated with clinical symptom scores, indicating deteriorated health perception and impaired social activity. Microbial metabolites, like tryptophan and benzoate, lost connections with plasma lipids in patients, in turn associated with fatigue, emotional and sleeping problems. This dataset is the richest multi-‘omics dataset for ME/CFS, as well as for numerous other chronic diseases to date. It introduces a novel, generalizable, and explainable AI approach that captures the complexity of chronic disease and provides new hypotheses for host-microbiome interactions in both health and ME/CFS.

Results

Cohort Overview

We tracked 249 participants over 3–4 years, including 153 ME/CFS patients (75 ‘short-term’ with disease symptoms < 4 years and 78 ‘long-term’ with disease symptoms > 10 years) and 96 healthy controls (Fig 1A; Supplemental Table 1). The cohort is 68% female and 32% male, aligning with the epidemiological data showing that women are 3–4 times more likely to develop ME/CFS21,22. Participants ranged in age from 19 to 68 years with body mass indexes (BMI) from 16 to 43 kg/m². Throughout the study, we collected detailed clinical metadata, blood samples, and fecal samples. In total, 1471 biological samples were collected across all participants at 515 timepoints (Methods, Supplemental Figure 1A, Supplemental Table 1).

Figure 1: Cohort Summary and Heterogeneity of ME/CFS. A) Cohort Design and ‘Omics Profiling.

Figure 1:

96 healthy donors and 153 ME/CFS patients were followed over 3–4 years with yearly sampling. Clinical metadata including lifestyle and dietary surveys, blood clinical laboratory measures (N=503), gut microbiome (N=479), plasma metabolome (N=414), and immune profiles (N=489) were collected (Supplemental Table 1 and Supplemental Figure 1A). B) Heterogeneity and Non-Linear Progression of ME/CFS in Symptom Severity and ‘Omics Profiles. Variability in symptom severity (top) and ‘omics profiles (bottom) for 20 representative ME/CFS patients over 3–4 time points. For symptom severity, the 12 major clinical symptoms (x-axis) vs. severity (scaled from 0% to 100%, y-axis) is shown for each patient (each color), with lines showing average severity and shaded areas showing severity range over their timepoints. The widespread highlights the lack of consistent temporal patterns and unique symptomatology of ME/CFS (controls shown in Supplemental Figure 1C). Bottom, PCoA of integrated ‘omics data with color dots matching patient timepoints in the symptom plot and grey dots representing the entire cohort. Again, the spread and overlap of the colored space reflect the diversity in ‘omics signatures vs. the more consistent pattern typical of controls (Supplemental Figure 1B). Abbreviations: ME/CFS, Myalgic Encephalomyelitis/Chronic Fatigue Syndrome; PCoA, Principal Coordinates Analysis. Supporting Materials: Supplemental Table 1, Supplemental Figure 1.

Blood samples were 1) sent for clinical testing at Quest Laboratory (48 features measured, N=503 samples), 2) fractionated into peripheral blood mononuclear cells (PBMCs), which were examined via flow cytometry, yielding data on 443 immune cells and cytokines (N=489), 3) plasma and serum, for untargeted liquid chromatography with tandem mass spectrometry (LC-MS/MS), identifying 958 metabolites (N=414). Detailed demographic documentation and questionnaires covering medication use, medical history, and key ME/CFS symptoms were collected (Methods). Finally, whole-genome shotgun metagenomic sequencing of stool samples (N=479) produced an average of 12,302,079 high-quality, classifiable reads per sample, detailing gut microbiome composition (1293 species detected) and KEGG gene function (9993 genes reconstructed).

Heterogeneity and Non-linear Progression of ME/CFS

First, we demonstrated the phenotypic complexity and heterogeneity of ME/CFS. Collaborating with clinical experts, we consolidated detailed questionnaires and clinical metadata, foundational to diagnosing ME/CFS, into twelve essential clinical scores (Methods). These scores covered core symptoms including physical and mental health, fatigue, pain levels, cognitive efficiency, sleep disturbances, orthostatic intolerance, and gastrointestinal issues (Supplemental Table 1).

While healthy individuals consistently presented low symptom scores (Supplemental Figure 1D), ME/CFS patients exhibited significant variability in symptom severity, with each individual showing different predominant symptoms (Figure 1B). Principal coordinates analysis (PCoA) of the ‘omics matrices highlighted the difficulty in distinguishing patients from controls, emphasizing the complex symptomatology of ME/CFS and the challenges in developing predictive models (Supplemental Figure 1E). Additionally, over time, in contrast to the stable patterns typical of healthy individuals (Supplemental Figure 1B), ME/CFS patients demonstrated distinctly varied patterns each year, as evidenced by the diversity in symptom severity and noticeable separation on the ‘omics PCoA (Figure 1B, Supplemental Figure 1C). Despite employing multiple longitudinal models (Methods), we found no consistent temporal signals, confirming the non-linear progression of ME/CFS.

This individualized, multifaceted, and dynamic nature of ME/CFS that intensifies with disease progression necessitates new approaches that extend beyond simple disease versus control comparisons. Here, we created and implemented an AI-driven model that integrates the multi-‘omics profiles to learn host phenotypes. This allowed us not only to develop a state-of-the-art classifier for disease, but for the first time, to identify biomarker sets for each clinical symptom as well as unique interaction networks that differed between patients and controls.

BioMapAI, an Explainable Neural Network Connecting ‘Omics to Multi-Type Outcomes

To connect multi-‘omics data to clinical symptoms, a model must accommodate the learning of multiple different outcomes within a single framework. However, traditional machine learning models are generally designed to predict a single categorical outcome or continuous variable23,24,25. This simplified disease classification and conventional biomarker identification typically fails to encapsulate the heterogeneity of complex diseases26,27.

We developed an AI-powered multi-‘omics framework, BioMapAI, a fully connected deep neural network that inputs ‘omics matrices (X), and outputs a mixed-type outcome matrix (Y), thereby mapping multiple ‘omics features to multiple clinical indicators (Figure 2A). By assigning specific loss functions for each output, BioMapAI aims to comprehensively learn every y (i.e., each of the 12 continuous or categorical clinical scores in this study), using the ‘omics data inputs. Between the input layer X and the output layer Y=y1,y2,,yn, the model consists of two shared hidden layers (Z1 with 64 nodes, and Z2 with 32 nodes) for general pattern learning, followed by a parallel hidden Z3=z13,z23,,zn3, with sub-layers (zn3, each with 8 nodes) tailored for each outcome yn, to capture outcome-specific patterns (Figure 2A). This unique architecture – two shared and one specific hidden layer – allows the model to capture both general and output-specific patterns. This model is made 1) explainable by incorporating a SHAP (SHapley Additive exPlanations) explainer, which quantifies the feature importance of each predictions, providing both local (symptom-level) and global (disease-level) interpretability, and 2) flexible by automatically finding appropriate learning goals and loss functions for each type of outcomes (without need of format refinement), facilitating BioMapAI’s adaptability to broader research applications.

Figure 2: BioMapAI’s Model Structure and Performance.

Figure 2:

A) Structure of BioMapAI. BioMapAI is a fully connected deep neural network comprised of an input layer (X), a normalization layer (not shown), three sequential hidden layers Z1,Z2,Z3, and one output layer (Y). Hidden layer 1 (Z1, 64 nodes) and hidden layer 2 (Z2,32 nodes), both feature a dropout ratio of 50% to prevent overfitting (visually represented by dark and light gray nodes). Hidden layer 3 has 12 parallel sub-layers each with 8 nodes Z3=z13,z23,,z123 to learn 12 objects in the output layer Y=y1,y2,,y12 representing key clinical symptoms of ME/CFS. B) True vs. Predicted Clinical Scores highlight BioMapAI’s accuracy. Three example density maps (full set, Supplemental Figure 2A) compare the true score, y (Column 1) against BioMapAI’s predictions generated from different ‘omics profiles - yˆimmune, yˆspecies, yˆKEGG, yˆmetabolome, yˆomics (Columns 2–6). The color gradient from blue (lower density) to red (higher density) illustrates the occurrence frequency (e.g., true scores for ~100% of healthy controls’ physical health ~ 0 = red), with dashed lines indicating key statistical percentiles (100%, 75%, 50%, 25%, and 0%). Note that model’s predicted scores a preserve differences between healthy controls and patients for these three examples, irrespective of ‘omics type. C) ‘Omics’ Strengths in Symptom Prediction. Radar plot shows BioMapAI’s performance in predicting the 12 clinical outcomes for each ‘omics datatype. Each of the 12 axes represents a clinical score output (Y=y1,y2,,y12), with five colors denoting the ‘omics datasets used for model training. The spread of each color along an axis reflects the normalized mean square error (MSE, Supplemental Table 2) between the actual, y, and the predicted, yˆ, outputs, illustrating the predictive strength or weakness of each ‘omics for specific clinical scores. For instance, species abundance predicted gastrointestinal, emotional, and sleep issues effectively, while the immune profile was broadly accurate across most scores. D) BioMapAI’s Performance in Healthy vs. Disease Classification. ROC curves show BioMapAI’s performance in disease classification using each ‘omics dataset separately or combined (‘Omics’), with the AUC in parentheses showing prediction accuracy (full report in Supplemental Table 3). E) Validation of BioMapAI with External Cohorts. External cohorts with microbiome data (Guo et al.28, Ruud et al.29) and metabolome data (Germain et al.30, Che et al.32) were used to test BioMapAI’s model, underscoring its generalizability (detailed classification matrix, Supplemental Table 4). Abbreviations: KEGG, Kyoto Encyclopedia of Genes and Genomes; ‘Omics’ refers to the combined multi-‘omics matrix; MSE, Mean Square Error; ROC curve, Receiver Operating Characteristic curve; AUC, Area Under the Curve; y, True Score; yˆ, Predicted Score. Supporting Materials: Supplemental Tables 2–4, Supplemental Figures 12.

BioMapAI Reconstructed Clinical Symptoms and Achieved State-of-the-Art Performance in Discriminating ME/CFS from Healthy Controls

BioMapAI is a versatile AI framework connecting a biological ‘omics matrix to multiple phenotypic outputs. It does not have a specific disease focus and is designed to be applicable to a range of applications. Here, we trained and validated its usage with our ME/CFS datasets, employing a five-fold cross-validation. This trained model, nicknamed DeepMECFS for the ME/CFS community, accurately represented the structure of diverse clinical symptom score types and discriminated between healthy individuals and patients (Figure 2, Supplemental Figure 2, Supplemental Table 2–3). For example, it effectively differentiated the physical health scores, where patients exhibited more severe conditions compared to healthy controls (category datatype 4 vs. 0, respectively, Figure 2B, Supplemental Table 2) and pain scores (continuous datatype ranging from 1(highest)-0(lowest), mean 0.52±0.24 vs. 0.11±0.12 for patients vs. controls). Though compressing some inherent variance, BioMapAI accurately reconstructed key statistical measures such as the mean and interquartile range (25%–75%), and highlighted the distinctions between healthy and disease. (Figure 2B, Supplemental Figure 2AB, Supplemental Table 2).

To determine the accuracy of reconstructed clinical scores by BioMapAI’s integration of ‘omics data, we compared their ability to discriminate ME/CFS patients from controls with the original clinical scores. We used one additional fully connected layer to regress the 12 predicted clinical scores Y^(12,) into a binary outcome of patient vs. controly Y^(1,). Because the diagnosis of ME/CFS relies on clinical interpretation of key symptoms (i.e., the original clinical scores), the original clinical scores have near-perfect accuracy in classification as expected (AUC, Area Under the Curve >99%, Supplemental Figure 2C). Notably, BioMapAI’s predicted scores based on the ‘omics data achieved a 91% AUC, highlighting its leading-edge accuracy in disease vs. healthy classification (Figure 2D, Supplemental Figure 2D), which was also superior to the performance of three ML models - linear regression (LR), support vector machine (SVM), and gradient boosting (GDBT) - and one deep learning model (DNN) without the hidden 3, ‘spread out’ layer (Supplemental Table 3). BioMapAI particularly excelled utilizing immune features (AUC = 80%), KEGG genes (78%), blood measure models (71%) and combined ‘omics (91%). GDBT, however, led in the microbial species (75%) and metabolome (74%) models, likely due to its emphasis on specific features.

Finally, to assess the robustness of our BioMapAI model, we validated it with independent, published ME/CFS cohorts (Figure 2E, Supplemental Table 4). Using data from two microbiome cohorts, Guo, Cheng et al., 2023 (US)28 and Raijmakers, Ruud et al., 2020 (Netherlands)29, BioMapAI achieved 72% and 63% accuracy in species relative abundance and 58% and 60% accuracy in microbial KEGG gene abundance. When applied to two metabolome cohorts, Germain, Arnaud et al., 2022 (US)30 and Che, Xiaoyu et al., 2022 (US)31, BioMapAI attained 68% and 59% accuracy. These results were strong given that the metabolomic features only overlap by 79% and 19%, respectively, due to methodological variations.

Importantly, BioMapAI significantly surpassed GDBT and DNN in external cohort validation, supporting our theory that while commonly used models, such as tree-based GDBT, may be effective within a single study, their overemphasis on specific key features can limit its generalizability across different studies, which may not share the same biomarkers. BioMapAI’s effectiveness also highlighted the value of incorporating clinical symptoms into a predictive model, proving that connecting ‘omics features to clinical symptoms improves disease classification. Given the limitations of using external cohorts – which often have significant methodological differences and cohort characteristics – to validate traditional microbiome and metabolite ML models32,33,34, BioMapAI represents a breakthrough as a far more adaptable and broadly applicable model.

‘Omics’ Strengths Varied in Symptom Prediction; Immune is the Most Predictive

A major innovation of BioMapAI is its ability to leverage different ‘omics data to predict individual clinical scores in addition to disease vs. healthy classification. We evaluated the predictive accuracy by calculating the mean squared error between actual (𝑦) and predicted (y^) scores and observed that the different ‘omics showed varying strengths in predicting clinical scores (Figure 2C). Immune profiling consistently excelled in forecasting a wide range of symptoms, including pain, fatigue, orthostatic intolerance, and general health perception, underscoring the immune system’s crucial role in health regulation. In contrast, blood measurements demonstrated limited predictive ability, except for cognitive efficiency, likely owing to their limited focus on 48 specific blood bioactives. Plasma metabolomics, which encompasses nearly a thousand measurements, performed significantly better with notable correlations with facets of physical health and social activity. These findings corroborate published metabolites and mortality35,36, longevity37,38, cognitive function39, and social interactions40,41,42. Microbiome profiles surpassed other ‘omics in predicting gastrointestinal abnormalities (as expected43,44), emotional well-being, and sleep problems, supporting recently established links in gut-brain health45,46,47.

BioMapAI is Explainable, Identifying Disease- and Symptom-Specific Biomarkers

Deep learning (DL) models are often referred to as ‘black box’, with limited ability to identify and evaluate specific features that influence the model’s predictions. BioMapAI is made explainable by incorporating SHAP values, which quantify how each feature influenced the model’s predictions. BioMapAI’s architecture – two shared layers Z1 and Z2 for general disease pattern learning and one parallel layer for each clinical score Z3=z13,z23,,z123 – allowed us to identify both disease-specific biomarkers, which are shared across symptoms and models (Supplemental Figure 3, Supplemental Table 5), and symptom-specific biomarkers, which are tailored to each clinical symptom (Figure 3, Supplemental Figure 45, Supplemental Table 6).

Figure 3: BioMapAI Identifies both Disease- and Symptom-Specific Biomarkers.

Figure 3:

For Symptom-Specific Biomarkers, A) Circularized Diagram of Species Model with B) Zoomed Segment for Pain. Each circular panel illustrates how the model predicts each of the 12 symptom-specific biomarkers derived from one type of ‘omics data (all datatypes shown in Supplemental Figure 4). The x-axis for each panel represents an individual’s values for each of the following contributors to the model’s performance (from top to bottom): 1. Variance Explained by Biomarker Categories: Gradients of dark green (100%) to white (0%) show variance explained by the model. For many biomarkers, disease-specific biomarkers account for the greatest proportion of variance, and symptom-specific biomarkers provide additional tailored explanations, with residual accounting for the remaining variance; 2. Aggregated SHAP Values quantify the contribution of each feature to the model’s predictions, with disease-specific biomarkers in grey and symptom-specific in purple. 3. Demography and Cohort Classification: cohort (controls, white vs. patients, black); age <50 (white) vs. >50 years old (black); sex (male, white vs. female, black); 4. True vs. Predicted Scores show BioMapAI’s predictive performance at the individual sample level, with true in blue and model-predicted scores in orange; 5. Examples of Symptom-Specific Biomarkers: Line graphs show the contribution of select symptom-specific biomarkers to the model across individuals, e.g., 5 gut species in A). In B), the three features most specific to the pain model include gut microbe F. prausnitzii, CD4 memory T, and DC CD1c+ cells. Peaks above 0 (middle line) indicate a positive contribution and below 0 for a negative contribution. For example, the mixed positive and negative contribution peaks of F. prausnitzii indicated a biphasic contribution to pain intensity. Disease-Specific Biomarkers are shown in Supplemental Figure 3. C) Different Correlation Patterns of Biomarkers to Symptoms: For pain (other symptoms in Supplemental Figure 5), correlation analysis of raw abundance (x-axis) of each biomarker with pain score (y-axis) show monotonic (e.g., CD4 memory and DC CD1c+ markers), biphasic (microbial and metabolomic markers), or sparse (KEGG genes) contribution patterns for those features. Dots represent an individual color-coded to SHAP value, where the color spectrum indicates negative (blue) to neutral (grey) to positive (red) contributions to pain prediction. Superimposed trend lines with shaded error bands represents the predicted correlation trends between biomarkers and pain intensity. Adjacent bar plots represent the data distribution. D-E) Examples of Pain-Specific Biomarkers’ Contributions. SHAP waterfall plots (colors corresponding to gradient in C) illustrate the contribution of individual features to a model’s predictive output. The top 10 features for two pairs of controls and patients are shown here, illustrating the species and the immune model (additional examples in Supplemental Figure 4A). The contribution of each feature is shown as a step (SHAP values provided adjacent), and the cumulative effect of all the steps provides the final prediction value, E[f(X)]. Our example of F. prausnitzii exhibits a protective role (negative SHAP) in controls but exacerbates pain (positive SHAP) in patients – consistent with the biphasic relationship observed in C). As a second example, all CD4 memory cells in this model have positive SHAP values, reinforcing the positive monotonic relationship with pain severity observed in C). Conversely, DC CD1c+ cells contribute negatively and thus may have a protective role. Abbreviation: SHAP, SHapley Additive exPlanations; DNN, Deep Neuron Network; GBDT, Gradient Boosting Decision Tree; KEGG, Kyoto Encyclopedia of Genes and Genomes. Supporting Materials: Supplemental Table 5–6, Supplemental Figure 35.

Disease-specific biomarkers are important features across symptoms and models (Methods, Supplemental Figure 3). Increased B cells (CD19+CD3−), CCR6+ CD8 memory T cells (mCD8+CCR6+CXCR3−), and CD4 naïve T cells (nCD4+FOXP3+) in patients were pivotal for most symptoms, indicating a systemic dysregulation of the adaptive immune response. The species model highlighted the importance of Dysosmobacteria welbionis, a gut microbe previously reported in obesity and diabetes, with a critical role in bile acid and butyrate metabolism48,49. The metabolome model categorized increased levels of glycodeoxycholate 3-sulfate, a bile acid, and decreased vanillylmandelate (VMA), a catecholamine breakdown product50. These critical features for all symptoms were consistently validated across ML and DL models, demonstrating the efficacy of BioMapAI (Supplemental Table 5).

More uniquely, BioMapAI linked ‘omics profiles to clinical symptoms and thus enabled the identification of symptom-specific biomarkers (Figure 3A). Certain ‘omics data, like species-gastrointestinal and immune-pain associations, were especially effective in predicting specific clinical phenotypes (Figure 2C). Utilizing SHAP, BioMapAI identified distinct sets of biomarkers for each symptom (Supplemental Table 6, Supplemental Figure 5). We found that while disease-specific biomarkers accounted for a substantial portion of the variance, symptom-specific biomarkers crucially refined the predictions, aligned predicted scores – consistently across age and gender – more closely with actual values (Figure 3AB, Supplemental Figure 4BD). For example, in the case of pain, CD4 memory and CD1c+ dendritic cells (DC) were particularly important features, and Faecalibacterium prausnitzii was uniquely linked as well with varying impact across individual (Figure 3B). Similar to pain, each clinical score in ME/CFS was characterized by its unique ‘omics features, distinct from those common across other symptoms (Supplemental Table 6).

In addition, we observed a spectrum of interaction types (linear, biphasic, and dispersed) extending beyond conventional linear interactions, underscoring the heterogeneity inherent in ME/CFS (Figure 3C). High-abundance species and immune cells often had a biphasic relationship with symptoms, showing dual effects, while low-abundance species and metabolites displayed a linear relationship with positive or negative associations with clinical scores (Supplemental Figure 5).

An example of a relatively straightforward monotonic (linear) relationship was observed between CD4 memory (CD4 M) cells, CD1c+ DCs and pain, with positive contributions of CD4 M cells to pain intensity severity. Conversely, CD1c+ DCs contributed negatively to pain severity in both patients and control (Figure 3C, E). These variations suggest alterations in inflammatory responses and specific pathogenic processes in ME/CFS, which may be virally triggered and is marked by prolonged infection symptoms. Many microbial biomarkers demonstrated linear contributions to symptoms, evidenced by numerous negative peaks indicating their beneficial role in symptom reduction (Figure 3A). For example, Dysosmobacteria welbionis, a disease-specific biomarker, exacerbated sleeping and gastrointestinal issues (Supplemental Figure 3), whereas Clostridium sp. and Alistipes communis alleviated these issues (Figure 3A, Supplemental Figure 5B).

A more complex, biphasic relationship was observed in the interaction of Faecalibacterium prausnitzii with pain, whose saddle curve (Figure 3C) and mixture of positive and negative contribution peaks (Figure 3B) revealed how abnormal low and high abundances could be associated with amplified pain. In disease, F. prausnitzii was associated with exacerbated pain, while in healthy individuals, it appeared to mitigate pain (Figure 3D). F. prausnitzii was identified as a biomarker in several ME/CFS cohorts28,29,51, but also has been implicated in numerous anti-inflammatory effects52,53,54,55. Here notably, BioMapAI elaborated its role at ME/CFS by recognizing its potential dual contribution to symptom severity. Similar biphasic relationships were observed for plasma metabolomics biomarkers, glucuronide and glutamine, in relation to pain (Figure 3C).

Distinct from other ‘omics features, KEGG genes exhibited sparse and dispersed contributions (Figure 3C, Supplemental Figure 4C). The vast feature matrix of KEGG models complicated the identification of a universal biomarker for any single symptom, as individuals possessed distinct symptom-specific KEGG biomarkers. For example, the gene FNR, an anaerobic regulatory protein transcription factor, negatively impacted pain but was active in only a small portion of patients, with the majority showing no significant impact (Figure 3C). This pattern was consistent for other KEGG biomarkers, which contributed sparsely to symptom severity (Supplemental Figures 4C).

Taken together, BioMapAI achieved a comprehensive mapping of the intricate nature of symptom-specific biomarkers to clinical phenotypes that has been inaccessible to single models to date. Our models unveil a nuanced and precise correlation between ‘omics features and disease symptomology, emphasizing ME/CFS’ complex etiology and consequent disease management approaches.

Healthy Microbiome-Immune-Metabolome Networks are Dysbiotic in ME/CFS

BioMapAI elucidated that each ‘omics layer provided distinct insights into the disease symptoms and influenced host phenotypes in a dynamic and complex manner. To examine crosstalk between ‘omics layers, we modeled co-expression modules for each ‘omics using weighted gene co-expression network analysis (WGCNA), identifying seven microbial species, six microbial gene set, nine metabolome, and nine immune clusters (Methods, Supplemental Table 7). Observing significant associations of these modules with disease classification (microbial modules), age and gender (immune and metabolome modules) (Supplemental Figure 6A), we first established baseline networks of inter-‘omics interactions in healthy individuals as a function of these and other clinical covariates such as age, weight, and gender (Figure 4A), and then examined how these interactions were altered in patient populations (Figure 4B, Supplemental Figure 6BC).

Figure 4: Microbiome-Immune-Metabolome Crosstalk is Dysbiotic in ME/CFS.

Figure 4:

A-B) Microbiome-Immune-Metabolome Network in A) Healthy and B) Patient Subgroups. A baseline network was established with 200+ healthy control samples (A), bifurcating into two segments: the gut microbiome (species in yellow, genetic modules in orange) and blood elements (immune modules in green, metabolome modules in purple). Nodes: modules; size: # of members; colors: ‘omics type; edges: interactions between modules, with Spearman coefficient (adjusted) represented by thickness, transparency, and color - positive (red) and negative (blue). Here, key microbial pathways (pyruvate, amino acid, and benzoate) interact with immune and metabolome modules in healthy individuals. Specifically, these correlations were disrupted in patient subgroups (B), as a function of gender, age (young <26 years old vs. older >50), BMI (normal <26 vs. overweight >26), and health status (individuals with IBS or infections). Correlations significantly shifted from healthy counterparts (Supplemental Figure 6C) are highlighted with colored nodes and edges indicating increased (red) or decreased (blue) interactions. C) Targeted Microbial Pathways and Host Interactions. Four important microbial metabolic mechanisms (tryptophan, butyrate, BCAA, benzoate) were further analyzed to compare control, short and long-term ME/CFS patients, and external cohorts for validation (Guo28 and Raijmakers29).1. Microbial Pathway Fold Change: Key genes were grouped and annotated in subpathways. Circle size: fold change over control; color: increase (red) or decrease (blue), p-values (adjusted Wilcoxon) marked. 2. Microbiome-Host Interactions: Sankey diagrams visualize interactions between microbial pathways and host immune cells/metabolites. Line thickness and transparency: Spearman coefficient (adjusted); color: red (positive), blue (negative). 3. Immune & Metabolites Fold Change: Pathway-correlated immune cells and metabolites are grouped by category. 4. Contribution to Disease Symptoms: Stacked bar plots show accumulated SHAP values (contributions to symptom severity) for each disease symptom (1–12, as in Supplemental Table 1). Colors: microbial subpathways and immune/metabolome categories match module color in fold change maps. X-axis: accumulated SHAP values (contributions) from negative to positive, with the most contributed symptoms highlighted. P-values: *p < 0.05, **p < 0.01, ***p < 0.001. Abbreviations: IBS, Irritable Bowel Syndrome; BMI, Body Mass Index; BCAA, Branched-Chain Amino Acids; MAIT, Mucosal-Associated Invariant T cell; SHAP, SHapley Additive exPlanations; GPE, Glycerophosphoethanolamine; INFγ, Interferon Gamma; CD, Cluster of Differentiation; Th, T helper cell; TMAO, Trimethylamine N-oxide; KEGG, Kyoto Encyclopedia of Genes and Genomes. Supporting Materials: Supplemental Table 7–8, Supplemental Figure 6.

Healthy control-derived host-microbiome interactions, such as the microbial pyruvate module interacting with multiple immune modules, and connections between commensal gut microbes (Prevotella, Clostridia sp., Ruminococcaceae) with Th17 memory cells, plasma steroids, phospholipids, and tocopherol (vitamin E) (Figure 4A), were disrupted in ME/CFS patients. Increased interactions between gut microbiome and mucosal/inflammatory immune modules, including CD8+ MAIT, and INFg+ CD4 memory cells, suggested a microbiome-mediated intensified inflammatory in ME/CFS (Supplemental Figure 6D). Young, female, and normal-weight patients shared those changes, while male patients showed more distinct alterations in the interplay between microbial and plasma metabolites. Elderly and overweight patients had more interaction abnormalities than other subgroups, with specific increases between Blautia, Flavonifractor, Firmicutes sp. linked with TNFα cytotoxic T cells and plasma plasmalogen, and decreased interactions between Lachnospiraceae sp. with Th17 cells (Figure 4B).

Further examining the pyruvate hub as well as several other key microbial modules whose networks were dysbiotic in patients, we mapped the interactions of their metabolic subpathways to plasma metabolites and immune cells and detailed the collective contributions to host phenotypes (Figure 4C, Supplemental Table 8). We further validated these findings with two independent cohorts (Guo 202328 and Raijmakers 202029). For example, increased tryptophan metabolism, linked to gastrointestinal issues, lost its inhibitory effect on Th22 cells, and gained interactions with γδ T cells and the secretion of INFg and GzA from CD8 and CD8+ MAIT cells. Several networks linked with emotional dysregulation and fatigue – again underscoring the gut-brain axis47 – differed significantly in patients vs. controls, including decreased butyrate production - especially from the pyruvate56 and glutarate57 sub-pathways- and branched-chain amino acid (BCAA) biosynthesis, which lost or reversed their interactions with Th17, Treg cells, and plasma lipids while gaining interactions with inflammatory immune cells including γδ T and CD8+ MAIT cells in patients; and increased microbial benzoate, synthesized by Clostridia sp.58,59 then converted to hippurate in the liver60,61, showed a strong positive correlation with plasma hippurate in long-term ME/CFS patients, supporting enhanced pathway activity in later stages of the disease. This change altered its interactions with numerous plasma metabolites, including steroids, phenols, BCAAs, fatty acids, and vitamins B5 and B6. Finally, we noted that connections of short-term patients often resembled a transitional phase, with dysbiotic health-associated networks and emergent pathological connections that solidified in long-term ME/CFS patients.

Based on BioMapAI’s outputs and network analyses, we propose that the shift in disease pathology in ME/CFS is linked to the topological interaction of the gut microbiome, immune function, and metabolome. (Figure 5). A decrease in key microbes, including Faecalibacterium prausnitzii, and resultant dysfunction of microbial metabolic pathways such as butyrate, tryptophan, and BCAA, contributed to critical ME/CFS phenotypes, particularly pain and gastrointestinal abnormalities. In healthy individuals, these microbial metabolites regulate mucosal immune cells, including Th17, Th22, and Treg cells, an interaction that is dysfunctional in ME/CFS resulting in elevated pro-inflammatory interactions via elevated activation of γδ T cells and CD8 MAIT cells with the secretion of INFg and GzA, particularly impacting health perception and social activities. Additional health-associated networks between gut microbial metabolites, particularly benzoate, with plasma metabolites such as lipids, GPE, fatty acids, and bile acids, were weakened or reversed in ME/CFS. This breakdown in the host metabolic-microbiome balance were collectively associated with fatigue, emotional and sleeping problems, supporting recent findings underscoring microbial mechanisms in the gut-brain axis that occur via modulation of plasma metabolites62,63,64.

Figure 5: Overview of Dysbiotic Host-Microbiome Interactions in ME/CFS.

Figure 5:

This conceptual diagram visualizes the host-microbiome interactions in healthy conditions (left) and its disruption and transition into the disease state in ME/CFS (right). The base icons of the figure remain consistent, while gradients and changes in color and size visually represent the progression of the disease. Process of production and processing is represented by lines with arrows, where the color indicates an increase (red) or decrease (blue) in the pathway in disease; lines without arrows indicate correlations, with red representing positive and blue representing negative correlations. In healthy conditions, microbial metabolites support immune regulation, maintaining mucosal integrity and healthy inflammatory responses by positively regulating Treg and Th22 cell activity, and controlling Th17 activities, including the secretion of IL17 (purple cells), IL22 (blue), and IFNγ. These microbial metabolites also maintain many positive interactions with plasma metabolites like lipids, bile acids, vitamins, and phenols. In ME/CFS, there is a significant decrease in beneficial microbes and a disruption in metabolic pathways, marked by a decrease in the butyrate (brown-red dots) and BCAA (yellow) pathways and an increase in tryptophan (green) and benzoate (red) pathways. These changes are linked to gastrointestinal issues. In ME/CFS, the regulatory capacity of the immune system diminishes, leading to the loss of health-associated interactions with Th17, Th22, and Treg cells, and an increase in inflammatory immune activity. Pathogenic immune cells, including CD8 MAIT and γδT cells, show increased activity, along with the secretion of inflammatory cytokines such as IFNγ and GzmA, contributing to worsened general health and social functioning. Healthy interactions between gut microbial metabolites and plasma metabolites weaken or even reverse in the disease state. A notable strong connection increased in ME/CFS is benzoate transformation to hippurate, associated with emotional disturbances, sleep issues, and fatigue. Abbreviations: IFNγ, Interferon gamma; Th17, T helper 17 cells; Th22, T helper 22 cells; Treg, Regulatory T cells; GzmA, Granzyme A; MAIT, Mucosa-Associated Invariant T cells; γδT, Gamma delta T cells; BCAA, Branched-Chain Amino Acids; GPE, Glycerophosphoethanolamine.

Discussion

Democratization of AI technologies and large-scale multi-‘omics has the promise of revolutionizing precision medicine65,66,67,68. This study generated among the richest, most extensive paired multi-‘omics dataset to date4,28,29,30,31,69,70,71, with new insights not only into ME/CFS, but potential other applications to heterogeneous and complicated diseases like fibromyalgia72 and long COVID73. BioMapAI marks the first AI trained to systematically decode these complex, multi-system symptoms. Traditionally, diagnosing ME/CFS has been challenging, often relying heavily on self-reported questionnaires74,75. However, the crux for long-term post-viral infection syndromes like ME/CFS is not necessarily pinpointing an exact diagnosis or tracing disease origins76,14 (typically infections77), but rather addressing the chronic, multifaceted symptoms that significantly impacts patients’ quality of life78,79. Our study introduces a highly nuanced approach to link physiological changes in gut microbiome, plasma metabolome, and immune status, with host symptoms, moving beyond the initial causes of the disease80,81. Importantly, we validated key biomarkers in external cohorts28,29,30,31, despite significant demographic and methodological differences between the studies.

In addition, by integrating these datatypes, we constructed complex new host-microbiome networks contrasted in health vs. ME/CFS. Networks constructed in healthy individuals revealed unique microbe-immune-metabolome connections and set a baseline for comparing numerous disease conditions while, critically, accounting for cohort covariates, including age, gender, and weight, as these factors reshape these networks by differing degrees, just as comorbid conditions like aging or obesity can complicate and individualize disease profiles. This approach enhanced the reliability of our findings in ME/CFS by rigorously accounting for potential confounders and solidified our proposed mechanisms exclusively to the disease itself82,83. For example, gut microbiome abnormalities were most relevant to ME/CFS, while changes in immune profiles and plasma metabolome were significant but influenced by factors like age and gender. Symptomatologically, the gut microbiome was expectedly linked to gastrointestinal issues and unexpectedly, to pain, fatigue, and mental health problems, possibly due to disruptions in the gut-brain axis from abnormal microbial metabolic functions, such as lost network connections with key plasma metabolites, particularly lipids. We previously noted immune abnormalities in ME/CFS84; in this study, we further analyzed activation of mucosal and inflammatory immunity, namely MAIT and γδ T cells, which linked to dysbiosis in gut microbial functions. These nuanced insights, while still premature for actual treatment applications, lay the groundwork for more precise controlled experiments and interventional studies. For instance, personalized treatment options could include supplementation of butyrate and amino acids for patients suffering from severe gastrointestinal and emotional symptoms, or targeted treatments for chronic inflammation for those experiencing significant pain and fatigue.

Taken together, our results underscore BioMapAI’s particular suitability to complex datatypes that collectively, better explains the phenotypic heterogeneity of diseases such as ME/CFS than any one alone. BioMapAI’s specialized deep neuron network structure with two shared general layers and one outcome-focused parallel layer is moreover generalizable and scalable to other cohort studies that aim to utilize ‘omics data for a range of outputs (e.g., not just limited to clinical symptoms). For instance, researchers could employ our model to link whole genome sequencing data with blood or protein measurements. Constructed to automatically adapt to any input matrix X and any output matrix Y=y1,y2,,ym, BioMapAI defaults to parallelly align specific layers for each output, y. Currently, the model treated all 12 studied symptoms, y1,y2,,y12, with equal importance due to the unclear symptom prioritization in ME/CFS85. We computed modules to assign different weights to symptoms to enhance diagnostic accuracy. While this approach was not particularly effective for ME/CFS, it may be more promising for diseases with more clearly defined symptom hierarchies86,87. In such cases, adjusting the weights of symptoms in the model’s final layer could improve performance and help pinpoint which symptoms are truly critical.

Limitations of our study include that that our study population was comprised more females and older individuals, majorly Caucasian, though this is consistent with the epidemiology of ME/CFS21,88,89, and was from a single geographic location (Bateman Horne Center). This may limit our findings to certain populations. In addition, previous RNA sequencing studies have suggested mitochondrial dysfunction and altered energy metabolism in ME/CFS90,91,92,93,94; thus, incorporating host PBMC RNA or ATAC sequencing in future research could provide deeper insights into regulatory changes. The typical decades-long disease progression of ME/CFS makes it challenging for our four-year longitudinal design to capture stable temporal signals - although separating our short-term (<4 years) and long-term (>10 years) provided valuable insights – ideally, tracking the same patients over a longer period would likely yield more accurate trends95,96. Long disease history also increases the likelihood of exposure to various diets and medications97, which could influence biomarker identification, particularly in metabolomics. Finally, model-wise, BioMapAI was trained on < 500 samples with fivefold cross-validation, which is relatively small given the complexity of the outcome matrix; expanding the training dataset and incorporating more independent validation sets could potentially enhance its performance and generalizability98,99.

Methods

Study Design.

This was 4-year prospective study. All participants had a physical examination at the baseline visit that included evaluation of vital signs, BMI, orthostatic vital signs, skin, lymphatic system, HEENT, pulmonary, cardiac, abdomen, musculoskeletal, nervous system and fibromyalgia (FM) tender points. We enrolled a total of 153 ME/CFS patients (of which 75 had been diagnosed with ME/CFS <4 years before recruitment and 78 had been diagnosed with ME/CFS >10 years before recruitment) and 96 healthy controls. Among them, 110 patients and 58 healthy controls were followed one year after the recruitment as timepoint 2; 81 patients and 13 healthy controls were followed two years after the recruitment as timepoint 3; and 4 patients were followed four years after the recruitment as timepoint 4. Subject characteristics are shown in Supplemental Table 1 and Supplemental Figure 1A.

Medical history and concomitant medications were documented. Blood samples were obtained prior to orthostatic and cognitive testing. The 10-minute NASA Lean Test and cognitive testing were conducted after the physical examination and blood draw100. Cognitive efficiency was tested with the DANA Brain Vital, measuring three reaction time and information processing measurements101. The orthostatic challenge was assessed with the 10-minute NASA Lean Test (NLT). Participants rested supine for 10 minutes, and baseline blood pressure (BP) and heart rate (HR) were measured twice during the last 2 minutes of rest102.

Participants were provided with an at-home stool collection kit at the end of each in-person visit. The following questionnaires were completed at baseline: DePaul Symptom Questionnaire (DSQ), Post-Exertional Fatigue Questionnaire, RAND-36, Fibromyalgia Impact Questionnaire-R, ACR 2010 Fibromyalgia Criteria Symptom Questionnaire, Pittsburgh Sleep Quality Index (PSQI), Stanford Brief Activity Survey, Orthostatic Intolerance Daily Activity Scale, Orthostatic Intolerance Symptom Assessment, Brief Wellness Survey, Hours of Upright Activity (HUA), medical history and family history. All but medical history and family history were administered again when participants came for their annual visit.

Approval was received before enrolling any subjects in the study (The Jackson Laboratory Institutional Review Board, 17-JGM-13). All participants were educated about the study prior to enrollment and signed all appropriate informed consent documents. Research staff followed Good Clinical Practices (GCP) guidelines to ensure subject safety and privacy.

ME/CFS Cohort.

Beginning in January 2018, we enrolled ME/CFS patients who had been sick for <4 years or sick for >10 years. No ME/CFS patients with duration ≥4 years and ≤10 years were enrolled in order to have clear distinctions between short and long duration of illness with ME/CFS. All participants were 18 to 65 years old at the time of enrollment. ME/CFS diagnosis according to the Institute of Medicine clinical diagnostic criteria and disease duration of <4 years were confirmed during clinical differential diagnosis and thorough medical work up103. Additional inclusion criteria required, 1) a substantial reduction or impairment in the ability to engage in pre-illness levels of occupational, educational, social, or personal activities that persists for more than 6 months and less than 4 years and is accompanied by fatigue, which is often profound, is of new or definite onset (not lifelong), is not the result of ongoing excessive exertion, and is not substantially alleviated by rest, and 2) post-exertional malaise. Exclusionary criteria for the <4 year ME/CFS cohort were, 1) morbid obesity BMI>40, 2) other active and untreated disease processes that explain most of the major symptoms of fatigue, sleep disturbance, pain, and cognitive dysfunction, 3) untreated primary sleep disorders, 4) rheumatological disorders, 5) immune disorders, 6) neurological disorders, 7) infectious diseases, 8) psychiatric disorders that alter perception of reality or ability to communicate clearly or impair physical health and function, 9) laboratory testing or imaging are available that support an alternate exclusionary diagnosis, and 10) treatment with short-term (less than 2 weeks) antiviral or antibiotic medication within the past 30 days.

For the >10 year ME/CFS cohort, disease duration of >10 year and clinical criteria was confirmed to meet the Institute of Medicine criteria for ME/CFS during clinical evaluation and medical history review103. Other than disease duration, inclusion and exclusion criteria were the same as for <4 year ME/CFS cohort.

Healthy Control Cohort.

Healthy control participants were also between 18 to 65 years of age and in general good health. Enrollment began in 2018 and subjects were selected to match the <4 year ME/CFS cohort by age (within 5 years), race, and sex (~2:1 female to male ratio). Exclusion criteria for healthy controls included, 1) a diagnosis or history of ME/CFS, 2) morbid obesity BMI>40, 3) treatment with short-term (less than 2 weeks) antiviral or antibiotic medication within the past 30 days or 4) treatment long-term (longer than 2 weeks) antiviral medication or immunomodulatory medications within the past 6 months.

Clinical Metadata and Scores.

Clinical symptoms and baseline health status were assessed on the day of physical examination and biological sample collection for both case and control subjects. For each participant, we collected demographic information (including age, gender, diet, race, BMI, family, work, and education), medical histories, clinical tests and questionnaires. From questionnaires and test as described above, we summarized 12 clinical scores to cover major symptoms of ME/CFS: Scores 1–8 were derived from the RAND36, following standardized rules 104 and summarized into eight categories: Physical Functioning (also referred to as Daily Activity in the main contents), Role Limitations due to Physical Health (Physical Limitations), Role Limitations due to Emotional Problems (Emotional Problems), Energy/Fatigue, Emotional Wellbeing (Mental Health), Social Functioning (Social Activity), Pain, and General Health (Health Perception). Cognitive Efficiency was summarized from the DANA Brain Vital test, Orthostatic Intolerance from the NLT test, Sleeping Problem Score from the Pittsburgh Sleep Quality Index (PSQI) questionnaire, and Gastrointestinal Problems Score from the Gastrointestinal Symptom Rating Scale (GSRS) questionnaire. Each score was transformed into a 0–1 scale to facilitate combination and comparison, where a score of 1 indicates maximum disability or severity and a score of 0 indicates no disability or disturbance.

Plasma Sample collection and Preparation.

Healthy and patient blood samples were obtained from Bateman Horne Center, Salt Lake City, UT and approved by JAX IRB. One 4 mL lavender top tube (K2EDTA) was collected, and tube slowly inverted 8–10 times immediately after collection. Blood was centrifuged within 30 minutes of collection at 1000 × g with low brake for 10 minutes. 250 uL of plasma was transferred into three 1 mL cryovial tubes, and tubes were frozen upright at −80°C. Frozen plasma samples were batch shipped overnight on dry ice to The Jackson Laboratory, Farmington, CT, and stored at −80°C. One green top tube (Heparin) was collected, and tube slowly inverted 8–10 times immediately after collection. Heparinized blood samples were shipped overnight at room temperature. Peripheral blood mononuclear cells (PBMC) were isolated using Ficoll-paque plus (GE Healthcare) and cryopreserved in liquid nitrogen.

Plasma untargeted metabolome by UPLC-MS/MS.

Plasma samples were sent to Metabolon platform and processed by Ultrahigh Performance Liquid Chromatography-Tandem Mass Spectroscopy (UPLC-MS/MS) following the CFS cohort pipeline. In brief, samples were prepared using the automated MicroLab STAR® system from Hamilton Company. The extract was divided into five fractions: two for analysis by two separate reverse phases (RP)/UPLC-MS/MS methods with positive ion mode electrospray ionization (ESI), one for analysis by RP/UPLC-MS/MS with negative ion mode ESI, one for analysis by HILIC/UPLC-MS/MS with negative ion mode ESI, and one sample was reserved for backup. QA/QC were analyzed with several types of controls were analyzed including a pooled matrix sample generated by taking a small volume of each experimental sample (or alternatively, use of a pool of well-characterized human plasma), extracted water samples, and a cocktail of QC standards that were carefully chosen not to interfere with the measurement of endogenous compounds were spiked into every analyzed sample, allowed instrument performance monitoring, and aided chromatographic alignment. Compounds were identified by comparison to Metabolon library entries of purified standards or recurrent unknown entities. The output raw data included the annotations and the value of peaks quantified using area-under-the-curve for metabolites.

Immune Profiling: Flow Cytometry Analysis.

Frozen PBMC aliquots were thawed, counted and divided into two parts, one part for day 0 surface staining, and the other part cultured in complete RPMI 1640 medium (RPMI plus 10% Fetal Bovine Serum (FBS, Atlanta Biologicals) and 1% penicillin/streptomycin (Corning Cellgro) supplemented with IL-2+IL15 (20ng/ml) for Treg subsets day 1 surface and transcription factors staining after culture with IL-7 (20ng/ml) for day 1 and day 6 intracellular cytokine staining, and a combination of cytokines (20ng/ml IL-12, 20ng/ml IL-15, and 40ng/ml IL-18) for day 1 intracellular cytokine staining (IL-12 from R&D, IL-7 and IL-15 from Biolegend). Surface staining was performed in staining buffer containing PBS + 2% FBS for 30 minutes at 4°C. When staining for chemokine receptors the incubation was done at room temperature. Antibodies used in the surface staining are 2B4, CD1c, CD14, CD16, CD19, CD25, CD27, CD31, CD3, CD303, CD38, CD4, CD45RO, CD56, CD8, CD95, CD161, CCR4, CCR6, CCR7, CX3CR1, CXCR3, CXCR5, γδ TCR bio, HLA-DR, IgG, IgM, LAG3, PD-1, TIM3, Va7.2, Va24Ja18 all were obtained from Biolegend.

For intracellular cytokine staining, cells were stimulated with PMA (40ng/ml for overnight cultured cells and 20ng/ml for 6 days cultured cells) and Ionomycin (500ng/ml) (both from Sigma-Aldrich) in the presence of GolgiStop (BD Biosciences) for 4 hours at 37°C. For cytokine secretion after stimulation with IL-12+IL-15+IL-18, GolgiStop was added to the culture on day 1 for 4 hours. For intracellular cytokine and transcription factor staining, PMA+Ionomycin stimulated cells of unstimulated cells were collected, stained with surface markers including CD3, CD4, CD8, CD161, PD1, 2B4, Vα7.2, CD45RO, CCR6, and CD27 followed by one wash with PBS (Phosphate buffer Saline) and staining with fixable viability dye (eBioscience). After surface staining, cells were fixed and permeabilized using fixation/permeabilization buffers (eBioscience) according to the manufacturer’s instruction. Permeabilized cells were then stained for intracellular FOXP3, Helios, IL-4, IFNγ, TNFα, IL-17A, IL-22, Granzyme A, GM-CSF, and Perforin from Biolegend. Flow cytometry analysis was performed on Cytek Aurora (Cytek Biosciences) and analyzed using FlowJo (Tree Star).

Fecal Sample Collection and DNA Extraction.

Stool was self-collected at home by volunteers using a BioCollector fecal collection kit (The BioCollective, Denver, CO) according to manufacturer instructions for preservation for sequencing prior to sending the sample in a provided Styrofoam container with a cold pack. Upon receipt, stool and OMNIgene samples were immediately aliquoted and frozen at −80°C for storage. Prior to aliquoting, OMNIgene stool samples were homogenized by vortexing (using the metal bead inside the OMNIgene tube), then divided into 2 microfuge tubes, one with 100μL aliquot and one with 1mL. DNA was extracted using the Qiagen (Germantown, MD, USA) QIAamp 96 DNA QIAcube HT Kit with the following modifications: enzymatic digestion with 50μg of lysozyme (Sigma, St. Louis, MO, USA) and 5U each of lysostaphin and mutanolysin (Sigma) for 30 min at 37 °C followed by bead-beating with 50 μg 0.1 mm of zirconium beads for 6 min on the Tissuelyzer II (Qiagen) prior to loading onto the Qiacube HT. DNA concentration was measured using the Qubit high sensitivity dsDNA kit (Invitrogen, Carlsbad, CA, USA).

Metagenomic Shotgun Sequencing.

Approximately 50μL of thawed OMNIgene preserved stool sample was added to a microfuge tube containing 350 μL Tissue and Cell lysis buffer and 100 μg 0.1 mm zirconia beads. Metagenomic DNA was extracted using the QiaAmp 96 DNA QiaCube HT kit (Qiagen, 5331) with the following modifications: each sample was digested with 5μL of Lysozyme (10 mg/mL, Sigma-Aldrich, L6876), 1μL Lysostaphin (5000U/mL, Sigma-Aldrich, L9043) and 1μL oh Mutanolysin (5000U/mL, Sigma-Aldrich, M9901) were added to each sample to digest at 37°C for 30 minutes prior to the bead-beating in the in the TissueLyser II (Qiagen) for 2 × 3 minutes at 30 Hz. Each sample was centrifuged for 1 minute at 15000 × g prior to loading 200μl into an S-block (Qiagen, 19585) Negative (environmental) controls and positive (in-house mock community of 26 unique species) controls were extracted and sequenced with each extraction and library preparation batch to ensure sample integrity. Pooled libraries were sequenced over 13 sequencing runs using both HiSeq (N=87) and NovaSeq (N=392) platforms. To address potential biases arising from varying read depths, all samples were down-sampled, using seqtk108 (v1.3-r106), to 5 million reads. This threshold corresponds to the 95th percentile of the read count distribution across the dataset.

Sequencing adapters and low-quality bases were removed from the metagenomic reads using scythe (v0.994) and sickle (v1.33), respectively, with default parameters. Host reads were removed by mapping all sequencing reads to the hg19 human reference genome using Bowtie2 (v2.3.1), under ‘very-sensitive’ mode. Unmapped reads (i.e., microbial reads) were used to estimate the relative abundance profiles of the microbial species in the samples using MetaPhlAn4.

Taxonomic Profiling (Specie Abundance) and KEGG Gene Profiling.

Taxonomic compositions were profiled using Metaphlan4.0105 and the species whose average relative abundance > 1e-4 were kept for further analysis, giving 384 species. The gene profiling was computed with USEARCH106 (v8.0.15) (with parameters: evalue 1e-9, accel 0.5, top_hits_only) to KEGG Orthology (KO) database v54, giving a total of 9452 annotated KEGG genes. The reads count profile was normalized by DeSeq2107 in R. Genes with a prevalence of over 20% were selected for downstream analysis.

Confounder Analysis.

Confounder analysis was done by R package MaAsLin2109. We considered demographic features (including age, gender, BMI, ethnicity, and race), diet records, medications (antivirals, antifungals, antibiotics, and probiotics), and self-reported IBS scores as potential confounders. The analysis followed the model formula:

exprage+gender+bmi+ethnic+race+IBS+diet_meat+diet_sugar+diet_veg+diet_grains+diet_fruit+antifungals+antibiotics+probiotics+antivirals+(1sample_id_tp1)

where expr refers to the ‘omics matrix. For each feature in the ‘omics data, we ran this generalized linear model to identify multivariable associations between each ‘omics feature and each metadata feature. Identified confounders were handled differently based on the type of data. For species and KEGG genes, any feature with a significant statistical association with any metadata feature was removed from all subsequent analyses, resulting in the removal of 21 species and 946 microbial genes. For immune profiling and plasma metabolomics, to remove the effects of identified confounders, each feature was adjusted by retaining the residuals105, i.e., the part of the outcome not explained by the confounding factors, from a general linear model:

y=(ypredictedconfounders)$residual

Additionally, for network and patient subset analysis (Methods), age, gender, BMI, and IBS were not included as confounders since we analyzed different age groups, gender groups, weight groups, and IBS groups separately. However, other identified confounders were still considered in the residual models.

BioMapAI.

The primary goal of BioMapAI is to connect high-dimensional biology data, X to mixed-type output matrix, Y. Unlike traditional ML or DL classifiers that typically predict a single outcome, y, BioMapAI is designed to learn multiple objects, Y=y1,y2,,yn, simultaneously within a single model. This approach allows for the simultaneous prediction of diverse clinical outcomes - including binary, categorical, continuous variables - with ‘omics profiles, thus address disease heterogeneity by tailoring each patient’s specific symptomology.

1. BioMapAI Structure.

BioMapAI is a fully connected deep neural network framework comprising an input layer X, a normalization layer, three sequential hidden layers, Z1, Z2, Z3,and one output layer Y.

  1. Input layer (X) takes high-dimensional ‘omics data, such as gene expression, species abundance, metabolome matrix, or any customized matrix like immune profiling and blood labs.

  2. Normalization Layer standardizes the input features to have zero mean and unit variance, defined as
    X=Xμσ
    where μ is the mean and σ is the standard deviation of the input features.
  3. Feature Learning Module is the core of BioMapAI, responsible for extracting and learning important patterns from input data. Each fully connected layer (hidden layer 1–3) is designed to capture complex interactions between features. Hidden Layer 1Z1 and Hidden Layer 2Z2 contain 64 and 32 nodes, respectively, both with ReLU activation and a 50% dropout rate, defined as:
    Zk=ReLUWkZk1+bk,k{1,2}
    Hidden Layer 3Z3 has n parallel sub-layers for each object, yi in Y. Every sub-layer, Zi3, contains 8 nodes, represented as:
    Zi3=ReLUWi3Z3+bi3,i{1,2,,n}
    All hidden layers used ReLU activation functions, defined as:
    ReLU(x)=max(0,x)
  4. Outcome Prediction Module is responsible for the final prediction of the objects. The output layer (Y) has n nodes, each representing a different object:
    yi=σWi4Zi3+bi4forbinaryobjectsoftmaxWi4Zi3+bi4forcategoricalobjectWi4Zi3+bi4forcontinuousobject
    The loss functions are dynamically assigned based on the type of each object:
    =1Ni=1Nyilogyˆi+1yilog1yˆiforbinaryobject1Ni=1Nj=1Cyijlogyˆijforcategoricalobject1Ni=1N0.5yiyˆi2,ifyiy^1δδyiyˆi0.5δ2,otherwiseforcontinuousobject

    During training, the weights are adjusted using the Adam optimizer. The learning rate was set to 0.01, and weights were initialized using the He normal initializer. L2 regularizations were applied to prevent overfitting.

  5. Optional Binary Classification Layer (not used for parameter training). An additional binary classification layer is attached to the output layer Y to evaluate the model’s performance in binary classification tasks. This layer is not used for training BioMapAI but serves as an auxiliary component to assess the accuracy of predicting binary outcomes, for example, disease vs. control. This ScoreLayer takes the predicted scores from the output layer and performs binary classification:
    ybinary=σWbinaryY+bbinary
    The initial weights of the 12 scores are derived from the original clinical data, and the weights are adjusted based on the accuracy of BioMapAI’s predictions:
    wnew=woldηMSE
    where MSE refers to the mean squared error (MSE) between the predicted y and true y, then adjusts the weights to optimize the accuracy of the binary classification.

2. Training and Evaluation of BioMapAI for ME/CFS – BioMapAI::DeepMECFS.

BioMapAI is a framework designed to connect high-dimensional, sparse biological ‘omics matrix X to multioutput Y. While BioMapAI is not tailored to a specific disease, it is versatile and applicable to a broad range of biomedical topics. In this study, we trained and validated BioMapAI using our ME/CFS datasets. The trained models are available on GitHub, nicknamed DeepMECFS, for the benefit of the ME/CFS research community.

  1. Dataset Pre-Processing Module: Handling Sample Imbalance. To ensure uniform learning for each output y, it is crucial to address sample imbalance before fitting the framework. We recommend using customized sample imbalance handling methods, such as Synthetic Minority Over-sampling Technique (SMOTE)110, Adaptive Synthetic (ADASYN)111, or Random Under-Sampling (RUS)112. In our ME/CFS dataset, there is a significant imbalance, with the patient data being twice the size of the control data. To effectively manage this class imbalance, we employed RUS as a random sampling method for the majority class. Specifically, we randomly sampled the majority class 100 times. For each iteration i, a different random subset Simajority was used. This subset Simajority of the majority class was combined with the entire minority class Sminority. For each iteration i:
    SimajoritySmajortiy,Sminority=SminoritySi=SimajoritySminority
    where the combined dataset Si was used for training at each iteration. This approach allows the model to generalize better and avoid biases towards the majority class, improving overall performance and robustness.
  2. Cross-Validation and Model Training. DeepMECFS is the name of the trained BioMapAI model with ME/CFS datasets. We trained on five preprocessed ‘omics datasets, including species abundances (Feature N=118, Sample N=474) and KEGG gene abundances (Feature N=3959, Sample N=474) from the microbiome, plasma metabolome (Feature N=730, Sample N=407), immune profiling (Feature N=311, Sample N=481), and blood measurements (Feature N=48, Sample N=495). Additionally, an integrated ‘omics profile was created by merging the most predictive features from each ‘omics model related to each clinical score (SHAP Methods), forming a comprehensive matrix of 154 features, comprising 50 immune features, 32 species, 30 KEGG genes, and 42 plasma metabolites.

    To evaluate the performance of BioMapAI, we employed a robust 5-fold cross-validation. Training was conducted over 500 epochs with a batch size of 64 and a learning rate of 0.0005, optimized through grid search. The Adam optimizer was used to adjust the weights during training, chosen for its ability to handle sparse gradients on noisy data. The initial learning rate was set to 0.01, with beta1 set to 0.9, beta2 set to 0.999, and epsilon set to 1e-7 to ensure numerical stability. Dropout layers with a 50% dropout rate were used after each hidden layer to prevent overfitting, and L2 regularization (λ=0.008) was applied to the kernel weights, defined as:
    Lreg=λ2i=1Nwi2
  3. Model Evaluation. To evaluate the performance of the models, we employed several metrics tailored to both regression and classification tasks. The Mean Squared Error (MSE) was used to evaluate the performance of the reconstruction of each object. For each yi, MSE was calculated as:
    MSEi=1Nj=1Nyijyˆij2,i=1,2,,n
    where yij is the actual values, yˆij is the predicted values, and N is the number of samples, n is the number of objects. For binary classification tasks (ME/CFS vs control), we utilized multiple metrics including accuracy, precision, recall, and F1 score to enable a comprehensive evaluation of the model’s performance.

    To evaluate the performance of BioMapAI, we compared its binary classification performance with three traditional machine learning models and one deep neural network (DNN) model. The traditional machine learning models included: 1) Logistic Regression (LR) (C=0.5, saga solver with Elastic Net regularization); 2) Support Vector Machine (SVM) with an RBF kernel (C=2); and 3) Gradient Boosting Decision Trees (GBDT) (learning rate = 0.05, maximum depth = 5, estimators = 1000). DNN model employed the same hyperparameters as BioMapAI, except it did not include the parallel sub-layer, Z3, thus it only performed binary classification instead of multi-output predictions. The comparison between BioMapAI and DNN aims to assess the specific contribution of the spread-out layer, designed for discerning object-specific patterns, in binary prediction. Evaluation metrics are detailed in Supplemental Table 3.

  4. External Validation with Independent Dataset. To validate BioMapAI’s robustness in binary classification, we utilized 4 external cohorts28,29,30,31 comprising more than 100 samples. For these external cohorts, only binary classification is available. A detailed summary of data collection for these cohorts is provided in Supplemental Table 4. For each external cohort, we processed the raw data (if available) using our in-house pipeline. The features in the external datasets were aligned to match those used in BioMapAI by reindexing the datasets. The overlap between the features in the external dataset and BioMapAI’s feature set was calculated to determine feature coverage. Any missing features were imputed with zeros to maintain consistency across datasets. The input data was then standardized as BioMapAI. We loaded the pre-trained BioMapAI, GBDT, and DNN for comparison. LR and SVM were excluded because they did not perform well during the in-cohort training process. The performance of the models was evaluated using the same binary classification evaluation metrics. Evaluation metrics detailed in Supplemental Table 4.

3. BioMapAI Decode Module: SHAP.

BioMapAI is designed to be explainable, ensuring that it not only reconstructs and predicts accurately but also is interpretable, which is particularly crucial in the biological domain. To achieve this, we incorporated SHapley Additive exPlanations (SHAP) into our framework. SHAP offers a consistent measure of feature importance by quantifying the contribution of each input feature to the model’s output.113

We applied SHAP to BioMapAI to interpret the results, following these three steps:

  1. Model Reconstruction. BioMapAI’s architecture includes two shared hidden layers - Z1, Z2- and one parallel sub-layers - Zi3- for each object yi. To. To decode the feature contributions for each object yi, we reconstructed sub-models from single comprehensive model:
    Modeli=Z1+Z2+Zi3,i=1,2,,n
    where n is the number of learned objects.
  2. SHAP Kernel Explainer. For each reconstructed model, Modeli, we used the SHAP Kernel Explainer to compute the feature contributions. The explainer was initialized with the model’s prediction function and the input data X:
    explaineri=shap.KernelExplainer(Modeli.predict,X,i=1,2,,n
    Then SHAP values were computed to determine the contribution of each feature to yi:
    ϕi=explaineri(X),i=1,2,,n
    The kernel explainer is a model-agnostic approach that approximates SHAP by evaluating the model with and without the feature of interest and then assigning weights to these evaluations to ensure fairness. For each modeli, with each feature j:
    ϕij(f,x)=SiNi\{j}Si!mSi1!m!ModeliSijModeliSi=1mSiNi\{j}m1mSi11ModeliSijModeliSi,i=1,2,,n
    where n is the number of learned objects, m is the total number of features, ϕij is the Shapley value for feature j in modeli, Ni is the full set of features in modeli, Si is the subset of features not including feature j, ModeliSi is the model prediction for the subset Si. The SHAP value matrix, ϕi, were further reshaped to align with the input data dimensions.
  3. Feature Categorization. Analyzing the SHAP value matrices, ϕ1,ϕ2,,ϕn, features can be roughly assigned to two categories: shared features - important to all outputs; or specific features - specifically important to individual outputs. We set the cutoff at 75%, where features consistently identified as top contributors in 75% of the models were classified as shared important features, termed disease-specific biomarkers. Features that were top contributors in only a few models were classified as specific important features, termed symptom-specific biomarkers.

    By reconstructing individual models, Modeli, for each object, yi, and applying SHAP explainer individually, we effectively decoded the contributions of input features to BioMapAI’s predictions. This method allowed us to categorize features into shared and specific categories—termed as disease-specific and symptom-specific biomarkers—providing novel interpretations of the ‘omics feature contribution to clinical symptoms.

4. Packages and Tools.

BioMapAI was constructed by Tensorflow(v2.12.0)114 and Keras(v2.12.0). ML models were from scikit-learn(v 1.1.2)115.

WGCNA and Network Analysis.

To identify co-expressed patterns of each ‘omics, we employed the Weighted Gene Co-expression Network Analysis (WGCNA) using the WGCNA116 package in R. The analysis was performed on preprocessed omics data (Methods): species abundances (Feature N=373, Sample N=479) and KEGG gene abundances (Feature N=4462, Sample N=479) from the microbiome, plasma metabolome (Feature N=395, Sample N=414), immune profiling (Feature N=311, Sample N=489). Network construction and module detection involved choosing soft-thresholding powers tailored to each dataset: 6 for species, 7 for KEGG, 5 for immune, and 6 for metabolomic. The adjacency matrices were transformed into topological overlap matrices (TOM) to reduce noise and spurious associations. Hierarchical clustering was performed using the TOM, and modules were identified using the dynamic tree cut method with a minimum module size of 30 genes. Module eigengenes were calculated, and modules with highly similar eigengenes (correlation > 0.75) were merged. Module-trait relationships were assessed by correlating module eigengenes with clinical traits, and gene significance (GS) and module membership (MM) were used to identify hub genes within significant modules. Network analysis was conducted using igraph117 in R. Module eigengenes from the WGCNA analysis were extracted for each dataset. A combined network was constructed by calculating Spearman correlation coefficients (corrected, Methods) between the module eigengenes of different datasets, and an adjacency matrix was created based on a threshold of 0.3 (absolute value) to include only significant associations. Network nodes represented module eigengenes and edges represented significant correlations. Degree centrality and betweenness centrality were calculated to identify highly connected and influential nodes. Networks in patient subgroups were displayed as the correlation differences from their healthy counterparts to exclude the influence of covariates. For example, correlations in female patients were compared with female healthy, and correlations in older patients were compared with older healthy.

Statistical Analysis.

The dimensionality reduction analysis was conducted by Principal Correspondence Analysis (PCoA) using sklearn.manifold.MDS function for ‘omics. For combined ‘omics data, PCoA was applied to combined module eigengenes from WGCNA. Fold change of species, genes, immune cells, and metabolites were compared between patient and control groups, short-term and control groups, and long-term and control groups. P values were computed by Wilcoxon signed-rank test with False Discovery Rate (FDR) correction, adjusted for multiple group comparisons. Spearman’s rank correlation was used to assess correlation covariant. P-values were adjusted using Holm’s method, accounting for multiple group comparisons. P value annotations: ns: p > 0.05, *: 0.01 < p <= 0.05, **: 0.001 < p <= 0.01, ***: p <= 0.001.

Longitudinal Analysis.

To capture statistically meaningful temporal signals, we employed various statistical and modeling methods, accounting for both linear and non-linear trends and intra-individual correlations:

1. Interquartile Range (IQR) and Intraclass Correlation Coefficient (ICC).

We initially assessed statistics at different time points by computing the IQR and ICC. Data were standardized to a mean of zero and a standard deviation of one to ensure comparability across features with different scales. The IQR quantified variability, while the ICC assessed the dependence of repeated measurements118, indicating the similarity of measurements over time. Data showed no statistical dependence and no trend of stable variance across time points.

2. Generalized Linear Models (GLMs).

GLMs119 were then used to analyze the effects of time points, considering age, gender, and their interactions. Time points were included as predictors to reveal changes in dependent variables over time, with interaction terms exploring variations based on age and gender. Random effects accounted for intra-individual correlations. Although 12 features out of 5000 showed weak trends over time (slopes < 0.2), they were not deemed sufficient to be potential longitudinal biomarkers, possibly due to individualized patterns.

3. Repeated Measures Correlation (rmcorr).

To better consider individual effects, we employed rmcorr120 to assess consistent patterns of association within individuals over time. This method captured stable within-individual associations across different time points. However, only 30 features out of 5000 showed weak slopes (< 0.3), and these were not considered sufficient to conclude the presence of longitudinal signals.

4. Smoothing Spline ANOVA (SS-ANOVA).

We then considered the longitudinal trends could be non-linear and more complex. To model complex, non-linear relationships between response variables and predictors over time, SS-ANOVA121 was used. SS-ANOVA uncovered non-linear trends and interactions in the omics data, however, no strong temporal signals were identified. In conclusion, robust analysis of the longitudinal data, accounting for both linear and non-linear trends and intra-individual correlations, revealed the difficulty in extracting strong and statistically meaningful temporal signals. As Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) is a disease that usually lasts for decades with non-linear progression, the four-year tracking period with annual measurements is likely insufficient for capturing consistent temporal signals, necessitating longer follow-up periods.

Supplementary Material

Supplement 1
media-1.pdf (9.6MB, pdf)
Supplement 2
media-2.pdf (30.2MB, pdf)
Supplement 3
media-3.pdf (395.5KB, pdf)
Supplement 4
media-4.pdf (8.3MB, pdf)
Supplement 5
media-5.pdf (2.1MB, pdf)
Supplement 6
media-6.pdf (19.2MB, pdf)
Supplement 7

Acknowledgements and Funding

We are thankful to the Oh, Unutmaz, and Li laboratories for inspiring discussions and acknowledge the contribution of the Genome Technologies Service at The Jackson Laboratory for expert assistance with sample sequencing for the work described in this publication. We also thank the clinical support team at the Bateman Horne Center and all the individuals who participated in this study. This work was funded by 1U54NS105539. JO is additionally supported by the NIH (1 R01 AR078634-01, DP2 GM126893-01, 1 U19 AI142733, 1 R21 AR075174).

Footnotes

Competing Interests

Dr. Suzanne D. Vernon is affiliated and has a financial interest with The BioCollective, a company that provided the BioCollector, the collection kit used for at home stool collection discussed in this manuscript. No other authors have competing interests.

Lead Contact

Further information and requests for resources and reagents should be directed to the lead contact, Julia Oh (Julia.Oh@jax.org).

Supplemental Table 1 Sample Metadata and Clinical Scores

Supplemental Table 2 Model Performance at Reconstructing Twelve Clinical Scores: Averaged Average Mean Squared Error by Model

Supplemental Table 3 Model Performance in Diagnostic Comparison—Within-Cohort, Cross-Validated by Various ML and DL Models

Supplemental Table 4 Model Performance in Diagnostic Comparison—Across Independent Cohorts

Supplemental Table 5 Disease-Specific Biomarker: Averaged Feature Contribution of BioMapAI, DNN and GDBT

Supplemental Table 6 Symptom-Specific Biomarker: Distinct Sets of Biomarkers for Each Symptom

Supplemental Table 7 WGCNA Module Eigengene

Supplemental Table 8 Targeted Pathways: Normalized Gene Read Counts and Their Correlation with Blood Responders

Data and Code

Metagenomics data is being deposited under the BioProject submission number SUB14546737 and will be publicly available as of the date of publication. Accession numbers are listed in the key resources table. BioMapAI framework is available at https://github.com/ohlab/BioMapAI/codes/AI. All original code, analyzed data and trained model has been deposited at https://github.com/ohlab/BioMapAI. Other ‘omics data and any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

References

  • 1.de Mel S., Lim S. H., Tung M. L. & Chng W.-J. Implications of Heterogeneity in Multiple Myeloma. BioMed Res. Int. 2014, 232546 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Wallstrom G., Anderson K. S. & LaBaer J. Biomarker Discovery for Heterogeneous Diseases. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Weyand C. M., McCarthy T. G. & Goronzy J. J. Correlation between disease phenotype and genetic heterogeneity in rheumatoid arthritis. J. Clin. Invest. 95, 2120–2126 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Xiong R. et al. Multi-‘omics of gut microbiome-host interactions in short- and long-term myalgic encephalomyelitis/chronic fatigue syndrome patients. Cell Host Microbe 31, 273–287.e5 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Giladi N., Mirelman A., Thaler A. & Orr-Urtreger A. A Personalized Approach to Parkinson’s Disease Patients Based on Founder Mutation Analysis. Front. Neurol. 7, (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Brown S. M. et al. Consistent Effects of Early Remdesivir on Symptoms and Disease Progression Across At-Risk Outpatient Subgroups: Treatment Effect Heterogeneity in PINETREE Study. Infect. Dis. Ther. 12, 1189–1203 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Iwasaki T. & Sano H. Predicting Treatment Responses and Disease Progression in Myeloma using Serum Vascular Endothelial Growth Factor and Hepatocyte Growth Factor Levels. Leuk. Lymphoma 44, 1275–1279 (2003). [DOI] [PubMed] [Google Scholar]
  • 8.Hare P. J., LaGree T. J., Byrd B. A., DeMarco A. M. & Mok W. W. K. Single-Cell Technologies to Study Phenotypic Heterogeneity and Bacterial Persisters. Microorganisms 9, 2277 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Cohen R. M., Haggerty S. & Herman W. H. HbA1c for the Diagnosis of Diabetes and Prediabetes: Is It Time for a Mid-Course Correction? J. Clin. Endocrinol. Metab. 95, 5203–5206 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Zhou W. et al. Longitudinal multi-omics of host–microbe dynamics in prediabetes. Nature 569, 663–671 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Hong S. et al. Cancer Statistics in Korea: Incidence, Mortality, Survival, and Prevalence in 2017. Cancer Res. Treat. 52, 335–350 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zeeshan S., Xiong R., Liang B. T. & Ahmed Z. 100 years of evolving gene–disease complexities and scientific debutants. Brief. Bioinform. 21, 885–905 (2020). [DOI] [PubMed] [Google Scholar]
  • 13.Cortes Rivera M., Mastronardi C., Silva-Aldana C. T., Arcos-Burgos M. & Lidbury B. A. Myalgic Encephalomyelitis/Chronic Fatigue Syndrome: A Comprehensive Review. Diagnostics 9, 91 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sweetman E. et al. Current Research Provides Insight into the Biological Basis and Diagnostic Potential for Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS). Diagnostics 9, 73 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Noor N. et al. A Comprehensive Update of the Current Understanding of Chronic Fatigue Syndrome. Anesthesiol. Pain Med. 11, e113629 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ruiz-Pablos M., Paiva B., Montero-Mateo R., Garcia N. & Zabaleta A. Epstein-Barr Virus and the Origin of Myalgic Encephalomyelitis or Chronic Fatigue Syndrome. Front. Immunol. 12, 656797 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Su R. et al. The TLR3/IRF1/Type III IFN Axis Facilitates Antiviral Responses against Enterovirus Infections in the Intestine. mBio 11, 10.1128/mbio.02540-20 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Anderson D. E. et al. Lack of cross-neutralization by SARS patient sera towards SARS-CoV-2. Emerg. Microbes Infect. 9, 900–902 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Poenaru S., Abdallah S. J., Corrales-Medina V. & Cowan J. COVID-19 and post-infectious myalgic encephalomyelitis/chronic fatigue syndrome: a narrative review. Ther. Adv. Infect. Dis. 8, 20499361211009385 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Reuken P. A. et al. Longterm course of neuropsychological symptoms and ME/CFS after SARS-CoV-2-infection: a prospective registry study. Eur. Arch. Psychiatry Clin. Neurosci. (2023) doi: 10.1007/s00406-023-01661-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Bretherick A. D. et al. Typing myalgic encephalomyelitis by infection at onset: A DecodeME study. NIHR Open Res. 3, 20 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bae J. & Lin J.-M. S. Healthcare Utilization in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS): Analysis of US Ambulatory Healthcare Data, 2000–2009. Front. Pediatr. 7, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zheng Y. & Zhu Z. Editorial: Retrieving meaningful patterns from big biomedical data with machine learning approaches. Front. Genet. 14, (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Leelatian N. et al. Unsupervised machine learning reveals risk stratifying glioblastoma tumor cells. eLife 9, e56879 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Su Q. et al. The gut microbiome associates with phenotypic manifestations of post-acute COVID-19 syndrome. Cell Host Microbe 32, 651–660.e4 (2024). [DOI] [PubMed] [Google Scholar]
  • 26.Bourgonje A. R., van Goor H., Faber K. N. & Dijkstra G. Clinical Value of Multiomics-Based Biomarker Signatures in Inflammatory Bowel Diseases: Challenges and Opportunities. Clin. Transl. Gastroenterol. 14, e00579 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Marcos-Zambrano L. J. et al. Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment. Front. Microbiol. 12, (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Guo C. et al. Deficient butyrate-producing capacity in the gut microbiome is associated with bacterial network disturbances and fatigue symptoms in ME/CFS. Cell Host Microbe 31, 288–304.e8 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Raijmakers R. P. H. et al. Multi-omics examination of Q fever fatigue syndrome identifies similarities with chronic fatigue syndrome. J. Transl. Med. 18, 448 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Germain A. et al. Plasma metabolomics reveals disrupted response and recovery following maximal exercise in myalgic encephalomyelitis/chronic fatigue syndrome. JCI Insight 7, (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Che X. et al. Metabolomic Evidence for Peroxisomal Dysfunction in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. Int. J. Mol. Sci. 23, 7906 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Liñares-Blanco J., Fernandez-Lozano C., Seoane J. A. & López-Campos G. Machine Learning Based Microbiome Signature to Predict Inflammatory Bowel Disease Subtypes. Front. Microbiol. 13, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.He F. et al. Development and External Validation of Machine Learning Models for Diabetic Microvascular Complications: Cross-Sectional Study With Metabolites. J. Med. Internet Res. 26, e41065 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Hawken S. et al. External validation of machine learning models including newborn metabolomic markers for postnatal gestational age estimation in East and South-East Asian infants. Preprint at 10.12688/gatesopenres.13131.2 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Mora-Ortiz M., Trichard M., Oregioni A. & Claus S. P. Thanatometabolomics: introducing NMR-based metabolomics to identify metabolic biomarkers of the time of death. Metabolomics 15, 37 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Balasubramanian R. et al. Metabolomic profiles associated with all-cause mortality in the Women’s Health Initiative. Int. J. Epidemiol. 49, 289–300 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Li H., Ren M. & Li Q. 1H NMR-Based Metabolomics Reveals the Intrinsic Interaction of Age, Plasma Signature Metabolites, and Nutrient Intake in the Longevity Population in Guangxi, China. Nutrients 14, 2539 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Kondoh H. & Kameda M. Metabolites in aging and aging-relevant diseases: Frailty, sarcopenia and cognitive decline. Geriatr. Gerontol. Int. 24, 44–48 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Peng S., Shen Y., Wang M. & Zhang J. Serum and CSF Metabolites in Stroke-Free Patients Are Associated With Vascular Risk Factors and Cognitive Performance. Front. Aging Neurosci. 12, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Duerler P., Vollenweider F. X. & Preller K. H. A neurobiological perspective on social influence: Serotonin and social adaptation. J. Neurochem. 162, 60–79 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Pomrenze M. B., Paliarin F. & Maiya R. Friend of the Devil: Negative Social Influences Driving Substance Use Disorders. Front. Behav. Neurosci. 16, (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Laslett A.-M. Commentary on Bischof et al. : Empirical and conceptual paradigms for studying secondary impacts of a person’s substance use. Addiction 117, 3148–3149 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Carco C. et al. Increasing Evidence That Irritable Bowel Syndrome and Functional Gastrointestinal Disorders Have a Microbial Pathogenesis. Front. Cell. Infect. Microbiol. 10, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Saffouri G. B. et al. Small intestinal microbial dysbiosis underlies symptoms associated with functional gastrointestinal disorders. Nat. Commun. 10, 2012 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Liang S., Wu X., Hu X., Wang T. & Jin F. Recognizing Depression from the Microbiota–Gut–Brain Axis. Int. J. Mol. Sci. 19, 1592 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Zhu F., Tu H. & Chen T. The Microbiota–Gut–Brain Axis in Depression: The Potential Pathophysiological Mechanisms and Microbiota Combined Antidepression Effect. Nutrients 14, 2081 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Topan R. & Scott S. M. Sleep: An Overlooked Lifestyle Factor in Disorders of Gut-Brain Interaction. Curr. Treat. Options Gastroenterol. 21, 435–446 (2023). [Google Scholar]
  • 48.Moens de Hase E. et al. Impact of metformin and Dysosmobacter welbionis on diet-induced obesity and diabetes: from clinical observation to preclinical intervention. Diabetologia 67, 333–345 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Amabebe E., Robert F. O., Agbalalah T. & Orubu E. S. F. Microbial dysbiosis-induced obesity: role of gut microbiota in homoeostasis of energy metabolism. Br. J. Nutr. 123, 1127–1137 (2020). [DOI] [PubMed] [Google Scholar]
  • 50.Kavanagh P. et al. Tentative identification of the phase I and II metabolites of two synthetic cathinones, MDPHP and α-PBP, in human urine. Drug Test. Anal. 12, 1442–1451 (2020). [DOI] [PubMed] [Google Scholar]
  • 51.Wang J.-H. et al. Clinical evidence of the link between gut microbiome and myalgic encephalomyelitis/chronic fatigue syndrome: a retrospective review. Eur. J. Med. Res. 29, 148 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Lenoir M. et al. Butyrate mediates anti-inflammatory effects of Faecalibacterium prausnitzii in intestinal epithelial cells through Dact3. Gut Microbes (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Sokol H. et al. Faecalibacterium prausnitzii is an anti-inflammatory commensal bacterium identified by gut microbiota analysis of Crohn disease patients. Proc. Natl. Acad. Sci. 105, 16731–16736 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Quévrain E. et al. Identification of an anti-inflammatory protein from Faecalibacterium prausnitzii, a commensal bacterium deficient in Crohn’s disease. Gut 65, 415–425 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Miquel S. et al. Identification of Metabolic Signatures Linked to Anti-Inflammatory Effects of Faecalibacterium prausnitzii. mBio 6, 10.1128/mbio.00300-15 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Vital M., Howe A. C. & Tiedje J. M. Revealing the Bacterial Butyrate Synthesis Pathways by Analyzing (Meta)genomic Data. mBio 5, e00889–14 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Recharla N., Geesala R. & Shi X.-Z. Gut Microbial Metabolite Butyrate and Its Therapeutic Role in Inflammatory Bowel Disease: A Literature Review. Nutrients 15, 2275 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Monteiro C. R. A. V. et al. In Vitro Antimicrobial Activity and Probiotic Potential of Bifidobacterium and Lactobacillus against Species of Clostridium. Nutrients 11, 448 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Zhao M., Li G. & Deng Y. Engineering Escherichia coli for Glutarate Production as the C5 Platform Backbone. Appl. Environ. Microbiol. 84, e00814–18 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Nguyen-Lefebvre A. T., Selzner N., Wrana J. L. & Bhat M. The hippo pathway: A master regulator of liver metabolism, regeneration, and disease. FASEB J. 35, e21570 (2021). [DOI] [PubMed] [Google Scholar]
  • 61.Khan M. A., Gupta A., Sastry J. L. N. & Ahmad S. Hepatoprotective potential of kumaryasava and its concentrate against CCl4-induced hepatic toxicity in Wistar rats. J. Pharm. Bioallied Sci. 7, 297–299 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Kim C.-S. Roles of Diet-Associated Gut Microbial Metabolites on Brain Health: Cell-to-Cell Interactions between Gut Bacteria and the Central Nervous System. Adv. Nutr. 15, 100136 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Rebeaud J., Peter B. & Pot C. How Microbiota-Derived Metabolites Link the Gut to the Brain during Neuroinflammation. Int. J. Mol. Sci. 23, 10128 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Ahmad S. et al. Gut microbiome-related metabolites in plasma are associated with general cognition. Alzheimers Dement. 17, e056142 (2021). [Google Scholar]
  • 65.Ahmed Z., Zeeshan S., Xiong R. & Liang B. T. Debutant iOS app and gene-disease complexities in clinical genomics and precision medicine. Clin. Transl. Med. 8, e26 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Ahmed Z., Zeeshan S., Xiong R. & Liang B. T. PAS-Gen: Guide to iOS app with gene-disease. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Ahmed Z., Wan S., Zhang F. & Zhong W. Artificial intelligence for omics data analysis. BMC Methods 1, 4 (2024). [Google Scholar]
  • 68.Ahmed Z., Mohamed K., Zeeshan S. & Dong X. Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine. Database J. Biol. Databases Curation 2020, baaa010 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Germain A., Ruppert D., Levine S. M. & Hanson M. R. Prospective Biomarkers from Plasma Metabolomics of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome Implicate Redox Imbalance in Disease Symptomatology. Metabolites 8, 90 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Lim E.-J. & Son C.-G. Review of case definitions for myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). J. Transl. Med. 18, 289 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Germain A., Barupal D. K., Levine S. M. & Hanson M. R. Comprehensive Circulatory Metabolomics in ME/CFS Reveals Disrupted Metabolism of Acyl Lipids and Steroids. Metabolites 10, 34 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Martínez-Lavín M. Holistic Treatment of Fibromyalgia Based on Physiopathology: An Expert Opinion. JCR J. Clin. Rheumatol. 26, 204 (2020). [DOI] [PubMed] [Google Scholar]
  • 73.López-Hernández Y. et al. The plasma metabolome of long COVID patients two years after infection. Sci. Rep. 13, 12420 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Iqbal M., Elzembely H. I. & Said O. M. Letter to the Editor: &ldquo;Self-Reported Student Awareness and Prevalence of Computer Vision Syndrome During COVID-19 Pandemic at Al-Baha University&rdquo; [Letter]. Clin. Optom. 14, 193–194 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Sedeh F. B. et al. The correlation between self-reported hand eczema and clinically based diagnosis in professional cleaners. Contact Dermatitis cod.14611 (2024) doi: 10.1111/cod.14611. [DOI] [PubMed] [Google Scholar]
  • 76.Jason L. A., Yoo S. & Bhatia S. Patient perceptions of infectious illnesses preceding Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. Chronic Illn. 18, 901–910 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Hanson M. R. The viral origin of myalgic encephalomyelitis/chronic fatigue syndrome. PLOS Pathog. 19, e1011523 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Hamine S., Gerth-Guyette E., Faulx D., Green B. B. & Ginsburg A. S. Impact of mHealth Chronic Disease Management on Treatment Adherence and Patient Outcomes: A Systematic Review. J. Med. Internet Res. 17, e3951 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Clark N. M. Management of Chronic Disease by Patients. Annu. Rev. Public Health 24, 289–313 (2003). [DOI] [PubMed] [Google Scholar]
  • 80.Derman I. D. et al. High-throughput bioprinting of the nasal epithelium using patient-derived nasal epithelial cells. Biofabrication 15, 044103 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Fleming E. et al. Cultivation of common bacterial species and strains from human skin, oral, and gut microbiota. BMC Microbiol. 21, 278 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Ren J., Cislo P., Cappelleri J. C., Hlavacek P. & DiBonaventura M. Comparing g-computation, propensity score-based weighting, and targeted maximum likelihood estimation for analyzing externally controlled trials with both measured and unmeasured confounders: a simulation study. BMC Med. Res. Methodol. 23, 18 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Lynn J. V., Buchman L. K., Breuler C. J. & Buchman S. R. Surgical Timing and Neurocognitive Development among Patients with Craniosynostosis: Analysis of Confounders. Plast. Reconstr. Surg. 151, 821 (2023). [DOI] [PubMed] [Google Scholar]
  • 84.Karhan E. et al. Perturbation of Effector and Regulatory T Cell Subsets in Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS). 2019.12.23.887505 10.1101/2019.12.23.887505v1 (2019) doi:. [DOI] [Google Scholar]
  • 85.Krumina A. et al. Clinical Profile and Aspects of Differential Diagnosis in Patients with ME/CFS from Latvia. Medicina (Mex.) 57, 958 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Zubcevik N. et al. Symptom Clusters and Functional Impairment in Individuals Treated for Lyme Borreliosis. Front. Med. 7, (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Costa G. G., Pereira A. R. & Carvalho A. S. Pericardite lúpica: dor torácica e febre em tempos de COVID-19. Rev. Port. Med. Geral E Fam. 38, 300–4 (2022). [Google Scholar]
  • 88.Vyas J., Muirhead N., Singh R., Ephgrave R. & Finlay A. Y. Impact of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) on the quality of life of people with ME/CFS and their partners and family members: an online cross-sectional survey. BMJ Open 12, e058128 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Martinez A., Okoh A., Ko Y.-A. & Wells B. Racial Differences in FMD. 2023.02.10.23285630 Preprint at 10.1101/2023.02.10.23285630 (2023). [DOI] [Google Scholar]
  • 90.Trivedi M. S. et al. Identification of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome-associated DNA methylation patterns. PLOS ONE 13, e0201066 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Bouquet J. et al. Whole blood human transcriptome and virome analysis of ME/CFS patients experiencing post-exertional malaise following cardiopulmonary exercise testing. PLOS ONE 14, e0212193 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Lande A. et al. Human Leukocyte Antigen alleles associated with Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS). Sci. Rep. 10, 5267 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Almenar-Pérez E. et al. Epigenetic Components of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome Uncover Potential Transposable Element Activation. Clin. Ther. 41, 675–698 (2019). [DOI] [PubMed] [Google Scholar]
  • 94.Das S., Taylor K., Kozubek J., Sardell J. & Gardner S. Genetic risk factors for ME/CFS identified using combinatorial analysis. J. Transl. Med. 20, 598 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Caruana E. J., Roman M., Hernández-Sánchez J. & Solli P. Longitudinal studies. J. Thorac. Dis. 7, E537–E540 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.White R. T. & Arzi H. J. Longitudinal Studies: Designs, Validity, Practicality, and Value. Res. Sci. Educ. 35, 137–149 (2005). [Google Scholar]
  • 97.Aurora C., Cecilia A. & Adina H. The Role of Diet in the Treatment of Chronic Diseases Case Study. ARS Medica Tomitana 27, 153–156 (2021). [Google Scholar]
  • 98.Therrien R. & Doyle S. Role of training data variability on classifier performance and generalizability. in Medical Imaging 2018: Digital Pathology vol. 10581 58–70 (SPIE, 2018). [Google Scholar]
  • 99.Zhang B., Qin A. K., Pan H. & Sellis T. A Novel DNN Training Framework via Data Sampling and Multi-Task Optimization. in 2020 International Joint Conference on Neural Networks (IJCNN) 1–8 (2020). doi:10.1109/IJCNN48605.2020.9207329. [Google Scholar]
  • 100.Lathan C., Spira J. L., Bleiberg J., Vice J. & Tsao J. W. Defense Automated Neurobehavioral Assessment (DANA)-psychometric properties of a new field-deployable neurocognitive assessment tool. Mil. Med. 178, 365–371 (2013). [DOI] [PubMed] [Google Scholar]
  • 101.Resnick H. E. & Lathan C. E. From battlefield to home: a mobile platform for assessing brain health. mHealth 2, 30 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Lee J. et al. Hemodynamics during the 10-minute NASA Lean Test: evidence of circulatory decompensation in a subset of ME/CFS patients. J. Transl. Med. 18, 314 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Committee on the Diagnostic Criteria for Myalgic Encephalomyelitis/ChronicFatigue Syndrome, Board on the Health of Select Populations, & Institute of Medicine. Beyond Myalgic Encephalomyelitis/Chronic Fatigue Syndrome: Redefining an Illness. (National Academies Press (US), Washington (DC), 2015). [Google Scholar]
  • 104.Monica, 1776 Main Street Santa & California 90401–3208. 36-Item Short Form Survey (SF-36) Scoring Instructions. https://www.rand.org/health-care/surveys_tools/mos/36-item-short-form/scoring.html.
  • 105.Blanco-Míguez A. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat. Biotechnol. 41, 1633–1644 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Edgar R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010). [DOI] [PubMed] [Google Scholar]
  • 107.Love M. I., Huber W. & Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Shen W., Le S., Li Y. & Hu F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLOS ONE 11, e0163962 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Mallick H. et al. Multivariable association discovery in population-scale meta-omics studies. PLoS Comput. Biol. 17, e1009442 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Chawla N. V., Bowyer K. W., Hall L. O. & Kegelmeyer W. P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 16, 321–357 (2002). [Google Scholar]
  • 111.He Haibo, Bai Yang, Garcia E. A., & Shutao Li. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) 1322–1328 (IEEE, Hong Kong, China, 2008). doi: 10.1109/IJCNN.2008.4633969. [DOI] [Google Scholar]
  • 112.Saripuddin M., Suliman A., Syarmila Sameon S. & Jorgensen B. N. Random Undersampling on Imbalance Time Series Data for Anomaly Detection. in Proceedings of the 2021 4th International Conference on Machine Learning and Machine Intelligence 151–156 (Association for Computing Machinery, New York, NY, USA, 2022). doi: 10.1145/3490725.3490748. [DOI] [Google Scholar]
  • 113.Lundberg S. & Lee S.-I. A Unified Approach to Interpreting Model Predictions. Preprint at 10.48550/arXiv.1705.07874 (2017). [DOI]
  • 114.Abadi M. et al. TensorFlow: A system for large-scale machine learning. Preprint at 10.48550/arXiv.1605.08695 (2016). [DOI]
  • 115.Pedregosa F. et al. Scikit-learn: Machine Learning in Python. Preprint at 10.48550/arXiv.1201.0490 (2018). [DOI]
  • 116.Langfelder P. & Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Antonov M. et al. igraph enables fast and robust network analysis across programming languages. Preprint at 10.48550/arXiv.2311.10260 (2023). [DOI]
  • 118.Koo T. K. & Li M. Y. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J. Chiropr. Med. 15, 155–163 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Nelder J. A. & Wedderburn R. W. M. Generalized Linear Models. J. R. Stat. Soc. Ser. Gen. 135, 370–384 (1972). [Google Scholar]
  • 120.Bakdash J. Z. & Marusich L. R. Repeated Measures Correlation. Front. Psychol. 8, (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Gu C. Smoothing Spline ANOVA Models: R Package gss. Smoothing Spline ANOVA Models. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1
media-1.pdf (9.6MB, pdf)
Supplement 2
media-2.pdf (30.2MB, pdf)
Supplement 3
media-3.pdf (395.5KB, pdf)
Supplement 4
media-4.pdf (8.3MB, pdf)
Supplement 5
media-5.pdf (2.1MB, pdf)
Supplement 6
media-6.pdf (19.2MB, pdf)
Supplement 7

Data Availability Statement

Metagenomics data is being deposited under the BioProject submission number SUB14546737 and will be publicly available as of the date of publication. Accession numbers are listed in the key resources table. BioMapAI framework is available at https://github.com/ohlab/BioMapAI/codes/AI. All original code, analyzed data and trained model has been deposited at https://github.com/ohlab/BioMapAI. Other ‘omics data and any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES