Abstract
Integrating genetic information with metabolomics has provided new insights into genes affecting human metabolism. However, gene-metabolite integration has been primarily studied in individuals of European Ancestry, limiting the opportunity to leverage genomic diversity for discovery. In addition, these analyses have principally involved known metabolites, with the majority of the profiled peaks left unannotated. Here, we perform a whole genome association study of 2,291 metabolite peaks (known and unknown features) in 2,466 Black individuals from the Jackson Heart Study. We identify 519 locus-metabolite associations for 427 metabolite peaks and validate our findings in two multi-ethnic cohorts. A significant proportion of these associations are in ancestry specific alleles including findings in APOE, TTR and CD36. We leverage tandem mass spectrometry to annotate unknown metabolites, providing new insight into hereditary diseases including transthyretin amyloidosis and sickle cell disease. Our integrative omics approach leverages genomic diversity to provide novel insights into diverse cardiometabolic diseases.
Subject terms: Genetic association study, Cardiovascular biology, Metabolomics, Genome-wide association studies
Most studies of the genetics of the metabolome have been done in individuals of European descent. Here, the authors integrate genomics and metabolomics in Black individuals, highlighting the value of whole genome sequencing in diverse populations and linking circulating metabolites to human disease.
Introduction
Disturbed metabolism plays a central role across a spectrum of pathological processes from cancer to cardiometabolic disease1,2. Metabolomics aims to systematically measure small molecules and provides a snapshot of metabolic activity, capturing both genetic and environmental influences on disease pathogenesis3. The integration of genomics and metabolomics has been increasingly leveraged in efforts to identify bioactive metabolites linked to human disease, as large-scale genome-wide association studies (GWAS) have played a critical role in our understanding of loci that affect disease risk. Previous GWAS of the metabolome has identified associations between hundreds of genomic loci across a broad range of metabolite classes, including amino acids, nucleosides, and lipids among others4–24. While prior studies have ranged in sample sizes from several hundred to a few thousand individuals, a recent study performed a GWAS meta-analysis across several large cohorts for 171 metabolites (in up to 86,507 individuals) measured using various metabolomic profiling platforms, including liquid chromatography-mass spectrometry (LC-MS) and nuclear magnetic resonance (NMR) spectroscopy23. This cross-platform analysis highlighted the robustness of findings in metabolomics GWAS and the ability to detect clinically relevant associations, helping to illuminate biology in both common diseases such as diabetes and rare conditions such as macular telangiectasia type II.
Despite these efforts, the vast majority of metabolomics GWAS to date have been undertaken in cohorts of European ancestry, limiting the opportunity to leverage genomic diversity for biological discovery25–27. Individuals of African ancestry are more genetically diverse than those of European ancestry28, and carry ancestry-specific mutations which may illuminate biology and therapeutic strategies in cardiometabolic disease29. Further, most GWAS of the metabolome to date have used genotyping arrays with measurement of a limited set of tag single nucleotide polymorphisms (SNP) with the imputation of remaining variants, limiting the ability to accurately assess low-frequency protein-coding and non-coding variation. Finally, while tools for unbiased metabolomic profiling can now measure hundreds of known metabolites as well as thousands of unknown metabolite peaks30,31, the latter have eluded definitive compound identification, thus limiting biological insight into locus-metabolite associations. There are significant challenges in unknown metabolomic profiling, and attempts at further annotating these peaks in prior GWAS have been limited. First, unknown metabolite peaks must be separated from background noise and adduct ions of known and other unknown metabolites32. In addition, chemical identification of unique peaks requires downstream resource-intensive processes for structural elucidation, identification including the acquisition of product ion mass spectra (MS/MS) to generate metabolite fragmentation data as “chemical fingerprints” that can help improve compound identification33. We have previously demonstrated proof-of-principle suggesting the utility of integrating LC-MS of unknown peaks with genetic findings. For example, when peak levels map to solute carriers or enzymes with known functions, including substrates and products, this may help narrow the potential compound matches for chemical standard validation34. However, this has remained an arduous process, limiting its application in metabolomics GWAS of unknown peaks in large population-based studies.
To extend prior work, we performed a genome-wide association study integrating whole genome sequencing (WGS) of 2,291 metabolite peaks in 2466 participants from the Jackson Heart Study (JHS), a Black epidemiological cohort in Jackson, Mississippi, and validated findings in the Multi-Ethnic Study of Atherosclerosis (MESA; n = 995) and Health, Risk Factors, Exercise Training and Genetics Family Study (HERITAGE; n = 658). Beyond confirming prior known locus-metabolite associations in a Black cohort—an important next step to test the generalizability of prior work—we highlight many novel findings, including associations in ancestry-specific alleles for heritable conditions more commonly observed in Black individuals, including transthyretin amyloidosis and sickle cell disease. We acquired MS/MS on metabolite features and have integrated WGS findings and recently developed bioinformatic tools that leverage MS fragmentation data for more efficient annotation and identification of unknown metabolite peaks. We have developed and made available an extensive sample library of metabolite peaks, linking MS/MS spectra, genomic associations, and clinical traits that can be leveraged for annotation and identification of unknown metabolites implicated in diverse disease processes. Our integrative omics approach highlights the value of whole genome sequencing analysis of the metabolome in diverse populations for biological discovery and contributes to a roadmap for the identification of metabolites implicated in human disease.
Results
We performed a whole genome association study (WGAS) in 2466 Black participants from JHS on 30,672,656 variants limited to an allele count of at least five against 2291 metabolites (337 known metabolites and 1954 unknown metabolites peaks; Fig. 1). Clinical characteristics of the study population are detailed in Supplementary Table 1. At a Bonferroni threshold of significance of 8E-11 (based on 5E-8 /602 principal components, which explain 95% of the variance in metabolite peak levels), there were 519 locus-metabolite associations, representing 427 metabolite peaks and 226 sentinel SNPs (Fig. 2). Of these, 118 locus-metabolite associations were determined from known metabolite analysis, representing 91 distinct metabolites. Comparison to prior GWAS of plasma metabolites, using publicly available summary statistics through PhenoScanner V235 and the GWAS Catalog36 as well as manual review of previously published metabolomics GWAS (Supplementary Methods), suggests 33 of these locus-known metabolite associations are novel. In addition, we identified 401 locus-metabolite associations from the unknown metabolite peak analysis, representing 336 metabolite features, highlighting a large amount of information in the yet-to-be-identified peaks.
Of the 226 metabolite quantitative trait loci (mQTLs), there were 159 unique genes annotated as the lead candidate gene (closest protein-coding gene to the mQTL). Of the sentinel SNPs, 65% were expression quantitative loci (eQTL) for their corresponding gene as determined by PhenoScanner v2.0 (p value <2E-4; Supplementary Data 2). Of those that were not eQTLs for the candidate gene, the majority were rare in individuals of non-Finnish European Ancestry (MAF <1% in gnomAD), highlighting a key limitation of presently available genomic information when performing investigations in Black individuals. Among the sentinel SNPs, 22% were located in exons and an additional 19% were in enhancer or promoter regions (Fig. 3 and Supplementary Data 2).
Of the 519 locus-metabolite associations meeting the Bonferroni level of significance, 368 locus-metabolite peak relationships were available for validation in both MESA and HERITAGE, of which 91% were validated with a p value <0.05 and consistent direction of effect. An additional 100 locus-metabolite peak associations were available in either MESA or HERITAGE, of which 86% were validated. Overall, there was 90% validation of available locus-metabolite associations with p value <0.05; 68% validation (318 of 468 locus-metabolite associations available) using Bonferroni level of significance (0.05/468; p value <1E-4; Supplementary Data 2).
Ancestry-specific variants and metabolite associations
WGS of diverse populations provides an opportunity to assess allelic architectures of populations of different ancestries and inform underlying biology. In particular, for our study, 17% of the 226 sentinel SNPs that were rare (MAF <1%) in Non-Finnish Europeans were common in individuals of African ancestry (MAF >5%; Supplementary Data 2). Overall, 29% of the sentinel SNPs were nearly monoallelic in individuals of NFE ancestry with a MAF <0.01% in gnomAD37. Here, we highlight several novel locus-metabolite relationships with SNPs that are rare in NFE (Table 1).
Table 1.
rsID | Position | Metabolite | Gene | Position | Allele | Beta | P Value | AFR AF |
---|---|---|---|---|---|---|---|---|
rs141239670 | Chr1:171219209 | Succinic acid | SDHA | Exon | T | 1.01 | 4.4E-11 | 0.01 |
rs56072071 | Chr2:215328065 | AICA-riboside** | ATIC | Intron | A | 0.38 | 2.9E-17 | 0.11 |
rs754490766 | Chr3:51959148 | N-acetylglutamic Acid | PCBP4 | Intron | A | 1.40 | 3.0E-36 | 0.015 |
rs754490766 | Chr3:51959148 | N-acetylserine | PCBP4 | Intron | A | 1.06 | 1.2E-21 | 0.015 |
rs73733867 | Chr6:44207081 | N4-acetylcytidine | MYMX | Intergenic | T | 1.29 | 1.9E-37 | 0.02 |
rs3211938 | Chr7:80671133 | C38:6 PE plasmalogen | CD36 | Exon | G | 0.38 | 1.6E-15 | 0.09 |
rs3211938 | Chr7:80671133 | C38:7 PC plasmalogen | CD36 | Exon | G | 0.38 | 1.6E-15 | 0.09 |
rs115027210 | Chr7:76062732 | 2-Hydroxyglutaric Acid | MDH2 | Exon | C | 0.53 | 4.6E-16 | 0.05 |
rs28832309 | Chr7:80690622 | C40:7 PE plasmalogen | SEMAC3C | Intergenic | C | 0.35 | 2.3E-12 | 0.09 |
rs7079286 | Chr10:106656814 | N(6),N(6)-dimethyl-lysine | F8MD8 | Exon | T | −0.48 | 1.2E-23 | 0.11 |
rs334 | Chr11:5227002 | LPC(OH-16:0)* | HBB | Exon | A | 0.48 | 4.19E-11 | 0.04 |
rs624307 | Chr11:65376604 | 3-Hydroxycarnitine | SLC25A45 | Exon | T | −0.42 | 7.6E-14 | 0.08 |
rs12322356 | Chr12:56378580 | UDP-GlcNAc | APOF | Intergenic | C | 0.29 | 6.0E-17 | 0.31 |
rs13333418 | Chr16:30975943 | Cholestanone** | SETD1A | Exon | C | −0.23 | 1.4E-13 | 0.30 |
rs28934585 | Chr17:7220519 | CAR 14:1 | ACADVL | Exon | T | −0.37 | 3.0E-15 | 0.11 |
rs28934585 | Chr17:7220519 | CAR 14:2 | ACADVL | Exon | T | −0.38 | 1.8E-14 | 0.11 |
rs28934585 | Chr17:7220519 | CAR 12:0 | ACADVL | Exon | T | −0.35 | 2.5E-13 | 0.11 |
rs76992529 | Chr18:31598655 | All-trans retinol** | TTR | Exon | A | −0.76 | 4.6E-14 | 0.02 |
rs12721054 | Chr19:44919330 | DG (36:4) | APOC1 | UTR | G | −0.31 | 1.3E-11 | 0.12 |
Novel, ancestry-specific mQTLs for known and unknown metabolites in the Jackson Heart Study (minor allele frequency for sentinel SNP less than 1% in non-Finish Europeans).
AFR African, AF allele frequency, UTR untranslated region.
*Unknown metabolite annotated using MS/MS data.
**Unknown metabolite annotated with MS/MS and confirmed with the chemical standard.
As an example, The TTR variant (V122I) is present in 3–4% of Black individuals and has been implicated in the pathophysiology of heart failure in the elderly, often unrecognized. We found the V122I in TTR to be associated with an unknown metabolite (m/z 269.226) in JHS (ß = − 0.76, p value = 4.4E-14). The TTR tetramer complexes with retinol-binding protein (RBP4). This metabolite peak is correlated with RBP4 (r2 = 0.64) measured by aptamer-based proteomic profiling38 and V122I is significantly associated with RBP4 levels in JHS. Leveraging MS/MS data and its genetic association with TTR and correlation with RBP4, we predicted this compound to be all-trans-retinol (vitamin A), which we subsequently confirmed with an authentic standard (Supplementary Fig. 1). In addition, another variant that is nearly monoallelic in individuals of European ancestry, in APOE (rs769455), is associated with an unknown metabolite peak (m/z 269.226; ß = 0.71, p value = 1.5E-12) that has an identical molecular mass to all-trans-retinol (Fig. 4) and MS fragmentation analysis and MS comparison against chemical standards and all-trans-retinol and other retinol species predicts that it is a cis-isomer of retinol.
Annotation for unknown metabolite peaks associated with genomic loci
Of the 2291 metabolites measured, 1954 were unknown metabolite peaks, of which 336 were associated with genomic loci (p value <8E-11). Of the mQTLs associated with unknown features, 15 had no prior metabolite associations within 500 kb of the sentinel SNP; 12 of these SNPs were rare in individuals of NFE ancestry. (Supplementary Data 2). In the first step of assigning chemical identities to unknown metabolites, we clustered metabolite peaks measured in the positive mode (known and unknown metabolites) to identify primary metabolite features and their adducts (Supplementary Data 3). After applying our clustering algorithm, 49 of the 336 unknown metabolite peaks were part of clusters with known metabolites and were assigned the chemical identity of this primary metabolite. Of the 287 unknown metabolites not clustered with known compounds, 63 were adducts or fragments of other primary unknown metabolites, leaving 224 as major ions or primary unknown metabolites (Fig. 5).
Next, to aid in the identification of the unknown metabolites, we applied MS/MS profiling to metabolite peaks measured in positive mode, resulting in MS/MS data on 91% of unknown metabolite peaks designated as major ions or primary metabolites in our study. Using CANOPUS39, a bioinformatics tool which uses MS/MS data to annotate chemical compound class for metabolites, 72% of primary unknown metabolite features were assigned a metabolite compound class; lipids, amino acids, and fatty acids were the most common metabolite groups. SIRIUS40, a software tool that uses MS/MS data for metabolite structural elucidation, assigned chemical predictions in rank order for 182 primary unknown metabolites received (top three metabolite predictions for each feature in Supplementary Data 3). Leveraging these chemical predictions, we assigned high-confidence metabolite IDs to 33 metabolites that carried additional evidence for supporting the annotation using complementary tools, including Global Natural Product Social Molecular Networking (GNPS; n = 8)41, a database for MS/MS spectra in metabolomics studies, and the Human Metabolome Database (HMDB; n = 5)32, MS data for related compounds from our in-house metabolite library (n = 12) or validation with chemical standards (n = 8).
In addition to providing a database of MS/MS data for metabolomics studies, GNPS allows visualization of networks of structurally similar metabolites based on MS fragmentation data within studies to derive metabolite identities. As an example, chemical annotation using SIRIUS failed to generate a high-confidence metabolite prediction for an unknown metabolite with m/z 536.4354. However, using GNPS and comparing MS fragmentation data of this metabolite peak to community-based MS/MS spectra repositories, we were able to annotate this peak as carotene. Carotene is associated with rs2293440 (ß = 0.20, p value = 4.73E-11, an exonic variant in Scavenger Receptor Class B Receptor 1 (SCARB1). SCARB1 plays a key role in lipoprotein metabolism through its action on reverse cholesterol transport and is associated with cholesterol levels in large population genomic studies42. Additionally, SCARB1 has been associated with the cellular uptake of carotenoids43, increasing our confidence in our metabolite annotation. Using GNPS, we mapped structurally similar unknown metabolites based on MS/MS spectra (Fig. 6A). A closely related metabolite peak (m/z 568.427) with a cosine similarity score for fragmentation spectra of 0.88 (Fig. 6B; high similarity score designated as >0.7) was noted to be associated with the Beta-carotene 15,15-dioxegynase (BCO1) and Intestine Specific Homeobox (ISX) loci. BCO1 converts carotenoids to retinal and ISX participates in carotenoid metabolism by regulating the expression of BCO144. Anchoring our potential metabolite identification on these genomic associations, we searched through related carotenoid species based on mass differences between this peak and carotene and annotated this compound to be a carotenoid, zeaxanthin (Fig. 6D), which we subsequently validated with a chemical standard. In addition, the GNPS network identified another closely related compound in this network of unknown peaks, cryptoxanthin, a naturally occurring carotenoid compound.
Of the 37 high-confidence metabolite IDs from unique primary unknown metabolites (33 from SIRIUS predictions with supporting sources of information and four using GNPS; Supplementary Data 6), 92% (34/37) were evaluated as having genomic evidence in support of the chemical compound predictions based on the known metabolic pathways of the assigned lead candidate gene (Supplementary Data 2). In addition, CANOPUS annotated 29 of the 37 with metabolite class, of which 27 were in support of the metabolite annotation. We confirmed 11 unknown metabolite compound annotations with commercially available chemical standards; five of these are part of novel associations in a metabolomics GWAS: all-trans-retinol, zeaxanthin, 5,6 dihydrouridine, AICA-Riboside, and cholestanone (Supplementary Data 3 and Supplementary Fig. 1). MS/MS data on all metabolites associated with genomic loci in our study are made available (Supplementary Data 7).
Discussion
In the present study, we used WGAS to identify genetic determinants of plasma metabolites in a Black population from JHS and applied novel chemical profiling and bioinformatic methods to annotate unknown metabolite peaks. In our cohort of Black individuals, the presence of ancestry-specific alleles that are nearly monoallelic in individuals of European ancestry increases the power to detect novel metabolomic associations with established cardiovascular risk loci—and represents an important first step in the broader discovery of ancestry-specific, pathogenically significant metabolic differences.
In this study, we show a novel association in a clinically relevant polymorphism in TTR (Transthyretein), which is associated with increased risk of heart failure in Black individuals, and an unknown metabolite feature we identify as all-trans-retinol. The TTR V122I polymorphism described in 3–4% of Black individuals destabilizes the TTR-RBP4 tetramer, thereby displacing RBP4 and promoting amyloid fibril formation that precipitates heart failure and death45. The reduced circulating all-trans retinol observed in our study may similarly be related to increased clearance and a potential marker of TTR-RBP4 tetramer stability, as has been postulated in cases with ATTRv V122I amyloidosis45. The effect of reduced all-trans retinol on downstream active retinol metabolites, including retinoic acid, and its potential contribution toward pathologic cardiac hypertrophy needs further study.
The Apolipoprotein E locus is a complex genomic region encoding APOE and the isoforms produced from its polymorphic alleles. APOE is involved in critical metabolic pathways, including lipid transport and metabolism46, and is associated with chronic diseases, including the development of atherosclerosis and Alzheimer’s disease, presumably mediated by its role in the transport and clearance of cholesterol and amyloid peptides to the brain47. We show an association between a missense variant in APOE (rs769455), a nearly monoallelic polymorphism in individuals of European ancestry, with an unknown metabolite peak (m/z 269.226). This unknown metabolite has an identical molecular mass to all-trans-retinol and MS fragmentation analysis suggests that it is a potential isomer of retinol. While retinyl esters form with chylomicrons transported by APOE, the role of APOE transport on retinol species and potential downstream implications for retinoid transport and bioavailability has yet-to-be elucidated. Variants near retinoic acid transporters and receptors (downstream and active metabolites of retinol) have shown associations with risk for Alzheimer’s disease and defective transport of retinol and related species has been implicated in this disease48,49. Our findings showing associations between APOE and cis-retinol may have potential implications for the biological roles of APOE in disease pathways.
We show a novel association between an ancestry-specific variant in CD36 (Platelet Glycoprotein IV; rs3211938) associated with specific plasmalogens, a subclass of phospholipids integral to cell membrane signaling and stability. CD36 is an integral transmembrane protein involved in the sequestration of malarial species, plasmodium falciparum, preventing splenic destruction of the organisms. As such, variants in CD36 offer protection against malaria infection50. In addition, CD36 functions as a receptor for several important inflammatory mediators, fatty acids, and lipids among others, and is implicated in processes including regulation of blood pressure51, lipids52, and the development of atherosclerosis in model systems53–55. Novel associations of CD36 with plasmalogen species, which are glycerophospholipids with important antioxidant properties and have been associated with the protection of endothelial cells in hypoxic conditions56, as well as with cell signaling and membrane stability50, highlight potential lipid mediators, and mechanisms for the role of CD36 in cardiometabolic disease development.
Sickle cell anemia is characterized by severe vascular abnormalities and leads to chronic cardiovascular diseases, including pulmonary hypertension, heart failure, and stroke57. In addition, individuals with the sickle cell trait are also at increased risk of developing chronic kidney disease58. Our findings show an association between the sickle mutation, rs334, and an unknown metabolite which we predict to be a lysophosphatidylcholine, a major component of red blood cell membranes. Red blood cell membrane structure is significantly altered in individuals with sickle cell disease, which affects cell shape, hemodynamics, and protein-membrane signaling interactions59. This metabolite association may be a marker of red blood cell membrane remodeling that occurs in sickle cell disease and may help elucidate mechanisms of red blood cell pathology and resulting downstream complications of ischemic and inflammatory tissue damage.
Unknown metabolomic profiling presents an opportunity for unbiased discovery in metabolomics GWAS. However, given the breadth and diversity of the metabolome, annotating metabolite peaks with subsequent validation of proposed metabolite identities is a lengthy and arduous process, traditionally requiring extensive manual curation of study features against reference databases. To facilitate a more efficient annotation pipeline for large-scale metabolomics GWAS, we performed additional tandem MS profiling to obtain MS fragmentation data for all our peaks and implemented recently developed bioinformatic methods that leverage MS/MS spectra in metabolomics studies to help annotate unknown compounds with chemical and/or class identities. Individual methods can help elucidate chemical identity by detailing compound sub-structure (SIRIUS), structural similarity to other metabolomic features (GNPS), and compound class (CANOPUS). In our study, no one method provided a complete annotation of study features, highlighting the challenges and complexity of working with unknown metabolomics. However, the use of complementary tools to elucidate metabolite identities enabled structural and/or class annotation for a majority of profiled peaks and represents the first systematic application of these bioinformatic tools to identify unknown peaks in a genomic association study.
We have previously demonstrated how genomic integration with MS fragmentation data on unknown metabolite peaks can help narrow the focus for these features, by mapping loci to predicted compounds based on shared metabolic pathways34. Here we have systematically integrated genetic associations with MS/MS spectra-based metabolite predictions from bioinformatic techniques. We find that a majority of our metabolite annotations, based on chemical identity, can be mapped to the metabolic/functional pathway of the associated lead candidate gene, providing an additional layer of support as we decipher metabolite identities associated with cardiometabolic diseases. As an example, an unknown metabolite feature with m/z 259.1036 is associated with triglycerides, DM, and CHD among other traits in JHS (Supplementary Data 4). This metabolite peak maps to the ATIC gene, which encodes 5-Aminoimidazole-4-Carboxamide Ribonucleotide Formyl transferase/IMP Cyclohydrolase, a protein-coding gene involved in purine biosynthesis. One of the top computational predictions based on tandem MS for this metabolite was AICA-Riboside (Acedesine). While this compound has been described as an AMP-activated protein kinase agonist with investigational applications in treatments for diabetes and lymphoma60–62, there is evidence that endogenous levels of the compound are physiologically important. ATIC deficiency, a recessive genetic disease, results in impaired purine synthesis and increased urinary AICA-Riboside and is marked by severe neurodevelopmental delays, growth impairment, and dysmorphic features63,64. The association of AICA-Riboside, with its canonical enzymatic pathway in WGAS, narrowed our metabolite search to this single bioinformatic prediction, which we then confirmed with a commercial standard (Supplementary Fig. 1). Thus, the integration of genetics and MS/MS spectra can sometimes enable an efficient pipeline for identification of metabolites associated with cardiometabolic disease.
Our study represents one of the few analyses of genome-metabolome integration in a Black population. As such, validation of locus-metabolite associations presents a significant challenge, especially for associations in ancestry-specific alleles, given the scarcity of both known and unknown metabolomics profiling in Black populations. In addition, metabolomics GWAS have traditionally implemented genotype imputation of SNP array using reference panels. However, in an admixed population such as JHS, limited representative reference panels necessitate the use of more accurate imputation panels or whole genome sequencing. Further efforts to apply metabolomic profiling in Black populations and integrate with WGS will be essential to replicate key locus-metabolomic findings, though many have strong biologic plausibility. To assess the novelty of our findings, we used the most up-to-date genomic databases assessing genotype-phenotype associations, as well as a manual review of prior published metabolomics GWAS. However, there is the potential that we may have missed some previously published locus-metabolite associations. In addition, though we have made significant progress in compound identification for unknown metabolite features, a significant number still lack validation with chemical standards, and there may be some inaccuracies in chemical and/or compound class annotations, though we believe the integration of genetic findings and novel bioinformatic tools have helped minimize misclassification. While we will continue to systematically validate these compound IDs, we make available our sample library of MS/MS spectra with clinical and genomic associations, which can be leveraged across the omics community, providing a crowdsourcing opportunity for metabolite identification and serving as an ongoing resource for validation of metabolite peaks.
In summary, our integrative approach toward the identification of known and unknown metabolites involved in diverse disease processes using WGAS in a Black population highlights novel and clinically relevant locus-metabolite associations. In addition, genomic integration with advanced chemical phenotyping using tandem MS improves the ability to annotate unknown metabolite peaks, the “dark matter” of the metabolome. This sample library of MS/MS spectra of metabolite features linked to genomic loci and clinical traits will improve the identification of biologically relevant metabolites.
Methods
Cohorts
The study designs and methods for JHS, MESA, and HERITAGE have been described in refs. 65–67. JHS is a prospective population-based observational study designed to investigate risk factors for cardiovascular disease (CVD) in Black individuals. In 2000–2004, 5306 Black individuals from the Jackson, Mississippi tri-county area (Hinds, Rankin, and Madison counties) were recruited for a baseline examination. Of the original cohort, 2466 individuals had whole genome sequencing and metabolomic profiling performed from baseline fasting samples and were included in the analyses. MESA included 6814 participants between the ages of 45–84 years recruited at six clinical centers across the US, who were identified as members of four racial/ethnic groups: White, Hispanic, Asian, or Black (28%). Included in the present study are 995 individuals across all four racial/ethnic groups with metabolomic profiling and WGS at baseline exam. HERITAGE enrolled a combination of self-identified white and Black family units, totaling 763 sedentary participants (38% Black) between the ages of 17–65 years, in a 20-week, graded endurance exercise training study across four clinical centers in the US and Canada in 1995. Included in the present study is a random subset of 658 individuals with baseline metabolomic profiling and genotyping.
Study approval
The Institutional Review Boards of Beth Israel Deaconess Medical Center, University of Mississippi Medical Center, University of Washington (MESA), and HERITAGE clinical centers approved the human study protocols, and all participants provided written informed consent.
LC-MS metabolite profiling
Metabolite profiling was performed using two LC-MS methods. Organic acids and other intermediary metabolites were separated using amide chromatography (Waters XBridge Amide column) and measured using targeted negative ion mode multiple reaction monitoring (MRM) MS with an LC-MS system comprised of an Agilent 1290 infinity LC coupled to an Agilent 6490 triple quadrupole mass spectrometer. MRM data were processed using Agilent Masshunter QQQ Quantitative analysis software68.
Separately, amino acids, acylcarnitines, and other polar metabolites (including both known and unknown metabolite features) were separated using hydrophilic interaction liquid chromatography (HILIC) using an Atlantis HILIC column (Waters; Mildford, MA) and measured using nontargeted, full scan, high-resolution MS in the positive ion mode over m/z 70–800 with an LC-MS system comprised of a Nexera X2 U-HPLC (Shimadzu Corp.; Marlborough, MA) coupled to a Q Exactive mass spectrometer (Thermo Fisher Scientific; Waltham, MA). Raw data were processed using TraceFinder 3.3 (Thermo Fisher Scientific; Waltham, MA) for supervised integration of a subset of identified metabolites and quality control. Progenesis QI (Nonlinear Dynamics; Newcastle upon Tyne, UK) was used for the detection and integration of both identified and unknown features. Each feature in the dataset was tracked by its measured mass to charge ratio and chromatographic retention time, which serves as a unique “tag” for each LC-MS peak. Known compounds were annotated using mixtures of authentic reference standards analyzed with each batch and reference data. These metabolites had previously been annotated in human plasma and confirmed via spiking experiments with standards and by matching retention times and MS data. Metabolites with a coefficient of variation69 greater than 30% and those missing in more than 30% of measured samples were removed from the analysis70.
Isotope-labeled internal standards were monitored in each sample to ensure proper MS sensitivity for quality control. Pooled plasma samples were interspersed at intervals of 20 participant samples in the HILIC method and intervals of 10 participant samples in the amide chromatography method to enable correction of drift in instrument sensitivity over time and to scale data between batches. We used a linear scaling approach to the nearest pooled plasma sample in the queue. An additional pooled plasma sample was interspersed at every 20 injections to determine the coefficient of variation for each metabolite and unknown over the run. Peaks were manually reviewed in a blinded fashion to assess quality.
MS/MS data acquisition
We acquired product ion mass spectra (MS/MS) on unknown features to aid their identification. All MS/MS data were acquired using an LC-MS system comprised of a Nexera X2 U-HPLC (Shimadzu Corp.; Marlborough, MA) coupled to an ID-X orbitrap mass spectrometer (Thermo Fisher Scientific; Waltham, MA). LC conditions were identical to those used in the nontargeted HILIC method, and electrospray ionization MS settings were spray voltage 3.5 kV, sheath gas 40, sweep gas 2, capillary temperature 350 °C, heater temperature 300 °C, S-lens RF 40. MS/MS data were generated using a combination of data-dependent acquisition (DDA) and inclusion list-directed MS/MS acquisition. For DDA, we used the AcquireX pipeline provided with the Thermo ID-X instrument and five consecutive injections of the JHS pooled plasma sample used for QC. The AcquireX scan cycle included an MS survey scan (70–800 m/z) followed by five MS/MS scans with a stepped collision energy of 10, 25, and 50 eV. To obtain MS/MS data on features not captured by the unsupervised AcquireX approach, we also used a directed MS/MS data acquisition approach in which lists of specific ions and retention time windows (inclusion lists) were created as required to measure spectra for ions of interest. First, we split all features in the study into 24 individual mass inclusion lists, separated based on ranges of metabolite peak retention times obtained from the initial LC-MS experiment, to improve the sensitivity of MS/MS data acquisition. We then generated MS/MS spectra using higher-energy C-trap dissociation (HCD) with stepped collision energies (10, 25, 50 V). Second, we targeted unknown features with GWAS hits and generated MS/MS with an expanded set of collision energies ranging from 10 to 50 V in 10 V increments. In order to increase the likelihood of capturing low abundance features in the JHS pool pooled plasma, samples used for MS/MS acquisition were concentrated ten-fold. Metabolites were extracted from 100 µL of pooled plasma using 900 µL of 74.9:24.9:0.2 (v/v/v) acetonitrile/methanol/formic acid. The samples were centrifuged (10 min, 9000×g, 4 °C) and the supernatants were dried under a gentle stream of nitrogen gas TurboVap LV, Biotage). Dried extracts were resuspended in 100 µL of 10:67.4:22.4:0.18 (v/v/v/v) water/acetonitrile/methanol/formic acid containing stable isotope-labeled internal standards (valine-d8, Sigma-Aldrich; St. Louis, MO; and phenylalanine-d8, Cambridge Isotope Laboratories; Andover, MA) and 10 µL were injected per LC-MS/MS analysis.
MS/MS data processing
Raw files were converted to *.mzML format files using MSConvert71 and both extracted ion chromatograms37 and matching MS/MS scans for each individual feature were generated using the R package MSnbase v. 3.1272, and in-house scripts for producing EIC and MS/MS spectra visualizations. Feature retention times and peak quality in the concentrated pools were confirmed by visually inspecting the chromatography peak shapes of each individual feature. After confirming the study retention times in the MS/MS acquisition, the extraction of MS/MS data was conducted by finding scans with precursors within ±0.2 a.m.u. of the known features and ±0.1 min from the apex of the peak detected in the MS/MS run. Matching MS/MS peaks within 5 ppm across MS/MS scans spanning the range were aggregated whenever more than one MS/MS scan was mapped to each individual feature. The resulting peak height for aggregated peaks was determined as the average of the aggregated peak intensities. Peaks inconsistently detected across MS/MS scans were removed from the final MS/MS inventory. Additionally, an electronic noise fragment detected in the MS/MS of low abundance peaks within 30 ppm of m/z of 173.46 was removed from parsed data. Parsed MS/MS was formatted as input for molecular structure predictions (*.ms) or MS/MS-based similarity networks (*.MGF). For MS/MS-based similarity predictions, spectra generated for individual features using more than one collision energy were kept as independent molecular features.
Genotyping
Whole genome sequencing (WGS) in JHS and MESA has been described in ref. 73. Participant samples underwent >30× WGS through the Trans-Omics for Precision Medicine project at the Northwest Genome Center at the University of Washington and the Broad Institute and joint genotype calling with participants in Freeze 6; genotype calling was performed by the Informatics Resource Center at the University of Michigan. Genotyping in HERITAGE was performed on the Illumina Infinium Global Screening Array. Genotypes were called using Illumina’s GenCall based on the TOP/BOT strand method. Genotype imputation to the TOPMed Freeze5 reference panel was performed using the University of Michigan Imputation Server Minimac4. In addition, phasing was performed with Eagle v2.4. Sites with call rates <90%, mismatched alleles, or invalid alleles were excluded.
Whole genome association study
Metabolite LC-MS peak areas were log-transformed and scaled to a mean of zero and standard deviation of 1 and subsequently residualized on age, sex, batch, and principal components (PCs) of ancestry 1–10 as determined by the GENetic EStimation and Inference in Structured samples (GENESIS)74, and subsequently inverse normalized. The association between these values and genetic variants was tested using linear mixed-effects models adjusted for age, sex, the genetic relationship matrix, and PCs 1–10 using the fastGWA model implemented in the GCTA software package75. Variants with a minor allele count less than 5 in a given cohort were excluded from analysis in that cohort. A Bonferroni adjusted significance threshold of 8E-11 (5 × 10−8/602 PC’s explaining 95% of the variance of metabolite levels) was used for discovery in JHS. To identify sentinel SNPs and metabolite quantitative loci (mQTL), we first defined a 1 Mb region around each SNP significantly associated with a given metabolite. Starting at the SNP with the lowest p value, overlapping mQTLs for a particular metabolite were merged. This process was repeated until no more overlapping regions existed for the given metabolite, and the lead variant was identified as the one with the lowest p value. Lead variants that were not in overlapping regions but in linkage disequilibrium (LD) with r2 ≥ 0.8 were again combined using SNPClip76, and this final merged region was designated as the mQTL, with the most significant SNP retained as the sentinel variant. Where association statistics were available in both MESA and HERITAGE, the two cohorts were meta-analyzed by the inverse-variance weighted method using the “metagen” package in R. Validation threshold was set at p < 0.05 with a consistent direction of effect.
Variant annotations
Reference allele frequencies from gnomAD and variant functional annotations using GENCODE and ClinVar disease annotations were obtained from the Functional Annotation of Variants—Online Resource (available favor.genohub.org, download date August 1, 2020).
Comparing to previous mQTLs
We used existing genomic databases and prior blood metabolomics GWAS to assess the novelty of our locus-metabolite associations. To determine whether mQTLs of known metabolites were novel, we first utilized the PhenoScanner package for R. A 1 MB region around each sentinel SNP associated with a metabolite was passed to the PhenoScanner function in R: build was set to “38”, p value to 5 × 10−8, catalog to “mQTL” (query date 12/1/2021). Novel locus-metabolite associations were cross-referenced against the GWAS Catalogue using the sentinel SNP and a 1 MB surrounding region. In addition, gene-metabolite associations were manually reviewed for novelty across 21 published GWAS of metabolomics (details of individual studies reviewed in Supplementary Methods)3–23.
mQTL and phenotype associations
To determine overlap between clinical GWAS analyses and mQTLs in this analysis, we utilized the PhenoScanner package for R. All sentinel SNPs associated with metabolite peaks as identified above were passed to the PhenoScanner function in R with the following arguments: build was set to “38”, p value to “1 × 10−5”, catalog to “GWAS”, r2 was set to “0.8”, proxies set to “None” (query date 12/1/2021).
Unknown metabolite peak annotation
Metabolite peak clustering
The electrospray ionization process used in LC-MS can generate more than one type of ion adduct of a molecule (e.g., [M + H]+, [M + Na]+, etc.), partially fragment molecules, and generate multimer ions. A single metabolite may therefore give rise to multiple unknown peaks. However, such redundant features share the same chromatographic retention time and have highly correlated signal intensities. To filter redundant features, all profiled metabolite peaks (HILIC platform) were grouped into clusters based on a retention time similarity of ±0.25 min and a signal intensity spearman correlation coefficient >0.80. The [M + H] + ion, if identified by mass differences among features in the cluster or the feature with the highest signal intensity, was identified as the primary feature.
Metabolite annotations leveraging MS/MS spectra and bioinformatic tools
Parsed MS/MS data (*.ms) were loaded into SIRIUS CSI-Finger ID version 4.7.240. Molecular formula predictions generated with Orbitrap-specific settings (MS/MS isotope scorer: ignore, mass deviation: 5 ppm, Candidates: 10, Candidates per ion: 1, possible ionizations: [M + H] + , [M + K] + , [M + Na]+). Structure elucidations were done using PubChem and the adducts [M + H] + , [M + K] + , [M + Na]+). Predictions were exported and the top three structure elucidations were parsed for each feature. Parsed MS/MS data for each metabolite peak were annotated for predicted metabolite class using ClassyFire ontology through CANOPUS39. MS/MS-based networks were built using the Global Natural Products Social Molecular Networking (GNPS)41, and the resulting networks were visualized with Cytoscape v. 3.8.277. To provide supporting information for our metabolite annotations, we searched the Human Metabolome database with m/z ± 5 ppm32.
Metabolite annotation scheme
Results from metabolite feature clustering, class, and chemical structure elucidations were integrated to annotate metabolite peaks. We subsequently assigned each annotation to a category based on levels of supporting evidence. In addition, we classified the categories of metabolite identification in accordance with the Metabolomics Standards Initiative (MSI) recommendations (Supplementary Table 2):78 Category 1: metabolite match to an authentic reference standard; Category 2: metabolite clusters with a known compound which has previously validated with standard (Category 1 and 2 corresponds to MSI Classification 1 as Identified Metabolite); Category 3: metabolite with MS/MS-based GNPS database match (MSI Classification 2: putatively annotated metabolite); Category 4: Metabolite with similar MS/MS spectra and retention time with the representative backbone of chemical standard for a compound in the metabolite family (manual curation) using in-house metabolite library (MSI classification 3: putatively annotated metabolite class). In addition, we add category 5: SIRIUS MS/MS-based chemical formula/compound predictions and Category 6: compound match using m/z search in HMDB, which are not included in the 2007 MSI classification scheme. Primary unknown metabolites with multiple sources of supporting evidence represented high-confidence metabolite annotations and were assigned specific metabolite IDs.
Genomic and metabolite pathway integration
Genetic associations with unknown metabolite peaks can be integrated with MS/MS and bioinformatic metabolite annotations in an effort to further illuminate metabolite identifications and/or offer supporting evidence for predictions based on the known pathways of the locus. Each lead candidate gene (nearest gene to sentinel SNP) was annotated for its associated metabolic pathways using the KEGG database79. Metabolite annotations were evaluated in the context of the metabolic pathways of the candidate gene or its prior GWAS associations to assess whether there was genomic evidence in support of the chemical compound identification.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
Dr. Tahir is supported by the John S. LaDue Memorial Fellowship in Cardiology. Dr. Katz is supported by an NHLBI T32 post-doctoral training grant (T32HL007374-0). Dr. Cruz is supported by the KL2/Catalyst Medical Research Investigator Training award from Harvard Catalyst (NIH/NCATS Award TR002542). Dr. Robbins is supported by the NHLBI K23HL150327 award. Dr. Benson is supported by the NHLBI K08HL145095 award. Dr. Ruberg is supported by NIH R01HL139671. Dr. Natarajan is supported by grants from the NHLBI (R01HL142711, R01HL148050, R01HL151283, R01HL127564, R01HL148565, R01HL135242, and R01HL151152), Fondation Leducq (TNE-18CVD04), and Massachusetts General Hospital (Paul & Phyllis Fireman Endowed Chair in Vascular Medicine). Drs. Gerszten, Wang, and Wilson are supported by NIH R01 DK081572.
The Jackson Heart Study (JHS) is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I/HHSN26800001) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I, and HHSN268201800012I) contracts from the National Heart, Lung, and Blood Institute73 and the National Institute for Minority Health and Health Disparities (NIMHD. Molecular data for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute73. Genome sequencing for “NHLBI TOPMed: The Jackson Heart Study” (phs000964.v1.p1) was performed at the Northwest Genomics Center (HHSN268201100037C). Core support, including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support, including phenotype harmonization, data management, sample-identity QC, and general program coordination, were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I) and the TOPMed Centralized Omics REsource (CORE; contract HHSN268201600034l). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. The authors wish to thank the staff and participants of the JHS.
We thank Drs. Arthur S. Leon, D.C. Rao, James S. Skinner, Tuomo Rankinen, Jacques Gagnon, and the late Jack H. Wilmore for contributions to the planning, data collection, and conduct of the HERITAGE project. This research was partially funded by National Heart, Lung, and Blood Institute Grants HL-45670, HL-47317, HL-47321, HL-47323, and HL-47327, to Dr. Brouchard and his colleagues all in support of the HERITAGE Family Study. C.B. is partially funded by the John W. Barton Sr. Chair in Genetics and Nutrition, and NIH COBRE grant (NIH P30GM118430-01). Dr. Sarzynski is supported by R01HL146462.
TOPMed MESA Multi-Omics/MESA Study Acknowledgement
Whole genome sequencing (WGS) for the Trans-Omics in Precision Medicine program was supported by the National Heart, Lung and Blood Institute. WGS for “NHLBI TOPMed: Multi-Ethnic Study of Atherosclerosis (MESA)” (phs001416.v1.p1) was performed at the Broad Institute of MIT and Harvard (3U54HG003067-13S1). Centralized read mapping and genotype calling, along with variant quality metrics and filtering, were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1). Phenotype harmonization, data management, sample-identity QC, and general study coordination were provided by the TOPMed Data Coordinating Center (3R01HL-120393-02S1). The MESA projects are conducted and supported by the National Heart, Lung, and Blood Institute in collaboration with MESA investigators. Support for the Multi-Ethnic Study of Atherosclerosis (MESA) projects are conducted and supported by the National Heart, Lung, and Blood Institute in collaboration with MESA investigators. Support for MESA is provided by contracts 75N92020D00001, HHSN268201500003I, N01-HC-95159, 75N92020D00005, N01-HC-95160, 75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D00006, N01-HC-95163, 75N92020D00004, N01-HC-95164, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, and UL1-TR-001420, UL1TR001881, DK063491, and R01HL105756. The authors thank the other investigators, the staff, and the participants of the MESA study for their valuable contributions. A full list of participating MESA investigators and institutes can be found at http://www.mesa-nhlbi.org.
The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services.
Author contributions
U.A.T., D.H.K., J.A.-P., A.G.B., T.J.W., J.G.W., C.B.P., P.N., and R.E.G contributed to the original concept of the project, planned these analyses, and formulated the methods. D.H.K., U.A.T., D.N., M.D.B., Z-Z.C., J.M.R., D.E.C., SD, L.F., S.S.R., M.E.H., A.C., J.G.W., and R.E.G., collected, organized, and contributed to the quality control and management of JHS metabolomics data. D.H.K., U.A.T., X.S., S.Z., D.N., M.D.B., J.M.R., D.E.C., S.D., L.F., S.S.R., D.S, R.P.T., P.D., K.D.T., Y.L., W.C.J., X.G., J.Y., Y.-D.I.C, A.W.M., S.S.R., J.I.R., and R.E.G., collected, generated, organized, and contributed to the quality control and management of MESA metabolomics data. D.H.K., U.A.T., D.N., M.D.B., J.M.R., D.E.C., S.D., L.F., S.S., D.S, C.B., M.A.S., and R.E.G. collected, generated, organized, and contributed to the quality control and management of HERITAGE Family Study metabolomics and genomic data. F.L.R and W.S.B contributed to the validation of unknown metabolite identities. D.J. and TOPMed performed W.G.S. from JHS and MESA. D.H.K., U.A.T, A.G.B, A.P., Z.Y., A.E., S.D., P.N., and R.E.G. developed the WGS analysis pipeline and statistical methods across all metabolomics data. D.H.K., U.A.T., J.A.-P., C.B.C., P.N., and R.E.G. analyzed the data and wrote/revised the article.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work.
Data availability
The WGAS summary statistics data generated in this study have been deposited in the GWAS catalog under accession code GCST90104476. Individual WGS data for TOPMed and metabolomic data for JHS and MESA, can be obtained by application to dbGaP with accession numbers for JHS and MESA are phs000964/phs002256.v5.p1 and phs001416.v2.p1. In addition, MS/MS spectra and analyses via Global Natural Product Structural Molecular Networking (GNPS) Job ID: aa6d11c8be15436abcb7d3d44fee5836 can be accessed at.
Competing interests
A.G.B. is a co-founder and shareholder of TenSixteen Bio. P.N. reports personal consulting fees from Amgen, Apple, AstraZeneca, Genentech/Roche, Novartis, TenSixteen Bio, Foresite Labs, and Blackstone Life Sciences, grant support from Amgen, Apple, AstraZeneca, Boston Scientific, and Novartis, is a scientific advisory board member with equity of TenSixteen Bio and geneXwell, and spousal support and equity in Vertex, all unrelated to the present work; P.N.’s interests were reviewed and are managed by Massachusetts General Hospital and Mass General Brigham in accordance with their conflict of interest policies. The remaining authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Usman A. Tahir, Daniel H. Katz, Julian Avila-Pachecho, Clary B. Clish, Pradeep Natarajan, Robert E. Gerszten.
A list of authors and their affiliations appears at the end of the paper.
Contributor Information
Robert E. Gerszten, Email: rgerszte@bidmc.harvard.edu
NHLBI Trans-Omics for Precision Medicine 1 Consortium:
Namiko Abe, Gonçalo Abecasis, Francois Aguet, Christine Albert, Laura Almasy, Alvaro Alonso, Seth Ament, Peter Anderson, Pramod Anugu, Deborah Applebaum-Bowden, Kristin Ardlie, Dan Arking, Donna K. Arnett, Allison Ashley-Koch, Stella Aslibekyan, Tim Assimes, Paul Auer, Dimitrios Avramopoulos, Najib Ayas, Adithya Balasubramanian, John Barnard, Kathleen Barnes, R. Graham Barr, Emily Barron-Casella, Lucas Barwick, Terri Beaty, Gerald Beck, Diane Becker, Lewis Becker, Rebecca Beer, Amber Beitelshees, Emelia Benjamin, Takis Benos, Marcos Bezerra, Larry Bielak, Joshua Bis, Thomas Blackwell, John Blangero, Nathan Blue, Eric Boerwinkle, Donald W. Bowden, Russell Bowler, Jennifer Brody, Ulrich Broeckel, Jai Broome, Deborah Brown, Karen Bunting, Esteban Burchard, Carlos Bustamante, Erin Buth, Brian Cade, Jonathan Cardwell, Vincent Carey, Julie Carrier, April Carson, Cara Carty, Richard Casaburi, Juan P. Casas Romero, James Casella, Peter Castaldi, Mark Chaffin, Christy Chang, Yi-Cheng Chang, Daniel Chasman, Sameer Chavan, Bo-Juen Chen, Wei-Min Chen, Michael Cho, Seung Hoan Choi, Lee-Ming Chuang, Mina Chung, Ren-Hua Chung, Suzy Comhair, Matthew Conomos, Elaine Cornell, Carolyn Crandall, James Crapo, L. Adrienne Cupples, Joanne Curran, Jeffrey Curtis, Brian Custer, Coleen Damcott, Dawood Darbar, Sean David, Colleen Davis, Michelle Daya, Mariza de Andrade, Lisa de las Fuentes, Paul de Vries, Michael DeBaun, Ranjan Deka, Dawn DeMeo, Scott Devine, Huyen Dinh, Harsha Doddapaneni, Qing Duan, Shannon Dugan-Perez, Ravi Duggirala, Susan K. Dutcher, Charles Eaton, Lynette Ekunwe, Adel El Boueiz, Patrick Ellinor, Leslie Emery, Serpil Erzurum, Charles Farber, Jesse Farek, Tasha Fingerlin, Matthew Flickinger, Myriam Fornage, Nora Franceschini, Chris Frazar, Mao Fu, Stephanie M. Fullerton, Lucinda Fulton, Stacey Gabriel, Weiniu Gan, Shanshan Gao, Yan Gao, Margery Gass, Heather Geiger, Bruce Gelb, Mark Geraci, Soren Germer, Auyon Ghosh, Richard Gibbs, Chris Gignoux, Mark Gladwin, David Glahn, Stephanie Gogarten, Da-Wei Gong, Harald Goring, Sharon Graw, Kathryn J. Gray, Daniel Grine, Colin Gross, C. Charles Gu, Yue Guan, Namrata Gupta, Jeff Haessler, Yi Han, Patrick Hanly, Daniel Harris, Nicola L. Hawley, Jiang He, Ben Heavner, Susan Heckbert, Ryan Hernandez, David Herrington, Craig Hersh, Bertha Hidalgo, James Hixson, Brian Hobbs, John Hokanson, Elliott Hong, Karin Hoth, Chao Hsiung, Jianhong Hu, Yi-Jen Hung, Haley Huston, Chii Min Hwu, Marguerite Ryan Irvin, Rebecca Jackson, Cashell Jaquish, Jill Johnsen, Andrew Johnson, Rich Johnston, Kimberly Jones, Hyun Min Kang, Robert Kaplan, Sharon Kardia, Shannon Kelly, Eimear Kenny, Michael Kessler, Alyna Khan, Ziad Khan, Wonji Kim, John Kimoff, Greg Kinney, Barbara Konkle, Charles Kooperberg, Holly Kramer, Christoph Lange, Ethan Lange, Leslie Lange, Cathy Laurie, Cecelia Laurie, Meryl LeBoff, Jiwon Lee, Sandra Lee, Wen-Jane Lee, Jonathon LeFaive, David Levine, Dan Levy, Joshua Lewis, Xiaohui Li, Yun Li, Henry Lin, Honghuang Lin, Xihong Lin, Simin Liu, Yu Liu, Ruth J. F. Loos, Steven Lubitz, Kathryn Lunetta, James Luo, Ulysses Magalang, Michael Mahaney, Barry Make, Alisa Manning, JoAnn Manson, Lisa Martin, Melissa Marton, Susan Mathai, Rasika Mathias, Susanne May, Patrick McArdle, Merry-Lynn McDonald, Sean McFarland, Stephen McGarvey, Daniel McGoldrick, Caitlin McHugh, Becky McNeil, Hao Mei, James Meigs, Vipin Menon, Luisa Mestroni, Ginger Metcalf, Deborah A. Meyers, Emmanuel Mignot, Julie Mikulla, Nancy Min, Mollie Minear, Ryan L. Minster, Braxton D. Mitchell, Matt Moll, Zeineen Momin, May E. Montasser, Courtney Montgomery, Donna Muzny, Josyf C. Mychaleckyj, Girish Nadkarni, Rakhi Naik, Take Naseri, Sergei Nekhai, Sarah C. Nelson, Bonnie Neltner, Caitlin Nessner, Deborah Nickerson, Osuji Nkechinyere, Kari North, Jeff O’Connell, Tim O’Connor, Heather Ochs-Balcom, Geoffrey Okwuonu, Allan Pack, David T. Paik, Nicholette Palmer, James Pankow, George Papanicolaou, Cora Parker, Gina Peloso, Juan Manuel Peralta, Marco Perez, James Perry, Ulrike Peters, Patricia Peyser, Lawrence S. Phillips, Jacob Pleiness, Toni Pollin, Wendy Post, Julia Powers Becker, Meher Preethi Boorgula, Michael Preuss, Bruce Psaty, Pankaj Qasba, Dandi Qiao, Zhaohui Qin, Nicholas Rafaels, Laura Raffield, Mahitha Rajendran, Vasan S. Ramachandran, D. C. Rao, Laura Rasmussen-Torvik, Aakrosh Ratan, Susan Redline, Robert Reed, Catherine Reeves, Elizabeth Regan, Alex Reiner, Muagututiâ€ã Sefuiva Reupena, Ken Rice, Rebecca Robillard, Nicolas Robine, Dan Roden, Carolina Roselli, Ingo Ruczinski, Alexi Runnels, Pamela Russell, Sarah Ruuska, Ester Cerdeira Sabino, Danish Saleheen, Shabnam Salimi, Sejal Salvi, Steven Salzberg, Kevin Sandow, Vijay G. Sankaran, Jireh Santibanez, Karen Schwander, David Schwartz, Frank Sciurba, Christine Seidman, Jonathan Seidman, Frédéric Sériès, Vivien Sheehan, Stephanie L. Sherman, Amol Shetty, Aniket Shetty, Wayne Hui-Heng Sheu, M. Benjamin Shoemaker, Brian Silver, Edwin Silverman, Robert Skomro, Albert Vernon Smith, Jennifer Smith, Josh Smith, Nicholas Smith, Tanja Smith, Sylvia Smoller, Beverly Snively, Michael Snyder, Tamar Sofer, Nona Sotoodehnia, Adrienne M. Stilp, Garrett Storm, Elizabeth Streeten, Jessica Lasky Su, Yun Ju Sung, Jody Sylvia, Adam Szpiro, Daniel Taliun, Hua Tang, Margaret Taub, Matthew Taylor, Simeon Taylor, Marilyn Telen, Timothy A. Thornton, Machiko Threlkeld, Lesley Tinker, David Tirschwell, Sarah Tishkoff, Hemant Tiwari, Catherine Tong, Michael Tsai, Dhananjay Vaidya, David Van Den Berg, Peter VandeHaar, Scott Vrieze, Tarik Walker, Robert Wallace, Avram Walts, Fei Fei Wang, Heming Wang, Jiongming Wang, Karol Watson, Jennifer Watt, Daniel E. Weeks, Joshua Weinstock, Bruce Weir, Scott T. Weiss, Lu-Chen Weng, Jennifer Wessel, Cristen Willer, Kayleen Williams, L. Keoki Williams, Carla Wilson, Lara Winterkorn, Quenna Wong, Joseph Wu, Huichun Xu, Lisa Yanek, Ivana Yang, Ketian Yu, Seyedeh Maryam Zekavat, Yingze Zhang, Snow Xueyan Zhao, Wei Zhao, Xiaofeng Zhu, Elad Ziv, Michael Zody, and Sebastian Zoellner
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-022-32275-3.
References
- 1.McGarrah RW, Crown SB, Zhang G-F, Shah SH, Newgard CB. Cardiovascular metabolomics. Circ. Res. 2018;122:1238–1258. doi: 10.1161/CIRCRESAHA.117.311002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Spratlin JL, Serkova NJ, Eckhardt SG. Clinical applications of metabolomics in oncology: a review. Clin. Cancer Res. 2009;15:431–440. doi: 10.1158/1078-0432.CCR-08-1059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Gerszten RE, Wang TJ. The search for new cardiovascular biomarkers. Nature. 2008;451:949–952. doi: 10.1038/nature06802. [DOI] [PubMed] [Google Scholar]
- 4.Gieger C, et al. Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum. PLoS Genet. 2008;4:e1000282. doi: 10.1371/journal.pgen.1000282. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hicks AA, et al. Genetic determinants of circulating sphingolipid concentrations in European populations. PLoS Genet. 2009;5:e1000672. doi: 10.1371/journal.pgen.1000672. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Illig T, et al. A genome-wide perspective of genetic variation in human metabolism. Nat. Genet. 2010;42:137–141. doi: 10.1038/ng.507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lemaitre RN, et al. Genetic loci associated with plasma phospholipid n-3 fatty acids: a meta-analysis of genome-wide association Studies from the CHARGE Consortium. PLoS Genet. 2011;7:e1002193. doi: 10.1371/journal.pgen.1002193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Suhre K, et al. Human metabolic individuality in biomedical and pharmaceutical research. Nature. 2011;477:54–60. doi: 10.1038/nature10354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Suhre K, et al. A genome-wide association study of metabolic traits in human urine. Nat. Genet. 2011;43:565–569. doi: 10.1038/ng.837. [DOI] [PubMed] [Google Scholar]
- 10.Tukiainen T, et al. Detailed metabolic and genetic characterization reveals new associations for 30 known lipid loci. Hum. Mol. Genet. 2011;21:1444–1455. doi: 10.1093/hmg/ddr581. [DOI] [PubMed] [Google Scholar]
- 11.Demirkan A, et al. Genome-wide association study identifies novel loci associated with circulating phospho- and sphingolipid concentrations. PLoS Genet. 2012;8:e1002490. doi: 10.1371/journal.pgen.1002490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Inouye M, et al. Novel loci for metabolic networks and multi-tissue expression studies reveal genes for atherosclerosis. PLoS Genet. 2012;8:e1002907. doi: 10.1371/journal.pgen.1002907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kettunen J, et al. Genome-wide association study identifies multiple loci influencing human serum metabolite levels. Nat. Genet. 2012;44:269–276. doi: 10.1038/ng.1073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Krumsiek J, et al. Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information. PLoS Genet. 2012;8:e1003005. doi: 10.1371/journal.pgen.1003005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Shin S-Y, et al. An atlas of genetic influences on human blood metabolites. Nat. Genet. 2014;46:543–550. doi: 10.1038/ng.2982. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Yu B, et al. Genetic determinants influencing human serum metabolome among African Americans. PLoS Genet. 2014;10:e1004212. doi: 10.1371/journal.pgen.1004212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Burkhardt R, et al. Integration of genome-wide SNP data and gene-expression profiles reveals six novel loci and regulatory mechanisms for amino acids and acylcarnitines in whole blood. PLoS Genet. 2015;11:e1005510. doi: 10.1371/journal.pgen.1005510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Demirkan A, et al. Insight in genome-wide association of metabolite quantitative traits by exome sequence analyses. PLoS Genet. 2015;11:e1004835. doi: 10.1371/journal.pgen.1004835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Draisma HHM, et al. Genome-wide association study identifies novel genetic variants contributing to variation in blood metabolite levels. Nat. Commun. 2015;6:7208. doi: 10.1038/ncomms8208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Rhee EP, et al. An exome array study of the plasma metabolome. Nat. Commun. 2016;7:12360. doi: 10.1038/ncomms12360. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Long T, et al. Whole-genome sequencing identifies common-to-rare variants associated with human blood metabolites. Nat. Genet. 2017;49:568–578. doi: 10.1038/ng.3809. [DOI] [PubMed] [Google Scholar]
- 22.Yazdani A, et al. Genome analysis and pleiotropy assessment using causal networks with loss of function mutation and metabolomics. BMC Genomics. 2019;20:395. doi: 10.1186/s12864-019-5772-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lotta LA, et al. A cross-platform approach identifies genetic regulators of human metabolism and health. Nat. Genet. 2021;53:54–64. doi: 10.1038/s41588-020-00751-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Raffler J, et al. Identification and MS-assisted interpretation of genetically influenced NMR signals in human plasma. Genome Med. 2013;5:13. doi: 10.1186/gm417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Baldassari AR, et al. Multi-ethnic genome-wide association study of decomposed cardioelectric phenotypes illustrates strategies to identify and characterize evidence of shared genetic effects for complex traits. Circ. Genom. Precis. Med. 2020;13:e002680. doi: 10.1161/CIRCGEN.119.002680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Roselli C, et al. Multi-ethnic genome-wide association study for atrial fibrillation. Nat. Genet. 2018;50:1225–1233. doi: 10.1038/s41588-018-0133-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wyss AB, et al. Multiethnic meta-analysis identifies ancestry-specific and cross-ancestry loci for pulmonary function. Nat. Commun. 2018;9:2976. doi: 10.1038/s41467-018-05369-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.McClellan JM, Lehner T, King M-C. Gene discovery for complex traits: lessons from Africa. Cell. 2017;171:261–264. doi: 10.1016/j.cell.2017.09.037. [DOI] [PubMed] [Google Scholar]
- 29.Stein EA, et al. Effect of a monoclonal antibody to PCSK9 on LDL cholesterol. N. Engl. J. Med. 2012;366:1108–1118. doi: 10.1056/NEJMoa1105803. [DOI] [PubMed] [Google Scholar]
- 30.Menni C, et al. Biomarkers for type 2 diabetes and impaired fasting glucose using a nontargeted metabolomics approach. Diabetes. 2013;62:4270–4276. doi: 10.2337/db13-0570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Sévin DC, Kuehne A, Zamboni N, Sauer U. Biological insights through nontargeted metabolomics. Curr. Opin. Biotechnol. 2015;34:1–8. doi: 10.1016/j.copbio.2014.10.001. [DOI] [PubMed] [Google Scholar]
- 32.Wishart DS, et al. HMDB: the human metabolome database. Nucleic Acids Res. 2007;35:D521–D526. doi: 10.1093/nar/gkl923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Zang X, Monge ME, Fernández FM. Mass spectrometry-based non-targeted metabolic profiling for disease detection: recent developments. Trends Anal. Chem. 2019;118:158–169. doi: 10.1016/j.trac.2019.05.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.O’Sullivan JF, et al. Dimethylguanidino valeric acid is a marker of liver fat and predicts diabetes. J. Clin. Invest. 2017;127:4394–4402. doi: 10.1172/JCI95995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Staley JR, et al. PhenoScanner: a database of human genotype-phenotype associations. Bioinformatics. 2016;32:3207–3209. doi: 10.1093/bioinformatics/btw373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Buniello A, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47:D1005–d1012. doi: 10.1093/nar/gky1120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Karczewski KJ, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Gold L, et al. Aptamer-based multiplexed proteomic technology for biomarker discovery. PLoS ONE. 2010;5:e15004. doi: 10.1371/journal.pone.0015004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Dührkop K, et al. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat. Biotechnol. 2021;39:462–471. doi: 10.1038/s41587-020-0740-8. [DOI] [PubMed] [Google Scholar]
- 40.Dührkop K, et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods. 2019;16:299–302. doi: 10.1038/s41592-019-0344-8. [DOI] [PubMed] [Google Scholar]
- 41.Wang M, et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 2016;34:828–837. doi: 10.1038/nbt.3597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Yang X, et al. SCARB1 gene variants are associated with the phenotype of combined high high-density lipoprotein cholesterol and high lipoprotein (a) Circ Cardiovasc. Genet. 2016;9:408–418. doi: 10.1161/CIRCGENETICS.116.001402. [DOI] [PubMed] [Google Scholar]
- 43.van Bennekum A, et al. Class B scavenger receptor-mediated intestinal absorption of dietary β-carotene and cholesterol. Biochemistry. 2005;44:4517–4525. doi: 10.1021/bi0484320. [DOI] [PubMed] [Google Scholar]
- 44.Widjaja-Adhi MAK, et al. Transcription factor ISX mediates the cross talk between diet and immunity. Proc. Natl Acad. Sci. USA. 2017;114:11530–11535. doi: 10.1073/pnas.1714963114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Arvanitis M, et al. Identification of transthyretin cardiac amyloidosis using serum retinol-binding protein 4 and a clinical prediction model. JAMA Cardiol. 2017;2:305–313. doi: 10.1001/jamacardio.2016.5864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Hauser PS, Narayanaswami V, Ryan RO. Apolipoprotein E: from lipid transport to neurobiology. Prog. Lipid Res. 2011;50:62–74. doi: 10.1016/j.plipres.2010.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.van Duijn CM, et al. Apolipoprotein E4 allele in a population–based study of early–onset Alzheimer’s disease. Nat. Genet. 1994;7:74–78. doi: 10.1038/ng0594-74. [DOI] [PubMed] [Google Scholar]
- 48.Grimm MOW, Mett J, Hartmann T. The impact of vitamin E and other fat-soluble vitamins on Alzheimer´s disease. Int J. Mol. Sci. 2016;17:1785. doi: 10.3390/ijms17111785. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Jiménez-Jiménez FJ, et al. Cerebrospinal fluid levels of alpha-tocopherol (vitamin E) in Alzheimer’s disease. J. Neural Transm. 1997;104:703–710. doi: 10.1007/BF01291887. [DOI] [PubMed] [Google Scholar]
- 50.Omi K, et al. CD36 polymorphism is associated with protection from cerebral malaria. Am. J. Hum. Genet. 2003;72:364–374. doi: 10.1086/346091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Pravenec M, et al. Identification of renal Cd36 as a determinant of blood pressure and risk for hypertension. Nat. Genet. 2008;40:952–954. doi: 10.1038/ng.164. [DOI] [PubMed] [Google Scholar]
- 52.Kuwasako T, et al. Lipoprotein abnormalities in human genetic CD36 deficiency associated with insulin resistance and abnormal fatty acid metabolism. Diabetes Care. 2003;26:1647–1648. doi: 10.2337/diacare.26.5.1647-a. [DOI] [PubMed] [Google Scholar]
- 53.Park YM. CD36, a scavenger receptor implicated in atherosclerosis. Exp. Mol. Med. 2014;46:e99–e99. doi: 10.1038/emm.2014.38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Febbraio M, Hajjar DP, Silverstein RL. CD36: a class B scavenger receptor involved in angiogenesis, atherosclerosis, inflammation, and lipid metabolism. J. Clin. Invest. 2001;108:785–791. doi: 10.1172/JCI14006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Moore KJ, Freeman MW. Scavenger receptors in atherosclerosis. Arterioscler. Thromb. Vasc. Biol. 2006;26:1702–1711. doi: 10.1161/01.ATV.0000229218.97976.43. [DOI] [PubMed] [Google Scholar]
- 56.Zoeller RA, et al. Increasing plasmalogen levels protects human endothelial cells during hypoxia. Am. J. Physiol. Heart Circ. Physiol. 2002;283:H671–H679. doi: 10.1152/ajpheart.00524.2001. [DOI] [PubMed] [Google Scholar]
- 57.Ranque B, et al. Arterial stiffness impairment in sickle cell disease associated with chronic vascular complications. Circulation. 2016;134:923–933. doi: 10.1161/CIRCULATIONAHA.115.021015. [DOI] [PubMed] [Google Scholar]
- 58.Gladwin MT, Sachdev V. Cardiovascular abnormalities in sickle cell disease. J. Am. Coll. Cardiol. 2012;59:1123–1133. doi: 10.1016/j.jacc.2011.10.900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Liu SC, et al. Red cell membrane remodeling in sickle cell anemia. Sequestration of membrane lipids and proteins in Heinz bodies. J. Clin. Invest. 1996;97:29–36. doi: 10.1172/JCI118402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Vincent MF, Marangos PJ, Gruber HE, Van den Berghe G. Inhibition by AICA riboside of gluconeogenesis in isolated rat hepatocytes. Diabetes. 1991;40:1259–1266. doi: 10.2337/diab.40.10.1259. [DOI] [PubMed] [Google Scholar]
- 61.Campàs C, Santidrián AF, Domingo A, Gil J. Acadesine induces apoptosis in B cells from mantle cell lymphoma and splenic marginal zone lymphoma. Leukemia. 2005;19:292–294. doi: 10.1038/sj.leu.2403593. [DOI] [PubMed] [Google Scholar]
- 62.Santidrián AF, et al. AICAR induces apoptosis independently of AMPK and p53 through up-regulation of the BH3-only proteins BIM and NOXA in chronic lymphocytic leukemia cells. Blood. 2010;116:3023–3032. doi: 10.1182/blood-2010-05-283960. [DOI] [PubMed] [Google Scholar]
- 63.Marie S, et al. AICA-ribosiduria: a novel, neurologically devastating inborn error of purine biosynthesis caused by mutation of ATIC. Am. J. Hum. Genet. 2004;74:1276–1281. doi: 10.1086/421475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Ramond F, et al. AICA-ribosiduria due to ATIC deficiency: delineation of the phenotype with three novel cases, and long-term update on the first case. J. Inherit. Metab. Dis. 2020;43:1254–1264. doi: 10.1002/jimd.12274. [DOI] [PubMed] [Google Scholar]
- 65.Taylor HA, Jr., et al. Toward resolution of cardiovascular health disparities in African Americans: design and methods of the Jackson Heart Study. Ethn. Dis. 2005;15:S6-4-17. [PubMed] [Google Scholar]
- 66.Bild DE, et al. Multi-ethnic study of atherosclerosis: objectives and design. Am. J. Epidemiol. 2002;156:871–881. doi: 10.1093/aje/kwf113. [DOI] [PubMed] [Google Scholar]
- 67.Bouchard C, et al. The HERITAGE family study. Aims, design, and measurement protocol. Med Sci. Sports Exerc. 1995;27:721–729. doi: 10.1249/00005768-199505000-00015. [DOI] [PubMed] [Google Scholar]
- 68.Tahir UA, et al. Metabolomic profiles and heart failure risk in black adults: insights from the Jackson Heart Study. Circ. Heart Fail. 2021;14:e007275. doi: 10.1161/CIRCHEARTFAILURE.120.007275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Bar N, et al. A reference map of potential determinants for the human serum metabolome. Nature. 2020;588:135–140. doi: 10.1038/s41586-020-2896-2. [DOI] [PubMed] [Google Scholar]
- 70.Do KT, et al. Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies. Metabolomics. 2018;14:128–128. doi: 10.1007/s11306-018-1420-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Chambers MC, et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 2012;30:918–920. doi: 10.1038/nbt.2377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Gatto L, Lilley KS. MSnbase-an R/Bioconductor package for isobaric tagged mass spectrometry data visualization, processing and quantitation. Bioinformatics. 2012;28:288–289. doi: 10.1093/bioinformatics/btr645. [DOI] [PubMed] [Google Scholar]
- 73.Raffield LM, et al. D-dimer in African Americans: whole genome sequence analysis and relationship to cardiovascular disease risk in the Jackson Heart Study. Arterioscler. Thromb. Vasc. Biol. 2017;37:2220–2227. doi: 10.1161/ATVBAHA.117.310073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Conomos MP, Miller MB, Thornton TA. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet. Epidemiol. 2015;39:276–293. doi: 10.1002/gepi.21896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Jiang L, et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nat. Genet. 2019;51:1749–1755. doi: 10.1038/s41588-019-0530-8. [DOI] [PubMed] [Google Scholar]
- 76.Myers, T. A., Chanock, S. J. & Machiela, M. J. LDlinkR: an R package for rapidly calculating linkage disequilibrium statistics in diverse populations. Front. Genet.11, 157 (2020). [DOI] [PMC free article] [PubMed]
- 77.Otasek D, Morris JH, Bouças J, Pico AR, Demchak B. Cytoscape automation: empowering workflow-based network analysis. Genome Biol. 2019;20:185. doi: 10.1186/s13059-019-1758-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Sumner LW, et al. Proposed minimum reporting standards for chemical analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI) Metabolomics. 2007;3:211–221. doi: 10.1007/s11306-007-0082-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The WGAS summary statistics data generated in this study have been deposited in the GWAS catalog under accession code GCST90104476. Individual WGS data for TOPMed and metabolomic data for JHS and MESA, can be obtained by application to dbGaP with accession numbers for JHS and MESA are phs000964/phs002256.v5.p1 and phs001416.v2.p1. In addition, MS/MS spectra and analyses via Global Natural Product Structural Molecular Networking (GNPS) Job ID: aa6d11c8be15436abcb7d3d44fee5836 can be accessed at.