Abstract
Although genome-wide association studies (GWAS) have successfully linked genetic risk loci to various disorders, identifying underlying cellular biological mechanisms remains challenging due to the complex nature of common diseases. We established a framework using human peripheral blood cells, physical, chemical and pharmacological perturbations, and flow cytometry-based functional readouts to reveal latent cellular processes and performed GWAS based on these evoked traits in up to 2,600 individuals. We identified 119 genomic loci implicating 96 genes associated with these cellular responses and discovered associations between evoked blood phenotypes and subsets of common diseases. We found a population of pro-inflammatory anti-apoptotic neutrophils prevalent in individuals with specific subsets of cardiometabolic disease. Multigenic models based on this trait predicted the risk of developing chronic kidney disease in type 2 diabetes patients. By expanding the phenotypic space for human genetic studies, we could identify variants associated with large effect response differences, stratify patients and efficiently characterize the underlying biology.
Subject terms: Genome-wide association studies, High-throughput screening, Personalized medicine, Translational research
Genome-wide analyses of blood cell phenotypes derived from perturbations coupled with flow cytometry-based functional readouts identify loci associated with latent cellular traits, yielding insights into biological mechanisms underlying common diseases.
Main
Precision medicine strives to reclassify complex heterogeneous diseases into distinct biologically defined groups, thereby enabling targeted therapies and improved outcomes. Examples include the subdivision of common cancers by somatic driver mutations1, the discovery of eosinophilic variants of asthma2 and the recognition that some presentations of heart failure may arise from the accumulation of amyloidogenic proteins, which can be subdivided further based on the aggregating protein3. The realization of precision medicine has been hindered by the lack of readily available measures of the activities of discrete biological pathways in most common diseases. Historical approaches have focused on mining large patient biobanks combining archived DNA, RNA and serum or plasma samples with clinical records4. Although such strategies have identified common genetic variants associated with clinical outcomes, they have typically not been successful at capturing the underlying cell biology, limiting their utility in producing mechanistic insights into therapeutic implications5,6.
We aimed to establish a framework that bridges genetic variants and complex diseases through standardized phenotyping of primary human cells. We used live human blood cells, as these reflect physiological processes, disease states and environmental factors, including active therapies. For example, dysregulation of hematopoietic processes can result in disease progression via mechanisms such as the contribution of inflammation to atherosclerosis and insulin resistance7–9 or hyperactive coagulation in pathological thrombosis10–12. In addition to circulating cells with their repertoire of responses, blood plasma contains hormones, secreted proteins, metabolites, cell-free DNA, microparticles and extracellular vesicles that can carry signals to blood cells or other cell types. Peripheral blood may offer a diagnostic window into multiple organ systems and integrative physiology13–15.
Previous genome-wide association studies (GWAS) on whole blood primarily focused on complete blood counts (CBCs); clinical parameters describing numbers; volumes and distribution of leukocytes; erythrocytes and platelets; and the genetic architecture of hematopoiesis and blood diseases have been mapped in detail16–18. A recent study expanded measured phenotypes to include flow cytometry-derived parameters with the aim of better describing cellular function19. The Human Functional Genomics Project profiled cytokine production and baseline immune parameters in response to pathogen challenges20. Other studies have revealed the genetic basis of platelet aggregation in response to known agonists21,22. However, these studies did not consider the dynamic responses of blood cells to environmental conditions, which likely contribute to their effects on disease development, progression and prevention.
We hypothesized that treating whole blood ex vivo with diverse stressors or stimuli would enable the identification of latent differential cellular responses and new disease-associated endophenotypes. We anticipated that this expansion of phenotypic space would evoke traits determined by large effect size common alleles, enabling efficient target identification and improving the prediction of incident events. Moreover, given that biological pathways are reused across diverse tissues and organ systems, insights into whole blood may be relevant to a range of conditions originating in different tissues. By identifying intermediate cellular phenotypes, we sought to define subcategories of disease and specific pathophysiologic mechanisms that can be targeted more directly.
Results
Chemical perturbations expand the phenotypic space of blood profiles
In clinical settings, whole blood cytometry is used to quantify circulating cells as part of standardized diagnostic tests. We adapted a widely-used whole-blood cytometry analyzer (Sysmex XN-1000) to systematically profile peripheral blood from over 4,700 study participants (donors) under 37 conditions (36 perturbations and baseline), genotyped more than 2,600 donors and performed GWAS for all blood perturbation profiles (Fig. 1a). We recorded side scatter (SSC), forward scatter (FSC) and side fluorescence (SFL) of blood cells using four fluorescence dyes (white cell differential channel by fluorescence (WDF), white count and nucleated red blood cells (WNR), reticulocyte (RET) and platelet F (PLT-F)) that quantify morphological and intracellular properties. Chemical stressors evoked distinct cellular states for blood cells that were not typically observed under baseline conditions, enabling the detection of new cell populations in three-dimensional cytometry measurements (Extended Data Fig. 1). We determined cellular gates based on empiric distributions of blood cells under perturbation conditions and defined parameter sets for all observed cell populations (Fig. 1b and Extended Data Fig. 2). The perturbation conditions represented discrete classes of exposure likely to contribute to blood cell responses as follows: (1) simulated physiological stressors; (2) chemical stressors; (3) gut microbiome metabolites; and (4) drugs with known mechanisms of action (Supplementary Table 1). We recorded up to 37 condition-specific blood responses for each donor and calculated quantitative profiles characterizing each cell population using cell counts, as well as median and s.d. for SSC, FSC and SFL parameters for each blood cell population (Fig. 1c and Supplementary Table 2). Compared to the baseline, each perturbation evoked particular changes in the characteristics of different blood lineages, resulting in a series of distinct cellular profiles (Extended Data Figs. 1 and 2 and Supplementary Fig. 1). With these chemical perturbations, we expanded quantification for each donor from 278 parameters to more than 4,000 parameters on average, greatly expanding the phenotypic space that could be interrogated.
Across the 36 perturbations, we collected measurements from 650 to 3,300 donors per condition. We then associated blood-response profiles with clinical traits, including quantitative lab values and diagnostic codes, to identify clinical endpoints and disease syndromes reflected in the evoked blood-response readouts (Fig. 1d). We also identified genetic loci associated with blood perturbation responses, which were often specific to perturbation conditions, cell populations and physical readouts (Fig. 1e). When comparing blood-response profiles, the perturbation conditions, readouts and associated genetic loci formed clusters of related conditions and cell types (Extended Data Fig. 3), suggesting the evoked blood profiles are informative for specific biological processes.
Perturbational conditions yield new genetic associations
To determine genetic variants associated with perturbation blood cell responses, we tested linear, univariate associations of 278 cellular phenotypes in 37 different conditions against >3.5 million imputed variants in 260–2,200 donors. We clumped variants with high linkage disequilibrium (LD) to identify more than 100 genomic loci that were significantly associated with at least two blood perturbation readouts (Supplementary Data 1). We identified 48 unique, nonoverlapping regions with nearby candidate genes (Fig. 2a and Table 1). Approximately half of the identified associations (25 of 48 genetic regions with candidate genes) had previously been described as blood biomarker associations under baseline conditions with parameters that are part of CBC studies encompassing 170,000 to over 700,000 individuals16–18. We observed new associations in previously unreported cell types for many previously reported loci (12 of 25), such as white blood cell (WBC) responses associated with SLC83A3, whereas only RET-based associations had previously been described17. Additionally, we identified 23 new regions associated with blood cell responses to perturbations that have not been described, for example, the response to empagliflozin associated with variants in TMCO4. This gene had previously been associated with chronic inflammatory diseases23. Most associations we observed were specific to a particular blood lineage, such as RET readouts associated with TRIM58 or neutrophil-specific associations with PFKP and ACSL1.
Table 1.
Lead SNP | rsID | P value | Candidate genes | CADD consequence | Top trait | Obs. | Previous association |
---|---|---|---|---|---|---|---|
1:20032226:G:A | rs10917522 | 3.09 × 10−9 | TMCO4 | Intron | WDF Empa 1.5 h NE3 CV SFL | 380 | – |
1:25703156:C:T | rs644592 | 5.58 × 10−18 | RHCE | Intron | RET rotenone 6 h ov. RET1 CV SFL | 943 | RBCa |
1:89840389:T:C | rs7550358 | 1.55 × 10−8 | GBP6 | Intron | RET captopril 5.5 h RET2 Count | 353 | – |
1:103361529:A:C | rs72683260 | 3.22 × 10−9 | COL11A1 | Intron | RET TMAO 3.5 h RBC2 Med SFL | 361 | – |
1:225579918:A:T | rs41268717 | 6.90 × 10−9 | DNAH14, LBR | Intron | WNR water 15 h WBC2 Med FSC | 1,423 | RBCa |
1:248039451:C:T | rs3811444 | 1.37 × 10−11 | TRIM58 | Missense | RET KCl 17 h RET1 SD SSC | 1,410 | RBCa, PLTb |
2:203226371:G:A | rs72925015 | 1.24 × 10−8 | BMPR2 | Upstream | WDF water 15 h MO2 Med SSC | 1,392 | RBCa |
3:16551213:C:G | rs2881513 | 3.78 × 10−8 | RFTN1 | Regulatory, intron | WNR nigericin 0.5 h UK1 CV FSC | 327 | – |
3:49774658:G:A | rs73077175 | 1.01 × 10−13 | CDHR4-UBA7, IP6K1 | Intron | WDF baseline NE2 Med SFL | 1,629 | RBCa |
3:50255663:C:T | rs35926495 | 8.32 × 10−25 | SLC38A3 | Intron | WDF baseline NE2 Med SFL | 1,664 | RBCa |
3:50374293:A:G | rs2073499 | 9.63 × 10−9 | HYAL3, RASSF1 | Regulatory, intron | WDF baseline NE2/NE4 ratio | 1,565 | BASOa |
3:51406862:A:G | rs111614418 | 2.29 × 10−8 | DOCK3 | Intron | WNR LPS 18 h WBC Med SSC | 1,416 | EOb |
3:56849749:T:C | rs1354034 | 7.23 × 10−10 | ARHGEF3 | Intron | RET KCl 17 h PLT Med SFL | 1,397 | PLT, RBC, LYb |
3:94702472:C:T | rs1432474 | 1.92 × 10−8 | LINC00879 | Intron | WDF water 23 h MO2 CV FSC | 1,415 | – |
4:38677227:A:C | rs34089598 | 7.94 × 10−12 | KLF3, KLF3-AS1 | Regulatory, intron | WNR Pam3CSK4 19 h WBC CV FSC | 1,310 | WBCb |
4:38798648:C:A | rs5743618 | 8.20 × 10−103 | TLR1, TLR6, TLR10 | Missense | WDF Pam3CSK4 19 h NE1 Med FSC | 1,300 | – |
4:178716833:T:C | rs10030190 | 4.08 × 10−8 | LINC01098 | Intron | WNR baseline UK1 CV FSC | 1,486 | – |
4:185602707:G:A | rs72703519 | 2.92 × 10−20 | CASP3-ACSL1 | Intron | WDF KCl 17 h NE2/NE4 ratio | 1,336 | – |
4:185665118:G:A | rs12513029 | 1.55 × 10−13 | CASP3-ACSL1 | Intergenic | WDF colchicine 20 h NE4 SD SFL | 1,296 | PLTa |
6:25719210:T:C | rs9358870 | 3.71 × 10−9 | SCGN | Intergenic | RET DMSO 4.5 h RBC1 SD FSC | 355 | PLTb |
6:25878848:A:G | rs55925606 | 2.97 × 10−9 | HFE-TRIM38 | Upstream and downstream | RET DMSO 4.5 h RBC1 CV FSC | 381 | RBC, PLTb |
7:18398911:C:T | rs62450075 | 9.82 × 10−9 | HDAC9 | Intron | RET KCl 17 h RBC1 SD FSC | 1,381 | – |
7:24832308:A:G | rs4719781 | 2.50 × 10−18 | DFNA5, OSBPL3 | Downstream | WNR ciprofloxacin 22 h BASO Med SSC | 1,260 | – |
7:28773957:A:C | rs73075771 | 1.19 × 10−8 | CREB5 | Intron | WNR TMAO 3.5 h UK1 CV SSC | 325 | WBCb |
7:92408370:C:T | rs445 | 2.30 × 10−14 | CDK6 | Regulatory, intron | WDF baseline EO1 Med SSC | 1,698 | WBC, RBCb |
7:128371246:C:T | rs41274144 | 6.64 × 10−9 | GARIN1B | 3′ UTR | WNR TMAO 3.5 h PLT CV SSC | 327 | – |
8:4096691:T:C | rs28522529 | 2.87 × 10−10 | CSMD1 | Intron | WDF captopril 5.5 h MO2 CV SFL | 343 | – |
8:6828115:G:T | rs2615764 | 1.89 × 10−17 | DEFA10P | Upstream | PLT-F baseline WBC1 Med SSC | 1,662 | WBCb |
9:7015133:A:G | rs10975974 | 3.39 × 10−10 | KDM4C | Intron | WDF baseline MO2 Med SSC | 1,688 | RBCa |
9:9744225:A:C | rs80353904 | 3.10 × 10−8 | PTPRD | Intron | WDF nigericin 7.5 h EO2 CV SSC | 351 | – |
10:3139540:A:G | rs34538474 | 6.55 × 10−15 | PFKP | Intron | WDF KCl 17 h NE2/NE4 ratio | 1,339 | PLTa |
10:71109406:T:C | rs6480404 | 4.03 × 10−13 | HK1 | Regulatory, intron | WDF Alhydrogel 21 h NE4 SD SFL | 1,378 | RBCb |
11:972270:C:T | rs7933889 | 1.03 × 10−8 | AP2A2 | Intron | WNR ciprofloxacin 22 h WBC2 SD SFL | 1,358 | – |
11:11548147:A:G | rs10831631 | 3.19 × 10−9 | GALNT18 | Intron | WDF LiCL 4 h NE1 CV FSC | 369 | – |
11:56806558:C:T | rs12421419 | 4.11 × 10−9 | OR5AK2, OR5AK4P | Downstream | WDF colchicine 20 h LY SD SSC | 1,338 | – |
11:57159189:T:C | rs548854 | 1.81 × 10−12 | PRG2, SLC43A3 | Upstream, intron | WDF colchicine 20 h EO1 Med FSC | 1,383 | – |
11:87048905:G:A | rs4536247 | 9.81 × 10−9 | TMEM135 | Intergenic | WDF water 15 h NE2% | 1,358 | – |
11:93862020:C:T | rs4753126 | 3.58 × 10−12 | HEPHL1, PANX1 | Regulatory, upstream | WDF colchicine 20 h EO2 Med SFL | 1,319 | RBCa |
11:112971545:C:T | rs11214488 | 2.16 × 10−8 | NCAM1 | Intron | WDF cholic acid 6.5 h NE3 CV SSC | 360 | – |
12:75695577:A:G | rs10785185 | 2.62 × 10−8 | CAPS2 | Intron | PLT-F isobutyric 3 h IPF SD FSC | 370 | – |
12:122399173:C:A | rs11615667 | 1.24 × 10−9 | WDR66 | Intron | PLT-F ciprofloxacin 22 h IPF SD SFL | 1,284 | PLTa |
14:21347966:G:T | rs74034667 | 1.88 × 10−10 | RNASE3 | Upstream | WDF baseline MO2 CV SFL | 1,700 | – |
14:21423790:G:C | rs2013109 | 8.60 × 10−12 | RNASE2 | Intron | WDF baseline MO CV SFL | 1,651 | – |
14:55654183:T:C | rs2094103 | 1.01 × 10−8 | DLGAP5 | Intron | PLT-F ciprofloxacin 22 h PLT-F SD FSC | 1,399 | – |
15:80260872:G:A | rs67760360 | 6.95 × 10−21 | BCL2A1 | Regulatory, intron | WDF LPS 18 h NE4 CV SFL | 1,430 | WBCb |
20:4157072:C:G | rs6084653 | 3.94 × 10−10 | SMOX | Intron | RET baseline RET2 CV SFL | 1,605 | RBCb |
20:57569860:C:G | rs1043219 | 4.09 × 10−10 | NELFCD | Downstream, 3′UTR | RET colchicine 20 h PLT CV SFL | 1,334 | PLTb |
20:57597970:A:C | rs463312 | 1.19 × 10−19 | TUBB1 | Missense, downstream | PLT-F baseline IPF SD SFL | 1,681 | PLTb |
Associations for blood traits and perturbation conditions were clumped to produce unique genomic regions across multiple conditions. Two-sided P values are based on t tests in linear regression models and are not adjusted for multiple testing. Variants with the lowest P value for each clumped region were selected as lead SNPs. The trait names contain the channel, condition and readout; for example, WDF Empa 1.5 h NE3 CV SFL indicates a readout in the WDF channel, with empagliflozin treatment, quantifying the SFL CV of a neutrophil subpopulation (NE3). This table contains a subset of regions with nearby candidate genes (see Supplementary Data 1 for a complete listing of associations). CV, coefficient of variation.
aThe previous association identified in ref. 17, which analyzed over 560,000 individuals.
bThe previous association identified in ref. 16, which analyzed over 173,000 individuals.
Chemical stressors increased response differences among donors (Extended Data Figs. 1 and 2 and Supplementary Fig. 1), making it possible to identify robust genetic associations with small sample sizes. For example, neutrophil and other WBC responses induced by inflammatory stimuli such as Pam3CSK4 or lipopolysaccharide (LPS) showed strong associations with a missense variant in TLR1 (for example, rs5743618, WDF_Pam3CSK4_19h_NE1_Med_FSC; P = 8.2 × 10−103, n = 1,300). This association between TLR1 and WBC traits was not described previously in cohorts studying CBC parameters with over half a million individuals. The same SNP has previously been associated with asthma and allergic diseases through unclear mechanisms24. Our results suggest a potential role for neutrophils as mediators in these disease phenotypes. Comparing β coefficients for six genes that were previously identified in blood-trait GWAS showed that perturbation conditions greatly increased observed effect sizes compared to baseline conditions (Fig. 2b).
Blood perturbation responses reflect organ-specific disease traits
To assess whether perturbation-based blood cell traits reflect individuals’ disease status, we tested for associations between 327 blood readouts (top three traits with the lowest GWAS P value were selected for each unique locus) and a collection of structured phenotypes based on electronic health record (EHR) data. Diagnostic status for multiple common disorders was significantly associated with variation in blood perturbation readouts (Fig. 3a,b, Supplementary Fig. 2 and Supplementary Data 2). Notably, perturbations elicited unique disease associations absent at baseline. For example, neutrophil variability in SFL at baseline (WDF_Baseline_NE4_SD_SFL) showed no significant association with disease. However, the same parameter under 21 h Alhydrogel perturbation (WDF_Alhydrogel_21h_NE4_SD_SFL) showed negative associations with multiple cardiometabolic diseases, including heart failure (cases = 532, t = −2.98, Padj = 0.014), type 2 diabetes (T2D; cases = 685, t = −6.43, Padj = 5.4 × 10−9) and chronic kidney disease (CKD; cases = 546, t = −3.48, Padj = 3.45 × 10−3). Certain blood readouts showed associations with very specific disease phenotypes; for example, platelet variability in SFL under KCl 17 h perturbation was positively associated with purpura and hemorrhagic conditions (RET_KCl_17h_PLT_CV_SFL: cases = 225, t = 8.16, Padj = 4.89 × 10−14) and negatively associated with venous thrombosis (RET_KCl_17h_PLT_CV_SFL: cases = 220, t = −3.78, Padj = 1.3 × 10−3).
In addition to diagnostic codes, quantitative lab values commonly used to assess various physiological parameters also demonstrated robust associations with blood perturbation responses. For example, red blood cell (RBC) median SSC under 18 h LPS condition (RET_LPS_18 h_RBC_Med_SSC) showed strong positive associations with serum albumin (n = 2,494, t = 11.75, Padj = 3.89 × 10−28) and eGFR (n = 2,569, t = 3.26, Padj = 6.66 × 10−3), which corresponds with its negative association with CKD status. Significant associations included clinical traits that are not directly measurable in blood, such as a positive correlation between corrected QT interval on an electrocardiogram and RBC size variability under 18-h LPS perturbation (RET_LPS_18h_RBC1_SD_FSC; n = 1,946, t = 10.22, Padj = 3.01 × 10−21), indicating that latent blood phenotypes may reflect physiological changes occurring in other tissues.
To explore the associations between blood traits and clinical phenotypes, we employed independent component analysis (ICA) to identify maximally uncorrelated components in the association matrix (Fig. 3c). ICA effectively grouped clinical endpoints and lab values into meaningful clusters, for example, one encompassing obesity, T2D and glucose measurements, and another comprising asthma, chronic obstructive pulmonary disease and venous thrombosis (Fig. 3c). We plotted the loadings of seven example blood traits onto the same IC space (Fig. 3c, arrows), demonstrating how each blood trait carries unique information related to clinical phenotypes. We found that many perturbation conditions elicited new clinical associations not observed at baseline, suggesting perturbations evoked unique previously latent blood cell responses that are disease-relevant.
A neutrophil population is negatively associated with cardiometabolic phenotypes
Multiple chemical stimuli, when studied with long exposure durations, elicited a distinct population of neutrophils (NE2) in the Sysmex WDF channel, exhibiting high SSC and fluorescence measurements, which were absent under baseline conditions (Fig. 1b). As an exemplar, we investigated this phenotype and functionally characterized this neutrophil population.
The ratio of NE2 neutrophils to the total neutrophil count (NE2/NE4) under multiple chemical perturbations showed associations with a complex aggregate of cardiometabolic diseases, specifically chronic ischemic heart disease, heart failure and T2D. For example, the NE2/NE4 ratio with an inflammatory stimulus (WDF_Pam3CSK4_19h_NE2/NE4) had negative associations with T2D (cases = 685, t = −5.4, Padj = 1.51 × 10−6), obesity (cases = 1,202, t = −4.37, Padj = 1.47 × 10−4) and related lab values (serum triglycerides: n = 2,248, t = −5.6, Padj = 7.64 × 10−7; serum glucose: n = 2,657, t = −3.7, Padj = 1.72 × 10−3). This blood readout also exhibited a positive correlation with total high-density lipoprotein cholesterol levels (n = 2,259, t = 6.32, Padj = 1.24 × 10−8). These results suggest that a low NE2/NE4 ratio is associated with cardiometabolic disease phenotypes.
The NE2 population represents apoptotic neutrophils
Because the NE2 population was only observed with perturbations at later time points, we hypothesized that it was related to neutrophil death. To evaluate the biological processes occurring in this population, we developed protocols to label purified neutrophils with the Sysmex WDF dye to visualize NE2 using regular flow cytometry (Fig. 4a). We found that the cells that represent the NE2 population, showing elevated WDF dye fluorescence and SSC, exhibited increased signals in Annexin V and Sytox, compared to the NE1-like population that mirrors the normal neutrophil profile at baseline (Fig. 4b,c). Annexin V is a marker for early apoptosis, while Sytox is indicative of cell death. Furthermore, we observed that blood samples with higher NE2/NE4 ratios exhibited higher percentages of Sytox and Annexin V-positive neutrophils (Fig. 4d). These results suggest that the NE2 population elicited by various chemical perturbations represents a subset of neutrophils actively undergoing apoptosis.
Pro-inflammatory responses delay neutrophil apoptosis
Delayed apoptosis and impaired clearance of neutrophils can lead to non-resolving inflammation and subsequent tissue damage25,26. Neutrophils have short lifespans27, which can be prolonged by pro-inflammatory and pro-survival signals26. Patients with aggregated cardiometabolic diseases exhibited a decreased NE2/NE4 ratio, suggesting reduced neutrophil apoptosis. We hypothesized that the reduction in neutrophil apoptosis results from their increased pro-inflammatory responses. We examined neutrophil activation and generation of reactive oxygen species (ROS) at an earlier time point (4.5 h post-treatment) that is within the normal range of neutrophil half-life in vivo and compared it with the Sysmex NE2/NE4 readout at a later time point (17 h post-treatment; Fig. 4e). Neutrophil activation has been previously associated with the upregulation of CD11b on the cell membrane and shedding of CD62L28,29. Using these two surface markers, three distinct neutrophil populations are defined, such as CD11bhigh CD62Llow, CD11bmedium CD62Lhigh and CD11blow CD62Llow (Fig. 4f). High expression of CD11b and shedding of CD62L indicate activated neutrophils, while high surface expression of CD62L suggests quiescent neutrophils, and loss of both surface markers is indicative of cell death. We observed a robust anticorrelation between neutrophil activation and NE2/NE4 in donors (Fig. 4g). In addition, we assessed ROS generation using CellROX and quantified the percentage of ROS-positive neutrophils for each donor (Fig. 4h). Similar to neutrophil activation, we observed an anticorrelation between ROS generation and NE2/NE4 in donors (Fig. 4i). These results suggest that individuals with an increase in pro-inflammatory neutrophils show a reduced NE2 population in the Sysmex readout. We then tracked individual neutrophil trajectories with time-lapse imaging using CellROX and Sytox. We observed that neutrophils that survived until 15 h exhibited higher ROS and extended duration with elevated ROS compared to those that died earlier (Fig. 4j–l). Together, these results demonstrate that neutrophil pro-inflammatory responses, including activation and ROS generation, delay their apoptosis, which is in turn reflected as a reduced NE2 population in Sysmex measurements.
Consistent with our finding that pro-inflammatory neutrophil responses determine the NE2/NE4 readout, our GWAS also revealed an SNP rs5743618 in the TLR1/6/10 region associated with NE2/NE4 ratio (Fig. 5). This variant has been previously demonstrated to enhance TLR1 trafficking and expression on the plasma membrane and account for interindividual variability in Pam3CSK4-induced cytokine responses30,31. To simulate this gain of function in TLR1, we used the TLR1/2 ligand Pam3CSK4. We found a dose-dependent decrease in the NE2/NE4 profile in whole blood treated with Pam3CSK4 compared to untreated control (Extended Data Fig. 4a,b). As expected, stimulating neutrophils with Pam3CSK4 also increased neutrophil activation and ROS generation (Extended Data Fig. 4c–f). Furthermore, tracking individual neutrophil trajectories revealed that Pam3CSK4-treated cells exhibit prolonged durations of ROS elevation compared to untreated controls (Extended Data Fig. 4g). Pam3CSK4 also greatly increased neutrophils’ glycolytic adenosine triphosphate (ATP) production (P < 0.001; Extended Data Fig. 4h), suggesting that the neutrophils undergo metabolic reprogramming after TLR stimulation, as previously observed in macrophages32. These results further support the role of elevated neutrophil pro-inflammatory responses underlying the decreased NE2/NE4 ratio measured with Sysmex.
Common variants in metabolic genes regulate neutrophil activation and apoptosis
Besides TLR1 and several genes previously reported to regulate cell death (for example, CASP3 and BCL2A1)33,34, we also identified three metabolic genes, HK1, PFKP and ACSL1, associated with NE2/NE4 ratio at genome-wide significance (Figs. 5 and 6a). HK1 and PFKP encode hexokinase 1 and phosphofructokinase, respectively, two key enzymes regulating the rate-limiting steps in glycolysis, converting glucose into pyruvate and generating low levels of ATP35,36 (Fig. 6a). The lead SNPs we identified for HK1 and PFKP were previously associated with their increased expression (rs6480404 expression quantitative trait loci (eQTLs) for HK1 in neutrophils: β = 0.178, P = 4 × 10−16; rs34538474 eQTL for PFKP in blood: β = 0.457, P = 3.3 × 10−310)37,38. The two SNPs were associated with a decreased NE2/NE4 ratio, suggesting reduced neutrophil apoptosis. Neutrophils are typically thought to use anaerobic glycolysis as their primary energy source. However, recent studies suggest that neutrophils use diverse metabolic pathways, including fatty acid oxidation (FAO), to provide energy for specific functions35,36,39. Acyl-CoA synthetase long-chain family member 1, encoded by ACSL1, converts fatty acid into acyl-CoA, which is then transported into mitochondria for oxidation (Fig. 6a). To investigate the effects of HK1 and PFKP manipulations in neutrophils, we used a subsaturation dose of 2-deoxy-d-glucose (2-DG; 10 mM) to inhibit glycolysis and HK1. We used an ACSL1 inhibitor, triacsin C40, to study ACSL1 function. We first assessed ATP production from neutrophils using the Seahorse metabolic analyzer. Consistent with the literature, we observed that unstimulated neutrophils are highly glycolytic (Fig. 6b). As expected, 2-DG decreased glycolytic ATP production (P = 0.03). In contrast, triacsin C ablated mitochondrial ATP production (P = 0.002) and increased glycolytic ATP (P = 4.7 × 10−6; Fig. 6b). As ACSL1 modulates FAO, we further analyzed triacsin C’s effect on FAO using exogenous palmitate as a long-chain fatty acid substrate. Compared to the DMSO control, triacsin C decreased both mitochondrial respiration and maximal respiration in response to FCCP, suggesting reduced FAO in neutrophils (Fig. 6c).
We next examined how 2-DG and triacsin C modulate neutrophil function. We found that both treatments increased the NE2/NE4 ratio in whole blood compared to controls, suggesting an increase in neutrophil death (Fig. 6d), and reduced ROS production in neutrophils (Fig. 6e,f). 2-DG also decreased neutrophil activation (Fig. 6e,g). Unexpectedly, we observed the upregulation of neutrophil activation induced by triacsin C (Fig. 6e,g). This increase is potentially caused by the metabolic shift from FAO to glycolysis in neutrophils. The bidirectional effects on neutrophil activation and ROS generation of triacsin C underlie the smaller effect on neutrophil death observed when compared to 2-DG (Fig. 6d).
Lastly, to investigate whether inhibiting HK1, PFKP or ACSL1 promotes neutrophil apoptosis and clearance in vivo, we used a transgenic zebrafish model expressing GFP under the myeloperoxidase (mpo) promoter Tg (mpo:GFP)41 to track neutrophil behaviors. We stimulated inflammatory responses by performing tail transection. Within 4 h post tail transection, we observed that neutrophils were recruited to the injury site, followed by resolution at around 30 h under control conditions (Fig. 6h,i). Adding a subsaturation dose of 2-DG did not alter this response (Fig. 6h,i). In contrast, under hyperglycemic conditions, at 30 h post tail transection, neutrophils continuously accumulated at the injury site, suggesting delayed resolution of inflammation (Fig. 6h,i). Inhibiting glycolysis with a subsaturation dose of 2-DG effectively resolved prolonged neutrophil inflammation at the injury site under hyperglycemic conditions (Fig. 6h,i). In addition to pharmacological modulation, we used CRISPR–Cas9 to knockdown zebrafish orthologs hk1, pfkpa/pfkpb and acsl1a/acsl1b. Under control conditions, these knockdowns did not affect neutrophil recruitment or clearance (Extended Data Fig. 5). However, under hyperglycemic conditions, all three individual knockdowns promoted the resolution of neutrophil accumulation at the injury site, with acsl1a/acsl1b knockdown exhibiting the most pronounced effect and hk1 knockdown showing the weakest effect (Extended Data Fig. 5).
These results suggest that HK1, PFKP and ACSL1 interact to regulate neutrophil inflammatory responses by modulating their metabolic profiles. Pharmacological inhibition of HK1 and PFKP effectively prevents sustained inflammation related to hyperglycemia and promotes neutrophil clearance. We found that SNPs leading to increased HK1 and PFKP expression reduced the NE2/NE4 ratio, which is prevalent in cardiometabolic disease. Patients with these common alleles appear to exhibit delayed inflammation resolution, potentially contributing to disease pathophysiology. Thus, modulation of HK1 and PFKP could serve as a mechanism-driven therapeutic strategy for such patients.
Polygenic scores for diverse blood cell readouts predict disease outcomes
As we observed correlations between blood-response readouts and clinical traits, we sought to test whether polygenic scores (PGSs) based on blood-response summary statistics can be used to stratify patient populations and improve the predictions of clinical events. We calculated PGSs for perturbation blood responses spanning different cell types and conditions, using clumping and thresholding with fixed parameters, for participants in the Mass General Brigham (MGB) Biobank and the UK Biobank (UKBB). We first computed Cox proportional hazard models for 30 clinical outcomes, using blood-based PGSs derived from the selected 327 blood readouts, adjusting for sex and the first two genetic principal components. Then, we performed meta-analyses to identify blood traits and clinical outcomes with robust associations in both MGB Biobank and UKBB datasets.
The PGSs calculated from different blood readouts exhibited unique associations with specific diseases. We stratified participants into quartiles according to their PGS and plotted the time to first diagnosis for a subset of diseases and blood traits (Fig. 7a), which showed clear separation among different quartiles. For example, the first quartile based on PGS calculated from variability in RBC FSC under 17 h KCL perturbation (RET_KCL_17h_RBC1_SD_FSC) showed delayed onset of heart failure compared to the last three quartiles (Fig. 7a), suggesting the genetic basis underlying this blood cell trait might be used to predict risk for heart failure and explore the mechanisms leading to its development. Because there are differences in the cohort characteristics and prevalence of outcomes between MGB Biobank and UKBB, we focused on associations that were significant in the meta-analysis of both cohorts (Fig. 7b).
We identified significant associations in both cohorts for multiple cardiometabolic conditions (Fig. 7b, Supplementary Fig. 3 and Supplementary Data 3), for example, obesity (RET_LPS_18h_RBC2_Med_SSC, Padj = 3.74 × 10−6, MGB cases = 9,499, UKBB cases = 41,893), T2D (RET_KCl_17h_RET1_%, Padj = 1.5 × 10−4, MGB cases = 6,226, UKBB cases = 34,941), CKD (WNR_Water_15h_WBC2_Med_FSC, Padj = 1.7 × 10−5, MGB cases = 5,627, UKBB cases = 23,771) and heart failure (RET_KCl_17h_RBC1_SD_FSC, Padj = 6.4 × 10−3, MGB cases = 4,421, UKBB cases = 15,811). We also observed strong associations with immune-related conditions such as type 1 diabetes (PLTF_LPS_18h_PLT_Med_SFL, Padj = 8.6 × 10−5, MGB cases = 530, UKBB cases = 4,207), asthma (WNR_LPS_18h_WBC_Med_SSC, Padj = 8.7 × 10−5, MGB cases = 6,176, UKBB cases = 62,009) and systemic lupus erythematosus (WDF_Alhydrogel_21h_NE2-NE4_ratio, Padj = 4.5 × 10−3, MGB cases = 532, UKBB cases = 804). Conducting ICA based on the meta-analysis results (Fig. 7c) revealed meaningful clusters of clinical phenotypes, such as a group involving lipidemia, chronic ischemic heart disease and heart failure. These findings suggest that genetic factors influencing various blood traits can effectively stratify different disease outcomes.
Multigenic models of ACSL1, PFKP and HK1 predict CKD risk in patients with T2D
We further investigated blood readouts associated with variants in ACSL1, PFKP and HK1 in detail. As demonstrated above, these metabolic genes regulate neutrophil activation and clearance, particularly in hyperglycemia. Thus, we sought to test whether PGSs calculated based on these blood cell traits predict the time to CKD onset and progression in individuals with prediabetes and diabetes (HbA1C > 5.7). We categorized CKD stages 3a, 3b, 4 and 5, based on estimated glomerular filtration rate (eGFR) thresholds (eGFR = 45–59, 30–44, 15–29 and <15 ml min−1/1.73 m2). We found that the PGSs for RBC variability in SFL under 21 h Alhydrogel, 20 h colchicine, 17 h KCL and 18 h LPS perturbations were positively associated with CKD progression, whereas the NE2/NE4 ratio under 17 h KCL, 20 h colchicine and 19 h Pam3CSK4 conditions was negatively associated with CKD development (Fig. 7d). These results suggest that PGSs based on cellular readouts can be used to identify subpopulations of disease at increased risk of discrete complications, such as accelerated progression of CKD in T2D.
Discussion
Over 3,300 traits have been investigated using GWAS in more than 1 million participants, with current studies continuing to increase sample sizes to improve statistical power. While the techniques are robust, it remains difficult to identify underlying biological effects6. One major bottleneck is a generalizable strategy to move from a locus to a genetic target and mechanistic insights, limiting translation toward therapeutic development. We outline an approach that combines cellular phenotyping with GWAS to uncover previously latent, large effect-size genetic loci with direct implications for cell biology. Using multigenic models based on selected cellular phenotypes, we then identified clinical phenotypes with substantially altered disease risks related to these intermediate phenotypes.
We focused on cellular responses in peripheral blood, as such samples are highly accessible and have long been used as a diagnostic tool in clinical settings, and technologies are broadly available for subsequent scaling of any useful findings. In addition to clinically available assays of cross-sectional cellular counts, we assessed blood cell properties under 36 perturbation conditions, aiming to elicit phenotypes that are latent at baseline, and thus likely to be previously unmeasured. We chose this approach to favor the identification of new disease-related endophenotypes, from which we could select those associated with large effect size common alleles that might represent rigorous drug targets. We expanded the phenotypic space from 29 blood parameters used in previous studies to over 4,000 cell readouts. We were able to identify alleles associated with key cellular processes, such as neutrophil activation and apoptosis, which have roles in common complex diseases beyond hematopoietic disorders. Evoked cellular response traits in peripheral blood offer a complementary approach to existing phenotyping with the potential to identify genes and pathways with translational and clinical relevance.
To validate that risk genes identified using our framework are linked to disease-relevant biology, we conducted functional studies of genes associated with the evoked NE2 population. Although the Sysmex measurements are not tailored to characterize neutrophil function, we found that WDF (a nucleic acid dye) used to distinguish blood cell lineages is reflective of neutrophil apoptosis. We further elucidated that the delay in neutrophil apoptosis was due to a neutrophil pro-inflammatory response. The perturbation-based assays we developed enabled the efficient identification and experimental validation of genes (HK1, PFKP and ACSL1) involved in metabolic pathways affecting neutrophil ROS generation and lifespan, revealing cell metabolism as a potential therapeutic target for inflammation in various cardiometabolic diseases.
Our approach reveals common genetic variants with large effect sizes. Notably, several genes we identified have been previously demonstrated to underlie specific Mendelian diseases. For example, we identified common coding variants in TUBB1 that affect platelet traits, while rare variants in TUBB1 were previously linked to inherited thrombocytopenia42,43. BMPR2, which is linked to hereditary pulmonary arterial hypertension (PAH)44, was associated with monocyte responses in this study. As monocytes and macrophage abnormalities have been implicated in the pathophysiology of PAH45, this finding suggests a monocytic contribution to the vascular inflammation observed in BMPR2-linked PAH but also offers a window into potential somatic contributions to other forms of PAH. These examples support the utility of latent phenotypes to define cellular mechanisms that can bridge common genetic variation and complex diseases.
PGSs calculated from a subset of blood cell traits associated with metabolic genes showed utility in risk prediction for renal complications of diabetes. Emerging evidence supports the involvement of innate immunity in CKD initiation and progression in diabetes, but studies have typically focused on macrophages46. Our results reveal a role for genetically determined variation in the genesis of pro-inflammatory neutrophils in CKD development in diabetic patients. The PGS models based on blood readouts were able to stratify patients with distinct risks for developing various cardiometabolic, vascular and inflammatory diseases, revealing subgroups that might benefit from therapeutics targeting related biological pathways.
Our study has several limitations. Firstly, we used a conventional significance threshold of P < 5 × 10−8 for genetic association without adjusting for the number of phenotypes tested, which may result in false positives. We estimated that approximately 350 traits were independent among the phenotypes tested. To reduce the false discovery rate (FDR), we reported significant associations only when at least two independent traits were linked to the clumped region. In practice, the evoked cellular traits and their genetics are efficiently validated in scalable in vivo models. Secondly, we had varying sample sizes across different perturbations, which could reduce statistical power for conditions with fewer samples, potentially resulting in false negatives. Furthermore, while our phenotypic associations are derived from multiple ancestry groups, the genetic associations are based on individuals of European ancestry due to limited representation of other ancestry groups in our cohort. We performed GWAS analyses for a subset of blood cellular traits across multiple ancestry groups, which revealed consistent trends in effect directions, albeit with notable disparities for several lead SNPs (Extended Data Fig. 6). Future investigations are needed to unravel the trans-ancestry genetic basis governing evoked blood responses. Lastly, for PGSs related to clinical traits based on EHR, we employed Cox proportional hazard models (time-to-event analyses). However, EHR data inherently present limitations, because they do not capture the entire medical history and there can be misalignment of the age of disease onset versus diagnosis. To address these issues, we used Cox models with delayed entry to handle incomplete observations. Nevertheless, the time of disease onset could be misrepresented due to the inherent constraints of EHR data.
In summary, we performed perturbational blood cell phenotyping using a widely available cytometry device that is primarily designed for robust whole-blood cell counts. This framework incorporating human genetic data, primary cellular phenotyping and deep clinical traits enables the iteration of genetic risk locus discovery, systematic target validation and subsequent drug discovery. Implementing such a method in routine clinical settings will facilitate the development of refined clinical trajectories and identification of large effect size common variants contributing to human disease and clinical outcomes.
Methods
Human study participants
Study participants were recruited in accordance with IRB 2019P003155 from multiple phlebotomy clinics in the MGB hospital system. Sample sizes of measured blood profiles and genotyped subjects per perturbation condition are listed in Supplementary Table 1. Demographic information such as age and sex are provided in Supplementary Table 3. Written informed consent was obtained from all individuals. The MGB Institutional Review Board approved the analyses of the UKBB (application 55482).
Zebrafish
All zebrafish studies were carried out under the protocols approved by the Brigham and Women’s Hospital Standing Committee on Animals.
Reagents
Details of reagents used in this study are included in Supplementary Table 5.
Whole-blood perturbation screening
Physiologically relevant doses and time points were determined for each perturbation to elicit reproducible effects on blood analyzed on a Sysmex XN-1000 hematology analyzer (see Supplementary Table 1 for perturbation condition descriptions including dose and exposure times, and Supplementary Table 5 for the details of chemical agents). Compounds dissolved in DMSO or chloroform were prepared such that the percent by volume of solvent is <0.5%. Each condition was assigned a three-digit identifier (for example, −007) that was paired with a patient ID for each treated sample (for example, AA-00100-007). This standardized label scheme allowed for the preparation of barcoded sample tubes and batch-wise automated measurements using the hematology analyzers. Sysmex XN-1000 was calibrated each day using Sysmex XN Check levels 1–3. New QC lots were acquired every 28 d as recommended by the manufacturer’s guidelines.
Up to 40 individuals per day were recruited from multiple phlebotomy clinics and donated up to 50 ml of blood in addition to their clinical blood draw. Whole blood was collected in 8.5 ml ACD tubes (BD 364606). Barcoded sample tubes with patient and perturbation identifiers were aligned and prepared batch-wise, by aliquoting 700 μl of whole blood into a grid of 5 ml round bottom tubes. All perturbation compounds were added to blood at specific time points and transferred to incubator shakers (39 °C, 200 RPM). After incubation, tubes were placed in automated sampling racks and profiled using the Sysmex XN-1000. Both Sysmex-derived blood parameters (for example, CBC) and raw cytometry data were exported as .csv and .fcs files.
Genotyping, quality control and imputation in screening cohort
Before aliquoting patient blood samples, a portion of freshly drawn blood was set aside for whole-blood DNA extraction. DNA was extracted from 3 ml of whole blood using Qiagen Puregene Blood Core Kit C (158389). DNA was quantified and checked for quality using NanoDrop One and Qubit, diluted to 75 ng μl−1 and stored at –80 °C. Samples were aliquoted into 96-well barcoded plates and quantified using Cytation Take3 Trio before genotyping. Internal genotyping for quality control was performed using Advanta Sample ID Genotyping Panel (Fluidigm, 101-7773). Aliquots were shipped to Northwell Health Genomics Alliance and the University of Miami Genotyping Core in 96-well barcoded plates with one empty well for controls. Samples were quantified using Nanodrop and Qubit to identify plates with high numbers of low-concentration samples, which could be replaced before genotyping. Genotypes were called from genomic DNA in batches of approximately 500 samples using the Illumina GSAv3 Beadchip and Illumina Genome Studio.
Computational analyses used Python 3.9 and R 4.2. Genotype data were processed using PLINK1.9 and PLINK2 (ref. 47). Samples were excluded from participants who had high missingness of variants (>10%), had sex mismatches from genotyped data or had withdrawn from the study. In addition, for samples failing Advanta fingerprinting (concordance of at least 0.75 in at least 20 SNPs), genotyping was repeated, or the samples were removed. Variants with high missingness across individuals (>10%) or deviations from Hardy–Weinberg equilibrium at P < 1 × 10−50 were filtered. Structural or multi-allelic variants were removed. A local instance of Michigan Imputation Server v1.5.7 (ref. 48) with Eagle2 and Minimac4 was used to impute genotypes with the 1000G Phase3 v5 reference panel. After imputation, variants with minor allele frequency of <0.0001 were removed. The first ten principal components were estimated using PLINK2. Relatedness was estimated using PLINK2 with the KING-robust kinship estimator49 and five individuals with a kinship greater than 0.177 (first-degree relations or closer) were removed. In total, after these exclusions, genotype data were available for 2,685 individuals on >3.5 million imputed variants. Based on self-reported ancestry at study entry, our cohort consisted primarily of individuals with European ancestry, preventing robust multi-ancestry analyses due to low numbers of individuals in other ancestry groups. Therefore, we calculated and reported genetic associations for the subset of participants with self-reported European ancestry only (discovery cohort). For cross-ancestry validation of the lead variants, we used the following self-reported ancestry groups: AFR, ASIAN and OTHER (including Other, Pacific Islander and Native American) for separate GWAS analyses. Genotyped individuals in the self-reported HISPANIC group were not included in the cross-ancestry analyses due to insufficient numbers.
Genotyping, quality control and imputation in MGB Biobank cohort
MGB Biobank samples were genotyped in batches using three related Illumina arrays (MEGA, MEGA Ex and MEG), as well as the Illumina GSAv3 array. Imputation was performed using the Michigan Imputation Server with the 1000G Phase3 v5 reference panel for each batch. We merged batches using the intersection of variants present in all batches and applied the same QC filtering as above. In short, individuals with high missingness (>10%) or sex mismatches were removed. Variants with high missingness across individuals (>10%) or deviations from Hardy–Weinberg equilibrium at P < 1 × 10−50 were filtered. Structural or multi-allelic variants were removed. Principal component analysis (PCA) was calculated using PLINK2, and individuals with a kinship greater than 0.177 as well as individuals with non-European ancestry (distance greater than 3× radius of 1000G EUR reference samples in joint PCA) were removed using plinkQC50. Individuals who were part of the screening cohort were removed from the MGB Biobank cohort. In total, after these exclusions, genotype data were available for 44,705 participants on >6.7 million imputed variants. For PGS applications, we further filtered variants to have a minimum minor allele count of 100 and missingness <2%, leaving 1.8 million variants.
Genotyping, quality control and imputation in UKBB cohort
The UKBB samples were genotyped on two Affymetrix arrays, UK BiLEVE and UKBB Axiom. The genotyping data underwent stringent quality control procedures described elsewhere51, including exclusion of individuals based on missingness, heterozygosity, sex mismatch, relatedness and non-British ancestry. Imputation was carried out using a two-step prephasing/imputation process using SHAPEIT and IMPUTE2 software, using the Haplotype Reference Consortium and UK10K haplotype resources. Post-imputation quality control included the removal of variants with minor allele frequency <1%, minor allele count >100, variants with an imputation quality score (Minimac r2) < 0.4 and those not in Hardy–Weinberg equilibrium (P < 1 × 10−15). We used the White ethnic background cohort based on the self-reported UKBB data field f21000. After these quality control steps, data for approximately 424,000 participants with clinical outcomes were available. PCA was performed on the non-imputed genotype data of the same individuals using PLINK2.
Phenotype measurements and quality control
We measured a total of 278 blood-based cellular phenotypes using a blood flow cytometer (Sysmex XN-1000) under 37 different conditions. The blood cell parameters can be categorized into indices related to membrane/intracellular structure measured using SSC, nucleic acid and membrane lipid content measured using SFL, and cell shape/volume measured using FSC, as well as parameters such as cell counts and percentages within defined regions (gates). For each parameter, we calculated robust estimators such as median, robust s.d. and robust coefficient of variation using FlowJo v10.8. Gates were empirically defined based on densities of measured cells under baseline and perturbation conditions and included additional regions for subpopulations that were typically not observed under baseline conditions. We defined a total of 15 WBC-related gates, 7 RBC gates, 4 platelet-related gates and 4 gates for debris or unknown cell types. All samples were measured within 36 h of blood draw, with baseline measurements occurring within 3 h for 80% and 7 h for 95% of samples.
We performed thorough quality control to identify sources of technical variation as well as biological covariates. For this, we assessed the effect of the time between blood draw and flow cytometry measurement, drift over the course of the study (study month) and biological covariates such as age, sex and race (Supplementary Fig. 4). We removed outlier samples where a single phenotype was outside of four median absolute deviations from the median measurement of all samples under the same conditions. We also computed a two-dimensional ICA projection for all blood measurements from a single fluorophore under a single perturbation condition and removed samples that were further than 2.5 median absolute deviations from the median sample. Finally, we quantile-transformed the phenotypic measurements. The final numbers of blood measurements as well as genotyped individuals passing QC across conditions are shown in Supplementary Table 3.
Estimation of the number of independent traits
During the study, multiple batches of perturbations were administered across different time periods, each involving mostly nonoverlapping groups of individuals. Due to the distinct cohorts and perturbation conditions across batches, the data consisted of several mostly complete blocks of measurements (apart from missing values in individual measurements). We approached each of these blocks separately to estimate the effective number of independent traits. To estimate the effective number of independent traits, we used quantile transformation followed by PCA on each of these blocks of blood readouts separately. We used the R package ‘PCAtools’ v2.12.0 to determine the count of PCA components that cumulatively explained 90% of the variance in the data for each block. This number varied from 243 to 349 across the blocks. However, the blocks also shared a subset of perturbation conditions, and we observed recurrent genetic associations under different perturbations, suggesting an overlap of underlying structure. Based on these analyses, we estimate the presence of over 350 independent traits (Supplementary Fig. 5).
Flow cytometry
Flow cytometry analyses were performed on neutrophils isolated from patients’ whole-blood samples, using the EasySep Direct Human Neutrophil Isolation Kit (STEMCELL, 19666). After isolation, neutrophils were resuspended in Tyrode’s solution as described previously. To characterize the NE2-like cell population using flow cytometry, neutrophils were isolated from whole-blood samples that were incubated at 37 °C for 17 h and then labeled with apoptosis indicators, Sytox green (Thermo Fisher Scientific, S7020) and R-PE conjugated Annexin V (Thermo Fisher Scientific). The labeled neutrophils were then subjected to permeabilization using Sysmex WDF Lysercell (Sysmex) and staining with Fluorocell WDF dye (Sysmex). The samples were analyzed for 5 min after the addition of Fluorocell WDF dye.
To characterize neutrophil activation and ROS, isolated neutrophils were labeled with Pacific Blue anti-human CD11b antibody (BioLegend, Clone ICRF44, 1:100 dilution) and Alexa Fluor 488 anti-human CD62L antibody (BioLegend, Clone DREG-56, 1:100 dilution). Cells were then subsequentially labeled with CellROX Deep Red Reagent (Thermo Fisher Scientific, C10422) at 37 °C for 30 min. Cells were washed and resuspended in staining buffer before flow cytometry analyses.
Seahorse metabolic analysis
For the real-time ATP rate assay, a DMEM assay medium containing 10 mM glucose, 1 mM pyruvate and 2 mM glutamine was used. Extracellular acidification rate and oxygen consumption rate were measured from neutrophils isolated from patients’ whole blood pretreated with or without 2-DG (10 mM) or triacsin C (5 µg ml−1), using a Seahorse XFe96 analyzer. Neutrophils were resuspended in DMEM medium and seeded (1 × 106 per well) in a Seahorse 96-well plate coated with CellTak (Corning, 354240) for 20 min. Cell attachment was visually confirmed before the assay. The assay was performed according to manufacturer instructions. Here 1.5 µM oligomycin, 1 µM FCCP and 0.5 µM rotenone/antimycin A were used.
For the long-chain fatty acid stress test, neutrophils isolated from untreated whole blood were first resuspended and incubated for 2 h at 37 °C in a substrate-limited medium containing 0.5 mM glucose, 1 mM glutamine, 0.5 mM l-Carnitine, and 1% FBS. Cells were then pelleted and resuspended in an assay medium containing 2 mM glucose and 0.5 mM l-Carnitine. Cells were seeded (1 × 106 per well) in a Seahorse 96-well plate coated with CellTak (Corning, 354240) for 20 min. After visually confirming cell attachment, cells were treated with triacsin C (5 µg ml−1) or DMSO control for 30 min. Palmitate-BSA FAO substrate was added before the assay. The assay was performed according to the manufacturer’s instructions. Also, 4 µM etomoxir, 1.5 µM oligomycin, 1 µM FCCP and 0.5 µM rotenone/antimycin A were used. Normalization for both assays was performed based on direct cell counting.
Zebrafish tail transection and hyperglycemia induction
Zebrafish larvae at 54 h postfertilization were anesthetized by immersion in E3 water with 4.2% tricaine. Tail transections were performed with a sterile scalpel at the distal end of the notochord. Brightfield and fluorescence images were acquired with a Cytation 5 at 4 h, and 24 h or 30 h post-transection at 28 °C. A neutrophil count within the tail region was performed using ImageJ. We induced hyperglycemia in zebrafish larvae by ablating β-cells as previously described12. Briefly, 48 hpf embryos were treated with 500 µM alloxan for 30 min, followed by incubation in E3 water containing 30 mM glucose.
Zebrafish genetic knockdowns
The hk1, pfkpa/pfkpb and acsl1a/acsl1b knockdown zebrafish lines were generated using CRISPR–Cas9. Two-part guide RNAs were used to knockdown each gene. The guide RNAs were designed using CHOPCHOP52, targeting the sequences shown in Supplementary Table 6. CRISPR RNAs (crRNAs) were synthesized (Integrated DNA Technology) and then annealed with trans-activating crRNA (tracrRNA) and incubated with Alt-R Cas9 Nuclease to form the ribonucleoprotein complex. Here 1.5 nl of the complex was injected into Tg (mpo:GFP) embryos at the one-cell stage.
Genome-wide association tests and model selection
After genetic and phenotypic QC, blood phenotypes were retained for 4,723 individuals and genotypes for 2,685 individuals. We excluded debris, ghost and NRBC cell-type gates from genetic association tests because they yielded non-normally distributed phenotypes after quantile transformation. We performed an univariable GWAS for each of the remaining 278 traits under 37 different conditions. Specifically, we used PLINK2 to compute association statistics for a linear regression of phenotype on the allele dose for >3.5 million imputed variants with minor allele frequency >0.05, minor allele count >10, covariate variance standardization and the covariates age, sex, time from blood draw to analysis, month of study, genotyping chip and batch and the first ten genotype principal components.
We used P < 5 × 10−8 as a significance threshold for each phenotype and did not correct for multiple testing at the level of association P values. Many of our measured phenotypes were correlated across similar gate/cell types (for example, subpopulations of neutrophils), phenotypic dimensions (for example, SSC and FSC) or conditions (for example, TLR ligands Pam3CSK4 and LPS). Given the large number of tests and limited number of study participants, we sought to identify a concise set of variants that are associated with the strongest observed cellular responses. For this, we clumped all significant variants using PLINK1.9 with LD r2 > 0.50, physical distance <250 kb between clumped variants and at least two independent hits from different traits for each clumped region. We used the variant with the smallest association P value across all measured traits for a given region as the lead variant. The following command was used for clumping and gene range annotations: plink --clump-range glist-hg19 --clump-p1 0.00000005 --clump-p2 0.00000005 --clump-r2 0.50 --clump-kb 250 --clump-replicate --clump {trait_files}. This command also annotated associated regions using gene range lists provided by PLINK2 (https://www.cog-genomics.org/static/bin/plink/glist-hg19). If multiple genes were present for a given location, we used the locus-to-gene model from OpenTargets Genetics to identify likely candidates53. We prioritized candidate genes in the following order: coding variants, variants in introns and distance to transcription start sites. If there was no clear evidence for a subset of candidates, we reported the full list from the PLINK gene annotation step. We also annotated each region with associations previously reported for blood cell traits based on the supplementary material of ref. 17.
Association with clinical phenotypes in the screening cohort
We defined 30 binary clinical phenotypes using ICD10 diagnostic codes (Supplementary Table 4). We also collected 20 quantitative measurements available across our entire cohort such as the comprehensive metabolic panel, lipid panel and structured electrocardiographic data. We fitted logistic or linear models associating binary and continuous traits with 327 blood phenotypes (top three traits with the lowest GWAS P value were selected for each unique locus). Blood readouts were quantile transformed and models included the covariates age, sex, race and time from blood draw to measurement. For categorical outcomes, we used the ‘glm’ function in ‘statsmodels’ 0.13.2 with the formula ‘diagnosis~blood_readout+age+race+sex+draw_time’ and binomial family linkage. For continuous outcomes, we used the ‘ols’ function in ‘statsmodels’ with the same formula. Models for categorical and continuous outcomes were tested using z test and t test, respectively. Subsequently, to control the FDR in the presence of multiple comparisons, we computed q values using the ‘qvalue’ package v2.4.2 in R. The q values provided an estimate of the minimum FDR at which each test may be considered significant. A listing of clinical associations including covariates, case counts, β coefficients and adjusted P values is provided in Supplementary Data 2.
PGSs and disease associations in the MGB and UKBB cohorts
For 327 traits with significant genetic associations, we used summary statistics from the screening cohort to calculate PGSeters to calculate PGSs. Specifically, we used the command plink --clump-p1 0.5 --clump-r2 0.5 --clump-kb 100 for clumping and a P value threshold of 0.1 for the scoring step.
Our survival analyses model the time to first observed diagnosis after birth, considering the age at the first available diagnosis for any diagnostic code as the start of the observation or ‘delayed entry’ into the model. We use the framework of counting processes to account for this delayed entry, and the corresponding survival models are fit using Cox’s proportional hazards regression. Counting process models allow us to consider each individual’s date of birth as the starting point while acknowledging that our observation period for each individual only starts at their first hospital or outpatient visit that is documented in the EHR.
There are two settings in which we define events as having occurred between birth and the beginning of the observation period. Cases where previous medical history (only available in MGB cohort) contains the diagnoses of interest, but without a specific diagnosis date, were treated as the disease onset occurring at some unknown time in the interval between birth and start of observation period (for example, before the first hospital encounter). In addition, if the time between the start of the observation period and the event date in the EHR system is less than 1 year, we assume that the true event date most likely occurred between birth and the first visit in the healthcare network and was only reported in the EHR with delay. In these cases, we consider it an ‘instant event’ and encode it as having occurred in the interval between birth and start of the observation period.
We used the same disease definitions as above (Supplementary Table 4) to define case status, as well as the age at first diagnostic code or first mention in the medical problem list as event date. We calculated Cox proportional hazard models for the time to onset of 30 clinical outcomes with the variables sex, first two genetic principal components and PGS for 327 blood traits using the R package (‘survival’ 3.5-3), which provides support for survival analyses based on counting processes including delayed entry. For a visual comparison of study participants, we also stratified individuals into PGS quartiles and plotted Kaplan–Meier curves.
Meta-analyses of MGB and UKBB disease associations
To integrate the results from the MGB Biobank and the UKBB, we conducted a meta-analysis on each blood PGS—clinical endpoint model using the ‘rma’ function from the ‘metafor’ package in R. We fitted a random-effect model using the restricted maximum likelihood method, which allows for the potential heterogeneity of effects across datasets. We used the estimated log hazard ratios and their standard errors from each dataset as inputs to this model and visualized the results with forest plots. To control the FDR in the presence of multiple comparisons, we computed q values using the ‘qvalue’ package v2.4.2 in R. Listings of PGS associations at the meta-analysis stage as well as in MGB and UKBB are provided in Supplementary Data 3–5.
ICA of blood traits and clinical endpoints
To visualize the multivariate structure between blood traits and clinical endpoints, we used an ICA of the association t scores calculated for blood readouts in the screening cohort, as well as blood-trait PGS association t scores calculated in the meta-analysis step. The matrix of t scores thus represented the pattern of association between all pairs of blood traits and clinical endpoints across our data. We conducted ICA using the ‘fastICA’ R package v1.2–3. This computational method separates a multivariate signal into additive subcomponents that are maximally independent. Applying the ICA to our matrix resulted in the following two outputs: a set of independent components and a mixing matrix. The independent components represented dimensions of variation within the data, while the mixing matrix showed how each original variable (blood readout or blood-based PGS) contributed to these dimensions. To visualize our results, we plotted the first two independent components, which gave us a projection of clinical endpoints into a two-dimensional space. We also used the weights from the mixing matrix to indicate the direction of association for a subset of blood traits within this space.
Additional statistical analysis
We first assessed the normality of the data with the Kolmogorov–Smirnov test. If the distribution was normal, for comparisons between the two groups, we used an unpaired two-tailed Student’s t test. For comparisons between treatments for the same donor, we performed paired two-tailed Student’s t tests. When the data were not normally distributed, we used the nonparametric Mann–Whitney test for comparison between two groups and the Wilcoxon matched-pair signed-rank test for comparison between different treatments for the same donors. To assess statistical significance in difference across more than two groups, we used an ordinary one-way analysis of variance test followed by Dunnett’s multiple comparison test.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41588-023-01600-x.
Supplementary information
Acknowledgements
This work was supported by One Brave Idea, cofounded by the American Heart Association and Verily with significant support from AstraZeneca and pillar support from Quest Diagnostics (to C.A.M. and R.C.D.). M.H., W.Z. and S.G. are supported by the Tobia and Morton Mower Science Innovation Fund Fellowship.
Extended data
Author contributions
M.H. and W.Z. designed and performed experiments and data analyses, and drafted the manuscript. S.S.E., P.C.T. and H.Z. designed and performed experiments and data analyses. C.N.W., D.D.K., L.L.X., C.N., Z.S., J.C., C.G.E., M.N.H., A.S.T., T.M., S.G., J.G.T., B.W. and S.V. performed experiments and provided technical assistance. E.W., C.S., J.B.N., D.N.N., G.M.L., H.C.F., C.J.P., M.C., S.S. and C.R. performed and coordinated the recruitment of study participants. R.C.D., M.H., C.A.M., S.V., and W.Z. contributed to the study conceptualization and design, and edited the manuscript. All authors read and approved the final version of the manuscript.
Peer review
Peer review information
Nature Genetics thanks Guillaume Lettre and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
Individual-level data are subject to restrictions imposed by patient consent and local ethics review boards. GWAS summary statistics have been deposited in the GWAS catalog database (GCST90257015-GCST90257105). PGSs as used for the UKBB analyses have been deposited in Figshare (10.6084/m9.figshare.24354235). Clumped significant variants are listed in Supplementary Data 1. Clinical outcomes and quantitative lab measurements associated with blood readouts with Padj < 0.1 are listed in Supplementary Data 2. Clinical outcomes associated with polygenic models derived from blood readouts with Padj < 0.1 are listed in Supplementary Data 3 for the meta-analyses, and Supplementary Data 4 and 5 for the MGB and UKBB cohorts, respectively. Other datasets generated or analyzed during the current study can be made available upon reasonable request to the corresponding authors.
Code availability
The custom code used in this study is available at 10.5281/zenodo.10041992 (ref. 54). For proprietary or commercial software/tools used in this study, please refer to the materials and methods section for details on how to access them or contact the corresponding author for more information.
Competing interests
R.C.D. was supported by grants from the National Institutes of Health and the American Heart Association (One Brave Idea, Apple Heart and Movement Study) and is a cofounder of Atman Health. C.A.M. is supported by grants from the National Institutes of Health and the American Heart Association (One Brave Idea, Apple Heart and Movement Study); is a consultant for Bayer, Biosymetrics, Clarify Health, Dewpoint Therapeutics, Dinaqor, Dr. Evidence, Foresite Labs, Insmed, Pfizer and Purpose Life Sciences; and is a cofounder of Atman Health. R.C.D., M.H., C.A.M., S.V. and W.Z. are co-inventors on patents related to this work, and C.A.M., R.C.D., M.H. and W.Z. hold equity in Tanaist. All other authors report no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Max Homilius, Wandi Zhu.
These authors jointly supervised this work: Max Homilius, Wandi Zhu, Calum A. MacRae, Rahul C. Deo.
Contributor Information
Max Homilius, Email: mhomilius@bwh.harvard.edu.
Wandi Zhu, Email: wzhu5@bwh.harvard.edu.
Calum A. MacRae, Email: cmacrae@bwh.harvard.edu
Rahul C. Deo, Email: rdeo@bwh.harvard.edu
Extended data
is available for this paper at 10.1038/s41588-023-01600-x.
Supplementary information
The online version contains supplementary material available at 10.1038/s41588-023-01600-x.
References
- 1.Martínez-Jiménez F, et al. A compendium of mutational cancer driver genes. Nat. Rev. Cancer. 2020;20:555–572. doi: 10.1038/s41568-020-0290-x. [DOI] [PubMed] [Google Scholar]
- 2.Green RH, et al. Asthma exacerbations and sputum eosinophil counts: a randomised controlled trial. Lancet. 2002;360:1715–1721. doi: 10.1016/S0140-6736(02)11679-5. [DOI] [PubMed] [Google Scholar]
- 3.Gertz MA, Dispenzieri A. Systemic amyloidosis recognition, prognosis, and therapy. JAMA. 2020;324:79–89. doi: 10.1001/jama.2020.5493. [DOI] [PubMed] [Google Scholar]
- 4.Burton PR, et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Visscher PM, et al. 10 Years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Uffelmann E, et al. Genome-wide association studies. Nat. Rev. Methods Prim. 2021;1:59. [Google Scholar]
- 7.Siddhartha J, et al. Clonal hematopoiesis and risk of atherosclerotic cardiovascular disease. N. Engl. J. Med. 2017;377:111–121. doi: 10.1056/NEJMoa1701719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zhu W, Deo RC, MacRae CA. Single cell biology: exploring somatic cell behaviors, competition and selection in chronic disease. Front. Pharmacol. 2022;13:867431. doi: 10.3389/fphar.2022.867431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Fuster JJ, et al. TET2-loss-of-function-driven clonal hematopoiesis exacerbates experimental insulin resistance in aging and obesity. Cell Rep. 2020;33:108326–108326. doi: 10.1016/j.celrep.2020.108326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Carobbio A, et al. Leukocytosis is a risk factor for thrombosis in essential thrombocythemia: interaction with treatment, standard risk factors, and Jak2 mutation status. Blood. 2006;109:2310–2313. doi: 10.1182/blood-2006-09-046342. [DOI] [PubMed] [Google Scholar]
- 11.Svensson EC, et al. TET2-driven clonal hematopoiesis and response to Canakinumab. JAMA Cardiol. 2022;7:521–528. doi: 10.1001/jamacardio.2022.0386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zhu W, et al. PIEZO1 mediates a mechanothrombotic pathway in diabetes. Sci. Transl. Med. 2022;14:eabk1707. doi: 10.1126/scitranslmed.abk1707. [DOI] [PubMed] [Google Scholar]
- 13.Ridker PM, et al. Antiinflammatory therapy with canakinumab for atherosclerotic disease. N. Engl. J. Med. 2017;377:1119–1131. doi: 10.1056/NEJMoa1707914. [DOI] [PubMed] [Google Scholar]
- 14.Ernandez T, Mayadas TN. The changing landscape of renal inflammation. Trends Mol. Med. 2016;22:151–163. doi: 10.1016/j.molmed.2015.12.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bowman SJ. Hematological manifestations of rheumatoid arthritis. Scand. J. Rheumatol. 2002;31:251–259. doi: 10.1080/030097402760375124. [DOI] [PubMed] [Google Scholar]
- 16.Astle WJ, et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell. 2016;167:1415–1429. doi: 10.1016/j.cell.2016.10.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Vuckovic D, et al. The polygenic and monogenic basis of blood traits and diseases. Cell. 2020;182:1214–1231. doi: 10.1016/j.cell.2020.08.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Chen M-H, et al. Trans-ethnic and ancestry-specific blood-cell genetics in 746,667 individuals from 5 global populations. Cell. 2020;182:1198–1213. doi: 10.1016/j.cell.2020.06.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Akbari P, et al. A genome-wide association study of blood cell morphology identifies cellular proteins implicated in disease aetiology. Nat Commun. 2023;14:5023. doi: 10.1038/s41467-023-40679-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Li Y, et al. A functional genomics approach to understand variation in cytokine production in humans. Cell. 2016;167:1099–1110. doi: 10.1016/j.cell.2016.10.017. [DOI] [PubMed] [Google Scholar]
- 21.Rodriguez BAT, et al. A platelet function modulator of thrombin activation is causally linked to cardiovascular disease and affects PAR4 receptor signaling. Am. J. Hum. Genet. 2020;107:211–221. doi: 10.1016/j.ajhg.2020.06.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Keramati AR, et al. Genome sequencing unveils a regulatory landscape of platelet reactivity. Nat. Commun. 2021;12:3626. doi: 10.1038/s41467-021-23470-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ellinghaus D, et al. Analysis of five chronic inflammatory diseases identifies 27 new associations and highlights disease-specific patterns at shared loci. Nat. Genet. 2016;48:510–518. doi: 10.1038/ng.3528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Zhu Z, et al. A genome-wide cross-trait analysis from UK Biobank highlights the shared genetic architecture of asthma and allergic diseases. Nat. Genet. 2018;50:857–864. doi: 10.1038/s41588-018-0121-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kebir DE, Filep JG. Modulation of neutrophil apoptosis and the resolution of inflammation through β2 integrins. Front. Immunol. 2013;4:60. doi: 10.3389/fimmu.2013.00060. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Fox S, Leitch AE, Duffin R, Haslett C, Rossi AG. Neutrophil apoptosis: relevance to the innate immune response and inflammatory disease. J. Innate Immun. 2010;2:216–227. doi: 10.1159/000284367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Pillay J, et al. In vivo labeling with 2H2O reveals a human neutrophil lifespan of 5.4 days. Blood. 2010;116:625–627. doi: 10.1182/blood-2010-01-259028. [DOI] [PubMed] [Google Scholar]
- 28.Fortunati E, Kazemier KM, Grutters JC, Koenderman L, Van den Bosch JMM. Human neutrophils switch to an activated phenotype after homing to the lung irrespective of inflammatory disease. Clin. Exp. Immunol. 2009;155:559–566. doi: 10.1111/j.1365-2249.2008.03791.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Casanova-Acebes M, et al. Rhythmic modulation of the hematopoietic niche through neutrophil clearance. Cell. 2013;153:1025–1035. doi: 10.1016/j.cell.2013.04.040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Mikacenic C, Reiner AP, Holden TD, Nickerson DA, Wurfel MM. Variation in the TLR10/TLR1/TLR6 locus is the major genetic determinant of interindividual difference in TLR1/2-mediated responses. Genes Immun. 2013;14:52–57. doi: 10.1038/gene.2012.53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Heffelfinger C, et al. Haplotype structure and positive selection at TLR1. Eur. J. Hum. Genet. 2014;22:551–557. doi: 10.1038/ejhg.2013.194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Lauterbach MA, et al. Toll-like receptor signaling rewires macrophage metabolism and promotes histone acetylation via ATP-citrate lyase. Immunity. 2019;51:997–1011. doi: 10.1016/j.immuni.2019.11.009. [DOI] [PubMed] [Google Scholar]
- 33.Porter AG, Jänicke RU. Emerging roles of caspase-3 in apoptosis. Cell Death Differ. 1999;6:99–104. doi: 10.1038/sj.cdd.4400476. [DOI] [PubMed] [Google Scholar]
- 34.Czabotar PE, Lessene G, Strasser A, Adams JM. Control of apoptosis by the BCL-2 protein family: implications for physiology and therapy. Nat. Rev. Mol. Cell Biol. 2014;15:49–63. doi: 10.1038/nrm3722. [DOI] [PubMed] [Google Scholar]
- 35.Kumar S, Dikshit M. Metabolic insight of neutrophils in health and disease. Front. Immunol. 2019;10:2099. doi: 10.3389/fimmu.2019.02099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Sadiku P, et al. Neutrophils fuel effective immune responses through gluconeogenesis and glycogenesis. Cell Metab. 2021;33:411–423. doi: 10.1016/j.cmet.2020.11.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Chen L, et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell. 2016;167:1398–1414. doi: 10.1016/j.cell.2016.10.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Võsa U, et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 2021;53:1300–1310. doi: 10.1038/s41588-021-00913-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Injarabian L, Devin A, Ransac S, Marteyn BS. Neutrophil metabolic shift during their lifecycle: impact on their survival and activation. Int. J. Mol. Sci. 2019;21:287. doi: 10.3390/ijms21010287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Al-Rashed F, et al. TNF-α induces a pro-inflammatory phenotypic shift in monocytes through ACSL1: relevance to metabolic inflammation. Cell. Physiol. Biochem. 2019;52:397–407. doi: 10.33594/000000028. [DOI] [PubMed] [Google Scholar]
- 41.Renshaw SA, et al. A transgenic zebrafish model of neutrophilic inflammation. Blood. 2006;108:3976–3978. doi: 10.1182/blood-2006-05-024075. [DOI] [PubMed] [Google Scholar]
- 42.Palma-Barqueros V, et al. Expanding the genetic spectrum of TUBB1-related thrombocytopenia. Blood Adv. 2021;5:5453–5467. doi: 10.1182/bloodadvances.2020004057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Freson K, et al. The TUBB1 Q43P functional polymorphism reduces the risk of cardiovascular disease in men by modulating platelet function and structure. Blood. 2005;106:2356–2362. doi: 10.1182/blood-2005-02-0723. [DOI] [PubMed] [Google Scholar]
- 44.Evans JDW, et al. BMPR2 mutations and survival in pulmonary arterial hypertension: an individual participant data meta-analysis. Lancet Respir. Med. 2016;4:129–137. doi: 10.1016/S2213-2600(15)00544-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.West JD, et al. Adverse effects of BMPR2 suppression in macrophages in animal models of pulmonary hypertension. Pulm. Circ. 2019;10:2045894019856483. doi: 10.1177/2045894019856483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Hofherr A, et al. Targeting inflammation for the treatment of diabetic kidney disease: a five-compartment mechanistic model. BMC Nephrol. 2022;23:208. doi: 10.1186/s12882-022-02794-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Chang CC, et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 2015;4:7. doi: 10.1186/s13742-015-0047-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Das S, et al. Next-generation genotype imputation service and methods. Nat. Genet. 2016;48:1284–1287. doi: 10.1038/ng.3656. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Manichaikul A, et al. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867–2873. doi: 10.1093/bioinformatics/btq559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Meyer, H. V. meyer-lab-cshl/plinkQC: plinkQC 0.3.2 (v0.3.2). Zenodo. 10.5281/zenodo.3934294 (2020).
- 51.Bycroft C, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Labun K, et al. CHOPCHOP v3: expanding the CRISPR web toolbox beyond genome editing. Nucleic Acids Res. 2019;47:W171–W174. doi: 10.1093/nar/gkz365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Mountjoy E, et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat. Genet. 2021;53:1527–1533. doi: 10.1038/s41588-021-00945-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Homilius, M. mxhm/blood_perturbation_gwas: initial release (v0.0.1). Zenodo. 10.5281/zenodo.10041992 (2023).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Individual-level data are subject to restrictions imposed by patient consent and local ethics review boards. GWAS summary statistics have been deposited in the GWAS catalog database (GCST90257015-GCST90257105). PGSs as used for the UKBB analyses have been deposited in Figshare (10.6084/m9.figshare.24354235). Clumped significant variants are listed in Supplementary Data 1. Clinical outcomes and quantitative lab measurements associated with blood readouts with Padj < 0.1 are listed in Supplementary Data 2. Clinical outcomes associated with polygenic models derived from blood readouts with Padj < 0.1 are listed in Supplementary Data 3 for the meta-analyses, and Supplementary Data 4 and 5 for the MGB and UKBB cohorts, respectively. Other datasets generated or analyzed during the current study can be made available upon reasonable request to the corresponding authors.
The custom code used in this study is available at 10.5281/zenodo.10041992 (ref. 54). For proprietary or commercial software/tools used in this study, please refer to the materials and methods section for details on how to access them or contact the corresponding author for more information.