Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2025 Jan 7;122(2):e2414018122. doi: 10.1073/pnas.2414018122

Exome sequencing identifies genes for socioeconomic status in 350,770 individuals

Xin-Rui Wu a,b,1, Liu Yang a,b,1, Bang-Sheng Wu a,b,1, Wei-Shi Liu a,b, Yue-Ting Deng a,b, Ju-Jiao Kang c, Qiang Dong a,b, Barbara J Sahakian c,d, Jian-Feng Feng c,e, Wei Cheng a,b,c,2, Jin-Tai Yu a,b,2
PMCID: PMC11745334  PMID: 39772748

Significance

This study unveiled the contribution of protein-coding variants to socioeconomic status (SES) in the general population by using exome sequencing data of 350,770 unrelated individuals. Under the strategy of collapsing rare variants within a gene by different functions and frequencies, 11 genes were identified to be associated with SES, 7 of which were not observed in previous genetic studies. Exome-wide single variant tests confirmed some previously reported common signals for SES. Further analysis using a wide range of health outcomes revealed the pleiotropy of these SES-associated genetic variants. The results improve our understanding of the genetic underpinning of SES and provide clues for future research to investigate the biological mechanisms of SES-health gradient.

Keywords: whole-exome sequencing, rare coding variant, pleiotropy, socioeconomic status, health

Abstract

Socioeconomic status (SES) is a critical factor in determining health outcomes and is influenced by genetic and environmental factors. However, our understanding of the genetic structure of SES remains incomplete. Here, we conducted a large-scale exome study of SES markers (household income, occupational status, educational attainment, and social deprivation) in 350,770 individuals. For rare coding variants, we identified 56 significant associations by gene-based collapsing tests, unveiling 7 additional SES-associated genes (NRN1, CCDC36, RHOB, EP400, NCAM1, TPTEP2-CSNK1E, and LINC02881). Exome-wide single common variant analysis revealed nine lead single-nucleotide polymorphisms (SNPs) associated with household income and 34 lead SNPs associated with EduYears, replicating previous GWAS findings. The gene–environment correlations had a substantial impact on the genetic associations with SES, as indicated by the significantly increased P values in several associations after controlling for geographic regions. Furthermore, we observed the pleiotropic effects of SES-associated genetic factors on a wide range of health outcomes, such as cognitive function, psychosocial status, and diabetes. This study highlights the contribution of coding variants to SES and their associations with health phenotypes.


Socioeconomic status (SES) is usually evaluated by income, educational attainment, occupational status, and social deprivation, representing an individual’s social and financial resources (13). A growing number of evidence supports that genomic differences may impact individual disparities in SES (411). Twin studies have estimated the heritability of educational attainment to be 43% (4). Despite numerous SES-related loci discovered by genome-wide association studies (GWASs) over the past decades, the identified common variants only explain 11 to 22% of the variance in SES traits (58), suggesting that the remaining “missing heritability” remains to be bridged (12). What’s more, the majority of the identified loci reside in noncoding regions, which makes it challenging to directly pinpoint the responsible genes for SES outcomes (13).

The advent of large-scale whole-exome sequencing (WES) offers a valuable resource for identifying SES-associated protein-coding variants (14). This is especially important for elucidating the role of rare variants [minor allele frequency (MAF) < 1%] in influencing SES, which are unrecognized in previous GWASs but represent unique advantages such as having larger effect sizes on complex traits, implicating reliable SES-associated genes and allowing informative functional interpretations (14, 15). Therefore, a systematic WES analysis on SES is needed to refine our understanding of the genetic architecture of SES.

Another critical accompanying question is the shared genetic mechanisms of SES and health. Accumulating evidence supports the role of SES as an important determinant of life outcomes, including longevity, through both environmental and genetic pathways (1618). Genetic findings regarding SES are expected to facilitate a better understanding of the potential mechanisms underlying clinical conditions and to ameliorate health issues associated with socioeconomic disadvantages (19). For example, additional screening programs for specific diseases may be identified, or particularly vulnerable individuals could be assessed at a younger age, or more frequently for a disease to ensure better outcomes. Evaluating the impact of genetic factors related to SES on a wide range of health outcomes would advance our understanding of the genetic pathways underpinning the SES-health linkage.

With the aim of comprehensively understanding the genetic architecture of SES and its effects on human health, we conducted a large-sample exome study of four SES traits (household income, educational attainment, occupational status, and social deprivation) in 350,770 individuals from the UK Biobank (UKB). We performed gene-based collapsing analysis and single-variant analysis for rare and common variants, respectively. SES-associated genes were biologically annotated at multiple levels of tissue types, cell types, and temporal expression patterns. Furthermore, we evaluated the associations between SES genetic factors and a wide range of health conditions and biological indicators (Fig. 1).

Fig. 1.

Fig. 1.

Summary of this study. The figure summarizes the analytical flow and key findings of this study. First, we performed gene-based collapsing analysis and single-variant analysis for rare and common variants from exomes data, respectively. Then, LOVO analysis, conditional analysis, and a series of biological annotations were performed to gain a more detailed characterization of SES genetic architecture. We finally evaluated the associations between SES genetic factors on a range of human health conditions and explored the potential mechanisms underpinning SES-health gradient by using neuroimaging, blood biochemistry, and other bioindicators. Partial illustrations sourced from BioRender with publishing license (https://www.biorender.com/).

Results

Exome-Wide Collapsing Analysis of Rare Variants Identified 56 Significant Associations.

We first aggregated rare variants of putative loss-of-function (pLOF) alone and in combination with likely deleterious missense variants, and performed gene-based collapsing analysis with 4 maximum MAF (max-MAF) cutoffs, utilizing a generalized mixed model implemented in SAIGE-GENE+ (20). In total, 588,764 association tests were performed covering 20,125 genes and 4 SES traits (SI Appendix, Fig. S1 and Tables S1–S3). The genomic inflation factor λ were 1.15, 0.93, 1.12, and 1.02 for years of education (EduYears), occupational status, household income, and Townsend deprivation index (TDI), respectively. To discern the source of the inflation, we performed association tests of exome-wide rare synonymous variants with SES indicators. The Quantile–Quantile (Q-Q) plots of synonymous mask showed good control over the test statistics, implying that a large portion of the observed inflation was attributable to polygenicity rather than confounding biases (SI Appendix, Figs. S2–S5). After Bonferroni correction, we identified 56 statistically significant associations at P < 8.49 × 10−8, involving 11 genes and 15 gene-trait pairs (9 for TDI, 2 for household income, 1 for occupational status and 3 for EduYears) (Fig. 2 AD and Table 1). Among the 15 identified gene-trait pairs, 9 (GIGYF1-EduYears/income/occupation/TDI, ANKRD12-income/TDI, EP400-TDI, NRN1-TDI, and TPTEP2-CSNK1E-TDI) were significant only with pLOF variants, 3 (KDM5B-EduYears, CCDC36-TDI, and NCAM1-TDI) were significant only when both pLOF and missense variants were included, and the remaining 3 pairs (ADGRB2-EduYears, LINC02881-TDI, and RHOB-TDI) were significant regardless of whether only pLOF or the combination of pLOF and missense variants was considered (SI Appendix, Table S4). The patterns of rare genetic variants in each SES-associated gene with the most pronounced P value are demonstrated in Fig. 2E and SI Appendix, Table S5.

Fig. 2.

Fig. 2.

Rare variants for socioeconomic status. Manhattan plots show the results from gene-based collapsing tests for (A) Townsend deprivation index, (B) household income, (C) occupational status, and (D) educational attainment. Each gene-trait pair was tested 12 times according to two function categories (distinguished by shape) and 4 max-MAF categories (distinguished by size) and performed by the SKAT-O test in SAIGE-GENE+ (20), adjusting for age, biological sex, and top 10 PCs. The group with the smallest P value in each gene-trait pair was retained in the Manhattan plots for a concise visualization. The x-axis represents the chromosomes, and the y-axis represents the −log10(P). The red dotted line indicates the Bonferroni-corrected significant threshold of P < 8.49 × 10−8. (E) The counts of pLOF and missense variants contained in genes significant in the gene-based collapsing tests. Function and max-MAF classes were chosen for cases with the smallest P value. The x-axis represents the numbers of pLOF variants, and the y-axis represents the numbers of likely deleterious missense variants. (F) and (G) show the rare mutations in NRN1 and CCDC36, respectively. The corresponding protein domains were determined by SMART (21). The consequence of each variant was annotated according to the canonical transcript of genes. (H) The distribution of TDI categories for NRN1 and CCDC36 carriers and noncarriers.

Table 1.

Significant results from gene-based collapsing tests for rare variants

Phenotype Gene Function category Max-MAF category MAC Burden Beta Burden SE SKAT-O P Known phenotype in OMIM
Townsend deprivation index (n = 299,086) NRN1 pLOF 1.00E-05 10 1.028 0.107 5.23E-22
ANKRD12 pLOF 1.00E-04 215 0.071 0.008 4.63E-18
TPTEP2-CSNK1E pLOF 1.00E-05 10 0.779 0.107 2.72E-13
CCDC36 pLOF+Missense 1.00E-05 10 −0.778 0.107 2.91E-13
LINC02881 pLOF 1.00E-05 10 0.758 0.107 1.16E-12
GIGYF1 pLOF 1.00E-04 102 0.083 0.012 3.18E-12
RHOB pLOF 1.00E-05 8 0.226 0.041 3.76 E-08
EP400 pLOF 1.00E-05 68 0.077 0.014 4.64 E-08
NCAM1 pLOF+Missense 1.00E-05 23 0.132 0.024 5.55 E-08
Household income (n = 258,589) ANKRD12 pLOF 1.00E-04 186 −0.027 0.003 2.33E-16
GIGYF1 pLOF 1.00E-04 84 −0.030 0.005 2.97E-10
Occupational status (n = 296,182) GIGYF1 pLOF 1.00E-04 118 0.072 0.013 7.13E-09
EduYears (n = 297,089) GIGYF1 pLOF 1.00E-04 98 −0.119 0.020 2.16E-10
KDM5B pLOF+Missense 1.00E-04 1,155 −0.023 0.006 3.26E-09 MRT65 (MIM: #618109)
ADGRB2 pLOF 1.00E-04 97 −0.100 0.020 6.78E-09

The group with the smallest P value in each significant gene-trait pair is shown in Table 1. Bolded symbols indicated additional gene–trait associations. Abbreviations: MAC, minor allele count (defined as the aggregated number of variants within a specified function annotation and frequency group on each gene); MRT65, intellectual developmental disorder, autosomal recessive 65.

Seven (NRN1, CCDC36, RHOB, EP400, NCAM1, TPTEP2-CSNK1E, and LINC02881) of 9 TDI-associated genes were additional to those reported in GWAS catalog and exome studies (22, 23) on such socioeconomic outcomes (Fig. 2A). Of note, rare pLOF variants in NRN1 exhibited the most significant association with larger TDI (β = 1.03, 95% CI = [0.82, 1.24], P = 5.23 × 10−22), while the CCDC36 missense variants demonstrated a protective effect on TDI (β = −0.78, 95% CI = [−0.99, −0.57], P = 2.91 × 10−13) (Fig. 2 F and G). In addition to replicating the effects of ANKRD12 and GIGYF1 on TDI (23), we found their associations with income (ANKRD12 and GIGYF1) (Fig. 2B) and occupation (GIGYF1) (Fig. 2C). We also replicated 3 genes (KDM5B, ADGRB2, and GIGYF1) previously reported to be associated with EduYears (8, 9, 22) (Fig. 2D). Our analysis strategy of functional categorization of variants complementarily indicated that the rare missense variants in KDM5B also impacted EduYears, as the P value of this pair got more significant (β = −0.02, 95% CI = [−0.03, −0.01], P = 3.26 × 10−9) after the inclusion of missense class in the mask that already contained pLOF variants (SI Appendix, Table S4), and its number was approaching three folds of the pLOF variants (Fig. 2E and SI Appendix, Table S5). The distribution of phenotypic categories for carriers and noncarriers of significant variants is detailed in Fig. 2H and SI Appendix, Table S6.

We noted that the number of significant associations for TDI was markedly higher than for the other three SES indices. To distinguish whether the results were influenced by gene–environment correlations, we additionally controlled for the geographic region of birth and current address of individuals. The associations of GIGYF1 and LINC2008 with TDI were no longer significant after this adjustment, suggesting that the gene–environment correlations across geographical regions had a substantial impact on them. The remaining seven genes remained nominal significance, with the P-values of ANKRD12, EP400, and RHOB showing minimal changes compared to the original results (SI Appendix, Table S7).

Subsequently, leave-one-variant-out (LOVO) analysis was performed for the significant associations identified in the gene-based collapsing analysis to determine whether the association signals were driven by a burden of multiple rare variants or by a single variant. The largest LOVO P value remained robust for approximately half (6/15) of the gene-trait pairs, including ANKRD12 with household income and TDI, GIGYF1 with EduYears, household income and TDI, as well as KDM5B with EduYears, which suggested these associations were driven by multiple rare variants collectively (SI Appendix, Fig. S6 and Dataset S1). For the remaining 9 pairs, each had at least one notable variant, the exclusion of which resulted in a significantly weakened association signal of the corresponding LOVO mask (with the LOVO P-value no longer meeting the significance threshold), implying the important role of such single variant in the specific association.

Combining GWAS Signals to Prioritize the Convergent Genes.

With genome-wide imputed genotype data from the UKB, we examined whether index common variants signals were present near identified genes in gene-based collapsing tests within a ±500 kb window and whether the effects of rare variants on the SES traits were independent of common variants. ADGRB2 and KDM5B, both associated with EduYears, were found to harbor index single-nucleotide polymorphisms (SNPs) rs2050256 [reported by Lee et al. (9)] (Fig. 3 A and B) and rs10920444 (Fig. 3 C and D), respectively. The intersection of rare and common variants accentuated the relevance of these two genes with education.

Fig. 3.

Fig. 3.

Intersection of rare variations with GWAS signals in ADGRB and KDM5B. (A) The Upper panel is the local illustration of GWAS signals around ADGRB2, and the Lower panel describes the rare mutations in ADGRB2. The index common SNP was rs2050256, previously reported by Lee et al. (9). (B) The results from conditional analysis of ADGRB2. (C) The regional plot shows GWAS signals around KDM5B, with the index common signal rs10920444. (D) The results from conditional analysis of KDM5B. The P values reported in (B) and (D) were calculated from SKAT-O tests, and the effect sizes were estimated through Burden tests.

After conditioning on nearby index common signal, the relationship between EduYears with ADGRB2 and KDM5B remained statistically significant (P = 6.45 × 10−8 and 1.07 × 10−8, respectively), indicating the independent role of rare variants in these two prioritized genes.

Exome-Wide Common Variant Association Tests.

The impact of common variants (MAF ≥ 1%) on SES traits obtained from exome data was examined at the single-variant level. Common variants annotated as “exonic” or “exonic;splicing” by ANNOVAR (24) were included in the association analysis. Substantial inflation of the test statistics was observed for 4 SES traits (λ = 1.66 for EduYears, λ = 1.38 for household income, λ = 1.12 for occupational status, and λ = 1.21 for TDI; Q-Q plots for each SES trait are demonstrated in SI Appendix, Fig. S7). The linkage disequilibrium (LD) score regression (25) intercepts were 1.12 for EduYears, 1.11 for household income, 1.07 for occupational status, and 1.13 for TDI, which were used to correct for the test statistics of single common variant analysis.

At exome-wide significant level of P < 3.50 × 10−7 (Bonferroni correction for 142,680 tests), we identified 9 and 34 lead SNPs for household income and EduYears, respectively (SI Appendix, Table S8). No significant associations were observed for occupational status and TDI. Considering that a threshold of P < 5 × 10−8 is generally used in GWASs, we only selected lead SNPs exceeding the genome-wide significance threshold (P < 5 × 10−8) for comparison with previously reported loci in GWASs (5, 6, 8, 26). However, no additional new signals (requiring LD r2 < 0.05 with known SNPs and not residing within known loci) were detected for household income and EduYears (SI Appendix, Table S8). After controlling for the geographic regions, 9 of 9 income-associated SNPs and 31 of 34 EduYears-associated SNPs remained nominal significance (SI Appendix, Table S9). The distribution of phenotypic categories among carriers and noncarriers of each lead SNP is detailed in SI Appendix, Table S10.

Associations in Other Ancestries from the UKB.

To explore which significant correlations identified in the individuals of White-British were shared among other ancestries, we rerun gene-based collapsing tests for rare variants and single-variant association analysis for common variants in other four ancestries from the UKB (non-British White, n = 25,777; Asian or Asian British, n = 8,591; Black or Black British, n = 6,654; Mixed or other, n = 5,980).

For gene-based collapsing tests (Dataset S2), 10 of the 14 available associations (71.4%) in the non-British White group had consistent directions of the effect sizes with those in the White-British group, with 4 pairs showing stronger magnitudes of effect sizes and having P-values < 0.05 (ANKRD12-income, β [SE] from −0.027 [0.003] to −0.034 [0.012]; GIGYF1-occupation, β [SE] from 0.072 [0.013] to 0.088 [0.042]; GIGYF1-EduYears, β [SE] from −0.119 [0.020] to −0.133 [0.057]; and KDM5B-EduYears, β [SE] from −0.023 [0.006] to −0.053 [0.019]). In the Mixed or other group, 66.7% (8/12) of the associations had consistent directions, and 3 pairs of them had stronger magnitudes, including one pair with P < 0.05 (ANKRD12-TDI, β [SE] from 0.071 [0.008] to 0.178 [0.082]). In the Black or Black British group, the directions of 60% (6/10) associations were consistent, and 3 pairs had stronger magnitudes of effect sizes, including one pair with P < 0.05 (GIGYF1-occupation, β [SE] from 0.072 [0.013] to 0.117 [0.055]). In the Asian or Asian British group, only one pair (8.3%, 1/12) showed consistent effect direction, with reduced effect sizes.

For single-variant analysis, approximately one-quarter of the common variants were captured due to the limitation of sample size (Dataset S3). The concordance of effect size directions with the White-British group was generally high: 100% (11/11) for the non-British White group, 50% (5/10) for the Asian or Asian British group, 87.5% (7/8) for the Black or Black British group, and 72.7% (8/11) for the Mixed or other group.

Sensitivity Analysis.

Educational attainment, as a phenotype at an earlier stage of life, is closely correlated with an individual’s economic status. We reperformed the analysis for the identified variants/genes-household-income/occupational-status/TDI associations with additional adjustment for EduYears. The associations between CCDC36/TPTEP2-CSNK1E/LINC02881 and TDI were attenuated notably, indicating that these effects were likely dependent on education levels (SI Appendix, Table S11). For the remaining 21 associations, all P values were less than 1 × 10−3 (SI Appendix, Tables S11 and S12).

Furthermore, the economic characteristics of the geographical region where an individual is living can also influence SES. Considering that the TDI can reflect these characteristics, we have included TDI as covariate in the association tests for EduYears/household-income/occupational status. All 49 associations identified in the primary analysis remained P < 1 × 10−3 after controlling for TDI (SI Appendix, Tables S13 and S14). Moreover, 43 of 49 (88%) associations reached the exome-wide significance threshold (P < 8.5 × 10−8 for gene-based collapsing tests and P < 3.5 × 10−7 for single-variant association tests).

To further clarify the potential confounding effects, we stratified the population by EduYears level (3 classes: 1) high, 2) middle, and 3) low) or TDI (5 classes: 1) 80 to 100%, 2) 60 to 80%, 3) 40 to 60%, 4) 20 to 40%, and 5) 0 to 20%; lower percentiles represented higher levels of social deprivation), and repeated the gene-based collapsing and single variant association tests. In the stratified analyses, all identified associations remained at least nominally significant (SI Appendix, Tables S15–S18), with several associations showing substantially weaker signals, possibly due to the reduced statistical power resulting from the smaller sample size.

Biological Annotation of SES-Associated Genes.

To examine the expression characteristics of SES-associated genes (including the 11 significant genes identified in the gene-based collapse tests and the 39 genes where common lead SNPs were located) in the human body, we performed enrichment analysis with the Genotypes-Tissue Expression (GTEx) bulk RNA-seq data from 54 tissues. Twenty-three tissue types showed significant enrichment at the Bonferroni corrected P < 9.25 × 10−4 (0.05/54 tissue types) (Fig. 4A and SI Appendix, Table S19), including all 13 brain regions. Using the BrainSpan developmental transcriptome data (27), we further found that the average expression levels of SES-associated genes in the brain were elevated across developmental stages, consistent with the trajectory highlighted by previous GWAS on EduYears (9) (Fig. 4B). The SES-associated genes were highly expressed in excitatory neurons (Fig. 4 C and D) based on single-nucleus RNA-seq (snRNA-seq) data from the human brain (28). We utilized SynGO (29) to visualize the involvement of SES genes in synapses-related processes. There were four and three genes located in presynaptic (BSN, KCNC4, NCAM1, and SGIP1) and postsynaptic regions (KCNC4, NCAM1, and PTPRF), respectively (Fig. 4E and SI Appendix, Table S20). The genes involved in the vital process in synapse such as signaling (NRN1 and NCAM1), organization (RHOB, BSN, PTPRF, and NCAM1), and transport (NRN1) are demonstrated in Fig. 4F. No significant enrichment of Gene Ontology (GO) pathway was observed.

Fig. 4.

Fig. 4.

Biological annotation of 50 socioeconomic status–associated genes. (A) Results from tissue types enrichment analysis. The x-axis represents 54 GTEx tissue types (red dot for 13 brain regions, blue for 39 nonbrain regions, and purple for 2 cell lines), and the y-axis represents the −log10(P). The black dotted line represents the significant threshold of P < 9.25 × 10−4 (0.05/54). (B) The average expression of SES-associated genes in the human brain with BrainSpan developmental transcriptome data (27). The x-axis represents the developmental period, and the y-axis represents the average brain expression. (C) and (D) show the cell types expressed by SES genes. The darker color of the dot in (D) indicates the higher relative expression level of SES-associated genes in that cell type. (E) Synaptic location and (F) process annotated by SynGO (29). The counts of genes in each synaptic ontology term are indicated by the color darkness. Abbreviations: OPC, oligodendrocyte progenitor cells.

Associations between SES-Associated Variants and Health Conditions/Traits.

We examined the associations of SES-associated variants [including rare variants on genes (n = 11) that were found to be significant in the gene-based collapse analyses, and common lead SNPs (n = 40) in the single-variant association tests] with 54 selected human health outcomes, including psychosocial factors (n = 3), cognitive function (n = 7), and diseases of 10 systems (n = 44) (SI Appendix, Table S21). After multiple testing correction, a total of 17 gene-phenotype pairs [P < 8.42 × 10−5 (0.05/11/54)] and 36 common variant-phenotype associations [P < 2.31 × 10−5 (0.05/40/54)] reached the significant threshold (Fig. 5 A and B). Consistent with previous findings of genetic correlations calculated using GWAS summary statistics (6), genetic variants associated with lower SES were linked to lower fluid intelligence score, a greater tendency toward neuroticism personality, as well as a higher risk of type 2 diabetes mellitus (T2DM), obesity, and depression. Other quantitative traits, including two cognitive function (longer reaction time and more errors in pairs matching test) and two psychosocial factors (higher levels of loneliness and social isolation) were also related to those disadvantageous SES-associated variants. Furthermore, several additional associations with diseases were observed, including rs2275155 on SDCCAG8 with chronic renal failure (OR = 1.05, 95% CI = [1.04, 1.08], P = 5.67 × 10−7), and rs1060105 on SBNO1 with irritable bowel syndrome (IBS) (OR = 0.95, 95% CI = [0.93, 0.97], P = 1.84 × 10−5) (Fig. 5C and Datasets S4–S7). The observation of these associations above may be due to the practical involvement of a genetic variant in SES and health traits, or it may be that SES correlated with health traits through nongenetic factors. To reduce the interference of SES as an important environmental factor influencing health outcomes, we have further reperformed the association analysis with additional adjustment for EduYears, household income, occupational status, and TDI. Eighteen associations (such as KDM5B-atrial fibrillation/flutter and GIGYF1-T2DM, ANKRD12-fluid intelligence score) remained significant, suggesting that the effects were unlikely to be driven by differences in SES itself (SI Appendix, Tables S22 and S23). Notably, SES was also influenced by other social confounders which were not considered in this paper; therefore, these associations did not fully distinguish between shared genetic etiologies or other assumptions such as mediated pleiotropy (6).

Fig. 5.

Fig. 5.

Associations with health-related traits. (A) Results from gene-based collapsing tests with health conditions for identified rare variants and (B) results from single-variant association tests with health conditions for identified common variants. The x-axis represents the categories of selected conditions, and the y-axis represents the −log10(P). (C) The number of significant associations of each identified variant with health conditions. (D) The heatmap shows the variants with at least one significant association with brain structure indexes, including 68 cortical regions (first 34 rows) and 16 subcortical regions (last 8 rows). A darker shade represents a smaller P value of the association.

To uncover the potential link between the SES-associated genetic variants and human biological indicators, we examined the gene-based and single-variant (for significant rare and common variants, respectively) associations with MRI phenotypes of brain and heart, blood biochemistry markers, and breath spirometry measures (SI Appendix, Table S21). We found 32 gene-phenotype pairs [P < 1.65 × 10−5 (0.05/11/276)] and 546 common variant-phenotype associations [P < 4.53 × 10−6 (0.05/40/276)], over two-thirds (393/578) of which were related to brain structure (Fig. 5D, SI Appendix, Figs. S8 and S9, and Datasets S8 and S9). The brain regions with the most relationships were the left rostral anterior cingulate, as well as the right inferior and superior temporal cortex. The common variant rs1881194 in KANSL1 showed the most significant negative associations with the areas of the bilateral fusiform gyrus (P = 6.86 × 10−52 for right hemisphere and 9.27 × 10−50 for left), which is an important region of visual categorization with a particularly critical role in face perception (30, 31). Notably, 3 variants [rs1881194 in KANSL1 (mentioned above), rs7220206 in ARHGAP27, and rs12452273 in PLEKHM1] showed widespread associations with multiple brain imaging phenotypes, including smaller inferior temporal area and lower hippocampus volume. All three variants were located at chromosome 17q21.31, a complex locus involved in various neurodevelopmental disorders and cognitive impairment (32, 33), and both PLEKHM1 and KANSL1 were established Mendelian diseases genes, the mutations of which could cause osteopetrosis (autosomal dominant 3) [MIM: #618107] and KANSL1-Koolen-De Vries syndrome [MIM: #610443], respectively. SES-associated variants also presented associations with insulin-like growth factor-1 (IGF-1) and glycated hemoglobin (HbA1c) in blood biochemical markers and forced expiratory volume (FEV) in spirometry (Datasets S8 and S9), although we found no notable effect of these SES genetic factors on respiratory diseases such as chronic obstructive pulmonary disease (COPD). No significant association with heart MRI phenotypes was observed. Moreover, we collected the recorded biological functions and medical associations for SES-associated genes (Dataset S10).

Carrier Frequencies of SES-Associated Variants in Different Sexes.

Finally, we compared whether the carrier frequencies of identified variants differed in males and females using Fisher’s exact tests. At P < 9.80 × 10−4 (0.05/51), no significant differences in the carrier frequencies by biological sex were observed (SI Appendix, Table S24).

Discussion

By conducting the largest exome sequencing study of socioeconomic phenotypes to our knowledge, we pinpointed the impact of coding variants on SES covering the entire allele frequency spectrum. Fifty genes (11 from rare variants and 39 from common variants) were revealed in the individuals of White-British ancestry, 7 of which presented additional associations with SES. The gene–environment correlations across geographical regions had a substantial impact on some of these genetic associations. The identified genetic factors of SES were also associated with various human health outcomes and biological traits, particularly for cognitive function, psychosocial status, and diabetes.

This study substantially broadened the genetic characteristics of SES by identifying rare variants in NRN1, CCDC36, RHOB, EP400, NCAM1, TPTEP2-CSNK1E, and LINC02881 that were associated with SES traits. The application of large sample exomes enabled us to comprehensively assess the role of rare variants in SES, addressing one of the gaps existed in the previous GWASs. In the case of neurotrophic factor NRN1, the most significant gene in the collapsing tests, it has been investigated in a number of studies since its discovery by Nedivi et al. (34) to support its involvement in synaptic plasticity and neuritogenesis (3537). Recent proteome studies have linked NRN1 protein to cognitive resilience (38, 39) and validated its role of resisting the loss of dendritic spine density in rat hippocampal neurons (39). Our study provides genetic evidence of associations between the burden of ultrarare pLOF variants in NRN1 (consisting of 4 start loss/gained variants and 3 frameshift variants) with more serious social deprivation. Subsequent biological annotation was consistent with previous findings that it involved in the synaptic process of transport and signaling. Although NRN1 did not show significant associations with cognitive function or neuropsychiatric traits, the multidimensional evidence suggested it may play a role in social consequences through neural-associated biological processes. Another instance was CCDC36, which exhibited a protective effect against social deprivation and this effect was dependent on education. Previous GWASs have also found that an intronic SNP rs113011189 and a downstream SNP rs150252215 of CCDC36 were associated with higher educational attainment (9, 40). As an interacting factor of HORMAD1, CCDC36 is involved in the formation of DNA double-strand breaks (DSBs) during meiosis (41, 42). Little is known about the precise biological mechanisms by which CCDC36 is involved in SES. However, our findings suggest that the rare missense variant aggregate in CCDC36 may impact SES, warranting validation through animal experiments and exploration of the underlying pathways.

In addition to providing additional SES-associated genes, the other genes overlapping with previous studies were replicated, convincing their contribution to the SES traits. We prioritized ADGRB2 and KDM5B among numerous loci provided by previous GWAS datasets because of the convergence of rare and common variants. KDM5B has been associated with developmental disorders (43), autism (44), and cognition (45) in the exome study and has been validated to impact cognition, behavior, and muscle in rat models (45). While less clear about the role of ADGRB2, it belongs to the brain-specific angiogenesis inhibitor (BAI) subfamily of the adhesion G-protein-coupled receptors (GRCPs) superfamily (46). The other two subfamily members, BAI1 and BAI3, have gained more attention in the past and are known to be implicated in synaptic growth, angiogenesis inhibition, and tumor progression (46). Common SNPs on ADGRB2 (BAI2) were associated with traits such as EA, smoking initiation, and height in GWASs. We captured the nearby SNP rs2050256 reported by Lee et al. (9) and conditioned on it, revealing the burden of rare variants in ADGRB2 had an independent effect on EduYears.

Another important finding was the associations between SES-associated genetic factors and human health traits. In this population-based cohort, SES-related variants were found to be correlated with worse cognitive function, poorer psychosocial status, and higher risk of T2DM, atrial fibrillation and flutter, liver diseases, depression, IBS, and chronic renal failure, partly in line with phenotypical associations reported in epidemiologic studies (1, 17, 47) and genetic correlations calculated using GWAS summary statistics (6). In addition, there were widespread effects of SES-associated genetic variants on brain structure, blood markers, and spirometry. These findings support the potential shared genetic factors of SES and health conditions and well-being and may prove helpful in reducing health inequalities and improving health outcomes for those individuals disadvantaged by SES. Here are several potential applications. First, education and occupation are commonly used as proxies of cognitive reserve, which has been employed in many epidemiological and genetic studies (7, 48, 49). For instance, Ko et al. conducted a GWAS on occupational attainment to uncover new cognitive-associated loci (7). Cognitive reserve, as a theoretical concept, is difficult to measure directly, while education and occupation can be easily assessed in large samples and present relatively objective information (48). The genetic findings of SES offer a practical avenue for understanding the genetic factors of differences in cognitive reserve across individuals. Second, future research focusing on diseases could benefit from distinguishing SES-associated genetic variants from those associated with both diseases and SES, which would help advance disease-specific mechanisms (50, 51). For example, Marees et al. reported that the genetic overlap with SES impacted the heritability of mental health traits and genetic correlation patterns among these traits (50). Third, our findings lay the groundwork for integrating rare variants into polygenic scores (PGI) for SES, which could be added into clinical composite scores to enhance disease prediction, used as control variables in epidemiological studies, or used for risk stratification (9).

This study complies with the ethical framework of a recent report on sociogenomics research (51). We detailed the use of the study population and phenotypes, as well as the analytical pipeline to guarantee the reproducibility and transparency of our findings. Large-sample exome data were included, and reliable computational models were employed to ensure sufficient statistical power. More importantly, we would next go into how to appropriately interpret these genetic findings of SES. First, the genes identified in this study were obtained from association analyses, with no equivalents to the “causal genes for SES.” The observed correlations between genetic variants and SES could be deconstructed into the direct genetic effect, indirect genetic effects, and confounding effects (52). The direct genetic effects are constructed by complex pathways of various heritable traits, involving cognitive abilities, personality traits (such as extroversion, openness to experience, and conscientiousness), health, and other characteristics (5, 53). The indirect effects (such as genetic nurture) and confounders (including environmental and other genetic factors) also play an indispensable role in shaping one’s SES (54, 55). Environmental confounders vary across geographic regions and population subgroups, and influence SES in an intertwined manner with genetic factors (52). Although we controlled for top 10 principal components (PCs), which helps to minimize spurious genetic associations with traits varying for nongenetic causes in subgroups, the confounding effects were not fully removed. To obtain unbiased estimations of direct genetic effects on SES, large-scale within-family genetics data are needed, in which the offspring’s genotypes are determined by random assortment of parental genotypes during meiosis (19, 52). Additionally, there is an undeniable impact from gene–environment correlations across geographical regions on the discovery of genetic associations (56), which is reflected in the notable changes in the significance of some associations when additionally controlling for geographical regions in our analysis. Third, SES is a complex trait determined by genetics, environment, and their interactions, and the influence of genetic variants on SES is modulated and/or mediated by environmental factors. A biodetermined interpretation, whereby a person is genetically assigned to be SES advantaged or disadvantaged, is inappropriate, especially in the context of SES traits being relatively distal from the fundamental biological processes (19).

There are several limitations in our study. The major one lies in the lack of replication evidence from external large cohorts with both SES data and WES (or whole-genome sequencing), which are unavailable for us now. Second, the limited sibling samples in the UKB constrain the statistic power of within-family analysis. We caution that our population-based results should be interpreted carefully due to confounders from family and social environment. Future studies with expanded within-sibling cohorts are essential to confirm these findings. Third, the population in the main analysis was restricted to White-British from the UKB. Despite our attempts to extend to other ancestries, statistical power was limited by sample size. Large-scale exome studies are warranted for other ancestries, which are necessary to advance a fair understanding of the genetic structure regarding SES and to improve life outcomes. Fourth, participants from the UKB were likely to report fewer health conditions compared to the sampling population, i.e., “healthy volunteer” selection bias (57). The effects of variants on phenotypes may be underestimated. Fifth, the examination of the effects of nongenetic factors on SES was limited in the present study. Environmental conditions beyond the individual, such as overall economic trends and job opportunities, can influence SES. Although we have tested the robustness of the results by additionally correcting for TDI, the effects of similar environmental factors on individual’s SES were not completely controlled for. Finally, although the association of protein-coding variants with SES in this study was systematically assessed by exome data, the contribution of variants outside protein-coding regions and other types (such as microsatellites and structural variants) to SES is also important, and large-sample whole-genome sequencing is warranted to further refine the genetic architecture of SES.

In conclusion, this study utilizes large-scale exome sequencing data to provide a genetic framework underlying SES and reveal its correlations with human health outcomes. Our findings indicate the contribution of coding variants to the individual disparities in socioeconomic characteristics, which could help gain a more comprehensive understanding of how genetic and environmental factors jointly influence socioeconomic outcomes. Additionally, the pleiotropic effects of genetic variants on SES and health outcomes provide a basis for further exploration of the biological mechanisms of health outcomes and the effects of SES.

Methods

Study Cohort and Phenotypes.

The UKB is a large-scale prospective cohort study recruiting over 500,000 individuals across England, Scotland, and Wales. Participants ranged in age from 37 to 73 y at the enrollment during 2006 to 2010, and all have signed informed consent and agreed to take a follow-up. Ethical approval for the UK Biobank was received from the North West Multi-centre Research Ethics Committee (11/NW/03820), and written informed consent was obtained from all participants. This project corresponds to UKB application ID 19542.

Occupational status, household income, TDI, and EduYears were included in this study to measure SES. This four-marker framework of SES has also worked well in previous studies (1, 3, 58). Occupational status was collected from Data field 6142. Those who answered “In paid employment or self-employed” or “Doing unpaid or voluntary work” or “Full or part-time student” or “Retired” were considered to be in employment status, while those with “Unemployed” or “Looking after home and/or family” or “Unable to work because of sickness or disability” were considered unemployed (1). The participants with response “prefer not to answer” or “None of the above” were excluded. Household income was obtained from Data field 738. We mapped the responses to a five-point scale (1 = less than £18,000, 2 = £18,000 to £30,999, 3 = £31,000 to £51,999, 4 = £52,000 to £100,000, and 5 = greater than £100,000), and then included household income as a continuous variable in our analyses, which was in line with the GWAS of income (5). TDI at baseline was collected from Data field 189 where the index was assigned to individuals based on their location, with positive values indicating areas of social and material deprivation and negative values indicating relative affluence. EduYears was derived from a two-step conversion process: 1) the qualification (Data field 6138) was converted to the International Standard Classification of Education (ISCED) 1997 level (59) and then 2) to specific years following the equivalency scale (SI Appendix, Table S2) (10). Through touchscreen question “Which of the following qualifications do you have?” (Data field 6138), the responses of participants were classified into 7 categories: 1) college/university degree = ISCED 5 = 20 EduYears; 2) National Vocational Qualification/Higher National Diploma/Higher National Certificate/equivalent = ISCED 5 = 19 EduYears; 3) Other professional qualifications such as nursing and teaching = ISCED 4 = 15 EduYears; 4) advanced (A) levels/advanced subsidiary (AS) levels/equivalent = ISCED 3 = 13 EduYears; 5) ordinary (O) levels/General Certificates of Secondary Education/equivalent = ISCED 2 = 13 EduYears; 6) Certificates of Secondary Education (CSEs)/equivalent = ISCED 2 = 10 EduYears; and 7) none of the above = ISCED 1 = 7 EduYears. The participants with response “prefer not to answer” were excluded. Except for occupational status, which was a dichotomous variable, the remaining 3 phenotypes were treated as continuous traits. More details can be found in SI Appendix, Tables S1 and S2.

Whole-Exome Sequencing Data and Quality Control.

The exome sequencing data in the UK Biobank were available for 454,787 participants (14). Detailed sequencing protocols have been described for the initial 50,000 release (60). Exomes were captured using the IDT × Gen Exome Research Panel v1.0, which targets 39 Mb of the human genome (19,396 genes). Sequencing was conducted on the Illumina NovaSeq 6000 platform with dual-indexed 75 × 75 bp paired-end reads. In each sample and among targeted bases, coverage exceeds 20× at 95.2% of sites on average. Additional quality control (QC) analogous to a previous study (61) was performed on the basis of the QQFE pVCF files provided by the UK Biobank. In brief, multiallelic sites were split into biallelic sites, and we set any calls with low genotype quality, extreme low or high genotype depth to no-call. The variants with call rate less than 90%, Hard-Weinberg P-value below 1 × 10−15, or located in the Ensembl low-complexity region were excluded. We further removed samples that aligned with any of the following criteria: 1) duplicates, 2) dropped from the study, 3) inconsistent between self-reported sex and genetically inferred sex, 4) Ti/Tv, Het/Hom, SNV/indel, and singletons counts beyond 8 SD from the mean, 5) with second or closer genetic relationships, and 6) non-British. The top 10 within-ancestral PCs were calculated by PLINK version 2.0 using an initially defined set of high-quality independent autosomal variants (MAF > 0.1%, missingness < 1%, HWE P-value > 1.0 × 10−6, and two rounds of pruning using –indep-pairwise 200 100 0.1 and --indep-pairwise 200 100 0.05). After QC filtering, 350,770 individuals were ultimately remained for the main analysis.

Variant Annotation.

Rare variants were annotated with SnpEff (62), and the most severe protein consequence was selected for each variant. As previously described (14), variants were classified into two major categories: 1) pLOF, including stop gained, stop loss, start loss, splice donor, splice acceptor, and frameshift; and 2) likely deleterious missense, which was defined as those predicted to be deleterious by all 5 annotation resources [sorting intolerant from tolerant (63), PolyPhen-2 HDIV and PolyPhen-2 HVAR (64), LRT (65), and MutationTaster (66)]. Common variants were annotated by ANNOVAR (24), and those annotated as “exonic” or “exonic;splicing” (n = 35,670) were included in the single-variant association analysis.

Exome-Wide Gene-Based Collapsing Analysis for Rare Variants.

Rare variants (MAF < 1%) of pLOF or/and likely deleterious missense were collapsed to respective genes. Gene-based collapsing analysis was performed using a generalized mixed model implemented in SAIGE-GENE+, which could considerably reduce the type I error issues and improve computational efficiency of rare variant tests (20). The covariates included age, biological sex, and top 10 genetic PCs. Each gene-trait pair was tested 8 times according to two function categories (pLOF variants alone or in combination with missense variants) and 4 max-MAF categories (1%, 0.1%, 0.01%, and 0.001%). The P-values reported in the main text were from the sequence kernel association optimal (SKAT-O) tests, and the effect sizes were estimated through Burden tests. The significance threshold for gene-based collapsing tests was set at P < 8.49 × 10−8, applying Bonferroni correction for a total of 588,764 association tests.

Adjustment for Geographic Regions.

We processed the geographical regions according to Abdellaoui et al.’s work (56). Briefly, place of birth in the United Kingdom was collected from Data field 129 (north coordinate) and 130 (east coordinate), and current address was collected from Data field 22704 (north coordinate, 1 km resolution) and 22702 (east coordinate, 1 km resolution). The individuals’ birth and current address were mapped to the nearest Middle Layer Super Output Area (MSOA) regions using sp R package (67, 68). MSOA regions with ≥100 individuals were included in the analysis.

Leave-One-Variant-Out Analysis.

To determine the contribution of single rare variants, LOVO analysis was performed for all significant gene-phenotype associations identified in the gene-based collapsing analysis. For each variant included in the mask, the LOVO scheme built a new mask eliminating that variant. Thus, if there were n variants of a mask in the gene-based collapsing test, there would be n LOVO masks generated and tested for association with the phenotype. If the association signal of a LOVO mask was significantly weakened, the single variant excluded from that LOVO mask was considered to be important for the specific gene-phenotype association.

Conditional Analysis.

As described in a previous study (61), we first conducted common (MAF > 1%) variant-trait association analysis in ±500 kb region of the significant genes with UK Biobank version 3 imputed genotype data (Category 263). The details of genotype calling, QC, and imputation have been described (69). Additional QC was performed by PLINK version 2.0 (70), removing SNPs with call rate <95%, MAF < 1% and Hardy–Weinberg P < 1.0 × 10−50, as well as individuals with missing genotype rate >5%, sex mismatches, abnormal sex chromosome aneuploidy, heterozygosity rate outliers, non-White British ancestry, and having more than 10 putative third-degree relatives. Common variant-trait association tests were performed using PLINK version 2.0 (70). Linear regression was conducted for continuous traits (EduYears, TDI, and household income) and logistic regression for binary trait (occupational status), adjusting for age, biological sex, and top 10 PCs. Index SNPs were then identified by applying the command --clump-p1 1e-05 and --clump-r2 0.1 in PLINK (70). Finally, we rerun the gene-based collapse analysis with the index SNP as an additional covariate.

Single-Variant Association Analysis for Common Variants.

A total of 35,670 common (MAF > 1%) variants obtained from exome data in the UKB were included in the single-variant association analysis, which was performed using PLINK version 2.0 (70), and adjusted for age, biological sex as well as top 10 PCs. The significance threshold was set at P < 3.50 × 10−7 (Bonferroni correction for 142,680 tests). The SNPs that passed the significance line were then clumped by LD r2 < 0.1 to determine the lead SNPs. R package LDlinkR was utilized to calculate r2 between identified SNPs in this study and index SNPs in previous GWASs (71). Publicly available reference haplotypes from the 1000 Genomes Project are utilized to calculate population-specific measures of LD, and we set the population code as “EUR” (= EUROPEAN). Only r2 calculated from SNPs in the same chromosome was reported. Signal was defined as additional if it passed the genome-wide significance threshold of P < 5 × 10−8, had not been reported in the GWAS Catalog (72) (https://www.ebi.ac.uk/gwas/), and was independent (LD r2 < 0.05) from previously reported index variants in GWASs. Lead SNPs with distances less than 1 Mb were considered to be located at the same locus. We defined a locus as the 500 kb upstream and downstream of the lead SNPs contained therein. Locus was declared additional if it did not contain variants reported genome-wide significant in previous GWASs.

Distribution in Phenotypic Categories Among Carriers and Noncarriers.

The phenotypic distribution of all significant variants carriers and noncarriers was counted. For a more visual comparison, we divided the quantitative traits into ordered categories. Household income (1 = less than £18,000, 2 = £18,000 to £30,999, 3 = £31,000 to £51,999, 4 = £52,000 to £100,000, and 5 = greater than £100,000) and EduYears (7, 10, 13, 15, 19, and 20 y) were divided into 5 and 6 classes according to the original responses, respectively. TDI were divided into quintiles. The values were sorted in descending order, then the top 20% were defined as 5, the 20 to 40% as 4, and so on.

Associations in Other Ancestries from the UK Biobank.

Four other ancestries were determined according to genetically definition and self-report: non-British White (n = 25,777), Asian or Asian British (n = 8,591), Black or Black British (n = 6,654), and Mixed or other (n = 5,980). Gene-based collapsing tests for rare variants and single-variant association analysis for common variants were conducted in four ancestries from the UKB. The covariates included age, biological sex, and top 10 PCs.

Biological Annotation.

Identified genes were biological annotated at multiple levels of tissue types, expression patterns in the brain, and cell types. GO pathways enrichment analysis was performed by R package clusterProfiler (73). Enrichment analysis of tissue types was performed by FUMA (74). We calculated the average expression of SES-associated genes in the brain using the BrainSpan developmental transcriptome data (27). To better compare with previous GWAS results, we also calculated the average brain expression levels of the prioritized genes resulted from Lee et al.’s paper (9). The expression of identified genes in brain cell types was evaluated utilizing a single-nucleus RNA-seq study (75). The relative expression levels of gene set by the AddModuleScore function implemented in Seurat R package (https://satijalab.org/seurat/). Synaptic annotation was conducted on a web-based platform SynGO (29).

Phenome-Wide Association Studies.

We next explored the effects of SES-associated variants on a range of human health traits. In the first part, we selected 44 disease outcomes based on previous epidemiological research on the associations of SES with mental and physical health conditions and longevity (17, 76). We also included seven cognitive and three psychosocial traits, which have reliable genetic overlap with SES (50). In the second part, we selected objective biological indicators, including 39 blood biochemistry markers, 220 brain MRI markers (volume, thickness, and surface area of 68 cortical regions, and volume of 16 subcortical regions), eight heart MRI markers, and nine spirometry measures. These biological indicators can reflect the structure and/or function of the major systems such as metabolic, neurological, cardiovascular, and respiratory, and may offer mechanistic insights into the relationships between SES-associated variants and health. Specific information on the phenotypes is demonstrated in SI Appendix, Table S21.

Clinical diseases were processed as binary traits, and the remaining traits as quantitative. Consistent with the main analyses, gene-based collapsing tests were performed using SKAT-O tests in SAIGE-GENE+ (20), and single-variant association tests were performed by PLINK version 2.0 (70), respectively. Age, sex, and top 10 PCs were included as covariates in model 1, and four SES indicators (EduYears, household income, occupational status, and TDI) were further adjusted in model 2. Significance thresholds were determined by multiple corrections for the number of tests for rare and common variants, respectively.

Carrier Frequencies by Sex.

We calculated the carrier frequencies of SES-associated genetic variants in males and females, respectively, and examined the differences using Fisher’s exact tests. P < 9.80 × 10−4 (0.05/51) was considered significant.

Supplementary Material

Appendix 01 (PDF)

Dataset S01 (XLSX)

Dataset S02 (XLSX)

pnas.2414018122.sd02.xlsx (15.6KB, xlsx)

Dataset S03 (XLSX)

pnas.2414018122.sd03.xlsx (17.7KB, xlsx)

Dataset S04 (XLSX)

pnas.2414018122.sd04.xlsx (305.5KB, xlsx)

Dataset S05 (XLSX)

pnas.2414018122.sd05.xlsx (68.7KB, xlsx)

Dataset S06 (XLSX)

pnas.2414018122.sd06.xlsx (213.6KB, xlsx)

Dataset S07 (XLSX)

pnas.2414018122.sd07.xlsx (55.3KB, xlsx)

Dataset S08 (XLSX)

Dataset S09 (XLSX)

Dataset S10 (XLSX)

pnas.2414018122.sd10.xlsx (23.1KB, xlsx)

Acknowledgments

We thank the participants and professionals of the UK Biobank. This study was supported by grants from the STI2030-Major Projects (2022ZD0211600), National Natural Science Foundation of China (92249305, 82071201, and 81971032), Shanghai Municipal Science and Technology Major Project (No.2018SHZDZX01), Shanghai Talent Development Funding for The Project (2019074), Research Start-up Fund of Huashan Hospital (2022QD002), Excellence 2025 Talent Cultivation Program at Fudan University (3030277001), and ZHANGJIANG LAB, Tianqiao and Chrissy Chen Institute, and the State Key Laboratory of Neurobiology and Frontiers Center for Brain Science of Ministry of Education, Fudan University. W.C. was supported by grants from the National Natural Sciences Foundation of China (No. 82071997) and the Shanghai Rising-Star Program (No. 21QA1408700).

Author contributions

Q.D., J.-F.F., W.C., and J.-T.Y. designed research; X.-R.W., L.Y., B.-S.W., W.-S.L., Y.-T.D., and J.-J.K. analyzed data; and X.-R.W., B.-S.W., and B.J.S. wrote the paper.

Competing interests

The authors declare no competing interest.

Footnotes

This article is a PNAS Direct Submission.

Contributor Information

Wei Cheng, Email: wcheng@fudan.edu.cn.

Jin-Tai Yu, Email: jintai_yu@fudan.edu.cn.

Data, Materials, and Software Availability

Open-source R package SAIGE-GENE+ was used to run gene-based collapsing tests, and the code was available from the GitHub (https://github.com/saigegit/SAIGE) (20). The consequence of significant variants was annotated by VEP (https://asia.ensembl.org/Homo_sapiens/Tools/VEP) (77). Partial illustrations in Fig. 1 were obtained from BioRender with publishing license (https://www.biorender.com/) (78). The protein domains were determined by SMART (https://smart.embl.de/) (21). LD with previously reported loci was performed by R package LDlinkR (https://github.com/CBIIT/LDlinkR) (71). GO pathways enrichment analysis was performed by R package clusterProfiler version 4.4.4 (https://github.com/YuLab-SMU/clusterProfiler) (73). Enrichment analysis of tissue types and GWAS gene sets was performed by FUMA (https://fuma.ctglab.nl/) (74). Synaptic annotation was conducted on a web-based platform SynGO (https://www.syngoportal.org/) (29). The code of the main analysis and visualization of single-nucleus RNA-seq data was an adaptation of the R package Seurat version 4.3.0 and available from https://satijalab.org/seurat/index.html (79). https://github.com/bulik/ldsc This project corresponds to UK Biobank application ID 19542. Individual-level genetic and phenotypic data from the UK Biobank dataset are available at https://biobank.ndph.ox.ac.uk/ (69) by application. The single-nucleus RNA-seq data used in this study was available from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE173731 (28). All other data are included in the manuscript and/or supporting information.

Supporting Information

References

  • 1.Zhang Y. B., et al. , Associations of healthy lifestyle and socioeconomic status with mortality and incident cardiovascular disease: Two prospective cohort studies. BMJ 373, n604 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Antonoplis S., Studying socioeconomic status: Conceptual problems and an alternative path forward. Perspect. Psychol. Sci. 18, 275–292 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Farah M. J., Socioeconomic status and the brain: Prospects for neuroscience-informed policy. Nat. Rev. Neurosci. 19, 428–438 (2018). [DOI] [PubMed] [Google Scholar]
  • 4.Silventoinen K., et al. , Genetic and environmental variation in educational attainment: An individual-based analysis of 28 twin cohorts. Sci. Rep. 10, 12681 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hill W. D., et al. , Genome-wide analysis identifies molecular systems and 149 genetic loci associated with income. Nat. Commun. 10, 5741 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hill W. D., et al. , Molecular genetic contributions to social deprivation and household income in UK Biobank. Curr. Biol. 26, 3083–3089 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ko H., et al. , Genome-wide association study of occupational attainment as a proxy for cognitive reserve. Brain 145, 1436–1448 (2022). [DOI] [PubMed] [Google Scholar]
  • 8.Okbay A., et al. , Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nat. Genet. 54, 437–449 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lee J. J., et al. , Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Okbay A., et al. , Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Rietveld C. A., et al. , GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Manolio T. A., et al. , Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Edwards S. L., Beesley J., French J. D., Dunning A. M., Beyond GWASs: Illuminating the dark road from association to function. Am. J. Hum. Genet. 93, 779–797 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Backman J. D., et al. , Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 599, 628–634 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Li T., et al. , The functional impact of rare variation across the regulatory cascade. Cell Genom. 3, 100401 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ye C. J., et al. , Mendelian randomization evidence for the causal effects of socio-economic inequality on human longevity among Europeans. Nat. Hum. Behav. 7, 1357–1370 (2023), 10.1038/s41562-023-01646-1. [DOI] [PubMed] [Google Scholar]
  • 17.Kivimäki M., et al. , Association between socioeconomic status and the development of mental and physical health conditions in adulthood: A multi-cohort study. Lancet Public Health 5, e140–e149 (2020). [DOI] [PubMed] [Google Scholar]
  • 18.Judd N., et al. , Cognitive and brain development is independently influenced by socioeconomic status and polygenic scores for educational attainment. Proc. Natl. Acad. Sci. U.S.A. 117, 12411–12418 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Harden K. P., Koellinger P. D., Using genetics for social science. Nat. Hum. Behav. 4, 567–576 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Zhou W., et al. , SAIGE-GENE+ improves the efficiency and accuracy of set-based rare variant association tests. Nat. Genet. 54, 1466–1469 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Letunic I., Khedkar S., Bork P., SMART: Recent updates, new developments and status in 2020. Nucleic Acids Res. 49, D458–D460 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Chen C. Y., et al. , The impact of rare protein coding genetic variation on adult cognitive function. Nat. Genet. 55, 927–938 (2023), 10.1038/s41588-023-01398-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Karczewski K. J., et al. , Systematic single-variant and gene-based association testing of thousands of phenotypes in 394,841 UK Biobank exomes. Cell Genom. 2, 100168 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Yang H., Wang K., Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat. Protoc. 10, 1556–1566 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Bulik-Sullivan B. K., et al. , LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kweon H., et al. , Associations between common genetic variants and income provide insights about the socioeconomic health gradient. bioRxiv [Preprint] (2024). 10.1101/2024.01.09.574865 (Accessed 28 August 2024). [DOI]
  • 27.Kang H. J., et al. , Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Garcia F. J., et al. , Single-cell dissection of the human brain vasculature. Nature 603, 893–899 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Koopmans F., et al. , SynGO: An evidence-based, expert-curated knowledge base for the synapse. Neuron 103, 217–234.e214 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Grill-Spector K., Weiner K. S., The functional architecture of the ventral temporal cortex and its role in categorization. Nat. Rev. Neurosci. 15, 536–548 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Rangarajan V., et al. , Electrical stimulation of the left and right human fusiform gyrus causes different effects in conscious face perception. J. Neurosci. 34, 12828–12836 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Bowles K. R., et al. , 17q21.31 sub-haplotypes underlying H1-associated risk for Parkinson’s disease are associated with LRRC37A/2 expression in astrocytes. Mol. Neurodegener. 17, 48 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Cooper Y. A., et al. , Functional regulatory variants implicate distinct transcriptional networks in dementia. Science 377, eabi8654 (2022). [DOI] [PubMed] [Google Scholar]
  • 34.Nedivi E., Hevroni D., Naot D., Israeli D., Citri Y., Numerous candidate plasticity-related genes revealed by differential cDNA cloning. Nature 363, 718–722 (1993). [DOI] [PubMed] [Google Scholar]
  • 35.Fujino T., et al. , CPG15 regulates synapse stability in the developing and adult brain. Genes Dev. 25, 2674–2685 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Naeve G. S., et al. , Neuritin: A gene induced by neural activity and neurotrophins that promotes neuritogenesis. Proc. Natl. Acad. Sci. U.S.A. 94, 2648–2653 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.An K., et al. , Neuritin can normalize neural deficits of Alzheimer’s disease. Cell Death Dis. 5, e1523 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Yu L., et al. , Cortical proteins associated with cognitive resilience in community-dwelling older persons. JAMA Psychiatry 77, 1172–1180 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Hurst C., et al. , Integrated proteomics to understand the role of Neuritin (NRN1) as a mediator of cognitive resilience to Alzheimer’s disease. Mol. Cell Proteomics 22, 100542 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Demange P. A., et al. , Investigating the genetic architecture of noncognitive skills using GWAS-by-subtraction. Nat. Genet. 53, 35–44 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Laroussi H., et al. , Characterization of the REC114-MEI4-IHO1 complex regulating meiotic DNA double-strand break formation. EMBO J. 42, e113866 (2023), 10.15252/embj.2023113866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Stanzione M., et al. , Meiotic DNA break formation requires the unsynapsed chromosome axis-binding protein IHO1 (CCDC36) in mice. Nat. Cell Biol. 18, 1208–1220 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Martin H. C., et al. , Quantifying the contribution of recessive coding variation to developmental disorders. Science 362, 1161–1164 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Satterstrom F. K., et al. , Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell 180, 568–584.e523 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Chen C. Y., et al. , The impact of rare protein coding genetic variation on adult cognitive function. Nat. Genet. 55, 927–938 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Hamann J., et al. , International union of basic and clinical pharmacology. XCIV. Adhesion G protein-coupled receptors. Pharmacol. Rev. 67, 338–367 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Lix L. M., et al. , Socioeconomic variations in the prevalence and incidence of Parkinson’s disease: A population-based analysis. J. Epidemiol. Community Health 64, 335–340 (2010). [DOI] [PubMed] [Google Scholar]
  • 48.Pappalettera C., Carrarini C., Miraglia F., Vecchio F., Rossini P. M., Cognitive resilience/reserve: Myth or reality? A review of definitions and measurement methods. Alzheimers Dement. 20, 3567–3586 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Almeida-Meza P., Richards M., Cadar D., Moderating role of cognitive reserve markers between childhood cognition and cognitive aging: Evidence from the 1946 British birth cohort. Neurology 99, e1239–e1250 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Wendt F. R., et al. , Multivariate genome-wide analysis of education, socioeconomic status and brain phenome. Nat. Hum. Behav. 5, 482–496 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Meyer M. N., et al. , Wrestling with social and behavioral genomics: Risks, potential benefits, and ethical responsibility. Hastings Cent. Rep. 53 (suppl. 1), S2–S49 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Young A. I., Benonisdottir S., Przeworski M., Kong A., Deconstructing the sources of genotype-phenotype associations in humans. Science 365, 1396–1400 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Belsky D. W., et al. , The genetics of success: How single-nucleotide polymorphisms associated with educational attainment relate to life-course development. Psychol. Sci. 27, 957–972 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Kong A., et al. , The nature of nurture: Effects of parental genotypes. Science 359, 424–428 (2018). [DOI] [PubMed] [Google Scholar]
  • 55.Wang B., et al. , Robust genetic nurture effects on education: A systematic review and meta-analysis based on 38,654 families across 8 cohorts. Am. J. Hum. Genet. 108, 1780–1791 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Abdellaoui A., Dolan C. V., Verweij K. J. H., Nivard M. G., Gene-environment correlations across geographic regions affect genome-wide association studies. Nat. Genet. 54, 1345–1354 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Fry A., et al. , Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am. J. Epidemiol. 186, 1026–1034 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Marees A. T., et al. , Genetic correlates of socio-economic status influence the pattern of shared heritability across mental health traits. Nat. Hum. Behav. 5, 1065–1073 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.UNESCO, International Standard Classification of Education (ISCED) 1997 (2006). https://uis.unesco.org/sites/default/files/documents/international-standard-classification-of-education-1997-en_0.pdf. Accessed 31 October 2022.
  • 60.Van Hout C. V., et al. , Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 586, 749–756 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Jurgens S. J., et al. , Analysis of rare genetic variation underlying cardiometabolic diseases and traits among 200,000 individuals in the UK Biobank. Nat. Genet. 54, 240–250 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Cingolani P., et al. , A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Vaser R., Adusumalli S., Leng S. N., Sikic M., Ng P. C., SIFT missense predictions for genomes. Nat. Protoc. 11, 1–9 (2016). [DOI] [PubMed] [Google Scholar]
  • 64.Adzhubei I., Jordan D. M., Sunyaev S. R., Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. Chapter 7, Unit7.20 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Chun S., Fay J. C., Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Schwarz J. M., Rödelsperger C., Schuelke M., Seelow D., MutationTaster evaluates disease-causing potential of sequence alterations. Nat. Methods 7, 575–576 (2010). [DOI] [PubMed] [Google Scholar]
  • 67.Pebesma E. J., Bivand R. S., Classes and methods for spatial data in R. R News 5, 9–13 (2005). [Google Scholar]
  • 68.Bivand R. S., Gómez-Rubio E. P. V., Applied Spatial Data Analysis with R (Springer New York, NY, 2013), 10.1007/978-1-4614-7618-4. [DOI] [Google Scholar]
  • 69.Bycroft C., et al. , The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Purcell S., et al. , PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Myers T. A., Chanock S. J., Machiela M. J., LDlinkR: An R package for rapidly calculating linkage disequilibrium statistics in diverse populations. Front. Genet. 11, 157 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Buniello A., et al. , The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Yu G., Wang L. G., Han Y., He Q. Y., clusterProfiler: An R package for comparing biological themes among gene clusters. Omics 16, 284–287 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Watanabe K., Taskesen E., van Bochoven A., Posthuma D., Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Morabito S., et al. , Single-nucleus chromatin accessibility and transcriptomic characterization of Alzheimer’s disease. Nat. Genet. 53, 1143–1155 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Ye C. J., et al. , Mendelian randomization evidence for the causal effects of socio-economic inequality on human longevity among Europeans. Nat. Hum. Behav. 7, 1357–1370 (2023). [DOI] [PubMed] [Google Scholar]
  • 77.Martin F. J., et al. , Ensembl 2023. Nucleic Acids Res. 51, D933–D941 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Wu X. R., WES_SES_Figure1. Created in BioRender. https://BioRender.com/a77f549. Deposited 3 December 2023.
  • 79.Hao Y., et al. , Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat. Biotechnol. 42, 293–304 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 01 (PDF)

Dataset S01 (XLSX)

Dataset S02 (XLSX)

pnas.2414018122.sd02.xlsx (15.6KB, xlsx)

Dataset S03 (XLSX)

pnas.2414018122.sd03.xlsx (17.7KB, xlsx)

Dataset S04 (XLSX)

pnas.2414018122.sd04.xlsx (305.5KB, xlsx)

Dataset S05 (XLSX)

pnas.2414018122.sd05.xlsx (68.7KB, xlsx)

Dataset S06 (XLSX)

pnas.2414018122.sd06.xlsx (213.6KB, xlsx)

Dataset S07 (XLSX)

pnas.2414018122.sd07.xlsx (55.3KB, xlsx)

Dataset S08 (XLSX)

Dataset S09 (XLSX)

Dataset S10 (XLSX)

pnas.2414018122.sd10.xlsx (23.1KB, xlsx)

Data Availability Statement

Open-source R package SAIGE-GENE+ was used to run gene-based collapsing tests, and the code was available from the GitHub (https://github.com/saigegit/SAIGE) (20). The consequence of significant variants was annotated by VEP (https://asia.ensembl.org/Homo_sapiens/Tools/VEP) (77). Partial illustrations in Fig. 1 were obtained from BioRender with publishing license (https://www.biorender.com/) (78). The protein domains were determined by SMART (https://smart.embl.de/) (21). LD with previously reported loci was performed by R package LDlinkR (https://github.com/CBIIT/LDlinkR) (71). GO pathways enrichment analysis was performed by R package clusterProfiler version 4.4.4 (https://github.com/YuLab-SMU/clusterProfiler) (73). Enrichment analysis of tissue types and GWAS gene sets was performed by FUMA (https://fuma.ctglab.nl/) (74). Synaptic annotation was conducted on a web-based platform SynGO (https://www.syngoportal.org/) (29). The code of the main analysis and visualization of single-nucleus RNA-seq data was an adaptation of the R package Seurat version 4.3.0 and available from https://satijalab.org/seurat/index.html (79). https://github.com/bulik/ldsc This project corresponds to UK Biobank application ID 19542. Individual-level genetic and phenotypic data from the UK Biobank dataset are available at https://biobank.ndph.ox.ac.uk/ (69) by application. The single-nucleus RNA-seq data used in this study was available from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE173731 (28). All other data are included in the manuscript and/or supporting information.


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES