Skip to main content
. 2023 Sep 19;13:15500. doi: 10.1038/s41598-023-42578-0

Figure 1.

Figure 1

Workflow of the analytic pipeline. We used sequencing data from 1000 Genomes Project phase 3 outbred and unrelated subjects (N = 1699 European (EUR), East Asian (EAS), Southern Asian (SAS) and African (AFR) individuals; American individuals were not used due to admixed ancestry) and GWAS summary statistics from schizophrenia49. ancestry-specific summary statistics for European and East Asian subjects were used for additional sensitivity analyses. Overlapping variants from both sources (5,554,437 SNPs) were included in the subsequent analyses. Throughout the whole process, markers of RNS were considered as the top 5% LD-independent variants with the highest probability of being markers of natural selection (RNS markers; NSNPs = 8679; LD-clumping based on SCZ). As a sensitivity analysis, we used top 5% LD-independent variants as RNS markers after LD-clumping based on RNS (LD-RNS markers; NSNPs = 6262). Enrichment of RNS was also tested in Alzheimer’s disease as a methodological control. Enrichment, effect-stratified analysis (bias towards risk or protection), and biological profiling of RNS in schizophrenia were performed as described in Methods. RNS-stratified polygenic score analyses (PGSSCZ) were performed in an independent Spanish case–control sample (CIBERSAM; NSCZ = 1927; NHC = 1561) and previous SCZ GWAS39 as discovery sample. LDSC-SEG = LDSC applied to specifically expressed genes.