Skip to main content
Diabetes logoLink to Diabetes
. 2025 Nov 14;75(1):205–214. doi: 10.2337/db25-0772

Development and Validation of a Type 1 Diabetes Multi-Ancestry Polygenic Score

Aaron J Deutsch 1,2,3,4, Andrew S Bell 4, Dominika A Michalek 5, Adam B Burkholder 6, Stella Nam 1,2,3, Raymond J Kreienkamp 1,2,3,4,7, Seth A Sharp 1,2,3, Alicia Huerta-Chagoya 1,3, Ravi Mandla 1,3,8,9, Ruth Nanjala 10, Yang Luo 10, Richard A Oram 11,12, Jose C Florez 1,2,3,4, Suna Onengut-Gumuscu 5, Stephen S Rich 5, Maggie CY Ng 13, Alison A Motsinger-Reif 14, Alisa K Manning 3,4,15, Josep M Mercader 1,3,4, Miriam S Udler 1,2,3,4,
PMCID: PMC12716612  PMID: 41236419

Abstract

Polygenic scores strongly predict type 1 diabetes risk, but most scores were developed in European-ancestry populations. In this study, we leveraged recent multiancestry genome-wide association studies to create a Type 1 Diabetes Multi-Ancestry Polygenic Score (T1D MAPS). We trained the score in the Mass General Brigham (MGB) Biobank (372 individuals with type 1 diabetes) and tested the score in the All of Us program (86 individuals with type 1 diabetes). We evaluated the area under the receiver operating characteristic curve (AUC), and we compared the AUC to two published single-ancestry scores for European (EUR) and African (AFR) populations: T1D Genetic Risk Score 2 (GRS2EUR) and T1D GRSAFR. We also developed an updated score (T1D MAPS2) that combines T1D GRS2EUR and T1D MAPS. Among individuals with non-European ancestry, the AUC of T1D MAPS was 0.90, significantly higher than T1D GRS2EUR (0.82) and T1D GRSAFR (0.82). Among individuals with European ancestry, the AUC of T1D MAPS was slightly lower than T1D GRS2EUR (0.89 vs. 0.91). However, T1D MAPS2 performed equivalently to T1D GRS2EUR in European ancestry (0.91 vs. 0.91) and performed better in non-European ancestry (0.90 vs. 0.82). Overall, these findings advance the accuracy of type 1 diabetes genetic risk prediction across diverse populations.

Article Highlights

  • Type 1 diabetes polygenic scores are highly predictive of disease risk, but their performance varies based on genetic ancestry.

  • Can we develop a polygenic score that accurately predicts type 1 diabetes risk across diverse populations?

  • Our novel polygenic score performs similar to existing scores in European populations, and it demonstrates superior performance in non-European populations.

  • This polygenic score will improve prediction of type 1 diabetes risk in genetically diverse populations.

Graphical Abstract

A schematic summarises development and testing of a multi ancestry polygenic score for type 1 diabetes. Discovery cohorts include A F R, A M R, and E U R ancestry, combining H L A haplotypes and non H L A variants to generate T 1 D M A P S and T 1 D M A P S 2. Validation in an independent testing cohort shows higher area under the curve for T 1 D M A P S 2 in non E U R ancestry and similar performance in E U R ancestry compared with existing genetic risk scores, indicating improved cross ancestry prediction.

Introduction

Type 1 diabetes is a complex disease with multiple genetic and environmental risk factors. Identifying individuals at high risk of type 1 diabetes can promote earlier detection, reducing diabetes-related morbidity and selecting potential candidates for disease-modifying monoclonal antibody therapy (1). Furthermore, the diagnosis of type 1 diabetes may be overlooked among individuals with known diabetes; although type 1 diabetes classically occurs in childhood, the disease can occur throughout adulthood (2) and may be misdiagnosed, particularly in individuals with atypical presentations (3–5).

Polygenic scores—which integrate the effects of multiple variants across the genome—offer a powerful approach to determine disease risk. Current polygenic scores display an outstanding ability to distinguish between individuals with and without type 1 diabetes, with an area under the receiver operating characteristic curve (AUC) of ≥0.9 in European genetic ancestry (6). However, because existing polygenic scores were primarily developed in European populations, their accuracy may decrease when applied to other ancestry groups (7,8) or to genetically admixed populations (9–11). Furthermore, the distribution of polygenic scores may differ across ancestry groups, creating a need for ancestry-specific score thresholds (12). Novel polygenic scores can incorporate ancestry-specific risk variants to improve disease prediction in specific populations (13,14). However, this approach requires investigators to select the optimal ancestry-matched polygenic score, which can be challenging, particularly for individuals from admixed genetic backgrounds.

Concerns about the transferability of polygenic scores have motivated calls to increase population diversity in genetic studies (15) and develop methods to capture disease risk across diverse populations (16–19). Multiancestry risk prediction is particularly challenging in type 1 diabetes because disease risk is strongly influenced by the HLA genes, which exhibit substantial variation across global populations (20). Current polygenic scores, also called genetic risk scores (GRS), have been restricted to a small number of genetic variants that reach genome-wide significance, but novel methods have led to global extended polygenic scores that capture the effects of millions of variants across the genome (21).

In this study, we leveraged recent diverse genetic studies to create a novel global extended polygenic score, Type 1 Diabetes Multi-Ancestry Polygenic Score (T1D MAPS). Compared with existing polygenic scores, T1D MAPS demonstrates equivalent predictive power in European ancestry and improved predictive power in non-European ancestry, including African and admixed American ancestry. Notably, we observed significant results despite a modest type 1 diabetes sample size in our training and testing cohorts. Furthermore, the distribution of T1D MAPS is similar across populations, allowing for a single universal score threshold to identify high-risk individuals. These results advance our understanding of type 1 diabetes risk and will help ensure optimal care for all individuals with type 1 diabetes.

Research Design and Methods

Construction of the Type 1 Diabetes Polygenic Score

We focused separately on the MHC—which includes the highly influential HLA genes—and variants outside the MHC (Fig. 1A). For the HLA score, we used summary statistics from a recent multiancestry genome-wide association study (GWAS) of type 1 diabetes (discovery cohort 1), which included >4,000 individuals with type 1 diabetes comprising three genetic ancestry groups: African (AFR), Admixed American (AMR), and European (EUR) (22). For the non-HLA score, we used a recent EUR-ancestry GWAS that included ∼19,000 individuals with type 1 diabetes (discovery cohort 2) (23).

Figure 1.

A flowchart outlines the T 1 D M A P S framework integrating H L A and non-H L A polygenic scores for type 1 diabetes prediction. It includes discovery and validation using multiancestry cohorts, imputation of H L A alleles, alignment with reference panels, and score calculation. Regression models combine coefficients from HLA and non-H L A scores to produce final T 1 D M A P S values, with performance evaluated through receiver operating characteristic analysis.

Overview of experimental design. A: Description of discovery cohorts used to select weights for T1D MAPS-HLA and T1D MAPS–non-HLA. B: Approach used to calculate T1D MAPS-HLA and T1D MAPS–non-HLA in training and testing cohorts. C: Approach used to combine T1D MAPS-HLA and T1D MAPS–non-HLA into overall T1D MAPS or T1D MAPS2 score. ROC, receiver operating characteristic; TAPAS, Typing At Protein for Association Studies; T1DGC, Type 1 Diabetes Genetics Consortium.

Biobank Cohorts

To train the polygenic score, we used the Mass General Brigham (MGB) Biobank, a repository linked to electronic medical records at the MGB Hospital system in Boston, Massachusetts (24) (Fig. 1B). Analysis of MGB Biobank was approved by the MGB Institutional Review Board (study protocol 2016P001018). Individuals were genotyped using the Illumina Multi-Ethnic Genotyping Array or the Illumina Infinium Global Screening Array (25). Imputation was performed using the Trans-Omics for Precision Medicine (TOPMed) r2 reference panel (26). Data from MGB Biobank were current as of October 2022.

To validate the score, we used the All of Us research program, a longitudinal cohort study across the U.S. with detailed survey data and health information (27) (Fig. 1B). Analysis of the All of Us cohort was approved by an institutional Data Use and Registration Agreement between MGB and the All of Us Research Program (study protocol 2020P002213). Individuals underwent short-read whole-genome sequencing (28). We performed analyses with the All of Us Controlled Tier Dataset v7 release. MGB Biobank and All of Us were both independent of the discovery cohorts used to identify type 1 diabetes genetic associations.

Phenotype Definitions

MGB Biobank

Type 1 diabetes was defined based on manual review of medical records by a trained medical reviewer, as described previously (29). All criteria were required: type 1 diabetes diagnosis confirmed by endocrinologist or primary care physician, current use of basal/bolus insulin regimen or insulin pump, and no secondary cause of diabetes listed in the medical record. Type 2 diabetes was defined using a phenotype algorithm developed by MGB Biobank, with a set positive predictive value of 0.95.

All of Us

Because individual-level clinical notes were not available, type 1 diabetes was defined based on structured data in the medical record, as described previously (30). All criteria were required: presence of type 1 diabetes diagnosis code before age 30, presence of insulin prescription, and absence of prescription for noninsulin glucose-lowering agents. As a secondary analysis, we tested a type 1 diabetes phenotype algorithm developed by the Electronic Medical Records and Genomics (eMERGE) consortium (“T1D-EHR”) (31). We also implemented a separate algorithm adapted from the eMERGE consortium to define type 2 diabetes (“T2D-EHR+”) (31).

Classification of Genetic Ancestry

We performed principal component analysis in MGB Biobank and projected the principal components onto a diverse reference panel from the Human Genome Diversity Panel (HGDP) and the 1000 Genomes Project (32). We used a random forest classifier to assign participants to one of six continental ancestry groups: African (AFR), Latino/admixed American (AMR), East Asian (EAS), Middle Eastern (MID), European (EUR), or South Asian (SAS). This approach generated a continuous probability for each ancestry group. For categorical assignments, individuals were assigned to the ancestry group with the highest probability. The All of Us Research Program used a similar approach to determine genetic ancestry, and results were made available to all investigators using the Researcher Workbench (28).

The demographic distribution of each data set is provided in Supplementary Table 1. To allow for statistical testing, we only analyzed ancestry groups that included at least five individuals with type 1 diabetes. This resulted in the following ancestry groups: AFR, AMR, and EUR. In accordance with the All of Us Data and Statistics Dissemination Policy, no data or aggregate statistics corresponding to <20 participants are displayed; therefore, all genetic ancestries aside from European were combined into a single category labeled as “non-European.”

Generation of HLA Haplotypes

In MGB Biobank, we used genotype array data to impute classical HLA alleles with HLA-Typing At Protein for Association Studies (HLA-TAPAS) (33,34), using an HLA reference panel that includes 21,546 whole-genome sequences spanning five global populations. We then constructed phased multilocus HLA class II haplotypes, including HLA-DRB1, HLA-DQA1, and HLA-DQB1.

In All of Us, we assembled HLA classical alleles from whole-genome sequencing data using Kourami (35). We then performed phasing by constructing each of four possible diploid genotypes across HLA-DRB1, HLA-DQA1, and HLA-DQB1, resulting in eight possible haplotypes. We compared each possible haplotype to the list of 38 HLA haplotypes included in the discovery cohort (22). If an individual had more than two possible haplotypes listed in the discovery cohort, or if more than one possible genotype could be constructed from the available haplotypes, that individual was assigned an HLA score of zero (i.e., disease risk equivalent to the population mean). The total number of haplotypes included for each individual is displayed in Supplementary Table 2.

Construction of T1D MAPS-HLA

We assigned a score for each of 38 HLA-DRB1-DQA1-DQB1 haplotypes based on the log odds of association with type 1 diabetes in discovery cohort 1 (22) (Supplementary Table 3). If more than one ancestry-specific odds ratio was available, we assigned a single score for each haplotype by taking the log odds from the population with the largest available sample size in the discovery cohort (EUR > AFR > AMR). Results were similar if instead we selected the log odds from the population most closely matching the ancestry of each individual in the testing cohort. Any haplotype not found in the discovery cohort was assigned a score of zero.

In addition, we developed a second version of the HLA score, T1D MAPS2-HLA, by calculating an average of T1D MAPS-HLA and T1D GRS2EUR, weighted by the percentage predicted EUR ancestry:

T1DMAPS2HLA = probabilityEUR* T1DGRS2EURHLA+(1probabilityEUR)* T1DMAPSHLA

Construction of T1D MAPS–non-HLA

For the non-HLA score, we used GWAS summary statistics from discovery cohort 2 (23). We excluded variants located in the MHC region, defined as chr6:28,000,000–34,000,000 in GRCh37 coordinates. We implemented polygenic risk scores with continuous shrinkage (PRS-CS) (36), which applies a CS parameter (φ) to generate a global extended polygenic score, using a EUR-ancestry reference panel from 1000 Genomes that includes ∼1 million genetic variants from the HapMap3 consortium. Using MGB Biobank as a training cohort, we tested a range of φ values (10−2 to 10−6) (Supplementary Table 4), and we selected φ = 10−5 to generate the final non-HLA score (Supplementary File).

As a secondary analysis, we implemented PRS-CSx (37), which extends PRS-CS by integrating GWAS data across multiple ancestries. We used AFR and AMR summary statistics from discovery cohort 1 (22) and EUR summary statistics from discovery cohort 2 (23). Likewise, we used ancestry-specific reference panels from 1000 Genomes. Using MGB Biobank as a training cohort, we constructed a logistic regression model, and we applied the resulting β-coefficients to calculate the non-HLA score as a weighted average of the three ancestry-specific scores.

Construction of Overall T1D MAPS Score

For the overall T1D MAPS score, we used MGB Biobank as a training cohort (Fig. 1C). We ran a logistic regression model and calculated the overall score as a weighted average of the HLA score and the non-HLA score:

T1D MAPS = βHLA* T1DMAPSHLA + βnonHLA* T1DMAPSnonHLA

We applied the same β-coefficients to generate the overall T1D MAPS score in All of Us. For T1D MAPS2, we used a similar approach, but we substituted an updated version of the HLA score (T1D MAPS2-HLA); the non-HLA score remained the same (Fig. 1C).

Evaluation of Polygenic Score Performance

We evaluated each polygenic score with the AUC. As a comparison, we implemented two previously published polygenic scores: T1D GRS2EUR, which was developed in EUR ancestry (6), and T1D GRSAFR, which was developed in AFR ancestry (13). For our primary analysis, we compared the AUC of T1D MAPS to T1D GRS2EUR using the DeLong test. As secondary analyses, we also compared the AUCs for different components of each polygenic score (HLA vs. non-HLA), and we assessed the AUC of each polygenic score in a logistic regression model that included genetic principal components. Because HLA allele frequency is highly correlated with genetic ancestry (20), we chose to focus our primary analysis on the AUC of each polygenic score on its own, following prior studies (6,13).

Data and Resource Availability

Data from the All of Us Research Program are available to authorized users at https://www.researchallofus.org. Data from MGB Biobank are available to MGB-affiliated researchers with approval from the MGB Institutional Review Board. The weights files and code used to generate polygenic scores are available online: T1D MAPS and T1D MAPS2, https://github.com/snam-mgh/T1D_MAPS; T1D GRS2EUR, https://github.com/sethsh7/PRSedm/; and T1D GRSAFR, https://www.pgscatalog.org/score/PGS000023/.

Results

Development of HLA Score

We trained T1D MAPS in MGB Biobank (Fig. 1B), which contained ∼64,000 individuals, including 372 individuals with type 1 diabetes (335 with EUR ancestry and 37 with non-EUR ancestry) (Supplementary Table 1). First, we imputed HLA haplotypes from genotype array data. We found that the genetic variants used to tag HLA haplotypes in T1D GRS2EUR were highly correlated with imputed haplotypes within EUR ancestry, but the correlation was weaker in non-EUR ancestry (mean R2 = 0.90 vs. 0.75, Wilcoxon P = 9.1 × 10−3) (Supplementary Fig. 1A). This supported our hypothesis that a multiancestry score may better capture HLA allelic variation among non-EUR populations.

Next, we implemented T1D MAPS-HLA in the training cohort, and we evaluated the AUC when comparing type 1 diabetes to all other individuals. We compared T1D MAPS-HLA with the HLA components of T1D GRS2EUR and T1D GRSAFR (Supplementary Fig. 2A). T1D GRSAFR-HLA had the lowest AUC (0.80–0.82). In non-EUR ancestry, the AUC of T1D MAPS-HLA was not significantly different from T1D GRS2EUR-HLA (0.89 vs. 0.84; P = 0.08, DeLong test). However, in EUR ancestry, T1D MAPS-HLA had a marginally lower AUC than T1D GRS2EUR-HLA (0.84 vs. 0.86, P = 0.04). To address this discrepancy, we developed an updated score, T1D MAPS2-HLA, which comprises a weighted average of T1D MAPS-HLA and T1D GRS2EUR-HLA based on an individual’s predicted proportion of EUR genetic ancestry. In EUR ancestry, T1D MAPS2-HLA performed marginally better than T1D GRS2EUR-HLA (AUC = 0.862 vs. 0.861, P = 0.03), although this small difference is unlikely to be clinically significant. In non-EUR ancestry, T1D MAPS2-HLA had a higher AUC than T1D GRS2EUR-HLA (0.89 vs. 0.84), but the result was not statistically significant (P = 0.06), likely related to the small number of non-EUR participants.

Development of Non-HLA Score

We then evaluated T1D MAPS–non-HLA in MGB Biobank. We tested two approaches: PRS-CS (36), using EUR summary statistics alone (23), or PRS-CSx (37), integrating data across multiple ancestries (AFR [22], AMR [22], and EUR [23]). We found that the AUC using PRS-CSx was lower compared with PRS-CS (EUR, 0.68 vs. 0.73, P = 9.9 × 10−4; non-EUR, 0.62 vs. 0.66, P = 0.29). Therefore, we chose to focus on the non-HLA score produced by PRS-CS.

When we assessed the performance in MGB Biobank, we found that T1D MAPS–non-HLA had a significantly higher AUC than T1D GRS2EUR–non-HLA within EUR ancestry (0.73 vs. 0.65, P = 1.7 × 10−6) (Supplementary Fig. 2B). In non-EUR ancestry, the AUC of T1D MAPS–non-HLA was not significantly different than T1D GRS2EUR–non-HLA (0.66 vs. 0.56, P = 0.18). T1D GRSAFR–non-HLA had the lowest AUC (0.54–0.58), likely because this score only includes two genetic variants.

Development of Overall T1D MAPS

To construct the overall polygenic score, we applied a logistic regression model in MGB Biobank to model type 1 diabetes status as a function of T1D MAPS-HLA and T1D MAPS–non-HLA. First, we assessed the AUC of each overall score in MGB Biobank, acknowledging the possibility of overfitting (Fig. 2A). Within EUR ancestry, T1D MAPS and T1D GRS2EUR had an equivalent AUC (0.88 vs. 0.88, P = 0.90), whereas T1D MAPS2 showed superior performance (0.90 vs. 0.88, P = 3.0 × 10−3). Among non-EUR ancestry, T1D MAPS and T1D MAPS2 were both superior to T1D GRS2EUR (0.89 vs. 0.80, P = 0.01; 0.90 vs. 0.80, P = 7.3 × 10−3). Both scores also outperformed T1D GRSAFR in this population (0.89 vs. 0.81, P = 1.8 × 10−3; 0.90 vs. 0.81, P = 8.4 × 10−4).

Figure 2.

A bar chart compares the area under the curve values of four genetic risk models for type 1 diabetes across European and non-European populations. In both the M G B Biobank training cohort and the All of Us testing cohort, T 1 D M A P S and T 1 D M A P S 2 outperform ancestry-specific genetic risk scores, demonstrating higher prediction accuracy across ancestries.

AUC of type 1 diabetes polygenic scores. AUCs are displayed in MGB Biobank (training cohort) (A) and All of Us (testing cohort) (B). All AUCs were compared with the AUC of T1D GRS2EUR using the DeLong test. Error bars denote the 95% CI. *P < 0.05.

Validation of T1D MAPS

To exclude the possibility of overfitting, we applied T1D MAPS in an external cohort. We used the All of Us Research Program, which contained >100,000 individuals with genomic data, including 86 individuals with strictly defined type 1 diabetes (61 with EUR ancestry and 25 with non-EUR ancestry) (Supplementary Table 1). We used whole-genome sequencing data to assemble multilocus HLA haplotypes. Once again, we found that the genetic variants used to tag HLA haplotypes in T1D GRS2EUR were highly correlated with sequencing-based haplotypes among individuals with EUR ancestry, but the correlation was weaker in non-EUR ancestry (mean R2 = 0.92 vs. 0.78, Wilcoxon P = 1.6 × 10−3) (Supplementary Fig. 1B).

In the testing cohort, within EUR ancestry, T1D MAPS had slightly lower AUC compared with T1D GRS2EUR (0.89 vs. 0.91, P = 0.02) (Fig. 2B), but T1D MAPS2 and T1D GRS2EUR performed similarly (0.91 vs. 0.91, P = 0.45). Among non-EUR ancestry, T1D MAPS and T1D MAPS2 were each superior to T1D GRS2EUR (0.90 vs. 0.82, P = 0.04; 0.90 vs. 0.82, P = 0.04) and to T1D GRSAFR (0.90 vs. 0.82, P = 7.3 × 10−3; 0.90 vs. 0.82, P = 7.1 × 10−3). When we analyzed the HLA and non-HLA components separately, we observed similar patterns in both the training and testing cohorts (Supplementary Fig. 2). In addition, after controlling for 4 or 10 principal components, our primary findings were unchanged (Supplementary Fig. 3). When we divided the cohort into three subgroups based on the probability of European genetic ancestry, we observed similar trends, although most findings were not statistically significant, likely due to reduced sample size (Supplementary Fig. 4). Finally, we confirmed that T1D MAPS could accurately distinguish between individuals with type 1 and type 2 diabetes (Supplementary Fig. 5A).

Because the testing cohort had a low number of individuals with type 1 diabetes, we also tested a more lenient phenotype definition, which did not account for age at diabetes onset (31). This lenient definition included 569 individuals with type 1 diabetes (351 with EUR ancestry and 218 with non-EUR ancestry). Using this definition, T1D MAPS and T1D GRS2EUR performed similarly within EUR ancestry (AUC 0.82 vs. 0.83, P = 0.66) (Supplementary Fig. 5B). Interestingly, both scores had low performance within non-EUR ancestry, with no significant difference between T1D MAPS and T1D GRS2EUR (AUC 0.67 vs. 0.67, >P = 0.59), or between T1D MAPS and T1D GRSAFR (AUC 0.67 vs. 0.67, P = 0.93).

Determination of Optimal Score Cutoffs

After developing T1D MAPS, we assessed the optimal score threshold for discriminating between individuals with and without type 1 diabetes. First, we confirmed that the distribution of each polygenic score differs based on genetic ancestry (Fig. 3). For example, among individuals with type 1 diabetes, the mean T1D GRS2EUR was lower in non-EUR ancestry compared with EUR ancestry. In contrast, T1D MAPS and T1D MAPS2 both had similar score distributions among individuals with type 1 diabetes, although there were slight ancestry-related differences among individuals without type 1 diabetes.

Figure 3.

Four box plots compare polygenic scores for type 1 diabetes across European and non-European populations using different genetic models. Individuals with diabetes have higher scores in all models, with T 1 D M A P S and T 1 D M A P S 2 showing clearer separation between affected and unaffected groups, indicating improved cross-ancestry performance.

Distribution of type 1 diabetes polygenic scores. Boxplots display the distribution of each polygenic score in All of Us, stratified by type 1 diabetes status and genetic ancestry. The dashed red line in each graph represents the score with the maximum Youden index.

To quantify the performance of each score, we calculated the sensitivity, specificity, and Youden index (sensitivity + specificity – 1) at various score thresholds in All of Us (Table 1). Once again, we demonstrated that score performance may differ based on genetic ancestry. For example, for T1D GRS2EUR, a score of 12.35 (85th percentile in the overall population) yielded a Youden index of 0.72 in EUR ancestry but only 0.45 in non-EUR ancestry. In comparison, for T1D MAPS2, a score of 14.47 (85th percentile in the overall population) yielded a Youden index of 0.69 in EUR ancestry and 0.67 in non-EUR ancestry.

Table 1.

Sensitivity and specificity of type 1 diabetes polygenic scores

EUR Non-EUR
Pop. centile Score Spec. Sens. Youden Spec. Sens. Youden
T1D GRS2EUR
 0.50 9.90 0.44 0.97 0.40 0.59 0.84 0.43
 0.55 10.22 0.49 0.95 0.44 0.64 0.80 0.44
 0.60 10.53 0.54 0.95 0.49 0.69 0.72 0.41
 0.65 10.85 0.59 0.93 0.53 0.73 0.72 0.45
 0.70 11.19 0.65 0.93 0.58 0.77 0.68 0.45
 0.75 11.54 0.70 0.93 0.64 0.81 0.64 0.45
 0.80 11.91 0.76 0.92 0.68 0.85 0.64 0.49
 0.85 12.35 0.82 0.90 0.72 0.89 0.56 0.45
 0.90 12.88 0.88 0.82 0.70 0.93 0.48 0.41
 0.95 13.66 0.94 0.72 0.66 0.97 0.44 0.41
T1D GRS2AFR
 0.50 3.59 0.44 0.92 0.36 0.58 0.96 0.54
 0.55 4.02 0.49 0.92 0.41 0.63 0.92 0.55
 0.60 4.42 0.54 0.90 0.44 0.67 0.88 0.55
 0.65 4.82 0.60 0.85 0.46 0.71 0.84 0.55
 0.70 5.25 0.67 0.80 0.47 0.74 0.68 0.42
 0.75 5.79 0.73 0.74 0.46 0.78 0.56 0.34
 0.80 6.26 0.78 0.69 0.47 0.82 0.56 0.38
 0.85 6.90 0.84 0.67 0.51 0.87 0.52 0.39
 0.90 7.36 0.88 0.38 0.26 0.91 0.44 0.35
 0.95 8.83 0.95 0.25 0.19 0.96 0.28 0.24
T1D MAPS
 0.50 12.60 0.58 0.95 0.53 0.38 1.00 0.38
 0.55 12.85 0.63 0.93 0.57 0.44 1.00 0.44
 0.60 13.09 0.68 0.92 0.59 0.49 0.96 0.45
 0.65 13.34 0.72 0.92 0.64 0.55 0.96 0.51
 0.70 13.59 0.76 0.85 0.62 0.61 0.96 0.57
 0.75 13.85 0.80 0.85 0.66 0.67 0.88 0.55
 0.80 14.14 0.84 0.80 0.65 0.74 0.88 0.62
 0.85 14.47 0.88 0.75 0.64 0.80 0.88 0.68
 0.90 14.87 0.92 0.67 0.59 0.87 0.80 0.67
 0.95 15.45 0.96 0.41 0.37 0.94 0.56 0.50
T1D MAPS2
 0.50 12.56 0.59 0.93 0.52 0.38 1.00 0.38
 0.55 12.81 0.63 0.93 0.57 0.43 1.00 0.43
 0.60 13.05 0.68 0.92 0.60 0.48 0.96 0.44
 0.65 13.30 0.73 0.90 0.63 0.54 0.96 0.50
 0.70 13.56 0.77 0.89 0.66 0.60 0.96 0.56
 0.75 13.83 0.81 0.89 0.70 0.66 0.92 0.58
 0.80 14.13 0.85 0.87 0.72 0.73 0.88 0.61
 0.85 14.47 0.89 0.80 0.69 0.79 0.88 0.67
 0.90 14.88 0.93 0.74 0.66 0.86 0.80 0.66
 0.95 15.48 0.96 0.61 0.57 0.93 0.56 0.49

This table displays the sensitivity (Sens.), specificity (Spec.), and Youden index (sensitivity + specificity – 1) for various values of each polygenic score, as applied in the testing cohort (All of Us). The centile for each score was calculated for the entire population (Pop.) and was then applied across all ancestry groups. The maximum Youden index for each column is denoted in bold font.

Discussion

Polygenic scores are highly predictive of type 1 diabetes risk. Here, we demonstrated that a novel multiancestry score can improve type 1 diabetes prediction in individuals with non-EUR genetic ancestry, while maintaining high predictive power in EUR ancestry populations.

Our results highlight the necessity of accurately capturing HLA haplotypes. We demonstrated that the genetic variants used in T1D GRS2EUR are more closely correlated with HLA haplotypes in EUR compared with non-EUR ancestry. We used two alternative methods to determine HLA haplotypes, based on genotype array data or whole-genome sequencing. Each resulting HLA score was highly correlated with type 1 diabetes risk, confirming previous findings that approaches using genotype arrays or whole-genome sequencing have similar predictive power for type 1 diabetes (12). Additionally, we demonstrated how a global extended polygenic score can improve disease prediction by incorporating numerous genetic variants that do not achieve genome-wide significance in a GWAS.

Although T1D MAPS demonstrated strong predictive power across ancestry groups, the AUC was marginally lower than T1D GRS2EUR in individuals with EUR ancestry in the testing cohort. This may be because T1D GRS2EUR accounts for known significant interactions between HLA haplotypes (38), but we had limited power to detect interaction effects in our discovery cohort. Therefore, we developed an alternative score, T1D MAPS2, which also accounts for an individual’s predicted proportion of EUR genetic ancestry. This approach avoids the drawbacks of existing polygenic scores because it can be applied in all settings and does not require individuals to declare their self-reported race or ethnicity. Furthermore, T1D MAPS2 models genetic ancestry as a continuous variable, avoiding the need to sort individuals into discrete categories such as EUR or AFR. Future work could account for local ancestry at specific genomic regions (e.g., MHC), which may refine the score’s ability to capture ancestry-specific effects.

In practice, clinicians may use polygenic scores to identify individuals at high risk of developing type 1 diabetes or to identify autoimmune etiologies for individuals with atypical presentations (39). In both circumstances, it is useful to designate a score cutoff to separate high-risk and low-risk individuals. However, identifying the optimal cutoff is challenging, as the underlying distribution of each polygenic score may vary based on genetic ancestry (12). One option is to introduce ancestry-specific score thresholds; nevertheless, even with tailored thresholds, predictive power may differ among ancestry groups, and this approach requires classification of individuals into discrete categories. Here, we demonstrate that a multiancestry score allows for a single threshold with high discriminatory power across all ancestry groups.

Although our results are promising, our findings were limited by small sample size. We examined a relatively low number of individuals with type 1 diabetes and non-EUR ancestry in both the training and testing cohorts. This limited our statistical power to detect differences between T1D MAPS and other polygenic scores; nevertheless, we observed significantly higher performance with T1D MAPS in non-EUR ancestry, demonstrating the value of this score. Future studies should replicate these findings in larger cohorts. In addition, future association studies for genetic discovery should include greater representation from diverse populations (including AFR and AMR ancestry, as well as other backgrounds such as East Asian or South Asian). The relatively low representation of non-EUR ancestry in the discovery cohort may explain why integrating ancestry-specific GWAS statistics with PRS-CSx did not significantly improve the performance of T1D MAPS.

Our assessment of disease risk relies on accurate classification of type 1 diabetes. One option is to focus on individuals with positive autoantibodies; however, most individuals in our cohorts did not have autoantibody testing, and up to 15% of individuals with type 1 diabetes do not have detectable autoantibodies (40). Instead, we applied classification algorithms using electronic medical records, but such algorithms are less accurate in non-EUR ancestry compared with EUR ancestry (29). In MGB Biobank, we mitigated this issue by manually reviewing clinical notes, but clinical notes were not available in All of Us. Interestingly, when we analyzed a more lenient classification algorithm in All of Us (which included individuals diagnosed at any age), we found that all polygenic scores had poor discriminatory power within non-EUR ancestry. This finding may reflect the fact that disease misclassification is common in adult-onset type 1 diabetes (41). Accurate classification in diverse settings is also challenging because diabetes may present differently across global populations (42). Nevertheless, integrating polygenic scores with clinical covariates, such as autoantibodies and family history, is likely to increase predictive power even further (43).

From a practical perspective, implementing T1D MAPS may require more resources compared with T1D GRS2EUR or T1D GRSAFR (44), due to the need for imputation or assembly of HLA haplotypes as well as calculation of a global extended polygenic score with >1 million variants. Nevertheless, the variants required to compute T1D MAPS can be obtained by performing imputation with the TOPMed reference panel (26) on data from commercially available genotype arrays, which are becoming increasingly affordable.

Similarly, T1D MAPS2 requires additional computational resources to determine predicted probabilities for each continental ancestry group. This approach aligns an individual’s genetic data to external reference panels (32) and does not vary based on the underlying genetic admixture of a given population. However, if continuous probabilities of genetic ancestry are not readily available, then T1D MAPS can be used instead, with relatively similar performance.

Overall, we demonstrated that T1D MAPS can accurately predict type 1 diabetes risk across ancestry groups, with a single score threshold that identifies high-risk individuals. This score is useful in clinical practice and may advance knowledge of diabetes risk, particularly in underrepresented populations. Future work should replicate these findings in larger cohorts and expand to include individuals from other ancestry groups.

This article contains supplementary material online at https://doi.org/10.2337/figshare.30286648.

Article Information

Acknowledgments. The authors thank All of Us participants for their contributions, without whom this research would not have been possible. They also thank the National Institutes of Health All of Us Research Program for making available the participant data examined in this study.

Duality of Interest. M.S.U. is involved in consulting activity and a research collaboration with Novo Nordisk that is unrelated to the content of this manuscript. No other potential conflicts of interest relevant to this article were reported.

Author Contributions. A.J.D. performed data analysis and wrote the initial draft of the manuscript. A.J.D., J.C.F., M.C.Y.N., J.M.M., and M.S.U. designed the study. A.S.B., D.A.M., and S.N. contributed to data analysis. D.A.M., S.O.-G., and S.S.R. contributed to study design and generated association statistics in the Type 1 Diabetes Genetics Consortium. A.B.B., R.N., Y.L., and A.A.M.-R. assisted with generating HLA haplotypes. R.J.K., S.A.S., R.A.O., and A.K.M. contributed to discussion and reviewed and edited the manuscript. A.H.-C. and R.M. assisted with determining genetic ancestry. All authors approved the final version of the manuscript. M.S.U. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Prior Presentation. Parts of this study were presented in abstract form at the 85th Scientific Sessions of the American Diabetes Association, Chicago, IL, 20–23 June 2025. A non–peer-reviewed version of this article was submitted to the medRxiv preprint server (https://www.medrxiv.org/content/10.1101/2025.06.20.25329522v1) on 20 June 2025.

Funding Statement

A.J.D. is supported by National Institutes of Health (NIH)/National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) K23 DK140643. R.J.K. is supported by NIH/NIDDK T32 DK007699. R.N. is supported by the AfOx Kennedy Trust Prize Studentship (KENN 202118). Y.L. is supported by a Kennedy Trust KTRR Senior Research Fellowship (KENN202109). M.C.Y.N., A.K.M., J.M.M., and M.S.U. are supported by NIH/National Human Genome Research Institute (NHGRI) U01HG011723. J.M.M. and M.S.U. are supported by NIDDK U01DK140757, DK137993, and a Medical University of Bialystok (MUB) grant from the Ministry of Science and Higher Education (Poland). J.M.M. is supported by American Diabetes Association grant 11-22-ICTSPM-16, by the Accelerating Medicines Partnership Program in Common Metabolic Diseases award from RFP 6 from the Foundation for the National Institutes of Health, and by the Novo Nordisk Foundation (NNF21SA0072102). M.S.U. is supported by Doris Duke Foundation Award 2022063, NIDDK U54DK118612, and MGH Claflin Distinguished Scholar Award.

Supporting information

Supplementary Material
db250772_supp.zip (389.4KB, zip)

References

  • 1. Herold KC, Gitelman SE, Gottlieb PA, Knecht LA, Raymond R, Ramos EL. Teplizumab: a disease-modifying therapy for type 1 diabetes that preserves β-cell function. Diabetes Care 2023;46:1848–1856 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Harding JL, Wander PL, Zhang X, et al. The incidence of adult-onset type 1 diabetes: a systematic review from 32 countries and regions. Diabetes Care 2022;45:994–1006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Thomas NJ, Lynam AL, Hill AV, et al. Type 1 diabetes defined by severe insulin deficiency occurs after 30 years of age and is commonly treated as type 2 diabetes. Diabetologia 2019;62:1167–1172 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Billings LK, Shi Z, Wei J, et al. Utility of polygenic scores for differentiating diabetes diagnosis among patients with atypical phenotypes of diabetes. J Clin Endocrinol Metab 2023;109:107–113 [DOI] [PubMed] [Google Scholar]
  • 5. Evans-Molina C, Oram RA. Type 1 diabetes presenting in adults: trends, diagnostic challenges and unique features. Diabetes Obes Metab 2025;27(Suppl. 6):57–68 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Sharp SA, Rich SS, Wood AR, et al. Development and standardization of an improved type 1 diabetes genetic risk score for use in newborn screening and incident diagnosis. Diabetes Care 2019;42:200–207 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Harrison JW, Tallapragada DSP, Baptist A, et al. Type 1 diabetes genetic risk score is discriminative of diabetes in non-Europeans: evidence from a study in India. Sci Rep 2020;10:9450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Oram RA, Sharp SA, Pihoker C, et al. Utility of diabetes type-specific genetic risk scores for the classification of diabetes type among multiethnic youth. Diabetes Care 2022;45:1124–1131 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Perry DJ, Wasserfall CH, Oram RA, et al. Application of a genetic risk score to racially diverse type 1 diabetes populations demonstrates the need for diversity in risk-modeling. Sci Rep 2018;8:4529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Kaddis JS, Perry DJ, Vu AN, et al. Improving the prediction of type 1 diabetes across ancestries. Diabetes Care 2022;45:e48–e50 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Luckett AM, Weedon MN, Hawkes G, Leslie RD, Oram RA, Grant SFA. Utility of genetic risk scores in type 1 diabetes. Diabetologia 2023;66:1589–1600 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Arni AM, Fraser DP, Sharp SA, et al. Type 1 diabetes genetic risk score variation across ancestries using whole genome sequencing and array-based approaches. Sci Rep 2024;14:31044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Onengut-Gumuscu S, Chen W-M, Robertson CC, et al.; Type 1 Diabetes Genetics Consortium . Type 1 diabetes risk in African-ancestry participants and utility of an ancestry-specific genetic risk score. Diabetes Care 2019;42:406–415 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Qu H-Q, Qu J, Glessner J, et al. Improved genetic risk scoring algorithm for type 1 diabetes prediction. Pediatr Diabetes 2022;23:320–323 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet 2019;51:584–591 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Mercader JM, Ng MCY, Manning AK, Rich SS. Predicting diabetes risk in diverse populations: what next? Lancet Diabetes Endocrinol 2021;9:808–810 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Wang Y, Kanai M, Tan T, et al.; BioBank Japan Project . Polygenic prediction across populations is influenced by ancestry, genetic architecture, and methodology. Cell Genom 2023;3:100408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Kachuri L, Chatterjee N, Hirbo J, et al.; Polygenic Risk Methods in Diverse Populations (PRIMED) Consortium Methods Working Group . Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet 2024;25:8–25 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Kullo IJ, Conomos MP, Nelson SC, et al.; Polygenic Risk Methods in Diverse Populations (PRIMED) Consortium . The PRIMED Consortium: reducing disparities in polygenic risk assessment. Am J Hum Genet 2024;111:2594–2606 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Redondo MJ, Gignoux CR, Dabelea D, et al. Type 1 diabetes in diverse ancestries and the use of genetic risk scores. Lancet Diabetes Endocrinol 2022;10:597–608 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Udler MS, McCarthy MI, Florez JC, Mahajan A. Genetic risk scores for diabetes diagnosis and precision medicine. Endocr Rev 2019;40:1500–1520 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Michalek DA, Tern C, Zhou W, et al. A multi-ancestry genome-wide association study in type 1 diabetes. Hum Mol Genet 2024;33:958–968 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Chiou J, Geusz RJ, Okino M-L, et al. Interpreting type 1 diabetes risk with genetics and single-cell epigenomics. Nature 2021;594:398–402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Boutin NT, Schecter SB, Perez EF, et al. The evolution of a large biobank at Mass General Brigham. J Pers Med 2022;12:1323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Wojcik GL, Fuchsberger C, Taliun D, et al. Imputation-aware tag SNP selection to improve power for large-scale, multi-ethnic association studies. G3 (Bethesda) 2018;8:3255–3267 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Taliun D, Harris DN, Kessler MD, et al.; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium . Sequencing of 53,831 diverse genomes from the NHLBI TOPMed program. Nature 2021;590:290–299 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Denny JC, Rutter JL, Goldstein DB, et al.; All of Us Research Program Investigators . The “All of Us” Research Program. N Engl J Med 2019;381:668–676 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. All of Us Research Program Genomics Investigators . Genomic data in the All of Us Research Program. Nature 2024;627:340–346 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Deutsch AJ, Stalbow L, Majarian TD, et al. Polygenic scores help reduce racial disparities in predictive accuracy of automated type 1 diabetes classification algorithms. Diabetes Care 2023;46:794–800 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Luckett AM, Oram RA, Deutsch AJ, et al. Standardized measurement of type 1 diabetes polygenic risk across multiancestry population cohorts. Diabetes Care 2025;48:e81–e83 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Szczerbinski L, Mandla R, Schroeder P, et al. Algorithms for the identification of prevalent diabetes in the All of Us Research Program validated using polygenic scores. Sci Rep 2024;14:26895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Karczewski KJ, Gupta R, Kanai M, et al. Pan-UK Biobank GWAS improves discovery, analysis of genetic architecture, and resolution into ancestry-enriched effects. 15 March 2024. [preprint]. DOI:10.1101/2024.03.13.24303864 [Google Scholar]
  • 33. Luo Y, Kanai M, Choi W, et al.; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium . A high-resolution HLA reference panel capturing global population diversity enables multi-ancestry fine-mapping in HIV host response. Nat Genet 2021;53:1504–1516 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Sakaue S, Gurajala S, Curtis M, et al. Tutorial: a statistical genetics guide to identifying HLA alleles driving complex disease. Nat Protoc 2023;18:2625–2641 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Lee H, Kingsford C. Kourami: graph-guided assembly for novel human leukocyte antigen allele discovery. Genome Biol 2018;19:16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Ge T, Chen C-Y, Ni Y, Feng Y-CA, Smoller JW. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 2019;10:1776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Ruan Y, Lin Y-F, Feng Y-CA, et al.; Stanley Global Asia Initiatives . Improving polygenic prediction in ancestrally diverse populations. Nat Genet 2022;54:573–580 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Hu X, Deutsch AJ, Lenz TL, et al. Additive and interaction effects at three amino acid positions in HLA-DQ and HLA-DR molecules drive type 1 diabetes risk. Nat Genet 2015;47:898–905 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Kreienkamp RJ, Deutsch AJ, Huerta-Chagoya A, et al. Type 1 diabetes polygenic scores improve diagnostic accuracy in pediatric diabetes care. Horm Res Paediatr 20 May 2025. [Epub ahead of print]. DOI:10.1159/000546445 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Patel SK, Ma CS, Fourlanos S, Greenfield JR. Autoantibody-negative type 1 diabetes: a neglected subtype. Trends Endocrinol Metab 2021;32:295–305 [DOI] [PubMed] [Google Scholar]
  • 41. Thomas NJ, Walkey HC, Kaur A, et al. The relationship between islet autoantibody status and the genetic risk of type 1 diabetes in adult-onset type 1 diabetes. Diabetologia 2023;66:310–320 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Deutsch AJ, Udler MS. Phenotypic and genetic diversity in diabetes across populations. J Clin Endocrinol Metab 2025;110:2123–2133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Ferrat LA, Vehik K, Sharp SA, et al.; TEDDY Study Group . A combined risk score enhances prediction of type 1 diabetes among susceptible children. Nat Med 2020;26:1247–1255 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Ferrat LA, Templeman EL, Steck AK, et al.; Type 1 Diabetes TrialNet Study Group . Type 1 diabetes prediction in autoantibody-positive individuals: performance, time and money matter. Diabetologia 2025;68:1709–1720 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material
db250772_supp.zip (389.4KB, zip)

Articles from Diabetes are provided here courtesy of American Diabetes Association

RESOURCES