A New Explained-Variance Based Genetic Risk Score for Predictive Modeling of Disease Risk

Ronglin Che; Alison A Motsinger-Reif

doi:10.1515/1544-6115.1796

. Author manuscript; available in PMC: 2017 May 28.

Published in final edited form as: Stat Appl Genet Mol Biol. 2012 Sep 25;11(4):Article–15. doi: 10.1515/1544-6115.1796

A New Explained-Variance Based Genetic Risk Score for Predictive Modeling of Disease Risk

Ronglin Che ¹, Alison A Motsinger-Reif ²

PMCID: PMC5446920 NIHMSID: NIHMS857347 PMID: 23023697

Abstract

The goal of association mapping is to identify genetic variants that predict disease, and as the field of human genetics matures, the number of successful association studies is increasing. Many such studies have shown that for many diseases, risk is explained by a reasonably large number of variants that each explains a very small amount of disease risk. This is prompting the use of genetic risk scores in building predictive models, where information across several variants is combined for predictive modeling. In the current study, we compare the performance of four previously proposed genetic risk score methods and present a new method for constructing genetic risk score that incorporates explained variance information. The methods compared include: a simple count Genetic Risk Score, an odds ratio weighted Genetic Risk Score, a direct logistic regression Genetic Risk Score, a polygenic Genetic Risk Score, and the new explained variance weighted Genetic Risk Score. We compare the methods using a wide range of simulations in two steps, with a range of the number of deleterious single nucleotide polymorphisms (SNPs) explaining disease risk, genetic modes, baseline penetrances, sample sizes, relative risks (RR) and minor allele frequencies (MAF). Several measures of model performance were compared including overall power, C-statistic and Akaike’s Information Criterion. Our results show the relative performance of methods differs significantly, with the new explained variance weighted GRS (EV-GRS) generally performing favorably to the other methods.

Keywords: explained variance, polygenic, predictive modeling, simple count genetic risk score, weighted genetic risk score

1 INTRODUCTION

An important priority in the area of genetic epidemiology is the identification of susceptible variants for the common disease. These genetic variants could further be incorporated in a feasible model to predict the disease risk, so that the environmental or therapeutic interventions could be introduced earlier to prevent the diseases or improve personalized treatment. In recent years, Genome-Wide Association Studies (GWAS) and candidate polymorphism investigations have identified a large number of variants that are consistently associated with the risk of complex diseases (Manolio 2010). However, most of the currently identified genetic variants convey a relatively modest effect, and the predictive value is limited. Anticipating the discovery of a large number of novel genetic variants in the near future, we need to prepare an appropriate framework to translate the emerging genomic knowledge into clinical utility, including the construction of genetic risk scores, the measurement of the predictive value, and the validation of the prediction models (Janssens and van Duijn 2009).

To address these issues, many analytical methods and models have been developed to better predict the disease risk using these low-effect risk variants. Recent studies have suggested possible risk models incorporating previously consistent genetic and conventional (clinical, demographic, etc.) risk factors (Meigs, et al. 2008; Talmud, et al. 2010). These genetic variants are included on the basis of consistent GWAS signals or meta-analysis results of association studies (De Jager, et al. 2009; Talmud, et al. 2010; Taylor, et al. 2011). Such improvements have had mixed results in predicting the risk of several common diseases, such as Type II diabetes (Meigs, et al. 2008; Talmud, et al. 2010), multiple sclerosis (De Jager, et al. 2009), systemic lupus erythematosus (Taylor, et al. 2011), breast cancer (Zheng, et al. 2010), lung cancer (Young, et al. 2009) and cardiovascular diseases (Paynter, et al. 2010), etc. Among these models, unweighted and weighted genetic risk score functions were used to construct genetic risk score profiles (De Jager, et al. 2009; Karlson, et al. 2010; Lin, et al. 2009; Meigs, et al. 2008; Paynter, et al. 2010; Seddon, et al. 2009; Talmud, et al. 2010; Taylor, et al. 2011; Young, et al. 2009; Zheng, et al. 2010). While these approaches have shown anecdotal success in real data analyses, these risk score functions have not been rigorously evaluated and compared. The assessment and comparison of the statistical properties of these functions in a range of scenarios is crucial for the proper application and interpretation of these methods.

In the current study, we try to compare the performance of four previously proposed methods: a simple count Genetic Risk Score (SC-GRS) (Talmud, et al. 2010), an odds ratio weighted Genetic Risk Score (OR-GRS) (De Jager, et al. 2009; Karlson, et al. 2010; Talmud, et al. 2010), a direct logistic regression Genetic Risk Score (DL-GRS) (Carayol, et al. 2010) and a polygenic Genetic Risk Score (PG-GRS) (Carayol, et al. 2010), and present a new method using an explained variance weighted Genetic Risk Score (EV-GRS). In a two-step simulation study, we used a wide range of simulated genetic models with a range of the number of deleterious single nucleotide polymorphisms (SNPs) in the etiology of disease risk, genetic modes, baseline penetrances, sample sizes, relative risks (RR) and minor allele frequencies (MAF) of the SNPs. We applied the risk score methods to the simulated data, and compared their performance based on power, C-statistic and Akaike’s Information Criteria (AIC) metrics.

2 METHODS

2.1 EXISTING GENETIC RISK SCORE MODELS

To simplify the analysis, we assume that one SNP per susceptibility gene has been selected, assuming these SNPs are uncorrelated and in turn contribute to the disease in an additive way. As described above, a very simple model is assumed. Let D denote the disease status where D = 1 if the subject has the disease (case) and D = 0 if healthy (control). Let G denote a vector of all genotype combinations and G_i denote the number of risk alleles of the subject at i-th SNP. We assume all genotypes are available for all SNPs and individuals and therefore no data is missing. All parameters are estimated by fitting the logistic regression model (Carayol, et al. 2010; Cordell and Clayton 2002).

2.1.1 Simple count GRS (SC-GRS)

logit P (D = 1 | G) = α + β (S C_GRS) = α + β \sum_{i = 1}^{I} G_{i}

(1)

S C_GRS = \sum_{i = 1}^{I} G_{i}

(2)

This simple count model involves only two parameters. The risk score profile utilized the sum of all risk alleles for all SNPs. No prior information about the effect size of associated SNP is required. It is relatively simple and thus has a wide application for current research, especially when current literature is insufficient to provide stable estimates for each SNP’s effect (Paynter, et al. 2010). However, the presumed assumption of equal contributions of all SNPs may not be plausible.

2.1.2 Odds ratio weighted GRS (OR-GRS)

logit P (D = 1 | G) = α + β (O R_GRS) = α + β \sum_{i = 1}^{I} w_{O R_i} G_{i}

(3)

w_{O R_i} = \log (O R_{i})

(4)

O R_GRS = \sum_{i = 1}^{I} w_{O R_i} G_{i}

(5)

rescaled : O R_GRS = I (\sum_{i = 1}^{I} w_{O R_i} G_{i}) / (\sum_{i = 1}^{I} w_{O R_i})

(6)

This model also needs two parameters. Here, the unequal effect size of SNPs is taken into account. The risk score is constructed as the weighted sum of all SNPs. The w_OR is a pre-determined fixed weight. Practically, it is usually the log per-allele OR from meta-analysis for this SNP (Talmud, et al. 2010). It is easy to derive that SNP(s) with larger OR tends to contribute more to disease risk. This method requires external determinants, but in some cases they are unavailable if no studies were done before or inaccurate prior determinants were provided. This requirement makes this type of risk score unavailable for some studies, where previous estimates are not available. To make the weighted genetic risk score more directly comparable to the simple count genetic risk score, we used the rescaled version of the OR-GRS by multiplying by the rescaling factor $I / (\sum_{i = 1}^{I} w_{O R_i})$ .

2.1.3 Direct logistic regression GRS (DL-GRS)

logit P (D = 1 | G) = α + D L_GRS = α + \sum_{i = 1}^{I} β_{i} G_{i}

(7)

D L_GRS = \sum_{i = 1}^{I} β_{i} G_{i}

(8)

This alternative weighted method directly fits a logistic regression model. The risk coefficient is the log(OR) for SNP i using the original dataset. The number of risk alleles is counted and multiplied by the risk coefficient to derive the risk score. No external information (i.e. an effect estimate from previous studies) is needed but I+1 parameters are estimated (Carayol, et al. 2010). Because this score is developed from the data at hand, the question of external validation inevitably arises. It can be assumed that if this score is applied in independent data, its fit will be substantially worse than the fit when applied in the first data in which it was built. The underlying goal of this risk score is essentially the same as with the OR-GRS, except that it is applied when external estimates of effect size are not available.

2.1.4 Polygenic GRS (PG-GRS)

logit P (D = 1 | G) = α + P G_GRS = α + \sum_{i = 1}^{I} β_{i 1} x_{i 1} + \sum_{i = 1}^{I} β_{i 2} x_{i 2}

(9)

P G_GRS = \sum_{i = 1}^{I} β_{i 1} x_{i 1} + \sum_{i = 1}^{I} β_{i 2} x_{i 2}

(10)

For the PG model, two dummy variables are considered per SNP. Let x_i₁ be an indicator function of homozygous status and x_i₂ be an indicator of homozygous for risk allele at SNP i. Suppose a is the risk allele. Then, genotype AA is coded as 00, Aa as 10 and aa as 01. If we set AA as the baseline genotype, β₁ is the risk coefficient for Aa and β₂ is the risk coefficient for aa. In this aspect, the PG model is more flexible if the underlying genetic mode is unknown. The other methods discussed above make an additive genetic model assumption in adding the number of risk alleles. This assumption of additivity will decrease performance if there is dominance deviation in the actual underlying risk etiologies. While the ability to not be limited to genetic assumptions is appealing, the clear drawback is that the number of parameters 2I+1 is dramatically increasing as usually many SNPs were involved in reality (Carayol, et al. 2010). Additionally, as with the DL-GRS, this GRS relied exclusively on information derived from the original dataset, so the same concerns about external validation hold here.

2.2 NEW GENETIC RISK SCORE MODEL

2.2.1 Explained variance weighted GRS (EV-GRS)

logit P (D = 1 | G) = α + β (E V_GRS) = α + β \sum_{i = 1}^{I} w_{E V_i} G_{i}

(11)

w_{E V_i} = log (O R_{i}) \sqrt{2 M A F_{i} (1 - M A F_{i})}

(12)

E V_GRS = \sum_{i = 1}^{I} w_{E V_i} G_{i}

(13)

rescaled : E V_GRS = I (\sum_{i = 1}^{I} w_{E V_i} G_{i}) / (\sum_{i = 1}^{I} w_{E V_i})

(14)

Motivated by the effect size definition by Park and colleagues (Park, et al. 2010), we propose a new weighted method incorporating both OR and minor allele frequency (MAF) for SNP i, where the MAF estimate could be obtained from http://www.ncbi.nlm.nih.gov/projects/SNP/ or from published data, and OR estimate comes from the log per-allele odds ratio from external meta-analysis results. For individual SNP, we believe both OR and MAF are reasonable factors to define the explained variance and in turn to construct the prior contribution to the disease risk. It is expected that within the same OR, the disease risk will increase with increases of the MAF. This motivation is linked to the idea of Bayesian methods that we have already obtained priori knowledge of these variants and we could make use of the knowledge to improve our prediction. Similarly, the rescaled version of EV-GRS was used to make the genetic risk score more comparable to SC- and OR-GRS.

2.3 TWO-STEP SIMULATION DESIGN

We evaluated and tested the GRS methods in a two-step simulation study. In Step one, methods were compared for general performance in a range of simulations with similar minor allele frequencies and relative risks (in which case our EV-GRS method is equivalent to previous methods), and the relative performance of each approach was demonstrated. In Step two, methods that performed well in the first step of analysis and that have comparable numbers of parameters, and our new EV-GRS method are compared in a range of simulations where minor allele frequencies vary.

2.3.1 Step one simulation

Our primary goal in the Step 1 simulation was to detect general differences in performance among the four current genetic risk score models in a range of genetic models.

Factors of interest in the simulations included: the number of deleterious single nucleotide polymorphisms (SNP) that convey disease risk, the minor allele frequencies (MAF) of those SNPs, the relative risks (RR) of the associations, the underlying genetic modes, and the sample sizes of the datasets. We consider true disease risk models involving 2 and 6 deleterious SNPs, assuming Hardy-Weinberg Equilibrium (HWE). To simplify, we assume these SNPs contribute to the disease in an additive way with no interaction, and assume no linkage disequilibrium between them. We understand that these simplifications limit dissection of how these models perform in some cases, but do make the simulations manageable within the scope of the current study. Minor allele frequencies (MAF) for the SNPs were set to either 0.25 or 0.5 to represent common variants. Relative risks (RR) considered for our model were 1.5, 2 and 3 for 2 SNPs combination and 1.25, 1.5 and 1.75 for 6 SNPs combination (Figure 1). This range varies since high relative risks could lead to disease prevalence out of bounds for the large number of SNPs. This scenario represented realistic situations that the small number of causal variants with larger effect may lead to susceptibility to common diseases, while large number of variants influencing diseases usually may convey minor effects. The baseline penetrance was fixed at 0.1 to ensure a realistic population prevalence rate for common, complex diseases.

Relative risk and minor allele frequency specifications in the simulation design.

Simulated models are represented as penetrance functions. Penetrance functions define the probability of disease given a particular genotype combination at the disease risk locus. Penetrance functions under three genetic modes (recessive, additive and dominant modes) were explicitly determined, and the summary measures of effect size were calculated as described previously (Culverhouse, et al. 2002). Table 1 illustrates the two-locus penetrance patterns used in the current study as an example, where k was the baseline penetrance and θ was defined as the specified relative risk of having a disease between different genotypes for each SNP. Using a similar strategy, 6 SNP combinations models were also generated (details not shown). Balanced (equal allocation) case-control data was simulated with a total sample size of 250 and 500.

Table 1.

Penetrance patterns under three genetic modes for 2-locus main effect model.

Mode

Genotype

Recessive

θ_bk

θ_ak

(θ_a+θ_b–1)k

Additive

\frac{(θ_{b} + 1) k}{2}

θ_bk

\frac{(θ_{a} + 1) k}{2}

\frac{(θ_{a} + θ_{b}) k}{2}

\frac{(θ_{a} + θ_{b} - 1) k}{2}

θ_ak

\frac{(2 θ_{a} + θ_{b} - 1) k}{2}

(θ_a + θ_b −1)k

Dominant

θ_bk

θ_ak

(θ_a +θ_b −1)k

(θ_a + θ_b −1)k

θ_ak

(θ_a +θ_b −1)k

(θ_a + θ_b −1)k

Scenario	Weight^d	Power			C			AIC
Scenario	Weight^d	SC-OR	SC-EV	OR-EV	SC-OR	SC-EV	OR-EV	SC-OR	SC-EV	OR-EV
1^a	Correct	OR	EV		OR	EV		OR	EV
100–600	Random	OR	EV		OR	EV		OR	EV
	Overestimate	OR	EV		OR	EV		OR	EV
	Underestimate
2^b	Correct	SC		EV	SC		EV	SC		EV
100–300	Random	SC		EV	SC		EV	SC		EV
	Overestimate	SC		EV	SC		EV	SC		EV
	Underestimate	SC	SC	EV	SC	SC	EV	SC	SC
2^b	Correct								SC	OR
400–600	Random								SC	OR
	Overestimate	SC		EV	SC		EV	SC		EV
	Underestimate	SC		EV	SC	SC	EV	SC	SC	EV
3^c	Correct	OR	EV	EV		EV	EV	OR	EV	EV
100–300	Random	OR	EV	EV		EV	EV	OR	EV	EV
	Overestimate	SC	EV	EV	SC		EV	SC		EV
	Underestimate	SC	SC	EV	SC	SC	EV	SC	SC	EV
3^c	Correct	OR	EV		OR	EV		OR	EV
400–600	Random	OR	EV		OR	EV		OR	EV
	Overestimate	OR	EV		OR	EV			EV	EV
	Underestimate	SC		EV	SC	SC	EV	SC		EV

Scenario		Model	RR			MAF			Prevalence
Scenario		Model	1/1–2^a	2/3–4^b	5–6	1/1–2^a	2/3–4^b	5–6	Prevalence
1 (2SNPs)	1-1	1	1.5	1.5		0.25	0.25		0.125
		2	2	2		0.25	0.25		0.15
		3	3	3		0.25	0.25		0.2
	1-2	4	1.5	1.5		0.5	0.5		0.15
		5	2	2		0.5	0.5		0.2
		6	3	3		0.5	0.5		0.3
	1-3	7	1.5	1.5		0.25	0.5		0.1375
		8	2	2		0.25	0.5		0.175
		9	3	3		0.25	0.5		0.25
2 (6 SNPs)	2-1	10	1.25	1.25	1.25	0.25	0.25	0.25	0.1375
		11	1.5	1.5	1.5	0.25	0.25	0.25	0.175
		12	1.75	1.75	1.75	0.25	0.25	0.25	0.2125
	2-2	13	1.25	1.25	1.25	0.5	0.5	0.5	0.175
		14	1.5	1.5	1.5	0.5	0.5	0.5	0.25
		15	1.75	1.75	1.75	0.5	0.5	0.5	0.325
	2-3	16	1.25	1.5	1.75	0.25	0.25	0.25	0.175
		17	1.25	1.5	1.75	0.5	0.5	0.5	0.25

Scenario	Model	RR			MAF			Prevalence^d
Scenario	Model	1–2	3–4	5–6	1–2	3–4	5–6	Prevalence^d
1^a	1	1.1	1.5	2	0.01	0.01	0.01	0.1032
	2	1.1	1.5	2	0.05	0.05	0.05	0.116
	3	1.1	1.5	2	0.25	0.25	0.25	0.18
2^b	4	1.1	1.1	1.1	0.01	0.05	0.25	0.1062
	5	1.5	1.5	1.5	0.01	0.05	0.25	0.131
	6	2	2	2	0.01	0.05	0.25	0.162
3^c	7	1.1	1.5	2	0.01	0.05	0.25	0.1552
	8	1.1	1.5	2	0.01	0.25	0.05	0.1352
	9	1.1	1.5	2	0.05	0.01	0.25	0.152
	10	1.1	1.5	2	0.05	0.25	0.01	0.128
	11	1.1	1.5	2	0.25	0.01	0.05	0.116
	12	1.1	1.5	2	0.25	0.05	0.01	0.112

Effect	Scenario 1 (2 SNPs)			Scenario 2 (6 SNPs)
Effect	Power	C	AIC	Power	C	AIC
RR	<.0001	<.0001	<.0001	<.0001	<.0001	<.0001
MAF	0.4030	0.6824	0.9948	0.7168	0.8848	0.6516
Genetic mode	0.1639	0.2344	0.3627	0.6138	0.5871	0.5286
Sample size	<.0001	0.4931	<.0001	<.0001	0.0008	<.0001
Method	<.0001	<.0001	0.0025	<.0001	<.0001	<.0001

Method Comparisons	Scenario 1 (2 SNPs)			Scenario 2 (6 SNPs)
Method Comparisons	Power	C	AIC	Power	C	AIC
SC—OR	0.9841	0.0775	0.9781	0.6746	0.4668	0.6354
SC—DL	<.0001	<.0001	0.4213	<.0001	<.0001	<.0001
SC—PG	<.0001	<.0001	0.1007	<.0001	<.0001	<.0001
OR—DL	<.0001	<.0001	0.2187	<.0001	<.0001	<.0001
OR—PG	<.0001	<.0001	0.2296	<.0001	<.0001	<.0001
DL—PG	0.9471	<.0001	0.001	0.9244	<.0001	<.0001

Scenario	Weight ^a	Power			C			AIC
Scenario	Weight ^a	SC	OR	EV	SC	OR	EV	SC	OR	EV
1	Correct	37.111	46.444	46.528	0.546	0.553	0.553	485.28	483.95	483.95
100–600	Random		46.083	46.056		0.552	0.552		484.01	484.01
	Overestimate		44.333	44.778		0.552	0.553		484.15	484.13
	Underestimate		34.667	35.111		0.544	0.544		485.06	485.03
2	Correct	32.778	28.278	31.889	0.562	0.558	0.562	277.71	278.08	277.78
100–300	Random		28.500	31.444		0.559	0.562		278.05	277.82
	Overestimate		19.056	30.667		0.553	0.560		279	277.87
	Underestimate		9.611	21.056		0.533	0.547		279.83	278.9
2	Correct	54.500	54.111	53.889	0.557	0.556	0.556	689.70	689.78	689.99
400–600	Random		54.333	53.722		0.556	0.556		689.84	690.07
	Overestimate		49.389	54.222		0.555	0.557		690.69	689.83
	Underestimate		41.056	49.333		0.548	0.552		691.64	690.76
3	Correct	31.778	34.639	37.611	0.562	0.562	0.566	277.92	277.64	277.38
100–300	Random		34.278	37.306		0.562	0.565		277.67	277.43
	Overestimate		22.472	34.917		0.555	0.564		278.8	277.65
	Underestimate		10.75	21.444		0.533	0.548		279.79	278.87
3	Correct	58.278	68.417	66.667	0.557	0.563	0.562	690.33	689	689.21
400–600	Random		68.222	66.222		0.562	0.561		689.1	689.31
	Overestimate		63.111	65.694		0.561	0.562		690.02	689.31
	Underestimate		44.528	55.417		0.550	0.554		691.95	690.39

		Power			C			AIC
Scenario	Weight^a	SC-OR	SC-EV	OR-EV	SC-OR	SC-EV	OR-EV	SC-OR	SC-EV	OR-EV
1	Correct	<.0001	<.0001	0.9943	<.0001	<.0001	0.9992	<.0001	<.0001	0.9999
100–600	Random	<.0001	<.0001	0.9993	<.0001	<.0001	0.9998	<.0001	<.0001	0.9997
	Over-est	<.0001	<.0001	0.8645	<.0001	<.0001	0.9891	<.0001	<.0001	0.9876
	Under-est	0.1412	0.2657	0.9352	0.1554	0.2372	0.9708	0.4293	0.3570	0.9905
2	Correct	<.0001	0.6140	0.0014	0.0025	0.8339	0.0114	0.0004	0.6873	0.0038
100–300	Random	<.0001	0.2776	0.0043	0.0022	0.7987	0.0119	0.0003	0.3532	0.0124
	Over-est	<.0001	0.6269	<.0001	<.0001	0.1932	<.0001	<.0001	0.7729	0.0002
	Under-est	<.0001	0.0155	0.0185	<.0001	<.0001	<.0001	<.0001	0.0100	0.0510
2	Correct	0.6310	0.3287	0.8590	0.9351	0.9691	0.9933	0.4783	0.0006	0.0137
400–600	Random	0.9462	0.3135	0.4835	0.533	0.3316	0.9306	0.1807	<.0001	0.0090
	Over-est	0.0008	0.9738	0.0015	0.0014	0.8099	0.0002	0.0001	0.8115	0.0008
	Under-est	<.0001	0.1673	0.0145	<.0001	0.0002	<.0001	<.0001	0.0046	0.0217
3	Correct	0.0103	<.0001	0.0074	0.8601	<.0001	<.0001	0.0028	<.0001	0.0060
100–300	Random	0.0224	<.0001	0.0044	0.7202	<.0001	0.0003	0.0064	<.0001	0.0110
	Over-est	<.0001	0.1595	<.0001	<.0001	0.1385	<.0001	<.0001	0.2138	<.0001
	Under-est	<.0001	0.0001	<.0001	<.0001	<.0001	<.0001	<.0001	0.0001	0.0002
3	Correct	<.0001	<.0001	0.3865	<.0001	<.0001	0.2557	<.0001	<.0001	0.1947
400–600	Random	<.0001	<.0001	0.2933	<.0001	<.0001	0.1219	<.0001	<.0001	0.1833
	Over-est	0.0015	<.0001	0.1328	<.0001	<.0001	0.3372	0.1408	<.0001	0.0001
	Under-est	<.0001	0.3666	<.0001	<.0001	0.0171	<.0001	<.0001	0.9713	<.0001

Model Specification
Model	Type	OR1-2	OR3-4	OR5-6	MAF1-6	Penetrance	n
1	1	1.1	1.5	2	0.01	0.1	100
2	2	1.1	1.5	2	0.05	0.1	100
3	3	1.1	1.5	2	0.25	0.1	100
4	1	1.1	1.5	2	0.01	0.1	200
5	2	1.1	1.5	2	0.05	0.1	200
6	3	1.1	1.5	2	0.25	0.1	200
7	1	1.1	1.5	2	0.01	0.1	300
8	2	1.1	1.5	2	0.05	0.1	300
9	3	1.1	1.5	2	0.25	0.1	300
10	1	1.1	1.5	2	0.01	0.1	400
11	2	1.1	1.5	2	0.05	0.1	400
12	3	1.1	1.5	2	0.25	0.1	400
13	1	1.1	1.5	2	0.01	0.1	500
14	2	1.1	1.5	2	0.05	0.1	500
15	3	1.1	1.5	2	0.25	0.1	500
16	1	1.1	1.5	2	0.01	0.1	600
17	2	1.1	1.5	2	0.05	0.1	600
18	3	1.1	1.5	2	0.25	0.1	600
19	1	1.1	1.5	2	0.01	0.01	100
20	2	1.1	1.5	2	0.05	0.01	100
21	3	1.1	1.5	2	0.25	0.01	100
22	1	1.1	1.5	2	0.01	0.01	200
23	2	1.1	1.5	2	0.05	0.01	200
24	3	1.1	1.5	2	0.25	0.01	200
25	1	1.1	1.5	2	0.01	0.01	300
26	2	1.1	1.5	2	0.05	0.01	300
27	3	1.1	1.5	2	0.25	0.01	300
28	1	1.1	1.5	2	0.01	0.01	400
29	2	1.1	1.5	2	0.05	0.01	400
30	3	1.1	1.5	2	0.25	0.01	400
31	1	1.1	1.5	2	0.01	0.01	500
32	2	1.1	1.5	2	0.05	0.01	500
33	3	1.1	1.5	2	0.25	0.01	500
34	1	1.1	1.5	2	0.01	0.01	600
35	2	1.1	1.5	2	0.05	0.01	600
36	3	1.1	1.5	2	0.25	0.01	600

Model Specification
Model	Type	OR1-6	MAF1-2	MAF3-4	MAF5-6	Penetrance	n
1	1	1.1	0.01	0.05	0.25	0.1	100
2	2	1.5	0.01	0.05	0.25	0.1	100
3	3	2	0.01	0.05	0.25	0.1	100
4	1	1.1	0.01	0.05	0.25	0.1	200
5	2	1.5	0.01	0.05	0.25	0.1	200
6	3	2	0.01	0.05	0.25	0.1	200
7	1	1.1	0.01	0.05	0.25	0.1	300
8	2	1.5	0.01	0.05	0.25	0.1	300
9	3	2	0.01	0.05	0.25	0.1	300
10	1	1.1	0.01	0.05	0.25	0.1	400
11	2	1.5	0.01	0.05	0.25	0.1	400
12	3	2	0.01	0.05	0.25	0.1	400
13	1	1.1	0.01	0.05	0.25	0.1	500
14	2	1.5	0.01	0.05	0.25	0.1	500
15	3	2	0.01	0.05	0.25	0.1	500
16	1	1.1	0.01	0.05	0.25	0.1	600
17	2	1.5	0.01	0.05	0.25	0.1	600
18	3	2	0.01	0.05	0.25	0.1	600
19	1	1.1	0.01	0.05	0.25	0.01	100
20	2	1.5	0.01	0.05	0.25	0.01	100
21	3	2	0.01	0.05	0.25	0.01	100
22	1	1.1	0.01	0.05	0.25	0.01	200
23	2	1.5	0.01	0.05	0.25	0.01	200
24	3	2	0.01	0.05	0.25	0.01	200
25	1	1.1	0.01	0.05	0.25	0.01	300
26	2	1.5	0.01	0.05	0.25	0.01	300
27	3	2	0.01	0.05	0.25	0.01	300
28	1	1.1	0.01	0.05	0.25	0.01	400
29	2	1.5	0.01	0.05	0.25	0.01	400
30	3	2	0.01	0.05	0.25	0.01	400
31	1	1.1	0.01	0.05	0.25	0.01	500
32	2	1.5	0.01	0.05	0.25	0.01	500
33	3	2	0.01	0.05	0.25	0.01	500
34	1	1.1	0.01	0.05	0.25	0.01	600
35	2	1.5	0.01	0.05	0.25	0.01	600
36	3	2	0.01	0.05	0.25	0.01	600

C
Model	SC	Correct		Random		Overestimate		Underestimate
Model	SC	OR	EV	OR	EV	OR	EV	OR	EV
1	0.5289	0.5261	0.5256	0.5252	0.5251	0.5271	0.5272	0.5249	0.5249
2	0.5571	0.5618	0.5605	0.5618	0.5622	0.5600	0.5624	0.5462	0.5466
3	0.5813	0.5916	0.5915	0.5925	0.5923	0.5908	0.5925	0.5636	0.5631
4	0.5235	0.5167	0.5165	0.5174	0.5173	0.5225	0.5224	0.5161	0.5159
5	0.5516	0.5582	0.5579	0.5579	0.5576	0.5579	0.5566	0.5425	0.5445
6	0.5749	0.5947	0.5946	0.5942	0.5941	0.5933	0.5933	0.5818	0.5822
7	0.5175	0.5164	0.5164	0.5164	0.5163	0.5160	0.5160	0.5143	0.5143
8	0.5477	0.5566	0.5566	0.5559	0.5560	0.5545	0.5545	0.5400	0.5410
9	0.5750	0.5922	0.5921	0.5917	0.5917	0.5914	0.5915	0.5865	0.5866
10	0.5194	0.5203	0.5202	0.5202	0.5203	0.5194	0.5195	0.5151	0.5153
11	0.5516	0.5593	0.5594	0.5588	0.5589	0.5593	0.5595	0.5528	0.5532
12	0.5760	0.5940	0.5941	0.5932	0.5932	0.5929	0.5929	0.5905	0.5906
13	0.5169	0.5160	0.5161	0.5163	0.5162	0.5166	0.5164	0.5100	0.5100
14	0.5499	0.5582	0.5583	0.5583	0.5583	0.5577	0.5578	0.5491	0.5497
15	0.5764	0.5945	0.5946	0.5942	0.5942	0.5938	0.5939	0.5919	0.5919
16	0.5177	0.5161	0.5161	0.5162	0.5162	0.5179	0.5177	0.5046	0.5046
17	0.5498	0.5574	0.5575	0.5574	0.5574	0.5570	0.5571	0.5531	0.5530
18	0.5762	0.5937	0.5937	0.5931	0.5931	0.5930	0.5931	0.5912	0.5913
19	0.5308	0.5219	0.5224	0.5216	0.5214	0.5274	0.5274	0.5252	0.5262
20	0.5552	0.5563	0.5570	0.5558	0.5567	0.5515	0.5520	0.5416	0.5437
21	0.5687	0.5775	0.5775	0.5776	0.5777	0.5753	0.5759	0.5468	0.5472
22	0.5222	0.5200	0.5197	0.5213	0.5209	0.5209	0.5210	0.5161	0.5173
23	0.5484	0.5494	0.5504	0.5500	0.5494	0.5517	0.5515	0.5297	0.5300
24	0.5646	0.5801	0.5803	0.5785	0.5785	0.5781	0.5783	0.5658	0.5662
25	0.5186	0.5182	0.5182	0.5178	0.5178	0.5173	0.5173	0.5162	0.5161
26	0.5414	0.5466	0.5460	0.5455	0.5454	0.5459	0.5460	0.5274	0.5274
27	0.5629	0.5761	0.5761	0.5756	0.5756	0.5748	0.5750	0.5675	0.5676
28	0.5187	0.5160	0.5158	0.5158	0.5156	0.5180	0.5177	0.5151	0.5159
29	0.5440	0.5488	0.5488	0.5488	0.5488	0.5485	0.5489	0.5377	0.5377
30	0.5628	0.5796	0.5796	0.5792	0.5792	0.5784	0.5784	0.5747	0.5749
31	0.5175	0.5167	0.5167	0.5167	0.5170	0.5172	0.5168	0.5098	0.5100
32	0.5414	0.5473	0.5473	0.5478	0.5476	0.5468	0.5468	0.5405	0.5407
33	0.5594	0.5720	0.5720	0.5715	0.5715	0.5708	0.5709	0.5682	0.5682
34	0.5157	0.5148	0.5146	0.5145	0.5146	0.5159	0.5159	0.5074	0.5074
35	0.5452	0.5513	0.5512	0.5509	0.5508	0.5511	0.5511	0.5433	0.5433
36	0.5616	0.5766	0.5767	0.5762	0.5762	0.5757	0.5757	0.5735	0.5736

AIC
Model	SC	Correct		Random		Overestimate		Underestimate
Model	SC	OR	EV	OR	EV	OR	EV	OR	EV
1	141.497	141.268	141.257	141.244	141.230	141.435	141.421	141.551	141.537
2	140.826	140.482	140.473	140.469	140.457	140.684	140.658	141.362	141.361
3	139.701	138.791	138.791	138.834	138.832	138.974	138.964	140.702	140.683
4	279.719	279.596	279.589	279.614	279.606	279.590	279.558	280.114	280.112
5	278.581	277.748	277.743	277.777	277.776	277.937	277.908	279.324	279.258
6	276.917	274.808	274.809	274.855	274.852	275.079	275.066	276.001	275.977
7	418.776	418.157	418.156	418.175	418.176	418.565	418.547	419.050	419.042
8	416.336	415.243	415.247	415.297	415.305	415.383	415.364	417.225	417.158
9	413.593	410.832	410.832	410.984	410.979	411.145	411.127	411.821	411.803
10	556.599	556.294	556.275	556.273	556.256	556.352	556.331	557.248	557.226
11	553.149	551.335	551.339	551.457	551.465	551.560	551.518	552.539	552.467
12	550.258	546.800	546.800	546.991	546.992	547.061	547.040	547.515	547.511
13	695.081	694.451	694.452	694.398	694.397	694.880	694.841	696.060	696.048
14	691.615	689.568	689.577	689.667	689.680	689.834	689.779	690.906	690.850
15	686.813	682.601	682.602	682.787	682.784	682.920	682.892	683.546	683.545
16	833.293	833.081	833.073	833.044	833.030	833.080	833.062	834.681	834.693
17	829.084	826.743	826.740	826.819	826.818	827.036	826.979	827.702	827.662
18	823.178	818.273	818.275	818.567	818.570	818.502	818.483	819.156	819.158
19	141.383	141.054	141.053	141.088	141.086	141.292	141.284	141.458	141.464
20	140.935	140.698	140.687	140.736	140.726	140.935	140.910	141.388	141.372
21	140.600	140.189	140.190	140.213	140.213	140.302	140.294	141.547	141.539
22	280.015	279.603	279.594	279.566	279.558	279.803	279.787	280.388	280.390
23	278.947	278.485	278.493	278.523	278.532	278.599	278.577	280.200	280.168
24	277.737	276.265	276.264	276.369	276.366	276.535	276.518	277.563	277.534
25	418.531	418.363	418.357	418.367	418.362	418.374	418.368	418.679	418.669
26	417.199	416.433	416.434	416.473	416.471	416.598	416.563	418.085	418.014
27	415.228	413.490	413.489	413.581	413.579	413.735	413.719	414.651	414.627
28	556.820	556.702	556.687	556.704	556.687	556.689	556.675	557.096	557.080
29	554.618	553.549	553.549	553.603	553.600	553.764	553.716	555.042	554.957
30	552.460	549.668	549.667	549.807	549.807	549.954	549.933	550.555	550.535
31	695.101	694.750	694.724	694.753	694.724	695.055	695.018	696.118	696.125
32	692.957	691.696	691.700	691.801	691.806	691.894	691.863	693.101	693.033
33	690.566	688.388	688.389	688.559	688.557	688.616	688.595	689.283	689.280
34	833.853	833.282	833.286	833.300	833.301	833.365	833.362	834.887	834.874
35	830.628	829.145	829.150	829.221	829.217	829.356	829.309	830.246	830.190
36	827.601	824.248	824.248	824.449	824.443	824.548	824.521	825.220	825.204

C
Model	SC	Correct		Random		Overestimate		Underestimate
Model	SC	OR	EV	OR	EV	OR	EV	OR	EV
1	0.5416	0.5286	0.5354	0.5293	0.5368	0.5367	0.5403	0.5358	0.5432
2	0.5626	0.5616	0.5641	0.5632	0.5647	0.5475	0.5587	0.5230	0.5281
3	0.6005	0.5975	0.6024	0.5979	0.6015	0.5873	0.6012	0.5461	0.5744
4	0.5325	0.5201	0.5307	0.5185	0.5299	0.5186	0.5294	0.5179	0.5237
5	0.5609	0.5574	0.5612	0.5584	0.5615	0.5496	0.5601	0.5224	0.5445
6	0.5989	0.6008	0.5997	0.5980	0.5969	0.5986	0.5988	0.5758	0.5887
7	0.5297	0.5281	0.5300	0.5285	0.5305	0.5207	0.5282	0.5169	0.5216
8	0.5579	0.5557	0.5584	0.5560	0.5581	0.5523	0.5566	0.5255	0.5465
9	0.5972	0.5982	0.5981	0.5971	0.5973	0.5937	0.5969	0.5762	0.5890
10	0.5228	0.5210	0.5236	0.5219	0.5238	0.5158	0.5216	0.5092	0.5155
11	0.5569	0.5577	0.5577	0.5574	0.5576	0.5554	0.5582	0.5366	0.5489
12	0.5941	0.5949	0.5937	0.5936	0.5923	0.5933	0.5944	0.5808	0.5878
13	0.5222	0.5166	0.5210	0.5161	0.5213	0.5164	0.5212	0.5139	0.5176
14	0.5604	0.5603	0.5607	0.5598	0.5599	0.5594	0.5611	0.5538	0.5576
15	0.6022	0.6030	0.6022	0.6021	0.6017	0.6010	0.6028	0.5973	0.5998
16	0.5216	0.5222	0.5226	0.5221	0.5212	0.5224	0.5227	0.5135	0.5175
17	0.5580	0.5596	0.5576	0.5596	0.5572	0.5588	0.5573	0.5543	0.5551
18	0.5980	0.5992	0.5979	0.5986	0.5974	0.5982	0.5986	0.5973	0.5968
19	0.5402	0.5190	0.5354	0.5220	0.5389	0.5283	0.5367	0.5219	0.5321
20	0.5652	0.5611	0.5622	0.5627	0.5622	0.5470	0.5576	0.5239	0.5346
21	0.5866	0.5798	0.5848	0.5805	0.5846	0.5677	0.5805	0.5271	0.5512
22	0.5348	0.5329	0.5355	0.5328	0.5340	0.5290	0.5352	0.5211	0.5297
23	0.5543	0.5535	0.5539	0.5535	0.5544	0.5453	0.5546	0.5103	0.5262
24	0.5897	0.5905	0.5893	0.5889	0.5882	0.5873	0.5888	0.5447	0.5750
25	0.5275	0.5265	0.5284	0.5271	0.5291	0.5175	0.5252	0.5143	0.5211
26	0.5526	0.5520	0.5528	0.5520	0.5526	0.5482	0.5509	0.5268	0.5398
27	0.5862	0.5864	0.5857	0.5859	0.5862	0.5845	0.5872	0.5648	0.5762
28	0.5233	0.5217	0.5220	0.5212	0.5211	0.5169	0.5240	0.5160	0.5207
29	0.5538	0.5541	0.5537	0.5535	0.5533	0.5514	0.5540	0.5362	0.5482
30	0.5882	0.5874	0.5881	0.5877	0.5879	0.5867	0.5889	0.5733	0.5818
31	0.5209	0.5207	0.5213	0.5215	0.5204	0.5181	0.5204	0.5115	0.5186
32	0.5527	0.5523	0.5531	0.5528	0.5532	0.5512	0.5534	0.5389	0.5452
33	0.5847	0.5862	0.5842	0.5857	0.5841	0.5850	0.5850	0.5814	0.5808
34	0.5195	0.5196	0.5193	0.5187	0.5188	0.5195	0.5199	0.5153	0.5176
35	0.5538	0.5541	0.5531	0.5538	0.5531	0.5532	0.5538	0.5478	0.5504
36	0.5839	0.5845	0.5843	0.5845	0.5841	0.5837	0.5849	0.5776	0.5812

AIC
Model	SC	Correct		Random		Overestimate		Underestimate
Model	SC	OR	EV	OR	EV	OR	EV	OR	EV
1	141.623	141.49	141.547	141.504	141.565	141.476	141.488	141.457	141.452
2	140.612	140.799	140.638	140.8	140.653	141.271	140.802	141.532	141.631
3	138.467	139.457	138.515	139.366	138.523	140.26	138.696	141.374	140.196
4	280.174	280.121	280.15	280.149	280.178	279.961	279.988	279.769	279.79
5	278.084	278.727	278.052	278.657	278.075	279.345	278.249	280.272	279.625
6	273.64	275.273	273.971	275.161	274.034	277.09	274.159	278.803	275.802
7	418.641	418.529	418.621	418.542	418.63	418.668	418.663	418.671	418.64
8	415.646	416.619	415.689	416.516	415.783	417.565	415.875	418.813	417.4
9	409.755	410.204	410.044	410.175	410.137	414.065	410.229	416.47	412.137
10	557.393	557.5	557.39	557.488	557.398	557.412	557.413	557.371	557.458
11	553.046	553.268	553.166	553.217	553.222	554.586	553.036	556.871	554.651
12	545.993	546.249	546.735	546.353	546.879	548.435	546.368	550.559	548.125
13	695.917	695.905	695.792	695.88	695.776	695.945	695.802	695.998	695.987
14	690.159	690.147	690.484	690.248	690.548	690.974	690.3	692.648	691.304
15	678.976	679.231	679.746	679.385	679.982	681.761	679.382	681.885	680.94
16	834.24	834.096	834.103	834.144	834.14	834.259	834.134	834.831	834.783
17	827.625	827.463	828.082	827.581	828.132	827.691	827.858	828.596	828.703
18	815.746	815.71	816.788	815.993	817.02	816.134	816.373	816.434	817.569
19	141.732	141.58	141.705	141.612	141.701	141.496	141.603	141.403	141.525
20	140.658	140.612	140.732	140.643	140.764	141.349	140.924	141.351	141.432
21	139.424	139.751	139.437	139.731	139.501	140.293	139.553	141.261	141.279
22	280	280.221	280.037	280.219	280.053	280.153	280.08	280.155	280.173
23	278.552	278.844	278.586	278.804	278.615	279.616	278.725	280.283	280.279
24	275.007	275.431	275.206	275.396	275.238	277.627	275.373	280.069	277.742
25	418.767	418.823	418.765	418.821	418.762	419.103	418.888	419.181	419.089
26	416.068	416.47	416.174	416.425	416.183	417.891	416.419	418.807	417.826
27	411.873	412.468	412.141	412.443	412.296	414.784	412.029	417.213	414.121
28	557.532	557.485	557.569	557.527	557.589	557.337	557.429	557.337	557.361
29	553.787	553.961	553.869	553.912	553.875	555.116	553.825	557.073	555.175
30	547.334	547.792	547.856	547.784	548.003	550.362	547.597	552.732	549.546
31	696.018	695.967	696.061	695.988	696.074	696.142	696.051	696.172	696.014
32	691.153	691.415	691.39	691.361	691.415	692.684	691.248	694.829	692.696
33	684.588	684.506	685.18	684.641	685.235	684.811	684.902	685.495	685.809
34	834.621	834.563	834.572	834.578	834.585	834.578	834.583	834.738	834.735
35	829.421	829.44	829.597	829.509	829.665	830.278	829.467	831.589	830.247
36	821.082	821.396	821.456	821.443	821.679	823.941	821.223	824.326	822.654

PERMALINK

A New Explained-Variance Based Genetic Risk Score for Predictive Modeling of Disease Risk

Ronglin Che

Alison A Motsinger-Reif

Abstract

1 INTRODUCTION

2 METHODS

2.1 EXISTING GENETIC RISK SCORE MODELS

2.1.1 Simple count GRS (SC-GRS)

2.1.2 Odds ratio weighted GRS (OR-GRS)

2.1.3 Direct logistic regression GRS (DL-GRS)

2.1.4 Polygenic GRS (PG-GRS)

2.2 NEW GENETIC RISK SCORE MODEL

2.2.1 Explained variance weighted GRS (EV-GRS)

2.3 TWO-STEP SIMULATION DESIGN

2.3.1 Step one simulation

Figure 1.

Table 1.

2.3.2 Step two simulation

2.4 PERFORMANCE MEASUREMENT

2.4.1 Power

2.4.2 C-statistic (Area under the curve)

2.4.3 Akaike information criterion (AIC)

2.5 DATA ANALYSIS

3 RESULTS

3.1 STEP ONE SIMULATION RESULTS

3.2 STEP TWO SIMULATION RESULTS

Table 2.

Figure 2. Model comparisons in Step 2 Scenario 1.

Figure 3. Model comparisons in Step 2 Scenario 2.

Figure 4. Model comparisons in Step 2 Scenario 2.

Figure 5. Model comparisons in Step 2 Scenario 3.

Figure 6. Model comparisons in Step 2 Scenario 3.

4 DISCUSSION

Acknowledgments

APPENDIX

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Table 9.

Table 10-1.

Table 10-2.

Table 10-3.

Table 10-4.

Table 11-1.

Table 11-2.

Table 11-3.

Table 11-4.

Table 12-1a.

Table 12-1b.

Table 12-2a.

Table 12-2b.

Table 12-3a.

Table 12-3b.

Table 12-4a.

Table 12-4b.

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases