Significant out-of-sample classification from methylation profile scoring for amyotrophic lateral sclerosis

Marta F Nabais; Tian Lin; Beben Benyamin; Kelly L Williams; Fleur C Garton; Anna A E Vinkhuyzen; Futao Zhang; Costanza L Vallerga; Restuadi Restuadi; Anna Freydenzon; Ramona A J Zwamborn; Paul J Hop; Matthew R Robinson; Jacob Gratten; Peter M Visscher; Eilis Hannon; Jonathan Mill; Matthew A Brown; Nigel G Laing; Karen A Mather; Perminder S Sachdev; Shyuan T Ngo; Frederik J Steyn; Leanne Wallace; Anjali K Henders; Merrilee Needham; Jan H Veldink; Susan Mathers; Garth Nicholson; Dominic B Rowe; Robert D Henderson; Pamela A McCombe; Roger Pamphlett; Jian Yang; Ian P Blair; Allan F McRae; Naomi R Wray

doi:10.1038/s41525-020-0118-3

. 2020 Feb 27;5:10. doi: 10.1038/s41525-020-0118-3

Significant out-of-sample classification from methylation profile scoring for amyotrophic lateral sclerosis

Marta F Nabais ^1,², Tian Lin ¹, Beben Benyamin ^1,³, Kelly L Williams ⁴, Fleur C Garton ¹, Anna A E Vinkhuyzen ¹, Futao Zhang ¹, Costanza L Vallerga ¹, Restuadi Restuadi ¹, Anna Freydenzon ¹, Ramona A J Zwamborn ⁵, Paul J Hop ⁵, Matthew R Robinson ¹, Jacob Gratten ^1,⁶, Peter M Visscher ^1,⁷, Eilis Hannon ², Jonathan Mill ^2,⁸, Matthew A Brown ⁹, Nigel G Laing ^10,¹¹, Karen A Mather ^12,¹³, Perminder S Sachdev ^12,¹⁴, Shyuan T Ngo ^7,^15,¹⁶, Frederik J Steyn ^15,¹⁶, Leanne Wallace ¹, Anjali K Henders ¹, Merrilee Needham ^17,^18,¹⁹, Jan H Veldink ⁵, Susan Mathers ²⁰, Garth Nicholson ²¹, Dominic B Rowe ⁴, Robert D Henderson ^7,^16,²², Pamela A McCombe ^16,²², Roger Pamphlett ²³, Jian Yang ^1,^7,^#, Ian P Blair ^4,^#, Allan F McRae ^1,^7,^#, Naomi R Wray ^1,^7,^✉,^#

¹Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072 Australia

²University of Exeter Medical School, Royal Devon & Exeter Hospital, Exeter, Devon EX2 5DW UK

³Australian Centre for Precision Health, University of South Australia Cancer Research Institute, School of Health Sciences, University of South Australia, Adelaide, SA 5001 Australia

⁴Centre for Motor Neuron Disease Research, Macquarie University, Sydney, NSW 2109 Australia

⁵Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, 3584 CG Netherlands

⁶Mater Research Institute, The University of Queensland, Brisbane, QLD 4101 Australia

⁷Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072 Australia

⁸Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, SE5 8AF UK

⁹Australian Translational Genomics Centre, Queensland University of Technology, Brisbane, QLD 4102 Australia

¹⁰The Centre for Medical Research, Faculty of Health and Medical Sciences, The University of Western Australia, Nedlands, WA 6009 Australia

¹¹Harry Perkins Institute of Medical Research, QEII Medical Centre, Nedlands, WA 6009 Australia

¹²Centre for Healthy Brain Ageing, School of Psychiatry, University of New South Wales, Sydney, NSW 2031 Australia

¹³Neuroscience Research Australia Institute, Randwick, NSW 2031 Australia

¹⁴Neuropsychiatric Institute, The Prince of Wales Hospital, University of New South Wales, Randwick, NSW 2031 Australia

¹⁵The Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD 4072 Australia

¹⁶Centre for Clinical Research, The University of Queensland, Brisbane, QLD 4019 Australia

¹⁷Fiona Stanley Hospital, Perth, WA 6150 Australia

¹⁸The University of Notre Dame Australia, Fremantle, WA 6160 Australia

¹⁹Institute for Immunology and Infectious Diseases, Murdoch University, Perth, WA 6150 Australia

²⁰Calvary Health Care Bethlehem, Parkdale, VIC 3195 Australia

²¹ANZAC Research Institute, Concord Repatriation General Hospital, Sydney, NSW 2139 Australia

²²Department of Neurology, Royal Brisbane and Women’s Hospital, Brisbane, QLD 4029 Australia

²³Discipline of Pathology and Department of Neuropathology, Brain and Mind Centre, The University of Sydney, Sydney, NSW 2050 Australia

^✉

Corresponding author.

Contributed equally.

PMCID: PMC7046630 PMID: 32140259

Abstract

We conducted DNA methylation association analyses using Illumina 450K data from whole blood for an Australian amyotrophic lateral sclerosis (ALS) case–control cohort (782 cases and 613 controls). Analyses used mixed linear models as implemented in the OSCA software. We found a significantly higher proportion of neutrophils in cases compared to controls which replicated in an independent cohort from the Netherlands (1159 cases and 637 controls). The OSCA MOMENT linear mixed model has been shown in simulations to best account for confounders. When combined in a methylation profile score, the 25 most-associated probes identified by MOMENT significantly classified case–control status in the Netherlands sample (area under the curve, AUC = 0.65, CI_95% = [0.62–0.68], p = 8.3 × 10⁻²²). The maximum AUC achieved was 0.69 (CI_95% = [0.66–0.71], p = 4.3 × 10⁻³⁴) when cell-type proportion was included in the predictor.

Subject terms: Predictive markers, Amyotrophic lateral sclerosis

Introduction

Amyotrophic lateral sclerosis (ALS) is a severe neurodegenerative disease characterized by progressive muscle weakness and degeneration of upper and/or lower motor neurons in the central nervous system. Clinical heterogeneity in presentation of ALS¹ may reflect a complex disease etiology, with considerable genetic contribution even among the >90% of cases that present without a strong family history of ALS.^1,2 A lower bound for the genetic contribution is given by the proportion of variance associated with common single-nucleotide polymorphisms (SNPs) genome-wide, the SNP-based heritability estimated as 8.5% (CI_95% = [7.5–9.5]) in Europeans³ and 15% (CI_95% = [8–22]) in East Asians.⁴ Epidemiological studies have also identified environmental and occupational risk factors, such as metal and pesticide exposure⁵ and defense force occupation,^6,7 but generally studies are underpowered.⁸ Hence, lifetime environmental exposures paired with genetic susceptibility likely contribute to an increased risk for ALS.

DNA methylation (DNAm), which in mammals is almost exclusively found in cytosine–guanine dinucleotides (CpG), is a widely studied epigenetic modification. Aberrant DNAm patterns can be consequence of environmental exposures, and/or cause or consequence of disease, and have been hypothesized to play a role in neurodegenerative diseases (including ALS).^9–12 Indeed, a methylome-wide association study (MWAS) has suggested that neurodegenerative processes in ALS may be associated with DNAm alterations.¹³ In this study, the authors interrogated the methylation status of genome-wide CpG loci in postmortem spinal cord tissue. Despite the very small sample size (12 ALS subjects and 11 age and gender-matched neurologically normal controls), the authors reported 4261 significant differentially methylated positions (DMPs), annotated to 3574 genes. Functional enrichment analyses showed these genes to be highly enriched in biological functions related to immune and inflammation response. However, the presence of confounding factors, and failing to account for these confounders, has been widely recognized as a concern in MWAS,¹⁴ because they could lead to spurious association results. For example, Figueroa-Romero et al.¹³ did not account for the potential confounding of cell-type composition, as there is evidence that these explain much of the observed variability in DNAm.^15,16 Nonetheless, identification of DMPs across the genome that may drive or be driven by ALS pathogenic processes remains of importance, especially for biomarker development.

Confounders in the context of DNAm studies need careful consideration. Some confounders such as sex and technical batch effects are usually recorded and can be modeled as covariates. However, other recognized lifestyle confounders may exist, but may not be recorded. Furthermore, there could be confounders that are present, but unknown. Some case–control DNAm differences, driven by cell-type composition, or changes due to environmental exposures, medications, or complications of the disease, may be considered confounders, or alternatively, may be considered of primary interest depending on the context of the scientific question. Some unmeasured confounders can be inferred from DNAm reference-based classification methods, for example the Horvath age predictor or the Houseman algorithm to predict cell-type proportions (CTP).^15,17 However, predicted confounders have inherent classification error, and may only explain a proportion of the variation attributable to the directly measured covariates, which may result in inflated test statistics due to the uncaptured variation of confounders.¹⁸ More importantly, recent studies show that correction approaches (e.g., principal component analysis) widely used in standard MWAS and association studies of other molecular phenotypes may induce bias, which persists for large sample sizes and replicates out-of-sample.¹⁹

The OmicS-data-based Complex trait Analysis (OSCA) software¹⁸ implements mixed linear model (MLM) approaches: MLM-based omics association (MOA) and multi-component MLM-based omics association excluding the target (MOMENT). MOA and MOMENT analyses account for trait-associated probes that are highly correlated with other trait-associated probes across the genome, and assume effect sizes are drawn from a normal distribution. MOA assumes a single distribution of effects sizes, while MOMENT allows a different effect size distribution for the most-associated probes. Fewer trait-associated probes are identified in MOA and MOMENT analyses compared to linear regression, but simulations¹⁸ show that probes that are significantly associated from these analyses are more likely to be true positives. Briefly, the MOA method fits a random genome-wide DNAm factor per person with variance–covariance matrix (the omics relationship matrix or ORM) between individuals built from genome-wide DNAm sites (equivalent to a model of fitting all DNAm sites as random effects); this model is analogous to the MLM association method implemented in EMMAX²⁰ and GCTA²¹ for SNP data. Extensive simulations showed high false-positive rate (FPR) in standard linear regression MWAS, while both MOA and MOMENT controlled for false discovery with little loss of power.¹⁸ However, in some scenarios, the more stringent MOMENT method is needed to control the FPR if a proportion of DNAm sites are much more correlated than the others, at the cost of slightly reduced power. The MOMENT method fits an MLM with two random-effect components for each probe tested (and hence two ORMs between subjects computed from two sets of DNAm sites), with the DNAm sites grouped by their associations with the trait (leaving out the DNAm sites in a window around the target probe being tested for association to avoid proximal contamination).¹⁸ Hence, the MOMENT method is more likely to identify DNAm differences that have a specific role in disease rather than reflect factors that impact multiple DNAm sites across the genome.

In this study, we applied both the MOA and MOMENT methods to a disease trait (ALS) and compared results with standard linear regression methods. We identified significant differences in predicted immune CTP and one significantly DMP between cases and controls. We used the estimated effect sizes of all DNAm sites, using best-linear unbiased prediction (BLUP) to calculate individual DNA methylation profile scores (MPS) in an independent ALS sample. The MPS classified ALS case–control status with area under the curve (AUC) of 0.61, CI_95% = [0.58–0.63], p = 5.6 × 10⁻¹¹. The classification accuracy increased when the effects associated with predicted immune CTP were included (AUC = 0.69, CI_95% = [0.66–0.71], p = 4.3 × 10⁻³⁴).

Results

Differences in predicted cell type proportions between ALS cases and controls

There is accumulating evidence for an active role of immune cells, and inflammation in general, on neurodegenerative disorders (both as cause and consequence of disease).^22,23 Hence, our first analysis was to investigate differences in predicted CTP between cases and controls. We used the Houseman algorithm based on purified cell types from whole blood¹⁵ to estimate blood CTP for each individual based on the DNAm data. Using stepwise backward logistic regression models, increased neutrophil proportions were found to be significantly associated with ALS, after Bonferroni correction (OR = 1.02, CI_95% = [1.01–1.04], p = 6 × 10⁻⁴) (Fig. 1). In the Netherlands (NL) sample, the pattern of differences in CTP was similar to that of the Australian (AU) sample, and with its larger sample size neutrophils (OR = 1.11, CI_95% = [1.09–1.13], p = 2.2 × 10⁻²⁷), monocytes (OR = 1.29, CI_95% = [1.22–1.36], p = 3.3 × 10⁻²⁰), B lymphocytes (OR = 1.12, CI_95% = [1.08–1.16], p = 4.1 × 10⁻¹⁰), and natural killer cells (OR = 1.06, CI_95% = [1.03–1.1], p = 3.1 × 10⁻⁵) were all significantly associated with ALS (Fig. 1).

Fig. 1 — Methylation-derived predicted cell proportions in % (y-axis) for different cell types (x-axis) in the Australian ALS case–control cohort (N_cases = 613 and N_controls = 782, red colored boxplots) and in the Netherlands ALS case–control cohort (N_cases = 1159 and N_controls = 637, blue colored boxplots). Gray—controls, orange—cases. P values are from stepwise logistic regression models (Methods) and indicate cell types significantly associated with case–control status, after Bonferroni correction. P values in red correspond to the AU ALS cohort and p values in blue correspond to the NL ALS cohort. The boxplot horizontal black line marks the median CTP value in that group. The lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles). The upper whisker extends from the hinge to the largest value no further than 1.5 IQR from the hinge (where IQR is the inter-quartile range, or distance between the first and third quartiles). The lower whisker extends from the hinge to the smallest value at most 1.5 IQR of the hinge. Data beyond the end of the whiskers are called “outlying” points and are plotted individually.

DNA MWAS analysis

MWAS results show that under an MLM framework the number of significant DMPs is much reduced, compared to the standard linear regression models (Fig. 2a). At a Bonferroni corrected genome-wide significance threshold of p = 3.1 × 10⁻⁷, the number of DMPs passing significance are 476 in a linear model without any covariates; 30 in a linear model with the first 10 PCs, calculated from the ORM as fixed effects added to the model as covariates; 12 in MOA, and 1 in MOMENT (top to bottom row of Fig. 2a, respectively). The genomic inflation factor, λ (the median of χ² test statistics of all DNAm sites divided by its expected value under the null), is better controlled using MOA and MOMENT compared to the linear regression models (Fig. 2b). The significance of almost all MOA-identified DNAm sites is reduced in MOMENT (Table 1, Fig. 3a), with the exception of cg04104695 annotated to gene CXXC5. However, MOA and MOMENT regression coefficients of DNAm sites with p from MOA < 5 × 10⁻⁴ (m = 241) are still highly correlated (Fig. 3b, ${\hat{r}}_{b} = 0.81$ , s.e. = 0.03). Interestingly, the correlation of effect sizes is much higher between standard linear regression models and MOA (Supplementary Fig. 1, ${\hat{r}}_{b} = 1$ , s.e. = 3 × 10⁻³) than with MOMENT ( ${\hat{r}}_{b} = - 0.2$ , s.e. = 0.02), when considering probes that are significant from the linear regression model. From simulations, MOMENT analyses have been shown to be more powerful in controlling for potential confounders, with a caveat of a slight loss of power.¹⁸ In a previous MWAS analysis of lung function it could be shown that the probes associated with smoking, a confounder not included in the analysis, could explain the probes significantly associated in MOA, but not MOMENT analysis.¹⁸ Hence, the difference between the number of significant DMPs found by MOA or MOMENT could reflect a small difference in power for detection of true positives, or a higher number of false positives in MOA.

Fig. 2 — a From top to bottom row: Manhattan plots using linear regression, linear regression with 10 principal components calculated from the ORM as fixed effects, MOA, and MOMENT. Red circles represent probes with p < 1 × 10⁻⁵; red crosses represent probes with p < 3.1 × 10⁻⁷ genome-wide significant threshold. Solid dark blue line mark p < 3.1 × 10⁻⁷ and dashed sky-blue line marks p < 1 × 10⁻⁵. b QQ-plots showing the expected and observed −log(p) in each model. We calculated the genomic inflation factor (λ) as the median of χ² test statistics of all probes divided by its expected value under the null. λ_Linear = 1.19, λ_{Linear_with_PCs} = 1.1, λ_MOA = 1.01, λ_MOMENT = 1.02.

Table 1.

DNA methylation sites significantly associated with ALS at p < 3.1 × 10⁻⁷ from MOA and MOMENT in the Australian cohort.

Chr	Probe	bp	Gene	Orientation	b_MOA	p_MOA	b_MOMENT	p_MOMENT
4	cg08422863	88379498	HERC6	R	−0.16	5.9 × 10⁻⁹	−0.07	0.01
13	cg05380910	38989909	STOML3	F	0.15	9.8 × 10⁻⁹	0.06	0.04
3	cg24866706	96619630	RPL18AP8;MTRNR2L12	R	0.15	2.1 × 10⁻⁸	0.04	0.09
3	cg27143246	169771795	MYNN;RP11−816J6.3	R	−0.15	2.4 × 10⁻⁸	−0.05	0.07
12	cg00178984	59697390	SLC16A7	R	0.15	5.8 × 10⁻⁸	0.06	0.02
8	cg26078251	124300968	RP11-383J24.2	F	0.15	6.5 × 10⁻⁸	0.05	0.06
8	cg20134271	124483581	RNF139	R	0.15	8.2 × 10⁻⁸	0.04	0.13
7	cg05303559	158719689	AC019084.7	F	0.14	1.5 × 10⁻⁷	0.03	0.35
5	cg04104695	139679164	CXXC5	R	−0.14	2.7 × 10⁻⁷	−0.14	2.1 × 10⁻⁷
6	cg12785183	166031972	—	F	0.14	2.7 × 10⁻⁷	0.01	0.9
11	cg21836562	128868867	KCNJ1	R	0.14	2.9 × 10⁻⁷	0.05	0.08
11	cg07613278	43311777	API5	F	−0.14	3 × 10⁻⁷	−0.03	0.2

Open in a new tab

Chr chromosome number, Probe probe identification number as provided by Illumina, bp base pair position in the genome, Gene closest genes the probe is annotated to, based on distance to transcription starting site, following the method described elsewhere,⁴⁴ Orientation DNA strand orientation, F forward, R Reverse, b_MOA effects sizes (increase (positive sign) or decrease (negative sign) of methylation between cases and controls per standard deviation unit) of AU MOA, p_MOA ps of MOA models for AU, b_MOMENT effects sizes (interpreted as b_MOA) of AU MOMENT, p_MOMENT ps of AU MOMENT.

Fig. 3 — a −log10(p) of all probes in MOA (x-axis) and MOMENT (y-axis), for the AU ALS dataset. Dashed blue lines mark the genome-wide significance threshold (p = 3.1 × 10⁻⁷) of MOA and MOMENT. Red dots mark all probes with p < 5 × 10⁻⁴ from MOA (m = 241) as in b. Effect sizes of MOA (x-axis) and MOMENT (y-axis), for AU ALS dataset, of probes with p < 5 × 10⁻⁴ from MOA. Correlation of effect sizes: ${\hat{r}}_{b} = 0.81$ = 0.81, s.e. = 0.03.

The NL cohort was available to us to seek replication of our DMP. Of the 101 top DNAm sites (suggestive p < 5 × 10⁻⁴) in the AU MOMENT analysis, 94 were available in the NL sample; however, none replicated at p < 0.05/94, i.e. 5.3 × 10⁻⁴ (Supplementary Fig. 2a). Post hoc power calculations (Supplementary Fig. 2c, d) show that the replication sample size necessary to detect a true association based on the estimated effect size of the most-associated probe (i.e., cg04104695) would be 3478 (assuming a balanced design, 80% power and replication significance threshold p = 5.3 × 10⁻⁴). Hence, the lack of replication of individual probes may reflect lack of power. Consistent with this conclusion, we found the effect sizes of MOMENT results for the AU and NL samples for these 94 probes were correlated (Supplementary Fig. 2b, ${\hat{r}}_{b} = 0.33$ , s.e. = 0.2).

Proportion of variance associated with genome-wide DNAm sites

To provide quantitative description of the relationship between case–control status and DNAm covariates, we estimated the proportion of variation in case–control status captured by genome-wide DNAm ( ${\hat{ρ}}^{2}$ ) by OREML (see Methods) using probe values corrected for known potential confounders. This approach parallels estimation of SNP-based heritability,²⁴ except that SNP-based heritability represents the variance attributable to genome-wide SNPs and so has an inference of association through causality. In contrast, with DNAm (or other molecular phenotypes) the ${\hat{ρ}}^{2}$ estimate could reflect both causes and consequences of disease (including consequences of medication). Therefore, it is not appropriate to transform this estimate to the liability scale (as is done for SNP-based heritability estimates). The estimate must be interpreted as dependent on the proportion of cases in the sample (which influences the phenotypic variance on this scale), and hence we report both ${\hat{ρ}}^{2}$ and phenotypic variance $({\hat{σ}}_{P}^{2})$ (since phenotypic variance is reduced by inclusion of covariates) to aid interpretation of results. In the baseline model with no covariates, ${\hat{σ}}_{P}^{2} = 0.247$ and ${\hat{ρ}}^{2} = 15 %$ (Table 2), and ${\hat{σ}}_{P}^{2}$ simply binomial variance given the proportion of the sample that are cases. A model with confounder covariates (predicted age + sex + smoking score + batch effects) gave ${\hat{ρ}}^{2} = 17 %$ ( ${\hat{σ}}_{P}^{2} = 0.227$ , 92% of baseline ${\hat{σ}}_{P}^{2}$ ). A model with predicted CTP as fixed effects gives a higher ${\hat{ρ}}^{2} = 24 %$ ( ${\hat{σ}}_{P}^{2} = 0.237$ , 96% of baseline ${\hat{σ}}_{P}^{2}$ ). A model with both the confounder covariates and predicted CTP as covariates gave ${\hat{ρ}}^{2} = 31 %$ ( ${\hat{σ}}_{P}^{2} = 0.221$ , 89% of baseline ${\hat{σ}}_{P}^{2}$ ).

Table 2.

Proportion of phenotypic variance captured by all DNAm sites $({\hat{ρ}}^{2})$ and phenotypic variance $({\hat{σ}}_{P}^{2})$ estimated from different OREML models.

OREML model (y = Cβ + Wu + e)	${\hat{ρ}}^{2}$ (s.e.)	${\hat{σ}}_{P}^{2}$ (s.e.)
No covariates	15% (0.05)	0.247 (0.01)
With predicted age + predicted smoking score + sex + batch effects	17% (0.06)	0.227 (0.01)
With predicted cell proportions	24% (0.06)	0.237 (0.01)
With predicted age + predicted smoking score + sex + batch effects + predicted cell proportions	31% (0.08)	0.221 (0.01)

Open in a new tab

s.e. standard error. In the absence of covariates ${\hat{σ}}_{P}^{2}$ = P(1 − P), where P is the proportion of the sample that are cases.

Out-of-sample classification using DNA methylation profiles scores (MPS)

Out-of-sample classification provides independent evidence that differences in DNAm between cases and controls reflect differences associated with disease status rather than technical confounding effects, as the latter are less likely to be shared between independently collected and processed samples. It can also leverage DNAm differences between cases and controls that do not achieve statistical significance. MPS were calculated for each individual in the NL sample as the sum of DNAm probe values weighted by their effect sizes estimated in the AU sample (Supplementary Fig. 3). Figure 4 summarizes the maximum AUC given by each of the different methods used to calculate MPS.

Fig. 4 — Bars indicate 95% confidence intervals of AUC values for each method. MOA: CI_95% = [0.57–0.63]; BLUP: CI_95% = [0.58–0.63]; MOMENT: CI_95% = [0.62–0.68]; predicted cell proportions (CTP): CI_95% = [0.63–0.69]; MOMENT + CTP: CI_95% = [0.65–0.7]; and BLUP + CTP: CI_95% = [0.66–0.71]. Dashed line indicates AUC = 0.5, i.e., random classification. P values are from logistic regression models. m = number of DNAm probes used to calculate the MPS.

The BLUP model provides jointly estimated effect sizes for each probe, but assumes probe effect sizes are drawn from a normal distribution. The AUC for AU as discovery cohort and NL as target was 0.61 (Supplementary Fig. 4, CI_95% = [0.58–0.63], p = 5.6 × 10⁻¹¹, p from logistic regression model). We also assessed the effect on classification accuracy of MPS derived from estimated effect sizes (from OREML) attributed to each CTP. Indeed, CTP-derived MPS alone were a better classifier than BLUP scores (Supplementary Fig. 4, AUC = 0.66, CI_95% = [0.63–0.69], p = 6.5 × 10⁻²⁶). Classification efficacy was further increased when both BLUP-derived DNAm effect sizes and CTP effect sizes were used together to calculate the MPS (Supplementary Fig. 4, AUC = 0.69, CI_95% = [0.66–0.71], p = 4.3 × 10⁻³⁴).

Finally, we used out-of-sample classification to gain insight about the results from the linear regression vs MOA vs MOMENT analyses (Supplementary Table 1). MPS based on MOA estimated effect sizes from AU as discovery sample to NL as target sample gave maximum AUC = 0.60 (CI_95% = [0.57–0.63], p = 2.7 × 10⁻⁴), using all probes with p < 0.5 (m = 74,556). However, MPS based on MOMENT results gave higher AUC for all p value thresholds tested, despite fewer probes passing each significance threshold. The maximum MOMENT AUC was 0.65 (CI_95% = [0.62–0.68], p = 8.3 × 10⁻²²) with probes with p < 1 × 10⁻⁴ (m = 25). A model that included MPS calculated with MOMENT-derived effect sizes and CTP effect sizes gave AUC = 0.67 (CI_95% = [0.65–0.70], p = 2.2 × 10⁻²⁹). We note a negative correlation between MOA and MOMENT MPS, despite high correlation of effect sizes (Supplementary Fig. 5). Based on simulations, Zhang et al.¹⁸ found this property to be induced when non-causal probes were included in the calculation of MPS. For causal probes in simulations, the MOA and MOMENT-based classifiers were strongly positively correlated.

Based on the DNAm profile scoring analyses we concluded that the MOMENT analyses are more likely to have identified true positive associations, AUC = 0.65 from 25 probes (Supplementary Table 2). We investigated if the top DMPs from MOMENT (p < 1 × 10⁻⁴, m = 25) overlapped with brain mQTL regions²⁵ (p < 1 × 10⁻⁵) and GWAS SNPs²⁶ (p < 5 × 10⁻⁸). We found no evidence for overlap with GWAS signals (Supplementary Table 3), which likely reflects lack of power in both MWAS and GWAS, as has previously been observed for body-mass index (BMI).²⁷

Discussion

In this study we have conducted the largest MWAS to date for ALS and we use the new software OSCA that implements different MLM approaches specifically designed for omics data. We have presented results to help evaluation of the comparison of linear regression, MOA and MOMENT methods for a disease trait, recognizing that the potential for confounding of technical artefacts is much greater for a binary trait as compared to a quantitative trait.

Using the most stringent MOMENT model, we identified 1 DMP between ALS cases and controls, which is annotated to CXXC5 on chromosome 5 (cg04104695, p_MOA = 2.7 × 10⁻⁷, p_MOMENT = 2.1 × 10⁻⁷, decreased blood DNA methylation associated with ALS). The association was not replicated in the independent NL study, but power analyses suggest this may reflect effect size and sample size, reminiscent of early underpowered GWAS for which increasing sample sizes have led to many new (highly replicated) discoveries in the past decade.²⁸ Despite lack of replication of the most-associated probe, out-of-sample classification using effect sizes estimated from the 25 most-associated DNAm sites in the AU sample gave a classification AUC of 0.65 (p = 2.2 × 10⁻²⁹) in the NL sample. We summarize literature evidence of a functional role for CXXC5 in neurodegeneration and functional annotation of 25 most-associated DNAm sites in a Supplementary Note. We found better out-of-sample classification when using DNAm effect sizes estimated from MOMENT compared to MOA or linear regression results, despite fewer probes detected at each association p value threshold. Our results support the recommendation of Zhang et al.¹⁸ to use the MOMENT over MOA model, since MOA results may include more false-positive associations.

The maximum out-of-sample classification was achieved using a score calculated from BLUP-estimated DNAm effect sizes together with effect size estimates for predicted CTP (AUC = 0.69, CI_95% = [0.66–0.71], p = 4.3 × 10⁻³⁴). We are careful to use the term out-of-sample classification rather than prediction, because the blood samples available to us in both the AU and NL cohorts were not taken prior to diagnosis, hence the strength of classification may reflect consequences as well as causes of disease. The trans-national out-of-sample classification is a strength of our study, but we recognize that best-practice medication and clinical management are likely shared across countries. In these historically collected samples we have incomplete records of riluzole use, which was the only approved medication at the time of sample collection. Despite these caveats it is noteworthy that the AUC is much higher for DNAm classification (here AUC = 0.69 from a discovery sample of 782 cases and 613 controls) compared to SNP polygenic risk score prediction (AUC = 0.57 in our AU sample, unpublished data, based on a discovery sample of 20,806 cases and 59,804 controls²⁶). Comparison of genetic and DNAm predictors for some complex traits also found much smaller discovery samples are needed for DNAm predictors.^27,29 However, this will also depend on the underlying (epi)genetic architecture of the trait (for example, DNAm predictors explained little variation in height, but high variation in BMI compared to genetic predictors²⁷).

Given the interest in developing early diagnostic biomarkers from blood it seems a worthwhile goal to collect blood samples from patients when they first present in neurology clinics, prior to diagnosis, treatment, and clinical management. Generating DNAm profiles from such samples would allow evaluation of DNAm as a biomarker diagnostic tool. Such data could also evaluate the sensitivity of the predictor in differentiating ALS from ALS mimics, i.e., people who present to neurology clinics with ALS-like symptoms, but who do not achieve ALS diagnosis (often after lengthy clinical evaluation). Differences in CTP between cases and controls is an important contributor to the out-of-sample classification, consistent with reports of aberrant activation of the peripheral immune system in ALS,^30,31 and with reports of associations between elevated white blood cell count and increased ratio of circulating neutrophils to monocytes with more-rapid progression of ALS.^32,33 Here, we predict CTP based on the cell-type-specific DNAm signatures,¹⁵ but direct measurement of blood cell types might generate more accurate classifiers. We estimate association effect sizes of >200,000 DNAm sites, and necessarily effect sizes are estimated with error. Larger sample sizes are needed to increase accuracy of estimation of DNAm effect sizes and to increase classification accuracy. Larger, carefully phenotyped sample collections could allow evaluations of classification scores based on a combination of genetic, DNAm, other “omics” and environmental risk factors.

Our study has several limitations. First, a perceived limitation of our study may be that blood is not a relevant tissue for understanding the biological mechanisms underlying ALS, due to the tissue-specificity of most DNAm patterns.^34–36 However, blood is an easily accessible tissue which is relevant for generation of biomarkers associated with disease. Biological or environmental interpretation of DNAm associations can be made as downstream analyses after their discovery. Variation between people in DNAm controlled by SNP variation has been shown to have high correlation between brain and blood,^25,37 which provides further support that blood may be an appropriate tissue for the goal of developing early diagnostic biomarkers. A second limitation is that the blood sample available to us from both the AU and NL cohorts had incomplete clinical records, both in relation to time since diagnosis and with respect to clinical management and treatment. Ongoing sample collections in both countries are now collecting such data more consistently.

In summary, we applied a new OSCA analysis pipeline to DNAm data measured in whole blood in an Australian ALS case–control cohort. The MOMENT method is expected to identify differences in DNAm associated with disease independent of confounding factors such as differences in cell-type proportion or smoking that generate widespread DNAm changes. An MPS calculated from the 25 most-associated probes identified by MOMENT, and an MPS based on cell-type differences both generate significant classification of ALS in an independent sample (Fig. 3), with the profile scores themselves being correlated only at 0.12 (Supplementary Fig. 5). Given our relatively small discovery sample, our significant classification results from an Australian discovery cohort into a Netherlands target cohort indicate that DNAm may be a useful predictive biomarker. We advocate for larger samples with blood collected prior to diagnosis and with deep clinical phenotyping to allow investigation of this proposal.

Methods

Datasets description

The Australian ALS cohort (AU) consisted of two cohorts, AU1 (440 cases, 418 controls) and AU2 (342 cases ALS, 195 controls) (Supplementary Table 4). For AU1, patients and controls were ascertained from the University of Sydney as part of the Australian MND DNA bank, which recruited participants from April 2000 to June 2011. Cases were white Australians older than 25 years recruited from around Australia via state-based MND associations with diagnoses verified by neurologists. All participants gave written consent and the study protocol was approved by the Sydney South West Area Health Service Human Research Ethics Committee (HREC). AU2 cases were recruited from clinics across Australia between 2015 and 2017 and were diagnosed with definite or probable ALS according to the revised El Escorial criteria.³⁸ Control subjects were healthy individuals free of neuromuscular diseases, recruited as either partners or friends of patients with ALS or community volunteers. Those with a recorded family history for ALS (including those recorded as testing positive for known mutations) were excluded as both cases or controls in both AU1 and AU2. Additional controls for the AU2 cohort were monozygotic (MZ) twin pairs aged >65 years contributed from the Older Australian Twin Study (OATS)³⁹ recruited at QIMR Berghofer Medical Research Institute, University of New South Wales and the University of Melbourne. Twin pair data helped in quality control (QC) checks but only one twin from each pair was used in analyses. Written consent was obtained from all individuals enrolled in this study, and the study was approved by the corresponding HREC at the different sites: University of Sydney, Western Sydney Local Health District, Royal Brisbane and Women Hospital Metro North, South Metropolitan Health Service, Macquarie University, QIMR Berghofer Medical Research Institute, University of New South Wales and the University of Melbourne. The mean predicted age,⁴⁰ predicted smoking scores,²⁹ and sex distribution between cases and controls for the cohorts used in this study can be found in Supplementary Table 5.

A cohort from the Netherlands (NL) was available to us for replication analyses, collected under Project MinE.⁴¹ The participants of this study consisted of 1866 Netherlands individuals (N = 1222 cases, N = 644 controls).⁴² All ALS cases were diagnosed with definite or probable ALS according to the revised El Escorial criteria,³⁸ and those with a recorded family history for ALS were excluded. All participants gave written informed consent and the institutional review board of the University Medical Center Utrecht approved this study.

DNA methylation data

For the AU ALS datasets, bisulfite conversions were performed in 96-well plates using the EZ-96 DNA Methylation Kit (Zymo Research, Irvine, CA, USA). Prior to conversion, DNA concentrations were determined by the Take3™ Micro-Volume Plate on the Epoch™ Microplate Spectrophotometer (BioTek Instruments, Inc.) and standardized to include 500 ng. Three technical replicates were included in each conversion to assess repeatability. The following DNA samples were obtained from the NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research: CEPH NA06997 and CEPH NA07029. Each DNA sample was included as a control on alternating plates. One sample from each run was duplicated on the plate and one sample duplicated from a different plate. DNA recovery after conversion was quantified using the Take3™ Micro-Volume Plate on the Epoch™ Microplate Spectrophotometer (BioTek Instruments, Inc.). Samples that showed incomplete bisulfite conversion (calculated concentration <25 ng/μl) were not taken forward to the assessment of DNA methylation. Bisulfite converted DNA samples were hybridized to the 12 sample, Illumina Infinium HumanMethylation450 Beadchip (Illumina Inc., San Diego, CA) using the Infinium HD Methylation protocol and Tecan robotics. DNAm data for the NL sample were generated under similar protocols.

QC and normalization of DNA methylation data

Data QC and normalization were conducted using the meffil R package.⁴³ The same pipeline for DNA methylation (DNAm) data processing and QC was applied to all samples. AU1 and AU2 samples were processed together (along with additional samples from other studies). QC threshold parameters (Supplementary Note) determined samples and DNAm sites to exclude prior to normalization (Supplementary Table 6). Functional normalization (FN) was performed to remove technical variation, as described elsewhere.⁴⁴ The most variable normalized probes (m = 20,000) were extracted, decomposed into principal components, and each component regressed against slide, chip column, chip row, and sex to test for batch effects. The association detection p value threshold was set to 0.01. For both datasets technical variation was not completely removed after FN and some known confounders affect DNA methylation. Thus, we perform an additional adjustment step using the normalized DNAm probe values (from meffil, as described above) in linear regression models as response variable and predicted age, sex, chip position, predicted CTP (excluding eosinophils, because of redundancy in proportion data), predicted smoking scores and slide (as random effect) as covariates. The residuals from this adjustment step were used for downstream MWAS analyses. We hypothesized the effect of this pre-adjustment should have a more pronounced effect in standard linear regression MWAS results compared to mixed linear models. Supplementary Fig. 6 supports this hypothesis, with a more pronounced reduction in significance of results in linear regression (Supplementary Fig. 6a) compared to both MOA and MOMENT (Supplementary Fig. 6b, c, respectively). We also observed a lower correlation of effect sizes (Supplementary Fig. 6d), between linear regression with pre-adjusted and non-adjusted DNA methylation values ( ${\hat{r}}_{b_Linear_adj} = 0.85$ , s.e. = 6.7 × 10⁻³), compared to both MOA ( ${\hat{r}}_{b_MOA_adj} = 0.99$ , s.e. = 1.6 × 10⁻³) and MOMENT ( ${\hat{r}}_{b_MOMENT_adj} = 0.97$ , s.e. = 5.7 × 10⁻³, Supplementary Fig. 6e, f, respectively). Related individuals, sex-chromosome linked probes, probes influenced by SNPs, and probes with non-unique hybridization and extension were also removed prior to analysis, following recommendations described elsewhere.⁴⁵ Afterwards, we removed remaining probes with s.d. <0.02. This decision is justified, because power to detect an association depends in part on the variance between individuals and (standardized) effect sizes. Excluding these DNAm sites also reduces the multiple testing burden in MWAS. In all, 160,304 probes remained for analysis in the AU cohort (Supplementary Table 6).

Omics residual maximum likelihood analyses (OREML): variance captured by all DNAm sites

Based on extensive simulation and application to real DNAm data, Zhang et al.¹⁸ developed a method with one random-effect component to estimate the proportion of trait variance captured by all DNAm sites. The random-effect model can be written as

y = Cβ + Wu + e with var (y) = V = {WW}^{'} σ_{u}^{2} + I σ_{e}^{2} = A σ_{o}^{2} + I σ_{e}^{2},

where y is an n × 1 vector of phenotype values of n individuals, C is an n × p matrix of p covariates (e.g., age, sex, smoking status), β is a p × 1 vector of the fixed covariate effects on the phenotype, W is an n × m matrix of m standardized DNAm values, where m is the number of DNAm sites, u_i is an m × 1 vector of the joint random probe effects on the phenotype, and e is an n × 1 vector of residuals. The variance of y is var(y) = V=WW′ $σ_{u}^{2} + I σ_{e}^{2}$ . We can re-write this equation as $V = A σ_{o}^{2} + I σ_{e}^{2}$ with A = WW′/m and $σ_{o}^{2} = m σ_{u}^{2}$ , where A is then the omics-data-based relationship matrix (ORM) and $σ_{u}^{2}$ is the variance between individuals attributed to genome-wide DNAm differences. The variance components can be estimated by REML. The proportion of variance attributable to genome-wide DNAm, $ρ^{2} = \frac{σ_{o}^{2}}{σ_{o}^{2} + σ_{e}^{2}}$ , is similar to the SNP-based heritability concept in GREML,²⁴ but where variances represent factors that may include both causes and consequences of the phenotype. The baseline variance of the phenotype case–control status in the AU sample is approximated as the binomial variance of ${\hat{σ}}_{P}^{2}$ = P(1−P) = 0.246, where P is the proportion of the sample that are cases (P = 0.56), with the reported ${\hat{σ}}_{P}^{2}$ being the sum of the estimates of the variance components ${\hat{σ}}_{o}^{2} + {\hat{σ}}_{e}^{2}$ .

Linear regression DNA MWAS analysis

For linear regression MWAS we used models without:

y = w_{i} b_{i} + e

and with covariates

y = w_{i} b_{i} + C β + e

where w_i (a n × 1 vector of DNAm measures of a probe i, i.e., the target probe) and b_i (the effect of probe i on the phenotype; fixed effect). We used the 10 principal components estimated from the ORM as covariates.

MLM-based omics association (MOA) and multi-component MLM-based omics association excluding the target (MOMENT) MWAS analyses

We conducted MLM MWAS using both MOA and MOMENT. The MOA MWAS model is

y = w_{i} b_{i} + W u + e

In this model, the probe being tested is fitted twice, once as a fixed and also as a random effect, which results in slightly reduced power compared to a (hypothetical) model in which the focal probe is excluded from W, but this would be computationally very demanding. In this model it is assumed that all probe effects follow a single distribution, which may not reflect the true distribution. In the MOMENT model¹⁸ DNAm probe effect sizes are drawn from two effect size distributions for different probes sets, selected according to their association statistics in an initial linear regression model, with each group then fitted as a random-effect:

y = w_{i} b_{i} + \sum_{j} W_{j} u_{j} + e

where W_j is an n x m_j matrix of standardized DNAm probe values in the jth group, and m_j is the number of DNAm sites in the group (excluding the DNAm sites in the 100 Kb region centred at the probe being tested). After conducting MWAS, DNAm sites were mapped to the latest GRCh38/hg38 genome build⁴⁵ and annotated to genes, based on GENCODE v22. We used the r_b method²⁵ to quantify the similarity between probe effects, which accounts for sample overlap and errors in the estimated probe effects.

Out-of-sample classification using DNA MPSs

MPS were calculated for each individual in the NL sample as the sum of DNAm probe values weighted by their effect sizes estimated in the AU sample (Supplementary Fig. 3). In different analyses DNAm probe effect sizes were estimated from the linear regression, MOA or MOMENT MWAS or the BLUP-estimated probe values of u from Eq. (1), as implemented in OSCA.¹⁸ In the BLUP models, predicted age, sex, predicted smoking scores, and batch effects (chip position and slide) were fitted as fixed effects. Classification efficacy of the MPS was evaluated by the area under the receiver-operator characteristic curve (AUC) that relates the false-positive rate (specificity) to the true-positive rate (sensitivity), in logistic regression. We used the R package pROC⁴⁶ to plot the receiver-operator characteristic curves and calculate AUC for each MPS. To evaluate the possible gain in classification accuracy, we calculated MPS from the fixed effects of predicted CTP, estimated in an OREML analysis (Eq. (1)). Analyses were conducted in R version 3.5.0 and OSCA v0.45.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Supplementary information

Supplementary Information^{(4.4MB, pdf)}

Reporting summary^{(1.3MB, pdf)}

Acknowledgements

We thank the participants who contributed to this research through providing a blood sample and clinical data. We thank the research nurses of all clinical sites for their recruitment of participants. We thank the laboratory researchers for their care in generating the DNAm data. We acknowledge funding from the National Health and Medical Research Council (1078901, 1083187, 1113400, 1121962, 1084417, 1079583 and 1151854 (JPND BRAIN-MEND)) and the Motor Neurone Disease Australia Ice Bucket Challenge Grant. F.C.G. acknowledges the MNDRIA Bill Gole Postdoctoral Fellowship. M.F.N. acknowledges the University of Queensland/University of Exeter (QUEX) Ph.D. studentship. J.H.V. acknowledges funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement no. 772376 - EScORIAL”).

Author contributions

N.R.W., I.P.B., and A.F.M. designed the study. M.F.N. and T.L. performed the main analyses. B.B., R.R., F.C.G., C.L.V., A.F., R.A.J.Z., and P.H. performed preliminary analyses and helped with the quality control of the AU and NL ALS cohorts. K.L.W., F.C.G., and A.A.E.V. helped with phenotype curation. N.R.W., A.F.M., F.Z., J.Y., J.G., P.M.V., E.H., J.M., and M.R.R. gave analyses advice. K.L.W., N.G.L., S.T.N., F.J.S., K.A.M., and P.S.S. facilitated access to blood samples. J.H.V., S.M., G.N., D.B.R., R.D.H., P.A.M., R.P. are the ALS cohort leaders and overviewed blood sample collection and participant recruitment. F.Z. and J.Y. developed the OSCA software. A.K.H. coordinated and managed the study. L.W. and M.A.B. coordinated and overviewed the lab quality control of blood samples to generate DNA methylation data, for the AU cohort. M.F.N. and N.R.W. wrote the manuscript. All authors read, commented, and approved the final manuscript.

Data availability

The Australian data are available from the author upon reasonable request; a dbGAP submission for upload of the Australian data is in progress. The Netherlands dataset was available to us for replication purposes.

Code availability

The code used to analyze the data and to generate the figures in this manuscript can be found at https://github.com/m-nabais/201909_als_dnam_manuscript.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Jian Yang, Ian P. Blair, Allan F. McRae, Naomi R. Wray

Supplementary information

Supplementary information is available for this paper at 10.1038/s41525-020-0118-3.

References

1.Al-Chalabi A, van den Berg LH, Veldink J. Gene discovery in amyotrophic lateral sclerosis: implications for clinical management. Nat. Rev. Neurol. 2017;13:96–104. doi: 10.1038/nrneurol.2016.182. [DOI] [PubMed] [Google Scholar]
2.Veldink JH. ALS genetic epidemiology ‘How simplex is the genetic epidemiology of ALS?’. J. Neurol. Neurosurg. Psychiatry. 2017;88:537. doi: 10.1136/jnnp-2016-315469. [DOI] [PubMed] [Google Scholar]
3.van Rheenen W, et al. Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis. Nat. Genet. 2016;48:1043–1048. doi: 10.1038/ng.3622. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Benyamin B, et al. Cross-ethnic meta-analysis identifies association of the GPX3-TNIP1 locus with amyotrophic lateral sclerosis. Nat. Commun. 2017;8:611. doi: 10.1038/s41467-017-00471-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.White, A. R., Aschner, M., Costa, L. G., Bush, A. I. & Roos, P. M. in Biometals in Neurodegenerative Diseases: Mechanisms and Therapeutics. (eds White, A., Aschner, M., Costa, L. & Bush, A.) Ch. 10, 175–193 (Elsevier Inc., 2017).
6.Beard, J. D. et al. Military service, deployments, and exposures in relation to amyotrophic lateral sclerosis survival. PLoS ONE12, e0185751 (2017). [DOI] [PMC free article] [PubMed]
7.Seals, R. M., Kioumourtzoglou, M. A., Hansen, J., Gredal, O., Weisskopf, M. G. Amyotrophic Lateral Sclerosis and the Military: A Population-based Study in theDanish Registries. Epidemiology27, 188–93 (2016). [DOI] [PMC free article] [PubMed]
8.Belbasis L, Bellou V, Evangelou E. Environmental risk factors and amyotrophic lateral sclerosis: an umbrella review and critical assessment of current evidence from systematic reviews and meta-analyses of observational studies. Neuroepidemiology. 2016;46:96–105. doi: 10.1159/000443146. [DOI] [PubMed] [Google Scholar]
9.Al-Chalabi A, et al. Genetic and epigenetic studies of amyotrophic lateral sclerosis. Amyotroph. Lateral Scler. Frontotemporal Degener. 2013;14:44–52. doi: 10.3109/21678421.2013.778571. [DOI] [PubMed] [Google Scholar]
10.Al-Mahdawi, S., Anjomani Virmouni, S. & Pook, M. A. in Epigenetic Biomarkers and Diagnostics (ed. García-Giménez, J. L.) Ch. 20, 401–415 (Academic Press, 2016).
11.Hwang J-Y, Aromolaran KA, Zukin RS. The emerging field of epigenetics in neurodegeneration and neuroprotection. Nat. Rev. Neurosci. 2017;18:347. doi: 10.1038/nrn.2017.46. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Martin LJ, Wong M. Aberrant regulation of DNA methylation in amyotrophic lateral sclerosis: a new target of disease mechanisms. Neurotherapeutics. 2013;10:722–733. doi: 10.1007/s13311-013-0205-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Figueroa-Romero C, et al. Identification of epigenetically altered genes in sporadic amyotrophic lateral sclerosis. PLoS ONE. 2012;7:e52672. doi: 10.1371/journal.pone.0052672. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Michels KB, et al. Recommendations for the design and analysis of epigenome-wide association studies. Nat. Methods. 2013;10:949. doi: 10.1038/nmeth.2632. [DOI] [PubMed] [Google Scholar]
15.Houseman EA, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics. 2012;13:86. doi: 10.1186/1471-2105-13-86. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Jaffe AE, Irizarry RA. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 2014;15:R31. doi: 10.1186/gb-2014-15-2-r31. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Horvath S. DNA methylation age of human tissues and cell types. Genome Biol. 2013;14:3156. doi: 10.1186/gb-2013-14-10-r115. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Zhang F, et al. OSCA: a tool for omic-data-based complex trait analysis. Genome Biol. 2019;20:107. doi: 10.1186/s13059-019-1718-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Dahl A, Guillemot V, Mefford J, Aschard H, Zaitlen N. Adjusting for principal components of molecular phenotypes induces replicating false positives. Genetics. 2019;211:1179. doi: 10.1534/genetics.118.301768. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Kang HM, et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 2010;42:348. doi: 10.1038/ng.548. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 2014;46:100–106. doi: 10.1038/ng.2876. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.McCombe PA, Henderson RD. The Role of immune and inflammatory mechanisms in ALS. Curr. Mol. Med. 2011;11:246–254. doi: 10.2174/156652411795243450. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Amor, S., Puentes, F., Baker, D., van der Valk, P. Inflammation in neurodegenerative diseases. Immunology 129, 154–169 (2010). [DOI] [PMC free article] [PubMed]
24.Yang J, et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Qi T, et al. Identifying gene targets for brain-related traits using transcriptomic and methylomic data from blood. Nat. Commun. 2018;9:2282. doi: 10.1038/s41467-018-04558-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Nicolas A, et al. Genome-wide analyses identify KIF5A as a novel ALS gene. Neuron. 2018;97:1268–1283.e1266. doi: 10.1016/j.neuron.2018.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Shah S, et al. Improving phenotypic prediction by combining genetic and epigenetic associations. Am. J. Hum. Genet. 2015;97:75–85. doi: 10.1016/j.ajhg.2015.05.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Visscher PM, et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.McCartney DL, et al. Epigenetic prediction of complex traits and death. Genome Biol. 2018;19:136. doi: 10.1186/s13059-018-1514-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Bowerman M, et al. Neuroimmunity dynamics and the development of therapeutic strategies for amyotrophic lateral sclerosis. Front. Cell. Neurosci. 2013;7:214. doi: 10.3389/fncel.2013.00214. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Gustafson MP, et al. Comprehensive immune profiling reveals substantial immune system alterations in a subset of patients with amyotrophic lateral sclerosis. PLoS ONE. 2017;12:e0182002. doi: 10.1371/journal.pone.0182002. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Murdock BJ, et al. Increased ratio of circulating neutrophils to monocytes in amyotrophic lateral sclerosis. Neurol. Neuroimmunol. Neuroinflamm. 2016;3:e242–e242. doi: 10.1212/NXI.0000000000000242. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Murdock BJ, et al. Correlation of peripheral immunity with rapid amyotrophic lateral sclerosis progression. JAMA Neurol. 2017;74:1446–1454. doi: 10.1001/jamaneurol.2017.2255. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Byun H-M, et al. Epigenetic profiling of somatic tissues from human autopsy specimens identifies tissue- and individual-specific DNA methylation patterns. Hum. Mol. Genet. 2009;18:4808–4817. doi: 10.1093/hmg/ddp445. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Davies MN, et al. Functional annotation of the human brain methylome identifies tissue-specific epigenetic variation across brain and blood. Genome Biol. 2012;13:R43. doi: 10.1186/gb-2012-13-6-r43. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Slieker RC, et al. Identification and systematic annotation of tissue-specific differentially methylated regions using the Illumina 450k array. Epigenet. Chromatin. 2013;6:26. doi: 10.1186/1756-8935-6-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Hannon E, Lunnon K, Schalkwyk L, Mill J. Interindividual methylomic variation across blood, cortex, and cerebellum: implications for epigenetic studies of neurological and neuropsychiatric phenotypes. Epigenetics. 2015;10:1024–1032. doi: 10.1080/15592294.2015.1100786. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Brooks BR, Miller RG, Swash M, Munsat TL. El Escorial revisited: revised criteria for the diagnosis of amyotrophic lateral sclerosis. Amyotroph. Lateral Scler. Other Mot. Neuron Disord. 2000;1:293–299. doi: 10.1080/146608200300079536. [DOI] [PubMed] [Google Scholar]
39.Sachdev PS, et al. A comprehensive neuropsychiatric study of elderly twins: The Older Australian Twins Study. Twin Res. Hum. Genet. 2009;12:573–582. doi: 10.1375/twin.12.6.573. [DOI] [PubMed] [Google Scholar]
40.Zhang Q, et al. Improved precision of epigenetic clock estimates across tissues and its implication for biological ageing. Genome Med. 2019;11:54. doi: 10.1186/s13073-019-0667-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Van Rheenen W, et al. Project MinE: study design and pilot analyses of a large-scale whole-genome sequencing study in amyotrophic lateral sclerosis. Eur. J. Hum. Genet. 2018;26:1537–1546. doi: 10.1038/s41431-018-0177-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Huisman MHB, et al. Population based epidemiology of amyotrophic lateral sclerosis using capture–recapture methodology. J. Neurol. Neurosurg. Psychiatry. 2011;82:1165. doi: 10.1136/jnnp.2011.244939. [DOI] [PubMed] [Google Scholar]
43.Min JL, Hemani G, Davey Smith G, Relton C, Suderman M. Meffil: efficient normalization and analysis of very large DNA methylation datasets. Bioinformatics. 2018 doi: 10.1093/bioinformatics/bty476. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Fortin J-P, et al. Functional normalization of 450k methylation array data improves replication in large cancer studies. Genome Biol. 2014;15:503. doi: 10.1186/s13059-014-0503-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Zhou W, Laird PW, Shen H. Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res. 2017;45:e22–e22. doi: 10.1093/nar/gkw967. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Robin X, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. doi: 10.1186/1471-2105-12-77. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(4.4MB, pdf)}

Reporting summary^{(1.3MB, pdf)}

Data Availability Statement

The code used to analyze the data and to generate the figures in this manuscript can be found at https://github.com/m-nabais/201909_als_dnam_manuscript.

[CR1] 1.Al-Chalabi A, van den Berg LH, Veldink J. Gene discovery in amyotrophic lateral sclerosis: implications for clinical management. Nat. Rev. Neurol. 2017;13:96–104. doi: 10.1038/nrneurol.2016.182. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Veldink JH. ALS genetic epidemiology ‘How simplex is the genetic epidemiology of ALS?’. J. Neurol. Neurosurg. Psychiatry. 2017;88:537. doi: 10.1136/jnnp-2016-315469. [DOI] [PubMed] [Google Scholar]

[CR3] 3.van Rheenen W, et al. Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis. Nat. Genet. 2016;48:1043–1048. doi: 10.1038/ng.3622. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Benyamin B, et al. Cross-ethnic meta-analysis identifies association of the GPX3-TNIP1 locus with amyotrophic lateral sclerosis. Nat. Commun. 2017;8:611. doi: 10.1038/s41467-017-00471-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.White, A. R., Aschner, M., Costa, L. G., Bush, A. I. & Roos, P. M. in Biometals in Neurodegenerative Diseases: Mechanisms and Therapeutics. (eds White, A., Aschner, M., Costa, L. & Bush, A.) Ch. 10, 175–193 (Elsevier Inc., 2017).

[CR6] 6.Beard, J. D. et al. Military service, deployments, and exposures in relation to amyotrophic lateral sclerosis survival. PLoS ONE12, e0185751 (2017). [DOI] [PMC free article] [PubMed]

[CR7] 7.Seals, R. M., Kioumourtzoglou, M. A., Hansen, J., Gredal, O., Weisskopf, M. G. Amyotrophic Lateral Sclerosis and the Military: A Population-based Study in theDanish Registries. Epidemiology27, 188–93 (2016). [DOI] [PMC free article] [PubMed]

[CR8] 8.Belbasis L, Bellou V, Evangelou E. Environmental risk factors and amyotrophic lateral sclerosis: an umbrella review and critical assessment of current evidence from systematic reviews and meta-analyses of observational studies. Neuroepidemiology. 2016;46:96–105. doi: 10.1159/000443146. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Al-Chalabi A, et al. Genetic and epigenetic studies of amyotrophic lateral sclerosis. Amyotroph. Lateral Scler. Frontotemporal Degener. 2013;14:44–52. doi: 10.3109/21678421.2013.778571. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Al-Mahdawi, S., Anjomani Virmouni, S. & Pook, M. A. in Epigenetic Biomarkers and Diagnostics (ed. García-Giménez, J. L.) Ch. 20, 401–415 (Academic Press, 2016).

[CR11] 11.Hwang J-Y, Aromolaran KA, Zukin RS. The emerging field of epigenetics in neurodegeneration and neuroprotection. Nat. Rev. Neurosci. 2017;18:347. doi: 10.1038/nrn.2017.46. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Martin LJ, Wong M. Aberrant regulation of DNA methylation in amyotrophic lateral sclerosis: a new target of disease mechanisms. Neurotherapeutics. 2013;10:722–733. doi: 10.1007/s13311-013-0205-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Figueroa-Romero C, et al. Identification of epigenetically altered genes in sporadic amyotrophic lateral sclerosis. PLoS ONE. 2012;7:e52672. doi: 10.1371/journal.pone.0052672. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Michels KB, et al. Recommendations for the design and analysis of epigenome-wide association studies. Nat. Methods. 2013;10:949. doi: 10.1038/nmeth.2632. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Houseman EA, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics. 2012;13:86. doi: 10.1186/1471-2105-13-86. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Jaffe AE, Irizarry RA. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 2014;15:R31. doi: 10.1186/gb-2014-15-2-r31. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Horvath S. DNA methylation age of human tissues and cell types. Genome Biol. 2013;14:3156. doi: 10.1186/gb-2013-14-10-r115. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Zhang F, et al. OSCA: a tool for omic-data-based complex trait analysis. Genome Biol. 2019;20:107. doi: 10.1186/s13059-019-1718-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Dahl A, Guillemot V, Mefford J, Aschard H, Zaitlen N. Adjusting for principal components of molecular phenotypes induces replicating false positives. Genetics. 2019;211:1179. doi: 10.1534/genetics.118.301768. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Kang HM, et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 2010;42:348. doi: 10.1038/ng.548. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 2014;46:100–106. doi: 10.1038/ng.2876. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.McCombe PA, Henderson RD. The Role of immune and inflammatory mechanisms in ALS. Curr. Mol. Med. 2011;11:246–254. doi: 10.2174/156652411795243450. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Amor, S., Puentes, F., Baker, D., van der Valk, P. Inflammation in neurodegenerative diseases. Immunology 129, 154–169 (2010). [DOI] [PMC free article] [PubMed]

[CR24] 24.Yang J, et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Qi T, et al. Identifying gene targets for brain-related traits using transcriptomic and methylomic data from blood. Nat. Commun. 2018;9:2282. doi: 10.1038/s41467-018-04558-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Nicolas A, et al. Genome-wide analyses identify KIF5A as a novel ALS gene. Neuron. 2018;97:1268–1283.e1266. doi: 10.1016/j.neuron.2018.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Shah S, et al. Improving phenotypic prediction by combining genetic and epigenetic associations. Am. J. Hum. Genet. 2015;97:75–85. doi: 10.1016/j.ajhg.2015.05.014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Visscher PM, et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.McCartney DL, et al. Epigenetic prediction of complex traits and death. Genome Biol. 2018;19:136. doi: 10.1186/s13059-018-1514-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Bowerman M, et al. Neuroimmunity dynamics and the development of therapeutic strategies for amyotrophic lateral sclerosis. Front. Cell. Neurosci. 2013;7:214. doi: 10.3389/fncel.2013.00214. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Gustafson MP, et al. Comprehensive immune profiling reveals substantial immune system alterations in a subset of patients with amyotrophic lateral sclerosis. PLoS ONE. 2017;12:e0182002. doi: 10.1371/journal.pone.0182002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Murdock BJ, et al. Increased ratio of circulating neutrophils to monocytes in amyotrophic lateral sclerosis. Neurol. Neuroimmunol. Neuroinflamm. 2016;3:e242–e242. doi: 10.1212/NXI.0000000000000242. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Murdock BJ, et al. Correlation of peripheral immunity with rapid amyotrophic lateral sclerosis progression. JAMA Neurol. 2017;74:1446–1454. doi: 10.1001/jamaneurol.2017.2255. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Byun H-M, et al. Epigenetic profiling of somatic tissues from human autopsy specimens identifies tissue- and individual-specific DNA methylation patterns. Hum. Mol. Genet. 2009;18:4808–4817. doi: 10.1093/hmg/ddp445. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Davies MN, et al. Functional annotation of the human brain methylome identifies tissue-specific epigenetic variation across brain and blood. Genome Biol. 2012;13:R43. doi: 10.1186/gb-2012-13-6-r43. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Slieker RC, et al. Identification and systematic annotation of tissue-specific differentially methylated regions using the Illumina 450k array. Epigenet. Chromatin. 2013;6:26. doi: 10.1186/1756-8935-6-26. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Hannon E, Lunnon K, Schalkwyk L, Mill J. Interindividual methylomic variation across blood, cortex, and cerebellum: implications for epigenetic studies of neurological and neuropsychiatric phenotypes. Epigenetics. 2015;10:1024–1032. doi: 10.1080/15592294.2015.1100786. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Brooks BR, Miller RG, Swash M, Munsat TL. El Escorial revisited: revised criteria for the diagnosis of amyotrophic lateral sclerosis. Amyotroph. Lateral Scler. Other Mot. Neuron Disord. 2000;1:293–299. doi: 10.1080/146608200300079536. [DOI] [PubMed] [Google Scholar]

[CR39] 39.Sachdev PS, et al. A comprehensive neuropsychiatric study of elderly twins: The Older Australian Twins Study. Twin Res. Hum. Genet. 2009;12:573–582. doi: 10.1375/twin.12.6.573. [DOI] [PubMed] [Google Scholar]

[CR40] 40.Zhang Q, et al. Improved precision of epigenetic clock estimates across tissues and its implication for biological ageing. Genome Med. 2019;11:54. doi: 10.1186/s13073-019-0667-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Van Rheenen W, et al. Project MinE: study design and pilot analyses of a large-scale whole-genome sequencing study in amyotrophic lateral sclerosis. Eur. J. Hum. Genet. 2018;26:1537–1546. doi: 10.1038/s41431-018-0177-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Huisman MHB, et al. Population based epidemiology of amyotrophic lateral sclerosis using capture–recapture methodology. J. Neurol. Neurosurg. Psychiatry. 2011;82:1165. doi: 10.1136/jnnp.2011.244939. [DOI] [PubMed] [Google Scholar]

[CR43] 43.Min JL, Hemani G, Davey Smith G, Relton C, Suderman M. Meffil: efficient normalization and analysis of very large DNA methylation datasets. Bioinformatics. 2018 doi: 10.1093/bioinformatics/bty476. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Fortin J-P, et al. Functional normalization of 450k methylation array data improves replication in large cancer studies. Genome Biol. 2014;15:503. doi: 10.1186/s13059-014-0503-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Zhou W, Laird PW, Shen H. Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res. 2017;45:e22–e22. doi: 10.1093/nar/gkw967. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Robin X, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. doi: 10.1186/1471-2105-12-77. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Significant out-of-sample classification from methylation profile scoring for amyotrophic lateral sclerosis

Marta F Nabais

Tian Lin

Beben Benyamin

Kelly L Williams

Fleur C Garton

Anna A E Vinkhuyzen

Futao Zhang

Costanza L Vallerga

Restuadi Restuadi

Anna Freydenzon

Ramona A J Zwamborn

Paul J Hop

Matthew R Robinson

Jacob Gratten

Peter M Visscher

Eilis Hannon

Jonathan Mill

Matthew A Brown

Nigel G Laing

Karen A Mather

Perminder S Sachdev

Shyuan T Ngo

Frederik J Steyn

Leanne Wallace

Anjali K Henders

Merrilee Needham

Jan H Veldink

Susan Mathers

Garth Nicholson

Dominic B Rowe

Robert D Henderson

Pamela A McCombe

Roger Pamphlett

Jian Yang

Ian P Blair

Allan F McRae

Naomi R Wray

Abstract

Introduction

Results

Differences in predicted cell type proportions between ALS cases and controls

Fig. 1. Predicted proportions of cell types estimated with the Houseman algorithm, which are based on methylation values of purified cell types from a whole-blood sample.

DNA MWAS analysis

Fig. 2. Manhattan and QQ plots of MWAS using linear or mixed linear regression models, for the Australian ALS samples (Ncases = 613 and Ncontrols = 782).

Table 1.

Fig. 3. MWAS results from MOA and MOMENT show high correlation of effect sizes in the Australian ALS cohort.

Proportion of variance associated with genome-wide DNAm sites

Table 2.

Out-of-sample classification using DNA methylation profiles scores (MPS)

Fig. 4. Maximum AUC given by the different methods used to calculate MPS, when classifying from the Australian (Ncases = 613, Ncontrols = 782) to the Netherlands (Ncases = 1159 and Ncontrols = 637) ALS cohort.

Discussion

Methods

Datasets description

DNA methylation data

QC and normalization of DNA methylation data

Omics residual maximum likelihood analyses (OREML): variance captured by all DNAm sites

Linear regression DNA MWAS analysis

MLM-based omics association (MOA) and multi-component MLM-based omics association excluding the target (MOMENT) MWAS analyses

Out-of-sample classification using DNA MPSs

Reporting summary

Supplementary information

Acknowledgements

Author contributions

Data availability

Code availability

Competing interests

Footnotes

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Fig. 2. Manhattan and QQ plots of MWAS using linear or mixed linear regression models, for the Australian ALS samples (N_cases = 613 and N_controls = 782).

Fig. 4. Maximum AUC given by the different methods used to calculate MPS, when classifying from the Australian (N_cases = 613, N_controls = 782) to the Netherlands (N_cases = 1159 and N_controls = 637) ALS cohort.