Abstract
Genetic generalized epilepsies (GGE) are genetically determined, as their name implies and they are clinically characterized by generalized seizures involving both sides of the brain in the absence of detectable brain lesions or other known causes. GGEs are yet complex and are influenced by many different genetic and environmental factors. Methylation specific epigenetic marks are one of the players of the complex epileptogenesis scenario leading to GGE. In this study, we have set out to perform genome-wide methylation profiling to analyze GGE trios each consisting of an affected parent-offspring couple along with an unaffected parent. We have developed a novel scoring scheme within trios to categorize each locus analyzed as hypo or hypermethylated. This stringent approach classified differentially methylated genes in each trio and helped us to produce trio specific and pooled gene lists with inherited and aberrant methylation levels. In order to analyze the methylation differences from a boarder perspective, we performed enrichment analysis with these lists using the PANOGA software. This collective effort has led us to detect pathways associated with the GGE phenotype, including the neurotrophin signaling pathway. We have demonstrated a trio based approach to genome-wide DNA methylation analysis that identified individual and possibly minor changes in methylation marks that could be involved in epileptogenesis leading to GGE.
Introduction
Genetic generalized epilepsies (GGE; also known as idiopathic generalized epilepsy) comprise a distinct subgroup of epilepsy syndromes: GGEs are genetically determined and they are clinically characterized by generalized seizures involving both sides of the brain in the absence of detectable brain lesions or other known etiologies [1]. GGEs account for almost 20% of all epilepsies, which in turn affects up to 0.2% of the general population [2]. Childhood and juvenile absence epilepsy (CAE and JAE), juvenile myoclonic epilepsy (JME) and epilepsy with generalized tonic–clonic seizures (GTCS) represent the most common GGE syndromes [3].
The genetic etiology of GGE remains to be elusive for most of the cases despite presumed presence of strong genetic factors. Even when the underlying genetic aberration is known, variable expressivity and reduced penetrance are common among patients [4]. This can partly be attributed to the complex determinants of epileptogenesis leading to GGE. Aberrant epigenetic modifications have recently been emerged in various neurological diseases including epilepsy as key pathogenic or predisposing factors [5]. Epigenetics refers to a broad range of mechanisms that ultimately lead to establishment of heritable and potentially reversible chromatin marks without actual base alterations on the DNA sequence [6]. This, in turn affects every aspect of cellular and organismal plasticity and dynamics via spatial and temporal control of gene expression. Among all epigenetic mechanisms, DNA methylation is a well-studied modification implicated in various models of epilepsy [7]. The process involves covalent addition of a methyl group to the cytosine base of a palindromic Cytosine-phosphate-Guanine (CpG) dinucleotide to form 5-methylcytosine.
Epigenome is regulated by both inherited and environmental factors. Inherited aberrations in epigenetic marks either due to on-site sequence alterations or changes in level or activity of trans acting elements can alter the epigenome, resulting in changes at the transcriptional level [8]. Such inherited changes can be investigated in readily accessible tissues including blood, which can serve as feasible source of epigenetic biomarker detection related to epilepsy. Likewise, blood-based changes at the pathway level might point out biological processes distorted in epilepsy. We therefore set out to investigate differential DNA methylation patterns in 15 trios with inherited GGE in order to dissect the inherited alterations in blood-based methylation signatures associated with epilepsy. We then conducted gene set enrichment analysis for these trio-based changes to map pathways related to epilepsy.
Materials and methods
Patients and genome-wide DNA methylation measures
Genome-wide DNA methylation was profiled in whole-blood samples obtained from 15 discrete trios each having an affected child along with an affected parent. The trios had been followed up between the years 2003–2012 at Istanbul University Epilepsy Center (EPIMER). All patients underwent full clinical, neuroradiological and electroencephalography (EEG) examinations. Informed consents were obtained from all individuals in accordance with ethics approval obtained for the study from Ethics Committee of Istanbul Medical Faculty (2009/2851). Clinic information regarding the epilepsy phenotype along with age of onset data and degree of consanguinity in each trio is presented in S1 Table.
Whole-blood genomic DNA (150 ng) was treated with sodium bisulphite using the EZ DNA Methylation Gold Kit (Zymo Research, Irvine, CA), following DNA isolation performed by using QIAamp DNA maxi kit. Genome-wide DNA methylation levels were assayed in 45 samples distributed on 6 different Illumina Infinium HumanMethylation450 BeadChip (Illumina, San Diego, CA, USA), following the Illumina Infinium HD Methylation protocol. This chip covers 485,577 CpG sites per sample at single-nucleotide resolution. The CpG sites are distributed mainly within or neighboring the genes, including promoter regions each represented as two distinct blocks of 200 bp and 1500 bp upstream of the transcription start site (designated TSS200 and TSS1500, respectively) [9]. DNA methylation data produced in the study have been deposited to the NCBI GEO database (Accession Number: GSE119684).
Data processing
Raw intensity (.idat) files were read into RnBeads package in order to perform (i) initial quality control assessments for filtering out poor-performing samples and probes, (ii) normalization and (iii) detection of batch effect for different chips. Normalization was first performed using both the Methylumi package for background correction and Subset-quantile Within Array Normalization (SWAN) for array normalization [10–12]. The normalized fraction of DNA methylation at a specific CpG site was calculated as beta value = M/M+U+100, where M and U are methylated and un-methylated signal intensities, respectively. The probes represented in Illumina HumanMethylation450 v1.2 manifest file was then filtered out if they had an “rs” identifier, were located on sex chromosomes, or had unreliable measurements (Greedycut algorithm; S1 Fig).
Three individuals from two trios (Trio 4 and 7) did not pass the required quality assessment due to failure in bisulphite conversion step, so these two trios were completely removed from downstream analyses leaving 13 GGE trios (S2 Fig).
Differential methylation analysis (DMA)
Two distinct types of DMA approaches, dual and trio based, were performed in order to elucidate methylation patterns that may be associated with the GGE phenotype. In the dual approach, the data set from all affected and unaffected samples were combined respectively and compared with each other as if they were two distinct phenotype groups. Alternatively, in the trio-based approach, the data from two affected samples from one trio were combined and compared against the unaffected sample in that particular trio. The data was analyzed afterwards for the 13 trios that passed the initial quality control by RnBeads package.
Trio-based DMA for enrichment analysis: Implementation of PANOGA using a trio-based scoring scheme
We conducted a trio-oriented pathway approach for clustering differentially methylated genes (DMG). The whole genome beta values for each CpG site was marked as hypo- or hyper-methylated: Beta value between 0.0 and 0.2 indicates hypo-methylation and beta value between 0.8 and 1.0 indicates hyper-methylation [13]. A particular CpG site is considered to be differentially methylated, whenever there is a difference in the methylation status between the unaffected parent and the affected child-parent pair in a given trio (Table 1). This rule indicates that if the affected pair within a trio is both hypo-methylated, the unaffected parent should be hyper-methylated and vice versa. We first formed a list of CpG loci that fits this rule and then scored each CpG locus in this list using the scoring scheme (Trio-based scoring scheme: TbSSch) presented in Table 1. We then annotated each CpG locus using its relevant gene in order to generate 13 final lists of trio-based DMGs with our custom TbSSch. If a gene is associated with more than one marked CpG locus, the maximum of the calculated TbSSch was used in the list.
Table 1. Trio-based scoring scheme (TbSSch) calculations for the selected CpG loci.
If methylation status of the | Calculate TbSSch as | |||
---|---|---|---|---|
Child is | Affected Parent is | Unaffected Parent is | ||
hyper | hyper | hypo | [(Ca-0.5)+(PAb-0.5)+(0.5-PUc)]/1.5 | |
hypo | hypo | hyper | [(0.5-Ca)+(0.5-PAb)+(PUc-0.5)]/1.5 |
Beta values are represented respectively as C for the affected child, PA for affected parent and PU for the unaffected parent.
Additionally, a single pooled list of DMGs, namely the ‘family pool list’ has also been produced, in which genes that exist in more than one trio-based DMGs were included. The maximum TbSSch was again assigned to the family pool list for sorting purposes. The trio-based DMA analysis presented herein is also repeated for probes residing only at TSS1500 and TSS200 locations for assessing differential promoter methylation. To sum up, we have produced gene lists for 13 individual trios and a single-family pool list both for genome-wide and promoter-specific differential methylation patterns. These lists of DMGs along with their p values are given as input to PANOGA (Pathway and Network Oriented GWAS Analysis) to identify pathways with proteins that are significantly altered for each epilepsy group.
PANOGA is a protocol originally designed for identification of SNP-targeted pathways from genome-wide association (GWAS) data [14]. Herein, we have implemented PANOGA to our genome-wide DNA methylation dataset and accordingly performed enrichment analysis using genes associated with differentially methylated CpG loci instead of SNP targeted genes.
Data analyses were conducted using the default PANOGA pipeline (for details [14, 15]). Briefly, PANOGA first searches out active sub-networks containing most of the disease affected proteins in the human PPI network [16]. Active Modules algorithm [17] is employed to identify the sub-networks taking into account the P-values of each gene with the network topology to extract potentially meaningful active sub-networks that overlaps at most 50% with each other. The next step following the identification of sub-networks is to evaluate whether these sub-networks are biologically meaningful. For each sub-network, PANOGA computes the number of genes in an identified sub-network that are also found in a specific human biochemical pathway, compared to the overall number of genes described for that pathway. In this functional enrichment step, PANOGA uses a two-sided (Enrichment/Depletion) test based on the hyper-geometric distribution to examine the association between the genes introduced to PANOGA and the genes found in each Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway [18]. To correct the p-values for multiple testing, the Bonferroni correction procedure was applied on the p-values of each identified pathway. If a KEGG pathway is determined statistically significant for at least one of the active sub-networks, PANOGA adds this pathway into our final list of significant KEGG pathways associated with disease. If a pathway appears in more than one subnetwork analysis only the most significant one is reported.
Dual DMA: Comparison of the epileptic versus healthy groups for KEGG annotation
The dual DMA approach focuses on identification of any possible methylation mark specific to the pooled epileptic group using the pooled healthy group as a control. For this analysis, initially DNA methylation heatmap was constructed using unsupervised hierarchical clustering of CpG sites with the highest variance in methylation across all samples. Then the correlation in average methylation values for epileptic versus healthy group was plotted were plotted.
We then used the average beta values values in order to annotate KEGG pathways identified via the PANOGA analysis. For the the average beta values for epileptic and healthy groups were listed for island probes. The probes were mapped to relevant genes and only-pathway related genes were sorted. The average beta value for each gene for epileptic and healthy groups were rescaled into a range between -1 and 1 using the function: y = C+(x-A)(D-C)/(B-A), where A and B stands for minimum and maximum points of the data; C and D equals to the extreme points of the new scale. x is the related value in our data set and y is the new scaled value. Since beta values range from 0 to 1, we used a larger range to scale beta values from 0 to 0.5 into -1 to 0 and 0.5 to 1 into 0 to 1, in order to get a clearer understanding for genes involved in KEGG pathway. Eventually, these values were integrated into KEGG pathway via the Pathview package, which is an R/Bioconductor package for pathway-based data integration and visualization [19]. If a gene has a single color, then the methylation rage is the same both across epileptic and healthy groups. If the gene has a dual color, then the left side represents the methylation status for epileptic and the right side represents the healthy group.
Results
Herein we present pathway oriented genome-wide DNA methylation analyses related to 13 GGE trios which meet relevant quality control metrics (S1 Table). We have focused on a novel enrichment analysis strategy by using a custom designed TbSSch for each CpG locus in individual trios. Stringent cut-off values for hypomethylation (0–0.2) and hypermethylation (0.8–1) were set initially and differentially methylated CpGs were selected in each trio if this locus was hypomethylated in the affected parent-child couple compared to the unaffected parent or vice versa. In this context, analyses were made for 13 trios and family pools according to genome-wide and promoter site CpG loci. The relevant loci were mapped to associated genes and imputed with PANOGA protocol to analyze pathway specific variances. The statistically significant pathways found in the analysis are given in Fig 1. Details about the analysis are also given in S2–S5 Tables.
Fig 1 shows PANOGA detected pathways in 13 trios and family pool both in the genome-wide and promoter level analysis. Among these pathways, ‘Neurotrophin Signaling Pathway’, ‘Pathways in Cancer’, ‘Focal Adhesion’ and ‘Metabolic Pathway’ reside both in genome level and promoter specific trio lists and also found to be significant with a p value of smaller than 0.05 in family pool analyses. The neurotrophin and cancer signaling pathways are the most common pathways; they are detected in all trios analyzed at the genome level. Neurotrophin signaling pathway appears to be the most epilepsy related pathway. The genes residing in these two KEGG pathways are dual colored according to rescaled average beta values from island-specific probes in epileptic and healthy groups (Figs 2 and 3). The list of dual colored and thus differentially methylated genes between epileptic and healthy groups for neurotrophin and cancer signaling pathways are tabulated in S6 Table along with their description and associated phenotypes (ENSEMBL-Biomart). The usual suspects for epilepsy, namely brain derived neurotrophic factor (BDNF) detected from the neurotrophin and solute carrier family 2 member 1 (SLC2A1/GLUT1) from the cancer pathway are highlighted via this approach. These genes along with other differentially methylated genes may be biomarkers for epilepsy. KEGG pathway sketches with dual gene coloring for other pathways are given in S4–S8 Figs.
Discussion
In this study we have analyzed genome-wide DNA methylation profiles, of epileptic and healthy individuals, derived from vertically inherited parent-offspring trios.
In biologically unsupervised analyzes, the phenotypic effects created by genes working together are overlooked. For this reason, some genes in the genome-wide DNA methylation data set may escape identification with the DMA algorithms. To address this problem, gene lists for all thirteen families and the family pool were run through the PANOGA implementation separately. The aim of this analysis was to investigate the common pathways for scored and filtered genes. ‘Neurotrophin Signaling Pathway’, ‘Pathways in Cancer’, ‘Focal Adhesion’ and ‘Metabolic Pathway’ were all identified in trio-based enrichment analysis and they were also significant in the family pool analysis (Fig 1). The neurotrophin signaling pathway was one of the most common pathways identified, which was found in all 13 trios with high significance.
Neurotrophins define a set of molecules, which are responsible for differentiation and survival of neural cells and their relationship with epilepsy were shown in many studies [20–22]. KEGG annotation of the methylation profiles produced a set of genes guided by functional information. Shetty et al. emphasized the positive correlation between increased "nerve growth factor" (NGF) expression and dentate mossy fiber sprouting that has been associated with mesial temporal lobe epilepsy. Moreover Adams et al. reported that the NGF blockage inhibited the seizure activity with mossy fiber sprouting in kindling model of epilepsy [21–23]. Our findings showed decreased NGF methylation level with a group of neurotropic factor molecules (Figs 2 and 3). With the differential methylation analysis alone, it could be challenging to split up these findings from genome-wide DNA methylation data without pathway-oriented approach. Therefore, it is important to investigate the importance to the genes that were found differentially methylated in pathways via the PANOGA in future studies (S4–S8 Fig).
‘Pathways in cancer’ is a constellation of many smaller pathways and gene cascades implemented in DNA damage, proliferation, angiogenesis and apoptosis [24]. Therefore, it may be important to consider the breakdown of this pathway and functionally analyze the resulting sub-pathways and associated genes. Fig 1 illustrates the sub-pathways associated with huge cancer pathway, including focal adhesion and several signaling pathways such as mTOR, PPAR and p53.
In this study, we have used a novel approach for clustering genome wide differentially methylated regions using trio datasets. Differentially methylated DNA are one of the players in the complex scenario leading to GGE phenotype. Therefore, it is hard to associate gene specific minor DNA methylation variances with epilepsy. We used an adapted PANOGA approach, where genes with minor differences in DNA methylation were enriched for epilepsy related pathway detection. The trios were pivotal for this analysis; as very stringent cut-off values could be set for each CpG locus analyzed in a given trio. The inherited DNA methylation marks within each trio were analyzed in the pathway level, which are potential targets for epilepsy research and therapy. It is also important to evaluate genes and pathways in terms of expression and common genetic variants as well as DNA methylation marks.
Supporting information
Acknowledgments
The authors are grateful to the members of the trios for their participation in this study. We also thank Prof Dr. Thomas Sander and Ann Kathrin for assisting wet laboratory applications. We would like to acknowledge Prof Dr. Hande Caglayan for her support throughout the project. This work was supported by the grants of The Scientific and Technology Research Council of Turkey (TUBITAK) Project Number: IntenC-109S218 and Scientific Research Projects Coordination Unit of Istanbul University, Project Number: ONAP-11021. Biobanking support was given by Istanbul Development Agency (Project Number: TR10/15/YNK/0093). OO, EE and SAUI had been fellows of the TUBITAK-IntenC project. The data is available on NCBI/GEO platform (GSE119684).
Data Availability
All raw data files (idat) and normalized data are available in the GEO-NCBI database (accession number: GSE1196844). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE119684
Funding Statement
This work was supported by the grants of The Scientific and Technology Research Council of Turkey (TUBITAK) Project Number: IntenC-109S218 and Scientific Research Projects Coordination Unit of Istanbul University, Project Number: ONAP-11021. Biobanking support was given by Istanbul Development Agency (Project Number: TR10/15/YNK/0093). OO, EE and SAUI had been fellows of TUBITAK-IntenC project.
References
- 1.Scheffer IE, Berkovic S, Capovilla G, Connolly MB, French J, Guilhoto L, et al. ILAE classification of the epilepsies: Position paper of the ILAE Commission for Classification and Terminology. Epilepsia. 2017;58(4):512–21. Epub 2017/03/08. 10.1111/epi.13709 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Jallon P, Latour P. Epidemiology of idiopathic generalized epilepsies. Epilepsia. 2005;46 Suppl 9:10–4. 10.1111/j.1528-1167.2005.00309.x . [DOI] [PubMed] [Google Scholar]
- 3.Nordli DR. Idiopathic generalized epilepsies recognized by the International League Against Epilepsy. Epilepsia. 2005;46 Suppl 9:48–56. 10.1111/j.1528-1167.2005.00313.x . [DOI] [PubMed] [Google Scholar]
- 4.Pandolfo M. Pediatric epilepsy genetics. Curr Opin Neurol. 2013;26(2):137–45. 10.1097/WCO.0b013e32835f19da . [DOI] [PubMed] [Google Scholar]
- 5.Kobow K, Blümcke I. The emerging role of DNA methylation in epileptogenesis. Epilepsia. 2012;53 Suppl 9:11–20. 10.1111/epi.12031 . [DOI] [PubMed] [Google Scholar]
- 6.Stricker SH, Köferle A, Beck S. From profiles to function in epigenomics. Nat Rev Genet. 2017;18(1):51–66. Epub 2016/11/21. 10.1038/nrg.2016.138 . [DOI] [PubMed] [Google Scholar]
- 7.Henshall DC, Kobow K. Epigenetics and Epilepsy. Cold Spring Harb Perspect Med. 2015;5(12). Epub 2015/10/05. 10.1101/cshperspect.a022731 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lemire M, Zaidi SH, Ban M, Ge B, Aïssi D, Germain M, et al. Long-range epigenetic regulation is conferred by genetic variation located at thousands of independent loci. Nat Commun. 2015;6:6326 Epub 2015/02/26. 10.1038/ncomms7326 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, et al. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98(4):288–95. Epub 2011/08/02. 10.1016/j.ygeno.2011.07.007 . [DOI] [PubMed] [Google Scholar]
- 10.Davis SDP, Bilke S, Triche T Jr. Bootwalla M. methylumi: Handle Illumina methylation data R package version 2.24.1.2017. [Google Scholar]
- 11.Maksimovic J, Gordon L, Oshlack A. SWAN: Subset-quantile within array normalization for illumina infinium HumanMethylation450 BeadChips. Genome Biol. 2012;13(6):R44 10.1186/gb-2012-13-6-r44 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Assenov Y, Müller F, Lutsik P, Walter J, Lengauer T, Bock C. Comprehensive analysis of DNA methylation data with RnBeads. Nat Methods. 2014;11(11):1138–40. 10.1038/nmeth.3115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wang F, Xu H, Zhao H, Gelernter J, Zhang H. DNA co-methylation modules in postmortem prefrontal cortex tissues of European Australians with alcohol use disorders. Sci Rep. 2016;6:19430 Epub 2016/01/14. 10.1038/srep19430 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bakir-Gungor B, Egemen E, Sezerman OU. PANOGA: a web server for identification of SNP-targeted pathways from genome-wide association study data. Bioinformatics. 2014;30(9):1287–9. 10.1093/bioinformatics/btt743 . [DOI] [PubMed] [Google Scholar]
- 15.Bakir-Gungor B, Sezerman OU. A new methodology to associate SNPs with human diseases according to their pathway related context. PLoS One. 2011;6(10):e26277 10.1371/journal.pone.0026277 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL. The human disease network. Proc Natl Acad Sci U S A. 2007;104(21):8685–90. 10.1073/pnas.0701361104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics. 2002;18 Suppl 1:S233–40. . [DOI] [PubMed] [Google Scholar]
- 18.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Luo W, Brouwer C. Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics. 2013;29(14):1830–1. Epub 2013/06/04. 10.1093/bioinformatics/btt285 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kandratavicius L, Monteiro MR, Assirati JA, Carlotti CG, Hallak JE, Leite JP. Neurotrophins in mesial temporal lobe epilepsy with and without psychiatric comorbidities. J Neuropathol Exp Neurol. 2013;72(11):1029–42. 10.1097/NEN.0000000000000002 . [DOI] [PubMed] [Google Scholar]
- 21.Adams B, Sazgar M, Osehobo P, Van der Zee CE, Diamond J, Fahnestock M, et al. Nerve growth factor accelerates seizure development, enhances mossy fiber sprouting, and attenuates seizure-induced decreases in neuronal density in the kindling model of epilepsy. J Neurosci. 1997;17(14):5288–96. . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Jankowsky JL, Patterson PH. The role of cytokines and growth factors in seizures and their sequelae. Prog Neurobiol. 2001;63(2):125–49. . [DOI] [PubMed] [Google Scholar]
- 23.Shetty AK, Zaman V, Shetty GA. Hippocampal neurotrophin levels in a kainate model of temporal lobe epilepsy: a lack of correlation between brain-derived neurotrophic factor content and progression of aberrant dentate mossy fiber sprouting. J Neurochem. 2003;87(1):147–59. . [DOI] [PubMed] [Google Scholar]
- 24.Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(D1):D457–62. 10.1093/nar/gkv1070 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All raw data files (idat) and normalized data are available in the GEO-NCBI database (accession number: GSE1196844). https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE119684