Skip to main content
GigaScience logoLink to GigaScience
. 2015 Nov 26;4:55. doi: 10.1186/s13742-015-0098-x

Exemplary multiplex bisulfite amplicon data used to demonstrate the utility of Methpat

Nicholas C Wong 1,2,3,17, Bernard J Pope 4,5, Ida Candiloro 6,7, Darren Korbie 8, Matt Trau 8,9, Stephen Q Wong 10,16,18, Thomas Mikeska 1,10,15,16, Bryce J W van Denderen 11, Erik W Thompson 12, Stefanie Eggers 2, Stephen R Doyle 13, Alexander Dobrovic 1,7,14,15,
PMCID: PMC4660811  PMID: 26613017

Abstract

Background

DNA methylation is a complex epigenetic marker that can be analyzed using a wide variety of methods. Interpretation and visualization of DNA methylation data can mask complexity in terms of methylation status at each CpG site, cellular heterogeneity of samples and allelic DNA methylation patterns within a given DNA strand. Bisulfite sequencing is considered the gold standard, but visualization of massively parallel sequencing results remains a significant challenge.

Findings

We created a program called Methpat that facilitates visualization and interpretation of bisulfite sequencing data generated by massively parallel sequencing. To demonstrate this, we performed multiplex PCR that targeted 48 regions of interest across 86 human samples. The regions selected included known gene promoters associated with cancer, repetitive elements, known imprinted regions and mitochondrial genomic sequences. We interrogated a range of samples including human cell lines, primary tumours and primary tissue samples. Methpat generates two forms of output: a tab-delimited text file for each sample that summarizes DNA methylation patterns and their read counts for each amplicon, and a HTML file that summarizes this data visually. Methpat can be used with publicly available whole genome bisulfite sequencing and reduced representation bisulfite sequencing datasets with sufficient read depths.

Conclusions

Using Methpat, complex DNA methylation data derived from massively parallel sequencing can be summarized and visualized for biological interpretation. By accounting for allelic DNA methylation states and their abundance in a sample, Methpat can unmask the complexity of DNA methylation and yield further biological insight in existing datasets.

Keywords: DNA methylation, Bisulfite sequencing, PCR, Visualization, Epigenetics, Cancer, Epialleles

Data description

DNA methylation can be analyzed using a wide range of methods [1], with bisulfite sequencing considered the current gold standard. Current technologies such as whole genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) provide unprecedented detail of methylation patterns throughout the genome, but the complexity of DNA methylation patterns is masked when simple summary metrics are used. For example, most studies of DNA methylation rationalize levels to a percentage value, which typically masks allelic patterns when interpreting the data. We have developed Methpat, a tool that summarizes and visualizes complex DNA methylation data collected by massively parallel sequencing of bisulfite DNA [2]. Using this tool, the DNA methylation state of individual CpG sites and the abundance of allelic patterns can be visualized [3]. Furthermore, by measuring the abundance of allelic DNA methylation patterns, cellular heterogeneity in methylation patterns can now be explored [4].

The utility of Methpat was demonstrated by measuring DNA methylation in 86 samples (Table 1) across 48 regions of interest (Table 2). This was achieved by using multiplex PCR on bisulfite converted DNA followed by massively parallel sequencing using an Illumina MiSeq Sequencing platform with v3 chemistry. Each sample was indexed and pooled at equimolar concentrations into a single library pool for sequencing. Data has been deposited into GEO with reference identifiers GSE67856 [5] and GSE71804 [6]. A panel of breast cancer cell lines treated with epidermal growth factor and transforming growth factor beta were also analyzed in parallel [7].

Table 1.

Human Samples used in this study

Sample Name Description GEO Accession
293 HEK-293 embryonic kidney cell line. ATCC CRL1573 GSE67856
40424 Normal fibroblast cell line GSE67856
910046 Normal fibroblast cell line GSE67856
12A-CD19 Normal Fluorescent Activated Cell Sorted (FACS) CD19 positive bone marrow cells from individual 12A GSE67856
12A-CD33 Normal Fluorescent Activated Cell Sorted (FACS) CD33 positive bone marrow cells from individual 12A GSE67856
12A-CD34 Normal Fluorescent Activated Cell Sorted (FACS) CD34 positive bone marrow cells from individual 12A GSE67856
12A-CD45 Normal Fluorescent Activated Cell Sorted (FACS) CD45 positive bone marrow cells from individual 12A GSE67856
6-MDA453 MDA-MB-453 metastatic breast cancer cell line. ATCC HTB-131 GSE67856
6C-CD19 Normal Fluorescent Activated Cell Sorted (FACS) CD19 positive bone marrow cells from individual 6C GSE67856
6C-CD33 Normal Fluorescent Activated Cell Sorted (FACS) CD33 positive bone marrow cells from individual 6C GSE67856
6C-CD34 Normal Fluorescent Activated Cell Sorted (FACS) CD34 positive bone marrow cells from individual 6C GSE67856
6C-CD45 Normal Fluorescent Activated Cell Sorted (FACS) CD45 positive bone marrow cells from individual 6C GSE67856
9A-CD19 Normal Fluorescent Activated Cell Sorted (FACS) CD19 positive bone marrow cells from individual 9A GSE67856
9A-CD33 Normal Fluorescent Activated Cell Sorted (FACS) CD33 positive bone marrow cells from individual 9A GSE67856
9A-CD34 Normal Fluorescent Activated Cell Sorted (FACS) CD34 positive bone marrow cells from individual 9A GSE67856
9A-CD45 Normal Fluorescent Activated Cell Sorted (FACS) CD45 positive bone marrow cells from individual 9A GSE67856
9A-Whole-Blood Whole blood sample from individual 9A GSE67856
BRL Normal lymphoblast cell line. GSE67856
CaCo Caco2 Colon cancer cell line. ATCC HTB37 GSE67856
DG75 Lymphoblast cancer cell line. ATCC CRL-2625 GSE67856
EKVX Cancer Cell Line GSE67856
HELA Cancer cell line. ATCC CCL-2 GSE67856
HEPG2 Liver cancer cell line. ATCC HB-8065 GSE67856
HT1080 Cancer cell line. ATCC CCL121 GSE67856
HTB22-Col MCF7 breast cancer cell line. ATCC HTB22 GSE67856
JWL Normal lymphoblast cell line. GSE67856
K562 CML cancer cell line. ATCC CCL-243 GSE67856
Sample29 Cell Line GSE71804
MB231BAG Breast cancer cell line. ATCC HTB-26 GSE67856
MCF7 Breast cancer cell line. ATCC HTB22 GSE67856
NALM6 Leukaemia cell line. ACC 128 GSE67856
NCCIT Embryonic carcinoma cell line. ATCC CRL-2073 GSE67856
OVCAR8 Cancer cell line GSE67856
SKNAS Neuroblastoma cancer cell line. ATCC CRL2137 GSE67856
U231 Cancer cell line GSE67856
Sample1 Human normal colon tissue GSE71804
Sample2 Human colon tumor GSE71804
Sample3 Human normal colon tissue GSE71804
Sample4 Human colon tumor GSE71804
Sample5 Human normal colon tissue GSE71804
Sample6 Human colon tumor GSE71804
Sample7 Human normal colon tissue GSE71804
Sample8 Human colon tumor GSE71804
Sample9 Human normal colon tissue GSE71804
Sample10 Human colon tumor GSE71804
Sample11 Human normal colon tissue GSE71804
Sample12 Human colon tumor GSE71804
Sample13 Pooled human cancer and blood cell DNA GSE71804
Sample14 Pooled human cancer and blood cell DNA GSE71804
Sample15 Pooled human cancer and blood cell DNA GSE71804
Sample16 Pooled human cancer and blood cell DNA GSE71804
Sample17 Pooled human cancer and blood cell DNA GSE71804
Sample18 Pooled human cancer and blood cell DNA GSE71804
Sample19 Artificially methylated human DNA GSE71804
Sample20 Artificially methylated human DNA GSE71804
Sample21 Artificially methylated human DNA GSE71804
Sample22 Artificially methylated human DNA GSE71804
Sample23 Artificially methylated human DNA GSE71804
Sample24 Artificially methylated human DNA GSE71804
Sample25 Human leukemia cell line GSE71804
Sample26 Human leukemia cell line GSE71804
Sample27 Human leukemia cell line GSE71804
Sample28 Human leukemia cell line GSE71804
468-C1-3-9_S40 MDA-468 cell line, control 1 GSE71804
468-C2-3-9_S48 MDA-468 cell line, control 2 GSE71804
468-S1-3-9_S56 MDA-468 cell line + EGF 1 GSE71804
468-S2-3-9_S64 MDA-468 cell line + EGF 2 GSE71804
ET-C1-3-9_S71 PMC42-ET cell line, control 1 GSE71804
ET-C2-3-9_S79 PMC42-ET cell line, control 2 GSE71804
ET-S1-3-9_S87 PMC42-ET cell line, +EGF 1 GSE71804
ET-S2-3-9_S95 PMC42-ET cell line, +EGF 2 GSE71804
LA-C1-3-9_S8 PMC42-LA cell line, control 1 GSE71804
LA-C3-3-9_S16 PMC42-LA cell line, control 2 GSE71804
LA-S1-3-9_S24 PMC42-LA cell line, +EGF 1 GSE71804
LA-S2-3-9_S32 PMC42-LA cell line, +EGF 2 GSE71804
PMC42ET-72-C_S31 PMC42-ET cell line, control 72 h GSE71804
PMC42ET-72 h-EGF_S39 PMC42-ET cell line, +EGF 72 h GSE71804
PMC42ET-9d-C_S47 PMC42-ET cell line, control 9 days GSE71804
PMC42ET-9d-EGF_S55 PMC42-ET cell line, +EGF 9 days GSE71804
PMC42ET-9d-TGFb_S63 PMC42-ET cell line, +TGFb 9 days GSE71804
PMC42LA-72 h-C_S86 PMC42-LA cell line, control 72 h GSE71804
PMC42LA-72 h-EGF_S94 PMC42-LA cell line, +EGF 72 h GSE71804
PMC42LA-9d-C_S7 PMC42-LA cell line, control 9 days GSE71804
PMC42LA-9d-EGF_S15 PMC42-LA cell line, +EGF 9 days GSE71804
PMC42LA-9d-TGFb_S23 PMC42-LA cell line, +TGFb 9 days GSE71804

Table 2.

Bisulfite PCR primers used in this study

Primer name Primer sequence Primer Tm Genomic location (hg38)
mandatory01_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGAAGTTTGGTYGTTGYGTTTTTAT 60.1–62.9
mandatory01_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGRAAACCRCTCRCRAAATACCCTA 57.6–64.6 chr4:154710460-154710544
mandatory02_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTAGYGGAGTTTAAGGGTTAGTGT 59.2–60.9
mandatory02_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAACRAAACRCACRTACRTATATTTATA 56.3–62.1 chr1:110052409-110052486
mandatory03_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGTTTGTTAGTTAGTTTTAGGTTTTTTAAT 59.8
mandatory03_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCTACCAAATTTCTATTACAAACCAAA 60.8 chr4:7526639-7526703
mandatory04_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGATTTGGTTTYGAGAGTTTGGATTTT 60.1–61.7
mandatory04_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAAAACCRCACACCTAAACACTTAAA 60.1–61.7 chr2:164593225-164593299
mandatory05_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGGAATTTTGAGATTTTTAAAAGTTTTTTT 59.8
mandatory05_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATAAAAACAACAAATACCACTTCCTAAA 59.9 chr2:9518296-9518358
mandatory06_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTGYGTYGATTTTGGTTTTGGTTAT 57.6–60.9
mandatory06_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCRACCCCTCCCAAATCCTAAAA 60.1–62.1 chr17:80709100-80709203
mandatory07_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGTTAGAGGAGAYGTTTTAGTTTTT 59.2–60.9
mandatory07_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAATTCCAAAAAACRTCAATCACAATAA 59.9–61.5 chr3:142837969-142838050
mandatory08_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGTTAAGAGGAGTTTGTTTTGTTTTAT 60.8
mandatory08_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTTCACTAAAAAACCTCACTCCCTA 60.9 chr7:140218100-140218192
mandatory09_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTTTAGAGTGTTTTTGGTTTTATTATTTTT 60.2
mandatory09_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTATTTACCCCTAAAAATACCCTTTATA 59.2 chr7:26206542-26206614
mandatory10_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGAAGTTGAAGTGAGAATGTGATT 60.3
mandatory10_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAATACCCATACAAACTATCTACACAA 60.1 chr7:3025554-3025664
mandatory11_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTATATAAAAATTATTAAGAATTTTATTGTTTTGT 58.5
mandatory11_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAATATAACCAAAATCCAAATAACACTAA 58.2 chr7:138229946-138230021
mandatory12_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGYGGYGTTTGATGGATTTGGTTT 59.2–62.9
mandatory12_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTTAATATAACCTAAACCCATATACTA 59.2 chr2:42275714-42275789
mandatory13_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTAGATTATGTTAAGGATTTTGGAAAT 59.2
mandatory13_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCTATACTATCAACACCCATTACTTAA 60.8 chr15:100249155-100249220
mandatory14_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTAAATTAGATGAGGTATAGTAGATTATAT 59.2
mandatory14_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAACTCTATCTCAAACTTCAAAAAATA 59.2 chr4:147557821-147557938
mandatory15_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTGGGGGATAGTTTTGGGTAT 60.1
mandatory15_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTACAACCTCCTACAAAAAAACCCTA 60.9 chr17:75369174-75369252
mandatory16_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATATTTTTAATTTAATTTGAAGGTTTATTGT 57.8
mandatory16_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCCAAACTTTCTCCTATAATCCAA 60.3 chr7:93520244-93520332
h19_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTTGTATTATTTTTTTTTTTGAGAGTTTATTT 60.2
h19_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATACRAAAAAAACCCACAATAAACTTAATA 59.8–61 chr11:2017873-2018050
mest_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGTTTTGTTTTTTTAATTGTGTTTATTGTTT 60.2
mest_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTAACCACTATAACCAAAATTACACAAAA 59.9 chr7:130131098-130131299
xist_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTAGTAATTTAGTATTGTTTATTTTATTTTTTT 59
xist_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATAACRAACCTCTTTATCTTTACTATATA 59.2–60.5 chrX:73070975-73071183
runx3_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTAGAYGTTYGGAGTTTTAGGGT 58.3–62
runx3_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCRACAACCCCAACTTCCTCTA 59.5–61.2 chr1:25256022-25256153
rarb_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAATTTTTTTATGYGAGTTGTTTGAGGAT 59.9–61.5
rarb_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTCCTTCCAAATAAATACTTACAAAAAA 59.9 chr3:25469822-25469959
mlh1_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGYGGGAGGTTATAAGAGTAGGGTT 60.9–62.9
mlh1_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATACRAAATATCCAACCAATAAAAACAAAA 59.8–61 chr3:37034573-37034734
rassf1a_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTTTYGTAGTTTAATGAGTTTAGGTTTT 60.5–62.1
rassf1a_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAATCCCTACACCCAAATTTCCATTA 60.9 chr3:50378200-50378398
apc_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGAGAGAAGTAGTTGTGTAAT 60.3
apc_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCATTCTATCTCCAATAACACCCTAA 60.9 chr5:112073447-112073596
cdkn2a_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATTTTGTTTTTTAAATTTTTTGGAGGGAT 59.2
cdkn2a_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCCAACCTAAAACRACTTCAAAAATA 60.1–61.7 chr9:21974960-21975097
dapk1_p1_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTYGGAGTGTGAGGAGGATAGT 60.9–62.9
dapk1_p1_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGRACRACRAAAACACAACTAAAAAATAAATA 58.5–62.6 chr9:90112783-90112938
dapk1_p2_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGYGGAGGGATYGGGGAGTTTTT 62.1–65.5
dapk1_p2_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCRCCTTAACCTTCCCAATTA 63.6–65.2 chr9:90112991-90113144
dapk1_i1_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGAGGYGGGGAGGTTAGTTAT 61.2–63.2
dapk1_i1_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAATAAAAAAAAACACCCTTTATTAAAACTAA 59.8 chr9:90113588-90113759
gstp1_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTGGGAAAGAGGGAAAGGTTTTT 60.3
gstp1_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGRCRACCTCCRAACCTTATAAAAATAA 58.4–62.9 chr11:67351064-67351273
cdh1_snp_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATTTTAGTAATTTTAGGTTAGAGGGTT 59.2
cdh1_snp_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAAAATAAATACRTAACTACAACCAAATAAA 59–60.2 chr16:68771006-68771197
cdh1_3ê_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTYGGAATTGTAAAGTATTTGTGAGT 60.1–61.7
cdh1_3ê_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATCAAAAAATCCRAAATACCTACAACAA 59.5–61.5 chr16:68771201-68771385
brca1_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTAGTTATTTGAGAAATTTTATAGTTTGTT 59
brca1_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAATTTCRTATTCTAAAAAACTACTACTTAA 58.5–59.8 chr17:41277330-41277493
AluSx_1_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGATTAGTTTGGTTAATATGGTGAAATT 59.9
AluSx_1_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTCTATCRCCCAAACTAAAATACAATA 60.8–62.1
AluSx_2_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTTGTAATTTTAGTATTTTGGGAGGT 60.8
AluSx_2_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAACCTCCCRAATAACTAAAACTACAA 60.1–61.7
L1ME_ORF2_1_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATGATAAAAGGGTTAATTTATTAGAAAGAT 59.8
L1ME_ORF2_1_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTATCTAATTATTCTRTCAATTACTAAAAA 58.5–59.8
L1ME_ORF2_2_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGATTGATAAAGAAGAAAATAGATAAGATAT 59.8
L1ME_ORF2_2_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTATTCAAATTTTCTATTTCTTTTTAAATCAA 59.8
foxe3_2_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTTGGGGAGGTTTATTTGAGGT 59.2
foxe3_2_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAACRCAAAATATACTCCAAACCAAAATA 59.9–61.5 chr1
foxp3_1_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTGGGTTTAGGGTTTTATTTGTAGT 59.2
foxp3_1_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACCCAAAACCTCAAACCTACTAAA 60.3 chrX
foxp3_2_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTTTGGGGATGGGTTAAGGGTT 60.9
foxp3_2_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAACCAATACCTACTTTAACCAAAAA 60.1 chrX
tlx3_1_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTYGGTTTAAGAAAGATGATATAGAGTT 59.9–61.5
tlx3_1_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCCATCCTAAACRAACRAAAAAACTAA 59.2–62.1 chr5
tlx3_2_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGYGTTAGTTATTTGGGAGGGTTT 59.2–60.9
tlx3_2_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAACRCTAAACTCAAATTCACACTATAAA 59.5–61.5 chr5
uniq_noCG_1_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGTTATGTAGTTTTAGTTAGAAGTTT 59.2
uniq_noCG_1_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAATCTAAATTTTAACACCTAAAACTATTTTAA 59.8 chr5
uniq_noCG_2_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATATGAAAGGTTGGTTTTATTGTTGAAT 59.9
uniq_noCG_2_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAAATAAACTTAATAACTCTACTCTTATATA 59 chr5
mgmt_1_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTGAGTTAGGTTTTGGTAGTGTT 60.3
mgmt_1_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTAATACCRCTCCCCTAATCAAAA 60.3–62 chr10
mgmt_2_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGGTAGTTTYGAGTGGTTTTGT 59.2–60.9
mgmt_2_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAACTAAACAACACCTAAAAAACACTTAA 59.9 chr10
mito_1_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTATTTATTTTTAATAGTATATAGTATATAAAGTT 58.5
mito_1_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACTTTAACTACCCCCAAATATTATAA 58.4 chrM
mito_2_plus_F TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATGATTTTTAATAGGGGTTTTTTTAGTTT 59.2
mito_2_plus_R GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCRTATCRAAAACCTTTTTAAACAAATAATA 58.5–61 chrM

Our initial QC assessment indicated high bisulfite conversion efficiency with very low non-CpG Cs in reads. An additional amplicon that corresponded to a sequence containing no CpG sites was also included as a control, from which all cytosines were observed to have converted to thymidine residues [1].

The data included here are the Sequence Read Archive files generated from our experiment. These have been aligned onto the hg38 reference genome using Bismark v0.9.0, from which a BAM file for each sample is generated. Using the Bismark_methylation_extractor command, the methylation status of cytosine residues within each read is output to a tab-delimited file. Methpat then operates on this output file to generate both a summarized tab-delimited file of read pattern counts and a HTML file for visualization. We have included the BAM files, Bismark_methylation_extractor output files and Methpat output files as supporting data. Methpat requires a Browser Extensive Data (BED)-format-like file that contains the coordinates for each amplicon of interest, their size and their primer lengths to extract and summarize DNA methylation pattern counts. The flow of data is summarized in Fig. 1.

Fig. 1.

Fig. 1

Flow of data towards visualization via Methpat. Raw fastq files are aligned to the hg38 reference genome in bisulfite space. a hg38 reference is prepared for Bismark using Bismark_genome_preparation with default parameters. b Bismark is used to align raw reads from fastq files to generate BAM alignment files. c Bismark_methylation_extractor is then used to extract the methylation status of all cytosines in every aligned read and outputs a tab-delimited file that Methpat operates on. Methpat requires this file along with a BED formatted file containing information for each amplicon of interest. This includes the start and end coordinates of the amplicon and the primer lengths for each amplicon. The output of Methpat is a summary tab-delimited file containing read counts of DNA methylation patterns of the amplicons of interest and an HTML file for visualization and publication quality figures

Our data has the potential to be used to investigate co-methylation [8], given the unprecedented depth of coverage of the amplicons investigated even in a single MiSeq run. We have interrogated a variety of regions of the genome including repetitive elements and the mitochondrial genome, which remain a challenge for most short read aligners. The interpretation of DNA methylation at repetitive sequence elements has always been a challenge and they are assumed to be methylated [9]. However, the dynamics of repetitive element DNA methylation in cancer [10] and development [11] remain areas of interest that can now be properly interpreted with massively parallel sequencing and visualization tools such as Methpat.

Availability of software and requirements

Project name: Methpat

Project home page: http://bjpop.github.io/methpat/

Operating system(s): any POSIX-like operating system (i.e.: Linux, OS X)

Programming language: Python 2.7, HTML and Javascript

Other requirements: Web Browser to view visualization output (HTML file). Suggested browsers include Firefox, Chrome or Safari. Methpat requires output files derived by Bismark (http://www.bioinformatics.babraham.ac.uk/projects/bismark/) and the Bismark_methylation_extractor command. Methpat can be accessed directly from http://bjpop.github.io/methpat/. With further instructions found at the URL.

License: 3-clause BSD License

Any restrictions to use by non-academics: None

A flow diagram of analytical requirements and files can be found in Fig. 1.

Availability of supporting data and materials

Sequence files associated with main research publication deposited in GEO, GSE67856 [5]. Remaining files are deposited in GEO, GSE71804 [6].

BAM files, bismark_methylation_extractor output files and Methpat output files for each sample analyzed in this paper are available in the GigaScience GigaDB repository [12].

Acknowledgements

We acknowledge Illumina Australia Pty Ltd for a MiSeq Pilot Sequencing Grant. This work was supported, in part, by National Breast Cancer Foundation of Australia (NCBF) grants to AD, EWT, DK and MT (CG-08-07, CG-10-04 and CG-12-07) and NHMRC APP1027527 (EWT, AD, NW, BJWVD). SW was supported by the Melbourne Melanoma Project funded by the Victorian Cancer Agency Translational Research program and established through support of the Victor Smorgon Charitable Fund. Computation time was granted by the Life Sciences Computation Centre (LSCC) at the Victorian Life Sciences Computational Initiative (VLSCI) under grant VR0002. The Murdoch Childrens Research Institute and St. Vincent’s Institute are supported by the Victorian Government Operational and Infrastructure Support Grant.

Abbreviations

BED

Browser extensible data

RRBS

Reduced representation bisulfite sequencing

WGBS

Whole genome bisulfite sequencing

Footnotes

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

NCW drafted the manuscript, designed and performed the experiment, and analyzed the data. BJP co-wrote the manuscript, wrote Methpat for visualization and analyzed the data. IC co-wrote the manuscript, designed and performed the experiment. DK co-wrote the manuscript and analyzed the data. MT co-wrote the manuscript. SQW co-wrote the manuscript and performed the experiment. TM co-wrote the manuscript, designed and performed the experiment. BJWVD co-wrote the manuscript and performed the experiment. EWT co-wrote the manuscript. CB co-wrote the manuscript and performed the experiment. SE co-wrote the manuscript and performed the experiment. SRD co-wrote the manuscript and performed the experiment. AD co-wrote the experiment, designed the experiment and analyzed and interpreted the results. All authors read and approved the final manuscript.

Contributor Information

Nicholas C. Wong, Email: nwon@unimelb.edu.au

Bernard J. Pope, Email: bjpope@unimelb.edu.au

Ida Candiloro, Email: ida.candiloro@unimelb.edu.au.

Darren Korbie, Email: d.korbie@uq.edu.au.

Matt Trau, Email: m.trau@uq.edu.au.

Stephen Q. Wong, Email: Stephen.Wong@petermac.org

Thomas Mikeska, Email: Thomas.Mikeska@onjcri.org.au.

Bryce J. W. van Denderen, Email: bvandenderen@gmail.com

Erik W. Thompson, Email: e2.thompson@qut.edu.au

Stefanie Eggers, Email: steffi.eggers@gmail.com.

Stephen R. Doyle, Email: s.doyle@latrobe.edu.au

Alexander Dobrovic, Email: alex.dobrovic@onjcri.org.au.

References

  • 1.Fraga MF, Esteller M. DNA methylation: a profile of methods and applications. Biotechniques. 2002;33:632–49. doi: 10.2144/02333rv01. [DOI] [PubMed] [Google Scholar]
  • 2.Wong NC, Pope BJ, Candiloro ILM, Korbie D, Trau M, Trau M, et al. MethPat: a tool for the analysis and visualisation of complex methylation patterns obtained by massively parallel sequencing. Submitted. [DOI] [PMC free article] [PubMed]
  • 3.Mikeska T, Candiloro IL, Dobrovic A. The implications of heterogeneous DNA methylation for the accurate quantification of methylation. Epigenomics. 2010;2:561–73. doi: 10.2217/epi.10.32. [DOI] [PubMed] [Google Scholar]
  • 4.Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, Wiencke JK, Kelsey KT. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics. 2012;13:1–16. doi: 10.1186/1471-2105-13-86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.GSE67856. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=qruhwasexjgtbmh&acc=GSE67856
  • 6.GSE71804. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=ahojaugovrcjbkp&acc=GSE71804
  • 7.Hugo HJ, Kokkinos MI, Blick T, Ackland ML, Thompson EW, Newgreen DF. Defining the e-cadherin repressor interactome in epithelial-mesenchymal transition: the PMC42 model as a case study. Cells Tissues Organs. 2011;193:23–40. doi: 10.1159/000320174. [DOI] [PubMed] [Google Scholar]
  • 8.Akulenko R, Helms V. DNA co-methylation analysis suggests novel functional associations between gene pairs in breast cancer samples. Hum Mol Genet. 2013;15:3016–22. doi: 10.1093/hmg/ddt158. [DOI] [PubMed] [Google Scholar]
  • 9.Yoder JA, Walsh CP, Bestor TH. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet. 1997;13:335–40. doi: 10.1016/S0168-9525(97)01181-5. [DOI] [PubMed] [Google Scholar]
  • 10.Wilson AS, Power BE, Molloy PL. DNA hypomethylation and human diseases. Biochim Biophys Acta. 2007;1775:138–62. doi: 10.1016/j.bbcan.2006.08.007. [DOI] [PubMed] [Google Scholar]
  • 11.Su J, Shao X, Liu H, Liu S, Wu Q, Zhang Y. Genome-wide dynamic changes of DNA methylation of repetitive elements in human embryonic stem cells and fetal fibroblasts. 2012, Genomics. 99(1): 10-7. [DOI] [PubMed]
  • 12.Wong NC, Pope BJ, Candiloro I, Korbie D, Trau M, Wong SQ, et al. Supporting data and materials for “Exemplary multiplex bisulfite amplicon data used to demonstrate the utility of Methpat”. GigaScience Database. 2015. http://dx.doi.org/10.5524/100167 [DOI] [PMC free article] [PubMed]

Articles from GigaScience are provided here courtesy of Oxford University Press

RESOURCES