Abstract
Mouse E14 embryonic stem cells (ESCs) are the most used ESC line, often employed for genome-wide studies involving next generation sequencing analysis [1], [2], [3], [4], [5]. More than 2 × 10 E9 sequences made on Illumina platform derived from the genome of E14 embryonic stem cells cultured in our laboratory were used to build a database of about 2.7 × 10 E6 single nucleotide variant [6]. The database was validated using other two sequencing datasets from other laboratory and high overlap was observed. The identified variants are enriched on intergenic regions, but several thousands reside on gene exons and regulatory regions, such as promoters, enhancers, splicing site and untranslated regions of RNA, thus indicating high probability of an important functional impact on the molecular biology of these cells. We created a new E14 genome assembly including the new identified variants and used it to map reads from next generation sequencing data generated in our laboratory or in others on E14 cell line. We observed an increase in the number of mapped reads of about 5%. CpG dinucleotide showed the higher variation frequency, probably because it could be a target of DNA methylation. Data were deposited in GEO datasets under reference GSM1283021 and here: http://epigenetics.hugef-research.org/data.php.
Keywords: ESC, NGS, Whole-genome E14
| Specifications | |
|---|---|
| Organism/cell line/tissue | Mouse E14 embryonic stem cells |
| Sex | Male |
| Sequencer or array type | Illumina HiScanSQ |
| Data format | Raw and analyzed |
| Experimental factors | N/A |
| Experimental features | Whole genome sequencing of E14 embryonic stem cells |
| Consent | N/A |
| Sample source location | Torino, Italy |
Direct link to deposited data
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1283021
Experimental design, materials and methods
E14 mouse ES cells were cultured in ESC medium (DMEM high glucose with 15% fetal bovine serum [FBS], NNEA1x, NaPyr1x, 0.1 mM 2-mercaptoethanol, and 1500 U/ml LIF). Genomic DNA was extracted using a DNeasy Blood and Tissue kit (Qiagen).
For sequencing of E14 genome, DNA was sonicated for 17′ pulse 30″ON/30″OFF high with Bioruptor Twin (Diagenode). Libraries were generated with DNA Sample Prep Kit (Illumina) and sequenced on Illumina HiScanSQ Platform. Basecalls performed using CASAVA version 1.8 following default parameters. Reads quality was estimated using FastQC tool v0.10.1 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Nucleotide positions with a quality score under 30 (Phred33 scale) were trimmed using the fastx_trimmer tool from the FASTX Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/).
After low-quality positions trimming, reads in which sequencing continued through the 3′ adapter sequence were clipped using the fastx_clipper tool from the FASTX Toolkit. Then, reads were aligned to the mouse genome assembly mm9 using Bowtie [7] v0.12.7 with the following parameters: -q --max /dev/null -v 1 -S --sam-nohead -m 1. Reads with the same mapping positions were collapsed into one using the rmdup tool from SAMtools. Variants calling was performed using the mpileup tool from SAMtools [8]. Next, we used VCFtools [9] v0.1.11 (http://vcftools.sourceforge.net/) to select only SNVs with coverage of ≥10 and a frequency of ≥0.5. Moreover, using custom Perl scripts we discarded sites with more than one variant call at the same place. Finally, using the GATK v2.7-4 (http://www.broadinstitute.org/gatk/) FastaAlternateReferenceMaker function we created the new reference E14 assembly from the mm9 genome assembly.
These data can be found at: http://epigenetics.hugef-research.org/data.php.
Contributor Information
Danny Incarnato, Email: danny.incarnato@hugef-torino.org.
Francesco Neri, Email: francesco.neri@hugef-torino.org, fneri@nextgenintelligence.com.
References
- 1.Krepelova A., Neri F., Maldotti M., Rapelli S., Oliviero S. Myc and max genome-wide binding sites analysis links the Myc regulatory network with the polycomb and the core pluripotency networks in mouse embryonic stem cells. PLoS ONE. 2014;9:e88933. doi: 10.1371/journal.pone.0088933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Neri F., Incarnato D., Krepelova A., Rapelli S., Pagnani A., Zecchina R. Genome-wide analysis identifies a functional association of Tet1 and Polycomb PRC2 in mouse embryonic stem cells. Genome Biol. 2013;14:R91. doi: 10.1186/gb-2013-14-8-r91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Chen X., Xu H., Yuan P., Fang F., Huss M., Vega V.B. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell. 2008;133:1106–1117. doi: 10.1016/j.cell.2008.04.043. [DOI] [PubMed] [Google Scholar]
- 4.Neri F., Krepelova A., Incarnato D., Maldotti M., Parlato C., Galvagni F. Dnmt3L antagonizes DNA methylation at bivalent promoters and favors DNA methylation at gene bodies in ESCs. Cell. 2013;155:121–134. doi: 10.1016/j.cell.2013.08.056. [DOI] [PubMed] [Google Scholar]
- 5.Williams K., Christensen J., Pedersen M.T., Johansen J.V., Cloos P.A.C., Rappsilber J. TET1 and hydroxymethylcytosine in transcription and DNA methylation fidelity. Nature. 2011;473:343–348. doi: 10.1038/nature10066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Incarnato D., Krepelova A., Neri F. High-throughput single nucleotide variant discovery in E14 mouse embryonic stem cells provides a new reference genome assembly. Genomics. 2014;104:121–127. doi: 10.1016/j.ygeno.2014.06.007. [DOI] [PubMed] [Google Scholar]
- 7.Langmead B., Trapnell C., Pop M., Salzberg S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Danecek P., Auton A., Abecasis G., Albers C.A., Banks E., DePristo M.A. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
