Abstract
We report sequence data obtained by our recently devised target capture method TargetEC applied to 20 inbred rat strains. This method encompasses not only all annotated exons but also highly conserved non-coding sequences shared among vertebrates. The total length of the target regions covers 146.8 Mb. On an average, we obtained 31.7 × depth of target coverage and identified 154,330 SNVs and 24,368 INDELs for each strain. This corresponds to 470,037 unique SNVs and 68,652 unique INDELs among the 20 strains. The sequence data can be accessed at DDBJ/EMBL/GenBank under accession number PRJDB4648, and the identified variants have been deposited at http://bioinfo.sls.kyushu-u.ac.jp/rat_target_capture/20_strains.vcf.gz.
| Specifications [standardized info for the reader] | |
|---|---|
| Organism/cell line/tissue | Rattus norvegicus (BDIX/NemOda, BDIX. Cg-Tal/NemOda, BN/SsNSlc, BUF/MNa, DOB/Oda, F344/DuCrlCrlj, F344/Jcl, F344/NSlc, F344/Stm, HTX/Kyo, HWY/Slc, IS/Kyo, IS-Tlk/Kyo, KFRS3B/Kyo, LE/Stm, LEC/Tj, NIG-III/Hok, RCS/Kyo, ZF, ZFDM) |
| Sex | Female and male, see Table 1 |
| Sequencer or array type | Illumina NextSeq 500 |
| Data format | FASTQ and VCF |
| Experimental factors | Genomic DNA extracted from spleen |
| Experimental features | Target capture sequencing of exons and conserved non-coding sequences |
| Consent | Not applicable |
| Sample source location | Rat strains were provided by the National BioResource Project (NBRP)–Rat (http://www.anim.med.kyoto-u.ac.jp/nbr/). |
1. Direct link to deposited data [provide URL below]
http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJDB4648
http://bioinfo.sls.kyushu-u.ac.jp/rat_target_capture/20_strains.vcf.zip
2. Experimental design, materials and methods
Rats are used as animal models of many human diseases, such as cancer and hypertension. Because of its significance in biomedical analyses, the genome sequence of the Brown Norway rat strain was determined as the third complete mammalian genome [1]. The National BioResource Project–Rat (NBRP-Rat) at Kyoto University is one of the largest repositories for rat strains, and currently, > 700 strains have been collected and preserved as live animals, embryos, or sperm [2]. Determination of genome sequences for these strains is important not only for understanding genetic causes for various phenotypes but also to augment their value as biological resources.
Whole exome sequencing is an efficient approach to characterize only the exonic portions of a genome, which typically comprise 1%–2% of complete mammalian genomes, and has been successfully used in the identification of relevant genes and their causative mutations in many diseases in humans. Although some non-human exome capture kits exist, there had previously been no such capture probe set for rats. Therefore, we established a target capture kit specifically designed for this rodent species, employing the SeqCap EZ Developer Library (Roche NimbleGen, Madison, WI, USA; design name 140929_RN5_MS_EZ_HX1). In designing our target capture probe set, we included highly conserved non-coding sequences (CNSs) as target regions as well as all annotated exons, covering a total 146.8 Mb of the genome [3]. By applying this target capture method TargetEC (target capture for exons and conserved non-coding sequences) to four rat strains (WTC/Kyo, WTC-swh/Kyo, PVG/Seac, and KFRS4/Kyo), we confirmed that TargetEC performs efficiently in the identification of causative mutations, including those present in the non-coding regions [3]. In this study, we further applied TargetEC to 20 additional inbred strains preserved in NBRP-Rat to identify additional variants observed in multiple rat strains. These 20 strains were selected according to the following three categories: disease models derived from selective breeding (BDIX/NemOda, BDIX.Cg-Tal/NemOda, BUF/MNa, HTX/Kyo, HWY/Slc, KFRS3B/Kyo, RCS/Kyo, ZF, and ZFDM), those originated from wild populations (BN/SsNSlc, DOB/Oda, IS/Kyo, IS-Tlk/Kyo, LE/Stm, LEC/Tj, and NIG-III/Hok), and representative inbred strains (F344/DuCrlCrlj, F344/Jcl, F344/NSlc, and F344/Stm). All animal experimentation protocols were approved by the Institutional Animal Care and Use Committees of Kyoto University and were conducted according to the Regulation on Animal Experimentation at Kyoto University.
Genomic DNA was extracted from spleen samples with standard protocols. Target capture was performed using the standard SeqCap EZ System protocol (Roche NimbleGen). DNA sequencing libraries were prepared using the KAPA HyperPlus Library Preparation Kit (KAPA Biosystems, London, UK) according to the manufacturer's protocol. Sequencing was performed on an Illumina NextSeq 500 platform (Illumina, San Diego, CA, USA) using the High Output Kit (2 × 150 cycles). We obtained 61–82 million reads for each strain (Table 1). Sequence reads were mapped to the rat genome assembly rn5 (RGSC 5.0, March 2012) using BWA (v0.7.4) [4] with the default parameters. SAMtools (v0.1.12a) [5], Picard tools (v1.87) (http://broadinstitute.github.io/picard/), and the Genome Analysis Toolkit (GATK; v2.5.2) [6] were used for post-processing of mapped reads. Variant calling employed the UnifiedGenotyper utility in GATK. We identified 154,330 SNVs and 24,368 INDELs in the target regions, on an average (Table 1). The number of unique SNVs and INDELs among the 20 strains was 470,037 and 68,652, respectively. Sequence data and variants identified for these strains represent valuable resources for further genetic studies in the rat.
Table 1.
Summary statistics for sequencing and variant calling.
| Strain | Sex | Total reads | Read length | Mapped reads after post-processing (%) | Average target depth | SNV (depth ≥ 5 ×) |
INDEL (depth ≥ 5 ×) |
|---|---|---|---|---|---|---|---|
| BDIX.Cg-Tal/NemOda | Unknown | 77,031,192 | 151 | 62,133,380 (80.7) | 33.0 | 161,043 | 25,729 |
| BDIX/NemOda | Female | 62,884,340 | 151 | 50,668,261 (80.6) | 26.2 | 155,727 | 24,561 |
| BN/SsNSlc | Male | 67,363,478 | 151 | 54,385,129 (80.7) | 29.4 | 23,060 | 5533 |
| BUF/MNa | Male | 60,898,020 | 151 | 49,603,905 (81.5) | 27.2 | 154,382 | 24,122 |
| DOB/Oda | Male | 68,359,820 | 151 | 61,010,641 (89.2) | 31.7 | 196,751 | 30,148 |
| F344/DuCrlCrlj | Male | 73,516,660 | 151 | 59,541,186 (81.0) | 27.7 | 152,184 | 23,890 |
| F344/Jcl | Male | 62,994,072 | 151 | 50,991,611 (80.9) | 26.6 | 152,141 | 23,855 |
| F344/NSlc | Male | 62,838,170 | 151 | 50,726,936 (80.7) | 27.5 | 152,546 | 23,930 |
| F344/Stm | Male | 64,788,908 | 151 | 52,984,127 (81.8) | 29.1 | 151,919 | 23,735 |
| HTX/Kyo | Male | 72,484,640 | 151 | 64,572,821 (89.1) | 33.7 | 154,418 | 24,156 |
| HWY/Slc | Male | 74,687,034 | 151 | 66,579,903 (89.1) | 34.6 | 157,070 | 24,873 |
| IS/Kyo | Male | 79,430,344 | 151 | 70,744,396 (89.1) | 37.4 | 187,300 | 29,120 |
| IS-Tlk/Kyo | Male | 75,990,092 | 151 | 67,761,875 (89.2) | 35.8 | 186,648 | 28,902 |
| KFRS3B/Kyo | Female | 81,643,134 | 151 | 72,603,786 (88.9) | 35.1 | 154,292 | 24,419 |
| LE/Stm | Male | 72,300,094 | 151 | 58,438,239 (80.8) | 31.7 | 157,488 | 25,052 |
| LEC/Tj | Unknown | 78,990,272 | 151 | 70,539,682 (89.3) | 37.3 | 167,547 | 26,315 |
| NIG-III/Hok | Unknown | 78,128,624 | 151 | 69,625,354 (89.1) | 36.9 | 164,732 | 26,195 |
| RCS/Kyo | Male | 71,627,648 | 151 | 57,894,324 (80.8) | 31.6 | 155,472 | 24,975 |
| ZF | Male | 69,986,466 | 151 | 56,655,891 (81.0) | 30.2 | 150,778 | 23,815 |
| ZFDM | Male | 73,535,060 | 151 | 59,407,086 (80.8) | 31.9 | 151,101 | 24,025 |
Conflict of interest
The authors declare no conflicts of interest.
Acknowledgements
We thank the National BioResource Project–Rat (http://www.anim.med.kyoto-u.ac.jp/nbr/) for providing rat strains. This work was supported in part by the Cooperative Research Project Program of the Medical Institute of Bioregulation, Kyushu University, to OO, and the Genome Information Upgrading Program of the National BioResource Project, Japan Agency for Medical Research and Development, to OO, TK, and MS.
Contributor Information
Takashi Kuramoto, Email: tkuramot@anim.med.kyoto-u.ac.jp.
Mikita Suyama, Email: mikita@bioreg.kyushu-u.ac.jp.
References
- 1.Gibbs R.A., Weinstock G.M., Metzker M.L., Muzny D.M., Sodergren E.J., Scherer S., Scott G., Steffen D., Worley K.C., Burch P.E., Okwuonu G., Hines S., Lewis L., DeRamo C., Delgado O., Dugan-Rocha S., Miner G., Morgan M., Hawes A., Gill R., Celera, Holt R.A., Adams M.D., Amanatides P.G., Baden-Tillson H., Barnstead M., Chin S., Evans C.A., Ferriera S., Fosler C., Glodek A., Gu Z., Jennings D., Kraft C.L., Nguyen T., Pfannkoch C.M., Sitter C., Sutton G.G., Venter J.C., Woodage T., Smith D., Lee H.-M., Gustafson E., Cahill P., Kana A., Doucette-Stamm L., Weinstock K., Fechtel K., Weiss R.B., Dunn D.M., Green E.D., Blakesley R.W., Bouffard G.G., De Jong P.J., Osoegawa K., Zhu B., Marra M., Schein J., Bosdet I., Fjell C., Jones S., Krzywinski M., Mathewson C., Siddiqui A., Wye N., McPherson J., Zhao S., Fraser C.M., Shetty J., Shatsman S., Geer K., Chen Y., Abramzon S., Nierman W.C., Havlak P.H., Chen R., Durbin K.J., Egan A., Ren Y., Song X.-Z., Li B., Liu Y., Qin X., Cawley S., Worley K.C., Cooney A.J., D'Souza L.M., Martin K., Wu J.Q., Gonzalez-Garay M.L., Jackson A.R., Kalafus K.J., McLeod M.P., Milosavljevic A., Virk D., Volkov A., Wheeler D.A., Zhang Z., Bailey J.A., Eichler E.E., Tuzun E., Birney E., Mongin E., Ureta-Vidal A., Woodwark C., Zdobnov E., Bork P., Suyama M., Torrents D., Alexandersson M., Trask B.J., Young J.M., Huang H., Wang H., Xing H., Daniels S., Gietzen D., Schmidt J., Stevens K., Vitt U., Wingrove J., Camara F., Albà M.M., Abril J.F., Guigo R., Smit A., Dubchak I., Rubin E.M., Couronne O., Poliakov A., Hübner N., Ganten D., Goesele C., Hummel O., Kreitler T., Lee Y.-A., Monti J., Schulz H., Zimdahl H., Himmelbauer H., Lehrach H., Jacob H.J., Bromberg S., Gullings-Handley J., Jensen-Seaman M.I., Kwitek A.E., Lazar J., Pasko D., Tonellato P.J., Twigger S., Ponting C.P., Duarte J.M., Rice S., Goodstadt L., Beatson S.A., Emes R.D., Winter E.E., Webber C., Brandt P., Nyakatura G., Adetobi M., Chiaromonte F., Elnitski L., Eswara P., Hardison R.C., Hou M., Kolbe D., Makova K., Miller W., Nekrutenko A., Riemer C., Schwartz S., Taylor J., Yang S., Zhang Y., Lindpaintner K., Andrews T.D., Caccamo M., Clamp M., Clarke L., Curwen V., Durbin R., Eyras E., Searle S.M., Cooper G.M., Batzoglou S., Brudno M., Sidow A., Stone E.A., Venter J.C., Payseur B.A., Bourque G., López-Otín C., Puente X.S., Chakrabarti K., Chatterji S., Dewey C., Pachter L., Bray N., Yap V.B., Caspi A., Tesler G., Pevzner P.A., Haussler D., Roskin K.M., Baertsch R., Clawson H., Furey T.S., Hinrichs A.S., Karolchik D., Kent W.J., Rosenbloom K.R., Trumbower H., Weirauch M., Cooper D.N., Stenson P.D., Ma B., Brent M., Arumugam M., Shteynberg D., Copley R.R., Taylor M.S., Riethman H., Mudunuri U., Peterson J., Guyer M., Felsenfeld A., Old S., Mockrin S., Collins F. Rat genome sequencing project consortium, genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004;428:493–521. doi: 10.1038/nature02426. [DOI] [PubMed] [Google Scholar]
- 2.Serikawa T., Mashimo T., Takizawa A., Okajima R., Maedomari N., Kumafuji K., Tagami F., Neoda Y., Otsuki M., Nakanishi S., Yamasaki K., Voigt B., Kuramoto T. National BioResource Project-Rat and related activities. Exp. Anim. Jpn. Assoc. Lab. Anim. Sci. 2009;58:333–341. doi: 10.1538/expanim.58.333. [DOI] [PubMed] [Google Scholar]
- 3.Yoshihara M., Saito D., Sato T., Ohara O., Kuramoto T., Suyama M. Design and application of a target capture sequencing of exons and conserved non-coding sequences for the rat. BMC Genomics. 2016;17:593. doi: 10.1186/s12864-016-2975-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., DePristo M.A. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
