Skip to main content
Genomics Data logoLink to Genomics Data
. 2016 Nov 14;10:155–157. doi: 10.1016/j.gdata.2016.11.010

Application of target capture sequencing of exons and conserved non-coding sequences to 20 inbred rat strains

Minako Yoshihara a,b, Tetsuya Sato a,b, Daisuke Saito a,b, Osamu Ohara c, Takashi Kuramoto d,, Mikita Suyama a,b,
PMCID: PMC5114524  PMID: 27882299

Abstract

We report sequence data obtained by our recently devised target capture method TargetEC applied to 20 inbred rat strains. This method encompasses not only all annotated exons but also highly conserved non-coding sequences shared among vertebrates. The total length of the target regions covers 146.8 Mb. On an average, we obtained 31.7 × depth of target coverage and identified 154,330 SNVs and 24,368 INDELs for each strain. This corresponds to 470,037 unique SNVs and 68,652 unique INDELs among the 20 strains. The sequence data can be accessed at DDBJ/EMBL/GenBank under accession number PRJDB4648, and the identified variants have been deposited at http://bioinfo.sls.kyushu-u.ac.jp/rat_target_capture/20_strains.vcf.gz.


Specifications [standardized info for the reader]
Organism/cell line/tissue Rattus norvegicus (BDIX/NemOda, BDIX. Cg-Tal/NemOda, BN/SsNSlc, BUF/MNa, DOB/Oda, F344/DuCrlCrlj, F344/Jcl, F344/NSlc, F344/Stm, HTX/Kyo, HWY/Slc, IS/Kyo, IS-Tlk/Kyo, KFRS3B/Kyo, LE/Stm, LEC/Tj, NIG-III/Hok, RCS/Kyo, ZF, ZFDM)

Sex Female and male, see Table 1
Sequencer or array type Illumina NextSeq 500
Data format FASTQ and VCF
Experimental factors Genomic DNA extracted from spleen
Experimental features Target capture sequencing of exons and conserved non-coding sequences
Consent Not applicable
Sample source location Rat strains were provided by the National BioResource Project (NBRP)–Rat (http://www.anim.med.kyoto-u.ac.jp/nbr/).

1. Direct link to deposited data [provide URL below]

http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJDB4648

http://bioinfo.sls.kyushu-u.ac.jp/rat_target_capture/20_strains.vcf.zip

2. Experimental design, materials and methods

Rats are used as animal models of many human diseases, such as cancer and hypertension. Because of its significance in biomedical analyses, the genome sequence of the Brown Norway rat strain was determined as the third complete mammalian genome [1]. The National BioResource Project–Rat (NBRP-Rat) at Kyoto University is one of the largest repositories for rat strains, and currently, > 700 strains have been collected and preserved as live animals, embryos, or sperm [2]. Determination of genome sequences for these strains is important not only for understanding genetic causes for various phenotypes but also to augment their value as biological resources.

Whole exome sequencing is an efficient approach to characterize only the exonic portions of a genome, which typically comprise 1%–2% of complete mammalian genomes, and has been successfully used in the identification of relevant genes and their causative mutations in many diseases in humans. Although some non-human exome capture kits exist, there had previously been no such capture probe set for rats. Therefore, we established a target capture kit specifically designed for this rodent species, employing the SeqCap EZ Developer Library (Roche NimbleGen, Madison, WI, USA; design name 140929_RN5_MS_EZ_HX1). In designing our target capture probe set, we included highly conserved non-coding sequences (CNSs) as target regions as well as all annotated exons, covering a total 146.8 Mb of the genome [3]. By applying this target capture method TargetEC (target capture for exons and conserved non-coding sequences) to four rat strains (WTC/Kyo, WTC-swh/Kyo, PVG/Seac, and KFRS4/Kyo), we confirmed that TargetEC performs efficiently in the identification of causative mutations, including those present in the non-coding regions [3]. In this study, we further applied TargetEC to 20 additional inbred strains preserved in NBRP-Rat to identify additional variants observed in multiple rat strains. These 20 strains were selected according to the following three categories: disease models derived from selective breeding (BDIX/NemOda, BDIX.Cg-Tal/NemOda, BUF/MNa, HTX/Kyo, HWY/Slc, KFRS3B/Kyo, RCS/Kyo, ZF, and ZFDM), those originated from wild populations (BN/SsNSlc, DOB/Oda, IS/Kyo, IS-Tlk/Kyo, LE/Stm, LEC/Tj, and NIG-III/Hok), and representative inbred strains (F344/DuCrlCrlj, F344/Jcl, F344/NSlc, and F344/Stm). All animal experimentation protocols were approved by the Institutional Animal Care and Use Committees of Kyoto University and were conducted according to the Regulation on Animal Experimentation at Kyoto University.

Genomic DNA was extracted from spleen samples with standard protocols. Target capture was performed using the standard SeqCap EZ System protocol (Roche NimbleGen). DNA sequencing libraries were prepared using the KAPA HyperPlus Library Preparation Kit (KAPA Biosystems, London, UK) according to the manufacturer's protocol. Sequencing was performed on an Illumina NextSeq 500 platform (Illumina, San Diego, CA, USA) using the High Output Kit (2 × 150 cycles). We obtained 61–82 million reads for each strain (Table 1). Sequence reads were mapped to the rat genome assembly rn5 (RGSC 5.0, March 2012) using BWA (v0.7.4) [4] with the default parameters. SAMtools (v0.1.12a) [5], Picard tools (v1.87) (http://broadinstitute.github.io/picard/), and the Genome Analysis Toolkit (GATK; v2.5.2) [6] were used for post-processing of mapped reads. Variant calling employed the UnifiedGenotyper utility in GATK. We identified 154,330 SNVs and 24,368 INDELs in the target regions, on an average (Table 1). The number of unique SNVs and INDELs among the 20 strains was 470,037 and 68,652, respectively. Sequence data and variants identified for these strains represent valuable resources for further genetic studies in the rat.

Table 1.

Summary statistics for sequencing and variant calling.

Strain Sex Total reads Read length Mapped reads after post-processing (%) Average target depth SNV
(depth ≥ 5 ×)
INDEL
(depth ≥ 5 ×)
BDIX.Cg-Tal/NemOda Unknown 77,031,192 151 62,133,380 (80.7) 33.0 161,043 25,729
BDIX/NemOda Female 62,884,340 151 50,668,261 (80.6) 26.2 155,727 24,561
BN/SsNSlc Male 67,363,478 151 54,385,129 (80.7) 29.4 23,060 5533
BUF/MNa Male 60,898,020 151 49,603,905 (81.5) 27.2 154,382 24,122
DOB/Oda Male 68,359,820 151 61,010,641 (89.2) 31.7 196,751 30,148
F344/DuCrlCrlj Male 73,516,660 151 59,541,186 (81.0) 27.7 152,184 23,890
F344/Jcl Male 62,994,072 151 50,991,611 (80.9) 26.6 152,141 23,855
F344/NSlc Male 62,838,170 151 50,726,936 (80.7) 27.5 152,546 23,930
F344/Stm Male 64,788,908 151 52,984,127 (81.8) 29.1 151,919 23,735
HTX/Kyo Male 72,484,640 151 64,572,821 (89.1) 33.7 154,418 24,156
HWY/Slc Male 74,687,034 151 66,579,903 (89.1) 34.6 157,070 24,873
IS/Kyo Male 79,430,344 151 70,744,396 (89.1) 37.4 187,300 29,120
IS-Tlk/Kyo Male 75,990,092 151 67,761,875 (89.2) 35.8 186,648 28,902
KFRS3B/Kyo Female 81,643,134 151 72,603,786 (88.9) 35.1 154,292 24,419
LE/Stm Male 72,300,094 151 58,438,239 (80.8) 31.7 157,488 25,052
LEC/Tj Unknown 78,990,272 151 70,539,682 (89.3) 37.3 167,547 26,315
NIG-III/Hok Unknown 78,128,624 151 69,625,354 (89.1) 36.9 164,732 26,195
RCS/Kyo Male 71,627,648 151 57,894,324 (80.8) 31.6 155,472 24,975
ZF Male 69,986,466 151 56,655,891 (81.0) 30.2 150,778 23,815
ZFDM Male 73,535,060 151 59,407,086 (80.8) 31.9 151,101 24,025

Conflict of interest

The authors declare no conflicts of interest.

Acknowledgements

We thank the National BioResource Project–Rat (http://www.anim.med.kyoto-u.ac.jp/nbr/) for providing rat strains. This work was supported in part by the Cooperative Research Project Program of the Medical Institute of Bioregulation, Kyushu University, to OO, and the Genome Information Upgrading Program of the National BioResource Project, Japan Agency for Medical Research and Development, to OO, TK, and MS.

Contributor Information

Takashi Kuramoto, Email: tkuramot@anim.med.kyoto-u.ac.jp.

Mikita Suyama, Email: mikita@bioreg.kyushu-u.ac.jp.

References

  • 1.Gibbs R.A., Weinstock G.M., Metzker M.L., Muzny D.M., Sodergren E.J., Scherer S., Scott G., Steffen D., Worley K.C., Burch P.E., Okwuonu G., Hines S., Lewis L., DeRamo C., Delgado O., Dugan-Rocha S., Miner G., Morgan M., Hawes A., Gill R., Celera, Holt R.A., Adams M.D., Amanatides P.G., Baden-Tillson H., Barnstead M., Chin S., Evans C.A., Ferriera S., Fosler C., Glodek A., Gu Z., Jennings D., Kraft C.L., Nguyen T., Pfannkoch C.M., Sitter C., Sutton G.G., Venter J.C., Woodage T., Smith D., Lee H.-M., Gustafson E., Cahill P., Kana A., Doucette-Stamm L., Weinstock K., Fechtel K., Weiss R.B., Dunn D.M., Green E.D., Blakesley R.W., Bouffard G.G., De Jong P.J., Osoegawa K., Zhu B., Marra M., Schein J., Bosdet I., Fjell C., Jones S., Krzywinski M., Mathewson C., Siddiqui A., Wye N., McPherson J., Zhao S., Fraser C.M., Shetty J., Shatsman S., Geer K., Chen Y., Abramzon S., Nierman W.C., Havlak P.H., Chen R., Durbin K.J., Egan A., Ren Y., Song X.-Z., Li B., Liu Y., Qin X., Cawley S., Worley K.C., Cooney A.J., D'Souza L.M., Martin K., Wu J.Q., Gonzalez-Garay M.L., Jackson A.R., Kalafus K.J., McLeod M.P., Milosavljevic A., Virk D., Volkov A., Wheeler D.A., Zhang Z., Bailey J.A., Eichler E.E., Tuzun E., Birney E., Mongin E., Ureta-Vidal A., Woodwark C., Zdobnov E., Bork P., Suyama M., Torrents D., Alexandersson M., Trask B.J., Young J.M., Huang H., Wang H., Xing H., Daniels S., Gietzen D., Schmidt J., Stevens K., Vitt U., Wingrove J., Camara F., Albà M.M., Abril J.F., Guigo R., Smit A., Dubchak I., Rubin E.M., Couronne O., Poliakov A., Hübner N., Ganten D., Goesele C., Hummel O., Kreitler T., Lee Y.-A., Monti J., Schulz H., Zimdahl H., Himmelbauer H., Lehrach H., Jacob H.J., Bromberg S., Gullings-Handley J., Jensen-Seaman M.I., Kwitek A.E., Lazar J., Pasko D., Tonellato P.J., Twigger S., Ponting C.P., Duarte J.M., Rice S., Goodstadt L., Beatson S.A., Emes R.D., Winter E.E., Webber C., Brandt P., Nyakatura G., Adetobi M., Chiaromonte F., Elnitski L., Eswara P., Hardison R.C., Hou M., Kolbe D., Makova K., Miller W., Nekrutenko A., Riemer C., Schwartz S., Taylor J., Yang S., Zhang Y., Lindpaintner K., Andrews T.D., Caccamo M., Clamp M., Clarke L., Curwen V., Durbin R., Eyras E., Searle S.M., Cooper G.M., Batzoglou S., Brudno M., Sidow A., Stone E.A., Venter J.C., Payseur B.A., Bourque G., López-Otín C., Puente X.S., Chakrabarti K., Chatterji S., Dewey C., Pachter L., Bray N., Yap V.B., Caspi A., Tesler G., Pevzner P.A., Haussler D., Roskin K.M., Baertsch R., Clawson H., Furey T.S., Hinrichs A.S., Karolchik D., Kent W.J., Rosenbloom K.R., Trumbower H., Weirauch M., Cooper D.N., Stenson P.D., Ma B., Brent M., Arumugam M., Shteynberg D., Copley R.R., Taylor M.S., Riethman H., Mudunuri U., Peterson J., Guyer M., Felsenfeld A., Old S., Mockrin S., Collins F. Rat genome sequencing project consortium, genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004;428:493–521. doi: 10.1038/nature02426. [DOI] [PubMed] [Google Scholar]
  • 2.Serikawa T., Mashimo T., Takizawa A., Okajima R., Maedomari N., Kumafuji K., Tagami F., Neoda Y., Otsuki M., Nakanishi S., Yamasaki K., Voigt B., Kuramoto T. National BioResource Project-Rat and related activities. Exp. Anim. Jpn. Assoc. Lab. Anim. Sci. 2009;58:333–341. doi: 10.1538/expanim.58.333. [DOI] [PubMed] [Google Scholar]
  • 3.Yoshihara M., Saito D., Sato T., Ohara O., Kuramoto T., Suyama M. Design and application of a target capture sequencing of exons and conserved non-coding sequences for the rat. BMC Genomics. 2016;17:593. doi: 10.1186/s12864-016-2975-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., DePristo M.A. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Genomics Data are provided here courtesy of Elsevier

RESOURCES