Skip to main content
Data in Brief logoLink to Data in Brief
. 2020 Dec 9;34:106577. doi: 10.1016/j.dib.2020.106577

Characterization of the genomic sequence data around common cutworm resistance genes in soybean (Glycine max) using short- and long-read sequencing methods

Eri Ogiso-Tanaka a,, Nobuhiko Oki b, Tsuyoshi Tanaka a, Takehiko Shimizu a, Masao Ishimoto a, Makita Hajika a, Akito Kaga a,
PMCID: PMC7758510  PMID: 33376760

Abstract

The common cutworm (CCW, Spodopteraab litura Fabricius) is one of the pests that most severely infect soybean (Glycine max L. Merr.). In a previous report, quantitative trait loci (QTL) analysis of CCW resistance using a recombinant inbred line derived from a cross between a susceptible cultivar ‘Fukuyutaka’ and a resistant cultivar ‘Himeshirazu’, identified two antixenosis resistance QTLs, CCW-1 and CCW-2. To reveal sequence variation between the aforementioned two cultivars, whole genome resequencing was performed using Illumina HiSeq2000 (75,632,747 and 91,540,849 reads). The generated datasets can be used for fine mapping and gene isolation of CCW-1 and CCW-2 as well as for revealing more detailed genetic differences between ‘Fukuyutaka’ and ’Himeshirazu’ .

Keywords: Glycine max, Soybean; Resistance to the common cutworm; QTL region; Whole genome resequencing; Targeted amplicon sequencing; HiSeq; Oxford Nanopore MinION

Specification Table

Subject Plant science
Specific subject area Agricultural and Biological Sciences, Genomics of soybean (Glycine max)
Type of data Figure and fastq/fasta files
How data were acquired Whole genomes of soybean cultivars ‘Fukuyutaka’ and ‘Himeshirazu’ were sequenced using the ILLUMINA HiSeq2000 short-read sequencer.
The sequence of the unique genomic region in CCW2 was amplified by genomic polymerase chain reaction (PCR) and sequenced using MinION nanopore long-read sequencer (type R9.4, Oxford Nanopore Technologies Ltd., UK [ONT]).
Data format Raw sequencing reads (fastq), Binary Alignment Map (BAM) and analyzed files (fasta)
Parameters for data collection The common cutworm susceptible soybean cultivar ‘Fukuyutaka’ and resistant cultivar ‘Himeshirazu’ were used in this work. Their seeds are available from Genebank in NARO (https://www.gene.affrc.go.jp/databases_en.php). Genomic DNA for the sequencing was prepared from new leaves of one individual.
Description of data HiSeq: Sequencing libraries were prepared with 1 μg DNA input, using the TruSeq DNA PCR-Free Library Preparation Kit (Illumina). Library pools were quantified by qPCR, loaded on the HiSeq2000 patterned flow cells and clustered on an Illumina cBot in accordance with the manufacturer's protocol. Flow cells were sequenced on the Illumina HiSeq2000 with 2 × 100 bp reads. Demultiplexing of sequencing data was performed with bcl2fastq2.
MinION: Amplicons were obtained by amplification from the genomic DNA of ‘Himeshirazu’. A total of 1 µg amplicon was end-repaired and used for library construction. The MinION sequencing was run using MinKNOW (version 1.7.3). The resulting FAST5 files were converted to FASTQ files using the Albacore basecaller (version 1.1.0, ONT). The raw reads were assembled using Canu (version 1.6) [1].
Data source location Institute of Crop Science, National Agriculture and Food Research Organization (NARO), Tsukuba, Japan
Data accessibility The sequence data have been deposited in the DNA Data Bank of Japan Sequence Read Archive, under submission ID DRA010742, DRA010747, DRA010652 http://trace.ddbj.nig.ac.jp/DRASearch/
(BioSample accessions: PRJDB10367, PRJDB10313)
The sequence has been placed in fasta format on FigShare, https://figshare.com/search?q=10.6084%2Fm9.figshare.13220792

Value of the Data

  • The genomic data of the susceptible and resistant soybean cultivars of common cutworm can be used for the development of a molecular marker for detecting quantitative trait loci and isolating genes.

  • The sequence data for insert genomic region of ‘Himeshirazu’ in the CCW2 region can be used for fine-mapping of a candidate gene.

  • These data can be used for development of DNA markers and can contribute to marker-assisted selection in soybean breeding.

1. Data Description

The common cutworm (CCW, Spodoptera litura Fabricius) is one of the most serious pests of soybean (Glycine max (L.) Merr.). Komatsu et al (2004) reported on the antibiotic effects of soybean cultivars ‘Fukuyutaka’ and ‘Himeshirazu’ on CCW. ‘Fukuyutaka’ is a leading cultivar in southwestern Japan but is susceptible to CCW. ‘Himeshirazu’ is a forage cultivar but has strong CCW resistance [2]. Quantitative trait loci (QTL) analysis of CCW resistance using a recombinant inbred line derived from a cross between ‘Fukuyutaka’ and ‘Himeshirazu’, identified two antibiosis resistant QTLs, CCW-1 and CCW-2 [3], [4], and two antixenosis resistant QTLs, qRslx1 and qRslx2 [5]. The QTLs, CCW-1 and CCW-2, regions of ‘Himeshirazu’ were verified by using near isogenic lines [6]. To detect the polymorphic sites of CCW-1 and CCW-2 genomic regions, we performed whole genome resequencing and variant detection.

HiSeq: We present the whole genome sequence data of ‘Fukuyutaka’ and ‘Himeshirazu’. We sequenced paired-end libraries using the Illumina HiSeq2000 and generated 75,632,747 and 91,540,849 reads. These were compared to the reference genome version 2.0 (Gmax275: http://genome.jgi.doe.gov/pages/dynamicOrganismDownload.jsf?organism=Phytozome#, downloaded on May 15, 2015) [7], and 1,599,492 and 1,846,338 polymorphic sites were detected in ‘Fukuyutaka’ and ‘Himeshirazu’, respectively (Table 1). Among them, the number of polymorphic sites in the CCW1 region (Chr7:10,655,942-15,394,281 corresponding to the genomic interval of SSR markers, Sat_258-Satt175) was 2,489 for ‘Fukuyutaka’ and 4,873 for ‘Himeshirazu’, and 9,553 and 10,627 polymorphisms were detected in the CCW2 region (Chr7:4,559,713-8,283,465 to Satt567-Satt463) for ‘Fukuyutaka’ and ‘Himeshirazu’, respectively (Table 1). The number of polymorphic sites between ‘Fukuyutaka’ and ‘Himeshirazu’ was 2,899 (SNP: 2,483, InDel: 416) in the CCW1 region and 10,547 (SNP: 8,632, InDel: 1,915) in the CCW2 region, (Table 2, S1-2). These data will be useful to develop SNP/InDel markers for genetic mapping and identify the responsible genes and comparative functional genomics. In addition, we found partially unaligned reads in ‘Himeshirazu’ around Chr7:4,588,573-4,588,578 (Fig. 1) from the read alignment. We successfully amplified about a 18 kb bp fragment of ‘Himeshirazu’ using primers flanking this unaligned region based on the Gmax275 reference genome (Chr07:4,585,885-4,597,201, 11,316bp in reference genome), and characterized the sequence in-depth (Fig. 2).

Table 1.

Number of polymorphic sites. Differences from the reference genome (cultivar: Williams 82).

Fukuyutaka Himeshirazu
Whole genome 1599492 1846338
CCW1 region (Sat_258-Satt175) 2489 4873
CCW2 region (Satt567-Satt463) 9553 10627

Table 2.

Number of polymorphic sites. Differences between ‘Himeshirazu’ and ‘Fukuyutaka’ cultivars.

CCW1 region (Sat_258-Satt175) CCW2 region (Satt567-Satt463)
SNP 2483 8632
InDel 416 1915
Total 2899 10547

Fig. 1.

Fig 1:

Read alignment suggests an insertion event around Chr7:4588573-4588578 in ‘Himeshirazu’ (bottom panel). The plot is an image from the integrative genome viewer that represents the read alignment of ‘Fukuyutaka’ (top panel) and ‘Himeshirazu’ (bottom panel). The red arrow represents the position where the insertion sequence is presumed to be located.

Fig. 2.

Fig 2:

Position and size of the amplicon for targeted sequencing with MinION platform (Chr07:4585885-4597201, amplicon size=11 kbp in reference genome). Black and red arrows represent target region of amplicon sequencing and candidate insertion breakpoint, respectively. Black lines represent amplicons from ‘Fukuyutaka’ and ‘Himeshirazu’.

MinION: We determined the inserted sequences in the CCW2 region observed in ‘Himeshirazu’. The amplified fragment, whose length was about 18 kbp estimated from PCR analysis, was sequenced using the Oxford Nanopore MinION platform (Oxford Nanopore Technologies Ltd., Oxford, UK). We obtained 28,725 raw reads. Only 18 reads were remained after the trimming and quality controls by Canu. The length distribution of 18 reads was bipolarized between 18,023 bp to 41,188 bp (Table 3). From the estimated size of the regions, we considered the longer reads would be artifacts. To confirm the possibility, we conducted homology search among 18 reads by BLASTN. While 14 shorter reads had one homologous region with each other, four longer reads (No. 15–18) whose lengths were 34,355 bp, 33,401 bp, 36,324 bp and 41,188 bp, respectively, had two homologous regions to short reads. We confirmed tandem duplication of a shorter read on a long read by mummer-4.0.0beta2 [8]. We also conducted a homology search of 18 reads against Gmax275 genome sequences and found the homology on Chr07 with gaps (7.2–7.5 Kbp) (Table 3). Therefore, we concluded that the longer reads were chimeric reads and excluded from the assembly. Finally, we constructed a consensus sequence from 14 reads. We also confirmed that the consensus sequence contained a target insertion observed in ‘Himeshirazu’ compared with the regions on Chr07 of the Gmax275 reference genome sequence with a long gap (Fig. 3). These data will be useful to perform fine mapping of CCW-2 and identify the responsible gene.

Table 3.

Summary of blastn results. The 18 “pass” reads aligned to target the sequence of the reference genome (Gmax275).

No. Query length (bp) Subject Identity Query start Query end Subject start Subject end Estimated Gap length(bp)
1 18023 Chr07 96.384 17 8350 4597201 4588579
95.64 15619 18002 4588582 4586108 7270
2 18100 Chr07 96.26 21 8367 4597201 4588573
94.353 15739 18097 4588582 4586106 7373
3 18118 Chr07 95.799 1 8310 4597173 4588573
96.739 15797 18118 4588582 4586192 7488
4 18149 Chr07 96.499 14 6546 4597201 4590470
96.884 6586 8298 4590365 4588602
96.734 15736 18146 4588582 4586106 7439
5 18150 Chr07 96.239 15 8360 4597201 4588573
97.264 15691 18117 4588582 4586100 7332
6 18163 Chr07 96.677 11 8387 4597201 4588573
96.802 15769 18163 4588582 4586113 7383
7 18172 Chr07 96.582 3 8330 4597166 4588573
96.529 15771 18170 4588582 4586108 7442
8 18185 Chr07 96.835 6 8384 4597188 4588573
97.029 15754 18184 4588582 4586093 7371
9 18212 Chr07 97.068 10 8370 4597201 4588613
95.858 15811 18212 4588582 4586100 7442
10 18216 Chr07 96.466 11 8336 4597168 4588573
96.879 15766 18211 4588582 4586090 7431
11 18217 Chr07 96.665 2 8382 4597196 4588573
97.457 15790 18216 4588582 4586111 7409
12 18222 Chr07 96.764 8 8375 4597197 4588586
97.57 15802 18222 4588582 4586117 7428
13 18269 Chr07 96.813 20 8396 4597196 4588573
96.121 15876 18268 4588582 4586111 7481
14 18272 Chr07 96.538 21 8401 4597201 4588573
97.065 15848 18110 4588582 4586269 7448
15 34355 Chr07 96.295 1 2609 4585889 4588582
96.922 10076 18488 4588573 4597202 7468
90.363 18489 21065 4585885 4588582
94.983 28504 34355 4588573 4594619 7440
16 33401 Chr07 95.634 1 2579 4585907 4588582
95.221 10000 17555 4588573 4596411 7422
91.039 17544 20127 4585885 4588582
94.494 27522 33401 4588573 4594679 7396
17 36324 Chr07 94.271 1 8259 4597196 4588573
90.693 15535 18055 4588582 4585914 7277
94.253 18074 26331 4597205 4588573
97.425 33710 36324 4588582 4585904 7380
18 41188 Chr07 97.039 1 8374 4597179 4588573
96.272 15836 18461 4588582 4585885 7463
91.788 18462 20978 4597202 4594636
88.625 20974 23209 4594839 4597202
90.557 23214 25741 4585889 4588582
95.204 32940 41187 4588573 4597196 7200

No: Number of queries (“pass” reads determined using MinION)

Identity: Percentage of identity (identical site/denominator).

Query start - Q uery end: query range coved by alignment

Subject start - Subject end: subject range covered by alignment.

Estimated Gap length (bp): The subject length and physical position on

Fig. 3.

Fig 3:

Genome alignment between consensus sequence of amplicons (y-axis) and the genomic sequence from 4.58 to 4.6 Mbp on Chr7 of Gmax275 reference genome (x-axis). The insertion breakpoint junction was on Chr07:4588576-4588579 (TGGA).

2. Experimental Design, Materials and Methods

2.1. Sample collection and DNA extraction

Samples for HiSeq: Soybean cultivars ‘Fukuyutaka’ and ‘Himeshirazu’ were cultivated in a greenhouse at the National Agriculture and Food Research Organization (NARO) in Tsukuba, Ibaraki, Japan, and treated in dark condition for one-week to reduce organelle before DNA extraction. Leaves were collected from about five seedlings of ‘Fukuyutaka’ and ‘Himeshirazu’ (seeds from a single individual), and DNA was extracted from bulked leaves using a protocol from Peterson et al. [9] with some modification.

Samples for MinION: ‘Himeshirazu’ was cultivated in an artificial climate chamber at NARO. Genomic DNA was extracted from the newest fresh leaves of ‘Himeshirazu’ using the CTAB method with the following modifications: Leaves were homogenized in liquid nitrogen and the tissues were transferred to preheated 2 x CTAB DNA extraction buffer (2% CTAB, 0.1 M Tris-HCl pH 8.0, 1.4 M NaCl, 1% PVP, 20 mM EDTA) and 80 μg/ml proteinase K. Then, they were incubated in a water bath at 55 °C for 15 min, and mixed occasionally by gentle inversion of the tubes. After they were removed from the water bath and the same volume of chloroform-isoamylalcohol (24:1) was added, they were mixed by inversion. They were spun down at 3000 rpm and the supernatant was transferred to the new tube. Equal volume of supernatant was added to isopropanol. They were mixed by inversion and centrifuged at 14000 rpm for 5 min (MX-201, TOMY Seiko Co., Ltd, Tokyo, Japan). The pellets were washed with 70% ethanol twice and dried at room temperature. The DNA pellet was air-dried and dissolved in 50 μl of low TE buffer (10 mM Tris-HCl, 0.1 mM EDTA pH 8.0). The DNA concentration was measured by nanodrop (Thermo Fisher Scientific Inc., USA) and Qubit (Thermo Fisher Scientific Inc.).

2.2. Library preparation, illumina and nanopore sequencing

HiSeq: A total of 75632747 and 91540849 paired reads from ‘Fukuyutaka’ and ‘Himeshirazu’ of a 350-bp insert-size library by TruSeq DNA PCR Free kit (Illumina Inc., San Diego, CA, USA) were generated from the Illumina HiSeq2000. The reads derived from the HiSeq2000 sequencing data were processed to remove adapter sequences and low-quality bases by trimmomatic-0.30 using the option “ILLUMINACLIP:adapter.fa:2:30:10 LEADING:15 TRAILING:15 SLIDINGWINDOW:4:15 MINLEN:32” [10]. The FASTQ files after quality control were deposited in the Sequence Read Archive (SRA) (biosample accession number: SAMD00238602 and SAMD00238603) under the bioproject accession number DRA010742 (Fukuyutaka) and DRA010747 (Himeshirazu).

MinION: 10 ng DNA from ‘Himeshirazu’ were used in the PCR reaction with primers CCW2-2_F (5’-TGACTGATCCTGCTGTGAGAATGTT-3’) [Chr07:4559602-4559619] and CCW2-8_R (5’-TGTAACGTAGGAAAATGACAACACTACATC-3’) [Chr07:4602994-4602971] for the amplification of approximately an 11-kb region in the reference Gmax275 genome. PCR was performed using the GeneAmp PCR PCR System 9700 (Thermo Fisher Scientific Inc.) using PrimeSTAR GXL DNA Polymerase (Takara Bio Inc., Shiga, Japan). The PCR conditions were as follows: initial denaturation at 94 °C for 1 min, 30 cycles of denaturation at 98 °C for 10 s, and annealing and extension at 68 °C for 10 min. The PCR products were electrophoresed on 0.8% agarose gel using the HindIII DNA ladder (Takara Bio Inc., Shiga, Japan) and stained with ethidium bromide. The amplicon size from ‘Himeshirazu’ was approximately 18 kb (between 9416 bp and 23130 bp fragment of HindIII marker). The amplicon (1 µg) was end-repaired and dA-tailed using the NEBNext End-Repair and NEBNext dA-Tailing modules (New England Biolabs, MA, USA) according to the manufacturer's instructions. Then, the sequencing adapter was ligated to the dA-tailed DNA using the Blunt/TA Ligase Master Mix (New England Biolabs, MA, USA) according to the manufacturer's instructions using the 1D Amplicon Sequencing SQK-LSK108, R9 version (Oxford Nanopore Technologies Ltd.). Sequencing was performed using a MinION flow cell (R9.4, ONT) in the MinION portable sequencer (Oxford Nanopore Technologies Ltd.). The sequencing run was performed using the MinKNOW software (version 1.7.3, Oxford Nanopore Technologies Ltd.) with the live basecalling option disabled. The run time was 31 h. The resulting FAST5 files in the “pass” folders, which correspond to sequences with high quality scores, were converted to FASTQ files using the Albacore basecaller (version 1.1.0, ONT). The FASTQ file was deposited in the SRA (biosample accession number: SAMD00238644) under the bioproject accession number DRA010652. The analyzed sequence file (FASTA) was deposited to figshare (doi:10.6084/m9.figshare.13220792).

2.3. Variant call and coverage analysis using HiSeq short-read sequence data

After trimmed paired reads were mapped on the soybean genome reference (Gmax275: http://genome.jgi.doe.gov/pages/dynamicOrganismDownload.jsf?organism=Phytozome#, downloaded on May 15, 2015) [7] using BWA-MEM [11]. We obtained mapping rates of 99.4% and 99.2% with 94.2% and 95.8% coverage of the reference (with x13.5 and x13.3 coverage of the covered regions) from ‘Fukuyutaka’ and ‘Himeshirazu’. Reads were then preprocessed using samtools v.1.3.1 [12] to convert SAM into BAM, which was sorted by coordinate order. Duplicate reads were marked using Picard MarkDuplicates (v.2.7.1) with the option “ASSUME_SORTED=true REMOVE_DUPLICATES=true” (http://broadinstitute.github.io/picard/). For local realignment and base quality score recalibration of the mapped reads, the tools RealignerTargetCreator, IndelRealigner, and BaseRecalibrator from GATK (Genome Analysis Toolkit) v.3.7.0 [13] were applied. All tools were used with the recommended standard settings [14,15]. This workflow design is in accordance with the best practices from the Broad Institute. Variants were called using the tool HaplotypeCaller with the option “–emitRefConfidence GVCF -variant_index_type LINEAR -variant_index_parameter 128000.” They were filtered with the filtering option “DP>100 || DP<5 || QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0” by VariantFiltration from GATK (McKenna et al., 2010). Then we generated a combined GVCF file with dbSNP using GenotyepGVCFs from GATK (McKenna et al., 2010). The dbSNP file was downloaded from NCBI (downloaded on May 31, 2016, from the dbSNP site of NCBI, which is now closed; currently, dbSNP information on soybean is being accessed from the European Variation Archive at EMBL-EBI). Because the reference genome (Glycine max v2.0) listed in NCBI and Gmax275 in Phytozome have different physical positions in some sequences, we created and used a modified dbSNP file for the Gmax275 position.

2.4. Identification of the unique genomic sequence in the CCW-2 region of ‘Himeshirazu’ using MinION long-read sequence data

The 28,725 reads derived from the MinION sequencing platform were input to canu-1.6 with the options (-p asm -d gmax_amplicon genomeSize=15000 correctedErrorRate=0.5 -nanopore-raw all.fastq gnuplotTested=true useGrid=false). After quality control and trimming, only 18 long reads were remained. The homologies of the 18 reads to CCW-2 regions were analyzed by blastn in the BLAST+ [16] and detected an insertion region of 7.2–7.5kb that did not hit the reference sequence (Table 3). Four of 18 reads showed tandem repeat sequence, and the length of the read was about twice the size of the PCR product, suggesting that the four reads are a chimera. Then, by using 14 MinION reads, a consensus sequence was generated. From the consensus sequence, 7.7kb insertion (breakpoint junction on Chr07:4588576-4588579 [TGGA]) was detected by comparing with Gmax275 reference genome (Fig. 3).

Declaration of Competing Interest

The authors declare that they have no competing financial interests or personal relationships that can influence the work reported in this paper.

Acknowledgments

This work was supported by a grant from the Ministry of Agriculture, Forestry and Fisheries of Japan [Genomics-based Technology for Agricultural Improvement (SFC1001)] and the Special Scheme Project on Advanced Research and Development for Next-Generation Technology from the Ministry of Agriculture, Forestry and Fisheries of Japan (Grant ID in e-Rad: 16781507).

Footnotes

Supplementary material associated with this article can be found in the online version at doi:10.1016/j.dib.2020.106577.

Contributor Information

Eri Ogiso-Tanaka, Email: demeter298@gmail.com.

Akito Kaga, Email: kaga@affrc.go.jp.

Appendix. Supplementary materials

mmc1.xlsx (433.8KB, xlsx)

References

  • 1.Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu:scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. Adv. Publication. 2017 doi: 10.1101/gr.215087.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Komatsu K., Okuda S., Takahashi M, Matsunaga R. Anti-biotic effect of insect-resistant soybean on common cutworm (Spodoptera litura) and its inheritance. Breed. Sci. 2004;54:27–32. [Google Scholar]
  • 3.Komatsu K., Okuda S., Takahashi M., Matsunaga R., Nakazawa Y. QTL mapping of antibiosis resistance to common cutworm (Spodoptera litura Fabricius) in soybean. Crop Sci. 2005;45:2044–2048. [Google Scholar]
  • 4.Komatsu K., Okuda S., Takahashi M., Matsunaga R., Nakazawa Y. Quantitative trait loci mapping of pubescence density and flowering time of insect-resistant soybean (Glycine max L. Merr.) Genet. Mol. Biol. 2007;30:635–639. [Google Scholar]
  • 5.Oki N, Komatsu K, Sayama T, Ishimoto M, Takahashi M, Takahashi M. Genetic analysis of antixenosis resistance to the common cutworm (Spodoptera litura Fabricius) and its relationship with pubescence characteristics in soybean (Glycine max (L.) Merr.) Breeding Sci. 2012;61:608–617. doi: 10.1270/jsbbs.61.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Komatsu K., Takahashi M., Nakazawa Y. Antibiosis resistance of QTL introgressive soybean lines to common cutworm (Spodoptera litura Fabricius) Crop Sci. 2008;48:527–532. [Google Scholar]
  • 7.Schmutz J.S., Cannon B., Schlueter J., Ma J., Mitros T., Nelson W., Hyten D.L. Genome sequence of the palaeopolyploid soybean. Nature. 2010;463:178–183. doi: 10.1038/nature08670. [DOI] [PubMed] [Google Scholar]
  • 8.Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: A fast and versatile genome alignment system. PLoS Comput. Biol. 2018;14(1) doi: 10.1371/journal.pcbi.1005944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Peterson D.G., Tomkins J.P., Frisch D.A., Wing R.A., Paterson A.H. “Construction of plant bacterial artificial chromosome (BAC) libraries: an illustrated guide. J. Agricultural Genomics. 2000;5:1–100. [Google Scholar]
  • 10.Bolger M.Lohse, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., 1000 Genome Project Data Processing Subgroup The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A. The Genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C. A framework for variation discovery and genotyping using next generation DNA sequencing data. Nat Genet. 2011;43:491–498. doi: 10.1038/ng.806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A. From FastQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinform. 2013;11 doi: 10.1002/0471250953.bi1110s43. 11.10.1–11.10.33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., Madden T.L. “BLAST+: architecture and applications”. BMC Bioinformatics. 2008;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.xlsx (433.8KB, xlsx)

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES