Genome sequencing data for wild and cultivated bananas, plantains and abacá

Christine Sambles; Lakshmipriya Venkatesan; Olanrewaju M Shittu; James Harrison; Karen Moore; Leena Tripathi; Murray Grant; Rachel Warmington; David J Studholme

doi:10.1016/j.dib.2020.106341

. 2020 Sep 23;33:106341. doi: 10.1016/j.dib.2020.106341

Genome sequencing data for wild and cultivated bananas, plantains and abacá

Christine Sambles ^a, Lakshmipriya Venkatesan ^a, Olanrewaju M Shittu ^a, James Harrison ^a, Karen Moore ^a, Leena Tripathi ^b, Murray Grant ^c, Rachel Warmington ^d, David J Studholme ^a,^⁎

PMCID: PMC7549061 PMID: 33072825

Abstract

We performed shotgun genome sequencing on a total of 19 different Musa genotypes including representatives of wild banana species Musa acuminata and M. balibisiana, allopolyploid bananas and plantains, Fe'i banana, pink banana (also known as hairy banana) and abacá (also known as hemp banana). We aligned sequence reads against a previously sequenced reference genome and assessed ploidy and, in the case of allopolyploids, the contributions of the A and B genomes; this provides important quality-assurance data about the taxonomic identities of the sequenced plant material. These data will be useful for phylogenetics, crop improvement, studies of the complex story of intergenomic recombination in AAB and ABB allotriploid bananas and plantains and can be integrated into resources such as the Banana Genome Hub.

Keywords: Musa, Genome, Plant, Diversity, Sequence analysis, DNA

Specifications Table

Subject	Biology
Specific subject area	Genomics of crop plants
Type of data	Deoxyribonucleic acid (DNA) sequence
How data were acquired	Shotgun genomic DNA sequencing was performed using Illumina HiSeq 2500, Illumina NovaSeq and BGIseq-500 platforms
Data format	Raw sequencing reads
Parameters for data collection	DNA was extracted from leaf material
Description of data collection	Shotgun genomic DNA sequencing was performed using Illumina HiSeq 2500, Illumina NovaSeq and BGIseq-500 platforms
Data source location	Institution: University of Exeter City: Exeter Country: United Kingdom Latitude and longitude (and GPS coordinates) for collected samples/data: Plant samples were collected from the Eden Project at 50.3601° N, 4.7447° W (50.357165238 -4.740163)
Data accessibility	Repository name: NCBI BioProject Data identification numbers: PRJNA540118, PRJNA413600 Direct URLs to data: https://www.ncbi.nlm.nih.gov/bioproject/540118 https://www.ncbi.nlm.nih.gov/bioproject/413600

Open in a new tab

Value of the Data

•
This genomic resequencing data will inform studies of Musa evolution, biodiversity, speciation and allopolyploidy.
•
Genome-wide sequence data are presented for abacá (Musa textilis), the Fe'i banana (M. troglodytarum) and the pink banana (M. velutina) as well as edible and wild bananas and plantains belonging to the species M. acuminata and M. balbisiana and their interspecific hybrids.
•
This is a useful resource for breeders, researchers as well as science communicators engaging with the general public about the germplasm collection at the Eden Project.
•
The data can be mined for polymorphisms with value as markers for breeding strategies.
•
These data can be integrated into banana genomics resources such as the Banana Genome Hub [1].
•
Since some samples were sequenced using more than one method, the data can be used to compare performances of alternative sequencing platforms [2].

1. Data Description

Genomic shotgun sequencing data was generated using BGIseq-500 (Table 1), Illumina HiSeq 2500 using libraries of two different sizes (Tables 2 and 3) and Illumina NovaSeq 6000 (Table 4). This generated a total of 505.69 GB and 120.95 GB raw read data for the Eden Project and IITA accessions respectively. Raw data is available at NCBI's Sequence Read Archive [3] via BioProjects PRJNA540118 and PRJNA413600.

Table 1.

Genomic sequencing data generated using BGIseq-500 (2 × 150 bp reads, 300-bp insert size).

BioSample	SRA accession	Eden project identifier	Received as	Depth of coverage
SAMN11522014	SRR8989628, SRR9734077	2012-1161	Musa acuminata ‘Green-Red’	59 ×
SAMN11522015	SRR8989629	2012-1156	Musa acuminata ‘Paka’	28 ×
SAMN11522016	SRR8989630, SRR9734074	2012-1173	Musa acuminata subsp. zebrina	54 ×
SAMN11522017	SRR8989631, SRR9734078	2011-0950	Musa acuminata× balbisiana ‘Congo 2’ (plantain subgroup)	59 ×
SAMN11522018	SRR8989632	2012-1154	Musa acuminata subsp. malaccensis	28 ×
SAMN11522019	SRR8989633, SRR9734079, SRR9850640	2001-1027	Musa balbisiana	52 ×
SAMN11522020	SRR8989634, SRR9734076, SRR9850639	2012-1164	Musa acuminata ‘Calypso’	54 ×
SAMN11522021	SRR8989635	2012-1152	Musa acuminata× balbisiana ‘Safet Velchi’ (Ney Poovan subgroup)	30 ×
SAMN11522022	SRR8989636	2011-0952	Musa acuminata× balbisiana “One Hand Planty”	28 ×
SAMN11522023	SRR8989637	1999-2846	Musa× paradisiaca^a	31 ×
SAMN11522024	SRR8989638	1998-2307	Musa acuminata ‘Pisang Mas’ (Sucrier subgroup)	32 ×
SAMN11522025	SRR8989639, SRR9850642	1999-0524	Musa textilis
SAMN11522026	SRR8989640, SRR9734080, SRR9850641	1999-0158	Musa troglodytarum ‘Wain’ (F'ei group)	36 ×
SAMN11522027	SRR8989641, SRR9734075	2012-1166	Musa velutina	47 ×

Open in a new tab

Accession 1999-2846 was received as Musa × paradisiaca but genome sequence data suggest that it is Musa acuminata.

Table 2.

Genomic sequencing data generated using Illumina HiSeq (2 × 150 bp reads, 800-bp insert size).

BioSample	SRA accession	Eden project identifier	Received as	Depth of coverage
SAMN11522025	SRR9696635	1999-0524	Musa textilis	23 ×
SAMN11522021	SRR9696636	2012-1152	Musa acuminata× balbisiana ‘Safet Velchi’ (Ney Poovan subgroup)	36 ×

Open in a new tab

Table 3.

Genomic sequencing data generated using Illumina HiSeq (2 × 125 bp reads, 300-bp insert).

BioSample	SRA accession	Received as	Depth of coverage
SAMN07758499	SRR6147591	Musa acuminata× balbisiana ‘Sukali Ndiizi’ (AAB group)	53 ×
SAMN07758501	SRR6147590	Musa acuminata× balbisiana ‘Gonja Manjaya’ (AAB group)	18 ×
SAMN07758502	SRR6147593	Musa acuminata ‘Cavendish’ (AAA group)	23 ×
SAMN07758503	SRR6147592	Musa balbisiana	24 ×
SAMN07758500	SRR6147589	Musa acuminata× balbisiana ‘Pisang Awak’ (ABB group)	28 ×

Open in a new tab

Table 4.

Genomic sequencing data generated using Illumina NovaSeq 6000 (2 × 150 bp reads, 300-bp insert size).

BioSample	SRA accession	Eden project identifier	Received as	Depth of coverage
SAMN11522021	SRR9015638	2012-1152	Musa acuminata× balbisiana ‘Safet Velchi’ (Ney Poovan subgroup)	30 ×
SAMN11522022	SRR9015637	2011-0952	Musa acuminata× balbisiana ‘One Hand Planty’	28 ×

Open in a new tab

An important quality control step is to check whether the sequence data are consistent with the botanical identifications of the source material. Therefore, we assessed observed against expected levels of ploidy. For allopolyploids purported to originate from interspecific hybrids between Musa acuminata and Musa balbisiana, we assessed the relative contributions of these respective “A” and “B” genomes compared against the expected characteristics of each sample as described under Experimental Design, Materials, and Methods. The resulting quality-control metrics are summarised in Table 5 and in Fig. 1. Accessions 2012-1152 (SAMN11522021), 1999-2846 (SAMN11522023) and 2011-0950 (SAMN11522017) were expected to be allopolyploids containing contributions from both the A and B genomes but sequence data appeared to be exclusively from the A genome, suggesting that these three plants had been mis-identified. Further, there were discrepancies between the expected ploidy levels versus the empirically inferred levels in several accessions.

Table 5.

Ploidy prediction and estimated composition of 16 accessions of Musa spp.^a

BioSample	Name	Expected ploidy	Observed ploidy according to nQuire (if different to expected)	Expected composition	SNP data consistent with expected composition?
SAMN11522018	Musa acuminata subsp. malaccensis	2		AA	Yes
SAMN11522015	Musa acuminata ‘Paka’	2		AA	Yes
SAMN11522014	Musa acuminata ‘Green-Red’	3		AAA	Yes
SAMN11522016	Musa acuminata subsp. zebrina	2	4	AA	Yes
SAMN07758502	Musa acuminata ‘Cavendish’	3		AAA	Yes
SAMN11522020	Musa acuminata ‘Calypso’	4		AAAA	Yes
SAMN11522021	Musa acuminata× balbisiana ‘Safet Velchi’ (Ney Poovan subgroup)	2	3	AB	No: appears to be exclusively A
SAMN07758499	Musa acuminata× balbisiana ‘Sukali Ndiizi’	3		AAB	Yes
SAMN07758501	Musa acuminata× balbisiana ‘Gonja Manjaya’	3		AAB	Yes
SAMN11522022	Musa acuminata× balbisiana ‘One Hand Planty’	3		AAB	Yes
SAMN07758500	Musa acuminata× balbisiana ‘Pisang Awak’	3	4	ABB	Yes
SAMN11522019	Musa balbisiana	2	4	BB	Yes
SAMN07758503	Musa balbisiana	2	4	BB	Yes
SAMN11522024	Musa acuminata ‘Pisang Mas’ (Sucrier subgroup)	2		AA	Yes
SAMN11522017	Musa acuminata× balbisiana ‘Congo 2’ (plaintain subgroup)	3		AAB	No: appears to be exclusively A
SAMN11522023	Musa× paradisiaca	2	3	AAB or ABB	No: appears to be exclusively A

Open in a new tab

Ploidy analysis was only performed on M. acuminata, M. balbisiana accessions and their hybrids. Consequently, Musa textilis (SAMN11522025), Musa troglodytarum ‘Wain’ (F'ei group) (SAMN11522026) and Musa velutina (SAMN11522027) were excluded.

Fig 1 — Circos representation of informative SNP variants identified in the 11 chromosomes of *M. acuminata*. The lines represent the LOESS smoothed percentage of B allele of 16 sequenced *Musa* accessions (*M. acuminata, M. balbisiana* and their hybrids). *Musa* accessions with the highest percentage of A genome are at the centre graduating to those with the highest percentage of B genome on the outside, according to the 1542 identified SNPs. Background colours represent percentage of B allele: green (0–33%), grey (33–66%) and red (66–100%). Tracks from outer (B allele dominant) to inner (A allele dominant) are: a. *M. balbisiana* (SAMN11522019), b. *M. balbisiana* (SAMN07758503), c. *M. acuminata*× *balbisiana* ‘Pisang Awak’ (SAMN07758500), d. *M. acuminata*× *balbisiana* ‘One Hand Planty’ (SAMN11522022), e. *M. acuminata*× *balbisiana* ‘Gonja Manjaya’ – AAB group (SAMN07758501), f. *M. acuminata*× *balbisiana* ‘Sukali Ndiizi’ (SAMN07758499), g. *Musa*× *paradisiaca* (SAMN11522023), h. *M. acuminata*× *balbisiana* ‘Safet Velchi’ – Ney Poovan subgroup (SAMN11522021), i. *M. acuminata* ‘Calypso’ (SAMN11522020), j. *M. acuminata* x *balbisiana* ‘Congo 2’ – plantain subgroup (SAMN11522017), k. *M. acuminata* ‘Pisang Mas’ – Sucrier subgroup (SAMN11522024), l. *M. acuminata* subsp. *malaccensis* (SAMN11522018), m. *M. acuminata* ‘Paka’ (SAMN11522015), n. *M. acuminata* ‘Green-Red’ (SAMN11522014), o. *M. acuminata* subsp. *zebrina* (SAMN11522016), p. *M. acuminata* ‘Cavendish’ – AAA group (SAMN07758502).

* A_n describes A genome autopolyploidy i.e. AA or AAA or AAAA.

2. Experimental Design, Materials and Methods

Fresh leaf material was obtained from five accessions from the IITA (International Institute of Tropical Agriculture) [4] accessions and 14 from the Eden Project. DNA was extracted from fresh leaf material and sequenced using a combination of Illumina HiSeq 2500, Illumina NovaSeq 6000 and BGIseq-500 platforms. This yielded at least 20 × coverage of each genome and was sufficient for calling single-nucleotide polymorphisms, detecting presence/absence polymorphisms and cataloguing patterns of heterozygosity.

From the 14 plant accessions from the Eden Project, cigar leaves were cut from the plant and lyophilised in a freeze dryer before sending to BGI Tech Solutions (Hong Kong) Co., Limited, where DNA extraction and sequencing was performed.

For the five accessions from the IITA (International Institute of Tropical Agriculture), genomic DNA was isolated using a modified CTAB (hexadecyltrimethylammonium bromide) extraction method [5]. The University of Exeter's Sequencing Service prepared Illumina sequencing libraries after fragmenting 500 ng of DNA to an average size of 500 bp, using the NEXTflex 8-barcode Rapid DNAseq kit sequencing (Perkin Elmer) with adapters containing indexes and 5–8 cycles polymerase chain reaction (PCR) [6]. Library quality was determined using D1000 screen-tapes (Agilent) and libraries were either sequenced individually or combined in equimolar pools. Sequencing was performed on a single lane of a high-output v4 flow-cell on the Illumina HiSeq 2500 at the University of Exeter, yielding pairs of 125-bp reads.

This yielded at least 20 × coverage of each genome, sufficient for calling single-nucleotide polymorphisms, detecting presence/absence polymorphisms and cataloguing patterns of heterozygosity. Reads were also generated with longer inserts using the Illumina HiSeq (2 × 150 bp reads, 800-bp insert size) for two of the samples, which potentially aids resolution of sequence repeats if data are used in de novo assembly of genomes.

The quality of the sequencing reads was evaluated using FASTQC [7]. Before further analyses, reads were trimmed and adapters removed using TrimGalore [8] with command-line options “-q 30 –paired”. Trimmed and filtered reads were aligned to the M. acuminata genome [9] using BWA [10] to generate binary alignment map (BAM) files [11].

As a prerequisite for plotting the relative contributions of the A and B genomes in allopolyploids, we first identified a set of informative SNPs that distinguish A (M. acuminata) from B (M. balbisiana) as follows utilising SAMtools’ mpileup function, BCFtools [11,12] and custom scripts available at https://github.com/davidjstudholme/SNPsFromPileups. First, the relevant BAM alignment files were converted into uncompressed VCF format using SAMtools v1.6 (mpileup function), selecting for variant sites only (-v) using the alternative model for multiallelic and rare-variant calling (-m). Potential SNPs were filtered using the filter function of BCFtools (v1.6), excluding potential SNPs that were within 100 base pairs of an indel (–SnpGap 100) and had a quality score of less than 35 (QUAL>=35) with a depth of 5 or more reads (MIN(DP)>=5). The minimum number of reads supporting an indel was set to two (MIN(IDV)>=2). Variants that were flagged as indels were excluded (INDEL=0). The resulting filtered VCF files contained the positions of candidate SNPs that distinguished the B genome [13] versus the A reference genome [14]. At each of these informative SNPs, we quantified the relative abundance of the A- and B- alleles, only considering sites where the depth was between 10 and 50. When plotting, the resulting percentage of the B allele was smoothed in R using the LOESS package [15]. The percentages of the B alleles at each SNP were visualised using Circos [16] (Fig. 1).

We used nQuire [17] to estimate ploidy from the BAM files (of genomic reads aligned agains the M. acuminata reference genome). After de-noising to remove noise from mis-mapping due to highly repetitive regions, we assessed ploidy level using the lrdmodel command of nQuire to produce delta log-likelihoods of diploidy, triploidy or tetraploidy. The lowest delta log-likelihood was taken to indicate the most likely ploidy level (Table 5). To infer ploidy levels, we used nQuire [17] to predict ploidy using BAM alignment files generated with BWA. The ploidy model yielding lowest value of ΔlogL was chosen as the inferred ploidy. The command lines used were as follows:

nQuire create -b example.bam -o example

for i in *.bin; do echo $i; nQuire denoise $i -o $i\_denoised; done

for i in *_denoised.bin; do echo $i; nQuire lrdmodel -t 8 $i; done

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.

Acknowledgements

David Studholme and Christine Sambles were supported by “MUSA: Microbial Uptakes for Sustainable management of major bananA pests and diseases” (Grant ID 727624, European Union (Horizon 2020)). DNA sequencing costs were supported by a grant from the Gatsby Charitable Foundation entitled “Banana Genetic Resources at Eden project” (GAT3587). We are grateful to Peggy Dousseaud for assistance with lyophilising leaf material and to Hetty Ninnis for expert assistance in collecting plant material at the Eden Project. This project made use of the University of Exeter's high-performance computing facility, Isca. This project utilised DNA sequencing equipment (Illumina HiSeq) funded by the Wellcome Trust Institutional Strategic Support Fund (WT097835MF), Wellcome Trust Multi-User Equipment Award (WT101650MA) and BBSRC LOLA award (BB/K003240/1).

References

1.Droc G., Larivière D., Guignon V., Yahiaoui N., This D., Garsmeur O., Dereeper A., Hamelin C., Argout X., Dufayard J.-F., Lengelle J., Baurens F.-C., Cenci A., Pitollat B., D'Hont A., Ruiz M., Rouard M., Bocs S. The banana genome hub. Database. 2013;2013:bat035. doi: 10.1093/database/bat035. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Zhu F.-Y., Chen M.-X., Ye N.-H., Qiao W.-M., Gao B., Law W.-K., Tian Y., Zhang D., Zhang D., Liu T.-Y., Hu Q.-J., Cao Y.-Y., Su Z.-Z., Zhang J., Liu Y.-G. Comparative performance of the BGISEQ-500 and Illumina HiSeq4000 sequencing platforms for transcriptome analysis in plants. Plant Methods. 2018;14:69. doi: 10.1186/s13007-018-0337-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Leinonen R., Sugawara H., Shumway M. The sequence read archive. Nucleic Acids Res. 2011;39:D19–D21. doi: 10.1093/nar/gkq1019. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Pillay M., Ogundiwin E., Tenkouano A., Dolezel J. Ploidy and genome composition of Musa germplasm at the International Institute of Tropical Agriculture (IITA) Afr. J. Biotechnol. 2006;5:1224–1232. [Google Scholar]
5.Gawel N.J., Jarret R.L. A modified CTAB DNA extraction procedure forMusa andIpomoea. Plant Mol. Biol. Rep. 1991;9:262–266. [Google Scholar]
6.Head S.R., Komori H.K., LaMere S.A., Whisenant T., Van Nieuwerburgh F., Salomon D.R., Ordoukhanian P. Library construction for next-generation sequencing: Overviews and challenges. Biotechniques. 2014;56 doi: 10.2144/000114133. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.S. Andrews, FastQC: a quality control tool for high throughput sequence data, (2019) Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc.
8.F. Krueger, Babraham Bioinformatics – Trim Galore!, (2019) Available online at: http://www.bioinformatics.babraham.ac.uk/projects/trim_galore.
9.D'Hont A., Denoeud F., Aury J.-M.J., Baurens F.-C.F., D'Hont A., Carreel F., Garsmeur O., Noel B., Bocs S., Droc G., Rouard M., Da Silva C., Jabbari K., Cardi C., Poulain J., Souquet M., Labadie K., Jourda C., Lengellé J., Rodier-Goud M., Alberti A., Bernard M., Correa M., Ayyampalayam S., Mckain M.R., Leebens-Mack J., Burgess D., Freeling M., Mbéguié-A-Mbéguié D., Chabannes M., Wicker T., Panaud O., Barbosa J., Hribova E., Heslop-Harrison P., Habas R., Rivallan R., Francois P., Poiron C., Kilian A., Burthia D., Jenny C., Bakry F., Brown S., Guignon V., Kema G., Dita M., Waalwijk C., Joseph S., Dievart A., Jaillon O., Leclercq J., Argout X., Lyons E., Almeida A., Jeridi M., Dolezel J., Roux N., Risterucci A.-M., Weissenbach J., Ruiz M., Glaszmann J.-C., Quétier F., Yahiaoui N., Wincker P. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature. 2012;488:213–217. doi: 10.1038/nature11241. [DOI] [PubMed] [Google Scholar]
10.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. Subgroup, 1000 genome project data processing the sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Wang Z., Miao H., Liu J., Xu B., Yao X., Xu C., Zhao S., Fang X., Jia C., Wang J., Zhang J., Li J., Xu Y., Wang J., Ma W., Wu Z., Yu L., Yang Y., Liu C., Guo Y., Sun S., Baurens F., Martin G., Salmon F., Garsmeur O., Yahiaoui N., Hervouet C., Rouard M., Laboureau N., Habas R., Ricci S., Peng M., Guo A., Xie J., Li Y., Ding Z., Yan Y., Tie W., D'Hont A., Hu W., Jin Z. Musa balbisiana genome reveals subgenome evolution and functional divergence. Nat. Plants. 2019 doi: 10.1038/s41477-019-0452-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Martin G., Baurens F.-C., Droc G., Rouard M., Cenci A., Kilian A., Hastie A., Doležel J., Aury J.-M., Alberti A., Carreel F., D'Hont A. Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods. BMC Genom. 2016;17:243. doi: 10.1186/s12864-016-2579-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Cleveland W.S., Grosse E., Shyu W.M. Statistical Models in S. Routledge; 2018. Local regression models; pp. 309–376. [Google Scholar]
16.Krzywinski M., Schein J., Birol I., Connors J., Gascoyne R., Horsman D., Jones S.J., Marra M.A. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Weiß C.L., Pais M., Cano L.M., Kamoun S., Burbano H.A. nQuire: a statistical framework for ploidy estimation using next generation sequencing. BMC Bioinform. 2018;19:122. doi: 10.1186/s12859-018-2128-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0001] 1.Droc G., Larivière D., Guignon V., Yahiaoui N., This D., Garsmeur O., Dereeper A., Hamelin C., Argout X., Dufayard J.-F., Lengelle J., Baurens F.-C., Cenci A., Pitollat B., D'Hont A., Ruiz M., Rouard M., Bocs S. The banana genome hub. Database. 2013;2013:bat035. doi: 10.1093/database/bat035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0002] 2.Zhu F.-Y., Chen M.-X., Ye N.-H., Qiao W.-M., Gao B., Law W.-K., Tian Y., Zhang D., Zhang D., Liu T.-Y., Hu Q.-J., Cao Y.-Y., Su Z.-Z., Zhang J., Liu Y.-G. Comparative performance of the BGISEQ-500 and Illumina HiSeq4000 sequencing platforms for transcriptome analysis in plants. Plant Methods. 2018;14:69. doi: 10.1186/s13007-018-0337-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0003] 3.Leinonen R., Sugawara H., Shumway M. The sequence read archive. Nucleic Acids Res. 2011;39:D19–D21. doi: 10.1093/nar/gkq1019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0004] 4.Pillay M., Ogundiwin E., Tenkouano A., Dolezel J. Ploidy and genome composition of Musa germplasm at the International Institute of Tropical Agriculture (IITA) Afr. J. Biotechnol. 2006;5:1224–1232. [Google Scholar]

[bib0005] 5.Gawel N.J., Jarret R.L. A modified CTAB DNA extraction procedure forMusa andIpomoea. Plant Mol. Biol. Rep. 1991;9:262–266. [Google Scholar]

[bib0006] 6.Head S.R., Komori H.K., LaMere S.A., Whisenant T., Van Nieuwerburgh F., Salomon D.R., Ordoukhanian P. Library construction for next-generation sequencing: Overviews and challenges. Biotechniques. 2014;56 doi: 10.2144/000114133. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0007] 7.S. Andrews, FastQC: a quality control tool for high throughput sequence data, (2019) Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc.

[bib0008] 8.F. Krueger, Babraham Bioinformatics – Trim Galore!, (2019) Available online at: http://www.bioinformatics.babraham.ac.uk/projects/trim_galore.

[bib0009] 9.D'Hont A., Denoeud F., Aury J.-M.J., Baurens F.-C.F., D'Hont A., Carreel F., Garsmeur O., Noel B., Bocs S., Droc G., Rouard M., Da Silva C., Jabbari K., Cardi C., Poulain J., Souquet M., Labadie K., Jourda C., Lengellé J., Rodier-Goud M., Alberti A., Bernard M., Correa M., Ayyampalayam S., Mckain M.R., Leebens-Mack J., Burgess D., Freeling M., Mbéguié-A-Mbéguié D., Chabannes M., Wicker T., Panaud O., Barbosa J., Hribova E., Heslop-Harrison P., Habas R., Rivallan R., Francois P., Poiron C., Kilian A., Burthia D., Jenny C., Bakry F., Brown S., Guignon V., Kema G., Dita M., Waalwijk C., Joseph S., Dievart A., Jaillon O., Leclercq J., Argout X., Lyons E., Almeida A., Jeridi M., Dolezel J., Roux N., Risterucci A.-M., Weissenbach J., Ruiz M., Glaszmann J.-C., Quétier F., Yahiaoui N., Wincker P. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature. 2012;488:213–217. doi: 10.1038/nature11241. [DOI] [PubMed] [Google Scholar]

[bib0010] 10.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0011] 11.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. Subgroup, 1000 genome project data processing the sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0012] 12.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0013] 13.Wang Z., Miao H., Liu J., Xu B., Yao X., Xu C., Zhao S., Fang X., Jia C., Wang J., Zhang J., Li J., Xu Y., Wang J., Ma W., Wu Z., Yu L., Yang Y., Liu C., Guo Y., Sun S., Baurens F., Martin G., Salmon F., Garsmeur O., Yahiaoui N., Hervouet C., Rouard M., Laboureau N., Habas R., Ricci S., Peng M., Guo A., Xie J., Li Y., Ding Z., Yan Y., Tie W., D'Hont A., Hu W., Jin Z. Musa balbisiana genome reveals subgenome evolution and functional divergence. Nat. Plants. 2019 doi: 10.1038/s41477-019-0452-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0014] 14.Martin G., Baurens F.-C., Droc G., Rouard M., Cenci A., Kilian A., Hastie A., Doležel J., Aury J.-M., Alberti A., Carreel F., D'Hont A. Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods. BMC Genom. 2016;17:243. doi: 10.1186/s12864-016-2579-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0015] 15.Cleveland W.S., Grosse E., Shyu W.M. Statistical Models in S. Routledge; 2018. Local regression models; pp. 309–376. [Google Scholar]

[bib0016] 16.Krzywinski M., Schein J., Birol I., Connors J., Gascoyne R., Horsman D., Jones S.J., Marra M.A. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0017] 17.Weiß C.L., Pais M., Cano L.M., Kamoun S., Burbano H.A. nQuire: a statistical framework for ploidy estimation using next generation sequencing. BMC Bioinform. 2018;19:122. doi: 10.1186/s12859-018-2128-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Genome sequencing data for wild and cultivated bananas, plantains and abacá

Christine Sambles

Lakshmipriya Venkatesan

Olanrewaju M Shittu

James Harrison

Karen Moore

Leena Tripathi

Murray Grant

Rachel Warmington

David J Studholme

Abstract

Specifications Table

Value of the Data

1. Data Description

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

Fig. 1.

2. Experimental Design, Materials and Methods

Declaration of Competing Interest

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Genome sequencing data for wild and cultivated bananas, plantains and abacá

Christine Sambles

Lakshmipriya Venkatesan

Olanrewaju M Shittu

James Harrison

Karen Moore

Leena Tripathi

Murray Grant

Rachel Warmington

David J Studholme

Abstract

Specifications Table

Value of the Data

1. Data Description

Table 1.

Table 2.

Table 3.

Table 4.

Table 5.

Fig. 1.

2. Experimental Design, Materials and Methods

Declaration of Competing Interest

Acknowledgements

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases