The Genome of the Beluga Whale (Delphinapterus leucas)

Steven J M Jones; Gregory A Taylor; Simon Chan; René L Warren; S Austin Hammond; Steven Bilobram; Gideon Mordecai; Curtis A Suttle; Kristina M Miller; Angela Schulze; Amy M Chan; Samantha J Jones; Kane Tse; Irene Li; Dorothy Cheung; Karen L Mungall; Caleb Choo; Adrian Ally; Noreen Dhalla; Angela K Y Tam; Armelle Troussard; Heather Kirk; Pawan Pandoh; Daniel Paulino; Robin J N Coope; Andrew J Mungall; Richard Moore; Yongjun Zhao; Inanc Birol; Yussanne Ma; Marco Marra; Martin Haulena

doi:10.3390/genes8120378

. 2017 Dec 11;8(12):378. doi: 10.3390/genes8120378

The Genome of the Beluga Whale (Delphinapterus leucas)

Steven J M Jones ^1,^2,^3,^*, Gregory A Taylor ¹, Simon Chan ¹, René L Warren ¹, S Austin Hammond ¹, Steven Bilobram ¹, Gideon Mordecai ^4,⁵, Curtis A Suttle ^4,^5,^6,⁷, Kristina M Miller ⁸, Angela Schulze ⁸, Amy M Chan ^4,⁵, Samantha J Jones ^1,³, Kane Tse ¹, Irene Li ¹, Dorothy Cheung ¹, Karen L Mungall ¹, Caleb Choo ¹, Adrian Ally ¹, Noreen Dhalla ¹, Angela K Y Tam ¹, Armelle Troussard ¹, Heather Kirk ¹, Pawan Pandoh ¹, Daniel Paulino ¹, Robin J N Coope ¹, Andrew J Mungall ¹, Richard Moore ¹, Yongjun Zhao ¹, Inanc Birol ^1,³, Yussanne Ma ¹, Marco Marra ^1,³, Martin Haulena ⁹

PMCID: PMC5748696 PMID: 29232881

Abstract

The beluga whale is a cetacean that inhabits arctic and subarctic regions, and is the only living member of the genus Delphinapterus. The genome of the beluga whale was determined using DNA sequencing approaches that employed both microfluidic partitioning library and non-partitioned library construction. The former allowed for the construction of a highly contiguous assembly with a scaffold N50 length of over 19 Mbp and total reconstruction of 2.32 Gbp. To aid our understanding of the functional elements, transcriptome data was also derived from brain, duodenum, heart, lung, spleen, and liver tissue. Assembled sequence and all of the underlying sequence data are available at the National Center for Biotechnology Information (NCBI) under the Bioproject accession number PRJNA360851A.

Keywords: genome, genome assembly, beluga whale, Delphinapterus leucas, Cetacea

1. Introduction

The beluga or white whale has a circumpolar distribution in arctic and subarctic regions [1]. It belongs to the two member cetacean family Monodontidae, along with the narwhal (Monodon monoceros). The beluga whale is characteristically white and lacks a dorsal fin. The latter is presumed to be an adaptation to coping with under-ice conditions and to preserve heat [2]. Here, we present the genomic sequence and gene annotation resources for the beluga whale. This genome assembly will further aid in the comparative genomic analysis of marine mammals, complementing the cetacean genome resources of the orca (Orinicus orca), minke whale (Balaenoptera acutorostrata), and bottlenose dolphin (Tursiops truncatus).

2. Methods, Results and Discussion

The genome was assembled using paired-end reads that were sequenced from both standard Illumina and microfluidic partitioned genomic DNA libraries (Figure 1). All genomic sequence was generated using the Illumina HiSeq X platform (Illumina, San Diego, CA, USA) at Canada’s Michael Smith Genome Sciences Centre (Vancouver, BC, Canada). DNA samples were obtained from blood derived from two related individuals (dam and adult daughter) descended originally from near Churchill, MB, Canada.

Genome assembly workflow. gDNA: Genomic DNA.

For the generation of the microfluidic partitioned library (using the Chromium System, 10x Genomics Inc., Pleasanton, CA, USA) we generated DNA of high molecular weight (predominately > 50 kbp) from individual GAN/ISIS:26980492/103006 (dam). Briefly, the genomic DNA (gDNA) was extracted using the QIAGEN MagAttract high molecular weight DNA Kit (QIAGEN, Germantown, MD, USA) using the DNA Extraction protocol from the manufacturer (Chromium Genome Reagent Kits Version 2 User Guide). The integrity of the gDNA was confirmed using pulsed-field gel electrophoresis (PFGE). Using the Chromium Controller instrument (10x Genomics) fitted with a micro-fluidic Genome Chip (PN-120216) (10x Genomics), a library of Genome Gel Beads was combined with 1 ng of gDNA, Master Mix and partitioning oil to create Gel Bead-In-EMulsions (GEMs). The GEMs were subjected to an isothermic amplification step and bar-coded DNA fragments were recovered for Illumina library construction, following the Chromium Genome Reagent Kits Version 2 User Guide (PN-120229) (10x Genomics). Quantitative polymerase chain reaction (qPCR) was performed to assess library yield and an Agilent 2100 Bioanalyzer DNA 1000 chip (Agilent Technologies, Inc., Waldbronn, Germany) was used to determine the library size range and distribution. The library was sequenced on the Illumina HiSeq X sequencer, using paired-end sequencing to produce 150 bp reads at an estimated 60-fold redundant sequence coverage.

Non-partitioned paired-end DNA sequencing libraries were produced from both individuals, dam and daughter. DNA was extracted from 200 µL peripheral blood using a QIAamp DNA Blood Mini kit (QIAGEN), automated on a QIAcube (QIAGEN) and quantified using a Quant-iT double-stranded DNA, High Sensitivity assay (Thermo Fisher Scientific, Waltham, MA, USA). Genomic DNA (500 ng) was used to construct whole genome PCR-free libraries for each whale. Briefly, DNA was sheared to 300–600 bp by acoustic sonication for 90 s (Covaris Inc., Woburn, MA, USA). Sheared DNA was end-repaired and size selected using PCRClean DX magnetic beads (Aline Biosciences, Woburn, MA, USA) targeting ≈450 bp fragments. After 3’-A-tailing, full-length Illumina TruSeq adapters (Illumina) were ligated. Libraries were purified using ALINE beads and fragment sizes were assessed using an aliquot of PCR amplified library DNA on an Agilent 2100 Bioanalyzer DNA1000 chip. PCR-free library concentrations were quantified using a qPCR Library Quantification kit (KAPA, KK4824) (Kapa Biosystems, Wilmington, MA, USA). Each genome library was sequenced on a single lane of the HiSeq X instrument with paired-end 150 bp reads, with each also generating an estimated 60-fold redundant sequence coverage.

To aid in the subsequent annotation of the genome and provide a transcriptome resource, RNA sequencing (RNA-Seq) was performed using both the Illumina NextSeq and the Illumina MiSeq optimizing for read depth and read length for each platform, respectively (Figure 2).

For the Illumina NextSeq total nucleic acids were extracted from the liver and brain tissue from each individual using the ALINE EvoPure RNA Isolation Kit (Aline Biosciences). Following DNase I treatment, 250 ng total RNA was used for ribosomal (rRNA) removal using the NEBNext rRNA Depletion Kit (New England Biolabs, Inc., Ipswich, MA, USA). Complementary DNA (cDNA) synthesis and paired-end strand specific RNA-Seq library construction were performed according to our modified protocols, the main modification includes the random primed cDNA synthesis using Maxima H Minus 1st strand cDNA synthesis kit (Thermo Fisher) with Actinomycin D and NEBNext directional second strand cDNA module (New England Biolabs, Inc.). Libraries were pooled and sequenced, generating 75 bp paired-end reads.

RNA for the Illumina MiSeq was sequenced from blood serum, lung, and a mixture of tissues comprising of equal amounts of brain, duodenum, heart, liver, lung, and spleen. These tissues were all extracted from the dam. Homogenization using Tri-reagent was performed in a Mixer Mill (QIAGEN) on replicates of the various tissues. Total RNA extractions on 100 µL of the aqueous layers for each tissue replicate, as well as the pooled sample containing equal volumes of aqueous layers from tissues, were performed using the Magma-96 for Microarrays RNA kit (Ambion, Inc., Austin, TX, USA) spin protocol with the addition of a TURBO DNase step, on a Biomek NXP (Beckman-Coulter, Mississauga, ON, Canada) automated liquid-handling instrument. Ribosomal RNA was removed with the Epicentre ScriptSeq Complete Gold Kit (Illumina) RNA-Seq libraries were prepared using the ScriptSeq Complete Epidemiology Next-Generation Sequencing (NGS) library kit (Illumina), barcoded, and combined into an RNA-Seq run on the Illumina MiSeq platform generating 201 bp paired-end reads (performed at the Pacific Biological Station, Nanaimo, BC, Canada).

A genomic assembly using the paired-end sequence reads from the partitioned library was assembled using the Supernova assembly algorithm (version 1.1.5, 10x Genomics, San Francisco, CA, USA). This produced an initial assembly with a scaffold N50 length of 16.79 Mbp (Table 1). To further improve this draft genome, we first assembled paired-end reads from the non-partitioned libraries from both of the individuals. Briefly, paired-end sequences from the same individual as the partitioned library were processed using Konnector (version 2.0) [3], which performs local de Bruijn graph assembly to generate pseudo-long reads representing complete DNA fragments. The resulting pseudo-long reads and remaining sequence reads that could not be incorporated by Konnector were assembled with ABySS (version 2.0.2, Canada’s Michael Smith Genome Sciences Centre) [4], using kmer values (k) between k = 60 and k = 120. Paired-end sequences from the second individual were incorporated in the later scaffolding stage of the ABySS assembly and did not contribute to the sequence content. The k100 ABySS assembly was determined to be the most contiguous based on the scaffold N50 length metric (N50 = 58,545, Table 1). We followed a scaffolding and gap-filling methodology similar to that of the recently published bullfrog genome [5]. Gaps in our initial, supernova draft assembly were filled with Cobbler (version 0.3, Canada’s Michael Smith Genome Sciences Centre) using parameters -d 100 -i 0.95 [6] utilizing contig sequences from the ABySS assemblies generated at three kmer values (k95, k100, k105). The subsequent gap-filled assembly was initially scaffolded with RAILS (version 1.2, Canada’s Michael Smith Genome Sciences Centre) using parameters -d 100 -i 0.95 [6] using scaffold sequences from the same three ABySS assemblies. Briefly, long sequences are aligned against a draft assembly using BWA-MEM (version 0.7.13), using parameters -a -t 16 [7], and the resulting alignments are parsed and inspected. When alignments satisfied our minimum alignment requirement (100 or more anchoring bases with over 95% sequence identity flanking each gap), we tracked the position and alignment orientation of each long sequence in the assembly draft. Sequence scaffolding was performed using the scaffolding algorithm from LINKS (version 1.8.5, Canada’s Michael Smith Genome Sciences Centre) [8], modified to automatically fill gaps with the sequence that informed the merge. The resulting assembly was re-scaffolded iteratively eight times with LINKS using parameters -k 26 -l 5 -a 0.3 -d 1,2.5,5,7.5,10,12.5,15,20 kbp, -t 10,5,5,4,4,3,3,2 -o 1 increment at each iteration, using scaffolds from all three ABySS assemblies. A final round of automated gap-filling with Cobbler using ABySS contigs was applied (same parameters as above). The genome scaffolding steps yielded an improved scaffold N50 length of 19.59 Mbp. Further improvement was achieved using Sealer (version 2.0.2, parameter -k at 60–200, step 10) [9], which used sequence contigs from the ABySS k100 assembly to close an additional 1059 sequence gaps.

Table 1.

Assembly statistics and gene content for the genome sequences reported in this study.

Assembly	Total Size (Gbp)	No. of Gaps	No. of Scaffolds	Scaffold N50 (bp)	Longest Scaffold (bp)	BUSCO Complete Genes	BUSCO Complete + Fragmented Genes
ABySS-pe	2.325 + 0.216% in gaps	210,782	102,940	58,545	997,316	4153 (66.42%)	4689 (74.99%)
Supernova	2.314 + 1.40% in gaps	30,858	8930	16.79 × 10⁶	78 × 10⁶	5667 (90.63%)	5911 (94.53%)
Rails/Cobbler	2.327 + 1.37% in gaps	26,898	6971	19.59 × 10⁶	95 × 10⁶	5669 (90.66%)	5915 (94.59%)
Sealer	2.327 + 1.36% in gaps	25,839	6971	19.59 × 10⁶	95 × 10⁶	5669 (90.66%)	5915 (94.59%)

Open in a new tab

BUSCO: Benchmarking Universal Single-Copy Orthologs.

The final assembly (Table 1) comprises 2.327 Gbp of highly contiguous assembled genomic sequence with a scaffold N50 length of 19.59 Mbp. Analysis of the representation of highly conserved genes using Benchmarking Universal Single-Copy Othologs (BUSCO) [10] indicated that for 6253 genes, the complete and contiguous protein coding sequence was found in our assembly for 5669 genes (90.66%), whilst complete or fragmented sequences were found for 5915 genes (94.59%).

A transcriptome was obtained from de novo assembly of 75 bp paired-end chastity passed RNA-Seq reads sequenced on the Illumina NextSeq. Each library was assembled with ABySS (version 1.3.4, k38 to k74). The resulting assemblies within each library were merged using Trans-ABySS (version 1.4.10, Canada’s Michael Smith Genome Sciences Centre, Vancouver, BC, Canada) to produce a working transcript contig set (Table 2). MAKER (2.31.9, Yandell Lab, Salt Lake City, UT, USA) was used for the annotation and determination of protein coding potential of the genome assembly [11]. As part of its process, MAKER runs three ab initio gene prediction programs and uses experimental gene evidence to inform each. The gene predictions from each tool are then combined to produce a final annotated gene set. Within the MAKER framework, RepeatMasker [12] was used to mask low-complexity genomic sequences. The gene prediction programs AUGUSTUS [13], Snap [14], and GeneMark [15] were run within MAKER with the RNA-Seq assemblies and the annotated proteins of O. orca (Genbank accession ANOL00000000.2) were provided as experimental evidence. AUGUSTUS predictions were based on the included Homo sapiens training set of genes. Snap was trained using the CEGMA (version 2.5) predictive genes [16]. GeneMark was self-trained. These three sets of predictions were combined by MAKER to produce a final gene set of 29,581 genes, for which there was supporting experimental evidence. The RNA-Seq data provided evidence for multiple isoforms of some genes, and 38,561 transcripts were predicted from the 29,581 genes. The average predicted protein length was 402 amino acids. A full annotation of the genome is available from the Refseq website (https://www.ncbi.nlm.nih.gov/refseq/).

Table 2.

Transcriptome assembly statistics for all tissues studied and the read counts for each library.

Tissue	n	n:N50	Min	N80	N50	N20	Max	Sum	Read Count
Liver, dam	960,722	117,511	74	144	420	1542	47,312	246.3 × 10⁶	239.4 × 10⁶
Brain, dam	2,019,281	247,296	74	193	587	2013	18,656	691 × 10⁶	247.0 × 10⁶
Liver, daughter	854,394	99,263	74	145	555	1538	47,494	235.4 × 10⁶	241.2 × 10⁶
Brain daughter	2,258,624	260,327	74	198	653	2219	19,796	806.2 × 10⁶	270.4 × 10⁶
Lung, dam	1,170,674	374,225	162	201	282	504	5091	339.2 × 10⁶	25.6 × 10⁶
Mixed, dam	860,603	305,220	149	186	220	352	4751	208.2 × 10⁶	26.6 × 10⁶
Serum, dam	1,244,441	516,834	135	195	203	287	8034	281.5 × 10⁶	23.8 × 10⁶

Open in a new tab

Overall, the contiguity of our assembly compares favourably with that of the O. orca (killer whale) genome, which reported a reconstructed genome size of 2.372 Gbp and scaffold N50 length of 12.735 Mbp [17]. With respect to assembly completeness, a similar number of BUSCO genes were present in a complete copy: 90.66% in the beluga and 91.46% in the orca. In addition, we estimated the degree of genome-wide sequence similarity between these two closely related organisms to be 97.87% ± 2.4 × 10⁻⁷% (mean ± standard deviation) using a kmer-based Bloom filter approach, as described by [18]. Our approach demonstrates the utility of microfluidic partitioned libraries to rapidly produce highly contiguous mammalian sized reference quality assemblies and we demonstrate how such assemblies can be further improved through the incorporation of assembled non-partitioned genomic libraries. We note that incorporating data from independent non-partitioned libraries provided only a modest overall improvement in the assembly, indicating that the partitioned libraries show no obvious bias. In the future, incorporating an independent assembly on the partitioned data might be an equally useful approach to improve assembly contiguity and eliminate the need for a second library. Our study also provides a deep transcriptomic resource that is profiled across multiple tissues.

Acknowledgments

Funding to conduct this work was provided by Genome Canada (Grant #212SEQ) and Genome British Columbia.

Author Contributions

S.J.M.J. and S.J.J. wrote the manuscript. G.A.T., S.C., R.L.W., S.A.H., S.B., G.M., A.S., A.M.C., K.T., I.L., D.C., K.L.M., C.C., A.A., N.D., A.K.Y.T., A.T., H.K., P.P. and D.P. generated the data. S.J.M.J., C.A.S., K.M.M., R.J.N.C., A.J.M., R.M., Y.Z., I.B., Y.M., M.M. and M.H. conceived and designed the experiments. All authors read and approved the manuscript.

Conflicts of Interest

The authors declare no competing financial interests.

References

1.Stewart B.E., Stewart R.E.A. Mammalian Species Delphinapterus leucas. J. Mammal. 1989:1–8. doi: 10.2307/3504210. [DOI] [Google Scholar]
2.O’Corry-Crowe G.M. Beluga whale Delphinapterus leucas. In: Perrin W.F., Würsig B.G., Thewissen J.G.M., editors. Encyclopedia of Marine Mammals. 1st ed. Academic Press; San Diego, CA, USA: 2002. pp. 94–99. [Google Scholar]
3.Vandervalk B.P., Yang C., Xue Z., Raghavan K., Chu J., Mohamadi H., Jackman S.D., Chiu R., Warren R.L., Birol I. Konnector v2.0: Pseudo-long reads from paired-end sequencing data. BMC Med. Genom. 2015;8(Suppl. 3):S1. doi: 10.1186/1755-8794-8-S3-S1. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Jackman S.D., Vandervalk B.P., Mohamadi H., Chu J., Yeo S., Hammond S.A., Jahesh G., Khan H., Coombe L., Warren R.L., et al. ABySS 2.0: Resource-efficient assembly of large genomes using a bloom filter. Genom. Res. 2017;27:768–777. doi: 10.1101/gr.214346.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Hammond S.A., Warren R.L., Vandervalk B.P., Kucuk E., Khan H., Gibb E.A., Pandoh P., Kirk H., Zhao Y., Jones M., et al. The North American bullfrog draft genome provides insight into hormonal regulation of long noncoding RNA. Nat. Commun. 2017;8:1433. doi: 10.1038/s41467-017-01316-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Warren R.L. RAILS and Cobbler: Scaffolding and automated finishing of draft genomes using long DNA sequences. J. Open Source Softw. 2016;1:116. doi: 10.21105/joss.00116. [DOI] [Google Scholar]
7.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 2013. 1303.3997v1301
8.Warren R.L., Chen Y., Vandervalk B.P., Behsaz B., Lagman A., Jones S.J.M., Birol I. LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads. GigaScience. 2015;4:35. doi: 10.1186/s13742-015-0076-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Paulino D., Warren R.L., Vandervalk B.P., Raymond A., Jackman S.D., Birol I. Sealer: A scalable gap-closing application for finishing draft genomes. BMC Bioinform. 2015;16:230. doi: 10.1186/s12859-015-0663-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Simao F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. Busco: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
11.Campbell M.S., Holt C., Moore B., Yandell M. Genome annotation and curation using MAKER and MAKER-P. Curr. Protoc. Bioinform. 2014;48:4.11.1–4.11.39. doi: 10.1002/0471250953.bi0411s48. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Smit A.F.A., Hubley R., Green P., editors. RepeatMasker. Institute for Systems Biolog; Seattle, WA, USA: [(accessed on 12 September 2017)]. Open-4.0. 2013–2015. Available online: http://www.repeatmasker.org. [Google Scholar]
13.Stanke M., Tzvetkova A., Morgenstern B. AUGUSTUS at EGASP: Using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol. 2006;7:S11. doi: 10.1186/gb-2006-7-s1-s11. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Korf I. Gene finding in novel genomes. BMC Bioinform. 2004;5:59. doi: 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Lukashin A.V., Borodovsky M. GeneMark.hmm: New solutions for gene finding. Nucleic Acids Res. 1998;26:1107–1115. doi: 10.1093/nar/26.4.1107. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Parra G., Bradnam K., Korf I. CEGMA: A pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23:1061–1067. doi: 10.1093/bioinformatics/btm071. [DOI] [PubMed] [Google Scholar]
17.Foote A.D., Liu Y., Thomas G.W., Vinar T., Alfoldi J., Deng J., Dugan S., van Elk C.E., Hunter M.E., Joshi V., et al. Convergent evolution of the genomes of marine mammals. Nat. Genet. 2015;47:272–275. doi: 10.1038/ng.3198. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Warren R.L., Keeling C.I., Yuen M.M., Raymond A., Taylor G.A., Vandervalk B.P., Mohamadi H., Paulino D., Chiu R., Jackman S.D., et al. Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism. Plant J. Cell Mol. Biol. 2015;83:189–212. doi: 10.1111/tpj.12886. [DOI] [PubMed] [Google Scholar]

[B1-genes-08-00378] 1.Stewart B.E., Stewart R.E.A. Mammalian Species Delphinapterus leucas. J. Mammal. 1989:1–8. doi: 10.2307/3504210. [DOI] [Google Scholar]

[B2-genes-08-00378] 2.O’Corry-Crowe G.M. Beluga whale Delphinapterus leucas. In: Perrin W.F., Würsig B.G., Thewissen J.G.M., editors. Encyclopedia of Marine Mammals. 1st ed. Academic Press; San Diego, CA, USA: 2002. pp. 94–99. [Google Scholar]

[B3-genes-08-00378] 3.Vandervalk B.P., Yang C., Xue Z., Raghavan K., Chu J., Mohamadi H., Jackman S.D., Chiu R., Warren R.L., Birol I. Konnector v2.0: Pseudo-long reads from paired-end sequencing data. BMC Med. Genom. 2015;8(Suppl. 3):S1. doi: 10.1186/1755-8794-8-S3-S1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4-genes-08-00378] 4.Jackman S.D., Vandervalk B.P., Mohamadi H., Chu J., Yeo S., Hammond S.A., Jahesh G., Khan H., Coombe L., Warren R.L., et al. ABySS 2.0: Resource-efficient assembly of large genomes using a bloom filter. Genom. Res. 2017;27:768–777. doi: 10.1101/gr.214346.116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5-genes-08-00378] 5.Hammond S.A., Warren R.L., Vandervalk B.P., Kucuk E., Khan H., Gibb E.A., Pandoh P., Kirk H., Zhao Y., Jones M., et al. The North American bullfrog draft genome provides insight into hormonal regulation of long noncoding RNA. Nat. Commun. 2017;8:1433. doi: 10.1038/s41467-017-01316-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6-genes-08-00378] 6.Warren R.L. RAILS and Cobbler: Scaffolding and automated finishing of draft genomes using long DNA sequences. J. Open Source Softw. 2016;1:116. doi: 10.21105/joss.00116. [DOI] [Google Scholar]

[B7-genes-08-00378] 7.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 2013. 1303.3997v1301

[B8-genes-08-00378] 8.Warren R.L., Chen Y., Vandervalk B.P., Behsaz B., Lagman A., Jones S.J.M., Birol I. LINKS: Scalable, alignment-free scaffolding of draft genomes with long reads. GigaScience. 2015;4:35. doi: 10.1186/s13742-015-0076-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9-genes-08-00378] 9.Paulino D., Warren R.L., Vandervalk B.P., Raymond A., Jackman S.D., Birol I. Sealer: A scalable gap-closing application for finishing draft genomes. BMC Bioinform. 2015;16:230. doi: 10.1186/s12859-015-0663-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10-genes-08-00378] 10.Simao F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. Busco: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

[B11-genes-08-00378] 11.Campbell M.S., Holt C., Moore B., Yandell M. Genome annotation and curation using MAKER and MAKER-P. Curr. Protoc. Bioinform. 2014;48:4.11.1–4.11.39. doi: 10.1002/0471250953.bi0411s48. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12-genes-08-00378] 12.Smit A.F.A., Hubley R., Green P., editors. RepeatMasker. Institute for Systems Biolog; Seattle, WA, USA: [(accessed on 12 September 2017)]. Open-4.0. 2013–2015. Available online: http://www.repeatmasker.org. [Google Scholar]

[B13-genes-08-00378] 13.Stanke M., Tzvetkova A., Morgenstern B. AUGUSTUS at EGASP: Using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol. 2006;7:S11. doi: 10.1186/gb-2006-7-s1-s11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14-genes-08-00378] 14.Korf I. Gene finding in novel genomes. BMC Bioinform. 2004;5:59. doi: 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15-genes-08-00378] 15.Lukashin A.V., Borodovsky M. GeneMark.hmm: New solutions for gene finding. Nucleic Acids Res. 1998;26:1107–1115. doi: 10.1093/nar/26.4.1107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16-genes-08-00378] 16.Parra G., Bradnam K., Korf I. CEGMA: A pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007;23:1061–1067. doi: 10.1093/bioinformatics/btm071. [DOI] [PubMed] [Google Scholar]

[B17-genes-08-00378] 17.Foote A.D., Liu Y., Thomas G.W., Vinar T., Alfoldi J., Deng J., Dugan S., van Elk C.E., Hunter M.E., Joshi V., et al. Convergent evolution of the genomes of marine mammals. Nat. Genet. 2015;47:272–275. doi: 10.1038/ng.3198. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18-genes-08-00378] 18.Warren R.L., Keeling C.I., Yuen M.M., Raymond A., Taylor G.A., Vandervalk B.P., Mohamadi H., Paulino D., Chiu R., Jackman S.D., et al. Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism. Plant J. Cell Mol. Biol. 2015;83:189–212. doi: 10.1111/tpj.12886. [DOI] [PubMed] [Google Scholar]

PERMALINK

The Genome of the Beluga Whale (Delphinapterus leucas)

Steven J M Jones

Gregory A Taylor

Simon Chan

René L Warren

S Austin Hammond

Steven Bilobram

Gideon Mordecai

Curtis A Suttle

Kristina M Miller

Angela Schulze

Amy M Chan

Samantha J Jones

Kane Tse

Irene Li

Dorothy Cheung

Karen L Mungall

Caleb Choo

Adrian Ally

Noreen Dhalla

Angela K Y Tam

Armelle Troussard

Heather Kirk

Pawan Pandoh

Daniel Paulino

Robin J N Coope

Andrew J Mungall

Richard Moore

Yongjun Zhao

Inanc Birol

Yussanne Ma

Marco Marra

Martin Haulena

Abstract

1. Introduction

2. Methods, Results and Discussion

Figure 1.

Figure 2.

Table 1.

Table 2.

Acknowledgments

Author Contributions

Conflicts of Interest

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases