Table 2.
BioProjects and vouchers | VGP NCBI BioProject | PRJNA489243 | |||||
Species NCBI BioProject | PRJNA970804 | ||||||
NCBI BioSample | SAMN33212336 | ||||||
NCBI Genome accessions | Haplotype 1 | Haplotype 2 | |||||
Assembly accession | GCA_030035585.1 | GCA_030020955.1 | |||||
Genome sequences | JASCZL000000000 | JASCZM000000000 | |||||
Genome sequence | PacBio HiFi reads | Run | 3 PACBIO_SMRT (Sequel II) runs: 6.5 million reads, 102 Gbases | ||||
Omni-C Illumina reads | Run | 2 ILLUMINA (Illumina NovaSeq 6000) runs: 457.5 million reads, 138.2Gb | |||||
Assembly identifier (quality code)a | mDugDug1 1(8.8.P8.Q70.C99) | ||||||
HiFi read coverageb | 32.0X | ||||||
Genome Assembly Quality Metrics | Haplotype 1 | Haplotype 2 | |||||
Number of contigs | 294 | 256 | |||||
Contig N50 (bp) | 57,632,671 | 57,883,746 | |||||
Contig NG50 (bp) | 57,632,671 | 57,883,746 | |||||
Longest contigs | 162,184,114 | 209,448,431 | |||||
Number of scaffolds | 198 | 167 | |||||
Scaffold N50 (bp) | 177,379,183 | 138,031,769 | |||||
Scaffold NG50 (bp) | 177,379,183 | 138,031,769 | |||||
Largest scaffold | 267,865,978 | 230,272,189 | |||||
Size of final assembly (bp) | 3,159,179,246 | 3,154,861,630 | |||||
Phased block NG50 (bp) | 57,632,671 | 57,883,746 | |||||
Gaps per Gbp (# Gaps) | 25 (79) | 28 (88) | |||||
Indel QV (frameshift) | 41.52 | 42.16 | |||||
Base pair QV | 70.4553 | 70.3254 | |||||
Full assembly = 70.3899 | |||||||
K-mer completeness | 97.9001 | 97.8847 | |||||
Full assembly = 99.7025 | |||||||
BUSCO completeness (vertebrata), n = 3354 | Cc | Sc | Dc | Fc | Mc | ||
Vertebrata n = 3354 | H1d | 97.9% | 95.9% | 2.0% | 1.0% | 1.1% | |
H2d | 97.8% | 95.7% | 2.1% | 1.1% | 1.1% | ||
Mammalia n = 9226 | H1d | 96.2% | 95.3% | 0.9% | 0.8% | 3.0% | |
H2d | 96.1% | 95.2% | 0.9% | 0.8% | 3.1% | ||
Organelles | 1 complete mitochondrial sequence (pending NCBI accession code) |
aAssembly quality code x·y·P·Q·C derived notation, from (Rhie et al. 2021). x = log10[contig NG50]; y = log10[scaffold NG50]; P = log10 [phased block NG50]; Q = Phred base accuracy QV (Quality value); C = % genome represented by the first “n” scaffolds, following a karyotype of 2n = 48 inferred from ancestral taxa Trichechus manatus (Noronha et al. 2022).
bRead coverage and NGx statistics have been calculated based on the estimated genome size of 3.16 Gbp.
cComplete BUSCOs (C), Complete and single-copy BUSCOs (S), Complete and duplicated BUSCOs (D), Fragmented BUSCOs (F), Missing BUSCOs (M).
d(H1) Haplotype 1 and (H2) Haplotype 2 assembly values.