Table 2.
Sequencing and assembly statistics, and accession numbers.
BioProjects and Vouchers | CCGP NCBI BioProject | PRJNA720569 | |||||
Genera NCBI BioProject | PRJNA765802 | ||||||
Species NCBI BioProject | PRJNA777156 | ||||||
NCBI BioSample | SAMN26368113 | ||||||
Specimen identification | CAN_PGR_092001 | ||||||
NCBI Genome accessions | Primary | Alternate | |||||
Assembly accession | JALGQL000000000 | JALGQM000000000 | |||||
Genome sequences | GCA_023055335.1 | GCA_023055415.1 | |||||
Genome Sequence | PacBio HiFi reads | Run | 1 PACBIO_SMRT (Sequel II) run: 2.3 M spots, 27.9 G bases, 15 Gb | ||||
Accession | SRX15223504 | ||||||
Omni-C Illumina reads | Run | 2 ILLUMINA (Illumina NovaSeq 6000) runs: 56.6 M spots, 17.1 G bases, 5.9 Gb | |||||
Accession | SRX15223505, SRX15223506 | ||||||
Genome Assembly Quality Metrics | Assembly identifier (quality codea) | fCliAna1(6.7.P7.Q58.C87) | |||||
HiFi read coverageb | 57× | ||||||
Primary | Alternate | ||||||
Number of contigs | 662 | 393 | |||||
Contig N50 (bp) | 9,145,431 | 9,433,512 | |||||
Contig NG50b | 10,429,551 | 9,838,831 | |||||
Longest contigs | 20,125,327 | 19,750,731 | |||||
Number of scaffolds | 443 | 171 | |||||
Scaffold N50 | 21,001,540 | 20,474,872 | |||||
Scaffold NG50b | 21,652,950 | 21,109,456 | |||||
Largest scaffold | 25,521,869 | 28,184,607 | |||||
Size of final assembly | 538,118,947 | 534,899,999 | |||||
Phased blocks NG50b | 10,429,551 | 9,838,831 | |||||
Gaps per Gbp (# gaps) | 406 (219) | 415 (222) | |||||
Indel QV (frameshift) | 47.09853152 | 47.67304317 | |||||
Base pair QV | 58.6256 | 58.6696 | |||||
Full assembly = 58.6475 | |||||||
K-mer completeness | 91.4415 | 91.4643 | |||||
Full assembly = 99.8107 | |||||||
BUSCO completeness (actinopterygii) n= 3640 |
C | S | D | F | M | ||
Pc | 97.90% | 96.70% | 1.20% | 0.40% | 1.70% | ||
Ac | 98.20% | 97.10% | 1.10% | 0.40% | 1.40% | ||
Organelles | 1 Partial mitochondrial sequence JALGQL010000443.1 |
aAssembly quality code x·y·P·Q·C, where x = log10[contig NG50]; y = log10[scaffold NG50]; P = log10 [phased block NG50]; Q = Phred base accuracy QV (quality value); C = % genome represented by the first “n” scaffolds, following a known karyotype of 2n = 48 (Hinegardner and Rosen 1972). BUSCO scores. (C)omplete and (S)ingle; (C)omplete and (D)uplicated; (F)ragmented and (M)issing BUSCO genes. n, number of BUSCO genes in the set/database.
bRead coverage and NGx statistics have been calculated based on the estimated genome size of 484 Mb.
cP(rimary) and (A)lternate assembly values.