Table 2.
Sequencing and assembly statistics, and accession numbers.
| Bio projects & vouchers | CCGP NCBI BioProject | PRJNA720569 | |||||
| Genera NCBI BioProject | PRJNA765806 | ||||||
| Species NCBI BioProject | PRJNA777157 | ||||||
| NCBI BioSample | SAMN31536067 | ||||||
| Specimen identification | COTO_CA2020_CCGP | ||||||
| NCBI Genome accessions | Primary | Alternate | |||||
| Assembly accession | JAPDVT000000000 | JAPDVU000000000 | |||||
| Genome sequences | GCA_026230055.1 | GCA_026230045.1 | |||||
| Genome sequence | PacBio HiFi reads | Run | 1 PACBIO_SMRT (Sequel II) run: 4M spots, 64.5G bases, 44.4Gb | ||||
| Accession | SRR23445762 | ||||||
| Omni-C Illumina reads | Run | 2 ILLUMINA (Illumina NovaSeq 6000) run, 133.4M spots, 40.3G bases, 13.4Gb | |||||
| Accession | SRR23445761, SRR23445763 | ||||||
| Genome assembly quality metrics | Assembly identifier (Quality codea) | mCorTow1(7.8.P.Q64.C98) | |||||
| HiFi Read coverageb | 32.31X | ||||||
| Haplotype 1 | Haplotype 2 | ||||||
| Number of contigs | 610 | 399 | |||||
| Contig N50 (bp) | 23,382,908 | 22,150,609 | |||||
| Contig NG50b | 24,508,096 | 22,150,609 | |||||
| Longest Contigs | 70,937,382 | 77,651,888 | |||||
| Number of scaffolds | 391 | 182 | |||||
| Scaffold N50 | 174,690,156 | 177,756,282 | |||||
| Scaffold NG50b | 178,686,506 | 177,756,282 | |||||
| Largest scaffold | 233,461,832 | 237,418,211 | |||||
| Size of final assembly | 2,104,912,948 | 1,961,562,149 | |||||
| Phased block NG50b | 24,508,096 | 22,150,609 | |||||
| Gaps per Gbp (# Gaps) | 104(219) | 111(217) | |||||
| Indel QV (Frame shift) | 40.23 | 38.6 | |||||
| Base pair QV | 64.7466 | 64.6825 | |||||
| Full assembly = 64.7155 | |||||||
| k-mer completeness | 94.6054 | 89.9054 | |||||
| Full assembly = 99.5751 | |||||||
| BUSCO completeness (n = 9,226) | C | S | D | F | M | ||
| H1c | 96.60% | 93.80% | 2.80% | 0.60% | 2.80% | ||
| H2c | 94.70% | 92.00% | 2.70% | 0.60% | 4.70% | ||
| Organelles | 1 Complete mitochondrial sequence | CM047939 | |||||
aAssembly quality code x·y·P·Q·C derived notation, from (Rhie et al. 2021). x = log10 [contig NG50]; y = log10 [scaffold NG50]; P = log10 [phased block NG50]; Q = Phred base accuracy QV (Quality value); C = % genome represented by the first ‘n’ scaffolds, following a known karyotype for C. townsendii of 2n = 32 (Baker and Patton, 1967). Quality code for all the assembly denoted by primary assembly (mCorTown1.0.hap1).
bRead coverage and NGx statistics have been calculated based on the estimated genome size of 1.997 Gb.
c(H1) Haplotype 1 and (H2) Haplotype 2 values.