Abstract
Advancements in genome sequencing technology have brought unprecedented accessibility of high-throughput sequencing to species of conservation interest. The potential knowledge gained from application of these techniques is maximized by availability of high-quality, annotated reference genomes for endangered species. However, these vital resources are often lacking for endangered minnows of North America (Cypriniformes: Leuciscidae). One such endangered species, Colorado pikeminnow (Ptychocheilus lucius), is the largest North American minnow and the top-level native aquatic predator in the Colorado River Basin of the southwestern United States and northwestern Mexico. Over the past century, Colorado pikeminnow has suffered habitat loss and population declines due to anthropogenic habitat modifications and invasive species introductions. The lack of genetic resources for Colorado pikeminnow has hindered conservation genomic study of this unique organism. This study seeks to remedy this issue by presenting a high-quality reference genome for Colorado pikeminnow developed from Pacific Biosciences HiFi sequencing and Hi-C scaffolding. The final assembly was a 1.1 Gb genome comprised of 305 contigs including 25 chromosome-sized scaffolds. Measures of quality, contiguity, and completeness met or exceeded those observed for Danio rerio (Danionidae) and 2 other Colorado River Basin leuciscids (Meda fulgida and Tiaroga cobitis). Comparative genomic analyses identified enrichment of gene families for growth, development, immune activity, and gene transcription; all of which are important for a large-bodied piscivorous fish living in a dynamic environment. This reference genome will provide a basis for important conservation genomic study of Colorado pikeminnow and help efforts to better understand the evolution of desert fishes.
Keywords: Ptychocheilus lucius, Leuciscidae, Cypriniformes, genome assembly, conservation genomics, endangered species
Introduction
Leuciscidae (Actinopterygii: Cypriniformes) is a highly diverse family of Holarctic fishes which includes North American minnows (Schönhuth et al. 2018). Leuciscids have diversified to occupy nearly all freshwater habitats in North America (Gotelli and Pyron 1991; Martin and Bonett 2015; Burress et al. 2017), including the desert southwest where several species grow to atypical size (>40 cm length) and occupy unique niches in the Colorado River and its tributaries (Minckley et al. 1986). However, despite their abundance and speciosity, leuciscids remain underrepresented in efforts to sequence vertebrate genomes for evolutionary study and conservation genomic applications (Fan et al. 2020; Rhie et al. 2021).
The Colorado pikeminnow (Ptychocheilus lucius), a Colorado River endemic, has several traits which are exceptional among leuciscid species. It is the largest native North American minnow and has a near-exclusively piscivorous diet as an adult, which is rare among minnows (Vanicek and Kramer 1969). It was historically the top-level aquatic predator in a highly depauperate native fish community (Minckley and Marsh 2009). Several of its life history traits (long lifespan, slow maturation process, and migratory behavior associated with spawning) are common among imperiled southwestern fishes (Tyus and McAda 1984; Tyus 1990). Population declines due to changes in flow regimes caused by dams, threats from nonnative species, and habitat loss have led to its listing as endangered under the US Endangered Species Act of 1973 (Miller 1961; Minckley and Deacon 1968).
Contemporary population trends, environmental uncertainty, and anthropogenic habitat modifications portend an uncertain future for Colorado pikeminnow (Comte et al. 2022; Pennock et al. 2022). Its contemporary distribution is restricted to the Green, Upper Colorado, and San Juan River subbasins (Fig. 1; Dibble et al. 2023). The San Juan River population continues to experience low recruitment following its extirpation and subsequent reestablishment (Ryden and Ahlm 1996; Franssen et al. 2016; Clark et al. 2018). The Green and Upper Colorado River populations are self-sustaining, but recent population declines have made persistence uncertain (Osmundson and White 2017; Dibble et al. 2023).
Fig. 1.
Approximate current and historical distributions of Colorado pikeminnow (Ptychocheilus lucius) along major rivers of the Colorado River Basin in the United States and Mexico. The Green, San Juan, and Upper Colorado River subbasins which contain the present-day Colorado pikeminnow populations are highlighted. The top-left inset shows the location of the Colorado River Basin in North America. The Colorado pikeminnow illustration is shown with copyright permission from © Joseph R. Tomelleri.
The precarious nature of remnant Colorado pikeminnow populations identifies a need to develop genomic resources which facilitate the estimation of population genomic parameters with improved precision and accuracy. For example, an annotated reference genome can aid in identification of selectively neutral loci which meet key assumptions for many conservation genomic analyses (Allendorf et al. 2010). Reduced representation sequencing methods routinely used in conservation genomic studies also benefit from reference genomes to ensure data sets fit analytical model assumptions, such as linkage disequilibrium among genetic loci (Campbell et al. 2018; O’Leary et al. 2018). Additionally, genomic resources facilitate studies from evolutionary perspectives which can help identify genomic structural variants associated with environmental adaptation (Wellenreuther et al. 2019; Mérot et al. 2020), facilitate differential gene expression studies to quantify molecular responses to physiological or ecological stimuli (Connon et al. 2018), and identify species’ physiological limitations (Komoroske et al. 2021).
Unfortunately, reference genomes which can maximize the information gained from genome-scale studies remain rare for western North American leuciscids. Colorado pikeminnow is no exception, with prior studies relying upon legacy genetic markers such as allozymes, microsatellites, and mtDNA assays (Morizot et al. 2002; Borley and White 2006; Martin et al. 2015). This deficit is remedied herein by combining long- and short-read sequence data to assemble and annotate a chromosome-scale Colorado pikeminnow reference genome. Comparisons to other Cypriniformes species were conducted to demonstrate genome contiguity and completeness, as well as to investigate potential gene family expansions of ecological importance for Colorado pikeminnow.
Materials and methods
Tissue collection
A male Colorado pikeminnow was collected from the 2006 cohort at the Southwestern Native Aquatic Resources and Recovery Center in Dexter, New Mexico, USA. This cohort is an F2 descendant from wild fish collected from the Colorado and Green rivers in 1981 and 1991 and was spawned to serve as a backup broodstock for the species’ recovery program (Diver et al. 2019). The 2006 cohort has an observed heterozygosity of 0.862 as determined from 24 microsatellite loci (Diver et al. 2019). The selected individual was first anesthetized using MS-222 for 2 blood draws into 4-mL vacutainer tubes coated with K2EDTA as anticoagulant. Blood samples were placed on dry ice immediately after collection and shipped overnight to sequencing facilities. The fish was then euthanized with an overdose of MS-222 and dissected to obtain 1 cm3 subsamples of 11 tissues for transcriptome characterization: brain, eye, gall bladder, gill, heart, kidney, liver, muscle, skin, spleen, and testes. These tissues were preserved in RNAlater and stored at 4°C for 24 h prior to RNA extraction.
DNA extraction and sequencing
One blood sample was shipped to the University of Oregon Genomics and Cell Characterization Core Facility for high molecular weight (HMW) DNA extraction using a Circulomics Nanobind CBB kit (Pacific Biosciences, Menlo Park, CA, USA). A genomic library was then prepared from 10 μg DNA using the PacBio HiFi Prep Kit following the manufacturer’s protocols. HiFi sequencing was conducted on 2 SMRT cells in a Pacific Biosciences Sequel II sequencer. The other blood sample was shipped to North Carolina State Genomic Sciences Laboratory (NCSU GSL) for Hi-C library preparation and sequencing. This library was prepared using the Proximo™ Hi-C kit for animals (Phase Genomics, Seattle, WA, USA) following the manufacturer’s protocols for nucleated red blood cells and sequenced on a single lane of an Illumina NovaSeq 6000 SP flowcell (2 × 150 bp reads).
Genome assembly
HiFi data were filtered to remove reads containing PacBio adapter sequence with HiFiAdapterFilt v2.0.0 (Sim et al. 2022). Illumina adapter sequence was trimmed from Hi-C reads using TrimGalore v0.6.7 (Martin 2011). Genome size was predicted from the k-mer profile of the HiFi data by counting 32-mers in Meryl v1.3 (Rhie et al. 2020) and analyzing the resulting histogram in GenomeScope v2.0 (Ranallo-Benavidez et al. 2020). Contigs were assembled in Hifiasm v0.16.1-r375 using default settings and Hi-C integration (Cheng et al. 2021, 2022). Scaffolding with Hi-C data was accomplished using the Juicer v1.6 and 3D-DNA v201008 pipelines (Durand, Shamim, et al. 2016; Dudchenko et al. 2017). A Hi-C contact map was produced using the trimmed Hi-C reads, DpnII restriction cut site locations, and the assembled contigs as Juicer inputs. The contact map was provided to the 3D-DNA pipeline to identify draft scaffolds that were subsequently visualized in Juicebox v1.11.08 (Durand, Robinson, et al. 2016). Two misjoins were manually corrected in Juicebox, and the updated contact map was input into 3D-DNA to construct the scaffolded assembly.
The assembly was screened for duplicated scaffolds using the “clean” function of funannotate v1.8.15 (Palmer and Stajich 2023). Scaffolds were then classified using Kraken2 v2.1.3 (Wood et al. 2019) and the Kraken2 “standard plus protozoa, fungi, & plant” (PlusPFP) RefSeq index dated 2023 October 9. Scaffolds mapping to algal, bacterial, plant, and yeast sequences were removed using custom scripts (removeContamination.sh and matchHeader.pl). HiFi reads were mapped back to the final assembly with Minimap2 v2.24-r1122 (Li 2018, 2021) for the purpose of calculating and visualizing genome assembly statistics in Quast v5.2.0 (Gurevich et al. 2013) and BlobToolKit v3.5.4 (Challis et al. 2020). Genome completeness was assessed using BUSCO v5.5.0 (Manni et al. 2021) (actinopterygii_odb10 database: 26 genomes, 3,640 genes).
Repeat modeling and masking
RepeatModeler v2.0.3 was used to develop a custom repeat library for Colorado pikeminnow, with the -LTRStruct option enabled (Flynn et al. 2020). The custom library was screened using BLAST (Altschul et al. 1990) to remove coding sequences that aligned to the UniProt-SwissProt database (Poux et al. 2017), downloaded 2023 November 30. The custom library was combined with the Repbase v27.08 vertebrate database (Bao et al. 2015) and input into RepeatMasker v4.1.2 to generate a soft-masked genome and GFF file (Smit et al. 2015).
RNA extraction and sequencing
RNA was extracted using the Macherey-Nagel NucleoSpin RNA Mini Kit (Macherey-Nagel, Düren, Germany). The manufacturer’s protocol was modified to include mechanical tissue disruption. Up to 20 mg of tissue was placed in a vial containing 2.8 mm ceramic beads and submerged in the Buffer RA1/β-mercaptoethanol mixture from the NucleoSpin RNA Mini Kit. Tissues were subjected to 2 rounds of 2-min disruption at 20–30 Hz on a Qiagen TissueLyser II (Qiagen, Germantown, MD, USA). Sample blocks were rotated 180° between rounds to promote homogenization. Samples then entered the NucleoSpin RNA Mini Kit manufacturer’s protocol for RNA extraction at the lysate filtering step. RNA extracts were quantified with a Qubit RNA Broad Range kit on the Qubit 3.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) and diluted to a standard concentration of 20 ng/µL using nuclease-free water. RNA libraries were prepared from 500 ng input RNA following the poly-A capture protocol of the Illumina Stranded mRNA Prep kit (Illumina, Inc., San Diego, CA, USA) and dual indexed with 10 bp barcodes. Sequencing was performed at the NCSU GSL on a single lane of an Illumina NovaSeq 6000 SP flowcell (2 × 150 bp reads).
Transcriptome assembly
Raw RNA sequence reads obtained from all tissues were first subjected to error correction using the k-mer-based program Rcorrector v1.0.5 (Song and Florea 2015). Read pairs containing at least 1 read deemed unfixable by Rcorrector were discarded. TrimGalore v0.6.7 (Martin 2011) was then applied with default settings to remove low-quality bases and Illumina adapters. Processed reads were then aligned to the SILVA v138.1 LSUParc and SSUParc databases (Quast et al. 2013) using bowtie2 v2.4.5 (Langmead and Salzberg 2012) to remove ribosomal RNA reads. The –very-sensitive-local and –nofw alignment options were applied in bowtie2. Finally, poly-A/T tails of at least 5 bp were trimmed from the 3′ and 5′ ends of all reads in Prinseq v0.20.4 (Schmieder and Edwards 2011). Read pairs were discarded if 1 read was <35 bp after trimming. Processed RNA reads were reference aligned to the genome assembly using the –very-sensitive, –dta, and –rna-strandedness RF options in HISAT2 v2.2.1 (Kim et al. 2019) and sorted using samtools v1.15.1 (Danecek et al. 2021).
Annotation
Extrinsic protein evidence, RNA sequence alignments, and the soft-masked genome were used to predict genes in the Colorado pikeminnow genome. Protein evidence was utilized from fishes of superorder Ostariophysi that had annotated chromosome-scale genome assemblies available in the NCBI RefSeq database. These included Mexican tetra (Astyanax mexicanus; GCF_023375975.1), goldfish (Carassius auratus; GCF_003368295.1), common carp (Cyprinus carpio; GCF_018340385.1), zebrafish (Danio rerio; GCF_000002035.6), channel catfish (Ictalurus punctatus; GCF_001660625.3), Chinese sucker (Myxocyprinus asiaticus; GCF_019703515.2), fathead minnow (Pimephales promelas; GCF_016745375.1), and razorback sucker (Xyrauchen texanus; GCF_025860055.1). Proteins curated for vertebrate species in the UniProt-SwissProt database were also included.
Gene prediction was conducted in the BRAKER3 v3.0.6 pipeline (Gabriel et al. 2023), which combines the BRAKER1 and BRAKER2 pipelines to utilize both RNA sequence data and extrinsic protein evidence for genome annotation (Hoff et al. 2016, 2019; Brůna et al. 2021). Briefly, BRAKER3 first combined RNA sequence alignments using StringTie (Kovaka et al. 2019). The assembled transcripts of putative protein-coding genes were compared against the protein database to identify probable genes and generate protein hints with GeneMark-ETP (Buchfink et al. 2015; Brůna et al. 2021, 2023). The protein hints were used to train AUGUSTUS (Stanke et al. 2006, 2008). Finally, genes predicted by GeneMark-ETP and AUGUSTUS were combined using TSEBRA (Gabriel et al. 2023) and output in GFF format (Pertea and Pertea 2020).
Putative gene functions were identified first using InterProScan v5.65-97 (Jones et al. 2014). All default analyses were run, with the Phobius v1.01 (Käll et al. 2004), SignalP v4.1 (Petersen et al. 2011), and TMHMM v2.0 (Sonnhammer et al. 1998; Krogh et al. 2001) analyses also enabled. The eggNOG-mapper v2.1.12 online database was queried with default settings to conduct functional annotation of protein sequences using orthology assignments (Cantalapiedra et al. 2021). The “annotate” function in funannotate v1.8.15 was then used to combine the vertebrate BUSCO data set with the outputs of InterProScan (XML format), eggNOG, and BRAKER3. The final annotation was evaluated in BUSCO v5.5.0 (Manni et al. 2021), and summary statistics were calculated using the agat_sp_statistics.pl script from AGAT v1.2.0 (Dainat 2023).
Comparative genomic analysis
The Colorado pikeminnow genome was aligned to 2 other southwestern North American leuciscids, spikedace (Meda fulgida; GCA_030578275.1) and loach minnow (Tiaroga cobitis; GCA_030578255.1), to identify syntenic regions and genomic structural rearrangements (Alexandre et al. 2023). Alignments were conducted using the default (-ax -asm5) options in Minimap2 v2.24. Colorado pikeminnow scaffolds were reverse complemented as necessary using a custom script (revcomList.pl) to facilitate alignment to other genomes. Synteny was evaluated using SyRI v1.6.3 (Goel et al. 2019) and visualized in plotsr v1.1.1 (Goel and Schneeberger 2022).
OrthoFinder v2.5.5 (Emms and Kelly 2019) was executed using default settings to identify orthogroups from the proteomes of 6 species from suborder Cyprinoidei with annotated genomes in the NCBI RefSeq database: common carp, goldfish, zebrafish, fathead minnow, and speckled dace (Rhinichthys osculus; GCF_029890125.1). Large orthogroups and gene family expansions were identified following the methods of Alexandre et al. (2023). First, a custom script (findLargestOrthogroups.pl) was employed to identify the 10 largest orthogroups in Colorado pikeminnow with at least 1 ortholog present in the outgroup genome (zebrafish). Secondly, gene family expansions were discovered by using another custom script (findExpandedOrthogroups.pl) to identify orthogroups with gene counts in Colorado pikeminnow that were at least 5× greater than in zebrafish. The outputs of these 2 analyses are respectively called the “largest orthogroups” and “expanded orthogroups” data sets. Orthologs were extracted from the zebrafish proteome and evaluated in STRING-DB v12.0 (Szklarczyk et al. 2021) using the “search proteins by sequences” function to identify Gene Ontology (GO) terms and evaluate functional enrichment in Colorado pikeminnow. Statistical significance of enrichment was assessed using P-values corrected by a 5% false discovery rate.
Results and discussion
The HMW DNA extraction yielded 36.2 μg DNA with 71.9% of fragments being >30,000 bp in length. PacBio HiFi sequencing yielded 2,893,341 reads totaling 36.3 Gb of sequence data. HiFi sequence data had a read N50 of 13,744 bp and a maximum read length of 76,826 bp. The Hi-C library yielded 218,603,599 reads totaling 66 Gb sequence data, of which 218,308,942 reads (65.4 Gb) were retained after quality filtering. RNA sequence reads for 11 tissue types totaled 340,027,426 reads (102.7 Gb), of which 268,869,255 reads (74.1 Gb) were retained postquality filtering. RNA extraction yield and number of sequence reads per tissue type are provided in Supplementary Table 1.
K-mer analysis predicted a genome size between 1,040,088,571 and 1,041,803,284 bp (Supplementary Fig. 1). This was smaller than the 1,232,280,000 bp length predicted from flow cytometry analysis of red blood cells (Gold and Li 1994). The initial assembly of 337 contigs fell between these expected values at 1,103,106,296 bp in length (Supplementary Table 2 and Supplementary Fig. 2). The final genome length, after Hi-C scaffolding and removal of contaminants, was 1,100,610,714 bp (Fig. 2). The Hi-C contact map is provided in Supplementary Fig. 3. The final product was a highly contiguous genome with 97.5% of its total length being contained in 25 scaffolds, coincident with the haploid number of chromosomes expected for Colorado pikeminnow (Jenkin et al. 1992). Mean sequencing depth of HiFi reads for the final assembly was measured at 35× (Supplementary Fig. 4).
Fig. 2.
A snail plot was prepared in BlobToolKit v3.5.4 to highlight postscaffolding genome assembly statistics, genome-wide GC content, and BUSCO scores.
Genome assembly statistics were compared to the loach minnow, spikedace, and zebrafish reference genomes in Table 1. The Colorado pikeminnow genome compared favorably with these species in measures of N50, N90, L50, and L90. Colorado pikeminnow had the third-greatest genome length of these 4 species and exhibited the second-largest N50 (41.8 Mb), L50 (11 scaffolds), and L90 (23 scaffolds) values. Its N90 value (31.5 Mb) was the largest of the 4 species evaluated. BUSCO values also compared favorably with those for spikedace, loach minnow, and zebrafish (Table 2). Colorado pikeminnow had the greatest numbers of complete BUSCOs (N = 3564; 97.9%) and single-copy BUSCOs (N = 3502; 96.2%). It had the lowest number of fragmented BUSCOs (N = 22; 0.6%) and second-fewest duplications (N = 62; 1.7%) and was tied with loach minnow for the fewest missing BUSCOs (N = 54; 1.5%).
Table 1.
Genome length summary statistics calculated for Colorado pikeminnow in Quast v5.2.0.
| Statistics | Colorado pikeminnow | Spikedace | Loach minnow | Zebrafish |
|---|---|---|---|---|
| # contigs (≥0 bp) | 305 | 84 | 551 | 1,923 |
| # contigs (≥1,000 bp) | 305 | 84 | 551 | 1,922 |
| # contigs (≥5,000 bp) | 276 | 73 | 551 | 1,824 |
| # contigs (≥10,000 bp) | 259 | 71 | 551 | 1,727 |
| # contigs (≥25,000 bp) | 200 | 68 | 499 | 1,281 |
| # contigs (≥50,000 bp) | 139 | 64 | 283 | 1,091 |
| Largest contig | 63,658,627 | 52,739,976 | 67,399,795 | 78,093,715 |
| Total length (≥0 bp) | 1,100,610,714 | 882,128,298 | 1,320,538,350 | 1,679,203,469 |
| Total length (≥1,000 bp) | 1,100,610,714 | 882,128,298 | 1,320,538,350 | 1,679,202,819 |
| Total length (≥5,000 bp) | 1,100,538,438 | 882,108,831 | 1,320,538,350 | 1,678,981,660 |
| Total length (≥10,000 bp) | 1,100,417,455 | 882,098,831 | 1,320,538,350 | 1,678,178,846 |
| Total length (≥25,000 bp) | 1,099,431,699 | 882,046,241 | 1,319,423,137 | 1,671,157,076 |
| Total length (≥50,000 bp) | 1,097,192,067 | 881,919,014 | 1,311,861,986 | 1,664,545,481 |
| N50 | 41,794,964 | 34,805,954 | 48,679,909 | 52,186,027 |
| N90 | 31,481,000 | 26,365,680 | 30,906,194 | 339,135 |
| auN | 43,728,421 | 35,515,798 | 45,453,224 | 44,676,698 |
| L50 | 11 | 11 | 13 | 14 |
| L90 | 23 | 22 | 25 | 405 |
| GC (%) | 39.50% | 39.33% | 40.28% | 36.60% |
| # N's per 100 kbp | 16.40 | 7.12 | 3.02 | 279.52 |
| # N's | 180,500 | 62,784 | 39,837 | 4,693,316 |
Statistics are also provided for genome assemblies of 3 related species for comparison: spikedace (M. fulgida; NCBI accession = GCA_030578275.1), loach minnow (T. cobitis; NCBI accession = GCA_030578255.1), and zebrafish (D. rerio; NCBI accession = GCF_000002035.6). All statistics were calculated considering only contigs ≥ 3,000 bp in length unless otherwise noted.
Table 2.
Genome completeness for Colorado pikeminnow was assessed in BUSCO v5.5.0 using the Actinopterygii database version odb10.2019-11-20 containing 3,640 single-copy orthologs.
| Colorado pikeminnow | Spikedace | Loach minnow | Zebrafish | |||||
|---|---|---|---|---|---|---|---|---|
| BUSCO | Total | Percent (%) | Total | Percent (%) | Total | Percent (%) | Total | Percent (%) |
| Complete | 3,564 | 97.9 | 3,533 | 97.0 | 3,557 | 97.7 | 3,496 | 96.0 |
| Single copy | 3,502 | 96.2 | 3,484 | 95.7 | 3,439 | 94.5 | 2,861 | 78.6 |
| Duplicated | 62 | 1.7 | 49 | 1.3 | 118 | 3.2 | 635 | 17.4 |
| Fragmented | 22 | 0.6 | 32 | 0.9 | 29 | 0.8 | 60 | 1.6 |
| Missing | 54 | 1.5 | 75 | 2.1 | 54 | 1.5 | 84 | 2.4 |
BUSCO scores are provided for genome assemblies of 3 related species for comparison: spikedace (M. fulgida; NCBI accession = GCA_030578275.1), loach minnow (T. cobitis; NCBI accession = GCA_030578255.1), and zebrafish (D. rerio; NCBI accession = GCF_000002035.6).
Repeat element identification
Repetitive elements comprised 609,816,837 bp (55.41%) of the Colorado pikeminnow genome (Supplementary Table 3). Retroelements made up 91.7 Mb (8.33%) of the genome, whereas DNA transposons were 185.5 Mb (16.86%), rolling circles were 27.6 Mb (2.51%), and 287.8 Mb (26.15%) were unclassified elements. The 2 largest categories of retroelements were LINEs (16.1 Mb; 1.47%) and LTR elements (72.5 Mb; 6.59%). Within LTR elements, most were categorized as Gypsy or DIRS1 (58.4 Mb; 5.3%). The top categories of DNA transposons were hobo-Activator elements (47.0 Mb; 4.27%) and Tourist/Harbinger elements (17.1 Mb; 1.55%). Satellites (12.1 Mb; 1.1%) and simple repeats (3.1 Mb; 0.28%) accounted for relatively small proportions of the genome.
Annotation
BRAKER3 identified 32,933 mRNAs representing 25,192 genes that covered 435.16 Mb (39.54%) of the genome (Supplementary Table 4). A total of 379,233 exons (mean = 11.5 per mRNA) and 346,300 introns (mean = 10.5 per mRNA) were found. On average, genes were 17,273 bp in length. Mean exon length was 292 bp and mean intron length was 1,921 bp. After filtering the proteome to retain the longest isoform of each gene, BUSCO analysis of the transcriptome indicated high retention of single-copy orthologs and low levels of duplication [complete = 3,539 (97.3%), complete and single-copy = 3,489 (95.9%), complete and duplicated = 50 (1.4%), fragmented = 12 (0.3%), missing = 89 (2.4%); database = actinopterygii_odb10]. BUSCO results for individual tissue types are provided in Supplementary Fig. 5.
Comparative genomic analysis
Syntenic comparisons to loach minnow and spikedace genomes revealed that approximately 50% of the Colorado pikeminnow assembly was comprised of syntenic regions corresponding to each of these reference genomes (loach minnow = 518.46 Mb, 47.11% assembly, Fig. 3, Supplementary Table 5; spikedace = 562.45 Mb, 51.10% assembly, Fig. 4, Supplementary Table 6). Genomic structural rearrangements were less frequently detected. Inversions were the most detected of all rearrangement categories (loach minnow = 99.03 Mb, 9.0%; spikedace = 116.97 Mb, 10.63%), followed by translocations (loach minnow = 26.84 Mb, 2.44%; spikedace = 15.58 Mb, 1.42%) and duplications (loach minnow = 16.21 Mb, 1.47%; spikedace = 18.82 Mb, 1.71%). Approximately one-third of the Colorado pikeminnow genome failed to align with these 2 genomes (loach minnow = 412.54 Mb, 37.48%; spikedace = 359.15 Mb, 32.63%).
Fig. 3.
Synteny of the Colorado pikeminnow genome with loach minnow (T. cobitis; NCBI accession = GCA_030578255.1) was assessed by first aligning the genomes with the default options in Minimap2 v2.24. Synteny was evaluated in SyRI v1.6.3, and plots were produced in plotsr v1.1.1. Chromosomes are labeled on the y-axis according to NCBI GenBank accession numbers for loach minnow chromosomes.
Fig. 4.
Synteny of the Colorado pikeminnow genome with spikedace (M. fulgida; NCBI accession = GCA_030578275.1) was assessed by first aligning the genomes with the default options in Minimap2 v2.24. Synteny was evaluated in SyRI v1.6.3, and plots were produced in plotsr v1.1.1. Chromosomes are labeled on the y-axis according to NCBI GenBank accession numbers for spikedace chromosomes.
The similar proportions of syntenic regions shared by Colorado pikeminnow with loach minnow and spikedace is concordant with observed phylogenetic relationships indicating that Colorado pikeminnow is approximately equally phylogenetically distant from these 2 species (Schönhuth et al. 2018). Colorado pikeminnow likely diverged from spikedace and loach minnow approximately 36–38 million years ago (adjusted divergence times obtained from the TimeTree v5 database; Supplementary Table 7) (Kumar et al. 2017, 2022). Alexandre et al. (2023) compared spikedace and loach minnow to sunbleak (Leucaspius delineatus) and found that 66.2% (spikedace) and 54.4% (loach minnow) of each species’ genome were syntenic with sunbleak. Each of these 4 species belongs to a different subfamily within Leuciscidae (Colorado pikeminnow = Laviniinae, loach minnow = Pogonichthyinae, spikedace = Plagopterinae, and sunbleak = Leuciscinae). Three of these families (Leuciscinae, Plagopterinae, and Pogonichthyinae) share a more recent common ancestor with one another than with Laviniinae (Schönhuth et al. 2018). Although the 47.1–51.1% synteny that Colorado pikeminnow shares with loach minnow and spikedace seems low, it is congruent with values observed among other Leuciscidae and reflects the divergence time among these species (Alexandre et al. 2023). Other studies have found synteny to be variable among Teleost fish families, as evidenced by a mean 75.3% synteny among 3 African cichlid species (Cichlidae), and 84.4% synteny among 3 livebearer species (Poeciliidae) (Pandey et al. 2020).
The OrthoFinder analysis assigned 202,658 genes representing 95.2% of the total input genes to 25,465 orthogroups. Half of all genes were assigned to 9,004 orthogroups each containing ≥8 genes. All 6 species were represented in 16,578 orthogroups, and 2,083 of these (12.56%) were single-copy orthogroups. OrthoFinder recovered a species tree (Fig. 5) that was concordant with expected phylogenetic relationships (Schönhuth et al. 2018). Statistical support, measured as the proportion of gene trees matching the species tree topology, was high for all nodes (support ≥ 0.89) except the split between speckled dace and fathead minnow (support = 0.59).
Fig. 5.
A species tree was calculated in OrthoFinder v2.5.5 from 16,578 orthogroups that contained all 6 species. Nodal support values represent the proportion of gene trees that support each species tree bipartition. The species tree was rooted by D. rerio, which was algorithmically detected by OrthoFinder as the outgroup taxon.
Significantly enriched GO terms were detected when evaluating the “largest orthogroups” data set. These included a GO process related to gastrulation (GO:0001702; P = 0.0023) and 2 GO functions relating to RNA polymerase II transcription regulatory region sequence (GO:0000977; P = 0.0025) and DNA-binding transcription factor activity (GO:0000981; P = 0.0025). Uncharacterized sequences related to insulin/insulin-like growth factor/relaxin family (CL:44178, P = 0.00058) and immunoglobulin (SM00409, P < 0.00001) were detected among the “expanded orthogroups,” but these gene families were not associated with GO terms.
Analysis of the “largest” and “expanded orthogroups” data sets identified ortholog groups with importance for gene expression, stress response, growth, and development (Supplementary Tables 8 and 9). These functions are important for all organisms, and additional studies should be conducted to examine whether the detected gene families represent lineage-specific expansions once additional genomic resources become available for other desert fish taxa (Lespinet et al. 2002; Fortna et al. 2004). However, there are multiple ecological factors that could explain the importance of these families in Colorado pikeminnow, which is regularly subject to environmental stressors such as elevated water temperatures and drought (Woodhouse et al. 2010; Seager et al. 2013). Significant enrichment of immunoglobulin is relevant in this context, since adaptive immune response is hypothesized to provide a greater relative contribution to immunocompetence in fishes at elevated environmental temperatures (Scharsack and Franke 2022). Regulation of transcription (e.g. GO:0000977 and GO:0000981) is important for all organisms, but could be especially vital for fishes living in fluctuating environments since these genes allow for timely response to environmental stressors via gene expression (de Nadal et al. 2011). Finally, genes regulating growth and development (e.g. GO:0001702 and CL:44178) (Chandhini et al. 2021) could be important because Colorado pikeminnow undergoes an ontogenetic diet shift from consuming aquatic invertebrates to exclusive piscivory (Vanicek and Kramer 1969). Growth both facilitates ontogenetic diet shifts for piscivorous fishes and allows consumption of larger prey (Juanes et al. 1994; Persson and Brönmark 2002). The importance of growth to the Colorado pikeminnow life history strategy has also been supported by observations of immature pikeminnow migrating to waters that allow for an extended growth period during winter and spring months (Durst and Franssen 2014).
The characterization of reference genomes for nonmodel organisms benefits multiple research fields, including both evolutionary and conservation genomics. The Colorado pikeminnow reference genome adds to a growing number of reference genomes for Leuciscidae (Martinson et al. 2022; Meuser et al. 2023), multiple of which are species of conservation concern from the Colorado River (Alexandre et al. 2023; Suchocki et al. 2023). Over time, the accrual of these resources will facilitate better understanding of how species evolve and adapt to their environments (Zhang et al. 2017; Bo et al. 2022). They will help provide novel insights into conservation issues that impact endangered species (Supple and Shapiro 2018; Formenti et al. 2022; Theissinger et al. 2023) and offer potential to examine endangered populations with unprecedented resolution as we determine how to make actionable population management decisions from genome-scale data (Kardos et al. 2021; Schiebelhut et al. 2024).
Supplementary Material
Acknowledgments
Special thanks are due to fish culture staff at SNARRC (W. Knight, J. Trujillo, R. Wirick, and M. Garnett) for providing access to the Colorado pikeminnow broodstock and fish culture facilities. N. Franssen, K. Han, P. Koenig, M. Saltzgiver, M. Ulibarri, and W. Wilson also provided support. Animal care and handling of endangered fishes was carried out under U.S. Fish and Wildlife permit TE676811-0. Computing resources were provided by the Arkansas High Performance Computing Center. The use of trade, product, industry, or firm names is for informative purposes only and does not constitute an endorsement by the US government or the U.S. Fish and Wildlife Service. Links to nonservice web sites do not imply any official U.S. Fish and Wildlife Service endorsement of the opinions or ideas expressed therein or guarantee the validity of the information provided. The findings, conclusions, and opinions expressed in this article represent those of the author and do not necessarily represent the views of the U.S. Fish & Wildlife Service.
Data availability
Raw RNA and DNA sequence data are available in the NCBI Sequence Read Archive (SRA) under BioProject PRJNA1003539. The assembled genome is available under GenBank accession JAVCAX000000000. The genome annotation is provided on the Open Science Framework at https://osf.io/w9zuy/ (DOI: 10.17605/OSF.IO/W9ZUY). All scripts utilized in genome assembly are provided at https://github.com/stevemussmann/cpm_genome_assembly (DOI:10.5281/zenodo.10626887).
Supplemental material available at G3 online.
Funding
This work was funded through the National Fish and Wildlife Foundation (project # 8006.21.072189).
Literature cited
- Alexandre NM, Cameron AC, Tian D, Chatla K, Kolora SR, Whiteman NK, Turner TF, Reinthal PN. 2023. Chromosome-level reference genomes of two imperiled desert fishes: spikedace (Meda fulgida) and loach minnow (Tiaroga cobitis). G3 (Bethesda). 13:jkad157. doi: 10.1093/g3journal/jkad157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Allendorf FW, Hohenlohe PA, Luikart G. 2010. Genomics and the future of conservation genetics. Nat Rev Genet. 11:697–709. doi: 10.1038/nrg2844. [DOI] [PubMed] [Google Scholar]
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol. 215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- Bao W, Kojima KK, Kohany O. 2015. Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 6:11. doi: 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bo J, Xu H, Lv W, Wang C, He S, Yang L. 2022. Molecular mechanisms of the convergent adaptation of bathypelagic and abyssopelagic fishes. Genome Biol Evol. 14:evac109. doi: 10.1093/gbe/evac109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Borley K, White MM. 2006. Mitochondrial DNA variation in the endangered Colorado pikeminnow: a comparison among hatchery stocks and historic specimens. N Am J Fish Manag. 26:916–920. doi: 10.1577/M05-176.1. [DOI] [Google Scholar]
- Brůna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. 2021. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform. 3:lqaa108. doi: 10.1093/nargab/lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brůna T, Lomsadze A, Borodovsky M. GeneMark-ETP: automatic gene finding in eukaryotic genomes in consistency with extrinsic data. [preprint]. 10.1101/2023.01.13.524024 [DOI]
- Buchfink B, Xie C, Huson DH. 2015. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
- Burress ED, Holcomb JM, Tan M, Armbruster JW. 2017. Ecological diversification associated with the benthic-to-pelagic transition by North American minnows. J Evol Biol. 30:549–560. doi: 10.1111/jeb.13024. [DOI] [PubMed] [Google Scholar]
- Campbell EO, Brunet BMT, Dupuis JR, Sperling FAH. 2018. Would an RRS by any other name sound as RAD? Methods Ecol Evol. 9:1920–1927. doi: 10.1111/2041-210X.13038. [DOI] [Google Scholar]
- Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J. 2021. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol. 38:5825–5829. doi: 10.1093/molbev/msab293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Challis R, Richards E, Rajan J, Cochrane G, Blaxter M. 2020. BlobToolKit—interactive quality assessment of genome assemblies. G3 (Bethesda). 10:1361–1374. doi: 10.1534/g3.119.400908. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chandhini S, Trumboo B, Jose S, Varghese T, Rajesh M, Kumar VJR. 2021. Insulin-like growth factor signalling and its significance as a biomarker in fish and shellfish research. Fish Physiol Biochem. 47:1011–1031. doi: 10.1007/s10695-021-00961-6. [DOI] [PubMed] [Google Scholar]
- Cheng H, Concepcion GT, Feng X, Zhang H, Li H. 2021. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 18:170–175. doi: 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cheng H, Jarvis ED, Fedrigo O, Koepfli KP, Urban L, Gemmell NJ, Li H. 2022. Haplotype-resolved assembly of diploid genomes without parental data. Nat Biotechnol. 40:1332–1335. doi: 10.1038/s41587-022-01261-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clark SR, Conner MM, Durst SL, Franssen NR. 2018. Age-specific estimates indicate potential deleterious capture effects and low survival of stocked juvenile Colorado pikeminnow. N Am J Fish Manag. 38:1059–1074. doi: 10.1002/nafm.10214. [DOI] [Google Scholar]
- Comte L, Olden JD, Lischka S, Dickson BG. 2022. Multi-scale threat assessment of riverine ecosystems in the Colorado River Basin. Ecol Indic. 138:108840. doi: 10.1016/j.ecolind.2022.108840. [DOI] [Google Scholar]
- Connon RE, Jeffries KM, Komoroske LM, Todgham AE, Fangue NA. 2018. The utility of transcriptomics in fish conservation. J Exp Biol. 221:jeb148833. doi: 10.1242/jeb.148833. [DOI] [PubMed] [Google Scholar]
- de Nadal E, Ammerer G, Posas F. 2011. Controlling gene expression in response to stress. Nat Rev Genet. 12:833–845. doi: 10.1038/nrg3055. [DOI] [PubMed] [Google Scholar]
- Dainat J. 2023. AGAT: another GFF analysis toolkit to handle annotations in any GTF/GFF format. (Version v1.2.0). Zenodo. doi: 10.5281/zenodo.3552717. [DOI]
- Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, et al. 2021. Twelve years of SAMtools and BCFtools. GigaScience. 10:giab008. doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dibble KL, Yackulic CB, Bestgen KR, Gido K, Jones MT, McKinstry MC, Osmundson DB, Ryden D, Schelly RC. 2023. Assessment of potential recovery viability for Colorado pikeminnow Ptychocheilus lucius in the Colorado River in Grand Canyon. J Fish Wildlife Manage. 14:239–268. doi: 10.3996/JFWM-22-031. [DOI] [Google Scholar]
- Diver TA, Harrison AS, Wilson WD. 2019. Genetic evaluation and history of captive broodstock populations of endangered Colorado pikeminnow (Ptychocheilus lucius). Report to the San Juan River Basin Recovery Implementation Program: U.S. Fish and Wildlife Service. 31. https://coloradoriverrecovery.org/sj/wp-content/uploads/sites/3/2022/04/gen_Genetic_evaluation_CPM_broodstock_2019_OCR.pdf
- Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander ES, Aiden AP, et al. 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, Aiden EL. 2016. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3:99–101. doi: 10.1016/j.cels.2015.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Durand NC, Shamim MS, Machol I, Rao SS, Huntley MH, Lander ES, Aiden EL. 2016. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Durst SL, Franssen NR. 2014. Movement and growth of juvenile Colorado pikeminnows in the San Juan River, Colorado, New Mexico, and Utah. Trans Am Fish Soc. 143:519–527. doi: 10.1080/00028487.2013.869258. [DOI] [Google Scholar]
- Emms DM, Kelly S. 2019. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20:238. doi: 10.1186/s13059-019-1832-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fan G, Song Y, Yang L, Huang X, Zhang S, Zhang M, Yang X, Chang Y, Zhang H, Li Y, et al. 2020. Initial data release and announcement of the 10,000 fish genomes project (Fish10K). GigaScience. 9:giaa080. doi: 10.1093/gigascience/giaa080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, Smit AF. 2020. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A. 117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Formenti G, Theissinger K, Fernandes C, Bista I, Bombarely A, Bleidorn C, Ciofi C, Crottini A, Godoy JA, Höglund J, et al. 2022. The era of reference genomes in conservation genomics. Trends Ecol Evol. 37:197–202. doi: 10.1016/j.tree.2021.11.008. [DOI] [PubMed] [Google Scholar]
- Fortna A, Kim Y, MacLaren E, Marshall K, Hahn G, Meltesen L, Brenton M, Hink R, Burgers S, Hernandez-Boussard T, et al. 2004. Lineage-specific gene duplication and loss in human and great ape evolution. PLoS Biol. 2:e207. doi: 10.1371/journal.pbio.0020207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Franssen NR, Durst SL, Gido KB, Ryden DW, Lamarra V, Propst DL. 2016. Long-term dynamics of large-bodied fishes assessed from spatially intensive monitoring of a managed desert river. River Res Appl. 32:348–361. doi: 10.1002/rra.2855. [DOI] [Google Scholar]
- Gabriel L, Brůna T, Hoff KJ, Ebel M, Lomsadze A, Borodovsky M, Stanke M. BRAKER3: Fully automated genome annotation using RNA-Seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA. [preprint]. 10.1101/2023.06.10.544449 [DOI] [PMC free article] [PubMed]
- Goel M, Schneeberger K. 2022. Plotsr: visualizing structural similarities and rearrangements between multiple genomes. Bioinformatics. 38:2922–2926. doi: 10.1093/bioinformatics/btac196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goel M, Sun H, Jiao W-B, Schneeberger K. 2019. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20:277. doi: 10.1186/s13059-019-1911-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gold JR, Li Y. 1994. Chromosomal NOR karyotypes and genome size variation among squawfishes of the genus Ptychocheilus (Teleostei: Cyprinidae). Copeia. 1994:60–65. doi: 10.2307/1446671. [DOI] [Google Scholar]
- Gotelli NJ, Pyron M. 1991. Life history variation in North American freshwater minnows: effects of latitude and phylogeny. Oikos. 62:30–40. doi: 10.2307/3545443. [DOI] [Google Scholar]
- Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 29:1072–1075. doi: 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. 2016. BRAKER1: unsupervised RNA-seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 32:767–769. doi: 10.1093/bioinformatics/btv661. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoff KJ, Lomsadze A, Borodovsky M, Stanke M. 2019. Whole-genome annotation with BRAKER. In: Kollmar M, editors. Gene Prediction: Methods and Protocols. New York (NY): Springer New York. p. 65–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jenkin JD, Li YC, Gold JR. 1992. Cytogenetic studies in North American minnows (Cyprinidae) XXVI. Chromosomal NOR phenotypes of 21 species from the western United States. Cytologia (Tokyo). 57:443–453. doi: 10.1508/cytologia.57.443. [DOI] [Google Scholar]
- Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, et al. 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics. 30:1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Juanes F, Buckel JA, Conover DO. 1994. Accelerating the onset of piscivory: intersection of predator and prey phenologies. J Fish Biol. 45:41–54. doi: 10.1111/j.1095-8649.1994.tb01083.x. [DOI] [Google Scholar]
- Käll L, Krogh A, Sonnhammer ELL. 2004. A combined transmembrane topology and signal peptide prediction method. J Mol Biol. 338:1027–1036. doi: 10.1016/j.jmb.2004.03.016. [DOI] [PubMed] [Google Scholar]
- Kardos M, Armstrong EE, Fitzpatrick SW, Hauser S, Hedrick PW, Miller JM, Tallmon DA, Funk WC. 2021. The crucial role of genome-wide genetic variation in conservation. Proc Natl Acad Sci U S A. 118:e2104642118. doi: 10.1073/pnas.2104642118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. 2019. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 37:907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Komoroske LM, Jeffries KM, Whitehead A, Roach JL, Britton M, Connon RE, Verhille C, Brander SM, Fangue NA. 2021. Transcriptional flexibility during thermal challenge corresponds with expanded thermal tolerance in an invasive compared to native fish. Evol Appl. 14:931–949. doi: 10.1111/eva.13172. [DOI] [Google Scholar]
- Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. 2019. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20:278. doi: 10.1186/s13059-019-1910-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krogh A, Larsson B, von Heijne G, Sonnhammer ELL. 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 305:567–580. doi: 10.1006/jmbi.2000.4315. [DOI] [PubMed] [Google Scholar]
- Kumar S, Stecher G, Suleski M, Hedges SB. 2017. TimeTree: a resource for timelines, timetrees, and divergence times. Mol Biol Evol. 34:1812–1819. doi: 10.1093/molbev/msx116. [DOI] [PubMed] [Google Scholar]
- Kumar S, Suleski M, Craig JM, Kasprowicz AE, Sanderford M, Li M, Stecher G, Hedges SB. 2022. TimeTree 5: an expanded resource for species divergence times. Mol Biol Evol. 39:msac174. doi: 10.1093/molbev/msac174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods. 9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lespinet O, Wolf YI, Koonin EV, Aravind L. 2002. The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res. 12:1048–1059. doi: 10.1101/gr.174302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. 2021. New strategies to improve minimap2 alignment accuracy. Bioinformatics. 37:4572–4574. doi: 10.1093/bioinformatics/btab705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. 2021. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 38:4647–4654. doi: 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17:10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
- Martin RM, Robinson ML, Wilson WD. 2015. Isolation and characterization of twenty-five novel microsatellite loci in Colorado pikeminnow, Ptychocheilus lucius, with cross-species amplification for eight other cyprinids. Conserv Genet Resour. 7:113–117. doi: 10.1007/s12686-014-0306-5. [DOI] [Google Scholar]
- Martin SD, Bonett RM. 2015. Biogeography and divergent patterns of body size disparification in North American minnows. Mol Phylogenet Evol. 93:17–28. doi: 10.1016/j.ympev.2015.07.006. [DOI] [PubMed] [Google Scholar]
- Martinson JW, Bencic DC, Toth GP, Kostich MS, Flick RW, See MJ, Lattier D, Biales AD, Huang W. 2022. De novo assembly of the nearly complete fathead minnow reference genome reveals a repetitive but compact genome. Environ Toxicol Chem. 41:448–461. doi: 10.1002/etc.5266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mérot C, Oomen RA, Tigano A, Wellenreuther M. 2020. A roadmap for understanding the evolutionary significance of structural genomic variation. Trends Ecol Evol. 35:561–572. doi: 10.1016/j.tree.2020.03.002. [DOI] [PubMed] [Google Scholar]
- Meuser AV, Pitura AR, Mandeville EG. 2023. A high-quality reference genome for the common creek chub, Semotilus atromaculatus. G3 (Bethesda). 14(2):jkad283. doi: 10.1093/g3journal/jkad283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miller RR. 1961. Man and the changing fish fauna of the American southwest. Papers Michigan Acad Sci Arts Lett. 46:365–404. https://www.nativefishlab.net/library/textpdf/15337.pdf [Google Scholar]
- Minckley W, Deacon JE. 1968. Southwestern fishes and the enigma of “endangered species.”. Science. 159:1424–1432. doi: 10.1126/science.159.3822.1424. [DOI] [PubMed] [Google Scholar]
- Minckley W, Hendrickson D, Bond C. 1986. Geography of western North American freshwater fishes: description and relationships to intracontinental tectonism. In: Hocutt CH, Wiley EO, editors. The Zoogeography of North American Freshwater Fishes. New York: John Wiley & Sons. p. 519–613. [Google Scholar]
- Minckley W, Marsh PC. 2009. Inland Fishes of the Greater Southwest: Chronicle of a Vanishing Biota. Tucson (AZ): University of Arizona Press. [Google Scholar]
- Morizot DC, Williamson JH, Carmichael GJ. 2002. Biochemical genetics of Colorado pikeminnow. N Am J Fish Manag. 22:66–76. doi:. [DOI] [Google Scholar]
- O’Leary SJ, Puritz JB, Willis SC, Hollenbeck CM, Portnoy DS. 2018. These aren’t the loci you’re looking for: principles of effective SNP filtering for molecular ecologists. Mol Ecol. 27:3193–3206. doi: 10.1111/mec.14792. [DOI] [PubMed] [Google Scholar]
- Osmundson DB, White GC. 2017. Long-term mark-recapture monitoring of a Colorado pikeminnow Ptychocheilus lucius population: assessing recovery progress using demographic trends. Endang Species Res. 34:131–147. doi: 10.3354/esr00842. [DOI] [Google Scholar]
- Palmer JM, Stajich JE. 2023. nextgenusfs/funannotate: Funannotate (1.8.15). [Computer software]. Zenodo. doi: 10.5281/zenodo.1134477. [DOI]
- Pandey M, Kushwaha B, Kumar R, Srivastava P, Saroj S, Singh M. 2020. Evol2Circos: a web-based tool for genome synteny and collinearity analysis and its visualization in fishes. J Hered. 111:486–490. doi: 10.1093/jhered/esaa025. [DOI] [PubMed] [Google Scholar]
- Pennock CA, Bruckerhoff LA, Gido KB, Barkalow AL, Breen MJ, Budy P, Macfarlane WW, Propst DL. 2022. Failure to achieve recommended environmental flows coincides with declining fish populations: long-term trends in regulated and unregulated rivers. Freshw Biol. 67:1631–1643. doi: 10.1111/fwb.13966. [DOI] [Google Scholar]
- Persson A, Brönmark C. 2002. Foraging capacity and resource synchronization in an ontogenetic diet switcher, pikeperch (Stizostedion lucioperca). Ecology. 83:3014–3022. doi: 10.2307/3071838. [DOI] [Google Scholar]
- Pertea G, Pertea M. 2020. GFF utilities: GffRead and GffCompare. F1000Res. 9:ISCB Comm J-304. doi: 10.12688/f1000research.23297.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Petersen TN, Brunak S, von Heijne G, Nielsen H. 2011. Signalp 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 8:785–786. doi: 10.1038/nmeth.1701. [DOI] [PubMed] [Google Scholar]
- Poux S, Arighi CN, Magrane M, Bateman A, Wei CH, Lu Z, Boutet E, Bye-A-Jee H, Famiglietti ML, Roechert B, et al. 2017. On expert curation and scalability: UniProtKB/Swiss-Prot as a case study. Bioinformatics. 33:3454–3460. doi: 10.1093/bioinformatics/btx439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO. 2013. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41:D590–D596. doi: 10.1093/nar/gks1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ranallo-Benavidez TR, Jaron KS, Schatz MC. 2020. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 11:1432. doi: 10.1038/s41467-020-14998-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, et al. 2021. Towards complete and error-free genome assemblies of all vertebrate species. Nature. 592:737–746. doi: 10.1038/s41586-021-03451-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rhie A, Walenz BP, Koren S, Phillippy AM. 2020. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21:245. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ryden DW, Ahlm LA. 1996. Observations on the distribution and movements of Colorado squawfish, Ptychocheilus lucius, in the San Juan River, New Mexico, Colorado, and Utah. Southwest Nat. 41:161–168. https://www.jstor.org/stable/30055101 [Google Scholar]
- Scharsack JP, Franke F. 2022. Temperature effects on teleost immunity in the light of climate change. J Fish Biol. 101:780–796. doi: 10.1111/jfb.15163. [DOI] [PubMed] [Google Scholar]
- Schiebelhut LM, Guillaume AS, Kuhn A, Schweizer RM, Armstrong EE, Beaumont MA, Byrne M, Cosart T, Hand BK, Howard L, et al. 2024. Genomics and conservation: guidance from training to analyses and applications. Mol Ecol Resour. 24:e13893. doi: 10.1111/1755-0998.13893. [DOI] [PubMed] [Google Scholar]
- Schmieder R, Edwards R. 2011. Quality control and preprocessing of metagenomic datasets. Bioinformatics. 27:863–864. doi: 10.1093/bioinformatics/btr026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schönhuth S, Vukić J, Šanda R, Yang L, Mayden RL. 2018. Phylogenetic relationships and classification of the Holarctic family Leuciscidae (Cypriniformes: Cyprinoidei). Mol Phylogenet Evol. 127:781–799. doi: 10.1016/j.ympev.2018.06.026. [DOI] [PubMed] [Google Scholar]
- Seager R, Ting M, Li C, Naik N, Cook B, Nakamura J, Liu H. 2013. Projections of declining surface-water availability for the southwestern United States. Nat Clim Chang. 3:482–486. doi: 10.1038/nclimate1787. [DOI] [Google Scholar]
- Sim SB, Corpuz RL, Simmonds TJ, Geib SM. 2022. HiFiAdapterFilt, a memory efficient read processing pipeline, prevents occurrence of adapter sequence in PacBio HiFi reads and their negative impacts on genome assembly. BMC Genomics. 23:157. doi: 10.1186/s12864-022-08375-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smit AFA, Hubley R, Green P. 2015.RepeatMasker Open-4.0 [Computer software] http://www.repeatmasker.org.
- Song L, Florea L. 2015. Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads. GigaScience. 4:48. doi: 10.1186/s13742-015-0089-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sonnhammer EL, von Heijne G, Krogh A. 1998. A hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol. 6:175–182. https://pubmed.ncbi.nlm.nih.gov/9783223/ [PubMed] [Google Scholar]
- Stanke M, Diekhans M, Baertsch R, Haussler D. 2008. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 24:637–644. doi: 10.1093/bioinformatics/btn013. [DOI] [PubMed] [Google Scholar]
- Stanke M, Schöffmann O, Morgenstern B, Waack S. 2006. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 7:62. doi: 10.1186/1471-2105-7-62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suchocki CR, Ka'apu-Lyons C, Copus JM, Walsh CAJ, Lee AM, Carter JM, Johnson EA, Etter PD, Forsman ZH, Bowen BW, et al. 2023. Geographic destiny trumps taxonomy in the roundtail chub, Gila robusta species complex (Teleostei, Leuciscidae). Sci Rep. 13:15810. doi: 10.1038/s41598-023-41719-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Supple MA, Shapiro B. 2018. Conservation of biodiversity in the genomics era. Genome Biol. 19:131. doi: 10.1186/s13059-018-1520-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, Doncheva NT, Legeay M, Fang T, Bork P, et al. 2021. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49:D605–D612. doi: 10.1093/nar/gkaa1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Theissinger K, Fernandes C, Formenti G, Bista I, Berg PR, Bleidorn C, Bombarely A, Crottini A, Gallo GR, Godoy JA, et al. 2023. How genomics can help biodiversity conservation. Trends Genet. 39:545–559. doi: 10.1016/j.tig.2023.01.005. [DOI] [PubMed] [Google Scholar]
- Tyus HM. 1990. Potamodromy and reproduction of Colorado squawfish in the Green River Basin, Colorado and Utah. Trans Am Fish Soc. 119:1035–1047. doi:. [DOI] [Google Scholar]
- Tyus HM, McAda CW. 1984. Migration, movements and habitat preferences of Colorado Squawfish, Ptychocheilus lucius, in the Green, White and Yampa Rivers, Colorado and Utah. Southwest Nat. 29:289–299. doi: 10.2307/3671360. [DOI] [Google Scholar]
- Vanicek CD, Kramer RH. 1969. Life history of the Colorado squawfish, Ptychocheilus lucius, and the Colorado chub, Gila robusta, in the Green River in Dinosaur National Monument, 1964–1966. Trans Am Fish Soc. 98:193–208. doi: 10.1577/1548-8659(1969)98[193:LHOTCS]2.0.CO;2. [DOI] [Google Scholar]
- Wellenreuther M, Mérot C, Berdan E, Bernatchez L. 2019. Going beyond SNPs: the role of structural genomic variants in adaptive evolution and species diversification. Mol Ecol. 28:1203–1209. doi: 10.1111/mec.15066. [DOI] [PubMed] [Google Scholar]
- Wood DE, Lu J, Langmead B. 2019. Improved metagenomic analysis with Kraken 2. Genome Biol. 20:257. doi: 10.1186/s13059-019-1891-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woodhouse CA, Meko DM, MacDonald GM, Stahle DW, Cook ER. 2010. A 1,200-year perspective of 21st century drought in southwestern North America. Proc Natl Acad Sci U S A. 107:21283–21288. doi: 10.1073/pnas.0911197107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang D, Yu M, Hu P, Peng S, Liu Y, Li W, Wang C, He S, Zhai W, Xu Q, et al. 2017. Genetic adaptation of Schizothoracine fish to the phased uplifting of the Qinghai–Tibetan Plateau. G3 (Bethesda). 7:1267–1276. doi: 10.1534/g3.116.038406. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
Supplementary Materials
Data Availability Statement
Raw RNA and DNA sequence data are available in the NCBI Sequence Read Archive (SRA) under BioProject PRJNA1003539. The assembled genome is available under GenBank accession JAVCAX000000000. The genome annotation is provided on the Open Science Framework at https://osf.io/w9zuy/ (DOI: 10.17605/OSF.IO/W9ZUY). All scripts utilized in genome assembly are provided at https://github.com/stevemussmann/cpm_genome_assembly (DOI:10.5281/zenodo.10626887).
Supplemental material available at G3 online.





