Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2023 Jul 25;13:12051. doi: 10.1038/s41598-023-34198-5

Discovery of putative long non-coding RNAs expressed in the eyes of Astyanax mexicanus (Actinopterygii: Characidae)

Iuri Batista da Silva 1,2, David Aciole Barbosa 3, Karine Frehner Kavalco 2, Luiz R Nunes 4, Rubens Pasa 2,, Fabiano B Menegidio 3,
PMCID: PMC10368750  PMID: 37491348

Abstract

Astyanax mexicanus is a well-known model species, that has two morphotypes, cavefish, from subterranean rivers and surface fish, from surface rivers. They are morphologically distinct due to many troglomorphic traits in the cavefish, such as the absence of eyes. Most studies on A. mexicanus are focused on eye development and protein-coding genes involved in the process. However, lncRNAs did not get the same attention and very little is known about them. This study aimed to fill this knowledge gap, identifying, describing, classifying, and annotating lncRNAs expressed in the embryo’s eye tissue of cavefish and surface fish. To do so, we constructed a concise workflow to assemble and evaluate transcriptomes, annotate protein-coding genes, ncRNAs families, predict the coding potential, identify putative lncRNAs, map them and predict interactions. This approach resulted in the identification of 33,069 and 19,493 putative lncRNAs respectively mapped in cavefish and surface fish. Thousands of these lncRNAs were annotated and identified as conserved in human and several species of fish. Hundreds of them were validated in silico, through ESTs. We identified lncRNAs associated with genes related to eye development. This is the case of a few lncRNAs associated with sox2, which we suggest being isomorphs of the SOX2-OT, a lncRNA that can regulate the expression of sox2. This work is one of the first studies to focus on the description of lncRNAs in A. mexicanus, highlighting several lncRNA targets and opening an important precedent for future studies focusing on lncRNAs expressed in A. mexicanus.

Subject terms: Long non-coding RNAs, Transcriptomics

Introduction

Astyanax mexicanus is a well-known model species in the study of the evolution of multiple traits, rapid phenotypic evolution and development of troglomorphic traits14. A. mexicanus has multiple populations distributed across surface rivers and subterranean rivers in Mexico. The surface populations, referred to as surface fish (SF) from now on, are found in rivers from the northwest of Mexico to the south of Texas, in the USA57. Populations from subterranean rivers, referred to as cavefish (CF), are found in the Mexican caves of Sierra de Guatemala, Sierra de Colmena and Sierra de El Abra, where more than 30 different populations are known to exist59. Aside from distribution, surface fishes and cavefishes are distinguishable by morphology, due to several troglomorphic traits found in cavefish populations. Cavefishes have craniofacial modifications, more and bigger neuromasts, more tastebuds, and reduction or absence of eyes and pigmentation68,1015. Alongside morphological changes, cavefishes have also undergone changes in behavior11,12,1618, circadian rhythm19,20, sleep21,22 and metabolism2325. The degree of those troglomorphic traits varies among cavefish populations and is possible to find populations with an intermediate morphotype and others with an extreme morphotype, such as the Pachón cave population6,26.

Among those many traits, the absence of eyes has received distinguished attention in many studies, with eye development being a recurring topic in studies involving A. mexicanus. As such, we now know that the absence of eyes occurs due to a degeneration process during the initial stages of development. Until 20 h post-fertilization (hpf), the eye development in cavefish and surface fish is quite similar, however, after 40 hpf, the cavefish lens enters apoptosis, leading to a progressive degeneration process that results in absence of eyes in the adult phase2734. This process was addressed under different approaches, including studies in retinal morphology and development31,3437, lens defects and transplants31,33,38,39, quantitative trait loci analysis (QLTs)28,40,41, genomics34,4245 and gene expression and transcriptomics31,34,35,4652. As such, many genes are suggested to have a relevant role in eye development and degeneration, that includes, but not restricted to, the crystallin genes αA-crys, cryaa, crybb1, crybb1c and crybgx31,52,53, transcription factor sox253, retinal homeobox rx342,45, cone-rod homeobox crx50,52, cbsa34 and dusp2645. However, despite being well-studied, the eye development in A. mexicanus it’s not entirely understood, and many questions remain to be answered. For instance, non-coding RNAs have not properly been addressed in A. mexicanus apart from annotations in the genomes available. It’s unknown what role they may play in the development of troglomorphic traits, such as in the absence of eyes.

Non-coding RNAs represent more than 98% of the eukaryote’s genomes and correspond to transcripts that do not codify proteins54,55. Although they can be classified under different criteria, the ncRNA length is usually the most common. Under this criterion, if a ncRNA has less than 200 nucleotides (nt), it is classified as a small non-coding RNA (sncRNA), if has 200 nt or more, it’s a long non-coding RNA (lncRNA)56,57. Alongside this initial classification, each category has different classes. SncRNAs are organized into at least 5 classes: microRNA (miRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), small interfering RNA (siRNA) and PIWI-interacting small RNA (piRNA)5759. MiRNAs, for example, are essential in almost every developmental process and the disruption of miRNA genes can result in developmental defects, including retinal degeneration60.

On the other hand, lncRNAs are usually classified according to the genomic position, as seen in the GENCODE, in which a lncRNA can be sense, antisense, intronic and intergenic (lincRNA)6163. For lncRNA transcripts, this classification is extended and considers the position, localization, and direction of transcription regarding the nearest protein-coding gene61,64,65. LncRNAs have many functions but are well-known to act as regulators of gene expression, acting during transcription, post-transcription and even at an epigenetic level6671 As such, they are present in a variety of processes, including cellular differentiation72, embryonic development73,74 and adaptation75. Some lncRNAs are known to be involved in ocular diseases, including corneal neovascularization, glaucoma, cataract, and diabetic retinopathy7678. The ANRIL lncRNA (antisense noncoding RNA in the INK4 locus), for example, is suggested to have a role in modulating optic nerve degeneration79. Hence, lncRNAs may play a role in eye degeneration in A. mexicanus and must be investigated. To do so, a broad identification and description of lncRNAs are needed.

Therefore, due to the absence of studies describing lncRNAs in A. mexicanus, we aimed to fill this knowledge gap by investigating lncRNAs expressed in the eye tissue of A. mexicanus embryos from cave and surface fish. We successfully identified thousands of putative lncRNAs expressed in both morphotypes and exclusive to each of them. Additionally, we were able to associate lncRNAs to protein-coding genes previously described as candidates in the eye development and degeneration in A. mexicanus.

Materials and methods

Library acquisition and pre-processing

The eye tissue RNA-Seq libraries of A. mexicanus used in this work were generated by Gore et al.52 and are available at the Sequence Read Archive (SRA), under the bioproject PRJNA429434, and at the Gene Expression Omnibus (GEO), access number GSE109006. Gore et al.52 extracted the eyes of A. mexicanus embryos at 54 h post-fertilization from each morphotype and isolated the total RNA with ZR-Duet DNA/RNA MiniPrep Kit (Zymo Research). Next, they prepared the libraries with the TruSeq RNA (Illumina) kit and enriched it for poly(a) RNA. Two biological replicates from each morphotype were sequenced in Illumina HiSeq 2500 platform (paired-end libraries, 2 × 100 bp and 100 million reads). The cavefish and surface fish libraries were deposited by the authors in SRA, under the accession numbers SRR6456919/SRR6456920 and SRR6456921/SRR6456922, respectively.

We downloaded the libraries from SRA and assessed the quality with FastQC v.0.11.980 and summarized on MultiQC v.1.1181. Adapters, contaminants, and low-quality reads were removed with fastP v.0.20.182, considering a high-quality score (> Q30). The methodology workflow, including the next steps, is summarized in Fig. 1 and detailed information on each tool, such as versions and options used, is available in Supplementary Table ST1-S1.

Figure 1.

Figure 1

Simplified workflow of the conducted analysis, divided into three major steps. (1) Pre-processing of RNA-seq libraries, transcriptome assembly, quality assessment and general metrics; (2) Functional annotation of the transcriptomes, coding potential calculation, identification of ncRNAs families and filtering process. (3) Mapping of putative lncRNAs, identification of shared, novel, known and conserved lncRNAs, as well their classification, interactions and in silico validation of them through ESTs.

De novo assembly and metrics assessment

Using the trimmed reads, we assembled eye-specific transcriptomes for both cave and surface fish through a de novo approach, using Trinity v.2.9.183, integrated into Galaxy Europe webserver (https://usegalaxy.eu/). For the next steps, we removed the ‘TRINITY’ prefix in the sequence name to simplify the IDs. Assembly metrics were assessed through the TrinityStats script and FPKM values for each transcript were obtained with align reads and estimate abundance script, both available with the Trinity package. Transcriptome completeness was evaluated with the aid of the Benchmarking Universal Single-Copy Orthologs (BUSCO) tool v.5.0.084, using the Actinopterygii OrthologDB v.1085 that consists of 3,640 BUSCO groups. The overall assembly quality was accessed with rnaQUAST v.2.2.186, mapping the transcriptomes against the reference genomes of A. mexicanus. The cavefish genome from Pachón cave87 was used as a reference for the cavefish transcriptome, and the surface genome45 for the surface fish transcriptome.

Functional and ncRNA annotation

The assembled transcriptomes were annotated through the Eukaryotic Non-Model Transcriptome Annotation Pipeline v.5.0.0 (EnTAP)88 in two steps: (a) similarity search using blastx with e-value ≤ e-5 and ≥ 50% minimum coverage against the custom database FishProteinDB (this study) and EggNOG database89. The FishProteinDB consists of 171,502 protein sequences of Hyperoartia, Myxini, Chondrichthyes, Actinopterygii and Sarcopterygii species (excluding the Tetrapod clade) from RefSeq. Additionally, we included proteins of A. mexicanus available in the Ensembl database; (b) functional annotation against the EggNOG databases to identify and assign Gene Ontology90, KEGG terms9193 and protein domains from SMART94 and PFAM95.

The transcriptomes were also annotated with the cmscan program, part of the Infernal suite v.1.1.496, using the Rfam v.14.6 database97 to classify transcripts into different non-coding RNA families. This step considered only annotations filtered by the bit score gathering threshold determined in Rfam.

Long non-coding RNA prediction

To identify long non-coding RNAs, we removed transcripts with less than 200 bp from the transcriptomes. The remaining sequences were submitted to five coding potential calculator tools (CP tools): Coding Potential Calculator 2 (CPC2) py3 v.1.0.198, RNASamba v.0.2.599, Coding-Potential Assessment Tool (CPAT) v.3.0.4100, RNAmining v.1.0.4101 and PLEK v.1.2102. A similar approach was conducted by Mishra and Wang103 with zebrafish, using six different CP tools, and by Aciole Barbosa et al.104 with cobia (Rachycentron canadum), using three CP tools alongside EnTAP and Infernal annotations. Both works successfully identified thousands of lncRNAs, however, they did not consider misclassifications and discordant results by the CP tools, only lncRNAs predicted as non-coding by all tools. As each CP tool classified the transcripts as coding or non-coding, to avoid tool bias and misclassifications, we only considered a transcript as coding or non-coding if four out of five tools agreed with this classification. Transcripts with only 3 tools concurring on the classification were considered ambiguous and deemed undetermined.

Next, we filtered out coding and undetermined elements, as well as any sequence classified as non-coding that were annotated in the EnTAP and Infernal steps. Redundant sequences were removed with CD-HIT-EST v.4.6105,106, using a similarity threshold of 1.0 and a word size of 8. Transcripts with FPKM value < 1 were then removed, and the resulting transcripts were deemed as long non-coding RNA candidates. In summary, we delimited lncRNA candidates in this study as transcripts with 200 or more nucleotides, that were classified as non-coding by at least 4 coding potential tools, that were not annotated by EnTAP nor Infernal and had FPKM value ≥ 1.

Mapping and classification of lncRNAs

LncRNA candidates were mapped against the chromosomes of their respective morphotypes with GMAP v.2021-12-17107, using the same genomes used in the rnaQUAST step. An index was created for each genome, excluding unplaced scaffolds and sequences not assigned to a chromosome. Chimeric alignments were then excluded from the gff3 output. Next, using the agat_convert_sp_gff2gtf script, available with the AGAT package v.0.9.1108, the GFF3 file was converted into a GTF file and used as input in FEELnc v.0.264 along with the lncRNA candidates. Next, the classifier module from FEELnc was used to predict interactions between lncRNAs and mRNAs and to classify lncRNAs based on genomic position considering four categories: type, subtype, transcription direction and localization, totalizing 16 possible transcript classifications. More info about this classification is available on the FEELnc GitHub page (github.com/tderrien/FEELnc#3–feelnc_classifierpl).

Next, we summarized this lncRNA transcript classification based on the GENCODE61 classification, separating lncRNAs in genic and intergenic. Genic lncRNAs were organized in exonic, intronic and overlapping, and intergenic in same-strand, convergent and divergent. For the interactions between lncRNAs and partners, we filtered the results using isBest = 1, as such, only the best interactions were retained.

Annotation of lncRNAs

To identify conserved lncRNAs between cave and surface morphotypes, we conducted a Reciprocal Best Hits (RBH) blast analysis109,110, also in the Galaxy Europe, with lncRNA candidates using megablast task, with identity ≥ 70% and coverage ≥ 70%. In silico validation was executed through a primary sequence alignment of the mapped lncRNA candidates against a database of Expressed Sequence Tags (ESTs) of A. mexicanus. Initially, 189,864 ESTs of A. mexicanus were downloaded from NCBI’s GenBank (https://www.ncbi.nlm.nih.gov/genbank/). We removed redundant sequences with CD-HIT-EST with the same parameters as described before, resulting in a database of ESTs with 148,690 unique ESTs. Next, we conducted the alignment with megablast task with NCBI BLAST + 109,110 integrated into the Galaxy Europe webserver. We applied a cutoff of e-value ≤ e−5, and identity and coverage ≥ 95%.

For the annotation step, we constructed a database of lncRNAs (AmexLNC DB), which comprehends previously described lncRNAs predicted in the previously cited genomes of A. mexicanus Pachón cave and surface, totalizing 7773 sequences. We conducted a similarity search with NCBI BLAST + under the megablast task, using the lncRNA candidates of each morphotype as query sequences, against the AmexLNC DB. An e-value ≤ e−5 was used as a cutoff, alongside identity ≥ 70% and coverage ≥ 70%. The blast results were filtered with BLAST top hit descriptions v.0.1.1111 and only the 3 top hits were considered.

Next, we proceed with the annotation step with two other lncRNAs databases: the Zebrafish LncRNA Database (ZFLNC)112, which comprehends 21,128 sequences of Danio rerio, and the LncBook 2.0113, a curated database with 323,950 transcripts sequences of human lncRNAs. While applying the same methodology used in the AmexLNC DB step, we keep the e-value ≤ e-5 as the cutoff, however, we used specific values of identity and coverage based on phylogenetic relationship. As such, for the annotation against zebrafish sequences, we used identity and coverage of 50%, while against human sequences, we maintained the identity as 50% but lowered the coverage to 25%. The choice of identity and coverage values was done considering the low sequence conservation of lncRNAs114,115. Venn diagrams were drawn with InteractiVenn116 online tool, to show lncRNAs that were annotated only against a single database and that had annotations in more than one.

LncRNAs conservation among fishes

Initially, ncRNA sequence data from Ensembl were downloaded from Ensembl ftp (https://ftp.ensembl.org/pub/current_fasta/) for all 90 assemblies available for the group ‘Fish’ (Supplementary Table ST1-S2) to create a blast database. In a first step, the multifasta from each fish was utilized to produce a blast database and all mapped CF and SF lncRNAs were blasted against them producing xml and tabular outputs. This step allowed filtering out all fish databases in which no hits were found (e-value >  = e−3 and query coverage >  = 60%). The remaining databases were merged in a single blast database and CF/SF lncRNAs were blasted (e-value >  = e−3) in a second round. This second step allowed us to easily filter the tabular outputs (query coverage >  = 60%) to obtain lncRNAs matching fish species and visualize these alignments with the aid of BlastViewer v. 5.5.2 (https://github.com/pgdurand/BlastViewer) using the xml blast outputs. Additionally, the number of hits in each species was summarized in a dendrogram with a custom R script based on taxize117, myTAI118 and ggtree119 packages.

Interactions with candidate genes for eye loss

Based on the previous data compiled by Casane and Retáux120 and Warren et al.45, we put together a list of protein-coding genes related to eye loss in the cavefish based on multiple works, considering differentially expressed genes between cavefish and surface fish, genes in QTLs related to eye development and candidates with a possible role in eye loss. We only kept the genes annotated in the A. mexicanus genomes available at Ensembl (Supplementary Table ST1-S3). We then searched for mapped lncRNAs that had at least one of these genes as a partner and created interaction plots between lncRNAs and the candidate genes using an in-house R script and the qgraph121 package. To enable better visualization, we removed the ‘DN’ prefix of the lncRNAs IDs in the interaction networks.

Conservation of SOX2-OT transcripts in cavefish and surface fish

To verify if the SOX2-OT transcripts were shared between CF and SF, we filtered the RBH results previously obtained. The secondary structure of the RBH pairs and comparisons between the structure of these pairs were obtained by submitting the fasta sequences to ExpaRNA122,123, using default parameters.

Results

Transcriptome assembly and completeness assessment

The CF libraries comprised 178,682,000 (SRR6456919) and 152,354,892 (SRR6456920) raw reads. After trimming, the samples had a similar percentage of high-quality reads 174,731,916 (97.8%) and 148,191,644 (97.3%), respectively. With the trimmed reads, we successfully assembled 270,293 transcripts, comprehending 277,979,592 bases, with 43.89% GC content. As for the SF morphotype, the libraries comprised 185,809,258 (SRR6456921) and 197,626,438 (SRR6456922) raw reads, which resulted in 181,220,752 and 192,198,568 high-quality reads, after the trimming. These were assembled into 244,721 transcripts, comprising 242,667,658 assembled bases, with 44.43% GC content. These results and more detailed information are available in Supplementary Table ST1-S4.

The BUSCO completeness assessment of CF and SF transcriptomes displayed 78.0% (2840) and 80.0% (2911) of the complete BUSCO dataset (3640 elements), respectively. While the BUSCO completeness analyses provided very similar results for both CF and SF transcriptomes, they displayed significantly different proportions of single copy (27.1% and 50.3%) and duplicated BUSCOs (50.9% and 29.7%) (Supplementary Table ST2-S0 and ST3-S0). Within the align reads and estimate abundance results, we identified 127,764 transcripts in CF and 123,415 in SF with FPKM < 1.0 (Supplementary Table ST2–S1 and ST3–S1).

The quality assessment of assemblies with rnaQUAST revealed that almost all transcripts in CF (99.88%) and SF (98.54) had at least one significant alignment (Table 1). The number of transcripts with multiple alignments was 1.28% in CF and 0.96 in SF. Considering unaligned elements, transcripts that didn’t have a significant alignment, only 319 (0.12%) were found in CF; while in SF, unaligned transcripts were more numerous and 3574 (1.46%) transcripts were identified. The number of misassembles was similar between morphotypes, 4.86% in CF and 6.34% in SF.

Table 1.

Results of rnaQUAST analysis for both CF and SF.

Cavefish Surface fish
Count % Count %
Transcripts 270,293 100% 244,721 100%
Transcripts > 500 bp 130,902 48.43 111,470 45.55
Transcripts > 1000 bp 79,863 29.55 66,889 27.33
Aligned 269,974 99.88 241,147 98.54
Uniquely aligned 253,388 93.75 223,275 91.24
Multiply aligned 3,453 1.28 2,357 0.96
Unaligned 319 0.12 3,574 1.46
Misassemblies 13,133 4.86 15,515 6.34
Avg. aligned fraction 0.974 0.964
Avg. alignment length 980.701 934.444
Avg. mismatches per transcript 4.905 4.754

Most assembled transcripts were longer than 500 bp. In CF, 99.88% of transcripts were successfully aligned against the reference genome. Similarly, in SF 98.54% were aligned. Uniquely alignments represent more than 90% of the aligned transcripts in both morphotypes.

Functional annotation

The transcriptome annotation resulted in 105,850 (39.2%) CF and 96,100 (39.3%) SF transcripts annotated against the FishProteinDB (Supplementary Table ST2-S2 and ST3-S2). As expected, most hits in both transcriptomes represent proteins from A. mexicanus. Concerning protein domains, 93,052 (34.4%) and 83,772 (34.2%) of the transcripts of CF and SF were annotated, respectively. Looking at Gene Ontology (GO) terms, 65,711 (24.3%) transcripts of the cave were associated with biological processes, 64,798 (24.0%) with molecular function and 49,158 with cellular component terms. In the surface transcriptome, 59,550 transcripts (24.3%) were distributed into biological process terms, while 58,979 (24.1%) and 44,097 (18.0%) hit molecular functional and cellular component terms, respectively. Additionally, 29,684 (11.0%) cave and 26,763 (10.9%) surface transcripts were related to KEGG pathway terms.

Annotated ncRNA families

Through Infernal annotation, 739 CF transcripts were found displaying similarity to 174 previously known RNAs from the Rfam database. From this total, 137 were identified as small non-coding RNAs (sncRNA), of which 88 represent small nucleolar RNAs (snoRNAs) and 49 microRNAs (miRNAs) (Fig. 2a and Supplementary Table ST2-S3). In the SF transcriptome, 699 transcripts were annotated against 173 known RNAs, with 136 being sncRNAs (90 snoRNAs and 46 miRNAs) (Fig. 2a and Supplementary Table ST3-S3).

Figure 2.

Figure 2

Number of sncRNAs, non-coding transcripts and the distribution of lncRNAs per chromosome. (a) Distribution of miRNAs and snoRNAs annotated in INFERNAL’s cmscan module; (b) Classification of transcripts according to their coding potential, considering the agreement of at least 4/5 coding potential tools; (c) Comparison of the chromosomal distribution of mapped lncRNAs between cavefish and surface fish.

Long non-coding RNA candidates

The annotation of each tool to each transcript is available in Supplementary Table ST2-S4 and ST3-S4. A Venn diagram showing the agreements and disagreements between the CP tools classifications can be found in Supplementary File 1. In the analysis considering the concordance of 4/5 tools, were identified 217,970 (80.65%) non-coding and 26,766 (9.90%) coding transcripts from the CF transcriptome, while 25,528 (9.45%) elements were considered undetermined (Fig. 2b). Similarly, 197,313 (80.64%) non-coding and 24,025 (9.82%) coding transcripts from the SF transcriptome were identified, leaving 23,357 (9.54%) elements classified as undetermined (Fig. 2b).

All non-coding RNAs were filtered against EnTAP annotations, providing identification for 55,933 (CF) and 50,794 (SF) transcripts. A similar analysis was conducted against Infernal, further identifying 357 elements in CF and 325 in SF. We also removed 824 CF and 326 SF redundant sequences with CD-HIT and excluded 127,764 transcripts in CF and 123,415 in SF with FPKM < 1.0. After this filtration process, we identified 33,092 and 22,453 transcripts as lncRNA candidates, from CF and SF transcriptomes, respectively.

Mapped lncRNAs

Considering the putative lncRNAs from CF, 33,069 (99;93%) of them were successfully mapped against the Pachón cave genome. From this totality, 2873 (8.69%) were mapped exclusively against chromosome 1, while chromosome 18 had the lowest number with only 664 (2.00%) mapped lncRNAs (Supplementary Table ST2-S5). In SF, the proportion of mapped lncRNAs was lower, with only 19,493 lncRNAs in total (86.82%) being mapped against the surface genome. Different from the scenario observed in the cavefish, only 656 (3.29%) lncRNAs were mapped against chromosome 1 and chromosome 21 had only 390 (1.96%) mapped lncRNAs, the lowest number among the 25 surface fish chromosomes (Supplementary Table ST3-S5). These differences in the number of mapped lncRNAs are present in almost all chromosomes (Fig. 2c).

Shared and validated lncRNAs

From the totality of lncRNA candidates, 5,741 of them are shared between the transcriptomes of CF and SF. Thus, most of the lncRNAs are exclusive to each morphotype, i.e., 27,328 lncRNAs were only found in the CF transcriptome and 13,752 were exclusively found in the SF (Fig. 3a and Supplementary Table ST1-S5). The annotation against ESTs resulted in the validation of 941 lncRNAs from CF (Supplementary Table ST2-S6) and 526 from SF (Supplementary Table ST3-S6).

Figure 3.

Figure 3

Shared lncRNAs, number of annotations against the databases and the number of novel and conserved lncRNAs. (a) Venn diagram showing lncRNAs that are shared between the two morphotypes; (b) Number of lncRNAs annotated to AmexLNC, ZFLNC and LncBook databases; Venn diagram showing lncRNAs of (c) cavefish and (d) surface fish and how the annotations were distributed across the databases. The number of lncRNAs annotated against more than one database is represented in the intersections; (e) Dendrogram illustrating the lncRNAs of CF and SF that were conserved among other fishes species with assemblies available in Ensembl; (f) Number of lncRNAs that were not annotated to any lncRNA database (Novel LncRNAs) and lncRNAs that were annotated to at least one database (Known LncRNAs), therefore, conserved lncRNAs.

Conserved lncRNAs

The number of lncRNAs annotated against the lncRNAs databases of A. mexicanus (AmexLNC), zebrafish (ZFLNC) and human (LncBook) is summarized in Fig. 3b. In the AmexLNC annotation step, 1980 lncRNAs of CF and 1423 of SF were annotated to at least one lncRNA (Supplementary Table ST2-S7 and ST3-S7). As for the ZFLNC step, 495 lncRNAs of CF and 361 of SF were successfully annotated (Supplementary Table ST2-S8 and ST3-S8), and, similarly, 269 and 214 lncRNAs of CF and SF, respectively, were annotated against a human lncRNA transcript (Supplementary Table ST2-S9 and ST3-S9). Following these results, most lncRNAs were uniquely annotated to a specific database, however, a few lncRNAs were annotated in more than one DB. In CF, 135 lncRNAs had a hit in both AmexLNC and ZFLNC, 46 in ZFLNC and LncBook, and 16 in AmexLNC and LncBook. Interestingly, 29 lncRNAs were annotated in all three databases (Fig. 3c). As for the SF, a similar case was found, in which 78 lncRNAs were annotated in AmexLNC and ZFLNC, 37 in both ZFLNC and LncBook, 10 in AmexLNC and LncBook, and, finally, 23 lncRNAs that were annotated in all of them (Fig. 3d).

As for the lncRNAs conservation in fishes, excluding the annotations against the A. mexicanus assembly, that, as expected, represented the most numerous annotations with 1707 hits from CF and 1207 from SF, we were able to identify multiple lncRNAs that are conserved between A. mexicanus and at least one species (Supplementary Table ST2-S10 and ST3-S10). In total, conserved lncRNAs were found in 21 other species, as represented in the dendrogram in Fig. 3e. A total of 55 lncRNAs from CF and 36 from SF matched a lncRNA from Oncorhynchus kisutch, and, similarly, 55 lncRNAs (CF) and 31 (SF) matched a lncRNA from Cyprinus carpio (Fig. 3e). Even though a considerable amount of hits were found in Salmoniformes (CF = 120; SF = 80), Cypriniformes (CF = 87; SF = 49) and Clupeiformes (CF = 32; SF = 22) no apparent relation can be identified between the number of hits and the phylogenetic proximity between A. mexicanus and the target species.

Furthermore, this also seems to be the case pertaining the number of hits and the number of lncRNAs and ncRNAs described for each assembly; Salmo salar, for instance, has 28,609 lncRNAs but only 46 hits from CF and 30 from SF, while O. kisutch has only 3425 sequences but was the species with most conserved lncRNAs identified (Supplementary Table ST1-S6). Additionally, some lncRNAs were conserved between multiple species, which is the case of the DN2011_c0_g1_i7 lncRNA from CF, which is conserved in A. mexicanus, D. rerio, S. salar, O. mykiss, O. kisutch and O. tshawytscha. In SF, the DN501_c29_g1_i1 lncRNA is conserved in A. mexicanus, S. salar, Sphaeramia orbicularis, O. kisutch and O. tshawytscha. The alignment details and visual representation of the top 5 lncRNAs with most species and subspecies hits, including the two previously cited, are available in Supplementary Table ST2-S10 and ST3-S10 for CF and SF, respectively.

Finally, considering the four annotation steps, we were able to identify 2860 lncRNAs of CF and 2080 of SF as known lncRNAs. In contrast, 30,209 lncRNAs of CF and 17,413 of were not annotated against any lncRNA sequence, therefore, identified as novel lncRNAs (Fig. 3f).

LncRNAs classification

Intergenic lncRNAs were the most abundant in the CF transcriptome, corresponding to 22,888 (69.88%) transcripts (Supplementary Table ST2-S11). In SF, only 9599 (49.74%) lncRNAs were intergenic (Supplementary Table ST3-S11). In both cavefish and surface fish, same-strand lincRNAs were more numerous, followed by convergent and divergent (Fig. 4a). Despite the expressive difference in total numbers, the morphotypes had a similar proportion in lincRNAs. A total of 11,377 (49.71%) and 4783 (49.83%) of same-strand lincRNAs were found in CF and SF, respectively. A similar scenario was found within the convergent, in which 9893 (43.22%) were from CF and 3735 (38.91%) were from SF. Divergent represented a small part of lincRNAs, with 1618 (7.07%) and 1081 (11.26%), in CF and SF, respectively.

Figure 4.

Figure 4

LncRNAs classification and interaction networks of lncRNAs and partners. (a) Distribution of intergenic lncRNAs, organized into three types: same-strand, convergent and divergent. Same-strand and convergent lincRNAs were the most abundant in both cavefish and surface fish, however, in cavefish, the number was considerably higher; (b) Distribution of genic lncRNAs, where cavefish had a higher number of intronic and surface fish of exonic; Interaction networks between lncRNAs of (c) cavefish and (d) surface fish and candidate genes (partners). The ‘DN’ prefix from the IDs of the lncRNAs was removed to enable better visualization. Moreover, to guarantee the legibility of these high-information figures, high-resolution versions of the interaction networks are available in Supplementary File 2 and 3, respectively; (e) Cavefish and (f) Surface fish lncRNAs, represented by outer circles, interacting with sox2 gene in the center. The tree circles with a background color other than white, represent homologous lncRNAs between morphotypes.

Genic lncRNAs displayed more differences between the two transcriptomes, with a substantial divergence in proportions (Fig. 4b). In CF, 9,867 (30.12%) genic lncRNAs were identified, of which 5832 (59.11%) were intronic, 2281 (23.12%) exonic and 1754 (17.77%) overlapping. In SF, a total of 9699 (50.26%) were observed, being 4124 (42.52%) intronic, 4113 (42.41%) exonic and 1462 (15.07%) overlapping.

LncRNAs interactions with partners

In the CF transcriptome, we identified 32,755 (99,05%) lncRNAs interacting with 12,633 partners, while in SF, 19,298 lncRNAs were interacting with 8389 partners. Multiple lncRNAs interacting with the same partner were observed in both transcriptomes, such as the ENSAMXG00000015728, with 35 interactions, and ENSAMXG00000029878, with 46, in CF and SF, respectively. Of these interactions, we observed 205 lncRNAs interacting with 57 partners in CF (Fig. 4c and Supplementary Table ST2-S12). In SF, the number of lncRNAs was smaller, but the number of partners was higher than in CF, with 143 lncRNAs and 72 partners (Fig. 4d and Supplementary Table ST3-S12).

Among the partners, we discovered seven transcripts interacting with the sox2 (SRY-Box Transcription Factor 2) gene in the CF, of which five of them were classified as genic sense intronic containing (intronic) and two of them, as intergenic antisense convergent lncRNAs (intergenic) (Fig. 4e and Table 2). Similarly, in the surface fish, eight transcripts were partners of sox2, however, all of them were intronic lncRNAs (Fig. 4f and Table 2). This classification indicates that these putative lncRNAs are transcribed in the same direction as the sox2 and that the sox2 is contained within them. The length varies considerably among these transcripts, ranging from 345 nt (DN1425_c0_g1_i45) to 1,189 nt (DN1425_c0_g1_i26) (Table 2).

Table 2.

LncRNAs associated with the sox2 gene, their respective classification and morphotype, and the lncRNA length (nt).

Gene LncRNA Classification Morphotype Length
sox2 DN21854_c0_g1_i13 Intronic containing Cavefish 530
sox2 DN110_c18_g1_i1 Intergenic convergent Cavefish 1018
sox2 DN21854_c0_g1_i5 Intronic containing Cavefish 709
sox2 DN21854_c0_g1_i8 Intronic containing Cavefish 542
sox2 DN13000_c2_g1_i1 Intergenic convergent Cavefish 1048
sox2 DN21854_c0_g1_i2 Intronic containing Cavefish 463
sox2 DN21854_c0_g1_i10 Intronic containing Cavefish 642
sox2 DN1425_c0_g1_i11 Intronic containing Surface fish 389
sox2 DN1425_c0_g1_i19 Intronic containing Surface fish 662
sox2 DN1425_c0_g1_i20 Intronic containing Surface fish 506
sox2 DN1425_c0_g1_i26 Intronic containing Surface fish 1189
sox2 DN1425_c0_g1_i27 Intronic containing Surface fish 463
sox2 DN1425_c0_g1_i42 Intronic containing Surface fish 558
sox2 DN1425_c0_g1_i45 Intronic containing Surface fish 345
sox2 DN1425_c0_g1_i7 Intronic containing Surface fish 530

LncRNAs highlighted in bold correspond to shared lncRNAs between the morphotypes.

According to the RBH results, CF and SF fish share three lncRNAs associated with sox2 (Fig. 4e,f). DN21854_c0_g1_i13 (530 bp) and DN1425_c0_g1_i7 (530 bp) have a percentage of identical matches (pident) of 99.621%; DN21854_c0_g1_i2 (463 bp) and DN1425_c0_g1_i27 (463 bp) a pident of 99.566%; and DN21854_c0_g1_i10 (642 bp) and DN1425_c0_g1_i19 (662 bp) a pident of 99.533%. The secondary structures representing the solution of LCS-EPM (Longest Common Subsequence of Exact Pattern Matchings) of these SOX2-OT transcripts RBH pairs show that similarity in such transcripts is observed not only at the sequence level, but also in their structure motifs (Fig. 5a,b,c).

Figure 5.

Figure 5

Comparison of secondary structures of RBH pairs of SOX2-OT transcripts. Motif colors represent conserved regions between the pair’s structures. (a) DN21854_c0_g1_i13 (CF) and DN1425_c0_g1_i7 (SF); (b) DN21854_c0_g1_i2 (CF) and DN1425_c0_g1_i27 (SF) and (c) DN21854_c0_g1_i10 (CF) and DN1425_c0_g1_i19 (SF).

Discussion

This study successfully expanded the knowledge of lncRNAs in A. mexicanus, filling a relevant gap to an important model species. From a few thousand lncRNAs described in A. mexicanus, we were able to describe tens of thousands of lncRNAs that are expressed in cave and surface fish. While some of them are shared between the morphotypes, a considerable number of them are uniquely expressed in cave or surface fish. We also identified lncRNAs conserved in several species, including human and zebrafish. Moreover, we explored their classification and interactions with other genes, creating a concise profile of lncRNAs in A. mexicanus. Some lncRNAs described here were interacting with key genes to eye development, revealing important target lncRNAs to be further studied in future research, such as the ones interacting with sox2.

We were successful in assembling the transcriptomes, with almost 100% of the transcripts aligning against the reference genome. Around 4–6% of the total aligned transcripts were misassemblies, and despite being a considerable percentage, it’s under the expected for the assembly method employed, since Trinity has one of the highest misassemblies rates among the de novo assemblers, as found by Yang and Smith124 (3.9%) and Kerkvliet et al.125 (17.9%). Likely, not all those transcripts represent assembly errors, since it is known that misassemblies can also be chimeric transcripts, that arise from the fusion of exons from different genes through chromosomal rearrangement or trans-splicing124,126,127. Therefore, it may be relevant to further explore these misassembled transcripts.

As for the coding potential, using the CP tools alone may not be sufficient to precisely identify non-coding transcripts. With our approach, using a consensus of 4 of 5 tools, we successfully evade, fully or partially, the bias effect of differences between the CP tools that could lead to a very strict classification. CP tools are, indeed, a reliable mechanism to identify non-coding transcripts, however, despite using different methods and algorithms, none of them were sufficient to precisely identify only true non-coding transcripts. Usually, CP tools use binomial classification, either a transcript is coding or non-coding, however, we were able to create a third category of transcripts classification using this method. Undetermined transcripts represent a classification where the coding potential is uncertain, for some tools they are non-coding, and for others, they are coding. This third category, alongside the EnTAP step, allowed us to annotate thousands of transcripts that seem not to be strictly non-coding transcripts. As such, we advise the use of multiple CP tools and annotation steps against a protein database to further avoid biases.

These undetermined transcripts, however, are not necessarily an issue, but rather an initial step toward the identification of potential bi-functional transcripts. Some lncRNAs have small Open Reading Frames (sORFs) that can code for small peptides with regulatory function128131 and some protein-coding genes can also have non-coding isoforms128,130,132134. Therefore, these undetermined transcripts may represent bi-functional transcripts and our approach may be able to discover them. However, further investigations are necessary to confirm this.

On the other hand, the divergence in lncRNAs across the chromosomes seems to not be involved with transcripts classification, but rather with differences between the reference genomes used in the mapping step. In the Pachón cave genome, for example, chromosome 1 has 133,971,750 bp87, while in the surface fish genome, the same chromosome has only 26,953,843 bp45. This difference in size is more likely to be an assembly bias, due to different sequencing and assembly methodologies, rather than a biological trait, since the number of unplaced scaffolds between the two assemblies is considerable. In the cavefish genome statistics, 170 unplaced scaffolds correspond to a total length of 29,150,210, while in the surface fish, this number is higher as 2390 scaffolds and 404,626,875 bp in length45. Those differences may have a role in the differences in the different proportions of genic and intergenic lncRNAs between CF and SF, as such, it should be addressed when equivalent genomes assemblies are available.

LncRNAs are known to have low primary sequence conservation in comparison to protein-coding genes114,115,135137, although, conservation in short regions of the lncRNAs sequence has been observed in different species138. Despite that, conserved lncRNAs have been identified in vertebrates, including relatively distantly related species, such as zebrafish and humans112,138,139. The analysis we conducted looking for conserved lncRNAs considered the primary sequence of the lncRNAs and while the parameters could be considered strict in terms of identity and coverage for lncRNAs, it allowed us to look for sequences that were, indeed, conserved or at least had well conserved small regions. This allowed us to identify multiple lncRNAs of A. mexicanus conserved in other fishes and humans.

The interactions between lncRNAs and partners could be affected by the reference genomes, resulting in a lower number of inferred interactions than occurs. Despite that, we were able to identify interactions with almost all mapped lncRNAs and more interestingly, interactions between them with candidate genes to eye development. These interactions are very interesting, due to the regulatory role of lncRNAs, and have great potential to appoint target lncRNAs and partners, that can be used in future studies. This seems to be the case with the lncRNAs that we identified to be interacting with sox2. Sox2 is a transcription factor found in the intronic region of SOX2-OT lncRNA (SRY-box transcription factor 2 (SOX2) overlapping transcript), a lncRNA that overlaps and is transcribed in the same direction as Sox2.

SOX2-OT, different from most lncRNAs, has considerable primary sequence conservation among vertebrates140142. The transcription is quite complex due to the existence of multiple transcription start sites, leading to transcripts of different lengths141,142. These traits are very well aligned with the multiple intronic transcripts that we found interacting with sox2, which also have a considerable difference in length. However, further investigations are required to confirm if those transcripts are, indeed, SOX2-OT isoforms. If confirmed, these lncRNAs must be explored, since SOX2-OT plays a relevant regulatory role, acting as a miRNA sponge and upregulating or downregulating the expression of sox2 according to the expression tissue140,141,143146. Moreover, it has been suggested that the SOX2-OT is involved in many processes in which sox2 has an important role during development, such as cell regulation, proliferation and differentiation141,142,146.

The sox2 regulation may be relevant in A. mexicanus, due to sox2 being associated with multiple processes, including maintaining the pluripotency of stem cells and the neural development and sensory organs147151. Interestingly, sox2 has an important role in eye development, acting in different stages of retinal development and controlling the activity of the Wnt/β-catenin pathway in the retina150,152155. In zebrafish, it was observed to be highly expressed in eyes and brain tissues at 28 hpf and especially expressed in the retina around 48 hpf142.

The role of sox2 in A. mexicanus was explored by Ma et al.53 that observed the downregulation of sox2 through the lens of cavefishes. They suppressed the sox2 in the surface fishes and observed downregulation of cryaa expression and lens apoptosis, as such, sox2 seems to be involved in eye development in A. mexicanus, although, the mechanisms involved in the downregulation of the cavefish remain to be understood. In any case, the regulation of sox2 by SOX2-OT must be considered in those processes and further investigated.

While the role of protein-coding genes has been effortlessly explored in A. mexicanus, lncRNAs have been, so far, neglected and little is known about them. By constructing a concise approach to predict, identify, and describe lncRNAs, thousands of lncRNAs were found to be expressed in the eye tissue of cavefish and surface fish morphotypes of A. mexicanus. However, differences in their lncRNA profile were also observed, including lncRNAs expressed exclusively in one morphotype. Additionally, putative lncRNAs associated with relevant genes with a role in eye development were highlighted. Therefore, this work can be used as a starting point to explore lncRNAs in future studies, including those focused on differential expression, with specific targets in mind. Furthermore, we open an important precedent to the arise of studies focusing on lncRNAs expressed in A. mexicanus.

Supplementary Information

Acknowledgements

We sincerely acknowledge the generous support of all individuals who contributed financially to cover the article processing charge for our publication: Ana Krotenberg Garcia, Caroline Garcia, Douglas Rodrigues Aguiar de Oliveira, José Wilson Faustino, Karina Brandão, Marcílio José da Costa Gomes, Maria Antônia Maciel, Rafael Augusto, Yara Brandão; and all the people who shared our crowdfunding. Your invaluable assistance has made it possible for our work to reach a wider audience and advance the boundaries of knowledge. We also thank the Federal University of Viçosa campus Rio Paranaíba, for the support, the Federal University of Viçosa for the access to computational resources and the anonymous reviewers for the helpful insights during the review process.

Author contributions

Conceptualization: I.B.S., F.B.M. and R.P.; Methodology: I.B.S., D.A.B. and F.B.M.; Formal Analysis and Investigation: I.B.S. and D.A.B.; Data curation: I.B.S., D.A.B. and F.B.M.; Writing—original draft: I.B.S.; Writing—review & editing: I.B.S., D.A.B., K.F.K., L.R.N., R.P. and F.B.M.; Project administration: F.B.M. and R.P. All authors read and approved the final version of the manuscript.

Funding

This study was financed in part by scholarship grants from Fundação de Amparo à Pesquisa do Estado de Minas Gerais—FAPEMIG (www.fapemig.br) (awarded to IBS) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil—CAPES (www.capes.gov.br) (awarded to DAB).

Data availability

This Transcriptome Shotgun Assembly project has been deposited at DDBJ/EMBL/GenBank under the accession DANNHJ000000000. The version described in this paper is the first version, DANNHJ010000000. The databases, sequences, scripts and intermediate datasets generated during the current study are available in the Open Science Framework (OSF) repository, under the identifier https://doi.org/10.17605/OSF.IO/3Z7QN.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Rubens Pasa, Email: rpasa@ufv.br.

Fabiano B. Menegidio, Email: fabianomenegidio@umc.br

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-023-34198-5.

References

  • 1.Jeffery WR. Cavefish as a model system in evolutionary developmental biology. Dev. Biol. 2001;231:1–12. doi: 10.1006/dbio.2000.0121. [DOI] [PubMed] [Google Scholar]
  • 2.Borowsky R. Astyanax mexicanus, the blind Mexican cave fish: A model for studies in development and morphology. Cold Spring Harb. Protoc. 2008;2008:pdb.emo107. doi: 10.1101/pdb.emo107. [DOI] [PubMed] [Google Scholar]
  • 3.Gross JB, Meyer B, Perkins M. The rise of Astyanax cavefish. Dev. Dyn. 2015;244:1031–1038. doi: 10.1002/dvdy.24253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.McGaugh SE, et al. Dark world rises: The emergence of cavefish as a model for the study of evolution, development, behavior, and disease. J. Exp. Zoolog. B Mol. Dev. Evol. 2020;334:397–404. doi: 10.1002/jez.b.22978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Gross JB. The complex origin of Astyanax cavefish. BMC Evol. Biol. 2012;12:105. doi: 10.1186/1471-2148-12-105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Jeffery WR. Astyanax surface and cave fish morphs. EvoDevo. 2020;11:1–10. doi: 10.1186/s13227-020-00159-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Maldonado E, Rangel-Huerta E, Rodriguez-Salazar E, Pereida-Jaramillo E, Martínez-Torres A. Subterranean life: Behavior, metabolic, and some other adaptations of Astyanax cavefish. J. Exp. Zool. B Mol. Dev. Evol. 2020;334:463–473. doi: 10.1002/jez.b.22948. [DOI] [PubMed] [Google Scholar]
  • 8.Keene A, Yoshizawa M, McGaugh S. Biology and Evolution of the Mexican Cavefish. Academic Press; 2015. [Google Scholar]
  • 9.Espinasa L, et al. A new cave locality for Astyanax cavefish in Sierra de El Abra, Mexico. Subterr. Biol. 2018;26:39–53. doi: 10.3897/subtbiol.26.26643. [DOI] [Google Scholar]
  • 10.Yamamoto Y, Byerly MS, Jackman WR, Jeffery WR. Pleiotropic functions of embryonic sonic hedgehog expression link jaw and taste bud amplification with eye loss during cavefish evolution. Dev. Biol. 2009;330:200–211. doi: 10.1016/j.ydbio.2009.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Yoshizawa M, Gorčiki Š, Soares D, Jeffery WR. Evolution of a behavioral shift mediated by superficial neuromasts helps cavefish find food in darkness. Curr. Biol. CB. 2010;20:1631–1636. doi: 10.1016/j.cub.2010.07.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Yoshizawa M, Yamamoto Y, O’Quin KE, Jeffery WR. Evolution of an adaptive behavior and its sensory receptors promotes eye regression in blind cavefish. BMC Biol. 2012;10:1–16. doi: 10.1186/1741-7007-10-108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Powers AK, Davis EM, Kaplan SA, Gross JB. Cranial asymmetry arises later in the life history of the blind Mexican cavefish, Astyanax mexicanus. PLoS ONE. 2017;12:e0177419. doi: 10.1371/journal.pone.0177419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Atukorala ADS, Bhatia V, Ratnayake R. Craniofacial skeleton of MEXICAN tetra (Astyanax mexicanus): As a bone disease model. Dev. Dyn. 2019;248:153–161. doi: 10.1002/dvdy.4. [DOI] [PubMed] [Google Scholar]
  • 15.Sifuentes-Romero I, et al. Repeated evolution of eye loss in Mexican cavefish: Evidence of similar developmental mechanisms in independently evolved populations. J. Exp. Zoolog. B Mol. Dev. Evol. 2020;334:423–437. doi: 10.1002/jez.b.22977. [DOI] [PubMed] [Google Scholar]
  • 16.Elipot Y, Hinaux H, Callebert J, Rétaux S. Evolutionary shift from fighting to foraging in blind cavefish through changes in the serotonin network. Curr. Biol. 2013;23:1–10. doi: 10.1016/j.cub.2012.10.044. [DOI] [PubMed] [Google Scholar]
  • 17.Kowalko JE, et al. Convergence in feeding posture occurs through different genetic loci in independently evolved cave populations of Astyanax mexicanus. Proc. Natl. Acad. Sci. 2013;110:16933–16938. doi: 10.1073/pnas.1317192110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kowalko J. Utilizing the blind cavefish Astyanax mexicanus to understand the genetic basis of behavioral evolution. J. Exp. Biol. 2020;223:jeb208835. doi: 10.1242/jeb.208835. [DOI] [PubMed] [Google Scholar]
  • 19.Moran D, Softley R, Warrant EJ. Eyeless Mexican cavefish save energy by eliminating the circadian rhythm in metabolism. PLoS ONE. 2014;9:e107877. doi: 10.1371/journal.pone.0107877. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Mack KL, et al. Repeated evolution of circadian clock dysregulation in cavefish populations. PLOS Genet. 2021;17:e1009642. doi: 10.1371/journal.pgen.1009642. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Duboué ER, Keene AC, Borowsky RL. Evolutionary convergence on sleep loss in Cavefish populations. Curr. Biol. 2011;21:671–676. doi: 10.1016/j.cub.2011.03.020. [DOI] [PubMed] [Google Scholar]
  • 22.O’Gorman M, et al. Pleiotropic function of the oca2 gene underlies the evolution of sleep loss and albinism in cavefish. Curr. Biol. 2021;31:3694–3701.e4. doi: 10.1016/j.cub.2021.06.077. [DOI] [PubMed] [Google Scholar]
  • 23.Aspiras AC, Rohner N, Martineau B, Borowsky RL, Tabin CJ. Melanocortin 4 receptor mutations contribute to the adaptation of cavefish to nutrient-poor conditions. Proc. Natl. Acad. Sci. 2015;112:9668–9673. doi: 10.1073/pnas.1510802112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Riddle MR, et al. Insulin resistance in cavefish as an adaptation to a nutrient-limited environment. Nature. 2018;555:647–651. doi: 10.1038/nature26136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Xiong S, Krishnan J, Peuß R, Rohner N. Early adipogenesis contributes to excess fat accumulation in cave populations of Astyanax mexicanus. Dev. Biol. 2018;441:297–304. doi: 10.1016/j.ydbio.2018.06.003. [DOI] [PubMed] [Google Scholar]
  • 26.Bradic M, Beerli P, García-de León FJ, Esquivel-Bobadilla S, Borowsky RL. Gene flow and population structure in the Mexican blind cavefish complex (Astyanax mexicanus) BMC Evol. Biol. 2012;12:1–17. doi: 10.1186/1471-2148-12-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Jeffery WR, Strickler AG, Yamamoto Y. To see or not to see: Evolution of eye degeneration in Mexican blind cavefish. Integr. Comp. Biol. 2003;43:531–541. doi: 10.1093/icb/43.4.531. [DOI] [PubMed] [Google Scholar]
  • 28.Protas M, Conrad M, Gross JB, Tabin C, Borowsky R. Regressive evolution in the Mexican Cave Tetra, Astyanax mexicanus. Curr. Biol. 2007;17:452–454. doi: 10.1016/j.cub.2007.01.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Jeffery WR. Regressive evolution in astyanax Cavefish. Annu. Rev. Genet. 2009;43:25–47. doi: 10.1146/annurev-genet-102108-134216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Jeffery WR. Evolution and development in the cavefish Astyanax. Curr. Top. Dev. Biol. 2009;86:191–221. doi: 10.1016/S0070-2153(09)01008-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hinaux H, et al. Lens defects in Astyanax mexicanus Cavefish: Evolution of crystallins and a role for alphaA-crystallin. Dev. Neurobiol. 2015;75:505–521. doi: 10.1002/dneu.22239. [DOI] [PubMed] [Google Scholar]
  • 32.Krishnan J, Rohner N. Cavefish and the basis for eye loss. Philos. Trans. R. Soc. B Biol. Sci. 2017;372:20150487. doi: 10.1098/rstb.2015.0487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ma L, Ng M, van der Weele CM, Yoshizawa M, Jeffery WR. Dual roles of the retinal pigment epithelium and lens in Cavefish eye degeneration. J. Exp. Zool. B Mol. Dev. Evol. 2020;334:438–449. doi: 10.1002/jez.b.22923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ma L, et al. A hypomorphic cystathionine ß-synthase gene contributes to cavefish eye loss by disrupting optic vasculature. Nat. Commun. 2020;11:2772. doi: 10.1038/s41467-020-16497-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Alunni A, et al. Developmental mechanisms for retinal degeneration in the blind cavefish Astyanax mexicanus. J. Comp. Neurol. 2007;505:221–233. doi: 10.1002/cne.21488. [DOI] [PubMed] [Google Scholar]
  • 36.Stemmer M, Schuhmacher L-N, Foulkes NS, Bertolucci C, Wittbrodt J. Cavefish eye loss in response to an early block in retinal differentiation progression. Development. 2015;142:743–752. doi: 10.1242/dev.114629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Emam A, Yoffe M, Cardona H, Soares D. Retinal morphology in Astyanax mexicanus during eye degeneration. J. Comput. Neurol. 2020;528:1523–1534. doi: 10.1002/cne.24835. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Yamamoto Y, Jeffery WR. Central role for the lens in cave fish eye degeneration. Science. 2000;289:631–633. doi: 10.1126/science.289.5479.631. [DOI] [PubMed] [Google Scholar]
  • 39.Strickler AG, Yamamoto Y, Jeffery WR. The lens controls cell survival in the retina: Evidence from the blind cavefish Astyanax. Dev. Biol. 2007;311:512–523. doi: 10.1016/j.ydbio.2007.08.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Protas M, et al. Multi-trait evolution in a cave fish, Astyanax mexicanus. Evol. Dev. 2008;10:196–209. doi: 10.1111/j.1525-142X.2008.00227.x. [DOI] [PubMed] [Google Scholar]
  • 41.O’Quin KE, Yoshizawa M, Doshi P, Jeffery WR. Quantitative genetic analysis of retinal degeneration in the blind cavefish Astyanax mexicanus. PLoS ONE. 2013;8:e57281. doi: 10.1371/journal.pone.0057281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.McGaugh SE, et al. The cavefish genome reveals candidate genes for eye loss. Nat. Commun. 2014;5:5307. doi: 10.1038/ncomms6307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Kowalko JE, Ma L, Jeffery WR. Genome editing in Astyanax mexicanus using transcription activator-like effector nucleases (TALENs) JoVE J. Vis. Exp. 2016 doi: 10.3791/54113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Carlson BM, Klingler IB, Meyer BJ, Gross JB. Genetic analysis reveals candidate genes for activity QTL in the blind Mexican tetra, Astyanax mexicanus. PeerJ. 2018;6:e5189. doi: 10.7717/peerj.5189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Warren WC, et al. A chromosome-level genome of Astyanax mexicanus surface fish for comparing population-specific genetic differences contributing to trait evolution. Nat. Commun. 2021;12:1447. doi: 10.1038/s41467-021-21733-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Pottin K, Hinaux H, Rétaux S. Restoring eye size in Astyanax mexicanus blind cavefish embryos through modulation of the Shh and Fgf8 forebrain organising centres. Development. 2011;138:2467–2476. doi: 10.1242/dev.054106. [DOI] [PubMed] [Google Scholar]
  • 47.Gross JB, Furterer A, Carlson BM, Stahl BA. An integrated transcriptome-wide analysis of cave and surface dwelling Astyanax mexicanus. PLoS ONE. 2013;8:e55659. doi: 10.1371/journal.pone.0055659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Hinaux H, et al. De Novo sequencing of Astyanax mexicanus surface fish and Pachón cavefish transcriptomes reveals enrichment of mutations in cavefish putative eye genes. PLoS ONE. 2013;8:e53553. doi: 10.1371/journal.pone.0053553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Ma L, Jeffery WR, Essner JJ, Kowalko JE. Genome editing using TALENs in blind Mexican cavefish, Astyanax mexicanus. PLoS ONE. 2015;10:e0119370. doi: 10.1371/journal.pone.0119370. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Stahl BA, Gross JB. A comparative transcriptomic analysis of development in two astyanax cavefish populations. J. Exp. Zool. B Mol. Dev. Evol. 2017;328:515–532. doi: 10.1002/jez.b.22749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Atukorala ADS, Franz-Odendaal TA. Genetic linkage between altered tooth and eye development in lens-ablated Astyanax mexicanus. Dev. Biol. 2018;441:235–241. doi: 10.1016/j.ydbio.2018.07.008. [DOI] [PubMed] [Google Scholar]
  • 52.Gore AV, et al. An epigenetic mechanism for cavefish eye degeneration. Nat. Ecol. Evol. 2018;2:1155–1160. doi: 10.1038/s41559-018-0569-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Ma L, Parkhurst A, Jeffery WR. The role of a lens survival pathway including sox2 and αA-crystallin in the evolution of cavefish eye degeneration. EvoDevo. 2014;5:28. doi: 10.1186/2041-9139-5-28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Mattick JS. Non-coding RNAs: The architects of eukaryotic complexity. EMBO Rep. 2001;2:986–991. doi: 10.1093/embo-reports/kve230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Wang KC, Chang HY. Molecular mechanisms of long noncoding RNAs. Mol. Cell. 2011;43:904–914. doi: 10.1016/j.molcel.2011.08.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Kapranov P, et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007;316:1484–1488. doi: 10.1126/science.1138341. [DOI] [PubMed] [Google Scholar]
  • 57.Brosnan CA, Voinnet O. The long and the short of noncoding RNAs. Curr. Opin. Cell Biol. 2009;21:416–425. doi: 10.1016/j.ceb.2009.04.001. [DOI] [PubMed] [Google Scholar]
  • 58.Gomes AQ, Nolasco S, Soares H. Non-coding RNAs: Multi-tasking molecules in the cell. Int. J. Mol. Sci. 2013;14:16010–16039. doi: 10.3390/ijms140816010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Shi J, Zhou T, Chen Q. Exploring the expanding universe of small RNAs. Nat. Cell Biol. 2022;24:415–423. doi: 10.1038/s41556-022-00880-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Bartel DP. Metazoan MicroRNAs. Cell. 2018;173:20–51. doi: 10.1016/j.cell.2018.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Derrien T, et al. The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression. Genome Res. 2012;22:1775–1789. doi: 10.1101/gr.132159.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Ma L, Bajic VB, Zhang Z. On the classification of long non-coding RNAs. RNA Biol. 2013;10:924–933. doi: 10.4161/rna.24604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Frankish A, et al. GENCODE 2021. Nucl. Acids Res. 2021;49:D916–D923. doi: 10.1093/nar/gkaa1087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Wucher V, et al. FEELnc: A tool for long non-coding RNA annotation and its application to the dog transcriptome. Nucl. Acids Res. 2017;45:e57. doi: 10.1093/nar/gkw1306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Ransohoff JD, Wei Y, Khavari PA. The functions and unique features of long intergenic non-coding RNA. Nat. Rev. Mol. Cell Biol. 2018;19:143–157. doi: 10.1038/nrm.2017.104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Lee JT. Epigenetic regulation by long noncoding RNAs. Science. 2012;338:1435–1439. doi: 10.1126/science.1231776. [DOI] [PubMed] [Google Scholar]
  • 67.Mercer TR, Mattick JS. Structure and function of long noncoding RNAs in epigenetic regulation. Nat. Struct. Mol. Biol. 2013;20:300–307. doi: 10.1038/nsmb.2480. [DOI] [PubMed] [Google Scholar]
  • 68.Long Y, Wang X, Youmans DT, Cech TR. How do lncRNAs regulate transcription? Sci. Adv. 2017;3:eaao2110. doi: 10.1126/sciadv.aao2110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Marchese FP, Raimondi I, Huarte M. The multidimensional mechanisms of long noncoding RNA function. Genome Biol. 2017;18:206. doi: 10.1186/s13059-017-1348-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Fernandes JCR, Acuña SM, Aoki JI, Floeter-Winter LM, Muxel SM. Long non-coding RNAs in the regulation of gene expression: physiology and disease. Non-Coding RNA. 2019;5:17. doi: 10.3390/ncrna5010017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Statello L, Guo C-J, Chen L-L, Huarte M. Gene regulation by long non-coding RNAs and its biological functions. Nat. Rev. Mol. Cell Biol. 2021;22:96–118. doi: 10.1038/s41580-020-00315-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Fatica A, Bozzoni I. Long non-coding RNAs: New players in cell differentiation and development. Nat. Rev. Genet. 2014;15:7–21. doi: 10.1038/nrg3606. [DOI] [PubMed] [Google Scholar]
  • 73.Pauli A, et al. Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis. Genome Res. 2012;22:577–591. doi: 10.1101/gr.133009.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Wu M, Zhang S, Chen X, Xu H, Li X. Expression and function of lncRNA MALAT-1 in the embryonic development of zebrafish. Gene. 2019;680:65–71. doi: 10.1016/j.gene.2018.09.037. [DOI] [PubMed] [Google Scholar]
  • 75.Zhao XF, et al. Identification and analysis of long non-coding RNAs in Leuciscus waleckii adapted to highly alkaline conditions. Front. Physiol. 2021;12:665268. doi: 10.3389/fphys.2021.665268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Li F, Wen X, Zhang H, Fan X. Novel insights into the role of long noncoding RNA in ocular diseases. Int. J. Mol. Sci. 2016;17:478. doi: 10.3390/ijms17040478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Zhang L, et al. Long non-coding RNAs in ocular diseases: New and potential therapeutic targets. FEBS J. 2019;286:2261–2272. doi: 10.1111/febs.14827. [DOI] [PubMed] [Google Scholar]
  • 78.Zhang X, Hong R, Chen W, Xu M, Wang L. The role of long noncoding RNA in major human disease. Bioorgan. Chem. 2019;92:103214. doi: 10.1016/j.bioorg.2019.103214. [DOI] [PubMed] [Google Scholar]
  • 79.Pasquale LR, et al. CDKN2B-AS1 genotype-glaucoma feature correlations in primary open-angle glaucoma patients from the United States. Am. J. Ophthalmol. 2013;155:342–353.e5. doi: 10.1016/j.ajo.2012.07.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Andrews, S. Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010).
  • 81.Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinforma. Oxf. Engl. 2016;32:3047–3048. doi: 10.1093/bioinformatics/btw354. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Chen S, Zhou Y, Chen Y, Gu J. fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinforma. Oxf. Engl. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Haas BJ, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Prot. 2013;8:1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. ArXiv210611799 Q-Bio (2021). [DOI] [PMC free article] [PubMed]
  • 85.Kriventseva EV, et al. OrthoDB v10: Sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucl. Acids Res. 2019;47:D807–D811. doi: 10.1093/nar/gky1053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Bushmanova E, Antipov D, Lapidus A, Suvorov V, Prjibelski AD. rnaQUAST: A quality assessment tool for de novo transcriptome assemblies. Bioinformatics. 2016;32:2210–2212. doi: 10.1093/bioinformatics/btw218. [DOI] [PubMed] [Google Scholar]
  • 87.Imarazene B, et al. A supernumerary “B-sex” chromosome drives male sex determination in the Pachón cavefish, Astyanax mexicanus. Curr. Biol. 2021;31:4800–4809.e9. doi: 10.1016/j.cub.2021.08.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Hart AJ, et al. EnTAP: Bringing faster and smarter functional annotation to non-model eukaryotic transcriptomes. Mol. Ecol. Resour. 2020;20:591–604. doi: 10.1111/1755-0998.13106. [DOI] [PubMed] [Google Scholar]
  • 89.Huerta-Cepas J, et al. eggNOG 4.5: A hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucl. Acids Res. 2016;44:D286–D293. doi: 10.1093/nar/gkv1248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.The Gene Ontology Consortium The gene ontology resource: 20 years and still GOing strong. Nucl. Acids Res. 2019;47:D330–D338. doi: 10.1093/nar/gky1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucl. Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Kanehisa M. Toward understanding the origin and evolution of cellular organisms. Prot. Sci. Publ. Prot. Soc. 2019;28:1947–1951. doi: 10.1002/pro.3715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. KEGG: Integrating viruses and cellular organisms. Nucl. Acids Res. 2021;49:D545–D551. doi: 10.1093/nar/gkaa970. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Letunic I, Doerks T, Bork P. SMART: Recent updates, new developments and status in 2015. Nucl. Acids Res. 2015;43:D257–D260. doi: 10.1093/nar/gku949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Finn RD, et al. Pfam: The protein families database. Nucl. Acids Res. 2014;42:D222–D230. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinforma. Oxf. Engl. 2013;29:2933–2935. doi: 10.1093/bioinformatics/btt509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Kalvari I, et al. Rfam 14: Expanded coverage of metagenomic, viral and microRNA families. Nucl. Acids Res. 2021;49:D192–D200. doi: 10.1093/nar/gkaa1047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Kang Y-J, et al. CPC2: A fast and accurate coding potential calculator based on sequence intrinsic features. Nucl. Acids Res. 2017;45:W12–W16. doi: 10.1093/nar/gkx428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Camargo AP, Sourkov V, Pereira GAG, Carazzolle MF. RNAsamba: Neural network-based assessment of the protein-coding potential of RNA sequences. NAR Genom. Bioinforma. 2020;2:lqz024. doi: 10.1093/nargab/lqz024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Wang L, et al. CPAT: Coding-potential assessment tool using an alignment-free logistic regression model. Nucl. Acids Res. 2013;41:e74. doi: 10.1093/nar/gkt006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Ramos TAR, et al. RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction. F1000Research. 2021;10:323. doi: 10.12688/f1000research.52350.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Li A, Zhang J, Zhou Z. PLEK: A tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinform. 2014;15:311. doi: 10.1186/1471-2105-15-311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Mishra SK, Wang H. Computational analysis predicts hundreds of coding lncRNAs in Zebrafish. Biology. 2021;10:371. doi: 10.3390/biology10050371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Aciole Barbosa D, et al. Transcriptomic profiling and microsatellite identification in Cobia (Rachycentron canadum), using high-throughput RNA sequencing. Mar. Biotechnol. 2021 doi: 10.1007/s10126-021-10081-0. [DOI] [PubMed] [Google Scholar]
  • 105.Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinforma. Oxf. Engl. 2006;22:1658–1659. doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
  • 106.Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinforma. Oxf. Engl. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Wu TD, Watanabe CK. GMAP: A genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005;21:1859–1875. doi: 10.1093/bioinformatics/bti310. [DOI] [PubMed] [Google Scholar]
  • 108.Dainat, J. et al. NBISweden/AGAT: AGAT-v0.9.1. 10.5281/zenodo.8178877 (2022).
  • 109.Camacho C, et al. BLAST+: Architecture and applications. BMC Bioinform. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Cock PJA, Chilton JM, Grüning B, Johnson JE, Soranzo N. NCBI BLAST+ integrated into Galaxy. GigaScience. 2015;4:s13742–s14015. doi: 10.1186/s13742-015-0080-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Cock PJA, Grüning BA, Paszkiewicz K, Pritchard L. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology. PeerJ. 2013;1:e167. doi: 10.7717/peerj.167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Hu X, et al. ZFLNC: A comprehensive and well-annotated database for zebrafish lncRNA. Database. 2018;2018:bay114. doi: 10.1093/database/bay114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Li Z, et al. LncBook 2.0: Integrating human long non-coding RNAs with multi-omics annotations. Nucl. Acids Res. 2023;51:D186–D191. doi: 10.1093/nar/gkac999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Diederichs S. The four dimensions of noncoding RNA conservation. Trends Genet. 2014;30:121–123. doi: 10.1016/j.tig.2014.01.004. [DOI] [PubMed] [Google Scholar]
  • 115.Tsagakis I, Douka K, Birds I, Aspden JL. Long non-coding RNAs in development and disease: Conservation to mechanisms. J. Pathol. 2020;250:480–495. doi: 10.1002/path.5405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Heberle H, Meirelles GV, da Silva FR, Telles GP, Minghim R. InteractiVenn: A web-based tool for the analysis of sets through Venn diagrams. BMC Bioinform. 2015;16:1–7. doi: 10.1186/s12859-015-0611-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Chamberlain SA, Szöcs E. Taxize: Taxonomic search and retrieval in R. F1000Research. 2013;2:191. doi: 10.12688/f1000research.2-191.v1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Drost H-G, Gabel A, Liu J, Quint M, Grosse I. myTAI: Evolutionary transcriptomics with R. Bioinformatics. 2018;34:1589–1590. doi: 10.1093/bioinformatics/btx835. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Yu G. Using ggtree to visualize data on tree-like structures. Curr. Protoc. Bioinforma. 2020;69:e96. doi: 10.1002/cpbi.96. [DOI] [PubMed] [Google Scholar]
  • 120.Casane D, Rétaux S. Chapter five—evolutionary genetics of the cavefish Astyanax mexicanus. In: Foulkes NS, editor. Advances in Genetics. Academic Press; 2016. pp. 117–159. [DOI] [PubMed] [Google Scholar]
  • 121.Epskamp S, Cramer AOJ, Waldorp LJ, Schmittmann VD, Borsboom D. qgraph: Network visualizations of relationships in psychometric data. J. Stat. Softw. 2012;48:1–18. doi: 10.18637/jss.v048.i04. [DOI] [Google Scholar]
  • 122.Heyne S, Will S, Beckstette M, Backofen R. Lightweight comparison of RNAs based on exact sequence–structure matches. Bioinformatics. 2009;25:2095–2102. doi: 10.1093/bioinformatics/btp065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Raden M, et al. Freiburg RNA tools: A central online resource for RNA-focused research and teaching. Nucl. Acids Res. 2018;46:W25–W29. doi: 10.1093/nar/gky329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Yang Y, Smith SA. Optimizing de novo assembly of short-read RNA-seq data for phylogenomics. BMC Genom. 2013;14:328. doi: 10.1186/1471-2164-14-328. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Kerkvliet J, de Fouchier A, van Wijk M, Groot AT. The Bellerophon pipeline, improving de novo transcriptomes and removing chimeras. Ecol. Evol. 2019;9:10513–10521. doi: 10.1002/ece3.5571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Gingeras TR. Implications of chimaeric non-co-linear transcripts. Nature. 2009;461:206–211. doi: 10.1038/nature08452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Frenkel-Morgenstern M, Gorohovski A, Vucenovic D, Maestre L, Valencia A. ChiTaRS 2.1—an improved database of the chimeric transcripts and RNA-seq data with novel sense–antisense chimeric RNA transcripts. Nucl. Acids Res. 2015;43:D68–D75. doi: 10.1093/nar/gku1199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Kumari P, Sampath K. cncRNAs: Bi-functional RNAs with protein coding and non-coding functions. Semin. Cell Dev. Biol. 2015;47–48:40–51. doi: 10.1016/j.semcdb.2015.10.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Ji Z, Song R, Regev A, Struhl K. Many lncRNAs, 5’UTRs, and pseudogenes are translated and some are likely to express functional proteins. Elife. 2015;4:e08890. doi: 10.7554/eLife.08890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Li J, Liu C. Coding or noncoding, the converging concepts of RNAs. Front. Genet. 2019;10:496. doi: 10.3389/fgene.2019.00496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Lu S, et al. A hidden human proteome encoded by ‘non-coding’ genes. Nucl. Acids Res. 2019;47:8111–8125. doi: 10.1093/nar/gkz646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Nam J-W, Choi S-W, You B-H. Incredible RNA: Dual functions of coding and noncoding. Mol. Cells. 2016;39:367–374. doi: 10.14348/molcells.2016.0039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Sampath K, Ephrussi A. CncRNAs: RNAs with both coding and non-coding roles in development. Development. 2016;143:1234–1241. doi: 10.1242/dev.133298. [DOI] [PubMed] [Google Scholar]
  • 134.Dhamija S, Menon MB. Non-coding transcript variants of protein-coding genes—what are they good for? RNA Biol. 2018;15:1025–1031. doi: 10.1080/15476286.2018.1511675. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Ponting CP, Oliver PL, Reik W. Evolution and functions of long noncoding RNAs. Cell. 2009;136:629–641. doi: 10.1016/j.cell.2009.02.006. [DOI] [PubMed] [Google Scholar]
  • 136.Johnsson P, Lipovich L, Grandér D, Morris KV. Evolutionary conservation of long non-coding RNAs; Sequence, structure, function. Biochim. Biophys. Acta BBA Gen. Subj. 2014;1840:1063–1071. doi: 10.1016/j.bbagen.2013.10.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Ulitsky I. Evolution to the rescue: Using comparative genomics to understand long non-coding RNAs. Nat. Rev. Genet. 2016;17:601–614. doi: 10.1038/nrg.2016.85. [DOI] [PubMed] [Google Scholar]
  • 138.Hezroni H, et al. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Rep. 2015;11:1110–1122. doi: 10.1016/j.celrep.2015.04.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.Ulitsky I, Shkumatava A, Jan CH, Sive H, Bartel DP. Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell. 2011;147:1537–1550. doi: 10.1016/j.cell.2011.11.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.Fantes J, et al. Mutations in SOX2 cause anophthalmia. Nat. Genet. 2003;33:462–463. doi: 10.1038/ng1120. [DOI] [PubMed] [Google Scholar]
  • 141.Shahryari A, Jazi MS, Samaei NM, Mowla SJ. Long non-coding RNA SOX2OT: Expression signature, splicing patterns, and emerging roles in pluripotency and tumorigenesis. Front. Genet. 2015;6:196. doi: 10.3389/fgene.2015.00196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Amaral PP, et al. Complex architecture and regulated expression of the Sox2ot locus during vertebrate development. RNA. 2009;15:2013–2027. doi: 10.1261/rna.1705309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143.Shahryari A, et al. Two novel splice variants of SOX2OT, SOX2OT-S1, and SOX2OT-S2 are coupregulated with SOX2 and OCT4 in esophageal squamous cell carcinoma. Stem Cells. 2014;32:126–134. doi: 10.1002/stem.1542. [DOI] [PubMed] [Google Scholar]
  • 144.Wang Z, Tan M, Chen G, Li Z, Lu X. LncRNA SOX2-OT is a novel prognostic biomarker for osteosarcoma patients and regulates osteosarcoma cells proliferation and motility through modulating SOX2. IUBMB Life. 2017;69:867–876. doi: 10.1002/iub.1681. [DOI] [PubMed] [Google Scholar]
  • 145.Messemaker TC, et al. Allele-specific repression of Sox2 through the long non-coding RNA Sox2ot. Sci. Rep. 2018;8:386. doi: 10.1038/s41598-017-18649-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146.Li P-Y, Wang P, Gao S-G, Dong D-Y. Long noncoding RNA SOX2-OT: Regulations, functions, and roles on mental illnesses, cancers, and diabetic complications. BioMed. Res. Int. 2020;2020:e2901589. doi: 10.1155/2020/2901589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147.Fong H, Hohenstein KA, Donovan PJ. Regulation of self-renewal and pluripotency by Sox2 in human embryonic stem cells. Stem Cells. 2008;26:1931–1938. doi: 10.1634/stemcells.2007-1002. [DOI] [PubMed] [Google Scholar]
  • 148.Pevny LH, Nicolis SK. Sox2 roles in neural stem cells. Int. J. Biochem. Cell Biol. 2010;42:421–424. doi: 10.1016/j.biocel.2009.08.018. [DOI] [PubMed] [Google Scholar]
  • 149.Robles V, Martí M, Belmonte JCI. Study of pluripotency markers in zebrafish embryos and transient embryonic stem cell cultures. Zebrafish. 2011;8:57–63. doi: 10.1089/zeb.2010.0684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150.Kondoh H, Uchikawa M, Ishii Y. Chapter 12—multiple roles for SOX2 in eye development. In: Kondoh H, Lovell-Badge R, editors. SOX2. Academic Press; 2016. pp. 217–233. [Google Scholar]
  • 151.Dvorakova M, et al. Early ear neuronal development, but not olfactory or lens development, can proceed without SOX2. Dev. Biol. 2020;457:43–56. doi: 10.1016/j.ydbio.2019.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152.Agathocleous M, et al. A directional Wnt/β-catenin-Sox2-proneural pathway regulates the transition from proliferation to differentiation in the Xenopusretina. Development. 2009;136:3289–3299. doi: 10.1242/dev.040451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 153.Matsushima D, Heavner W, Pevny LH. Combinatorial regulation of optic cup progenitor cell fate by SOX2 and PAX6. Development. 2011;138:443–454. doi: 10.1242/dev.055178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 154.Fujimura N. WNT/β-catenin signaling in vertebrate eye development. Front. Cell Dev. Biol. 2016;4:138. doi: 10.3389/fcell.2016.00138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 155.Kinney BA, et al. Sox2 and canonical Wnt signaling interact to activate a developmental checkpoint coordinating morphogenesis with mesoderm fate acquisition. Cell Rep. 2020;33:108311. doi: 10.1016/j.celrep.2020.108311. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

This Transcriptome Shotgun Assembly project has been deposited at DDBJ/EMBL/GenBank under the accession DANNHJ000000000. The version described in this paper is the first version, DANNHJ010000000. The databases, sequences, scripts and intermediate datasets generated during the current study are available in the Open Science Framework (OSF) repository, under the identifier https://doi.org/10.17605/OSF.IO/3Z7QN.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES