Abstract
Scarce genomic resources have limited the development of breeding programs for serrasalmid fish Colossoma macropomum (tambaqui) and Piaractus mesopotamicus (pacu), the key native freshwater fish species produced in South America. The main objectives of this study were to design a dense SNP array for this fish group and to validate its performance on farmed populations from several locations in South America. Using multiple approaches based on different populations of tambaqui and pacu, a final list of 29,575 and 29,612 putative SNPs was selected, respectively, to print an Axiom AFFYMETRIX (THERMOFISHER) SerraSNP array. After validation, 74.17% (n = 21,963) and 71.25% (n = 21,072) of SNPs were classified as polymorphic variants in pacu and tambaqui, respectively. Most of the SNPs segregated within each population ranging from 14,199 to 19,856 in pacu; and from 15,075 to 20,380 in tambaqui. Our results indicate high levels of genetic diversity and clustered samples according to their hatchery origin. The developed SerraSNP array represents a valuable genomic tool approaching in-depth genetic studies for these species.
Subject terms: Computational biology and bioinformatics, Genetics, Molecular biology
Introduction
Tambaquis, pacus and piranhas are included in the family Serrasalmidae (Ostariophysi: Characiformes)1, and they are broadly distributed throughout the major river systems of South America2. Serrasalmids (round-shaped fish) are a diverse group comprising 101 valid species, represented by a variety of feeding strategies and associated morphological adaptations3. While piranhas are popularly known for voracious and carnivorous behavior, tambaquis and pacus are omnivorous, possessing special teeth to eat fruit and seeds, and have a great ability to harness natural food sources (zooplankton). Beyond their feeding habitats, these species have excellent traits of economic interest for aquaculture including high growth performance, ease of reproduction and market appreciation4–6.
Tambaqui (Colossoma macropomum), pacu (Piaractus mesopotamicus) and pirapitinga (Piaractus brachypomus) are the major serrasalmid fish produced by the aquaculture industry in South America4. Tambaqui and pirapitinga are mainly farmed in the Amazon region (north region of Brazil, Colombia and Peru), while pacu is produced at higher latitudes in South America (southern Brazil and northern Argentina). These species have also been introduced for farming purposes in several countries in Asia, including China, Indonesia, Malaysia, Myanmar and Viet Nam5,7. About 142 thousand tonnes of tambaqui was produced in 2016, of which 96.4% was produced by Brazil (approximately 137 thousand tonnes)8. Pacu is also primarily produced in Brazil, representing the second largest native fish species production in the country, with 12 thousand tonnes produced in 20199.
Despite the potential for serrasalmid species in aquaculture, their production remains entirely based on genetically unimproved stocks, and there is enormous untapped potential for selective breeding in these species. Traditional pedigree-based breeding programs have only recently been initiated for these species10–12. Currently, there are no reports of the incorporation of genomic information to breeding programs for serrasalmid species, which can be explained by the lack of genomic resources for this fish group. For example, only a few studies have described the discovery of novel molecular markers for these species13–15. However, the recent availability of genetic maps constructed using GBS (Genotyping by Sequencing) for tambaqui16 and RADseq (Restriction site-associated DNA sequencing) for pacu17, create new opportunities for the application of molecular markers to uncover the genetic basis of economically and ecologically important traits.
Genome wide association studies (GWAS) are considered useful tools for dissecting complex traits in both natural and farmed populations18,19. In addition, the use of genomic selection has the potential to increase genetic gain20,21 and, consequently, boost emerging breeding programs of new species. A few GWAS have been performed for serrasalmid species. For example, suggestive Quantitative Trait Loci (QTLs) and genes have been associated with resistance against an important bacterial pathogen in pacu17, and to a lack of intermuscular bones in tambaqui22. One of the major limitations for application of GWAS and genomic selection in serrasalmid fish is the lack of a standard high-throughput and robust genotyping assay. Dense single nucleotide polymorphism (SNP) arrays have been developed for popular farmed fish species, such as Atlantic salmon23,24, rainbow trout25, coho salmon26, catfish27, tilapia28 and carp29. These arrays have been broadly used to study traits of economic importance and to apply genomic selection in aquaculture breeding programs30,31.
In the present study, the main objective was to design and evaluate a multi-species SNP array for two key serrasalmid fish of economic importance, C. macropomum and P. mesopotamicus, using SNPs from multiple sources including different populations from commercial broodstocks and breeding programs. The SerraSNP array was developed using AFFYMETRIX Axiom (THERMOFISHER) technology and validated in samples from various populations of pacu and tambaqui. This dense SNP array was also tested in a closely related serrasalmid (P. brachypomus) to assess its potential use in other commercial species.
Results
The sequence data used to generate the SNP dataset from ddRADseq and RADseq is part of a larger ongoing study aimed at exploring the genomic diversity of farmed tambaqui and pacu. Therefore, the descriptive data sequence of ddRADseq and RADseq will be prepared and presented in separate studies. After the de novo formation of the loci catalog and consequent read alignment, thousands of SNPs were identified in both species. Following quality-control filtering of putative SNPs, an initial selection of 130,403 and 81,848 filtered SNPs from pacu and tambaqui, respectively were identified. The in silico prediction to select the best probes according to the Axiom array criteria resulted in 99,682 and 42,851 recommended markers for pacu and tambaqui, respectively (Table 1). Further filtering step was applied to choose 30K SNPs for each species, selecting the final list of 29,612 putative SNPs for pacu and 29,575 putative SNPs for tambaqui to be included at the SerraSNP Axiom array. Most of the SNPs incorporated in the SerraSNP array were obtained by RADseq (17,932) in pacu and by ddRADseq (25,929) in tambaqui (Table 2). 2200 validated SNPs positioned in the tambaqui linkage map described by Nunes-Silva et al.16 passed the SNP filtering steps and were included in the final array design (Table 2).
Table 1.
SNPs classification according the Axiom quality control criteria (p-convert).
Category | Number of SNPs | |
---|---|---|
Pacu | Tambaqui | |
Recommended | 99,682 | 42,851 |
Neutral | 18,308 | 11,857 |
Not_recommended | 12,413 | 27,140 |
Total | 130,403 | 81,848 |
Table 2.
SNP performance in the Axiom array, according to the method of SNP discovery in each species.
Species | Method | Putative SNPs | Selected SNPs | Polymorphic SNPs | Conversion rate | MAF |
---|---|---|---|---|---|---|
Pacu | RNA-Seq | 80,004 | 8052 | 5842 | 0.72 | 0.254 |
RADseq | 43,355 | 17,932 | 13,345 | 0.74 | 0.175 | |
ddRADseq | 7044 | 3628 | 2776 | 0.76 | 0.160 | |
Total | 130,403 | 29,612 | 21,963 | 0.74 | 0.203 | |
Tambaqui | RNA-Seq | 18,282 | 1446 | 480 | 0.33 | 0.234 |
GBS | 6803 | 2200 | 1888 | 0.85 | 0.237 | |
ddRADseq | 56,763 | 25,929 | 18,704 | 0.72 | 0.260 | |
Total | 81,848 | 29,575 | 21,072 | 0.71 | 0.247 |
The SerraSNP array was used to genotype 94 and 58 samples of pacu and tambaqui, respectively. A list containing the putative polymorphic SNPs and the flanking sequences for tambaqui and pacu were provided respectively, including: the source of sequencing and the classification based on p-convert values and wobble criteria (Supplementary Table S1 and Supplementary Table S2). Almost all samples passed the QC and genotype call rate > 97% threshold, except two samples of pacu and one of tambaqui. The classification of SNPs according to their quality showed that 74.17% (n = 21,963) and 71.25% (n = 21,072) were classified as polymorphic (either “Poly High Resolution” and “No Minor Hom”) in pacu and tambaqui, respectively (Table 3). The average MAF of these polymorphic SNPs in the combined total samples of pacu and tambaqui were 0.203 and 0.247, respectively (Table 4). In general, the lower MAF value in pacu is due to the larger number of SNPs distributed between MAF 0.01 and 0.049, which is mostly the result of SNPs belonging to the RADseq and ddRADseq dataset (Fig. 1). The different techniques for SNP discovery showed low interference on population segregation of SNPs (Table 4; Supplementary Table S3). Most of SNPs were segregating within each population with MAF > 0.01, ranging from 14,199 (Hatchery2) to 19,856 (Hatchery4) in pacu; and from 15,075 (Hatchery9) to 20,380 (Hatchery1) in tambaqui (Table 4).
Table 3.
Summary of the SNP classification according to their performance in the Axiom SerraSNP array of each species.
Category | Pacu | Tambaqui | ||
---|---|---|---|---|
Number of markers | % | Number of markers | % | |
PolyHighResolution | 16,780 | 56.67 | 19,661 | 66.48 |
NoMinorHom | 5183 | 17.50 | 1411 | 4.77 |
MonoHighResolution | 1771 | 5.98 | 1468 | 4.96 |
CallRateBelowThreshold | 1166 | 3.94 | 1443 | 4.88 |
OffTargetVariant | 570 | 1.92 | 550 | 1.86 |
Other | 4142 | 13.99 | 5042 | 17.05 |
Total | 29,612 | 100.00 | 29,575 | 100.00 |
Table 4.
Descriptive population genetic estimates for the sampled pacu and tambaqui populations included in the validation of the array.
Species | Population | n | SNPs MAF > 0.01 | Average MAF | Ho | He |
---|---|---|---|---|---|---|
Pacu | Hatchery 1 | 42 | 19,608 | 0.221 | 0.306 | 0.307 |
Hatchery 2 | 21 | 18,461 | 0.214 | 0.307 | 0.294 | |
Hatchery 3 | 13 | 14,199 | 0.235 | 0.402 | 0.313 | |
Hatchery 4 | 18 | 19,856 | 0.210 | 0.285 | 0.292 | |
All populations | 94 | 21,963 | 0.203 | 0.276 | 0.283 | |
Tambaqui | Brazil—combined | 28 | 20,380 | 0.266 | 0.362 | 0.356 |
Hatchery 5 | 9 | 19,572 | 0.267 | 0.39 | 0.357 | |
Hatchery 6 | 9 | 17,172 | 0.268 | 0.426 | 0.352 | |
Hatchery 7 | 10 | 19,078 | 0.274 | 0.379 | 0.365 | |
Colombia—Hatchery 8 | 15 | 15,649 | 0.247 | 0.351 | 0.331 | |
Peru—Hatchery 9 | 15 | 15,075 | 0.248 | 0.353 | 0.329 | |
All populations | 58 | 21,072 | 0.247 | 0.306 | 0.338 | |
Pirapitinga | Wild | 10 | 3042 | 0.278 | 0.454 | 0.348 |
Figure 1.
SNP distribution according to the MAF values of each species. 21,963 validated SNPs (74.17%) in pacu (blue) and 21,072 validated SNPs (71.25%) in tambaqui (green) were considered. The validation of SNPs was carried out in 94 pacu and in 58 tambaqui individuals, respectively.
In relation to SNPs annotation, 14,033 (66.9%) and 13,046 SNPs (62.7%) were located into transcribed regions in pacu and tambaqui, respectively (Supplementary Table S4). Regarding the RNA-Seq derived SNPs in pacu 2047 SNPs (48.3%) were in intergenic regions; 1486 (35.1%) in untranslated regions (5′ and 3′ UTR); and 705 (16.6%) in coding sequences (cds), including 490 SNPs as synonymous and 215 as missense variants (Table 5). In relation to RNA-Seq derived SNPs in tambaqui, 208 SNPs (69.8%) were identified in intergenic regions, 31 (10.4%) in 5′ and 3′ UTR; and 59 (19.8%) were considered as synonymous SNPs (Table 5).
Table 5.
Classification of SNPs from RNA-Seq included in the array, considering the annotation of 4093 SNPs from pacu and 294 SNPs from tambaqui.
Category | Number of SNPs | |
---|---|---|
Pacu (%) | Tambaqui (%) | |
Intergenic | 2047 (48.3) | 208 (69.8) |
UTR 3′ prime | 1202 (28.4) | 17 (5.7) |
UTR 5′ prime | 284 (6.7) | 14 (4.7) |
Exon | ||
Synonymous SNP | 490 (11.6) | 59 (19.8) |
Missense SNP | 215 (5.0) | – |
Total of effects | 4238 (100) | 298 (100) |
Percentage values are between parenthesis.
To assess the utility of the SerraSNP array on a closely related species, which is also relevant for aquaculture, a set of DNA samples of pirapitinga (Piaractus brachypomus) was also tested. The samples passed the QC call rate when using a 94% threshold in the 30K SNP dataset of pacu, which demonstrates a high SNP conversion rate across this species. However, the polymorphic rate was much lower compared to pacu, resulting in 3042 SNPs (about 10% for all 30K SNPs of pacu on the SerraSNP array) (Table 4). The transferability of tambaqui markers into pacu or pirapitinga DNA resulted in failed SNP calling, even using a lower QC call rate (< 70%). The same occurred when investigating the pacu markers using tambaqui DNA samples.
Parameters of population genetics were calculated between different hatcheries of both species. The expected heterozygosity ranged between 0.29 to 0.31 in pacu and between 0.33 to 0.37 in tambaqui. The expected heterozygosity for the pirapitinga samples was 0.35. Our results indicated that populations of both species revealed high levels of genetic diversity using the parameters of MAF and heterozygosity (Table 4). Overall, no strong evidence of heterozygous deficiency was detected, and most populations had higher levels of observed heterozygosity than expected (according to HWE). In both species, the populations with highest MAF also revealed the highest heterozygosity values (Hatchery3 for pacu, and Hatchery1 for tambaqui); and, consequently, these populations have the highest genetic variability values in the present study (Table 4).
According to IBS analysis for pacu (Supplementary Fig. S1) and tambaqui (Supplementary Fig. S2), and DAPC results (Fig. 2), there is evidence of the genetic structure of pacu and tambaqui samples clustering according to their hatchery origin. In pacu, hatchery2 and hatchery3 tended to cluster separately from the group formed by hatchery1 and hatchery4, which shows high genetic similarity (Supplementary Fig. S1). In tambaqui, hatchery8 (Colombia) and hatchery9 (Peru) show clear evidence of genetic differentiation in relation to populations from Brazil, which demonstrated low genetic structure to each other, particularly between hatchery6 and hatchery7 (Supplementary Fig. S2).
Figure 2.
DAPC analysis from hatcheries of pacu (Hatchery 1–4) (a) and tambaqui (Hatchery 5–9) (b) to demonstrate the genetic structure using the SNPs. DAPC was performed adopting an optimum number of principal components (PC = 10) calculated using the α-score function of the Adegenet software.
Due to the absence of a reference genome at chromosome level (at least not published yet) for pacu and tambaqui, the genome coverage of the polymorphic SNPs cannot be evaluated in the present study. However, correlation analysis between previously mapped SNPs included on the SerraSNP array and the corresponding linkage map for tambaqui (SNPs from GBS technique16) and pacu (RADseq17) revealed high and positive values for the relative amount (0.90 and 0.99, respectively) and density (0.98 and 0.99, respectively) (Supplementary Table S5), suggesting a wide coverage of these markers in the genome. Moreover, the extent of Linkage Disequilibrium (LD) between markers was assessed by SNP pruning for the different populations of pacu and tambaqui (Fig. 3). Pairwise r2 was calculated among the polymorphic SNPs (MAF > 0.01) and a range of r2 pruning thresholds were applied (from 0.1 to 0.9) to determine the number of markers remaining after each filtering step in each population of both species. Thus, we have a picture of the number of SNPs with different levels of redundant genotypic information. Generally, different profiles of increasing numbers of markers, at increasing levels of LD pruning, can be detected in populations of pacu. For instance, the lowest and the highest number of pruned SNPs from pacu hatchery1 and hatchery3 were at different levels of LD. Pacu hatchery2 and hatchery4 showed similar numbers of SNPs pruned at different LD levels. Assuming a r2 > 0.8, which is considered a strong LD threshold, the number of informative SNPs showing non-highly redundant genotypic information ranges from about 9K (hatchery3) to 19K (hatchery1), depending on the population of pacu analysed. In contrast, the number of pruned SNPs at different levels of LD from tambaqui populations, showed a similar trend between the different populations analysed. Assuming a r2 > 0.8, the number of informative SNPs showing non-highly redundant genotypic information ranges from about 11K (hatchery6) to 15K (hatchery5), depending on the population of tambaqui analysed.
Figure 3.
Linkage disequilibrium (LD) by SNP pruning between markers among all the sampled pacu and tambaqui populations (Hatchery 1–9). 21,963 validated SNPs (74.17%) in pacu and 21,072 validated SNPs (71.25%) in tambaqui were considered.
Discussion
Dense SNP arrays have been shown to facilitate genome-scale studies by allowing the simultaneous evaluation of thousands of SNPs in commercially important fish species, such as Atlantic salmon23,24, rainbow trout25 and tilapia28. These markers have facilitated the analysis of GWAS for important commercial traits (growth rate, sex determination and disease resistance) and also the implementation of genomic selection in fish species32–35. Although pacu and tambaqui represent the main native fish of continental aquaculture in South America, studies aimed at the incorporation of genome-wide SNP information to boost the breeding programs of these species are scarce. Therefore, the broad utility (multi-species) and open access format of the array presented here will allow the advancement of genomic studies in this fish group and support ongoing and emerging breeding programs in South America.
After assessment of the cluster properties of each of the SNPs in the SerraSNP array, most of the markers on the Axiom platform were designated as high quality and polymorphic, with validation rates of 74.17% and 71.25% in pacu and tambaqui, respectively. This conversion rate is similar to previous SNP arrays developed for other aquaculture species36,37, demonstrating the efficacy of our multi-strategy design and stringent filtering steps for robust SNP discovery. A balanced conversion rate was observed across the four main SNP discovery techniques, except the RNA-Seq-derived SNPs with lower conversion rate (33%) in tambaqui, which could be linked to the low coverage sequencing (454 Roche technology) and the small sampling used to generate the RNA-Seq database (i.e., the previously published dataset was not sequenced with primarily target of SNP discovery), or even to the absence of a reference genome resulting in false positive markers23. Otherwise, the previous published GBS-derived SNPs used in tambaqui registered the highest conversion rate (85%), as result of a particular strategy in SNP discovery and validation by using linkage mapping experiments16. In pacu, the larger dataset of candidate SNPs provided to AFFYMETRIX were derived from RNA-Seq, but only 10% were incorporated on the Axiom array. This high discrepancy compared to RADseq and ddRADseq (40–50%) is due to the larger size of the RNA contigs, which resulted in multiple SNPs per contig; therefore, as only one SNP per contig was selected, high drop out of SNPs derived from the RNA-Seq dataset was observed.
Overall, although a high conversion rate of QC-filtered SNPs was reported, it is likely that a low proportion of false positive SNPs discovered in these sequencing experiments would remain, particularly in the ddRAD and RADseq dataset. This is due to unique potential sources of error and bias in the library preparation protocol, especially related to PCR duplicates (clonal DNA fragments originated during PCR steps) that can lead to downstream genotyping errors (for review, see Andrews et al.38). Moreover, false positive or failed SNPs may also occur in RNA-seq-derived SNPs because Mendelian errors in pedigreed samples were not possible to be detected by this method and/or due to limitations in the genotyping technology (SNPs located close to exon–intron boundaries).
It is also worth noting that the rate of monomorphic SNPs was low, with values of 5.98% and 4.96% in pacu and tambaqui, respectively, especially when compared to the array developed for Atlantic Salmon using similar techniques (12–53%)23. The high values of monomorphic SNPs (large number of false positive SNPs) in the latter study were mainly attributed to the RNA-seq technique (53%), which is particularly susceptible to false positive marker discovery23. Moreover, the conversion rate in the Atlantic Salmon array (46%) was relatively low comparing to the SerraSNP array (71–74%), mostly due to the duplicated genome nature in salmonids, in which was necessary to apply special strategies to avoid false positive SNPs, such as effective removal of putative paralogous variants by RR-seq (reduced-representation sequencing) of a haploid fish (conversion rate of 74%).
In the present study, a high fraction of polymorphic SNPs was identified in putative transcribed regions of the genomes of pacu and tambaqui (66.9 and 62.7%, respectively) (Supplementary Table S4). Moreover, particularly for the RNA-Seq-derived SNPs, several markers were also identified in coding sequences or regulatory regions of transcripts; therefore, they are likely to be functional and could be linked to phenotypes of economic interest. Thus, the enrichment of associated SNPs to transcribed genes in the SerraSNP array will be useful for the determining the genetic architecture of target traits in aquaculture and, consequently, for the inclusion of genomic selection in breeding programs of these species.
Population genetic analysis showed that the SNP discovery strategy used here allowed us to identify and develop a high-quality 60K genotyping array which can be reliably used to genotype different populations of farmed pacu and tambaqui. Low to moderate genetic diversity values and genetic differentiation between farmed populations have already been found for both species probably due to the lack of controlled management of broodstocks, although genetic similarity was found in between some farms most likely due to physical proximity favoring an exchange of fish between them39,40.
Results from the assessment of the SNP segregation between different populations indicate that the SNP panel developed in the present study would be useful for genetic applications across different populations of pacu and tambaqui, including high-resolution population genomics, genome wide-association studies and genomic selection. Nevertheless, the performance of this SNP array is expected to have a slight decrease in some populations particularly due to genetic composition of the target populations used for SNP discovery (instead ascertainment bias of the techniques). For instance, most of the samples used for SNP discovery in tambaqui were from Brazil, creating bias for Colombia and Peru hatcheries. Otherwise, a similar rate of SNPs segregation at population level was detected across the different techniques of SNP discovery, even when using limited sampling source (only 3 families by ddRAD in pacu).
The conversion rate for using SNP markers between species (pacu into tambaqui, and vice-versa) or using an additional species (pirapitinga) is low. This situation is comparable to the 250K SNP array for catfish, in which the transferability for different species also resulted in a low number of polymorphic SNPs27. Only 10% of the 30K SNPs for pacu on the SerraSNP array were polymorphic to pirapitinga; thus, the probes designed from pacu sequences could hybridize to the genomic DNA of pirapitinga, but the level of polymorphic SNPs in the latter species was limited. However, as there is no SNP marker data available for pirapitinga to date, the 3K polymorphic SNPs identified are a valuable resource for the application of genomics to population genetic and evolutionary studies, and for selective breeding in in this species; especially given the high values of genetic diversity identified (MAF = 0.28; Ho = 0.45 and He = 0.35).
Although a larger sample size for the population of each species for LD analysis and haplotypes construction is still necessary, the preliminary results of LD pruning demonstrate that the SNP panel of pacu and tambaqui presented here will be useful for the design of low-density SNP panels providing little redundant genotypic information. As observed previously in other fish species41, lower density SNP panels can be designed and applied for genomic selection and breeding, with fewer tag markers selected on interesting traits. In addition, correlation values between the mapped SNPs included in the SerraSNP array and linkage maps, previously published for tambaqui16 and pacu17, showed a high coverage and representation of the SerraSNP array throughout the genome, which reflects to its reliability and usefulness for genetic studies aiming the development of breeding programs for both species.
This study describes the development and analysis of a dense SNP array for two Serrasalmidae species. A large database of SNP markers using multiple approaches was developed for pacu and tambaqui, both native species widely farmed in South America. Following stringent filtering criteria, SNP assays for these two fish species were combined on the 60K SNP array, to generate a 30K high quality SNP panel for each species. Testing of the array in diverse populations revealed a high number of informative SNPs that are shared between species populations. Also, the array can be used to assess genetic diversity and population structure between populations. The SerraSNP array has open access, which will facilitate the study of important economic and ecological/evolutionary traits for these two Serrasalmidae species, including applications such as genomic selection, QTL mapping, phylogenetic analyses and conservation genetic programs.
Methods
Ethics statement
This study was conducted in strict accordance with the recommendations of the National Council for Control of Animal Experimentation (CONCEA) (Brazilian Ministry for Science, Technology and Innovation) and was approved by the Ethics Committee on Animal Use (CEUA numbers 19.005/17 and 19.006/17) of Faculdade de Ciências Agrárias e Veterinárias, UNESP, Campus Jaboticabal, SP, Brazil.
Sample information
The tambaqui samples used for SNP discovery were obtained from broodstocks of different commercial hatcheries in South America and from the breeding nucleus of the Aquaculture Center (CAUNESP) of São Paulo State University (UNESP), Brazil (Table 6).
Table 6.
Details about the sampling and methods used for SNP discovery for each species.
Species | Source | Method | Samples | Origin |
---|---|---|---|---|
Tambaqui | Hatchery 1—broodstock | ddRADseqa | 11 | South Brazil |
Hatchery 2—broodstock | ddRADseqa | 17 | South Brazil | |
Hatchery 3—broodstock | ddRADseqa | 17 | South Brazil | |
Hatchery 4—broodstock | ddRADseqa | 15 | North Brazil | |
Hatchery 5—broodstock | ddRADseqa | 18 | North Brazil | |
Hatchery 6—broodstock | ddRADseqa | 21 | North Brazil | |
Hatchery 7—broodstock | ddRADseqa | 15 | Colombia | |
Hatchery 8—broodstock | ddRADseqa | 16 | Colombia | |
Hatchery 9—broodstock | ddRADseqa | 31 | Peru | |
Hatchery 10—broodstock | ddRADseqa | 8 | Peru | |
Breeding nucleus (Caunesp)—broodstock | ddRADseqa | 29 | South Brazil | |
Breeding nucleus (Caunesp)—20 full-sib families | ddRADseqb | 600 | South Brazil | |
Nunes et al.16—1 full-sib family | GBS | 124 | North Brazil | |
Ariede et al.14—pool of 10 random individuals | RNA-Seq | 10 | South Brazil | |
Gomes et al.47—pool of 8 random individuals | RNA-Seq | 8 | North Brazil | |
Pacu | Mastrochirico-Filho et al.17—14 full-sib families | RADseq | 400 | South Brazil |
Breeding nucleus (Caunesp)—3 full-sib families | ddRADseqa | 100 | South Brazil | |
Mastrochirico-Filho et al.46—3 full-sib families | RNA-Seq | 36 | South Brazil |
aEnzyme combination I (SphI and MluCI).
b3 families used the enzyme combination I (SphI and MluCI) and 17 families used the combination II (NlaIII and MluCI).
The commercial hatcheries represented ten different populations, of which six were from Brazil, two from Peru and two from Colombia. The samples from the Caunesp breeding nucleus were composed of 29 selected breeders, plus the parents and offspring from 20 full-sib families generated with a hierarchical mating scheme using 6 dams and 17 sires. In total, the 20 full-sib family dataset consisted of 23 parents and 577 F1 individuals (≅ 30 fish per family). The SNP array was validated with 5 different populations of tambaqui (three from Brazil, one from Colombia and one from Peru). The three from Brazil were the same populations used for SNP discovery (Table 4).
The individuals of pacu used for SNP discovery were obtained from the breeding nucleus of CAUNESP and represented 17 full-sib families (Table 6), which were generated with a hierarchical mating scheme using 8 dams and 15 sires. In total, 23 parents and 477 F1 individuals were selected from the 17 families (≅ 25 fish per family). The SNP array was tested on four different populations of pacu collected from broodstocks of different commercial hatcheries in Brazil, which were not the same populations used for SNP discovery (Table 4).
SNP discovery and filtering
DNA extraction was performed using the DNeasy Blood & Tissue kit (QIAGEN). Purified DNA was quantified using the Qubit dsDNA BR Assay kit (INVITROGEN). There is no reference genome available for tambaqui and pacu, therefore, the ddRADseq approaches were used for SNP discovery because they are appropriate for genome reduction and de novo assembly42,43. Previous published databases using different techniques, such as GBS, RADseq and RNA-Seq (Table 6), were also used to increase the power of SNP discovery throughout the genome.
ddRADseq library construction was completed as previously described by Peterson et al.43. Briefly, 25 ng of genomic DNA from each individual was double-digested (8 U/reaction) using one of two enzyme combinations, I (NlaIII and MluCI) or II (SphI and MluCI) (Table 6), and ligated to specific adapters for each enzyme (P1 and P2, 0.25 μM) using T4 DNA ligase, at 23 °C for 1 h 30 min and 65 °C for 10 min. P1 adapters had an additional 5 nucleotides that served as individual markers (barcode). Size selection of the digested fragments was performed with E-Gel SizeSelect II (THERMOFISHER) equipment. Subsequently, PCR assays were performed to incorporate the indexes identification of each library (about 48 samples/library). Phusion enzyme conditions were used to perform the PCR assay. Reactions were purified with the AMPure XP Beads Kit and analysed by Agilent Bioanalyzer and Qubit. ddRADseq libraries were sequenced by NOVOGENE (Sacramento, USA) on ILLUMINA HiSeq4000 (PE 150 bp).
After sequencing, data were analyzed using the software package Stacks for de novo SNP identification (STACKS v. 2.4144). The subprograms of Stacks were implemented sequentially (process_radtags, ustacks, cstacks, sstacks). The stacks populations module was used to generate genotype output data for the population samples. Initially, reads were demultiplexed and filtered using process_radtags, with parameters that removed reads with uncalled bases, discarded reads with low quality scores and excluded sequences in which barcodes and RAD cutsites were not found. Subsequently, the loci were constructed by de novo methodology due to the absence of reference genomes for both species, using a minimum read depth of three (m = 3). A loci catalog was constructed in cstacks using a subset of individuals for those derived from broodstocks or using the parental individuals for those resulting from family structure. The maximum number of mismatches between sample loci when building the catalog was set to three (n = 3). The sstacks program was used in order to match individual putative loci (constructed by ustacks) against the catalog, followed by the module populations for SNP discovery, with the filter parameters of minor allele frequency (MAF) > 0.05 and genotyping call rate > 0.7. To differentiate putative SNPs from sequencing errors, PLINK 1.9 software45 was used to exclude SNPs using Hardy–Weinberg equilibrium (assessed on each individual broodstock) and Mendelian error rate (me 0.1 for family structure). Moreover, individuals with high missing genotype rates (mind 0.3) were also discarded.
The raw sequences from previous RNA-Seq databases were downloaded from the NCBI database (see Data Availability section) and the following steps were performed. The RNA-Seq database of pacu was originated from 36 individuals challenged against the bacteria Aeromonas hydrophila (for details, see Mastrochirico-Filho et al.46). The RNA-Seq raw data of tambaqui was obtained from 10 liver samples and 8 muscle and skin samples, all of them collected from random individuals, as described in detail by Ariede et al.14 and Gomes et al.47, respectively. High quality reads were used to assemble a de novo transcriptome using TRINITY v.2.9.1, adopting standard parameters48, such as a kmer value of 25 and contigs longer than 200 bp. The read mapping was performed by BOWTIE2 v.2.3.4.349. Redundancy was reduced using the CD-HIT-EST v.4.6.8 software, clustering sequences with 95% identity50. After mapping, the SAMTOOLS program51 was used to manipulate the BAM files for SNP discovery. The filtering was performed with a quality equal to or higher than 20 (Phred score) and variants with a minimum sequencing depth of 10 reads. Abundant and repetitive SNPs in small areas were excluded. The VCFtools program52 was used to eliminate SNPs with MAF values less than 0.05.
SNPs from GBS were the same used for linkage mapping in tambaqui described in detail by Nunes et al.16.
SNP selection for Axiom array design
A list consisting of 130,403 and 81,848 putative SNP markers for pacu and tambaqui was provided to THERMOFISHER (AFFYMETRIX) as 71-mer nucleotide sequences, with both alleles at the target SNP highlighted at position 36. A p-convert value (probability of a putative SNP locus converting to a reliable assay on the Axiom array) was produced by THERMOFISHER team for each submitted SNP adjacent sequence. Probes were analysed for each SNP side (forward and reverse), and then classified as recommended, neutral, or not recommended, based on p-convert values and wobble criteria (i.e., nearby interfering polymorphisms or another SNP at the flanking sequence of the target marker).
The initial analysis performed by THERMOFISHER resulted in a superior number of recommended markers than the total capacity of the Axiom MyDesign custom array. Therefore, a second step of filtering was carried out to obtain the best 30K SNPs for each species. Basically, the data was filtered to achieve only one SNP per RAD/ddRAD locus or RNA contig. When one RAD/ddRAD locus or RNA contig was identified with multiple recommended SNPs, the SNP marker with highest p-convert was selected.
SNP array validation
In total, 152 genomic DNA samples (94 pacu and 58 tambaqui individuals) were sent to THERMOFISHER (USA, California) for genotyping using the SerraSNP array. The results were used to validate the performance of the array and to quantify the number of segregating SNPs in the sampled populations.
Raw data consisting of intensity calculations (CEL files) was imported into the Axiom Analysis Suite (v2.0.035, AFFYMETRIX) for genotype calling and quality control. Samples presenting a QC call rate > 0.97 and quality control (DQC) > 0.82 passed the quality control assessment (following the “Best Practices Workflow” recommended by AFFYMETRIX). Quality control analysis then classified the SNPs into categories according to their clustering performance in relation to Axiom quality control criteria: (i) “polymorphic high resolution” where the SNP passes all QC, (ii) “monomorphic high resolution” where the SNP passes all QC, but only one genotype is detected, (iii) “call rate below threshold” where genotype call rate is < 97%, (iv) “no minor homozygote” where the SNP passes all QC but only two clusters are observed, (v) “off-target variant” where atypical cluster properties arise from variants in the SNP flanking region, and (vi) “other” where the SNP does not fall into any of the previous categories. Only SNPs from categories (i) and (iv) were included and identified as polymorphic for further analyses, as they are most probably reliable and informative SNPs.
Descriptive statistics of minor allele frequencies (MAF), expected (He) and observed (Ho) heterozygosity; and IBS (Identity by State) analysis followed by multi-dimensional scaling (MDS), discriminant analysis of principal components (DAPC) and linkage disequilibrium (LD) were performed using PLINK 1.9 software45. A predefined tambaqui genome (not published) was adopted as reference to identify SNPs into transcribed regions using BEDOPS v2.4.4053. To evaluate SNPs from RNA-Seq with potential functional effects, the list of polymorphic SNPs selected for the array was annotated with SNPEff v 5.054 using Uniprot proteins database as reference.
At the moment, there are still no reference genomes at chromosome level available for both species. However, linkage maps were previously developed for tambaqui16 and pacu17 using 7192 and 17,453 SNPs, respectively. Of these total, 25.6% and 42.6% of mapped SNPs were included respectively in the SerraSNP array, and were correlated to the linkage maps to evaluate the level of coverage and genetic representativeness of the SerraSNP array for both species.
Supplementary Information
Acknowledgements
This work was supported by São Paulo Research Foundation (FAPESP grant 2016/21011–9, 2016/18294–9, 2017/26900–9, 2017/19717-3, 2019/10662-7, 2019/08972-8 and 2020/11049-4) providing a PhD fellowship for VAMF and funds for the sequencing service; Comisión Nacional de Investigación Científica y Tecnológica (International Call FAPESP-CONICYT - 2018/08416-5) providing funds for the experiment costs; National Council for Scientific and Technological Development (CNPq grant 311559/2018–2, 422670/2018-9 and 140740/2016–3) providing a PhD scholarship and financing for the study design; and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES—Finance Code 001 and CAPES/PRINT) providing funds for project costs.
Author contributions
D.T.H. and J.M.Y. conceptualized, acquire funding and supervised this project; V.A.M.F. performed bioinformatics analysis and analysed the data with R.B.A. and C.H.S.B.; P.C collaborated with the construction of the SNP array; D.T.H., M.H.M.B. and G.A.L.S. coordinated the sample collection; R.B.A., M.V.F., V.A.M.F., L.V.G.L., N.J.M and J.F.G.A. performed DNA extraction and library construction; D.T.H. wrote the manuscript with V.A.M.F and R.B.A. collaboration; J.M.Y and F.P reviewed, approved and contributed with the language improvement of the manuscript. The authors declare that there is no conflict of interest.
Data availability
The raw fastq files obtained by RADseq methods for SNP discovery in tambaqui55 and pacu56 are available in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). RNA sequencing information is encompassed for pacu57 and tambaqui58 by NCBI BioProject ID PRJNA632934 and PRJNA358254, respectively.
Code availability
Parameters of software involved in the analysis are described in the following paragraph. FastQC: v0.11.8, standard parameters. RADseq methods: (1) Stacks: v. 2.0, m = 3, n = 3, r = 0.70. (2) Plink: v. 1.9, -hwe 10–6 –me 0.1 0.9 -mind 0.3. RNA-Seq method: (1) Trimmomatic: v.0.36, LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:36. (2) Trinity: v.2.9.1, standard parameters kmer-length value of 25 and contiguous sequences (contigs) longer than 200 bp. (3) BWA (Burrows-Wheeler Aligner): v. 0.7.17, mem algorithm. (4) SAMtools programs: version 1.9, default parameters, vcfutils.pl varFilter -d 10 -a 3.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Vito A. Mastrochirico-Filho and Raquel B. Ariede.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-021-98885-x.
References
- 1.Fricke, R., Eschmeyer, W. N. & Van der Laan, R. (eds). Eschmeyer’s catalog of fishes: genera, species, references http://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp (2020).
- 2.Jégu M. Subfamily Serrasalminae (Pacus and piranhas) In: Reis RE, Kullander SO, Ferraris CJ, editors. Check List of the Freshwater Fishes of South and Central America. Edipucrs; 2003. pp. 182–196. [Google Scholar]
- 3.Mateussi NTB, Melo BF, Ota RP, Roxo FF, Ochoa LE, Foresti F, Oliveira C. Phylogenomics of the Neotropical fish Family Serrasalmidae with a novel intrafamilial classification (Teleostei: Characiformes) Mol. Phylogenet. Evol. 2020;153:106945. doi: 10.1016/j.ympev.2020.106945. [DOI] [PubMed] [Google Scholar]
- 4.Valladão GMR, Gallani SU, Pilarski F. South American fish for continental aquaculture. Rev. Aquacult. 2018;10:351–369. doi: 10.1111/raq.12164. [DOI] [Google Scholar]
- 5.Woynárovich, A. & Van Anrooy, R. Field guide to the culture of tambaqui (Colossoma macropomum, Cuvier, 1816). FAO Fisheries and Aquaculture Technical Paper 624 (FAO Food and Agriculture Organization of the United Nations, 2019).
- 6.Freitas MV, Lira LVG, Ariede RB, Agudelo JFG, Neto RRO, Borges CHS, Mastrochirico-Filho VA, Neto BFG, Carvalheiro R, Hashimoto DT. Genotype by environment interaction and genetic parameters for growth traits in the Neotropical fish pacu (Piaractus mesopotamicus) Aquaculture. 2020;530:735933. doi: 10.1016/j.aquaculture.2020.735933. [DOI] [Google Scholar]
- 7.Honglang, H. Freshwater fish seed resources in China in Assessment of freshwater fish seed resources for sustainable aquaculture. FAO Fisheries Technical Paper. No. 501 (eds Bondad-Reantaso, M. G.) 185–199 (FAO Food and Agriculture Organization of the United Nations, 2007).
- 8.IBGE (Instituto Brasileiro de Geografia e Estatística). Produção da Pecuária Municipal 2016 https://biblioteca.ibge.gov.br/index.php/biblioteca-atalogo?view=detalhes&id=784 (2016).
- 9.IBGE (Instituto Brasileiro de Geografia e Estatística). Produção da Pecuária Municipal 2019 https://biblioteca.ibge.gov.br/index.php/biblioteca-catalogo?view=detalhes&id=784 (2019).
- 10.Mastrochirico-Filho VA, Ariede RB, Freitas MV, Lira LV, Agudelo JFG, Pilarski F, Neto RVR, Yáñez JM, Hashimoto DT. Genetic parameters for resistance to Aeromonas hydrophila in the Neotropical fish pacu (Piaractus mesopotamicus) Aquaculture. 2019;513:734442. doi: 10.1016/j.aquaculture.2019.734442. [DOI] [Google Scholar]
- 11.Ariede RB, Freitas MV, Agudelo JFG, Borges CHS, Lira LVG, Yoshida GM, Pilarski F, Yáñez JM, Hashimoto DT. Genetic (co) variation between resistance to Aeromonas hydrophila and growth in tambaqui (Colossoma macropomum) Aquaculture. 2020;523:735225. doi: 10.1016/j.aquaculture.2020.735225. [DOI] [Google Scholar]
- 12.Lira LV, Ariede RB, Freitas MV, Mastrochirico-Filho VA, Agudelo JFG, Barría A, Yáñez JM, Hashimoto DT. Quantitative genetic variation for resistance to the parasite Ichthyophthirius multifiliis in the Neotropical fish tambaqui (Colossoma macropomum) Aquac. Rep. 2020;17:100338. doi: 10.1016/j.aqrep.2020.100338. [DOI] [Google Scholar]
- 13.Mastrochirico-Filho VA, Hata ME, Sato LS, Jorge PH, Foresti F, Rodriguez MV, Martínez P, Porto-Foresti F, Hashimoto DT. SNP discovery from liver transcriptome in the fish Piaractus mesopotamicus. Conserv. Genet. Resour. 2016;8:109–114. doi: 10.1007/s12686-016-0521-3. [DOI] [Google Scholar]
- 14.Ariede RB, Freitas MV, Hata ME, Mastrochirico-Filho VA, Utsunomia R, Mendonça FF, Foresti F, Porto-Foresti F, Hashimoto DT. Development of microsatellite markers using next-generation sequencing for the fish Colossoma macropomum. Mol. Biol. Rep. 2018;45:9–18. doi: 10.1007/s11033-017-4134-z. [DOI] [PubMed] [Google Scholar]
- 15.Jorge PH, Mastrochirico-Filho VA, Hata ME, Mendes NJ, Ariede RB, Freitas MV, Vera M, Porto-Foresti F, Hashimoto DT. Genetic characterization of the fish Piaractus brachypomus by microsatellites derived from transcriptome sequencing. Front. Genet. 2018;9:46. doi: 10.3389/fgene.2018.00046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Nunes JRS, Liu S, Pértille F, Perazza CA, Villela PMS, Almeida-Val VMF, Hilsdorf AWS, Liu Z, Coutinho LL. Large-scale SNP discovery and construction of a high-density genetic map of Colossoma macropomum through genotyping-by-sequencing. Sci. Rep. 2017;7:46112. doi: 10.1038/srep46112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mastrochirico-Filho VA, Borges CHS, Freitas MV, Ariede RB, Pilarski F, Utsunomia R, Carvalheiro R, Gutierrez AP, Peñaloza C, Yáñez JM, Houston RD, Hashimoto DT. Development of a SNP linkage map and genome-wide association study for resistance to Aeromonas hydrophila in pacu (Piaractus mesopotamicus) BMC Genomics. 2020;21:672. doi: 10.1186/s12864-020-07090-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Yue GH. Recent advances of genome mapping and marker-assisted selection in aquaculture. Fish Fish. 2014;15:376–396. doi: 10.1111/faf.12020. [DOI] [Google Scholar]
- 19.Yue GH, Wang L. Current status of genome sequencing and its applications in aquaculture. Aquaculture. 2017;468:337–347. doi: 10.1016/j.aquaculture.2016.10.036. [DOI] [Google Scholar]
- 20.Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–1829. doi: 10.1093/genetics/157.4.1819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Sonesson AK, Meuwissen TH. Testing strategies for genomic selection in aquaculture breeding programs. Genet. Sel. Evol. 2009;41:37. doi: 10.1186/1297-9686-41-37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Nunes JRS, Pértille F, Andrade SCS, Perazza CA, Villela PMS, Almeida-Val VMF, Gao Z-X, Coutinho LL, Hilsdorf AWS. Genome-wide association study reveals genes associated with the absence of intermuscular bones in tambaqui (Colossoma macropomum) Anim. Genet. 2020 doi: 10.1111/age.13001. [DOI] [PubMed] [Google Scholar]
- 23.Houston RD, Taggart JB, Cézard T, Bekaert M, Lowe NR, Downing A, Talbot R, Bishop SC, Archibald AL, Bron JE, Penman DJ, Davassi A, Brew F, Tinch AE, Gharbi K, Hamilton A. Development and validation of a high density SNP genotyping array for Atlantic salmon (Salmo salar) BMC Genomics. 2014;15:1–13. doi: 10.1186/1471-2164-15-90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Yáñez JM, Naswa S, López ME, Bassini L, Correa K, Gilbey J, Bernatchez L, Norris A, Neira R, Lhorente JP, Schnable PS, Newman S, Mileham A, Deeb N, Di Genova A, Maass A. Genomewide single nucleotide polymorphism discovery in Atlantic salmon (Salmo salar): Validation in wild and farmed American and European populations. Mol. Ecol. Resour. 2016;16:1002–1011. doi: 10.1111/1755-0998.12503. [DOI] [PubMed] [Google Scholar]
- 25.Palti Y, Gao G, Liu S, Kent MP, Lien S, Miller MR, Rexroad CE, 3rd, Moen T. The development and characterization of a 57 K single nucleotide polymorphism array for rainbow trout. Mol. Ecol. Resour. 2015;15:662–672. doi: 10.1111/1755-0998.12337. [DOI] [PubMed] [Google Scholar]
- 26.Barría A, Christensen KA, Yoshida G, Jedlicki A, Leong JS, Rondeau EB, Lhorente JP, Koop B, Davidson WS, Yáñez JM. Whole genome linkage disequilibrium and effective population size in a coho salmon (Oncorhynchus kisutch) breeding population using a high-density SNP array. Front. Genet. 2019;10:498. doi: 10.3389/fgene.2019.00498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Liu S, Sun L, Li Y, Sun F, Jiang Y, Zhang Y, Zhang J, Feng J, Kaltenboeck L, Kucuktas H, Liu Z. Development of the catfish 250K SNP array for genome-wide association studies. BMC Res. Notes. 2014;7:135. doi: 10.1186/1756-0500-7-135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Yáñez JM, Yoshida G, Barria A, Palma-Véjares R, Dante T, Díaz D, Cáceres G, Cádiz MI, López ME, Lhorente JP, Jedlicki A, Soto J, Salas D, Maass A. High-throughput single nucleotide polymorphism (SNP) discovery and validation through whole-genome resequencing in Nile Tilapia (Oreochromis niloticus) Mar. Biotechnol. 2020;22:109–117. doi: 10.1007/s10126-019-09935-5. [DOI] [PubMed] [Google Scholar]
- 29.Xu J, Zhao Z, Zhang X, Zheng X, Li J, Jiang Y, Kuang Y, Zhang Y, Feng J, Li C, Yu J, Li Q, Zhu Y, Liu Y, Xu P, Sun X. Development and evaluation of the first high-throughput SNP array for common carp (Cyprinus carpio) BMC Genomics. 2014;15:307. doi: 10.1186/1471-2164-15-307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Yáñez JM, Newman S, Houston RD. Genomics in aquaculture to better understand species biology and accelerate genetic progress. Front. Genet. 2015;6:128. doi: 10.3389/fgene.2015.00128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.You X, Shan X, Shi Q. Research advances in the genomics and applications for molecular breeding of aquaculture animals. Aquaculture. 2020;526:735357. doi: 10.1016/j.aquaculture.2020.735357. [DOI] [Google Scholar]
- 32.Correa K, Lhorente JP, López ME, Bassini L, Naswa S, Deeb N, Di Genova A, Maass A, Davidson WS, Yáñez JM. Genome-wide association analysis reveals loci associated with resistance against Piscirickettsia salmonis in two Atlantic salmon (Salmo salar L.) chromosomes. BMC Genomics. 2015;16:854. doi: 10.1186/s12864-015-2038-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Palaiokostas C, Ferraresso S, Franch R, Houston RD, Bargelloni L. Genomic prediction of resistance to pasteurellosis in gilthead sea bream (Sparus aurata) using 2b-RAD sequencing. G3 Genes Genom. Genet. 2016;6:3693–3700. doi: 10.1534/g3.116.035220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Bangera R, Correa K, Lhorente JP, Figueroa R, Yáñez JM. Genomic predictions can accelerate selection for resistance against Piscirickettsia salmonis in Atlantic salmon (Salmo salar) BMC Genomics. 2017;18:121. doi: 10.1186/s12864-017-3487-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Vallejo RL, Silva RMO, Evenhuis JP, Gao G, Liu S, Parsons JE, Martin KE, Wiens GD, Lourenco DAL, Leeds TD, Palti Y. Accurate genomic predictions for BCWD resistance in rainbow trout are achieved using low-density SNP panels: Evidence that long-range LD is a major contributing factor. J. Anim. Breed. Genet. 2018;135:263–274. doi: 10.1111/jbg.12335. [DOI] [PubMed] [Google Scholar]
- 36.Zeng Q, Fu Q, Li Y, Waldbieser G, Bosworth B, Liu S, Yang Y, Bao L, Yuan Z, Li N, Liu Z. Development of a 690 K SNP array in catfish and its application for genetic mapping and validation of the reference genome sequence. Sci. Rep. 2017;7:1–14. doi: 10.1038/srep40347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Nugent CM, Leong JS, Christensen KA, Rondeau EB, Brachmann MK, Easton AA, Ouellet-Fagg CL, Crown MTT, Davidson WS, Koop BF, Danzmann RG, Ferguson MM. Design and characterization of an 87k SNP genotyping array for Arctic charr (Salvelinus alpinus) PLoS ONE. 2019;14:e0215008. doi: 10.1371/journal.pone.0215008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Andrews KR, Good JM, Miller MR, Luikart G, Hohenlohe PA. Harnessing the power of RADseq for ecological and evolutionary genomics. Nat. Rev. Genet. 2016;17:81–92. doi: 10.1038/nrg.2015.28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Aguiar JP, Gomes PFF, Hamoy IG, dos Santos EB, Schneider H, Sampaio I. Loss of genetic variability in the captive stocks of tambaqui Colossoma macropomum (Cuvier, 1818), at breeding centres in Brazil, and their divergence from wild populations. Aquac. Res. 2018;49:1914–1925. doi: 10.1111/are.13647. [DOI] [Google Scholar]
- 40.Mastrochirico-Filho VA, Del Pazo F, Hata ME, Villanova GV, Foresti F, Vera M, Martínez P, Porto-Foresti F, Hashimoto DT. Assessing genetic diversity for a pre-breeding program in Piaractus mesopotamicus by SNPs and SSRs. Genes. 2019;10:668. doi: 10.3390/genes10090668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Tsairidou S, Hamilton A, Robledo D, Bron JE, Houston RD. Optimizing low-cost genotyping and imputation strategies for genomic selection in Atlantic salmon. G3 Genes Genom. Genet. 2020;10:581–590. doi: 10.1534/g3.119.400800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, Selker EU, Cresko WA, Johnson EA. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE. 2008;3:e3376. doi: 10.1371/journal.pone.0003376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE. Double digest RADseq: An inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS ONE. 2012;7:e37135. doi: 10.1371/journal.pone.0037135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH. Stacks: Building and genotyping loci de novo from short-read sequences. G3 Genes Genom. Genet. 2011;1:171–182. doi: 10.1534/g3.111.000240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Mastrochirico-Filho VA, Hata ME, Kuradomi RY, Freitas MV, Ariede RB, Pinheiro DG, Robledo D, Houston R, Hashimoto DT. Transcriptome profiling of Pacu (Piaractus mesopotamicus) challenged with pathogenic Aeromonas hydrophila: Inference on immune gene response. Front. Genet. 2020;11:604. doi: 10.3389/fgene.2020.00604. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Gomes F, Watanabe L, Vianez J, Nunes M, Cardoso J, Lima C, Schneider H, Sampaio I. Comparative analysis of the transcriptome of the Amazonian fish species Colossoma macropomum (tambaqui) and hybrid tambacu by next generation sequencing. PLoS ONE. 2019;14:e0212755. doi: 10.1371/journal.pone.0212755. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. Trinity: Reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat. Biotechnol. 2011;29:644. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–1659. doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
- 51.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Neph S, Kuehn MS, Reynolds AP, Haugen E, Thurman RE, Johnson AK, Rynes E, Maurano MT, Vierstra J, Thomas S, Sandstrom R, Humbert R, Stamatoyannopoulos JA. BEDOPS: High-performance genomic feature operations. Bioinformatics. 2012;28:1919–1920. doi: 10.1093/bioinformatics/bts277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012;6:80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.NCBI Sequence Read Archivehttp://identifiers.org/ncbi/insdc.sra:SRP293925 (2020)..
- 56.NCBI Sequence Read Archivehttp://identifiers.org/ncbi/insdc.sra:SRP293156 (2020).
- 57.NCBI Sequence Read Archivehttp://identifiers.org/ncbi/insdc.sra:SRP261699 (2020).
- 58.NCBI Sequence Read Archivehttp://identifiers.org/ncbi/insdc.sra:SRP095431 (2020).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The raw fastq files obtained by RADseq methods for SNP discovery in tambaqui55 and pacu56 are available in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). RNA sequencing information is encompassed for pacu57 and tambaqui58 by NCBI BioProject ID PRJNA632934 and PRJNA358254, respectively.
Parameters of software involved in the analysis are described in the following paragraph. FastQC: v0.11.8, standard parameters. RADseq methods: (1) Stacks: v. 2.0, m = 3, n = 3, r = 0.70. (2) Plink: v. 1.9, -hwe 10–6 –me 0.1 0.9 -mind 0.3. RNA-Seq method: (1) Trimmomatic: v.0.36, LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:36. (2) Trinity: v.2.9.1, standard parameters kmer-length value of 25 and contiguous sequences (contigs) longer than 200 bp. (3) BWA (Burrows-Wheeler Aligner): v. 0.7.17, mem algorithm. (4) SAMtools programs: version 1.9, default parameters, vcfutils.pl varFilter -d 10 -a 3.