Abstract
Whole-genome single-nucleotide polymorphism (SNP) markers are valuable genetic resources for the association and conservation studies. Genome-wide SNP development in many teleost species are still challenging because of the genome complexity and the cost of re-sequencing. Genotyping-By-Sequencing (GBS) provided an efficient reduced representative method to squeeze cost for SNP detection; however, most of recent GBS applications were reported on plant organisms. In this work, we used an EcoRI-NlaIII based GBS protocol to teleost large yellow croaker, an important commercial fish in China and East-Asia, and reported the first whole-genome SNP development for the species. 69,845 high quality SNP markers that evenly distributed along genome were detected in at least 80% of 500 individuals. Nearly 95% randomly selected genotypes were successfully validated by Sequenom MassARRAY assay. The association studies with the muscle eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) content discovered 39 significant SNP markers, contributing as high up to ∼63% genetic variance that explained by all markers. Functional genes that involved in fat digestion and absorption pathway were identified, such as APOB, CRAT and OSBPL10. Notably, PPT2 Gene, previously identified in the association study of the plasma n-3 and n-6 polyunsaturated fatty acid level in human, was re-discovered in large yellow croaker. Our study verified that EcoRI-NlaIII based GBS could produce quality SNP markers in a cost-efficient manner in teleost genome. The developed SNP markers and the EPA and DHA associated SNP loci provided invaluable resources for the population structure, conservation genetics and genomic selection of large yellow croaker and other fish organisms.
Keywords: Genotyping-By-Sequencing (GBS), Single nucleotide polymorphism (SNP), Teleost, Marker development, Large yellow croaker
Introduction
Whole-genome single nucleotide polymorphism (SNP) is one of the most important genomic resources for population diversity, conservation genetics and functional gene identification for biological traits (Seeb et al., 2011). To obtain the molecular markers of the shared genomic loci among individuals, many technologies were invented and developed to probe whole-genome polymorphisms. The techniques allowing synthesizing DNA probes in chips have led to the advent and application of SNP microarrays (Lipshutz et al., 1999), making it possible to explore genome-wide SNP in a high-throughput manner. However the cost of array design and application obstructs the wider usage in non-model species, especially for endangered and economic organisms (De Donato et al., 2013). More importantly, microarray approaches cannot discover novel SNP loci for species without reference sequences (Popova et al., 2013). With the development of next-generation sequencing (NGS), the state-of-art sequencing platform enable scientists to scan small variants in genomes at an unprecedentedly scale with rapidly decreasing price. The multiplex library strategies were widely used to further reduce the cost per sample. However, the budget is still one of the biggest challenges for whole-genome resequencing in non-model samples (Muir et al., 2016). Furthermore, the whole-genome sequencing data for hundreds of individuals also inevitably burdens the limited computational and bioinformatics capacity in labs.
In the past few years, several robust sequencing-based genotyping techniques have been invented in the research community to overcome the bottle-neck of cost in whole-genome resequencing. Most of those innovations employ a strategy of partial genome representation sequencing (Narum et al., 2013), such as restriction site associated DNA (RAD) (Rowe, Renaut & Guggisberg, 2011), IIB restriction endonucleases based RAD (2bRAD) (Wang et al., 2012) and Genotyping-By-Sequencing (GBS) (De Donato et al., 2013). RAD applies a restriction enzyme to digest genome DNA and then random fragment them to generate RAD tags. Although RAD experiments was initially designed for microarray-based genotyping (Miller et al., 2007), the updated RAD tag isolation and library construction procedure has been prevalently used to couple with high-throughput sequencing on the Illumina platforms, resulting many successful applications for genome-wide genotyping, genetic mapping, quantitative trait locus (QTL) and association studies (Baird et al., 2008). However, RAD still depends on random fragmentations, reducing the consistence on SNP loci among samples. Elshire et al. (2011) subsequently developed a more straightforward genotyping method as GBS with restriction enzymes of ApeKI in maize and barley. The protocols for GBS are simple, extremely specific and highly reproducible. In recent years, the easy transferability of GBS to other species leads to many application in plants (Poland & Rife, 2012). One of the most attracting features of GBS is the using of methylation-sensitive restriction enzymes during libraries constructions to avoid repetitive fragments and to simplify the reads alignments in extremely complex genomes (Elshire et al., 2011); therefore, GBS is an excellent whole-genome genotyping technique for complex non-model organism genomes with massive repetitive regions and abundant genetic diversities.
Teleost, representing a large portion of fish species, has been showed to undergo the third round of whole-genome duplication (WGD) 370 million years ago (Braasch et al., 2016; Xiao et al., 2015b). The extra WGD left a large portion of duplicated and repetitive sequences in teleost genomes (Berthelot et al., 2014; Jaillon et al., 2004), making the accurate whole-genome SNP marker development was still challenging in many teleost species (Wang et al., 2008). We speculated that GBS technique provided an efficient way and was suitable for genotyping in teleost complex genome. However, the whole-genome SNP development and association studies based on GBS is rarely reported on teleost fish species. Large yellow croaker (Larimichthys crocea), belonging to the Sciaenidae family of teleost, is an important marine fish in China and East Asia (Xiao et al., 2015a). Due to over-fishing and habitat degradation in last decades, the wild stock of the species has rapidly collapsed (Liu, Mitcheson & Sadovy, 2008). The environmental changes and over-dense aquaculture pose more challenges on population conservation and sustainable development of the aquaculture for large yellow croaker. Whole-genome molecular markers and genome-wide association studies (GWAS) for important traits are prerequisites for the population conservation and genomic selection of the species (Steiner et al., 2013). However, the association studies are rarely reported for large yellow croaker, largely because of the lacking of abundant stable genomic SNP markers.
GBS technique provides the potential cost-efficient way for whole-genome SNP marker development in complex teleost genome. In the present investigation, we used large yellow croaker to verify the applicability of GBS on teleost. Two restriction enzymes of EcoRI and NlaIII based GBS protocol was developed and optimized. Massive whole-genome SNP markers were developed from the sequencing reads by bioinformatic pipelines, which were subsequently validated by Sequenom MassARRAY assay. The detected SNP markers in this work were then applied to the whole-genome association study of the muscle Eicosapentaenoic Acid (EPA) and Docosahexaenoic Acid (DHA) content in large yellow croaker. Our study confirmed the suitability of GBS on whole-genome SNP marker development in teleost genome. The developed whole-genome SNP markers and functional genes involved in muscle EPA and DHA contents offered valuable genetic resources for conservation genetics and genomic selection of large yellow croaker.
Materials and Methods
Ethics statement
The sample collection and experiments in the study was approved by the Animal Care and Use Committee of Fisheries College of Jimei University (Animal Ethics no. 1067).
Sample preparation and DNA extraction
The mixed reference population of 500 individuals was bred by a random fertilization of 30 males and 30 females at the large yellow croaker breeding base of Jimei University in Ningde, Fujian, China. All fish individuals were 1.5 year old with the total length and weight of 24.5–25.9 cm and 217.8–234.1 g (95% confidence interval), respectively. To extract genomic DNA respectively from 500 individuals, the dorsal fins (20–30 mg) of the fish individuals were collected, frozen in liquid nitrogen for the following DNA extraction. Total genomic DNA was prepared in 1.5 ml microcentrifuge tubes containing 550 µl TE buffer (100 mM NaCl, 10 mM Tris, pH 8, 25 mM EDTA, 0.5% SDS and proteinase K, 0.1 mg/ml). The samples were incubated at 55 °C overnight and subsequently extracted twice using phenol and then phenol/chloroform (1:1) method. DNA was precipitated by adding two and a half volumes of ethanol, collected by brief centrifugation, washed twice with 70% ethanol, air dried, re-dissolved in TE buffer (10 mM Tris–HCl, 1 mM EDTA, pH 7.5). DNA concentration and quality were estimated with an ND-1000 spectrophotometer (NanoDrop, Wilmington, DE, USA) and by electrophoresis in 0.8% agarose gels with a lambda DNA standard.
In silico enzyme assessment for GBS library construction
To assess how endonucleases influence the DNA fragment length distribution, four enzyme combinations were designed for in silico digestion of large yellow croaker genome: ApeKI-PstI, EcoRI-BstNI, EcoRI-NlaIII and PstI-NlaIII (NEB, Ipswich, MA, USA). The activation and heat-inactivation temperature, location in chromosome and length distribution of all fragments were analyzed to evaluation the best performance of each enzyme combination.
GBS library construction and sequencing
The GBS libraries were constructed based on two DNA endonucleases: EcoRI (NEB, Ipswich, MA, USA) and NlaIII (NEB, Ipswich, MA, USA). A pilot GBS experiment was performed before the library construction to optimize the temperature and time parameters for yield, size distribution. Based on the pilot experiment, the GBS libraries of large yellow croaker based on EcoRI and NlaIII were constructed following the similar method in previous report (Beissinger et al., 2013). Briefly, genomic DNA (20 ng/µl) was incubated at 37°C with EcoRI and NlaIII, 10XCutSmart™ Buffer. The restriction reactions were heat-inactivated at 65 °C by 20 min and were kept in 8 °C for the following experiments. Sequencing adaptor and barcode mix, T4 DNA Ligase, 10 mM ATP and 10XCutSmart™ Buffer were incubated at 16 °C for 2 h for ligation reactions. The reactions were then heat-inactivated at 65 °C by 20 min and the reaction systems were kept in 8 °C. Then, polymerase chain reactions (PCR) experiments were performed in the reaction solutions (20 µL) containing the diluted restriction/ligation samples (4 pM, 2 µL), dNTP (each at 10 mM, 5 µL), Taq DNA polymerase (NEB, Ipswich, MA, USA) (5 units/µL, 0.25 µL), Illumina Primers (each at 10 µM, 1 µL) and Indexing Primers (10 µM, 1 µL). The PCR procedure was: 95 °C 2 min; 15 cycle of 95 °C 30 s 60 °C 30 s, 72 °C 30 s; 72 °C 5 min and kept in 4 °C. The PCR products were run on a 8% polyacrylamide gel electrophoresis. Fragments of 200–300 bp were isolated using QIAGEN QIAquick® Gel Extraction Kit and diluted for pair-end sequencing on an Illumina HiSeq 2500 sequencing platform (Illumina, Inc, San Diego, CA, USA).
Sequencing read quality control and genotyping
The raw sequencing reads generated by Illumina HiSeq 2500 from the GBS libraries were treated and cleaned for SNP detection. First, the adaptors were removed and the resulted reads were split by sample-specific barcode sequences. Only reads begins with the digest site sequences of EcoRI and NlaIII were retained for the following quality control. Second, the overall base and read quality were accessed by FastQC. To avoid the negative influence of ambiguous bases for SNP detection, reads with more than 5% of N were removed. Then, the resulted reads were cleaned by the following steps: (1) discarding the reads that the quality lower than 20; (2) deleting 5 bp windows in reads end that the average quality smaller than 20; (3) removing read pairs if one end was shorter than 50 bp.
The cleaned reads were mapped to large yellow croaker genome by BWA 0.7.6a (Li & Durbin, 2009). The mapping was preceded by a short reads alignment with BWA-MEM algorithm. The alignment were then sorted by coordinates and duplicate marked by SortSam and MarkDuplicates programs in Picard tools 1.107 (picard.sourceforge.net), respectively. To reduce the false positives of SNP detection in this study, three processes were carried out: (1) short read mapping were re-aligned by local bases matches; (2) base Quality Score Recalibration (BQSR) was employed to adjust the accuracy of the base and mapping quality scores; (3) only reads pairs that both aligned on genome with a mapping score higher than 30 were used for SNP calling. Then, the SNP markers were detected by GATK UnifiedGenotyper utility.
SNP validation by Sequenom MassARRAY assay
Genomic DNA was extracted from dorsal fin ray tissue as the method described before. PCR amplification was performed in the reaction system (5 µl total volume) containing 20 ng of genomic DNA, 0.5U HotstarTaq (Qiagen), 0.5 µl 10× PCR buffer, 0.1 µl dNTPs for each nucleotide and 0.5 pmol of each primer. All PCR experiments were carried out in a PTC-100 PCR instrument (Eppendorf) with the following program: 4 min denaturation at 94 °C, 35 cycles of 20 s at 94 °C, 30 s at 56 °C and 1 min at 72 °C and a final extension at 72 °C for 3 min. After the PCR products were cleaned using 2 µl SAP (SEQUENOM), the single base extension used 2 µl EXTEND Mix (SEQUENOM) contained 0.94 µl Extend primer Mix, 0.041 µl iPLEX enzyme and 0.2 µl iPLEX termination mix and performed with the following steps: initial denaturation at 94 °C for 30 s, followed by 40 cycles of 3-step amplification profile of 5 s at 94 °C, additional 5 cycles of 5 s at 52 °C and 5 s at 80 °C and a final extension at 72 °C for 3 min.The PCR product was cleaned by resin purification and then analyzed using MassARRAY Analyzer Compac (SEQUENOM) and software TYPER (SEQUENOM).
To evaluate the accuracy of SNP detection in this study, the genotypes from GATK SNP calling were compared with those from MassARRAY assay. If the genotypes of one SNP locus from GATK calling were identical with that in MassARRAY, then the locus was called a correct genotype. As a result, 1,421 of 1,500 SNP loci were correctly genotyped by GATK and the success rate of SNP calling was ∼94.7%. The specificity and sensitivity of SNP calling in the study were also evaluated. The reference homozygous genotypes (AA) both from MassARRAY and GATK were called true negatives, and the heterozygous genotypes or allelic homozygous (AB and BB) both from MassARRAY and GATK were called true positives. Specificity was then calculated as the number of true positives divided by the number of true positives plus the number of false positives, and the sensitivity was estimated as the number of true positives divided by the number of true positives plus the number of false negatives as the following formula:
Association analysis with the muscle EPA and DHA content
From the 500 large yellow croaker population, 200 individuals were randomly selected for the muscle EPA and DHA content measurement for the following statistics and association analysis. The fat acid composition analysis followed the similar methods in previous reports (Murillo, Rao & Durant, 2014). Briefly, the total lipid was extracted from the fresh muscle tissue using the chloroform-methanol method (Folch, Lees & Sloane-Stanley, 1957). After saponification with 1 ml of 50% KOH in 15 ml ethanol, the lipid was then esterified in 80 °C for 20 min using 6.7% boron trifluoride (BF3) in methanol (Morita chemical industries Co., Ltd., Osaka, Japan). After making up in hexane (20 mg/ml), fatty acid methyl esters (FAME) preparations were analyzed by gas chromatography (GC). The temperature increase of 170–260 °C at 2 °C/min was set and helium was used as the carrier gas. Since the muscle contents of EPA and DHA were highly correlated, we combined those two components together in fish muscle.
Prior to the association study, pairwise clustering based on the alleles shared identical by state (IBS) between any two individuals was performed to assess the genetic relatedness. The population stratification was analyzed by multi dimensional scaling (MDS) clustering methods available in the PLINK software (Purcell et al., 2007).
With the developed SNP markers, the association analysis was performed between genotypes and measured muscle EPA and DHA content using Plink 1.07 A simple linear regression of phenotype on genotype was performed in the analysis. Markers with p-values ≤ 1e–4 were considered significantly associated with muscle EPA and DHA contents. To identify the biological functions of nearby genes and whether the orthologs of these significantly associated loci were also associated with the EPA and DHA content in other species, we identified the protein-coding genes around 50 kb of the significant SNP markers. We aligned the genes against the NCBI nr database by Blastx (Altschul et al., 1997). GO term and KEGG pathway enrichment analysis of the associated genes were performed with Gene set enrichment analysis (GSEA) (Shi & Walker, 2007) by two-tailed Fish’s exact test with Benjamini & Hochberg false discovery rate (FDR) (Benjamini & Hochberg, 1995) against the background of the all protein-coding gene in large yellow croaker genome. The additive genetic variances were estimated by using R-package EMMREML, Version 3.1. (http://mirror.bjtu.edu.cn/cran/web/packages/EMMREML/index.html).
Results
Enzyme assessment and GBS construction for large yellow croaker
According to the principles of the enzyme combination design for GBS library construction, four enzyme combinations were designed for the GBS analysis of large yellow croaker genome: ApeKI-PstI, EcoRI-BstNI, EcoRI-NlaIII and PstI-NlaIII (NEB, Ipswich, MA, USA). To assess the fragment size distribution and the number of potential SNP marker developed, the public large yellow croaker draft genome sequences (Ao et al., 2015) were in silico digested by the four two-enzyme combinations to mimic the genomic fragmentation. As shown in the Fig. 1, the predicted fragment numbers decreased with the fragment size for all enzyme combinations, but the ApeKI-PstI and EcoRI-BstNI lead to a large portion of fragments longer than 1 kb. According to the size distribution in Fig. 1 and to make the fragment size more compatible to NGS sequencing, the genomic fragment with a size of 100–300 bp were preferred; therefore, EcoRI-NlaIII and PstI-NlaIII were the rational combinations for the following library construction. According to our in silico experiments, genomic fragment in the range of 200–300 bp were used to construct GBS libraries (see ‘Method’ for details). By the assessment of the combination of EcoRI-NlaIII, roughly 1.5 million fragments would be collected in libraries.
Library sequencing and reads mapping
GBS libraries were constructed by the two enzyme based digestion (see ‘Method’ for the details). The NGS sequencing of GBS libraries for 500 individuals generate roughly 314 Gb raw sequencing reads. To evaluate the raw data distribution among samples, we found that the majority of individuals (∼95%) had the raw sequencing reads ranged from 600 to 650 Mb, indicating the excellent sequencing uniformity among samples from library construction and sequencing. The raw reads were cleaned by HTSeq to trim low quality ends (average quality < 20) and eliminate short reads (length < 50 bp). The cleaned reads were mapped to large yellow croaker reference genome sequences (Ao et al., 2015) by BWA (Li & Durbin, 2009). To assess the quality of GBS library, the mapped reads distribution of Sample 88 along the linkage groups were illuminated as an example (Fig. S1). We found that reads were evenly covered all linkage groups of large yellow croaker, indicating an ideal representativeness of the libraries at the whole genome level. The covered loci depth distribution (Fig. S2) showed that the majority of depth ranged from 5 to 20 and extreme reads enrichment on genome local regions were successfully avoided in sequencing libraries.
SNP discovery among samples in large yellow croaker genome
To develop molecular markers based on the GBS library sequencing, SNP variants markers were detected from the reads alignments by GATK (McKenna et al., 2010) pipelines (see ‘Method and Material’ for detailed information). To improve the quality of the detected SNP, we employed the extra reads local re-alignment and Base Quality Score Recalibration (BQSR) steps in SNP calling pipelines. Previous literatures on model organisms showed that those extra processes on reads alignment and SNP quality could significant reduce the false positives SNPs (De Pristo et al., 2011; McKenna et al., 2010; Van der Auwera et al., 2013), therefore our refined bioinformatics pipeline coupled with the library construction provided a solid foundation for SNP detection in this study. As a result, 489,246 SNP markers were discovered in at least 200 large yellow croakers with a loci depth threshold of 3. It is not surprised that the majority of SNP markers were not shared by all samples because of the inherent DNA polymorphisms on enzyme digestion site in genome. The number of shared SNP markers among samples was crucial for the evaluation of the GBS sequencing of large yellow croaker genome, especially for the studies of QTL and GWAS analysis in populations. We further used depth- and population-based method to investigate the influence of loci depth and population size on the number of shared SNP marker. As we expected, both the population size and loci depth dramatically influenced the number of shared SNP markers (Fig. 2). However, hundreds of thousands of the shared SNP markers were identified with a depth threshold of 5 in the study. According to previous literatures on SNP development in non-model organisms, the depth filtering of 5 provided high quality SNP markers for the genetic studies (Hiremath et al., 2012; Nguyen, Hayes & Ingram, 2014); therefore, our SNP calling based on GBS library developed sufficient SNP markers for the biological trait mapping and conservation genetics. We indeed found the sharp decreases on the number of shared SNPs for the population size from 450 to 500, which could be attributed to the samples with extremely low sequencing amount.
To control SNP marker quality while maximizing the number of shared samples and to facilitate the following GWAS analysis, markers with the loci depth higher than 5 and the shared in at least 400 individuals (90% of all sample) were used for the following analysis, resulting 69,845 SNP markers in large yellow croaker genome. To answer the question if our sequencing data was sufficient for the whole-genome SNP development, the numbers of the detected genomic SNP markers were plotted against the sequencing data for each sample. As shown in Fig. 3, the number of the discovered SNP marker increased with sequencing reads and remained to be ∼70,000 when the sequencing amount reached 600 Mb, implying that 600 Mb might be an optimal data amount for the trade-off the cost and SNP number in our large yellow croaker GBS libraries. The distribution of those SNP markers in 24 linkage groups (Fig. 4) showed that those SNP markers were ideally evenly distributed in the genome, suggesting an excellent representation of whole genome markers in large yellow croaker. The location and functions of SNPs were investigated by comparing the locus coordinates with those of gene annotations. We found that ∼53 % of these SNPs were from genic regions, including exons (3,000), introns (27,114), and untranslated regions (UTRs, 9,166) (Fig. 3). The detailed SNP categories in UTR revealed that 4,311 and 4,855 SNPs were from 5UTR and 3UTR respectively. The biological functions of those SNP markers were analyzed according to their relative positions of the protein-coding genes. As in Fig. 5, 866 SNPs markers in coding regions caused synonymous mutations. Of the remaining markers, 3,022 SNPs could lead to a change of amino acid and introduction of frame shift and new or lost start/stop codons. Those SNP markers might significantly alter the biological functions of the hosting genes and thus influence the biological traits that controlled by those genes.
Experimental validation of detected SNP loci
To assess the reliability of the SNP makers developed from the reduced representation libraries, 50 loci from 30 individuals were randomly selected to validate the marker polymorphism by the Sequenom MassARRAY assay. As shown in Table 1, MassARRAY assay verified the most of detected SNP markers in those samples. Among 1,500 markers, 1,421 were validated by MassARRAY, confirming our library construction, sequencing and SNP marker calling pipelines. The primers for SNP validation and the detailed genotypes were listed in Tables S1 and S2, respectively. As shown in the Table 1, the specificity and sensitivity for the SNP genotype detection in the present study were estimated as 94.2% and 98.3%, respectively. Notably, we found that the majority of discordant genotypes were heterozygous, which was consistent with the reports for other organisms (Sonah et al., 2013). We attributed the error-prone genotypes in heterozygous markers to the fact that those markers need more supporting reads than their homozygous counterparts. However, the Sequenom MassARRAY assay still successfully validated ∼95% of the detected SNP marker developed by the GBS library sequencing, providing us solid SNP genotypes of the following trait association and other genetics studies for large yellow croaker.
Table 1. SNP validation by Sequenom MassARRAY assay.
Genotypes | Sequenom MassARRAY assay | ||||
---|---|---|---|---|---|
AA | AB | BB | NN | ||
SNP calling | AA | 901 | 0 | 2 | 0 |
AB | 54 | 404 | 7 | 0 | |
BB | 0 | 2 | 116 | 0 | |
NN | 11 | 1 | 2 | 0 |
The association study with the muscle EPA and DHA content
To apply the genome-wide markers to probe potential marker and genes contributing to muscle EPA and DHA contents, 200 large yellow croakers reared with the identical feed in the same netcage were used to quantify EPA and DHA level. Muscle EPA and DHA contents in 176 individuals were successfully extracted and measured. The contents exhibited a typical normal-like distribution (p-value of 0.94 with Kolmogorov–Smirnov test) with an average of 21.5 mg/g and a standard deviation of 4.1 mg/g (Fig. S3). The difference of the highest and the lowest EPA and DHA contents was ∼13.8 mg/g.
Before the association study, population stratification with MDS clustering showed that all samples were grouped into several clusters (Fig. S4). 200 individuals used in the association study were distributed among population clusters, providing diversified genetic background for the following studies. From the MDS plot (Fig. S4), the first and second dimension value (arbitrary value) were all smaller than 0.05, indicating homogeneous genotypes among samples. The association study of SNP marker with the muscle EPA and DHA content was performed with the linear model with a covariance to sex in Plink (Purcell et al., 2007). 69,845 SNP loci developed above with depth threshold of 5 were used to perform the association study (Fig. 5). As shown in Fig. 6, 39 markers from 11 linkage groups were exhibited significant association with the EPA and DHA content (p-value < 1e–4). Notably, many associated markers were significant by clusters in linkage group 4, 5 and 11, suggesting the credibility of the association studies. The results might also imply that many genes contributed to the muscle EPA and DHA levels in large yellow croaker. With the variance estimation by Restricted Maximum Likelihood (REML) method (Smith & Graser, 1986), we found that those 39 significant markers could interpret as high up to ∼63.0% of genetic variance explained by all 69,845 markers.
To identify gene contributing to the muscle EPA and DHA content in large yellow croaker, we investigated the biological functions of protein-coding genes within 50 kb of all significant SNP markers (p-value < 1e–4). As a result, 122 genes were identified from the above association regions. The biological KEGG pathway and GO term annotations of the associated genes were enriched under the background of all protein-coding genes. The metabolic pathway of fat digestion and absorption was significant (FDR < 0.023) in the KEGG enrichment (Fig. 6B and Table S3). Meanwhile, GO terms of unsaturated fatty acid biosynthetic process, fatty acid derivative biosynthetic process and lipid transporter activity were also highlighted (FDR < 0.05) for the associated functional genes (Fig. 6C). The detailed gene function GO annotations were summarized in Table S4. We found that the many identified genes played important roles in lipid transport, metabolism and transcription regulation, such as apolipoprotein B (APOB), Carnitine O-acetyltrasferase (CRAT) and oxysterol binding protein 10 (OSBPL10). APOB is a crucial lipid transport protein in organism. Previous nutriology studies confirmed the correlation of EPA and DHA contents with APOB genotypes and gene expression (Anil, 2007). Given their close relationship, we speculated that the polymorphisms on APOB gene might contribute to the EPA and DAH accumulation in large yellow croaker muscle. CRAT and OSBPL10 may also involved in the muscle EPA and DHA content since carnitine and oxysterol were important components and regulators in EPA and DHA synthesis pathways according to previous reports (Qiu, 2003; Rise, Marangoni & Galli, 2002). Notably, we observed palmitoyl-protein thioesteraes 2 (PPT2) (around a maker with a p-value of 6.7e−06) as a potential functional gene contributing to muscle EPA and DHA contents. PPT2 gene was also identified by genome-wide association study on n-3 and n-6 polyunsaturated fatty acid levels in Chinese and European-ancestry populations (Dorajoo et al., 2015; Hu et al., 2016).
Discussions
The advent and development of NGS have unprecedentedly prompted the application of the whole-genome marker development (Seeb et al., 2011). Recently, SNP developments on genomic level were performed in many species, including livestock and fish in agriculture (Sun et al., 2014). However, the cost for whole-genome re-sequencing is still one of the largest challenges in genomic marker developments. Based on NGS, GBS generally used multiple endonucleases to obtain the desired genomic length and the number of fragments to squeeze the sequencing cost (De Donato et al., 2013; Elshire et al., 2011; Sonah et al., 2013), thus improving the specificity of marker detection along individuals; however, most of the GBS application were reported for plant genomes. Teleost, representing a large portion of fish species, has been showed to undergo the additional third round of whole-genome duplication (WGD) 370 million years ago. The extra genome duplication led to a large portion of duplicated and repetitive sequences in teleost genomes (Sémon & Wolfe, 2007). GBS techniques provided an efficient way to probe polymorphism markers from complex genomes; however, the whole-genome SNP development and association studies based on GBS is rarely reported on teleost fish species. In this work, we used teleost large yellow croaker to verify the applicability of GBS on genomic marker development on teleost species. So far as we know, this is the first GBS implementation in large yellow croaker genome. The developed SNP markers provided useful resources for the following genetic studies, including population structure, conservation and functional gene mapping of important traits of the species. The enzyme combination and GBS protocols used in this study could also be valuable reference for other teleost species.
Our in silico experiments mimicked the two enzyme digestion on large yellow croaker genome. After the detail investigation of the enzymes, we chose the combination of EcoRI and NlaIII for GBS protocol for two reasons. Firstly, EcoRI and NlaIII possessed the identical heat-inactivation temperature, which facilitated the pilot studies to optimize the experimental conditions for library construction; secondly and more importantly, EcoRI was sensitive when restriction site overlaps methylation sequence of CpG islands, therefore the using of the enzyme would partially avoid the digestion in repetitive regions. Many previous GBS libraries were constructed with the fragment length 100–300 bp or even wider (Elshire et al., 2011; Sonah et al., 2013); however, we predicted ∼3 million fragment would be generated in that range. The large number of fragment implied a large amount of sequencing reads to cover those genomic regions, which would increase the unit-cost for the sequencing. To reduce the genomic fragments needed to be sequenced for libraries in this study, we attempted to narrow the length range to 200–300 bp, which was predicted to generate roughly 1.5 million genomic fragment for sequencing.
Taking the SNP frequency of 1 per 1,000 bp (Pushkarev, Neff & Quake, 2009), the library sequencing might result into roughly 300 thousand SNP markers along large yellow croaker genome. Our estimation was based on the assumption that all individuals have no mutation on endonuclease digesting site and the read depth were high enough to cover SNP loci. However, because of the divergent genomic background among populations, it is very hard to detect all SNP markers that shared by all individuals. In this work, 489,246 raw SNP markers supported by more than three reads were detected with an average sequencing amount of 600 Mb in at least 200 individual from the 500 large yellow croaker population. To facilitate the following marker association study and breeding practise, previous studies proposed several methods to filter the high quality SNP that shared in more individuals, such as depth-based (Li et al., 2009), quality score-based (Brockman et al., 2008) and population-based (Bansal et al., 2010) manner. In this study, we employed a composite strategy for SNP filtering by simultaneously considering loci depth, marker quality and shared population size. As a result, 69,845 SNP markers were left with a depth higher than 5, quality score higher than 100 and shared with at least 80% individuals (400 large yellow croakers). More than half (∼53%) of those detected quality SNP markers resided in genic regions, enabling us to probe the possible association of trait with the nearby functional genes. We noticed that the percentage of markers in genic regions was higher than that of previous reports in soybean (39.5%) (Sonah et al., 2013) but lower than that of sweat cherry (65.6%) (Guajardo et al., 2015). Those SNP markers generated from the reduced representation library, especially those in genic regions, provided us an easy and efficient manner to detect genomic small variants and to identify genomic regions related to important traits of large yellow croaker at genomic scale. The detected SNP markers were then validated by Sequenom MassARRAY assay for the randomly selected 50 loci in 30 individuals. Although the success rate (94.6%) was slight lower than that reported in the similar study in soybean (98%) (Sonah et al., 2013), the library preparation protocol and bioinformatics pipeline provided us high quality genotypes on those SNP loci in the population for the following association studies.
The successful applications of GWAS have greatly prompted the understanding to the genetic bases of important economic traits and would eventually benefit the artificial breeding and population conservation of non-model species (Correa et al., 2015; Narum et al., 2013) EPA and DHA are both omega-3 poly-unsaturated fatty acids that important in human physiology (Swanson, Block & Mousa, 2012). Previous medical experiments demonstrated their positive effects on depressive symptoms in clinical trials (Hoffmire et al., 2012) and the essential functions in brain development (Brenna & Carlson, 2014). Marine fish is a main source for human EPA and DHA supplement and nutritional properties of fish meat are highly dependent on polyunsaturated fatty acid levels; therefore the EPA and DHA content in muscle is one of the important indicants for the meat quality of fish. The genetic bases controlling EPA and DHA accumulation in fish species are highly interconnected and not fully revealed. Identifying key SNP loci and functional genes will increase our knowledge of molecular mechanism of polyunsaturated fatty acid synthesis and metabolism in marine fish. To the best of our knowledge, most of the previous researches were focus on the genetic variants on poly unsaturated fatty acid metabolism after fish oil supplements in human or gene expression and EPA and DHA level changes with different feed in fish (Gregory et al., 2016; Li et al., 2014; Li et al., 2013; Trushenski et al., 2012). The association studies aiming to identity potential functional genes contributing to EPA and DHA accumulation in fish meat is rarely reported.
Among 176 individuals that were used to measure the muscle EPA and DHA level, the average muscle EPA and DHA content in the top 20 large yellow croakers (28.4 mg/g) was almost two-fold of that in the lowest 20 ones (14.6 mg/g). Given that those fish were reared in the same cage and fed with the identical feed, there was a great potential to raise the muscle EPA and DHA content in large yellow croaker via genetic improvements. Using the developed quality SNP markers by GBS protocol, 39 SNP markers from 11 linkage groups were observed to be significantly associated with muscle EPA and DHA levels. From the coordinates of gene and SNP loci, 122 protein-coding genes were identified around those significant markers. The functional analysis by homological searching found that many genes were involved in fat metabolism and transport, such as APOB, CRAT and OSBPL10. Unsaturated fatty acid biosynthetic process, fatty acid derivative biosynthetic process and lipid transporter activity and fat digestion and absorption pathway were significantly enriched in GO terms and KEGG pathways for the identified functional genes. Meanwhile, we observed large numbers of genes functions in cellular metabolism, gene expression and translation regulation, which may also play a role in modulating muscle EPA and DHA contents (Tables S2 and S3). Interestingly, we identified the potential functional gene of PPT2 gene in large yellow croaker that was previously discovered during the whole-genome association of plasma n-3 and n-6 polyunsaturated fatty acid level in Asian and European populations (Hu et al., 2016). The PPT2 gene in the linkage group 5 of large yellow croaker might play a similar function in human and also contribute to the muscle EPA and DHA level. This result suggested that teleost fish and human may shared similar metabolic pathway for the polyunsaturated fatty acid synthesis and accumulation; however biological functions of PPT2 gene for the muscle EPA and DHA content in large yellow croaker and other vertebrates need further gene functional analysis.
Conclusions
Teleost were widely believed to undergo the third round of WGD during the natural evolution; therefore, genomes of many teleost species were characterized by the complexity of high heterozygosity and repeat contents. In this work, EcoRI-NlaIII based GBS protocol was used to develop the whole-genome SNP markers in teleost large yellow croaker. The study verified the applicability of GBS on teleost species and provided useful references for GBS applications in other fish species. For large yellow croaker, about 70,000 high quality SNP markers, supported by at least 400 individuals in population, were detected from the GBS libraries. Those SNP markers were further experimentally validated by Sequenom MassARRAY assay. The even distribution and diversified biological impacts of those molecular makers confirmed the effect and efficiency of the GBS-based SNP development in large yellow croaker. With muscle EPA and DHA contents from 176 individuals, a genome-wide association study between genotypes and EPA and DHA level were performed. 39 and 122 significantly associated SNP loci and related protein-coding genes were identified. The functional analysis of the related genes confirmed the results of the association study.
For the aspect of molecular resources, our developed SNP markers could be valuable genetic resources for large yellow croaker, and would be used in the following population structure, conservation genetics and the association studies for other important economic traits. The associated results for the muscle EPA and DHA content, namely the significant SNP loci and functional genes, provided us important guidance for the further investigation of genetic bases of the muscle EPA and DHA accumulation in large yellow croaker and would eventually aid the technological development towards the genetic improvement of meat quality via the molecular-aided selection of the species.
Supplemental Information
Funding Statement
This work was supported by grants from National Natural Science Foundation of China (U1205122 and 31602207), Natural Science Foundation of Fujian Province (2016J05081), National ‘863’ Project of China (2012AA10A403), and the Foundation for Innovation Research Team of Jimei University (2010A02). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Additional Information and Declarations
Competing Interests
The authors declare there are no competing interests.
Author Contributions
Shijun Xiao conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Panpan Wang and Zhaofang Han analyzed the data, reviewed drafts of the paper.
Linsong Dong analyzed the data, contributed reagents/materials/analysis tools, reviewed drafts of the paper.
Yaguang Zhang and Qiurong Wang performed the experiments, contributed reagents/materials/analysis tools, reviewed drafts of the paper.
Zhiyong Wang conceived and designed the experiments, reviewed drafts of the paper.
Animal Ethics
The following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers):
This sample collection and experiments was approved by the Animal Care and Use committee of Fisheries College of Jimei University (Animal Ethics no. 1067).
Data Availability
The following information was supplied regarding data availability:
The sequencing short reads were deposited in the NCBI Sequence Read Archive (SRA): https://www.ncbi.nlm.nih.gov/bioproject/?term=309464.
References
- Altschul et al. (1997).Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anil (2007).Anil E. The impact of EPA and DHA on blood lipids and lipoprotein metabolism: influence of apoE genotype. Proceedings of the Nutrition Society. 2007;66:60–68. doi: 10.1017/S0029665107005307. [DOI] [PubMed] [Google Scholar]
- Ao et al. (2015).Ao J, Mu Y, Xiang LX, Fan D, Feng M, Zhang S, Shi Q, Zhu LY, Li T, Ding Y. Genome sequencing of the perciform fish Larimichthys crocea provides insights into molecular and genetic mechanisms of stress adaptation. PLoS Genetics. 2015;11:e1005118. doi: 10.1371/journal.pgen.1005118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baird et al. (2008).Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, Selker EU, Cresko WA, Johnson EA. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE. 2008;3:e3376. doi: 10.1371/journal.pone.0003376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bansal et al. (2010).Bansal V, Harismendy O, Tewhey R, Murray SS, Schork NJ, Topol EJ, Frazer KA. Accurate detection and genotyping of SNPs utilizing population sequencing data. Genome Research. 2010;20:537–545. doi: 10.1101/gr.100040.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beissinger et al. (2013).Beissinger TM, Hirsch CN, Sekhon RS, Foerster JM, Johnson JM, Muttoni G, Vaillancourt B, Buell CR, Kaeppler SM, De Leon N. Marker density and read depth for genotyping populations using genotyping-by-sequencing. Genetics. 2013;193:1073–1081. doi: 10.1534/genetics.112.147710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benjamini & Hochberg (1995).Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological) 1995;57:289–300. [Google Scholar]
- Berthelot et al. (2014).Berthelot C, Brunet F, Chalopin D, Juanchich A, Bernard M, Noël B, Bento P, Da Silva C, Labadie K, Alberti A. The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nature Communications. 2014;5:3657–3657. doi: 10.1038/ncomms4657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Braasch et al. (2016).Braasch I, Gehrke AR, Smith JJ, Kawasaki K, Manousaki T, Pasquier J, Amores A, Desvignes T, Batzel P, Catchen J. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nature Genetics. 2016;48:427–437. doi: 10.1038/ng.3526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brenna & Carlson (2014).Brenna JT, Carlson SE. Docosahexaenoic acid and human brain development: evidence that a dietary supply is needed for optimal development. Journal of Human Evolution. 2014;77:99–106. doi: 10.1016/j.jhevol.2014.02.017. [DOI] [PubMed] [Google Scholar]
- Brockman et al. (2008).Brockman W, Alvarez P, Young S, Garber M, Giannoukos G, Lee WL, Russ C, Lander ES, Nusbaum C, Jaffe DB. Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Research. 2008;18:763–770. doi: 10.1101/gr.070227.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Correa et al. (2015).Correa K, Lhorente JP, López ME, Bassini L, Naswa S, Deeb N, Di Genova A, Maass A, Davidson WS, Yáñez JM. Genome-wide association analysis reveals loci associated with resistance against Piscirickettsia salmonis in two Atlantic salmon (Salmo salar L) chromosomes. BMC Genomics. 2015;16:1–9. doi: 10.1186/1471-2164-16-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- De Donato et al. (2013).De Donato M, Peters SO, Mitchell SE, Hussain T, Imumorin IG. Genotyping-by-sequencing (GBS): a novel, efficient and cost-effective genotyping method for cattle using next-generation sequencing. PLoS ONE. 2013;8:e62137. doi: 10.1371/journal.pone.0062137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- De Pristo et al. (2011).De Pristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, Del Angel G, Rivas MA, Hanna M. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics. 2011;43:491–498. doi: 10.1038/ng.806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dorajoo et al. (2015).Dorajoo R, Sun Y, Han Y, Ke T, Burger A, Chang X, Low HQ, Guan W, Lemaitre RN, Khor C-C. A genome-wide association study of n-3 and n-6 plasma fatty acids in a Singaporean Chinese population. Genes & Nutrition. 2015;10:1–11. doi: 10.1007/s12263-014-0449-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elshire et al. (2011).Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, Mitchell SE. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE. 2011;6:e19379. doi: 10.1371/journal.pone.0019379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Folch, Lees & Sloane-Stanley (1957).Folch J, Lees M, Sloane-Stanley G. A simple method for the isolation and purification of total lipids from animal tissues. Journal of Biological Chemistry. 1957;226:497–509. [PubMed] [Google Scholar]
- Gregory et al. (2016).Gregory M, Collins R, Tocher D, James M, Turchini G. Nutritional regulation of long-chain PUFA biosynthetic genes in rainbow trout (Oncorhynchus mykiss) The British Journal of Nutrition. 2016;115:1–9. doi: 10.1017/S0007114515003736. [DOI] [PubMed] [Google Scholar]
- Guajardo et al. (2015).Guajardo V, Solís S, Sagredo B, Gainza F, Muñoz C, Gasic K, Hinrichsen P. Construction of high density sweet cherry (Prunus avium L) linkage maps using microsatellite markers and SNPs detected by genotyping-by-sequencing (GBS) PLoS ONE. 2015;10:e0127750. doi: 10.1371/journal.pone.0127750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hiremath et al. (2012).Hiremath PJ, Kumar A, Penmetsa RV, Farmer A, Schlueter JA, Chamarthi SK, Whaley AM, Carrasquilla-Garcia N, Gaur PM, Upadhyaya HD. Large-scale development of cost-effective SNP marker assays for diversity assessment and genetic mapping in chickpea and comparative mapping in legumes. Plant Biotechnology Journal. 2012;10:716–732. doi: 10.1111/j.1467-7652.2012.00710.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoffmire et al. (2012).Hoffmire CA, Block RC, Thevenet-Morrison K, Van Wijngaarden E. Associations between omega-3 poly-unsaturated fatty acids from fish consumption and severity of depressive symptoms: an analysis of the 2005–2008 National Health and Nutrition Examination Survey. Prostaglandins, Leukotrienes and Essential Fatty Acids. 2012;86:155–160. doi: 10.1016/j.plefa.2012.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu et al. (2016).Hu Y, Li H, Lu L, Manichaikul A, Zhu J, Chen YD, Sun L, Liang S, Siscovick DS, Steffen LM, Tsai MY, Rich SS, Lemaitre RN, Lin X. Genome-wide meta-analyses identify novel loci associated with n-3 and n-6 polyunsaturated fatty acid levels in Chinese and European-ancestry populations. Human Molecular Genetics. 2016;25(6):1215–1224. doi: 10.1093/hmg/ddw002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jaillon et al. (2004).Jaillon O, Aury J-M, Brunet F, Petit J-L, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004;431:946–957. doi: 10.1038/nature03025. [DOI] [PubMed] [Google Scholar]
- Li et al. (2013).Li Q, Ai Q, Mai K, Xu W, Zheng Y. A comparative study: in vitro effects of EPA and DHA on immune functions of head-kidney macrophages isolated from large yellow croaker (Larmichthys crocea) Fish & Shellfish Immunology. 2013;35:933–940. doi: 10.1016/j.fsi.2013.07.004. [DOI] [PubMed] [Google Scholar]
- Li & Durbin (2009).Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li et al. (2009).Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, Wang J. SNP detection for massively parallel whole-genome resequencing. Genome Research. 2009;19:1124–1132. doi: 10.1101/gr.088013.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li et al. (2014).Li H, Liu J-P, Zhang M-L, Yu N, Li E-C, Chen L-Q, Du Z-Y. Comparative analysis of fatty acid profiles in brains and eyes of five economic fish species in winter and summer. Journal of Food and Nutrition Research. 2014;2:722–730. doi: 10.12691/jfnr-2-10-11. [DOI] [Google Scholar]
- Lipshutz et al. (1999).Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ. High density synthetic oligonucleotide arrays. Nature Genetics. 1999;21:20–24. doi: 10.1038/4447. [DOI] [PubMed] [Google Scholar]
- Liu, Mitcheson & Sadovy (2008).Liu M, Mitcheson D, Sadovy Y. Profile of a fishery collapse: why mariculture failed to save the large yellow croaker. Fish and Fisheries. 2008;9:219–242. doi: 10.1111/j.1467-2979.2008.00278.x. [DOI] [Google Scholar]
- McKenna et al. (2010).McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miller et al. (2007).Miller MR, Dunham JP, Amores A, Cresko WA, Johnson EA. Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Research. 2007;17:240–248. doi: 10.1101/gr.5681207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Muir et al. (2016).Muir P, Li S, Lou S, Wang D, Spakowicz DJ, Salichos L, Zhang J, Weinstock GM, Isaacs F, Rozowsky J. The real cost of sequencing: scaling computation to keep pace with data generation. Genome Biology. 2016;17:1–9. doi: 10.1186/s13059-015-0866-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Murillo, Rao & Durant (2014).Murillo E, Rao K, Durant AA. The lipid content and fatty acid composition of four eastern central Pacific native fish species. Journal of Food Composition and Analysis. 2014;33:1–5. doi: 10.1016/j.jfca.2013.08.007. [DOI] [Google Scholar]
- Narum et al. (2013).Narum SR, Buerkle CA, Davey JW, Miller MR, Hohenlohe PA. Genotyping-by-sequencing in ecological and conservation genomics. Molecular Ecology. 2013;22:2841–2847. doi: 10.1111/mec.12350. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen, Hayes & Ingram (2014).Nguyen TTT, Hayes BJ, Ingram BA. Genetic parameters and response to selection in blue mussel (Mytilus galloprovincialis) using a SNP-based pedigree. Aquaculture. 2014;420–421:295–301. doi: 10.1016/j.aquaculture.2013.11.021. [DOI] [Google Scholar]
- Poland & Rife (2012).Poland JA, Rife TW. Genotyping-by-sequencing for plant breeding and genetics. The Plant Genome. 2012;5:92–102. doi: 10.3835/plantgenome2012.05.0005. [DOI] [Google Scholar]
- Popova et al. (2013).Popova T, Boeva V, Manie E, Rozenholc Y, Barillot E, Stern M-H. Analysis of somatic alterations in cancer genome: from SNP arrays to next generation sequencing. Sequence and Genome Analysis I–Humans, Animals and Plants. 2013;204:63–97. [Google Scholar]
- Purcell et al. (2007).Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, De Bakker PI, Daly MJ. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pushkarev, Neff & Quake (2009).Pushkarev D, Neff NF, Quake SR. Single-molecule sequencing of an individual human genome. Nature Biotechnology. 2009;27:847–850. doi: 10.1038/nbt.1561. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Qiu (2003).Qiu X. Biosynthesis of docosahexaenoic acid (DHA, 22: 6–4, 7, 10, 13, 16, 19): two distinct pathways. Prostaglandins, Leukotrienes and Essential Fatty Acids. 2003;68:181–186. doi: 10.1016/S0952-3278(02)00268-5. [DOI] [PubMed] [Google Scholar]
- Rise, Marangoni & Galli (2002).Rise P, Marangoni F, Galli C. Regulation of PUFA metabolism: pharmacological and toxicological aspects. Prostaglandins, Leukotrienes and Essential Fatty Acids. 2002;67:85–89. doi: 10.1054/plef.2002.0403. [DOI] [PubMed] [Google Scholar]
- Rowe, Renaut & Guggisberg (2011).Rowe H, Renaut S, Guggisberg A. RAD in the realm of next-generation sequencing technologies. Molecular Ecology. 2011;20:3499–3502. doi: 10.1111/j.1365-294x.2011.05197.x. [DOI] [PubMed] [Google Scholar]
- Seeb et al. (2011).Seeb J, Carvalho G, Hauser L, Naish K, Roberts S, Seeb L. Single-nucleotide polymorphism (SNP) discovery and applications of SNP genotyping in nonmodel organisms. Molecular Ecology Resources. 2011;11:1–8. doi: 10.1111/j.1755-0998.2010.02979.x. [DOI] [PubMed] [Google Scholar]
- Sémon & Wolfe (2007).Sémon M, Wolfe KH. Rearrangement rate following the whole-genome duplication in teleosts. Molecular Biology and Evolution. 2007;24:860–867. doi: 10.1093/molbev/msm003. [DOI] [PubMed] [Google Scholar]
- Shi & Walker (2007).Shi J, Walker MG. Gene set enrichment analysis (GSEA) for interpreting gene expression profiles. Current Bioinformatics. 2007;2:133–137. doi: 10.2174/157489307780618231. [DOI] [Google Scholar]
- Smith & Graser (1986).Smith S, Graser H-U. Estimating variance components in a class of mixed models by restricted maximum likelihood. Journal of Dairy Science. 1986;69:1156–1165. doi: 10.3168/jds.S0022-0302(86)80516-1. [DOI] [Google Scholar]
- Sonah et al. (2013).Sonah H, Bastien M, Iquira E, Tardivel A, Légaré G, Boyle B, Normandeau É, Laroche J, Larose S, Jean M. An improved genotyping by sequencing (GBS) approach offering increased versatility and efficiency of SNP discovery and genotyping. PLoS ONE. 2013;8:e54603. doi: 10.1371/journal.pone.0054603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steiner et al. (2013).Steiner CC, Putnam AS, Hoeck PE, Ryder OA. Conservation genomics of threatened animal species. Annual Review of Animal Biosciences. 2013;1:261–281. doi: 10.1146/annurev-animal-031412-103636. [DOI] [PubMed] [Google Scholar]
- Sun et al. (2014).Sun L, Liu S, Wang R, Jiang Y, Zhang Y, Zhang J, Bao L, Kaltenboeck L, Dunham R, Waldbieser G. Identification and analysis of genome-wide SNPs provide insight into signatures of selection and domestication in channel catfish (Ictalurus punctatus) PLoS ONE. 2014;9:e109666. doi: 10.1371/journal.pone.0109666. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swanson, Block & Mousa (2012).Swanson D, Block R, Mousa SA. Omega-3 fatty acids EPA and DHA: health benefits throughout life. Advances in Nutrition. 2012;3:1–7. doi: 10.3945/an.111.000893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trushenski et al. (2012).Trushenski J, Schwarz M, Bergman A, Rombenso A, Delbos B. DHA is essential, EPA appears largely expendable, in meeting the n-3 long-chain polyunsaturated fatty acid requirements of juvenile cobia Rachycentron canadum. Aquaculture. 2012;326:81–89. [Google Scholar]
- Van der Auwera et al. (2013).Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, DePristo MA. From FastQ data to high-confidence variant calls: the genome analysis Toolkit best practices pipeline. Current Protocols in Bioinformatics. 2013;43:11.10.1–11.10.33. doi: 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang et al. (2012).Wang S, Meyer E, McKay JK, Matz MV. 2b-RAD: a simple and flexible method for genome-wide genotyping. Nature Methods. 2012;9:808–810. doi: 10.1038/nmeth.2023. [DOI] [PubMed] [Google Scholar]
- Wang et al. (2008).Wang S, Sha Z, Sonstegard TS, Liu H, Xu P, Somridhivej B, Peatman E, Kucuktas H, Liu Z. Quality assessment parameters for EST-derived SNPs from catfish. BMC Genomics. 2008;9:1. doi: 10.1186/1471-2164-9-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiao et al. (2015a).Xiao S, Han Z, Wang P, Han F, Liu Y, Li J, Wang ZY. Functional marker detection and analysis on a comprehensive transcriptome of large yellow croaker by next generation sequencing. PLoS ONE. 2015a;10:e0124432. doi: 10.1371/journal.pone.0124432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiao et al. (2015b).Xiao S, Wang P, Zhang Y, Fang L, Liu Y, Li JT, Wang ZY. Gene map of large yellow croaker (Larimichthys crocea) provides insights into teleost genome evolution and conserved regions associated with growth. Scientific Reports. 2015b;5 doi: 10.1038/srep18661. Article 18661. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The following information was supplied regarding data availability:
The sequencing short reads were deposited in the NCBI Sequence Read Archive (SRA): https://www.ncbi.nlm.nih.gov/bioproject/?term=309464.