Abstract
Salinity is a major abiotic constraint for rice farming. Abundant natural variability exists in rice germplasm for salt tolerance traits. Since few studies focused on the genome level variation in rice genotypes with contrasting response to salt stress, genomic resequencing in diverse genetic materials is needed to elucidate the molecular basis of salt tolerance mechanisms. The whole genome sequences of two salt tolerant (Pokkali and Nona Bokra) and three salt sensitive (Bengal, Cocodrie, and IR64) rice genotypes were analyzed. A total of 413 million reads were generated with a mean genome coverage of 93% and mean sequencing depth of 18X. Analysis of the DNA polymorphisms revealed that 2347 nonsynonymous SNPs and 51 frameshift mutations could differentiate the salt tolerant from the salt sensitive genotypes. The integration of genome-wide polymorphism information with the QTL mapping and expression profiling data led to identification of 396 differentially expressed genes with large effect variants in the coding regions. These genes were involved in multiple salt tolerance mechanisms, such as ion transport, oxidative stress tolerance, signal transduction, and transcriptional regulation. The genome-wide DNA polymorphisms and the promising candidate genes identified in this study represent a valuable resource for molecular breeding of salt tolerant rice varieties.
Subject terms: Genetics, Plant genetics
Introduction
Salinity is a major environmental constraint that threatens world food security. It affects crop growth, development, and productivity due to reduced water uptake and increased concentration of salts1. Soil salinization is increasing at an alarming rate with 1.5 million hectares of land becoming unsuitable for agriculture each year and 50% of the cultivable land is predicted to be unsuitable for farming by 20502. Salinity will continue to be a major constraint for crop production due to climate change and poor irrigation practices. Therefore, enhancing adaptation of major crop plants under saline condition and development of improved irrigation management practices are logical and pragmatic approaches for increasing global food production.
Rice (Oryza sativa L.) is a food staple for more than 50% of the world population. Since it is highly sensitive to salt stress, development of rice varieties that maintain high yield under saline environment is needed to enhance rice productivity. Abundant natural variability for salt tolerance existing in rice germplasm have been exploited in rice breeding programs with limited success3,4. Genetic engineering approaches using many genes associated with signaling pathways, ion transport, oxidative stress tolerance, and osmolyte accumulation have not led to commercialization of any salt tolerant rice variety. The lack of progress in this direction is mainly due to involvement of multiple adaptive mechanisms controlled by myriad of genes. Identification of genes and their superior variants would be helpful for a comprehensive understanding of salt tolerance mechanisms as well as for rice breeding activities through marker-assisted selection5.
Due to increased speed and reduced cost of sequencing, next generation sequencing technologies have accelerated development and adoption of novel strategies for crop improvement6. The available high-quality reference genome allows quick and precise alignment and mapping of sequences generated by next generation sequencing technologies resulting in identification of variants on a genome scale7. In general, discovery of single nucleotide polymorphisms (SNPs) and insertions/deletions (InDels) spread over the whole genome has proven to be useful for high throughput genotyping, genome mapping, population genetics studies, gene cloning, and marker-assisted breeding7. Since the DNA sequence variants particularly in the coding and regulatory regions impact gene expression, discovery of genome-wide variants provides an opportunity to elucidate the molecular basis of phenotypic differences.
The whole genome resequencing of rice germplasm has generated valuable genomic resources to accelerate both genetic analysis and molecular breeding of agronomically important traits8–10. A most significant undertaking in this regard involved sequencing of 3,010 diverse rice genotypes which revealed numerous novel protein coding genes and provided a comprehensive analysis of genetic diversity, population structure, and domestication process11. These genomic resources have been exploited for developing markers for genome-wide association studies of agronomically important traits12 and for resolving the origin of cultivated rice13.
The molecular basis of response to abiotic stresses such as drought and salinity has been investigated by whole genome resequencing14. The analysis of whole genome and transcriptome of a salt tolerant indica genotype SR86 revealed several differentially expressed genes and their high impact variants under salt stress15. Since very few studies have analyzed the genome level variation in salt sensitive and salt tolerant genotypes14–16, genomic resequencing in more diverse genetic materials is needed not only to provide insights into the molecular basis of salt tolerance, but also to generate genomic resources for rice improvement. In this study, we used three salt sensitive and two salt tolerant rice genotypes for resequencing. Compared to earlier studies14,15, we contrasted the salt tolerant genotypes with the salt sensitive genotypes at the whole genome level and identified genome-wide DNA polymorphisms. Through integration of the DNA polymorphism data with the results from our transcriptomics16 and quantitative trait loci (QTL) studies17–19, the differentially expressed genes (DEGs) and their variants present in the salt tolerance QTL regions were identified for validation as well as for use in molecular breeding to improve salt tolerance in the future.
Results
Genome-wide discovery of DNA polymorphisms in pair-wise analysis
Whole genome resequencing of five genotypes resulted in a total of 413 million reads (Table 1). Ninety-eight percent of the filtered reads were mapped on the reference genome. The mean genome coverage was 93% with highest in Bengal (97%) and lowest in IR64 (87%). The sequencing depth ranged from 12-fold in Cocodrie to 26-fold in IR64. The genome sequence of each of the two salt tolerant genotypes was compared with each of the three salt sensitive genotypes for discovery of SNPs and InDels. Together for all six combinations (IR64/Pokkali, IR64/Nona Bokra, Bengal/Pokkali, Bengal/Nona Bokra, Cocodrie/Pokkali, and Cocodrie/Nona Bokra) (Supplementary Fig. S1), largest number of SNPs and InDels were identified in chromosome 1, whereas chromosome 9 harbored the least. The frequency of SNPs and InDels was highest in chromosomes 6 and 7 and lowest in chromosome 10.
Table 1.
Total raw reads | Total filtered reads | Sequencing depth (fold) | Total mapped reads | Genome coverage (%) | Uniquely mapped reads | |
---|---|---|---|---|---|---|
Pokkali | 68,001,168 | 63,188,954 (92.9%) | 14.7 | 62,376,939 (98.7%) | 92.7 | 45,061,913 (71.3%) |
Nona Bokra | 65,901,336 | 60,705,534 (92.1%) | 14.1 | 59,995,162 (98.8%) | 92.2 | 43,889,764 (72.3%) |
Cocodrie | 53,807,980 | 49,931,256 (97.5%) | 11.6 | 49,478,104 (99.1%) | 95.1 | 39,103,676 (78.3%) |
Bengal | 104,386,506 | 98,771,468 (94.6%) | 23.0 | 98,132,017 (99.4%) | 96.6 | 80,529,369 (81.5%) |
IR64 | 121,235,526 | 113,048,322 (93.3%) | 26.3 | 109,233,476 (96.6%) | 86.7 | 79,987,053 (70.8%) |
With regards to the individual pair-wise comparisons, the number of variants was highest in both Bengal/Pokkali and Bengal/Nona Bokra, but lowest in IR64/Pokkali and IR64/Nona Bokra (Supplementary Fig. S2). There were 1.3 million SNPs in each combination involving Bengal, whereas 105,027 and 109,626 InDels were identified in Bengal/Pokkali and Bengal/Nona Bokra combinations, respectively (Supplementary Tables S1 and S2). On the other hand, the number of SNPs was 536,863 and 549,297 and the number of InDels was 43,604 and 46,482 in IR64/Pokkali and IR64/Nona Bokra, respectively. The number of SNPs and InDels in combinations involving Cocodrie was in between the combinations involving Bengal and IR64. The frequency of SNPs and InDels per 100 kb was highest in combinations involving Bengal followed by Cocodrie and IR64.
The distribution and frequency of these DNA polymorphisms were uneven across the rice chromosomes in all six combinations (Supplementary Tables S1 and S2). The number of InDels was proportionate to the chromosome length but not for SNPs. For example, the largest number of SNPs was observed on the longest chromosome 1 in IR64/Pokkali, Bengal/Nona Bokra, Cocodrie/Pokkali and Cocodrie/Nona Bokra. The frequency of SNPs and InDels was not the lowest for the smallest chromosome. The frequency of SNPs was highest in chromosomes 12, 8, 9, 9, 6, and 7 in IR64/Pokkali, IR64/Nona Bokra, Bengal/Pokkali, Bengal/Nona Bokra, Cocodrie/Pokkali, and Cocodrie/Nona Bokra, respectively.
Analysis and annotation of SNPs and InDels
Overall, the number of SNPs with transitions (A/G and C/T) was five-fold higher than SNPs with transversions (A/C, A/T, G/C, and G/T) (Supplementary Fig. S3A). Both A/G and C/T transitions were observed in equal number. The frequency of A/T, A/C, and G/T was similar but higher than G/C among the transversions. The transition/transversion ratio was 2.4. The length distribution revealed a range of 1 bp to 31 bp for deletions and 1 bp to 46 bp for insertions (Supplementary Fig. S3B).
Nearly 75% of SNPs and InDels were in the intergenic regions and the rest in the genic regions (Supplementary Fig. S4A). A large percentage of these polymorphisms were in promoter and 1 kb downstream regions. Most SNPs and InDels were present in the intronic regions (Supplementary Fig. S4B). The coding sequence (CDS) regions harbored a significantly higher percentage of SNPs compared with InDels. The percentage of SNPs and InDels in the 5′ untranslated regions (UTR) was the least. All SNPs identified in six combinations located within the CDS regions were classified into two categories, synonymous and nonsynonymous SNPs. There were more nonsynonymous SNPs than the synonymous SNPs. Of the 170,104 SNPs identified in the CDS regions, 56% were nonsynonymous. Among InDels, 2800 frameshift mutations were detected in the CDS regions.
The SNPs and InDels were categorized into mainly four types based on their functional annotation: high impact (affecting splice site, stop, and start codons), moderate impact (nonsynonymous), low impact (synonymous coding, alternate start/stop, start gained), and modifier (upstream, downstream, intergenic, and UTRs). The share of modifier SNPs was the highest among all SNPs (93.5%) followed by moderate (3.5%), low (2.7%), and high impact SNPs (0.2%) (Supplementary Table S3). On the other hand, 98.3% of InDels were modifiers, but the number of high impact InDels were more (1.1%) compared to moderate (0.2%) and low impact InDels (0.4%). The high and moderate impact SNPs affecting the amino acids were assigned into missense, nonsense, and silent categories, which constituted 58%, 2.7% and 39.7%, respectively (Supplementary Table S4). In addition, only a small number of high impact SNPs and InDels affecting the transcript splice site (splice acceptor and splice donor), stop codon (stop site gain and stop site lost) and start codon lost were identified.
Genes carrying SNPs and InDels and their functional relevance
The genes carrying nonsynonymous/large effect SNPs/InDels were classified based on their functions using eukaryotic orthologous group (KOG) analysis (Supplementary Fig. S5). There was no difference in percentage of genes under each category in both Pokkali and Nona Bokra comparisons. The largest group included genes for general function followed by signal transduction mechanism, post translational modification, protein turnover, and chaperones. Few other highly enriched groups included genes involved in RNA processing function, energy production and conversion, cell cycle control, cell division, amino acid transport, carbohydrate transport, lipid transport, transcription, and translational activities.
Gene ontology (GO) analysis was performed using the three sets of genes identified in all six genotype pairs: (a) genes harboring the large effect/nonsynonymous SNPs and InDels, (b) genes harboring SNPs and InDels in the promoter regions, and (c) differentially expressed genes16 harboring large effect SNPs and/or nonsynonymous SNPs and InDels in the promoter regions. In case of the large effect variants containing genes, GO analysis revealed significant enrichment of genes involved in biological processes, such as cellular response to stress, ion transport, post-translational protein modification, phosphorylation and cellular carbohydrate, and lipid metabolic process, whereas the genes with protein binding activity, kinase activity, transferase activity, and hydrolase activity were abundant under the molecular function category (Supplementary Fig. S6). The analysis of polymorphisms in the promoter regions of genes revealed enrichment of similar categories of pathways but the genes involved in ATPase activity were significantly enriched (Supplementary Fig. S7). In case of DEGs harboring large effect SNPs and InDels, genes involved in the biological process such as response to chemical stimulus were enriched, whereas genes involved in DNA polymerase activity, kinase activity, phosphoprotein phosphatase activity were significantly abundant under the molecular function category (Supplementary Fig. S8).
Analysis of DNA polymorphisms between salt tolerant and salt sensitive genotypes
The distribution and frequency of genome-wide DNA polymorphisms and their functional relevance between the salt tolerant and salt sensitive genotypes were examined to gain insights into the molecular basis of salt tolerance (Fig. 1). All polymorphic SNPs and InDels between these two groups were considered for analysis. The frequency and density of these polymorphisms were not proportional to the length of chromosomes which was quite different from the pattern seen in pairwise analysis (Supplementary Fig. S1). The number and density of SNPs and InDels per 100 kb was highest in chromosomes 5 and 6 and lowest in chromosomes 10 and 12. The type of nucleotide substitutions (transitions and transversions) followed the same trend as described before for the SNPs and InDels detected in all six pairs and the ratio of transition/transversion was slightly lower than the estimate made on all combinations (Fig. 2A). The length of insertions and deletions detected was up to 12 bp and 17 bp, respectively and most polymorphisms involved insertion or deletion of 1–2 bp (Fig. 2B).
Analysis of annotation information from rice genome revealed that majority of DNA polymorphisms were in intergenic, intronic, upstream, and downstream regions of the genes (Fig. 3A). A total of 18,168 SNPs and 1307 InDels were located within the genes and 24% of these SNPs and 4% of InDels were present in the CDS regions (Fig. 3B). Majority of SNPs (64%) and InDels (76%) were located in the intronic locations. There were only 2347 nonsynonymous SNPs and 51 frameshift changes detected between salt tolerant and salt sensitive groups.
A total of 85 SNPs and 56 InDels with large effects were identified (Table 2). Most of these high impact SNPs disrupted the splice sites, start, and stop codons. Among these groups, stop gained type constituted 55% of total polymorphisms. The high impact InDels were largely due to frameshift mutations. Compared to this result, large number of large effect SNPs and InDels was observed in each of the six combination (Supplementary Table S5).
Table 2.
Type of DNA polymorphism | SNPs | InDels |
---|---|---|
Splice site acceptor | 14 | 0 |
Splice site donor | 10 | 0 |
Start lost | 4 | 0 |
Stop lost | 8 | 5 |
Stop gained | 49 | 0 |
Frameshift | 0 | 51 |
Total | 85 | 56 |
The gene ontology analysis using the genes carrying nonsynonymous SNPs and/or large effect DNA polymorphisms between the two groups revealed that genes involved in post-translational protein phosphorylation and protein binding genes were enriched (Fig. 4). But when the SNPs and InDels in the promoter regions were considered, the genes involved in post-translational protein ubiquitination and DNA-directed RNA polymerase activity were abundant (Fig. 5). But in case of the differentially expressed genes, genes involved in biological processes such as response to chemical stimulus and abiotic stimulus were enriched like an earlier study16 (Fig. 6).
Associating salt tolerant QTLs to salt responsive genes
We selected 118 QTLs identified earlier in three mapping populations involving both Pokkali and Nona Bokra (Supplementary Table S6). These mapping populations were a RIL population from the cross Bengal x Pokkali17 and two introgression line populations from the crosses Jupiter x Nona Bokra18 and Cheniere x Nona Bokra19. Since a RNA-seq experiment conducted by our group identified DEGs between salt tolerant genotype Pokkali and salt sensitive IR64 in response to salt stress16, the expression data from this study was integrated with the DNA polymorphism data for the genes present in the salt tolerant QTL regions. This analysis resulted in identification of 1092 and 1511 DEGs with large effect SNPs/InDels in their coding and promoter regions, respectively (Supplementary Tables S7, S8).
The eukaryotic orthologous groups analysis of DEGs in the QTL intervals (Fig. 7A) corroborated the earlier GO analysis (Figs. 5, 6). Fifteen percent of genes were involved in post-translational modification/protein turnover, and chaperones, whereas the signal transduction mechanisms accounted for 8% of the genes. Two other groups included genes involved in energy production and conversion (6%), and genes involved in amino acid transport and metabolism, secondary metabolite biosynthesis, transport and catabolism, and translation, ribosomal structure and biogenesis (5%). Genes associated with biological processes such as responses to chemicals, hormones, and organic substances were most abundant (Fig. 7B). On the other hand, the most enriched genes under the ‘molecular function’ were involved in DNA polymerase, catalytic, transferase, and hydrolase activity.
Discussion
Salinity is a major climate-related risk for rice production worldwide. The discovery of DNA polymorphisms on a genome scale as well as salt stress related candidate genes are useful resources to facilitate genetic analysis and marker-assisted breeding activities for rice improvement. We focused on six pairs of comparisons involving two salt tolerant and three salt sensitive genotypes. There were differences in intraspecies comparisons as expected. For example, indica/indica comparison normally resulted in lower density of polymorphisms compared to indica/japonica comparison10,20. Higher number of polymorphisms observed in pairs involving Bengal compared to those involving Cocodrie could be due to genetic closeness of Cocodrie to indica genotypes. Even though both Bengal and Cocodrie belong to the japonica group, Bengal was genetically distinct from Cocodrie21. In all six pairs, the chromosomal regions with high and low density were identified for SNPs and InDels as previously reported16,22,23. Such localization of variants in several genomic regions could be due to hitch-hiking effect of many selected genes during the domestication process24.
Comparison of whole genome sequences between the salt tolerant and salt sensitive groups was done to identify the genes and their variants associated with salt tolerance (Fig. 1). Many of these variants differentiating both groups are expected to be related to salt tolerance. Since both salt tolerant genotypes were land races and all three salt sensitive genotypes were improved varieties, variants for genes involved in contrasting agronomic performance could be included. Compared to the pairwise analysis, there was a drastic reduction in number of SNPs and InDels between both groups. Subsequent annotation of these variants resulted in only 2347 nonsynonymous SNPs and 51 frameshift mutations (Fig. 3).
Integrating genome-wide polymorphism information to QTL mapping and transcriptomics data is increasingly used to identify candidate genes for the target traits25,26. In one study, QTL mapping coupled with RNA-seq analysis resulted in identification of 4 DEGs in the QTL regions for salt tolerance in rice25. But in this study, we first identified genome-wide DNA polymorphisms between two sets of genotypes with contrasting response to salinity stress. In the second step, we used the genomic coordinates of QTLs for several seedling salt tolerance traits from our previous studies involving the same two salt tolerant donors used in this study17–19 and identified the genes or their promoter regions carrying variants in the QTL confidence intervals. Finally, we determined the differentially expressed genes with large effect variants in the CDS and promoter regions using the gene expression data from an earlier study16 (Supplementary Tables S7 and S8).
The functional relevance of the DEGs present in the QTL intervals provides a better understanding of physiological and molecular mechanisms associated with adaptation in saline environment. There were 396 and 573 genes that carried variants in the CDS and promoter regions, respectively. Most abundant classes include genes involved in ion transport, oxidative stress tolerance, signal transduction, stress response, and transcriptional regulation (Supplementary Tables S7 and S8). Several transcriptomic studies reported involvement of genes associated with these salt tolerance mechanisms15,27,28. The congruence of salt tolerance QTLs with many protein kinase and leucine-rich repeat, zinc-finger, NB-ARC, and P450 domain containing genes was earlier reported14.
The DEGs involved in ion transport and homeostasis included potassium channel SKOR (LOC_Os06g14030) and chloride channel protein (LOC_Os04g55210), OsHKT2;3—Na+ transporter (LOC_Os01g34850), ABC transporter, ATP-binding protein (LOC_Os08g30740), AAA family ATPase, (LOC_Os02g19150), calcium-transporting ATPase, plasma membrane-type (LOC_Os12g04220), cation transport regulator-like protein 1 (LOC_Os02g26700), and sulfate transporters (LOC_Os01g41050, LOC_Os03g09970, LOC_Os03g09980). Reduced expression of OsHKT2;4, a high-affinity Na+/K+ transporter and SKOR in response to salt stress was previously reported28. There were three multi-antimicrobial extrusion (MATE) protein family genes (LOC_Os02g45380, LOC_Os01g31980, LOC_Os10g20350) with role in tolerance to abiotic stresses including salinity29.
The oxidative stress alleviating genes included several members of glutathione S-transferases, peroxidases, oxidoreductases, and HAD superfamily phosphatase. These genes were differentially expressed in rice under salt stress15. Several members of the cytochrome P450 family with role in growth, development, and stress tolerance30 were localized in the QTL intervals. The most abundant differentially expressed transcription factors were from MYB, AP2, NB-ARC, Zinc Finger, and Zinc knuckle families which were upregulated in salt tolerant genotypes under salt stress16,27.
There were several plant defense-related DEGs encoding NBS-LRR disease resistance proteins, NB-ARC containing proteins, terpene synthase, and 12-oxophytodienoate reductase. Due to proximity to a salt tolerance QTL, De Leon et al. (2016)17 suggested a NBS-LRR gene as a potential candidate enhancing salt tolerance in Pokkali. The pentatricopeptide repeat (PPR) proteins, which were crucial for adaptation under salinity15,31, overlapped with QTLs and carried large effect variants. Several dirigent genes, which have a role in passage of water and solutes into the vascular system due to its involvement in lignin synthesis32, were identified. These genes were upregulated in roots under salt stress in salt tolerant rice genotype SR8613. Similarly, the inclusion of glycosyl hydrolase family proteins in this list was obvious due to their response to biotic and abiotic stresses33.
The genes involved in signal transduction included members of OsGDSL (GDSL-like lipase/acylhydrolase) and jacalin-like lectin domain containing protein gene family34–37 and were differentially expressed in rice seedlings under salt stress27,35,37. Among kinases and phosphatases, most notable was receptor-like protein kinase 2 precursor which was represented by thirteen members of this gene family. Other genes included tyrosine protein kinase domain containing proteins, BRASSINOSTEROID INSENSITIVE 1-associated receptor kinase 1 precursor, and protein kinase domain containing proteins. A protein phosphatase 2C family protein OsPP2C8 (LOC_Os01g46760) was identified as a potential candidate gene for salt tolerance by associating sequence polymorphism with differential expression of genes present in the salt tolerance QTL region38. Although this gene was not detected in our study, we identified two other members (LOC_Os01g19130, LOC_Os02g13100) that carried SNPs and overlapped with QTLs.
Overall, the genetic complexity of salt tolerance is clearly evident from this study due to involvement of genes associated with multiple salt tolerance mechanisms; therefore, pyramiding of multiple beneficial alleles has been suggested to improve salinity tolerance in rice39. The candidate genes associated with salt tolerance were discovered by integrating the whole genome sequence analysis with QTL and gene expression data. However, this approach may not identify the genotype-specific salt tolerance genes and their variants because the salt tolerant donors vary widely in their response to salt stress due to the differences in genetic control of salt tolerance mechanisms. For example, different salt adaptation mechanisms operating in Pokkali and Nona Bokra40 could be the reason why no DEG was detected in the Saltol region. There are two most important outputs from this study: (1) the candidate genes with large impact variants in the coding as well as promoter regions represent some promising targets for validation in future; (2) the genomic resources of SNPs and InDels would be useful for genetic analysis and development of marker-assisted selection tools to improve salt tolerance in rice.
Materials and methods
Genome resequencing of rice genotypes with contrasting salinity response
Five rice genotypes with contrasting response to salt stress were used for this investigation. The salt tolerant genotypes, Pokkali and Nona Bokra, are indica landraces which are widely used in breeding program to enhance salt tolerance3,4 as well as in many QTL mapping studies17–19,39,41. Both genotypes respond differently to salt stress suggesting differences in salt tolerance mechanisms40,42,43. Among the salt sensitive genotypes, Bengal and Cocodrie are japonica cultivars developed at the Louisiana State University Agricultural Center44,45. Both varieties are highly sensitive to salt stress at the seedling stage21. IR64, a widely cultivated indica variety developed at the International Rice Research Institute, is sensitive to salt stress at the seedling stage46.
Genomic DNA was isolated from young leaves using Qiagen DNeasy kit (Qiagen Inc., Valencia, CA, USA). The quality and concentration of DNA in each sample were determined by Bioanalyzer 2100 (Agilent Technologies, Singapore) and Qubit 2.0 Fluorometer (Invitrogen Life Technologies, Eugene, Oregon), respectively. Libraries were made using Illumina TruSeq DNA sample preparation kit (Illumina, USA) and paired-end sequencing was done in an Illumina HiSeq 2000 at the Virginia Bioinformatics Institute, Blacksburg, VA. The filtering of raw sequence data was accomplished via an in-built standard Illumina pipeline.
The FASTQ files for all five genotypes were submitted to the sequence read archive (SRA) at the National Center for Biotechnology Information (NCBI) and the SRA accession numbers are PRJNA413821, PRJNA413822, PRJNA632686, and SRX272395.
Mapping of whole genome sequences to the reference genome
The primer/adopter sequences and low-quality reads with Phred quality score < 30 were removed using the NGS QC Toolkit (v2.3.3; http://www.nipgr.res.in/ngsqctoolkit.html)47. Only high-quality filtered reads were used for mapping on the rice reference genome (MSU7 version; http://rice.plantbiology.msu.edu/index.shtml) using Burrows Wheeler Alignment (BWA) software (v0.7.12)48. The SAMtools (v1.1) was used to determine the reference genome coverage49. All downstream analyses were done using the uniquely mapped reads.
Identification and annotation of variants
FreeBayes software (v0.9.21; https://github.com/ekg/freebayes) was employed to identify the SNPs and InDels using the following criteria: (a) the minimum variant frequency of ≥ 90%, (b) average quality of the SNP base ≥ 30, and (c) minimum read depth of 10. Additional filtering was done if there were three or more SNPs/InDels in any 10-bp window50. The distribution of DNA polymorphisms in different genomic regions was assessed by combining genome annotation information and positions of DNA polymorphisms. Above analyses including identification of synonymous/nonsynonymous SNPs, and large-effect SNPs/InDel were accomplished using single-nucleotide polymorphism effect predictor (SnpEff, v4.1 k)51 with default parameters. To identify the variants in the promoter regions, 2 kb upstream sequence of genes was used. All identified SNPs and InDels in the 5 rice genotypes compared to the reference genome were uploaded in a public depository Figshare (https://figshare.com/projects/Whole_genome_sequence_analysis_of_rice_genotypes_with_contrasting_response_to_salinity_stress_Insights_into_salt_tolerance/90053).
Gene ontology analyses
BiNGO plug-in (version 2.44, https://www.psb.ugent.be/cbd/papers/BiNGO/Home.html) available in Cytoscape (version 3.2.2, http://www.cytoscape.org/) was used for gene ontology (GO) analysis at P-value of ≤ 0.05. For functional characterization, genes were assigned to eukaryotic orthologous group (KOG) using KOGnitor database of National Center for Biotechnology Information (NCBI). The GO enrichment analysis was done separately using genes with large-effect polymorphisms in all genes or promoter regions or the differentially expressed genes (DEGs), identified in all pair-wise comparisons as well as for those differentiating salt tolerant from salt sensitive groups (100% allelic variation between both groups).
Mapping of SNPs/InDels on QTL regions
Since several QTL mapping studies were published by our laboratory using both Pokkali and Nona Bokra as salt tolerant donors17–19,41, the nonsynonymous SNPs and/or large effect variants (100% allelic variation) present in the QTL regions differentiating salt tolerant and salt sensitive groups were used to identify candidate genes associated with salt stress response. The start and end positions of QTLs in the MSU7 rice genome sequence were ascertained via BLASTN search. The nonsynonymous SNPs and/or large effect variants present within the QTL confidence intervals were identified based on overlapping of the genomic coordinates of QTLs in the rice genome annotation GFF file. The nonsynonymous SNPs and/or large effect variants present in the QTL confidence intervals were further mapped on the differentially expressed genes obtained between Pokkali and IR64 in response to salt stress in an earlier transcriptomic study16. This transcriptome study reported the differentially expressed genes (log2 fold change > 1 or < -1 with P-value ≤ 0.05) at the seedling stage under control and salinity stress using Tophat and Cufflinks pipeline52. After identification of variants in the QTL regions, the DEGs between tolerant and sensitive groups carrying these variants in CDS and promoter regions were analyzed for their functional relevance by GO analysis.
Supplementary information
Acknowledgements
This research was supported by United States Department of Agriculture-National Institute of Food and Agriculture (Grant No. 2018-67013-27618). This manuscript is approved for publication by the Director of Louisiana Agricultural Experiment Station, USA as manuscript number 2020-306-34886. M.J. acknowledges the Department of Biotechnology, Government of India for providing support under different schemes.
Author contributions
P.K.S. designed the study. R.S. conducted the experiment, data analysis, and generated all the figures and tables. M.J. supervised the data analysis. P.K.S. wrote the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Prasanta K. Subudhi and Rama Shankar.
Contributor Information
Prasanta K. Subudhi, Email: psubudhi@agcenter.lsu.edu
Mukesh Jain, Email: mjain@jnu.ac.in.
Supplementary information
is available for this paper at 10.1038/s41598-020-78256-8.
References
- 1.Munns R, Tester M. Mechanisms of salinity tolerance. Ann. Rev. Plant Biol. 2008;59:651–681. doi: 10.1146/annurev.arplant.59.032607.092911. [DOI] [PubMed] [Google Scholar]
- 2.Hasanuzzaman, M., Nahar, K., Alam, M.M., Bhowmik, P.C., Hossain, M.A., Rahman, M.M. et al. Potential use of halophytes to remediate saline soils. J. Biomedicine Biotech.2014, Article ID 589341 (2014). [DOI] [PMC free article] [PubMed]
- 3.Akbar, M., Gunawardena, I.E. & Ponnamperuma, F.N. Breeding for soil stresses. In: Progress in Rainfed Lowland Rice. International Rice Research Institute, Manila (1986).
- 4.Gregorio GB, et al. Progress in breeding for salinity tolerance and associated abiotic stresses in rice. Field Crops Res. 2002;76:91–101. doi: 10.1016/S0378-4290(02)00031-X. [DOI] [Google Scholar]
- 5.Cushman JC, Bohnert HJ. Genomic approaches to plant stress tolerance. Curr. Opin. Plant Biol. 2000;3:117–124. doi: 10.1016/S1369-5266(99)00052-7. [DOI] [PubMed] [Google Scholar]
- 6.Nguyen KL, Grondin A, Courtois B, Gantet P. Next-generation sequencing accelerates crop gene discovery. Trends Plant Sci. 2018;24:263–274. doi: 10.1016/j.tplants.2018.11.008. [DOI] [PubMed] [Google Scholar]
- 7.Huang X, Lu T, Han B. Resequencing rice genomes: an emerging new era of rice genomics. Trends Genet. 2013;29:225–232. doi: 10.1016/j.tig.2012.12.001. [DOI] [PubMed] [Google Scholar]
- 8.McNally KL, et al. Genome wide SNP variation reveals relationships among landraces and modern varieties of rice. Proc. Natl Acad. Sci. USA. 2009;106:12273–12278. doi: 10.1073/pnas.0900992106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Yamamoto T, et al. Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single-nucleotide polymorphisms. BMC Genom. 2010;11:267. doi: 10.1186/1471-2164-11-267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Arai-Kichise Y, et al. Discovery of genome-wide DNA polymorphisms in a landrace cultivar of japonica rice by whole-genome sequencing. Plant Cell Physiol. 2011;52:274–282. doi: 10.1093/pcp/pcr003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wang W, et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature. 2018;557:43–49. doi: 10.1038/s41586-018-0063-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Han B, Huang X. Sequencing-based genome-wide association study in rice. Curr. Opin. Plant Biol. 2013;16:133–138. doi: 10.1016/j.pbi.2013.03.006. [DOI] [PubMed] [Google Scholar]
- 13.Huang X, et al. A map of rice genome variation reveals the origin of cultivated rice. Nature. 2012;490:497–501. doi: 10.1038/nature11532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Jain M, Moharana KC, Shankar R, Kumari R, Garg R. Genome-wide discovery of DNA polymorphisms in rice cultivars with contrasting drought and salinity stress response and their functional relevance. Plant Biotechnol. J. 2014;12:253–264. doi: 10.1111/pbi.12133. [DOI] [PubMed] [Google Scholar]
- 15.Chen R, et al. Whole genome sequencing and comparative transcriptome analysis of a novel seawater adapted, salt-resistant rice cultivar-sea rice 86. BMC Genom. 2017;18:655. doi: 10.1186/s12864-017-4037-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Shankar R, Bhattacharjee A, Jain M. Transcriptome analysis in different rice cultivars provides novel insights into desiccation and salinity stress responses. Sci. Rep. 2016;6:2371. doi: 10.1038/srep23719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.De Leon TB, Linscombe S, Subudhi PK. Molecular dissection of seedling salinity tolerance in rice (Oryza sativa L.) using a high-density GBS-based SNP linkage map. Rice. 2016;9:52. doi: 10.1186/s12284-016-0125-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Puram VRR, Ontoy J, Linscombe S, Subudhi PK. Genetic dissection of seedling stage salinity tolerance in rice using introgression lines of a salt tolerant landrace Nona Bokra. J. Heredity. 2017;108:658–670. doi: 10.1093/jhered/esx067. [DOI] [PubMed] [Google Scholar]
- 19.Puram VRR, Ontoy J, Subudhi PK. Identification of QTLs for salt tolerance traits and prebreeding lines with enhanced salt tolerance using a salt tolerant donor ‘Nona Bokra’. Plant Mol. Biol. Rep. 2018 doi: 10.1007/s11105-018-1110-2. [DOI] [Google Scholar]
- 20.Chai C, Shankar R, Jain M, Subudhi PK. Genome-wide discovery of DNA polymorphisms by whole genome sequencing differentiates weedy and cultivated rice. Sci. Rep. 2018;8:14218. doi: 10.1038/s41598-018-32513-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.De Leon TB, Linscombe S, Gregorio G, Subudhi PK. Genetic variation in Southern USA rice genotypes for seedling salinity tolerance. Front. Plant Sci. 2015;6:374. doi: 10.3389/fpls.2015.00374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wang L, et al. SNP deserts of Asian cultivated rice: genomic regions under domestication. J. Evol. Biol. 2009;22:751–761. doi: 10.1111/j.1420-9101.2009.01698.x. [DOI] [PubMed] [Google Scholar]
- 23.Nagasaki H, Ebana K, Shibaya T, Yonemaru J, Yano M. Core single-nucleotide polymorphisms-a tool for genetic analysis of the Japanese rice population. Breed. Sci. 2010;60:648–655. doi: 10.1270/jsbbs.60.648. [DOI] [Google Scholar]
- 24.Smith JM, Haigh J. The hitch-hiking effect of a favorable gene. Genet. Res. 1974;23:23–35. doi: 10.1017/S0016672300014634. [DOI] [PubMed] [Google Scholar]
- 25.Wang S, et al. Integrated RNA sequencing and QTL mapping to identify candidate genes from Oryza rufipogon associated with salt tolerance at the seedling stage. Front. Plant Sci. 2017;8:1427. doi: 10.3389/fpls.2017.01427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Pandit A, et al. Combining QTL mapping and transcriptome profiling of bulked RILs for identification of functional polymorphism for salt tolerance genes in rice (Oryza sativa L.) Mol. Genet. Genom. 2010;284:121–136. doi: 10.1007/s00438-010-0551-6. [DOI] [PubMed] [Google Scholar]
- 27.Mansuri RM, et al. Dissecting molecular mechanisms underlying salt tolerance in rice: a comparative transcriptional profiling of the contrasting genotypes. Rice. 2019;12:13. doi: 10.1186/s12284-019-0273-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Domingo C, et al. Physiological basis and transcriptional profiling of three salt-tolerant mutant lines of rice. Front. Plant Sci. 2016;7:1462. doi: 10.3389/fpls.2016.01462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Tiwari M, Sharma D, Singh M, Tripathi RD, Trivedi PK. Expression of OsMATE1 and OsMATE2 alters development, stress responses and pathogen susceptibility in Arabidopsis. Sci. Rep. 2014;4:3964. doi: 10.1038/srep03964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Xu J, Wang XY, Guo WZ. The cytochrome P450 superfamily: Key players in plant development and defense. J. Integr. Agric. 2015;14:1673–1686. doi: 10.1016/S2095-3119(14)60980-1. [DOI] [Google Scholar]
- 31.Jiang SC, et al. Crucial roles of the pentatricopeptide repeat protein SOAR1 in Arabidopsis response to drought, salt, and cold stresses. Plant Mol. Biol. 2015;88:369–385. doi: 10.1007/s11103-015-0327-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Jin-long G, et al. A novel dirigent protein gene with highly stem-specific expression from sugarcane, response to drought, salt and oxidative stresses. Plant Cell Rep. 2012;31:1801–1812. doi: 10.1007/s00299-012-1293-1. [DOI] [PubMed] [Google Scholar]
- 33.Sharma R, Cao P, Jung KH, Sharma MK, Ronald PC. Construction of a rice glycoside hydrolase phylogenomic database and identification of targets for biofuel research. Front. Plant Sci. 2013;4:330. doi: 10.3389/fpls.2013.00330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Naranjo MA, Forment J, Roldan-Medina M, Serrano R, Vicente O. Overexpression of Arabidopsis thaliana LTL1, a salt-induced gene encoding a GDSL-motif lipase, increases salt tolerance in yeast and transgenic plants. Plant Cell Environ. 2006;29:1890–1900. doi: 10.1111/j.1365-3040.2006.01565.x. [DOI] [PubMed] [Google Scholar]
- 35.Jiang Y, Chen R, Dong J, Xu Z, Gao X. Analysis of GDSL lipase (GLIP) family genes in rice (Oryza sativa) Plant Omics. 2012;5:351. [Google Scholar]
- 36.Esch L, Schaffrath U. An update on jacalin-like lectins and their role in plant defense. Int J. Mol. Sci. 2017;18:1592. doi: 10.3390/ijms18071592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.He X, et al. OsJRL, a rice jacalin-related mannose-binding lectin gene enhances Escherichia coli viability under high-salinity stress and improves salinity tolerance of rice. Plant Biol. 2017;19:257–267. doi: 10.1111/plb.12514. [DOI] [PubMed] [Google Scholar]
- 38.Sun BR, et al. Genomic and transcriptomic analysis reveal molecular basis of salinity tolerance in a novel strong salt-tolerant rice landrace Changmaogu. Rice. 2019;12:99. doi: 10.1186/s12284-019-0360-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Thomson MJ, et al. Characterizing the Saltol quantitative trait locus for salinity tolerance in rice. Rice. 2010;3:148–160. doi: 10.1007/s12284-010-9053-8. [DOI] [Google Scholar]
- 40.Moons A, Bauw G, Prinsen E, Van Montagu M, Van der Straeten D. Molecular and physiological responses to abscisic acid and salts in roots of salt-sensitive and salt-tolerant indica rice varieties. Plant Physiol. 1995;107:177–186. doi: 10.1104/pp.107.1.177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.De Leon TB, Linscombe S, Subudhi PK. Identification and validation of QTLs for seedling salinity tolerance in introgression lines of a salt tolerant rice landrace ‘Pokkali’. PLoS ONE. 2017;12:e0175361. doi: 10.1371/journal.pone.0175361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Yeo AR, Yeo ME, Flowers SA, Flowers TJ. Screening of rice (Oryza sativa L.) genotypes for physiological characters contributing to salinity resistance and their relationship to overall performance. Theor. Appl. Genet. 1990;79:377–384. doi: 10.1007/BF01186082. [DOI] [PubMed] [Google Scholar]
- 43.Xie JH, Zapata-Arias FJ, Shen M, Afza R. Salinity tolerant performance and genetic diversity of four rice varieties. Euphytica. 2000;116:105–110. doi: 10.1023/A:1004041900101. [DOI] [Google Scholar]
- 44.Linscombe SD, et al. Registration of Bengal rice. Crop Sci. 1993;33:645–646. doi: 10.2135/cropsci1993.0011183X003300030046x. [DOI] [Google Scholar]
- 45.Linscombe SD, et al. Registration of ‘Cocodrie’ rice. Crop Sci. 2000;40:294. doi: 10.2135/cropsci2000.0007rcv. [DOI] [Google Scholar]
- 46.Kumari S, et al. Transcriptome map for seedling stage specific salinity stress response indicates a specific set of genes as candidate for saline tolerance in Oryza sativa L. Funct. Integr. Genom. 2009;9:109–123. doi: 10.1007/s10142-008-0088-5. [DOI] [PubMed] [Google Scholar]
- 47.Patel RK, Jain M. NGS QC toolkit: a toolkit for quality control of next generation sequencing data. PLoS ONE. 2012;7:e30619. doi: 10.1371/journal.pone.0030619. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Li H, Durbin R. Fast and accurate short-read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Li H, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Jhanwar S, et al. Transcriptome sequencing of wild chickpea as a rich resource for marker development. Plant Biotechnol. J. 2012;10:690–702. doi: 10.1111/j.1467-7652.2012.00712.x. [DOI] [PubMed] [Google Scholar]
- 51.Cingolani P, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 2012;6:80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Trapnell C, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protocol. 2012;3:562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.