Chromosomal characteristics of salt stress heritable gene expression in the rice genome

Matthew T McGowan; Zhiwu Zhang; Stephen P Ficklin

doi:10.1186/s12863-021-00970-7

. 2021 May 27;22:17. doi: 10.1186/s12863-021-00970-7

Chromosomal characteristics of salt stress heritable gene expression in the rice genome

Matthew T McGowan ^1,^✉, Zhiwu Zhang ^1,², Stephen P Ficklin ^1,³

PMCID: PMC8162008 PMID: 34044788

Abstract

Background

Gene expression is potentially an important heritable quantitative trait that mediates between genetic variation and higher-level complex phenotypes through time and condition-dependent regulatory interactions. Therefore, we sought to explore both the genomic and condition-specific characteristics of gene expression heritability within the context of chromosomal structure.

Results

Heritability was estimated for biological gene expression using a diverse, 84-line, Oryza sativa (rice) population under optimal and salt-stressed conditions. Overall, 5936 genes were found to have heritable expression regardless of condition and 1377 genes were found to have heritable expression only during salt stress. These genes with salt-specific heritable expression are enriched for functional terms associated with response to stimulus and transcription factor activity. Additionally, we discovered that highly and lowly expressed genes, and genes with heritable expression are distributed differently along the chromosomes in patterns that follow previously identified high-throughput chromosomal conformation capture (Hi-C) A/B chromatin compartments. Furthermore, multiple genomic hot-spots enriched for genes with salt-specific heritability were identified on chromosomes 1, 4, 6, and 8. These hotspots were found to contain genes functionally enriched for transcriptional regulation and overlaps with a previously identified major QTL for salt-tolerance in rice.

Conclusions

Investigating the heritability of traits, and in-particular gene expression traits, is important towards developing a basic understanding of how regulatory networks behave across a population. This work provides insights into spatial patterns of heritable gene expression at the chromosomal level.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12863-021-00970-7.

Keywords: RNAseq, Genetics, Transcriptomics, Heritability, Agronomy

Background

Understanding the molecular mechanisms by which genetic variation influences complex quantitative traits remains a major goal of genetic research today. Current polygenic and omnigenic models posit that for complex traits, only a small proportion of heritable phenotypic variation can be explained by relatively few easily identified mutations with large effects. The remaining majority of heritable variation is due to a much larger quantity of low to moderate effect mutations. After more than a decade of research utilizing Genome-Wide Association Studies (GWAS) it is clear that many of these low to moderate effect genetic variants underlying complex traits tend to lie in regulatory regions of the genome rather than in protein coding regions. Furthermore, affected regions have been found to be enriched for genes that interact in highly interconnected regulatory networks [1]. Therefore, expression quantitative trait locus (eQTL) studies seek to identify relationships between genetic variants and the genes on which they may have a regulatory effect by treating gene expression as the phenotypic trait for GWAS analysis.

The increasing number of studies investigating eQTLs in multiple plant species have revealed similar patterns of eQTL architectures. The location of eQTLs in relation to their affected gene are often referred to as cis and trans depending on whether they map respectively to the same relative location as the gene or elsewhere in the genome. While cis eQTLs tend to have larger effects on average compared to trans eQTLs, only a small proportion of genes appear to have cis eQTLs that explain a majority of their expression variance. Instead, many genes appear to have both cis and trans acting eQTLs with the most eQTLs being trans [2, 3]. Cross-gene eQTL analysis has revealed that many of these trans eQTLs are significantly enriched in genomic hotspots with wide reaching effects on gene expression [4, 5].

In any association study (GWAS or eQTL) characterization of heritability for the selected trait (e.g. phenotype or expression-level) is necessary to estimate genetic causality for the trait. Heritability is a fundamental genetics concept that describes how much of the variation in a given trait can be attributed to genetic variation [6]. It has demonstrated lasting usefulness in quantifying response to selection in plant breeding [7] and estimating disease risk in medicine [8]. Traditionally, heritability is estimated using known information about the genetic relationships between individuals. In human research, these known genetic relationships are usually in the form of monozygotic (identical) and dizygotic (fraternal) twins. In plant and animal research, pedigrees from controlled breeding populations are used to represent these genetic relationships. Another approach for estimating heritability uses high-density genotyping technologies such as single nucleotide polymorphism (SNP) arrays to infer genetic relationships. Genotype differences between individuals are used to calculate a genetic relationship matrix (GRM), also called a kinship matrix. This GRM is then used to estimate the proportion of phenotypic variance explained using linear mixed models. This approach is referred to as Genomic Relatedness Restricted Maximum Likelihood (GREML) and has multiple software implementations such as GCTA [9], EMMA [10], and rrBLUP [11]. Despite the large number of eQTL studies investigating gene expression, relatively few studies have explored genomic patterns of gene expression heritability using GREML-based estimates. Two studies in humans explored gene expression heritability of whole blood samples [12, 13], but similar research in plants is currently lacking.

Another area of gene expression research that is relatively unexplored is the influence of environmental factors. Even though differential gene expression analysis is a highly active area of research, studies investigating variation in gene expression in response to environmental changes have primarily focused on condition, time, and tissue-specific expression variation. Yet these studies are limited to a few different genotypes, far below the necessary sample sizes required for performing eQTL analysis [14]. However, given that complex agronomic phenotypes are known to have significant genotype-by-environment interaction effects, exploring how these interactions affect gene expression variation may provide novel insights into the underlying architecture of these phenotypes.

An important consideration prior to exploration of heritability is understanding any potential bias from variation that underlies the bimodal distribution of gene expression. It has been shown that gene expression when quantified with RNA-seq data has a bimodal structure such that lowly expressed (LE) genes and highly expressed (HE) genes appear as two overlapping distributions with LE genes centered in the negative log2 range and the other in the positive log2 range [15]. The source of this bimodality is a currently a topic of debate. One theory suggests the lower distribution is due to an unknown combination of transcriptional noise, ambiguous read mapping, contamination, cell type heterogeneity, and sequencing errors. Thus, many only use the HE genes for downstream research [16]. However, there is evidence that transcripts from the low abundance distribution are transcribed mRNA and not artifacts or small RNA molecules [17].

Another consideration for exploration of gene expression heritability, related to non-normal gene expression distributions, is that transcriptional repression has been shown to be correlated with the 3D conformational structure of chromosomes in the nucleus including chromatin and centromeric structures [18]. Chromatin alteration in plants has been shown to play important roles in tissue-specific specialization [19, 20], stress response [21–23], and suppression of transposable elements [24, 25]. Plant genomes have been found to possess active and repressive genome territories referred to as the A and B compartments which correspond to euchromatic and heterochromatic regions, respectively [26, 27]. While these compartments have been found to be largely stable across tissues, it remains unclear how stable these compartments are across changing environmental conditions known to alter chromatin states such as abiotic stress.

In this study, we sought to address the limitations and considerations just described for gene expression heritability by exploring the 2D and 3D chromosomal characteristics of heritable gene expression using an RNA-seq dataset of 84 individuals of the Oryza sativa Rice Diversity Panel 1 (RDP1) previously reported [28]. We explored patterns of missing values in the RNA-seq data (i.e., missingness) and the distribution of highly expressed (HE) and lowly expressed (LE) genes across the 2D chromosomal structure. Heritability was calculated independently for salt stress and control conditions and their distribution was also explored across the 2D genomic structure. We then explored the relationship of HE and LE genes to the Hi-C analysis of rice chromatin structures.

Results

Gene expression

For the 55,986 annotated gene transcripts in the Michigan State University (MSU) v7.0 Oryza sativa Nipponbare (rice) assembly [29], the distribution of missing values (genes with no measured expression) followed a U-shaped distribution with most genes having either a high or low missing rate and relatively few genes having moderate levels of missingness. We classified genes as having constitutive, mixed, or repressed expression patterns if non-zero expression was observed in > 95%, 5–95%, or < 5% of samples, respectively (Fig. 1a). Overall, non-zero gene expression followed a clear bi-modal distribution consisting of a mode of HE genes with positive log₂ TPMs and a second mode of LE genes with negative log₂ TPMs (Fig. 1b). Genes with constitutive expression occupied the HE mode, while genes with a mixed or repressed expression pattern matched the LE mode. Thus, HE genes are both highly expressed and highly present (few missing values) while LE genes are lowly expressed and lowly present. Furthermore, cross-tabulation across conditions indicates that genes had largely conserved expression patterns for all three expression patterns (Table 1). While there were a small number of genes that switched categories between conditions, there were no genes that changed from constitutive to repressed.

Fig. 1 — Bimodal Gene Expression Patterns: Plot A shows the proportion of samples with missing values calculated for each gene. The overall distribution of the missing rate is bimodal with the majority of genes either having few (< 5%) or many (> 95%) missing values. Genes were classified as ‘constitutive’ (< 5% missing), mixed (5–95% missing), or repressed (> 95% missing). Constitutive genes are those to the left of the red dashed line. The mean value of non-zero TPMs for expressed genes also had a bimodal distribution based on the missing rate. Plot B shows the density plots of constitutive and non-constitutive genes

Table 1.

Contingency Table of Expression-Level Categories

		Salt-stress
		Constitutive	Mixed	Repressed	Totals
Control	Constitutive	16,372	363	0	16,735
	Mixed	91	25,116	932	26,139
	Repressed	0	1007	12,105	13,112
	Totals	16,463	26,486	13,037

Open in a new tab

Heritability

Comparison of heritability results

Correlation of gene expression biological replicates on a per-gene basis was calculated as a potential estimate for heritability, similar to twin-based measures of heritability in humans. Replicate heritability values were then compared to both GREML estimates of heritability using a genotypic mean (two-step) and GREML estimates that included replication as a random effect in the model.

Due to the relatively small sample size, there were many genes where the GREML heritability (single-step or two-step) could not be reliably predicted with a mixed linear model resulting in an inflated number of genes with low heritability estimates (0–0.2) and a wide 95% confidence interval (Additional File 1, Fig. S1). There was strong correlation between replicate heritability versus single-step GREML (ρ = 0.89), indicating that gene expression heritability can be estimated using the biological replicates expression data. However, the correlation of the two-step method was moderate when compared to the one-step approach (ρ = 0.41) and with replicate heritability approach (ρ = 0.45) (Fig. 2). Results in Fig. 2 are for the control condition, but patterns were similar for the salt condition (Additional File 1, Fig. S2).

Fig. 2 — Comparison of Heritability Calculation Methods for the Control condition: Pairwise correlation between repeatability (Pearson’s), single-step GREML (with replicates), and two-step GREML (using the genotypic mean) for the control condition. The lower triangle shows correlation scatterplots of the pairwise comparisons, the diagonal provides the density distribution plots for each individual method and the upper right triangle provides the corresponding pairwise correlation values

Condition-specific heritability classification

To identify a significance threshold for expression heritability, randomized permutation tests of shuffled gene expression values were used to calculate a null heritability distribution. Using this null-distribution, a significance threshold was calculated using a fixed type-I error rate (□ <= 0.01) (Fig. 3a). Genes were classified whether they were significantly heritable for control and salt-stress conditions (Fig. 3b). While most genes with heritable expression appeared to have conserved heritability for both control and salt-stress conditions (n = 6851), there were a considerable number of genes significantly heritable only during control (n = 3599) or salt-stress (n = 1377). These genes with condition-specific heritability were less heritable than genes that were heritable across both conditions (Additional File 1, Fig. S3). Genes heritable in both salt stress and control were correlated symmetrically along the diagonal (Fig. 3b), indicating no condition-specific bias.

Fig. 3 — Classification of gene expression heritability. Plot A shows the heritability distribution of randomly shuffled gene expression values. This distribution serves as the null-distribution used for determining non-significant heritability estimates for genes. The dashed red line indicates the quantile for a fixed type-1 error (□=0.01). Plot B shows the comparison of salt and control heritability estimates. A quantile threshold was used to classify each gene as having significant heritability in salt treatment, control or general (i.e. both)

Chromosomal structure and conformation

HE and LE genes follow distinct 2D spatial patterns

The spatial distribution of constitutive, mixed, and repressed genes was visualized along the chromosomes using a sliding window of 3 Mb at 100Kb intervals. Empirically, constitutive genes appear enriched on the ends of chromosomes and depleted near pericentromeric regions (Fig. 4). For metacentric chromosomes, this pattern formed a U-shape centered on the centromere. Densities for genes with repressed and mixed expression were often inverse of constitutive genes and appear enriched near the centromere and depleted at the chromosome ends. Reductions in density of constitutive genes were not always centered on the centromeric regions. For example, subtelocentric chromosomes 4, 9, and 10 (and chromosome 11 to a lesser extent) show this asymmetry as the short chromosomal arms appeared relatively devoid of genes with constitutive expression (Fig. 4).

Fig. 4 — Gene density distributions across chromosomes. Plots A-D represent chromosomes 1, 4, 6, and 8 respectively. The black lines at the bottom of each plot represent the relative chromosome length, with the position and relative size of pericentromeric regions indicated by overlapping red boxes. Overall gene frequency represented by the red line appears roughly uniform across each chromosome. Genes with constitutive expression (expressed in > 95% of samples), represented by the lime-colored line, are enriched on the distal ends of chromosome arms and depleted near pericentromeric regions. Genes with repressed expression (< 5% of samples), represented by the cyan colored line, are enriched near pericentromeric regions. Genes with mixed expression (5–95% of samples), represented by the pink line, largely follow the same distribution as repressed genes

Comparison of gene expression and HI-C a/B chromatin compartments

Regarding 3D characteristics of expressed genes, densities of genes (when calculated using a fixed 100 kb window size) were highly correlated (ρ = 0.7–0.9) with A/B chromatin compartments identified with the first principal component of PCA analysis of a Hi-C contact map [27] (Additional File 1, Figs. S4-S6). Euchromatic A compartments corresponded to genes that were constitutively expressed across all genotypes. Conversely, heterochromatic B corresponded to genes with either mixed or repressed expression across genotypes.

Salt-specific spatial enrichment analysis

When the spatial distribution of genes with salt-specific heritability was compared to the distribution of genes with non-specific heritability, 22 windows were identified on chromosomes 1, 4, 6, and 8 that passed a permutation-based p-value threshold (□=0.001) (Fig. 5, Table 2). This test indicates where the genome is enriched for salt-stress specific expression. Other chromosomes did not have significantly enriched windows (Additional File 1, Figs. S7-S9). Adjacent and overlapping windows were combined into five contiguous regions (Additional File 2, Table S1). Gene ontology enrichment analysis of heritable genes in these regions identified terms of transcription factor activity (GO:0003700), response to endogenous stimulus (GO:0009719), nucleic acid binding (GO:0003676), and DNA binding (GO:0003677) (Additional File 2, Tables 2-3). When compared to previous GWAS studies, there were overlaps between these regions and QTLs identified for salt-tolerance related traits. In particular, a 3 Mb window on chromosome 4 directly overlaps with a highly significant 575 Kb QTL identified from a previous GWAS that used the same RDP1 panel that was significant for sodium and potassium accumulation in root tissue [28]. Fine mapping of this QTL identified HKT1;1, a sodium-transporter gene (LOC_Os04g51820) that is the likely causal gene. It was also determined that altering the expression of this gene using RNA-interference lines significantly affected both shoot and root growth under saline conditions [28].

Fig. 5 — Salt-specific Heritable Gene Enrichment. Plots A-D represent chromosomes 1, 4, 6, and 8 respectively. The black lines at the bottom of each plot represent the relative chromosome length, with the position and relative size of pericentromeric regions indicated by overlapping red boxes. Using a sliding window size of 1.5 Mb at 100 Kb intervals, chromosomes were tested for enrichment of genes with salt-specific heritability using all genes with heritable expression (salt-specific, optimal-specific, and general) as the null distribution. P-values were adjusted for multiple-testing using a permutation based approach. Using a critical value of 0.001, indicated by the dashed red line, significant windows enriched for salt-specific heritability were identified on chromosomes 1, 4, 6, and 8

Table 2.

Genome windows enriched for salt-specific heritable expression

Chromosome	Start Position	End Position	Heritable genes	Fisher’s test adjusted p-value
1	36,450,000	38,550,000	19	2.5E-04
4	24,550,000	26,050,000	17	7.5E-04
4	28,250,000	30,950,000	19	2.0E-05
6	10,650,000	12,150,000	13	5.0E-04
8	23,350,000	25,650,000	17	4.0E-05

Open in a new tab

In summary, results show missingness is the cause of bimodality in the salt-stress gene expression data. Regarding 2D characteristics, HE and LE genes have distinct distribution patterns in relation to the centromeric location of the chromosomes. Additionally, salt-specific heritable genes follow similar 2D distribution patterns but are also highly correlated with 3D conformation following Hi-C identified A/B compartments. We also identified several significant genomic hot-spots enriched for genes with salt-specific heritability on chromosomes 4 which is concordant with previous GWAS studies investigating salt tolerance phenotypes in a similar population as well as 3 additional windows on chromosomes 1, 6, and 8.