Abstract
Heterogeneous Stock (HS) rats are a genetically diverse outbred rat population that is widely used for studying genetics of behavioral and physiological traits. Mapping Quantitative Trait Loci (QTL) associated with transcriptional changes would help to identify mechanisms underlying these traits. We generated genotype and transcriptome data for five brain regions from 88 HS rats. We identified 21 392 cis-QTLs associated with expression and splicing changes across all five brain regions and validated their effects using allele specific expression data. We identified 80 cases where eQTLs were colocalized with genome-wide association study (GWAS) results from nine physiological traits. Comparing our dataset to human data from the Genotype-Tissue Expression (GTEx) project, we found that the HS rat data yields twice as many significant eQTLs as a similarly sized human dataset. We also identified a modest but highly significant correlation between genetic regulatory variation among orthologous genes. Surprisingly, we found less genetic variation in gene regulation in HS rats relative to humans, though we still found eQTLs for the orthologs of many human genes for which eQTLs had not been found. These data are available from the RatGTEx data portal (RatGTEx.org) and will enable new discoveries of the genetic influences of complex traits.
INTRODUCTION
Rats are used in a variety of fields including physiological and behavioral research because of their similarities to humans and are preferred over mice for studying certain traits (1–4). In particular, research into the genetic basis of complex behavioral tasks such as measures of impulsivity and complex models of substance abuse and other motivated behavior has made extensive use of various inbred and outbred rat populations (5). Many associations with these traits have been detected (6). However, similar to the situation in human complex trait genetics, resolving implicated chromosomal regions to specific genes and underlying mechanisms is a critically important step that remains challenging.
Identification of heritable differences in gene expression via expression quantitative trait loci (eQTL) mapping offers one way to identify the molecular mediators of loci implicated by genome-wide association studies (GWAS) (7–10). Mapping of eQTLs has been conducted at scale for dozens of human tissues, most notably by the Genotype-Tissue Expression Consortium (GTEx) (7). In contrast, eQTL mapping in rats has been limited in terms of populations, tissues, sample size, and number of genetic markers used (11–27). Some eQTL mapping has been conducted in Heterogeneous Stock (HS) rats (28,29), but to our knowledge this has not yet been done transcriptome-wide.
HS rats were developed in the 1980s by interbreeding eight inbred rat strains (ACI/N, BN/SsN, BUF/N, F344/N, M520/N, MR/N, WKY/N and WN/N) (30) and have been maintained as an outbred population ever since. As a result, each HS rat chromosome is a mosaic of the eight possible founder haplotypes meaning that all alleles are common. The relatively high minor allele frequency is in stark contrast to humans, which have a preponderance of rare variants, and provides greater power for mapping eQTLs.
Because HS rats are being used for a variety of behavioral and physiological studies, there is an urgent need for a well-powered and complete library of QTLs. Here, we used tissue from five brain regions that have been implicated in addiction and other psychiatrically important traits to map eQTLs and splicing QTLs (sQTLs) in HS rats. We explored several important considerations for eQTL mapping in this population. We also have compared the results of eQTL mapping in HS rats to publicly available human data, identifying both similarities and important differences. Finally, we have provided all the data generated here on an online portal (RatGTEx.org) that provides a clearing house for this and other eQTL datasets.
MATERIALS AND METHODS
Brain samples
Brains were extracted from 88 HS rats (43 male and 45 female). Mean age was 85.7 ± 2.2 for males and 87.0 ± 3.8 for females. Rats were selected to try to avoid related individuals. All rats were group housed under standard laboratory conditions and were naïve to behavioral or drug treatment.
Rat brains were taken out of a –80°C freezer and cryosectioned into 60 μm sections, which were mounted onto RNase-free glass slides. Slides were stored in -80°C until dissection. During dissection, slides were placed on a –20°C cold plate. One drop (approximately 50 μl) of RNAlater was placed on the brain region of interest. Each brain region then was dissected out under a dissecting video camera by using a pair of fine-tipped forceps with the assistance of an 18 gauge needle with a bent tip. Bilateral tissue of the same brain region from each rat was immediately transferred into 350 μl Buffer RLT (containing beta-mercaptoethanol) and placed on dry ice. Tissue was stored in –80°C before RNA extraction.
Tissue was thawed on ice and homogenized by using a clean stainless steel bead using Qiagen TissueLyser (40 Hz, 3 min). AllPrep DNA/RNA mini kit (Qiagen) was used to extract RNA. Samples were processed by using the QIAcube robot following standard protocols. The optional DNase digestion step was included for RNA samples. The average RIN for PL, IL, OFC, NAcc and LHb were 9.47 ± 0.58, 9.33 ± 0.63, 9.7 ± 0.53, 8.88 ± 0.79 and 8.94 ± 0.88, respectively.
RNA sequencing
We performed RNA-Seq on mRNA from each brain region sample using Illumina HiSeq 4000 to obtain 100 bp single-end reads for 435 samples, with 26.7 million raw reads per sample on average (Supplementary Table S1).
To quantify gene expression, reads were first trimmed for adapter and poor-quality base calls using cutadapt (31). Reads were then aligned to the Ensembl Rat Transcriptome using RSEM (32). Upper quartile adjustment was applied to estimated gene read counts using DESeq2 (33). Samples were filtered based on low reads counts, mismatched genotypes (as described in the paragraph below), and expression principal component analysis (PCA) outliers. For two rats, all samples were removed by these filters, yielding processed data for 397 samples in 86 rats. Genes were eliminated if <25% of libraries had more than one read or if the total number of reads among all libraries for the gene was <100. Read counts were log2 transformed after adding a pseudocount of one to each read count. We used those values for calculating allelic fold change, and for eQTL mapping we applied rank-based inverse normal transformation to the values per gene.
Separately, to quantify allele specific expression and splicing, RNA-Seq reads were aligned to the Rnor_6.0 (rn6) genome from Ensembl (http://ftp.ensembl.org/pub/release-99/fasta/rattus_norvegicus/dna/Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz) using STAR v2.7.3a (34). STAR was run in two passes per sample, where novel splice junctions identified in the first pass were used to align additional reads in the second pass. The second pass used WASP to reduce mapping bias due to polymorphisms (35). Duplicate reads were then marked with the Picard MarkDuplicates function.
To check for mismatched RNA-Seq/genotype samples, we counted reads containing each allele for each exonic SNP using GATK ASEReadCounter (36) and compared counts to the genotypes at those SNPs. We identified 13 samples in which the RNA-Seq sample did not correspond to the label-associated genotype. Two of these samples matched each other's genotypes, and their rat IDs were swapped and the samples were kept. Of the remaining 11, three matched with genotypes for which samples already existed for the same brain region, and the other eight matched with none of the 88 genotypes, so these 11 samples were removed from the study.
Genotyping
We used genotyping-by-sequencing as described previously (37) to genotype the 88 rats, yielding 125 686 high-quality observed autosomal SNPs in Rnor_6.0 coordinates. We used SHAPEIT (38) followed by IMPUTE2 (39) to impute additional SNPs based on the genotypes of the eight HS founder strains (ACI/N, BN/SsN, BUF/N, F344/N, M520/N, MR/N, WKY/N and WN/N), resulting in phased genotypes for 3 511 003 SNPs.
Founder haplotypes
Regions of the 88 rat genomes were mapped to the eight HS founders using the calc_genoprob function of R/qtl2 with the cohort and founder strain genotypes (40). Diploid haplotype pair probabilities were collapsed to probabilities per strain per locus per animal using the genoprob_to_alleleprob function. These inferred haplotype mappings were used solely to examine genetic diversity in the cohort, and were not utilized in QTL mapping.
Haplotype probabilities were compared to the results of breeding simulations. For one simulated locus, one of eight haplotype labels was randomly chosen for each of the two copies per individual, for 100 individuals grouped into 50 female-male pairs. To progress one generation, individuals were rearranged into new pairs either by rotating the males by one in the sequence of pairs (circular mating), or by shuffling the pair assignment of the males (random mating). For each new pair, the locus is inherited in an offspring by randomly selecting one of the two alleles from the female and another from the male. This was done twice per pair to produce a new set of 50 female-male pairs. This was repeated for 80 generations. This full locus simulation was repeated 200 times using circular mating and 200 times using random mating.
eQTL mapping
We performed cis-eQTL mapping using single-SNP linear regression implemented in tensorQTL (41), testing variants within 1 Mb upstream and downstream of each gene's transcription start site (cis-window). We included 28 covariates: the first 20 principal components of the brain region's expression matrix, and the genotype similarity to each of the eight HS founder strains to control for unequal relatedness. Empirical beta-approximated P-values were computed using data permutations (42) and were then used to calculate gene-level q-values and nominal P-value significance thresholds. A q-value cutoff of 0.05 was used to determine the genes for which at least one significant cis-eQTL was found. We then ran tensorQTL in cis_independent mode to find additional, conditionally independent cis-eQTLs per cis-eQTL gene (eGene) using a stepwise regression procedure (43). Finally, we ran tensorQTL in trans mode without excluding cis-window SNPs to identify all associations genome-wide with nominal P-value < 10−5.
sQTL mapping
We quantified splice phenotypes by first identifying splice junctions using regtools (44). Using the cluster_prepare_fastqtl.py script provided by the GTEx pipeline, we clustered introns using LeafCutter (45), mapped clusters to genes, and applied filtering and normalization. We then mapped cis-sQTLs using tensorQTL, using genes as phenotype groups when doing permutations to compute empirical P-values. As with cis-eQTLs, a q-value cutoff of 0.05 computed across genes was used to determine significant cis-sQTLs, and used stepwise regression to find additional conditionally independent cis-sQTLs for each gene. We used the same eight genotype covariates as for eQTL mapping, plus the first ten principal components of the splice phenotypes.
eQTL effect size
We define cis-eQTL effect size as allelic fold change (aFC) and computed it in two independent ways. Primarily, we computed aFC from total gene expression based on the additive cis-regulatory model (46), with the same covariates as were used for eQTL mapping. For validation, we computed haplotype-level allele specific expression (ASE) using phASER (47), which we then used to compute aFC for genes with sufficient ASE information (48). This method relies on phased genotypes and the SNPs detected in RNA-Seq reads.
Effect sizes for GTEx eQTLs were obtained from tables downloaded from https://gtexportal.org. Human-rat ortholog pairs were obtained from Ensembl BioMart.
GEMMA
To assess the impact of using a linear mixed model (LMM) for mapping cis-eQTLs, the leave one chromosome out (LOCO) method was used, so GEMMA (49) was run in gk mode to create 20 kinship matrices, each based on all genotypes except those on the same chromosome as the genes for which it would be used. GEMMA was run on the nucleus accumbens core samples in lmm mode using the Wald test. It was run separately for each gene, testing only the gene's cis-window variants with minimum minor allele frequency (MAF) = 5%. As with the tensorQTL mapping, the first 20 expression PCs were used as covariates, but the eight genotype-based covariates were omitted so as not to interfere with the random effect term. Percent variance explained (PVE) by the kinship matrix for each gene was computed by running GEMMA in vc mode, supplying a kinship matrix but not genotypes. The lmm mode mapping was repeated in lm mode for comparison, identically aside from not supplying a kinship matrix. These results were used only for the analysis on LMM impact, while results from tensorQTL described earlier were used for the remainder of the study.
Heritability estimates
The cis-heritability (h2) for rat genes were calculated with GEMMA by first computing a kinship matrix for each gene using only cis-window variants. GEMMA was then run in vc mode, supplying the gene-specific kinship matrix and the same covariates used for eQTL mapping, and recording the PVE from the output log. Human cis-heritability estimates were previously computed (50).
VG estimates
VG (the expected variance in the gene dosage due to interindividual genetic differences observed in allele specific expression) was estimated for each gene in each rat brain region by running ANEVA (51) using the phased, gene-level allele specific expression counts. VG estimates for human genes were similarly obtained using GTEx v8 data, and VG estimates calculated with at least 5000 ASE counts were included.
The human gene sets used to subset human-rat ortholog pairs for VG comparison were based on those previously collected by Mohammadi et al. (51–53). We removed sets that overlapped with fewer than 20 ortholog pairs, and replaced the GWAS-derived sets with sets of all author-reported genes for traits in the GWAS Catalog v1.0.2 (54), choosing the 10 traits with the most author-reported genes while avoiding redundant traits.
Variant annotation
SNPs were annotated with functional categories using the Ensembl Variant Effect Predictor (55). The background SNP set for enrichment was all cis-window SNPs for all tested genes. The test sets for enrichment were the cis-eQTL SNPs (eSNPs) with lowest P-value per eGene in each brain region, including multiple SNPs in the case of tied P-values.
Colocalization
We collected linear association statistics from eQTL mapping in the five brain regions and from the nine traits from a published GWAS in HS rats (56). GWAS scores were available for a set of pruned SNPs (r2 < 0.95), so for each brain region we selected the top cis-eSNP per gene that was present in the pruned GWAS dataset to test for colocalization. We computed z-scores for eQTL and GWAS associations by dividing the slope by its standard error for each selected SNP. Using the summary data-based Mendelian randomization (SMR) method (57) we computed the approximate χ2 test statistic and computed a P-value using the upper tail of the chi-squared distribution with one degree of freedom. We computed SMR P-values for each selected SNP and used a Bonferroni threshold to determine SNPs with significant colocalization. We repeated this for each of the 45 tissue-trait combinations.
RESULTS
We obtained gene expression profiles from five brain regions from 88 HS rats using RNA-Seq with an average library size of 26.7 million raw reads (Supplementary Table S1). The regions examined were: nucleus accumbens core (NAcc), infralimbic cortex (IL), prelimbic cortex (PL), orbitofrontal cortex (OFC) and lateral habenula (LHb) (Figure 1A). These brain regions were selected because of their relevance to a variety of behavior traits, including but not limited to substance abuse-related traits.
We determined genotypes at 3 511 003 SNPs across all autosomes using genotyping by sequencing (37). Consistent with our expectations based on their population history (Figure 1B), linkage disequilibrium (LD) decayed over much longer distances in this population as compared to humans (Figure 1C). Minor allele frequencies were fairly uniform, with a mean of 24% and the first and third quartiles of 11% and 36%; importantly the spike of rare alleles typically observed in human populations was not present (Figure 1D).
Clusters of expression profiles for the three cortical regions were relatively close along their first two principal components, while nucleus accumbens core and lateral habenula profiles formed separate clusters (Figure 1E). Further separation between the cortical regions was apparent in the fourth principal component (Figure 1F).
HS founder haplotype diversity
Since the HS rats have been maintained as an outbred population for many generations (73–80 for this cohort), the chromosomes are expected to be random mosaics of the eight founder haplotypes. In addition to the accumulation of recombinations, which improves mapping resolution, genetic drift inevitably erodes haplotype diversity (Figure 2). We inferred ancestral haplotypes across each animal's genome, which showed that at many loci the founder haplotypes had deviated substantially from their initially uniform proportions (Figure 2A, Supplementary Figure S1). To determine if the observed loss of haplotype diversity was consistent with genetic drift versus other possibilities such as genotyping errors, breeding errors, or inadvertent selection for fitness and fecundity, we simulated the breeding history. In particular, since the HS population has undergone periods of both circular and random pair mating, we simulated both strategies separately. We found that the distribution of observed haplotype diversity, as measured by Shannon entropy, lies between that of the two simulated strategies at generation 80 (Figure 2B), suggesting that the changes in haplotype frequency are broadly consistent with random genetic drift.
Mapping eQTLs
We tested for associations between gene expression and each SNP across the genome (Figure 3A). As observed in other organisms (7,9), associations with SNPs near the gene's location in the genome, which we presumed to be cis-eQTLs, were prevalent. While we observed some associations with distant SNPs, which may represent trans-eQTLs, we primarily focused on putatively cis-acting eQTLs within ±1 Mb of each gene's transcription start site (TSS) to retain statistical power and limit false positives (see Materials and Methods). Unless otherwise noted, ‘eQTL’ hereafter refers to a cis-eQTL.
Plotting all P-values in the cis-windows revealed blocks of SNPs in full LD with identical P-values, or with multiple overlapping sets of such SNPs, depending on the founder haplotypes involved (Figure 3B). The cis-window of 96.6% of the genes contained at least 10 SNPs that were not in full LD, and 28.1% contained at least 100 such SNPs. The TSS of 73% of genes were in high LD (r2 > 0.99) with at least one other gene's TSS. In instances where multiple top SNPs in perfect LD were associated with the same gene, a single SNP was selected randomly for downstream analyses and visualization. We estimated the effect sizes for the cis-eQTLs using allelic fold change (aFC) and found that 87% of the cis-eQTLs alter the gene expression by up to two fold (|log2 aFC| ≤ 1), and 96% did so up to four fold (|log2 aFC| ≤ 2) (Figure 3C). Consistent with their higher statistical power, minor allele frequencies of eQTLs were higher on average than the set of all measured SNPs (Figure 3D). eQTLs were enriched close to the associated gene's TSS, occurring upstream and downstream of the TSS at similar frequencies (Figure 3E).
We identified cis-eQTLs for between 3339 and 4003 genes for each of the brain regions at a 5% false discovery rate (Figure 4A), a consistent amount that represented 20% to 24% of the 16 456 to 16 814 expressed genes in each brain region. A total of 7,788 genes were affected by a cis-eQTL in at least one brain region, many of which (3,234, 42%) were identified in only one brain region and 1170 (15%) of which were identified in all five brain regions (Figure 4B). To validate the mapped cis-eQTLs, we measured their regulatory effect size from total gene expression and allele specific expression (ASE) data independently using aFC (46). We could obtain ASE counts for 70.8% of expressed genes on average per brain region, and from those counts estimated aFC from ASE for 52.4% cis-eQTLs on average. These two independent aFC measurements were consistent for each brain region (mean Pearson's r = 0.58 ± SD 0.02, Deming regression β = 1.26 ± 0.10, Supplementary Figure S2). We used a stepwise regression procedure to identify conditionally independent cis-eQTLs beyond the strongest cis-eQTL per gene, and found an average of 174 genes with two eQTLs and 4 genes with three eQTLs in each brain region, resulting in an average of 4.9% additional eQTLs per brain region. In total, we found 19 588 cis-eQTLs across the five brain regions (Supplementary Table S2).
Next we looked at the tissue specificity of the eQTLs that may reflect biological differences across brain regions. The three cortical regions (IL, PL, and OFC) shared a greater overlap of cis-eQTL genes (eGenes) than any other tissue trio, with 349 eGenes shared exclusively among them, compared to only 104 eGenes for the next-highest trio (Figure 4C). This observation is broadly consistent with our expectation that cortical tissues should have similar expression patterns. However, there were a number of eQTLs that were only detected in a single tissue (Supplementary Table S3). In some cases, this may reflect real biological differences that give rise to tissue specific eQTLs. In other cases, this apparent tissue specificity could be also caused by noise in associations that are close to the significance threshold, such that only one tissue reached the significance threshold, or by low expression of the gene in question in other tissues (Supplementary Table S4).
We quantified splicing in terms of intron excision ratios and used these phenotypes to map cis-sQTLs (7). Because these measurements are based on a smaller number of reads, power to detect sQTLs is likely lower than for eQTLs. We found cis-sQTLs in 4.1–5.4% of the 6918 to 7676 genes in which we detected alternative splicing per brain region. Over all splice junctions per gene and using stepwise regression, we found 305 to 403 independent cis-sQTLs per brain region, impacting a total of 764 genes (Figure 4D, Supplementary Table S5). This included 404 genes for which cis-sQTLs were identified in only one brain region and 117 for which cis-sQTLs were found in all five brain regions (Figure 4E, F). Importantly, 47% of the genes with an sQTL in a brain region did not have an eQTL in that brain region, demonstrating the added benefit of mapping sQTLs.
We annotated top associated SNPs (eSNPs) per eQTL and found enrichment in all protein-coding gene-associated categories, both exonic and intronic (Figure 4G). However, these enrichments may also reflect the tendency for both eSNPs and gene-associated features to occur near the gene's transcription start site relative to the full cis-window. While there were too few sQTLs to reliably measure sQTL SNP (sSNP) annotation enrichment, especially for smaller categories such as ‘Splice region’, the proportions of annotations among the sSNPs were similar to the proportions among eSNPs. Given that there were often blocks of multiple eSNPs with identical P-values, the level of eSNP resolution in HS rats is limited by LD structure.
Because of the complex familial relationships among members of an outbred population like the HS, several prior eQTL mapping studies in similar mouse populations employed a linear mixed model (LMM) that includes a kinship matrix that accounts for relatedness (9,58,59). However, LMMs are more computationally intensive, and given the large number of genes being examined, we questioned the need for an LMM. We examined genetic relatedness between the rats and found several outliers that appear to be closely related pairs (Supplementary Figure S3a). We repeated cis-eQTL mapping for one brain region, NAcc, using GEMMA (49) in linear model (LM) mode and in LMM mode, run with identical parameters aside from the inclusion of a leave-one-chromosome-out kinship matrix for LMM. The absolute values of Z-scores for the top association per gene were highly correlated (Spearman's rho = 0.991, Supplementary Figure S3b). The sets of eGenes below a wide range of P-value thresholds had strong overlap between the modes (97%, 97% and 95% for 10−9, 10−6 and 10−3, respectively, Supplementary Figure S3c). This is in stark contrast to the scenario of GWAS in a panel of inbred mouse or rat strains, where use of LMM is critical to avoid false positive results (60). In our dataset, P-values for LMM tended to be slightly more significant (Supplementary Figure S3d).
Comparison to human
The GTEx Consortium recently released a comprehensive map of eQTLs in 49 human tissues including 13 brain regions (7). The number of eGenes in our data was lower (mean of 3736 over five tissues) than for the human brain tissues in GTEx data (mean of 6870 over 13 tissues) where the authors used 114 to 209 donor samples to map eQTLs, using a similar testing procedure and the same false discovery rate (5%) as the present study. We sought to compare these counts in light of the correlation between sample size and eGene count among the GTEx tissues (Pearson's r = 0.86, Figure 5A). We subsampled each GTEx brain tissue dataset to 81 samples, the largest sample size in the present study. Mapping eQTLs with these subsampled datasets resulted in fewer eGenes (mean: 1900, SD: 473) than the rat brain tissues (mean: 3736, SD: 304). While this comparison pertains to human datasets with artificially reduced sample sizes, and the human and HS rat datasets differed in multiple biological and technical ways that could influence statistical power, it suggests that on a per subject basis we had greater power to map eQTLs in HS rats.
The distances between each eGene's top eSNP and TSS were much greater for rat brain (median 271 kb) than for human brain (median 35 kb, Figure 5B). In many cases a rat brain eQTL had a cluster of eSNPs in perfect LD (r2 = 1), which therefore had identical P-values. In these cases a single SNP was randomly chosen, which is one reason for the greater distances between the top eSNP and the TSS observed in HS rats.
Colocalization analysis and other transcriptome-informed functional population genomic analyses rely on presence of common regulatory variation in a population such as eQTLs to interpret GWAS signal. Next, we focused on genes that do not have an eQTL mapped in any brain tissues in the GTEx data. Out of 11 686 orthologous genes that are well expressed (median TPM > 1 in at least one tissue) in both GTEx human and HS rat data, we found that 85% have an eQTL in at least one GTEx brain tissue, leaving 1717 genes with no mapped eQTLs. As previously reported, the genes with no eQTLs are enriched for critical genes that are intolerant to loss of function coding genetic variation (46). We found that for 44% (n = 749) of these genes we identified an eQTL in at least one rat brain region. Indeed, the orthologous genes associated with these eGenes that are exclusive to the rat eQTL data are significantly more likely to be intolerant of loss-of-function mutations than those genes with eQTLs in GTEx data (Figure 5C). These results suggest that the HS rat population may be a valuable resource for characterizing phenotypic consequences of genetic variation in genes that are highly depleted for functional variation in human populations.
Conservation of genetic regulatory constraint between human and rat
Genetic regulatory variation present in a population is negatively correlated with the coding constraint of the genes (46,51,62). We compared the amount of genetic variation present in the HS rat population to human data using different approaches, each affected by a different set of confounding factors. Effect sizes (|log2 aFC|) of the cis-eQTLs in rat brain regions were smaller overall than those measured in 13 human brain tissues in GTEx (7). We looked at eQTL effect sizes for ortholog pairs to detect correspondence in tolerance to regulatory variation between similar human and rat genes. For each gene, we averaged the absolute effect size per top eQTL across tissues, and then paired up these values for every ortholog pair with any eQTL in rat brain tissues and any eQTL in human brain tissues (n = 6079 pairs). Effect sizes correlated significantly (Pearson's r = 0.24, P = 1.3e–79, Figure 5D). This suggests that some degree of variance in tolerance to regulatory variation is conserved between rat and human.
For each gene in each tissue, we estimated cis-heritability (h2) of expression. Since h2 is another measure pertaining to genetic regulatory constraint, we expected some correlation in h2 between orthologous genes due to their similarity in function and therefore correlation in their degree of evolutionary constraint. We averaged h2 across tissues per gene and compared to h2 estimates for human genes averaged over 13 GTEx brain tissues. Mean h2 between orthologs correlated modestly but was highly significant due to the large number of observations (Pearson's r = 0.096, P = 4.5e–31, Figure 5E).
We then estimated VG, the expected variance in the gene dosage due to interindividual genetic differences observed in allele specific expression using ANEVA (51). We compared the SDG (standard deviation, ) values per gene to those estimated for GTEx brain tissues. As expected, VG correlated much more highly between tissues from the same species than did VG between orthologs in cross-species tissue pairs (Supplementary Figure S4). When averaged across tissues per rat gene and human gene, SDG values for ortholog pairs were weakly but significantly correlated (Pearson's r = 0.14, P = 9.9e–15, Figure 5F). SDG tended to be lower (i.e. lower genetic dosage variance) for the rat gene in ortholog pairs, including the orthologs for a wide range of human gene sets representing both essential and non-essential genes (Figure 5G).
Colocalization
Due to the longer-range LD in HS rats compared to humans, particularly the blocks of SNPs in complete LD within the cohort, colocalization methods that model colocalization as the overlap of single causal SNPs are less informative because colocalization probabilities are divided among the group of SNPs that are in LD with one another. To address this limitation, we tested colocalization of cis-eQTLs with GWAS results for a set of nine traits related to body morphology and adiposity obtained from an independent cohort of HS rats (56) using the summary data-based Mendelian randomization (SMR) method, which only evaluates consistency of effect for the top eQTL (57). We found 80 significant colocalizations among the 45 tissue-trait pairs using tissue-trait-specific Bonferroni P-value thresholds ranging from 1.3e–5 to 1.6e–5 (Supplementary Table S6), with the most colocalizations found for prelimbic cortex eQTLs and the RetroFat trait (Figure 6A). Eight eGenes were involved in at least four colocalizations: Apip, Cacul1, Drc1, Gpn1, Mrpl45, Nudt4, Pnpo and Rbks. Colocalizations for multiple tissues and traits generally clustered together in or near the QTL regions of the original GWAS (Figure 6B, Supplementary Figure S5).
Data portal
All gene expression, eQTL, and sQTL data are available at RatGTEx.org, for which we have adapted code and API design from the GTEx Portal to host rat eQTL data. This portal also includes interactive visualizations, derived from those in the GTEx portal, that can display results for any queried genes and variants. These five datasets initiate the RatGTEx portal, with datasets for additional tissues to be added as they become available.
DISCUSSION
We used RNA-Seq to map eQTLs and sQTLs in five brain regions in a cohort of 88 outbred HS rats. We also explored the unique genetic characteristics of the HS rat population. We focused on cis-eQTLs and sQTLs and characterized the degree of tissue specificity. We compared our results to human eQTL data from the GTEx project. We also used colocalization to demonstrate the utility of these eQTLs for interpreting GWAS results from HS rats. We have made all of the data generated here including the eQTL and sQTL mapping results available through a new portal (RatGTEx.org) to facilitate the application of these data to rat genetic and genomic research.
We mapped both cis-eQTLs and trans-eQTLs. The trans-eQTLs were much less prevalent. The biological significance of trans-eQTL signals is generally harder to ascertain as the analysis suffers from limited statistical power and can be confounded by batch effects. Furthermore, the rat genome assembly is not as thoroughly characterized as the human genome. Thus, some trans-eQTLs may reflect mismapping of reads from RNA-Seq such that a cis-eQTL appears to be a trans-eQTL or the information about a SNP’s location can also be incorrect due to an error in the genome assembly, which also creates an apparent trans-eQTL that is actually a cis-eQTL, as has been observed in human eQTL studies (63). For all of these reasons, we focused most of our efforts on cis-eQTLs. Future work with large sample sizes and a focus on trans-eQTLs could yield interesting results, for example, pertaining to colocalization analysis to understand pleiotropic GWAS QTLs.
We compared HS rat data presented here with human data from the GTEx project. We found fewer cis-eQTLs in rats compared to humans. This difference is consistent with our smaller sample size. Indeed, when we downsampled GTEx brain datasets so that the number of individuals matched our study, we only identified half as many eQTLs compared to our rat data. One reason that HS rats had more power than humans on a per-sample basis might be the longer-range LD in HS rats, which reduces the effective number of tests being performed (64). Another advantage of HS rats is the higher MAF as compared to humans. The greater power in HS rats could also reflect the much more controlled environment of laboratory rats.
The greater LD in HS rats compared to humans increases power but does so at the expense of precision since there are often large LD blocks that increase uncertainty about which SNP causes a given eQTL. Another consequence of this causal SNP uncertainty is that the eSNP annotation enrichments reported here are less indicative of the specific regulatory mechanisms driving the effect compared to those obtained in humans. For example, the distances between the eSNP and the TSS in rats is much wider in HS rats as compared to humans (Figure 5B).
We found ten times fewer cis-sQTLs compared to cis-eQTLs. The GTEx project reported about four fold fewer cis-sQTLs compared to cis-eQTLs. The larger difference between sQTLs and eQTLs in our dataset may be due to both our use of single-end sequencing and our lower sequencing depth, both of which reduce the number of junction-spanning reads, which are essential for sQTL detection. Therefore, we do not believe this difference reflects a true biological difference between the two species.
Our study is similar to several previous studies that have mapped eQTLs or similar features in mice and rats. Prior mouse and rat studies have used inbred, recombinant inbred, and outbred populations, with microarrays or RNA-Seq (9,11–29,58,65–70). While some of these previous studies have used more computationally intensive linear mixed models to account for population structure effects, we did not find appreciable difference between the results from the linear regression and linear mixed model, which is consistent with Parker et al. (9). This may be explained by the fact that we avoided sampling multiple individuals from the same family. Had the breeding scheme been less carefully designed, there may have been isolated clusters of more closely related individuals within the population, and an LMM might have been necessary.
Genes that are intolerant to loss of function mutations tend to have lower levels of regulatory variation as well (46). We found cis-eQTLs in the rat orthologs of many of the human genes with no cis-eQTLs in the GTEx brain tissue data. Indeed, these genes with eQTLs in only rats had relatively high intolerance scores in humans. Given the lower statistical power in our study versus the GTEx brain dataset, these rat exclusive eQTLs are likely a result of relaxed selection pressure against eQTLs in these genes in rats. Presence of common regulatory variants in these genes presents an opportunity to study the downstream dosage effects in some of these variation-intolerant human gene orthologs.
Comparing the amount of genetic variation in gene expression in rats and humans, we found that this quantity is only moderately correlated between the populations. This low correlation level is likely a combined effect of low statistical power and the artificial nature of the rat population that relaxes selection constraints on genes. However, surprisingly, we found that the rat population shows lower levels of genetic regulatory variation across a diverse set of genes as measured by eQTL effect sizes, cis-heritability of gene expression, and the ASE-derived estimates of genetic variance in gene expression. Notably, these results cannot be explained by the difference in statistical power and the sample sizes. Future investigation could uncover the cause, in particular whether it relates to biological differences between humans and rats, consequences of the HS rat population design, environmental conditions, or other factors.
We were able to use eQTLs from brain tissue to show colocalization with nine body morphology and adiposity traits. The success of this approach may reflect the idea that eQTLs are shared across many tissues, not just among brain regions. Furthermore, adiposity is heavily influenced by consummatory behavior and energy expenditure, both of which are controlled by the brain.
The results of this study offer practical guidance for future HS rat eQTL studies. For example, the degree of eQTL overlap across brain regions was very high, especially for the three cortical regions. Had we sampled the fewer brain regions from a larger number of individuals, we would have obtained greater statistical power.
DATA AVAILABILITY
Raw RNA-Seq data is available at NCBI GEO accession GSE173141. Processed genotype, expression, eQTL, and sQTL data are available at https://RatGTEx.org/download/#studies.
Supplementary Material
ACKNOWLEDGEMENTS
Author contributions: A.A.P. and H.C. designed the study. L.S.W. bred and shipped the animals. T.W. bred and processed the animals and produced RNA-Seq data. HC supervised tissue collection and conducted a preliminary analysis. J.G. helped to produce the genotypes used in our analysis. L.M.S. processed RNA-Seq data and quantified gene expression. D.M., A.S.C., O.P. and N.E. processed RNA samples and analyzed data. P.M., A.A.P. and A.G. designed and advised on data analyses. P.M., A.A.P. and D.M. wrote the paper.
Contributor Information
Daniel Munro, Department of Psychiatry, University of California San Diego, La Jolla, CA, USA; Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, CA, USA.
Tengfei Wang, Department of Pharmacology, Addiction Science and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA.
Apurva S Chitre, Department of Psychiatry, University of California San Diego, La Jolla, CA, USA.
Oksana Polesskaya, Department of Psychiatry, University of California San Diego, La Jolla, CA, USA.
Nava Ehsan, Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, CA, USA.
Jianjun Gao, Department of Psychiatry, University of California San Diego, La Jolla, CA, USA.
Alexander Gusev, Division of Population Sciences, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA, USA.
Leah C Solberg Woods, Section of Molecular Medicine, Department of Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA.
Laura M Saba, Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
Hao Chen, Department of Pharmacology, Addiction Science and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA.
Abraham A Palmer, Department of Psychiatry, University of California San Diego, La Jolla, CA, USA; Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA.
Pejman Mohammadi, Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, CA, USA; Scripps Research Translational Institute, Scripps Research, La Jolla, CA, USA.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
National Institute on Drug Abuse [P50DA037844, P30DA044223]; National Institute of General Medical Sciences [R01GM140287]; National Institute on Alcohol Abuse and Alcoholism [R24AA013162]; D.M., P.M. and L.M.S. were partly supported by Skaggs Scholars Program. Funding for open access charge: National Institute on Drug Abuse [P50DA037844].
Conflict of interest statement. None declared.
REFERENCES
- 1. Ren Y., Palmer A.A.. Behavioral genetic studies in rats. Methods Mol. Biol. 2019; 2018:319–326. [DOI] [PubMed] [Google Scholar]
- 2. Padmanabhan S., Joe B.. Towards precision medicine for hypertension: a review of genomic, epigenomic, and microbiomic effects on blood pressure in experimental rat models and humans. Physiol. Rev. 2017; 97:1469–1528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Rojas A., Ganesh T., Wang W., Wang J., Dingledine R.. A rat model of organophosphate-induced status epilepticus and the beneficial effects of EP2 receptor inhibition. Neurobiol. Dis. 2020; 133:104399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Cohen R.M., Rezai-Zadeh K., Weitz T.M., Rentsendorj A., Gate D., Spivak I., Bholat Y., Vasilevko V., Glabe C.G., Breunig J.J.et al.. A transgenic alzheimer rat with plaques, tau pathology, behavioral impairment, oligomeric aβ, and frank neuronal loss. J. Neurosci. 2013; 33:6245–6256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Parker C.C., Chen H., Flagel S.B., Geurts A.M., Richards J.B., Robinson T.E., Solberg Woods L.C., Palmer A.A.. Rats are the smart choice: rationale for a renewed focus on rats in behavioral genetics. Neuropharmacology. 2014; 76:250–258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Aitman T.J., Critser J.K., Cuppen E., Dominiczak A., Fernandez-Suarez X.M., Flint J., Gauguier D., Geurts A.M., Gould M., Harris P.C.et al.. Progress and prospects in rat genetics: a community view. Nat. Genet. 2008; 40:516–522. [DOI] [PubMed] [Google Scholar]
- 7. GTEx Consortium The GTEx consortium atlas of genetic regulatory effects across human tissues. Science. 2020; 369:1318–1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Frochaux M.V., Bou Sleiman M., Gardeux V., Dainese R., Hollis B., Litovchenko M., Braman V.S., Andreani T., Osman D., Deplancke B.. cis-regulatory variation modulates susceptibility to enteric infection in the Drosophila genetic reference panel. Genome Biol. 2020; 21:6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Parker C.C., Gopalakrishnan S., Carbonetto P., Gonzales N.M., Leung E., Park Y.J., Aryee E., Davis J., Blizard D.A., Ackert-Bicknell C.L.et al.. Genome-wide association study of behavioral, physiological and gene expression traits in outbred CFW mice. Nat. Genet. 2016; 48:919–926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Chick J.M., Munger S.C., Simecek P., Huttlin E.L., Choi K., Gatti D.M., Raghupathy N., Svenson K.L., Churchill G.A., Gygi S.P.. Defining the consequences of genetic variation on a proteome-wide scale. Nature. 2016; 534:500–505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Hubner N., Wallace C.A., Zimdahl H., Petretto E., Schulz H., Maciver F., Mueller M., Hummel O., Monti J., Zidek V.et al.. Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nat. Genet. 2005; 37:243–253. [DOI] [PubMed] [Google Scholar]
- 12. Scheetz T.E., Kim K.-Y.A., Swiderski R.E., Philp A.R., Braun T.A., Knudtson K.L., Dorrance A.M., DiBona G.F., Huang J., Casavant T.L.et al.. Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc. Natl. Acad. Sci. 2006; 103:14429–14434. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Petretto E., Mangion J., Dickens N.J., Cook S.A., Kumaran M.K., Lu H., Fischer J., Maatz H., Kren V., Pravenec M.et al.. Heritability and tissue specificity of expression quantitative trait loci. PLoS Genet. 2006; 2:e172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Guryev V., Saar K., Adamovic T., Verheul M., van Heesch S.A.A.C., Cook S., Pravenec M., Aitman T., Jacob H., Shull J.D.et al.. Distribution and functional impact of DNA copy number variation in the rat. Nat. Genet. 2008; 40:538–545. [DOI] [PubMed] [Google Scholar]
- 15. Tabakoff B., Saba L., Printz M., Flodman P., Hodgkinson C., Goldman D., Koob G., Richardson H.N., Kechris K., Bell R.L.et al.. Genetical genomic determinants of alcohol consumption in rats and humans. BMC Biol. 2009; 7:70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Grieve I.C., Dickens N.J., Pravenec M., Kren V., Hubner N., Cook S.A., Aitman T.J., Petretto E., Mangion J.. Genome-wide co-expression analysis in multiple tissues. PLoS One. 2008; 3:e4033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Adriaens M.E., Lodder E.M., Moreno-Moral A., Šilhavý J., Heinig M., Glinge C., Belterman C., Wolswinkel R., Petretto E., Pravenec M.et al.. Systems genetics approaches in rat identify novel genes and gene networks associated with cardiac conduction. J. Am. Heart Assoc. 2018; 7:e009243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Otto G.W., Kaisaki P.J., Brial F., Le Lay A., Cazier J.-B., Mott R., Gauguier D.. Conserved properties of genetic architecture of renal and fat transcriptomes in rat models of insulin resistance. Dis. Model. Mech. 2019; 12:dmm038539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Chen T.-D., Rotival M., Chiu L.-Y., Bagnati M., Ko J.-H., Srivastava P.K., Petretto E., Pusey C.D., Lai P.-C., Aitman T.J.et al.. Identification of ceruloplasmin as a gene that affects susceptibility to glomerulonephritis through macrophage function. Genetics. 2017; 206:1139–1151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Dumas M.-E., Domange C., Calderari S., Martínez A.R., Ayala R., Wilder S.P., Suárez-Zamorano N., Collins S.C., Wallis R.H., Gu Q.et al.. Topological analysis of metabolic networks integrating co-segregating transcriptomes and metabolomes in type 2 diabetic rat congenic series. Genome Med. 2016; 8:101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Kaisaki P.J., Otto G.W., Argoud K., Collins S.C., Wallis R.H., Wilder S.P., Yau A.C.Y., Hue C., Calderari S., Bihoreau M.-T.et al.. Transcriptome profiling in rat inbred strains and experimental cross reveals discrepant genetic architecture of genome-wide gene expression. G3: Genes, Genomes, Genetics. 2016; 6:3671–3683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Wang J., Ma M.C.J., Mennie A.K., Pettus J.M., Xu Y., Lin L., Traxler M.G., Jakoubek J., Atanur S.S., Aitman T.J.et al.. Systems biology with high-throughput sequencing reveals genetic mechanisms underlying the metabolic syndrome in the lyon hypertensive rat. Circ. Cardiovasc. Genet. 2015; 8:316–326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Thessen Hedreul M., Möller S., Stridh P., Gupta Y., Gillett A., Daniel Beyeen A., Öckinger J., Flytzani S., Diez M., Olsson T.et al.. Combining genetic mapping with genome-wide expression in experimental autoimmune encephalomyelitis highlights a gene network enriched for t cell functions and candidate genes regulating autoimmunity. Hum. Mol. Genet. 2013; 22:4952–4966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Lindblom R.P.F., Aeinehband S., Parsa R., Ström M., Al Nimer F., Zhang X.-M., Dominguez C.A., Flytzani S., Diez M., Piehl F.. Genetic variability in the rat aplec C-type lectin gene cluster regulates lymphocyte trafficking and motor neuron survival after traumatic nerve root injury. J. Neuroinflamm. 2013; 10:60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Langley S.R., Bottolo L., Kunes J., Zicha J., Zidek V., Hubner N., Cook S.A., Pravenec M., Aitman T.J., Petretto E.. Systems-level approaches reveal conservation of trans-regulated genes in the rat and genetic determinants of blood pressure in humans. Cardiovasc. Res. 2013; 97:653–665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Jirout M.L., Friese R.S., Mahapatra N.R., Mahata M., Taupenot L., Mahata S.K., Kren V., Zídek V., Fischer J., Maatz H.et al.. Genetic regulation of catecholamine synthesis, storage and secretion in the spontaneously hypertensive rat. Hum. Mol. Genet. 2010; 19:2567–2580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Yamashita S., Wakazono K., Nomoto T., Tsujino Y., Kuramoto T., Ushijima T.. Expression quantitative trait loci analysis of 13 genes in the rat prostate. Genetics. 2005; 171:1231–1238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Keele G.R., Prokop J.W., He H., Holl K., Littrell J., Deal A., Francic S., Cui L., Gatti D.M., Broman K.W.et al.. Genetic fine-mapping and identification of candidate genes and variants for adiposity traits in outbred rats. Obesity. 2018; 26:213–222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Keele G.R., Prokop J.W., He H., Holl K., Littrell J., Deal A.W., Kim Y., Kyle P.B., Attipoe E., Johnson A.C.et al.. Sept8/SEPTIN8 involvement in cellular structure and kidney damage is identified by genetic mapping and a novel human tubule hypoxic model. Sci. Rep. 2021; 11:2071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Hansen C., Spuhler K.. Development of the national institutes of health genetically heterogeneous rat stock. Alcohol. Clin. Exp. Res. 1984; 8:477–479. [DOI] [PubMed] [Google Scholar]
- 31. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet j. 2011; 17:10. [Google Scholar]
- 32. Li B., Dewey C.N.. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011; 12:323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Love M.I., Huber W., Anders S.. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15:550. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R.. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013; 29:15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. van de Geijn B., McVicker G., Gilad Y., Pritchard J.K.. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods. 2015; 12:1061–1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M.et al.. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010; 20:1297–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Gileta A.F., Gao J., Chitre A.S., Bimschleger H.V., St Pierre C.L., Gopalakrishnan S., Palmer A.A.. Adapting Genotyping-by-Sequencing and variant calling for heterogeneous stock rats. G3: Genes, Genomes, Genetics. 2020; 10:2195–2205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Delaneau O., Howie B., Cox A.J., Zagury J.-F., Marchini J.. Haplotype estimation using sequencing reads. Am. J. Hum. Genet. 2013; 93:687–696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Howie B.N., Donnelly P., Marchini J.. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009; 5:e1000529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Broman K.W., Gatti D.M., Simecek P., Furlotte N.A., Prins P., Sen Ś., Yandell B.S., Churchill G.A.. R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multiparent populations. Genetics. 2019; 211:495–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Taylor-Weiner A., Aguet F., Haradhvala N.J., Gosai S., Anand S., Kim J., Ardlie K., Van Allen E.M., Getz G.. Scaling computational genomics to millions of individuals with GPUs. Genome Biol. 2019; 20:228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Ongen H., Buil A., Brown A.A., Dermitzakis E.T., Delaneau O.. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics. 2016; 32:1479–1485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. GTEx Consortium Genetic effects on gene expression across human tissues. Nature. 2017; 550:204–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Feng Y.-Y., Ramu A., Cotto K.C., Skidmore Z.L., Kunisaki J., Conrad D.F., Lin Y., Chapman W., Uppaulri R., Govindan R.et al.. RegTools: Integrated analysis of genomic and transcriptomic data for discovery of splicing variants in cancer. 2018; bioRxiv doi:08 April 2021, preprint: not peer reviewed 10.1101/436634. [DOI]
- 45. Li Y.I., Knowles D.A., Humphrey J., Barbeira A.N., Dickinson S.P., Im H.K., Pritchard J.K.. Annotation-free quantification of RNA splicing using leafcutter. Nat. Genet. 2018; 50:151–158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Mohammadi P., Castel S.E., Brown A.A., Lappalainen T.. Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change. Genome Res. 2017; 27:1872–1884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Castel S.E., Mohammadi P., Chung W.K., Shen Y., Lappalainen T.. Rare variant phasing and haplotypic expression from RNA sequencing with phASER. Nat. Commun. 2016; 7:12817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Castel S.E., Aguet F., Mohammadi P., GTEx Consortium, Ardlie K.G., Lappalainen T. A vast resource of allelic expression data spanning human tissues. Genome Biol. 2020; 21:234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Zhou X., Stephens M.. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 2012; 44:821–824. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Wheeler H.E., Shah K.P., Brenner J., Garcia T., Aquino-Michaels K., GTEx Consortium, Cox N.J., Nicolae D.L., Im H.K.. Survey of the heritability and sparse architecture of gene expression traits across human tissues. PLoS Genet. 2016; 12:e1006423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Mohammadi P., Castel S.E., Cummings B.B., Einson J., Sousa C., Hoffman P., Donkervoort S., Jiang Z., Mohassel P., Foley A.R.et al.. Genetic regulatory variation in populations informs transcriptome analysis in rare disease. Science. 2019; 366:351–356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Zarrei M., MacDonald J.R., Merico D., Scherer S.W.. A copy number variation map of the human genome. Nat. Rev. Genet. 2015; 16:172–183. [DOI] [PubMed] [Google Scholar]
- 53. The Deciphering Developmental Disorders Study Large-scale discovery of novel genetic causes of developmental disorders. Nature. 2015; 519:223–228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Buniello A., MacArthur J.A.L., Cerezo M., Harris L.W., Hayhurst J., Malangone C., McMahon A., Morales J., Mountjoy E., Sollis E.et al.. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019; 47:D1005–D1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. McLaren W., Gil L., Hunt S.E., Riat H.S., Ritchie G.R.S., Thormann A., Flicek P., Cunningham F.. The ensembl variant effect predictor. Genome Biol. 2016; 17:122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Chitre A.S., Polesskaya O., Holl K., Gao J., Cheng R., Bimschleger H., Garcia Martinez A., George T., Gileta A.F., Han W.et al.. Genome-Wide association study in 3,173 outbred rats identifies multiple loci for body weight, adiposity, and fasting glucose. Obesity. 2020; 28:1964–1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Zhu Z., Zhang F., Hu H., Bakshi A., Robinson M.R., Powell J.E., Montgomery G.W., Goddard M.E., Wray N.R., Visscher P.M.et al.. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 2016; 48:481–487. [DOI] [PubMed] [Google Scholar]
- 58. Gonzales N.M., Seo J., Hernandez Cordero A.I., St Pierre C.L., Gregory J.S., Distler M.G., Abney M., Canzar S., Lionikas A., Palmer A.A.. Genome wide association analysis in a mouse advanced intercross line. Nat. Commun. 2018; 9:5162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Ghazalpour A., Doss S., Kang H., Farber C., Wen P.-Z., Brozell A., Castellanos R., Eskin E., Smith D.J., Drake T.A.et al.. High-resolution mapping of gene expression using association in an outbred mouse stock. PLoS Genet. 2008; 4:e1000149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Sul J.H., Martin L.S., Eskin E.. Population structure in genetic studies: confounding factors and mixed models. PLoS Genet. 2018; 14:e1007309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Lek M., Karczewski K.J., Minikel E.V., Samocha K.E., Banks E., Fennell T., O’Donnell-Luria A.H., Ware J.S., Hill A.J., Cummings B.B.et al.. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016; 536:285–291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Petrovski S., Gussow A.B., Wang Q., Halvorsen M., Han Y., Weir W.H., Allen A.S., Goldstein D.B.. The intolerance of regulatory sequence to genetic variation predicts gene dosage sensitivity. PLoS Genet. 2015; 11:e1005492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Saha A., Battle A.. False positives in trans-eQTL and co-expression analyses arising from RNA-sequencing alignment errors. [version 2; peer review: 3 approved]. F1000Res. 2018; 7:1860. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Parker C.C., Palmer A.A.. Dark matter: are mice the solution to missing heritability?. Front. Genet. 2011; 2:32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65. Chesler E.J., Lu L., Shou S., Qu Y., Gu J., Wang J., Hsu H.C., Mountz J.D., Baldwin N.E., Langston M.A.et al.. Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat. Genet. 2005; 37:233–242. [DOI] [PubMed] [Google Scholar]
- 66. Hasin-Brumshtein Y., Khan A.H., Hormozdiari F., Pan C., Parks B.W., Petyuk V.A., Piehowski P.D., Brümmer A., Pellegrini M., Xiao X.et al.. Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes. eLife. 2016; 5:e15614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Orozco L.D., Bennett B.J., Farber C.R., Ghazalpour A., Pan C., Che N., Wen P., Qi H.X., Mutukulu A., Siemers N.et al.. Unraveling inflammatory responses using systems genetics and gene-environment interactions in macrophages. Cell. 2012; 151:658–670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Aylor D.L., Valdar W., Foulds-Mathes W., Buus R.J., Verdugo R.A., Baric R.S., Ferris M.T., Frelinger J.A., Heise M., Frieman M.B.et al.. Genetic analysis of complex traits in the emerging collaborative cross. Genome Res. 2011; 21:1213–1222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Skelly D.A., Raghupathy N., Robledo R.F., Graber J.H., Chesler E.J.. Reference trait analysis reveals correlations between gene expression and quantitative traits in disjoint samples. Genetics. 2019; 212:919–929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Rat Genome Sequencing and Mapping Consortium Combined sequence-based and genetic mapping analysis of complex traits in outbred rats. Nat. Genet. 2013; 45:767–775. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Raw RNA-Seq data is available at NCBI GEO accession GSE173141. Processed genotype, expression, eQTL, and sQTL data are available at https://RatGTEx.org/download/#studies.