Abstract
Allele-specific gene expression can influence disease traits. Non-coding germline genetic variants that alter regulatory elements can cause allele-specific gene expression and contribute to cancer susceptibility. In tumors, both somatic copy number alterations and somatic single nucleotide variants have been shown to lead to allele-specific expression of genes, many of which are considered drivers of tumor growth. Here, we review recent studies revealing the pervasive presence of this phenomenon in cancer susceptibility and progression. Furthermore, we underscore the importance of careful experimental design and computational analysis for accurate allelic expression quantification and avoidance of false positives. Finally, we discuss additional methodological challenges encountered in cancer studies and in the burgeoning field of single-cell transcriptomics.
Current Opinion in Genetics and Development 2021, 66:10–19
This review comes from a themed issue on Cancer genomics
Edited by David J. Adams, Marcin Imielinski and C. Daniela Robles-Espinoza
For a complete overview see the Issue and the Editorial
Available online 28th December 2020
https://doi.org/10.1016/j.gde.2020.10.007
0959-437X/© 2020 Published by Elsevier Ltd.
Introduction
The human genome is diploid, with each individual generally carrying two copies of each chromosome. Each chromosome harbors one copy of each gene, referred to as allele, each of which is inherited by one of the two parents. The two gene alleles are generally expressed at similar levels in a tissue, but cis-regulatory differences between them, for example, differential binding of transcription factors (TFs), can lead to systematic differences between the expression of the two alleles in an individual (Figure 1a). This is commonly referred to as allelic imbalance or Allele-Specific Expression (ASE). In extreme instances, only one copy of the gene is expressed, a phenomenon called mono-allelic expression, that can be due to gene deletion or to epigenetic mechanisms such as imprinting or X chromosome inactivation. The gene expression measured for each gene allele individually, called from now on allelic expression quantification, can be generated using RNA sequencing data and allows to quantify the degree of allelic imbalance in each gene (Figure 1b) [1,2•].
In non-cancerous tissues, ASE is driven mostly by genetic regulatory variation in cis, specifically by genetic variants that transcriptionally or post-transcriptionally influence the amount of mRNA present from each allele [3, 4, 5]. For example, a genetic variant could affect mRNA decay [6] or alternative splicing [7,8], which can lead to ASE for a gene. In cancer tissues, on the other hand, ASE is often driven by somatic copy number variation (SCNA) of one allele, including focal amplifications of cancer-promoting genes or loss of a wild-type tumor suppressor gene copy, which can often confer a selective advantage to tumor growth [9]. Furthermore, non-coding germline genetic variants can exert their effect on cancer predisposition and progression by causing ASE. These data can be used to study the cis-regulatory footprint of both germline variants and somatic mutations in cancer (Figure 2). Additionally, these allelic expression phenomena can be cell-type or cell-state dependent [10, 11, 12, 13, 14]. For example, TFs driving ASE may be differentially active across cell states, so allelic imbalance may not be observed in all contexts or environments (Figure 1c). This may also influence how and when traits are altered and may therefore lead to disease.
In this Review, we describe how ASE has been shown to influence cancer development and progression, with an emphasis on the most recent discoveries. We then outline how allelic expression can be measured accurately along with its technical and analytical challenges, with a particular emphasis on cancer-associated complexities and the transformative single-cell omics technologies, and we conclude with our view on future perspectives.
Allele-specific expression in cancer
Genetic predisposition to cancer
Recently, allelic expression has been used to investigate the target genes of non-coding variants within regions associated with cancer predisposition through genome-wide association studies (GWAS). One of the first ASE-associated loci that was described in colorectal and prostate cancer was rs6983267, which lies in an enhancer region and affects the expression of the well-known c-MYC oncogene [15,16], and since then, other studies have fine-mapped variants in other regulatory regions that may influence gene expression in other cancer types [17, 18, 19]. For example, Choi et al. focused on a 100 kb region in chromosome 1 that has been associated to melanoma risk, and observed that the tag SNP rs3219090 was associated to ASE in PARP1, a gene coding for an enzyme that participates in DNA repair, and fine-mapped this effect to preferential binding of RECQL, another DNA repair protein, to a nearby indel [20]. According to the authors, the risk allele then translates into higher PARP1 levels, which may promote melanoma formation by rescuing cells from BRAFV600E oncogene-induced senescence. Another example involves intronic cis-acting variants in individuals predisposed to breast and ovarian cancer that lead to haploinsufficiency of the DNA repair proteins PALB2 and BRCA1 [21,22], and regions in the genome with differential promoter/enhancer activity between matched tumor and normal samples, that overlap GWAS-associated variants in renal cell carcinoma [23]. In an instance that exemplifies the complexities of cancer risk and gene regulation, Hua et al. identified that prostate cancer risk SNP rs11672691 falls within an intron of lncRNA PCAT19, in a region with both promoter and enhancer function. Risk allele G was associated with both a lower expression of the short form of lncRNA PCAT19 (affecting its promoter function) and a higher expression of its long form (affecting its enhancer function) via decreased binding of transcription factors YY1 and NKX3.1 [24•]. This long form then promotes prostate cancer development by cooperating to activate a number of cell-cycle genes. To facilitate performing these analyses systematically, methods such as PLASMA [25] and the statistical framework developed by Zou et al. [26] have been introduced.
Furthermore, allele-specific mechanisms have been speculated to play a role in modifying the penetrance of deleterious variants, for example in individuals with Li-Fraumeni syndrome, caused by germline damaging variation in the tumor suppressor gene TP53 [27]. Buzby et al. reported a father-daughter duo where both were heterozygous carriers of a deleterious TP53 Ser241Tyr variant, but only the daughter developed tumors. Cells from the father showed a significantly higher wild-type/mutant TP53 ASE ratio than those from the daughter, which allowed him to have comparable total TP53 expression levels to those of homozygous wild-type TP53 cells. Although the authors did not investigate the causes of ASE, they speculate that these TP53 alleles may be subject to imprinting or an undescribed epigenetic regulatory mechanism. This was a targeted search, but generalizing these observations, Castel et al. [28•] used phased genomes from healthy individuals and cohorts from The Cancer Genome Atlas (TCGA) [29], and discovered that cancer patients had an enrichment of risk-increasing haplotype configurations, consisting of a rare coding variant on a higher-expressed haplotype, compared to controls. This suggests that germline cis-regulatory variants can modify the penetrance of coding variants.
Somatic mutations leading to allelic imbalance
ASE tends to be more common in tumors as compared to normal tissues, and is largely driven by SCNAs [30, 31, 32]. For example, a recent study characterized the genomic contribution to RNA alterations in the tumor transcriptomes of 1188 patients. The authors reported that SCNAs accounted for 84.3% of the variation in ASE, and germline variants associated with expression levels (eQTL) explained 9.1% [33••]. Interestingly, somatic single nucleotide variants with a stop-gain effect leading to nonsense-mediated decay (NMD) composed the most relevant mechanism to explain ASE at an individual level, in line with findings from rare genetic variation in healthy populations in GTEx data [5]. Furthermore, the authors found that ASE driven by somatic variation is enriched in cancer driver genes, suggesting that the allelic effects can act as a driver and are not only a consequence of cancer-associated genomic aberrations. A similar result was observed by Przytycki and Singh, who developed a method to detect differential ASE between normal and tumor samples and, when applied to TCGA breast cancer samples, found that known cancer genes exhibit this phenomenon, with SCNA and NMD being important contributors [34•]. Another smaller study focusing on 11 recurrently mutated genes in acute myeloid leukemia found that 9 showed ASE, supporting the idea that ASE may be a common event in cancer [35].
Allelic imbalance can also be studied at the DNA level by quantifying SCNAs, and assessing whether the gene allele carrying a coding somatic mutation was specifically amplified or lost. A study of more than a thousand likely driver mutations across 69 oncogenes in 13 448 tumors concluded that nearly half (45%) showed allelic imbalance in DNA copy numbers, and that 41% of all samples studied across 53 cancer types showed mutant allele imbalance of at least one oncogenic mutation [9]. Focal amplifications, loss of the wild-type allele, and “hitchhiking” (to a lower extent), were all found to play a role in these observations, and distinct mechanisms behind them were described (e.g. tumor suppressor genes were mostly affected by the loss of the WT allele whereas oncogenes showed mainly single-copy genomic gains of the mutant allele). It is not unreasonable then to expect that these genomic aberrations will translate to a transcriptional bias leading to ASE. All these large studies indicate that a significant fraction of driver alterations in cancer are associated with ASE events, and that considering the allelic imbalance state of cancer-associated genes may provide additional prognostic information. These observations further support the involvement of this mechanism in tumor development.
Overall, these studies exemplify the utility of studying ASE and its role in cancer susceptibility and tumor progression. Here, we would like to underscore the importance of applying rigorous methodologies to its study. While ASE has the advantage of comparing expression levels of two alleles subject to the same technical and biological environment, it can easily lead to false positives if technical biases and noise are not considered. Below we highlight the main computational and experimental aspects that need to be taken into account when performing ASE analyses.
Allelic expression quantification methods and technical considerations
Allelic expression quantification is typically measured by taking advantage of heterozygous sites inside exons and counting the number of next generation sequencing (NGS) reads mapping over the heterozygous site that displays one allele versus the other (Figure 1b). If the ratio between the two alleles significantly deviates from the expected 50:50, then this locus is deemed to show ASE or allelic imbalance. However, in order to quantify allelic expression in an accurate and reliable way, there are multiple experimental aspects and computational biases to take into account when designing an experiment and analyzing the resulting data. Figure 3 depicts the main steps and recommended guidelines to perform allelic expression analyses. As detailed best practices for ASE analyses are out of the scope of this review, we would like to direct those readers interested in greater depth in the topic to Castel et al. [1], as well as the other references cited in Figure 3 and throughout this section.
First, for a given individual, heterozygous variants in genes need to be identified. This can be done by a number of methods, including whole genome sequencing, exome sequencing, genotyping microarrays, and RNA-seq (although extra considerations should be taken if the latter is chosen, as variant calling from RNA-seq has inherent limitations that may significantly impact their reliability in ASE analyses [1]). Then, once RNA-seq is performed on the tissue of interest, and after proper quality check, reads are aligned to the reference genome. Here, mapping bias is an important aspect to control for when quantifying allelic expression. During the alignment of reads to heterozygous sites on the reference genome, reads that contain the reference allele will align better than reads with the alternative allele. This can lead to a higher number of genes with false-positive ASE signal, since the reference allele will be overrepresented [1]. Several strategies have been proposed to alleviate mapping bias in allelic expression data such as using variant-aware aligners [36,37], or discarding reads that would not map uniquely to the same position if their allele is flipped [38]. For highly polymorphic genes, such as Human Leukocyte Antigen (HLA) genes, mapping bias is even more problematic. For example, individuals that have HLA alleles that are highly different from the reference genome can appear to have lower gene expression levels (or lower DNA copy number dosage) than individuals with reference alleles. At present, the best approach to quantify allelic expression on HLA genes is to use a personalized genome for each individual, containing their specific HLA alleles [11,39,40]. Some limitations may soon be overcome by technological advances on long read sequencing approaches applied to HLA allelic expression quantification as well as to isoform-specific allelic expression determination [41,42].
Another aspect to take into account when testing for ASE is the overdispersed nature of allelic expression data, which can cause false positives if assuming a standard binomial distribution [38,43]. To avoid this, extra-binomial variation has been accounted for using beta-binomial models [13,38], binomial-logit-normal distribution [44] or by estimating overdispersion as a random effect in a binomial generalized linear-mixed model [11,12].
Careful experimental design can also ensure better quality of ASE analysis. For example, longer reads are preferable over shorter reads as the latter are significantly more prone to display mapping bias. Additionally, library complexity can influence the quality of ASE data, with low complexity libraries (such as those with low amount of RNA starting material) or libraries with a large amount of low complexity sites (due to low number of reads with unique starting positions for example) posing extra challenges [45,46]. Higher sequencing depth and higher number of heterozygous sites in expressed regions of the genome (usually exonic, and occasionally intronic and intergenic regions) yield higher power to detect allelic imbalance signals. Hence, when comparing the degree of ASE between different samples (such as healthy tissue against tumor), variation in read depth and number of heterozygous variants ascertained should be taken into account. It is also important to keep in mind that while ASE largely reflects the effects of regulatory variants, it does not reveal information about the regulatory variant itself, and further investigation is needed for that purpose [5,47,48].
Allelic expression quantification in cancer
While allelic expression quantification in cancer follows in general the same steps as the analysis in non-cancer tissues, there are extra challenges that need consideration. For example, SCNAs on the tumor may affect the interpretation of an identified ASE event in an exonic heterozygous germline variant and therefore, copy-number profiles of the tumor and of the matched normal should be integrated into the analysis of a cancer sample if available. Similarly, other somatic variants may affect detection of ASE. For example, small indels produce stronger mapping biases. And if allelic expression is to be measured over exonic somatic single nucleotide variants (SNV), care should be taken as to how these somatic variants are called, since both the fraction of tumor cells in a sample and sequencing depth can impact the number of reads observed for a given gene and each of its alleles [49,50]. There may also be tumor-specific biases to take into account given that tumor mutation burden varies among tumor types and among patients with the same tumor type [51]. Theoretically, a higher tumor mutation burden (TMB) could mean that more somatic variants within genes would be available to detect ASE and/or more regulatory somatic variants could be causing ASE. In a recent study comparing tumor types there does not seem to be a clear correlation between high number of non-synonymous SNVs and high ASE (e.g. melanoma and breast adenocarcinoma [33••]). However, it remains to be systematically tested whether different metrics of TMB are associated with higher ASE events, while controlling for read depth. Hence, the somatic or germline nature of the ASE event needs to be properly determined by using both large databases of human genetic variation and a matched normal tissue if available.
Allele-specific expression in single cells
The latest omic technologies allow us to ascertain the transcriptome of single cells, and theoretically, to detect cell-specific ASE. Single-cell RNA-seq has enabled researchers to assess tumor and immune cell heterogeneity, and discover new cellular states driven by transcriptional programs with important roles in cancer and response to therapy [52, 53, 54]. However, there are still important limitations of this technology. Single-cell RNA-seq protocols are able to capture only a small proportion of the mRNA pool in a cell, which makes the data sparse. Estimates range from 6 to 8% in early high-throughput protocols [55], although with more recent versions it may go up to 32% (10X Genomics URL: https://www.10xgenomics.com/), with low-throughput methods in general having higher sensitivity [56]. For allelic expression quantification, this is a particularly important problem. If only one of the two alleles is detected in a given cell, this does not necessarily mean that the other allele is not expressed; it may just not have been captured by the technique. For example, Borel et al. discovered more ASE in single fibroblast cells than in bulk data [57]. While part of this may be biological, it is currently hard to disentangle true allelic imbalance signals from the high levels of technical noise at a single cell level [58,59•]. Kim et al. proposed to use external RNA spike-ins during library preparation to distinguish technical from biological sources of variation in ASE [58]. A recent study developed a computational method to quantify in a more precise way allelic expression in single cells. This method leverages information from multi-mapping reads and from other cells that are in the same allelic state, which is particularly important in scenarios of low depth sequencing [59•]. An additional consideration performing single-cell RNA-seq with the objective of quantifying allelic expression is the area of the gene captured by the specific protocol. For example, protocols that cover the whole transcript, such as Smart-seq3 [60], are able to quantify allelic expression over a higher number of heterozygous sites than protocols that only cover the 3’ or 5’ end of the gene [55].
Despite these technical challenges, allelic expression quantification in single-cell data has already been useful to study a number of biological processes. Groups have quantified allelic expression in the X chromosome to study X chromosome inactivation escape [61,62]. Others have measured allelic expression across phased SNPs within genes to infer transcriptional kinetics in mice, suggesting these are influenced differently by promoters and enhancers [63]. In cancer, allelic information from single-cell RNA-seq data has been used to characterize intra-tumor heterogeneity and identify key transcriptional programs in particular genetic subclones [64,65].
Conclusions and perspectives
Recent studies have shown that allele-specific mechanisms play a significant role in cancer development. At the germline variant level, several susceptibility variants identified by cancer GWA studies have been shown to have cis-regulatory effects for nearby genes, and there is evidence that this mechanism may also play a role in high-penetrance familial cancer syndromes [27]. At the somatic mutation level, a significant fraction of tumors have at least one somatically acquired-oncogenic mutation displaying allelic imbalance. Furthermore, recent evidence shows that germline regulatory variants can influence tissue immune infiltration [66], response to immunotherapy [67], and gene regulation in tumor-infiltrating lymphocytes only [68]. This suggests that future studies of ASE in both cancer tissues and immune cells may reveal additional insights into mechanisms of cancer development and response to therapy, and potentially even in predicting autoimmunity side effects to treatment [69].
ASE studies allow us to gain insights into important biological processes in gene regulation, and the potential ways in which these events contribute to triggering carcinogenesis, tumor evolution, or response to cancer therapies. However, it is critical to take into account potential technical biases when designing experiments and performing allelic expression bioinformatic and statistical analyses, in order to avoid false positives and derive robust conclusions. This is especially challenging with single-cell experiments, where data are sparse, and cells are often sequenced at low depth. However, as single-cell experimental protocols mature and new computational methods are developed to improve accurate quantification and statistical analysis of allelic expression in single cells, this will deepen our understanding of how ASE changes dynamically between cell states, how it interacts with protein-coding variants, and overall how it influences transcriptional programs that lead to disease or response to therapies. We expect that these developments will contribute to increasing our appreciation of the diversity of mechanisms fueling cancer development.
Conflict of interest statement
Nothing declared.
References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as
• of special interest
•• of outstanding interest
Acknowledgements
The authors wish to thank Jair S. García-Sotelo, Alejandro de León, Carlos S. Flores, and Luis A. Aguilar of the Laboratorio Nacional de Visualización Científica Avanzada from the National Autonomous University of Mexico, and Alejandra Castillo, Carina Díaz, Abigayl Hernández and Eglee Lomelin of the International Laboratory for Human Genome Research, UNAM. C.D.R.-E. was supported by the Medical Research Council [MR/S01473X/1] and Programa de Apoyo a Proyectos de Investigación e Innovación Tecnológica (PAPIIT UNAM) [IA202020], the Academy of Medical Sciences through a Newton Advanced Fellowship and by the Wellcome Sanger Institute through an International Fellowship, and by CONACyT (Projects no. A1-S-30165 and A3-S-31603). P.M. was supported by the NIH Center for Translational Science Award (CTSA) grants (UL1TR002550).
References
- 1.Castel S.E., Levy-Moonshine A., Mohammadi P., Banks E., Lappalainen T. Tools and best practices for data processing in allelic expression analysis. Genome Biol. 2015;16:195. doi: 10.1186/s13059-015-0762-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2•.Castel S.E., GTEx Consortium, Aguet F., Mohammadi P., Ardlie K.G., Lappalainen T. A vast resource of allelic expression data spanning human tissues. Genome Biol. 2020;21 doi: 10.1186/s13059-020-02122-z. [DOI] [PMC free article] [PubMed] [Google Scholar]; Using a large RNA-seq resource of 54 tissues from 838 individuals, the authors identify allele-specific expression across 15,253 samples using exemplary methods. They identify ASE at both the SNP level and haplotype level, and present a new tool to estimate effect sizes of cis-regulatory variants.
- 3.Buil A., Brown A.A., Lappalainen T., Viñuela A., Davies M.N., Zheng H.-F., Richards J.B., Glass D., Small K.S., Durbin R. Gene-gene and gene-environment interactions detected by transcriptome sequence analysis in twins. Nat Genet. 2015;47:88–91. doi: 10.1038/ng.3162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lappalainen T., Sammeth M., Friedländer M.R., ’t Hoen P.A.C., Monlong J., Rivas M.A., Gonzàlez-Porta M., Kurbatova N., Griebel T., Ferreira P.G. Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013;501:506–511. doi: 10.1038/nature12531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ferraro N.M., Strober B.J., Einson J., Abell N.S., Aguet F., Barbeira A.N., Brandt M., Bucan M., Castel S.E., Davis J.R. Transcriptomic signatures across human tissues identify functional rare genetic variation. Science. 2020;369 doi: 10.1126/science.aaz5900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Pai A.A., Cain C.E., Mizrahi-Man O., De Leon S., Lewellen N., Veyrieras J.-B., Degner J.F., Gaffney D.J., Pickrell J.K., Stephens M. The contribution of RNA decay quantitative trait loci to inter-individual variation in steady-state gene expression levels. PLoS Genet. 2012;8 doi: 10.1371/journal.pgen.1003000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Montgomery S.B., Sammeth M., Gutierrez-Arcelus M., Lach R.P., Ingle C., Nisbett J., Guigo R., Dermitzakis E.T. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature. 2010;464:773–777. doi: 10.1038/nature08903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Pickrell J.K., Marioni J.C., Pai A.A., Degner J.F., Engelhardt B.E., Nkadori E., Veyrieras J.-B., Stephens M., Gilad Y., Pritchard J.K. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010;464:768–772. doi: 10.1038/nature08872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bielski C.M., Donoghue M.T.A., Gadiya M., Hanrahan A.J., Won H.H., Chang M.T., Jonsson P., Penson A.V., Gorelick A., Harris C. Widespread selection for oncogenic mutant allele imbalance in cancer. Cancer Cell. 2018;34:852–862. doi: 10.1016/j.ccell.2018.10.003. e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gutierrez-Arcelus M., Ongen H., Lappalainen T., Montgomery S.B., Buil A., Yurovsky A., Bryois J., Padioleau I., Romano L., Planchon A. Tissue-specific effects of genetic and epigenetic variation on gene regulation and splicing. PLoS Genet. 2015;11 doi: 10.1371/journal.pgen.1004958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gutierrez-Arcelus M., Baglaenko Y., Arora J., Hannes S., Luo Y., Amariuta T., Teslovich N., Rao D.A., Ermann J., Jonsson A.H. Allele-specific expression changes dynamically during T cell activation in HLA and other autoimmune loci. Nat Genet. 2020;52:247–253. doi: 10.1038/s41588-020-0579-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Knowles D.A., Davis J.R., Edgington H., Raj A., Favé M.-J., Zhu X., Potash J.B., Weissman M.M., Shi J., Levinson D.F. Allele-specific expression reveals interactions between genetic variation and environment. Nat Methods. 2017;14:699–702. doi: 10.1038/nmeth.4298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Moyerbrailean G.A., Richards A.L., Kurtz D., Kalita C.A., Davis G.O., Harvey C.T., Alazizi A., Watza D., Sorokin Y., Hauff N. High-throughput allele-specific expression across 250 environmental conditions. Genome Res. 2016;26:1627–1638. doi: 10.1101/gr.209759.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kim-Hellmuth S., Aguet F., Oliva M., Muñoz-Aguirre M., Kasela S., Wucher V., Castel S.E., Hamel A.R., Viñuela A., Roberts A.L. Cell type-specific genetic regulation of gene expression across human tissues. Science. 2020;369 doi: 10.1126/science.aaz8528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Pomerantz M.M., Ahmadiyeh N., Jia L., Herman P., Verzi M.P., Doddapaneni H., Beckwith C.A., Chan J.A., Hills A., Davis M. The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat Genet. 2009;41:882–884. doi: 10.1038/ng.403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wright J.B., Brown S.J., Cole M.D. Upregulation of c-MYC in cis through a large chromatin loop linked to a cancer risk-associated single-nucleotide polymorphism in colorectal cancer cells. Mol Cell Biol. 2010;30:1411–1420. doi: 10.1128/MCB.01384-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wang M., Li Z., Chu H., Lv Q., Ye D., Ding Q., Xu C., Guo J., Du M., Chen J. Genome-wide association study of bladder cancer in a Chinese cohort reveals a new susceptibility locus at 5q12.3. Cancer Res. 2016;76:3277–3284. doi: 10.1158/0008-5472.CAN-15-2564. [DOI] [PubMed] [Google Scholar]
- 18.Dudek A.M., Vermeulen S.H., Kolev D., Grotenhuis A.J., Kiemeney L.A.L.M., Verhaegh G.W. Identification of an enhancer region within the TP63/LEPREL1 locus containing genetic variants associated with bladder cancer risk. Cell Oncol. 2018;41:555–568. doi: 10.1007/s13402-018-0393-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Conde L., Bracci P.M., Richardson R., Montgomery S.B., Skibola C.F. Integrating GWAS and expression data for functional characterization of disease-associated SNPs: an application to follicular lymphoma. Am J Hum Genet. 2013;92:126–130. doi: 10.1016/j.ajhg.2012.11.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Choi J., Xu M., Makowski M.M., Zhang T., Law M.H., Kovacs M.A., Granzhan A., Kim W.J., Parikh H., Gartside M. A common intronic variant of PARP1 confers melanoma risk and mediates melanocyte growth via regulation of MITF. Nat Genet. 2017;49:1326–1335. doi: 10.1038/ng.3927. [DOI] [PubMed] [Google Scholar]
- 21.Montalban G., Bonache S., Moles-Fernández A., Gisbert-Beamud A., Tenés A., Bach V., Carrasco E., López-Fernández A., Stjepanovic N., Balmaña J. Screening of BRCA1/2 deep intronic regions by targeted gene sequencing identifies the first germline BRCA1 variant causing pseudoexon activation in a patient with breast/ovarian cancer. J Med Genet. 2019;56:63–74. doi: 10.1136/jmedgenet-2018-105606. [DOI] [PubMed] [Google Scholar]
- 22.Duran-Lozano L., Montalban G., Bonache S., Moles-Fernández A., Tenés A., Castroviejo-Bermejo M., Carrasco E., López-Fernández A., Torres-Esquius S., Gadea N. Alternative transcript imbalance underlying breast cancer susceptibility in a family carrying PALB2 c.3201+5G>T. Breast Cancer Res Treat. 2019;174:543–550. doi: 10.1007/s10549-018-05094-8. [DOI] [PubMed] [Google Scholar]
- 23.Gusev A., Spisak S., Fay A.P., Carol H., Vavra K.C. Allelic imbalance reveals widespread germline-somatic regulatory differences and prioritizes risk loci in Renal Cell Carcinoma. bioRxiv. 2019 [Google Scholar]
- 24•.Hua J.T., Ahmed M., Guo H., Zhang Y., Chen S., Soares F., Lu J., Zhou S., Wang M., Li H. Risk SNP-mediated promoter-enhancer switching drives prostate cancer through lncRNA PCAT19. Cell. 2018;174:564–575. doi: 10.1016/j.cell.2018.06.014. e18. [DOI] [PubMed] [Google Scholar]; This is a comprehensive study characterizing the regulatory effects of a prostate cancer susceptibility region. By complementing findings from publicly available data with a series of functional experiments, the authors show that risk-associated variants within the region affect the binding of two transcription factors, which results in PCAT19 isoform switching via a functional transition of this region from promoter to enhancer.
- 25.Wang A.T., Shetty A., O’Connor E., Bell C., Pomerantz M.M., Freedman M.L., Gusev A. Allele-specific QTL fine mapping with PLASMA. Am J Hum Genet. 2020;106:170–187. doi: 10.1016/j.ajhg.2019.12.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zou J., Hormozdiari F., Jew B., Castel S.E., Lappalainen T., Ernst J., Sul J.H., Eskin E. Leveraging allelic imbalance to refine fine-mapping for eQTL studies. PLoS Genet. 2019;15 doi: 10.1371/journal.pgen.1008481. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Buzby J.S., Williams S.A., Schaffer L., Head S.R., Nugent D.J. Allele-specific wild-type TP53 expression in the unaffected carrier parent of children with Li-Fraumeni syndrome. Cancer Genet. 2017;211:9–17. doi: 10.1016/j.cancergen.2017.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28•.Castel S.E., Cervera A., Mohammadi P., Aguet F., Reverter F., Wolman A., Guigo R., Iossifov I., Vasileva A., Lappalainen T. Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk. Nat Genet. 2018;50:1327–1334. doi: 10.1038/s41588-018-0192-y. [DOI] [PMC free article] [PubMed] [Google Scholar]; In this paper the authors used allelic expression analyses to investigate the interaction between regulatory and coding variation, showing that individuals that developed cancer had overall a larger amount of ‘deleterious combinations’ than healthy controls, such as rare pathogenic coding alleles on the higher expressed haplotype.
- 29.Cancer Genome Atlas Research Network, Weinstein J.N., Collisson E.A., Mills G.B., Shaw K.R.M., Ozenberger B.A., Ellrott K., Shmulevich I., Sander C., Stuart J.M. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45:1113–1120. doi: 10.1038/ng.2764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Walker E.J., Zhang C., Castelo-Branco P., Hawkins C., Wilson W., Zhukova N., Alon N., Novokmet A., Baskin B., Ray P. Monoallelic expression determines oncogenic progression and outcome in benign and malignant brain tumors. Cancer Res. 2012;72:636–644. doi: 10.1158/0008-5472.CAN-11-2266. [DOI] [PubMed] [Google Scholar]
- 31.Liu Z., Dong X., Li Y. A genome-wide study of allele-specific expression in colorectal cancer. Front Genet. 2018;9:570. doi: 10.3389/fgene.2018.00570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Mayba O., Gilbert H.N., Liu J., Haverty P.M., Jhunjhunwala S., Jiang Z., Watanabe C., Zhang Z. MBASED: allele-specific expression detection in cancer tissues and cell lines. Genome Biol. 2014;15:405. doi: 10.1186/s13059-014-0405-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33••.PCAWG Transcriptome Core Group, Calabrese C., Davidson N.R., Demircioğlu D., Fonseca N.A., He Y., Kahles A., Lehmann K.-V., Liu F., Shiraishi Y. Genomic basis for RNA alterations in cancer. Nature. 2020;578:129–136. doi: 10.1038/s41586-020-1970-0. [DOI] [PMC free article] [PubMed] [Google Scholar]; This paper presents the most complete set of gene alterations in cancer to date, obtained by deep analysis of the tumor transcriptomes from 1188 donors. Somatic copy number alterations are identified as the major driver of total and allele-specific gene expression, as well as cis-acting somatic single nucleotide variants, splicing events and structural variants resulting in gene fusions.
- 34•.Przytycki P.F., Singh M. Differential allele-specific expression uncovers breast cancer genes dysregulated by cis noncoding mutations. Cell Syst. 2020;10:193–203. doi: 10.1016/j.cels.2020.01.002. e4. [DOI] [PMC free article] [PubMed] [Google Scholar]; This paper introduces a new method to detect cancer-relevant mutations in non-coding portions of the genome by identifying genes whose ASE significantly differs between cancerous and normal tissue. When applied to breast cancer samples, the authors are able to recapitulate the known large effects on ASE of copy number alterations and nonsense-mediated decay, and identify novel potentially functional cis-acting mutations.
- 35.Batcha A.M.N., Bamopoulos S.A., Kerbs P., Kumar A., Jurinovic V., Rothenberg-Thurley M., Ksienzyk B., Philippou-Massier J., Krebs S., Blum H. Allelic imbalance of recurrently mutated genes in acute myeloid Leukaemia. Sci Rep. 2019;9:11796. doi: 10.1038/s41598-019-48167-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Wu T.D., Nacu S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics. 2010;26:873–881. doi: 10.1093/bioinformatics/btq057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Garrison E., Sirén J., Novak A.M., Hickey G., Eizenga J.M., Dawson E.T., Jones W., Garg S., Markello C., Lin M.F. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol. 2018;36:875–879. doi: 10.1038/nbt.4227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.van de Geijn B., McVicker G., Gilad Y., Pritchard J.K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat Methods. 2015;12:1061–1063. doi: 10.1038/nmeth.3582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Aguiar V.R.C., César J., Delaneau O., Dermitzakis E.T., Meyer D. Expression estimation and eQTL mapping for HLA genes with a personalized pipeline. PLoS Genet. 2019;15 doi: 10.1371/journal.pgen.1008091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Darby C.A., Stubbington M.J.T., Marks P.J., Martínez Barrio Á, Fiddes I.T. scHLAcount: allele-specific HLA expression from single-cell gene expression data. Bioinformatics. 2020;36:3905–3906. doi: 10.1093/bioinformatics/btaa264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Tilgner H., Grubert F., Sharon D., Snyder M.P. Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc Natl Acad Sci U S A. 2014;111:9869–9874. doi: 10.1073/pnas.1400447111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Cole C., Byrne A., Adams M., Volden R., Vollmers C. Complete characterization of the human immune cell transcriptome using accurate full-length cDNA sequencing. Genome Res. 2020;30:589–601. doi: 10.1101/gr.257188.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Degner J.F., Marioni J.C., Pai A.A., Pickrell J.K., Nkadori E., Gilad Y., Pritchard J.K. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics. 2009;25:3207–3212. doi: 10.1093/bioinformatics/btp579. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Mohammadi P., Castel S.E., Cummings B.B., Einson J., Sousa C., Hoffman P., Donkervoort S., Jiang Z., Mohassel P., Foley A.R. Genetic regulatory variation in populations informs transcriptome analysis in rare disease. Science. 2019;366:351–356. doi: 10.1126/science.aay0256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Kilpinen H., Waszak S.M., Gschwind A.R., Raghav S.K., Witwicki R.M., Orioli A., Migliavacca E., Wiederkehr M., Gutierrez-Arcelus M., Panousis N.I. Coordinated effects of sequence variation on DNA binding, chromatin structure, and transcription. Science. 2013;342:744–747. doi: 10.1126/science.1242463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Waszak S.M., Kilpinen H., Gschwind A.R., Orioli A., Raghav S.K., Witwicki R.M., Migliavacca E., Yurovsky A., Lappalainen T., Hernandez N. Identification and removal of low-complexity sites in allele-specific analysis of ChIP-seq data. Bioinformatics. 2014;30:165–171. doi: 10.1093/bioinformatics/btt667. [DOI] [PubMed] [Google Scholar]
- 47.Rojano E., Seoane P., Ranea J.A.G., Perkins J.R. Regulatory variants: from detection to predicting impact. Brief Bioinform. 2019;20:1639–1654. doi: 10.1093/bib/bby039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Pastinen T. Genome-wide allele-specific analysis: insights into regulatory variation. Nat Rev Genet. 2010;11:533–538. doi: 10.1038/nrg2815. [DOI] [PubMed] [Google Scholar]
- 49.Petrackova A., Vasinek M., Sedlarikova L., Dyskova T., Schneiderova P., Novosad T., Papajik T., Kriegova E. Standardization of sequencing coverage depth in NGS: recommendation for detection of clonal and subclonal mutations in cancer diagnostics. Front Oncol. 2019;9:851. doi: 10.3389/fonc.2019.00851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Shi W., Ng C.K.Y., Lim R.S., Jiang T., Kumar S., Li X., Wali V.B., Piscuoglio S., Gerstein M.B., Chagpar A.B. Reliability of whole-exome sequencing for assessing intratumor genetic heterogeneity. Cell Rep. 2018;25:1446–1457. doi: 10.1016/j.celrep.2018.10.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Chalmers Z.R., Connelly C.F., Fabrizio D., Gay L., Ali S.M., Ennis R., Schrock A., Campbell B., Shlien A., Chmielecki J. Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Med. 2017;9:34. doi: 10.1186/s13073-017-0424-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Jerby-Arnon L., Shah P., Cuoco M.S., Rodman C., Su M.-J., Melms J.C., Leeson R., Kanodia A., Mei S., Lin J.-R. A cancer cell program promotes T cell exclusion and resistance to checkpoint blockade. Cell. 2018;175:984–997. doi: 10.1016/j.cell.2018.09.006. e24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Sade-Feldman M., Yizhak K., Bjorgaard S.L., Ray J.P., de Boer C.G., Jenkins R.W., Lieb D.J., Chen J.H., Frederick D.T., Barzily-Rokni M. Defining T Cell states associated with response to checkpoint immunotherapy in melanoma. Cell. 2018;175:998–1013. doi: 10.1016/j.cell.2018.10.038. e20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Azizi E., Carr A.J., Plitas G., Cornish A.E., Konopacki C., Prabhakaran S., Nainys J., Wu K., Kiseliovas V., Setty M. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell. 2018;174:1293–1308. doi: 10.1016/j.cell.2018.05.060. e36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Zheng G.X.Y., Terry J.M., Belgrader P., Ryvkin P., Bent Z.W., Wilson R., Ziraldo S.B., Wheeler T.D., McDermott G.P., Zhu J. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017;8 doi: 10.1038/ncomms14049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ding J., Adiconis X., Simmons S.K., Kowalczyk M.S., Hession C.C., Marjanovic N.D., Hughes T.K., Wadsworth M.H., Burks T., Nguyen L.T. Systematic comparison of single-cell and single-nucleus RNA-sequencing methods. Nat Biotechnol. 2020;38:737–746. doi: 10.1038/s41587-020-0465-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Borel C., Ferreira P.G., Santoni F., Delaneau O., Fort A., Popadin K.Y., Garieri M., Falconnet E., Ribaux P., Guipponi M. Biased allelic expression in human primary fibroblast single cells. Am J Hum Genet. 2015;96:70–80. doi: 10.1016/j.ajhg.2014.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Kim J.K., Kolodziejczyk A.A., Ilicic T., Teichmann S.A., Marioni J.C. Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression. Nat Commun. 2015;6 doi: 10.1038/ncomms9687. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59•.Choi K., Raghupathy N., Churchill G.A. A Bayesian mixture model for the analysis of allelic expression in single cells. Nat Commun. 2019;10 doi: 10.1038/s41467-019-13099-0. [DOI] [PMC free article] [PubMed] [Google Scholar]; In this paper the authors developed a novel computational method to estimate ASE in single cells by leveraging information from multi-mapping reads and from other cells.
- 60.Hagemann-Jensen M., Ziegenhain C., Chen P., Ramsköld D., Hendriks G.-J., Larsson A.J.M., Faridani O.R., Sandberg R. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat Biotechnol. 2020;38:708–714. doi: 10.1038/s41587-020-0497-0. [DOI] [PubMed] [Google Scholar]
- 61.Tukiainen T., Villani A.-C., Yen A., Rivas M.A., Marshall J.L., Satija R., Aguirre M., Gauthier L., Fleharty M., Kirby A. Landscape of X chromosome inactivation across human tissues. Nature. 2017;550:244–248. doi: 10.1038/nature24265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Garieri M., Stamoulis G., Blanc X., Falconnet E., Ribaux P., Borel C., Santoni F., Antonarakis S.E. Extensive cellular heterogeneity of X inactivation revealed by single-cell allele-specific expression in human fibroblasts. Proc Natl Acad Sci U S A. 2018;115:13015–13020. doi: 10.1073/pnas.1806811115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Larsson A.J.M., Johnsson P., Hagemann-Jensen M., Hartmanis L., Faridani O.R., Reinius B., Segerstolpe Å, Rivera C.M., Ren B., Sandberg R. Genomic encoding of transcriptional burst kinetics. Nature. 2019;565:251–254. doi: 10.1038/s41586-018-0836-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Wang L., Fan J., Francis J.M., Georghiou G., Hergert S., Li S., Gambe R., Zhou C.W., Yang C., Xiao S. Integrated single-cell genetic and transcriptional analysis suggests novel drivers of chronic lymphocytic leukemia. Genome Res. 2017;27:1300–1311. doi: 10.1101/gr.217331.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Fan J., Lee H.-O., Lee S., Ryu D.-E., Lee S., Xue C., Kim S.J., Kim K., Barkas N., Park P.J. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data. Genome Res. 2018;28:1217–1227. doi: 10.1101/gr.228080.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Marderstein A.R., Uppal M., Verma A., Bhinder B., Tayyebi Z., Mezey J., Clark A.G., Elemento O. Demographic and genetic factors influence the abundance of infiltrating immune cells in human tissues. Nat Commun. 2020;11 doi: 10.1038/s41467-020-16097-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Lim Y.W., Chen-Harris H., Mayba O., Lianoglou S., Wuster A., Bhangale T., Khan Z., Mariathasan S., Daemen A., Reeder J. Germline genetic polymorphisms influence tumor gene expression and immune cell infiltration. Proc Natl Acad Sci U S A. 2018;115:E11701–E11710. doi: 10.1073/pnas.1804506115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Zhang Y., Manjunath M., Yan J., Baur B.A., Zhang S., Roy S., Song J.S. The Cancer-associated genetic variant rs3903072 modulates immune cells in the tumor microenvironment. Front Genet. 2019;10:754. doi: 10.3389/fgene.2019.00754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Schnell A., Bod L., Madi A., Kuchroo V.K. The yin and yang of co-inhibitory receptors: toward anti-tumor immunity without autoimmunity. Cell Res. 2020;30:285–299. doi: 10.1038/s41422-020-0277-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Castel S.E., Mohammadi P., Chung W.K., Shen Y., Lappalainen T. Rare variant phasing and haplotypic expression from RNA sequencing with phASER. Nat Commun. 2016;7 doi: 10.1038/ncomms12817. [DOI] [PMC free article] [PubMed] [Google Scholar]