Abstract
A sequence-to-expression machine learning model achieves higher accuracy by incorporating information about potential long-range interactions.
Deep learning models have shown great promise in their ability to predict, solely on the basis of DNA sequence, the complex patterns of gene expression, chromatin accessibility, and histone modifications that arise in diverse cell types. However, current state-of-the-art predictive methods cannot model interactions with distal regulatory elements that are a few tens of kilobases away from the locus of interest. In this issue of Nature Methods, Avsec et al.1 address this problem by using techniques developed in the field of natural language processing. They have trained a deep learning model, called Enformer, to predict thousands of epigenetic and transcriptional profiles from hundreds of human and mouse cell types using only DNA sequence as input, incorporating information up to 100 kb on either side of the target locus.
Each cell in a single organism contains essentially the same genomic sequence, yet those cells can exhibit dramatic differences in gene expression across cell types. These differences in transcriptional patterns are regulated by epigenetic factors such as DNA accessibility and histone modifications. Evidence suggest that many disease-associated noncoding genetic variants influence phenotype via changes in gene expression.2 Thus, a predictive model that is capable of annotating the influence of every nucleotide, as well as its variants, on these regulatory attributes would be a major step toward achieving the goals of personalized genomic medicine.
Accordingly, predicting gene expression in mammalian genomes has been a longstanding goal. Early models aimed to predict gene expression by relying on relevant yet indirect information, such as ChIP-seq measurements of transcription factors3 or histone modification levels.4 More recently, deep convolutional neural network models such as Basset5 have been brought to bear on the fundamental problem of predicting chromatin accessibility directly from local DNA sequence. Basset was subsequently improved upon by the Basenji2 and ExPecto6 models to model gene expression and to capture regulatory interactions up to 32 kb and 20 kb, respectively, away from the locus of interest. To achieve this receptive field, Basenji used a technique known as “dilated convolution layers,” and ExPecto used a two-stage process of fitting sequence to expression while making use of epigenomic data.
The Enformer model1 pushes even further in this direction, achieving more accurate predictions through the use of a deep learning architecture that is able to integrate information from up to 100 kb away in the genome (Figure 1). This increase in the model’s receptive field is achieved by using a model architecture, called a “transformer,” that has achieved significant breakthroughs in natural language processing.7 The building blocks of a transformer model are “self-attention layers,” which transform each position in the input sequence into a latent representation by computing a weighted sum across the representations of all other positions in the sequence. The weight between any two positions, referred to as the “attention weight,” is computed based on the degree of similarity between those position’s corresponding latent representations. Because each position is directly associated with all other positions in the sequence, the transformer architecture naturally allows information to flow between distal elements. As a consequence, self-attention layers can identify the effects of sparsely distributed distal elements much more easily than the convolutions used in a typical deep neural network, which have local receptive fields and hence require many successive layers to reach distal elements.
Figure 1:

Enformer uses a transformer deep learning architecture that is able to integrate information from up to 100 kb away in the genome. Transformers use an attention mechanism that associates each locus with all other positions in the sequence, enabling information to flow between distal elements and thus producing more accurate predictions. The model outputs thousands of predicted tracks, including transcription factor and histone ChIP-seq, DNase-seq, ATAC-seq, and CAGE data.
Studies of 3D genome architecture and its relationship with gene expression have revealed many instances in which regulatory elements, including enhancers, repressors, and insulators, can influence gene expression from far greater than 32 kb away.8 Accordingly, Avsec et al. show convincingly that Enformer substantially outperforms the state-of-the-art model in predicting gene expression from cap-analysis gene expression (CAGE) experiments, where tissue-specific gene expression strongly depends on distal elements. Furthermore, in making its predictions, Enformer’s attention mechanism focuses on cell-type-specific regulatory elements, including promoters, enhancers, and insulators. This mechanism can be exploited to accurately identify enhancer-promoter interactions. Finally, Avsec et al. show that Enformer can more accurately predict whether a natural variant or CRISPR-perturbed enhancer will cause a significant expression change compared to previous approaches.
An important caveat for all of these models is that they cannot make predictions outside of the cell types they are trained on. Thus, their primary utility lies in identifying regulatory elements and other potentially functionally important sequence patterns and predicting in the impacts of genetic variation. A new modeling approach would be required to make predictions in novel cell types.
This work points to several other promising directions for future research. First, it has been demonstrated that highly structured 3D DNA contacts, which greatly influence long-range gene regulation, are predictable from the DNA sequence.9 Accordingly, Enformer could potentially improve its modeling of insulators and distal regulation by integrating such information. Second, it would be interesting to apply additional interpretation methods to Enformer, with the goal of better understanding the quantitative effects—additive, cooperative, or competitive—of sequence syntax within predicted enhancer-promoter interactions across different cell types and different species. Third, in addition to in silico validation and elucidation of never-before-seen variants of interest, one could imagine using models like Enformer in reverse, thereby enabling the design of synthetic sequences that exhibit desired characteristics with cell-type specificity.10
Acknowledgments:
This work was supported by NIH award U01 HG009395.
Footnotes
Competing interests: The authors declare no competing interests.
References
- [1].Avsec Z, Agarwal V, Visentin D, Ledsam JR, Grabska-Barwinska A, Taylor KR, Assael Y, Jumper J, Kohli P, and Kelley DR. Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods, in press, 2021. [DOI] [PMC free article] [PubMed]
- [2].Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, and Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Research, 28(5):739–750, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Ouyang Z, Zhou Q, and Wong HW. ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. Proceedings of the National Academy of Sciences of the United States of America, 106:21521–21526, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Karlic R, Chung HR, Lasserre J, Vlahovicek K, and Vingron M. Histone modification levels are predictive for gene expression. Proceedings of the National Academy of Sciences of the United States of America, 107(7):2926–2931, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Kelley DR, Snoek J, and Rinn JL . Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Research, 26(7):990–999, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Zhou J, Theesfeld CL, Yao K, Chen KM, Wong AK, and Troyanskaya OG. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nature Genetics, 50(8):1171–1179, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Waswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, and Polosukhin I. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000–6010, 2017.
- [8].Gasperini M, Tome JM, and Shendure J. Towards a comprehensive catalogue of validated and target-linked human enhancers. Nature Reviews Genetics, 21(5):292–310, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Fudenberg G, Kelley DR, and Pollard KS. Predicting 3D folding from DNA sequence with Akita. Nature Methods, 17(11):1111–1117, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Schreiber JM, Lu YY, and Noble WS. Ledidi: Designing genome edits that induce functional activity. ICML Workshop on Computational Biology, 2020.
