Skip to main content
[Preprint]. 2024 Feb 3:2023.07.15.549134. Originally published 2023 Jul 18. [Version 2] doi: 10.1101/2023.07.15.549134

Fig. 1. Overview of the EpiGePT model for multiple epigenomic signals prediction.

Fig. 1

The EpiGePT model consists of four modules, namely the Sequence module, the TF module, the Transformer module, and the Multi-task prediction module. The sequence module comprises multiple layers of convolution applied to the one-hot encoded DNA sequence input. The input sequence length consists of 1000 genomic bins of 128bp for the prediction of multiple signals and 50 bins of 200bp for the prediction of DNase signal alone. The TF module encompasses the binding status and expression of 711 transcription factors. The Transformer module consists of a series of consecutive transformer encoders, while the multi-task module is composed of a fully connected layer. Additionally, the EpiGePT framework integrates an optional knowledge guidance module that enhances the interpretability of the model by incorporating three-dimensional chromatin interaction data into the attention layer, thus improving its understanding of regulatory mechanisms.