Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2024 Feb 21;25(2):bbad539. doi: 10.1093/bib/bbad539

Lactylation prediction models based on protein sequence and structural feature fusion

Ye-Hong Yang 1, Jun-Tao Yang 2,3,, Jiang-Feng Liu 4,5,
PMCID: PMC10939394  PMID: 38385873

Abstract

Lysine lactylation (Kla) is a newly discovered posttranslational modification that is involved in important life activities, such as glycolysis-related cell function, macrophage polarization and nervous system regulation, and has received widespread attention due to the Warburg effect in tumor cells. In this work, we first design a natural language processing method to automatically extract the 3D structural features of Kla sites, avoiding potential biases caused by manually designed structural features. Then, we establish two Kla prediction frameworks, Attention-based feature fusion Kla model (ABFF-Kla) and EBFF-Kla, to integrate the sequence features and the structure features based on the attention layer and embedding layer, respectively. The results indicate that ABFF-Kla and Embedding-based feature fusion Kla model (EBFF-Kla), which fuse features from protein sequences and spatial structures, have better predictive performance than that of models that use only sequence features. Our work provides an approach for the automatic extraction of protein structural features, as well as a flexible framework for Kla prediction. The source code and the training data of the ABFF-Kla and the EBFF-Kla are publicly deposited at: https://github.com/ispotato/Lactylation_model.

Keywords: lysine lactylation, automatic feature extraction, feature fusion, deep learning, residue contact map

INTRODUCTION

Lactate is the primary carbon-containing metabolite in the cell glycolysis pathway, and its biological function has been widely studied due to the existence of the Warburg effect in tumor cells [1]. Lysine lactylation (Kla) is a newly discovered posttranslational modification (PTM) in both histone and nonhistone proteins [2, 3] that has been reported to be involved in important life activities, such as glycolysis-related cell function, macrophage polarization and nervous system regulation [4, 5], and plays a key role in neuroinflammation and tumorigenesis [6, 7]. To understand the molecular mechanism and immune regulation pathway of lactylation, high-resolution liquid chromatography–tandem mass spectrometry (LC–MS/MS) has been used to identify Kla sites in human tissue and cell lines [8–11] and other species [12–16]. However, as the experiment is time-consuming and costly, and modified proteins are not abundant, it is difficult to widely achieve the global profiling of lactylation in different species by current experimental strategies.

Some computational methods have been used to predict Kla sites. few-shot learning-based Kla prediction model (FSL-Kla) is a few-shot learning model based on 343 Kla sites of 191 proteins collected from Homo sapiens, Mus musculus and Botryotinia fuckeliana. In feature coding schemes, FSL-Kla adopts 11 manually designed features, including the amino acid composition and physicochemical properties of the sequence, as well as structural features, such as the accessible surface area, protein secondary structure (SS) and backbone torsion angle [17]. However, manually designed features are limited to express more in-depth and complex biological information that may be related to PTMs [18]. Moreover, the features designed based on prior knowledge also limit the deep learning models to exploring more suitable features in new fields. To solve this problem, subsequent models have used embedding layers to automatically extract features from amino acid sequences: DeepKla has been combined with a convolutional neural network, bidirectional gated recurrent units and an attention mechanism to predict Kla sites in rice [19]; auto-Kla predicts Kla sites in gastric cancer cells based on automated machine learning [20]. These studies have explored Kla site prediction, but there are still some areas for improvement. First, these models focus on using features extracted from protein primary sequences [17, 19, 20], but in actual 3D folding structures, residues adjacent to the Kla site and those that affect its modification may be far apart from it in the sequence. Therefore, when only using sequence features, part of the information may be missed [21]. In addition to the FSL-Kla model, other models that integrate structural information for PTM analysis have been introduced. Kamacioglu et al. [22] selected 23 structure-related parameters obtained from the Protein Database (PDB) and UniProt [23] to annotate and reveal functional phosphorylation sites [24]. Fei Zhu et al. [25] presented two deep learning models by incorporating sequence and structural features, including absolute solvent accessibility (ACC) and SS, to elucidate the molecular basis and underlying functional landscape of PTMs.

However, the structural features used in the above studies were still designed manually, due to our current lack of precise understanding of which structure features play an important role in the process of lactylation, so it is difficult to manual design appropriate features for Kla prediction. Due to the complexity of protein structure information representation compared to primary sequences, it is difficult to extract structural features directly through natural language methods like sequence features. Therefore, there is currently no automatic feature extraction method based on protein spatial structure for Kla site prediction. Second, current Kla prediction models use relatively small datasets, and insufficient samples could give rise to overfitting and affect the accuracy of deep learning models.

Fortunately, profiting from the breakthrough of deep learning algorithms, we can obtain protein structural information with experimental accuracy of almost the complete human proteome from the AlphaFold protein structure database (https://AlphaFold.ebi.ac.uk) [26, 27], enabling us to conveniently extract the protein 3D structural-based features directly from PDB files and further reveal the structural background of Kla in the human proteome [21]. Meanwhile, the recently newly identified Kla datasets of hepatocellular carcinoma and human lung tissues through LC–MS/MS also provide more sufficient training samples for human Kla prediction [9, 10].

In this work, we established two different neural network frameworks based on natural language processing (NLP) to fuse features from protein sequences and 3D structures for lactylation prediction while automatically extracting information from protein sequences and 3D structures, avoiding potential biases in manual feature design. Specifically, to process 3D structural features using the NLP method, we selected the nearest contact residue set of the target lysine site from the residue contact map generated by the protein structure and arranged them according to their order in the protein sequence as the input segments. Through this approach, we can not only easily utilize transformer models with attention mechanisms to extract important information hidden in protein structures but we can also use different ways to fuse protein structure features with sequence features, increasing the flexibility of the lactylation prediction model. After constructing the benchmark dataset of the human Kla sites of the global lactylated proteins [8–11], we compared our two kinds of fusion feature models with the other Kla prediction models and achieved good results, indicating the effectiveness of our methods. To our knowledge, this is the first work that utilizes an NLP model to extract protein structure features for Kla site prediction, providing a framework for integrating information from protein structures into the prediction of other kinds of PTMs. The architecture of the feature fusion models is shown in Figure 1.

Figure 1.

Figure 1

The framework of the feature fusion Kla prediction models. (A) The framework of the ABFF-Kla. (B) The framework of the EBFF-Kla.

MATERIALS AND METHODS

Benchmark dataset

Sequence features

In this study, we collected multiple human lactylation data sites identified by LC–MS/MS experiments as the sample set [8–11]. Among the larger datasets which include 10 749 Kla sites of 2856 proteins, we selected with 80% for model training and testing, and 20% for validation [8–10]. For a smaller independent experimental dataset, including 755 Kla sites on 548 proteins, as another independent validation set [11]. Then, we downloaded these protein sequences from the UniProt database [23] and used CD-HIT (program for clustering large protein databases at different sequence identity levels) to remove redundant sequences [28]. Especially, by referring to previous research and our experience [29], we have selected 0.7 as the similarity threshold for CD-HIT, as a high threshold cannot clear similar redundant sequences, while a low threshold will aggregate different sequences into one class, resulting in a loss of sequence diversity. Unsuitable threshold will affect the distribution of samples and the model performance. These sequences were truncated into 35-residue-long peptides with lysine located at the center, and a segment was defined as a positive sample if its central lysine was a Kla site; otherwise, it was defined as a negative sample [19, 20]. Finally, we obtained 9560 positive nonredundant Kcr sites. Meanwhile, to balance the positive and negative samples, 9560 nonredundant normal lysine sites were randomly selected as negative K sties. In order to determine the optimal range of peptide length in the samples, in addition to the 35 residues length samples, we also reduced and increased the upstream and downstream lengths of the central lysine site by two residue positions, with corresponding sample lengths of 31 and 39, respectively (see Supplementary Materials available online at http://bib.oxfordjournals.org).

Structure feature

In this work, we automatically extracted features around Kla sites using an NLP method, including not only its upstream and downstream residues in the primary sequence but also its contact residues in the spatial structure [21, 30]. In the analysis of protein structure, two resides was considered to be in contact if the distance between their corresponding Cα atom is below to a certain threshold, which usually between 8 Å and 10 Å [31–33]. The selection of distance threshold will affects the number of contacts in the protein, and the lower the threshold, the fewer contact residue pairs will be obtained. In this work, to construct the contact maps of a protein, we hope to preserve as much contact residues information as possible around the target lysine site for subsequent calculations, so we adopted the upper limit 10 Å as the threshold [31]. After obtaining the above nonredundant sequences, we downloaded the PDB files of these proteins from the AlphaFold database [26], and constructed the contact maps of them from the 3-dimensional atomic coordinates of the corresponding PDB files [31]. For each target lysine residue in the nonredundant positive and negative datasets, we selected the nearest contact residue set of the target K site from the protein contact map and then arranged these contact residues into a new 35-residue-long segment based on their order on the protein sequence with the target K site in the center of the segment, named the contact segment, as shown in Figure 1A. Finally, we obtained 8884 positive contact segment samples and 8884 negative contact segment samples. Similar to the sequence features, to determine the optimal range of the segment length, we also attempted 31, 35 and 39 residues long segment, respectively (Supplementary Materials available online at http://bib.oxfordjournals.org).

Feature fusion network for Kla prediction

Attention-based feature fusion model

Attention mechanisms have been widely used in NLP tasks and are based on the hidden states of the source and target sides to obtain the dependency relationship between each word on the source side and the target side, thus focusing on the important information that is screened out [34]. Self-attention further reduces the dependence on external information and is better at capturing the internal correlations of features, especially by calculating the mutual influence between words, thus solving the problem of long-distance dependence [35]. Using deep learning models integrating attention methods to mine semantic associations and syntactic features between residues in protein sequences, various types of PTMs can be predicted, and good performance has been achieved [18–20, 36].

In this work, we designed a dual-channel neural network to integrate two self-attention submodules, which focus on the context of protein sequences and 3D structures that affect Kla sites. For each submodule, the 35-residue-long peptides and contact fragments were regarded as sentences in the text corpus and through the token embedding layer input to their self-attention layer. Next, the combination of a convolutional layer and a long short-term memory (LSTM) layer can more accurately obtain the most important semantic features and positional information of the features [37, 38], as shown in Figure 1A. The self-attention layer and the output sequence features of the sequence submodule is shown in Formula 1 and Formula 2, and the structure submodule below, respectively. Similarly, the self-attention layer and the output structure features of the structure submodule is shown in Formulas 3 and 4, respectively. Finally, the output of the two submodules was merged through a new self-attention layer, thereby incorporating the features of the residues that maybe far from the target K site in the protein sequence but could affect the target K site in the spatial structure into the Kla prediction. The calculation of this process can be found in Formulas 5 and 6. We refer to this model as ABFF-Kla, and its architecture is shown in Figure 1A.

graphic file with name DmEquation1.gif (1)
graphic file with name DmEquation2.gif (2)
graphic file with name DmEquation3.gif (3)
graphic file with name DmEquation4.gif (4)
graphic file with name DmEquation5.gif (5)
graphic file with name DmEquation6.gif (6)

Embedding-based feature fusion model

In the ABFF-Kla model, the sequence features and structure features are fused through their respective self-attention submodules. In this section, we attempted to fuse the features through the embedding module. This model was named EBFF-Kla. In the EBFF-Kla model, the input layer is summed by two embeddings: a sequence embedding and a contact segment embedding, which represent upstream and downstream residues of the target K site in the sequence and the nearest contact residues of the target K site in the spatial structure, respectively [30]. The description of the embedding is as follow:

graphic file with name DmEquation7.gif (7)
graphic file with name DmEquation8.gif (8)

Then, a self-attention layer combined with a bidirectional gated recurrent unit (Bi-GRU) layer deeply extracts the most important semantic features and position features of the residues either in the primary sequence or 3D structure that affect the lactylation of target lysine sites, as shown in Figure 1B.

Model training and evaluation

When training the ABFF-Kla model, we adopted the idea of transfer learning. First, we pretrained the sequence-based submodule and structure-based submodule separately and transfer the parameters of these two models to ABFF-Kla for initialization. Then, the entire model is trained by simultaneously inputting the sequence peptides and the contact segments. To evaluate the performance of ABFF-Kla and EBFF-Kla, we calculated the four different parameters: specificity (SPE), sensitivity (SEN), accuracy (ACC) and Matthews correlation coefficient (MSS). The four parameters are defined as follows:

graphic file with name DmEquation9.gif
graphic file with name DmEquation10.gif
graphic file with name DmEquation11.gif
graphic file with name DmEquation12.gif

where TP, FP, TN and FN are the numbers of true positive samples, false-positive samples, true negative samples and false-negative samples, respectively. We also calculated the area under receiver operating characteristic curves as evaluation measures in this work.

RESULTS

Feature fusion model performance

To compare the performance of the above feature fusion models for Kla prediction, we randomly selected 80% from the human Kla benchmark dataset as the training and testing set, and using the remaining 20% for validation. After five random partitions were conducted, the mean accuracy and MCC of 10-fold cross-validation showed that the ABFF-Kla model has a higher performance than that of EBFF-Kla, indicating that the attention-based feature fusion framework can provide richer information than that through embedding-based feature fusion in Kla prediction (Figure 2). In addition, to optimize the two models and find the recurrent neural network (RNN) layer types that are more suitable for each of them, we swapped and compared the LSTM layer of ABFF-Kla and the Bi-GRU layer of EBFF-Kla, as shown in Figure 2A and B. The results indicated that after replacing the LSTM layer with the Bi-GRU layer, the Kla prediction performance of EBFF-Kla was improved (Figure 2). The reason is that EBFF-Kla directly overlays complex features through the embedding module, while the GRU can effectively capture the semantic associations of long sequences while suppressing gradient vanishing or exploding, leading to a complexity lower than that of LSTM [35, 36].

Figure 2.

Figure 2

The accuracy and MCC of the feature fuse Kla prediction models. (A) The accuracy of the sequence sub-modules, the structure sub-modules, ABFF-Kla and EBFF-Kla. (B) The MCC of the sequence sub-module, the structure sub-module, ABFF-Kla and EBFF-Kla.

To compare the contribution of protein sequences and contact fragments to Kla prediction, we trained the self-attention submodule of the ABFF-Kla model on the 35-residue-long peptides and contact fragments. From Figure 2A and B, we found that the prediction performance of the submodule using protein sequences was higher than that of the submodule using contact segments but lower than that of feature fusion models, illustrating that protein structure features can provide additional information beyond sequence features for Kla prediction.

Comparison of the feature fusion models with different sample lengths

In order to determine the range of optimal sample length of our feature fusion models, we conducted five rounds of random partitioning on the human Kla benchmark dataset based on samples with 31 residue lengths, 35 residue lengths and 39 residue lengths, respectively, and make independently validated on each round. The mean accuracy and MCC of 10-fold cross-validation on the test set, validation set and the independent validation set showed that both the sequence sub-module and the structure sub-module exhibit higher performance over 35-residue long samples than other two lengths (Figure 3A and B). Especially for independent validation sets, this trend is more pronounced on the structure sub-module than on the sequence sub-module (Figure 3A and B). Therefore, as a fusion model for two sub modules, the trend of ABFF-Kla is also higher than that of EBFF-Kla, which directly performs feature fusion on the embedding layer (Figure 3C and D). This result indicated that as the sample length increases, although the model can obtain more sequence and structural information, the corresponding interference factors introduced will also increase, thereby affecting the generalization ability of the models.

Figure 3.

Figure 3

Compare the feature fusion model on different sample lengths. (A) The ACC and MCC of the sequence sub-module. (B) The ACC and MCC of the structure sub-module. (C) The ACC and MCC of ABFF-Kla. (D) The ACC and MCC of EBFF-Kla.

Comparison of the feature fusion models with other Kla prediction models

Then, we compared our feature fusion models with other Kla prediction models, such as DeepKla [19] and Auto-Kla [20]. We also conducted five rounds of random partitioning on the human Kla benchmark dataset, and make independently validated on each round using the independent dataset. The mean ACC, MCC and AUC of 10-fold cross-validation indicated that ABFF-Kla (LSTM) has the highest performance, followed by EBFF-Kla (Bi-GRU), which is higher than that of auto-Kla and DeepKla, as shown in Figure 4. From Figure 4A–C, it can be observed that the two ABFF-Kla models have higher specificity and sensitivity than those of the other models, which illustrates that the framework of attention-based feature fusion leads to better predictive performance. In these models, DeepKla and Auto-Kla only use features from protein sequences, while ABFF-Kla and EBFF-Kla fuse features from protein sequences and spatial structures, indicating that contact fragments can provide additional information that affects the lactylation of target lysine sites.

Figure 4.

Figure 4

Compare the feature fuse Kla prediction with other Kla models. (A) Compare the accuracy of different Kla models. (B) Compare the MCC of different Kla models. (C) Compare the AUC of different Kla models. (D) The mean ACC, MCC and AUC values of different Kla models.

Prediction the lactylation of human proteome

In this section, we used our feature fusion models to predict the lactylation of all proteins in the human proteome. Firstly, we downloaded the Swiss-Prot human protein sequence from Uniprot database (https://www.uniprot.org/uniprotkb), which a total of 50 302 proteins [23]. Among them, in addition to 2991 lactylation proteins that have been identified by MS [8–11] (Supplementary Table S1, available online at http://bib.oxfordjournals.org), we used the corresponding models to predict Kla sites of the remaining proteins: (1) For the lysine sites with both sequence and structure features, we used the ABFF-Kla model for prediction, and these results are shown in Supplementary Table S2, available online at http://bib.oxfordjournals.org. (2) For the lysine sites with only sequence features but no structure information, we used the sequence sub-module for prediction, this results are listed in Supplementary Table S3, available online at http://bib.oxfordjournals.org. (3) Specially, for the lysine sites that cannot obtain sufficient sequence features due to their proximity to the ends of the protein sequence, if they have structure features, we can use structure sub-models for prediction, which is also an advantage of our feature fusion model compared to other sequence feature prediction models. The prediction results for this part are shown in Supplementary Table S4, available online at http://bib.oxfordjournals.org, and we conducted GO enrichment analysis on these proteins, as shown in Figure 5.

Figure 5.

Figure 5

The GO enrichment analysis of the proteins with Kla sites predicted by the structure sub-module.

CONCLUSION AND DISCUSSION

In this work, we first built a NLP method to automatically extract the 3D structural features of proteins, avoiding the potential biases caused by manually designed structural features in previous studies. Then, we designed two neural network frameworks, the attention-based feature fusion model ABFF-Kla and the embedding-based feature fusion model EBFF-Kla, to integrate the context of protein sequences and 3D structures that affect Kla sites, which improved the performance of Kla prediction. Our work provides an approach for the automatic extraction of protein structural features, as well as a flexible framework for the integration of different types of protein features, not only for lactylation prediction but also for other PTM prediction studies.

However, there are still some shortcomings in this research. We automatically extracted the contact residues adjacent to the target lysine site through contact maps and generated contact segments as input features, reducing the information in a 3D space to a 2D plane, which inevitably leads to the loss of some structural information. How to better utilize the advantages of automatic feature extraction in deep learning models to extract richer and more important structural features for the prediction of lactylation is our next research direction.

Key Points

  • Generated the contact residues segments of the lactylated lysine site from the protein structures predicted by Alphafold2.

  • Using natural language processing method to automatically extract the 3D structural features of proteins, avoiding manual feature design.

  • Establish two lactylation prediction frameworks, ABFF-Kla and EBFF-Kla, to fuse the sequence features and the structure features based on the attention layer and embedding layer, respectively.

Supplementary Material

TableS1_KlaProteins_identified_MS_bbad539
TableS2_proteins_ABFF-Kla_prediction_bbad539
TableS3_proteins_Sequence-submodule-prediction_bbad539
TableS4_proteins_Structure-submodule-prediction_bbad539
positive_seuqence_peptide_bbad539
positive_contact_segment_bbad539
negative_sequence_peptide_bbad539
negative_contact_segment_bbad539

ACKNOWLEDGEMENTS

This work was supported by Biomedical High Performance Computing Platform, Chinese Academy of Medical Sciences.

Author Biographies

Ye-Hong Yang is a master at the State Key Laboratory of Common Mechanism Research for Major Diseases. Her research interests include machine learning and bioinformatics.

Jun-Tao Yang is a professor at the State Key Laboratory of Common Mechanism Research for Major Diseases. His research interests include proteomics and bioinformatics.

Jiang-Feng Liu is a PhD at the State Key Laboratory of Common Mechanism Research for Major Diseases. Her research interests include proteomics.

Contributor Information

Ye-Hong Yang, State Key Laboratory of Common Mechanism Research for Major Diseases, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, No.5, Dongdan 3, Dongcheng District Municipality of Beijing, Beijing 100005, China.

Jun-Tao Yang, State Key Laboratory of Common Mechanism Research for Major Diseases, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, No.5, Dongdan 3, Dongcheng District Municipality of Beijing, Beijing 100005, China; Plastic Surgery Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100144, PR China.

Jiang-Feng Liu, State Key Laboratory of Common Mechanism Research for Major Diseases, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, No.5, Dongdan 3, Dongcheng District Municipality of Beijing, Beijing 100005, China; Plastic Surgery Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100144, PR China.

FUNDING

This work was supported by grants from the Chinese Academy of Medical Sciences Innovation Fund for Medical Sciences, China (grant numbers CIFMS2022-I2M-2-001, CIFMS2022-I2M-1-011, CIFMS2021-I2M-1-057, CIFMS2021-I2M-1-049, CIFMS2021-I2M-1-044, CIFMS2021-I2M-1-016 and CIFMS2021-12 M-1-001).

DATA AVAILABILITY

The source code and the datasets of this research are all available and can be download from the web site: https://github.com/ispotato/Lactylation_model.

References

  • 1. Heiden MGV, Cantley LC, Thompson CB. Understanding the Warburg effect: the metabolic requirements of cell proliferation. Science  2009;324:1029–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Zhang D, Tang Z, Huang H, et al.  Metabolic regulation of gene expression by histone lactylation. Nature  2019;574:575–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Chen AN, Luo Y, Yang YH, et al.  Lactylation, a novel metabolic reprogramming code: current status and prospects. Front Immunol  2021;12:688910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Li L, Chen K, Wang T, et al.  Glis1 facilitates induction of pluripotency via an epigenome-metabolome-epigenome signalling cascade. Nat Metab  2020;2:882–92. [DOI] [PubMed] [Google Scholar]
  • 5. Irizarry-Caro RA, McDaniel MM, Overcast GR, et al.  TLR signaling adapter BCAP regulates inflammatory to reparatory macrophage transition by promoting histone lactylation. Proc Natl Acad Sci U S A  2020;117:30628–38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Hagihara H, Shoji H, Otabi H, et al.  Protein lactylation induced by neural excitation. Cell Rep  2021;37:109820. [DOI] [PubMed] [Google Scholar]
  • 7. Pan RY, He L, Zhang J, et al.  Positive feedback regulation of microglial glucose metabolism by histone H4 lysine 12 lactylation in Alzheimer's disease. Cell Metab  2022;34:634–648.e6. [DOI] [PubMed] [Google Scholar]
  • 8. Yang D, Yin J, Shan L, et al.  Identification of lysine-lactylated substrates in gastric cancer cells. iScience  2022;25:104630. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Yang Z, Yan C, Ma J, et al.  Lactylome analysis suggests lactylation-dependent mechanisms of metabolic adaptation in hepatocellular carcinoma. Nat Metab  2023;5:61–79. [DOI] [PubMed] [Google Scholar]
  • 10. Yang YH, Wang QC, Kong J, et al.  Global profiling of lysine lactylation in human lungs. Proteomics  2023;23:e2200437. [DOI] [PubMed] [Google Scholar]
  • 11. Hong H, Chen X, Wang H, et al.  Global profiling of protein lysine lactylation and potential target modified protein analysis in hepatocellular carcinoma. Proteomics  2023;23(9):e2200432. [DOI] [PubMed] [Google Scholar]
  • 12. Yao Y, Bade R, Li G, et al.  Global-scale profiling of differential expressed lysine-lactylated proteins in the cerebral endothelium of cerebral ischemia-reperfusion injury rats. Cell Mol Neurobiol  2023;43:1989–2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Zhao W, Yu H, Liu X, et al.  Systematic identification of the lysine lactylation in the protozoan parasite toxoplasma gondii. Parasit Vectors  2022;15:180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Song Y, Liu X, Stielow JB, et al.  Post-translational changes in Phialophora verrucosa via lysine lactylation during prolonged presence in a patient with a CARD9-related immune disorder. Front Immunol  2022;13:966457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Gao M, Zhang N, Liang W. Systematic analysis of lysine lactylation in the plant fungal pathogen Botrytis cinerea. Front Microbiol  2020;11:594743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Meng X, Baine JM, Yan T, Wang S. Comprehensive analysis of lysine lactylation in rice (Oryza sativa) grains. J Agric Food Chem  2021;69:8287–97. [DOI] [PubMed] [Google Scholar]
  • 17. Jiang P, Ning W, Shi Y, et al.  FSL-Kla: a few-shot learning-based multi-feature hybrid system for lactylation site prediction. Comput Struct Biotechnol J  2021;19:4497–509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Qiao Y, Zhu X, Gong H. BERT-Kcr: prediction of lysine crotonylation sites by a transfer learning method with pre-trained BERT models. Bioinformatics  2022;38:648–54. [DOI] [PubMed] [Google Scholar]
  • 19. Lv H, Dao FY, Lin H. DeepKla: an attention mechanism-based deep neural network for protein lysine lactylation site prediction. iMeta  2022;1:e11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Lai FL, Gao F. Auto-Kla: a novel web server to discriminate lysine lactylation sites using automated machine learning. Brief Bioinform  2023;24:bbad070. [DOI] [PubMed] [Google Scholar]
  • 21. Bludau I, Willems S, Zeng WF, et al.  The structural context of posttranslational modifications at a proteome-wide scale. PLoS Biol  2022;20:e3001636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Burley SK, Bhikadiya C, Bi C, et al.  RCSB protein data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res  2021;49:D437–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Consortium U . UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res  2019;47:D506–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Kamacioglu A, Tuncbag N, Ozlu N. Structural analysis of mammalian protein phosphorylation at a proteome level. Structure  2021;29:1219–1229.e3. [DOI] [PubMed] [Google Scholar]
  • 25. Zhu F, Yang S, Meng F, et al.  Leveraging protein dynamics to identify functional phosphorylation sites using deep learning models. J Chem Inf Model  2022;62:3331–45. [DOI] [PubMed] [Google Scholar]
  • 26. Jumper J, Evans R, Pritzel A, et al.  Highly accurate protein structure prediction with AlphaFold. Nature  2021;596:583–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Baek M, DiMaio F, Anishchenko I, et al.  Accurate prediction of protein structures and interactions using a three-track neural network. Science  2021;373:871–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Huang Y, Niu B, Gao Y, et al.  CD-HIT suite: a web server for clustering and comparing biological sequences. Bioinformatics  2010;26:680–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Lv H, Dao FY, Guan ZX, et al.  Deep-Kcr: accurate detection of lysine crotonylation sites using deep learning method. Brief Bioinform  2021;22:1–10. [DOI] [PubMed] [Google Scholar]
  • 30. Yang KK, Wu Z, Bedbrook CN, Arnold FH. Learned protein embeddings for machine learning. Bioinformatics  2018;34:2642–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Gligorijević V, Renfrew PD, Kosciolek T, et al.  Structure-based protein function prediction using graph convolutional networks. Nat Commun  2021;12:3168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins  1994;18(4):309–17. [DOI] [PubMed] [Google Scholar]
  • 33. Adhikari B, Bhattacharya D, Cao R, Cheng J. Confold: residue-residue contact-guided ab initio protein folding. Proteins  2015;83(8):1436–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Kim Y, Denton C, Hoang L, et al.  Structured attention networks. arXiv preprint, arXiv170200887  2017;1–21. [Google Scholar]
  • 35. Vaswani A, Shazeer N, Parmar N, et al.  Attention is all you need. arXiv preprint, arXiv:170603762  2017;1–15. [Google Scholar]
  • 36. Lyu X, Li S, Jiang C, et al.  DeepCSO: a deep-learning network approach to predicting cysteine S-sulphenylation sites. Front Cell Dev Biol  2020;8:594587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Karim F, Majumdar S, Darabi H, Chen S. LSTM fully convolutional networks for time series classification. IEEE Access  2017;6:1662–9. [Google Scholar]
  • 38. Peng Z, Wei S, Tian J, et al.  Attention-based bidirectional long short-term memory networks for relation classification. Comput Sci  2016;8:207–12. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

TableS1_KlaProteins_identified_MS_bbad539
TableS2_proteins_ABFF-Kla_prediction_bbad539
TableS3_proteins_Sequence-submodule-prediction_bbad539
TableS4_proteins_Structure-submodule-prediction_bbad539
positive_seuqence_peptide_bbad539
positive_contact_segment_bbad539
negative_sequence_peptide_bbad539
negative_contact_segment_bbad539

Data Availability Statement

The source code and the datasets of this research are all available and can be download from the web site: https://github.com/ispotato/Lactylation_model.


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES