Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2022 Jan 1;25(1):103730. doi: 10.1016/j.isci.2021.103730

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2022 The Authors

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

PMC Copyright notice

Construction and validation of the CAROM-ML model

(A) Table of inputs for CAROM-ML. The input features comprise 13 gene, reaction, and enzyme properties. The target column includes the posttranslational modification class. Each gene-reaction pair is marked as either phosphorylated, acetylated, or unknown.

(B) A single decision tree model was built by training on the observations from all organisms, while only using the top 50% most important features as identified in the SHAP analysis. The complexity of the tree was constrained by limiting the tree depth to enable ease of interpretation and visualization. The XGBoost model is made of an ensemble of such decision trees.

(C) The results from the CAROM-ML model from 5-fold cross validation are shown in the bar graph (left) with the 95% confidence intervals represented by the error bars. The cross-validation results are also shown in the confusion matrix.

(D) Comparison of model predictions for the G1, S, and G2 phases of the cell cycle with experimental phospho-proteomics data for those phases. Confusion matrix shows predictions from the main CAROM-ML model, whereas the bar graph shows the standard deviation for five models trained with different random seeds.

(E) Comparison of cell cycle acetylation predictions with experimental acetylomics data from HeLa cells treated with pan-deacetylase inhibitors. The number of unique acetylated genes for each group are displayed in parentheses. Within the table, the number of overlapping genes between each phase and drug is shown, along with the p value of the hypergeometric test.