Skip to main content
iMeta logoLink to iMeta
letter
. 2024 Jun 20;3(4):e215. doi: 10.1002/imt2.215

BioLadder: A bioinformatic platform primarily focused on proteomic data analysis

Yupeng Zhang 1, Chunyuan Yang 2, Jinhao Wang 1, Lixin Wang 1, Yan Zhao 1, Longqing Sun 1, Wei Sun 1, Yunping Zhu 2,, Jingli Li 1,, Songfeng Wu 1,
PMCID: PMC11316921  PMID: 39135688

Abstract

BioLadder (https://www.bioladder.cn/) is an online data analysis platform designed for proteomics research, which includes three classes of experimental data analysis modules and four classes of common data analysis modules. It allows for a variety of proteomics analyses to be conducted easily and efficiently. Additionally, most modules can also be utilized for the analysis of other omics data. To facilitate user experience, we have carefully designed four different kinds of functions for customers to quickly and accurately utilize the relevant analysis modules.

graphic file with name IMT2-3-e215-g001.jpg


To the Editor,

In recent years, the vigorous development of multiomics research has generated massive amounts of data, and in‐depth data analysis and mining have become an important feature of life science research [1, 2]. Bioinformatics has become one of the most commonly used research tools, playing a pivotal role in life science research.

However, bioinformatics research requires programming training, which may not be a strong suit for those researchers that focus on scientific questions. Moreover, even some researchers have coding skills, who still need to invest considerable time and effort in coding to complete the analysis, which undoubtedly leads to delays in related work.

Online analytical platforms are undoubtedly the first choice for researchers, as they do not require additional installation and preparation work. Simply opening a web page and uploading data for analysis can greatly accelerate the pace of life science research. Currently, there are many similar online data analysis platforms, including some specialized omics data analysis platforms, such as ImageGP [3], Sangerbox [4], Majorbio Cloud [5], OmicStudio [6], OmicsSuite [7], OmicsAnalyst [8], and so forth. However, most of these analytical platforms were developed based on the needs of genomics and transcriptomics, and almost none are specifically designed for proteomics.

The proteome is translated from the transcriptome and not only possesses the expressive properties of the transcriptome, but also includes additional properties, such as modifications and interactions [9, 10]. In terms of qualitative and quantitative experimental techniques, proteomics is far more complex than genomics and transcriptomics, which imposes additional requirements on data analysis. In recent years, with the advancement of technology, proteomics has gradually played an increasingly important role in medical research [11, 12], leading to a growing and diverse demand for protein data analysis.

Here, we provide the BioLadder bioinformatics platform (https://www.bioladder.cn/), which not only offers some conventional analytical tools but also provides commonly used proteomics analysis tools, including experimental result visualization, sequence‐level analysis, expression data analysis, and functional analysis. Some of the tools are newly developed and currently have no equivalent tools available online.

MODULE DESIGN FOR PROTEOMIC DATA ANALYSIS

Proteomic data analysis can be divided into two main categories (Figure 1): (1) Experimental data analysis: Analysis related to proteomics experimental data, including the analysis of experimental data, expression matrix data analysis, and so forth (Classes 1–3); (2) Common data analysis: Analysis not dependent on proteomics experimental data, including protein sequence analysis, as well as some general classification and functional analysis, and so forth (Classes 4–7).

Figure 1.

Figure 1

BioLadder module classes in the proteome data analysis framework. BioLadder consists of over 50 analysis modules, which belong to two main categories (ExperimentalDataAnalysis and CommonDataAnalysis) and seven classes (C1–C7 in the figure).

The seven classes are outlined as follows:

Class 1. ExpDataVisualization

Experimental data visualization currently includes two modules (CoverageBar and Pep2ProMap), which display the coverage of proteomic identification peptides to proteins, as well as the information on protein digestion sites.

Class 2. DataPreProcessing

Data preprocessing includes data format conversion, normalization, imputation, and so on. It is an important part for the following analysis.

Class 3. QuantitativeAnalysis

Quantitative comparison entails analyzing the quantitative results of each protein and is the most prevalent type of analysis module, subdivided into five groups: (1) DifferenceAnalysis: Differential analysis encompasses differential calculation, FDR (False Discovery Rate) correction, and the visualization of differential results, such as volcano plots, ROC (Receiver Operating Characteristic) curves, and so forth. These modules are capable of both differential calculation and result presentation; (2) QuantitativeDes: Quantitative data description includes creating scatter plots, density plots, distribution bar or line graphs, as well as coefficient of variation (CV). These modules are designed to describe the distribution, density, and other features of quantitative data; (3) QuantitativeComp: Quantitative data comparison includes bar graphs, heat maps, box plots, and so on. These modules are primarily utilized to compare the quantitative differences or variations among different samples or genes; (4) QuantitativeCorr: Quantitative data correlation includes correlation heat maps, correlation matrix graphs, and more. These modules calculate the quantitative correlation between samples or genes to reveal the relationships among samples or genes; (5) QuantitativeCluster: Quantitative clustering includes dimensionality reduction methods, like, PCA (Principal Component Analysis), T‐SNE (T‐Distributed Stochastic Neighbor Embedding), UMAP (UniformManifold Approximation and Projection) for dimension reduction, trend analysis of multiple data sets, TreeDiagram, and so on. These modules generally utilize algorithms for dimensionality reduction or other distance calculation methods to cluster and analyze samples or genes.

Class 4. SeqAnalysis

Sequence analysis refers to analyses that can be completed based on protein sequences, including multiple sequence alignment, sequence motif analysis, calculation of protein physicochemical properties, and so forth.

Class 5. AbundanceMap

The abundance chart offers a convenient way to query and display reference quantitative data for body fluids (currently including blood and urine).

Class 6. ClassificationAnalysis

Classification analysis consists of two groups: classification display and classification comparison: (1) Classification display involves presenting the differences in results of different types after classification using scatter plots, pie charts, area charts, and so forth; (2) Classification comparison entails comparing results of different types using VennChart, Sankey diagrams, Radar charts, and other visualizations.

Class 7. FunctionAnalysis

Function analysis focuses on visualizing enrichment results based on Gene Ontology, as well as drawing interaction network diagrams.

Therefore, the analysis modules included in Bioladder cover experimental data analysis in proteomics research, as well as multiple modules for public sequence data analysis (Table S1). These analysis modules can meet most of the data analysis needs of researchers in the field of proteomics.

SPECIFIC PROTEOME DATA ANALYSIS MODULES

In response to the needs of proteomics research, we have developed several proteome data visualization modules (Table S2), such as (1) coverage analysis of peptide segments in protein sequences, including the CoverageBar and Pep2ProMap modules. These modules are primarily designed for presenting Lip‐MS (Limited Proteolysis‐Mass Spectrometry) experimental results, but can also be used to display identification data from any proteomics experiment; (2) analysis and visualization of quantitative data distribution, including the CV curve and SumCurve modules. Users can utilize these modules to examine the variability and abundance curves of quantitative data; (3) quantification data and marked proteins, including the AbundancePoint and BodyFluidMap modules. The former allows users to input their own quantitative data and specify proteins, while the latter enables users to query the quantitative information of specific proteins in the body fluid database (currently including blood and urine).

We believe that these proteome data visualization modules will meet the demands of proteomics research and provide valuable insights for researchers.

CONVENIENT AND USER‐FRIENDLY DESIGN

To enable users in omics research to utilize our online analysis platform in the most convenient and efficient manner, we have meticulously designed various aspects, including input file formats (Figure 2A), parameter settings (Figure 2B), color schemes (Figure 2C), and so on. We provide help documentation, WeChat customer service, and real‐time tooltips to make it easy for customers to access relevant help information (Figure 2D). Only part of these designs can be implemented in current online cloud platforms (Table S3).

Figure 2.

Figure 2

Four convenient and user‐friendly designs in BioLadder. (A) An example of a popular proteomic file format as the default input format (Venn plot). (B) Specialized default parameters for proteomics, including algorithm selection, data preprocessing, and presentation. Diverse and extensive adjustment methods (volcano plot as an example). (C) Three different groups of color schemes. (D) Comprehensive help information (three kinds of documents and WeChat communication), and convenient real‐time assistance (heatmap plot as an example).

SIMPLIFIED INPUT FORMAT

Many data analysis methods are universal across different fields with its own input data format, which may not be commonly used in the field of proteomics. Proteomics data may require some transformation to facilitate the corresponding analysis. Therefore, in our design, we provide conversion modules for different types of data (e.g., converting between long and wide formats) and design some modules to directly support common proteomics formats. For example, in the Venn diagram module, users can not only input commonly used Venn format data but also directly input quantitative matrix data tables (i.e., usually used in proteomics) for analysis. Additionally, it could also filter out some data below a certain minimum quantitation value, which helps eliminate results that may be caused by noise.

SPECIALIZED DEFAULT PARAMETERS FOR PROTEOMICS, DIVERSE, AND EXTENSIVE ADJUSTMENT METHODS

To meet the specific requirements of proteomics data analysis, we have established suitable default parameters for some modules to minimize the need for parameter adjustments as much as possible.

First, in terms of algorithms, we have adjusted default parameters based on the characteristics of proteomics data. For instance, in correlation calculations, due to the nature of expression data, a few highly abundant proteins may significantly impact the default Pearson correlation calculation. Therefore, in those modules that involved correlation calculations, we have defaulted to using Spearman rank correlation for computation, which were adopted in many proteomics‐related studies as well [13, 14, 15]. Furthermore, considering that there is often significant variation in the identified protein numbers of different samples, conventional normalization methods may inevitably introduce bias. To address this issue, we have incorporated a method called median normalization of common proteins in the normalization module.

Second, in data preprocessing, we made some adjustments based on the data characteristics of proteomics. For example, as most genes tend to be relatively low abundance, directly plotting quantitative distributions often results in most proteins being concentrated in low abundance, which makes the differences between samples hard to discern [13, 14, 15]. Hence, in modules such as box plots, violin plots, and kernel density plots, we have directly set the default to require logarithmic transformation, allowing for clear visualization of quantitative data variances across different samples without any parameter modifications.

Furthermore, we also made some special default parameters in data presentation. For instance, in heatmap analysis, with genes typically numerous on the y‐axis, displaying gene names can often be illegible. Therefore, we have defaulted to display only sample names and omitting gene names for better clarity.

In addition, to cater to user preferences, we have incorporated easily adjustable parameters in several modules, empowering users to customize their display results. For example, in volcano plot analysis, we have included two types of point annotation methods: (1) Customizing protein markers based on a designated marker column in the uploaded file; (2) Batch marking based on p value and fold change thresholds. Similarly, in box plot analysis, users can choose whether to add hypothesis test labels between different groups. We have also devised custom options allowing users to selectively add hypothesis test labels to specific group comparisons (e.g., only annotating significant results or comparisons of particular interest).

POWERFUL COLOR SCHEME

Color scheme is a crucial aspect of data visualization, as improper color combinations can significantly reduce the effectiveness of visualizations.

To address this problem, we have configured default color schemes in all modules, including some default color schemes from R packages or ggplot2 (https://github.com/tidyverse/ggplot2), ensuring users can immediately create refined graphics without additional steps.

Furthermore, more than half the modules have incorporated additional color schemes sourced from commonly used excellent color schemes in literature or journals, such as Nature, Science, and Lancet (ggsci: https://github.com/nanxstats/ggsci).

For users with specific requirements, we offer the option to customize colors. Users can select colors directly using color palettes or precisely modify color configurations by adjusting color codes, enabling them to customize colors for each sample or group based on their preferences and esthetics.

These three functionalities provide our modules with powerful color customization capabilities, catering to various user needs and allowing users to quickly complete color customization according to their preferences.

Additionally, certain modules with unique characteristics utilize special color schemes. For example, the volcano plot module typically only requires three colors for upregulation, downregulation, and nonsignificance, so a color picker is used to set up the tricolor scheme.

COMPREHENSIVE HELP INFORMATION, CONVENIENT REAL‐TIME ASSISTANCE

To ensure users can smoothly utilize our modules for data analysis, we provide helpful information from multiple perspectives in the “User Guide.” First, we offer an introduction to provide an overview of the website structure and functionalities. Second, we have a “Frequently Asked Questions” page that compiles the most common inquiries. Third, detailed documentation is provided for each module. Additionally, we offer a WeChat communication group where users can directly consult our staff about encountered issues.

Furthermore, besides commonly used parameter settings, we have added tooltips for instant assistance, allowing users to access helpful information on parameter settings at any time to help accurately configure the corresponding parameters. For instance, in the heatmap module, four types of tooltips are provided: (1) Tooltip for input file details, including file content explanation, maximum file limits, and file formats; (2) Tooltip for dropdown selection boxes, explaining the meaning of each option; (3) Tooltip for download formats, providing download instructions and graphical explanations of download settings; (4) In the top left corner of result plots in most modules, a “Text Tutorial” link is provided, along with a tooltip explaining the plot, allowing customers to quickly understand the plot's significance. These tooltips enable users to easily access helpful information and seamlessly continue with configuration and data analysis.

METHODS

BioLadder's user interface is engineered upon the Vue.js framework, offering a robust and interactive client‐side experience. The server‐side architecture is meticulously crafted utilizing the Laravel framework, known for its expressive syntax and robust features. The platform's data persistence is managed by MySQL, ensuring reliable and efficient data management. The analytical functionalities are delivered through a synergistic integration of JavaScript for dynamic web interactions, Shiny for creating interactive web applications, and a selection of R packages optimized for statistical analysis and graphics, enabling sophisticated data processing and visualization within the proteomics domain.

AUTHOR CONTRIBUTIONS

Songfeng Wu, Jingli Li, and Yunping Zhu conceived the idea of developing the BioLadder platform. Yupeng Zhang completed the construction of the platform and the implementation of various modules. Chunyuan Yang built and maintained computing services and wrote manuscript. Jinhao Wang assisted in completing some of the development work, while Lixin Wang assisted in conducting research and promotion. Yan Zhao, Longqing Sun, and Wei Sun assisted in the testing and proposed modification suggestions. All authors have read the final manuscript and approved it for publication.

CONFLICTS OF INTEREST STATEMENT

Yupeng Zhang, Jinhao Wang, Lixin Wang, Yan Zhao, Longqing Sun, Wei Sun, Jingli Li, and Songfeng Wu are employees and researchers of Qinglian Biotech Co., Ltd. The remaining authors declare no conflict of interest.

Supporting information

Table S1: BioLadder modules in the proteome data analysis framework.

Table S2: The possible application for each new developed modules.

Table S3: Comparison of BioLadder convenient and user‐friendly designs in different cloud platforms.

IMT2-3-e215-s001.xlsx (13.9KB, xlsx)

ACKNOWLEDGMENTS

This work was supported by the National Key Research Program of China (2021YFA1301603). We thank Shouke Zhang for his invaluable assistance in crafting and enhancing the graphical representations.

These authors contributed equally: Yupeng Zhang and Chunyuan Yang.

Contributor Information

Yunping Zhu, Email: zhuyunping@gmail.com.

Jingli Li, Email: lijingli@qinglianbio.com.

Songfeng Wu, Email: wusf897@gmail.com.

DATA AVAILABILITY STATEMENT

The data and scripts used are saved in GitHub https://github.com/sz-zyp/BioLadder2024. Supporting Information (tables, scripts, graphical abstracts, slides, videos, Chinese translated version, and update materials) is available online DOI or http://www.imeta.science/. Data sharing is not applicable to this article as no new data were created or analyzed in this study.

REFERENCES

  • 1. Karczewski, Konrad J. , and Snyder Michael P.. 2018. “Integrative Omics for Health and Disease.” Nature Reviews Genetics 19: 299–310. 10.1038/nrg.2018.4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Yu, Xiangtian , and Zeng Tao. 2018. “Integrative Analysis of Omics Big Data.” Methods in Molecular Biology (Clifton, N.J.) 1754: 109–135. 10.1007/978-1-4939-7717-8_7 [DOI] [PubMed] [Google Scholar]
  • 3. Chen, Tong , Liu Yong‐Xin, and Huang Luqi. 2022. “ImageGP: An Easy‐to‐Use Data Visualization Web Server for Scientific Researchers.” iMeta 1: e5. 10.1002/imt2.5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Shen, Weitao , Song Ziguang, Zhong Xiao, Huang Mei, Shen Danting, Gao Pingping, Qian Xiaoqian, et al. 2022. “Sangerbox: A Comprehensive, Interaction‐Friendly Clinical Bioinformatics Analysis Platform.” iMeta 1: e36. 10.1002/imt2.36 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Ren, Yi , Yu Guo, Shi Caiping, Liu Linmeng, Guo Quan, Han Chang, Zhang Dan, et al. 2022. “Majorbio Cloud: A One‐Stop, Comprehensive Bioinformatic Platform for Multiomics Analyses.” iMeta 1: e12. 10.1002/imt2.12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Lyu, Fengye , Han Feiran, Ge Changli, Mao Weikang, Chen Li, Hu Huipeng, Chen Guoguo, Lang Qiulei, and Fang Chao. 2023. “OmicStudio: A Composable Bioinformatics Cloud Platform with Real‐Time Feedback That Can Generate High‐Quality Graphs for Publication.” iMeta 2: e85. 10.1002/imt2.85 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Miao, Ben‐Ben , Dong Wei, Gu Yi‐Xin, Han Zhao‐Fang, Luo Xuan, Ke Cai‐Huan, and You Wei‐Wei. 2023. “OmicsSuite: A Customized and Pipelined Suite for Analysis and Visualization of Multi‐Omics Big Data.” Horticulture Research 10: uhad195. 10.1093/hr/uhad195 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Zhou, Guangyan , Ewald Jessica, and Xia Jianguo. 2021. “OmicsAnalyst: A Comprehensive Web‐Based Platform for Visual Analytics of Multi‐Omics Data.” Nucleic Acids Research 49: W476–W482. 10.1093/nar/gkab394 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Fan, Shuhao , Kong Chengcheng, Zhou Ren, Zheng Xianrui, Ren Dalong, and Yin Zongjun. 2024. “Protein Post‐Translational Modifications Based on Proteomics: A Potential Regulatory Role in Animal Science.” Journal of Agricultural and Food Chemistry 72: 6077–6088. 10.1021/acs.jafc.3c08332 [DOI] [PubMed] [Google Scholar]
  • 10. Vogel, Christine , and Marcotte Edward M.. 2012. “Insights into the Regulation of Protein Abundance from Proteomic and Transcriptomic Analyses.” Nature Reviews Genetics 13: 227–232. 10.1038/nrg3185 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Liao, Yuxing , Savage Sara R., Dou Yongchao, Shi Zhiao, Yi Xinpei, Jiang Wen, Lei Jonathan T., and Zhang Bing. 2023. “A Proteogenomics Data‐Driven Knowledge Base of Human Cancer.” Cell Systems 14: 777–787.e775. 10.1016/j.cels.2023.07.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Li, Yize , Dou Yongchao, Da Veiga Leprevost Felipe, Geffen Yifat, Calinawan Anna P., Aguet François, Akiyama Yo, et al. 2023. “Proteogenomic Data and Resources for Pan‐Cancer Analysis.” Cancer Cell 41: 1397–1406. 10.1016/j.ccell.2023.06.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Jiang, Ying , Sun Aihua, Zhao Yang, Ying Wantao, Sun Huichuan, Yang Xinrong, Xing Baocai, et al. 2019. “Proteomics Identifies New Therapeutic Targets of Early‐Stage Hepatocellular Carcinoma.” Nature 567: 257–261. 10.1038/s41586-019-0987-8 [DOI] [PubMed] [Google Scholar]
  • 14. Xu, JunYu , Zhang Chunchao, Wang Xiang, Zhai Linhui, Ma Yiming, Mao Yousheng, Qian Kun, et al. 2020. “Integrative Proteomic Characterization of Human Lung Adenocarcinoma.” Cell 182: 245–261.e217. 10.1016/j.cell.2020.05.043 [DOI] [PubMed] [Google Scholar]
  • 15. Gao, Qiang , Zhu Hongwen, Dong Liangqing, Shi Weiwei, Chen Ran, Song Zhijian, Huang Chen, et al. 2019. “Integrated Proteogenomic Characterization of Hbv‐Related Hepatocellular Carcinoma.” Cell 179: 561–577.e522. 10.1016/j.cell.2019.08.052 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1: BioLadder modules in the proteome data analysis framework.

Table S2: The possible application for each new developed modules.

Table S3: Comparison of BioLadder convenient and user‐friendly designs in different cloud platforms.

IMT2-3-e215-s001.xlsx (13.9KB, xlsx)

Data Availability Statement

The data and scripts used are saved in GitHub https://github.com/sz-zyp/BioLadder2024. Supporting Information (tables, scripts, graphical abstracts, slides, videos, Chinese translated version, and update materials) is available online DOI or http://www.imeta.science/. Data sharing is not applicable to this article as no new data were created or analyzed in this study.


Articles from iMeta are provided here courtesy of Wiley

RESOURCES