Abstract
Mass spectrometry (MS)-based proteomics focuses on identifying and quantifying peptides and proteins in biological samples. Processing of MS-derived raw data, including deconvolution, alignment, and peptide-protein prediction, has been achieved through various software platforms. However, the downstream analysis, including quality control, visualizations, and interpretation of proteomics results remains cumbersome due to the lack of integrated tools to facilitate the analyses. To address this challenge, we developed QuickProt, a series of Python-based Google Colab notebooks for analyzing data-independent acquisition (DIA) and parallel reaction monitoring (PRM) proteomics datasets. These pipelines are designed so that users with no coding expertise can utilize the tool. Furthermore, as open-source code, QuickProt notebooks can be customized and incorporated into existing workflows. As proof of concept, we applied QuickProt to analyze in-house DIA and stable isotope dilution (SID)-PRM MS proteomics datasets from a time-course study of human erythropoiesis. The analysis resulted in annotated tables and publication-ready figures revealing a dynamic rearrangement of the proteome during erythroid differentiation, with the abundance of proteins linked to gene regulation, metabolic, and chromatin remodeling pathways increasing early in erythropoiesis. Altogether, these tools aim to automate and streamline DIA and PRM-MS proteomics data analysis, making it more efficient and less time-consuming.
Keywords: Data mining and visualization, data-independent acquisition, erythropoiesis, liquid chromatography-tandem mass spectrometry, mass spectrometry, parallel reaction monitoring, proteomics, QuickProt, stable isotope dilution
Introduction
Over the last few decades, mass spectrometry (MS)-based proteomics has emerged as one of the most dominant fields for protein profiling [1,2]. Advances in high-resolution MS instruments and computational tools have enabled researchers to characterize large-scale proteomes across various biological systems, with applications ranging from cell state transitions to structural biology and drug design [3,4]. Data-independent acquisition (DIA) is an untargeted MS technique used in proteomics analysis for the identification and relative quantification of peptides and proteins [5,6]. Unlike conventional approaches such as data-dependent acquisition (DDA), where precursor peptide ions are typically selected for fragmentation based on their intensities, DIA uses a series of predefined mass-to-charge (m/z) windows to select and fragment co-eluting precursor ions simultaneously [5,7,8]. This allows for the unbiased and often more reproducible identification of peptides and proteins, making it ideal for discovery proteomics experiments [9,10].
Parallel reaction monitoring (PRM) is a targeted MS strategy in which the mass spectrometer is programmed to analyze a predefined set of m/z values corresponding to peptides of interest [11,12]. By focusing on a predetermined set of target peptides, the technique can achieve low detection limits and high reproducibility. Compared to wide-window DIA, the resulting fragment ion chromatograms from PRM are less complex, simplifying their deconvolution and downstream processing [13,14]. Combined with stable isotope dilution (SID), PRM allows absolute quantification of peptides and proteins [15].
Although many software tools have been developed to identify and quantify peptides/proteins from raw LC-MS/MS data, such as DIA-NN [16], Spectronaut, OpenMS [17] and EncyclopeDIA [18], only a few offer the tools needed for quality control, data visualization, and statistical analysis (e.g., MSstats) [19] among the proteomes of experimental groups in an integrated package. While some tools like Skyline [20], Perseus [21], and MaxQuant [22] do provide some of these services, users are often limited to settings pre-established by the developer, restricting the number and types of possible customizations. Here, we present QuickProt, a series of seven Python-based Google Colab notebooks dedicated to the data mining and visualization of DIA and PRM MS-proteomics datasets (Figure S1 and Table S1).
QuickProt development and implementation
The QuickProt tool is designed to analyze DIA and PRM data through QuickProt-DIA and QuickProt-PRM modules, respectively (Figure 1, Figure S1). It enables proteomics data analysis to be more efficient and less time-consuming by automating several tasks. Users with no coding expertise can easily utilize these tools. The user only needs to provide two input tables generated from DIA-NN [16] or a single input table from Skyline [20] software, two widely-used, freely available algorithms used for the initial analysis of DIA and PRM analysis, respectively. By simply clicking on the run button, users can execute all of the code in the notebooks. All outputs from these notebooks, including tables and figures, will be saved in dedicated folders. The resulting spreadsheet tables will be saved in a folder named ‘TABLES’ while the generated figures will be saved as TIFF images at a resolution of 300 dpi in the ‘PLOTS’ folder (Table S1), all within the directory specified by the user (Figure S2). Furthermore, as open-source code, these tools can be modified to suit other bioinformatics workflows, depending on the investigator’s needs. All notebooks can be easily accessed at https://github.com/OmarArias-Gaguancela/QuickProt, where all of the modules have descriptions and links to their respective notebooks or pipelines.
Figure 1: Overview of QuickProt Google Colab notebook tool for DIA and PRM analysis.

In the left panel, the QuickProt-DIA workflow is represented. LC-MS/MS data is exported in RAW file format and subsequently converted to mzML format. Next, mzML files are imported into DIA-NN software for peptide and protein identification and relative quantification. Alternatively, mzML files can be imported into Skyline for quantification while using the DIA-NN spectral library for identification. Skyline (DIA_RESULTS.csv) or DIA-NN (report.tsv and report.stats.tsv) tables from either workflow can be analyzed in the Python-based QuickProt-DIA module composed of QuickProt-DIA (DIA-NN) or QuickProt (Skyline) notebooks. Both generate annotated tables and figures that fall into the following categories: preprocessing (e.g., sample filtering), quality control (e.g., coefficient of variation), peptide and protein yields, exploratory analysis (e.g., hierarchical clustering dendrogram), protein abundance (e.g., heatmap visualization), and enrichment analysis (e.g., KEGG). In the right panel, the QuickProt-PRM module workflow is represented. LC-MS/MS data in RAW file format is imported into Skyline alongside an isolation list (IsolationList.csv) and a selected spectral library. Following Skyline analysis, the output table ‘PRM_RESULTS_Free_label.csv’ or ‘PRM_RESULTS_Heaby_label.csv’ is imported into QuickProt-PRM for analysis. QuickProt (Label-free) notebook outputs are generated automatically, whereas, for heavy-label experiments, QuickProt (Heavy-label) notebook requires additional information (e.g., amount of spiked HLIS) for the generation of absolute quantities. The lower left and right panels show tables with the contents produced from each notebook. Abbreviations: LC-MS/MS: liquid chromatography-tandem mass spectrometry; mzML: Mass Spectrometry Markup Language; MW: molecular weight; HLIS: heavy labeled internal standard; KEGG: Kyoto Encyclopedia of Genes and Genomes; GO: gene ontology.
QuickProt-DIA module overview
The QuickProt-DIA module includes two pipelines, depending on whether the input file is obtained from DIA-NN or Skyline. (Figure 1, Figures S1–S7). QuickProt-DIA (DIA-NN) pipeline uses the output table from DIA-NN, namely ‘report.tsv’, as an input file that contains peptides and proteins identified through their neural network-based spectral library and quantified by a MaxLFQ-like algorithm [16]. On the other hand, QuickProt-DIA (Skyline) pipeline uses a table from Skyline named ‘DIA_RESULTS.csv’ for processing. In QuickProt-DIA (Skyline), we used the DIA-NN spectral library for identification and Skyline to quantify peptides and proteins. Depending on the input table (DIA-NN or Skyline), the QuickProt-DIA module produces outputs that fall into the following categories: preprocessing, quality control, analysis of peptide and protein yields, exploratory analysis, protein abundance analysis, and gene enrichment analysis (Figure 1).
Preprocessing:
During the preprocessing stage, the user can rename the samples and add experimental group names to the replicates of a given condition (Figure S3A), a function not currently available in DIA-NN but provided in Skyline. In the case of QuickProt-DIA (DIA-NN), we filtered out non-proteotypic peptides and peptides that were assigned multiple Protein IDs in the DIA-NN prediction. Data can be preprocessed using a data manipulation module (Figure S3B) that includes options for imputation (minimum, median, k-Nearest Neighbors, random forest, extra trees, and iterative methods), normalization (z-score, quantile, and median), and/or batch correction using the linear model-based ComBat method [23] (Figure S3B). Among the imputation methods available, we recommend using a random forest-based approach, as prior research has consistently shown it to outperform other traditional methods [24,25]. It is important to denote that the data manipulation module is optional and should be used on a case-by-case basis. Users are encouraged to assess their data prior to applying any preprocessing steps such as imputation, normalization, or batch correction, and to choose only the methods appropriate for their specific research goals. We also incorporated an optional control point to filter out proteins that do not meet a specified unique peptide threshold (Figure S4). This feature is particularly beneficial for removing proteins that are identified by only a single peptide, which could result in false positives and/or unreliable quantification [26]. Implementing this type of quality control is recommended in discovery MS proteomics experiments [26]. Then, the user should run the code in the ‘Metrics to be calculated’ section, which computes and generates tables that will be used for plotting the data in later stages of the notebook. Finally, both QuickProt-DIA pipelines allow customization of the layout of the samples and experimental groups. (Figure S4).
Quality Control:
For quality control purposes, we included a section to generate a coefficient of variation (CV) plot and correlation plot that displays Spearman’s correlation coefficient and data distribution among the replicates/samples of a given experimental group. For QuickProt-DIA (DIA-NN or Skyline), we added a section for plotting the distribution of MS points across the chromatographic peaks for each experimental group (Figure S4). Since DIA quantification is generally based on MS2 chromatograms, we used MS2 points as the default choice. However, this can be adjusted to MS1, depending on the objective of the analysis. Notably, when using QuickProt-DIA (DIA-NN), users must have the DIA-NN output files ‘report.stats.tsv’ (for cycle time calculation) and ‘report.tsv’ (for full width at half maximum calculation) in their directory. Both parameters are required to calculate the number of points across the peak. By contrast, QuickProt-DIA (Skyline) only requires ‘DIA_RESULTS.csv’ for calculation of MS points across the peak. Altogether, we highly recommend first running these quality control sections of the notebook, as these outputs help to assess the reproducibility and technical variability in the experiment.
Peptide and Protein Yield:
Next, QuickProt-DIA generates annotated tables and plots for the total number of peptides and proteins found in a given experiment (Figure S5). Additionally, it provides a comprehensive breakdown of the number of shared proteins between experimental groups and the number of unique proteins in each experimental group. These results are concurrently extracted into annotated CSV tables, which are deposited in a subfolder called ‘SHARED_UNIQUE_PROTEINS’ within the ‘TABLES’ folder, allowing easy access for the user (Table S1). These pipelines also display a density plot for the number of peptides and calculate the median number of peptides per protein in the dataset. Additionally, users can enter the name of a specific protein or gene of interest to automatically visualize the number of peptides detected for that protein in the dataset (Figure S5).
Exploratory Analysis:
In the exploratory analysis, the user can display a hierarchical clustering dendrogram, correlation matrix, principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), and variable importance in projection (VIP) scores plots to identify groupings and connectivity relationships between the experimental groups tested in the experiment (Figure S5).
Protein Abundance:
For protein abundance visualization (Figure S6), we introduce a protein ranking tool where the user can arrange the proteins identified in each experimental group and rank them based on their estimated abundance levels. We provide a dedicated section for visualizing relative protein abundances through a volcano plot. For this analysis, a t-test is performed. Users can compare the fold change (FC) and P-values derived from the comparison of protein abundances between two experimental groups. To make it more interactive and user-friendly, we added drop-down menus allowing users to select P-values (0.05 to 0.001) and FC thresholds (0.5 to 10) that fit the experiment’s needs. These values are then converted to -log10 and log2 scales for plotting. Accordingly, a list of the total number of proteins analyzed is generated as an output table, along with lists of only upregulated or downregulated proteins in a subfolder called ‘VOLCANO_PLOT_VALUES’ within the ‘TABLES’ folder (Table S1). We went a step further by creating a section (Figure S6) that allows the user to visualize individual protein abundances across experimental samples by simply selecting the name of the protein or gene of interest from a drop-down menu and then clicking a ‘Generate Plot’ button to yield a bar plot with its respective statistical analysis. For t-test analysis, the user selects the reference experimental group against which the statistical analysis will be performed. The bars of the plot will be automatically assigned an asterisk for statistically significant differences, or the letters ‘n.s.’ will be added on top of the bar when no statistical differences in protein abundances are found between the experimental groups of interest. We also incorporated an ANOVA with Tukey’s post-hoc test option and used the Compact Letter Python library to automatically indicate significant differences among multiple groups (Figure S6).
The user has the option to plot the abundances of all quantified proteins in each experimental group or sample via clustering heatmap visualization (Figure S6). We added a section where the user can input a list of proteins or genes of interest to be displayed in a heatmap. In this case, neither imputation nor clustering is conducted. Missing values are depicted with the letters ‘n.d.’ whereas the normalized values for the abundances of the selected proteins are shown on the log2 scale. Alternatively, in another section, we used the ‘IterativeImputer’ tool from the Scikit-learn machine learning library [27], which uses an iterative approach to handle missing values, and then this data is used to display a clustering heatmap comparing the proteomes among experimental groups.
Enrichment Analysis:
We incorporated the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) tools in the enrichment analysis section of the QuickProt-DIA pipelines to identify biological pathways, processes, components, and functions that are enriched in a dataset. Hence, QuickProt-DIA facilitates KEGG and GO analyses on DIA datasets to provide a systems view of the biological processes that are affected in an experimental group, which can help prioritize proteins for potential follow-up experiments. In these pipelines, the user can choose whether to perform the analysis on the list of total proteins of a given experimental group, the total number of proteins compared in the volcano plot, or exclusively on upregulated or downregulated proteins derived from the same volcano plot analysis (Figure S7).
QuickProt-PRM module overview
QuickProt-PRM module comprises two pipelines, namely, QuickProt-PRM (Label-free) and QuickProt-PRM (Heavy label), for processing of PRM data from Skyline through input tables titled ‘PRM_RESULTS_Free_label.csv’ or ‘PRM_RESULTS_Heavy_label.csv’, respectively (Figure 1, Figures S8–S11). Like the DIA module, QuickProt-PRM also produces outputs including categories such as preprocessing, quality control, peptide and protein yields, exploratory analysis, and protein abundance analysis. During the preprocessing step, customization of the sample and experimental group layout is provided for both notebooks (Figures S8–S9). QuickProt-PRM (Label-free) is dedicated to relative abundance quantification, whereas QuickProt-PRM (Heavy label) aims to calculate the number of molecules of a given protein in a sample using spiked-in isotopically heavy labeled internal standard (HLIS) proteins or peptides. These values can then be converted to copies per cell or organelle (e.g., nucleus) with the appropriate conversion factors. To accomplish such absolute calculations, the user must provide the molecular weight of the spiked HLIS, the amount of spiked HLIS, the known amount of protein per nucleus or cell used in the experiment, and the amount of sample injected into the LC-MS/MS (Figure S9). In the QuickProt-PRM (Label-free) and QuickProt-PRM (Heavy label) notebooks, users have the option to normalize the data based on a reference protein and/or apply the data manipulation module (Figure S10). Just like in the QuickProt-DIA notebooks (Figure S5), the QuickProt-PRM notebooks also allow the generation of CV, correlation, and MS2 point plots during the quality control stage. Users can plot the number of quantified peptides for a given protein. Exploratory analysis features include hierarchical clustering dendrograms, a correlation matrix, PCA, PLS-DA, and a VIP scores plot.
As in QuickProt-DIA, QuickProt-PRM (Label-free) allows the user to visualize the estimated relative abundance of proteins via protein ranking, volcano, bar plots, and heatmaps. For QuickProt-PRM (Heavy label), absolute quantities of a protein (e.g., #copies/ nucleus or #copies/ cell) can be visualized through bar and/or line trend plots. We provide a section where users can select a given protein of interest and assess the statistical differences in abundance among experimental groups. In addition, one of the advantages of using PRM proteomics with HLIS is the ability to compare stoichiometric changes between different proteins. QuickProt-PRM leverages this by enabling an option to display and assess the statistical differences (Figure S11) between different proteins in an experimental group. In both types of comparisons, ANOVA with Tukey’s post-hoc statistical tests is performed, and the results are displayed automatically in the bar plots. We used the compact letter display to automatically depict significant differences among multiple groups. Finally, QuickProt-PRM allows users to display the abundance of all the targeted proteins for all experimental groups or by sample via clustering heatmaps. There is also an option to specifically display certain proteins of interest in the heatmap depending on the user’s requirements.
Lastly, we introduce two more modules, named QuickProt-PepSeq and QuickProt-ID Search. The first module is designed to evaluate the number of peptides identified in a specific amino acid sequence (e.g., domain) from a DIA dataset (Figure S12). This module is compatible with inputs from DIA-NN or Skyline reports processed using QuickProt-DIA, specifically ‘report_updated.csv’ or ‘DIA_RESULTS_UPDATED.csv’, respectively. In both notebooks, users simply need to input the sequence of interest and click the run button to produce an Excel spreadsheet (Table S1) with annotated tabs, displaying the number of peptide matches and their respective sequences by sample and experimental group. A bar plot of the number of peptide matches is also displayed (Figure S12). QuickProt-ID Search maps protein IDs in the UniProt database to a list of gene names provided by the user (Figure S13). This is especially helpful in cases where the investigator deals with thousands of genes.
DIA dataset and analysis using QuickProt-DIA
DIA dataset:
To demonstrate the performance of QuickProt-DIA, we analyzed an in-house generated DIA dataset. We employed a well-characterized ex vivo culture system method in which cord blood-derived human multipotent hematopoietic stem and progenitor cells (HSPCs) were induced to differentiate along the erythroid lineage [28,29] (Method S1). Samples were collected on days 0, 2, 4, 6, 8, 10, 11, 12, and 14, representing sequential stages of erythropoiesis, thus yielding cell lineages from HSPCs (day 0) to polychromatophilic and orthochromatic erythroblasts (day 14). Nuclear extracts were prepared from the cells (Method S1) and used to generate peptide samples for LC-MS/MS analysis on an Orbitrap Eclipse mass spectrometer operated in DIA mode (Method S2–S4; Table S2 and S4). We used DIA-NN to generate a spectral library for peptide and protein identification due to its high sensitivity and processing speed. Peptide-to-protein quantification was determined with Skyline, where chromatograms were extracted, and qualitative analysis of each product ion peak was performed, providing an additional safeguard to enhance data reliability. The quantitative information was exported from Skyline, which was then processed using the QuickProt-DIA (Skyline) pipeline (Figure 2).
Figure 2: Comprehensive analysis of erythropoiesis time-course DIA proteomics data via QuickProt-DIA (Skyline) notebook.

Samples on days 0, 2, 4, 6, 8, 10, 11, 12, and 14 (D0–14) during erythroid differentiation were collected for DIA discovery proteomics analysis. A) Coefficient of variation, with median values depicted inside each violin plot. B) Spearman’s correlation coefficient (ρ) and data distribution plots among replicates for day 0. C) Distribution of MS2 data points across the chromatographic peak, with median values depicted for the proteomes of each experimental group. D) Number of peptides identified in each experimental group, with median values depicted on top of each bar graph. E) Number of proteins identified in each experimental group, with median values depicted on top of each bar graph. Data represent the median ± SD of two biological replicates. F) Hierarchical clustering dendrogram. G) Correlation matrix among experimental groups. H) Clustering heatmap. I) Volcano plot for differential expression between D0 and D2. J), and D0 and D14. For I) and J), a P-value ≤ 0.05 and an FC (fold-change) threshold of > |2| were chosen for the analyses.
DIA analysis results:
For this analysis, the minimum peptide threshold per protein was set at two for all samples in the dataset. Then, QuickProt-DIA was used to evaluate the quality and reproducibility of the DIA data. The CVs varied from 7.3% to 18.4% in the experimental groups tested (Figure 2A). As these values were below the recommended 20% threshold [30–32], they indicate acceptable biological variability among the replicates. The correlation plot for Day 0 replicates shows a Spearman’s rank correlation coefficient of 0.959 (Figure 2B), and similar results were obtained for the other days of the time course (Figure S14). We also examined the distribution of MS2 points across chromatographic peaks; in this experiment, the median ranged from 8 to 9 MS2 points (Figure 2C). This range has been reported to be optimal for appropriately representing the peptide peak shape for accurate quantification [33]. Lastly, we assessed the median number of peptides per protein in each experimental group and found a median value of 4 to 5 for peptides per protein (Figure S15A). These metrics suggest that the LC-MS/MS method used was able to meet certain quality control requirements, ensuring robust quantitative accuracy for the experiment.
Regarding peptide and protein identification during the time course study, the total number of peptides varied from 11,812 to 30,398 (Figure 2D), yielding 2,072 to 3,891 proteins (Figure 2E), respectively. Notably, the core proteome, representing the number of proteins shared between all experimental groups, consisted of 1,813 identified proteins. The number of shared proteins between different samples varied from 96 to 1,958 (Figure S15B). Day 8 had the highest number of unique proteins (75), and Days 6 and 12 had the lowest number of unique proteins (3–4) (Figure S15C).
Inspection of the outputs from the exploratory analysis tools revealed a marked grouping at certain stages of erythropoiesis progression. Hierarchical clustering identified two major branches. The first branch comprised two sub-branches: one with day 0, and the other with days 2, 4, and 6. By contrast, the second branch contained two major sub-branches; one comprised days 12 and 14, and the other days 8, 10, and 11 (Figure 2F). In the correlation matrix, days 0 and 2 had the lowest levels of correlation compared to the other days of the time course, particularly when compared to days 12 and 14 (Figure 2G). Inspection of the PLS-DA, but not the PCA, revealed a grouping pattern in which samples from the later stages of the time course (days 12–14) separated from earlier days (Figure S16A–B). The VIP scores plot identified the top five proteins contributing to the grouping pattern observed in the PLS-DA plot (Figure S16C). This suggests timely and dynamic changes from early to late stages of erythropoiesis. When performing a clustering heatmap for protein abundance, the protein expression profiles for each day were readily visualized, and differentially expressed proteins could be easily detected. (Figure 2H). Given the important role of chromatin-modifying complexes in transcriptional regulation during erythropoiesis through modulation of chromatin accessibility [34], we used QuickProt-DIA to inspect the expression profiles of subunits of a few chromatin-modifying complexes [35], including the following: BAF (BRG1- or BRM-associated factor), ISWI (Imitation Switch), NuRD (Nucleosome Remodeling and Deacetylase), INO80 (Inositol requiring 80), SAGA (Spt-Ada-Gcn5 Acetyltransferase), ATAC (Ada Two-A Containing), SRCAP (Snf2-related CREBBP Activator Protein), and ATR-X (Alpha Thalassemia/Mental Retardation Syndrome X-linked) (Figure S17). QuickProt-DIA generated heatmaps that allowed easy visualization of the expression patterns and relative abundances for the members of these complexes over the time course. Interestingly, the abundances of several members of the BAF, e.g., ARID1A, and ISWI, e.g., SMARCA5 complexes, peaked at days 2 and 8, implying a requirement for these subunits at these specific time points during erythropoiesis (Figure S17).
Next, we used QuickProt-DIA to evaluate the protein abundance rankings for proteomes on each day of erythropoiesis (Figure S18). Notably, AHNAK was consistently the most abundant protein at all-time points.
Supported by the initial evidence from the exploratory analysis, we hypothesized that the number of upregulated proteins would increase significantly from the initial starting point on day 0 throughout certain time points during erythropoiesis progression. To support this hypothesis, we used the volcano plot tool (Figure 2I–J; Figure S19) in the QuickProt-DIA (Skyline) pipeline and performed a statistical analysis of the proteomes of day 0 compared to each subsequent day. Like the day 0 vs. day 2 comparison (Figure 2I), when day 0 was compared to days 4–11, approximately 93% of the significantly changed proteins were upregulated, while the rest were downregulated. Notably, after day 11, the number of upregulated proteins declined, continuing until day 14 (Figure 2I and J; Figure S19). On day 14, out of the 551 significantly changed proteins, around 81% were downregulated, and 19% were upregulated (Figure 2J). Consistent with these findings, when inspecting individual selected proteins (e.g., ARID1A, SMARCC2, or BAZ1B), we observed a dramatic spike on day 2 (ARID1A and SMARCC2) or day 3 (BAZ1B) and a decrease on days 12 and 14 compared to day 0 (Figure S20). These data reveal the dynamics of the proteome during erythropoiesis and suggest that specific proteins may need to be upregulated in the early stages, whereas certain proteins may require downregulation in the later stages for proper differentiation.
Lastly, we used the KEGG and GO tools in the QuickProt-DIA notebook to evaluate the biological role of proteins identified during different stages of erythropoiesis (Figures S21–S28). KEGG and GO data showed no major differences when inspecting the total number of proteins (Figure S21, S24–S26). However, focusing on significantly upregulated or downregulated proteins revealed some interesting findings. For instance, in the day 0 vs. day 2 comparison, upregulated proteins at day 2 were enriched in processes like spliceosome activity, primary metabolism (e.g., fatty acid metabolism), and ATP-dependent chromatin remodeling (Figure S22A). The enrichment of proteins involved in chromatin remodeling at day 2 is consistent with the notion of priming at the early stages of development, when the chromatin is relatively open [36], but transcription is still low. Conversely, in the day 0 vs. day 14 comparison, pathways associated with phagosome vesicle production were most enriched (Figure S22B) at day 14, likely linked to the extrusion of nuclei in the late stages of erythropoiesis [37–39]. By contrast, downregulated proteins in the day 0 vs. day 2 comparison exclusively affected pyruvate and glycolysis metabolism, while the day 0 vs. day 14 comparison showed downregulation in pathways related to spliceosome activity, DNA replication, nucleocytoplasmic transport, carbon metabolism, among others (Figure S23). This analysis indicates that certain pathways enriched on day 2 are downregulated at later stages of erythroid development, suggesting a timely and dynamic regulation during erythropoiesis.
We assessed molecular function, cellular components, and biological processes in the GO analysis (Figures S24–S28). For example, the upregulated proteins on day 2 showed enrichment for translation, ribosome biogenesis, nucleic acid, and protein binding processes (Figure S27). On day 14, cellular and protein localization, transport and binding, and ATP-related processes were prominent (Figure S27). Conversely, downregulated proteins on day 2 favored processes such as hemoglobin binding and cellular detoxification, while on day 14, protein and mRNA binding and processing, and chromosome reorganization were affected (Figure S28). Altogether, the QuickProt DIA (Skyline) pipeline efficiently provides an overview of the biological processes underlying erythroid differentiation, establishing an initial framework for evaluating complex proteomic datasets to extract meaningful biological insights.
PRM dataset and analysis using QuickProt-PRM
PRM dataset:
The mammalian SWI/SNF complex, also known as the BAF complex, is an ATP-dependent chromatin remodeling complex that plays important roles in gene regulation during cell differentiation [40–46]. This remodeler is organized into three distinct configurations: canonical-BAF (cBAF), polybromo-associated-BAF (PBAF), and non-canonical-BAF (ncBAF) [40,47]. Each complex comprises 12–15 subunits, with some subunits shared between different complexes, and many of the subunits have one or more paralogues. Mutations in protein components of BAF have been linked to numerous forms of cancer, neurodevelopmental disorders, and defective erythropoiesis [40,41,44,48,49]. An analysis of BAF subunit copy number during erythropoiesis would provide insights into the relative stoichiometries of the subunits and their expression patterns during erythropoiesis, which could guide future studies on the roles of individual BAF subunits in chromatin remodeling and gene expression during this process [44]. Towards this aim, we determined the absolute abundances of 21 subunits (including paralogues) of the cBAF complex (all subunits except for ACTB) by SID-PRM mass spectrometry during the time course of erythropoiesis. This dataset was then analyzed using QuickProt-PRM (Heavy label) for easy visualization and interpretation. For this data generation, we created heavily labeled concatemer proteins (QconCATs) [50,51] containing up to 4 peptides for each cBAF subunit. Known amounts of these QconCATs were then spiked into the nuclear extract fractions, followed by enzymatic digestion (Method S2–3; Table S3–S7). Peptides obtained after co-digestion were cleaned and analyzed by PRM-MS to acquire targeted spectral information for the cBAF subunits (Method S3–4; Table S3–S4). Raw data (Table S4) was processed in Skyline, where, after chromatogram extraction, each product ion peak was qualitatively monitored to ensure the reliability of the corresponding quantitative information. After analysis and refinement, the values were exported and analyzed using the QuickProt-PRM (Heavy label) pipeline (Figure 3). The preprocessing step involved several transformations (Method S5). First, we calculated the number of femtomoles of the spiked HLIS for each QconCAT (Table S6). Next, we calculated the heavy-to-light ratios for each peptide, which were used to determine the number of femtomoles of the light peptide. We then determined the number of protein copies using Avogadro’s number. Then, we used the amount of injected nuclear extract per sample to calculate the number of injected nuclei, based on the measured protein amount per nucleus for each day in the time course (Table S7). Finally, we calculated the number of protein copies per nucleus by dividing the total number of protein copies by the number of injected nuclei for each day (Method S5).
Figure 3: Targeted analysis of cBAF proteins during erythropoiesis via QuickProt-PRM (Heavy-label) notebook.

Erythroid samples were spiked with heavily labeled concatamers (QconCATs) for absolute quantification (#copies/nucleus) of 21 cBAF proteins during erythroid differentiation via PRM proteomics. A) Coefficient of variation, with median values depicted inside each violin plot. B) Hierarchical clustering dendrogram. C) Clustering heatmap for proteins in the cBAF complex. D) Trend line plot for ARID1 paralogues, ARID1A and ARID1B. E) Bar plots for the absolute abundance of ARID1A and ARID1B, each plotted independently throughout the time course. F) Stoichiometric evaluation between ARID1A and ARID1B was conducted at four time points (D0, D2, D8, and D11). Data represent the median ± SD of two biological replicates. A compact letter display is automatically assigned on top of every bar. Different letters denote significant differences (P ≤ 0.05) by ANOVA with Tukey’s post-hoc test.
PRM analysis results:
Quality control analysis revealed CVs ranging from 4.3–17.1% (Figure 3A). A Spearman’s rank correlation coefficient of approximately 0.9 among the samples (Figure S29) and a median of 28–32 MS2 points across chromatographic peaks (Figure S30A) was observed. These data indicate excellent reproducibility among the replicates from each experimental group. Exploratory analysis revealed two major groupings; the first comprised days 0, 4, 6, 12, and 14, while the second included days 2, 8, 10, and 11 (Figure 3B). Inspection of the correlation matrix showed high-level similarity among days, except for day 14, which had the lowest level of correlation compared to the others (Figure S30B). Both PCA and PLS-DA revealed a grouping pattern in which early and late stages of erythropoiesis appear separated, particularly between day 0 and day 14 (Figure S31A–B). The VIP scores plot identified the top five proteins contributing to the PLS-DA grouping pattern: SMARCD3, BCL7A, ACTL6B, DPF1, and SMARCD1 (Figure S31C). This exploratory analysis provides insights into the separation and relatedness of samples across the erythropoiesis time course based on the absolute protein abundance profiles of cBAF subunits.
Employing the peptide and protein yield option in QuickProt PRM (Heavy label), a total median peptide yield of 50–54 was detected for up to 21 cBAF proteins (Figure S32A–B). Peptide distribution showed a median of 2–3 peptides per protein detected (Figure S32C). Heatmap clustering and protein abundance ranking analysis revealed that among the 21 proteins quantified, ACTL6A was consistently the most abundant, whereas DPF1, DPF3, and SSL18L1 were frequently the least abundant (Figure 3C; Figure S33; Table S8–9). Further, we inspected the relative stoichiometry of all cBAF paralogues during the erythroid time course (Figure 3D; Figure S34). For example, ARID1 paralogues, namely ARID1A and ARID1B, showed a similar number of copies per nucleus at day 0 (2,371 and 1,828 copies, respectively; P>0.05). Then, rapid and significant spikes in abundance were noticed on days 2, 8, and 11, followed by a decline until day 14 for both paralogues (Figure 3D–E). Noticeably, ARID1A yielded more copies per nucleus than ARID1B at most points of the time course (average difference of 2,763 copies on days 2–12; P≤0.05) (Figure 3D, F). Discordant expression levels between paralogues were also observed for ACTL6A over ACTL6B, DPF2 over DPF1 and DPF3, SMARCA4 over SMARCA2, and to a lesser extent, in BCL7A over BCL7B and BCL7C, and SS18 over SS18L1 (Figure S34). Interestingly, SMARCD2 had a major spike exclusively on day 11, then dramatically decreased until day 14. Although SMARCD2 was mostly elevated relative to the other paralogues (SMARCD1 and SMARCD3), it is important to note a shift in relative abundance on day 14, where SMARCD3 was higher than the other two paralogues (Figure S34; Table S8–9). The observed changes in the relative stoichiometry for these paralogues during the time course could indicate a change in the composition [52,53] of cBAF or PBAF during erythroid development.
Unlike the other paralogues where the higher expression of one paralogue over another is evident, SMARCC1 and SMARCC2 yielded similar copy numbers throughout several data points, with some exceptions (e.g., day 14) (Figure S34; Table S8–9). Lastly, to complete the analysis of cBAF members, we also analyzed the abundance trend of SMARCE1 (no paralogue) and found abundance spikes on days 2, 8, and 11 (Figure S34; Table S8–9).
The large range of subunit abundances at each time point (e.g., 702–33,299 copies/nucleus on day 8, Table S9) likely reflects the fact that some subunits are shared between the cBAF, PBAF and/or ncBAF complexes, as well as with other complexes, whereas other subunits are cBAF-specific. In addition, it likely reflects the presence of BAF complexes composed of different combinations of paralogous subunits. In future studies, the composition and/or stoichiometry of cBAF complexes during erythropoiesis can be studied by isolating the complexes before quantitative MS analysis.
The data mining and visualization tools from the QuickProt-PRM notebook allowed us to efficiently assess the absolute quantities of members of the cBAF complex in nuclear extracts, highlighting their relative stoichiometries and expression characteristics during erythropoiesis. This framework will be used in follow-up experiments to target specific cBAF subunits for knockdown or overexpression studies to test their functional importance during erythroid differentiation.
Conclusions and Outlook
QuickProt provides tools carefully designed to analyze and visualize data from DIA and PRM MS-proteomics experiments. In this article, we demonstrated the utility of the developed tools using in-house generated DIA and PRM datasets, offering insights into the biological processes underlying erythropoiesis. QuickProt offers comprehensive solutions beyond DIA data analysis (QuickProt-DIA). We provide notebooks for the analysis of PRM data (QuickProt-PRM), supporting both label-free and heavy-label experiments for relative and absolute quantification, respectively. Moreover, we have incorporated two more tools; QuickProt-PepSeq to report the number of unique peptide matches to a user-defined protein sequence in DIA datasets, and QuickProt-ID Search to retrieve UniProt protein IDs from gene names. We leveraged the accessibility of Google Colab notebooks to build our Python-based code within them. This makes it more user-friendly and versatile compared to other tools in the field, which typically require some coding skills. Furthermore, as an open-source platform, users can modify and customize the tools to suit the needs of specific projects.
Supplementary Material
Figure S1: Scheme of the QuickProt tool for proteomic data mining and visualization. QuickProt comprises five modules: QuickProt-DIA, QuickProt-PRM, QuickProt-PepSeq, and QuickProt-ID Search. QuickProt-DIA consists of two notebooks or pipelines: QuickProt-DIA (DIA-NN) and QuickProt-DIA (Skyline). Additionally, QuickProt-PRM includes two notebooks: QuickProt-PRM (Heavy label) and QuickProt-PRM (Label-free). QuickProt-PepSeq features two pipelines: QuickProt-PepSeq (DIA-NN) and QuickProt-PepSeq (Skyline). Lastly, QuickProt-ID Search has a notebook of the same name.
Figure S2: Interface for changing directories and installing Python library packages. A) Before (upper panel) and after (lower panel) views resulting from using the change directory interface. B) Before (upper panel) and after (lower panel) views following execution of the “Install necessary packages” section. Both A) and B) represent the initial code cells found in all QuickProt notebooks.
Figure S3: Overview of the QuickProt-DIA (DIA-NN) or (Skyline) notebook data preprocessing interface for sample/group name selection and data manipulation sections.
Figure S4: Overview of the QuickProt-DIA (DIA-NN) or (Skyline) notebook data preprocessing interface for peptide threshold, parameter calculation, and sample layout rearrangement.
Figure S5: Overview of the quality control, peptide and protein yields, and exploratory analysis interface in the QuickProt-DIA or QuickProt-PRM notebooks.
Figure S6: Overview of the protein abundance analysis interface in the QuickProt-DIA or QuickProt-PRM (Label-free) notebooks.
Figure S7: Overview of the enrichment analysis interface in the QuickProt-DIA (DIA-NN) or (Skyline) notebooks.
Figure S8: Overview of the data preprocessing interface in the QuickProt-PRM (Label-free) notebook.
Figure S9: Overview of the data preprocessing interface in the QuickProt-PRM (Heavy label) notebook.
Figure S10: Overview of the normalization and data manipulation modules in the QuickProt-PRM (Label-free) or (Heavy label) notebooks.
Figure S11: Overview of the protein abundance analysis interface in the QuickProt-PRM (Heavy label) notebooks.
Figure S12: Overview of QuickProt-PepSeq workflow. Outputs from DIA-NN or Skyline, coupled with QuickProt-DIA, can be imported into the QuickProt-PepSeq notebook (part of the QuickProt notebook series). A given region of interest (e.g., a protein domain) is then entered into the notebook to be mapped against the DIA data of a specific experiment. As a result, an Excel file, ‘PeptideMatches.xlsx’ will be generated. This file contains, in different tabs, the number of peptide matches per sample and per experimental group, as well as their respective amino acid sequences. The notebook also includes an option for plotting a bar graph showing the number of peptide matches in each experimental group.
Figure S13: Overview of QuickProt-ID Search workflow. The user inputs a list of genes of interest into QuickProt-ID. Using the Unipressed (Uniprot REST) library, the notebook has been adapted to generate a CSV table with their respective protein IDs.
Figure S14: Spearman’s correlation coefficient (ρ) and data distribution plots among replicates for days, 2, 4, 6, 8, 10, 11, 12, and 14 in DIA-data.
Figure S15: Distribution of the number of peptides per protein and bar plots for shared and unique proteins in DIA data. A) The number of peptides per protein plot depicts the distribution of values as well as the median value for each experimental group. Bar plots depict the median values for B) shared and C) unique proteins. A summary table with the names and numbers of proteins is generated and stored in a subfolder called ‘SHARED_UNIQUE_PROTEINS’ within the ‘TABLES’ folder. Additionally, within the same subfolder, another folder called ‘Extracted’ is generated to extract and store tables listing the proteins that are shared or unique for a given comparison or experimental group.
Figure S16: Principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), and variable importance in projection (VIP) scores plot derived from the DIA data of erythroid samples. A) PCA plot showing group patterns for erythroid time-course samples, with components PC1 and PC2 displayed. B) PLS-DA plot showing group patterns for erythroid time-course samples, with components PLS1 and PLS2 displayed. C) VIP scores plot indicating relative protein abundance [from lowest (yellow/green) to highest (blue)] and ranking the top five proteins/genes based on their contribution to the PLS-DA grouping pattern.
Figure S17: Heatmaps show the relative abundances of proteins in the BAF, ISWI, NuRD, INO80, SAGA, ATAC, SRCAP, and ARTX chromatin remodeling complexes during the time course in the DIA data. Log2 normalization was applied to the abundance values for proteins from days 0 to 14. The abbreviation ‘n.d.’ stands for ‘not detected’, indicating values that were not found in the datasets.
Figure S18: Protein ranking for erythroid samples collected on days 0, 2, 4, 6, 8, 10, 11, 12, and 14 in DIA data. The Y-axis represents the log2 abundance of proteins in the proteomes of each experimental group, whereas the X-axis depicts the abundance ranking for proteins in a given proteome. The names of the maximum, median, and minimum-ranking proteins are shown in the plot.
Figure S19: Volcano plots for differential expression of D0 vs. D4, D6, D8, D10, D11, or D12 from the DIA data. The volcano plots depict a P-value ≤ 0.05 and a fold-change (FC) threshold of 1.0. In the upper part of the plot, beside the title, a detailed description of the total number of upregulated, downregulated, and non-significantly changed proteins is shown. The names of the top three most significant proteins/genes (with the highest -log10(p-value)) in each comparison are displayed in the volcano plots.
Figure S20: Bar plots for protein abundance and peptide numbers for selected proteins in the DIA data of the erythroid samples. A) The abundance of ARID1A (upper panel) and SMARCC2 (lower panel) proteins is displayed in bar plots. Statistical analysis using the t-test was performed on the datasets, with Day 0 serving as the reference group for comparison. Asterisks represent statistically significant differences (P ≤ 0.05), whereas ‘n.s.’ indicates non-significant differences. These statistical representations were added automatically by the code in the QuickProt-DIA notebook. B) Abundance of BAZ1B protein. Data represent the median ± SD of two biological replicates. A compact letter display is automatically assigned on top of every bar. Different letters denote significant differences (P ≤ 0.05) by ANOVA with Tukey’s post-hoc test. C) Number of peptides supporting the abundance estimate for ARID1A, SMARCC2, and BAZ1B in each experimental group. Data represent the median ± SD of two biological replicates.
Figure S21: KEGG enrichment analysis of proteomes from individual experimental groups (A-C), and for D0 vs. D2 or D14 comparisons (D-E) in DIA data.
Figure S22: KEGG enrichment analysis of upregulated proteins in proteomes for A) D0 vs. D2 or B) D0 vs. D14 comparisons from the DIA data.
Figure S23: KEGG enrichment analysis of upregulated proteins in proteomes in A) D0 vs. D2 or B) D0 vs. D14 comparisons from the DIA data.
Figure S24: GO enrichment analysis of proteomes from individual experimental groups in DIA data. GO enrichment is classified by A) Biological Process, and B) Cellular Component.
Figure S25: Molecular function GO enrichment analysis of proteomes from individual experimental groups in the DIA data.
Figure S26: GO enrichment of proteomes from D0 vs. D2 or D14 comparisons in DIA data. GO enrichment is classified by A) Biological Process, B) Cellular Component, and C) Molecular Function as depicted in the plots.
Figure S27: GO enrichment analysis of upregulated proteins from proteomes in D0 vs. D2 or D14 comparisons from the DIA data. GO enrichment is classified by A) Biological Process, B) Cellular Component, and C) Molecular Function as depicted in the plots.
Figure S28: GO enrichment analysis of downregulated proteins from proteomes in D0 vs. D2 or D14 comparisons from the DIA data. GO enrichment is classified by A) Biological Process, B) Cellular Component, and C) Molecular Function as depicted in the plots.
Figure S29: Spearman’s correlation coefficient (ρ) and data distribution plots among replicates for days, 2, 4, 6, 8, 10, 11, 12, and 14 in PRM data.
Figure S30: A) Distribution of MS2 data points across the chromatographic peak and B) correlation matrix among experimental groups from the PRM data.
Figure S31: Principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), and variable importance in projection (VIP) scores plot derived from the PRM data of cBAF proteins in erythroid samples. A) PCA plot showing group patterns for erythroid time-course samples, with components PC1 and PC2 displayed. B) PLS-DA plot showing group patterns for erythroid time-course samples, with components PLS1 and PLS2 displayed. C) VIP scores plot indicating relative protein abundance [from lowest (yellow/green) to highest (blue)] and ranking the top five proteins/genes based on their contribution to the PLS-DA grouping pattern.
Figure S32: Number of A) peptides and B) proteins, and C) distribution of the number of peptides per protein in PRM data. Data represent the median ± SD of two biological replicates.
Figure S33: Protein ranking for erythroid samples collected on days 0, 2, 4, 6, 8, 10, 11, 12, and 14 in PRM data. The name of every protein in the ranking is depicted in each plot. The Y-axis represents the log2 number of protein copies per nucleus in the proteomes of each experimental group, whereas the X-axis depicts the abundance ranking for proteins in a given proteome.
Figure S34: Trend line plot for protein members of cBAF complex based on the PRM data. Trend line plot for ACTL6 (ACTL6A, and ACTL6B), BCL7 (BCL7A, BCL7B, and BCL7C), DPF (DPF1–3), SMARCA (SMARCA2, and SMARCA4), SMARCC (SMARCC1, and SMARCC2), SS18 (SS18 AND SS18L1), and SMARCE1 proteins. Data represent the median ± SD of two biological replicates.
Table S1: List of annotated tables generated for each QuickProt notebook. A detailed description of the folder location, names of output spreadsheet tables, and their contents is provided.
Table S2: DIA method for sample processing in Eclipse Orbitrap LC-MS/MS.
Table S3: PRM method for sample processing in Eclipse Orbitrap LC-MS/MS.
Table S4: Metadata table containing the names of samples (used for ease in analysis) and their respective LC-MS/MS raw file names.
Table S5: List of peptides and QconCATs used for PRM quantification.
Table S6: Molecular weight (MW) of QconCATs.
Table S7: Protein amount (pg) per nucleus in each experimental group.
Table S8: Absolute quantities of peptides from 21 cBAF proteins in each replicate/sample. Data represent the median copies per nucleus ± SD, and in parentheses, the number of peptides used to calculate the abundance in each case. The full list of peptides and QconCATs used for PRM quantification of each protein is provided in Table S5.
Table S9: Median number of copies per nucleus for 21 protein members of the cBAF complex. The median copies per nucleus were calculated first for cBAF subunits within each replicate, then the median values across two biological replicates were used to calculate the final copies per nucleus. The minimum and maximum values for each protein across the time course are also represented in the table.
Acknowledgments
This work was supported by the National Institutes of Health (NIH), Grant numbers: RO1DK098449 and S10OD026936.
Footnotes
Conflict of Interest Statement
The authors declare no conflicts of interest.
Data Availability Statement
The LC-MS/MS raw data, spectral libraries, and the input files processed in QuickProt have been deposited in ProteomeXchange [PXD060333] and can also be accessed via Panorama Public [https://panoramaweb.org/QuickProt_datasets.url]. The QuickProt notebooks were deposited in GitHub (https://github.com/OmarArias-Gaguancela/QuickProt), which contains links to all notebooks along with ready-to-use input/sample data that can be processed in each notebook.
References
- [1].Shuken SR, An Introduction to Mass Spectrometry-Based Proteomics. J. Proteome Res 2023, 22, 2151–2171. [DOI] [PubMed] [Google Scholar]
- [2].Aebersold R, Mann M, Mass spectrometry-based proteomics. Nature 2003, 422, 198–207. [DOI] [PubMed] [Google Scholar]
- [3].Guerrera IC, Kleiner O, Application of mass spectrometry in proteomics. Biosci. Rep 2005, 25, 71–93. [DOI] [PubMed] [Google Scholar]
- [4].Boys EL, Liu J, Robinson PJ, Reddel RR, Clinical applications of mass spectrometry-based proteomics in cancer: Where are we? PROTEOMICS 2023, 23, 2200238. [DOI] [PubMed] [Google Scholar]
- [5].Fröhlich K, Brombacher E, Fahrner M, Vogele D, et al. , Benchmarking of analysis strategies for data-independent acquisition proteomics using a large-scale dataset comprising inter-patient heterogeneity. Nat. Commun 2022, 13, 2622. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Doerr A, DIA mass spectrometry. Nat. Methods 2015, 12, 35–35. [Google Scholar]
- [7].Fröhlich K, Fahrner M, Brombacher E, Seredynska A, et al. , Data-Independent Acquisition: A Milestone and Prospect in Clinical Mass Spectrometry–Based Proteomics. Mol. Cell. Proteomics 2024, 23, 100800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Willems P, Fels U, Staes A, Gevaert K, Van Damme P, Use of Hybrid Data-Dependent and -Independent Acquisition Spectral Libraries Empowers Dual-Proteome Profiling. J. Proteome Res 2021, 20, 1165–1177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Page MJ, Amess B, Rohlff C, Stubberfield C, Parekh R, Proteomics: a major new technology for the drug discovery process. Drug Discov. Today 1999, 4, 55–62. [DOI] [PubMed] [Google Scholar]
- [10].Espejo C, Lyons B, Woods GM, Wilson R, in:, Greening DW, Simpson RJ (Eds.), SerumPlasma Proteomics Methods Protoc, Springer US, New York, NY: 2023, pp. 127–152. [Google Scholar]
- [11].Park J, Oh HJ, Han D, Wang JI, et al. , Parallel Reaction Monitoring-Mass Spectrometry (PRM-MS)-Based Targeted Proteomic Surrogates for Intrinsic Subtypes in Breast Cancer: Comparative Analysis with Immunohistochemical Phenotypes. J. Proteome Res 2020, 19, 2643–2653. [DOI] [PubMed] [Google Scholar]
- [12].Peterson AC, Russell JD, Bailey DJ, Westphall MS, Coon JJ, Parallel Reaction Monitoring for High Resolution and High Mass Accuracy Quantitative, Targeted Proteomics*. Mol. Cell. Proteomics 2012, 11, 1475–1488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Urisman A, Levin RS, Gordan JD, Webber JT, et al. , An Optimized Chromatographic Strategy for Multiplexing In Parallel Reaction Monitoring Mass Spectrometry: Insights from Quantitation of Activated Kinases*. Mol. Cell. Proteomics 2017, 16, 265–277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Peckner R, Myers SA, Jacome ASV, Egertson JD, et al. , Specter: linear deconvolution for targeted analysis of data-independent acquisition mass spectrometry proteomics. Nat. Methods 2018, 15, 371–378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Gillespie MA, Palii CG, Sanchez-Taltavull D, Shannon P, et al. , Absolute Quantification of Transcription Factors Reveals Principles of Gene Regulation in Erythropoiesis. Mol. Cell 2020, 78, 960–974.e11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Demichev V, Messner CB, Vernardis SI, Lilley KS, Ralser M, DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat. Methods 2020, 17, 41–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Röst HL, Sachsenberg T, Aiche S, Bielow C, et al. , OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat. Methods 2016, 13, 741–748. [DOI] [PubMed] [Google Scholar]
- [18].Searle BC, Pino LK, Egertson JD, Ting YS, et al. , Chromatogram libraries improve peptide detection and quantification by data independent acquisition mass spectrometry. Nat. Commun 2018, 9, 5128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Kohler D, Staniak M, Tsai T-H, Huang T, et al. , MSstats Version 4.0: Statistical Analyses of Quantitative Mass Spectrometry-Based Proteomic Experiments with Chromatography-Based Quantification at Scale. J. Proteome Res 2023, 22, 1466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].MacLean B, Tomazela DM, Shulman N, Chambers M, et al. , Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinforma. Oxf. Engl 2010, 26, 966–968. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Tyanova S, Temu T, Sinitcyn P, Carlson A, et al. , The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods 2016, 13, 731–740. [DOI] [PubMed] [Google Scholar]
- [22].Tyanova S, Temu T, Carlson A, Sinitcyn P, et al. , Visualization of LC-MS/MS proteomics data in MaxQuant. Proteomics 2015, 15, 1453–1456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Lazar C, Gatto L, Ferro M, Bruley C, Burger T, Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J. Proteome Res 2016, 15, 1116–1125. [DOI] [PubMed] [Google Scholar]
- [24].Kokla M, Virtanen J, Kolehmainen M, Paananen J, Hanhineva K, Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: a comparative study. BMC Bioinformatics 2019, 20, 492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Jin L, Bi Y, Hu C, Qu J, et al. , A comparative study of evaluating missing value imputation methods in label-free proteomics. Sci. Rep 2021, 11, 1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Li YF, Arnold RJ, Tang H, Radivojac P, The importance of peptide detectability for protein identification, quantification, and experiment design in MS/MS proteomics. J. Proteome Res 2010, 9, 6288–6297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Pedregosa F, Varoquaux G, Gramfort A, Michel V, et al. , Scikit-learn: Machine Learning in Python. J Mach Learn Res 2011, 12, 2825–2830. [Google Scholar]
- [28].Giarratana M-C, Kobari L, Lapillonne H, Chalmers D, et al. , Ex vivo generation of fully mature human red blood cells from hematopoietic stem cells. Nat. Biotechnol 2005, 23, 69–74. [DOI] [PubMed] [Google Scholar]
- [29].Palii CG, Cheng Q, Gillespie MA, Shannon P, et al. , Single-Cell Proteomics Reveal that Quantitative Changes in Co-expressed Lineage-Specific Transcription Factors Determine Cell Fate. Cell Stem Cell 2019, 24, 812–820.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Cho K-C, Oh S, Wang Y, Rosenthal LS, et al. , Evaluation of the sensitivity and reproducibility of targeted proteomic analysis using data independent acquisition for serum and cerebrospinal fluid proteins. J. Proteome Res 2021, 20, 4284–4291. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Brenes AJ, Calculating and Reporting Coefficients of Variation for DIA-Based Proteomics. J. Proteome Res 2024, 23, 5274–5278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Andersen L-AC, Palstrøm NB, Diederichsen A, Lindholt JS, et al. , Determining Plasma Protein Variation Parameters as a Prerequisite for Biomarker Studies—A TMT-Based LC-MSMS Proteome Investigation. Proteomes 2021, 9, 47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Pino LK, Just SC, MacCoss MJ, Searle BC, Acquiring and Analyzing Data Independent Acquisition Proteomics Experiments without Spectrum Libraries. Mol. Cell. Proteomics MCP 2020, 19, 1088–1103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Ludwig LS, Lareau CA, Bao EL, Nandakumar SK, et al. , Transcriptional States and Chromatin Accessibility Underlying Human Erythropoiesis. Cell Rep 2019, 27, 3228–3240.e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Gourisankar S, Krokhotin A, Wenderski W, Crabtree GR, Context-specific functions of chromatin remodellers in development and disease. Nat. Rev. Genet 2024, 25, 340–361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Gaspar-Maia A, Alajem A, Meshorer E, Ramalho-Santos M, Open chromatin in pluripotency and reprogramming. Nat. Rev. Mol. Cell Biol 2011, 12, 36–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Chasis JA, Mohandas N, Erythroblastic islands: niches for erythropoiesis. Blood 2008, 112, 470–478. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Klei TRL, Meinderts SM, van den Berg TK, van Bruggen R, From the Cradle to the Grave: The Role of Macrophages in Erythropoiesis and Erythrophagocytosis. Front. Immunol 2017, 8, 73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Qiu L-B, Dickson H, Hajibagheri MAN, Crocker PR, Extruded Erythroblast Nuclei Are Bound and Phagocytosed by a Novel Macrophage Receptor. Blood 1995, 85, 1630–1639. [PubMed] [Google Scholar]
- [40].Mashtalir N, Suzuki H, Farrell DP, Sankar A, et al. , A Structural Model of the Endogenous Human BAF Complex Informs Disease Mechanisms. Cell 2020, 183, 802–817.e24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [41].Michel BC, D’Avino AR, Cassel SH, Mashtalir N, et al. , A non-canonical SWI/SNF complex is a synthetic lethal target in cancers driven by BAF complex perturbation. Nat. Cell Biol 2018, 20, 1410–1420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Singhal N, Graumann J, Wu G, Araúzo-Bravo MJ, et al. , Chromatin-Remodeling Components of the BAF Complex Facilitate Reprogramming. Cell 2010, 141, 943–955. [DOI] [PubMed] [Google Scholar]
- [43].Xiao M, Kondo S, Nomura M, Kato S, et al. , BRD9 determines the cell fate of hematopoietic stem cells by regulating chromatin state. Nat. Commun 2023, 14, 8372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Alfert A, Moreno N, Kerl K, The BAF complex in development and disease. Epigenetics Chromatin 2019, 12, 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Wu J, Fan C, Kabir AU, Krchma K, et al. , Baf155 controls hematopoietic differentiation and regeneration through chromatin priming. Cell Rep 2024, 43, 114558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Park J, Kirkland JG, The role of the polybromo-associated BAF complex in development. Biochem. Cell Biol. Biochim. Biol. Cell 2025, 103, 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [47].Varga J, Kube M, Luck K, Schick S, The BAF chromatin remodeling complexes: structure, function, and synthetic lethalities. Biochem. Soc. Trans 2021, 49, 1489–1503. [DOI] [PubMed] [Google Scholar]
- [48].Boulay G, Sandoval GJ, Riggi N, Iyer S, et al. , Cancer-Specific Retargeting of BAF Complexes by a Prion-like Domain. Cell 2017, 171, 163–178.e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [49].Kadoch C, Crabtree GR, Mammalian SWI/SNF chromatin remodeling complexes and cancer: Mechanistic insights gained from human genomics. Sci. Adv 2015, 1, e1500447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Pratt JM, Simpson DM, Doherty MK, Rivers J, et al. , Multiplexed absolute quantification for proteomics using concatenated signature peptides encoded by QconCAT genes. Nat. Protoc 2006, 1, 1029–1043. [DOI] [PubMed] [Google Scholar]
- [51].Takemori N, Takemori A, Tanaka Y, Endo Y, et al. , MEERCAT: Multiplexed Efficient Cell Free Expression of Recombinant QconCATs For Large Scale Absolute Proteome Quantification*. Mol. Cell. Proteomics 2017, 16, 2169–2183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [52].Braun SMG, Petrova R, Tang J, Krokhotin A, et al. , BAF subunit switching regulates chromatin accessibility to control cell cycle exit in the developing mammalian cortex. Genes Dev 2021, 35, 335–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Mashtalir N, D’Avino AR, Michel BC, Luo J, et al. , Modular Organization and Assembly of SWI/SNF Family Chromatin Remodeling Complexes. Cell 2018, 175, 1272–1288.e20. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1: Scheme of the QuickProt tool for proteomic data mining and visualization. QuickProt comprises five modules: QuickProt-DIA, QuickProt-PRM, QuickProt-PepSeq, and QuickProt-ID Search. QuickProt-DIA consists of two notebooks or pipelines: QuickProt-DIA (DIA-NN) and QuickProt-DIA (Skyline). Additionally, QuickProt-PRM includes two notebooks: QuickProt-PRM (Heavy label) and QuickProt-PRM (Label-free). QuickProt-PepSeq features two pipelines: QuickProt-PepSeq (DIA-NN) and QuickProt-PepSeq (Skyline). Lastly, QuickProt-ID Search has a notebook of the same name.
Figure S2: Interface for changing directories and installing Python library packages. A) Before (upper panel) and after (lower panel) views resulting from using the change directory interface. B) Before (upper panel) and after (lower panel) views following execution of the “Install necessary packages” section. Both A) and B) represent the initial code cells found in all QuickProt notebooks.
Figure S3: Overview of the QuickProt-DIA (DIA-NN) or (Skyline) notebook data preprocessing interface for sample/group name selection and data manipulation sections.
Figure S4: Overview of the QuickProt-DIA (DIA-NN) or (Skyline) notebook data preprocessing interface for peptide threshold, parameter calculation, and sample layout rearrangement.
Figure S5: Overview of the quality control, peptide and protein yields, and exploratory analysis interface in the QuickProt-DIA or QuickProt-PRM notebooks.
Figure S6: Overview of the protein abundance analysis interface in the QuickProt-DIA or QuickProt-PRM (Label-free) notebooks.
Figure S7: Overview of the enrichment analysis interface in the QuickProt-DIA (DIA-NN) or (Skyline) notebooks.
Figure S8: Overview of the data preprocessing interface in the QuickProt-PRM (Label-free) notebook.
Figure S9: Overview of the data preprocessing interface in the QuickProt-PRM (Heavy label) notebook.
Figure S10: Overview of the normalization and data manipulation modules in the QuickProt-PRM (Label-free) or (Heavy label) notebooks.
Figure S11: Overview of the protein abundance analysis interface in the QuickProt-PRM (Heavy label) notebooks.
Figure S12: Overview of QuickProt-PepSeq workflow. Outputs from DIA-NN or Skyline, coupled with QuickProt-DIA, can be imported into the QuickProt-PepSeq notebook (part of the QuickProt notebook series). A given region of interest (e.g., a protein domain) is then entered into the notebook to be mapped against the DIA data of a specific experiment. As a result, an Excel file, ‘PeptideMatches.xlsx’ will be generated. This file contains, in different tabs, the number of peptide matches per sample and per experimental group, as well as their respective amino acid sequences. The notebook also includes an option for plotting a bar graph showing the number of peptide matches in each experimental group.
Figure S13: Overview of QuickProt-ID Search workflow. The user inputs a list of genes of interest into QuickProt-ID. Using the Unipressed (Uniprot REST) library, the notebook has been adapted to generate a CSV table with their respective protein IDs.
Figure S14: Spearman’s correlation coefficient (ρ) and data distribution plots among replicates for days, 2, 4, 6, 8, 10, 11, 12, and 14 in DIA-data.
Figure S15: Distribution of the number of peptides per protein and bar plots for shared and unique proteins in DIA data. A) The number of peptides per protein plot depicts the distribution of values as well as the median value for each experimental group. Bar plots depict the median values for B) shared and C) unique proteins. A summary table with the names and numbers of proteins is generated and stored in a subfolder called ‘SHARED_UNIQUE_PROTEINS’ within the ‘TABLES’ folder. Additionally, within the same subfolder, another folder called ‘Extracted’ is generated to extract and store tables listing the proteins that are shared or unique for a given comparison or experimental group.
Figure S16: Principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), and variable importance in projection (VIP) scores plot derived from the DIA data of erythroid samples. A) PCA plot showing group patterns for erythroid time-course samples, with components PC1 and PC2 displayed. B) PLS-DA plot showing group patterns for erythroid time-course samples, with components PLS1 and PLS2 displayed. C) VIP scores plot indicating relative protein abundance [from lowest (yellow/green) to highest (blue)] and ranking the top five proteins/genes based on their contribution to the PLS-DA grouping pattern.
Figure S17: Heatmaps show the relative abundances of proteins in the BAF, ISWI, NuRD, INO80, SAGA, ATAC, SRCAP, and ARTX chromatin remodeling complexes during the time course in the DIA data. Log2 normalization was applied to the abundance values for proteins from days 0 to 14. The abbreviation ‘n.d.’ stands for ‘not detected’, indicating values that were not found in the datasets.
Figure S18: Protein ranking for erythroid samples collected on days 0, 2, 4, 6, 8, 10, 11, 12, and 14 in DIA data. The Y-axis represents the log2 abundance of proteins in the proteomes of each experimental group, whereas the X-axis depicts the abundance ranking for proteins in a given proteome. The names of the maximum, median, and minimum-ranking proteins are shown in the plot.
Figure S19: Volcano plots for differential expression of D0 vs. D4, D6, D8, D10, D11, or D12 from the DIA data. The volcano plots depict a P-value ≤ 0.05 and a fold-change (FC) threshold of 1.0. In the upper part of the plot, beside the title, a detailed description of the total number of upregulated, downregulated, and non-significantly changed proteins is shown. The names of the top three most significant proteins/genes (with the highest -log10(p-value)) in each comparison are displayed in the volcano plots.
Figure S20: Bar plots for protein abundance and peptide numbers for selected proteins in the DIA data of the erythroid samples. A) The abundance of ARID1A (upper panel) and SMARCC2 (lower panel) proteins is displayed in bar plots. Statistical analysis using the t-test was performed on the datasets, with Day 0 serving as the reference group for comparison. Asterisks represent statistically significant differences (P ≤ 0.05), whereas ‘n.s.’ indicates non-significant differences. These statistical representations were added automatically by the code in the QuickProt-DIA notebook. B) Abundance of BAZ1B protein. Data represent the median ± SD of two biological replicates. A compact letter display is automatically assigned on top of every bar. Different letters denote significant differences (P ≤ 0.05) by ANOVA with Tukey’s post-hoc test. C) Number of peptides supporting the abundance estimate for ARID1A, SMARCC2, and BAZ1B in each experimental group. Data represent the median ± SD of two biological replicates.
Figure S21: KEGG enrichment analysis of proteomes from individual experimental groups (A-C), and for D0 vs. D2 or D14 comparisons (D-E) in DIA data.
Figure S22: KEGG enrichment analysis of upregulated proteins in proteomes for A) D0 vs. D2 or B) D0 vs. D14 comparisons from the DIA data.
Figure S23: KEGG enrichment analysis of upregulated proteins in proteomes in A) D0 vs. D2 or B) D0 vs. D14 comparisons from the DIA data.
Figure S24: GO enrichment analysis of proteomes from individual experimental groups in DIA data. GO enrichment is classified by A) Biological Process, and B) Cellular Component.
Figure S25: Molecular function GO enrichment analysis of proteomes from individual experimental groups in the DIA data.
Figure S26: GO enrichment of proteomes from D0 vs. D2 or D14 comparisons in DIA data. GO enrichment is classified by A) Biological Process, B) Cellular Component, and C) Molecular Function as depicted in the plots.
Figure S27: GO enrichment analysis of upregulated proteins from proteomes in D0 vs. D2 or D14 comparisons from the DIA data. GO enrichment is classified by A) Biological Process, B) Cellular Component, and C) Molecular Function as depicted in the plots.
Figure S28: GO enrichment analysis of downregulated proteins from proteomes in D0 vs. D2 or D14 comparisons from the DIA data. GO enrichment is classified by A) Biological Process, B) Cellular Component, and C) Molecular Function as depicted in the plots.
Figure S29: Spearman’s correlation coefficient (ρ) and data distribution plots among replicates for days, 2, 4, 6, 8, 10, 11, 12, and 14 in PRM data.
Figure S30: A) Distribution of MS2 data points across the chromatographic peak and B) correlation matrix among experimental groups from the PRM data.
Figure S31: Principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), and variable importance in projection (VIP) scores plot derived from the PRM data of cBAF proteins in erythroid samples. A) PCA plot showing group patterns for erythroid time-course samples, with components PC1 and PC2 displayed. B) PLS-DA plot showing group patterns for erythroid time-course samples, with components PLS1 and PLS2 displayed. C) VIP scores plot indicating relative protein abundance [from lowest (yellow/green) to highest (blue)] and ranking the top five proteins/genes based on their contribution to the PLS-DA grouping pattern.
Figure S32: Number of A) peptides and B) proteins, and C) distribution of the number of peptides per protein in PRM data. Data represent the median ± SD of two biological replicates.
Figure S33: Protein ranking for erythroid samples collected on days 0, 2, 4, 6, 8, 10, 11, 12, and 14 in PRM data. The name of every protein in the ranking is depicted in each plot. The Y-axis represents the log2 number of protein copies per nucleus in the proteomes of each experimental group, whereas the X-axis depicts the abundance ranking for proteins in a given proteome.
Figure S34: Trend line plot for protein members of cBAF complex based on the PRM data. Trend line plot for ACTL6 (ACTL6A, and ACTL6B), BCL7 (BCL7A, BCL7B, and BCL7C), DPF (DPF1–3), SMARCA (SMARCA2, and SMARCA4), SMARCC (SMARCC1, and SMARCC2), SS18 (SS18 AND SS18L1), and SMARCE1 proteins. Data represent the median ± SD of two biological replicates.
Table S1: List of annotated tables generated for each QuickProt notebook. A detailed description of the folder location, names of output spreadsheet tables, and their contents is provided.
Table S2: DIA method for sample processing in Eclipse Orbitrap LC-MS/MS.
Table S3: PRM method for sample processing in Eclipse Orbitrap LC-MS/MS.
Table S4: Metadata table containing the names of samples (used for ease in analysis) and their respective LC-MS/MS raw file names.
Table S5: List of peptides and QconCATs used for PRM quantification.
Table S6: Molecular weight (MW) of QconCATs.
Table S7: Protein amount (pg) per nucleus in each experimental group.
Table S8: Absolute quantities of peptides from 21 cBAF proteins in each replicate/sample. Data represent the median copies per nucleus ± SD, and in parentheses, the number of peptides used to calculate the abundance in each case. The full list of peptides and QconCATs used for PRM quantification of each protein is provided in Table S5.
Table S9: Median number of copies per nucleus for 21 protein members of the cBAF complex. The median copies per nucleus were calculated first for cBAF subunits within each replicate, then the median values across two biological replicates were used to calculate the final copies per nucleus. The minimum and maximum values for each protein across the time course are also represented in the table.
Data Availability Statement
The LC-MS/MS raw data, spectral libraries, and the input files processed in QuickProt have been deposited in ProteomeXchange [PXD060333] and can also be accessed via Panorama Public [https://panoramaweb.org/QuickProt_datasets.url]. The QuickProt notebooks were deposited in GitHub (https://github.com/OmarArias-Gaguancela/QuickProt), which contains links to all notebooks along with ready-to-use input/sample data that can be processed in each notebook.
