Abstract
Top-down proteomics (TDP) aims to delineate proteomes in a proteoform-specific manner, which is vital for accurately understanding protein function in cellular processes. It requires high-capacity separation of proteoforms before mass spectrometry (MS) and tandem MS (MS/MS). Capillary isoelectric focusing (cIEF)-MS has been recognized as a useful tool for TDP in the 1990s because cIEF is capable of high-resolution separation of proteoforms. Previous cIEF-MS studies concentrated on measuring the protein’s mass without MS/MS, impeding the confident proteoform identification in complex samples and the accurate localization of post-translational modifications on proteoforms. Herein, for the first time, we present automated cIEF-MS/MS-based TDP for large-scale delineation of proteoforms in complex proteomes. Single-shot cIEF-MS/MS identified 711 proteoforms from an Escherichia coli (E. coli) proteome consuming only nanograms of proteins. Coupling two-dimensional size-exclusion chromatography (SEC)-cIEF to ESI-MS/MS enabled the identification of nearly 2000 proteoforms from the E. coli proteome. Label-free quantitative TDP of zebrafish male and female brains using SEC-cIEF-MS/MS quantified thousands of proteoforms and revealed sex-dependent proteoform profiles in brains. Particularly, we discovered several proteolytic proteoforms of pro-opiomelanocortin and prodynorphin with significantly higher abundance in male zebrafish brains as potential endogenous hormone proteoforms. Multilevel quantitative proteomics (TDP and bottom-up proteomics) of the brains revealed that the majority of proteoforms having statistically significant difference in abundance between genders showed no abundance difference at the protein group level. This work represents the first multilevel quantitative proteomics study of sexual dimorphism of the brain.
Graphical Abstract

INTRODUCTION
Mass spectrometry (MS)-based top-down proteomics (TDP) has emerged as a powerful tool for accurate identification and quantification of proteoforms, which represent all forms of protein molecules from the same gene because of genetic variations, alternative splicing, and post-translational modifications (PTMs).1–3 Accurate characterization of proteoforms is critical for better understanding protein functions and discovering important proteoform signatures in the development of diseases.4–6 Because of the extremely high complexity of proteomes, high-resolution proteoform separation is vital for large-scale TDP.
Besides the routinely used reversed-phase liquid chromatography (RPLC)-tandem MS (MS/MS), capillary zone electrophoresis (CZE)-MS/MS has been suggested as a valuable tool for TDP of complex proteomes with the identification of thousands of proteoforms.7–14 CZE separates proteoforms according to their electrophoretic mobilities which correspond to their charge-to-size ratios. CZE exhibits high separation efficiency for proteoforms and is well compatible with electrospray ionization (ESI)-MS. However, the performance of CZE is limited by the low sample loading capacity (typically about 1% of the total capillary volume). As an alternative electrophoretic separation method, capillary isoelectric focusing (cIEF) separates amphoteric analytes based on their isoelectric points (pIs) and has a high resolution for proteoform separation and high sample loading capacity (i.e., 100% of the total capillary volume).15 Integrating cIEF with ESI-MS for protein study has been an important research area for two decades because cIEF has ultrahigh resolution for proteoform separation.16 Lee and Smith groups performed the pioneering cIEF-MS studies in the 1990s for characterization of simple protein mixtures and complex proteomes17–20 via the coaxial sheath flow CE–MS interface.21 These pioneering studies laid the foundation of using cIEF-MS for protein characterization. However, the technique has not been widely adopted for protein characterization in last two decades because of its manual operations, the ionization suppression of analytes from ampholytes, and the lack of a robust and highly sensitive CE–MS interface.
In recent years, cIEF-MS has attracted great attention again because of the drastic improvement of the CE–MS interface in sensitivity and the automated operations of cIEF-MS. The flow-through microvial CE–MS interface22 and the electrokinetically pumped sheath flow CE–MS interface23,24 have been employed for cIEF-MS studies, in which “sandwich” injection methods were developed for automated cIEF-MS.25–27 Several studies have successfully employed automated cIEF-MS for high-resolution characterization of anti-body charge variants.28–31
Although cIEF-MS presented great potential for delineating proteoforms, previous cIEF-MS studies have mainly focused on measuring the protein’s mass without MS/MS analysis, impeding the confident proteoform identification in complex samples and the accurate localization of PTMs on proteoforms. In this study, we report the first work of applying automated cIEF-MS/MS in large-scale TDP of complex proteomes. The automated and online cIEF-MS/MS platform was developed using the electrokinetically pumped sheath flow CE–MS interface, the “sandwich” injection configuration, and linear-polyacrylamide (LPA) coated separation capillaries. First, we developed high-throughput and high-capacity cIEF-MS/MS methods for large-scale TDP. Second, we coupled size-exclusion chromatography (SEC)-cIEF to ESI-MS/MS for large-scale qualitative TDP of an Escherichia coli cell lysate and label-free quantitative TDP of zebrafish male and female brains for a better understanding of sexual dimorphism of the brain at the proteoform level. Finally, we compared quantitative proteomics datasets of zebrafish brains from TDP and bottom-up proteomics (BUP).
EXPERIMENTAL SECTION
All the experimental details are described in the Supporting Information. Some brief experimental information is shown below.
The E. coli lysate and zebrafish brain lysates were fractionated by SEC, and each SEC fraction was further analyzed by automated cIEF-MS/MS. The automated cIEF-MS/MS system was constructed by coupling a CESI 8000 Plus CE system (Beckman Coulter) to a Q-Exactive HF mass spectrometer (Thermo Fisher Scientific) via a commercialized electrokinetically pumped sheath-flow CE–MS nanospray interface (CMP Scientific Corp).23,24 An LPA-coated capillary and the “sandwich” injection configuration were employed for the automated cIEF-MS. For proteoform identification and relative quantification, TopPIC (top-down MS-based proteoform identification and characterization) software was used.32 Label-free quantification (LFQ) was deployed for quantitative TDP analyses of male and female zebrafish brains via comparing the feature intensities of proteoforms between the samples. The feature intensity of a proteoform was calculated as the sum of intensities of its corresponding peaks from all scans and charge states.11
RESULTS AND DISCUSSION
Automated High-Throughput and High-Capacity cIEF-MS/MS.
Figure 1A shows a diagram of the automated cIEF-MS system. In this platform, the outlet of an LPA-coated capillary is positioned into the electrokinetically pumped sheath-flow CE–MS interface filled with an acidic sheath buffer containing 0.2% (v/v) formic acid (FA) and 10% (v/v) methanol, while its inlet is inserted into an acidic anolyte solution [0.1% (v/v) FA or 5% (v/v) acetic acid (AA)]. The focusing is carried out by applying a 30 kV voltage across the capillary after injecting a plug of basic catholyte [0.3% (w/w) NH3·H2O, pH 11.8] and a mixture of analytes and ampholyte into the capillary successively. After focusing, the separated proteoforms are mobilized out of the capillary for ESI-MS automatically when the pH gradient is gradually disrupted by the migration of hydrogen protons from the acidic anolyte and anions from the sheath buffer (chemical mobilization).
Figure 1.

Development of cIEF-MS/MS methods with a single SEC fraction of an E. coli lysate. (A) Flowchart of automated cIEF-MS including basic catholyte and sample injection, focusing, and chemical mobilization. (B) Evaluation of reproducibility of cIEF-MS/MS system. The base peak electropherograms are from cIEF-MS/MS analysis of fraction 3 of E. coli lysate in triplicate runs using an 80 cm capillary. (C) Base peak electropherograms of fraction 3 using an 80 cm capillary plus 0.1% FA as the anolyte (red), a 150 cm capillary plus 0.1% FA as the anolyte (blue), and a 150 cm capillary plus 5% AA as the anolyte (dark cyan).
To improve proteoform separation and detection, critical experimental parameters of cIEF-MS were first investigated with a standard protein mixture, Figures S1–S4. The results indicated that a 5 cm catholyte plug, a 40 cm sample plug (half of the total capillary volume), a 0.1% ampholyte concentration, and low protein concentration were the most appropriate conditions for cIEF separation balancing separation resolution and MS signal. Using the optimized condition, one SEC fraction of an E. coli lysate (~0.4 mg/mL protein concentration) was analyzed by cIEF-MS/MS in triplicate. On average, nearly 300 proteoforms were identified in only 50 min with good reproducibility regarding the number of proteoform identifications (n = 3 and RSD = 4.1%), Figure 1B. We called the method high-throughput cIEF-MS/MS. The high-throughput cIEF-MS/MS method also showed nice reproducibility regarding the top-down LFQ intensity of proteoforms, Figure S5.
We then questioned how we further boosted the number of proteoform identifications from a single cIEF-MS/MS run. Inspired by our recent CZE-MS/MS-based TDP work using a 1.5 m-long LPA-coated capillary,11 we tried cIEF-MS/MS with a 1.5 m-long LPA-coated capillary for analysis of the same E. coli sample used previously. We loaded roughly 50% of the capillary with the sample (80 cm long sample plug) for cIEF-MS/MS in this case. The 1.5 m capillary offered a higher number of proteoform identifications (449 vs 281) and peak capacity (92 vs 77) compared to the 80 cm capillary, Figure 1C and the Supporting Information. In addition, we observed that compared to 0.1% (v/v) FA, the use of 5% (v/v) AA as an anolyte further increased the peak capacity (136 vs 92) and proteoform identifications (711 vs 449) by nearly 50 and 60%, respectively, Figure 1C and the Supporting Information. 5% (v/v) AA elongated the protein migration time and achieved a wider separation window and thereby enhanced the number of proteoform identifications and peak capacity. This is likely because 5% (v/v) AA has a higher viscosity and a lower pH than 0.1% (v/v) FA, which slow down protein migration during the mobilization process. cIEF-MS/MS using a 1.5 m-long capillary and 5% (v/v) AA as the anolyte enabled the identification of 711 proteoforms and 177 proteins from the E. coli sample in about 2.5 h instrument time with a consumption of roughly 480 ng of proteins. We named this method high-capacity cIEF-MS/MS. Interestingly, the high-capacity cIEF-MS/MS method is comparable with dynamic pH junction-based CZE-MS/MS11,12 and nanoflow RPLC-MS/MS33–35 regarding the number of proteoform identifications in a single run. We need to point out that the LPA-coated capillaries prepared in our study are generally durable, which can be continuously used for more than 60 h for cIEF-MS. All the exciting data render cIEF-MS/MS as another powerful tool for large-scale delineation of proteoforms in complex samples.
Large-Scale TDP of E. coli Cells Using SEC-cIEF-MS/MS.
2D-PAGE is well known for high-capacity separation of proteoforms based on their molecular weight (MW) and pI. Unfortunately, it is challenging to directly couple 2D-PAGE to ESI-MS/MS for TDP because of offline and tedious operations. Here, we proposed to couple SEC-cIEF to ESI-MS/MS for large-scale TDP for the first time. The E. coli proteoforms were first fractionated to six fractions based on their size using SEC, followed by online high-capacity cIEF-MS/MS, Figure 2A. Each SEC eluate was further separated into an about 40 min separation window by cIEF, indicating good orthogonality of SEC and cIEF for proteoform separation. The number of identified proteoforms and proteins per SEC fraction ranged from 150 to 711 and 32 to 177, respectively. Sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) analysis of the SEC fractions showed that SEC offered reasonable separations of proteoforms based on their MWs with clear MW shift from high to low as the fraction number increased, Figure 2B. The mass distribution of identified proteoforms from cIEF-MS/MS analysis of each SEC fraction agreed well with the SDS-PAGE data, Figure 2C. Figure 2D shows the correlations of proteoforms’ pIs and migration time from cIEF-MS/MS analyses of two SEC fractions. Basic proteoforms tended to migrate out of the cIEF capillary faster than acidic ones, indicating clear pI-based separations. The data in Figure 2D agree with the cIEF-MS/MS-based BUP data in the literature.36 Figure 2E depicts the cumulative proteoform and protein identifications as a function of the number of SEC fractions with a continuous increase in both protein and proteoform identifications as more SEC fractions were considered.
Figure 2.

Characterization of an E. coli proteome using SEC-cIEF-MS/MS. (A) 2D separation of the E. coli proteome using the SEC-cIEF platform. Proteins were fractionated based on MWs in the SEC dimension (vertical chromatogram) and further separated according to pI values in the cIEF dimension (horizontal electropherograms). (B) SDS-PAGE profiling of the proteome in SEC fractions. (C) Box plots of mass distribution of identified proteoforms in SEC fractions. (D) Migration time vs calculated pI value of proteoforms without modifications in SEC fractions 5 and 6. The pI values were calculated using ExPASy (https://web.expasy.org/compute_pi/). (E) Number of proteoform (the black line) and protein identifications (the dark cyan colored bars) cumulated on fractions.
The SEC-cIEF-MS/MS identified 10,153 proteoform-spectrum matches (PrSMs), 1896 proteoforms, and 365 proteins from the E. coli proteome with a 5% proteoform-level FDR, Figure S6A and the Supporting Information. The data represent the first and largest TDP dataset using cIEF-MS/MS. The majority of the identified proteoforms had masses less than 20 kDa, while 83 proteoforms were between 20 and 33 kDa, Figure S6B. Although the extracted E. coli proteome consisted of proteins ranging from ~10 to 100 kDa (Figure 2B), characterization of proteoforms larger than 30 kDa remains challenging for top-down MS because of the dramatic decrease in the signal-to-noise ratio with the increase in the proteoform’s mass, limited mass resolution of mass analyzers, and ion suppression caused by coeluted small proteins. The number of matched fragment ions of identified proteoforms was in a range of 6–92 with the mean at 23, Figure S6C. An example of the fragmentation pattern of one proteoform (putative monooxygenase YdhR) is shown in Figure S6D. The proteoform was identified with 76 fragment ions, a 1.71 × 10−45 E-value, and a 52% backbone cleavage coverage. On average, we identified about five proteoforms per protein (1896 proteoforms and 365 proteins). For some proteins, the number of proteoforms could be much higher. For instance, we identified 48 proteoforms of the protein osmotically inducible protein Y (osmY). All these proteoforms were truncated either at the N-termini (47) or at the C-termini (1). Because TDP directly characterizes intact proteoforms, we were able to determine the distribution of the first amino acid residue position of the truncated proteoforms at the N-termini, Figure S6E. For 23 out of the 47 N-terminally truncated proteoforms, the first 28 amino acids residues were cleaved as the signal peptide as reported in the literature.37 Interestingly, we also identified 3 and 4 proteoforms with the first 27 and 114 amino acid residues truncated, respectively. We then analyzed relative abundance of these proteoforms truncated at different positions based on the number of PrSMs of each proteoform,13,38 Figure S6F. The 23 proteoforms with the first 28 amino acid residues removed accounted for about 87% of the total number of PrSMs of osmY (248 out of 284). We further examined the 23 proteoforms and discovered that they either had no PTMs or carried various PTMs, for example, methylation, acetylation, and succinylation. According to their numbers of PrSMs, the proteoform with the first 28 amino acids removed and without any PTMs is the most abundant proteoform of osmY in the E. coli cells. The data suggest the power of our SEC-cIEF-MS/MS platform for delineating proteoforms in complex biological samples on a global scale.
Quantitative TDP of Zebrafish Male and Female Brains.
Sexual dimorphism of brains, which is mainly generated from the expression of sex chromosome genes and effects of hormones secreted from gonads, determines phenotypic differences on memory, cognition, emotion, stress responsivity, and reproductive behaviors.39 Only several studies employed quantitative BUP to study sexual dimorphism of brains.40–42 Based on our knowledge, no quantitative TDP studies have been done to compare male and female brain proteomes in a proteoform-specific manner. Zebrafish is an important model organism in developmental biology for both embryogenesis studies and drug development.43 Here, we performed a label-free quantitative TDP study using SEC-cIEF-MS/MS to investigate the sex-related proteoforms in zebrafish brains.
Five male zebrafish brains were pooled and homogenized to reduce heterogeneity between fishes, and the extracted protein sample was fractionated by SEC into four fractions. The female zebrafish brains were prepared with the same protocol. The eight SEC fractions (four fractions each gender) were analyzed by the high-throughput cIEF-MS/MS in technical triplicate. The relative abundance of proteoforms was compared between female and male brains for each pair of SEC fractions (i.e., male SEC fraction 1 vs female SEC fraction 1) to simplify the quantitative TDP data analysis. A total of 171, 1268, 1260, and 741 proteoforms corresponding to 51, 211, 216, and 192 proteins were identified from SEC fraction 1, 2, 3, and 4, respectively. Proteoforms with N-terminal methionine excision, N-terminal truncation or signal peptide cleavage, and several common PTMs, including acetylation (+42 Da), phosphorylation (+80 Da), and methylation (+14 Da), were identified. For instance, we identified a proteoform of calmodulin containing an N-terminal methionine excision, an N-terminal acetylation, and K115 trimethylation, Figure 3A, which was also reported in our previous study of zebrafish brains using CZE-MS/MS.44 In addition, we identified an N-terminal truncated proteoform of caveolae-associated protein 4a with the sequence ranged from Lys273 to Asp329 and it is phosphorylated at Thr292, Figure 3B. The phosphorylation at Thr292 was further confirmed by PTM information in UniProt (https://www.uniprot.org/uniprot/A1L260).
Figure 3.

Quantitative TDP of four SEC fractions of female and male zebrafish brains using cIEF-MS/MS. (A) Sequence and fragmentation pattern of a proteoform of calmodulin. The sequence underlined with green line has a mass shift of 42.0 Da corresponding to trimethylation at K115. (B) Sequence and fragmentation pattern of a proteoform of cavelolae-associated protein 4a. A mass shift of 79.0 Da at T292 corresponds to a phosphorylation modification. (C–F) Volcano plots of −log(p-value) vs log2(fold change, female/male) of quantified proteoforms in SEC fractions 1, 2, 3, and 4 of female and male brains, respectively. The differentially expressed proteoforms were determined by t-test using Perseus with cutoff settings of FDR = 0.05 and S0 = 1. The proteoforms with higher abundance in the female and male brains are highlighted in red and dark cyan color, respectively.
When performing LFQ, only the proteoforms having reported intensities across the six cIEF-MS/MS runs (triplicate runs per gender) were considered for further abundance comparisons between genders. The feature intensity of selected proteoforms was normalized and compared based on the t-test analysis using an FDR threshold of 0.05 and S0 of 1, as depicted in Figure 3C–F. Out of the 109, 814, 1089, and 569 quantified proteoforms in SEC fractions 1 to 4, we discovered 2, 92, 34, and 40 proteoforms showing higher abundance in the corresponding SEC fractions of the female brain sample, while 3, 54, 37, and 21 proteoforms presented higher abundance in relevant fractions of the male brain sample. In total, 263 proteoforms showed statistically significant difference in abundance between the male and female brains.
To understand biological significance of these differentially expressed proteoforms, we performed gene ontology (GO) enrichment analysis of genes whose proteoforms showed significantly higher abundance in female and male brains. We focused on examining the enriched biological process (BP) from 29 annotated genes of female (Figure S7A) and 34 genes of male (Figure S7B). In female brains, the enriched BP categories consist of sequestering of actin monomers, histone exchange, neuron projection development, cell proliferation, and actin filament organization, suggesting that these proteoforms are involved in neurite outgrowth and neuronal development. Sequestering of actin monomers, as the most enriched BP category, includes thymosin beta 2 (Tβ 2) and beta thymosin-like protein. Two proteoforms of the Tβ 2 and five proteoforms of the beta thymosin-like protein showed significantly higher abundance in female brains. Studies on beta-thymosin of zebrafish have revealed that the protein has monomeric actin binding ability and regulates neuronal growth and differentiation.45,46 However, the mechanism of how specific proteoforms of beta-thymosin are involved in sex-specific functions of the brains remains unknown. The category of histone exchange includes acidic leucine-rich nuclear phosphoprotein 32 family member A (ANP32A) and acidic leucine-rich nuclear phosphoprotein 32 family member E (ANP32E). APN32A plays a role in inhibiting the acetyl-transferase complex in the nucleus, regulating initiation of transcription.47 APN32E is implicated in the removal histone variant H2A.Z via inhibiting protein phosphatase 2A, promoting synaptogenesis.48–50 Overexpression of N-terminal truncated proteoforms of APN32A and APN32E in female brain might play some roles in sex-related regulation of transcription and neuron cell proliferation. Prothymosin alpha-A (PTα-A) and prothymosin alpha-B (PTα-B), which are enriched in the cell proliferation category, have both N-terminal and C-terminal truncated proteoforms identified in our study. PTα is an essential nuclear protein, which regulates cell proliferation and protects brain from stroke or traumatic damage by inhibiting cell apoptosis and neuronal necrosis.51 In breast cancer MCF7 cells, PTα was found to be upregulated by estradiol at both mRNA and protein levels, and gene transcription activity of PTα can be altered by estrogen receptor α.52 Similar data have been observed in neuro-blastoma cell, in which the synthesis of PTα can be promoted via estradiol treatment.53 This evidence indicates that the overexpressed proteoforms of PTα in the female brains may be associated with estrogen-regulated neural cell proliferation and differentiation.
In male brains, axon development and axon extension were enriched in BP categories, Figure S7B. Several proteoforms are overexpressed in male brains and their corresponding genes are involved in neuronal development. For example, growth-associated protein 43 (Gap43), a membrane bound protein, is responsible for axonal outgrowth and elongation.54 We found a fragment of Gap43 which was highly expressed in the male brains but not in the female brains, suggesting that the expression of Gap43 might be regulated by hormones. This hypothesis was consistent with previous studies, which showed that the mRNA of Gap43 was regulated by gonadal hormones and had sexual dimorphism.54,55 Interestingly, we identified several overexpressed proteoforms in the male brains from pro-opiomelanocortin (POMC), prodynorphin (PDYN), and prepronociceptin a (PPNOC), which are relevant with the neuropeptide signaling pathway. Particularly, POMC and PDYN are important neuropeptide precursors that can be proteolytically cleaved at either paired (such as Lys–Arg or Arg–Arg) or single basic residues to generate endogenous hormone peptides.56,57 We identified two proteoforms of POMC located in the region of the N-terminal peptide of POMC (NPP, Gln29 to Ser73), which is a potential adrenal growth factor.58 A proteoform of POMC (Ser54 to His105), which contains cleavage sites at His–Lys at the C-terminus and Arg–Ser at N-terminus, was identified with 4.6 times higher abundance in the male brains than in the female brains (p-value: 10−3.9). The other proteoform (Gln29 to Arg53) with the N-terminal signaling peptide cleaved was found with 2.9 times higher abundance in male brains compared to that in the female brains (p-value: 10−2.3). Additionally, a proteoform (Asp20 to Val100) of PDYN generated from excision of the N-terminal signaling peptide and cleavage at Val–Lys at C-terminus showed statistically higher abundance in male brain. A mass shift of +55.06 Da localized in range of Gly81 to Ala85 could be due to a sequence variation or a PTM. We also identified another PDYN proteoform having the same sequence without any mass shift, which showed no statistically significant difference in abundance between male and female brains. Further study will be needed to investigate hormone-related BPs regulated by overexpression of the proteoform of PDYN with the mass shift in male brains.
We noted that ten and four phosphorylated proteoforms showed significantly higher abundance in female and male brains, respectively, including but not limited to proteoforms of beta thymosin-like protein, MARCKS-related protein 1-B, thymosin beta 2, calmodulin, and microtubule-associated protein (Supporting Information). The data suggest the potential role of protein phosphorylation in sexual dimorphism.
In summary, we discovered drastic differences in proteoform abundance between male and female zebrafish brains using SEC-cIEF-MS/MS-based label-free TDP. A variety of differentially expressed proteoforms are associated with neuronal development. For example, proteoforms of Tβ 2, beta thymosin-like protein, APN32A, APN32E, PTα-A, PTα-B, stathmin, and microtubule-associated protein were highly expressed in female brains, while proteoforms of neurofilament (medium polypeptide), Gap43, trafficking regulator of GLUT4 (SLC2A4) 1a, and tubulin polymerization-promoting protein family member 2 were highly expressed in the male brains. It has been found that hormones can regulate most of gene expression corresponding to the proteoforms mentioned above and affect multiple cellular processes such as neurogenesis, cell death, and cell differentiation.59 We speculate that the sex-dependent proteoform expression profile in zebrafish brains could be closely associated with hormone regulation in different genders. Discovering these differentially expressed proteoforms will help us pursue a better understanding of the sex-related neuronal developmental process. Our data demonstrate the value of quantitative TDP in studying sexual dimorphism of brains.
Quantitative BUP of Zebrafish Male and Female Brains.
We also performed tandem mass tag (TMT)-based quantitative BUP of male and female zebrafish brains. We have two goals: first, acquire a comprehensive picture of sex-dependent gene expression outcome in brain at the protein group level and second, compare and combine the quantification results of TDP and BUP to pursue a better understanding of sexual dimorphism of the brain. The workflow of TMT quantification is shown in Figure S8. In our experiment, we quantified 3811 protein groups from 30,738 peptides (Supporting Information). The volcano plot was generated with t-test cut-off settings of FDR 0.05 and S0 0.4. We discovered that 67 protein groups were overexpressed in female brains, while 221 protein groups were overexpressed in male brains, Figure 4A. GO enrichment analysis of highly expressed protein groups in female indicated several categories associated with neuron growth and brain development, including histone exchange, translational initiation, translation, and cell proliferation, which is consistent with our findings in the top-down study. Overexpressed proteins such as APN32A, APN32E, PTα-A and PTα-B have also been identified to be highly expressed in proteoforms using TDP. Some other highly expressed proteins not annotated in enrichment analysis also drew our attention because they showed drastically higher abundance in female brains. For example, vitellogenin 1 and vitellogenin 5 from the vitellogenin gene family are typical estrogenic biomarkers and showed 10.6- (p-value: 10−4.3) and 4.6-fold (p-value: 10−4.8) higher abundance in the female brains. Coagulation factor XIII (A1 polypeptide a, tandem duplicate 1), which exhibited 3.3-fold (p-value: 10−3.6) higher level in female brains, was reported to be greatly upregulated by 17 β-estradiol during embryonic development process.60 We particularly found multiple hormone-regulated proteins showing significantly higher abundance in male brains than in female brains. These proteins include hemopexin, antithrombin, and lectin (mannose-binding, 1), which are associated with cellular response to estrogen stimulus based on GO enrichment analysis. For example, hemopexin, as a heme scavenger, maintains iron homeostasis in neurons and prevents heme-mediated oxidative damage.61 Treatment of zebrafish embryos with estrogen downregulated the expression of hemopexin in the liver at various developmental stages.60 In our study, the hemopexin showed 2.8 times (p-value: 10−5.3) higher abundance in male brains, which may be associated with the lower level of estrogen.
Figure 4.

Comparison of quantitative BUP and TDP data for achieving overview of gene expression outcome at the protein group and proteoform levels. (A) Volcano plot of protein groups quantified in female and male brains of zebrafish from BUP. The cutoff settings for t-test were FDR = 0.05 and S0 = 0.4. Comparison of quantitative results of female (B) and male (C) zebrafish brains between TDP and BUP. “ND” means not detected; “-” suggests no significant change in expression level.
When comparing quantitation results of TDP and BUP, we extracted protein accession numbers from the differentially expressed proteoforms from TDP and used them to match with protein groups quantified by BUP to examine whether they were upregulated, downregulated, not differentially expressed, or not identified. Our data revealed that the majority of proteoforms having statistically higher abundance in the female (82.9%) or male brains (77.9%) were not differentially expressed at the protein group level (Figure 4B,C). For instance, several proteoforms of beta thymosin-like protein, beta-synuclein, thymosin beta 2, calmodulin, pro-opiomelanocortin, and prodynorphin with various PTMs from TDP have showed statistically significant difference in abundance between male and female brains. However, the BUP failed to catch these differences and revealed no significant abundance difference at the protein group level for these proteins. Interestingly, for 5.1 and 8.7% of the differentially expressed proteoforms, TDP and BUP data show an opposite expression pattern. For only 10.1 and 12.5% of the differentially expressed proteoforms, the TDP and BUP data agree. The discrepancy between BUP and TDP data is expected. The quantitative BUP here most likely only provided the difference of the average abundance of all proteoforms stemming from one gene between the two samples. BUP could not provide the direct abundance and PTM information of individual proteoforms because the intact proteoform pictures are lost during the enzymatic digestion. Quantitative TDP here directly characterized individual proteoforms regarding PTMs and abundance difference between samples. It is well known that proteins can undergo significant changes in PTMs without overall protein abundance changes in various BPs. Therefore, it is reasonable that a proteoform carrying specific PTMs quantified by TDP has significant abundance difference between samples and the corresponding protein from BUP that represents all the proteoforms of the gene have consistent abundance in the samples.
The data of comparing BUP and TDP datasets are very important. First, the results show that combining two quantitative strategies is potentially valuable for generating comprehensive information regarding sexual dimorphism of zebrafish brains because TDP and BUP can provide complementary information on gene expression products. Second, the discrepancies between the BUP and TDP data clearly indicate the importance of delineating proteins in a proteoform-specific manner with TDP for accurately understanding protein function in various BPs. The TDP and BUP MS raw data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository62 with the dataset identifier PXD020342.
CONCLUSIONS
We reported the first applications of cIEF-MS/MS and SEC-cIEF-MS/MS for large-scale TDP of complex biological samples with the identification and quantification of thousands of proteoforms. Multilevel quantitative proteomics of male and female zebrafish brains discovered a variety of differentially expressed proteins and proteoforms associated with neuron development and hormone peptide activity. We revealed drastic discrepancies between quantitative results of TDP and BUP, suggesting the importance of multilevel proteomics for accurately understanding the roles played by proteins in cellular processes.
We need to point out that the performance of the automated cIEF-MS/MS is limited for the characterization of highly basic (pI > 10) or acidic (pI < 3) proteoforms. The ampholyte used in the cIEF-MS/MS still produces ionization suppression of proteoforms and contamination of the instrument. In the future studies, we need to improve the technique for extremely acidic and basic proteoforms and eliminate the negative effect of ampholyte via novel approaches, for example, immobilized pH gradient-based cIEF.63,64 Additionally, we need to further improve the stability of the capillary coating under automated cIEF-MS conditions.
Supplementary Material
ACKNOWLEDGMENTS
We thank the Prof. Heedeok Hong’s group at the Department of Chemistry of Michigan State University for kindly providing the E. coli cells for this project. We thank Prof. Jose Cibelli’s group at the Department of Animal Science of Michigan State University for their help in collecting zebrafish brains for the project. We thank the Prof. Xiaowen Liu’s group at the Indiana University-Purdue University Indianapolis for their help in the TDP database search using the TopPIC software. We thank the support from the National Institute of General Medical Sciences (NIGMS) through Grant R01GM125991 and the National Science Foundation through Grant DBI1846913 (CAREER Award).
Footnotes
Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.analchem.0c03266.
Experimental details, cIEF-MS data of standard proteins using different lengths of the NH4OH plug, cIEF-MS data of standard proteins using different protein concentrations, cIEF-MS data of standard proteins using different sample injection lengths, cIEF-MS data of standard proteins using different concentrations of the ampholyte, correlations of proteoform LFQ intensities between cIEF-MS/MS runs, identification results of the E. coli proteome from large-scale TDP using SEC-cIEF-MS/MS, GO enrichment analysis of differentially expressed proteoforms in female and male brains of zebrafish, and workflow of quantitative BUP (PDF) Identified and quantified proteoforms and proteins from BUP and TDP (XLSX)
Complete contact information is available at: https://pubs.acs.org/10.1021/acs.analchem.0c03266
The authors declare no competing financial interest.
Contributor Information
Tian Xu, Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States.
Xiaojing Shen, Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States;.
Zhichang Yang, Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States.
Daoyang Chen, Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States.
Rachele A. Lubeckyj, Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
Elijah N. McCool, Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
Liangliang Sun, Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States;.
REFERENCES
- (1).Smith LM; Kelleher NL; Kelleher NL; Goodlett D; Langridge-Smith P; Goo YA; Safford G; Bonilla L; Kruppa G; Zubarev R Nat. Methods 2013, 10, 186–187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Smith LM; Kelleher NL Science 2018, 359, 1106–1107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (3).Toby TK; Fornelli L; Kelleher NL Annu. Rev. Anal. Chem 2016, 9, 499–519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (4).Cabras T; Pisano E; Montaldo C; Giuca MR; Iavarone F; Zampino G; Castagnola M; Messana I Mol. Cell. Proteomics 2013, 12, 1844–1852. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (5).Calligaris D; Villard C; Lafitte DJ Proteomics 2011, 74, 920–934. [DOI] [PubMed] [Google Scholar]
- (6).Li H; Nguyen HH; Ogorzalek Loo RR; Campuzano IDG; Loo JA Nat. Chem 2018, 10, 139–148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (7).Gomes FP; Yates JR III Mass Spectrom. Rev 2019, 38, 445–460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (8).Schaffer LV; Millikin RJ; Miller RM; Anderson LC; Fellers RT; Ge Y; Kelleher NL; LeDuc RD; Liu X; Payne SH; Sun L; Thomas PM; Tucholski T; Wang Z; Wu S; Wu Z; Yu D; Shortreed MR; Smith LM Proteomics 2019, 19, 1800361. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (9).Shen X; Yang Z; McCool EN; Lubeckyj RA; Chen D; Sun L TrAC, Trends Anal. Chem 2019, 120, 115644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (10).Han X; Wang Y; Aslanian A; Bern M; Lavallée-Adam M; Yates JR III Anal. Chem 2014, 86, 11006–11012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (11).Lubeckyj RA; Basharat AR; Shen X; Liu X; Sun LJ Am. Soc. Mass Spectrom 2019, 30, 1435–1445. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (12).Lubeckyj RA; McCool EN; Shen X; Kou Q; Liu X; Sun L Anal. Chem 2017, 89, 12059–12067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).McCool EN; Lubeckyj RA; Shen X; Chen D; Kou Q; Liu X; Sun L Anal. Chem 2018, 90, 5529–5533. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (14).Zhao Y; Sun L; Zhu G; Dovichi NJ J. Proteome Res 2016, 15, 3679–3685. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).Hühner J; Lämmerhofer M; Neusüß C Electrophoresis 2015, 36, 2670–2686. [DOI] [PubMed] [Google Scholar]
- (16).Shen Y; Xiang F; Veenstra TD; Fung EN; Smith RD Anal. Chem 1999, 71, 5348–5353. [DOI] [PubMed] [Google Scholar]
- (17).Paša-Tolić L; Jensen PK; Anderson GA; Lipton MS; Peden KK; Martinović S; Tolić N; Bruce JE; Smith RD J. Am. Chem. Soc 1999, 121, 7949–7950. [Google Scholar]
- (18).Tang Q; Harrata AK; Lee CS Anal. Chem 1995, 67, 3515–3519. [Google Scholar]
- (19).Tang Q; Harrata AK; Lee CS Anal. Chem 1997, 69, 3177–3182. [DOI] [PubMed] [Google Scholar]
- (20).Jensen PK; Paša-Tolić L; Anderson GA; Horner JA; Lipton MS; Bruce JE; Smith RD Anal. Chem 1999, 71, 2076–2084. [DOI] [PubMed] [Google Scholar]
- (21).Smith RD; Barinaga CJ; Udseth HR Anal. Chem 1988, 60, 1948–1952. [Google Scholar]
- (22).Maxwell EJ; Zhong X; Zhang H; van Zeijl N; Chen DDY Electrophoresis 2010, 31, 1130–1137. [DOI] [PubMed] [Google Scholar]
- (23).Sun L; Zhu G; Zhang Z; Mou S; Dovichi NJ J. Proteome Res 2015, 14, 2312–2321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (24).Wojcik R; Dada OO; Sadilek M; Dovichi NJ Rapid Commun. Mass Spectrom 2010, 24, 2554–2560. [DOI] [PubMed] [Google Scholar]
- (25).Mokaddem M; Gareil P; Varenne A Electrophoresis 2009, 30, 4040–4048. [DOI] [PubMed] [Google Scholar]
- (26).Zhong X; Maxwell EJ; Ratnayake C; Mack S; Chen DDY Anal. Chem 2011, 83, 8748–8755. [DOI] [PubMed] [Google Scholar]
- (27).Zhu G; Sun L; Dovichi NJ J. Sep. Sci 2017, 40, 948–953. [DOI] [PubMed] [Google Scholar]
- (28).Dai J; Lamp J; Xia Q; Zhang Y Anal. Chem 2018, 90, 2246–2254. [DOI] [PubMed] [Google Scholar]
- (29).Lechner A; Giorgetti J; Gahoual R; Beck A; Leize-Wagner E; François Y-NJ Chromatogr. B: Anal. Technol. Biomed. Life Sci 2019, 1122–1123, 1–17. [DOI] [PubMed] [Google Scholar]
- (30).Wang L; Bo T; Zhang Z; Wang G; Tong W; Da Yong Chen D Anal. Chem 2018, 90, 9495–9503. [DOI] [PubMed] [Google Scholar]
- (31).Wang L; Chen DDY Electrophoresis 2019, 40, 2899–2907. [DOI] [PubMed] [Google Scholar]
- (32).Kou Q; Xun L; Liu X Bioinformatics 2016, 32, 3495–3497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (33).Anderson LC; DeHart CJ; Kaiser NK; Fellers RT; Smith DF; Greer JB; LeDuc RD; Blakney GT; Thomas PM; Kelleher NL; Hendrickson CL J. Proteome Res 2017, 16, 1087–1096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (34).Liu Z; Wang R; Liu J; Sun R; Wang FJ Proteome Res. 2019, 18, 2185–2194. [DOI] [PubMed] [Google Scholar]
- (35).Riley NM; Sikora JW; Seckler HS; Greer JB; Fellers RT; LeDuc RD; Westphall MS; Thomas PM; Kelleher NL; Coon JJ Anal. Chem 2018, 90, 8553–8560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (36).Zhu G; Sun L; Yang P; Dovichi NJ Anal. Chim. Acta 2012, 750, 207–211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (37).Yim HH; Villarejo MJ Bacteriol. 1992, 174, 3637–3644. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (38).Geis-Asteggiante L; Ostrand-Rosenberg S; Fenselau C; Edwards NJ Anal. Chem 2016, 88, 10900–10907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (39).Arnold AP Nat. Rev. Neurosci 2004, 5, 701–708. [DOI] [PubMed] [Google Scholar]
- (40).Di Domenico F; Casalena G; Sultana R; Cai J; Pierce WM; Perluigi M; Cini C; Baracca A; Solaini G; Lenaz G; Jia J; Dziennis S; Murphy SJ; Alkayed NJ; Butterfield DA Brain Res. 2010, 1362, 1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (41).Martins-de-Souza D; Schmitt A; Röder R; Lebar M; Schneider-Axmann T; Falkai P; Turck CW J. Psychiatr. Res 2010, 44, 989–991. [DOI] [PubMed] [Google Scholar]
- (42).Ogata Y; Charlesworth MC; Higgins L; Keegan BM; Vernino S; Muddiman DC Proteomics 2007, 7, 3726–3734. [DOI] [PubMed] [Google Scholar]
- (43).Howe K; Clark MD; Torroja CF; Torrance J; Berthelot C; Muffato M; Collins JE; Humphray S; McLaren K; Matthews L; McLaren S; Sealy I; Caccamo M; Churcher C; Scott C; Barrett JC; Koch R; Rauch G-J; White S; Chow W; Kilian B; Quintais LT; Guerra-Assunção JA; Zhou Y; Gu Y; Yen J; Vogel J-H; Eyre T; Redmond S; Banerjee R; Chi J; Fu B; Langley E; Maguire SF; Laird GK; Lloyd D; Kenyon E; Donaldson S; Sehra H; Almeida-King J; Loveland J; Trevanion S; Jones M; Quail M; Willey D; Hunt A; Burton J; Sims S; McLay K; Plumb B; Davis J; Clee C; Oliver K; Clark R; Riddle C; Elliott D; Threadgold G; Harden G; Ware D; Begum S; Mortimore B; Kerry G; Heath P; Phillimore B; Tracey A; Corby N; Dunn M; Johnson C; Wood J; Clark S; Pelan S; Griffiths G; Smith M; Glithero R; Howden P; Barker N; Lloyd C; Stevens C; Harley J; Holt K; Panagiotidis G; Lovell J; Beasley H; Henderson C; Gordon D; Auger K; Wright D; Collins J; Raisen C; Dyer L; Leung K; Robertson L; Ambridge K; Leongamornlert D; McGuire S; Gilderthorp R; Griffiths C; Manthravadi D; Nichol S; Barker G; Whitehead S; Kay M; Brown J; Murnane C; Gray E; Humphries M; Sycamore N; Barker D; Saunders D; Wallis J; Babbage A; Hammond S; Mashreghi-Mohammadi M; Barr L; Martin S; Wray P; Ellington A; Matthews N; Ellwood M; Woodmansey R; Clark G; Cooper JD; Tromans A; Grafham D; Skuce C; Pandian R; Andrews R; Harrison E; Kimberley A; Garnett J; Fosker N; Hall R; Garner P; Kelly D; Bird C; Palmer S; Gehring I; Berger A; Dooley CM; Ersan-Üruün Z; Eser C; Geiger H; Geisler M; Karotki L; Kirn A; Konantz J; Konantz M; Oberländer M; Rudolph-Geiger S; Teucke M; Lanz C; Raddatz G; Osoegawa K; Zhu B; Rapp A; Widaa S; Langford C; Yang F; Schuster SC; Carter NP; Harrow J; Ning Z; Herrero J; Searle SMJ; Enright A; Geisler R; Plasterk RHA; Lee C; Westerfield M; de Jong PJ; Zon LI; Postlethwait JH; Nüsslein-Volhard C; Hubbard TJP; Crollius HR; Rogers J; Stemple DL Nature 2013, 496, 498–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (44).McCool EN; Chen D; Li W; Liu Y; Sun L Anal. Methods 2019, 11, 2855–2861. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (45).Roth LW; Bormann P; Bonnet A; Reinhard E Development 1999, 126, 1365–1374. [DOI] [PubMed] [Google Scholar]
- (46).van Kesteren RE; Carter C; Dissel HM; van Minnen J; Gouwenberg Y; Syed NI; Spencer GE; Smit AB J. Neurosci 2006, 26, 152–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (47).Wang S; Wang Y; Lu Q; Liu X; Wang F; Ma X; Cui C; Shi C; Li J; Zhang D BioMed Res. Int 2015, 2015, 207347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (48).Costanzo RV; Vilá-Ortíz GJ; Perandones C; Carminatti H; Matilla A; Radrizzani M Eur. J. Neurosci 2006, 23, 309–324. [DOI] [PubMed] [Google Scholar]
- (49).Obri A; Ouararhni K; Papin C; Diebold M-L; Padmanabhan K; Marek M; Stoll I; Roy L; Reilly PT; Mak TW; Dimitrov S; Romier C; Hamiche A Nature 2014, 505, 648–653. [DOI] [PubMed] [Google Scholar]
- (50).Shin H; He M; Yang Z; Jeon YH; Pfleger J; Sayed D; Abdellatif M Biochim. Biophys. Acta, Gene Regul. Mech 2018, 1861, 481–496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (51).Fujita R; Ueda M; Fujiwara K; Ueda H Cell Death Differ. 2009, 16, 349–358. [DOI] [PubMed] [Google Scholar]
- (52).Bianco NR; Montano MM Oncogene 2002, 21, 5233–5244. [DOI] [PubMed] [Google Scholar]
- (53).Ciana P; Ghisletti S; Mussi P; Eberini I; Vegeto E; Maggi AJ Biol. Chem 2003, 278, 31737–31744. [DOI] [PubMed] [Google Scholar]
- (54).Lustig RH; Sudol M; Pfaff DW; Federoff HJ Mol. Brain Res 1991, 11, 125–132. [DOI] [PubMed] [Google Scholar]
- (55).Shughrue PJ; Dorsa DM J. Comp. Neurol 1994, 340, 174–184. [DOI] [PubMed] [Google Scholar]
- (56).Benjannet S; Rondeau N; Day R; Chretien M; Seidah NG Proc. Natl. Acad. Sci. U.S.A 1991, 88, 3564–3568. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (57).Day R; Lazure C; Basak A; Boudreault A; Limperis P; Dong W; Lindberg IJ Biol. Chem 1998, 273, 829–836. [DOI] [PubMed] [Google Scholar]
- (58).Bicknell AB J. Mol. Endocrinol 2016, 56, T39–T48. [DOI] [PubMed] [Google Scholar]
- (59).Cooke B; Hegstrom CD; Villeneuve LS; Breedlove SM Front. Neuroendocrinol 1998, 19, 323–362. [DOI] [PubMed] [Google Scholar]
- (60).Hao R; Bondesson M; Singh AV; Riu A; McCollum CW; Knudsen TB; Gorelick DA; Gustafsson J-Å PLoS One 2013, 8, No. e79020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (61).Hahl P; Davis T; Washburn C; Rogers JT; Smith A J. Neurochem 2013, 125, 89–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (62).Perez-Riverol Y; Csordas A; Bai J; Bernal-Llinares M; Hewapathirana S; Kundu DJ; Inuganti A; Griss J; Mayer G; Eisenacher M; Pérez E; Uszkoreit J; Pfeuffer J; Sachsenberg T; Yılmaz Ş; Tiwary S; Cox J; Audain E; Walzer M; Jarnuczak AF; Ternent T; Brazma A; Vizcaíno JA Nucleic Acids Res. 2019, 47, D442–D450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (63).Zhu G; Yuan H; Zhao P; Zhang L; Liang Z; Zhang W; Zhang Y Electrophoresis 2006, 27, 3578–3583. [DOI] [PubMed] [Google Scholar]
- (64).Yang C; Wang S; Chang C; Wang Y; Hu X Anal. Chem 2010, 82, 1580–1583. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
