Peptidomics dataset: Blood plasma and serum samples of healthy donors fractionated on a set of chromatography sorbents

Georgij Arapidi; Maria Osetrova; Olga Ivanova; Ivan Butenko; Tatjana Saveleva; Polina Pavlovich; Nikolay Anikanov; Vadim Ivanov; Vadim Govorun

doi:10.1016/j.dib.2018.04.018

. 2018 Apr 10;18:1204–1211. doi: 10.1016/j.dib.2018.04.018

Peptidomics dataset: Blood plasma and serum samples of healthy donors fractionated on a set of chromatography sorbents

Georgij Arapidi ^a,^b,^c,^⁎, Maria Osetrova ^a,^b, Olga Ivanova ^a, Ivan Butenko ^b,^c, Tatjana Saveleva ^a,^b, Polina Pavlovich ^a,^b, Nikolay Anikanov ^a,^c, Vadim Ivanov ^a, Vadim Govorun ^a,^b,^c

PMCID: PMC5996950 PMID: 29900295

Abstract

Blood as connective tissue potentially contains evidence of all processes occurring within the organism, at least in trace amounts (Petricoin et al., 2006) [1]. Because of their small size, peptides penetrate cell membranes and epithelial barriers more freely than proteins. Among the peptides found in blood, there are both fragments of proteins secreted by various tissues and performing their function in plasma and receptor ligands: hormones, cytokines and mediators of cellular response (Anderson et al., 2002) [2]. In addition, in minor amounts, there are peptide disease markers (for example, oncomarkers) and even foreign peptides related to pathogenic organisms and infection agents. To propose an approach for detailed peptidome characterization, we carried out an LC–MS/MS analysis of blood serum and plasma samples taken from 20 healthy donors on a TripleTOF 5600+ mass-spectrometer. We prepared samples based on our previously developed method of peptide desorption from the surface of abundant blood plasma proteins followed by standard chromatographic steps (Ziganshin et al., 2011) [3]. The mass-spectrometry peptidomics data presented in this article have been deposited to the ProteomeXchange Consortium (Deutsch et al., 2017) [4] via the PRIDE partner repository with the dataset identifier PXD008141 and 10.6019/PXD008141.

Specifications table

Subject area	Biochemistry
More specific subject area	Proteomics, peptidomics
Type of data	LC–MS/MS data and identification data
How data was acquired	TripleTOF 5600+ mass spectrometer with a NanoSpray III ion source (Sciex, Canada) coupled with a NanoLC Ultra 2D+ nano-HPLC system (Eksigent, USA)
Data format	Raw and analyzed data
Experimental factors	Blood plasma and serum samples of 10 healthy male and 10 healthy female donors
Experimental features	Samples were fractionated on several sorbents (cation exchange Toyopearl CM-650M, CM Bio-Gel A, SP Sephadex C-25 and anion exchange QAE Sephadex A-25) and analyzed by LC–MS/MS individually and pooled
Data source location	Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Miklukho-Maklaya str. 16/10, Moscow 117997, Russian Federation
Data accessibility	The mass spectrometry peptidomics data have been deposited to the ProteomeXchange Consortium via the PRIDE[5]partner repository with the dataset identifierPXD008141 and 10.6019/PXD008141. Direct download link: http://www.ebi.ac.uk/pride/archive/projects/PXD008141

Open in a new tab

Value of data

•
The dataset contains 59 raw LC–MS/MS analyses of blood plasma and serum samples fractionated on several different chromatography sorbents. The chromatography methods used complement each other.
•
The dataset contains a large number of identified peptides, fragments of known human proteins. The dataset describes the possibilities of our approach for detailed peptidome characterization.
•
The dataset can be analyzed for novel peptides, e.g. products of possible lncRNA translation.
•
The dataset allows for extended statistical analysis, and we encourage such collaborations.

1. Data

Blood plasma and serum samples of 10 healthy male and 10 healthy female donors were fractionated on a set of sorbents (cation exchange Toyopearl CM-650M, CM Bio-Gel A, SP Sephadex C-25 and anion exchange QAE Sephadex A-25) and analyzed by LC–MS/MS individually and pooled in equal quantities, separately for male and female samples (Table 1, Supplementary Table S1).

Table 1.

Number of analyzed samples in each group.

Biomaterial type	Chromatography sorbent	Sample type	Number of samples
Plasma	Toyopearl	Individual	4 male, 4 female
Plasma	Toyopearl	Pool	6 male, 3 female
Serum	Toyopearl	Individual	4 male, 4 female
	Toyopearl	Pool	2 male, 2 female
	Bio-Gel	Pool	8 male, 7 female
	SP-Sephadex	Pool	2 male, 3 female
	QAE-Sephadex	Pool	6 male, 4 female

Open in a new tab

Initial analysis allowed to identify 13,590 unique peptides belonging to 1430 protein groups. The distribution of the identified peptides by groups is shown in Table 2 (more information can be found in Supplementary Table S2).

Table 2.

Number of identified peptides and spectra.

Sample group	Number of identified peptides	Number of identified spectra
Plasma, Toyopearl	4516	100,730
Serum, Toyopearl	4207	78,304
Serum, Bio-Gel	5058	67,409
Serum, SP-Sephadex	1635	7407
Serum, QAE-Sephadex	3614	25,782
Total number	13,590	279,632

Open in a new tab

The chromatography methods used complement each other since the identified peptides are fairly unique for a particular sample group (Fig. 1). On the other hand, the identified peptides are sufficiently reproducible within the sample group – approximately 70% of the peptides are reproduced in at least two out of four samples (Fig. 2).

Fig. 2 — Number of common and unique peptides for 4 individual blood plasma samples.

The analysed known protein-protein interactions between precursor proteins of the identified peptides via STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) database [6] revealed two perfectly seen clusters (Fig. 3). Functional enrichment analysis shown that one cluster consists of proteins belong to pathways “protein activation cascade” (GO:0072376), “endopeptidase inhibitor activity” (GO:0004866) and “Complement and coagulation cascades” (KEGG:04610), while the other includes proteins of pathways “structural constituent of cytoskeleton” (GO:0005200), “Actin” (PF00022) and “Tubulin C-terminal domain” (PF03953). Most of the precursor proteins of the identified peptides belong to the pathways indicating the involvement of these proteins in extracellular processes, as “extracellular region part” (GO:0044421), “extracellular exosome” (GO:0070062) and “membrane-bounded vesicle” (GO:0031988). All significant pathways can be found in Supplementary Table S3.

Fig. 3 — Protein–protein interaction network of the precursor proteins of the identified peptides. The analysis via STRING database. Red - “protein activation cascade” pathway; Green - “endopeptidase inhibitor activity” pathway; Yellow - “Complement and coagulation cascades” pathway; Blue - “structural constituent of cytoskeleton” pathway; Purple - “Actin” pathway; Brown - “Tubulin C-terminal domain” pathway.

2. Experimental design, materials and methods

2.1. Patients and specimens

Blood plasma and serum samples of 10 healthy male (average age 32 years) and 10 healthy female (average age 26 years) donors were collected from the Federal Research and Clinical Center of Physical Chemical Medicine of the Federal Medical and Biological Agency of the Russian Federation. The status of “healthy” was set based on anamnesis and further monitoring of the donor. The study was approved by the Ethics Committees of the clinical center and all the donors gave written informed consent for the participation in the study.

2.2. Sample collection

To obtain plasma, blood samples were collected from cubital vein into blood collection tubes (REF 456023, 6 ml, Vacuette tube, Austria). Plasma was obtained within 15 min after collection. The collection tubes were centrifuged for 15 min at 700 g at room temperature. The plasma was separated from blood cells, aliquoted and stored at –80 °C until analysis. To obtain serum, blood samples were collected from cubital vein into blood collection tubes (REF 456092, 6 ml, Vacuette tube, Austria). Serum was obtained after coagulation of blood for 1 h at room temperature. The collection tubes were centrifuged for 15 min at 700 g at room temperature. The serum was separated from the clot, aliquoted and stored at –80 °C until analysis.

2.3. Chemicals

Formic acid (FA), trifluoroacetic acid (TFA), acetic acid (AcOH), ammonium hydroxide solution (NH4OH) and sodium hydroxide (NaOH) were purchased from Sigma-Aldrich (St. Louis, MO, USA). LiChrosolv acetonitrile hypergrade for LC–MS (AсN), LiChrosolv acetone for liquid chromatography, LiChrosolv methanol gradient grade for liquid chromatography (MeOH), LiChrosolv ethanol gradient grade for liquid chromatography and HPLC grade water were acquired from Merck (Darmstadt, Germany).

2.4. Plasma and serum fractionation and peptide extraction

2.4.1. Peptide extraction using cation exchange Toyopearl, Bio-Gel or SP-Sephadex

Blood plasma and serum samples were fractionated either on Toyopearl CM-650M (Tosoh Bioscience LLC, USA) weak cation exchange particles, or CM Bio-Gel A (Bio-Rad Laboratories, Inc., USA) cation exchange gel, or SP Sephadex C-25 (GE Healthcare Bio-Sciences AB, Sweden) strong cation exchange particles. Preliminarily, 80 µl of sorbent was washed 2 times with 400 µl WS1 (20 mM AcOH (pH 3.5), NaOH) in an eppendorf tube. To precipitate the sorbent, the eppendorf tube was centrifuged at 500 g for 10 s. Then the solvent was accurately removed paying close attention not to take any sorbent beads. 200 µl of plasma/serum was diluted with 400 µl WS1 and added to the sorbent. After 30 min of incubation with vortexing, the sorbent was precipitated and the plasma/serum-buffer solution was removed. Then the sorbent was washed 3 times with 700 µl WS1. The sorbent was incubated for 15 min with 800 µl of 0.1% NH4OH (pH 11), precipitated and the eluate was collected. Finally, 9 µl of FA was added to eluate to adjust pH to approximately 3.

2.4.2. Peptide extraction using anion exchange QAE-Sephadex

Blood serum samples were fractionated on QAE Sephadex A-25 (GE Healthcare Bio-Sciences AB, Sweden) strong anion exchange particles. Preliminarily, 160 µl of sorbent was washed 2 times with 400 µl WS3 (20 mM Tris (pH 8.26), FA) in an eppendorf tude. To precipitate the sorbent, the eppendorf tude was centrifuged at 500 g for 10 s. Then the solvent was accurately removed paying close attention not to take any sorbent beads. 200 µl of serum was diluted with 400 µl WS3 and added to the sorbent. After 30 min of incubation with vortexing, the sorbent was precipitated and the serum-buffer solution was removed. Then the sorbent was washed 3 times with 700 µl WS3. The sorbent was incubated for 15 min with 800 µl of 0.5% TFA, precipitated and the eluate was collected.

2.4.3. Peptide desorption from abundant blood proteins

To desorb peptides from the surface of abundant blood proteins, we used the technique described earlier [3]. The eppendorf with the eluate after peptide extraction was incubated at 98 °C water for 15 min. After heating, additional fractionation was carried out on a C18 Discovery Supelco 50 mg (Sigma-Aldrich Co. LLC, USA) RP-SPE cartridge. The RP-SPE cartridge was pre-conditioned with 500 µl of MeOH and equilibrated 3 times with 500 µl of WS5 (3% AcN, 97% water, 0.1% TFA). Eluate from the previous step was applied on the sorbent and went through at a flow rate of approximately 200 µl/min. The cartridge was washed 3 times with 300 µl WS5. Eluate was collected with 1 ml 80% AcN, 20% water, 0.1%TFA, concentrated under vacuum to 5 µl and diluted with 10 µl WS5.

2.5. LC–MS/MS analysis

Analysis was performed on a TripleTOF 5600+ mass spectrometer with a NanoSpray III ion source (Sciex, Canada) coupled with a NanoLC Ultra 2D+ nano-HPLC system (Eksigent, USA). The HPLC system was configured in the trap-elute mode. For sample loading buffer and buffer A, a mixture of 98.9% water, 1% MeOH, 0.1% FA (v/v) was used. Buffer B was 99.9% AcN and 0.1% FA (v/v). Samples were loaded on a Chrom XP C18 trap column (3 µm, 120 Å, 350 µm 0.5 mm; Eksigent) at a flow rate of 3 µl/min for 10 min and eluted through a 3C18-CL-120 separation column (3 µm, 120 Å, 75 µm 150 mm; Eksigent) at a flow rate of 300 nl/min. The gradient was from 5% to 40% buffer B in 90 min followed by 10 min at 95% buffer B and 20 min of re-equilibration with 5% buffer B. Between different samples, two blank 45-min runs consisting of 5–8 min waves (5% B, 95%, 95%, 5%) were required to wash the system and to prevent carryover. The information-dependent mass-spectrometer experiment included one survey MS1 scan followed by 50 dependent MS2 scans. MS1 acquisition parameters were as follows: the mass range for MS2 analysis was 300–1250 m/z, and the signal accumulation time was 250 ms. Ions for MS2 analysis were selected on the basis of intensity with a threshold of 200 counts per second and a charge state from 2 to 5. MS2 acquisition parameters were as follows: the resolution of the quadrupole was set to UNIT (0.7 Da), the measurement mass range was 200–1800 m/z, and the signal accumulation time was 50 ms for each parent ion. Collision-activated dissociation was performed with nitrogen gas with the collision energy ramped from 25 to 55 V within the signal accumulation time of 50 ms. Analyzed parent ions were sent to the dynamic exclusion list for 15 s in order to get an MS2 spectra at the chromatographic peak apex. β-Galactosidase tryptic solution (20 fmol) was run with a 15-min gradient (5–25% buffer B) every two samples and between sample sets to calibrate the mass spectrometer and to control the overall system performance, stability, and reproducibility.

2.6. Peptide identification

Raw LC–MS/MS data were converted to .mgf peaklists with ProteinPilot (version 4.5, Sciex, Canada). For this procedure, we ran ProteinPilot in identification mode with the following parameters: no specific digestion, TripleTOF 5600 instrument, thorough ID search with detected protein threshold 95.0% against the UniProt human protein knowledgebase. For thorough protein identification, the generated peak lists were searched with the MASCOT (version 2.5.1, Matrix Science Ltd., UK) and X! Tandem (VENGEANCE, 2015.12.15, The Global Proteome Machine Organization) search engines against the UniProt human protein knowledgebase. The precursor and fragment mass tolerance were set at 20 ppm and 50 ppm, respectively. Database-searching parameters included the following: no specific digestion. For X! Tandem we also selected parameters that allowed quick check for protein N-terminal residue acetylation, peptide N-terminal glutamine ammonia loss or peptide N-terminal glutamic acid water loss. Resulting files were submitted to the Scaffold 4 software (version 4.2.1, Proteome Software, Inc., USA) for validation and further analysis. We used the local false discovery rate scoring algorithm with standard experiment-wide protein grouping. For the evaluation of peptide hits, a false discovery rate less than 1% was selected for peptides only. False positive identifications were based on reverse database analysis.

2.7. Protein–protein interaction network and enrichment analysis

Precursor proteins, which peptides were identified in at least two samples (one sample of a male donor and one sample of a female donor), were analysed via STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) website [6]. The list of the precursor protein identifies were uploaded and standard enrichment was performed using Gene Ontology (GO): Biological Process, Molecular Function or Cellular Component; Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathways; Protein Families (PFAM) Protein Domains. False discovery rate threshold was 0.05.

Acknowledgements

LC–MS/MS analysis was supported by the Russian Science Foundation (project no. 14-50-00131). Data analysis was supported by the Russian Foundation for Basic Research (project nos. 17-00-00461 and 16-04-01414).

Footnotes

^{Transparency document}

Supplementary data associated with this article can be found in the online version at 10.1016/j.dib.2018.04.018.

^{Appendix A}

Supplementary data associated with this article can be found in the online version at 10.1016/j.dib.2018.04.018.

Transparency document. Supplementary material

Supplementary material

mmc1.pdf^{(580.3KB, pdf)}

Appendix A. Supplementary material

Supplementary material

mmc2.xlsx^{(7.1MB, xlsx)}

References

1.Petricoin E.F., Belluco C., Araujo R.P., Liotta L.A. The blood peptidome: a higher dimension of information content for cancer biomarker discovery. Nat. Rev. Cancer. 2006;6:961–967. doi: 10.1038/nrc2011. [DOI] [PubMed] [Google Scholar]
2.Anderson N.L., Anderson N.G. The human plasma proteome history, character, and diagnostic prospects. Mol. Cell. Proteom. 2002 doi: 10.1074/mcp.r200007-mcp200. 〈http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.5699&rep=rep1&type=pdf〉 [DOI] [PubMed] [Google Scholar]
3.Ziganshin R., Arapidi G., Azarkin I., Zaryadieva E., Alexeev D., Govorun V., Ivanov V. New method for peptide desorption from abundant blood proteins for plasma/serum peptidome analyses by mass spectrometry. J. Proteom. 2011;74:595–606. doi: 10.1016/j.jprot.2011.01.014. [DOI] [PubMed] [Google Scholar]
4.Deutsch E.W., Csordas A., Sun Z., Jarnuczak A., Perez-Riverol Y., Ternent T., Campbell D.S., Bernal-Llinares M., Okuda S., Kawano S., Moritz R.L., Carver J.J., Wang M., Ishihama Y., Bandeira N., Hermjakob H., Vizcaíno J.A. The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res. 2017;45:D1100–D1106. doi: 10.1093/nar/gkw936. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Vizcaíno J.A., Csordas A., Del-Toro N., Dianes J.A., Griss J., Lavidas I., Mayer G., Perez-Riverol Y., Reisinger F., Ternent T., Xu Q.-W., Wang R., Hermjakob H. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 2016;44:11033. doi: 10.1093/nar/gkw880. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.von Mering C., Jensen L.J., Snel B., Hooper S.D., Krupp M., Foglierini M., Jouffre N., Huynen M.A., Bork P. STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res. 2005;33:D433–D437. doi: 10.1093/nar/gki005. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

mmc1.pdf^{(580.3KB, pdf)}

Supplementary material

mmc2.xlsx^{(7.1MB, xlsx)}

[bib1] 1.Petricoin E.F., Belluco C., Araujo R.P., Liotta L.A. The blood peptidome: a higher dimension of information content for cancer biomarker discovery. Nat. Rev. Cancer. 2006;6:961–967. doi: 10.1038/nrc2011. [DOI] [PubMed] [Google Scholar]

[bib2] 2.Anderson N.L., Anderson N.G. The human plasma proteome history, character, and diagnostic prospects. Mol. Cell. Proteom. 2002 doi: 10.1074/mcp.r200007-mcp200. 〈http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.5699&rep=rep1&type=pdf〉 [DOI] [PubMed] [Google Scholar]

[bib3] 3.Ziganshin R., Arapidi G., Azarkin I., Zaryadieva E., Alexeev D., Govorun V., Ivanov V. New method for peptide desorption from abundant blood proteins for plasma/serum peptidome analyses by mass spectrometry. J. Proteom. 2011;74:595–606. doi: 10.1016/j.jprot.2011.01.014. [DOI] [PubMed] [Google Scholar]

[bib4] 4.Deutsch E.W., Csordas A., Sun Z., Jarnuczak A., Perez-Riverol Y., Ternent T., Campbell D.S., Bernal-Llinares M., Okuda S., Kawano S., Moritz R.L., Carver J.J., Wang M., Ishihama Y., Bandeira N., Hermjakob H., Vizcaíno J.A. The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res. 2017;45:D1100–D1106. doi: 10.1093/nar/gkw936. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Vizcaíno J.A., Csordas A., Del-Toro N., Dianes J.A., Griss J., Lavidas I., Mayer G., Perez-Riverol Y., Reisinger F., Ternent T., Xu Q.-W., Wang R., Hermjakob H. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 2016;44:11033. doi: 10.1093/nar/gkw880. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.von Mering C., Jensen L.J., Snel B., Hooper S.D., Krupp M., Foglierini M., Jouffre N., Huynen M.A., Bork P. STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res. 2005;33:D433–D437. doi: 10.1093/nar/gki005. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Peptidomics dataset: Blood plasma and serum samples of healthy donors fractionated on a set of chromatography sorbents

Georgij Arapidi

Maria Osetrova

Olga Ivanova

Ivan Butenko

Tatjana Saveleva

Polina Pavlovich

Nikolay Anikanov

Vadim Ivanov

Vadim Govorun

Abstract

1. Data

Table 1.

Table 2.

Fig. 1.

Fig. 2.

Fig. 3.

2. Experimental design, materials and methods

2.1. Patients and specimens

2.2. Sample collection

2.3. Chemicals

2.4. Plasma and serum fractionation and peptide extraction

2.4.1. Peptide extraction using cation exchange Toyopearl, Bio-Gel or SP-Sephadex

2.4.2. Peptide extraction using anion exchange QAE-Sephadex

2.4.3. Peptide desorption from abundant blood proteins

2.5. LC–MS/MS analysis

2.6. Peptide identification

2.7. Protein–protein interaction network and enrichment analysis

Acknowledgements

Footnotes

Transparency document. Supplementary material

Appendix A. Supplementary material

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases