Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Jul 24.
Published in final edited form as: Proteomics. 2009 Jan;9(2):470–484. doi: 10.1002/pmic.200800507

Dissecting the Human Plasma Proteome and Inflammatory Response Biomarkers

Sudipto Saha 1,3, Scott H Harrison 1,3, Jake Yue Chen 1,2,3,
PMCID: PMC3402908  NIHMSID: NIHMS388997  PMID: 19105179

Abstract

A central focus of clinical proteomics is to search for biomarkers in plasma for diagnostic and therapeutic use. We studied a set of plasma proteins accessed from the HIP2 database, a larger set of curated human proteins, and a subset of inflammatory proteins, for overlap with sets of known protein biomarkers, drug targets, and secreted proteins. Most inflammatory proteins were found to occur in plasma, and over three times the level of biomarkers were found in inflammatory plasma proteins and their interacting protein neighbors compared to the sets of plasma and curated human proteins. Percentage overlaps with Gene Ontology terms were similar between the curated human set and plasma protein set, yet the set of inflammatory plasma proteins had a distinct ontology-based profile. Most of the major hub proteins within protein-protein interaction networks of tissue specific sets of inflammatory proteins were found to occur in disease pathways. The present study presents a systematic approach for profiling a plasma subproteome’s relationship to both its potential range of clinical application and its overlap with complex disease.

Keywords: Mass spectrometry, plasma proteome, inflammatory response, biomarkers, drug targets

1 Introduction

The identification of protein biomarkers for the early diagnosis, subtyping, and monitoring of treatment for chronic diseases, including cardiovascular diseases, cancer, arthritis, Alzheimer’s disease, pulmonary disease, and autoimmune diseases, is now a central research focus in clinical proteomics [1, 2]. While biomedical researchers have traditionally sought to study molecular and immunological responses in tissues collected from patients, a recent trend has been to identify sets of proteins that may respond to changes in disease states or drug treatments, and be detected in easy-to-access biofluids such as human plasma. For decades, biomedical researchers and clinicians have used plasma to isolate and measure proteins that can be useful for the diagnosis or monitoring of disease [3, 4]. With the recent completion of the human genome and increasing availability of sensitive and reproducible analytical platforms, e.g., protein microarrays and liquid chromatography coupled tandem mass spectrometers (LC-MS/MS), researchers can now study hundreds or even thousands of proteins expressed in patient tissues or biofluids in parallel, therefore creating new opportunities for developing and managing disease [57]. An emerging focus has been to use panel biomarkers consisting of more than one protein and peptide marker to achieve a higher level of sensitivity and specificity [8]. Interpreting patterns of protein expressions in the human plasma proteome—sometimes very complex due to noisy signals and incomplete data sets—as a function of disease states often involves high-throughput data management, non-trivial statistical data analysis, and computational interpretation of data in the context of physiological and molecular pathways. A general process for biomarker discovery in the plasma proteome can be viewed in three steps: 1) defining the detectable “human plasma proteome” based on a specific analytical platform such as mass spectrometry or antibodies; 2) selecting differentially expressed proteins between different experimental conditions; and 3) identifying candidate protein disease biomarker profiles to achieve decent detection sensitivity and specificity [9].

Understanding the constituents, variability, and detectability of the human plasma proteome has been an elusive and challenging effort even today. With the release of the Human Proteome Organization (HUPO) Plasma Proteome Project (PPP) core dataset of 3020 plasma proteins in 2005, annotation of the human plasma proteome was initially performed by several groups [1013]. Ping et al. [10] studied the HUPO PPP core datasets of 3020 proteins, and described plasma proteins to be comprised of a diverse group of proteins from the human proteome, including glycoproteins, DNA binding proteins, coagulation pathway, cardiovascular, liver, inflammation, and monocular phagocyte proteins. In this study, liver was the dominant tissue-based source of proteins, although many of the proteins detected are also expressed in other tissues. Berhane et al. [11] observed 354 proteins of particular interest for cardiovascular research to be found in the core dataset of HUPO. They classified these proteins into eight categories: protein markers of inflammation in cardiovascular disease, vasoactive and coagulation proteins, signal transduction pathway proteins, growth and differentiation-associated proteins, cytoskeletal proteins, transcription proteins, channel and receptor proteins, and heart failure and remodeling-related proteins. Muthusamy et al. [12] identified 3778 gene-based plasma proteins in the literature to generate a plasma proteome database and GO profile, and compared this set of proteins with the contents of the HUPO PPM core data set of 3020 proteins (mapping to 2446 genes). Liu et al. [13] performed a gene ontology annotation of cellular components in the plasma proteome generated from IMS-MS of 9087 plasma proteins identified by independent study and found differences between GO annotation percentages of human versus plasma proteomes. Recently, Saha et al. [14] reported a comprehensive catalogue of plasma proteins from healthy individuals called HIP2 (Healthy Human Individual’s Integrated Plasma Proteome), which collected set of 11,588 unique human plasma proteins detectable with different shotgun mass spectrometry methods [15]. In this study, while each human plasma protein has peptide-based evidence underlying its identification, only 106 high-abundant proteins were common among four different shotgun proteomics experiments, revealing an ongoing challenge in improving reproducibility among low-abundant proteins. Sub-proteomic analyses of the plasma proteome have also begun to evaluate biological processes of potential biomedical significance such as inflammatory pathways [10, 11]. Chronic inflammation is especially of interest for human plasma protein studies, because it is characterized by a response of tissue destruction by inflammatory cells like macrophages and plasma cells, and it has been found to be a factor associated with a wide variety of chronic diseases such as cancer [2, 1618]. While inflammatory proteins are generally recognized as being present across many varieties of chronic diseases including cancer [19], the extent to which inflammatory proteins may be used in disease biomarker studies has not yet been studied systematically.

In this study, we dissected the human plasma proteome by examining its overlap with other datasets of interest: especially concerning proteins associated with drug treatment and biomarkers, and the subset of inflammatory proteins in the plasma proteome. We first analyzed the prevalence of candidate cancer biomarkers, drug targets and secreted proteins in human plasma, inflammatory proteins and expanded inflammatory proteins. Second, statistical comparisons of Gene Ontology (GO) were performed between the set of human Uniprot proteins and the plasma proteome, and between the plasma proteome and the set of inflammatory protein. Third, tissue specific protein-protein interactions (PPIs) of inflammatory proteins were calculated for each inflammatory protein subset present in each of five major tissues: brain, heart, lung, kidney and liver. We observed many of the interacting protein partners to be disease specific. Fourth, the hub proteins (10 or more interacting protein partners) in the PPIs networks were searched in the pathway database and found to be related to disease biology pathways.

Our overall objective is to foster within the biomedical research community, for the first-time, a systematic survey of the human plasma proteome relating to the human proteome for those interested in plasma proteomics applications. For purpose of application, we further sought to inspect detail of how the general inflammatory response may differentiate across tissues, biomolecular interaction networks and pathways to provide insight concerning the non-trivial dynamics that may be impeding efforts to establish sensitive and specific panels of biomarkers.

2 Materials and methods

2.1 Datasets

We used different datasets of proteins in our study. The description of each database and dataset used in our study is below:

HIP2 database

Human plasma proteins were collected from the HIP2 database, which currently contains 11,588 non-redundant International Protein Index (IPI) number protein entries [14]. This database is a comprehensive collection of healthy human plasma proteins, and has protein data mappings of supporting peptide evidence from several high-quality and high-throughput mass-spectrometry (MS) experimental data sets.

Human Plasma Proteome and Filtered Human Plasma Proteome sets

We defined the set of 10,138 human plasma proteins as the human plasma proteome, and the plasma proteins from HIP2 with two or more unique peptide sequences associated with their identification as the filtered human plasma proteome (n=7817). With peptide count thresholds of ≥3, ≥5 and ≥10, we defined additional filtered sets of plasma proteins (respectively, n={5392, 2718, 747}) and further gathered UniProt names corresponding to a high-confidence plasma protein set reported by States et al. [20]. The high-confidence plasma protein set was a subset of HUPO, and was based on a confidence level of at least 95% as determined by multiple hypothesis-testing techniques and coding region lengths of genes.

Curated Human Proteome set

For a comprehensive list of curated human proteins, 17,807 human proteins were extracted from UniProtKB/Swiss-Prot Release 54.6 of 04-Dec-2007 with the sequence retrieval system (SRS; http://srs.ebi.ac.uk/). We defined the 17,807 human Uniprot proteins as the curated human proteome. We generally considered the coverage of the plasma proteome compared to the curated human proteome to be sufficiently high (9995/17,807=56%) and reasonable for an initial analysis. Out of 10,138 human plasma proteins from HIP2, 9995 proteins have distinct Uniprot names; In our set-to set comparison, we used Uniprot names as identifiers.

Inflammatory protein and inflammatory plasma sets

The inflammatory protein data set was obtained by using gene ontology queries against the human Uniprot protein database. The gene product overlaps to inflammatory response (GO:0006954) and its descendants in the gene ontology were found with GO::TermFinder [21], and identified as the set of inflammatory proteins. This set was defined based on presence within both the human plasma proteome and the curated human proteome to identify 204 and 291 Swiss-Prot accession number proteins respectively. We defined the set of 291 proteins as the inflammatory protein set (I), and the set of 204 proteins as the inflammatory plasma protein set (Ip).

Cancer Biomarker Data set

The cancer biomarker list of 1261 proteins, mapping to 1049 Uniprot names, was from Anderson laboratory [22]. This data was compiled from literature and other sources and believed to be differentially expressed in human cancer [22]. In this data set, of the 34 biomarkers with more than 1000 citations, 79% were reported to be plasma proteins and 56% were reported as being used for clinical diagnosis (89% of these were reported as being plasma proteins). Of the 28 biomarkers with between 500 and 1000 citations, 57% were reported as being plasma proteins, but only 7% were reported as being used clinically.

DrugBank Data set

A list of 2,396 drug target proteins, mapping to 837 human Uniprot names, was obtained by parsing entries from the DrugBank database that combines detailed drug/chemical data with comprehensive drug target or protein information where each entry contains >80 data fields with half of the information being devoted to drug chemical structure and the other half devoted to drug target proteins [23]. The drug target protein source was from different species; in our study we worked on human proteins (837 Uniprot proteins).

Secreted Protein data set

The human secreted protein list of 1191 proteins, mapping to 1187 Uniprot names, was obtained from the Secreted Protein Database (SPD), which consists of a core dataset and a reference dataset. The core dataset of SPD contains 18,152 secreted proteins retrieved from Swiss-Prot/TrEMBL, RefSeq and CBI-Gene of human, mouse and rat [24]. All the entries of SPD were ranked according to the prediction confidence, and contain both experimental and computationally predicted secreted proteins. For our analysis using SPD, we used human proteins coming from Rank0 that consist of the manually curated set of Swiss-Prot proteins.

2.2 Gene Ontology annotations

We studied the three major GO vocabularies – biological process, molecular function and cellular component – using GO::TermFinder, a tool for accessing and evaluating GO annotations, and calculated significance for comparisons between sets of proteins [21]. We used the Gene Ontology project Open Biomedical Ontologies (OBO) file version 1.2 format revision 5.640, and the Gene Ontology project human annotation file revision 1.75 [25]. 9576 proteins of the 10,138 plasma proteins were observed to have one or more associated GO terms. Of the 7817 proteins in the filtered human plasma proteome, 7354 had GO annotations. Of the 17,807 proteins in the curated human proteome, 16,196 had GO annotations. We defined the filtered subset of 7354 proteins in the human plasma proteome with GO annotations as P. We defined the filtered subset of 16,196 proteins in the curated human proteome with GO annotations as H.

To account for the relative differences in sizes of protein sets and frequencies of GO annotations, we based the significance of annotation frequencies on hypergeometric calculation of an adjusted p value using a simulation-based correction based on the fraction (padj) of 1000 independent null-hypothesis simulations (samplings drawn from the larger data set) having any node with a p value equal or better than the p value for that node in the smaller data set, where the node is the GO term to which elements are annotated [21]. Comparisons were conducted between H and P, and between P and Ip. In addition to our criterion for statistical significance (padj≤0.001), initial selection of Gene Ontology categories was based on those GO terms whose shortest distance to the root of the GO hierarchy was 4 (i.e., GO level 4). In general, a GO level is the shortest distance to the root of the direct acyclic graph GO hierarchy, where level 0 is the root of the hierarchy, level 1 are terms “biological process”, “cellular component” and “molecular function” and higher number levels relate to increased qualitative specificity.

2.3 Protein-protein interactions (PPIs) study and visualization

A tool for visually exploring biological networks, Cytoscape version 2.5.1 with the APID2NET plugin, was used to expand and study the interacting proteins associated with the initial set of 204 proteins [26]. Cytoscape uses a December 2006 release of the Biomolecular Interaction Network Database (BIND) for protein-protein interactions data. BIND records are based on interactions as they have been shown experimentally and published in at least one peer-reviewed journal [27]. Two search filters of APID2NET were used: i) connection level; and ii) number of experimental methods. A connection level set to 1 collected first neighbors of the initial set of 710 proteins/nodes, and a connection level set to 2 collected first and second neighbors of the initial set of 2866 proteins/nodes. We set the number of experimental methods to be 2, meaning that the PPIs have two or more types of experimental evidence to avoid false positives from single experiments. Network views of protein interactions were generated with APID2NET [28], and showed proteins as nodes, and interactions as edges on the network. PPI networks (connection level=1) were built for five tissues: heart, brain, lung, kidney and liver. Tissue-specificities for inflammatory proteins were based on UniProtKB/Swiss-Prot entry comments sections as accessed from the ExPASy Proteomics Server [29].

2.4 Pathway database searching

We conducted a pathway analysis of proteins in the five tissue-based PPIs for those hub proteins whose degree is ≥10. We studied the presence of these hub proteins in pathways based on four pathway databases: BioCarta (http://cgap.nci.nih.gov/Pathways/BioCarta_Pathways/), NCI-Curated (http://pid.nci.nih.gov), Reactome [30] and ProteinLounge (http://www.proteinlounge.com).

3 Results and discussion

For practical application, an issue for our investigation was to evaluate the plasma proteins and inflammatory proteins inferred by mass spectrometry for diagnostic and therapeutic use. We analyzed the presence of cancer biomarkers, drug targets and secreted proteins across different data sets: i) curated human proteome, ii) human plasma proteome, iii) inflammatory proteins and inflammatory plasma proteins, and iv) inflammatory plasma expanded proteins. Extensive GO cross-comparison analyses were performed across these sets. To check the specificity of inflammatory proteins present in the plasma, tissue specific PPIs were studied in five major tissues: brain, heart, lung, kidney and liver. The hub proteins in the network analysis coming from the inflammatory protein seed list were then searched in four pathway databases: NCI-curated, Reactome, BioCarta, and ProteinLounge.

3.1 Prevalence of Biomarkers, Drug Targets, and Secreted Proteins

Protein biomarkers and drug targets are studied here in the context of curated human proteome and human plasma proteome, because we believe they indicate a range of clinical application opportunities. In particular, protein biomarkers in the plasma may have especially significant utility for clinical diagnosis. Ideally, however, a tissue-specific drug target protein should not be present in the plasma in high concentration, because the drug would then bind to the proteins in the plasma and not reach the specific tissue(s) [31]. Tissue-specific proteins that are secreted outside of the cell membrane may eventually reach the plasma, so we made an attempt to study the relationship of plasma proteins to secreted proteins. As a means for studying potential association with chronic disease, we surveyed both plasma and non-plasma inflammatory proteins of the curated human proteome for biomarkers, drug targets and secreted proteins.

3.1.1 Curated human proteome, human plasma proteome and filtered human plasma proteome comparison

We compared the curated human proteome and human plasma proteome by examining its percentage of overlap in each set for biomarkers, drug targets and secreted proteins. As shown in Table 1, the human plasma proteome and filtered human plasma proteome have slightly higher percentages of biomarkers, drug targets and secreted proteins relative to the curated human proteome. Percentage differences of overlap with biomarkers and drug targets were <0.5% between the filtered human plasma proteome and the human plasma proteome. The set of secreted proteins had the smallest percentage-wise distinction of overlap between the curated human proteome and the human plasma proteome, and was the only category to decrease in overlap between the unfiltered and filtered human plasma proteomes. All percentage overlaps were less than 10%, and these results highlight the need for using additional classifications of protein subsets within these proteomes to uncover higher amounts of overlap with biomarkers and drug targets. When adjusting the filter from peptide count ≥2, to thresholds of 3 and 5, changes to the overall percentage overlaps were within ±2%. With a threshold peptide count ≥10, the changes to percentage overlaps were also similarly minor for candidate biomarkers and drug targets, but the percentage overlap more than doubled for secreted proteins. We also conducted the percentage overlap comparison with a high-confidence set of plasma proteins from States et al. [20], and found these overlaps to be distinctly different from what was found for each of the specific peptide count thresholds. Since our study relates primarily to usage of actual output and performance of existing clinical proteomics technologies, we opted to use the peptide count threshold ≥2 as reported in [10, 13].

Table 1.

Percentage profile of candidate biomarkers, drug target and secreted proteins for the curated human proteome set, unfiltered and filtered plasma proteome sets.

Curated human proteome (%)
n=17,807
Human plasma proteome (%) peptides ≥1
n=10,138
Filtered human plasma proteome (%) peptides ≥2
n=7817
Filtered-3 human plasma proteome (%) peptides ≥3
n=5392
Filtered-5 human plasma proteome (%) peptides ≥5
n=2718
Filtered-10 human plasma proteome (%) peptides ≥10
n=747
States et al 2006 set of plasma proteins
n=436
Candidate biomarkers 5.9 8.4 8.7 9.1 10.4 12.7 8.2
Drug target proteins 4.7 6.9 7.2 7.4 6.8 7.5 4.3
Secreted proteins 6.7 7.9 7.6 7.7 9.5 20.00 13.7

3.1.2 Inflammatory proteins

We next examined inflammatory proteins and their interacting protein neighbors for overlap with biomarkers, drug targets and secreted proteins. The examined datasets were the set of 291 inflammatory proteins, the subset of 204 inflammatory plasma proteins, and datasets based on expansions with interacting protein neighbors. As shown in Table 2, restricting the initial set of 291 inflammatory proteins to the 204 proteins in the plasma proteome produced only marginal changes (≤1%) in percentage overlap with biomarkers, drug targets and secreted proteins. Overlap with biomarkers increased by 4% with a first neighbor-based protein interaction expansion on the set of inflammatory plasma proteins, yet an overall percentage-wise decrease was found for the second neighbor-based protein interaction expansion. We calculated p-value based on Fisher’s exact test to test whether inflammatory proteins present in plasma (n=204), first neighbor expansion proteins (n=710) and second neighbor expansion proteins (n=2866) are biomarkers or non biomarkers. We got p-value of 0.1977 between inflammatory proteins present in plasma and first neighbor expansion proteins; p-value of 0.0006 between first neighbor expansion proteins and second neighbor expansion proteins; p-value of 0.0141 between inflammatory proteins present in plasma and second neighbor expansion proteins (See Table S1–S3). Protein interaction expansions to the set of inflammatory plasma proteins produced consecutive reductions in overlap with drug targets and secreted proteins, most significantly for secreted proteins and the contingency tables of Fisher’s exact test are shown in Table S4–S9. Based on protein interactions, both first neighbor and second neighbor expansions were found to be successful in overlapping with higher numbers of biomarkers, drug targets and secreted proteins than observed for the curated human proteome or human plasma proteome.

Table 2.

Overlapping analysis between inflammatory proteins/expanded network proteins and different sets of proteomes of interests.

Inflammatory proteins (%) Inflammatory proteins present in plasma proteome (%) Inflammatory plasma expanded proteins* (%) Inflammatory plasma expanded proteins** (%)
Plasma proteome 70 100 80 75
Candidate biomarkers 27 26 30 20
Drug target proteins 14 15 12 8.0
Secreted proteins 39 39 19 8.7
*

 Degree of neighborhood=1

**

Degree of neighborhood=2

For an additional survey of biomarkers, drug targets and secreted proteins, we next distinguished for plasma versus non-plasma proteins in the set of inflammatory proteins and its related set of expanded proteins. The subsets of non-plasma proteins in the inflammatory and expanded sets of inflammatory proteins had elevated percentages of overlap for biomarkers and secreted proteins as shown in Table 3. We found the greatest increases between plasma and non-plasma protein sets to occur for percentage overlaps with inflammatory protein biomarkers. We found this trend to be consistent with the presence of inflammatory protein biomarkers outside of plasma. The inflammatory response proteins and their interacting protein partners thus implicate possibilities for clinical application related to cancer detection and therapy. Investigating the relationship of these prospective sets of proteins with functional annotations, disease biology and pathways may therefore yield insight on the range and activity of these proteins in cellular and organism physiology.

Table 3.

Comparison counts with a distinction for plasma proteins between inflammatory-related proteins and expanded proteins with candidate biomarkers, drug targets and secreted proteins.§

Comparison Set Ip* Ip* Ep* Ep*
Candidate biomarkers 54 (26%) 25 (29%) 119 (30%) 40 (31%)
Drug target proteins 31 (15%) 10 (11%) 49 (12%) 10 (7.8%)
Secreted proteins 79 (39%) 35 (40%) 48 (12%) 22 (17%)
§

IpIp = I; EpEp = E; IpEp = ∅; IpEp = ∅;

*

- denotes intersection with comparison set;

Ip: n=204; Ip: n=87; Ep: n=399; Ep: n=128;

Ip*: subset of the inflammatory protein set in the plasma proteome that intersects with comparison set;

Ip*: subset of the inflammatory protein set, not in the plasma proteome, that intersects with comparison set;

Ep*: subset of the expanded inflammatory protein set (degree of neighbor = 1) in the plasma proteome that intersects with comparison set;

Ep*: subset of the expanded inflammatory protein set (degree of neighbor = 1), not in the plasma proteome, that intersects with comparison set.

3.2 Comparative gene functional category analysis

We use gene ontology as a primary tool to annotate and compare protein functions between the annotated subsets of the curated human proteome (H) and the filtered human plasma proteome (P) to make comparisons. We conducted two sets of comparisons involving: i) the annotated subsets of the curated human proteome and filtered human plasma proteome (H vs P); and ii) the filtered human plasma proteome and inflammatory plasma proteins (P vs Ip). To account for the relative difference in sizes between protein sets and their frequency ranges of GO annotations, we determined significance based on an adjusted p-value as calculated from 1000 null-hypothesis simulations of hypergeometric comparisons based on random selection (padj≤0.001).

Following our initial selection of statistically significant, level 4 GO terms, we made further refinements to the set of GO terms to be used for analysis. The set of biological process GO terms was pruned for general redundancies with inflammatory response resulting in the removal of 35 GO terms from our analytical comparisons. Calculation of percentages was based on the sets of proteins annotated to each of three selected sets of GO terms: 11 cellular component terms, 18 molecular function terms and 14 biological process terms.

3.2.1 Curated human proteome and filtered human plasma proteome

We studied the annotations of 43 GO terms as they are applied to the curated human proteome and filtered human plasma proteome. Differences between the GO annotation percentages of the H and P sets were <2%, and the overall distributions were identical as shown in Figure 1. Across the three GO vocabularies, for the 13 GO terms in H and P with <2% annotation frequencies, the P set uniformly had higher percentages except for the indexed P8 category (response to bacterium, GO:0009617). For the 8 cellular component and 10 molecular function GO terms in H and P with >2% annotation frequencies, the H set had only a slightly higher number of GO terms with greater percentages than the P set. For biological process, the P set had a higher percentage than the H set for 8 of the 12 GO terms with >2% annotation frequencies. While the differences of percentages between GO annotations in the H and P sets were low (<2%), 34 of the 43 GO terms were statistically significant for the overall comparison as presented in Tables 46.

Figure 1.

Figure 1

Gene ontology (GO) percentages for the curated human proteome (H), the filtered plasma proteome (P), and the inflammatory response plasma protein set (I). The indexed labels C1,…,C11; F1,…,F18; and P1,…,P14 are defined in Tables 46, and correspond to cellular component, molecular function, and biological process vocabularies respectively.

Table 4.

Selected cellular component GO terms applied to comparisons between H versus P protein sets and P versus Ip protein sets. The results of comparisons are shown in the last two columns; significance (Y) is based on an adjusted p-value as calculated from 1000 null-hypothesis simulations of hypergeometric comparisons based on random selection, padj≤0.001; n/a is when there are not any representatives of the GO term in the Ip set.

Index Level GO ID GO term H vs P P vs Ip
C1 6 GO:0005783 endoplasmic reticulum N N
C2 6 GO:0005739 mitochondrion N N
C3 6 GO:0005765 lysosomal membrane N n/a
C4 6 GO:0005794 golgi apparatus N N
C5 5 GO:0016459 myosin complex Y n/a
C6 4 GO:0005581 collagen Y n/a
C7 5 GO:0044459 plasma membrane part Y N
C8 6 GO:0005856 cytoskeleton Y N
C9 6 GO:0005634 nucleus Y N
C10 3 GO:0031012 extracellular matrix Y N
C11 4 GO:0005615 extracellular space N Y
Table 6.

Selected biological process GO terms applied to comparisons between H versus P protein sets and P versus Ip protein sets. The results of comparisons are shown in the last two columns; significance (Y) is based on an adjusted p-value as calculated from 1000 null-hypothesis simulations of hypergeometric comparisons based on random selection, padj≤0.001; n/a is when there are not any representatives of the GO term in the Ip set.

Index Level GO ID GO term H vs P P vs Ip
P1 5 GO:0043062 extracellular structure organization and biogenesis Y n/a
P2 4 GO:0007275 multicellular organismal development Y N
P3 4 GO:0007049 cell cycle Y N
P4 4 GO:0006950 response to stress Y Y
P5 4 GO:0016265 death Y Y
P6 4 GO:0006928 cell motility Y Y
P7 4 GO:0007154 cell communication Y Y
P8 5 GO:0009617 response to bacterium N Y
P9 4 GO:0022402 cell cycle process Y N
P10 4 GO:0008283 cell proliferation Y N
P11 4 GO:0007155 cell adhesion Y N
P12 4 GO:0006810 transport Y N
P13 4 GO:0002520 immune system development N Y
P14 4 GO:0006955 immune response N Y

Interestingly, our findings show the percentage ontology classifications of the H and P sets to be much more similar than has been previously reported in studies where the proteome coverage was low or the data came from a single instrument type [10, 12, 13]. Possible differences likely arose from our use of a more expansive and up-to-date reference set for the plasma proteome, our choice of a reference set of human proteins, and our use of more recent ontology and annotation files. Another source of variation between our results and the other proteomic analyses may have been our inclusion of all evidence codes including the inferred from electronic annotation code (IEA). There was some notable, almost identical, similarity however between some of the percentages in our study, and percentages from a study based on a different plasma proteome database [12]. Similarities were found for cell communication (P7: 24% versus 23.7% respectively), immune response (P14: 4.5% versus 4.3% respectively) and hydrolase activity (F14: 8.8% versus 9% respectively). The similarities are surprisingly close considering that our percentages are calculated based on a different overall set of GO terms and, unlike Muthusamy et al. [12], we did not include a category for proteins without annotation. Significant differences (two-fold) were observed however for transport (P12) and signal transducer activity (F18). We found consistency of sorted rankings for percentages of different GO terms, although percentages and proteomic comparisons between Ping et al. [10], Liu et al. [13], and our study varied. Nucleus (C9) and protein binding (F11) were the top-ranked GO categories of cellular component and molecular function for both our study (see Figure 1) and Ping et al. [10]. Nucleus was the second highest ranking category for Liu et al. [13]. For biological process terms across both human and plasma protein sets, transport was in the top three GO terms in our study and the most prevalent GO term in Ping et al. [10]. When comparing the human and plasma protein sets, Liu et al. [13] found a noticeable increase in extracellular component (+5%) and a decrease in nuclear proteins (-12%). We found similar directions of change, but of a reduced magnitude; for extracellular matrix and space (C10 and C11), there was a 1% increase, and for nucleus (C9), there was a 2% decrease. The biological processes of cell proliferation, immune response and cell motility in our analysis matched the percentage frequency rank order of Ping et al., as did the molecular functions of protein binding, ion binding and nucleotide binding. We did not find a similarly high magnitude of percentage differences between plasma and human protein sets. In general, our study did not find the many two-fold or more changes in percentages common to most of Ping et al. (for instance, from a human protein set to plasma protein set comparison, they found changes of 2% to 4%, and 2% to 6% for extracellular matrix and space respectively). We did however find, consistent with findings reported in Ping et al., plasma proteins to be increasingly over-represented for low frequency GO annotations relative to human proteins.

Our findings may also help further resolve proposed interpretations concerning the percentage differences between the plasma and the entire human proteome. Intriguingly, with plasma, Liu et al. and Ping et al. report similar percentages for their respective nuclear and nucleus component categories (18%). Liu et al. find a rise of the percentage (to 30%) observed in a human protein set, yet Ping et al. find an unexpected reduction (to <14%) which they suggested may be the result of the secretion of cellular breakdown products into circulation. We found 40% of the human plasma proteome to consist of nucleus proteins, with a minor rise in value to 42% for the curated human proteome. Liu et al. suggest that mimicry of percentages in the human proteome by the plasma map would indicate that it is composed of random assignments, and they find evidence against random assignment based on significant differences in percentage. We did not find significant differences in GO term percentages but our tests for statistical significance would reject a scenario of completely random assignment.

3.2.2 Filtered human plasma proteome and inflammatory plasma proteins

We compared GO annotations between the human plasma proteome and inflammatory plasma proteins. Across the set of comparisons between 43 GO terms shown in Tables 46 and Figure 1, Ip did not have any representative proteins for 3 of the cellular component terms: lysosomal membrane, myosin complex and collagen (C3, C5 and C6), 6 of the molecular function terms: structural constituent of cytoskeleton, chromatin binding, helicase activity, extracellular matrix structural constituent, RNS polymerase II transcription factor activity and GTPase regulator activity (F1, F2, F3, F4, F8 and F9) and a single biological process term: extracellular structure organization and biogenesis (P1). The distribution of GO term percentages on the Ip set was broadly distinguishable from the P set. Differences between the GO term percentages of the P and Ip sets were generally >2%. Differences were >10% for 3 of the 8 GO terms within the P versus Ip cellular component comparison, and 2 of the GO terms within each of the 12 molecular function and 13 biological process comparisons. For larger differences among cellular component GO terms (>10%), the Ip set had higher percentages for plasma membrane part (C7) and extracellular space (C11), and a smaller percentage for nucleus (C9). For larger differences among molecular function GO terms (>10%), the Ip set had higher percentages for protein binding (F11) and signal transducer activity (F18). For larger differences among biological process GO terms (>10%), the Ip set had a higher percentage for response to stress (P4) and a smaller percentage for transport (P12). For a percentage difference threshold of >8%, the Ip set also had a higher percentage for immune response (P14) and a smaller percentage for cell communication (P7). Statistically significant comparisons between the P and Ip sets were observed for 1 cellular component GO term : extracellular space (C11), 1 molecular function GO term: signal transducer activity (F18) and 7 biological process GO terms: response to stress, death, cell motility, cell communication, response to bacterium, immune system development and immune response (P4, P5, P6, P7, P8, P13 and P14). Extracellular space and signal transducer activity were exclusively significant for the P vs Ip comparison. The statistically significant distinctions of biological process with the inflammatory protein set characterize a system of cellular communication, motility, immune response and system development, death, and response to stress. For biological process, the response to bacterium, immune response and immune system development terms were exclusively significant for the P vs Ip comparison. The Ip set is based on the biological process GO term of inflammatory response, and the interdependency among concepts within the biological process GO vocabulary suggested that over-representation may occur for other biological process GO terms, as was indeed found.

Overall, we found inflammatory proteins to be annotated with functions, processes and cellular components in ways that would distinctly separate this class of proteins from the human plasma proteome in general. Beyond this observed distinction in functional annotation for inflammatory proteins, objectives for direct clinical application may be further advanced by investigating the specificity of inflammatory proteins to multiple aspects of different complex diseases. In particular, we next sought to complement the findings of our ontology analysis by investigations of tissue specificity, interactions between proteins, and biological pathways.

3.3 Tissue specific PPIs

Inflammation processes are tissue specific [32], thus we investigated overlap of inflammatory plasma proteins in tissue. We observed in which tissues those inflammatory proteins were expressed, and found these proteins to be expressed in heart, brain, lung, kidney, pancreas, skin, colon, testis and ovary, liver and spleen. For our study, we evaluated five main organs such as heart, brain, lung, kidney and liver for which the inflammatory proteins were also present in amounts ≥ 20. The UniProt IDs of the proteins expressed in these tissues has been shown in Table 7. For the visualization of PPIs, we used plugin software APID2NET in the Cytoscape with settings described in Section 2.2 of this paper. All the GO terms mentioned in the tissue PPIs analysis are related to cancer prognosis [3335].

Table 7.

Expression of inflammatory proteins (present in plasma) in different tissue organs.

Uniprot name Uniprot Primary Accession number Protein name Gene name Organs
A1AG1_HUMAN P02763 Alpha-1-acid glycoprotein 1 ORM1 Liver
A1AG2_HUMAN P19652 Alpha-1-acid glycoprotein 2 ORM2 Liver
A2AP_HUMAN P08697 Alpha-2-antiplasmin SERPINF2 Liver
AACT_HUMAN P01011 Alpha-1-antichymotrypsin SERPINA3 Brain, Liver
ADO_HUMAN Q06278 Aldehyde oxidase AOX1 Brain, Heart, Lung, Kidney, Liver
ADPRH_HUMAN P54922 ADP-ribosylarginine hydrolase ADPRH Heart, Lung, Kidney, Liver
AOC3_HUMAN Q16853 Membrane copper amine oxidase AOC3 Heart, Lung, Kidney
APOL2_HUMAN Q9BQE5 Apolipoprotein-L2 APOL2 Brain, Lung, Kidney, Liver
ATRN_HUMAN O75882 Attractin ATRN Liver
B4GT1_HUMAN P15291 Beta-1,4-galactosyltransferase 1 B4GALT1 Brain
BMP2_HUMAN P12643 Bone morphogenetic protein 2 BMP2 Brain, Heart, Lung, Liver
C163A_HUMAN Q86VB7 Scavenger receptor cysteine-rich type 1 protein M130 CD163 Liver
C3AR_HUMAN Q16581 C3a anaphylatoxin chemotactic receptor C3AR1 Brain, Heart, Lung
CCL17_HUMAN Q92583 C-C motif chemokine 17 CCL17 Lung
CCL18_HUMAN P55774 C-C motif chemokine 18 CCL18 Lung
CCL21_HUMAN O00585 C-C motif chemokine 21 CCL21 Heart, Liver
CCL23_HUMAN P55773 C-C motif chemokine 23 CCL23 Lung, Liver
CCL26_HUMAN Q9Y258 C-C motif chemokine 26 CCL26 Heart
CCL8_HUMAN P80075 C-C motif chemokine 8 CCL8 Brain, Heart, Lung, Liver
CDO1_HUMAN Q16878 Cysteine dioxygenase type 1 CDO1 Brain, Heart, Liver
CEBPB_HUMAN P17676 CCAAT/enhancer-binding protein beta CEBPB Lung, Kidney
CFAH_HUMAN P08603 Complement factor H CFH Liver
CHST1_HUMAN O43916 Carbohydrate sulfotransferase 1 CHST1 Brain
CP4FB_HUMAN Q9HBI6 Cytochrome P450 4F11 CYP4F11 Heart, Kidney Liver
CXCR4_HUMAN P61073 C-X-C chemokine receptor type 4 CXCR4 Brain, Heart, Lung, Kidney, Liver
CXL13_HUMAN O43927 C-X-C motif chemokine 13 CXCL13 Liver
EDG3_HUMAN* Q99500 Sphingosine 1-phosphate receptor 3 S1PR3 Heart, Kidney, Liver
EPCR_HUMAN Q9UNN8 Endothelial protein C receptor PROCR Heart, Lung, Kidney, Liver
FETUA_HUMAN P02765 Alpha-2-HS-glycoprotein AHSG Liver
FHR1_HUMAN Q03591 Complement factor H-related protein 1 CFHR1 Liver
FHR5_HUMAN Q9BXR6 Complement factor H-related protein 5 CFHR5 Liver
FPRL1_HUMAN** P25090 N-formyl peptide receptor 2 FPR2 Lung
GCR_HUMAN P04150 Glucocorticoid receptor NR3C1 Heart
HDAC9_HUMAN Q9UKV0 Histone deacetylase 9 HDAC9 Brain, Heart
ICBR_HUMAN P57730 Caspase-1 inhibitor Iceberg ICEBERG Heart
IL1AP_HUMAN Q9NPH3 Interleukin-1 receptor accessory protein IL1RAP Lung, Liver
IL1F6_HUMAN Q9UHA7 Interleukin-1 family member 6 IL1F6 Brain
IRAK2_HUMAN O43187 Interleukin-1 receptor-associated kinase- like 2 IRAK2 Lung, Kidney, Liver
LT4R1_HUMAN Q15722 Leukotriene B4 receptor 1 LTB4R Brain, Heart, Liver
MEFV_HUMAN O15553 Pyrin MEFV Brain, Heart, Lung, Kidney, Liver
MGLL_HUMAN Q99685 Monoglyceride lipase MGLL Brain, Heart, Lung, Kidney, Liver
MMP25_HUMAN Q9NPA2 Matrix metalloproteinase-25 MMP25 Lung
NDST1_HUMAN P52848 Bifunctional heparan sulfate N- deacetylase/N-sulfotransferase 1 NDST1 Heart, Liver
NFAC3_HUMAN Q12968 Nuclear factor of activated T-cells, cytoplasmic 3 NFATC3 Heart, Kidney
NFAC4_HUMAN Q14934 Nuclear factor of activated T-cells, cytoplasmic 4 NFATC4 Lung, Kidney
NFAM1_HUMAN Q8NET5 NFAT activation molecule 1 NFAM1 Lung
NMI_HUMAN Q13287 N-myc-interactor NMI Brain, Liver
NOD1_HUMAN Q9Y239 Nucleotide-binding oligomerization domain-containing protein 1 NOD1 Heart, Lung, Kidney, Liver
NOX4_HUMAN Q9NPH5 NADPH oxidase 4 NOX4 Brain, Heart, Kidney
PA24C_HUMAN Q9UP65 Cytosolic phospholipase A2 gamma PLA2G4C Heart
PTAFR_HUMAN P25105 Platelet-activating factor receptor PTAFR Heart, Lung
SAA_HUMAN P02735 Serum amyloid A protein SAA1 Liver
SAA4_HUMAN P35542 Serum amyloid A-4 protein SAA4 Liver
SN_HUMAN Q9BZZ2 Sialoadhesin SIGLEC1 Brain, Lung, Liver
SPR1_HUMAN Q15743 Sphingosylphosphorylcholine receptor GPR68 Brain, Lung
STAB1_HUMAN Q9NY15 Stabilin-1 STAB1 Liver
STAT3_HUMAN P40763 Signal transducer and activator of transcription 3 STAT3 Brain, Lung, Kidney, Liver
THRB_HUMAN P00734 Prothrombin F2 Liver
TIRAP_HUMAN P58753 Toll/interleukin-1 receptor domain- containing adapter protein TIRAP Brain, Heart, Lung, Kidney, Liver
TLR10_HUMAN Q9BXR5 Toll-like receptor 10 TLR10 Lung
TLR2_HUMAN O60603 Toll-like receptor 2 TLR2 Lung, Liver
TLR7_HUMAN Q9NYK1 Toll-like receptor 7 TLR7 Brain, Lung
TLR8_HUMAN Q9NR97 Toll-like receptor 8 TLR8 Brain, Heart, Lung, Liver
TLR9_HUMAN Q9NR96 Toll-like receptor 9 TLR9 Lung, Liver
TRFE_HUMAN P02787 Serotransferrin TF Liver
VPS45_HUMAN Q9NRW7 Vacuolar protein sorting-associated protein 45 VPS45 Brain, Heart, Lung, Kidney, Liver
X3CL1_HUMAN P78423 Fractalkine CX3CL1 Brain, Heart, Lung, Kidney
*

EDG3_HUMAN (associated with primary accession number: Q99500 from release 35.0) was renamed to S1PR3_HUMAN in release 56.0.

**

FPRL1_HUMAN (associated with primary accession number: P25090 from release 46.0) was renamed to FPR2_HUMAN in release 55.5.

3.3.1 Brain

We observed 25 inflammatory proteins expressed in brain tissue, and the interaction network was expanded to 129 nodes and 278 edges. We found six major hub proteins – NMI(17), CXCR4 (26), HDAC (17), AACT (18) and BMP2 (14), STAT3 (103) as shown in Fig. S1. In the expanded list of 129 proteins, we studied the GO terms of biological process, and observed that eight proteins are from the JAK-STAT cascade (GO:7259)- STAT4, SOCS3, CCR2, STAT4, STAT3, STA5A, STA5B, NMI; four proteins are from the anti-apoptosis (GO:6916)- TF65, SOCS3, NFKB1, HDAC3; two proteins are from the I-kappaB kinase/NF-kappaB cascade (GO:7249)- STAT1 TLR8; two proteins are from activation of NF-kappaB-inducing kinase (GO:7250)- TLR4 M3K7. Interestingly, we observed other proteins to be associated with GO development processes such as a protein in eye development (GO:1654) - BMR1B; two proteins in nervous system development (GO:7399) - STAT3, HDAC4; and two proteins in muscle contraction (GO:6936)- LT4R1, DAG1.

3.3.2 Heart

We observed 29 inflammatory proteins expressed in heart tissue, and the interaction network was expanded to 116 nodes and 234 edges. We found five major hub proteins, HDAC9 (17), NOD1 (13), GCR (98), BMP2 (14), CXCR4 (26) as shown in Fig. S2. In the expanded list of 116 proteins, we studied the GO terms of biological process, and observed that five proteins are from the JAK-STAT cascade (GO:7259) - STA5A, CCR2, STAT3, STA5A, STA5B; five proteins are associated with anti-apoptosis (GO:6916)- HDAC3, BAG1, NFKB1, TF65, SOCS3; one protein is associated with activation of NF-kappaB transcription factor (GO:51092)- TF65; and one protein is from the I-kappaB kinase/NF-kappaB cascade(GO:7249)- TLR8.

3.3.3 Lung

We observed 34 inflammatory proteins expressed in lung tissue, and the interaction network was expanded to 153 nodes and 353 edges. We found eight major hub proteins – IRAK2 (12), IL1AP (13), CXCR4 (26), NOD1 (13), TLR2 (18), CEBPB (60), BMP2 (14), STAT3 (103) as shown in Fig. S3. Six proteins are from the JAK-STAT cascade pathway (GO:7259)- INAR1, SOCS3, STA5A, CCR2, NMI, STAT3; five proteins are associated with anti-apoptosis (GO:6916)- FOXO1, NFKB1, SOCS3, TF65, ENPL; three proteins are from the I-kappaB kinase/NF-kappaB cascade pathway (GO:7249)- IRAK2, STAT1, TLR8; three proteins are from the activation of NF-kappaB-inducing kinase pathway (GO:7250)- TLR4, M3K7, TRAF6. All the GO terms mentioned were found to relate to cancer prognosis [3335].

3.3.4 Kidney

We observed 21 inflammatory proteins expressed in kidney tissue, and the interaction network was expanded to 106 nodes and 281 edges. We found six major hub proteins – IRAK2 (12), CXCR4 (26), NOD1 (13), CEBPB (60), BMP2 (14), STAT3 (103) as shown in Fig. S4. We observed that five proteins are from the JAK-STAT cascade pathway (GO:7259)- INAR1, SOCS3, STA5A, NMI, STAT3; four proteins are associated with anti-apoptosis (GO:6916)- SOCS3, NFKB1, TF65, FOXO1; two proteins are from the I-kappaB kinase/NF-kappaB cascade (GO:7249)- STAT1, IRAK2; and three proteins are associated with activation of NF-kappaB-inducing kinase (GO:7250)- TLR4, M3K7, TRAF6.

3.3.5 Liver

We observed 43 inflammatory proteins expressed in liver tissue, and the interaction network was expanded to 165 nodes and 318 edges. We found fourteen major hub proteins – IRAK2 (12), NMI (25), IL1AP (13), CXCR4 (26), NOD1 (13), TLR2 (18), AACT (18), BMP2 (14), THRB (48), FETUA (10), TRFE (20), CFAH (11), A2AP (15), STAT3 (103) as shown in Fig S5. We observed that eight proteins are from the with JAK-STAT cascade (GO:7259)- STAT4, STA5B, INAR1, SOCS3, STA5A, CCR2, NMI, STAT3; four proteins are associated with anti-apoptosis (GO:6916)- NFKB1, SOCS3, TF65, ENPL; three proteins are from the I-kappaB kinase/NF-kappaB cascade (GO:7249)- IRAK2, TLR8, STAT1; three proteins are associated with in activation of NF-kappaB-inducing kinase (GO:7250) - TLR4, M3K7, TRAF6.

3.4 Pathway database searching for network hub proteins

Seventeen hub proteins expressed in five tissues were analyzed with known pathways as shown in Table 8. It was observed that many inflammatory proteins were expressed in many tissues. For example, BMP2_HUMAN protein is expressed in all the five organs studied (heart, brain, lung, kidney, and liver) and it is secreted protein [24]. NOD1_HUMAN, AACT_HUMAN, FETUA_HUMAN, TRFE_HUMAN, CFAH_HUMAN, A2AP_HUMAN were the six hub proteins not found in our set of pathway databases. All of those pathways containing our proteins related to complex disease. For instance, HDAC9_HUMAN is expressed in heart, brain, and liver and is associated with pathways of cardiac hypertrophy, notch signaling, and the p53 signaling pathway. GCR_HUMAN is expressed only in the heart and is found in chromatin remodelling[36]; defects in the GCR gene cause a hypertensive, hyperandrogenic disorder characterized by increased serum cortisol concentrations [37]. AACT_HUMAN is expressed only in brain and liver tissue, and defects in the AACT gene are a proposed cause of chronic obstructive pulmonary disease [38]. TLR2_HUMAN is expressed in liver and is observed in Toll-like receptors pathway and TLR-TRIF pathway. THRB_HUMAN is a part of thrombin signalling, thrombopoietin pathway. Six hub proteins, BMP2_HUMAN, AACT_HUMAN, CXCR4_HUMAN, FETUA_HUMAN, TRFE_HUMAN and THRB_HUMAN were found in the cancer biomarker list as reported in [22]. Four hub proteins, GCR_HUMAN, TLR2_HUMAN, TRFE_HUMAN and THRB_HUMAN were found in the drug target list [23].

Table 8.

Major inflammatory hub proteins in Tissue-based PPIs (>=10) and their role in known biological pathways.

Major HUB proteins in Tissue-based PPIs (>=10) and GO Degree Tissues expressed MS platforms Pathways involved Source
HDAC9_HUMAN 17 Heart, Brain IMS-MS/MS_TOF DNA Methylation and Transcriptional Repression, NFAT and Cardiac Hypertrophy, Notch Signaling, p53 Signaling Protein Lounge, HDAC9, NCI-curated
NOD1_HUMAN 13 Heart, Lung, kidney Liver IMS-MS/MS_TOF --
GCR_HUMAN 98 Heart IMS-MS/MS_TOF Chromatin remodelling, Defects in this gene cause a hypertensive, hyperandrogenic disorder characterized by increased serum cortisol concentrations. BioCarta, NCI-curated
BMP2_HUMAN 14 Heart, Brain, Lung, Kidney, Liver IMS-MS/MS_TOF BMP Pathway, JAK/STAT Pathway, Mitochondrial Apoptosis, PAK Pathway,, MIF Regulation of Innate Immune Cells, MIF Mediated Glucocorticoid Regulation, TGF-Beta Pathway, MAPK Family Pathway, PKR Pathway, Rac1 Pathway, Rho Family GTPases Protein Lounge, BioCarta
NMI_HUMAN 25 Brain, Liver IMS-MS/MS_TOF Prolactin Signaling Protein Lounge, BioCarta
CXCR4_HUMAN 26 Brain, Lung, Kidney, Liver IMS-MS/MS_TOF CXCR4 Pathway, Signaling by Slit, NF-kappaB Activation by Viruses, EphB-EphrinB Signaling, Ephrin-Eph Signaling, Apoptotic Pathways Triggered By HIV1 Protein Lounge, BioCarta, NCI-curated
AACT_HUMAN 18 Brain, Liver ESI-MS/MS_LTQ Defects may be a cause of chronic obstructive pulmonary disease
IRAK2_HUMAN 12 Lung, Kidney, Liver IMS-MS/MS_TOF Toll-Like Receptors Pathway, TLR-TRIF Pathway, NF-KappaB Family Pathway, IL-1 Pathway, NF-KappaB (p50/p65) Pathway Protein Lounge, BioCarta
IL1AP_HUMAN 13 Lung, Liver IMS-MS/MS_TOF and ESI-MS/MS_LTQ IL-1 Pathway, IL-10 Pathway Protein Lounge
CEBPB_HUMAN 60 Lung, Kidney IMS-MS/MS_TOF Glucocorticoid Receptor Signaling, Prolactin Signaling, Growth Hormone Signaling Protein Lounge, BioCarta, NCI-curated
TLR2_HUMAN 18 Lung, Liver IMS-MS/MS_TOF Toll-Like Receptors Pathway, TLR-TRIF Pathway Protein Lounge
STAT3_HUMAN 103 Brain, Lung, Kidney, Liver IMS-MS/MS_TOF STAT3 Pathway Protein Lounge
FETUA_HUMAN 10 Liver ESI-MS/MS_LTQ, IMS-MS/MS_TOF
TRFE_HUMAN 20 Liver ESI-MS/MS_LCQ, IMS-MS/MS_TOF, ESI-MS/MS_QTOF, ESI-MS/MS_DECAXP
CFAH_HUMAN 11 Liver ESI-MS/MS_LCQ, IMS-MS/MS_TOF
A2AP_HUMAN 15 Liver ESI-MS/MS_LTQ, IMS-MS/MS_TOF
THRB_HUMAN 48 Liver IMS-MS/MS_TOF, ESI-MS/MS_LTQ Thrombin Signaling, Thrombopoietin Pathway Protein Lounge

4 Conclusions

Characterizing the components of plasma proteome systematically is closely related to future clinical applications of MS proteomics emerging from a wave of promising new technologies such as SELDI [39], shotgun proteomics [40], multiple reaction monitoring (MRM) [41] and glycoprotein-mass spectrometry coupled proteomics [42]. In order to measure the consequence of these advancements which may add significant quality to the biomarker discovery process, there is a need for inspecting how current proteomic surveys generally relate to systems of complex disease. In this work, we examined proteomic and inflammatory protein sets for clinically important overlaps with biomarkers and drug targets in the healthy human plasma proteome recently integrated in the HIP2 database [14]. We analyzed the functional annotations of these protein sets with gene ontology, and investigated patterns of protein interaction, tissue specificity and overlap with pathways. The human plasma proteome and filtered human plasma proteome were found to have higher percentages of biomarkers and drug targets than was found in the curated human proteome. Plasma is readily accessible and, based on the presence of biomarkers in plasma, this suggests that it is useful for objectives of clinical diagnosis and prognosis. As we found, especially for GO level 4, tests for statistical significance identified differences between the curated human proteome and the filtered human plasma proteome across all three GO vocabularies. Distinct differences of percentages for GO term overlaps were not observed however between the curated human proteome and the human plasma proteome, although we did find higher percentage representations of P compared to H for GO term annotation percentages <2%. The higher percentage representations for low frequency GO term annotations in plasma can be generally interpreted to represent how the plasma proteome specifically deviates from the overall human proteome. Although the inflammatory process is tissue specific, 70% of inflammatory proteins are also found in plasma, and inflammation has been found to have some overlap with complex disease. We found inflammatory plasma proteins to have a higher chance to be biomarkers than the plasma proteome. Additionally, we found inflammatory plasma protein interacting partners to have a higher chance to be biomarkers than the plasma proteome. Compared to the plasma proteome, the inflammatory plasma protein set also presented a distinctive profile of statistically significant GO terms and percentages of overlap. Our visualizations of PPI network topologies showed differences between tissues, most distinctively for the higher centrality observed in brain and heart PPI networks. Six hub proteins in the PPIs expanded inflammatory network were found to be potential cancer biomarkers: BMP2_HUMAN, AACT_HUMAN, CXCR4_HUMAN, FETUA_HUMAN, TRFE_HUMAN and THRB_HUMAN. We identified a specificity issue in terms of how many inflammatory proteins are expressed in many tissues, and may thus be difficult to use as biomarkers of tissue specific cancer. Despite decades of effort, single biomarkers have not been found that can reach the levels of specificity and sensitivity that are required for routine clinical use for the detection or monitoring of the most common cancers. Alternative approaches that measure sensitivity and specificity based on multiple protein and peptide markers may therefore be necessary to achieve a higher level of diagnostic specificity. Overall, we found that a systematic evaluation of plasma, inflammatory proteins and interacting protein partners facilitates the study of complex disease and opportunities for clinical application.

Supplementary Material

Supplementary Data

Table 5.

Selected molecular function GO terms applied to comparisons between H versus P protein sets and P versus Ip protein sets. The results of comparisons are shown in the last two columns; significance (Y) is based on an adjusted p-value as calculated from 1000 null-hypothesis simulations of hypergeometric comparisons based on random selection, padj≤0.001; n/a is when there are not any representatives of the GO term in the Ip set.

Index Level GO ID GO term H vs P P vs Ip
F1 4 GO:0005200 structural constituent of cytoskeleton Y n/a
F2 4 GO:0003682 chromatin binding Y n/a
F3 4 GO:0004386 helicase activity Y n/a
F4 4 GO:0005201 extracellular matrix structural constituent Y n/a
F5 4 GO:0016563 transcription activator activity Y N
F6 4 GO:0003712 transcription cofactor activity Y N
F7 4 GO:0000166 nucleotide binding Y N
F8 4 GO:0003702 RNA polymerase II transcription factor activity Y n/a
F9 4 GO:0030695 GTPase regulator activity Y n/a
F10 4 GO:0008289 lipid binding Y N
F11 4 GO:0005515 protein binding Y N* (p=0.06)
F12 4 GO:0022892 substrate-specific transporter activity Y N
F13 4 GO:0003700 transcription factor activity Y N
F14 4 GO:0016787 hydrolase activity Y N
F15 4 GO:0016740 transferase activity Y N
F16 4 GO:0022857 transmembrane transporter activity Y N
F17 4 GO:0043167 ion binding Y N
F18 4 GO:0004871 signal transducer activity N Y

Acknowledgments

This work was supported by a grant from the National Cancer Institute (U24CA126480-01), part of NCI’s Clinical Proteomic Technologies Initiative (http://proteomics.cancer.gov). This initiative is designed to advance the field of clinical cancer proteomics by addressing the challenges to the measurement of peptides and/or proteins in clinical specimens. A component of this initiative is the set of Clinical Proteomic Technology Assessment for Cancer (CPTAC) teams that includes the Broad Institute of MIT and Harvard, Memorial Sloan-Kettering Cancer Center, Purdue University, University of California, San Francisco, and Vanderbilt University School of Medicine.

References

  • 1.Anderson NL, Anderson NG. The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics. 2002;1:845–867. doi: 10.1074/mcp.r200007-mcp200. [DOI] [PubMed] [Google Scholar]
  • 2.Aggarwal BB. Nuclear factor-kappaB: the enemy within. Cancer cell. 2004;6:203–208. doi: 10.1016/j.ccr.2004.09.003. [DOI] [PubMed] [Google Scholar]
  • 3.Alper C, Taggart HA. Plasma protein patterns in disease. Am Pract Dig Treat. 1954;5:349–353. [PubMed] [Google Scholar]
  • 4.Parfentjev IA, Johnson ML. The plasma protein pattern and its significance in geriatrics and cancer diagnosis. Geriatrics. 1955;10:232–238. [PubMed] [Google Scholar]
  • 5.Trusheim MR, Berndt ER, Douglas FL. Stratified medicine: strategic and economic implications of combining drugs and clinical biomarkers. Nature reviews. 2007;6:287–293. doi: 10.1038/nrd2251. [DOI] [PubMed] [Google Scholar]
  • 6.Williams SA, Slavin DE, Wagner JA, Webster CJ. A cost-effectiveness approach to the qualification and acceptance of biomarkers. Nature reviews. 2006;5:897–902. doi: 10.1038/nrd2174. [DOI] [PubMed] [Google Scholar]
  • 7.Veenstra TD, Conrads TP, Hood BL, Avellino AM, et al. Biomarkers: mining the biofluid proteome. Mol Cell Proteomics. 2005;4:409–418. doi: 10.1074/mcp.M500006-MCP200. [DOI] [PubMed] [Google Scholar]
  • 8.Petricoin EF, Belluco C, Araujo RP, Liotta LA. The blood peptidome: a higher dimension of information content for cancer biomarker discovery. Nat Rev Cancer. 2006;6:961–967. doi: 10.1038/nrc2011. [DOI] [PubMed] [Google Scholar]
  • 9.Liotta LA, Petricoin EF. Putting the “bio” back into biomarkers: orienting proteomic discovery toward biology and away from the measurement platform. Clinical chemistry. 2008;54:3–5. doi: 10.1373/clinchem.2007.097659. [DOI] [PubMed] [Google Scholar]
  • 10.Ping P, Vondriska TM, Creighton CJ, Gandhi TK, et al. A functional annotation of subproteomes in human plasma. Proteomics. 2005;5:3506–3519. doi: 10.1002/pmic.200500140. [DOI] [PubMed] [Google Scholar]
  • 11.Berhane BT, Zong C, Liem DA, Huang A, et al. Cardiovascular-related proteins identified in human plasma by the HUPO Plasma Proteome Project pilot phase. Proteomics. 2005;5:3520–3530. doi: 10.1002/pmic.200401308. [DOI] [PubMed] [Google Scholar]
  • 12.Muthusamy B, Hanumanthu G, Suresh S, Rekha B, et al. Plasma Proteome Database as a resource for proteomics research. Proteomics. 2005;5:3531–3536. doi: 10.1002/pmic.200401335. [DOI] [PubMed] [Google Scholar]
  • 13.Liu X, Valentine SJ, Plasencia MD, Trimpin S, et al. Mapping the human plasma proteome by SCX-LC-IMS-MS. Journal of the American Society for Mass Spectrometry. 2007;18:1249–1264. doi: 10.1016/j.jasms.2007.04.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Saha S, Harrison SH, Shen C, Tang H, et al. HIP2: An online database of human plasma proteins from healthy individuals. BMC medical genomics. 2008;1:12. doi: 10.1186/1755-8794-1-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Shadforth I, Bessant C. Genome annotating proteomics pipelines: available tools. Expert review of proteomics. 2006;3:621–629. doi: 10.1586/14789450.3.6.621. [DOI] [PubMed] [Google Scholar]
  • 16.Balkwill F, Charles KA, Mantovani A. Smoldering and polarized inflammation in the initiation and promotion of malignant disease. Cancer cell. 2005;7:211–217. doi: 10.1016/j.ccr.2005.02.013. [DOI] [PubMed] [Google Scholar]
  • 17.Theodoropoulos G, Papaconstantinou I, Felekouras E, Nikiteas N, et al. Relation between common polymorphisms in genes related to inflammatory response and colorectal cancer. World J Gastroenterol. 2006;12:5037–5043. doi: 10.3748/wjg.v12.i31.5037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Oremek GM, Sauer-Eppel H, Bruzdziak TH. Value of tumour and inflammatory markers in lung cancer. Anticancer research. 2007;27:1911–1915. [PubMed] [Google Scholar]
  • 19.Aggarwal BB, Shishodia S, Sandur SK, Pandey MK, Sethi G. Inflammation and cancer: how hot is the link? Biochemical pharmacology. 2006;72:1605–1621. doi: 10.1016/j.bcp.2006.06.029. [DOI] [PubMed] [Google Scholar]
  • 20.States DJ, Omenn GS, Blackwell TW, Fermin D, et al. Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study. Nature biotechnology. 2006;24:333–338. doi: 10.1038/nbt1183. [DOI] [PubMed] [Google Scholar]
  • 21.Boyle EI, Weng S, Gollub J, Jin H, et al. GO::TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics (Oxford, England) 2004;20:3710–3715. doi: 10.1093/bioinformatics/bth456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Polanski M, Anderson LN. Biomarker Insights. 2006:1–48. [PMC free article] [PubMed] [Google Scholar]
  • 23.Wishart DS, Knox C, Guo AC, Shrivastava S, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic acids research. 2006;34:D668–672. doi: 10.1093/nar/gkj067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Chen Y, Zhang Y, Yin Y, Gao G, et al. SPD--a web-based secreted protein database. Nucleic acids research. 2005;33:D169–173. doi: 10.1093/nar/gki093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ashburner M, Ball CA, Blake JA, Botstein D, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Shannon P, Markiel A, Ozier O, Baliga NS, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Alfarano C, Andrade CE, Anthony K, Bahroos N, et al. The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic acids research. 2005;33:D418–424. doi: 10.1093/nar/gki051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hernandez-Toro J, Prieto C, De las Rivas J. APID2NET: unified interactome graphic analyzer. Bioinformatics (Oxford, England) 2007;23:2495–2497. doi: 10.1093/bioinformatics/btm373. [DOI] [PubMed] [Google Scholar]
  • 29.Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, et al. ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic acids research. 2003;31:3784–3788. doi: 10.1093/nar/gkg563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Vastrik I, D’Eustachio P, Schmidt E, Joshi-Tope G, et al. Reactome: a knowledge base of biologic pathways and processes. Genome biology. 2007;8:R39. doi: 10.1186/gb-2007-8-3-r39. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Shargel L. Applied Biopharmaceutics & Pharmacokinetics. McGraw-Hill; New York: 2005. [Google Scholar]
  • 32.Luster AD, Alon R, von Andrian UH. Immune cell migration in inflammation: present and future therapeutic targets. Nature immunology. 2005;6:1182–1190. doi: 10.1038/ni1275. [DOI] [PubMed] [Google Scholar]
  • 33.Spano JP, Milano G, Rixe C, Fagard R. JAK/STAT signalling pathway in colorectal cancer: a new biological target with therapeutic implications. Eur J Cancer. 2006;42:2668–2670. doi: 10.1016/j.ejca.2006.07.006. [DOI] [PubMed] [Google Scholar]
  • 34.Monks NR, Biswas DK, Pardee AB. Blocking anti-apoptosis as a strategy for cancer chemotherapy: NF-kappaB as a target. Journal of cellular biochemistry. 2004;92:646–650. doi: 10.1002/jcb.20080. [DOI] [PubMed] [Google Scholar]
  • 35.Luo JL, Kamata H, Karin M. IKK/NF-kappaB signaling: balancing life and death--a new approach to cancer therapy. The Journal of clinical investigation. 2005;115:2625–2632. doi: 10.1172/JCI26322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Fryer CJ, Archer TK. Chromatin remodelling by the glucocorticoid receptor requires the BRG1 complex. Nature. 1998;393:88–91. doi: 10.1038/30032. [DOI] [PubMed] [Google Scholar]
  • 37.Malchoff DM, Brufsky A, Reardon G, McDermott P, et al. A mutation of the glucocorticoid receptor in primary cortisol resistance. The Journal of clinical investigation. 1993;91:1918–1925. doi: 10.1172/JCI116410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Poller W, Faber JP, Scholz S, Weidinger S, et al. Mis-sense mutation of alpha 1-antichymotrypsin gene associated with chronic lung disease. Lancet. 1992;339:1538. doi: 10.1016/0140-6736(92)91301-n. [DOI] [PubMed] [Google Scholar]
  • 39.Tang N, Tornatore P, Weinberger SR. Current developments in SELDI affinity technology. Mass spectrometry reviews. 2004;23:34–44. doi: 10.1002/mas.10066. [DOI] [PubMed] [Google Scholar]
  • 40.Hu L, Ye M, Jiang X, Feng S, Zou H. Advances in hyphenated analytical techniques for shotgun proteome and peptidome analysis--a review. Analytica chimica acta. 2007;598:193–204. doi: 10.1016/j.aca.2007.07.046. [DOI] [PubMed] [Google Scholar]
  • 41.Anderson L, Hunter CL. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol Cell Proteomics. 2006;5:573–588. doi: 10.1074/mcp.M500331-MCP200. [DOI] [PubMed] [Google Scholar]
  • 42.Abbott KL, Aoki K, Lim JM, Porterfield M, et al. Targeted glycoproteomic identification of biomarkers for human breast carcinoma. Journal of proteome research. 2008;7:1470–1480. doi: 10.1021/pr700792g. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

RESOURCES