Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Sep 1.
Published in final edited form as: J Proteome Res. 2008 Jul 25;7(9):3755–3764. doi: 10.1021/pr800031f

Halobacterium salinarum NRC-1 PeptideAtlas: strategies for targeted proteomics

Phu T Van †,, Amy K Schmid , Nichole L King , Amardeep Kaur , Min Pan , Kenia Whitehead , Tie Koide , Marc T Facciotti , Young-Ah Goo †,, Eric W Deutsch , David J Reiss , Parag Mallick §, Nitin S Baliga †,‡‡,*
PMCID: PMC2643335  NIHMSID: NIHMS73201  PMID: 18652504

Abstract

The relatively small numbers of proteins and fewer possible posttranslational modifications in microbes provides a unique opportunity to comprehensively characterize their dynamic proteomes. We have constructed a Peptide Atlas (PA) for 62.7% of the predicted proteome of the extremely halophilic archaeon Halobacterium salinarum NRC-1 by compiling approximately 636,000 tandem mass spectra from 497 mass spectrometry runs in 88 experiments. Analysis of the PA with respect to biophysical properties of constituent peptides, functional properties of parent proteins of detected peptides, and performance of different mass spectrometry approaches has helped highlight plausible strategies for improving proteome coverage and selecting signature peptides for targeted proteomics. Notably, discovery of a significant correlation between absolute abundances of mRNAs and proteins has helped identify low abundance of proteins as the major limitation in peptide detection. Furthermore we have discovered that iTRAQ labeling for quantitative proteomic analysis introduces a significant bias in peptide detection by mass spectrometry. Therefore, despite identifying at least one proteotypic peptide for almost all proteins in the PA, a context-dependent selection of proteotypic peptides appears to be the most effective approach for targeted proteomics.

Keywords: Peptide Atlas, Halobacterium, iTRAQ, bioinformatics, archaea, proteomics

INTRODUCTION

A complete genome sequence presents a one-dimensional perspective of the physiological potential of an organism. It is the temporally and spatially coordinated expression of genes into functional protein networks that yields emergent behavior that is unique to each species. Therefore, to fully understand how cells function at a systems level it is imperative to measure, assimilate and simultaneously analyze changes that occur at all levels of genetic information processing1. The transcriptome is dynamic and relatively easy to monitor comprehensively using whole genome microarrays, providing insight into which genes respond and assist in adaptation of the organism to a particular environment2. However, much more information on regulatory processes remains locked within the proteome. There exist important differences between the transcriptome and the proteome that stem from a variety of post-transcriptional processes, such as regulated degradation and posttranslational modifications, thus elevating the importance of comprehensive analysis of dynamic changes in the proteome in response to various environmental perturbations3, 4.

However, comprehensive detection of the proteome is fraught with technical challenges, especially with regard to proteins that are present in low abundance, integral to the membrane, or uniquely expressed in an environment-specific manner. Even within the same protein some peptides are more tractable than others using mass spectrometry-based approaches. While there are several existing hypotheses regarding the underlying reasons that make some peptides more tractable than others (i.e. biophysical properties such as isoelectric point (pI), hydrophobicity, and length)5, certain properties such as protease accessibility, protein structure, and protein modifications complicate any attempt to make accurate predictions of peptide tractability by mass spectrometry (MS) using purely theoretical approaches.

The PeptideAtlas (PA) project was initiated to map the proteome of a given organism, cell type or tissue as experimentally detected by the mass spectrometer6. The PA is technology agnostic and can make use of data from a variety of MS proteomic approaches such as qualitative proteomics surveys, tandem MS of immunoprecipitated complexes, and quantitative proteomics (e.g. ICAT and iTRAQ). Once constructed, the PA can be used as a reference for designing targeted proteomic strategies such as multiple reaction monitoring (MRM) as well as absolute protein quantification7. PeptideAtlas databases have been constructed for the human8, human plasma9, fly10, and yeast11 proteomes.

Here we report a PA for Halobacterium salinarum NRC-1, an obligate halophilic archaeon that evolved unique adaptations, such as increased surface negative charges on folded proteins for survival in its extreme environment of 4.5 M salt12. H. salinarum NRC-1 has a completely sequenced and easily manipulable genome and as such has been used as a model system for constructing a predictive model of cellular responses13 to a diverse array of routine and stressful environmental changes2, 4, 14-17. The PA represents the product of integration and re-processing of data from a wide array of proteomics experiments (surveys of fractionated proteomes, enrichment of complexes by immunoprecipitation, and ICAT- and iTRAQ-based quantitative analysis of proteomic changes) in these environmental response studies. This exercise has verified the expression of 63% of the predicted proteome of H. salinarum NRC-1 including previously undetected and potentially new members of a diverse array of physiological processes. Through extensive analysis of peptides in the PA in context of function, biophysical properties, and abundance, we have identified several factors that might have contributed to our inability to detect 37% of the proteome. Notably by demonstrating a significant correlation between absolute abundance of proteins and transcripts we have identified low abundance of proteins as the main limiting factor in peptide detection by mass spectrometry. We have also conducted a comparative analysis of all the various proteomics approaches that contributed data to the construction of the PA to craft strategies for improving coverage and using proteotypic reference peptides for targeted proteomics.

MATERIALS AND METHODS

Cell culture conditions, protein preparation and mass spectrometric conditions

All details regarding cell culturing, protein preparation, and mass spectrometry conditions are discussed in the corresponding publications on H. salinarum NRC-1 for each of the mass spectrometry proteomics methods included in the PA4, 17-20. These methods include iTRAQ, ICAT, cell fractionation, enrichment by immunoprecipitation, and gel band extracted proteins. However, to aid clarity in the present study, we have delineated pertinent details regarding these procedures in Table 1.

Table 1.

Culturing conditions for all proteomics experiments included in the H. salinarum PeptideAtlas. (See also references 4, 15, 28, 29, 31)

Proteomics
method
Experiment
biological
purpose
Data
source
Strains
Culture
conditions
and perturbation
Protein
extraction
Fractionation
method
Protein
digestion
Labeling
Mass
spectrometry
Peptide to
protein
matching
Data
analysis
Fractionation
To detect which proteins are enriched in the membrane vs the cytoplasmic fractions

(19)
Halobacterium
salinarum
NRC-1
Standard
conditionsa
Lysed by osmotic shock. Lysates treated with nuclease and PMSF,
clarified by centrifugation.
2X ultracentrifugation at 53,
000×g over 30% sucrose gradient cushion.
100ug protein digested with trypsin at 37°C overnight
N/A
μLC-ESI-MS/MS on an LCQ-DECA.
SEQUESTe
INTERACTd,
PeptideProphet,
ProteinProphetf.
Gel bands
To identify proteins which form complexes with bacteriorhodopsin regulator
Facciotti and Vuthoori,
unpublished data
Halobacterium
salinarum
NRC-1
with bat tagged
on N terminus with myc
standard
conditions,
cross-linked with 1.2% formaldehyde
Lysed by sonication,
treated with protease inhibitors
(Roche,
Switzerland),
clarified by centrifugation
Sample simplified on Ni2+-NTA resin,
run on 10% polyacrylamide gel,
gel bands extracted
in-gel digestion with trypsin
N/A
μLC-ESI-MS/MSc
SEQUEST
PeptideProphet
and ProteinProphet
IP
To delineate protein-protein interaction network specified by the 13 general transcription factors

(18)
H. salinarum
NRC-1 with each of
the 13 general transcription
factors tagged on
C terminus with myc
Standard
conditions
Lysed using a microfluidizer,
treated with nuclease and PMSF,
clarified by centrifugation.
Immunoprecipitation by sepharose-bound IgG and mouse anti-myc antibody
trypsin
N/A
μLC-ESI-MS/MS
SEQUEST
PeptideProphet
and ProteinProphet
ICAT
To quantify proteome differences between cells with or without phototrophic ability

(17)
H. salinarum
NRC-1 with overexpressed
(bat+; S9) or
absent (bat-; SD23)
bacteriorhodopsin.
Genetic
perturbation,
standard culture conditions
As describedb
cation exchange
trypsin
(AFTER labeling)
As describedd with modifications. Total protein
(2.5 mg) was denatured with 6M urea and 0.05% SDS and immediately reduced with 5 mM tributylphosphine. Cysteine residues were selectively labeled with a 2-fold molar excess of either light
(d0)
(bat-) or heavy
(d8) ICAT
(bat+)
(ABI). d0- and d8-ICAT labeled proteins were mixed in a 1:1 ratio
μLC-ESI-MS/MS
SEQUEST
EXPRESS softwared
iTRAQ
(1) To quantify protein expression changes occurring over time in cells exposed to gamma irradiaiton

(4)
H. salinarum NRC-1 Cell cultures
(OD600nm = 0.4) were exposed to
2500 Gy of gamma-ray. Irradiated and control cultures recovered at 42oC and 220rpm shaking,
during which samples were removed at various time points to extract proteins.
Lysed by osmotic shock. Lysates treated with nuclease and PMSF,
clarified by centrifugation. Proteins were acetone precipitated.
cation exchange desalting,
then HPLC
trypsin
(BEFORE labeling).
iTRAQ labeling was conducted as per manufacturer's instructions
(ABI). Reference samples were labeled with 114 Da reagent,
whereas irradiated time point samples were labeled with each of 115,
116,
or 117 Da reagent.
LC-MS/MS using an Applied Biosystems API QSTAR Pulsar I,
in-house nanospray device
COMET,
SEQUEST
PeptideProphet,
ProteinProphet,
Libraf.

(2) To quantify protein expression changes occurring over time in cells exposed to aerobic vs anaerobic conditions

(20)
H. salinarum NRC-1 Dissolved oxygen levels were varied between low
(0−0.5% saturation) and high
(80−100% saturation) in a turbidostat culture
(OD 0.6) over the course of 14 hours. Samples were removed at various time points to extract proteins
Lysed by osmotic shock. Lysates treated with nuclease and PMSF,
clarified by centrifugation. Proteins were acetone precipitated.
cation exchange desalting,
then HPLC
trypsin
(BEFORE labeling).
Reference samples were labeled with 114 Da reagent,
whereas oxygen-treated time point samples were labeled with each of 115,
116,
or 117 Da reagent.
LC-MS/MS using an Applied Biosystems API QSTAR Pulsar I,
in-house nanospray device
COMET,
SEQUEST
PeptideProphet,
ProteinProphet,
Libra.
a

37C, 225 rpm shaking, complex medium (CM; 250 NaCI, 20 g/L MgSO4.7H2O, 3 g/L sodium citrate, 2 g/L KCl, 10 g/L peptone), broad spectrum white light, grown to stationary phase (OD600∼1.0−2.0). Yang, C. F., Kim, J. M., Molinari, E. & DasSarma, S. (1996) J. Bacteriol. 178, 840−845.)

b

(13 DasSarma, S. & Fleischmann, E. M. (1995) Halophiles (Cold Spring Harbor Lab. Press, Plainview, NY).

c

Yi EC, Lee H, Aebersold R, Goodlett DR (2003) Rapid Commun Mass Spectrom 17:2093−2098.

d

Han, D. K., Eng, J., Zhou, H. & Aebersold, R. (2001) Nat. Biotechnol. 19, 946−951.

e

Eng, J.K., McCormack, A.L. & Yates, J.R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Amer. Soc. Mass Spectrom.. 5, 976−989 (1994).

f

Keller A, Nesvizhskii AI, Kolker E, Aebersold R (2002) Anal Chem 74:5383−5392; Nesvizhskii, A. I., Keller, A., Kolker, E. & Aebersold, R. (2003) Anal Chem 75, 4646−58.

g

Pedrioli, P. G., et al.(2004) Nat Biotechnol 22, 1459−66.

PeptideAtlas build summary

A PA is created by identifying the peptides in MS/MS spectra, calculating the genomic coordinates of the peptides, and storing the datasets and derived information in a database for subsequent data mining10. The H. salinarum NRC-1 PA was constructed from 88 experiments [immunoprecipitation (IP), quantitative proteomic analysis using isotopic reagents (ICAT and iTRAQ), and proteome surveys via fractionation into soluble and membrane fractions] comprised of a total of 636,000 MS/MS from multiple spectrometer vendors [Sciex QStar (Applied Biosystems, Foster City, CA), Micromass QTOF (Waters, Milford, MA) and LCQ (ThermoFinnigan, Waltham, MA)] (Table 2). For each experiment, the vendor format MS/MS spectra were converted to mzXML format21 and assigned to peptides using SEQUEST22 and the complete set of H. salinarum NRC-1 protein sequences derived from the original genome annotation12 and the National Center for Biotechnology Information (NCBI) and SwissProt sequence databases. The peptide identifications were scored using PeptideProphet23 and filtered to retain only those with P ≥ 0.9, which corresponds to a spectrum identification false discovery rate of 1.1%. After all experiments were processed, the peptides were aligned to the reference proteome. The chromosomal coordinates of peptides from this analysis were verified against NCBI's Generic Features File (GFF) files and manually-curated data maintained at ISB (http://baliga.systemsbiology.net/halobacterium) in Systems Biology Experimental Analysis Management System (SBEAMS), a Relational Database Management System (RDBMS) (http://www.sbeams.org), which was also used to archive all of the PA results. We generated a complete library of tryptic peptides by performing an in silico digest of the entire H. salinarum NRC-1 predicted proteome, allowing for one missed cleavage. A measure of peptide observability, the Empirical Observability Score (EOS) (E. Deutsch, personal communication) was calculated for each peptide using the following equation: Nsamples (peptide) / Nsamples (protein). For example, if a protein was seen in 10 different samples, and one of its constituent peptides was seen in 5 of those samples, EOS of that peptide would be 0.5.

Table 2.

Summary of MS/MS spectra, proteins and peptides included in the H. salinarum NRC-1 PeptideAtlas.

Number of experimentsa 88
Number of MS runs 497
Number of MS/MS spectra 636,300
Number of MS/MS spectra searchedb 539,950
Number of MS/MS spectra above P=0.9 threshold 76,212
Number of distinct peptides in P>0.9 PeptideAtlas 12,316
Number of distinct peptides aligned to reference genome 11,960
Number of proteins or ORFs in reference genome 2,627
Number of proteins or ORFs detected in P>0.9 PeptideAtlas 1,646 (62.7%)
a

includes experiments of all types as listed in Figure 2.

b

∼96,000 poor-quality spectra were excluded from SEQUEST search.

Calculation of mRNA/protein abundance correlation

To calculate transcript abundance for each of the 1,646 genes whose cognate proteins were detected in the PA (Table 2), we computed the arithmetic mean intensity for that gene across 215 microarray conditions (Supplementary Table ST-1). These intensities were then log10 transformed. Cultures prepared for these microarray experiments were treated identically to those used for the proteomics experiments included in the PA (Table 1, Supplementary Table ST-1; conditions included gamma radiation stress4, UV radiation24, oxygen transitions20, and genetic knockouts17, 18, 24). To calculate sequence coverage per protein (Fig. 3A), the number of amino acids in each peptide corresponding to a given protein were summed, then divided by the total number of amino acids in that protein. If peptides were detected with partially overlapping amino acid sequences, each of the bases in the overlapping region was only counted once. To calculate the spectral counts (Fig. 3B), we computed the arithmetic mean of the number of spectra counted per protein which corresponded to peptides with a confidence value of P ≥ 0.9. To calculate the concordance between transcript abundance and cumulative proteome coverage, the average mRNA signal intensities for each gene were organized into 100 bins with 100 intensities per bin (i.e. bin 1 = intensity 0−99; bin 2 = 100−199; etc.) (Fig. 3C). The total range of intensities for this analysis was 0 to 50,000. Cumulative proteome coverage was calculated by adding the total number of proteins detected per transcript intensity bin as each successively higher bin was added to the analysis. The p-value of the correlation between mRNA and protein abundance was computed by counting the number of times that a set of randomly-permuted mRNA and protein levels had a correlation coefficient that was greater than or equal to the reported (unpermuted) correlation.

Figure 3. Significant correlation between absolute mRNA and protein abundance. (A) Concordance between transcript abundance and per-protein sequence coverage.

Figure 3

Comparison of peptide coverage per protein (Y-axis) and transcript abundance (X-axis), each of which was calculated as described in Materials and Methods. Each point on the scatterplot corresponds to one of the 1,646 genes whose proteins were detected in the PA. The Spearman correlation coefficient between the two datasets is shown on the graph (Rs = 0.511; P < 10−7), and the bold grey line represents the correlation squared (R2). (B) Concordance between transcript abundance and per-protein spectral counts. Arithmetic mean of the number of spectra counted per protein (Y-axis; Materials and Methods) was plotted as a function of each protein's cognate transcript abundance (X-axis). (C) Concordance between transcript abundance and proteome coverage. Average mRNA signal intensities for each gene are organized into 100 bins with 100 intensities per bin (i.e. bin 1 = intensity 0−99; bin 2 = 100−199; etc). Note that the first three bins are empty (i.e. transcripts with low intensities were detected neither at the mRNA nor at the protein level). Cumulative proteome coverage (black connected squares; right-hand Y-axis) was calculated by adding the total number of proteins detected per transcript intensity bin as each successively higher bin was added to the analysis. Note that although the analysis was carried out to intensities of 50,000, for brevity we have terminated the graph at 10,000 on the X-axis and 60% cumulative coverage on the right-hand Y-axis , since we observed an increase of new detections of only ∼3% between intensities of 10,000 and 50,000. Total protein count (grey vertical bars; left-hand Y-axis) is represented by the height of each bar, denoting the total number of proteins detected per bin.

RESULTS AND DISCUSSION

Construction of the H. salinarum NRC-1 PeptideAtlas

A total of 636,000 tandem mass (MS/MS) spectra from 88 proteomic experiments in 497 individual runs representing at least three types of approaches and three types of mass spectrometers (Materials and Methods) were converted to a common file format (mzXML) (Table 2). Using SEQUEST and PeptideProphet, 76,212 MS/MS spectra or ∼12% of all MS/MS spectra had significant matches (P ≥ 0.9) to peptides from 1,646 predicted proteins in H. salinarum NRC-1 (Table 2), resulting in a false discovery rate of 1.1%. This represents 1,461 non-redundant proteins or 63% of the predicted proteome, thus improving coverage by 1.7-fold over a previous report that made use of a two-dimensional separation approach for protein cataloguing25. To facilitate further analysis, the PA module has been integrated with the H. salinarum NRC-1 protein annotation module in SBEAMS –a relational database system for managing systems biology data (http://baliga.systemsbiology.net/halobacterium/).

Physiological functions represented in the H. salinarum NRC-1 PA

Although gene finding algorithms such as GLIMMER can identify protein-coding genes with relatively low error rates26, until verified experimentally these genes are considered putative. This is an especially important concept considering that over a third of all genes predicted from almost all completely sequenced genomes do not match experimentally characterized orthologs. The identification of a peptide verifies the expression of the parent protein predicted from the genome sequence. As such we have verified the expression of 1,461 non-redundant proteins predicted in the H. salinarum genome. Of these 1,029 proteins (9,330 peptides) had significant matches to PFAM signatures (e–value < 0.001)27; 1,157 proteins (10,490 peptides) had significant matches to Clusters of Orthologous Groups (COGs) (e-value < 0.001)28; 902 proteins (9,012 peptides) matched manually-curated functional annotations12, 29, and 838 proteins (12,410 peptides) mapped to distinct enzymatic steps within 77 metabolic pathways in Kyoto Encyclopedia of Genes and Genomes (KEGG)30. In summary, we have experimentally verified the expression of at least 989 proteins (37.6% of the H. salinarum NRC-1 proteome) with some putative functional annotation, which represents a 2.3-fold improvement over the 16.2% verification in a previous proteomic survey25. More importantly, we have verified the expression of at least 300 proteins with no significant matches to experimentally characterized proteins. Below we provide some highlights from this analysis.

Detection of essential cellular functions

Consistent with previous H. salinarum NRC-1 high-throughput proteomics studies, we observed a high degree of coverage for proteins involved in essential cellular functions (Table 3, Supplemental Table ST-2; Supplemental Figure SF-1). For example, with regard to genetic information processing, unique peptides were detected from five of the six predicted DNA polymerase proteins or subunits (PolA1 was not detected). We have detected unique peptides from 10 of the 12 predicted putative RNA polymerase (RNAP) subunits. In addition, we detected a putative 7 KDa RNAP subunit, Rpc10 (COG1996), which is not co-transcribed with any of the other known RNA polymerase subunits and was not detected in any previous H. salinarum NRC-1 proteomic surveys. With regard to protein synthesis, secretion and degradation, unique peptides from all 55 ribosomal proteins, all 20 amino acid-tRNA synthetases, elongation factors EF-1α and β, and EF-2, 11 translation initiation factors, 6 putative sec-dependent secretion proteins, 5 putative twin-arginine translocation proteins, and five proteases were detected. Also as expected, proteins involved with cellular motility and relocation including 9 chemotaxis proteins, 8 flagellar proteins, and 11 gas vesicle biogenesis proteins were detected. At least 53 out of the 75 predicted membrane ABC transport system subunits have been detected.

Table 3.

Functions of H. salinarum NRC-1 PA parent proteins as classified by KEGG.

Categorya Proteinsb %c
Energy metabolism 89 100.0 %
Carbohydrate metabolism 120 92.3 %
Lipid metabolism 50 84.7 %
Amino acid metabolism 175 85.4 %
Nucleotide metabolism 85 88.5 %
Metabolism of cofactors & vitamins 62 72.1 %
Transcription 19 65.5 %
Translation 76 95.0 %
Protein processing 9 100.0 %
Replication & repair 4 80.0 %
Membrane transport 54 78.3 %
Secondary metabolite biosynthesis 24 70.6 %
Xenobiotics biodegradation & metabolism 33 75.0 %
Signal transduction 10 52.6 %
Cell motility 9 90.0 %
Metabolism of other amino acids 16 88.9 %
Unknownd 811 49.3 %
Total detected 1646
a

Categories represent first-level annotations from the KEGG database. Hand-curated annotations were excluded for simplicity.

b

Total number of PA parent proteins per KEGG category.

c

Percentages indicate the fraction of peptides detected out of the total predicted by the H. salinarum NRC-1 genome sequence in each KEGG category.

d

Proteins marked “unknown” had no annotations and were not represented in the KEGG database.

Identification of specialized components of H. salinarum physiology

We have detected 20 of the 26 components of the four unique modes of energy production in H. salinarum NRC-1, including oxidative phosphorylation, arginine fermentation, phototrophy and dimethyl sulfoxide (DMSO) respiration (Table 3, Supplemental Table ST-2). Although all proteins from the arginine deiminase pathway, including ArcRABC, were previously detected25, here we have newly detected previously elusive components of the DMSO and phototrophic respiratory pathways DmsC and the regulator Bat (Table 3). Specifically, we were able to detect 183 unique peptides from 13 proteins involved in energy production via bacteriorhodopsin-mediated phototrophy by enriching for the purple membrane (Table 1: fractionation, ICAT, and bacteriorhodopsin IP gel band extraction experiments)31. Interestingly, in cells which overexpress the purple membrane (Table 1: ICAT experiments), we also detected VNG1459H, a protein of unknown function. VNG1459H co-localizes in the genome and is significantly co-expressed under relevant environmental conditions with other known phototrophy genes13, 32, 33. While the exact function remains to be tested, these data support the prediction that this protein may be involved in phototrophy, an extension of a process that was considered well understood.

Fractionation and subsequent solubilization with detergents also improved the detection of other membrane-associated proteins, allowing detection of 188 out of 550 proteins with predicted transmembrane domains19, 34 (Fig. 2D). We have also verified the expression of a large number of transcription factors despite their supposed low abundance in the cell: 68% of predicted transcription factors (88 out of 130) are included in the PA, which represents a significant improvement over previous proteomic surveys of H. salinarum NRC-1 and other organisms, which detected at most 44% of all predicted TFs35. For example, we detected several general transcription factors (e.g. TFBa, TFBd, TFBe, TBPc, and TBPd) only upon enrichment by immunoprecipitation (Fig. 1, Table 1).

Figure 2. Characteristics of peptides detected in H. salinarum NRC-1 proteomics experiments included in the PeptideAtlas. (A) Influence of molecular weight (MW) on peptide detection.

Figure 2

A comparison of total predicted (grey bars) vs. total detected (black bars) tryptic peptides indicates the expected peptide detection range of mass spectrometry as a function of peptide size. (B) Influence of charge on detection of peptides. A similar plot of total expected vs. detected peptides as in (A) but as a function of isoelectric point (pI) shows a bias against the detection of basic peptides. (C) Peptide detection as a function of relative hydrophobicity. Comparison of predicted vs. observed peptides as a function of hydrophobicity46 shows optimal detection was for peptides of hydrophobicity of ∼30. (D) Detection of membrane vs. soluble proteins. The membrane association predictions are based on hydropathy plots as in (C). This plot shows that despite enrichment of membrane proteins in some experiments there is a significant bias in detection of membrane proteins (∼34%) relative to soluble proteins (∼70%).

Figure 1. A threshold of peptide detections has been achieved in the H. salinarum NRC-1 PA.

Figure 1

The cumulative number of peptides detected with P ≥ 0.9 is plotted as a function of the cumulative number of MS/MS spectra. We observe an increase in new unique peptides with an increase in the numbers of MS/MS spectra added to the PA. However, it appears that we have reached a threshold given the five different approaches that we have used thus far. Peptides detected within each type of proteomic experimental approach are color-coded (see legend for details). Numbers in legend indicate total numbers of peptides detected uniquely in each experiment type. For example, 978 peptides were detected only in ICAT experiments and not in any of the other proteomic experiments represented in the PA.

Strategies for improving proteome coverage

Despite the unprecedented proteome coverage of the H. salinarum NRC-1 PA, it is significant that nearly 37% of all predicted proteins were not detected. In fact we observe from a cumulative plot of numbers of distinct peptides detected as a function of individual experiment type that we have reached an apparent threshold that was previously predicted6, 8, 11 but not observed until now (Fig. 1). To explore the possibility of improving coverage, we examined the influences of several parameters on proteome detection. Using these metrics as a guide, we discuss possible solutions below for improving detections in high-throughput analysis of the proteome.

Influence of protein molecular weight on detection

As expected, we observed better detection of peptides with an increase in molecular weight up to 1,500 Da (Fig. 2A). This is explained by a combination of peptide sequence uniqueness with an increase in length and the detection limits of the mass spectrometer. Protein size is also an important factor at play with regard to proteome coverage considering that lower molecular weight proteins tend to be underrepresented in total protein surveys36. However, it is noteworthy that we have detected at least one peptide from each of 406 (∼42%) out of the total 963 predicted proteins with calculated molecular weights less than 20 KDa, which is a slight improvement over the 380 proteins detected in a recent study specifically designed to enrich these proteins36.

Isoelectric point

The isoelectric point of a peptide influences its enrichment depending on the type of fractionation columns used for enriching peptides (or proteins) during sample preparation. Most proteins in H. salinarum NRC-1 have a relatively higher number of acidic residues and the resulting surface negative charge is believed to help circumvent protein aggregation and precipitation in a hypersaline cytoplasm37. Consequently most peptides in H. salinarum NRC-1 PA are also acidic with a median isoelectric point of 4.4 (Fig. 2B). It is interesting that despite the predominant use of cation exchange chromatography for sample processing in most of the experiments within the H. salinarum NRC-1 PA, a significant fraction of basic peptides were not detected.

Hydrophobicity

Peptides of very low hydrophobicity were poorly detected. This is expected because of the property of the LC column used in most of our experiments34. Low hydrophobicity peptides are washed off from these columns before the mass spectrometer has a chance to analyze them. Also, as expected, peptides with hydrophobicity greater than ∼30 were detected at a relatively lower frequency, perhaps due to their low solubility (Fig. 2C).

Solubility

Despite enrichment of membrane proteins in some experiments, this fraction of the proteome is poorly represented in the PA (Fig. 1). This was evident in the observation that over 90% of all detected peptides originated from proteins predicted to be soluble (Fig. 2D). This bias in detection has been discussed previously4.

Influence of protein abundance on peptide detection

Abundance of proteins in a population can significantly influence the time a mass spectrometer spends analyzing each unique protein species38. Since there is no independent approach to measure absolute abundance of proteins on a systems scale we evaluated the use of mRNA signal intensity from microarray-based transcription profiling experiments as a proxy for the same. First we investigated whether mRNA and protein abundance were indeed proportional. The dynamic quantitative relationships between transcription and translation can be assessed at the level of absolute abundance or relative changes in the outputs of mRNA and protein. Although comparisons of relative changes across protein and mRNA concentrations have yielded variable relationships in some studies3, 39-43, we have previously demonstrated that given sufficient numbers of temporal measurements for both RNA and protein level changes over time scales of minutes, for most genes there exists significant time-lagged correlation between relative changes in transcript and protein abundance2, 16.

However, a significant correlation value between absolute mRNA and protein across the entire genome has not yet been reported40, 44. To assess this relationship we compared mRNA signal intensities from 215 microarray experiments4, 20, 24 (Supplementary Table ST-1) to average spectral counts (over all peptides) per protein from 497 mass spectrometric runs. This comparison yielded significant correlation across the two datasets (Spearman correlation ∼0.5; P < 10−6) indicating that the abundance of most proteins is proportional to the abundance of their corresponding transcripts (Fig. 3C). In addition, we found that this relationship is not biased by protein length (Supplemental Fig. SF-2). Further, from our analysis we found that for lower abundance transcripts there is a dramatic increase in proteome coverage with small increases in mRNA signal intensity (Fig. 3C). This may be attributable to the observation that a significant fraction of the transcriptome (>60%) in H. salinarum NRC-1 appears to be present in low abundance (300−1500 intensity units) (Fig. 3C). Regardless, we find that more peptides tend to be detected from proteins whose transcripts are present in higher abundance, as reflected in better sequence coverage and spectral counts per individual protein with an increase in mRNA abundance (Fig. 3A, B). We conclude from this analysis that although targeted enrichment will help detect peptides with certain biophysical properties, approaches to enrich low abundance proteins and higher sensitivity mass spectrometers will yield higher proteome coverage.

Strategies for targeted proteomics

As the PA becomes increasingly comprehensive we can use it for designing strategies for high throughput approaches to rapidly characterize the proteome –both qualitatively and quantitatively. A tangible approach to accomplish this is via the use of “proteotypic peptides”5, 45, which are peptides that map uniquely to one protein and are likely to be observed with LCMS/MS if the protein is present. We selected proteotypic peptides as those that (i) receive a PeptideProphet score of P > 0.9, (ii) were detected in more than one experiment, and (iii) have an Empirical Observability Score (EOS) > 0.3 (Materials and Methods). These peptides can be used as beacons for tracking specific proteins in high-throughput experiments for the targeted analysis of the proteome, greatly reducing mass-spectrometer time and improving proteome coverage at the same time. Proteotypic peptides can also aid in the QCAT approach7, in which known quantities of labeled synthetic peptides are spiked in and used as reference for absolute quantification of proteins.

Using the criteria listed above, we have identified proteotypic peptides for 1,505 proteins or 57.3% of the proteome (Table 4). In other words, we can now, in principle, tune the mass spectrometer to specifically search for 1,505 mass spectra instead of a possible ∼76,212 (Table 4), which represents a 50-fold reduction in mass spectrometer time to get information on the same number of proteins. However, the selection of proteotypic peptides for practical applications is more complicated since the PA represents a diverse array of experimental techniques (ICAT, iTRAQ, immunoprecipitation, etc.) designed to address different scientific problems. Each of these approaches could have an inherent bias in peptide detection as suggested by the observation that a majority of proteins were observed in a small number of runs; for example, 633 (43%) of the 1,461 observed non-redundant proteins were observed in 5 or fewer of the 497 runs (Fig. 4). We conducted a comparative analysis to determine possible biases and efficiencies of peptide detection by each individual approach. We caution that such a comparison can be confounded by the significant biases in protein and peptide populations that were intentionally enriched in several of these approaches. For example, there was very little overlap between cysteine-specific ICAT labeling approach and any other proteomics method (Fig. 5). While the genetic and environmental perturbations could partly explain the reason for the bias, this is more likely an outcome of the poor per-protein cysteine content in H. salinarum NRC-1 (<65%)18.

Table 4.

Detection of H. salinarum NRC-1 PA proteins by proteotypic peptides.

distinct peptides per proteina distinct proteins represented
1 570
2 405
3 202
4 117
5 71
6 34
7 37
8 24
9 13
10 7
>10 25b
a

">10" includes all proteins with 11 or more distinct peptides per protein.

b

25 represents the sum of distinct proteins represented by 11−56 distinct peptides per protein.

Figure 4. The majority of proteins detected by the H. salinarum NRC-1 PA are observed in a small number of μLC-ESI-MS/MS runs.

Figure 4

The number of μLC-ESI-MS/MS runs is plotted on the X-axis. The cumulative number of proteins detected with the addition of each successive MS run is plotted on the Y-axis.

Figure 5. Comparison of all experiments included in the H. salinarum PA.

Figure 5

Numbers in parentheses indicate peptide detections by each approach, numbers in boxes indicate unique detections, and shading indicates the percentage of peptides detected by both approaches. Experiment names refer to those introduced in Table 1.

Despite the potential for these inherent biases, we were able to make a fair and statistically significant comparison of performance across two of the most information rich datasets that shared significant numbers of detected proteins: iTRAQ vs. all other shotgun proteomics methods (Fig. 5, Table 5). Specifically, we considered proteotypic peptides for 25 proteins that were observed in the largest number of LCMS runs in iTRAQ experiments vs. all other approaches (data for two proteins are shown in Table 5, the remaining 25 proteins is given in Supplemental Table ST-3). Notably, of the 180 proteotypic peptides for these 25 proteins, only 40 were detected reliably by both approaches. A likely explanation is that iTRAQ introduces a significant bias in the types of peptides that are detected. Therefore, the choice of an appropriate proteotypic peptide will clearly vary depending on the application. Until we have a clearer understanding for the reasons for these observed biases, an empirical approach remains the best option for proteotypic peptide selection. For example, searching for the glutamate dehydrogenase protein GdhB in the H. salinarum NRC-1 PA yields 26 distinct peptides, which were detected a total of 208 times. Of these, the peptide CAVMDLPFGGAK (PA accession: PAp00363211) has an Empirical Observability Score (EOS) score of 0.59, indicating it was detected reliably. Further investigation into this peptide reveals that it was detected in both iTRAQ and ICAT experiments a total of 57 times and is therefore a reliable candidate for future use as a proteotypic peptide for both of these experimental approaches. However, this peptide was not detected in any of the experiments for targeted enrichment of transcription factors. A second peptide, VVQVSVPVER (PAp00368363) with a lower EOS score of 0.41, on the other hand, was detected only 12 times but observed in at least three of the targeted enrichment experiments. Clearly, this second peptide would make for a better proteotypic peptide for these targeted enrichment applications. Using this approach, one can compile a custom list of proteotypic peptides for a specific application of interest. Further explorations of this type are possible at the H. salinarum NRC-1 proteome annotation webpage (http://baliga.systemsbiology.net/halobacterium).

Table 5.

Comparison between iTRAQ and non-iTRAQ proteotypic peptides for two proteins.

Methoda Protein ID Peptideb
iTRAQ VNG0414G DIPTVVVER
iTRAQ VNG0414G EFDDGPAAAVIK
iTRAQ VNG0414G YGENPHQDAAVYR
non-iTRAQ VNG0414G EVVVAPGYTDDAVDVLTAK
non-iTRAQ VNG0414G DNTHAAASVVHADQLNPDAK
non-iTRAQ VNG0414G HTNPAGCATADTLADAYSDALSTDAK
non-iTRAQ VNG0414G VLDVGTLDGTPAPVTETPLVGGR
iTRAQ VNG0620G QSFLDVVMNER
iTRAQ VNG0620G GPAASGGYYTIAPTDK
iTRAQ VNG0620G AQIYAGNK
iTRAQ VNG0620G ASVQNYGVVYR
iTRAQ VNG0620G VSSPGGAVSGSEVQYR
non-iTRAQ VNG0620G LAQEKPVVTSVR
non-iTRAQ VNG0620G AVEIGLADEIGGLDAAIADAADR
non-iTRAQ VNG0620G VSSPGGAVSGSEVQYR
non-iTRAQ VNG0620G GPAASGGYYTIAPTDK
a

The two proteins VNG0414G and VNG620G were identified in both iTRAQ experiments and in non-iTRAQ experiments in 71 and 51 LCMS runs, respectively.

b

We selected the peptides most frequently observed in each experimental method and note that they are distinct.

Supplementary Material

Supp. data

ACKNOWLEDGEMENTS

We thank Christopher Bare and David Campbell for help with programming, database construction and false discovery rate calculation, and Ning Zhang for help with generating proteotypic peptide scores. This work was supported by grants from NIH (P50GM076547 and 1R01GM077398-01A2), DoE (MAGGIE: DE-FG02-07ER64327), NSF (EF-0313754, EIA-0220153, MCB-0425825, DBI-0640950) and NASA (NNG05GN58G) and Institute for Systems Biology institutional support to NSB, postdoctoral fellowships are acknowledged from NSF to MTF (DBI 0400598) and KW (0443746), and from NIH (5F32GM078980-02) to AKS.

Footnotes

SUPPORTING INFORMATION AVAILABLE

A comparative analysis revealed possible similarities between spectral counts and genome organization, a finding discussed in the Supporting Information. Supporting figures (Figure SF-1, Figure SF-2) and Supplementary Tables (ST1 and ST2) are freely available online at http://pubs.acs.org.

REFERENCES

  • 1.Facciotti MT, Bonneau R, Hood L, Baliga NS. Systems biology experimental design--considerations for building predictive gene regulatory network models for prokaryotic systems. Current Genomics. 2004;5:527–544. [Google Scholar]
  • 2.Kaur A, Pan M, Meislin M, Facciotti MT, El-Geweley R, Baliga NS. A systems view of haloarchaeal strategies to withstand stress from transition metals. Genome Res. 2006;16:841–854. doi: 10.1101/gr.5189606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Gygi SP, Rochon Y, Franza BR, Aebersold R. Correlation between protein and mRNA abundance in yeast. Mol Cell Biol. 1999;19:1720–1730. doi: 10.1128/mcb.19.3.1720. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Whitehead K, Kish A, Pan M, Kaur A, Reiss DJ, King N, Hohmann L, Diruggiero J, Baliga NS. An integrated systems approach for understanding cellular responses to gamma radiation. Mol Syst Biol. 2006;2:47. doi: 10.1038/msb4100091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Mallick P, Schirle M, Chen SS, Flory MR, Lee H, Martin D, Ranish J, Raught B, Schmitt R, Werner T, Kuster B, Aebersold R. Computational prediction of proteotypic peptides for quantitative proteomics. Nat Biotechnol. 2007;25:125–131. doi: 10.1038/nbt1275. [DOI] [PubMed] [Google Scholar]
  • 6.Desiere F, Deutsch EW, King NL, Nesvizhskii AI, Mallick P, Eng J, Chen S, Eddes J, Loevenich SN, Aebersold R. The PeptideAtlas project. Nucleic Acids Res. 2006;34:D655–658. doi: 10.1093/nar/gkj040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Beynon RJ, Doherty MK, Pratt JM, Gaskell SJ. Multiplexed absolute quantification in proteomics using artificial QCAT proteins of concatenated signature peptides. Nat Methods. 2005;2:587–589. doi: 10.1038/nmeth774. [DOI] [PubMed] [Google Scholar]
  • 8.Desiere F, Deutsch EW, Nesvizhskii AI, Mallick P, King NL, Eng JK, Aderem A, Boyle R, Brunner E, Donohoe S, Fausto N, Hafen E, Hood L, Katze MG, Kennedy KA, Kregenow F, Lee H, Lin B, Martin D, Ranish JA, Rawlings DJ, Samelson LE, Shiio Y, Watts JD, Wollscheid B, Wright ME, Yan W, Yang L, Yi EC, Zhang H, Aebersold R. Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biol. 2005;6:R9. doi: 10.1186/gb-2004-6-1-r9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Deutsch EW, Eng JK, Zhang H, King NL, Nesvizhskii AI, Lin B, Lee H, Yi EC, Ossola R, Aebersold R. Human Plasma PeptideAtlas. Proteomics. 2005;5:3497–3500. doi: 10.1002/pmic.200500160. [DOI] [PubMed] [Google Scholar]
  • 10.Desiere F, Deutsch EW, Nesvizhskii AI, Mallick P, Eng JK, Aderem A, Boyle R, Brunner E, Donohoe S, Fausto N, Hafen E, Hood L, Katze MG, Kennedy K, Kregenow F, Lee H, Lin B, Martin D, Ranish J, Rawlings DJ, Samelson LE, Shiio Y, Watts J, Wollscheid B, Wright ME, Yan W, Yang L, Yi E, Zhang H, Aebersold R. Integration of Peptide Sequences Obtained by High-Throughput Mass Spectrometry with the Human Genome. Genome Biology. 2004;5:R9. doi: 10.1186/gb-2004-6-1-r9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.King NL, Deutsch EW, Ranish JA, Nesvizhskii AI, Eddes JS, Mallick P, Eng J, Desiere F, Flory M, Martin DB, Kim B, Lee H, Raught B, Aebersold R. Analysis of the Saccharomyces cerevisiae proteome with PeptideAtlas. Genome Biol. 2006;7:R106. doi: 10.1186/gb-2006-7-11-r106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ng WV, Kennedy SP, Mahairas GG, Berquist B, Pan M, Shukla HD, Lasky SR, Baliga NS, Thorsson V, Sbrogna J, Swartzell S, Weir D, Hall J, Dahl TA, Welti R, Goo YA, Leithauser B, Keller K, Cruz R, Danson MJ, Hough DW, Maddocks DG, Jablonski PE, Krebs MP, Angevine CM, Dale H, Isenbarger TA, Peck RF, Pohlschroder M, Spudich JL, Jung KW, Alam M, Freitas T, Hou S, Daniels CJ, Dennis PP, Omer AD, Ebhardt H, Lowe TM, Liang P, Riley M, Hood L, DasSarma S. Genome sequence of Halobacterium species NRC-1. Proc Natl Acad Sci U S A. 2000;97:12176–12181. doi: 10.1073/pnas.190337797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bonneau R, Facciotti MT, Reiss DJ, Schmid AK, Pan M, Kaur A, Thorsson V, Shannon P, Johnson MH, Bare JC, Longabaugh W, Vuthoori M, Whitehead K, Madar A, Suzuki L, Mori T, Chang DE, Diruggiero J, Johnson CH, Hood L, Baliga NS. A predictive model for transcriptional control of physiology in a free living cell. Cell. 2007;131:1354–1365. doi: 10.1016/j.cell.2007.10.053. [DOI] [PubMed] [Google Scholar]
  • 14.Baliga NS, Pan M, Goo YA, Yi EC, Goodlett DR, Dimitrov K, Shannon P, Aebersold R, Ng WV, Hood L. Coordinate regulation of energy transduction modules in Halobacterium sp. analyzed by a global systems approach. Proc Natl Acad Sci U S A. 2002;99:14913–14918. doi: 10.1073/pnas.192558999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Baliga NS, Bjork SJ, Bonneau R, Pan M, Iloanusi C, Kottemann MCH, Hood L, DiRuggiero J. Systems Level Insights Into the Stress Response to UV Radiation in the Halophilic Archaeon Halobacterium NRC-1. Genome Res. 2004;14:1025–1035. doi: 10.1101/gr.1993504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Schmid AK, Reiss DJ, Kaur A, Pan M, King N, Van PT, Hohmann L, Martin DB, Baliga NS. The anatomy of microbial cell state transitions in response to oxygen. Genome Res. 2007;17:1399–1413. doi: 10.1101/gr.6728007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Facciotti MT, Reiss DJ, Pan M, Kaur A, Vuthoori M, Bonneau R, Shannon P, Srivastava A, Donohoe SM, Hood LE, Baliga NS. General transcription factor specified global gene regulation in archaea. Proc Natl Acad Sci U S A. 2007;104:4630–4635. doi: 10.1073/pnas.0611663104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Baliga NS, Pan M, Goo YA, Yi EC, Goodlett DR, Dimitrov K, Shannon P, Aebersold R, Ng WV, Hood L. Coordinate regulation of energy transduction modules in Halobacterium sp. analyzed by a global systems approach. Proc Natl Acad Sci U S A. 2002;99:14913–14918. doi: 10.1073/pnas.192558999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Goo YA, Yi EC, Baliga NS, Tao WA, Pan M, Aebersold R, Goodlett DR, Hood L, Ng WV. Proteomic Analysis of an Extreme Halophilic Archaeon, Halobacterium sp. NRC-1. Mol Cell Proteomics. 2003;2:506–524. doi: 10.1074/mcp.M300044-MCP200. [DOI] [PubMed] [Google Scholar]
  • 20.Schmid AK, Reiss DJ, Kaur A, Pan M, King N, Van PT, Hohmann L, Martin DB, Baliga NS. The anatomy of microbial cell state transitions in response to oxygen. Genome Res. 2007;17:1399–1413. doi: 10.1101/gr.6728007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Pedrioli PG, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti RH, Apweiler R, Cheung K, Costello CE, Hermjakob H, Huang S, Julian RK, Kapp E, McComb ME, Oliver SG, Omenn G, Paton NW, Simpson R, Smith R, Taylor CF, Zhu W, Aebersold R. A common open representation of mass spectrometry data and its application to proteomics research. Nat Biotechnol. 2004;22:1459–1466. doi: 10.1038/nbt1031. [DOI] [PubMed] [Google Scholar]
  • 22.Eng JK, McCormack AL, Yates JRI. An approach to correlate tandem mass spectral data of pepties with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994;5:976–989. doi: 10.1016/1044-0305(94)80016-2. [DOI] [PubMed] [Google Scholar]
  • 23.Keller A, Eng J, Zhang N, Li XJ, Aebersold R. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol. 2005;1:2005, 0017. doi: 10.1038/msb4100024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Baliga NS, Bjork SJ, Bonneau R, Pan M, Iloanusi C, Kottemann MC, Hood L, DiRuggiero J. Systems level insights into the stress response to UV radiation in the halophilic archaeon Halobacterium NRC-1. Genome Res. 2004;14:1025–1035. doi: 10.1101/gr.1993504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gan RR, Yi EC, Chiu Y, Lee H, Kao YC, Wu TH, Aebersold R, Goodlett DR, Ng WV. Proteome analysis of Halobacterium sp. NRC-1 facilitated by the biomodule analysis tool BMSorter. Mol Cell Proteomics. 2006;5:987–997. doi: 10.1074/mcp.M500367-MCP200. [DOI] [PubMed] [Google Scholar]
  • 26.Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23:673–679. doi: 10.1093/bioinformatics/btm009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A. Pfam: clans, web tools and services. Nucleic Acids Res. 2006;34:D247–251. doi: 10.1093/nar/gkj149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28:33–36. doi: 10.1093/nar/28.1.33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Bonneau R, Baliga NS, Deutsch EW, Shannon P, Hood L. Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp. NRC-1. Genome Biol. 2004;5:R52. doi: 10.1186/gb-2004-5-8-r52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kanehisa M. The KEGG database. Novartis Found Symp. 2002;247:91–101. discussion 101−103, 119−128, 244−152. [PubMed] [Google Scholar]
  • 31.Hartmann R, Sickinger HD, Oesterhelt D. Anaerobic growth of halobacteria. Proc Natl Acad Sci U S A. 1980;77:3821–3825. doi: 10.1073/pnas.77.7.3821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Bonneau R, Reiss DJ, Shannon P, Facciotti M, Hood L, Baliga NS, Thorsson V. The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol. 2006;7:R36. doi: 10.1186/gb-2006-7-5-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Reiss DJ, Baliga NS, Bonneau R. Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics. 2006;7:280. doi: 10.1186/1471-2105-7-280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Schindler PA, Van Dorsselaer A, Falick AM. Analysis of hydrophobic proteins and peptides by electrospray ionization mass spectrometry. Anal Biochem. 1993;213:256–263. doi: 10.1006/abio.1993.1418. [DOI] [PubMed] [Google Scholar]
  • 35.Lipton MS, Pasa-Tolic L, Anderson GA, Anderson DJ, Auberry DL, Battista JR, Daly MJ, Fredrickson J, Hixson KK, Kostandarithes H, Masselon C, Markillie LM, Moore RJ, Romine MF, Shen Y, Stritmatter E, Tolic N, Udseth HR, Venkateswaran A, Wong KK, Zhao R, Smith RD. Global analysis of the Deinococcus radiodurans proteome by using accurate mass tags. Proc Natl Acad Sci U S A. 2002;99:11049–11054. doi: 10.1073/pnas.172170199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Klein C, Aivaliotis M, Olsen JV, Falb M, Besir H, Scheffer B, Bisle B, Tebbe A, Konstantinidis K, Siedler F, Pfeiffer F, Mann M, Oesterhelt D. The Low Molecular Weight Proteome of Halobacterium salinarum. J Proteome Res. 2007;6:1510–1518. doi: 10.1021/pr060634q. [DOI] [PubMed] [Google Scholar]
  • 37.Kennedy SP, Ng WV, Salzberg SL, Hood L, DasSarma S. Understanding the Adaptation of Halobacterium Species NRC-1 to Its Extreme Environment through Computational Analysis of Its Genome Sequence. Genome Res. 2001;11:1641–1650. doi: 10.1101/gr.190201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhang H, Li XJ, Martin DB, Aebersold R. Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. Nat Biotechnol. 2003;21:660–666. doi: 10.1038/nbt827. [DOI] [PubMed] [Google Scholar]
  • 39.Bitton DA, Okoniewski MJ, Connolly Y, Miller CJ. Exon level integration of proteomics and microarray data. BMC Bioinformatics. 2008;9:118. doi: 10.1186/1471-2105-9-118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Cox B, Kislinger T, Wigle DA, Kannan A, Brown K, Okubo T, Hogan B, Jurisica I, Frey B, Rossant J, Emili A. Integrated proteomic and transcriptomic profiling of mouse lung development and Nmyc target genes. Mol Syst Biol. 2007;3:109. doi: 10.1038/msb4100151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Flory MR, Lee H, Bonneau R, Mallick P, Serikawa K, Morris DR, Aebersold R. Quantitative proteomic analysis of the budding yeast cell cycle using acid-cleavable isotope-coded affinity tag reagents. Proteomics. 2006;6:6146–6157. doi: 10.1002/pmic.200600159. [DOI] [PubMed] [Google Scholar]
  • 42.Washburn MP, Koller A, Oshiro G, Ulaszek RR, Plouffe D, Deciu C, Winzeler E, Yates JR., 3rd Protein pathway and complex clustering of correlated mRNA and protein expression analyses in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 2003;100:3107–3112. doi: 10.1073/pnas.0634629100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Schmidt MW, Houseman A, Ivanov AR, Wolf DA. Comparative proteomic and transcriptomic profiling of the fission yeast Schizosaccharomyces pombe. Mol Syst Biol. 2007;3:79. doi: 10.1038/msb4100117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Corbin RW, Paliy O, Yang F, Shabanowitz J, Platt M, Lyons CE, Jr., Root K, McAuliffe J, Jordan MI, Kustu S, Soupene E, Hunt DF. Toward a protein profile of Escherichia coli: comparison to its transcription profile. Proc Natl Acad Sci U S A. 2003;100:9232–9237. doi: 10.1073/pnas.1533294100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Craig R, Cortens JP, Beavis RC. The use of proteotypic peptide libraries for protein identification. Rapid Commun Mass Spectrom. 2005;19:1844–1850. doi: 10.1002/rcm.1992. [DOI] [PubMed] [Google Scholar]
  • 46.Krokhin OV. Sequence-specific retention calculator. Algorithm for peptide retention prediction in ion-pair RP-HPLC: application to 300- and 100-A pore size C18 sorbents. Anal Chem. 2006;78:7785–7795. doi: 10.1021/ac060777w. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp. data

RESOURCES