Abstract
Most luminal lysosomal proteins are synthesized as precursors containing mannose 6-phosphate (Man6-P) and a number of recent studies have conducted affinity purification of Man6-P containing proteins as a step towards defining the composition of the lysosome. Approximately 60 known lysosomal proteins have been found in such studies as well as many other Man-6-P glycoproteins, some of which represent new lysosomal proteins. The latter are of considerable interest from cell-biological and biomedical perspectives but differentiating between them and other proteins remains a significant challenge. The aim of this study was to conduct a global analysis of the mammalian Man6-P glycoproteome, implementing technical and biostatistical methods to aid in the discovery and validation of lysosomal candidates. We purified Man6-P glycoproteins from 17 individual rat tissues. To distinguish nonspecific contaminants (i.e. abundant or “sticky” proteins that are not fully removed during purification) from specifically-purified proteins, we conducted a semi-quantitative mass spectrometric comparison of protein levels in nonspecific mock eluates versus specific affinity chromatography eluates to identify those proteins that are specifically purified. We identified 60 known lysosomal proteins, representing nearly all that are currently known to contain Man-6-P. We also find 136 other proteins that are specifically purified but which are not known to have lysosomal function. This approach provides a list of candidate lysosomal proteins and also provides insights into the relative distribution of Man6-P glycoproteins.
Introduction
The lysosome is a eukaryotic organelle that plays a critical role in the degradation and recycling of cellular macromolecules including proteins, carbohydrates, nucleic acids and lipids. The catabolic function of the lysosome is conducted by the concerted action of soluble luminal hydrolases and their accessory proteins, as well as transmembrane proteins that function in vesicular transport, catalysis and molecular transport1. To date, approximately 60 soluble lysosomal proteins have been described and this number continues to increase. The number of lysosomal transmembrane proteins has not been well defined although recent proteomic studies indicate that they appear to be numerous (see below).
Lysosomes and lysosomal proteins are of considerable biomedical importance as they are directly involved or have been implicated in numerous human diseases. Defects in lysosomal function result in lysosomal storage disorders2 which is a group of over 40 inherited diseases that are frequently progressive, neurodegenerative and which usually result in decreased life-span. In addition, alterations in lysosomal function have been implicated in cancer and metastasis, Alzheimer disease, immune system dysfunction and other widespread human diseases.
Given these links with human disease, there is considerable interest in defining the scope of cellular functions for the lysosome and one direction in which this has been recently explored is in the proteomic characterization of its constituent proteins (reviewed in3). A particular emphasis of these studies has been in the identification of new lysosomal proteins to better understand the function of this organelle but also to identify candidates for the defective proteins in human lysosomal storage diseases of unknown etiology4, 5. Different approaches have been used in the proteomic characterization of lysosomal proteins and each has inherent advantages and disadvantages.
Proteomic surveys have been conducted on subcellular fractions enriched for lysosomes by gradient centrifugation6-8. This approach allows for the identification of both soluble and transmembrane lysosomal proteins but, because lysosomes cannot be isolated to homogeneity due to an intrinsic overlap in the density of cellular organelles, enrichment for lysosomal proteins is relatively modest using such techniques (typically 50- to 100-fold). Thus, proteomic studies based upon subcellular fractionation alone are prone to false positive errors in terms of assignment of lysosomal localization. However, as improvements in preparative methods and statistical analysis of data are implemented, the accuracy of lysosomal assignments from such studies appears to be increasing8.
An alternative approach that allows for much greater enrichment of the subset of proteins that reside within the lumen of the lysosome is affinity purification based upon the presence of a specific carbohydrate modification, mannose 6-phosphate (Man6-P). Man6-P is found on N-linked glycans of most newly synthesized soluble lysosomal proteins and is recognized by two Man6-P receptors (MPRs) that direct the vesicular trafficking of lysosomal proteins from the Golgi to an acidic prelysosomal compartment9. While lysosomal proteins in transit contain the Man6-P modification, the total amount of any given lysosomal protein in the Man6-P glycoform is dependent on source, as it may be rapidly removed in some tissue or cell-types but may persist in others. Thus, depending on the type of sample analyzed, 1 to ∼50% of a given lysosomal protein may contain Man6-P and such glycoforms can be purified from complex mixtures using immobilized soluble forms of the MPRs as an affinity purification reagent10. This approach has been used to investigate the lysosomal proteomes from a number of sources including cultured cells and tissues10-20. This method allows for considerable purification factors (e.g. >106–fold when Man6-P glycoproteins were purified from human plasma17) but there are important limitations. First, while strongly suggestive, the presence of Man6-P does not always equate with lysosomal localization. Second, differentiating between true Man6-P glycoproteins and contaminants can represent a significant hurdle. For example, in any sample purified by affinity chromatography on immobilized MPR, in addition to Man6-P glycoproteins, there are also proteins that do not contain Man6-P but which instead bind and copurify with true Man6-P glycoproteins (i.e. specific contaminants) as well as highly abundant or “sticky” cellular proteins that are not completely removed by affinity chromatography (i.e. nonspecific contaminants).
While these different approaches to the purification of lysosomal proteins have their own particular merits, a general limitation of all of the studies conducted to date is that they have been performed on limited numbers of sources and this could potentially restrict the number of proteins found. Lysosomes are found in all nucleated cell types and many acid hydrolases appear to be present in all lysosomes but levels of individual lysosomal proteins vary considerably according to cell type and tissue. In addition, some lysosomal proteins are only expressed in highly-specialized tissues and cell types. For example, granzymes A and B are lysosomal proteins that play a role in immune function and which appear to be restricted to cytotoxic T lymphocytes and natural killer cells21. Variations in the distribution of lysosomal proteins were clearly shown in an analysis of rat tissues demonstrating that the content of Man6-P glycoproteins varies considerably in both quantitative and qualitative respects10. Similarly, expression profiling of soluble lysosomal proteins in 45 human tissues based upon the detection of their respective transcripts (Fig. 1, Panel A; Online Supplementary Material Table 1) demonstrates some lysosomal proteins to be quite widely distributed (e.g. present in as many as 44 tissues based on transcript analysis) whereas expression of others is more limited. In addition, the number of tissues in which transcripts corresponding to each lysosomal protein are found, increase with the total number of ESTs assigned to each protein (Fig. 1, Panel B). Tissue distribution may be particularly relevant in the search for new lysosomal proteins which could potentially have escaped classification as such because of a restricted expression pattern.
In this study, we have surveyed the mammalian Man6-P glycoproteome from 17 individual rat tissues using methods that allow the micropurification of these proteins from limiting amounts of sample. We estimated protein abundance in specific versus nonspecific mock affinity column eluates to help differentiate between Man6-P glycoproteins and nonspecific contaminants. The combination of a global purification approach with bioinformatic methods to eliminate nonspecific contaminants has allowed the generation of a database of mammalian proteins that are specifically purified by MPR affinity chromatography, many of which represent previously unrecognized candidate lysosomal proteins.
Experimental
Purification of Man6-P glycoproteins
Rat tissues from adult Sprague-Dawley rats that were euthanized using hypobaric CO2 were obtained from Zivic Laboratories Inc (Pittsburgh, PA). Tissue samples were derived from 2 to 4 animals depending on the size of the particular tissue sample. Affinity purification of Man6-P glycoproteins was essentially as described10 with a number of modifications to allow a small scale procedure for limiting amounts of tissue sample. All procedures were conducted at 4 °C. Tissues were homogenized using a Brinkmann Polytron homogenizer (Westbury, NY) with 20 mm generator in 100 ml phosphate buffered saline (PBS) containing protease and phosphatase inhibitors (defined as “PBS-I” and comprising PBS containing 5 mM beta-glycerophosphate and 2.5 mM EDTA, 1 ug/ml pepstatin A, 1 ug/ml leupeptin and 0.25 mM Pefabloc). Tween-20 was added to a final concentration of 0.2 % and the homogenate was centrifuged at 40,000 × g for 2 hrs. The resulting supernatant was filtered through Whatman 3MM paper to remove insoluble lipids and other aggregates. Supernatants were loaded overnight onto 4 ml bed volume columns of sCI-MPR coupled to Affigel 10 at a density of 5 mg/ml 10. Columns were then flow-washed with 30 ml PBS-I containing 0.2 % Tween 20, then batch washed 4 times with 10 ml PBS-I containing 0.2 % Tween 20 and then 4 times with 10 ml PBS-I without Tween 20. Columns were then flow-washed overnight with 80ml PBS-I and sequentially batch eluted with: 1) PBS containing 10 mM mannose and 10 mM glucose 6-phosphate; 2) PBS containing 10 mM Man6-P, and; 3) 0.1 M glycine, pH 2.5. To perform batch elution, beads were resuspended in 4 ml of each respective elution buffer, incubated for 10 min and allowed to drain by gravity flow. Each elution was then repeated and pooled to give 8 ml per elution fraction. The volume of the different elutions were each reduced to ∼100 μl using a Centricon YM10 centrifugal concentrator and the protein concentration was determined22
Tandem mass spectrometry
For each tissue, a sample of the specific (Man6-P) or nonspecific (mannose and glucose 6-phosphate) affinity purification eluates was heated for 10 minutes at 60°C in reducing, denaturing SDS-PAGE sample buffer, then fractionated on precast 10% polyacrylamide gels (Invitrogen, Carlsbad, CA) until the bromophenol blue dye-front had run ∼ 1 cm into the gel. Gel slices corresponding to each sample were excised and cut into small pieces, reduced, alkylated with iodoacetamide and digested with trypsin as described16. Samples were analyzed by LC-MS/MS using an LTQ linear ion trap mass spectrometer (Thermo Fisher Scientific, Waltham, MA) as described previously17. For the Man6-P eluates, typically a portion of each digest corresponding to 1 μg of starting material was analyzed by LC-MS. This was not possible for all samples and in some cases less was analyzed and the amount of each sample used for each LC-MS analysis is shown in Table 1. For each mock eluate, we analyzed the same proportion (v/v) of the total purified sample that we analyzed for the corresponding specific eluate. For example, if an LC-MS run was performed on 10% (v/v) of a total purified specific eluate, then we also analyzed 10% of the mock eluate regardless of the protein concentration of this sample. Two LC-MS/MS runs were conducted for each elution condition for each tissue source.
Table 1. Rat tissue Man6-P glycoprotein purification.
Starting material |
Mock eluate |
Specific eluate |
|||
---|---|---|---|---|---|
Tissue | wet wt (g) | Yield (μg/g) | LC-MS load (ng) | Yield (μg/g) | LC-MS load (ng) |
brain | 3.7 | 0.2 | 2 | 11.1 | 1000 |
cecum | 1.0 | 0.4 | 2 | 10.0 | 520 |
duodenum | 2.3 | 0.9 | 19 | 4.7 | 1000 |
heart | 5.0 | 0.3 | 5 | 6.5 | 1000 |
kidney | 5.0 | 0.4 | 10 | 4.0 | 1000 |
liver | 20.0 | 0.3 | 29 | 0.8 | 1000 |
lung | 5.0 | 0.1 | 3 | 4.9 | 1000 |
mammary gland | 4.5 | 0.9 | 12 | 7.4 | 1000 |
pancreas and stomach | 3.4 | 1.1 | 14 | 7.4 | 1000 |
placenta | 2.0 | 2.2 | 13 | 16.4 | 1000 |
skeletal muscle | 3.3 | 0.5 | 9 | 5.6 | 1000 |
skin | 1.2 | 2.1 | 15 | 2.8 | 200 |
spleen | 3.5 | 0.2 | 5 | 4.5 | 1000 |
testis | 4.0 | 2.0 | 16 | 11.9 | 1000 |
thymus | 2.4 | 0.6 | 9 | 1.5 | 200 |
uterus | 0.5 | 0.2 | <1 | 3.4 | 80 |
vas deferens | 0.2 | nd* | nd* | nd* | nd* |
nd, not determined.
Generation of peak lists
Peak lists were generated from raw data using the TurboSEQUEST module of BioworksBrowser 3.1 SR1 (Thermo Fisher Scientific). Parameters were: peptide molecular weight range, 500−5000 Da; threshold intensity of 1000; a precursor mass tolerance of 1.4 m/z; minimum ion count of 50 and automatic charge state determination.
Database searching
The ENSEMBL rat protein database (see below) is incomplete as five known lysosomal proteins are absent even though these proteins are encoded by the rat genome. To help include such proteins in our analysis, we searched our data against both the rat and mouse databases, converting assignments made with the mouse database to the corresponding rat gene identifier where this was available. In cases where no rat gene identifier corresponding to the mouse assignment was available, the mouse gene identifier is instead used. Databases (rat, ENSEMBL, Feb. 2006, version 48.34m, which contains 18311 known genes; mouse, ENSEMBL April 2007 build of the NCBI m37 assembly, database version 48.37a, which contains 21928 known genes) were searched using a local implementation of GPM-XE Manager version 2.1.0 (Beavis Informatics Ltd, Winnipeg, Canada) which uses X! Tandem version 2007.07.1, to assign spectral data23, 24. LTQ data was searched using the MudPit option to produce a merged output file which allows for a consistent assignment of spectra to similar or identical gene products. Parameters for searching were a precursor ion mass error of +4 and −0.5 Da and a fragment mass error of 0.4 Da. Errors in assignment of monoisotopic mass were not permitted. Cysteine carbamidomethylation was specified as a complete modification, and methionine oxidation was a permitted variable modification during development of the preliminary model with one missed cleavage site allowed. Methionine oxidation and deamidation at asparagine and glutamine residues were allowed during model refinement of those preliminary assignments achieving an expectation value of < 0.001. The threshold for protein assignment was a log GPM expectancy score (log(e)) of −10 based on the aggregate score from the merged data with a minimum of two peptides assigned per protein. When performing data analysis, sample information (e.g. source, eluate for affinity purification and spectral count data) was extracted from the merged output file. Data supporting protein assignments are given in Online Supplementary Material Table 2. Tentative subcellular locations for identified rat proteins were assigned from the human or mouse equivalents using the LOCATE subcellular localization database 25, 26.
Statistical Analysis
Relative quantitation of protein abundance in different eluates was conducted by comparing the total number of spectra assigned to each protein in each sample27, 28. Statistical analysis was essentially as described previously20. In brief, the method of Wilson29 was used to calculate the upper and lower limits of the 95% confidence interval for the ratio of spectral counts found in the specific compared to mock eluate. Analyses were conducted using R version 2.5.0, which is open source software for statistical computation and graphics (http://www.r-project.org/). Data for statistical analyses are presented in Online Supplementary Material Table 3.
Results and Discussion
MPR-affinity purification was conducted on 17 different rat tissues which were chosen primarily on the basis of availability of sufficient material for predicted yields of Man6-P glycoproteins for multiple LC-MS/MS analyses (i.e. 10−40 μg). For the majority of tissues examined, yields were adequate although for several (skin, thymus, uterus and vas deferens), source material was limiting and the subsequent yields of purified protein were less than optimal (Table 1) although not sufficiently low to preclude analysis (see Methods). The highest relative yield was obtained from placenta (16 μg/g). In other tissues, as predicted from earlier blotting experiments10, relative yields of Man6-P glycoproteins were high from brain and testis (11−12 μg/g tissue).
In total, 793 proteins were assigned that met our criteria for significance (Online Supplementary Material Table 2; summarized in Table 2) although 21 assignments were to proteins that are not of rodent origin and were eliminated from the analysis as contaminants. Of the remaining 772 assignments from rat tissues, 60 known soluble lysosomal proteins were identified and this number is comparable with that found in proteomic analysis of different human and mouse tissue sources (respectively, 60 and 56 proteins in total 16, 19, 20). It is worth noting that in rat, we found two highly similar (97%) yet genetically distinct rat CLN5 paralogs, encoded by genes on chromosomes 2 and 15. Given the similarity between these proteins, individual spectra cannot for the most part be assigned individually and we have thus considered these two proteins as a single entity in our analysis.
Table 2. Lysosomal proteins in rat tissues.
ENSG Identifier | ENSEMBL Description | Min log(e) | Coverage | Spectral counts (mock eluate) | Spectral counts (specific eluate) | Log2(specific/mock) | Function/class |
---|---|---|---|---|---|---|---|
ENSMUSG00000001348 | Acid phosphatase 5, tartrate resistant | −44.6 | 17 | 1 | 72 | 6.17 (3.65 to 10.47) | e |
ENSMUSG00000005043 | N-sulfoglucosamine sulfohydrolase | −96.6 | 26 | 18 | 104 | 2.53 (1.82 to 3.25) | e |
ENSMUSG00000016256 | Cathepsin Z | −127 | 44 | 170 | 612 | 1.85 (1.6 to 2.09) | e |
ENSMUSG00000025579 | Glucosidase, alpha, acid | −223.1 | 21 | 14 | 263 | 4.23 (3.47 to 5) | e |
ENSRNOG00000000043 | Iduronidase, alpha-L- | −145 | 28 | 6 | 127 | 4.4 (3.25 to 5.55) | e |
ENSRNOG00000000108 | Aspartylglucosaminidase | −321.5 | 61 | 138 | 983 | 2.83 (2.58 to 3.09) | e |
ENSRNOG00000000435 | Lysosomal thioesterase PPT2 | −92.5 | 40 | 14 | 163 | 3.54 (2.76 to 4.32) | e |
ENSRNOG00000000571 | Prosaposin | −326.2 | 54 | 164 | 646 | 1.98 (1.73 to 2.22) | ap |
ENSRNOG00000000913 | Beta-glucuronidase | −309 | 45 | 118 | 728 | 2.63 (2.34 to 2.91) | e |
ENSRNOG00000001385 | LAMA-like protein 2 | −400.1 | 55 | 239 | 1843 | 2.95 (2.75 to 3.14) | e |
ENSRNOG00000001465 | Iduronate 2-sulfatase | −131.6 | 31 | 16 | 142 | 3.15 (2.41 to 3.89) | e |
ENSRNOG00000002188 | Heparanase | −106.5 | 35 | 8 | 94 | 3.55 (2.53 to 4.57) | e |
ENSRNOG00000002273 | N-acylethanolamine-hydrolyzing acid amidase | −146.6 | 39 | 12 | 146 | 3.6 (2.77 to 4.44) | e |
ENSRNOG00000003291 | Cellular repressor of E1A-stimulated genes 1 | −143.6 | 52 | 22 | 346 | 3.98 (3.36 to 4.59) | u |
ENSRNOG00000003759 | Galactosylceramidase | −79.8 | 19 | 5 | 69 | 3.79 (2.52 to 5.05) | e |
ENSRNOG00000004919 | Glucosamine (N-acetyl)-6-sulfatase | −522 | 61 | 389 | 1867 | 2.26 (2.11 to 2.42) | e |
ENSRNOG00000005526 | Mannosidase 2, alpha B2 | −353.2 | 35 | 29 | 555 | 4.26 (3.72 to 4.79) | e |
ENSRNOG00000005931 | Plasma glutamate carboxypeptidase | −313.3 | 60 | 282 | 890 | 1.66 (1.47 to 1.85) | e |
ENSRNOG00000007089 | Legumain | −358 | 60 | 204 | 946 | 2.21 (2 to 2.43) | e |
ENSRNOG00000007351 | Gamma-glutamyl hydrolase | −151.6 | 39 | 74 | 764 | 3.37 (3.02 to 3.71) | e |
ENSRNOG00000008064 | Alpha-N-acetylgalactosaminidase | −463.8 | 68 | 295 | 1798 | 2.61 (2.43 to 2.79) | e |
ENSRNOG00000008310 | Myeloperoxidase | −467.7 | 57 | 74 | 726 | 3.29 (2.95 to 3.64) | e |
ENSRNOG00000009325 | Tissue alpha-L-fucosidase | −187.3 | 41 | 99 | 459 | 2.21 (1.9 to 2.53) | e |
ENSRNOG00000010034 | N-acylsphingosine amidohydrolase 1 | −318.8 | 61 | 112 | 877 | 2.97 (2.69 to 3.25) | e |
ENSRNOG00000010080 / ENSRNOG00000009759 | CLN5 protein | −93.7 | 28 | 4 | 81 | 4.34 (2.95 to 5.73) | u |
ENSRNOG00000010196 | Galactosidase, beta 1 | −471 | 58 | 384 | 1552 | 2.01 (1.85 to 2.18) | e |
ENSRNOG00000010252 | Beta-hexosaminidase alpha chain | −533.3 | 62 | 282 | 1887 | 2.74 (2.56 to 2.92) | e |
ENSRNOG00000010331 | Cathepsin B | −515.2 | 82 | 298 | 1550 | 2.38 (2.2 to 2.56) | e |
ENSRNOG00000010630 | Prolylcarboxypeptidase | −304.4 | 51 | 60 | 556 | 3.21 (2.83 to 3.6) | e |
ENSRNOG00000011150 | Arylsulfatase B | −224.9 | 55 | 68 | 362 | 2.41 (2.04 to 2.79) | e |
ENSRNOG00000011513 | Galactosidase, alpha | −309.3 | 54 | 138 | 1040 | 2.91 (2.66 to 3.17) | e |
ENSRNOG00000011864 | GM2 ganglioside activator protein | −120.4 | 48 | 141 | 86 | −0.71 (−1.1 to −0.33) | ap |
ENSRNOG00000012062 | Epididymal secretory protein E1 | −191.2 | 63 | 108 | 571 | 2.4 (2.11 to 2.7) | ap |
ENSRNOG00000012616 | Palmitoyl-protein thioesterase 1 | −278.3 | 64 | 126 | 1056 | 3.07 (2.8 to 3.33) | e |
ENSRNOG00000012640 | Dipeptidyl-peptidase 2 | −461.6 | 56 | 260 | 1645 | 2.66 (2.47 to 2.85) | e |
ENSRNOG00000012953 | Arylsulfatase A | −248.3 | 59 | 86 | 544 | 2.66 (2.33 to 2.99) | e |
ENSRNOG00000013190 | Ribonuclease T2 | −204.3 | 53 | 74 | 397 | 2.42 (2.07 to 2.78) | e |
ENSRNOG00000013476 | Beta-mannosidase | −489.8 | 49 | 226 | 1673 | 2.89 (2.69 to 3.09) | e |
ENSRNOG00000014064 | Cathepsin H | −316.8 | 62 | 292 | 1446 | 2.31 (2.13 to 2.49) | e |
ENSRNOG00000014461 | Galactosamine (N-acetyl)-6-sulfate sulfatase | −351.7 | 61 | 191 | 1146 | 2.58 (2.36 to 2.81) | e |
ENSRNOG00000015573 | Chitobiase, di-N-acetyl- | −238.8 | 51 | 107 | 716 | 2.74 (2.45 to 3.03) | e |
ENSRNOG00000015857 | Cathepsin A | −556.2 | 63 | 544 | 3568 | 2.71 (2.58 to 2.84) | e |
ENSRNOG00000016496 | Cathepsin C | −782.8 | 78 | 427 | 2968 | 2.8 (2.65 to 2.94) | e |
ENSRNOG00000017977 | Sphingomyelin phosphodiesterase 1 | −176.3 | 37 | 20 | 205 | 3.36 (2.7 to 4.01) | e |
ENSRNOG00000018566 | Cathepsin L | −559.1 | 79 | 622 | 4129 | 2.73 (2.61 to 2.85) | e |
ENSRNOG00000018989 | Ependymin related protein 1 (zebrafish) | −181.4 | 55 | 255 | 1490 | 2.55 (2.36 to 2.74) | u |
ENSRNOG00000019077 | Lysosomal acid lipase | −136.9 | 47 | 44 | 372 | 3.08 (2.63 to 3.53) | e |
ENSRNOG00000019212 | Tripeptidyl peptidase I | −382.3 | 55 | 218 | 979 | 2.17 (1.96 to 2.38) | e |
ENSRNOG00000019387 | Interferon gamma inducible protein 30 | −150.7 | 53 | 26 | 203 | 2.96 (2.38 to 3.55) | e |
ENSRNOG00000019708 | Cathepsin F | −278 | 60 | 75 | 514 | 2.78 (2.43 to 3.13) | e |
ENSRNOG00000019859 | Lysophospholipase 3 | −155.3 | 31 | 16 | 221 | 3.79 (3.06 to 4.51) | e |
ENSRNOG00000020206 | Cathepsin D | −551.7 | 72 | 867 | 3822 | 2.14 (2.03 to 2.25) | e |
ENSRNOG00000021155 | Cathepsin K | −178.4 | 38 | 12 | 100 | 3.06 (2.21 to 3.91) | e |
ENSRNOG00000021157 | Cathepsin S | −140.5 | 37 | 56 | 257 | 2.2 (1.78 to 2.61) | e |
ENSRNOG00000023830 | Deoxyribonuclease-2-alpha | −186.5 | 58 | 26 | 312 | 3.58 (3.01 to 4.16) | e |
ENSRNOG00000023910 | Mannosidase 2, alpha B1 | −807.4 | 61 | 220 | 1575 | 2.84 (2.64 to 3.04) | e |
ENSRNOG00000025274 | Beta-hexosaminidase beta chain | −364.1 | 52 | 246 | 1178 | 2.26 (2.06 to 2.46) | e |
ENSRNOG00000031266 | Sialic acid acetylesterase | −231.6 | 52 | 58 | 513 | 3.14 (2.75 to 3.54) | e |
ENSRNOG00000032381 | Alpha-N-acetylglucosaminidase | −370.8 | 39 | 46 | 807 | 4.13 (3.71 to 4.56) | e |
ENSRNOG00000032942 | Neuraminidase 1 | −222.4 | 45 | 157 | 586 | 1.9 (1.65 to 2.15) | e |
A central aim of this study was to differentiate between true Man6-P glycoproteins and nonspecific contaminants. To this end, we used the spectral counting method27, 28 to estimate the relative abundance of each protein in a given tissue sample that was released from the MPR affinity column using a glucose 6-phosphate/mannose (“mock”) or a Man6-P (“specific”) eluate. Our prediction was that true Man6-P glycoproteins (but possibly also specific contaminants associated with Man6-P glycoproteins, depending upon the strength of interaction) should be enriched in the Man6-P eluate relative to the mannose/glucose 6-phosphate eluate. In contrast, nonspecific contaminants (i.e. abundant or “sticky” proteins that leach from the column in a Man6-P independent manner) should be present at equal or greater levels in the mock compared to specific eluate. Given that the statistical power of spectral counting as a measure of protein abundance increases in proportion to the number of spectra counted, our approach was to compare the sum of spectra assigned to each protein from either the specific and mock eluates from all of the tissue samples combined together. The advantage of this approach is that it allows for confident conclusions to be drawn with respect to proteins that are present at low levels but in numerous samples. In these cases, the corresponding counts from individual samples would be insufficient to allow for useful conclusions.
We analyzed the same proportion of the total specific and mock eluates rather than equivalent amounts of protein (see Methods). Thus, spectral counts measured in the two eluates are essentially normalized to unit weight of starting material. In terms of estimating enrichment in the specific Man6-P eluate, this represents a conservative approach as the total number of spectral counts is not directly proportional to the amount of protein analyzed due to sampling limitations during LC-MS. For example, with fewer peptides available for MS/MS analysis, each peptide may be measured more frequently when smaller amounts of protein are analyzed. In addition, when larger amounts of protein are analyzed, ion suppression by more abundant peptides may decrease the signal intensity and thus frequency of measurement of less abundant peptides. The relationship between amount of protein digest analyzed and number of spectra measured was determined experimentally and is shown in Fig. 2A, where it is clear that the number of spectral counts plateaus with increasing amount of material analyzed. Thus, spectral counts measured in the mock eluate may be over-estimated and this is shown to be the case in Fig. 2B. Here, for each tissue, we plot the ratio of spectral counts for the specific versus mock eluates against the ratio of protein analyzed in the equivalent specific versus mock eluates. If spectral counts were directly proportional to amount of protein analyzed then these two ratios would be expected to be the same but this is not the case. Instead, the abundance of proteins in each of the mock eluates is overestimated and thus the stated enrichment factors are likely to be underestimates.
Most of the known soluble lysosomal proteins (59/60) were enriched in the Man6-P eluate (Online Supplementary Material Table 3; Fig. 3, Panel A), with GM2 activator protein being the only one that was depleted in the specific eluate. This may indicate that GM2 activator protein represents a low-affinity ligand for the immobilized MPR and that it readily dissociates during washing. Alternatively, some or all of the purified GM2 activator may be purified by virtue of association with other lysosomal proteins rather than by the presence of Man6-P. Interestingly, GM2 activator has been reported to traffic to the lysosome by both Man6-P dependent and Man6-P independent pathways30, 31, suggesting that this may be the case.
With the rationale that the specificity of purification for novel lysosomal candidates should be similar to known lysosomal proteins, we can use the enrichment observed for the latter to help in the identification of potential lysosomal candidates. We set the threshold for the lower 95% confidence interval of the specific/mock elution ratio to be >2.75 (log2 > 1.5) (Fig. 3, Panel A and Online Supplementary Material Table 3).
We can use relative enrichment to categorize proteins currently not classified as lysosomal (Table 3; Fig. 3, Panel B). For instance, 52 were found that were enriched to the same degree as the known lysosomal proteins (i.e. with an enrichment of >2.75-fold based on the lower limit of the 95% confidence interval) and we have categorized these as primary candidates for lysosomal residence (Table 4). We have also considered those proteins that are significantly enriched in the specific eluate but which are not enriched to the same degree as the known lysosomal proteins (lower limit of the 95% confidence interval for specific/mock >1 but ≤ 2.75). These are categorized as secondary candidates. Proteins that are significantly depleted in the specific eluate (upper limit of the 95% confidence interval for spectral counts of specific/mock <1) are classified as not lysosomal. While this classification is arbitrary, we believe that it represents a useful approach to prioritizing candidates for further investigation.
Table 3. Classification of affinity purified proteins based upon relative abundance in specific and mock eluates.
Category | Number of proteins |
---|---|
lysosomal | 60 |
not lysosomal | 272 |
primary candidate | 52 |
secondary candidate | 84 |
unclassified | 304 |
Total | 772 |
Table 4. Primary lysosomal candidates that are enriched in the Man6-P eluate to the same extent as known lysosomal proteins.
ENSG Identifier |
ENSEMBL Description |
Min log(e) |
Coverage |
Spectral counts (specific eluate) |
Spectral counts (mock eluate) |
Log2(specific/mock) |
Previously identified |
Number of tissues found |
Function |
Number of potential N-linked glycosylation sites |
Signal domain |
Protein Class |
Subcellular location |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ENSRNOG00000011287 | Multiple inositol polyphosphate phosphatase 1 | −52 | 15 | 24 | 0 | ∞ (2.64 to ∞) | no | 9 | e | 2 | yes | secreted | ER |
ENSRNOG00000015941 | Similar to 65kDa FK506-binding protein | −141 | 30 | 85 | 18 | 2.24 (1.51 to 2.97) | no | 12 | e | 7 | yes | secreted | ER |
ENSRNOG00000010935 | Carboxypeptidase B2 | −86 | 33 | 44 | 3 | 3.87 (2.27 to 5.48) | no | 8 | e | 6 | yes | secreted | extracellular |
ENSRNOG00000008575 | Amiloride binding protein 1 | −449.5 | 55 | 435 | 111 | 1.97 (1.67 to 2.27) | no | 6 | e | 4 | yes | secreted | extracellular |
ENSRNOG00000011913 | Ceruloplasmin | −330.8 | 35 | 145 | 30 | 2.27 (1.71 to 2.84) | yes | 9 | e | 6 | yes | secreted | extracellular |
ENSRNOG00000030183 | Procollagen lysine, 2-oxoglutarate 5-dioxygenase 2 | −278.3 | 42 | 177 | 45 | 1.98 (1.51 to 2.45) | yes | 13 | e | 6 | yes | secreted | extracellular, ER |
ENSRNOG00000001823 | Sialyltransferase 1 | −120.8 | 33 | 62 | 7 | 3.15 (2.05 to 4.25) | no | 3 | e | 3 | yes | soluble, non-secreted | Golgi |
ENSRNOG00000019014 | N- heparan sulfate sulfotransferase 1 | −38.5 | 6.1 | 14 | 0 | ∞ (1.87 to ∞) | no | 5 | e | 4 | yes | type II membrane | lysosomes |
ENSRNOG00000007763 | Procollagen-lysine, 2-oxoglutarate 5-dioxygenase 1 | −196.3 | 35 | 121 | 19 | 2.67 (1.98 to 3.36) | yes | 15 | e | 4 | yes | soluble, non-secreted | mitochondria, ER |
ENSMUSG00000028015 | Cathepsin O | −21 | 11 | 16 | 0 | ∞ (2.06 to ∞) | yes | 9 | e | 4 | yes | secreted | |
ENSRNOG00000000815 | Acid sphingomyelinase-like phosphodiesterase 3a | −341.8 | 55 | 679 | 58 | 3.55 (3.16 to 3.93) | yes | 17 | e | 6 | yes | secreted | |
ENSRNOG00000002358 | Retinoid-inducible serine carboxypeptidase | −211.6 | 41 | 673 | 80 | 3.07 (2.74 to 3.41) | yes | 16 | e | 5 | yes | secreted | |
ENSRNOG00000008422 | Lactoperoxidase | −79.9 | 19 | 23 | 0 | ∞ (2.58 to ∞) | no | 3 | e | 6 | yes | secreted | |
ENSRNOG00000008933 | LAMA-like protein 1 | −409.8 | 52 | 1062 | 146 | 2.86 (2.61 to 3.11) | yes | 16 | e | 5 | yes | secreted | |
ENSRNOG00000011181 | Carboxypeptidase A3 | −55.2 | 16 | 27 | 2 | 3.75 (1.83 to 5.68) | no | 6 | e | 4 | yes | secreted | |
ENSRNOG00000015551 | Fucosidase, alpha-L- 2, plasma | −221.4 | 50 | 322 | 52 | 2.63 (2.21 to 3.05) | yes | 15 | e | 5 | yes | secreted | |
ENSRNOG00000017908 | Pancreatic lipase-related protein 1 | −214.1 | 43 | 117 | 27 | 2.12 (1.52 to 2.71) | no | 3 | e | 3 | yes | secreted | |
ENSRNOG00000018236 | Chymosin family | −253.7 | 66 | 323 | 28 | 3.53 (2.97 to 4.08) | no | 5 | e | 1 | yes | secreted | |
ENSRNOG00000024181 | Tryptase alpha/beta 1 | −59.2 | 33 | 54 | 2 | 4.75 (2.86 to 6.65) | no | 12 | e | 2 | yes | secreted | |
ENSRNOG00000026937 | Arylsulfatase K | −144.5 | 27 | 57 | 1 | 5.83 (3.31 to 10.14) | yes | 8 | e | 7 | yes | secreted | |
ENSRNOG00000028092 | Carboxypeptidase A2 | −100.7 | 34 | 37 | 3 | 3.62 (2.01 to 5.24) | no | 2 | e | 0 | yes | secreted | |
ENSRNOG00000029604 | Mast cell protease 8 | −235.7 | 49 | 241 | 38 | 2.66 (2.17 to 3.16) | no | 9 | e | 4 | yes | secreted | |
ENSRNOG00000030909 | Mast cell protease 9 | −445.3 | 76 | 1121 | 87 | 3.69 (3.37 to 4) | no | 11 | e | 2 | yes | secreted | |
ENSRNOG00000032717 | Granzyme-like protein 2 | −398.4 | 56 | 439 | 65 | 2.76 (2.38 to 3.13) | no | 10 | e | 3 | yes | secreted | |
ENSRNOG00000039865 | Vanin 3 | −167 | 39 | 121 | 5 | 4.6 (3.35 to 5.85) | no | 8 | e | 3 | yes | soluble, non-secreted | |
ENSRNOG00000039971 | Cytotoxic T lymphocyte-associated protein 2 beta -like | −55.5 | 57 | 38 | 0 | ∞ (3.31 to ∞) | no | 6 | e | 0 | no | soluble, non-secreted | |
ENSRNOG00000009086 | Serum amyloid P-component | −144.6 | 58 | 320 | 78 | 2.04 (1.68 to 2.39) | yes | 15 | l | 2 | yes | secreted | extracellular |
ENSRNOG00000009217 | F-box only protein 6 | −71.9 | 31 | 62 | 4 | 3.95 (2.55 to 5.36) | yes | 11 | l | 1 | no | soluble, non-secreted | |
ENSRNOG00000032834 | Stress 70 protein chaperone microsome-associated 6 | −138.4 | 34 | 67 | 3 | 4.48 (2.89 to 6.07) | yes | 14 | o | 5 | yes | secreted | ER |
ENSRNOG00000020129 | P-cadherin | −41.4 | 9.1 | 11 | 0 | ∞ (1.52 to ∞) | no | 1 | o | 4 | yes | type I membrane | plasma Membrane |
ENSRNOG00000008838 | Gastrokine 1 | −66.8 | 34 | 91 | 15 | 2.6 (1.82 to 3.38) | no | 5 | o | 1 | yes | secreted | secretory granule |
ENSRNOG00000008716 | Neurofilament heavy polypeptide | −138.5 | 14 | 28 | 2 | 3.81 (1.88 to 5.73) | no | 4 | o | 1 | no | soluble, non-secreted | |
ENSRNOG00000018400 | Golgi membrane protein 1 | −26.7 | 7.4 | 11 | 0 | ∞ (1.52 to ∞) | no | 4 | o | 4 | yes | type II membrane | |
ENSRNOG00000013572 | Latexin | −42.2 | 26 | 21 | 0 | ∞ (2.45 to ∞) | no | 4 | pi | 3 | no | soluble, non-secreted | cytoplasm |
ENSRNOG00000030387 | Kininogen 1 | −253.1 | 56 | 302 | 82 | 1.88 (1.53 to 2.23) | yes | 13 | pi | 5 | yes | secreted | extracellular |
ENSRNOG00000005599 | Alpha-1-inhibitor 3 | −606.3 | 13 | 99 | 13 | 2.93 (2.11 to 3.75) | yes | 11 | pi | 12 | yes | secreted | extracellular |
ENSRNOG00000037188 | Murinoglobulin-1 | −635.5 | 44 | 497 | 75 | 2.73 (2.38 to 3.08) | yes | 15 | pi | 1 | yes | secreted | lysosomes |
ENSRNOG00000001201 | Cystatin-B | −96.7 | 72 | 329 | 65 | 2.34 (1.96 to 2.72) | yes | 17 | pi | 0 | no | soluble, non-secreted | nucleus, Cytoplasm |
ENSRNOG00000009855 | Serine (or cysteine) peptidase inhibitor, clade A, | −38 | 21 | 13 | 0 | ∞ (1.76 to ∞) | yes | 1 | pi | 1 | yes | secreted | |
ENSRNOG00000020455 | Cystatin E/M | −62.4 | 51 | 44 | 1 | 5.46 (2.93 to 9.78) | no | 9 | pi | 0 | yes | secreted | |
ENSRNOG00000033245 | Murinoglobulin family | −340.7 | 3 | 12 | 0 | ∞ (1.64 to ∞) | no | 5 | pi | 14 | yes | secreted | |
ENSRNOG00000010527 | Serine (or cysteine) peptidase inhibitor, clade A, | −227.7 | 57 | 209 | 41 | 2.35 (1.87 to 2.83) | no | 10 | pi | 3 | yes | type I membrane | |
ENSRNOG00000004067 | Neuronal cell adhesion molecule | −51.7 | 6.7 | 11 | 0 | ∞ (1.52 to ∞) | yes | 1 | s | 23 | yes | type I membrane | plasma Membrane |
ENSMUSG00000024109 | Neurexin I | −64.6 | 9.3 | 17 | 0 | ∞ (2.15 to ∞) | yes | 2 | s | 2 | no | type II membrane | plasma Membrane |
ENSRNOG00000004610 | Lumican | −52.8 | 26 | 40 | 3 | 3.74 (2.13 to 5.35) | yes | 11 | s | 4 | yes | secreted | extracellular |
ENSRNOG00000004554 | Decorin | −134 | 31 | 164 | 39 | 2.07 (1.57 to 2.57) | no | 12 | s | 4 | yes | secreted | extracellular |
ENSRNOG00000015410 | Similar to asporin | −75.1 | 28 | 42 | 6 | 2.81 (1.61 to 4.01) | no | 11 | s | 2 | yes | secreted | extracellular |
ENSRNOG00000002878 | Afamin | −206.6 | 36 | 155 | 21 | 2.88 (2.23 to 3.54) | yes | 12 | tp | 6 | yes | secreted | extracellular |
ENSRNOG00000021001 | Gastric intrinsic factor | −32 | 11 | 17 | 0 | ∞ (2.15 to ∞) | no | 2 | tp | 5 | yes | secreted | |
ENSRNOG00000020148 | Interleukin-4 induced gene-1 | −135.2 | 25 | 68 | 13 | 2.39 (1.54 to 2.23) | no | 5 | o | 3 | yes | multipass membrane | nucleus |
ENSRNOG00000021258 | Endogenous retroviral sequence 3 | −174.9 | 64 | 226 | 56 | 2.01 (1.59 to 2.43) | no | 4 | o | 3 | yes | type I membrane | |
ENSRNOG00000037782 | Gastrokine family | −27 | 19 | 19 | 0 | ∞ (2.31 to ∞) | no | 3 | o | 1 | yes | type II membrane |
Relative tissue expression of purified proteins
Expression profiling of soluble lysosomal proteins in human tissues demonstrated that some lysosomal proteins are quite widely distributed whereas expression of others was more limited (Fig. 1, Panel A; Online Supplementary Material Table 1). Here, the number of tissues in which each individual protein was expressed was simply determined on the basis of presence or absence in the respective Man6-P eluates as determined by LC-MS/MS (Fig. 4, Panel A). Thirty of the 60 known lysosomal proteins were found to be ubiquitously distributed and were present in all 17 tissue samples examined. An additional 26 proteins were present in most (12 to 16) of the sample types. Thus, the Man6-P forms of known lysosomal proteins appear to be quite widely distributed in the tissues examined and the average number of tissues in which each protein was detected was ∼15. For the proteins not assigned to the lysosome, the pattern of distribution was very different (Fig. 4, Panel B) with the majority of proteins found in three or less tissue samples.
Given that known lysosomal proteins tended to be relatively widely distributed, we considered the possibility that tissue distribution could help in the identification of candidates. In Fig. 5, we examined the tissue distribution of the individual groups of proteins that were categorized according to their enrichment in the specific eluate. For the proteins that were unclassified or categorized as not lysosomal, few were widely distributed, and each protein was found in an average of 1.3 and 2.4 tissues, respectively (Fig. 5, Panels A and B). More of the secondary candidates were widely distributed (each protein was found in an average of 4.4 tissues) but the majority were still found in few (≤ 3) tissues (Panel C). In contrast, for the primary lysosomal candidates, many more proteins were widely distributed and each was found in 7.9 tissues on average (Panel D).
Given that known lysosomal proteins tend to be widely distributed, we examined the list of lysosomal candidates for proteins that are found in 13 or more of the 17 tissues analyzed (Fig. 6). We identify a number of proteins that are both enriched and widely distributed that are particularly promising candidates for lysosomal localization including several orthologs of known lysosomal proteins. However, while tissue distribution can help in identifying candidates, it cannot be used to exclude candidates as the population of previously discovered lysosomal proteins may be biased towards the most abundant lysosomal proteins with the widest tissue distribution. Some undiscovered lysosomal proteins (whose identification and classification is the goal of this study) may have escaped assignment to this organelle because they are rare or have very limited distribution.
Concluding Remarks
It is becoming increasingly apparent from recent studies that the soluble proteome of the lysosome is more expansive than previously imagined. While over 60 Man6-P containing proteins are established as residing within the lumen of this organelle, analyses of proteins isolated by MPR-affinity chromatography from a variety of mammalian sources10-20 have revealed a significant number of additional proteins that may have lysosomal function. In this study, we have surveyed the proteome of MPR-affinity purified proteins from a broad selection of rat tissues. We have used mass spectrometric and biostatistical methods to distinguish specifically purified proteins from nonspecific contaminants by filtering the extensive list of identified proteins with parameters based upon the relative abundance of known lysosomal proteins in specific versus mock affinity column eluates. In concept, this approach is not dissimilar to the I-DIRT procedure for identifying specific members of a protein complex that are isolated by the affinity tagging of one of its constituents32, with the main difference being that we have relied upon spectral counting for protein abundance measurement rather than isotopic labeling.
When data obtained from all 17 tissues are considered together, we found that no significant conclusions could be drawn for 304/772 of the identified rodent proteins. In most cases, this could be attributed to low spectral counts for both the specific and mock eluates resulting in an extremely wide 95% confidence interval for the ratio. However, about a third (272/772) of all of the identified proteins could be confidently excluded from further analysis because they were significantly depleted in the specific compared to the mock eluate. One hundred and ninety six proteins were significantly enriched in the Man6-P compared to mock eluate. Of these, 60 are known soluble lysosomal proteins and the rest are proteins that are not currently thought to have lysosomal function. Of the latter, 52 proteins were enriched to levels comparable to the known lysosomal proteins (Table 4). Some of these proteins (21/52) were also identified in previous proteomic studies of purified Man6-P glycoproteins 11-14, 16-20.
The enriched proteins that are not assigned to the lysosome fall into numerous functional categories. Many are known or predicted to be hydrolases or other enzymes and as such, they represent promising lysosomal candidates, especially those that resemble known lysosomal proteins and which have a widespread tissue distribution. Several proteins fall into this category. Acid sphingomyelinase-like 3A (SMPDL3A) has been identified in many studies of purified Man6-P glycoproteins and is a paralog of the lysosomal hydrolases, acid sphingomyelinase. Increased expression of SMPDL3A has been observed in bladder cancer and a role in tumorigenesis has been proposed 33. Retinoid inducible serine carboxypeptidase (RISC) is a widely distributed protease that colocalizes with lysosome-associated membrane protein 2 and is probably a lysosomal protein 34, 35. FLJ22662 is a paralog of LOC196463, a previously discovered 14, 17 protein that was recently demonstrated to be lysosomal 36. Based upon sequence homology, both FLJ22662 and LOC196463 may have phosphodiesterase activity.
However, for many of the enriched proteins, it is not easy to predict whether or not a lysosomal function is likely but frequently we find more than one representative of a particular class of protein. For example, while glycosyltranferase would not appear to be a classical lysosomal activity, we find enzymes of this type (including GDP-fucose protein O-fucosyltransferase 2 (POFUT2), beta 3-glycosyltransferase-like and sialyltransferase 1) that are enriched in the Man6-P eluate. It is possible that they represent ER proteins of which some proportion may be aberrantly decorated with Man6-P, especially as we purify other proteins that may have ER localization (e.g. procollagen-lysine 1, 2-oxoglutarate 5-dioxygenase 1 and 2, KDEL containing protein 2 and stress 70 protein chaperone microsome-associated 6). However, it is also possible that these proteins are representatives of a hitherto unsuspected class of lysosomal protein. As noted previously17, we also identified a number of protease inhibitors that appear to contain Man6-P. In this study, we also find a significant number of small leucine-rich proteoglycans.
Enrichment in the specific eluate during affinity purification is consistent with a lysosomal function but it is not indicative of such. For example, a protein that is enriched in the Man6-P eluate could represent a specific contaminant or a non-lysosomal Man6-P glycoprotein rather than a bona fide lysosomal resident. While some of the enriched proteins are unquestionably purified in association with true Man6-P glycoproteins (e.g. cystatins, that lack N-linked glycosylation sites), for many or most of the purified proteins there seems little biological basis to suspect such an interaction thus they most probably do contain Man6-P. Earlier studies that directly demonstrated sites of Man6-phosphorylation on a number of apparently non-lysosomal proteins tend to support this conclusion19.
It is worth noting that one property that is consistent with localization within the lumen of the lysosome is the presence of a signal sequence. In this study, we find that signal domains are predicted for the vast majority (46/52) of the identified proteins that are not assigned to the lysosome but which are enriched in the specific eluate to the same extent as known lysosomal proteins.
The demonstration here of numerous proteins that are not thought to reside within the lysosome that likely contain Man6-P raises two intriguing alternatives. First, the proteome of luminal resident lysosomal proteins could be considerably larger and more diverse than is currently thought. If this is the case, then the functional significance of the lysosomal residence of these “new” proteins would need to be carefully evaluated. Second, even if they are mannose 6-phosphorylated, it is possible that these proteins have no physiological role in the lysosome. For instance, it is possible that the presence of Man6-P could simply indicate that these proteins represent low affinity substrates for the Man6-phosphotransferase and thus a proportion of a given protein may receive the Man6-P modification resulting in aberrant targeting to the lysosome that may be of little biological significance. Alternatively, there may be some non-lysosomal Man6-P glycoproteins that are not efficiently bound by the MPRs under physiological conditions but which are isolated when exposed to the high concentration of coupled MPR used in our affinity purification protocol.
In order to differentiate between these possibilities, methods for the accurate subcellular localization of the candidates identified here will be required. This can be achieved on a case-by-case basis (e.g., by generating appropriate antibody reagents and performing biochemical and morphological localization studies14, 36, 37) but combining the resolution of subcellular fractionation with the sensitivity of mass spectrometry for protein identification together with the sort of technical and biostatistical approaches to validate conclusions described here and elsewhere8 currently appears to be the most promising route. In this approach, subcellular fractions are prepared that are enriched for lysosomal activities and the protein composition of these fractions are investigated using various mass spectrometric proteomic analyses. In principle, this could be an effective method for characterizing both the soluble component of the lysosome as well as the membrane proteins associated with this organelle. However, lysosomal subcellular fractions are highly complex samples, with significant contamination by other organelles and this poses technical hurdles for a global, data-independent mass spectrometric approach towards cellular localization. As an alternative, the database of lysosomal candidates identified in this study should provide an excellent resource for targeted MS studies that address a subpopulation of candidate proteins within the complex lysosomal fractions.
Supplementary Material
Acknowledgements
This work was supported by NIH grants DK054317 and S10RR017992 (PL). We thank Caifeng Zhao for her excellent assistance with the mass spectrometry.
Footnotes
Supporting Information Available. Raw mass spectrometry data files are available upon request. Supplementary data in the form of an Excel workbook is provided which details lysosomal protein tissue distribution in terms of transcript and spectral counts as well as supporting information for protein assignment and statistical analysis. This information is available free of charge via the Internet at http://pubs.acs.org.
References
- 1.Holtzman E. Lysosomes. Plenum Press; New York: 1989. p. xvi.p. 439. [Google Scholar]
- 2.Scriver CR. The metabolic & molecular bases of inherited disease. 8th ed. McGraw-Hill; New York: 2001. pp. 4 v.pp. xlviipp. 6338pp. I–140. [Google Scholar]
- 3.Sleat DE, Jadot M, Lobel P. Lysosomal proteomics and disease. Proteomics - Clinical Applications. 2007 doi: 10.1002/prca.200700250. [DOI] [PubMed] [Google Scholar]
- 4.Fan X, Zhang H, Zhang S, Bagshaw RD, Tropak MB, Callahan JW, Mahuran DJ. Identification of the gene encoding the enzyme deficient in mucopolysaccharidosis IIIC (Sanfilippo disease type C). Am J Hum Genet. 2006;79(4):738–44. doi: 10.1086/508068. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Naureckiene S, Sleat DE, Lackland H, Fensom A, Vanier MT, Wattiaux R, Jadot M, Lobel P. Identification of HE1 as the second gene of Niemann-Pick C disease. Science. 2000;290(5500):2298–301. doi: 10.1126/science.290.5500.2298. [DOI] [PubMed] [Google Scholar]
- 6.Bagshaw RD, Mahuran DJ, Callahan JW. A proteomic analysis of lysosomal integral membrane proteins reveals the diverse composition of the organelle. Mol Cell Proteomics. 2005;4(2):133–43. doi: 10.1074/mcp.M400128-MCP200. [DOI] [PubMed] [Google Scholar]
- 7.Chataway TK, Whittle AM, Lewis MD, Bindloss CA, Davey RC, Moritz RL, Simpson RJ, Hopwood JJ, Meikle PJ. Two-dimensional mapping and microsequencing of lysosomal proteins from human placenta. Placenta. 1998;19(8):643–54. doi: 10.1016/s0143-4004(98)90026-1. [DOI] [PubMed] [Google Scholar]
- 8.Schroder B, Wrocklage C, Pan C, Jager R, Kosters B, Schafer H, Elsasser HP, Mann M, Hasilik A. Integral and associated lysosomal membrane proteins. Traffic. 2007;8(12):1676–86. doi: 10.1111/j.1600-0854.2007.00643.x. [DOI] [PubMed] [Google Scholar]
- 9.Ghosh P, Dahms NM, Kornfeld S. Mannose 6-phosphate receptors: new twists in the tale. Nat Rev Mol Cell Biol. 2003;4(3):202–12. doi: 10.1038/nrm1050. [DOI] [PubMed] [Google Scholar]
- 10.Sleat DE, Sohar I, Lackland H, Majercak J, Lobel P. Rat brain contains high levels of mannose-6-phosphorylated glycoproteins including lysosomal enzymes and palmitoyl-protein thioesterase, an enzyme implicated in infantile neuronal lipofuscinosis. J Biol Chem. 1996;271(32):19191–8. doi: 10.1074/jbc.271.32.19191. [DOI] [PubMed] [Google Scholar]
- 11.Czupalla C, Mansukoski H, Riedl T, Thiel D, Krause E, Hoflack B. Proteomic analysis of lysosomal acid hydrolases secreted by osteoclasts: implications for lytic enzyme transport and bone metabolism. Mol Cell Proteomics. 2006;5(1):134–43. doi: 10.1074/mcp.M500291-MCP200. [DOI] [PubMed] [Google Scholar]
- 12.Journet A, Chapel A, Kieffer S, Louwagie M, Luche S, Garin J. Towards a human repertoire of monocytic lysosomal proteins. Electrophoresis. 2000;21(16):3411–9. doi: 10.1002/1522-2683(20001001)21:16<3411::AID-ELPS3411>3.0.CO;2-M. [DOI] [PubMed] [Google Scholar]
- 13.Journet A, Chapel A, Kieffer S, Roux F, Garin J. Proteomic analysis of human lysosomes: application to monocytic and breast cancer cells. Proteomics. 2002;2(8):1026–40. doi: 10.1002/1615-9861(200208)2:8<1026::AID-PROT1026>3.0.CO;2-I. [DOI] [PubMed] [Google Scholar]
- 14.Kollmann K, Mutenda KE, Balleininger M, Eckermann E, von Figura K, Schmidt B, Lubke T. Identification of novel lysosomal matrix proteins by proteome analysis. Proteomics. 2005;5(15):3966–78. doi: 10.1002/pmic.200401247. [DOI] [PubMed] [Google Scholar]
- 15.Sleat DE, Kraus SR, Sohar I, Lackland H, Lobel P. alpha-Glucosidase and N-acetylglucosamine-6-sulphatase are the major mannose-6-phosphate glycoproteins in human urine. Biochem J. 1997;324(Pt 1):33–9. doi: 10.1042/bj3240033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Sleat DE, Lackland H, Wang Y, Sohar I, Xiao G, Li H, Lobel P. The human brain mannose 6-phosphate glycoproteome: a complex mixture composed of multiple isoforms of many soluble lysosomal proteins. Proteomics. 2005;5(6):1520–32. doi: 10.1002/pmic.200401054. [DOI] [PubMed] [Google Scholar]
- 17.Sleat DE, Wang Y, Sohar I, Lackland H, Li Y, Li H, Zheng H, Lobel P. Identification and validation of mannose 6-phosphate glycoproteins in human plasma reveal a wide range of lysosomal and non-lysosomal proteins. Mol Cell Proteomics. 2006;5(10):1942–56. doi: 10.1074/mcp.M600030-MCP200. [DOI] [PubMed] [Google Scholar]
- 18.Sleat DE, Zheng H, Lobel P. The human urine mannose 6-phosphate glycoproteome. Biochim Biophys Acta. 2006 doi: 10.1016/j.bbapap.2006.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Sleat DE, Zheng H, Qian M, Lobel P. Identification of sites of mannose 6-phosphorylation on lysosomal proteins. Mol Cell Proteomics. 2006;5(4):686–701. doi: 10.1074/mcp.M500343-MCP200. [DOI] [PubMed] [Google Scholar]
- 20.Qian M, Sleat DE, Zheng H, Moore D, Lobel P. Proteomics analysis of serum from mutant mice reveals lysosomal proteins selectively transported by each of the two mannose 6-phosphate receptors. Mol Cell Proteomics. 2008;7(1):58–70. doi: 10.1074/mcp.M700217-MCP200. [DOI] [PubMed] [Google Scholar]
- 21.Bots M, Medema JP. Granzymes at a glance. J Cell Sci. 2006;119(Pt 24):5011–4. doi: 10.1242/jcs.03239. [DOI] [PubMed] [Google Scholar]
- 22.Bradford MM. A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal Biochem. 1976;72:248–54. doi: 10.1006/abio.1976.9999. [DOI] [PubMed] [Google Scholar]
- 23.Beavis RC. Using the global proteome machine for protein identification. Methods Mol Biol. 2006;328:217–28. doi: 10.1385/1-59745-026-X:217. [DOI] [PubMed] [Google Scholar]
- 24.Craig R, Cortens JP, Beavis RC. Open source system for analyzing, validating, and storing protein identification data. J Proteome Res. 2004;3(6):1234–42. doi: 10.1021/pr049882h. [DOI] [PubMed] [Google Scholar]
- 25.Fink JL, Aturaliya RN, Davis MJ, Zhang F, Hanson K, Teasdale MS, Kai C, Kawai J, Carninci P, Hayashizaki Y, Teasdale RD. LOCATE: a mouse protein subcellular localization database. Nucleic Acids Res. 2006;34(Database issue):D213–7. doi: 10.1093/nar/gkj069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sprenger J, Lynn Fink J, Karunaratne S, Hanson K, Hamilton NA, Teasdale RD. LOCATE: a mammalian protein subcellular localization database. Nucleic Acids Res. 2008;36(Database issue):D230–3. doi: 10.1093/nar/gkm950. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Liu H, Sadygov RG, Yates JR., 3rd A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal Chem. 2004;76(14):4193–201. doi: 10.1021/ac0498563. [DOI] [PubMed] [Google Scholar]
- 28.Zhang B, VerBerkmoes NC, Langston MA, Uberbacher E, Hettich RL, Samatova NF. Detecting differential and correlated protein expression in label-free shotgun proteomics. J Proteome Res. 2006;5(11):2909–18. doi: 10.1021/pr0600273. [DOI] [PubMed] [Google Scholar]
- 29.Wilson EB. Probable Inference, the Law of Succession, and Statistical Inference. Journal of the American Statistical Association. 1927;22:209–212. [Google Scholar]
- 30.Glombitza GJ, Becker E, Kaiser HW, Sandhoff K. Biosynthesis, processing, and intracellular transport of GM2 activator protein in human epidermal keratinocytes. The lysosomal targeting of the GM2 activator is independent of a mannose-6-phosphate signal. J Biol Chem. 1997;272(8):5199–207. doi: 10.1074/jbc.272.8.5199. [DOI] [PubMed] [Google Scholar]
- 31.Rigat B, Wang W, Leung A, Mahuran DJ. Two mechanisms for the recapture of extracellular GM2 activator protein: evidence for a major secretory form of the protein. Biochemistry. 1997;36(27):8325–31. doi: 10.1021/bi970571c. [DOI] [PubMed] [Google Scholar]
- 32.Tackett AJ, DeGrasse JA, Sekedat MD, Oeffinger M, Rout MP, Chait BT. I-DIRT, a general method for distinguishing between specific and nonspecific protein interactions. J Proteome Res. 2005;4(5):1752–6. doi: 10.1021/pr050225e. [DOI] [PubMed] [Google Scholar]
- 33.Wright KO, Messing EM, Reeder JE. Increased expression of the acid sphingomyelinase-like protein ASML3a in bladder tumors. J Urol. 2002;168(6):2645–9. doi: 10.1016/S0022-5347(05)64236-X. [DOI] [PubMed] [Google Scholar]
- 34.Chen J, Streb JW, Maltby KM, Kitchen CM, Miano JM. Cloning of a novel retinoid-inducible serine carboxypeptidase from vascular smooth muscle cells. J Biol Chem. 2001;276(36):34175–81. doi: 10.1074/jbc.M104162200. [DOI] [PubMed] [Google Scholar]
- 35.Lee TH, Streb JW, Georger MA, Miano JM. Tissue expression of the novel serine carboxypeptidase Scpep1. J Histochem Cytochem. 2006;54(6):701–11. doi: 10.1369/jhc.5A6894.2006. [DOI] [PubMed] [Google Scholar]
- 36.Jensen AG, Chemali M, Chapel A, Kieffer-Jaquinod S, Jadot M, Garin J, Journet A. Biochemical characterization and lysosomal localization of the mannose-6-phosphate protein p76 (hypothetical protein LOC196463). Biochem J. 2007;402(3):449–58. doi: 10.1042/BJ20061205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Della Valle MC, Sleat DE, Sohar I, Wen T, Pintar JE, Jadot M, Lobel P. Demonstration of lysosomal localization for the mammalian ependymin-related protein using classical approaches combined with a novel density shift method. J Biol Chem. 2006;281(46):35436–45. doi: 10.1074/jbc.M606208200. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.