Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2004 May;13(5):1422–1425. doi: 10.1110/ps.03511604

Identification and analysis of polyserine linker domains in prokaryotic proteins with emphasis on the marine bacterium Microbulbifer degradans

Michael B Howard 1, Nathan A Ekborg 1, Larry E Taylor 1, Steven W Hutcheson 1, Ronald M Weiner 1
PMCID: PMC2286767  PMID: 15075401

Abstract

Polyserine linkers (PSLs) are interdomain, serine-rich sequences found in modular proteins. Though common among eukaryotes, their presence in prokaryotic enzymes is limited. We identified 46 extracellular proteins involved in complex carbohydrate degradation from Microbulbifer degradans that contain PSLs that separate carbohydrate-binding domains or catalytic domains from other binding domains. In nine M. degradans proteins, PSLs also separated amino-terminal lipoprotein acylation sites from the remainder of the polypeptide. Furthermore, among the 76 PSL proteins identified in sequence repositories, 65 are annotated as proteins involved in complex carbohydrate degradation. We discuss the notion that PSLs are flexible, disordered spacer regions that enhance substrate accessibility.

Keywords: microbulbifer degradans, polyserine linker, secreted carbohydrases, domain linkers, marine bacterium 2–40;, extracellular depolymerases


Functional domains (e.g., catalytic and binding domains) within some prokaryotic carbohydrases are separated by linker regions consisting of simple or repetitive sequence rich in proline, threonine, serine, or glycine (Knowles et al. 1987; Gilkes et al. 1988, 1991; Tomme et al. 1988; Shen et al. 1991; Beguin and Aubert 1994). Domain linkers have been hypothesized to provide proteins with a flexible region or to increase the distance between active domains, presumably to optimize interaction with substrates (Bhandari et al. 1986; Burton et al. 1989; Radford et al. 1989; Ferreira et al. 1990). Polyserine linker domains (PSLs) are thought to be flexible, in stark contrast to the more common proline rich linker sequences which are predicted to have extended, rigid conformations (Shen et al. 1991).

Domain linkers composed predominantly of serine are rare. The soil bacterium Cellvibrio japonicus (formerly Pseudomonas fluorescens subsp. cellulosa) had been the only organism known to encode multiple carbohydrases containing PSLs, as extensively characterized by Gilbert et al. (Hall et al. 1989; Kellett et al. 1990; Millward-Sadler et al. 1995; Brown et al. 2001; McKie et al. 2001). Cellvibrio japonicus was the only prokaryote known to have more than two PSL proteins (with a current total of 13 sequenced and characterized PSL enzymes) until the genome sequence of Microbulbifer degradans strain 2–40 became available. M. degradans is a γ-proteobacterium isolated from decaying salt marsh cord grass in the Chesapeake Bay watershed (Andrykovich and Marx 1988). It has been shown to depolymerize and metabolize more than 10 insoluble complex polysaccharides (ICP) including agar, chitin, alginate, xylan, pectin, cellulose, pullulan, fucoidan, laminarin, and starch (Gonzalez and Weiner 2000; Howard et al. 2003). Here we report that 46 secreted carbohydrases and related proteins in M. degradans contain PSLs that separate a variety of functional domains.

Materials and methods

PSL-containing proteins were identified using protein sequences based on the translated nucleotide sequences of 140 completed microbial genomes and, where possible, the 125 unfinished microbial genomes found at the National Center for Biotechnology Information microbial genome home page (http://www.ncbi.nlm.nih.gov/genomes/MICROBES/Complete.html). Nonredundant, annotated protein sequence databases were searched for PSL proteins using the PIR pattern/peptide match program at the Protein Information Resource server (http://pir.georgetown.edu/). The domain architecture of each PSL protein was analyzed using the Simple Modular Architecture Research Tool (http://smart.embl-heidelberg.de). Type II secretion signals were identified using the iPSORT program (http://www.hypothesiscreator.net/iPSORT). Finally, lipoprotein acylation sites were identified at the DOLOP Web site (http://www.mrc-lmb.cam.ac.uk/genomes/dolop).

Results and Discussion

Forty-six M. degradans genes were identified that encoded proteins with PSLs: 18 contained a single PSL, whereas 28 had two or more (Supplementary Table 1). These domains have an average length of 39 residues and an average composition of 79% serine, 11% glycine, 7% threonine, and 3% alanine. Glycine residues are predominantly found immediately flanking tracts of polyserine sequence (Fig. 1), and more than 80% of the PSLs have glycine residues at their start or terminus. Several of the PSLs also contain a single aspartic acid or cysteine residue. Though serine is the predominant residue within each PSL, none of the PSLs were identical in terms of exact residue composition or sequence. Each of the six codons for serine is used to encode the PSLs, and none are used preferentially, nor were they arranged in any obvious pattern or repeat.

Figure 1.

Figure 1.

An example of a polyserine linker (PSL) protein from Microbulbifer degradans. The protein sequence of ZP_00064986, a predicted cellulase, contains two binding domains (underlined) separated by a PSL. Further, the second binding domain is separated from the remainder of the protein (which contains the catalytic domain) by a second PSL. Note that glycine residues are found in the sequence flanking each PSL (bold G’s). Secretion signal has double underline.

All of the 46 M. degradans PSL proteins are carbohydrate-depolymerizing enzymes, carbohydrate-binding proteins, or proteins with similarity to known proteins involved in carbohydrate degradation. These include two chitinases, eight cellulases, 10 pectate lyases, five xylanases, three mannanases, a rhamnogalacturonan lyase, an alginate lyase, and 16 proteins of unknown function. Among the 16 proteins for which no activity could be predicted, each has weak similarity to a known degradative enzyme or contains sequence similarity to known carbohydrate-binding module or catalytic domain. In cases in which no sequence similarity was identified, the PSLs separated the proteins into segments large enough to contain presently unconfirmed catalytic sites or carbohydrate-binding modules. Each of the 46 PSL-containing proteins contains a Type II secretion signal.

In M. degradans, PSLs always separate predicted binding or catalytic domains (Fig. 1). Interestingly, in nine proteins, a PSL immediately follows the secretion signal. All nine of these proteins contain an apparent lipoprotein acylation site; that is, each has at least one positively charged residue within the first five amino acids, a hydrophobic stretch of 8–10 residues, and a lipobox containing the appropriately conserved amino acids, including the requisite cysteine residue. In gram-negative bacteria, when the cysteine residue within a lipobox is acylated, the protein becomes anchored to the inner or outer membrane (Madan Babu and Sankaran 2002). This report is the first observation of PSLs separating an anchoring domain from the remainder of a protein.

Forty-two of the 46 genes encoding PSL proteins are unique within the M. degradans genome sequence. The remaining four genes include two pairs of paralogs. The genes for two predicted pectate lyases (ZP_00067834 and ZP_00067832) exhibit greater than 75% identity among a carbohydrate-binding domain and a Fibronectin Type III domain, and more than 80% identity between sequences corresponding to catalytic domains. However, the nucleotide sequence corresponding to the similarly located PSLs is less than 20% identical. Likewise, two cellulases (ZP_00066178 and ZP_00068260) also appear to have significant similarity at the nucleotide level except for their PSLs. In C. japonicus, the genes for XylB and XylC are located in tandem in the genome and contain a duplicate sequence at their amino termini, which includes a PSL. Duplicated genes in which one of the genes encoded a PSL and the other did not were not identified in either organism. Thus, it appears that neither a known method of transposition nor a recent, repetitive duplication event generated PSLs.

Interestingly, eight of the M. degradans PSL proteins are most similar to C. japonicus enzymes in which sequence, overall domain architecture, and PSL location are conserved. Horizontal transfer is known to play a role in the acquisition of new genetic material by bacteria, though it often occurs in specific eco-niches, such as the rumen (Netherwood et al. 1999; Garcia-Vallve et al. 2000). It is unlikely that C. japonicus, a soil bacterium, and M. degradans, a marine bacterium, have recently shared a common environment. Thus, these genes may have been exchanged before each evolved to different habitats or may have been inherited from a common ancestor (Warren et al. 1986). In either case, these domain arrangements have been conserved for an evolutionarily long period of time, suggesting that the placement of the domains and PSLs within each enzyme is functionally significant.

Beyond the PSL proteins of M. degradans and C. japonicus, 17 PSL proteins were identified during searches of the nonredundant database, as well as complete and incomplete microbial genome sequences (Supplementary Table 2). Interestingly, no proteins with PSLs were identified among archeae. Cellulose-degrading enzymes with PSLs were identified in Pseudomonas sp. ND137, Xyella fastidiosa strain Temecula1, Xylella fastidiosa strain 9a5c, and Ruminococcus albus. Erwinia chrysanthemi encodes OutD, a pectic enzyme secretion protein that contains a PSL. These species, however, do not encode more than one protein with a PSL. Supplementary Table 2 also shows several other proteins with PSLs that do not contain currently characterized domains.

There are several observations that suggest that PSLs are flexible. First, on the basis of the NORSp program (Liu and Rost 2003), PSLs are not predicted to have a regular secondary structure but are instead extended ‘loopy’ regions. Second, lipovitellin (a eukaryotic protein that contains a polyserine region) was partially crystallized; however, the polyserine region was not included in the crystal structure (Anderson et al. 1998). This is consistent with the notion that disordered regions are not amenable to crystalization. Finally, we have determined that glycine residues flank >80% of the PSLs in M. degradans proteins. These residues may increase the overall flexibility of these regions, as the flexibility of glycine is well-documented (Ladurner and Fersht 1997; Krieger et al. 2003). Taken together, these factors suggest that PSLs are disorganized, flexible spacers.

During the degradation of ICP, flexible linker regions coupling a catalytic and a binding domain could expand the potential substrate target area available to the enzyme after a carbohydrate-binding module makes contact with a polymer. Similarly, PSLs would enhance substrate availability to an enzyme anchored to a bacterial outer membrane—a potential survival advantage in the marine environment, where diffusion and dilution are major factors affecting extracellular enzymes. In nine M. degradans enzymes and in several hypothetical proteins from other organisms (Supplementary Table 2), PSLs are located immediately after an amino-terminal lipobox, suggesting that PSLs can function to extend the catalytic or binding domains of a surface-associated enzyme from the outer membrane. Based on thorough searches of existing prokaryotic genome databases, of the known enzymes of C. japonicus, of the nonredundant database, and of the considerable data afforded by analysis of the M. degradans genome, it is now possible to posit that in prokaryotes, PSLs are generally found within secreted, complex polysaccharide depolymerizing enzymes or proteins involved in carbohydrate binding or metabolism to potentiate interaction with substrates.

Although M. degradans encodes 46 proteins with PSLs involved in complex carbohydrate degradation, it is postulated to contain nearly twice that number of extracellular carbohydrases in which the domains are not separated by repetitive linking sequence. Similarly, C. japonicus also encodes carbohydrases that do not contain PSLs. The deletion of polyserine linkers from two C. japonicus xylanases decreases their activity on insoluble substrates but does not altogether abolish their activity or reduce binding (Black et al. 1996; Rixon et al. 1996). Possibly relevant as well, threonie/proline-rich linkers have been shown to be dispensable with only moderate loss of activity (Ferreira et al. 1990). These observations indicate that PSLs are not absolutely required for carbohydrase function but may have evolved to enhance the activity of certain enzyme configurations, particularly during in situ degradation of ICP. Though PSL coding sequences are dynamic, their amino acid sequences are static, suggesting specific structural constraints associated with advantageous function.

Microbulbifer degradans is unique among marine bacteria in its ability to degrade more than 10 ICP. Moreover, the draft genome sequence reveals over 130 putative carbohydrases involved in the degradation of these ICP. That 46 of these proteins contain PSLs and that they are limited to secreted enzymes involved in ICP degradation is an extremely interesting finding that underlies the importance of the PSL motif in carbohydrate catalysis in nature.

Electronic supplemental material

The supplemental material contains: Supplementary Table 1, Microbulbifer degradans proteins containing polyserine linkers; Supplementary Table 2, Additional PSL proteins among prokaryotes.

Acknowledgments

This research was supported by funds from the Maryland Sea Grant (NA16RG2207) and the National Science Foundation (DEB0109869).

We thank the Joint Genome Institute of the United States Department of Energy for sequencing the Microbulbifer degradans 2–40 genome, and J. Bretz for his valuable comments on the manuscript.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.

Article published ahead of print. Article and publication date are at http://www.proteinscience.org/cgi/doi/10.1110/ps.03511604.

Supplemental material: see www.proteinscience.org

References

  1. Anderson, T.A., Levitt, D.G., and Banaszak, L.J. 1998. The structural basis of lipid interactions in lipovitellin, a soluble lipoprotein. Structure 6 895–909. [DOI] [PubMed] [Google Scholar]
  2. Andrykovich, G. and Marx, I. 1988. Isolation of a new polysaccharide digesting bacterium from a salt marsh. Appl. Microbiol. Biotechnol. 54 1061–1062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Beguin, P. and Aubert, J.P. 1994. The biological degradation of cellulose. FEMS Microbiol. Rev. 13 25–58. [DOI] [PubMed] [Google Scholar]
  4. Bhandari, D.G., Levine, B.A., Trayer, I.P., and Yeadon, M.E. 1986. 1H-NMR study of mobility and conformational constraints within the proline-rich N-terminal of the LC1 alkali light chain of skeletal myosin. Correlation with similar segments in other protein systems. Eur. J. Biochem. 160 349–356. [DOI] [PubMed] [Google Scholar]
  5. Black, G.W., Rixon, J.E., Clarke, J.H., Hazlewood, G.P., Theodorou, M.K., Morris, P., and Gilbert, H.J. 1996. Evidence that linker sequences and cellulose-binding domains enhance the activity of hemicellulases against complex substrates. Biochem. J. 319 515–520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Brown, I.E., Mallen, M.H., Charnock, S.J., Davies, G.J., and Black, G.W. 2001. Pectate lyase 10A from Pseudomonas cellulosa is a modular enzyme containing a family 2a carbohydrate-binding module. Biochem. J. 355 155–165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Burton, J., Wood, S.G., Pedyczak, A., and Siemion, I.Z. 1989. Conformational preferences of sequential fragments of the hinge region of human IgA1 immunoglobulin molecule: II. Biophys. Chem. 33 39–45. [DOI] [PubMed] [Google Scholar]
  8. Ferreira, L.M., Durrant, A.J., Hall, J., Hazlewood, G.P., and Gilbert, H.J. 1990. Spatial separation of protein domains is not necessary for catalytic activity or substrate binding in a xylanase. Biochem. J. 269 261–264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Garcia-Vallve, S., Romeu, A., and Palau, J. 2000. Horizontal gene transfer of glycosyl hydrolases of the rumen fungi. Mol. Biol. Evol. 17 352–361. [DOI] [PubMed] [Google Scholar]
  10. Gilkes, N.R., Warren, R.A., Miller Jr., R.C., and Kilburn, D.G. 1988. Precise excision of the cellulose binding domains from two Cellulomonas fimi cellulases by a homologous protease and the effect on catalysis. J. Biol. Chem. 263 10401–10407. [PubMed] [Google Scholar]
  11. Gilkes, N.R., Henrissat, B., Kilburn, D.G., Miller Jr., R.C., and Warren, R.A. 1991. Domains in microbial β-1, 4-glycanases: Sequence conservation, function, and enzyme families. Microbiol. Rev. 55 303–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Gonzalez, J. and Weiner, R. 2000. Phylogenetic characterization of a marine bacterium strain 2–40, a degrader of complex polysaccharides. Int. J. Syst. Bacteriol. 8 831–834. [DOI] [PubMed] [Google Scholar]
  13. Hall, J., Hazlewood, G.P., Huskisson, N.S., Durrant, A.J., and Gilbert, H.J. 1989. Conserved serine-rich sequences in xylanase and cellulase from Pseudomonas fluorescens subspecies cellulosa: Internal signal sequence and unusual protein processing. Mol. Microbiol. 3 1211–1219. [DOI] [PubMed] [Google Scholar]
  14. Howard, M.B., Ekborg, N.A., Taylor, L.E., Weiner, R.M., and Hutcheson, S.W. 2003. Genomic analysis and initial characterization of the chitinolytic system of Microbulbifer degradans strain 2–40. J. Bacteriol. 185 3352–3360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Kellett, L.E., Poole, D.M., Ferreira, L.M., Durrant, A.J., Hazlewood, G.P., and Gilbert, H.J. 1990. Xylanase B and an arabinofuranosidase from Pseudomonas fluorescens subsp. cellulosa contain identical cellulose-binding domains and are encoded by adjacent genes. Biochem. J. 272 369–376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Knowles, J., Lehtovaara, P., Penttila, M., Teeri, T., Harkki, A., and Salovuori, I. 1987. The cellulase genes of Trichoderma. Antonie Van Leeuwenhoek 53 335–341. [DOI] [PubMed] [Google Scholar]
  17. Krieger, F., Fierz, B., Bieri, O., Drewello, M., and Kiefhaber, T. 2003. Dynamics of unfolded polypeptide chains as model for the earliest steps in protein folding. J. Mol. Biol. 332 265–274. [DOI] [PubMed] [Google Scholar]
  18. Ladurner, A.G. and Fersht, A.R. 1997. Glutamine, alanine or glycine repeats inserted into the loop of a protein have minimal effects on stability and folding rates. J. Mol. Biol. 273 330–337. [DOI] [PubMed] [Google Scholar]
  19. Liu, J. and Rost, B. 2003. NORSp: Predictions of long regions without regular secondary structure. Nucleic Acids Res. 31 3833–3835. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Madan Babu, M., and Sankaran, K. 2002. DOLOP—database of bacterial lipoproteins. Bioinformatics 18 641–643. [DOI] [PubMed] [Google Scholar]
  21. McKie, V.A., Vincken, J.P., Voragen, A.G., van den Broek, L.A., Stimson, E., and Gilbert, H.J. 2001. A new family of rhamnogalacturonan lyases contains an enzyme that binds to cellulose. Biochem. J. 355 167–177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Millward-Sadler, S.J., Davidson, K., Hazlewood, G.P., Black, G.W., Gilbert, H.J., and Clarke, J.H. 1995. Novel cellulose-binding domains, NodB homologues and conserved modular architecture in xylanases from the aerobic soil bacteria Pseudomonas fluorescens subsp. cellulosa and Cellvibrio mixtus. Biochem. J. 312 (Pt. 1) 39–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Netherwood, T., Bowden, R., Harrison, P., O’Donnell, A.G., Parker, D.S., and Gilbert, H.J. 1999. Gene transfer in the gastrointestinal tract. Appl. Environ. Microbiol. 65 5139–5141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Radford, S.E., Laue, E.D., Perham, R.N., Martin, S.R., and Appella, E. 1989. Conformational flexibility and folding of synthetic peptides representing an interdomain segment of polypeptide chain in the pyruvate dehydrogenase multienzyme complex of Escherichia coli. J. Biol. Chem. 264 767–775. [PubMed] [Google Scholar]
  25. Rixon, J.E., Clarke, J.H., Hazlewood, G.P., Hoyland, R.W., McCarthy, A.J., and Gilbert, H.J. 1996. Do the non-catalytic polysaccharide-binding domains and linker regions enhance the biobleaching properties of modular xylanases? Appl. Microbiol. Biotechnol. 46 514–520. [DOI] [PubMed] [Google Scholar]
  26. Shen, H., Schmuck, M., Pilz, I., Gilkes, N.R., Kilburn, D.G., Miller Jr., R.C., and Warren, R.A. 1991. Deletion of the linker connecting the catalytic and cellulose-binding domains of endoglucanase A (CenA) of Cellulomonas fimi alters its conformation and catalytic activity. J. Biol. Chem. 266 11335–11340. [PubMed] [Google Scholar]
  27. Tomme, P., Van Tilbeurgh, H., Pettersson, G., Van Damme, J., Vandekerckhove, J., Knowles, J., Teeri, T., and Claeyssens, M. 1988. Studies of the cellulolytic system of Trichoderma reesei QM 9414. Analysis of domain function in two cellobiohydrolases by limited proteolysis. Eur. J. Biochem. 170 575–581. [DOI] [PubMed] [Google Scholar]
  28. Warren, R.A., Beck, C.F., Gilkes, N.R., Kilburn, D.G., Langsford, M.L., Miller Jr., R.C., O’Neill, G.P., Scheufens, M., and Wong, W.K. 1986. Sequence conservation and region shuffling in an endoglucanase and an exoglucanase from Cellulomonas fimi. Proteins 1 335–341. [DOI] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES