Skip to main content
Self Nonself logoLink to Self Nonself
. 2010 Jan-Mar;1(1):71–74. doi: 10.4161/self.1.1.9588

Bacterial peptides are intensively present throughout the human proteome

Brett Trost 1, Anthony Kusalik 1, Guglielmo Lucchese 2, Darja Kanduc 2,
PMCID: PMC3091599  PMID: 21559180

Abstract

Forty bacterial proteomes—20 pathogens and 20 non-pathogens—were examined for amino acid sequence similarity to the human proteome. All bacterial proteomes, independent of their pathogenicity, share hundreds of nonamer sequences with the human proteome. This overlap is very widespread, with one third of human proteins sharing at least one nonapeptide with one of these bacteria. On the whole, the bacteria-versus-human nonamer overlap is numerically defined by 47,610 total perfect matches disseminated through 10,701 human proteins. These findings open new perspectives on the immune relationship between bacteria and host, and might help our understanding of fundamental phenomena such as self-nonself discrimination and tolerance versus auto-reactivity.

Keywords: bacterial proteomes, human proteome, similarity screening, peptide sharing, self-nonself discrimination, tolerance versus auto-reactivity

Introduction

The completeness of the current protein databases represents a scientific turning point for comparatively analyzing and evaluating commonalities and differences among well-defined available proteomes. Our labs are taking advantage of this unique chance for investigating the molecular determinants possibly involved in human susceptibility to infectious agents.13 We recently analyzed a set of viral proteomes for sequence similarity to the human proteome, and reported a massive and widespread peptide overlapping between viral and human proteins.4,5 Here we analyze a set of 20 pathogenic and 20 non-pathogenic bacterial proteomes, and report that all of the bacterial proteomes studied exhibit an unexpectedly high level of peptide sharing with the human proteome, irrespective of the microbe's pathogenicity.

Results and Discussion

Quantitative analysis of nonapeptide overlap between bacterial proteomes and the human proteome is reported in Table 1. The table shows that all 40 bacterial proteomes under analysis exhibit substantial, widespread nonamer overlap with the human proteome.

Table 1.

Overlap between bacterial proteomes and the human proteome at the nonamer level

Taxonomic IDa Bacterium namea 1 2 3 4 5
299768 Streptococcus thermophilus 442301 444042 1010 2692 652
367928 Bifidobacterium adolescentis 585334 586711 1004 2839 727
206672 Bifidobacterium longum 622420 624747 1069 3212 829
257314 Lactobacillus johnsonii 565126 573011 951 2017 549
272621 Lactobacillus acidophilus 562384 569089 938 2052 555
203120 Leuconostoc mesenteroides 584182 590347 1505 3300 703
416870 Lactococcus lactis 653204 663147 1148 3016 806
393595 Alcanivorax borkumensis 882913 885964 1987 3714 877
220668 Lactobacillus plantarum 880644 892309 1328 3326 908
226185 Enterococcus faecalis 907372 927384 1326 2757 812
420662 Methylibium petroleiphilum 1330008 1346090 3065 7266 1955
251221 Gloeobacter violaceus 1321494 1342276 2217 6957 1790
369723 Salinispora tropica 1460575 1481468 2466 5602 1975
78245 Xanthobacter autotrophicus 1539223 1578178 3497 8037 2044
138119 Desulfitobacterium hafniense 1533966 1560833 1900 5613 1469
318586 Paracoccus denitrificans 1488837 1521089 3221 6139 1506
351746 Pseudomonas putida 1684829 1713643 2986 5905 1550
222523 Bacillus cereus 1459717 1481886 2038 3973 1045
366394 Sinorhizobium medicae 1827784 1872275 3448 7053 1672
224911 Bradyrhizobium japonicum 2503929 2549744 3916 9814 2399
471472 Chlamydia trachomatis 302412 302833 675 1392 354
455434 Treponema pallidum 338611 341669 693 2678 613
392021 Rickettsia rickettsii 306266 307726 1324 3255 621
458234 Francisella tularensis 437842 444794 1125 2170 490
85962 Helicobacter pylori 473597 479277 845 1760 438
224326 Borrelia burgdorferi 365362 405718 601 1217 333
195099 Campylobacter jejuni 518259 523912 877 2392 535
374833 Neisseria meningitidis 537216 551575 1503 2711 629
516950 Streptococcus pneumoniae 597786 615907 1107 2960 762
257309 Corynebacterium diphtheriae 703475 707188 1242 2608 787
212717 Clostridium tetani 783287 789969 1103 2594 688
273036 Staphylococcus aureus 708397 714988 1280 2405 647
262698 Brucella abortus 859295 862673 2845 5080 1051
400673 Legionella pneumophila 991803 1008582 1869 4260 974
520 Bordetella pertussis 1018489 1038957 2489 5612 1474
243277 Vibrio cholerae 1114657 1123600 1967 5115 1117
349746 Yersinia pestis 1095487 1108011 2106 4212 1085
83331 Mycobacterium tuberculosis 1262574 1290217 2342 7054 1892
99287 Salmonella typhimurium 1368776 1383324 2306 4670 1160
261594 Bacillus anthracis 1402238 1416803 1977 4495 1105
All non-pathogensb 22026212 23204233 18465 36021 8410
All pathogensb 14704464 15417723 13248 28372 6249
All bacteriab 35431829 38621956 24789 47610 10701

The level of overlap between 20 pathogenic and 20 non-pathogenic bacterial proteomes and the human proteome is shown. The human proteome contained 36,014 proteins and 15,511,124 occurrences of 10,999,648 unique nonamers. Bacteria that are pathogenic to humans are shown in bold. Column details are as follows: (1) unique nonamers in the bacterial proteome; (2) total number of nonamers in the bacterial proteome (including multiple occurrences); (3) unique bacterial nonamers occurring in the human proteome; (4) bacterial overlap occurrences in the human proteome (including multiple occurrences); (5) number of human proteins involved in the overlap. The linear regression equations, with associated correlation coefficients (r) and p-values, are as follows: column 3 (y) versus column 1 (x): y = 0.00157x + 288.9 (r = 0.891; p = 6.88 × 10−15); column 4 (y) versus column 1 (x): y = 0.003600x + 671.7 (r = 0.893; p = 5.11 × 10−15); column 5 (y) versus column 1 (x): y = 0.00095x + 135.0 (r = 0.895; p = 3.28 × 10−15).

a

Information for each bacterium can be found at ncbi. nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi.

b

Obtained by combining the 20 pathogenic (or the 20 non-pathogenic or all 40) bacterial proteomes into one protein set and then computing the overlap of this set with the human proteome.

The overlap between the 40 bacterial proteomes and the human proteome consists of a total of 47,610 perfect matches disseminated through 10,701 human proteins. In other words, about 50,000 perfect sequences, each 9 amino acids long, are shared between the 40 bacterial proteomes described in Table 1 and about one third of the human proteome. The bacterial versus human overlap is independent of the microbe'spathogenicity. We find that, as expected, the extent of the bacterial overlap depends almost exclusively on the size of the bacterial proteome. Indeed, the size of the bacterial proteome (in terms of number of unique nonamers) is positively correlated (r ≥ 0.891) to the three other variables: the number of unique overlaps in the human proteome; the total number of overlaps in the human proteome, including repeats; and the number of human proteins involved in the overlap. All of these correlations are statistically significant (p < 0.01).

These data have important implications for the link between microbial infections, molecular mimicry, and autoimmunity. Molecular mimicry is based on the principle that infectious agents initiate and sustain an autoimmune reaction by generating autoreactive B and/or T lymphocytes that simultaneously recognize cross-reactive determinants from both the original infectious agent and the host. This sharing of amino acid sequences on proteins from self- and nonself-sources (i.e., host and virus/bacterium) is the fundamental essence of the molecular mimicry concept.6,7 We note that molecular mimicry may involve both linear and conformational antigenic determinants. Since the data reported in this paper represent possible linear, but not conformational epitopes, the numbers given actually understate the level of epitopic overlap between bacterial and human proteomes. Consequently, although our data suggest an impressive potential for cross-reactivity between bacterial and human proteins, this potential must surely be even greater than our numbers indicate.

A considerable number of classical and recent reports have suggested molecular mimicry as a pathogenic mechanism in a wide range of diseases. These include acute rheumatic fever, reactive arthritis after enteric infection or associated with Reiter's syndrome, myasthenia gravis, rheumatoid arthritis, insulin-dependent diabetes, ankylosing spondylitis, Guillain-Barré syndrome, autoimmune hepatitis and primary biliary cirrhosis, neurological diseases such as multiple sclerosis and other demyelinating pathologies, and even the atherosclerotic plaque.814

In contrast, the results presented here are consistent with a number of other reports in which the elusive character of the molecular mimicry hypothesis has been underlined.1527 Our past4,5 and present data tend to exclude a causal mechanistic role for molecular mimicry in the genesis of autoimmunity. According to the molecular mimicry hypothesis, the widespread overlap between viral and bacterial proteomes and the human proteome (see Table 1 and ref. 5) would predict that autoimmune diseases should have a much higher incidence than actually observed, both in the total number of individuals affected and the number of autoimmune pathologies per individual. Thus, it is difficult to reconcile the enormous number of viral and bacterial peptides disseminated throughout the human proteins with a fundamental role for molecular mimicry in the etiology of certain autoimmune conditions.

Instead, we believe that the high number of bacterial sequences that are also found in the human proteome, but are not clinically relevant in terms of inducing autoimmune diseases, offers a mechanistic basis for an additional microbial immune evasion strategy. Through evolution and adaptation, microbes have developed strategies that allow them to evade the immune system of their host. Such tactics promote infectious persistence and chronicity; among others, these include the altered peptide ligands of the circumsporozoite protein in malaria;28 macrophage apoptosis in microbial infections by Shigella;29 antigenic variations in Trypanosoma cruzi,30 and the consumption/degradation of complement components in microbial organisms like Porphyromonas gingivalis and Trichomonas vaginalis.31 The high level of peptide sharing between microbial and human proteomes might represent a camouflage mechanism that protects microbes from the immune attack of the host, possibly acting through the regulatory T cells that provide critical control of unwanted autoimmune responses. In a wider context, the high level of exact peptide sharing between microbial and human proteomes suggests that post-translational modifications (i.e., glycosylation, cysteinylation, citrullination, etc.) should be reconsidered as a factor that may contribute to the creation or disruption of microbial epitopes.32

Finally, from an evolutionary point of view, the massive and repeated distribution of bacterial amino acid sequences throughout the human proteome seems to indicate that bacterial and human proteins are composed of common peptide backbone units and suggests the existence of a common structural platform in the composition of proteomes, be they microbial or human.1,33

Methods

The human proteome was downloaded from Integr8 (www.ebi.ac.uk/integr8),34 and contained 38,009 proteins at the time that it was downloaded. To reduce sequence redundancy, all possible pairs of proteins in this proteome were examined. For a given pair, if the sequences were identical then one sequence was arbitrarily chosen for deletion; if one sequence was a fragment of the other sequence, then the fragment was deleted. After filtering, we were left with a human proteome consisting of 36,014 unique proteins, for a total of 15,806,702 amino acids.

Like the human proteome, all bacterial proteomes were downloaded from Integr8.34 The set of pathogenic bacteria was acquired by searching EBI's list of bacteria (www.ebi.ac.uk/genomes/bacteria.html) for those that cause disease in humans. The set of non-pathogenic bacteria was acquired by arbitrarily choosing bacteria listed on the Integrated Microbial Genomes (IMG) website (http://img.jgi.doe.gov/cgi-bin/pub/main.cgi)35 that contain the annotation “Disease: none.” Although the IMG site contains downloadable proteomes for each organism, these proteomes were downloaded from Integr8 instead of the IMG site in order to maintain consistency with the pathogenic bacteria. Each bacterial proteome was filtered in the same manner as the human proteome. The 40 filtered bacterial proteomes consisted of 128,248 unique proteins for a total of 39,651,163 amino acids.

Sequence similarity analysis of each of the 40 bacterial proteomes to the human proteome was carried out using bacterial nonamers sequentially overlapped by eight residues. The scans were performed by custom programs written in C, which utilized suffix trees for efficiency.36 The bacterial proteomes were manipulated and analyzed as follows. Each bacterial proteome was decomposed in silico to a set of nonamers (including all duplicates). A library of unique nonamers for each microbial proteome was then created by removing duplicates. Next, for each nonamer in the library, the entire human proteome was searched for instances of the same nonamer. Any such occurrence was termed an overlap or match. Cursory analyses (e.g., identification of unique overlapping nonamers, counts of unique overlapping nonamers, counts of duplications) were performed using shell scripts and standard LINUX/UNIX utilities. Linear least-squares regression was performed to determine whether any linear relationships exist between the size of a given bacterial proteome and its level of overlap to the human proteome.

Acknowledgements

B.T. performed the computational analysis. A.K. provided bioinformatics expertise and supervised the computational analysis. G.L. developed initial analyses of bacterial proteomes, validated them by PIR perfect match program, and analyzed output data. D.K. proposed the original idea, supervised the work, interpreted the data and wrote the paper. All four authors revised the paper, with a major contribution by B.T.

Funding for this work was provided by the Ministry of University and Research of Italy (MIUR) and the Natural Sciences and Engineering Research Council of Canada (NSERC).

Footnotes

References

  • 1.Kanduc D, Tessitore L, Lucchese G, Kusalik A, Farber E, Marincola FM. Sequence uniqueness and sequence variability as modulating factors of human anti-HCV humoral immune response. Cancer Immunol Immunother. 2008;57:1215–1223. doi: 10.1007/s00262-008-0456-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Capone G, De Marinis A, Simone S, Kusalik A, Kanduc D. Mapping the human proteome for nonredundant peptide islands. Amino Acids. 2008;35:209–216. doi: 10.1007/s00726-007-0563-7. [DOI] [PubMed] [Google Scholar]
  • 3.Kanduc D. “Self-nonself” peptides in the design of vaccines. Curr Pharm Des. 2009;28:3283–3289. doi: 10.2174/138161209789105135. [DOI] [PubMed] [Google Scholar]
  • 4.Kusalik A, Bickis M, Lewis C, Li Y, Lucchese G, Marincola FM, et al. Widespread and ample peptide overlapping between HCV and Homo sapiens proteomes. Peptides. 2008;28:1260–1267. doi: 10.1016/j.peptides.2007.04.001. [DOI] [PubMed] [Google Scholar]
  • 5.Kanduc D, Stufano A, Lucchese G, Kusalik A. Massive peptide sharing between viral and human proteomes. Peptides. 2008;29:1755–1766. doi: 10.1016/j.peptides.2008.05.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Oldstone MB. Molecular mimicry and immune-mediated diseases. FASEB J. 1998;12:1255–1265. doi: 10.1096/fasebj.12.13.1255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Oldstone MB. A suspenseful game of ‘hide and seek’ between virus and host. Nat Immunol. 2007;8:325–327. doi: 10.1038/ni0407-325. [DOI] [PubMed] [Google Scholar]
  • 8.Oomes PG, Jacobs BC, Hazenberg MP, Bänffer JR, van der Meché FG. Anti-GM1 IgG antibodies and Campylobacter bacteria in Guillain-Barre syndrome: evidence of molecular mimicry. Ann Neurol. 1995;38:170–175. doi: 10.1002/ana.410380208. [DOI] [PubMed] [Google Scholar]
  • 9.Cunningham MW. Autoimmunity and molecular mimicry in the pathogenesis of post-streptococcal heart disease. Front Biosci. 2003;8:533–543. doi: 10.2741/1067. [DOI] [PubMed] [Google Scholar]
  • 10.Lamb DJ, El-Sankary W, Ferns GA. Molecular mimicry in atherosclerosis: a role for heat shock proteins in immunisation. Atherosclerosis. 2003;167:177–185. doi: 10.1016/s0021-9150(02)00301-5. [DOI] [PubMed] [Google Scholar]
  • 11.Ebringer A, Ahmadi K, Fielder M, Rashid T, Tiwana H, Wilson C, et al. Molecular mimicry: the geographical distribution of immune responses to Klebsiella in ankylosing spondylitis and its relevance to therapy. Clin Rheumatol. 1996;15:57–61. doi: 10.1007/BF03342648. [DOI] [PubMed] [Google Scholar]
  • 12.Karopoulos C, Rowley MJ, Handley CJ, Strugnell RA. Antibody reactivity to mycobacterial 65 kDa heat shock protein: relevance to autoimmunity. J Autoimmun. 1995;8:235–48. doi: 10.1006/jaut.1995.0018. [DOI] [PubMed] [Google Scholar]
  • 13.O'Donohue J, McFarlane B, Bomford A, Yates M, Williams R. Antibodies to atypical mycobacteria in primary biliary cirrhosis. J Hepatol. 1994;21:887–889. doi: 10.1016/s0168-8278(94)80255-6. [DOI] [PubMed] [Google Scholar]
  • 14.Li de la Sierra I, Pernot L, Prangé T, Saludjian P, Schiltz M, et al. Molecular structure of the lipoamide dehydrogenase domain of a surface antigen from Neisseria meningitidis. J Mol Biol. 1997;269:129–141. doi: 10.1006/jmbi.1997.1009. [DOI] [PubMed] [Google Scholar]
  • 15.Markesich DC, Sawai ET, Butel JS, Graham DY. Investigations on etiology of Crohn's disease. Humoral immune response to stress (heat shock) proteins. Dig Dis Sci. 1991;36:454–460. doi: 10.1007/BF01298874. [DOI] [PubMed] [Google Scholar]
  • 16.Richter W, Mertens T, Schoel B, Muir P, Ritzkowsky A, Scherbaum WA, et al. Sequence homology of the diabetes-associated autoantigen glutamate decarboxylase with coxsackie B4-2C protein and heat shock protein 60 mediates no molecular mimicry of autoantibodies. J Exp Med. 1994;180:721–726. doi: 10.1084/jem.180.2.721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Horwitz MS, Bradley LM, Harbertson J, Krahl T, Lee J, Sarvetnick N. Diabetes induced by Coxsackie virus: initiation by bystander damage and not molecular mimicry. Nat Med. 1998;4:781–785. doi: 10.1038/nm0798-781. [DOI] [PubMed] [Google Scholar]
  • 18.Zhao R, Loftus DJ, Appella E, Collins EJ. Structural evidence of T cell xeno-reactivity in the absence of molecular mimicry. J Exp Med. 1999;189:359–370. doi: 10.1084/jem.189.2.359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Tanaka A, Prindiville TP, Gish R, Solnick JV, Coppel RL, et al. Are infectious agents involved in primary biliary cirrhosis? A PCR approach. J Hepatol. 1999;31:664–671. doi: 10.1016/s0168-8278(99)80346-8. [DOI] [PubMed] [Google Scholar]
  • 20.Verjans GM, Remeijer L, Mooy CM, Osterhaus AD. Herpes simplex virus-specific T cells infiltrate the cornea of patients with herpetic stromal keratitis: no evidence for autoreactive T cells. Invest Ophthalmol Vis Sci. 2000;41:2607–2612. [PubMed] [Google Scholar]
  • 21.Kirby B, Al-Jiffri O, Cooper RJ, Corbitt G, Klapper PE, Griffiths CE. Investigation of cytomegalovirus and human herpes viruses 6 and 7 as possible causative antigens in psoriasis. Acta Derm Venereol. 2000;80:404–406. doi: 10.1080/000155500300012738. [DOI] [PubMed] [Google Scholar]
  • 22.Benoist C, Mathis D. Autoimmunity provoked by infection: how good is the case for T cell epitope mimicry? Nat Immunol. 2001;2:797–801. doi: 10.1038/ni0901-797. [DOI] [PubMed] [Google Scholar]
  • 23.Schloot NC, Willemen SJ, Duinkerken G, Drijfhout JW, de Vries RR, Roep BO. Molecular mimicry in type 1 diabetes mellitus revisited: T-cell clones to GAD65 peptides with sequence homology to Coxsackie or proinsulin peptides do not crossreact with homologous counterpart. Hum Immunol. 2001;62:299–309. doi: 10.1016/s0198-8859(01)00223-3. [DOI] [PubMed] [Google Scholar]
  • 24.Faller G, Keller KM, Claeys D, Buderus S, Kühlwein D, Reiche N, et al. Prevalence and specificity of antigastric autoantibodies in adolescents infected with Helicobacter pylori. J Pediatr. 2002;140:68–74. doi: 10.1067/mpd.2002.120270. [DOI] [PubMed] [Google Scholar]
  • 25.Van Bilsen JH, Wagenaar-Hilbers JP, Boot EP, van Eden W, Wauben MH. Searching for the cartilage-associated mimicry epitope in adjuvant arthritis. Autoimmunity. 2002;35:201–210. doi: 10.1080/08916930290024188. [DOI] [PubMed] [Google Scholar]
  • 26.Wang CX, Teufel A, Cheruti U, Grötzinger J, Galle PR, Lohse AW, et al. Characterization of human gene encoding SLA/LP autoantigen and its conserved homologs in mouse, fish, fly and worm. World J Gastroenterol. 2006;12:902–907. doi: 10.3748/wjg.v12.i6.902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Fourneau JM, Bach JM, van Endert PM, Bach JF. The elusive case for a role of mimicry in autoimmune diseases. Mol Immunol. 2004;40:1095–1102. doi: 10.1016/j.molimm.2003.11.011. [DOI] [PubMed] [Google Scholar]
  • 28.Plebanski M, Lee EA, Hannan CM, Flanagan KL, Gilbert SC, Gravenor MB, et al. Altered peptide ligands narrow the repertoire of cellular immune responses by interfering with T-cell priming. Nat Med. 1999;5:565–571. doi: 10.1038/8444. [DOI] [PubMed] [Google Scholar]
  • 29.Hilbi H, Zychlinsky A, Sansonetti PJ. Macrophage apoptosis in microbial infections. Parasitology. 1997;115:79–87. doi: 10.1017/s0031182097001790. [DOI] [PubMed] [Google Scholar]
  • 30.Schechter M, Nogueira N. Variations induced by different methodologies in Trypanosoma cruzi surface antigen profiles. Mol Biochem Parasitol. 1988;29:37–45. doi: 10.1016/0166-6851(88)90117-x. [DOI] [PubMed] [Google Scholar]
  • 31.Mosser DM, Brittingham A. Leishmania, macrophages and complement: a tale of subversion and exploitation. Parasitology. 1997;115:9–23. doi: 10.1017/s0031182097001789. [DOI] [PubMed] [Google Scholar]
  • 32.Doyle HA, Mamula MJ. Post-translational protein modifications in antigen recognition and autoimmunity. Trends Immunol. 2001;22:443–449. doi: 10.1016/s1471-4906(01)01976-7. [DOI] [PubMed] [Google Scholar]
  • 33.Kusalik A, Trost B, Bickis M, Fasano C, Capone G, Kanduc D. Codon number shapes peptide redundancy in the universal proteome composition. Peptides. 2009 doi: 10.1016/j.peptides.2009.06.035. In press. [DOI] [PubMed] [Google Scholar]
  • 34.Kersey P, Bower L, Morris L, Horne A, Petryszak R, Kanz C. Integr8 and Genome Reviews: integrated views of complete genomes and proteomes. Nucleic Acids Res. 2005;33:297–302. doi: 10.1093/nar/gki039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Markowitz VM, Szeto E, Palaniappan K, Grechkin Y, Chu K, Chen IM. The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions. Nucleic Acids Res. 2008;36:528–533. doi: 10.1093/nar/gkm846. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Gusfield D. Algorithms on strings, trees and sequences: Computer science and computational biology. Cambridge University Press; 1997. [Google Scholar]

Articles from Self Nonself are provided here courtesy of Taylor & Francis

RESOURCES