Skip to main content
Journal of Clinical Microbiology logoLink to Journal of Clinical Microbiology
. 2010 Mar 3;48(5):1921–1923. doi: 10.1128/JCM.00357-10

MST (Molecular Serotyping Tool): a Program for Computer-Assisted Molecular Identification of Escherichia coli and Shigella O Antigens

Roney S Coimbra 1,2,*, François Artiguenave 3, Leandro S R Z Jacques 1, Guilherme C Oliveira 1,2
PMCID: PMC2863929  PMID: 20200287

Abstract

Escherichia coli and Shigella O antigens can be inferred using the rfb-restriction fragment length polymorphism (RFLP) molecular test. We present herein a dynamic programming algorithm-based software to compare the rfb-RFLP patterns of clinical isolates with those in a database containing the 171 previously published patterns corresponding to all known E. coli/Shigella O antigens.


Classical Escherichia coli/Shigella O serogrouping is expensive, labor-intensive, and susceptible to errors due to cross-reactivity between adsorbed O-antigen rabbit antisera, as reviewed in reference 2. Furthermore, corrupted expression of genes involved in O-antigen synthesis renders some strains nontypeable (“rough”). Importantly, classical serotyping does not detect new O antigens. These drawbacks are surmounted by the rfb-restriction fragment length polymorphism (RFLP) test (1, 2). Briefly, the rfb locus containing most of the genes involved in O-antigen synthesis is amplified by PCR and digested with MboII, and products are resolved by electrophoresis. A database with 171 rfb-RFLP patterns representing each known Shigella and E. coli O antigen has been published and used to type reference and clinical strains, including an isolate that had become rough in the lab, with 100% specificity and sensitivity (1, 2). We present herein a web-based software program to compare the rfb-RFLP patterns of clinical isolates with those of known E. coli and Shigella O antigens.

For the purpose of this work, the concepts of similarity and alignment between two rfb-RFLP patterns were adopted from Needleman and Wunsch's dynamic programming algorithm (7), which detects insertions, deletions, and substitutions as changes in the strings that represent nucleic acid or protein sequences. By analogy, restriction patterns represented by ordered fragment sizes can be aligned and their similarity can be calculated as the sum of penalties for the edit operations that transform one pattern into the other. These edit operations are deletions (missing bands) and transformations (errors on fragment sizing). A scoring function assigns a penalty to each transformation, and a dynamic programming algorithm computes the best editing scores between two patterns, producing an editing matrix. For two patterns, a = a1a2···am and b = b1b2···bn, where ai is the fragment size at position i of test pattern a and bj is the fragment size at position j of reference pattern b, the editing matrix has (m + 1) × (n + 1) positions, where n is the number of fragments in pattern a and m is the number of fragments in pattern b. The matrix is initialized, filling the first column with each fragment of pattern a and the first row with each fragment of pattern b. The aim is to move across the matrix from the first column and first row to the last column and last row. At each position, it is possible to move only one step either in the diagonal to align two fragments and advance one position in both patterns or in the horizontal (or vertical) to acknowledge a missing fragment in one pattern and advance one position in the other. Sij is the cumulative score at position i in pattern a and position j in pattern b:

graphic file with name M1.gif

where

graphic file with name M2.gif

s(aibj) is the score for aligning the fragments ai and bj, and w is the penalty for a missing fragment at position ai of pattern a or in position bj of pattern b; σ is a threshold defined by the equation σ = −5.82E−06 ai + 0.04451.

The scoring function tolerates a variable sizing error between two identical fragments. This error cannot exceed the penalty for a band deletion (w) which, as defined by the variable threshold σ, corresponds to a maximal error in band sizing ranging linearly from 7.0% at 0.5 kbp to 3.5% at 4 kbp. The editing score between two patterns is the sum of the penalties of all edit operations required to transform one pattern into another. It is found in the intersection of the last row and last column of the editing matrix. The corresponding global alignment is then extracted by tracing back the editing matrix.

A global score (Gs) is calculated using the editing score, the number of nonaligned fragments, and the total number of fragments in the two patterns: Gs = editing score × number of gaps × 100/(number of bands)2.

Gs is more influenced by failing to match two identical bands than by errors in band sizing, which is the most common artifact affecting reproducibility of RFLP-based methods.

MST (Molecular Serotyping Tool) (http://www.cebio.org/mst) iterates over the 171 reference Shigella and E. coli rfb-RFLP patterns previously published (1, 2), searching for the reference pattern with the lowest Gs to the test pattern provided by the user. If a match is found with Gs under a threshold (default = 1.5), the output is a schematic representation of the two patterns displayed side by side where two corresponding bands are linked by a line and the absence of a band is represented by a blank space in the respective pattern (Fig. 1).

FIG. 1.

FIG. 1.

Output of MST (Molecular Serotyping Tool). (A) Schematic view of a perfect alignment (global score [Gs] = 0) between the rfb-RFLP pattern of strain 98-0917 of Shigella dysenteriae serotype 1 and the reference pattern A1. (B) Alignment containing gaps between the rfb-RFLP patterns of strain 98-3043 of S. dysenteriae serotype 3 and reference pattern R124_A3 (Gs = 1.0). This pattern is shared by E. coli O124 and Shigella dysenteriae 3. Corresponding bands in each pattern are linked by a line. P, reference pattern; ?, experimental pattern.

MST (Molecular Serotyping Tool) was validated by searching the database with previously published rfb-RFLP patterns of 24 Shigella and 14 E. coli clinical isolates with known O antigens determined by rfb-RFLP and classical serotyping (1, 2). In all cases, MST accurately identified the O antigen from the rfb-RFLP pattern (Table 1) (100% specificity and sensitivity; discriminatory power = 1; scores ranging from 0 to 1.0, median = 0). The 171 patterns in the reference database for E. coli and Shigella were shuffled to generated 1,000 random patterns which were then compared to the database with MST. Seventeen matches (1.7%) were found with scores ranging from 0 to 1.5 (median = 1.2). The median scores of the 38 clinical isolates and the 17 random patterns with matches in the database were significantly different (P < 0.0001) by the Mann-Whitney test (GraphPad Prism, version 5.0; GraphPad Software Inc., San Diego, CA).

TABLE 1.

Identification of O antigens of E. coli and Shigella clinical isolates using MST (Molecular Serotyping Tool)

Strain Identification by biochemical tests and serotyping DNA fragments generated by rfb-RFLP (sizes in bp) Closest rfb-RFLP reference pattern MST score
98-7171 Shigella dysenteriae 1 1,350, 1,005, 908, 780, 713, 674, 473, 448, 402 A1 0.1
98-0917 S. dysenteriae 1 1,360, 1,008, 909, 794, 725, 688, 483, 454, 412, 259 A1 0.0
98-1240 S. dysenteriae 1 1,372, 1,019, 917, 805, 732, 687, 487, 458, 416, 263, 231, 177, 132 A1 0.0
98-3549 S. dysenteriae 1 1,362, 1,016, 925, 805, 729, 692, 486, 457, 415, 260 A1 0.0
98-1285 S. dysenteriae 2 1,985, 1,373, 1,111, 1,074, 995, 371, 329, 309, 301 R112_A2 0.0
98-1062 S. dysenteriae 3 1,622, 1,309, 1,157, 939, 868, 713, 381, 333, 311, 275, 262 R124_A3 0.7
98-3043 S. dysenteriae 3 1,627, 1,328, 1,151, 933, 860, 705, 378, 328, 304, 270, 260 R124_A3 1.0
98-1802 S. dysenteriae 3 1,660, 1,350, 1,172, 946, 870, 719, 393, 343, 321, 287, 277, 254 R124_A3 0.2
98-2893 S. dysenteriae 3 1,644, 1,327, 1,154, 938, 868, 717, 393, 341, 318, 283, 269, 251 R124_A3 0.2
98-7485 S. dysenteriae 4 1,799, 1,295, 914, 865, 811, 765, 670, 642, 615, 433, 406, 364, 352 A4 0.0
98-2786 S. dysenteriae 8 1,207, 1,090, 836, 652, 511, 464, 386, 368, 339, 261 A8 0.4
94-10434 S. dysenteriae 11 1,192, 1,148, 1,077, 917, 874, 788, 751, 712, 681, 626, 537, 499, 475, 400, 358, 318, 286 A11 0.0
140-80 Shigella flexneri 2 2,050, 1,590, 846, 719, 679, 566, 539, 263 R13_111b_B1-5a 0.0
137-80 S. flexneri 3 2,050, 1,611, 847, 720, 680, 561, 538, 264 R13_111b_B1-5a 0.0
89-80 S. flexneri 4 2,049, 1,580, 847, 713, 674, 562, 533, 257 R13_111b_B1-5a 0.0
98-7639 Shigella boydii 1 946, 890, 827, 691, 630, 559, 485, 366, 306, 263 R149a_C1 0.0
98-5278 S. boydii 2 1,611, 1,004, 963, 851, 719, 689, 674, 594, 551, 478, 319, 300, 258 C2 0.0
95-1589 S. boydii 7 2,694, 1,187, 776, 730, 647, 586, 522, 503, 424, 405, 306, 265 C7 0.0
98-8651 S. boydii 10 1,326, 1,216, 1,173, 842, 638, 586, 540, 505, 443, 427, 373, 331, 272 C10 0.0
38-87 S. boydii 12 1,657, 766, 731, 690, 620, 546, 519, 470, 447, 404, 361, 301, 250 R7_C12b 0.0
98-8059 S. boydii 18 1,975, 1,531, 719, 641, 573, 529, 475, 461, 413, 375, 252 C18 0.0
98-8725 Shigella sonnei a 1,510, 1,440, 1,289, 1,029, 955, 898, 716, 644, 630, 590, 572, 494, 413, 377, 355, 335, 280 D 0.0
94-4723 S. sonnei d 1,499, 1,429, 1,294, 1,015, 957, 904, 721, 640, 626, 590, 574, 501, 415, 379, 357, 336, 278 D 0.0
155-87 S. sonnei f 1,475, 1,402, 1,263, 989, 938, 882, 701, 627, 606, 574, 556, 477, 396, 362, 343, 317, 260 D 0.0
Ec155 E. coli O2 3,011, 1,661, 879, 815, 716, 685, 662, 633, 606, 574, 462, 380, 325, 315, 277, 260 R2a_5b_50 0.0
PE42 E. coli O15 2,027, 930, 737, 712, 544, 510, 491, 429, 414, 381, 273, 254 R15 0.7
Ec211 E. coli O52 1,465, 816, 678, 607, 557, 533, 495, 467, 355, 306, 281, 266 R52 0.0
Ec102 E. coli O69 1,760, 1,076, 836, 710, 674, 565, 479, 465, 410, 366, 354, 299, 291, 255 R69 0.1
Ec154 E. coli O75 1,661, 1,282, 948, 858, 757, 692, 601, 477, 396, 389, 342, 321, 280, 254 R75 0.1
Ec197 E. coli O86 1,354, 1,099, 918, 733, 690, 610, 566, 545, 530, 424, 408, 387, 323, 302 R86 0.2
Ec81 E. coli O89 1,892, 1,119, 1,060, 1,020, 744, 527, 438, 384, 346, 325, 311, 281, 266 R89 0.0
Ec129 E. coli O102 1,422, 1,176, 979, 650, 628, 596, 483, 455, 354, 313, 279, 263, 254 R102 0.1
97-7739 E. coli O111 1,858, 1,226, 1,191, 920, 593, 426, 400, 389, 374, 362, 336, 316, 296, 278 R111a 0.0
PE125 E. coli O113 1,037, 987, 949, 872, 802, 706, 467, 435, 396, 382, 320, 266 R113 0.0
PE16 E. coli O121 1,762, 1,313, 1,013, 933, 903, 801, 698, 667, 579, 467, 451, 435, 396, 386, 346, 312, 285, 257 R121 0.1
97-9206 E. coli O126 1,589, 1,300, 985, 883, 765, 742, 579, 510, 494, 471, 459, 397, 355, 312 R126 0.0
97-4922 E. coli O128 1,401, 1,123, 915, 765, 697, 642, 611, 533, 418, 356, 344, 323, 297 R128 0.0
97-11366 E. coli O128 1,384, 1,104, 914, 763, 695, 633, 609, 531, 420, 358, 342, 325, 299 R128 0.0

Three new Shigella serotypes and an O148 Shiga toxin-producing E. coli strain have been described using the rfb-RFLP test (3-6). The possibility of searching the database of reference rfb-RFLP patterns using MST might further contribute to the epidemiology of E. coli and Shigella.

Acknowledgments

R.S.C. and G.C.O. are research fellows from CDTS-FIOCRUZ/CAPES and CNPq, respectively. This work was supported by FAPEMIG grant CBB-1181/0 and NIH-Fogarty grant TW007012.

We acknowledge Eric Aguiar for technical assistance.

Footnotes

Published ahead of print on 3 March 2010.

REFERENCES

  • 1.Coimbra, R., F. Grimont, and P. Grimont. 1999. Identification of Shigella serotypes by restriction of amplified O-antigen gene cluster. Res. Microbiol. 150:543-553. [DOI] [PubMed] [Google Scholar]
  • 2.Coimbra, R., F. Grimont, P. Lenormand, P. Burguiere, L. Beutin, and P. Grimont. 2000. Identification of Escherichia coli O-serogroups by restriction of the amplified O-antigen gene cluster (rfb-RFLP). Res. Microbiol. 151:639-654. [DOI] [PubMed] [Google Scholar]
  • 3.Coimbra, R., P. Lenormand, F. Grimont, P. Bouvet, S. Matsushita, and P. Grimont. 2001. Molecular and phenotypic characterization of potentially new Shigella dysenteriae serotype. J. Clin. Microbiol. 39:618-621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Espié, E., F. Grimont, V. Vaillant, M. Montet, I. Carle, C. Bavai, H. de Valk, and C. Vernozy-Rozand. 2006. O148 Shiga toxin-producing Escherichia coli outbreak: microbiological investigation as a useful complement to epidemiological investigation. Clin. Microbiol. Infect. 12:992-998. [DOI] [PubMed] [Google Scholar]
  • 5.Grimont, F., M. Lejay-Collin, K. Talukder, I. Carle, S. Issenhuth, K. Le Roux, and P. Grimont. 2007. Identification of a group of shigella-like isolates as Shigella boydii 20. J. Med. Microbiol. 56:749-754. [DOI] [PubMed] [Google Scholar]
  • 6.Melito, P., D. Woodward, J. Munro, J. Walsh, R. Foster, P. Tilley, A. Paccagnella, J. Isaac-Renton, J. Ismail, and L. Ng. 2005. A novel Shigella dysenteriae serovar isolated in Canada. J. Clin. Microbiol. 43:740-744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Needleman, S., and C. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48:443-453. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Clinical Microbiology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES