Abstract
The complete DNA sequence of pO157, the large virulence plasmid of EHEC strain O157:H7 EDL 933, is presented. The 92 kb F-like plasmid is composed of segments of putative virulence genes in a framework of replication and maintenance regions, with seven insertion sequence elements, located mostly at the boundaries of the virulence segments. One hundred open reading frames (ORFs) were identified, of which 19 were previously sequenced potential virulence genes. Forty-two ORFs were sufficiently similar to known proteins for suggested functions to be assigned, and 22 had no convincing similarity with any known proteins. Of the newly identified genes, an unusually large ORF of 3169 amino acids has a putative cytotoxin active site shared with the large clostridial toxin (LCT) family and proteins such as ToxA and B of Clostridium difficile . A conserved motif was detected that links the large ORF and the LCT proteins with the OCH1 family of glycosyltransferases. In the complete sequence, the mosaic form can be observed at the levels of base composition, codon usage and gene organization. Insights were obtained from patterns of DNA composition as well as the pathogenic and 'housekeeping' gene segments. Evolutionary trees built from shared plasmid maintenance genes show that even these genes have heterogeneous origins.