Considerable advances in automated DNA sequencing and bioinformatics have made the sequencing of entire microbial genomes a reality (see fig 1). Researchers have been quick to recognise the scientific value of microbial sequence data, and there is now a frantic rush to sequence a wide range of important micro-organisms. The continuing release of data from these projects, combined with advances in techniques used to investigate pathogenesis, will lead to an unprecedented understanding of micro-organisms and how they cause disease.
Figure 1.
Sequencing of the genome of a micro-organism such as Helicobacter pylori (a). First, the microbial chromosomal DNA is isolated and purified (b). The DNA is then mechanically sheared into random fragments, which are sequenced by banks of high throughput sequencers (c). The completed fragments are assembled in the correct order by areas of overlapping sequence, and any gaps are closed (d). Computer analysis of the completed sequence permits annotation of the data and prediction of putative coding regions, operons, and regulatory sequences. The end result is a gene map (e) with genes colour coded by their role (energy metabolism, amino acid biosynthesis, replication, etc)
In this article I discuss how this sequence information will affect the practice of clinical microbiology and the management of infectious diseases. In particular, I examine how the large scale study of the entire genetic complement of micro-organisms should result in the provision of more rapid and efficient diagnostic techniques, as well as the development of new antimicrobial strategies.
Microbial genome sequencing
In the past few years a large number of clinically important bacteria, parasites, and fungi have been sequenced, and genomic data for many human pathogens will be available by the turn of the century (see box). Because many of the early sequencing efforts were funded by private industry, concerns were raised about ownership and freedom of access to the data generated by these projects. As sequencing technology has become more refined and efficient, however, the cost of such projects has fallen, and funding is increasingly being provided by government agencies and medical charities. This has greatly improved access to microbial sequence information, and data from most projects are deposited, often on a daily basis, on websites on the internet.
A comprehensive catalogue of all genome sequencing projects is provided by the Magpie Genome Sequencing Project List (http://www.mcs.anl.gov/home/gaasterl/magpie.html), and links to individual projects are also available at this site. Other useful resources include the website of the Institute for Genomic Research (http://www.tigr.org), which has performed much of the pioneering work on genomics, and the Pasteur Institute’s home page (http://www. pasteur.fr), which has a version in English and provides cross referencing links to related sites in general microbiology, medicine, and molecular biology.
Predicted developments
Comprehensive study of microbial pathogenesis and the interaction between pathogens and their hosts
Identification of sensitive and specific molecular targets suitable for microbial identification, typing, and for use as markers of antimicrobial resistance
Discovery of microbial molecular markers associated with substantial variance in the risk and severity of disease
Selection of potential candidates for the rational development of new therapeutic agents and vaccines
Improved understanding of normal human gene function and provision of experimental systems for the further analysis of human hereditary disease
Large scale projects for sequencing microbial genomes that are completed or in progress
| Bacteria and related organisms | Parasites |
| Borrelia burgdorferi | Brugia malayi |
| Escherichia coli O157 | Cryptosporidium parvum |
| Haemophilus influenzae | Leishmania major |
| Helicobacter pylori | Plasmodium falciparum |
| Mycobacterium leprae and M tuberculosis | Pneumocystis carinii |
| Mycoplasma pneumoniae | Schistosoma mansoni |
| Neisseria gonorrhoeae and N meningitidis | Trypanosoma brucei and T cruzi |
| Rickettsia prowazekii | Fungi |
| Staphylococcus aureus | Aspergillus nidulans |
| Streptococcus pneumoniae and S pyogenes | Candida albicans |
| Treponema pallidum | Saccharomyces cerevisiae |
| Vibrio cholerae | |
Towards a better understanding of microbial pathogenesis
In order to understand how micro-organisms cause disease, it is necessary to identify those microbial gene products specifically implicated in the infection process. By characterising the strategies used by different organisms, it is possible to appreciate why certain species or strains are associated with clinical disease, whereas others are merely harmless commensals. The genome sequence of a pathogenic micro-organism is a tremendous resource, but it is essentially only an inventory of all its genes. To unravel the complexities of pathogenesis, we must identify exactly which of these genes are associated with colonisation of the host, persistence, and disease causation.
Screening for virulence genes
With modern bioinformatic software, it is possible to screen out potential virulence genes by computer analysis of the genome sequence of a micro-organism. The power and simplicity of this approach became clear after the publication of the H influenzae genome.1 Before this genome had been published, researchers had identified only seven genes involved in the biosynthesis of lipopolysaccharide, an important virulence factor. A search of the entire genome of H influenzae for lipopolysaccharide genes known to exist in other organisms (a homology search) identified 25 new biosynthetic genes.2 Further characterisation of these may lead to the identification of a potential target for a vaccine against non-type b strains of H influenzae. Similarly, it is possible to search for sequences that act as markers for virulence genes. In H influenzae and other bacteria the expression of cell surface molecules (which allow the organism to adhere to host cells, adapt to the host’s microenvironment, and avoid the host’s immune defences) is regulated by multiple repeating sequences.3 These repeating sequences can therefore be used as markers for genes important in mucosal colonisation and the interaction between the microbe and its host.
Computer analysis can also be used to identify “pathogenicity islands,” hot spots within microbial genomes that contain blocks of important virulence genes. Because these distinct genetic elements usually have a different nucleotide composition to the rest of the bacterial chromosome, they can be identified with sophisticated computer software. By comparing the entire genome sequences of micro-organisms, we should be able to determine which factors are conserved for pathogenicity and which are unique to certain pathogens. Consequently, it will be possible to make predictions of clinical outcome on the basis of the genetic makeup of particular species or strains.
Biological and physiological studies
Although computer analysis of sequence data can rapidly generate new information on micro-organisms, this technique has its limitations and often only provides clues about pathogenic mechanisms. To complete the functional analysis of an organism and obtain meaningful information on how pathogens cause disease, it is necessary to perform biological and physiological studies. The traditional method of studying microbial pathogenesis has been to “knock out” or mutate an individual gene and then characterise the effect this has on the micro-organism. Normally, mutating a gene will result in the loss of a certain function (for example, loss of motility), and it is then possible to determine if loss of this function affects the ability of the micro-organism to cause disease in an animal model. This gene by gene approach is laborious and time consuming, particularly when even the smallest free living micro-organism (Mycoplasma genitalium) has 470 genes.4 Since the advent of genome sequencing the challenge facing researchers has been to develop techniques with high throughput that allow the simultaneous examination of all the genes of a micro-organism in a systematic fashion.
One system that has been developed for the large scale study of pathogenesis is signature tagged mutagenesis.5 In this technique, a large number of genes (usually about 100) are mutated simultaneously, so creating a pool of knockout mutants. As each gene is knocked out, a unique identifying “bar code” (the tag) is inserted into each knockout mutant, so making it possible to identify any of the 100 mutants within the pool. This means that the entire pool of knockout mutants can be studied together (rather than one by one), and any individual mutant that shows interesting features can be pulled out of the pool by means of the bar code identification system. This technique has proved particularly useful for identifying new virulence factors in animal models. A pool of bar coded knockout mutants is used to inoculate and cause disease in the animal. Individual mutants that are not subsequently recovered from the animal have lost the ability to establish infection and therefore harbour a mutation in a gene likely to be involved in causing disease. These particular knockout mutants can be picked out of the pool, and the role of the mutated gene in pathogenesis can be studied further. Signature tagged mutagenesis is an approach that can be used in a wide range of micro-organisms and has already identified important virulence genes in Salmonella typhimurium, Staphylococcus aureus, and Vibrio cholerae.5–7
Other techniques that allow a more global approach to the study of microbial pathogenesis have also recently been developed.8,9 An important feature of these approaches is that they can be used either in animal models of infection or under conditions that mimic those encountered within a host cell. This allows specific identification of microbial factors preferentially expressed during the infection process and which therefore play a crucial role in the development of disease. These newly identified virulence factors will not only further unravel the complexity of microbial pathogenesis but will also provide potential targets for diagnostic use and for new drugs and vaccines.
New approaches to diagnosis
While enormous changes have occurred elsewhere in clinical practice, the routine diagnosis of microbial infection by clinical microbiology laboratories has changed little in the past 50 years. Identification of micro-organisms continues to rely heavily on morphological characterisation, which is both time consuming and labour intensive. Despite considerable advances in molecular biology, molecular diagnostic techniques are generally restricted to the identification of fastidious, slow growing, or dangerous organisms such as Chlamydia trachomatis and Mycobacterium tuberculosis. As well as problems of cost and practicability, the paucity of truly sensitive and specific molecular targets and of cheap, easy to use methods of detection have been major obstacles to routine implementation of these techniques.
As the entire genome sequences of many common human pathogens become available, it will finally be possible to discover if specific genetic markers do exist for individual pathogens and if it is possible to distinguish between micro-organisms on the basis of these sequences. Moreover, it will also be possible to identify genetic markers that will assist in typing (strain specific markers), in predicting clinical outcome (virulence markers), and in predicting susceptibility to antimicrobial drugs (resistance markers). Once these targets have been identified, the challenge will be to incorporate them into standardised, automated assays that will allow the rapid and efficient examination of large numbers of clinical specimens.
Classic detection of genetic markers is achieved by hybridisation (in which the target sequence pairs up and binds to a complementary “probe” sequence) or by genetic amplification of the target sequence (such as by the polymerase chain reaction).
Gene amplification
Currently, gene amplification can be used to detect only a small number of human pathogens, including Mycoplasma pneumoniae and Bordetella pertussis. Recently, there has been interest in developing amplification techniques that might allow the detection of more than one target sequence (and hence more than one pathogen) in a sample.10 This can be achieved with multiplex polymerase chain reaction, a technique that uses two or more sets of primers to amplify multiple target DNA sequences in the same reaction. With such an approach, it would theoretically be possible to screen a clinical specimen for all commonly found pathogens (for example, a stool specimen for the microbial causes of diarrhoea). Unfortunately, the number of primer pairs that can be included in the same amplification reaction is limited, and other factors, such as the need to remove inhibitory substances from clinical specimens, also prevent the wide scale adoption of such techniques. Although gene amplification may prove useful for diagnosing specific microbial infections, it is unlikely to have a major role in the large scale examination of specimens in a clinical diagnostic laboratory.
Hybridisation
The recent development of high density oligonucleotide arrays has revolutionised hybridisation technology, and these “DNA chips” have exciting potential applications for the diagnosis of micro-organisms.11 DNA chips consist of thousands of specific oligonucleotide probes (short fragments of DNA) arranged on glass or nylon supports (the array) about the size of a postage stamp. They are constructed by means of light directed oligonucleotide synthesis, a modification of a technique used in the computer chip industry, which ensures that each probe has its own specific coordinates on the array. Because so many probes can be included on such a small area, these DNA chips can be used to screen for an enormous number of potential target DNA sequences and are therefore ideal for examining clinical samples for microbial pathogens. Target DNA would be extracted from the sample and labelled with fluorescent dye, before being hybridised to the miniaturised probe (see fig 2). The result of each hybridisation reaction is a distinctive pattern of fluorescence on the array, the coordinates of the positive fluorescent spots allowing identification of the target sequences in the sample.
Figure 2.
Use of DNA chips in diagnosing microbial infections. Microbial genome sequencing identifies specific genetic markers for individual pathogens that can be used to distinguish between micro-organisms (a). Thousands of these specific markers are placed on a small glass or nylon support to create a DNA chip (b). To screen clinical samples, target DNA is extracted and labelled with fluorescent dye (c), and pooled target sequences are hybridised to the chip (d). Target sequences in the clinical sample pair up with the corresponding probes on the chip, resulting in generation of fluorescent signal at the sites of the probes (e). Analysis of the resultant pattern of fluorescence on the array allows identification of the target sequences (and hence the microbial pathogens) present in the sample
Once information on microbial genomes is available, it will be possible to design DNA chips that contain representative markers for all potential pathogens that may be present in individual clinical samples. For example, a chip could contain genetic markers for the pathogens of the central nervous system. If the target sequence of Neisseria meningitidis was present in the cerebrospinal fluid of a patient with meningitis, this would hybridise to the appropriate probe on the DNA chip and be identified by the coordinates of the generated fluorescent signal. It would also be possible to include, on the same chip, probes that would differentiate more virulent stains, allow typing and subtyping, and which would indicate the presence of genes conferring antimicrobial resistance. Consequently, a single hybridisation reaction would provide information on the diagnosis, predict the probable response to treatment, and provide useful data for clinical epidemiology studies. Once the problem of extracting clean target sequences from samples has been solved, such a system has exciting potential for the rapid and cost effective examination of clinical specimens.
This approach may also prove an efficient way to screen micro-organisms for the presence of point mutations that are important in the development of resistance to many antimicrobial drugs. Point mutations in a target sequence (in this case, the resistance gene) cause mismatching between the target sequence and the probe, and this can be detected as reduced intensity of the fluorescent signal. This technique has already been used to study the changes in the HIV-1 protease gene known to contribute to drug resistance in this virus and may prove useful for the rapid screening for resistance in other organisms.12
Drug and vaccine development
New approaches to the development of drugs and vaccines are desperately needed given the current problems of antimicrobial resistance and the fact that it has been almost 30 years since a new class of antibiotic was discovered. Once decoded, the entire genome sequence of a micro-organism can be examined for potential target sites for attack, opening new avenues for the rational development of antimicrobial strategies. Identification of those microbial factors essential for the viability of an organism or those preferentially expressed during the infection process would allow the selection of potential molecular targets for antimicrobial design.
The development of new vaccines is also likely to be greatly accelerated by the availability of data on microbial genomes. With sophisticated computer software, all the proteins of a micro-organism could be examined for those that may be protective. This is achieved by identifying proteins that are known to be major antigens in other species or predicting which proteins are expressed on the cell surface and therefore likely to be immunogenic. Once identified, these can be purified and studied for protective effect, either alone or in synergy with other antigens. A system that permits direct screening of the entire genome of a micro-organism for immunogenic antigens has also recently been described.13 Such a system has considerable potential for the development of both prophylactic and therapeutic vaccines.
Insights into human biology and disease
Microbial genome sequencing and the application of new molecular techniques will inevitably reveal more information on the complexities of microbial pathogenesis and the interaction between pathogens and their hosts. Further examination of how pathogens exploit mammalian host cell function will also assist our understanding of human cell biology.14
Analysis of microbial genome sequences has also revealed that many proteins implicated in human hereditary diseases are also found in micro-organisms.15 These conditions include hereditary non-polyposis colon cancer, Wilson’s disease, and adrenoleukodystrophy, and further homologues will probably be identified as other microbial sequencing projects are completed. Micro-organisms can therefore be used as convenient models for studying the function of genes implicated in human hereditary diseases. For example, the human gene implicated in adrenoleukodystrophy, a neurodegenerative disease characterised by defective long chain fatty acid metabolism, is similar to two genes identified in the yeast Saccharomyces cerevisiae. Further characterisation of these yeast genes has shown that they are involved in the transport of activated long chain fatty acids, suggesting that this is the underlying defect in patients with adrenoleukodystrophy.16
As well as predicting the biological function of genes mutated in human hereditary diseases, the study of conserved proteins in micro-organisms will ultimately lead to improvements in the diagnosis and treatment of these conditions. Microbial genome sequencing is likely to play an increasingly important role in the analysis of newly discovered human genes and provide further clues to the molecular basis of hereditary diseases.
Acknowledgments
I thank Dr Agnès Labigne for helpful comments during the preparation of this manuscript. I also thank the Station de Microscopie Electronique, Institut Pasteur, and Catherine Chevalier for providing images used in the figures.
Footnotes
Funding: PJJ is supported by a research training fellowship in medical microbiology from the Wellcome Trust.
Conflict of interest: None.
References
- 1.Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995;269:496–512. doi: 10.1126/science.7542800. [DOI] [PubMed] [Google Scholar]
- 2.Hood DW, Deadman ME, Allen T, Masoud H, Martin A, Brisson JR, et al. Use of the complete genome sequence information of Haemophilus influenzae strain Rd to investigate lipopolysaccharide biosynthesis. Mol Microbiol. 1996;22:951–965. doi: 10.1046/j.1365-2958.1996.01545.x. [DOI] [PubMed] [Google Scholar]
- 3.Hood DW, Deadman ME, Jennings MP, Bisercic M, Fleischmann RD, Venter JC, et al. DNA repeats identify novel virulence genes in Haemophilus influenzae. Proc Natl Acad Sci USA. 1996;93:11121–11125. doi: 10.1073/pnas.93.20.11121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, et al. The minimal gene complement of Mycoplasma genitalium. Science. 1995;270:397–403. doi: 10.1126/science.270.5235.397. [DOI] [PubMed] [Google Scholar]
- 5.Hensel M, Shea JE, Gleeson C, Jones MD, Dalton E, Holden DW. Simultaneous identification of bacterial virulence genes by negative selection. Science. 1995;269:400–403. doi: 10.1126/science.7618105. [DOI] [PubMed] [Google Scholar]
- 6.Mei J-M, Nourbakhsh F, Ford CW, Holden DW. Identification of Staphylococcus aureus virulence genes in a murine model of bacteraemia using signature-tagged mutagenesis. Mol Microbiol. 1997;26:399–407. doi: 10.1046/j.1365-2958.1997.5911966.x. [DOI] [PubMed] [Google Scholar]
- 7.Chiang SL, Mekalanos JJ. Use of signature-tagged mutagenesis to identify Vibrio cholerae genes critical for colonization. Mol Microbiol. 1998;27:797–805. doi: 10.1046/j.1365-2958.1998.00726.x. [DOI] [PubMed] [Google Scholar]
- 8.Valdivia RH, Falkow S. Bacterial genetics by flow cytometry: rapid isolation of Salmonella typhimurium acid-inducible promoters by differential fluorescence induction. Mol Microbiol. 1996;22:367–378. doi: 10.1046/j.1365-2958.1996.00120.x. [DOI] [PubMed] [Google Scholar]
- 9.Mahan MJ, Slauch JM, Mekalanos JJ. Selection of bacterial virulence genes that are specifically induced in host tissues. Science. 1993;259:686–688. doi: 10.1126/science.8430319. [DOI] [PubMed] [Google Scholar]
- 10.Roberts TC, Storch GA. Multiple PCR for diagnosis of AIDS-related central nervous system lymphoma and toxoplasmosis. J Clin Microbiol. 1997;35:268–269. doi: 10.1128/jcm.35.1.268-269.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Marshall A, Hodgson J. DNA chips: an array of possibilities. Nature Biotechnol. 1998;16:27–31. doi: 10.1038/nbt0198-27. [DOI] [PubMed] [Google Scholar]
- 12.Kozal MJ, Shah N, Shen N, Yang R, Fucini R, Merigan TC, et al. Extensive polymorphisms observed in HIV-1 clade B protease gene using high-density oligonucleotide arrays. Nature Med. 1996;2:753–759. doi: 10.1038/nm0796-753. [DOI] [PubMed] [Google Scholar]
- 13.Barry MA, Lai WC, Johnston SA. Protection against mycoplasma infection using expression-library immunization. Nature. 1995;377:632–635. doi: 10.1038/377632a0. [DOI] [PubMed] [Google Scholar]
- 14.Finlay BB, Cossart P. Exploitation of mammalian host cell functions by bacterial pathogens. Science. 1997;276:718–725. doi: 10.1126/science.276.5313.718. [DOI] [PubMed] [Google Scholar]
- 15.Bassett DE, Jr, Boguski MS, Hieter P. Yeast genes and human disease. Nature. 1996;379:589–590. doi: 10.1038/379589a0. [DOI] [PubMed] [Google Scholar]
- 16.Hettema EH, van Roermund CWT, Distel B, van den Berg M, Vilela C, Rodrigues-Pousada C, et al. The ABC transporter proteins Pat1 and Pat2 are required for import of long-chain fatty acids into peroxisomes of Saccharomyces cerevisiae. EMBO J. 1996;15:3813–3822. [PMC free article] [PubMed] [Google Scholar]


