Skip to main content
Applied and Environmental Microbiology logoLink to Applied and Environmental Microbiology
letter
. 2003 Oct;69(10):6342–6343. doi: 10.1128/AEM.69.10.6342-6343.2003

Terminal Restriction Fragment Length Polymorphism Data Analysis

Alastair Grant 1,*, Lesley A Ogilvie 1
PMCID: PMC201242  PMID: 14532073

Blackwood et al. (1) discuss the statistical analysis of terminal restriction fragment length polymorphism (T-RFLP) data. They sampled several soils and identified methods that correctly grouped replicate samples (cluster analysis) and successfully distinguished between site differences (redundancy analysis). Here, we argue that their recommended analyses will not be appropriate for many studies of microbial communities. Statistical analysis should be more explicitly informed by scientific objectives.

Redundancy analysis may be appropriate for the analysis of data from designed experiments or where there is a strong environmental gradient that is expected to have a large influence on microbial ecology. But, where data do not have a strong structure defined a priori, similarities between samples are more sensibly explored by ordination methods such as principal component analysis or multidimensional scaling. The resulting visual displays give powerful insights into the data (see reference 3 for examples).

There is often no reason to expect samples to fall into discrete groups. But many clustering methods will identify apparently well-defined clusters in data where there are no natural groups (2). Ward's method is particularly prone to this problem. Cluster analysis is best viewed as a way of dividing samples up into convenient but arbitrary groups and should not be the only exploratory data analysis method used.

Using peak heights will downweight longer fragments because of diffusion during electrophoresis. It is therefore preferable to use peak areas (4).

On the basis of which similarity measure gave “the right answer” for their data, Blackwood et al. (1) recommend using Euclidean distance on square-root-transformed peak heights (Hellinger distances). Euclidean distances take absences of a species from two samples as a sign of their similarity. There is, therefore, a strong argument for preferring Bray-Curtis (Czekanowski) similarities, which are not affected by the number of joint absences (3).

It is also helpful to consider more explicitly what represents an important difference between samples. Analyses based on raw data will be dominated by variations in abundance of a small number of common operational taxonomic units (OTUs). A log or square root transformation reduces the influence of commoner OTUs. Jaccard distances and other methods based on presence/absence data give equal weighting to rare and abundant OTUs. Eukaryote ecology has often found that log or square root transformations yield the most informative analysis. But the analysis either of raw or binary data may be appropriate if one is interested in common or rare species, respectively.

REFERENCES

  • 1.Blackwood, C. B., T. Marsh, S. H. Kim, and E. A. Paul. 2003. Terminal restriction fragment length polymorphism data analysis for quantitative comparison of microbial communities. Appl. Environ. Microbiol. 69:926-932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Chatfield, C., and A. J. Collins. 1980. Introduction to multivariate analysis. Chapman & Hall/CRC Press, Boca Raton, Fla.
  • 3.Clarke, K. R., and R. M. Warwick. 1994. Change in marine communities: an approach to statistical analysis and interpretation. Plymouth Marine Laboratory, Plymouth, United Kingdom.
  • 4.Kitts, C. L. 2001. Terminal restriction fragment patterns: a tool for comparing microbial communities and assessing community dynamics. Curr. Issues Intest. Microbiol. 2:17-25. [PubMed] [Google Scholar]
Appl Environ Microbiol. 2003 Oct;69(10):6342–6343.

Authors' Reply

Christopher B Blackwood 1,2,3,*, Terry Marsh 1,2,3, Sang-Hoon Kim 1,2,3, Eldor A Paul 1,2,3

We thank Grant and Ogilvie for pointing out that statistical analysis should be driven by scientific objectives. We are in agreement and would hope that the recommendations made in our paper (1) are not applied without consideration of the details of the experiment being analyzed. Some statistical methods, however, performed with a high degree of sensitivity for T-RFLP data, while others did not. While methods that we did not include may be superior in certain situations, their sensitivity in analyzing T-RFLP data remains to be tested. As stated in our paper, the study was not meant to be exhaustive, and we look forward to tests of alternative methods of T-RFLP data analysis.

Redundancy analysis cannot be used when no information other than that from T-RFLP profiles is available. We agree with Grant and Ogilvie that in such situations it would be useful to apply, in addition to cluster analysis, an ordination technique. One method can complement the other. It may also be prudent to apply the lessons from our study when performing ordinations. Cluster analysis can be useful because it can summarize in one dendrogram the information of several ordination plots. We did not observe that cluster analysis identified well-defined groups when no groups were visible in ordination plots (comparisons not discussed in our paper), although this is possible. When there were no natural groups of profiles, dendrograms had little heterogeneity in stem lengths and ordination plots presented an undifferentiated data cloud.

There is no consensus on whether T-RFLP peak height or area should be analyzed. Grant and Ogilvie recommend analysis of peak area, which we avoided because overlapping peaks are not deconvoluted by Genescan, resulting in an artificial alteration of area based on proximity to other peaks. For comparison between communities, the downweighting of larger fragments using peak height may not be overly detrimental since this effect will be constant across profiles and could be dealt with analytically if necessary.

As we stated (1), future evaluations of T-RFLP data analysis could include other distance metrics such as the Bray-Curtis similarity mentioned by Grant and Ogilvie. We recommended either Hellinger or Jaccard distance since they performed equally well in general, and the properties of one metric may be preferred in individual circumstances. Redundancy analysis using Bray-Curtis similarity, like Jaccard distance, does not result in scores for evaluation of the effects of particular T-RFs. In the study of Legendre and Gallagher (2), its performance was very good but not equal to that of Hellinger distance.

While Grant and Ogilvie mention that analysis of raw T-RFLP data may be desirable in some situations, we observed that raw T-RFLP data are influenced by analytical noise. Profiles should at least be transformed to relative peak height, unless very different laboratory methods are used. Also, if one is interested in heavily weighting small peaks, then care must be taken to have uniform total fluorescence among profiles.

We hope that these observations and the others in our paper will serve as a good starting point for future efforts to analyze T-RFLP data.

REFERENCES

  • 1.Blackwood, C. B., T. Marsh, S. Kim, and E. A. Paul. 2003. Terminal restriction fragment length polymorphism data analysis for quantitative comparison of microbial communities. Appl. Environ. Microbiol. 69:926-932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Legendre, P., and E. D. Gallagher. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 129:271-280. [DOI] [PubMed] [Google Scholar]

Articles from Applied and Environmental Microbiology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES