Blackwood et al. (1) discuss the statistical analysis of terminal restriction fragment length polymorphism (T-RFLP) data. They sampled several soils and identified methods that correctly grouped replicate samples (cluster analysis) and successfully distinguished between site differences (redundancy analysis). Here, we argue that their recommended analyses will not be appropriate for many studies of microbial communities. Statistical analysis should be more explicitly informed by scientific objectives.
Redundancy analysis may be appropriate for the analysis of data from designed experiments or where there is a strong environmental gradient that is expected to have a large influence on microbial ecology. But, where data do not have a strong structure defined a priori, similarities between samples are more sensibly explored by ordination methods such as principal component analysis or multidimensional scaling. The resulting visual displays give powerful insights into the data (see reference 3 for examples).
There is often no reason to expect samples to fall into discrete groups. But many clustering methods will identify apparently well-defined clusters in data where there are no natural groups (2). Ward's method is particularly prone to this problem. Cluster analysis is best viewed as a way of dividing samples up into convenient but arbitrary groups and should not be the only exploratory data analysis method used.
Using peak heights will downweight longer fragments because of diffusion during electrophoresis. It is therefore preferable to use peak areas (4).
On the basis of which similarity measure gave “the right answer” for their data, Blackwood et al. (1) recommend using Euclidean distance on square-root-transformed peak heights (Hellinger distances). Euclidean distances take absences of a species from two samples as a sign of their similarity. There is, therefore, a strong argument for preferring Bray-Curtis (Czekanowski) similarities, which are not affected by the number of joint absences (3).
It is also helpful to consider more explicitly what represents an important difference between samples. Analyses based on raw data will be dominated by variations in abundance of a small number of common operational taxonomic units (OTUs). A log or square root transformation reduces the influence of commoner OTUs. Jaccard distances and other methods based on presence/absence data give equal weighting to rare and abundant OTUs. Eukaryote ecology has often found that log or square root transformations yield the most informative analysis. But the analysis either of raw or binary data may be appropriate if one is interested in common or rare species, respectively.
REFERENCES
- 1.Blackwood, C. B., T. Marsh, S. H. Kim, and E. A. Paul. 2003. Terminal restriction fragment length polymorphism data analysis for quantitative comparison of microbial communities. Appl. Environ. Microbiol. 69:926-932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Chatfield, C., and A. J. Collins. 1980. Introduction to multivariate analysis. Chapman & Hall/CRC Press, Boca Raton, Fla.
- 3.Clarke, K. R., and R. M. Warwick. 1994. Change in marine communities: an approach to statistical analysis and interpretation. Plymouth Marine Laboratory, Plymouth, United Kingdom.
- 4.Kitts, C. L. 2001. Terminal restriction fragment patterns: a tool for comparing microbial communities and assessing community dynamics. Curr. Issues Intest. Microbiol. 2:17-25. [PubMed] [Google Scholar]