Skip to main content
Bioinformation logoLink to Bioinformation
. 2010 Apr 30;4(10):463–467. doi: 10.6026/97320630004463

Classification of DNA sequences based on thermal melting profiles

Edward Reese 1, Vishwanathan V Krishnan 1,2,3,*
PMCID: PMC2951702  PMID: 21042401

Abstract

A new classification scheme based on the melting profile of DNA sequences simulated thermal melting profiles. This method was applied in the classification of (a) several species of mammalian ­ β globin and (b) α­chain class II MHC genes. Comparison of the thermal melting profile with the molecular phylogenetic trees constructed using the sequences shows that the melting temperature based approach is able to reproduce most of the major features of the sequence based evolutionary tree. Melting profile method takes into account the inherent structure and dynamics of the DNA molecule, does not require sequence alignment prior to tree construction, and provides a means to verify the results experimentally. Therefore our results show that melting profile based classification of DNA sequences could be a useful tool for sequence analysis.

Keywords: DNA, hybridization, melting profiles, classification

Background

The DNA double helix has more information built into its structure, that are both local such as variation in base pairing interactions and stacking interactions, as well as long range such as dynamic superhelical stress [13]. These interactions are responsible for the physical chemistry of the sequences and they are reflected in the thermodynamic properties such as the melting temperature. Therefore, sequences that have a high homology are expected to have similar thermodynamic parameters such as the melting temperature (Tm). Melting of a DNA molecule involves the denaturation the double­stranded DNA molecule into two single strands and it is the reverse process of hybridization. The denaturation process can be affected by many means such as an increase in temperature or denaturant concentrations [4]. The melting curve represents the denaturation process as a function of increasing sample temperature. Experimentally, the melting profile can be monitored by optical techniques such as absorption and fluorescence microscopy as the interactions among stacked bases cause a decrease in UV absorption. Melting of double­stranded DNA at elevated temperatures involves breaking the hydrogen bonds of the base pairs and a decrease of base stacking. This results in an increase in UV absorption, a hyperchromicity, which can be measured with a spectrophotometer [4].

Melting curve analysis has been used in many applications such as the detection of single nucleotide polymorphisms (SNPs) [57] and has recently been proposed as an approach to DNA sequencing [8]. DNA melting profile analysis has also been used in many clinical research applications [9]; these include genotyping [1013], mutation scanning [14,15] and simultaneous genotyping and mutation scanning [1618]. Experiments based on melting profiles have also been used as a rapid, economical means of screening close relatives for transplant compatibility [19]. The melting behavior and melting temperature of oligonucleotides can be predicted by a wide range of thermodynamic models which assumes that the stability of a DNA duplex depends on the identity and orientation of the neighboring base pairs [572025]. The idea of using the thermal stability, in particular the melting temperature to differentiate between DNA sequences was originally suggested in the pioneering work by King and Wilson [26] and later followed by others [2629]. King and Wilson used the nucleic acid hybridization melting temperature to quantify the resemblance between human and chimpanzee genes [26]. The difference in melting temperature between the reannealed human DNA and human­chimpanzee hybrid DNA is about 1.1° C, and that to the sibling species of Drosophila, congeneric species of mice and congeneric species of Drisophila are 3° C, 5° C and 19° C, respectively. Higher the difference in the Tm, larger the evolutionary distance between the species. As longer DNA sequences tend to have several localized melting events [30,31], the melting profile of the DNA sequence has additional information [9]. In one of the early works, Schmid and Marks [28] demonstrated the use of DNA hybridization as a guide to phylogeny using a model system of heteroduplexes formed between human β globin CDNA and four β ­like globin genes isolated from a different species (chimpanzee).

In this study, we present a simple method for classifying the nucleotide sequences using simulated melting profiles. We demonstrate the utility of this method in β έglobin and gene clusters of MHC class II α­chain proteins across multiple mammalian species. Comparison of the melting profile generated phylogenetics with that of the conventional sequence based approach reproduces many of the major features, but do not show a perfect match. The major advantage of this method is that it provides a way to verify phylogenetics constructed only from the DNA sequence and the molecular evolutionary process experimentally.

Methodology

The first exon of the β ­globin gene is often used as a standard example in many DNA­based graphing methods [32] is used as the first example. The gene family of β ­globin varies between 86­105 bases and has a significant biological role in oxygen transport. β ­ globin gene across the 11 species is listed in Table 1. The class II MHC α­chain sequences were obtained from the original reference of Takahashi and co­workers [33] (see supplementary material for data). The average length of the class II MHC α­chain is 612 and a total of 31 different species were used for the analysis. The melting profile was separately calculated for each individual sequence using the MELTSIM program [34, 35]. The melting profiles were calculated from 60°C to 120°C in steps of 0.1°C using the default setting (75mM of NaCl). Classification of the calculated profiles were performed using the Euclidian distance measure, unweighted arithmetic average for clustering with a 10 fold bootstrapping for revalidation. Clustering and validation of the profiles were performed using a combination of codes written in Matlab [36] and R [37,38]. For the same DNAs, sequence­based evolutionary trees were constructed using MEGA [39]. All phylogenetic output files were generated in newick tree format (http://evolution.genetics.washington.edu/phylip/newicktree.html) , and visualized using Treedyn [40].

Results

The gene family of β-globin across the 11 species are listed in Table 1 along with the percentage of the various bases, GC content as well the estimated Tm, for each sequence. Figure 1 shows the simulated melting profile using the program MELTSIM of the denaturation process (θ vs. temperature) (in Figure 1a) for all the 11 sequences and its first derivative dθ/dT (in Figure 1b). The population value 0.5 and ­0.5 represents the double­­helical and denatured (single strand) DNAs, respectively and the melting temperature (Tm) is defined to be where these populations are equal. The first derivative of the melting profile is more illustrative of the process of denaturation, as the melting temperature is represented as a peak (dθ/dT). Each derivative profile shows peak value that corresponds to the Tm and additional features such the width, shape and other low intensity peaks are manifestations of the sequence composition [41,42]. As β­globin is represented by a relatively small number of bases such distinct features are not pronounced, except for the opossum (highlighted with dark curve in Figure 1). The DNA sequence of opossum has the lowest of the GC content of the sequences and consequently has the lowest Tm [43]. Melting profile based phylogenetics of β ­globin is shown in Figure 2a, while the similar profile using the sequence is shown in Figure 2b. Most of the tree structures are reproduced in the melting profile based phylogenetics, with few notable differences. In both approaches the sequence of Opossum is clearly differentiated and the relative distances between Rat, Lemur and Gallus are also reproduced. Sequence based approach shows the cluster of Human, Chimpanzee and Gorilla, while the melting profile based approach keeps only the cluster of Chimpanzee and Gorilla together.

Figure 1.

Figure 1

Melting profiles of beta­globin genes: (a) and their respective first derivatives (b). Some of the sequences are labeled in (b) and the melting profile of Opossum is shown in dark lines.

Figure 2.

Figure 2

Evolutionary trees of β­globin genes: Evolutionary trees constructed using the thermal melting profiles (left) and that from the respective gene sequence after sequence alignment.

The shape of the melting curve and the melting temperature (Tm) are sensitive to the salt concentration in the sample [44]. All the calculations in Figure 1 are performed with the salt concentration of 75mM. To highlight the effect of salt concentration on the melting profile, the β­globin sequence of opossum that shows additional features in the melting profile is simulated as a function of salt concentration. Figure 3a shows the plot of the profiles and the change in the Tm is shown in Figure 3b. Increase in salt concentration increases Tm (70°C at 0.01M to 98°C at 0.5M) and follows the empirical relationship between Tm vs. log(Concentration) [44]. Increasing the salt concentration also drastically changes the melting profile such as drastic loss of fine features that represent local melting events. Both melting profile and the Tm are important to differentiate the sequences in determining the phylogenetics and therefore optimal concentration of the salt is expected to be critical.

Figure 3.

Figure 3

Temperature effect of Melting profiles: Plots of the melting profiles of the Opossum sequence of the β­globin gene family as a function of salt concentration (a). The plot of the melting temperature (Tm) as a function of the log (concentration) follows a liner relationship (b).

Figure 4 show the melting profiles and molecular phylogenetics, respectively for the genes of class II MHC α­chain genes [33]. The origins and divergence times of mammalian class II MHC genes have been studied in detail by Takahashi et al [33] and the data presented here is one of the subgroups of gene clusters studied in the original study ( Table 2 of [33]). The sequence lengths of class II MHC α­chain genes are much larger than that of β ­globin. Total of 31 different sequences and sequence length close to 612 for most sequences (see supporting material for the list genetic sequences) were used. As expected the melting profiles of these genes are complex suggesting the presence of additional features that could be useful to differentiate one sequence from the other. Original sequence based analysis of class II MHC α­chain genes showed four sequences (Zebra fish A1, A2 and A3 and shark) belong to an outgroup and the melting temperature analysis (Figure 4) also reproduces the same result. Mammalian class II MHC genes are clustered into four major groups, DRA, DPA, DQA and DNA [33]. Melting temperature based phylogeny is able to reproduce three of the four clusters only with overlaps between the closer clusters DQA and DPA.

Figure 4.

Figure 4

Application to Class II α­chains of MHC sequences: (a) Melting profiles of the 31 class II α­chains of MHC proteins simulated using MELTSIM. (b) Melting profile derived molecular evolutionary tree of class II alpha­chains of MHC proteins. DPA, DQA, DNA and DRA refer to the gene clusters originally ([33]).

Discussion

With the advent of new molecular biology techniques including sophisticated cloning, sequencing and monitoring genomes allows the characterization of the single species without need for crosshybridization techniques. For example Liu et al [45] have applied a melting map calculation to the complete human genome. (http://meltmap.uio.no). DNA melting curves analysis is a valuable technique in sequencing, differentiate between coding and noncoding regions [4648], genotyping [4648] in the design of oligonucleotide probes in microarray experiments [4648] and clinical applications [19] Classification of the profiles presented in this paper adds to an additional dimension to the power of melting profile analysis.

Salt concentration is expected to play a significant role on the applicability of DNA melting based differentiation between the sequences (Figure 2). The hypochromicity of DNA, responsible for the denaturing of the double helix is explained in terms of the interaction of the bases when they are stacked in the double helical array as delineated by Watson ­ Crick Model [4951]. Semiquantitative models developed in the 1960s continue to provide significant insight to the melting profiles of DNAs [44,52]. Another major variable generating the melting profiles of the DNA sequences is the choice of the simulation program. In this work we have used the one of the widely used method, MELTSIM (materials and methods). For short oligonucleotides (16­32 bases), Panjkovich and Melo [53] performed an extensive comparison of the various methods. In their study it was noted that large and significant differences in the estimations of Tm were obtained while using different methods and no conclusive recommendations were provided on the choice of simulation methods to determine the melting profiles or its accuracy. Here, we are suggesting an approach to classify DNA sequences has potential implication for sequence analysis; DNA sequences could be classified purely from the experimental melting profiles and sequence information is not mandatory as it depends. This method is expected to find wider applications once the sensitivity of the results is established by experiments.

Supplementary material

Data 1
97320630004463S1.pdf (122.2KB, pdf)

Acknowledgments

The authors acknowledge Dr. K.W. P. Miller for critical reading of the manuscript. V.V.K. is supported by an NIH grant: Research Infrastructure for Minority Institutions P20MD002732.

Footnotes

Citation:Reese et al, Bioinformation 4(10): 463-467 (2010)

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data 1
97320630004463S1.pdf (122.2KB, pdf)

Articles from Bioinformation are provided here courtesy of Biomedical Informatics Publishing Group

RESOURCES