Abstract
The P450 enzymes maintain a conserved P450 fold despite a considerable variation in sequence. The P450 family even includes proteins that lack the single conserved cysteine and are therefore no longer haem-thiolate proteins. The mechanisms of successive gene duplications leading to large families in plants and animals are well established. Comparisons of P450 CYP gene clusters in related species illustrate the rapid changes in CYPome sizes. Examples of CYP copy number variation with effects on fitness are emerging, and these provide an opportunity to study the proximal causes of duplication or pseudogenization. Birth and death models can explain the proliferation of CYP genes that is amply illustrated by the sequence of every new genome. Thus, the distribution of P450 diversity within the CYPome of plants and animals, a few families with many genes (P450 blooms) and many families with few genes, follows similar power laws in both groups. A closer look at some families with few genes shows that these, often single member families, are not stable during evolution. The enzymatic prowess of P450 may predispose them to switch back and forth between metabolism of critical structural or signal molecules and metabolism dedicated to environmental response.
Keywords: P450 sequence diversity, copy number variation, duplication, gene clusters
1. Introduction
The remarkable diversity of P450 sequences and of P450 functions goes beyond the numbers: thousands of sequences and of publications, spanning all kingdoms of life and all areas of biological research [1]. The P450 enzyme family is a family of extremes. At one extreme, a single amino acid change is sufficient to change regioselectivity of hydroxylation of two closely related enzymes on the same substrate [2]. At the other extreme, P450 sequences are so diverse that there are no universally conserved residues across the family (see below). This diversity raises questions about the origins of the P450 gene family, and offers a rich field of investigation into the mechanisms of gene family diversification. The earliest P450 gene must have emerged early in the evolution of life forms, as the presence of P450s in Archaea, Bacteria and eukaryotes would suggest. The origin of the P450 family is unclear, but several hypotheses have been proposed and merit further exploration. They include a role as reductase [3], as in nitric oxide reductase [4], in anaerobic environments and before acquiring the capacity to bind dioxygen. A role as peroxygenase may have been initially important as well [5]. This was possibly followed by a role in the ‘detoxification’ of molecular oxygen at a time when this gas became a cellular and atmospheric agent of destruction and extinction [6]. The oxygenated by-products of this detoxification, such as membrane lipids and sterols, then increased in importance to metabolism in many roles, including as signal molecules [3,6]. In this view of P450 evolution, the function of the genes as environmental response genes or as essential components of physiology was blurred from the beginning. It is not known whether the lack of CYP genes in some bacteria represents an ancestral or a derived trait, nor whether this question can be resolved. We will not discuss in more detail the possible features of the ancestral progenitor of the P450 family.
Here we will explore some aspects of the success of this family in the exploration of sequence space. Structure determination of many bacterial and an increasing number of eukaryotic P450s has shown that the basic P450 fold is highly conserved, and that this has happened despite a tremendous diversification of sequence [7,8]. Great progress has been made in the last 10 years in our understanding of the importance of gene duplications as shapers of genomic diversity [9]. Much work has been done using computational approaches on a genomic scale. Work on the P450 family allows a more detailed view, and in some cases, the contribution of a P450 in determining the fitness of an organism is known, so that the predictions or trends that emerged from computational approaches can be compared with actual data. We will first show that the P450 family, perhaps more than any other gene family, exploits the full range of sequence conservation, from zero universally conserved residues to 100 per cent identity in recently duplicated or amplified genes. We describe several instances of CYP cluster evolution to show the dynamic nature of gene duplications and rearrangements that are at the base of P450 diversification. We show the power-law distribution of P450 sequences into CYP families and discuss the conundrum of stable and unstable genes that is conventionally proposed to explain why some CYP families are blooming and others are of small size.
2. P450 sequence diversity and conservation
(a). Beyond the twilight zone
When the identity of two protein sequences drops below 25 per cent, these sequences enter what Doolittle [10] called the twilight zone, where homology and similarity of randomly assorted sequences become undistinguishable. Fortunately, navigating through sequence space is a stepwise, transitive process so that two sequences can be little more than 20 per cent identical, but each can have a much higher identity to a third sequence, allowing a correct assignment of homology of all three sequences. In the P450 family, the alignment of very divergent sequences has furthermore been anchored by very few invariant residues, the P450 motifs such as WXXXR in the C helix, the conserved Thr of helix I, EXXR of helix K, and PERF followed by the haem binding region FXXGXXXCXG around the axial Cys ligand [7,11]. As more sequences became available, the conserved motifs and invariant residues had to yield to more and more exceptions, and these exceptions are often a clue to a peculiarity of P450 catalysis.
The conserved motif in the I helix, (Asp/Glu)Thr, is thought to be involved in the protonation of the distal oxygen in the ferric hydroperoxy complex. This conserved Thr was long considered to be invariant, with mutagenesis studies showing loss of activity when it was substituted [12]. CYP107A1 (P450eryF), a bacterial enzyme of erythromycin biosynthesis [13], was the first clear example of a P450 lacking the conserved Thr, as it has an Ala at that position. The function of the hydroxyl group of the conserved Thr is carried out by a hydroxyl group of the substrate 6-deoxyerythronolide in CYP107A1 [14]. Another example of a P450 lacking this conserved Thr is CYP176A (P450cin) in which an Asn residue forms a hydrogen bond with the oxygen from the substrate cineole ensuring regio- and stereoselectivity of the enzyme [15]. There are now multiple examples of P450 enzymes lacking this motif—in fact 25 per cent of the Arabidopsis thaliana P450s differ from the (Asp/Glu)Thr sequence in the I helix [16].
One of the last invariant residues that yielded to an exception was the Glu residue of the EXXR motif in the K helix that was found to be missing in the CYP157C1-4 of Streptomyces sp. [17]. The EXXR motif, thought to be essential for correct folding and haem incorporation, is replaced by QXXW in those P450s, yet CYP157C1 folds correctly into a spectrally normal P450. The Cys axial ligand to the haem iron has therefore remained until now the last and only invariant residue in all members of the P450 family.
(b). Number of invariant residues: down to zero
As haem-thiolate proteins, the P450 sequences are characterized by the invariant Cys located before helix L. A few natural enzymes with unusual properties have modified sequences around this Cys residue. These are the allene oxide synthases and hydroperoxide lyases in the CYP74 family and the peroxygenases of the bacterial CYP152 family that have a nine- or three-residue insertion in the usual FXXGXXXC sequence and substitutions in the Phe residue [5]. Prostacyclin synthases and a few other CYP also have a substitution of this conserved Phe (figure 1). Despite these variations, all natural P450 sequences described to date have the haem axial ligand Cys as the only common feature. This is no longer true. There are now examples of P450 family members that break this dogma. The CYP20 sequences are unusual in that region of the alignment, and the Ciona CYP20 sequences have a His as putative haem ligand instead of Cys. The CYP20 enzymes are ‘orphans’ of unknown activity [18]. Mutagenesis of Cys to His in CYP2B4 can turn the enzyme to a peroxidase when covalent attachment of the haem to the protein is favoured [19], so the CYP20 enzymes with an unusual His may still have peroxidase activity. Fusion proteins between a fatty haem peroxidase/dioxygenase domain and a hydroperoxide isomerase P450 domain are more unusual. These P450s found in fungi are grouped in the CYP6000 series, with Aspergillus nidulans PpoA (CYP6001A1) as the first example [20]. Although the hydroperoxide isomerase activity resembles the allene oxide synthase activity, PpoA does not have a nine-residue insertion before the conserved Cys, and it has the canonical FXXGXXXCXG sequence. The related PpoC enzyme in the same species lacks the isomerase activity, and has a degenerated sequence in that region, lacking the conserved Cys (figure 1) [21]. As these Ppo enzymes are active as homotetramers, it is possible that the P450 domain in PpoC is still active in subunit interactions with the peroxidase/dioxygenase domain after having lost its isomerase activity.
Figure 1.

Variations around the ‘invariant’ Cys (C) in some P450 sequences. x9x indicates a nine-residue insertion [5] and xxx a three-residue insertion; h indicates a hydrophobic residue and dots indicate gaps introduced to optimize the alignment. Ciona and Branchiostoma are Chordata; Amphimedon and Suberites are Porifera. The five bottom sequences are from insects.
Several insect P450 sequences also lack the conserved Cys. CYP408A1 and CYP408B1 are about 30 per cent identical to the closest ‘normal’ P450 sequences and clearly belong to the CYP3 clan. These sequences also lack the conserved C helix motif WXXXR, in which the Arg makes a salt bridge with a haem propionate. These proteins would therefore be predicted to lack a haem prosthetic group. The CYP408 sequences retain the EXXR motif, but also lack the highly conserved Thr in the I helix. These unusual P450 sequences are found in two locust species, Locusta migratoria and Schistocerca gregaria, and in three hemipteran species, the major pest of rice Nilaparvata lugens (brown plant hopper), and two leafhoppers, Peregrinus maidis and Homalodisca vitripennis (glassy-winged sharpshooter; figure 1). The occurrence of these genes with multiple expressed sequence tags (ESTs) in five species from two insect orders (Orthoptera and Hemiptera) that have diverged about 350 million years ago (Ma) strongly suggests that they are functional genes not pseudogenes. Indeed, pseudogenes would probably have lost all recognizable P450 features in such a long evolutionary time, if not for some selection of their presumably non-catalytic activity. Mining for such sequences is not trivial, because one is looking at a rare P450 that lacks some of the very characters that define a P450. Some sequences previously dismissed as probable pseudogenes may be CYP408-like P450 sequences.
(c). Conservation of sequence
The other extreme of sequence conservation is seen in pairs of genes, recently duplicated genes that are identical or nearly so in their amino acid sequence. As an example, the Cyp3a41a and Cyp3a41b genes that lie between Cyp3a16 and Cyp3a44 in the mouse are identical at the amino acid level [22]. Gene duplication as driver of gene diversity has been discussed before in the context of the P450 family [23]. Recent experimental data from Caenorhabditis elegans indicate that the rate of gene duplication is two orders of magnitude higher than the rate of point mutations in that species [24]. If the high rate of spontaneous gene duplications [9] is confirmed experimentally in other species as well, it would mean that point mutations have great latitude to tinker with sequences.
3. Comparative genomics of CYP clusters
Comparative genomics illustrate the dynamic nature of gene duplications, as seen in particular in CYP gene clusters. One particular cluster of six CYP2 genes, the CYP2A/B/F/G/S/T cluster, diverged through duplications and inversions in the 80 Myr since the human and rodent lineages separated, giving rise to 12 genes plus 10 pseudogenes in the mouse, 14 genes plus four pseudogenes in the rat [25], and six genes plus seven pseudogenes in humans [26]. Of the six human genes, four encode xenobiotic-metabolizing P450s, and two are orphans of unknown function. Furthermore, two of the presumed ancestral members of the cluster (CYP2T and CYP2G) are pseudogenized in humans, but still active genes in the mouse and rat. Gene conversion events have been carefully examined in the CYP2 cluster by using a phylogenetic approach showing multiple events [27]. This study indicated that the human CYP2A13 gene is a gene conversion hot spot in this cluster, with nine events, and is the ‘recipient’ of many outside sequences, including originally non-coding ones. This observation led the authors to wonder how this gene can still be functional. It is also remarkable that all the genes involved have remained clustered despite the fact that they belong to six different CYP subfamilies.
Another interesting case is the CYP21A2 locus on the short arm of chromosome 6 in humans, which is constituted by a large (approx. 35 kb) segmental duplication in the HLA region encompassing complement factor gene C4 and CYP21 [28]. The comparative genomics of this locus reveals that the number of C4/CYP21 duplication units is quite variable in mammals, from one to four [29]. In humans with two units, one segment contains a pseudogene CYP21A1P, and the other contains the active gene CYP21A2. About 85 per cent of the known CYP21 mutant alleles result from gene conversion events between the active gene and its pseudogene, of which 75 per cent are so-called ‘microconversions’ that appear as point mutations until closer analysis [28]. Because of the importance of congenital adrenal hyperplasia caused by CYP21 defects, usually compound heterozygote mutants, this case has been studied in extreme detail [28]. It is likely that such microconversions between a CYP gene and its recent duplicate (pseudogenized or not) are much more widespread than heretofore acknowledged.
(a). CYP clusters in Lepidoptera
We studied five cases of CYP cluster evolution between three related species of Lepidoptera, the silkworm Bombyx mori and the two noctuids Helicoverpa armigera and Spodoptera frugiperda [30]. Their relative times of speciation are known: the silkworm lineage diverged from the lineage to noctuids about 80 Ma, and the two noctuid lineages diverged about 20–30 Ma [30]. It is therefore generally possible to distinguish duplications within a lineage (generation of paralogues) from speciation events (generation of orthologues). The five CYP clusters from different B. mori chromosomes were compared with homologous clusters in the noctuid species. These clusters have evolved at approximately the same rate, as the overall percentage identity between the various P450s is similar when Bombyx is compared with the noctuids, and when the two noctuids are compared with each other (table 1). The total number of genes is higher in the noctuids, 17 in H. armigera and 25 in S. frugiperda, whereas the five loci carry only nine genes in B. mori.
Table 1.
Comparison of five CYP genes loci in three lepidopteran species. The number of genes in the locus is in parentheses, and the range of percentage identities at the protein level is given for the comparison of B. mori to the two noctuids and for the comparison between the two noctuids.
| B. mori | versus | (H. armigera | versus | S. frugiperda) | |
|---|---|---|---|---|---|
| locus | (n) | % identity | (n) | % identity | (n) |
| CYP332A | (1) | 58–65 | (2) | 71–78 | (1) |
| CYP4L | (1) | 61–67 | (2) | 70–87 | (3) |
| CYP4M | (2) | 54–64 | (3) | 64–80 | (4) |
| CYP6B | (1a) | 52–63 | (5) | 71–78 | (5) |
| CYP9A | (4) | 51–68 | (5) | 46–76 | (12) |
aThe CYP6B29 gene of B. mori is a probable pseudogene.
The CYP332A locus on chromosome 15 is the simplest case of a single gene having been duplicated in just one of the three species (H. armigera). In that species, the presence of a transposable element between the two duplicated copies (marked as RT in the figure) may provide a clue to the process involved (figure 2). The proximity of transposable elements to P450 clusters in insects has been noted before [31], although a systematic survey and a thorough statistical analysis of their relative distribution has not yet been undertaken. Similarly, a high density of transposable elements in the CYP4M region of S. frugiperda has been reported [30]. This locus on chromosome 5 of B. mori consists of two tandemly duplicated genes but the additional genes in each of the noctuids result from three duplications, two that occurred before the divergence of the H. armigera and S. frugiperda lineages, and a third one that occurred more recently in the Spodoptera lineage (figure 3).
Figure 2.
The CYP332 locus (a) and the CYP4L locus (b) in three lepidopteran species, B. mori (Bm), S. frugiperda (Sf) and H. armigera (Ha). The genes are depicted on each chromosome in their correct orientation and order. Stippled lines are regions not covered by the sequenced BAC. The flanking genes are shown to indicate conserved synteny. Gdh:, glucose dehydrogenase; L51mt, mitochondrial ribosomal protein L51; Rps2, 40S ribosomal protein S2; Aspg, aspartylglucosaminidase.
Figure 3.

Evolution of the CYP4M locus in three lepidopteran species, B. mori (Bm), S. frugiperda (Sf) and H. armigera (Ha). The genes are depicted on each chromosome in their correct orientation and order but not relative distance. The phylogenetic tree (maximum-likelihood supported by boostrap, with a CYP4L as outgroup, not shown) is superimposed with its topology correct but branch length modified for clarity. Filled circles indicate gene duplications and filled triangles indicate speciation events.
The CYP4L locus on B. mori chromosome 13 comprises just one gene, CYP4L6. In the noctuids, this has been duplicated at least twice, once in the Spodoptera lineage and once in the Helicoverpa lineage. Interestingly, the two genes in H. armigera, CYP4L5 and CYP4L11, are inverted when compared with the other two species. This large inversion encompasses both genes despite their large size, 12.5 and 6.5 kb, respectively, and 11 exons each. This cluster is closely linked to the tandemly duplicated pair of CYP333B1 genes from the mitochondrial P450 clan (figure 2). Close linkages between animal P450s from different clans is the subject of another paper in this issue [32].
The complex evolution of the CYP9 locus on chromosome 17 of B. mori has been described before [23,30]. It is remarkable because of the 21 genes in the three species, only three pairs can be considered 1 : 1 orthologues, the others resulting from several duplications in each lineage. The sequence of an additional BAC extended the S. frugiperda cluster by three additional genes when compared with our original study. The CYP9A28-31 genes are recently duplicated in the Spodoptera lineage and are 76–90% identical in amino acid sequence, yet we found ESTs only for CYP9A31, suggesting that expression levels for the members of this cluster are quite divergent.
The CYP6B locus provides an example where diversity following speciation as well as haplotype diversity within a species can be compared. The Bombyx CYP6B29 gene on chromosome 21 has a stop codon in the sequenced P50 strain and in several ESTs, and is therefore an expressed pseudogene at least in that strain. The CYP6B locus on BAC 21N21 of H. armigera has been described to consist of three genes [33], and we found, in the same BAC library, another haplotype at that locus. The BAC 12E11 carries five genes, of which one is interrupted by the extremity of the BAC. Evidence for two segmental duplications and rearrangements can be deduced from the analysis of self-alignment dot plots, and these involve large tracts of the chromosome (figure 4a). The five CYP6B in each of the two noctuid species result from duplications that are lineage specific, showing that duplications at the same locus have led to phylogenetically distinct clusters in these closely related species (figure 4b). The examples provided here show that the number and orientation of CYP genes in homologous clusters can be deceiving as to the orthology relationships of the genes. They also show that gene numbers can rapidly change, when copy number variation (CNV) involves segmental duplications with more than one P450 gene.
Figure 4.

(a) The CYP6B locus in three lepidopteran species, B. mori (Bm), S. frugiperda (Sf) and H. armigera (Ha). The two haplotypes found in H. armigera are shown as Ha_1 (from BAC 12E11, accession no. FP340431, [30]) and Ha_2 (from BAC 21N21, accession no. DQ458470, [33]). Small symbols represent repetitive sequences, and boxes under the Ha_1 haplotype represent the extent of recent segmental duplications. The BAC sequence terminates in the A gene indicating that the full Ha_1 haplotype may have a complete additional duplication of the C gene (CYP6B2). (b) Maximum-likelihood tree of the CYP6B sequences showing the independent origin of the five S. frugiperda sequences and of the five H. armigera sequences.
4. From copy number variation to duplication
The P450 family offers examples of duplications that are sufficiently recent to allow a careful examination of the fate of the duplicated genes within a few million years, when it is most informative [34]. It has been argued that CNVs in a genome are ‘a phase of molecular evolution like any other polymorphism’ [35]. Therefore, in humans and a growing number of species for which a reference genome is available, it should be possible to follow the phenotypic differences caused by CNV in P450 genes, and hence the trajectory of duplications and deletions towards fixation. As the primary manifestation of CNV is a change in gene dosage, it is important to study the fitness effect of CNVs, extra copies through duplication (effectively as gain-of-function) or fewer copies through deletion (or loss-of-function allele).
In a survey of over a thousand disease-associated human genes, Kondrashov & Koonin [36] compared the number of paralogues for two classes of genes. They showed that haploinsufficient genes, which have an abnormal phenotype when heterozygotic over a loss-of-function allele, have on average more paralogues than haplosufficient genes. They argue that gene dosage may influence the initial fixation of duplications. Two copies of a recessive gene maintained by CNV could compensate for the deleterious phenotype. They also showed that haplosufficient genes are on average more likely to be enzymes. This result is generally compatible with the Kacser & Burns [37] theory of dominance derived from metabolic control theory (see, however [38]). How do P450 enzymes fit into this scheme, and what is the phenotype of dosage imbalance for CYP genes? Disorders caused by mutations in steroid hormone biosynthesizing P450s are recessive in plants (the seven dwarfs of brassinosteroid biosynthesis), in insects (e.g. Halloween genes) and in mammals. One of the most frequent autosomal recessive mutations carried in human populations is CYP21A2 deficiency [28], but the gene is haplosufficient. While some CYP17A1 and CYP11B1 mutant heterozygotes can be detected by an ACTH (adrenocorticotropic hormone) stimulation test, their discrete biochemical response in this test can hardly be extrapolated into a fitness-affecting phenotype. Mutations in the human CYP1B1 gene that are linked to congenital glaucoma (buphthalmos) and other disease-linked human CYP mutations are also fully recessive [39].
The picture is more subtle for environmental response genes. The best example is the highly polymorphic human CYP2D6 gene with important effects of gene dosage [40]. It can be considered a haploinsufficient gene in drug metabolism, as heterozygotes lacking one functional allele are ‘intermediary metabolizers’ of bufuralol, and these individuals may have impaired clearance of some drugs. Conversely, individuals carrying extra copies of the CYP2D6 gene, duplications or amplifications (up to 13×), are ‘ultrarapid metabolizers’ [40] and may not be responsive to some antidepressant drugs. In accordance with Kondrashov & Koonin's [36] observation that haploinsufficient genes have more paralogues than haplosufficient genes, CYP2D6 has many paralogues—but in the mouse! There are nine CYP2D genes in the mouse for just one in humans. In primates, the CYP2D locus is a cluster of three genes, CYP2D6, D7 and D8. CYP2D6 and CYP2D8 were already present before the divergence of New World monkeys from Catarrhini (Old World monkeys and apes, including humans), whereas CYP2D6 and CYP2D7 duplicated within the latter lineage [41]. The CYP2D7 and CYP2D8 are pseudogenes in humans, but not in all primates, and several mutant alleles of human CYP2D6 are hybrids (resulting from gene conversions) with various proportions of the CYP2D7P pseudogene (www.cypalleles.ki.se/cyp2d6.htm).
It is paradoxical however that the haploinsufficient gene/more paralogues correlation [36] contrasts with the P450 situation in animals. Most paralogues are usually found in families generally associated with environmental response, where they are sometimes considered redundant with no overt phenotype when missing [39]. They would not readily be classified as haploinsufficient. While complete redundancy of two genes is difficult to prove formally, the commonly observed overlap of substrate specificity in P450 enzymes is one aspect of redundancy that can complicate the assessment of haploinsufficiency. Perhaps the phenotype of gene dosage imbalance (too little or too much) can be observed only in response to an environmental challenge that can have a huge effect on fitness and is difficult to predict in the absence of this challenge. The CNV in CYP2D6 may have been selected upon by toxic alkaloids in the human diet 10–50 000 years ago [42]. Examples of P450 CNV linked to insecticide resistance have been recently described [43,44] and more are likely to be reported as predicted [45]. In the mosquito Culex quinquefasciatus, a vector of Wucheria bancrofti and of West Nile virus, CNVs in the CYP9M10 gene are observed between different strains [43]. In a pyrethroid susceptible strain named OGS, CYP9M10 is a null with a stop codon in the first coding exon, yet the strain has been maintained for over 40 years in the laboratory, and the gene is therefore dispensable. In a pyrethroid-resistant strain named JPP however, CYP9M10 is present in two identical copies in a segmental duplication of about 100 kb that may have occurred quite recently. A single copy of the duplicated CYP9M10 haplotype is sufficient to confer significant resistance to permethrin [43]. Another example is provided by the model insect Drosophila where insecticide resistance is correlated with changes in Cyp6g1 allele frequencies—from single-copy haplotype to various duplicated haplotypes with one or two transposable element insertions in the promoter regions for good measure [44]. Here, as perhaps in the CYP2D6 case, extra copies of a P450 that happens to metabolize an insecticide are equivalent to a gain of function. The wild-type with just one copy is now ‘haploinsufficient’ and strongly selected against. Few other examples of P450 CNVs are currently known so that the importance of gene dosage effects on the fate of a duplication [46] are difficult to evaluate. Overexpression of transgenic copies of a gene (when this is experimentally feasible) or constitutive overexpression due to mutations affecting expression levels can give an idea of what phenotype an increased gene dosage may have. The results are mixed but would merit a thorough exploration.
5. Patterns of sequence distribution into CYP families
The ubiquity of gene duplication events that underly the diversity of the P450 family is well documented and lineage-specific expansions in CYP subfamilies have been named ‘blooms’ [23]. The result of such blooming behaviour is reflected in the distribution of CYP gene numbers within families and subfamilies.
(a). Frequency distribution of CYP gene families
As was noted before, the distribution of CYP genes into families is skewed in all species (except those in which gene duplications are repressed, such as Neurospora crassa) [23]. The CYPome of every species is characterized by few families with large numbers of genes and many families with one or few genes. The frequency of family size versus number of genes is reasonably well described by a power law, a behaviour that is typical of families of paralogous genes in eukaryotes [47]. It can be derived from relatively simple birth and death models. Figure 5 shows the example of Vitis vinifera and one of its pests, Tetranychus urticae. Although the number of genes in the spider mite is much lower than that of the grapevine, the power-law relationship is remarkably similar. There is also no fundamental difference in the frequency distribution of CYP genes between the highly polyphagous spider mite that feeds on plants from over 140 families, and the monophagous silkworm B. mori that eats only mulberry leaves. The annual A. thaliana and the woody perennial grapevine also have the same behaviour (figure 5), and the mosquito Culex pipiens (a detritivore as larva and blood feeder as adult) or red flour beetle Tribolium castaneum, a stored grain pest, cannot be distinguished from the herbivores (results not shown). The power-law behaviour can be derived as a limit of exponential increase (gene duplications) by random decay (gene death or transition from one family to another through divergence) [47] and it has therefore a mechanistic backing [49]. In the case of the CYP genes, the sample size is small so that the power law is plausible but alternative models cannot be ruled out statistically [48]. Baek et al. [50] proposed a random group formation (RGF) model to describe the power-law distribution of data. The fit of the raw, unbinned data for CYP families to a power law with exponential cutoff predicted by the RGF model is remarkably good, perhaps because it accounts better for the low frequency, large family sizes. Theoretically, the RGF model is a prediction for the number of families with n genes, given the total CYPome size and the size of the largest CYP family. The analysis of CYPome data by the RGF model will be presented elsewhere, and the more simplistic representation of figure 5 is given for illustrative purposes. The main conclusion from figure 5, however, is that patterns of gene duplications and death in the CYPome are not dependent on the ecology and life-history traits of the organism. Plant–animal chemical warfare, widely thought to be a root cause of the proliferation of CYP genes [51], does not shape the distribution of P450s into families. There is nothing fundamentally different between a plant or a fungus that expands its repertoire of specific biosynthetic genes to make complex ‘secondary’ compounds and an animal that expands its repertoire of ‘detoxification’ P450s with their legendary wide and overlapping substrate specificity. P450 diversity is of course the result of evolutionary forces of which three, mutation (including duplication), recombination and genetic drift, are stochastic and non-adaptive [52]. The fourth, natural selection, plays a role in the evolution of P450s that is highly variable in time and space and would not, therefore, be predicted to be the principal determinant of the common distribution of CYP sequences into families that we observe.
Figure 5.
Distribution of CYP genes into families showing the frequency of CYP families of various sizes against the number of genes per family. The data for each of the four representative species were binned logarithmically over intervals of limits Xn, where log10 (Xn) = n/8 and n = 0, 1, 2,… The binned data were then fit to a power law by linear least-squares regression (dashed line). The solid line shows the one variable fit (to the power-law scaling factor or slope) obtained by maximum-likelihood estimation [48]. The power-law distribution is the limit of a birth and death model (see main text for details) and is not specific to the species.
6. Stable versus unstable genes
The observation that CYP families fall into two distinct groups: those that show no variation in family or subfamily size and those that do, dates to a comparison of human, mouse and rat CYPomes [53]. At the time, it was seen as no surprise that drug-metabolizing P450s were in the variable group. However, the question why blooms of P450 duplication occur in some families and other families have just one or two members remains puzzling [23]. Key aspects seem to be the rate of pseudogenization that is dependent on the gene's ‘importance’ and ‘stability’, whether a gene is ‘dispensable’ or ‘essential’ [54,55]. These criteria seem to have been derived ex post facto, and may be somewhat tautological, inasmuch as they depend on estimates of fitness in the present that may not reflect those in the distant evolutionary past. In addition, these estimates of fitness are themselves biased in favour of strong, immediate effects (e.g. ‘lethal’ genes in knockout screens) and against more discrete but perhaps more ecologically relevant effects that can, over time, lead to fixation or loss of a duplicate as effectively. Pesticide resistance is an extreme example of the latter, an unpredictable but relevant effect on fitness that would easily be missed in a screen for ‘essential’ genes, if pesticide treatment is not included in the screening criteria. As described below, the CYP family provides examples of genes from small subfamilies, often subfamilies with single genes, that are not as stable as would be expected.
(a). Stable and unstable genes in plants
The largest clan of plant P450s, CYP71, has endogenous P450s involved in essential functions (hormone and biopolymer biosynthesis) at its root, and has diversified into more than half of all plant P450 sequences [56]. In this case, environmental response, the synthesis of defensive chemicals, may have diverged from an initial physiological imperative. A recent example is provided by CYP701A6. Close to the root of the CYP71 clan, the CYP701 family of plants is highly conserved and is involved in the biosynthesis of the plant hormone gibberellin. Yet rice CYP701A6, the authentic ent-kaurene hydroxylase/oxidase has five CYP701 paralogues of which at least one, CYP701A8 has now been shown to catalyse hydroxylation at the C3α position of terpenes in the synthesis of antifungal phytoalexins [57].
Such diversions are also found in clans with single families, and the CYP51 family will be taken as example. CYP51 is usually a highly conserved, single-copy gene. Two CYP51 genes were initially reported from A. thaliana. They result from a relatively recent duplication, and one of the paralogues has ended up as a pseudogene that still shows root-specific expression [58]. The ‘stability’ of the CYP51 gene would thus seem confirmed in Arabidopsis: duplication remains possible but the fate of the new paralogue is to be pseudogenized. However, stability of CYP51 is not confirmed in other plant lineages. In addition to the single sterol biosynthesis (obtusifoliol 14α-demethylase) CYP51G, there are nine CYP51H enzymes in rice of unknown function and two in oats. CYP51H10, one of the oat CYP51 paralogues is involved in avenacin biosynthesis [59], so that its neofunctionalization has drawn it to the synthesis of defence compounds and away from its physiological function in primary (sterol) metabolism. This example shows that a CYP51 bloom is apparent only in the grasses lineage. In other lineages studied to date, the gene appears ‘stable’. In evolutionary time, it is possible that such blooms are totally lost, for instance because it happened long enough ago that the diverged duplicates are no longer recognized as belonging to the CYP51 family. The CYP710 family may be an example that is thought to be derived from CYP51 [56]. An early study proposed that ‘all eukaryotic P450s descended from CYP51’ [53], but this is now considered as unlikely. Indeed, the function of CYP51, three successive monooxygenations on a substrate that remains in the active site, with removal of a formyl group by carbon–carbon scission, is a specialized and likely derived trait, not an ancestral one. It is well known that the distribution of CYP51 enzymes with a conserved 14α-demethylase function (albeit with slightly different sterol substrates, the obtusifoliol branch in plants, the lanosterol branch in animals and fungi) is remarkably broad in eukaryotes and there is little doubt that strong purifying selection has maintained a recognizable CYP51 function probably over a billion years. CYP51 genes are also found in some bacteria [60], notably Mycobacterium tuberculosis and Streptomyces coelicolor. Lateral gene transfer from plants may be at the origin of the mycobacterial and methylcoccal CYP51 [60]. Even though all CYP51 genes are descended from an ancestral CYP51, the organism (eukaryotic or bacterial) in which CYP51 originated is therefore difficult to pinpoint clearly. How many other P450 genes were present in the ancestral organism where CYP51 originated is unknown, because other CYPs may have gone to extinction. It is also difficult to ascertain how many, or if any, other eukaryotic CYP families derived from CYP51 by duplications and divergence, again because extinction at one extreme or rapid divergence at the other extreme may have removed all traces of the actual progenitor of extant CYP families. Extinction of ancestral CYP families, as of individual genes (see below for CYP307), is a confounding factor in the reconstruction of the evolutionary history of the CYPome. Because the ‘stable’ or ‘essential’ character of a gene can change over evolutionary time, even CYP51, a very ancient and highly conserved P450, can become ‘unstable’ as seen in grasses. The ‘instability’ of the CYP51 gene is well documented in some lineages, such as nematodes and insects. Animals of these groups have become sterol auxotrophs, and rely on their food sources for sterols. They have lost the ability to make sterols from the (still functional) mevalonate/isoprenoid pathway and have concomitantly lost CYP51.
(b). Instability of the CYP307 gene in arthropods
CYP307 is an enzyme involved in an early step of ecdysteroid (moulting hormone) biosynthesis. Its precise function is not yet known, but involves the generation of the ecdysteroid-type steroid A/B ring chemistry and perhaps the 14α hydroxy group starting from the ring structure of 7-dehydrocholesterol. The evolutionary history of CYP307 genes is still fragmentary but shows considerable instability. In Drosophila, there are two CYP307 paralogues with a complex origin [61,62]. This history can now be traced back to Chelicerates (the two-spotted spider mite) and Mandibulates (that include crustaceans and insects). When the sequence trees are analysed and interpreted onto the species tree, it is possible to discern at least five duplication events, including two retrotranspositions in the Drosophila genus lineage, and eight gene losses (figure 6). Moreover, gene duplications and losses are not restricted to a short evolutionary time or a specific lineage, but they are spread out on the species tree indicating that the CYP307 gene paralogues are not intrinsically stable as the few number of genes of that family would imply in the classical view. For instance, two CYP307 paralogues are an ancestral trait of Pancrustacea, or perhaps of arthropods (as the single Chelicerate sequence cannot be assigned with confidence to one of the two paralogous CYP307 branches). In Hymenoptera, there was one loss in the lineage to wasps, and a reciprocal loss in the lineage of bees and ants. In Crustacea, a reciprocal loss also occurred between branchiopods (Daphnia pulex) and copepods (Lepeophtheirus salmonis). In those lineages with two more recent CYP307 paralogues (for instance, the Drosophilidae), the duplication was followed by a subfunctionalization that split the expression of the gene in two different developmental stages [63]. The example of CYP307 shows that a P450 gene with an essential physiological function (Cyp307a1 (spook) is a lethal gene in Drosophila) is ‘unstable’ when observed from a distant perspective in time and phylogenetic diversity. While there are currently no examples of blooms emanating from CYP307, there are multiple duplications and losses that are fixed in several lineages, and that have therefore overcome the expected, strong selection against CNV.
Figure 6.
Evolution of the CYP307 gene in arthropods. The species tree, drawn from the consensus phylogeny of arthropods, carries information on the CYP307 paralogues, based on maximum-likelihood reconstruction of the CYP307 phylogeny (with CYP306 as outgroup). Gene birth is shown with a stork symbol, and gene death with a grim reaper symbol, so that each line symbol represents a new CYP307 paralogue (full line, stippled lines, dotted lines, a total of seven paralogues; e.g. there is one in Lepidoptera, but three in A. pisum). This study expands on earlier interpretations of CYP307 evolution [61,62]. The timing of the initial duplication is shown to have occurred in the Pancrustacea lineage, but this remains hypothetical. The Drosophila subgenus is represented by D. mojavensis, D. grimshawi and D. virilis and the Sophophora subgenus by D. melanogaster, D. yakuba and D. ananassae. Lepidoptera are represented by B. mori, Spodoptera littoralis and Danaus plexippus, bees and ants are represented by Apis mellifera, Bombus terrestris, Atta cephalotes, Linepithema humile, Pogonomyrmex barbatus and Solenopsis invicta.
(c). Physiology and environmental response: back and forth
The examples described above show that even CYP families that have today a single gene in any given species are not immune to the general birth and death process. In fact, there is somewhat of a paradox in that slowly evolving genes under purifying selection, when duplicated, have a higher probability of fixation [64,65]. Yet such genes are, in CYPomes, precisely those considered essential, the highly conserved genes involved in the synthesis or metabolism of signal and structural metabolites crucial to survival (hereafter called physiology in contrast to environmental response). Furthermore, a genome-wide study in Drosophila found that genes with restricted transcriptional activity are predominantly located in late-replicating regions of the genome, where there is a high CNV mutational bias [66]. This observation would seem to conflict with the observation that CYP genes involved in physiology are often members of single gene subfamilies, whereas those involved in environmental response are generally members of larger subfamilies. Restricted transcriptional activity is expected of hormone biosynthesis and metabolism, at a precise time and in a specialized cell type. Detoxification, on the other hand, may seem a priori to fall in the other extreme. An attempt to provide stronger relationships between functional constraints, strength of purifying selection and gene duplication was to partition gene families into those that contain at least one essential gene (E), and those without essential genes (N) [67]. The latter were shown to be more dynamic with higher rates of pseudogenization as well as of duplicate fixation. The CYP family as a whole would be an E family, and even at the subfamily level, it is difficult to find pure N subfamilies. The basis of such classifications is again the definition of essential gene, and the criterion used is that of high-throughput knockout or knockdown screens [67]. A distinction between genes under ‘single-copy control’ and those under ‘multicopy licence’ [68] appears stronger than the presence or absence of ‘essential’ genes in a group, and may provide a way out of the conundrum offered by CYPomes. Both modes of evolution, single-copy control and multicopy licence, that each represent the extreme of a spectrum, coexist in CYPomes but remain to be described by other than heuristic criteria.
Traditionally, plant P450 diversity is viewed as a set of sophisticated chemical factories, assembly lines of specific enzymes mounting the chemical defences against herbivores and pathogens, with a background of common P450s involved in the synthesis of hormones and structural building blocks. Animal P450 diversity is seen as an arsenal of polyvalent detoxification enzymes needed to counter the plant defences, with a background of common P450s involved in the metabolism of signal molecules. Which one of these two major functions ‘came first’ is a difficult question to answer. The enzymatic prowess of P450s and their wide latitude in exploring sequence space may predispose them to switch back and forth between the metabolism of critical structural or signal molecules and the metabolism dedicated to environmental response. The P450 phylogeny does not give any clues to what substrate class is metabolized by which branch of the tree. In humans, 10 members of the CYP2 family are xenobiotic-metabolizing enzymes, two metabolize endogenous substrates (CYP2J2 and CYP2R1) while four remain ‘orphans’ [18]. In vertebrates, all mitochondrial clan P450s metabolize endogenous substrates, while in arthropods this clan has generated several CYP families of xenobiotic-metabolizing P450s [23]. The P450 diversity is therefore remarkable not just because it is so enormous, but also because it cannot easily be categorized by a typical gene function or by a single evolutionary process shaping it. In Gene Ontology, CYP genes cover a range of ‘cellular component’ and ‘biological process’ categories, and fit in several ‘molecular function’ categories. P450 diversity defies generalizations, and that is surely one of its main attractions.
Acknowledgements
We thank David R. Nelson for the invitation to contribute to this special issue and for useful comments on this paper. We also thank him for his many years of commitment to the P450 field, providing not just CYP names, but stimulating ideas that so often bring together the interests of our diverse community. We thank the reviewers for thoughtful comments that helped improve the manuscript. We thank Michael R. Feyereisen for analysis of the data of figure 5. Hideki Sezutsu's work in Sophia-Antipolis was supported by INRA Department SPE in a NIAS-INRA collaboration that is gratefully acknowledged.
References
- 1.Nelson DR. 2013. A world of cytochrome P450s. Phil. Trans. R. Soc. B 368, 20120430. 10.1098/rstb.2012.0430 (doi:10.1098/rstb.2012.0430) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Schalk M, Croteau R. 2000. A single amino acid substitution (F363I) converts the regiochemistry of the spearmint (-)-limonene hydroxylase from a C6- to a C3-hydroxylase. Proc. Natl Acad. Sci. USA 97, 11 948–11 953 10.1073/pnas.97.22.11948 (doi:10.1073/pnas.97.22.11948) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kahn RA, Durst F. 2000. Function and evolution of plant cytochrome P450. In Evolution of metabolic pathways (ed. Romeo JT.), pp. 151–191 Amsterdam, The Netherlands: Elsevier Science Ltd; 10.1016/S0079-9920(00)80007-6 (doi:10.1016/S0079-9920(00)80007-6) [DOI] [Google Scholar]
- 4.Nakahara K, Tanimoto T, Hatano K, Usuda K, Shoun H. 1993. Cytochrome P-450 55A1 (P-450dNIR) acts as nitric oxide reductase employing NADH as the direct electron donor. J. Biol. Chem. 268, 8350–8355 [PubMed] [Google Scholar]
- 5.Lee DS, Nioche P, Hamberg M, Raman CS. 2011. Structural insights into the evolutionary paths of oxylipin biosynthetic enzymes. Nature 455, 363–368 10.1038/nature07307 (doi:10.1038/nature07307) [DOI] [PubMed] [Google Scholar]
- 6.Nebert DW, Feyereisen R. 1994. Evolutionary argument for a connection between drug metabolism and signal transduction. In Cytochrome P450, 8th Int. Conf. (ed. Lechner MC.), pp. 3–13 Paris, France: John Libbey Eurotext [Google Scholar]
- 7.Poulos TL, Johnson EF. 2005. Structures of cytochrome P450 enzymes. In Cytochrome P450. Structure, mechanism and biochemistry (ed. Ortiz de Montellano PR.), pp. 87–114 New York, NY: Kluwer Academic [Google Scholar]
- 8.Mestres J. 2005. Structure conservation in cytochromes P450. Proteins 58, 596–609 10.1002/prot.20354 (doi:10.1002/prot.20354) [DOI] [PubMed] [Google Scholar]
- 9.Lynch M, Conery JS. 2000. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 10.1126/science.290.5494.1151 (doi:10.1126/science.290.5494.1151) [DOI] [PubMed] [Google Scholar]
- 10.Doolittle RF. 1986. Of URFs and ORFs: a primer on how to analyze derived amino acid sequences. Mill Valley, CA: University Science Books [Google Scholar]
- 11.Werck-Reichhart D, Feyereisen R. 2000. Cytochromes P450: a success story. Genome Biol. 1, 3003.1–3003.9 10.1186/gb-2000-1-6-reviews3003 (doi:10.1186/gb-2000-1-6-reviews3003) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Imai M, Shimada H, Watanabe Y, Matsushima-Hibaya Y, Makino R, Koga H, Horiuchi T, Ishimura Y. 1989. Uncoupling of the cytochrome P-450cam monooxygenase reaction by a single mutation, threonine-252 to alanine or valine: possible role of the hydroxy amino acid in oxygen activation. Proc. Natl Acad. Sci. USA 86, 7823–7827 10.1073/pnas.86.20.7823 (doi:10.1073/pnas.86.20.7823) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Weber JM, Leung JO, Swanson SJ, Idler KB, McAlpine JB. 1991. An erythromycin derivative produced by targeted gene disruption in Saccharopolyspora erythraea. Science 252, 114–117 10.1126/science.2011746 (doi:10.1126/science.2011746) [DOI] [PubMed] [Google Scholar]
- 14.Andersen JF, Tatsuta K, Gunji H, Ishiyama T, Hutchinson CR. 1993. Substrate specificity of 6-deoxyerythronolide B hydroxylase, a bacterial cytochrome P450 of erythromycin A biosynthesis. Biochemistry 32, 1905–1913 10.1021/bi00059a004 (doi:10.1021/bi00059a004) [DOI] [PubMed] [Google Scholar]
- 15.Slessor KE, Farlow AJ, Cavaignac SM, Stok JE, De Voss JJ. 2011. Oxygen activation by P450(cin): protein and substrate mutagenesis. Arch. Biochem. Biophys. 507, 154–162 10.1016/j.abb.2010.09.009 (doi:10.1016/j.abb.2010.09.009) [DOI] [PubMed] [Google Scholar]
- 16.Mizutani M, Sato F. 2011. Unusual P450 reactions in plant secondary metabolism. Arch. Biochem. Biophys. 507, 194–203 10.1016/j.abb.2010.09.026 (doi:10.1016/j.abb.2010.09.026) [DOI] [PubMed] [Google Scholar]
- 17.Rupasinghe S, Schuler MA, Kagawa N, Yuan H, Lei L, Zhao B, Kelly SL, Waterman MR, Lamb DC. 2006. The cytochrome P450 gene family CYP157 does not contain EXXR in the K-helix reducing the absolute conserved P450 residues to a single cysteine. FEBS Lett. 580, 6338–6342 10.1016/j.febslet.2006.10.043 (doi:10.1016/j.febslet.2006.10.043) [DOI] [PubMed] [Google Scholar]
- 18.Guengerich FP, Tang Z, Cheng Q, Salamanca-Pinzon SG. 2011. Approaches to deorphanization of human and microbial cytochrome P450 enzymes. Biochim. Biophys. Acta 1814, 139–145 10.1016/j.bbapap.2010.05.005 (doi:10.1016/j.bbapap.2010.05.005) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Vatsis KP, Peng HM, Coon MJ. 2005. Abolition of oxygenase function, retention of NADPH oxidase activity, and emergence of peroxidase activity upon replacement of the axial cysteine-436 ligand by histidine in cytochrome P450 2B4. Arch. Biochem. Biophys. 434, 128–138 10.1016/j.abb.2004.10.015 (doi:10.1016/j.abb.2004.10.015) [DOI] [PubMed] [Google Scholar]
- 20.Brodhun F, Gobel C, Hornung E, Feussner I. 2009. Identification of PpoA from Aspergillus nidulans as a fusion protein of a fatty acid heme dioxygenase/peroxidase and a cytochrome P450. J. Biol. Chem. 284, 11 792–11 805 10.1074/jbc.M809152200 (doi:10.1074/jbc.M809152200) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Brodhun F, Schneider S, Gobel C, Hornung E, Feussner I. 2010. PpoC from Aspergillus nidulans is a fusion protein with only one active haem. Biochem. J. 425, 553–565 10.1042/BJ20091096 (doi:10.1042/BJ20091096) [DOI] [PubMed] [Google Scholar]
- 22.Zaphiropoulos PG. 2003. A map of the mouse Cyp3a locus. DNA Seq. 14, 155–162 10.1080/1042517031000089478 (doi:10.1080/1042517031000089478) [DOI] [PubMed] [Google Scholar]
- 23.Feyereisen R. 2011. Arthropod CYPomes illustrate the tempo and mode in P450 evolution. Biochim. Biophys. Acta 1814, 19–28 10.1016/j.bbapap.2010.06.012 (doi:10.1016/j.bbapap.2010.06.012) [DOI] [PubMed] [Google Scholar]
- 24.Lipinski KJ, Farslow JC, Fitzpatrick KA, Lynch M, Katju V, Bergthorsson U. 2011. High spontaneous rate of gene duplication in Caenorhabditis elegans. Curr. Biol. 21, 306–310 10.1016/j.cub.2011.01.026 (doi:10.1016/j.cub.2011.01.026) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hu S, Wang H, Knisely AA, Reddy S, Kovacevic D, Liu Z, Hoffman SM. 2008. Evolution of the CYP2ABFGST gene cluster in rat, and a fine-scale comparison among rodent and primate species. Genetica 133, 215–226 10.1007/s10709-007-9206-x (doi:10.1007/s10709-007-9206-x) [DOI] [PubMed] [Google Scholar]
- 26.Wang H, Donley KM, Keeney DS, Hoffman SM. 2003. Organization and evolution of the Cyp2 gene cluster on mouse chromosome 7, and comparison with the syntenic human cluster. Environ. Health Perspect. 111, 1835–1842 10.1289/ehp.6546 (doi:10.1289/ehp.6546) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Song G, et al. 2011. Conversion events in gene clusters. BMC Evol. Biol. 11, 226. 10.1186/1471-2148-11-226 (doi:10.1186/1471-2148-11-226) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Miller WL, Auchus RJ. 2011. The molecular biology, biochemistry, and physiology of human steroidogenesis and its disorders. Endocrinol. Rev. 32, 81–151 10.1210/er.2010-0013 (doi:10.1210/er.2010-0013) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Horiuchi Y, Kawaguchi H, Figueroa F, O'hUigin C, Klein J. 1993. Dating the primigenial C4-CYP21 duplication in primates. Genetics 134, 331–339 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.d'Alencon E, et al. 2010. Extensive synteny conservation of holocentric chromosomes in Lepidoptera despite high rates of local genome rearrangements. Proc. Natl Acad. Sci. USA 107, 7680–7685 10.1073/pnas.0910413107 (doi:10.1073/pnas.0910413107) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Chen S, Li X. 2007. Transposable elements are enriched within or in close proximity to xenobiotic-metabolizing cytochrome P450 genes. BMC Evol. Biol. 7, 46. 10.1186/1471-2148-7-46 (doi:10.1186/1471-2148-7-46) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Nelson DR, Goldstone JV, Stegeman JJ. 2013. The cytochrome P450 genesis locus: the origin and evolution of animal cytochrome P450s. Phil. Trans. R. Soc. B 368, 20120474. 10.1098/rstb.2012.0474 (doi:10.1098/rstb.2012.0474) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Grubor VD, Heckel DG. 2007. Evaluation of the role of CYP6B cytochrome P450s in pyrethroid resistant Australian Helicoverpa armigera. Insect Mol. Biol. 16, 15–23 10.1111/j.1365-2583.2006.00697.x (doi:10.1111/j.1365-2583.2006.00697.x) [DOI] [PubMed] [Google Scholar]
- 34.Hahn MW. 2009. Distinguishing among evolutionary models for the maintenance of gene duplicates. J. Hered. 100, 605–617 10.1093/jhered/esp047 (doi:10.1093/jhered/esp047) [DOI] [PubMed] [Google Scholar]
- 35.Schrider DR, Hahn MW. 2010. Gene copy-number polymorphism in nature. Proc. R. Soc. B 277, 3213–3221 10.1098/rspb.2010.1180 (doi:10.1098/rspb.2010.1180) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kondrashov FA, Koonin EV. 2004. A common framework for understanding the origin of genetic dominance and evolutionary fates of gene duplications. Trends Genet. 20, 287–290 10.1016/j.tig.2004.05.001 (doi:10.1016/j.tig.2004.05.001) [DOI] [PubMed] [Google Scholar]
- 37.Kacser H, Burns JA. 1981. The molecular basis of dominance. Genetics 97, 639–666 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bagheri HC. 2006. Unresolved boundaries of evolutionary theory and the question of how inheritance systems evolve: 75 years of debate on the evolution of dominance. J. Exp. Zool. B Mol. Dev. Evol. 306, 329–359 10.1002/jez.b.21069 (doi:10.1002/jez.b.21069) [DOI] [PubMed] [Google Scholar]
- 39.Nebert DW, Wikwall K, Miller WL. 2013. Human cytochromes P450 in health and disease. Phil. Trans. R. Soc. B 368, 20120431. 10.1098/rstb.2012.0431 (doi:10.1098/rstb.2012.0431) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Ingelman-Sundberg M. 2004. Pharmacogenetics of cytochrome P450 and its applications in drug therapy: the past, present and future. Trends Pharmacol. Sci. 25, 193–200 10.1016/j.tips.2004.02.007 (doi:10.1016/j.tips.2004.02.007) [DOI] [PubMed] [Google Scholar]
- 41.Yasukochi Y, Satta Y. 2011. Evolution of the CYP2D gene cluster in humans and four non-human primates. Genes Genet. Syst. 86, 109–116 [DOI] [PubMed] [Google Scholar]
- 42.Ingelman-Sundberg M, Oscarson M, McClellan RA. 1999. Polymorphic human cytochrome P450 enzymes: an opportunity for individualized drug treatment. Trends Pharmacol. Sci. 20, 342–349 10.1016/S0165-6147(99)01363-2 (doi:10.1016/S0165-6147(99)01363-2) [DOI] [PubMed] [Google Scholar]
- 43.Itokawa K, Komagata O, Kasai S, Okamura Y, Masada M, Tomita T. 2010. Genomic structures of Cyp9m10 pyrethroid-resistant and -susceptible strains of Culex quinquefasciatus. Insect Biochem. Mol. Biol. 40, 631–640 10.1016/j.ibmb.2010.06.001 (doi:10.1016/j.ibmb.2010.06.001) [DOI] [PubMed] [Google Scholar]
- 44.Schmidt JM, et al. 2010. Copy number variation and transposable elements feature in recent, ongoing adaptation at the Cyp6g1 locus. PLoS Genet. 6, e1000998. 10.1371/journal.pgen.1000998 (doi:10.1371/journal.pgen.1000998) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Taylor M, Feyereisen R. 1996. Molecular biology and evolution of resistance of toxicants. Mol. Biol. Evol. 13, 719–734 10.1093/oxfordjournals.molbev.a025633 (doi:10.1093/oxfordjournals.molbev.a025633) [DOI] [PubMed] [Google Scholar]
- 46.Kondrashov FA, Kondrashov AS. 2006. Role of selection in fixation of gene duplications. J. Theor. Biol. 239, 141–151 10.1016/j.jtbi.2005.08.033 (doi:10.1016/j.jtbi.2005.08.033) [DOI] [PubMed] [Google Scholar]
- 47.Koonin EV, Wolf YI. 2010. Constraints and plasticity in genome and molecular-phenome evolution. Nat. Rev. Genet. 11, 487–498 10.1038/nrg2810 (doi:10.1038/nrg2810) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Clauset A, Shalizi CR, Newman MEJ. 2009. Power-law distributions in empirical data. SIAM Rev. 51, 661–703 10.1137/070710111 (doi:10.1137/070710111) [DOI] [Google Scholar]
- 49.Stumpf MPH, Porter MA. 2012. Critical truths about power laws. Science 335, 665–666 10.1126/science.1216142 (doi:10.1126/science.1216142) [DOI] [PubMed] [Google Scholar]
- 50.Baek SK, Bernhardsson S, Minnhagen P. 2011. Zipf's law unzipped. New J. Phys. 13, 043004 [Google Scholar]
- 51.Gonzalez F, Nebert DW. 1990. Evolution of the P450 gene superfamily: animal–plant ‘warfare’, molecular drive, and human genetic differences in drug oxidation. Trends Genet. 6, 182–186 10.1016/0168-9525(90)90174-5 (doi:10.1016/0168-9525(90)90174-5) [DOI] [PubMed] [Google Scholar]
- 52.Lynch M. 2007. The frailty of adaptive hypotheses for the origins of organismal complexity. Proc. Natl Acad. Sci. USA 104, 8597–8604 10.1073/pnas.0702207104 (doi:10.1073/pnas.0702207104) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Nelson DR. 1999. Cytochrome P450 and the individuality of species. Arch. Biochem. Biophys. 369, 1–10 10.1006/abbi.1999.1352 (doi:10.1006/abbi.1999.1352) [DOI] [PubMed] [Google Scholar]
- 54.Hughes T, Liberles DA. 2008. The power-law distribution of gene family size is driven by the pseudogenisation rate's heterogeneity between gene families. Gene 414, 85–94 10.1016/j.gene.2008.02.014 (doi:10.1016/j.gene.2008.02.014) [DOI] [PubMed] [Google Scholar]
- 55.Krylov DM, Wolf YI, Rogozin IB, Koonin EV. 2003. Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res. 13, 2229–2235 10.1101/gr.1589103 (doi:10.1101/gr.1589103) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Nelson D, Werck-Reichhart D. 2011. A P450-centric view of plant evolution. Plant J. 66, 194–211 10.1111/j.1365-313X.2011.04529.x (doi:10.1111/j.1365-313X.2011.04529.x) [DOI] [PubMed] [Google Scholar]
- 57.Wang Q, Hillwig ML, Wu Y, Peters RJ. 2012. CYP701A8: A rice ent-kaurene oxidase paralog diverted to more specialized diterpenoid metabolism. Plant Physiol. 158, 1418–1425 10.1104/pp.111.187518 (doi:10.1104/pp.111.187518) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Kim HB, Schaller H, Goh CH, Kwon M, Choe S, An CS, Durst F, Feldmann KA, Feyereisen R. 2005. Arabidopsis cyp51 mutant shows postembryonic seedling lethality associated with lack of membrane integrity. Plant Physiol. 138, 2033–2047 10.1104/pp.105.061598 (doi:10.1104/pp.105.061598) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Qi X, et al. 2006. A different function for a member of an ancient and highly conserved cytochrome P450 family: from essential sterols to plant defense. Proc. Natl Acad. Sci. USA 103, 18 848–18 853 10.1073/pnas.0607849103 (doi:10.1073/pnas.0607849103) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Rezen T, Debeljak N, Kordis D, Rozman D. 2004. New aspects on lanosterol 14α-demethylase and cytochrome P450 evolution: lanosterol/cycloartenol diversification and lateral transfer. J. Mol. Evol. 59, 51–58 10.1007/s00239-004-2603-1 (doi:10.1007/s00239-004-2603-1) [DOI] [PubMed] [Google Scholar]
- 61.Sztal T, Chung H, Gramzow L, Daborn PJ, Batterham P, Robin C. 2007. Two independent duplications forming the Cyp307a genes in Drosophila. Insect Biochem. Mol. Biol. 37, 1044–1053 10.1016/j.ibmb.2007.05.017 (doi:10.1016/j.ibmb.2007.05.017) [DOI] [PubMed] [Google Scholar]
- 62.Rewitz KF, Gilbert LI. 2008. Daphnia Halloween genes that encode cytochrome P450s mediating the synthesis of the arthropod molting hormone: evolutionary implications. BMC Evol. Biol. 8, 60. 10.1186/1471-2148-8-60 (doi:10.1186/1471-2148-8-60) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Ono H, et al. 2006. Spook and Spookier code for stage-specific components of the ecdysone biosynthetic pathway in Diptera. Dev. Biol. 298, 555–570 10.1016/j.ydbio.2006.07.023 (doi:10.1016/j.ydbio.2006.07.023) [DOI] [PubMed] [Google Scholar]
- 64.Davis PC, Petrov DA. 2004. Preferential duplication of conserved proteins in eukaryotic genomes. PLoS Biol. 2, E55. 10.1371/journal.pbio.0020055 (doi:10.1371/journal.pbio.0020055) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Jordan IK, Wolf YI, Koonin EV. 2004. Duplicated genes evolve slower than singletons despite the initial rate increase. BMC Evol. Biol. 4, 22. 10.1186/1471-2148-4-22 (doi:10.1186/1471-2148-4-22) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Cardoso-Moreira MM, Long M. 2010. Mutational bias shaping fly copy number variation: implications for genome evolution. Trends Genet. 26, 243–247 10.1016/j.tig.2010.03.002 (doi:10.1016/j.tig.2010.03.002) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Shakhnovich BE, Koonin EV. 2006. Origins and impact of constraints in evolution of gene families. Genome Res. 16, 1529–1536 10.1101/gr.5346206 (doi:10.1101/gr.5346206) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Waterhouse RM, Zdobnov EM, Kriventseva EV. 2010. Correlating traits of gene retention, sequence divergence, duplicability and essentiality in vertebrates, arthropods, and fungi. Genome Biol. Evol. 2, 75–86 [DOI] [PMC free article] [PubMed] [Google Scholar]



