Skip to main content
Biology Letters logoLink to Biology Letters
. 2009 Mar 25;5(3):397–400. doi: 10.1098/rsbl.2009.0104

Modelling mitochondrial site polymorphisms to infer the number of segregating units and mutation rate

Michael D Hendy 1,*, Michael D Woodhams 1,, Andrew Dodd 2,
PMCID: PMC2679937  PMID: 19324622

Abstract

We present a mathematical model of mitochondrial inheritance evolving under neutral evolution to interpret the heteroplasmies observed at some sites. A comparison of the levels of heteroplasmies transmitted from mother to her offspring allows us to estimate the number Nx of inherited mitochondrial genomes (segregating units). The model demonstrates the necessity of accounting for both the multiplicity of an unknown number Nx, and the threshold θ, below which heteroplasmy cannot be detected reliably, in order to estimate the mitochondrial mutation rate μm in the maternal line of descent. Our model is applicable to pedigree studies of any eukaryotic species where site heteroplasmies are observed in regions of the mitochondria, provided neutrality can be assumed. The model is illustrated with an analysis of site heteroplasmies in the first hypervariable region of mitochondrial sequence data sampled from Adélie penguin families, providing an estimate Nx and μm. This estimate of μm was found to be consistent with earlier estimates from ancient DNA analysis.

Keywords: mitochondrial DNA, mutation rate, segregating units, pedigree analysis, Adélie penguins

1. Introduction

It has recently been found that estimates of the human mitochondrial mutation rate from pedigree data are many times higher than those estimated from sequence divergence at putatively neutral sites (e.g. Parsons et al. 1997; Howell et al. 2003; Santos et al. 2005). One possible reason for this is that heteroplasmy (variation among different organelle genomes within the same individual) may be maintained for many generations, after the origin by mutation of a new variant, provided that there is not a very small bottleneck in the number of organelle genomes during transmission from parent to offspring (reviewed by Birky 1991). The persistence of heteroplasmic mutations for many generations after their origin may lead to an inflated contribution to a pedigree-based estimate of mutation rate, if they are treated as mutations that have become fixed within individuals. We propose a method for estimating the mutation rate and size of the transmission bottleneck for maternally transmitted mitochondrial genomes. We illustrate this using the dataset of Millar et al. (2008) on the Adélie penguin.

Millar et al. (2008) sequenced a segment of 344 base pairs in a fast evolving region of first hypervariable region (a region they argued was not under strong selective pressure) from 1931 Adélie penguins (508 families) at Antarctic nesting sites. This was a follow-up to the study of Lambert et al. (2002) of ancient and contemporary samples of the same region of mitochondrial DNA. In the data reported in Millar et al. (2008), a low proportion of the sites in individual birds exhibited heteroplasmies, where two bases were called on the electropherogram at that site, each with a signal greater than θ=0.23 of the total signal. Signals below the detection threshold θ, could not be reliably distinguished from background noise and were not reported there. Fifty-five mothers (out of 508) exhibited site heteroplasmies (above θ), with seven having two heteroplasmic sites. Hence, the proportion of sites with an observed heteroplasmy across all mothers was

βˆ=48+2×7508×344=3.55×104. (1.1)

In all but three cases, the observed site heteroplasmies persisted above the threshold in both mother and chicks. Site heteroplasmies in a chick were only checked at sites where a heteroplasmy had been observed in its mother.

Following Lambert et al. (2002) and Millar et al. (2008), we define mutation rate as the rate at which a base substitution is incorporated into all mitochondrial genomes of an individual. Previous studies linking mutation rate with observed site heteroplasmies include two studies (Brandstätter et al. 2004; Santos et al. 2005) of human populations of up to three generations, where the mutation rate was estimated from assuming that each observed site heteroplasmy represented a transitional substitution. We find most of the substitutions giving rise to an observed site heteroplasmy are subsequently lost, and those that eventually become fixed may persist as an observed site heteroplasmy for many generations and be oversampled.

For our study, we follow the maternal ancestry of an individual, tracking the trajectory of a site substitution introduced into the germ line by some ancestor. At each generation, the population of mitochondrial genomes will pass through a bottleneck when the oocyte segregates a small number of its mother's genomes. Lambert et al. (2002) argued that this region of the mt-genome is not under strong selective pressure, so our model assumes random drift among the Nx segregating genomes.

This model should be applicable for any region of the mt-genome assumed neutral under selection for any eukaryote species.

2. Material and methods

Our model focuses on an individual site of the mitochondrial genomes in the maternal ancestry of one individual. When a heteroplasmy (comprising two different nucleotides, generically X, Y) is observed at a site, we presume, this is a consequence of a somatic substitution XY or YX at that site in a maternal ancestor some g generations prior, inherited by a daughter, which has persisted for g generations.

Consider the maternal ancestry of an individual A0, with A1 its mother and A2 its maternal grandmother, etc. Suppose a nucleotide substitution XY has occurred at a site in one genome of a maternal line ancestor Ag (g≥1), which is inherited by Ag−1. Our model shows that the probability of an additional substitution inherited at that site among Ag−1, …, A0 is less than 10−2, so we will neglect this possibility. Suppose this site heteroplasmy persists for (exactly) k≥1 generations, inherited by Ag−1, …, Agk, but not by Agk−1. If kg, then A0 will be heteroplasmic at that site, although not necessarily observable. If k<g, Agk−1, …, A0 are not heteroplasmic at that site, with their genomes either containing all X or all Y.

We propose a model where each oocyte recruits Nx mitochondrial genomes independently from the population of its mother's genomes. (The number Nx is sometimes referred to as the number of segregating units. For the Adélie penguins, we estimated Nx to have a 95% confidence interval (CI) of 25<Nx<69.)

The observed levels of most site heteroplasmies from the blood samples of a mother and her chicks closely agree, which reflects a close agreement in the germ line. We will assume that the variation, after accounting for measurement uncertainty, is due to sampling at the recruiting bottleneck and that the proportions in the blood sample estimate the inherited proportions.

If A1 (the mother of A0) contains a site heteroplasmy with the nucleotides Y and X appearing in proportions ϕ and 1−ϕ, then the probability (using a binomial selection with replacement) that A0 inherits i genomes with allele Y and Nxi genomes of allele X is

Pr(i|Nx,ϕ)=(Nxi)ϕi(1ϕ)Nxi.

If A1 had inherited j copies of allele Y and Nxj of allele X from her mother A2, we assume she exhibits the proportions ϕ=j/Nx and 1−ϕ=1−j/Nx of the alleles in the genomes available for inheritance. Hence,

pNx(i,j)=(Nxi)(jNx)i(NxjNx)Nxi, (2.1)

is the probability that A0 inherits i copies of allele Y, and Nxi copies of allele X, given her mother had inherited j and Nxj corresponding copies. Let

PNx=[pNx(i,j)], (2.2)

be the matrix of these probabilities, where the Nx−1 rows and columns are indexed by i,j∈{1,2, …, Nx−1}.

Given that Ag has a somatic mutation in a descendant of one of her Nx founding genomes, the probability that Ag−1 inherits more than one mutated genome is very small. Hence, we will assume Ag−1 is heteroplasmic at that site, with proportion 1/Nx of its genomes containing Y. If the heteroplasmy is lost g generations later, then A0 has either all X or all Y at that site. Let hX,Y be the proportion of cases where the heteroplasmy persists and hX and hY be the proportions where the mutation is lost or fixed. The neutral model predicts that as g increases

hX,Y0,hXNx1Nx,hY1Nx.

In a simulation study of 107 heteroplasmic site histories, we followed the introduction of one mutation until the site heteroplasmy was lost, for Nx=20 and for Nx=40. We found (table 1) for θ=0.23 that hX≈((Nx−1)/Nx) and hY≈1/Nx. Table 1 also gives the average numbers of generations that the site heteroplasmies persist, and are observable. We note that over all histories, the average numbers of generations a site heteroplasmy was observable were almost identical for Nx=20 and 40, but the average variation in the levels of a heteroplasmy at a site between a mother and her chick differed significantly.

Table 1.

Site heteroplasmy histories: results from 107 simulations for Nx=20 and 40 segregating units. (In more than 99.9% of the histories, the introduced mutation is either lost or fixed within 200 generations, and no site heteroplasmy survived more than 520 generations. Statistics are presented for the cases HXn (XY lost and never observed at θ=0.23), HXo (lost but observable for at least one generation) and HY (the mutation XY is ultimately fixed). We note that the proportions fixed (HY) is close to its expected value of 1/Nx. gav records the mean number of generations the site heteroplasmy survives, and gav(obs) records the mean number of generations the site heteroplasmy is in the observable range. Note as Nx is doubled, the gav(obs) values double, but the proportions are halved so the mean number of generations for which a site heteroplasmy is observable is approximately constant, close to 2.30 in both cases. In the final row, we note the average mean square difference between the mother/chick pairs over all observed heteroplasmic sites differs for Nx=20 and Nx=40.)

Nx=20 Nx=40


category proportion gav gav(obs) proportion gav gav(obs)
HXn 82.47% 3.87 91.05% 4.94
HXo 12.54% 24.26 10.27 6.46% 50.06 19.36
HY 4.99% 37.18 20.75 2.48% 77.54 40.93
all 100.00% 8.09 2.32 100.00% 9.66 2.27
obs. M/C av. diff.=8.49% av. diff.=5.98%

For each Ak in the ancestry, let nk be the number of its founding genomes with nucleotide Y at that site. We have assumed that ng−1=1. If for k<g−1, nk=0 or Nx, the heteroplasmy is lost. Suppose 1≤nk+1=j<Nx, then the probability that nk=i, (1≤i<Nx) is

Pr(nk=i|nk+1=j)=(PNx)i,j.

In lemma 1 of the electronic supplementary material, we show that

Pr(nk=i|ng1=1)=(PNx)i,1(gk1),

which is the ith entry of the leading column of PNx(gk1). Assuming that the probability that Ag introduces a somatic mutation into the germ line at a selected site is α, then summing over all generations g≥1, the probability that A0 has a site heteroplasmy with n0=i (1≤i<Nx) is

Pr(n0=i)=g1α(PNx)i,1(g1)=α(QNx)i,1, (2.3)

where (QNx)i,1 is the first entry in the ith row of

QNx=g1PNx(g1)=(IPNx)1.

As (QNx)i,1 is the expected number of generations that a new site heteroplasmy persists with i copies (figure 1), the expected number of generations a heteroplasmy is observable at that site is

obsNx=θi/Nx1θ(QNx)i,1.

Most site heteroplasmies never reach the detection threshold, those that do, usually persist for many more than obsNx generations (table 1).

Figure 1.

Figure 1

Values of (QNx)i,1 (written as perNx), the expected number of generations a site heteroplasmy persists, and of obsNx(0.23), the expected number of generations the site heteroplasmy are observable with threshold θ=0.23 (plotted on a logarithmic scale). Note that obsNx(θ) converges to the constant value 2 ln(θ−1−1)=2.417 (dotted line), while perNx (solid line) grows logarithmically as Nx increases.

In figure 2, we plot the values of (Q20)i,1 and (Q40)i,1, noting that these two distributions are almost identical, and that (QNx)i,1 is closely approximated by 2/i within the observed region. (The limit (QNx)i,12/i as Nx→∞ was noted by Fisher and Wright (Ewens 2004, eqn (1.56)).)

Figure 2.

Figure 2

The values of per20(i), (a) for i=2, …, 19, and per40(i), (b) for i=2, …, 39, are plotted as dots. The solid lines are the curves 2/i. The shaded regions are outside the detection threshold for θ=0.23 with the observable region (unshaded) between them. We note that in both observed regions the points perNx(i) fit the curve 2/i closely.

We show in lemma 2 of the electronic supplementary material, that assuming QNx(i)(2/i), the probability β that a site has an observable heteroplasmy is closely approximated by 2α ln(θ−1−1). Assuming neutral evolution, 1/Nx of the substitutions entering the germ line will become fixed in the maternal line of descent, so that the mutation rate can be estimated as

μm=α/Nxtβ2Nxtln(θ11), (2.4)

where t is the generation time.

3. Results

We have shown that majority of heteroplasmies cannot be observed, leading to undersampling; but that those that do may persist for many generations, leading to oversampling. These two effects do not balance. We have demonstrated that the estimate of the mutation rate from the density of observed heteroplasmic sites is dependent both on the number of segregating units Nx and on the detection threshold θ.

For the Adélie penguin data of Millar et al. (2008), with θ=0.23, t=6.46 and βˆ=3.55×104, we used Bayesian analysis (see ‘Estimating Nx from mother–chick comparisons’ of the electronic supplementary material) to obtain a maximum-likelihood estimate of Nx=36.5, with 95% CI of 25.0≤Nx≤66.9. Using equation (2.4), Millar et al. (2008) estimated the median value to be μm=0.55 s s−1Myr−1 (CI 0.29≤μm≤0.88), a value not significantly different from the ancient evolutionary rate of k=0.86 s s−1Myr−1 (CI 0.53≤k≤1.17) estimated by Lambert et al. (2002) for the same region. In their study, Millar et al. showed that if they had assumed that each observed heteroplasmy represented a mutation in transition, the estimate of μ would have been increased by a factor of nearly 100.

4. Discussion

This model has assumed neutral evolution for the regions of mitochondria genome under analysis. Whether specific regions are under selective pressure is outside the scope of this study; however, Rand et al. (1994), for example, addressed this issue. Our model is applicable wherever the assumption of neutrality can be assumed for pedigree studies of any species where multiple copies of the mitochondrial genome are inherited, and a sufficient number of heteroplasmies are observed.

Accounting for the effects modelled here may illuminate the apparent discrepancies reported between molecular substitution rates and those estimated from pedigree studies, such as for human studies, e.g. Parsons et al. (1997), Howell et al. (2003) and Santos et al. (2005). As these studies do not report observational thresholds, our model cannot be applied directly to their data.

Improvement in the accuracy of determining the relative expression levels of site heteroplasmies (with θ1/Nx) might lead to a direct estimate of Nx. However, it is likely that Nx may vary across the sample, in which case the distribution would not clump around the i/Nx values. We have shown (by simulation) that the model is robust against moderate variations in Nx, provided we take Nx to be an idealized harmonic mean of the individual values in the sample.

Acknowledgements

We gratefully acknowledge support from the New Zealand Centres of Research Excellence Fund. We thank the following people who assisted in the development of this paper: J. Esins, D. M. Lambert, C. D. Millar, D. Penny, and K. Schliep. We thank the editor and referees for their very helpful comments in improving the presentation.

Footnotes

One contribution of 11 to a Special Feature on ‘Whole organism perspectives on understanding molecular evolution’.

Supplementary Material

Modelling mitochondrial site heteroplasmies to infer the number of segregating units and mutation rate—Appendix

Mathematical proofs of lemmas supporting equation (2.4) and the Bayesian Analysis used to estimate the mitochondrial mutation rate in Adélie penguins

rsbl20090104s20.pdf (96KB, pdf)

References

  1. Birky C.W. Evolution and population genetics of organelle genes. In: Selander R.K., Clark A.G., Whittam T.S., editors. Evolution at the molecular level. Sinauer; Sunderland, MA: 1991. pp. 112–134. (See also pp. 202–221.) [Google Scholar]
  2. Brandstätter A., Niederstätter H., Parson W. Monitoring the inheritance of heteroplasmy by computer-assisted detection of mixed base-calls in the entire human mitochondrial control region. Int. J. Legal Med. 2004;118:47–54. doi: 10.1007/s00414-003-0418-z. doi:10.1007/s00414-003-0418-z [DOI] [PubMed] [Google Scholar]
  3. Ewens W.J. Springer; New York, NY: 2004. Mathematical population genetics. [Google Scholar]
  4. Howell N., Smejkal C.B., Mackey D.A., Chinnery P.F., Turnbull D.M., Herrnstadt C. The pedigree rate of sequence divergence in the human mitochondrial genome: there is a difference between phylogenetic and pedigree rates. Am. J. Hum. Genet. 2003;72:659–670. doi: 10.1086/368264. doi:10.1086/368264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Lambert D.M., Ritchie P.A., Millar C.D., Holland B., Drummond A.J., Baroni C. Rates of evolution in ancient DNA from Adélie penguins. Science. 2002;295:2270–2273. doi: 10.1126/science.1068105. doi:10.1126/science.1068105 [DOI] [PubMed] [Google Scholar]
  6. Millar C.D., Dodd A., Anderson J., Gibb G.C., Ritchie P.A., Baroni C., Woodhams M., Hendy M.D., Lambert D.M. Mutation and evolutionary rates in Adélie penguins from the Antarctic. PLoS Genet. 2008;4:e000209. doi: 10.1371/journal.pgen.1000209. doi:10.1371/journal.pgen.1000209 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Parsons T.J., et al. A high observed substitution rate in the human mitochondrial DNA control region. Nat. Genet. 1997;15:363–368. doi: 10.1038/ng0497-363. doi:10.1038/ng0497-363 [DOI] [PubMed] [Google Scholar]
  8. Rand D.M., Dorfsman M., Kann L.M. Neutral and non-neutral evolution of Drosophila mitochondrial DNA. Genetics. 1994;138:741–756. doi: 10.1093/genetics/138.3.741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Santos C., Monteil R., Sierra B., Bettencourt C., Fernandez E., Alvarez L., Lima M., Abade A., Aluja M.P. Understanding differences between phylogenetic and pedigree-derived mtDNA mutation rate: a model using families from the Azores Islands (Portugal) Mol. Biol. Evol. 2005;22:1490–1505. doi: 10.1093/molbev/msi141. doi:10.1093/molbev/msi141 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Modelling mitochondrial site heteroplasmies to infer the number of segregating units and mutation rate—Appendix

Mathematical proofs of lemmas supporting equation (2.4) and the Bayesian Analysis used to estimate the mitochondrial mutation rate in Adélie penguins

rsbl20090104s20.pdf (96KB, pdf)

Articles from Biology Letters are provided here courtesy of The Royal Society

RESOURCES