Modeling neutral evolution of Alu elements using a branching process

Marek Kimmel; Matthias Mathaes

doi:10.1186/1471-2164-11-S1-S11

. 2010 Feb 10;11(Suppl 1):S11. doi: 10.1186/1471-2164-11-S1-S11

Modeling neutral evolution of Alu elements using a branching process

Marek Kimmel ^1,^2,^✉, Matthias Mathaes ¹

PMCID: PMC2822525 PMID: 20158868

Abstract

Background

Alu elements occupy about eleven percent of the human genome and are still growing in copy numbers. Since Alu elements substantially impact the shape of our genome, there is a need for modeling the amplification, mutation and selection forces of these elements.

Methods

Our proposed theoretical neutral model follows a discrete-time branching process described by Griffiths and Pakes. From this model, we derive a limit frequency spectrum of the Alu element distribution, which serves as the theoretical, neutral frequency to which real Alu insertion data can be compared through statistical goodness of fit tests. Departures from the neutral frequency spectrum may indicate selection.

Results

A comparison of the Alu sequence data, obtained by courtesy of Dr. Jerzy Jurka, with our model shows that the distributions of Alu sequences in the AluY family systematically deviate from the expected distribution derived from the branching process.

Conclusions

This observation suggests that Alu sequences do not evolve neutrally and might be under selection.

Introduction and background

Human genome is a result of 10⁹years of evolution. It is very complex and in some respects it is still evolving. This contribution concerns evolution of the so-called Alu elements, which are movable sequences of DNA, very abundant in the human genome. We present a mathematical random process, the Griffiths-Pakes discrete-time branching process with infinite-allele mutations, which is almost ideally suited for modeling of Alu elements proliferation. For the biologically important special case of the linear-fractional offspring distribution we derive semi-explicit expressions for the expected frequency spectra of classes of alleles existing in a given number of copies (an analogue of the Ewens sampling formula). We compare the outcome with Alu-element statistics data.

Alu repeat sequences

Background on Alus

Alu elements belong to the group of transposable or mobile elements, which occupy nearly 45% of the human genome [1]. Within this group of transposable and also highly repetitive elements, LINEs (Long INterspersed Elements) and SINEs (Short INterspersed Elements) form the two largest groups. They occupy 21% and 13% of the human genome respectively [2]. Whereas the LINEs are dominated by L1 elements, the largest and hence most studied group of the SINEs is comprised of the Alu elements. While many transposable elements are present in all eukaryotic genomes, Alu elements appear only in mammals. A typical full-length Alu sequence is approximately 300 bp long. Alu sequences amplify by retrotransposition, also known as "the copy and paste" mechanism. At present it is estimated that more than one million copies of Alu elements occupy about eleven percent of the human genome, and the number of elements seems to be growing [1].

Alu elements are non-autonomous and seem to have to use the L1 elements' tools for retrotransposition. It has been hypothesized that L1 endonuclease causes a nick at the TTAAAA consensus site, after which Alu anneals directly to the site of integration [3]; then a second nick on the other strand completes the insertion. These two staggered nicks introduce an identifiable characteristic of Alu elements. The newly inserted Alu element is surrounded by an identical set of direct repeats, which are also called target site duplications (TSDs). These direct repeats range from 10 to 15 bp and are considered the prevalent feature of retrotranspositional insertion [4]. This process of integration, also known as target-primed reverse transcription (TPRT) [5,6], is responsible for the successful amplification of Alu elements. At present it is estimated that more than one million copies of Alu elements occupy about eleven percent of the human genome, and the number of elements seems to be growing [1].

Based on diagnostic mutations, Alu elements are divided into subfamilies. The three major families of Alu sequences are J, S and Y. The letters are chosen in alphabetical order to convey the different ages of each family. Alu sequences in the J family are the oldest, while Alu sequences in the Y family are the youngest. The most interesting family in the current research of Alu elements is the Y family, which contains the youngest and most active Alu elements [7]. Due to their recent integration, 25 percent of their loci are still polymorphic [1]. An Alu locus is defined to be polymorphic if some individuals have an Alu element at that particular location while others do not. These polymorphic loci can be used as genetic markers for disease association studies.

Unlike Single-Nucleotide Polymorphisms (SNPs) Alu markers are small in numbers, but they are identical by descent and essentially homoplasy-free markers and their ancestral state, which is defined by their absence from a specific locus, is always known. Polymorphic Alu loci have been used in genetic diversity studies, forensic studies and disease association studies [8,9]. Alu insertions have influenced the architecture of human genome by duplication, deletion, inversion, transduction and translocation [10]. Alu elements frequently appear in introns, 3' untranslated regions of genes, and intergenic genomic regions [11]. Alu insertions act as insertional mutagens and are responsible for 0.5 percent of human genetic disorders [12]. Almost all these diseases are caused by Alu elements from the youngest subfamilies [6]. For a comprehensive list of AluY disease loci and their associated diseases, one can also consult [6]. Their summary of Alu insertion induced diseases includes neurofibromatosis, hemophilia A and B, Huntington disease and Apert syndrome. Deiniger and Batzer [12] attribute diseases such as insulin-resistant diabetes type II, Lesch-Nyhan syndrome, Tay-Sachs disease, familial hypercholesterolaemia and -thalassaemia to Alu-mediated recombination. Additionally, several types of cancer, including Ewing sarcoma, breast cancer and leukemia are shown to be caused by Alu elements [1,12].

Alu sequence data used in this study

Dr. Jerzy Jurka of the Genetic Information Research Institute (GIRI) kindly provided Alu sequence data for our analysis. All Alu subfamilies were extracted from the March 2006 assembly of the USCS Human Genome database. Only recognizable full-length Alu sequences were retained for analysis. Overall, Alu sequences for nine different Alu subfamilies were extracted from the USCS reference genome: AluYa1, AluYa4, AluYa5, AluYa8, AluYb8, AluYc1, AluYd2, AluYe2, and AluYe5.

The goal was to extract Alu sequences that belonged to relatively large subfamilies (more than 1000 sequences), such as AluYa1, AluYa4, AluYb8, AluYc1, and AluYe2. For each subfamily, a consensus or reference Alu sequence was used to screen the entire human genome for matching sequences. A match occurred when stretches of nucleotides that include the main diagnostic mutations agreed with the Alu subfamily consensus sequences. Since the insertion mechanism of an Alu element introduces large differences in their poly-A tails, these need to be deleted from analysis. Dr. Jurka provided the Alu sequence data with poly-A tails already deleted.

Alu sequences contain the middle A-stretch, another highly variable region similar to the poly-A tail, which lies between the two monomers that constitute an Alu sequence, and can be considered the A-tail of the first monomer. To accurately delete the middle-A stretch, it is necessary to align the Alu sequences for each subfamily. A consensus sequence for each subfamily was obtained from Repbase [13], a database of repetitive elements, which is maintained by GIRI. In each subfamily, pairwise alignment of each Alu sequence in the subfamily with the Repbase consensus sequences, was performed using ClustalW [14]. MEGA4 software [15] was used to display the alignments including the middle-A stretch. After deleting the middle-A stretch, the average length of an Alu sequence is about 260 base pairs.

Following preparatory steps described above, we obtained the counts of Alu sequences that had n identical copies in the sample, for n = 1, 2, 3,.... To obtain these counts for each Alu subfamily, a program was written in R-language. These counts or corresponding percentages represent final data, which were tested against the theoretical distribution based on the branching process model.

Results and discussion

Maximum-likelihood fits

To fit the branching process model to the Alu sequence data, we use the maximum likelihood method. The highest value of the likelihood determines the estimates for our parameters. Since the log-likelihood of the sample does not exist in a closed form, we evaluate it numerically. For these runs we set the value of the probability of mutation at μ = 10^-6. Sensitivity of the outcome to variation in parameter μ is very slight as long as this parameter is small (such as 10^-5- 10^-9per division).

Figures 1, 2, 3, 4 depict the maximum-likelihood fits of the model to the data from AluYa1, AluYa5, AluYb8 and AluYc1 subfamilies, respectively. They are presented in the semi-logarithmic scale, to amplify the tail probabilities. The graphical comparison demonstrates that the data fit relatively well for allele classes 1 and 3 - 7. Notably, the allele class 2 shows the worst fit among the first seven allele classes. These seven classes account for at least 0.99 cumulative class frequency observed in the data.

**AluYa1 data-based class frequencies against the theoretical {Ψ_k} in log scale**. Fitted by Griffiths-Pakes process with linear-fractional distribution, with b = 0.016, p = 0.983.

**AluYa5 data-based class frequencies against the theoretical {Ψ_k} in log scale**. Fitted by Griffiths-Pakes process with linear-fractional distribution, with b = 0.139, p = 0.861.

**AluYb8 data-based class frequencies against the theoretical {Ψ_k} in log scale**. Fitted by Griffiths-Pakes process with linear-fractional distribution, with b = 0.143, p = 0.856.

**AluYc1 data-based class frequencies against the theoretical {Ψ_k} in log scale**. Fitted by Griffiths-Pakes process with linear-fractional distribution, with b = 0.035, p = 0.965.

Simulation-based test

Testing for significance of the differences between the theoretical and observed frequencies of allele classes is in our case made difficult by the dominance of class 1 frequency. This causes that, with finite sample sizes, counts of alleles in the further classes are quite low (see Table 1). Therefore the usual restrictions for minimum number of observations and minimum number of classes in the χ²test (as well as in other usual tests for distribution comparisons) are met only in two data sets. For this reason, we resort to a simulation-based approach.

Table 1.

Frequencies of alleles (mutant types) with j copies (class j alleles). Classes with j > 20 have been omitted.

Number of copies	1	2	3	4	5	6	7	8	9	10	11	12	13	14	16	18
Ya1	3761	25	2	4	1	1
Ya4	426	6	2	2	1									1
Ya5	1722	75	15	16	11	10	5	4	2	5	1	1	2	1		2
Ya8	28	3							1
Yb8	1489	71	11	15	8	9	4	4	1		1		1	1	1
Yc1	3162	42	9	4	1	1	1		1
Yd2	401	1
Ye2	1130	3	1
Ye5	853	10	7	2	1
All	12970	237	47	43	23	21	10	8	5	5	2	1	3	3	1	2

Open in a new tab

In this approach we use the χ²statistic as our test statistic. We start with randomly drawing a sample of size n from the uniform distribution U(0, 1), where n is equal to the number of Alu sequences in each subfamily. Based on the distribution {Ψ_j} produced by our fitted model we obtain the probabilities of Alu sequences with j = 1, 2, 3,... copies. These probabilities determine the bin a random draw from the uniform distribution will be placed in. Repeating this n times results in a distribution of counts per bin. From this simulated distribution of counts we compute the χ²statistic by using the expected counts under our fitted model and the counts from the simulated (observed) approach. For each Alu subfamily, this process was repeated 100, 000 times. The χ²values were sorted and then plotted to display their distribution. When comparing the χ²statistic from the actual data to the simulated χ²statistics for each subfamily (Table 2), it becomes apparent that the data produce a very high χ²value, which is highly unlikely under the proposed model. The two Alu subfamilies that have χ²values less than the maximum simulated χ²values are AluYa8 and AluYd2. A closer look, however, reveals that both of the χ²values for the AluYa8 and AluYd2 subfamilies are among the ten highest values in the sample of 100, 000 simulated χ²values (crude p-value of 10^-4). The large χ²statistic for our data is mostly due to the difference between the observed and the expected counts in bin number 2 (Alu sequences with 2 copies) and in the combined bins of the tail. We notice it as a systematic departure.

Table 2.

Sample-based vs. simulation-based χ²statistic for the Alu subfamilies considered

Alu	Sample-based χ²statistic	Maximum of the simulation-based χ²statistics
Ya1	1118.072	22.72153
Ya4	99.01778	19.06578
Ya5	338.4337	12.60890
Ya8	15.82118	23.83412
Yb8	274.0389	14.98851
Yc1	908.3557	15.46991
Yd2	224.6298	225.5328
Ye2	421.1556	89.49312
Ye5	169.5426	20.00121

Open in a new tab

For each subfamily 100, 000 simulations were performed.

Conclusion

The current study seems to constitute the first application of the Griffiths-Pakes process to biological data. The outcome is interesting in the sense that a generally plausible fit is obtained to the Alu element frequency distribution. It is not quite clear, why the fit fails worst at the frequency class 2. This may be influenced by initial steps of data preparation. If a region containing a a relatively frequent variant were removed so that sequences could be aligned, some unique variants might migrate to class 2. Another possibility is that the difference is caused by a departure from neutrality in Alu evolution.

We should notice that the current model does not involve genetic drift. In reality, the genomes evolve within individuals and properly, the branching process should have been embedded in a population genetic model of Wright-Fisher or Moran type. However, this would lead to enormous complications. As an additional exercise, we attempted to fit the Alu class frequencies by the classical Ewens sampling formula, using a test developed by Slatkin [16], but the fit is rather bad.

Methods

Discrete branching process of Griffiths and Pakes with infinite allele mutations

Branching processes have been widely used in modeling cell population dynamics. An insertion of an Alu sequence into a new genomic location can be considered a proliferation process not dissimilar from cell division. Therefore proliferation and mutation of Alu sequences can be described in a mathematical way using a branching process. The branching process has to account for the fact that Alu sequences are still growing in numbers in the human genome. Therefore we focus on the supercritical branching processes, in which the expected number of offspring is greater than one (m > 1). One interesting model prediction, which can be compared to data is how many different Alu sequences occur in each Alu subfamily or more specifically how many Alu alleles with frequency j exist in each subfamily. Based on a discrete-time branching process with infinite allele mutations, Griffiths and Pakes [17] derived a limit result for the expected proportion of alleles having frequencies in j.

Griffiths and Pakes [17] process is a modification of the standard Bienayme-Galton-Watson branching process to allow individuals infinitely many possible identifiable types. In our application, the types are alleles (variants) of the Alu sequence identified by specific point mutations. From time t = 0, a non-mutant clone of particles is evolving in time according to a single-type branching process (Figure 5). With probability μ per time step, a particle mutates and initates a clone of new previously nonexistent type, which evolves according to the same rules as the original non-mutant clone. As a result, a set of clones of different types emerges, spawning further clones, some of which may die out. We are interested in deriving, using Griffiths-Pakes [17] theory, expected frequencies of allele classes such that allele is in class k if it exists in k copies, for a specific biologically justifiable version of the process.

**Griffiths-Pakes branching process with infinite-allele mutations**. A non-mutant clone of particles is evolving in time according to a single-type branching process (in our case, time dicrete). With probability μ per time step, a particle mutates and initates a clone of new previously nonexistent type, which evolves according to the same rules as the original non-mutant clone. As a result, a set of clones of different types emerges, spawning further clones, some of which may die out. Upper panel: low μ; lower panel: high μ.

The number of individuals at t = 0 is defined as Z₀= i. Let G_nbe the collection of individuals in generation n and let Z_ndenote their number. Each generation size depends on the previous generation size through the branching property

where ξ_j,nare independent identically distributed (iid) integer-valued random variables, which represent the number of offspring born to the j^thmember of G_n. The distribution of ξ_j,nis characterized by its probability generating function (pgf)

where p_k= P[ξ_j,n= k], and it is assumed that p₀+ p₁<1, i.e., the branching process is nontrivial. We have m = f'(1).

If an individual produces j offspring then the number of progeny having the parental allele is distributed binomially with parameters j and 1 - μ, hence its pgf is equal to (μ + (1 - μ)s)^j. This implies that any new allele is followed by a branching process of its like-type descendants with offspring pgf H(s) = f(μ + (1 - μ)s). This process is supercritical if its expected progeny count M = m(1 - μ) is greater than 1. Within this framework let us define the symbol Inline graphic , where H^(r)(s) is the rth iterate of pgf H(s), to be equal to the probability that there are j individuals at time r in a nonmutant clone started at time 0 by a single individual. Let us denote Ψ_jthe long-term expected proportion of alleles with frequency j ≥ 1, which is the formula that we will use to compute the theoretical distribution of Alu allele classes for given offspring pgfs. Asymptotically, these proportions assume the form (based on Griffiths and Pakes [17], detailed derivation in [18]

Linear fractional offspring distribution

The process of creation of new Alu repeats by retrotransposition can be naturally described by the age dependent Markov branching process {Z_t} (i.e., process with exponentially distributed individuals' lifelengths) with binary fission, which leads to a quadratic pgf of progeny number per individual. The rationale is that any existing Alu ("individual") from an active family produces two progeny (i.e., itself and a replica) at a random time time moment, where "random" means that the intervals between successive fission events are independent, identically distributed random variables. Moreover, the copy may fail to reinsert into the genome. Therefore, the form of the progeny count pgf will be αs²+ (1 - α)s, where α is the probability of successful reinsertion. If such process is sampled at constant time intervals, the resulting discrete-time process {Z_kΔt} is a Galton-Watson branching process with linear fractional pgf ([19], expression (4.14), also c.f. [20]). A unique property of the linear fractional case of the Galton-Watson process, excluding the trivial case f(s) = ps + q, is that the iterations of the pgf can be computed explicitly and also are of linear fractional form. Let us start with the offspring pgf in the linear fractional case:

The probability distribution corresponding to this generating function is:

graphic file with name 1471-2164-11-S1-S11-i6.gif

The parameters b and p are subject to certain restrictions,

To ensure that this process is supercritical, i.e., m > 1, additional constraints on b and p are needed. The mean of f(s) is Inline graphic , so supercriticality yields an additional restriction on parameters b and p, b > (1- p)², or equivalently

To be more precise, we should satisfy condition m(1 - μ) > 1, but with μ very close to 0, the distinction is not important. As demonstrated in [18], for the linear-fractional case, we obtain the following computable expression

graphic file with name 1471-2164-11-S1-S11-i10.gif

The infinite sums in the numerator and denominator are numerically computed. A program was written in R-language to compute the Ψ_j. Since Alu sequence data in Table 1 suggest a high value for Ψ₁, we verify that the theoretical Ψ₁attains such values for any choices of parameters b, p, and μ. For fixed μ = 10^-6, we established a grid of b and p from 0 to 1 in steps of 0.01. Figure 6 shows that Ψ₁can assume any value between 0 and 1, and that high values of Ψ₁occur for a combination of low values of b and high values of p.

**Contour plot illustrating the influence of parameters b and p on Ψ₁, based on Griffiths-Pakes process with linear-fractional distribution**. Red: large Ψ₁; blue: small Ψ₁. Range of Ψ₁-values, from 0 through 1.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

MK conceived the study. MK and MM jointly derived the mathematical model equations. MM prepared the Alu element data for model fitting carried out sequence alignment and statistical testing. MK drafted the manuscript with MM's help. All authors read and approved the final manuscript.

Contributor Information

Marek Kimmel, Email: kimmel@rice.edu.

Matthias Mathaes, Email: matze@rice.edu.

Acknowledgements

The authors acknowledge the generous contribution of Dr. Jerzy Jurka of the Genetic Information Research Institute (GIRI) who kindly supplied all Alu sequences employed in this study, together with invaluable advice. Anthony Pakes of the School of Mathematics and Statistics of the University of Western Australia provided helpful comments concerning the Griffiths-Pakes process.

This article has been published as part of BMC Genomics Volume 11 Supplement 1, 2010: International Workshop on Computational Systems Biology Approaches to Analysis of Genome Complexity and Regulatory Gene Networks. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/11?issue=S1.

References

Batzer M, Deininger P. Alu repeats and human genomic diversity. Nature Reviews - Genetics. 2002;3:371. doi: 10.1038/nrn811. [DOI] [PubMed] [Google Scholar]
Lander E, Linton L, Birren B, Nusbaum C, Zody M, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
Jurka J. Sequence patterns indicate an enzymatic involvement in integration of mammalian retroposons. Proc Natl Acad Sci USA. 1997;94:1872–1877. doi: 10.1073/pnas.94.5.1872. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cordaux R, Srikanta D, Lee J, Stoneking M, Batzer M. In search of polymorphic Alu insertions with restricted geographic distributions. Genomics. 2007;90:154–158. doi: 10.1016/j.ygeno.2007.03.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zingler N, Willhoeft U, Brose H, Schoder V, Jahns T, Hanschmann K, Morrish T, Lower J, Schumann G. Analysis of 5'junctions of human LINE-1 and Alu retrotransposons suggests an alternative model for 5'-end attachment requiring microhomology-mediated end-joining. Genome Research. 2005;15(6):780. doi: 10.1101/gr.3421505. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ostertag E, HK Jr. Biology of mammalian L1 Retrotransposons. Annual Reviews in Genetics. 2001;35:501–538. doi: 10.1146/annurev.genet.35.102401.091032. [DOI] [PubMed] [Google Scholar]
Mills R, Bennett E, Iskow R, Devine S. Which transposable elements are active in the human genome? Trends in Genetics. 2007;23(4):183–191. doi: 10.1016/j.tig.2007.02.006. [DOI] [PubMed] [Google Scholar]
Tishkoff S, Pakstis A, Stoneking M, Kidd J, Destro-Bisol G, Sanjantila A, Lu R, Deinard A, Sirugo G, Jenkins T. Short tandemrepeat polymorphism/alu haplotype variation at the PLAT locus: implications for modern human origins. Am J Hum Genet. 2000;67(4):901–925. doi: 10.1086/303068. [DOI] [PMC free article] [PubMed] [Google Scholar]
Watkins W, Rogers A, Ostler C, Wooding S, Bamshad M, Brassington A, Carroll M, andJAWalker SN, Prasad B. Genetic Variation Among World Populations: Inferences From 100 Alu Insertion Polymorphisms. Genome Research. 2003;13(7):1607–1618. doi: 10.1101/gr.894603. [DOI] [PMC free article] [PubMed] [Google Scholar]
Han K, Xing J, Wang H, Hedges D, Garber R, Cordaux R, Batzer M. Under the genomic radar: The Stealth model of Alu amplification. Genome Research. 2005;15(5):655. doi: 10.1101/gr.3492605. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carroll M, Roy-Engel A, Nguyen S, Salem A, Vogel E, Vincent B, Myers J, Ahmad Z, Nguyen L, Sammarco M. Large-scale analysis of the Alu Ya5 and Yb8 subfamilies and their contribution to human genomic diversity. J Mol Biol. 2001;311:17–40. doi: 10.1006/jmbi.2001.4847. [DOI] [PubMed] [Google Scholar]
Deininger P, Batzer M. Alu Repeats and Human Disease. Molecular Genetics and Metabolism. 1999;67(3):183–193. doi: 10.1006/mgme.1999.2864. [DOI] [PubMed] [Google Scholar]
Jurka J, Kapitonov V, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;150(1-4):462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]
Larkin M, Blackshields G, Brown N, Chenna R, McGettigan P, McWilliam H, Valentin F, Wallace I, Wilm A, Lopez R. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23(21):2947. doi: 10.1093/bioinformatics/btm404. [DOI] [PubMed] [Google Scholar]
Tamura K, Dudley J, Nei M, Kumar S. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) Software Version 4.0. Molecular Biology and Evolution. 2007;24(8):1596. doi: 10.1093/molbev/msm092. [DOI] [PubMed] [Google Scholar]
Slatkin M. An exact test for neutrality based on the Ewens sampling distribution. Genetical Research. 1994;64:71–74. doi: 10.1017/S0016672300032560. [DOI] [PubMed] [Google Scholar]
Griffiths R, Pakes A. An infinite-alleles version of the simple branching process. Advances in Applied Probability. 1988;20(3):489–524. doi: 10.2307/1427033. [DOI] [Google Scholar]
Kimmel M, Mathaes M. Griffiths-Pakes branching process as a model for evolution of Alu elements. Modeling neutral evolution of Alu elements using a branching process, Proceedings of the Badajoz Branching Processes Workshop, to appear. 2010. [DOI] [PMC free article] [PubMed]
Kimmel M, Axelrod D. Branching Processes in Biology. Springer; 2001. [Google Scholar]
Athreya K, Ney P. Branching Processes. Dover Publications; 2004. [Google Scholar]

[B1] Batzer M, Deininger P. Alu repeats and human genomic diversity. Nature Reviews - Genetics. 2002;3:371. doi: 10.1038/nrn811. [DOI] [PubMed] [Google Scholar]

[B2] Lander E, Linton L, Birren B, Nusbaum C, Zody M, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W. Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]

[B3] Jurka J. Sequence patterns indicate an enzymatic involvement in integration of mammalian retroposons. Proc Natl Acad Sci USA. 1997;94:1872–1877. doi: 10.1073/pnas.94.5.1872. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] Cordaux R, Srikanta D, Lee J, Stoneking M, Batzer M. In search of polymorphic Alu insertions with restricted geographic distributions. Genomics. 2007;90:154–158. doi: 10.1016/j.ygeno.2007.03.010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] Zingler N, Willhoeft U, Brose H, Schoder V, Jahns T, Hanschmann K, Morrish T, Lower J, Schumann G. Analysis of 5'junctions of human LINE-1 and Alu retrotransposons suggests an alternative model for 5'-end attachment requiring microhomology-mediated end-joining. Genome Research. 2005;15(6):780. doi: 10.1101/gr.3421505. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] Ostertag E, HK Jr. Biology of mammalian L1 Retrotransposons. Annual Reviews in Genetics. 2001;35:501–538. doi: 10.1146/annurev.genet.35.102401.091032. [DOI] [PubMed] [Google Scholar]

[B7] Mills R, Bennett E, Iskow R, Devine S. Which transposable elements are active in the human genome? Trends in Genetics. 2007;23(4):183–191. doi: 10.1016/j.tig.2007.02.006. [DOI] [PubMed] [Google Scholar]

[B8] Tishkoff S, Pakstis A, Stoneking M, Kidd J, Destro-Bisol G, Sanjantila A, Lu R, Deinard A, Sirugo G, Jenkins T. Short tandemrepeat polymorphism/alu haplotype variation at the PLAT locus: implications for modern human origins. Am J Hum Genet. 2000;67(4):901–925. doi: 10.1086/303068. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] Watkins W, Rogers A, Ostler C, Wooding S, Bamshad M, Brassington A, Carroll M, andJAWalker SN, Prasad B. Genetic Variation Among World Populations: Inferences From 100 Alu Insertion Polymorphisms. Genome Research. 2003;13(7):1607–1618. doi: 10.1101/gr.894603. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] Han K, Xing J, Wang H, Hedges D, Garber R, Cordaux R, Batzer M. Under the genomic radar: The Stealth model of Alu amplification. Genome Research. 2005;15(5):655. doi: 10.1101/gr.3492605. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] Carroll M, Roy-Engel A, Nguyen S, Salem A, Vogel E, Vincent B, Myers J, Ahmad Z, Nguyen L, Sammarco M. Large-scale analysis of the Alu Ya5 and Yb8 subfamilies and their contribution to human genomic diversity. J Mol Biol. 2001;311:17–40. doi: 10.1006/jmbi.2001.4847. [DOI] [PubMed] [Google Scholar]

[B12] Deininger P, Batzer M. Alu Repeats and Human Disease. Molecular Genetics and Metabolism. 1999;67(3):183–193. doi: 10.1006/mgme.1999.2864. [DOI] [PubMed] [Google Scholar]

[B13] Jurka J, Kapitonov V, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;150(1-4):462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]

[B14] Larkin M, Blackshields G, Brown N, Chenna R, McGettigan P, McWilliam H, Valentin F, Wallace I, Wilm A, Lopez R. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23(21):2947. doi: 10.1093/bioinformatics/btm404. [DOI] [PubMed] [Google Scholar]

[B15] Tamura K, Dudley J, Nei M, Kumar S. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) Software Version 4.0. Molecular Biology and Evolution. 2007;24(8):1596. doi: 10.1093/molbev/msm092. [DOI] [PubMed] [Google Scholar]

[B16] Slatkin M. An exact test for neutrality based on the Ewens sampling distribution. Genetical Research. 1994;64:71–74. doi: 10.1017/S0016672300032560. [DOI] [PubMed] [Google Scholar]

[B17] Griffiths R, Pakes A. An infinite-alleles version of the simple branching process. Advances in Applied Probability. 1988;20(3):489–524. doi: 10.2307/1427033. [DOI] [Google Scholar]

[B18] Kimmel M, Mathaes M. Griffiths-Pakes branching process as a model for evolution of Alu elements. Modeling neutral evolution of Alu elements using a branching process, Proceedings of the Badajoz Branching Processes Workshop, to appear. 2010. [DOI] [PMC free article] [PubMed]

[B19] Kimmel M, Axelrod D. Branching Processes in Biology. Springer; 2001. [Google Scholar]

[B20] Athreya K, Ney P. Branching Processes. Dover Publications; 2004. [Google Scholar]

PERMALINK

Modeling neutral evolution of Alu elements using a branching process

Marek Kimmel

Matthias Mathaes

Supplement

Conference