Abstract
The models for the mosaic structure of an individual’s genome from multiparental populations have been developed primarily for autosomes, whereas X chromosomes receive very little attention. In this paper, we extend our previous approach to model ancestral origin processes along two X chromosomes in a mapping population, which is necessary for developing hidden Markov models in the reconstruction of ancestry blocks for X-linked quantitative trait locus mapping. The model accounts for the joint recombination pattern, the asymmetry between maternally and paternally derived X chromosomes, and the finiteness of population size. The model can be applied to various mapping populations such as the advanced intercross lines (AIL), the Collaborative Cross (CC), the heterogeneous stock (HS), the Diversity Outcross (DO), and the Drosophila synthetic population resource (DSPR). We further derive the map expansion, density (per Morgan) of recombination breakpoints, in advanced intercross populations with L inbred founders under the limit of an infinitely large population size. The analytic results show that for X chromosomes the genetic map expands linearly at a rate (per generation) of two-thirds times 1 – 10/(9L) for the AIL, and at a rate of two-thirds times 1 – 1/L for the DO and the HS, whereas for autosomes the map expands at a rate of 1 – 1/L for the AIL, the DO, and the HS.
Keywords: advanced intercross lines (AIL), Collaborative Cross (CC), Diversity Outcross (DO), Drosophila synthetic population resource (DSPR), map expansion, MPP, multiparental populations, Multiparent Advanced Generation Inter-Cross (MAGIC)
There have been recently designed quantitative trait locus (QTL) mapping populations with either multiple parents to increase the genetic diversity of the founder population, or many intercross generations to improve the mapping resolution by accumulating historical recombination events. Some examples include the Collaborative Cross (CC) (Churchill et al. 2004), the advanced intercross lines (AIL) (Darvasi and Soller 1995), the heterogeneous stock (HS) (Mott et al. 2000), the diversity outcross (DO) (Svenson et al. 2012), and the Drosophila synthetic population resource (DSPR) (King et al. 2012). The CC can be regarded as a set of eight-way recombinant inbred lines (RIL) by sibling mating, where eight founders of each line are permuted.
The genomes of individuals in QTL mapping populations are random mosaics of the founders’ genomes. The QTL mapping generally necessitates the reconstruction of these genome blocks along two homologous chromosomes of a sampled individual from available genotype data. Such reconstruction is often performed under a hidden Markov model (HMM) with the latent state being the pair of ancestral origins at a locus, where the transition probability of ancestral origins between two loci, or the two-locus diplotype (two-haplotype) probabilities are required.
Modeling ancestral origins along a pair of autosomal chromosomes has been well developed recently. Broman (2012a) extended the approach of Haldane and Waddington (1931) from the two-way to four- and eight-way RIL by sibling mating and provided recipes for calculating autosomal two-locus diplotype probabilities numerically. Johannes and Colome-Tatche (2011) derived autosomal two-locus diplotype probabilities for the two-way RIL by selfing. Zheng et al. (2014) described a general modeling framework for ancestral origins that can be applied to autosomes in various mapping populations such as the RIL by selfing or sibling mating and the AIL.
A special treatment is required for modeling ancestral origins along a pair of X chromosomes. Haldane and Waddington (1931) derived the recurrence relations of the X-linked two-locus diplotype probabilities for the two-way RIL by sibling mating and the bi-parental repeated parent-offspring mating, and their closed form solutions for the final homozygous lines. Broman (2005) extended the solutions to the two- and three-locus haplotype probabilities for the two, four, or eight-way RIL by sibling mating. Broman (2012b) derived the X-linked two-locus haplotype probabilities in advanced intercross populations including the AIL, the HS, and the DO, assuming an infinitely large population size.
In this paper, we extend our previous work (Zheng et al. 2014) to model the ancestral origins along a pair of X chromosomes in a finite mapping population. This extension also builds on the theory of junctions in inbreeding (Fisher 1949, 1954). A junction is defined as a boundary point of genome blocks on chromosomes where two distinct ancestral origins meet, and the boundary points that occur at the same location along multiple chromosomes are counted as a single junction. The map expansion is the expected junction density (per Morgan) on a maternally or paternally derived X chromosome, denoted by or , respectively. We denote by the overall junction density along the XX chromosomes of a female, and it can be used as a measure of X-linked QTL mapping resolution (Darvasi and Soller 1995; Weller and Soller 2004).
The key feature of this extension is to account for the asymmetry between maternally and paternally derived X chromosomes because the latter did not experience any crossover events with Y chromosomes. We first present a model framework for X-linked ancestral origins, where the recurrent relations are derived for various junction densities including the map expansions and . Then, we derive the closed form solutions for these expected densities in mapping populations including the RIL by sibling mating, the AIL, the HS, the DO, and the DSPR; they are evaluated by forward simulation studies. Lastly, we discuss the model assumptions and the implications of the analytic results on haplotype reconstructions and breeding designs.
A model for X-linked ancestral origins
Assumptions and notation
Consider a dioecious population with two separate sexes: homogametic females with sex chromosomes XX and heterogametic males with sex chromosomes XY. There are no recombination events between X and Y, and thus we ignore the pseudoautosomal regions on the XY chromosomes. As in most mammals and some insects (Drosophila), some flowering species, such as white campion (Silene latifolia), papaya (Carica papaya), and asparagus (Asparagus officianalis), have the XY sex determination system (Ming and Moore 2007). The dioecious population was founded in generation 0, and it has nonoverlapping generations. There are no natural or artificial selections since the founder population. The mating schemes of producing the next generation are random, and they may vary from one generation to the next. The assignments of offspring genders are assumed to be independent of mating schemes.
The ancestral origins along two homologous autosomes have been modeled as a continuous time Markov chain (CTMC) (Zheng et al. 2014). We extend the approach to account for the asymmetry of XX chromosomes, using superscript m (p) for maternally (paternally) derived genes or chromosomes. See Supporting Information, Table S1 for a list of symbols used in this paper. Let be the ordered pair of the ancestral origins at location x along the two X chromosomes of a randomly sampled female. The ancestral origin process is assumed to follow a CTMC, where x is the time parameter in unit of Morgan. We assign a unique ancestral origin to the X chromosomes of each inbred founder, or to each X chromosome of each outbred founder. Multiple genes, within or between loci, are identical by descent (IBD) if they have the same ancestral origins. Let L be the number of possible ancestral origins that or may take. L may be less than the number of inbred founders if some male founders did not produce daughters to pass down their X chromosomes. For example, for the four-way RIL by sibling mating since one of the founder mating pairs produces only one son (Figure 1A).
The L possible ancestral origins are assumed to be exchangeable, so that we focus on the changes of ancestral origins. See Figure 1B and the relevant part of Discussion on the exchangeability assumption. The initial distribution of at the leftmost locus is specified by , a probability that the two ancestral origins are the same (IBD) at a locus. Let be the non-IBD probability. Given either IBD or non-IBD at the locus, the ancestral origin pair takes each of the possible combinations with equal probability.
The transition rate matrix of the CTMC can be constructed from the expected densities of all the junction types along the two X chromosomes of a female. The junction type denotes the four-gene IBD configuration on both sides of a junction, where () is on the left-hand (right-hand) side, haplotype () is on the first (second) chromosome, and the same integers denote IBD. Figure 1C illustrates the seven types of junctions: , , , , , , and for , where the two types and do not exist for . We do not define junction types for the eight two-locus configurations , , , , , , , and , because there are either zero or no less than two junctions between the two loci. Figure 1D shows the transition rate matrix of the CTMC in the four-way RIL by sibling mating. Figure 1E shows the relationships between the expected densities and the transition rates, and they are derived based on the interpretation that is the two-locus diplotype probability, in the limit that the genetic distance (in Morgan) between two loci goes to zero.
The map expansions and and the overall expected junction density are given by
(1) |
(2) |
(3) |
similar to those for autosomes (Zheng et al. 2014) except that for X chromosomes. We have and , since the junction densities do not depend on the direction of chromosomes. In contrast to the single-locus two-gene non-IBD probability , the ordering of the superscripts in generally does matter, that is, except for the junction type . In addition, we have (see Figure 1C). Thus, the CTMC of X-linked ancestral origins can be described by one non-IBD probability and the five expected junction densities , , , , and , under the exchangeability assumption of the L possible ancestral origins.
Single-locus non-IBD probabilities
The calculation of the expected junction densities necessitates the introduction of the probabilities for the two- and three-gene IBD configurations at a single locus. All the following derivations of the recurrence relations for these probabilities are based on the Mendelian inheritance of X-linked genes: a paternally derived gene must be a copy of the maternally derived gene in a male of the previous generation, and a maternally derived gene has equal probability of being a copy of either the maternally derived gene or the paternally derived gene in a female of the previous generation.
In a dioecious mapping population, the single-locus two-gene probabilities of IBD configuration depend on whether or not the two homologous genes are in a single individual. Thus, we denote by , , and the two-gene probability of IBD configuration , given that the two homologous genes are in two distinct individuals in generation t and have parental origins , , and , respectively (Figure 2A); it holds that .
The recurrence relations of the two-gene non-IBD probabilities are derived by tracing the parental origins of two homologous genes from generation into the previous generation, and they are given by
(4a) |
(4b) |
(4c) |
(4d) |
where equation (4d) holds immediately after one generation of random mating, although it may not hold in the founder population at . In equation (4a), the first term on the right-hand side refers to the scenario that the two genes with parent origins in generation t come from a single female of the previous generation with the probability , and with probability that they come from different genes of the female. In equation (4b), the two genes with parental origins cannot merge because they must come from one male and one female of the previous generation. In equation (4c), the two genes with parental origins in generation t come from a single male of the previous generation with the probability ; if so, they must merge because there is only one X chromosome in a male.
We introduce the single-locus three-gene probabilities of IBD configuration . Let , , , and be the probabilities of IBD configuration , given that the three homologous genes are in three distinct individuals in generation t and have parental origins , , , and , respectively (Figure 2B). Similarly, we define and for three homologous genes in two distinct individuals. The ordering of the superscripts does not matter for these three-gene probabilities, for example, .
The recurrence relations of the three-gene non-IBD probabilities are derived by tracing the parental origins of three homologous genes from generation t ≥ 1 into the previous generation, and they are given by
(5a) |
(5b) |
(5c) |
(5d) |
(5e) |
(5f) |
where is the coalescence probability of three maternally derived genes in generation t that a particular pair of genes come from a single female of the previous generation and the third comes from another female of the previous generation, and similarly for three paternally derived genes. The equations (5e, 5f) hold immediately after one generation of random mating, although they may not hold in the founder population at .
The derivations of the recurrence equations (5a–5d) for the three-gene non-IBD probabilities are similar to equations (4a–4c) for the two-gene non-IBD probabilities. In equation (5a), the pre-factor 3 denotes that each of the three possible pairs of genes may come from a single female of the previous generation; the term is the probability that the three maternally derived genes in generation t come from three distinct females of the previous generation, and it is obtained by the probability that one pair of genes come from two distinct females minus the probability that the third gene and either gene of the pair come from a single female of the previous generation. Similarly, the term in equation (5d) is the probability that the three paternally derived genes in generation t come from three distinct males of the previous generation.
Expected junction densities
We derive the recurrence relations for , , , , and . The recurrence relation for follows from the theory of junctions (Fisher 1954): a new junction is formed whenever a recombination event occurs between two X chromosomes that are non-IBD at the location of a crossover. The recurrence relations for the map expansions and are given by
(6a) |
(6b) |
where equation (6b) follows directly from no recombination events occurring between the XY chromosomes in a male of the previous generation.
To measure differential map expansions between maternally and paternally derived chromosomes, we define and , and their recurrence relations are given by
(7a) |
(7b) |
according to the recurrence equations (6a, 6b). If there are equal numbers of males and females in the population, a randomly chosen X chromosome is maternally derived with probability , and it is paternally derived with probability . Thus can be interpreted as the map expansion on a randomly chosen X chromosome.
For comparisons, we denote by the map expansion on a random chosen autosome, and and its recurrence relation is given by (MacLeod et al. 2005; Zheng et al. 2014)
(8) |
where refers to the non-IBD probability between two homologous autosomal genes in an individual. The equations (7a, 8) show that the map expansion for an X chromosome is two-thirds for an autosome if the non-IBD probability for autosomes is the same as for XX chromosomes, and the sex ratio is 1.
In addition to and , we define , , , and for haplotypes and that are in two distinct individuals and have parental origins , , , and , respectively (Figure 2C). The contributions to the junctions in the current generation come from either the existing junctions at the previous generation, or a new junction via a crossover event. In the following, we focus on the formation of a new junction, because the contributions of the existing junctions in the previous generation are similar to those for the two-gene non-IBD probabilities in the recurrence equations (4a–4c).
The schematics of the recurrence relations for junction types and are shown in Figure S1. The ancestry transitions of type occur on both haplotypes and at exactly the same location, and thus a new junction of type can be formed only by duplicating a chromosome segment. It holds that and because of the symmetry of type . We have
(9a) |
(9b) |
(9c) |
(9d) |
for t ≥ 1, where equation (9d) may not hold in the founder population at , the first term on the right-hand side of equation (9a) refers to the scenario that both haplotypes and come from a single female of the previous generation, and the first term on the right-hand side of equation (9c) refers to the scenario that both haplotypes are the duplicated copies of the maternally derived X chromosome in a male of the previous generation (Figure S1A). According to equations (6a, 6b) and equations (9a–9d), the overall expected density in equation (3) does not depend on the three-gene non-IBD probabilities.
The ancestry transition of type occurs on haplotype . A new junction of type is formed whenever the two parental chromosomes of haplotype and the parental chromosome of haplotype are distinct and have the IBD configuration at the location of the crossover. We have
(10a) |
(10b) |
(10c) |
(10d) |
(10e) |
(10f) |
for , where equations (10e, 10f) may not hold in the founder population at . A new junction is formed at the rate (), given that the parental chromosome of haplotype is maternally (paternally) derived. The density in equation (10c) has no contributions of a new junction because there are no crossover events occurring between the XY chromosomes in the father of haplotype (Figure S1B). We denote by , and , and their recurrence relations are given in Appendix A. Both and measure the asymmetry between maternally and paternally derived X chromosomes.
Model evaluation by simulations
To evaluate the theoretical predications of non-IBD probabilities and expected junction densities, we perform simulation studies with the same model assumptions: random mating with discrete generations, no natural selections, and no genetic interferences, except that the ancestral origins along chromosomes do not follow Marker assumptions. Instead, the genome ancestral origins are simulated forwardly by first generating a pedigree according to a given breeding design, and then dropping on the pedigree the distinct founder genome labels (ancestral origins) that are assigned to the whole X chromosomes of each complete inbred founder or to each X chromosome of each outbred founder. The X chromosomes of each descendant gamete are specified as a list of the labeled segments determined by chromosomal crossovers.
For a mapping population with the particular breeding design, the realized junction densities and IBD probabilities are saved for all individuals in each generation in each simulation replicate, and they are averaged over in total replicates. Various mating schemes are used in simulating breeding pedigrees. We denote by RM1 the random mating where each sampling of two randomly chosen individuals with opposite genders produces one offspring, and RM2 the random mating where each sampling of two randomly chosen individuals with opposite genders produces two offspring. We combine these mating schemes with -NE if each parent contributes a Poisson distributed number of gametes to the next generation, and -E if each parent contributes exactly two gametes. Thus, we have four random mating schemes, RM1-NE, RM1-E, RM2-NE, and RM2-E. The sibling mating belongs to RM2-E with population size 2, and the exclusively pairing in -way (n ≥ 1) crosses can be regarded as a special case of random mating without inbreeding. The genders are assigned randomly, independent of mating schemes.
Application to QTL mapping populations
Multistage populations
For mapping populations with stage-wise constant mating schemes, we derive analytic expressions of the non-IBD probabilities and the expected junction densities for constructing CTMC of X-linked ancestral origins, according to the recurrence relations. The closed form solutions are obtained by linking results of each subsequent stage via the initial conditions. The general results for a population with constant random mating are derived in Appendix A, where three scenarios are considered: finite population of size ≥6, sibling-mating population of size 2, and large population of size »6. Table S2 gives the coalescence probabilities of X chromosomes for various mating schemes, similar to Table 1 of Zheng et al. (2014) for autosomes. Table S3 summarizes the results for X chromosomes in a sibling-mating population, and Table S4 for autosomes; they are necessary for dioecious breeding populations with a stage of inbreeding by sibling mating such as the CC and the DSPR. We use the superscripts of A denoting the quantities for autosomes.
Table 1. Results for X chromosomes in the -way RIL by sibling mating in the last generation , where for and for .
Quantity | Theoretical Prediction |
---|---|
(A) 2 ways sibling | |
0 | |
(B) ways sibling | |
We derive the analytic expressions of , , , , , and in the mapping populations of the RIL, the AIL, and the DO, and they are given in Table 1, Table 2, and Table 3, respectively. These results are necessary for constructing the CTMC of ancestral origins along the XX chromosomes of a female; only the expression of is needed for the maternal derived X chromosome of a male. For comparisons, the autosomal results for , , , and are included. The results for the AIL, the DO, and the DSPR are derived under the assumption of a large population size in the intercross stage. We evaluate this assumption in the DSPR, because the evaluation results hold similarly for the AIL and the DO. In addition, the map expansions and are given explicitly under the assumption of an infinitely large intercross population size, which may be used as a simple measure of QTL mapping resolution.
Table 2. Results for the AIL in the last generation .
Quantity | Theoretical Prediction |
---|---|
(A) X chromosomes | |
(B) Autosomes | |
The eigenvalues and for X chromosomes, and for autosomes and . AIL, advanced inter-cross lines.
Table 3. Results for the DO in the last generation .
Quantity | Theoretical Prediction |
---|---|
(A) X chromosomes | |
(B) Autosomes | |
The eigenvalues and for X chromosomes, and for autosomes and . DO, diversity outcross.
Many breeding populations can be divided into three stages: mixing, intercross, and inbreeding, such as the RIL by sibling mating, the CC, and the DSPR. There is no inbreeding stage for the AIL, the HS, and the DO. We denote by U the number of intercross generations, V the number of inbreeding generations, and N the intercross population size. Let and denote the random mating schemes for mixing and intercross stages, respectively. We choose the mixing stage to consist of one generation of random mating, so that the non-IBD probabilities and the expected junction densities in the population do not depend on whether genes or haplotypes are in distinct individuals.
The general derivation procedure is as follows. First, we derive the initial conditions in the population for the intercross stage, according to the genetic compositions of the founder population . Second, we substitute the obtained initial conditions into the theorems of Appendix A3 under the assumption of a large intercross population size. Alternatively, the theorems of Appendix A1 may be used for a finite intercross population. Lastly, if there is a stage of inbreeding by sibling mating, we substitute analytic expressions in the population into the theorems of Appendix A2 to obtain the results in the last generation .
RIL
The -way RIL by sibling mating can be regarded as a three-stage mapping population without the intercross stage for . All the founders are fully inbred, and the intercross mating scheme is exclusively pairing so that inbreeding is completely avoided. Thus , and the non-IBD probability during the intercross stage , where for and for . According to the recurrence equations (6a, 6b), it holds
(11a) |
(11b) |
and . Furthermore, it is straightforward to obtain , , , , and , where the indicator if and 0 otherwise, since the two maternally derived genes at must come from the inbred female founder for the two-way RIL.
Substituting the initial conditions in the population into Table S3, we obtain the results for the RIL in the last generation shown in Table 1. The non-IBD probabilities for X chromosomes are the same as those for autosomes (Table 2 of Zheng et al. 2014). Thus, we show analytically that the map expansion for the X chromosome is two-thirds that of the autosome for the -way (n ≥ 1) RIL, according to equations (7a, 8). Broman (2012a) has verified this two-thirds rule via Maxima for the -way RIL up to .
Figure 3 shows that these theoretical predictions fit very well with the forward simulation results for the two- and eight-way RIL by sibling mating. The differential densities and decay very fast with generation t and show some oscillations in the beginning generations. The overall expected junction density reaches the maximum in the same generation for autosomes.
AIL
We consider a multiparental AIL population that is founded by inbred females and inbred males. A unique ancestral origin is assigned to each inbred founder’s genomes so that the two-gene non-IBD probabilities and , and similarly for the three-gene non-IBD probabilities , if they exist.
The population of size N is produced by mating scheme RM1-NE or RM2-NE. According to Table S2, the coalescence probabilities and for mating scheme RM1-NE, and they hold approximately for RM2-NE with large population size N » 6. Thus, the two-gene non-IBD probabilities at are given by and according to the recurrence equations (4a–4d), and the three-gene non-IBD probabilites at are given by and according to the recurrence equations (5a–5f). In addition, no junctions can be formed from inbred founders so that it holds that , , and .
The population is maintained for U generations with constant size N and sex ratio 1. Assuming that the intercross population size is large (N » 6), all the two- and three-gene coalescence probabilities at are approximately equal and are denoted by s, and they are determined by the intercross mating scheme according to Table S2. Substituting the initial conditions in the population into the theorems of Appendix A3, we obtain in Table 2 the results for X chromosomes in the AIL in the last generation . Table 2 also shows the results for autosomes, which are derived according to Zheng et al. (2014).
As shown in Table 2, the non-IBD probabilities for X-chromosomes are unequal to those for autosomes, and thus the map expansions generally do not satisfy the two-thirds rule. According to the map expansions and in Table 2, we derive their approximations under the limit of an infinitely large population size (N →∞) so that the coalescence probability goes to zero (s →0),
(12a) |
(12b) |
where the last two terms for in Table 2 are small and thus ignored. The equations (12a, 12b) show that the two-thirds rule is approximately valid for a large number L of founder lines. The map expansion of equation (12b) for is consistent with the previous results (Darvasi and Soller 1995; Liu et al. 1996; Winkler et al. 2003; Broman 2012b).
The left panels of Figure 4 show for the AIL that the theoretical predictions fit very well with the forward simulation results, where RM1-NE, RM1-E, , and . Within intercross generations, the non-IBD probability decreases slowly with generation t, the differential map expansion remains almost constant after a few generations of oscillations, and the map expansions in equations (12a, 12b), shown as thick red lines in Figure 4, are very good approximations.
HS and DO
The HS and the DO differ from the AIL only in the genetic compositions of the founder population. The N progenitors of the DO at were sampled independently from pre-CC lines at a variety of different generations. Each pre-CC line is produced by the RIL by sibling mating from randomly permuted founder strains. Let denote the proportion of the pre-CC progenitors that were in generation k. Thus, for a random progenitor, it holds and , where and for can be obtained from Table 1. Because the founder stains are exchangeable, we obtain , , and , and because recombination crossovers are independent among different pre-CC lines, the between-individual expected junction densities at are given by , , and , where refers to the probability that the third ancestral origin on haplotype is different from the two ancestral origins on haplotype where the ancestry transition occurs. The within-individual expected junction densities at are not required in the following derivations.
The population of size N is produced by random mating with equal sex ratio. Assuming that the population size N » 6, the coalescence probabilities at are approximated to be zero. According to the recurrence equations for the two- and three-gene non-IBD probabilities, the between-individual probabilities did not change and the within-individual non-IBD probabilities at equal to the corresponding between-individual probabilities. In addition, we have , , , , , according to the recurrence equations for the expected junction densities.
Similar to the intercross stage of the AIL, we obtain in Table 3 the results for X chromosomes in the DO in the last generation by substituting the initial conditions in the population into the theorems of Appendix A3. Table 3 also shows the results for autosomes, which are derived according to Zheng et al. (2014). Under the limit of an infinitely large population size (N→∞), we obtain from Table 3
(13a) |
(13b) |
showing that the two-thirds rule is valid under such an approximation since and for progenitors drawn from the RIL (Table 1). The map expansion in equation (13b) for is the same as the one obtained by Broman (2012b).
The right panels of Figure 4 show for the HS that the theoretical predictions fit very well with the forward simulation results, where RM1-E, the 100 individuals in the population were sampled independently from CC funnels at the same generation . The results are similar to those for the AIL with the same L shown in the left panels of Figure 4. For X chromosomes, the non-IBD probabilities in the DO are larger than those in the AIL, and thus in the DO the map expands at a higher rate than that for the AIL, see equations (12a, 13a).
DSPR
The DSPR RILs were derived from two synthetic populations, each created independently by adding the multiparental AIL with an inbreeding stage by sibling mating (King et al. 2012). For example, we derive the analytic expressions of the map expansions in one synthetic population with L founder strains. We assume that , which holds in a non-inbreeding population and approximately in a large population (e.g., ) with a large number of intercross generations (e.g., ). According to the map expansions in Table S3, we have
(14a) |
(14b) |
where and , and and are given in Table 1, Table 2, or Table 3 if the population is the last generation of the RIL, the AIL, or the DO, respectively.
We evaluate the large size assumption for various random mating schemes by simulation studies of the DSPR. Figure 5 shows the fitting of the theoretical predictions with the forward simulation results for the intercross size 20, 50, and 100, where the mating schemes RM1-NE and RM1-E (RM1-NE) for the left (right) panels. The theoretical predictions are obtained by combining the results for the AIL (Table 2) with those for the sibling-mating population (Table S3), assuming the large size (N » 6). The relative worse fitting for the differential densities and is probably attributable to the limited number (2 × 104) of simulation replicates. The theoretical fitting becomes improved with increased size N, and it is very good for N = 100 within the range of U = 20 intercross generations. The fitting for RM1-E is better than RM1-NE because in the former case the two-gene coalescence probabilities are always equal to the three-gene probabilities (Table S2), independent of the size N. Figure S2 shows similar results for the random mating scheme RM2, except that the expected junction densities are slightly smaller. Figure S3 and Figure S4 show that the large size assumption is less sensitive for autosomes, and the fittings are very good even for N = 20.
Discussion
We have extended our previous framework of modeling ancestral origin processes from autosomes to X chromosomes, and thus the same assumptions such as exchangeability of ancestral origins, Markov properties and random mating also apply (Zheng et al. 2014). The deviations from Markov properties result in larger variances in the IBD-tract length and the junction densities, which have been shown to be acceptable (Chapman and Thompson 2003; Martin and Hospital 2011). The random mating indicates that our approach does not apply to breeding populations with marker-assisted selections.
In contrast to the previous approaches (Haldane and Waddington 1931; Broman 2012a), the exchangeability assumption of ancestral origins greatly reduces model complexity, because the number of possible junction types does not depend on the number of founders for L ≥ 3 whereas the number of diplotype states increases very fast with L. The assumption affects the rate matrix of the Markov model, but not the expected junction densities where only changes of ancestral origins matter. The exchangeability is a good approximation for the AIL- or the multiparent advanced generation inter-cross (i.e., MAGIC)-type populations with random mating, but it does not hold for the multiway RIL by sibling mating.
However, the exchangeability assumption is not critical for the application of our results to haplotype reconstructions from genotype data. The genomes of the individuals collected in the last generation have been well mixed by random chromosomal segregations over many generations. This is demonstrated in Figure 1A for the four-way RIL by sibling mating, where a female A and a male B was crossed, and a female C and a male D was crossed, and then a daughter from A × B and a son from C × D was crossed. The X chromosome of the founder D is lost in . The genotype probabilities for AB and AC are different and given in the Table 2 of Broman (2012a), although the sum of the genotype probabilities for AB, AC, and BC is equal to in Table 1. Figure 1B shows that the genotype probability for AB or AC becomes close to the average probability as generation t increases. Furthermore, in the beginning generations when the asymmetry among ancestral origins is large, there are fewer number of recombination breakpoints, and thus more marker data per genome block are available to estimate ancestral origins. As a result, a priori equal weights of ancestral origins have little effects.
An HMM is under development for reconstructing ancestral origins for both autosomes and X chromosomes from marker data, using the present model and the previous one (Zheng et al. 2014) as the prior distribution. The previously implemented HMM methods, such as GAIN (Liu et al. 2010) and HAPPY (Mott et al. 2000), were developed for autosomes, and they do not account for the asymmetry between maternally and paternally derived X chromosomes.
The closed form expressions for non-IBD probabilities and various expected junction densities have been derived for stage-wise mapping populations. They provide the complete information for constructing the CTMC along two X chromosomes but also the guides for designing a new population in terms of X-linked QTL mapping resolutions. For advanced intercross populations such as the AIL, the HS, and the DO under the assumption of a large intercross size, the map expands linearly at a rate proportional to the inverse of the number L of inbred founders, which is robust to intercross mating schemes. For the RIL and the inbreeding stage of the DSPR, the map expansion slows down with increasing level of inbreeding. The overall junction density for the DSPR decreases after one generation of the inbreeding stage by sibling mating, whereas for the RIL it reaches the maximum in the middle of inbreeding by sibling mating. These conclusions can also be applied to autosomes. Thus the most effective way of improving mapping resolutions is to increase the number U of intercross generations in a large population (N ≥ 5U, empirically).
Acknowledgments
I thank George O. Agogo, Rianne Jacobs, Martin P. Boer, Fred A. van Eeuwijk, and the two anonymous reviewers for their helpful comments. This research was supported by the Stichting Technische Wetenschappen (STW) - Technology Foundation, which is part of the Nederlandse Organisatie voor Wetenschappelijk Onderzoek - Netherlands Organization for Scientific Research, and which is partly funded by the Ministry of Economic Affairs. The specific grant number was STW-Rijk Zwaan project 12425.
Appendix A
Results for constant random mating populations
We introduce some matrix-vector notations to facilitate the derivations. Denote by the element-by-element multiplication of the two matrices and , and by the element-by-element division of the two matrices. Denote by the element-wise power where the subscripts of the natural numbers , and by default . Let 1 be a vector with appropriate length and all the elements being 1. Let be the diagonal matrix with the diagonal elements being the vector . Denote by the matrix with row vectors , …, of equal length. Denote by superscript T the transpose of a vector or matrix.
The closed form expressions for the two- and three-gene non-IBD probabilities and the expected junction densities are derived for populations with constant size and random mating schemes. The coalescence probabilities are thus constant, and set , , , and . We first consider a finite population, number of males and number of females , so that all the two- and three-gene non-IBD probabilities exist. Then consider an example of small population size, a sibling-mating population with one male and one female (), where the non-IBD probabilities , , , , and do not exist. Lastly, we consider a large population under the limit that the size .
A1 Finite population
Definition A1.1. The finite population refers to a population of constant number of males and number of females, maintained by random mating, and the initial population satisfies , , and .
Proposition A1.2. Denote by the two-gene non-IBD probability in a finite population. According to equations (4a–4c), it holds
(A1.1) |
where
(A1.2) |
Premise A1.3. The eigenvalues of in a finite population, denoted by , in the decreasing order of their absolute values, are distinct with multiplicities 1, and none of them is 0, 1, or .
Theorem A1.4. The two-gene non-IBD probability in a finite population is given by
(A1.3) |
where
(A1.4) |
(A1.5) |
Proof. It holds
(A1.6a) |
(A1.6b) |
(A1.6c) |
where the constant coefficients , , and are to be solved. Substituting and of equations (A1.6a, A1.6b) into the recurrence equation (4b), we obtain
(A1.7) |
Substituting and of equations (A1.6a, A1.6c) into the recurrence equation (4c), we obtain
(A1.8) |
Substituting and of equations (A1.7, A1.8) into equation (A1.6a–A1.6c), we obtain
(A1.9) |
where is given by in equation (A1.4), and the constant coefficient is determined by the initial condition and it is given by equation (A1.5).
Proposition A1.5. Denote by the three-gene non-IBD probability in a finite population. According to equations (5a–5d), it holds
(A1.10) |
where
(A1.11) |
Premise A1.6. The eigenvalues of in a finite population, denoted by in the decreasing order of their absolute values, are distinct with multiplicities 1, and none of them is 0, 1, , or .
Theorem A1.7. The three-gene non-IBD probability in a finite population is given by
(A1.12) |
where
(A1.13) |
(A1.14) |
and for
(A1.15) |
Proof. It holds
(A1.16a) |
(A1.16b) |
(A1.16c) |
(A1.16d) |
where the constant coefficients , , , and are to be solved. Substituting and of equations (A1.16a, A1.16d) into the recurrence equation (5d), we obtain
(A1.17) |
Substituting , , and of equations (A1.16a–A1.16c) into the recurrence equation (5c), we obtain
(A1.18) |
Substituting , , and of equations (A1.16a–A1.16c) into the recurrence equation (5b), and substituting of equation (A1.18), we obtain
(A1.19) |
where is given by equation (A1.15). Substituting , , and of equations (A1.17–A1.19) into equations (A1.16a–A1.16d), we obtain
(A1.20) |
where is given by equation (A1.13), and the constant coefficient is determined by the initial condition and it is given by equation (A1.14).
Theorem A1.8. The map expansions in a finite population are given by
(A1.21) |
(A1.22) |
(A1.23) |
where is given by equation (A1.5), and
(A1.24) |
(A1.25) |
(A1.26) |
Proof. According to equation (7a), can be obtained from the accumulative summation of the non-IBD probability of equation (A1.6b), and we have
(A1.27) |
which is equivalent to equation (A1.21) with the stationary map expansion being given by equation (A1.24). The eigenvalues for the transition matrix of the linear recurrence equations (4a–4c) and equations (6a, 6b) are 1, , , and thus the map expansions and can be expressed in the forms of equations (A1.22, A1.23), where the constant coefficients are determined by calculating from equations (A1.22, A1.23) and comparing the result with equation (A1.21), and is determined by the initial condition . □
Theorem A1.9. Denote by the expected density of junction type in a finite population, and it holds
(A1.28) |
where are given by equations (A1.24–A1.26), is given by equation (A1.4),
(A1.29) |
(A1.30) |
(A1.31) |
and for
(A1.32) |
Proof. The eigenvalues for the transition matrix of the linear recurrence equations (4a–4c), equations (6a, 6b), and equations (9a–9c) are 1, , and duplicated . It holds
(A1.33a) |
(A1.33b) |
(A1.33c) |
where the constant coefficients , , and are to be solved. Substituting and of equations (A1.33a, A1.33b) into the recurrence equation (9b), we obtain
(A1.34) |
Substituting and of equations (A1.33a, A1.33c) and of equation (A1.22) into the recurrence equation (9c), and substituting of equation (A1.34), we obtain
(A1.35) |
Substituting , and of equations (A1.33a–A1.33c) and and of equations (A1.22, A1.23) into the recurrence equation (9a), and substituting and of equations (A1.34, A1.35), we obtain in equation (A1.31) and
(A1.36a) |
(A1.36b) |
The constant coefficient is determined by the initial condition .
Definition A1.10. Denote by
(A1.37) |
(A1.38) |
and thus
(A1.39) |
(A1.40) |
Proposition A1.11. According to Definition A1.10, the recurrence equations (10a–10d) in a finite population are transformed into
(A1.41a) |
(A1.41b) |
(A1.41c) |
(A1.41d) |
Theorem A1.12. Denote by the expected density of junction type in a finite population, and it holds
(A1.42) |
where is given by equation (A1.4),
(A1.43) |
(A1.44) |
(A1.45) |
and for
(A1.46) |
where is the characteristic polynomial of the transition matrix of equation (A1.2).
Proof. The eigenvalues for the transition matrix of the linear recurrence equations (5a–5d) and equations (A1.41a–A1.41c) are , . It holds
(A1.47a) |
(A1.47b) |
(A1.47c) |
where the constant coefficients , , and are to be solved. Substituting and of equations (A1.47a, A1.47c) into the recurrence equation (A1.41c), we obtain
(A1.48) |
Substituting and of equations (A1.47a, A1.47b) and of equation (A1.16b) into the recurrence equation (A1.41b), we obtain
(A1.49) |
The theorem follows by substituting , and of equations (A1.47a–A1.47c) and and of equations (A1.16b, A1.16c) into the recurrence equation (A1.41a), and substituting and of equations (A1.48, A1.49), where the constant coefficient in equation (A1.44) is determined by the initial condition . □
Theorem A1.13. The expected density is given by
(A1.50) |
where
(A1.51) |
(A1.52) |
Proof. The eigenvalues of the transition matrix of the linear recurrence equations (5a–5d) and equation (A1.41d): , , . Thus equation (A1.50) holds, where is determined by putting equations (A1.50, A1.16b) into equation (A1.41d), and the constant coefficient is determined by the initial condition . □
Corollary A1.14. The overall expected junction density of a female in a finite population is given by
(A1.53) |
where are given by equations (A1.24–A1.26), and by equations (A1.30, A1.31).
Proof. The corollary follows by substituting and of equations (A1.22, A1.23) and of equation (A1.33b) into equation (3).
A2 Sibling-mating population
Definition A2.1. The sibling-mating population refers to a population of constant size 2, one male and one female, maintained by sibling mating, and the initial population satisfies . In such a population, the coalescence probability , and the coalescence probabilities , , are set to zero.
Definition A2.2. In a sibling-mating population, the non-IBD probabilities , , , , and are set to zero since they do not exist. Similarly the expected junction densities and are set to zero.
Proposition A2.3. Denote by the two-gene non-IBD probability in a sibling-mating population. According to equations (4a, 4b), it holds
(A2.1) |
and its eigenvalues are given by
(A2.2) |
Definition A2.4. In a sibling-mating population, the conjugate is obtained by replacing with from the terms involving .
Theorem A2.5. The two-gene non-IBD probability in a sibling-mating population is given by
(A2.3) |
(A2.4) |
Proof. Similar to Theorem A1.4, it holds
(A2.5) |
where
(A2.6) |
(A2.7) |
The theorem follows by substituting an into equation (A2.5). □
Theorem A2.6. The three-gene non-IBD probability in a sibling-mating population is given by
(A2.8) |
Proof. According to the recurrence equations (5b, 5e),
(A2.9) |
Thus it holds
(A2.10) |
where
(A2.11) |
(A2.12) |
Theorem A2.7. The map expansions in a sibling-mating population are given by
(A2.13) |
(A2.14) |
(A2.15) |
where and are given by
(A2.16) |
(A2.17) |
Proof. Similar to Theorem A1.8, the map expansions in a sibling-mating population are given by
(A2.18) |
(A2.19) |
(A2.20) |
where
(A2.21) |
(A2.22) |
(A2.23) |
The theorem follows by substituting of equation (A2.2) and of equation (A2.7) into the aforementioned equations.
Theorem A2.8. The expected density in a sibling-mating population is given by
(A2.24) |
(A2.25) |
where and are given by equations (A2.16, A2.17), respectively.
Proof. Similar to Theorem A1.9, the expected density in a sibling-mating population is given by
(A2.26) |
where and are given by equations (A2.16, A2.17), respectively, and is given by equation (A2.6), and
(A2.27) |
(A2.28) |
(A2.29) |
and for
(A2.30) |
The theorem follows by substituting of equation (A2.2) and of equation (A2.23).
Theorem A2.9. The expected densities of type (1232) in a sibling-mating population are given by
(A2.31) |
(A2.32) |
(A2.33) |
Proof. Denote by . Similar to Theorem A1.12, the expected density in a sibling-mating population is given by
(A2.34) |
where is given by equation (A2.6), and
(A2.35) |
(A2.36) |
(A2.37) |
and for
(A2.38) |
where is the characteristic polynomial of the transition matrix of equation (A2.1). The equations (A2.31, A2.32) follow by substituting of equation (A2.2), of equation (A2.11), and of equation (A2.12).
Similar to Theorem A1.13, the expected density in a sibling-mating population is given by
(A2.39) |
where
(A2.40) |
(A2.41) |
Corollary A2.10. The overall expected junction density in a sibling-mating population is given by
(A2.42) |
where and are given by equations (A2.16, A2.17), respectively.
Proof. The corollary follows by substituting and of equations (A2.14, A2.15) and of equation (A2.25) into equation (3).
A3 Large population
Definition A3.1. The large population refers to a population with large and equal number of males and females, maintained by random mating, and the initial population satisfies , , and . In such a large population, the coalescence probabilities are set to .
Proposition A3.2. Denote by the two-gene non-IBD probability in a large population, and the eigenvalues of the transition matrix in equation (A1.2) are given by
(A3.1) |
where is approximated to the first order of s to keep it smaller than 1, and and are approximated to zero order of s.
Theorem A3.3. The two-gene non-IBD probabilities in a large population are given by
(A3.2) |
where
(A3.3) |
(A3.4) |
(A3.5) |
Proof. The theorem follows directly from Theorem A1.4, where
(A3.6) |
(A3.7) |
The matrix is approximated to the zero order of s. □
Proposition A3.4. Denote by the three-gene non-IBD probability in a large population, and the eigenvalues of the transition matrix of equation (A1.11) are given by
(A3.8) |
where is approximated to the first order of s to keep it smaller than 1, and is approximated to zero order of s.
Theorem A3.5. The three-gene non-IBD probability in a large population is given by
(A3.9) |
where
(A3.10) |
(A3.11) |
Proof. The theorem follows directly from Theorem A1.7, where
where the elements of are approximated to be the leading order of s and up to the zero order of s. □
Theorem A3.6. The map expansions in a large population are given by
(A3.12) |
(A3.13) |
(A3.14) |
where is given by equation (A3.4), and
(A3.15) |
(A3.16) |
(A3.17) |
Remark. As given t, , that is, the map expansions are dominant by the linear term proportional to t; as given s, the map expansions asymptotically go to the constant .
Proof. The proof is similar to Theorem A1.8, except that the eigenvalue or has multiplicity 2. □
Theorem A3.7. The expected density , is given by
(A3.18) |
where , , and are given by equation (A3.15), equation (A3.17), and equation (A3.3), respectively, and
(A3.19) |
Proof. The eigenvalues for the transition matrix of the linear recurrence equations (4a–4c), equations (6a, 6b), and equations (9a–9c) are 1, triplicated , and duplicated and , under the limit . It holds
(A3.20a) |
(A3.20b) |
(A3.20c) |
where is given by equation (A3.17), and the constant coefficients , , and are to be solved. Substituting and of equations (A3.20a, A3.20b) into the recurrence equation (9b), we obtain
(A3.21) |
Substituting and of equations (A3.20a, A3.20c) and of equation (A3.13) into the recurrence equation (9c), and substituting of equation (A3.21), we obtain
(A3.22) |
Substituting , and of equations (A3.20a–A3.20c) and of equation (A3.13) into the recurrence equation (9a), substituting of equation (A3.21), and substituting the exact of equation (A3.22), we obtain
(A3.23) |
where the constant coefficients , , and are approximated by keeping up to the zero order of s, while is approximated by keeping up to the first order of s. The theorem follows by substituting , , , and of equations (A3.21–A3.23) into equations (A3.20a–A3.20c), where the constant coefficient of equation (A3.19) is determined by the initial conditions . □
Theorem A3.8. Denote by the expected density of junction type in a large population, and it holds
(A3.24) |
where is given by equation (A3.3), is given by equation (A3.11), and
(A3.25) |
(A3.26) |
Remark. In equation (A3.24), as ; the term of equation (A3.26) is independent of s since in the matrices is canceled with s of (see equation (A3.11)).
Proof. In a large population, the seven eigenvalues for the transition matrix of the linear recurrence equations (5a–5d) and equations (A1.41a–A1.41c) are , , , , and . It holds
(A3.27a) |
(A3.27b) |
(A3.27c) |
where the constant coefficients , , and are to be solved. Substituting and of equations (A3.27a, A3.27b) and of equation (A1.16b) into the recurrence equation (A1.41b), we obtain
(A3.28) |
Substituting and of equations (A3.27a, A3.27c) into the recurrence equation (A1.41c), we obtain
(A3.29) |
Substituting , and of equations (A3.27a–A3.27c) and and of equations (A1.16b–A1.16c) into the recurrence equation (A1.41a), and substituting and of equations (A3.28, A3.29) we obtain
(A3.30) |
According to equation (A3.10), we have
(A3.31) |
and thus
(A3.32) |
where is given by equation (A3.11). The constant coefficient is determined by the initial condition , and it is given by equation (A3.25). The theorem follows by writing equations (A3.27a–A3.27c) in the matrix form while keeping only the leading order of s.
Theorem A3.9. The expected density in a large population is given by
(A3.33) |
where , , and are given by equation (A3.11), and
(A3.34) |
Proof. In a large population, the eigenvalues of the transition matrix of the linear recurrence equations (5a–5d) and equation (A1.41d) are , duplicated , and . It holds
(A3.35) |
where is to be solved. Substituting of equation (A3.35) and of equation (A3.9) into the recurrence equation (A1.41d), and we obtain
(A3.36) |
and is determined by the initial condition and it is given in equation (A3.34). The theorem follows by substituting equation (A3.36) into equation (A3.35), and approximating to zero since the leading term of is the first order of s, see equation (A3.11).
Corollary A3.10. The overall expected unction density in a large population is given by
(A3.37) |
where , , and are given by equation (A3.4), equations (A3.15–A3.17), and equation (A3.19), respectively.
Proof. The corollary follows by substituting and of equations (A3.13, A3.14) and of equation (A3.20b) into equation (3).
Footnotes
Supporting information is available online at http://www.g3journal.org/lookup/suppl/doi:10.1534/g3.114.016154/-/DC1
Communicating editor: E. Huang
Literature Cited
- Broman K., 2005. The genomes of recombinant inbred lines. Genetics 169: 1133–1146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Broman K. W., 2012a Genotype probabilities at intermediate generations in the construction of recombinant inbred lines. Genetics 190: 403–412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Broman K. W., 2012b Haplotype probabilities in advanced intercross populations. G3 (Bethesda) 2: 199–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chapman N., Thompson E., 2003. A model for the length of tracts of identity by descent in finite random mating populations. Theor. Popul. Biol. 64: 141–150. [DOI] [PubMed] [Google Scholar]
- Churchill G., Airey D., Allayee H., Angel J., Attie A., et al. , 2004. The collaborative cross, a community resource for the genetic analysis of complex traits. Nat. Genet. 36: 1133–1137. [DOI] [PubMed] [Google Scholar]
- Darvasi A., Soller M., 1995. Advanced intercross lines, an experimental population for fine genetic-mapping. Genetics 141: 1199–1207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fisher R., 1949. The Theory of Inbreeding. Oliver and Boyd, London. [Google Scholar]
- Fisher R., 1954. A fuller theory of junctions in inbreeding. Heredity 8: 187–197. [Google Scholar]
- Haldane J., Waddington C., 1931. Inbreeding and linkage. Genetics 16: 357–374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johannes F., Colome-Tatche M., 2011. Quantitative epigenetics through epigenomic perturbation of isogenic lines. Genetics 188: 215–227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- King E. G., Macdonald S. J., Long A. D., 2012. Properties and power of the Drosophila Synthetic Population Resource for the routine dissection of complex traits. Genetics 191: 935–949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu E. Y., Zhang Q., McMillan L., Pardo-Manuel de Villena F., Wang W., 2010. Efficient genome ancestry inference in complex pedigrees with inbreeding. Bioinformatics 26: i199–i207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu S., Kowalski S., Lan T., Feldmann I., Paterson A., 1996. Genome-wide high-resolution mapping by recurrent intermating using Arabidopsis thaliana as a model. Genetics 142: 247–258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacLeod A. K., Haley C. S., Woolliams J. A., 2005. Marker densities and the mapping of ancestral junctions. Genet. Res. 85: 69–79. [DOI] [PubMed] [Google Scholar]
- Martin O. C., Hospital F., 2011. Distribution of parental genome blocks in recombinant inbred lines. Genetics 189: 645–654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ming R., Moore P. H., 2007. Genomics of sex chromosomes. Curr. Opin. Plant Biol. 10: 123–130. [DOI] [PubMed] [Google Scholar]
- Mott R., Talbot C., Turri M., Collins A., Flint J., 2000. A method for fine mapping quantitative trait loci in outbred animal stocks. Proc. Natl. Acad. Sci. USA 97: 12649–12654. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Svenson K. L., Gatti D. M., Valdar W., Welsh C. E., Cheng R. Y., et al. , 2012. High-resolution genetic mapping using the mouse diversity outbred population. Genetics 190: 437–447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weller J., Soller M., 2004. An analytical formula to estimate confidence interval of qtl location with a saturated genetic map as a function of experimental design. Theor. Appl. Genet. 109: 1224–1229. [DOI] [PubMed] [Google Scholar]
- Winkler C., Jensen N., Cooper M., Podlich D., Smith O., 2003. On the determination of recombination rates in intermated recombinant inbred populations. Genetics 164: 741–745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng C. Z., Boer M. P., van Eeuwijk F. A., 2014. A general modeling framework for genome ancestral origins in multiparental populations. Genetics 198: 87–101. [DOI] [PMC free article] [PubMed] [Google Scholar]