Simultaneous Estimation of Mixing Rates and Genetic Drift Under Successive Sampling of Genetic Markers With Application to the Mud Crab (Scylla paramamosain) in Japan

Toshihide Kitakado; Shuichi Kitada; Yasuhiro Obata; Hirohisa Kishino

doi:10.1534/genetics.106.056424

. 2006 Aug;173(4):2063–2072. doi: 10.1534/genetics.106.056424

Simultaneous Estimation of Mixing Rates and Genetic Drift Under Successive Sampling of Genetic Markers With Application to the Mud Crab (Scylla paramamosain) in Japan

Toshihide Kitakado ^*,¹, Shuichi Kitada ^*, Yasuhiro Obata ^†, Hirohisa Kishino ^‡

PMCID: PMC1569702 PMID: 16928801

Abstract

In stock enhancement programs, it is important to assess mixing rates of released individuals in stocks. For this purpose, genetic stock identification has been applied. The allele frequencies in a composite population are expressed as a mixture of the allele frequencies in the natural and released populations. The estimation of mixing rates is possible, under successive sampling from the composite population, on the basis of temporal changes in allele frequencies. The allele frequencies in the natural population may be estimated from those of the composite population in the preceding year. However, it should be noted that these frequencies can vary between generations due to genetic drift. In this article, we develop a new method for simultaneous estimation of mixing rates and genetic drift in a stock enhancement program. Numerical simulation shows that our procedure estimates the mixing rate with little bias. Although the genetic drift is underestimated when the amount of information is small, reduction of the bias is possible by analyzing multiple unlinked loci. The method was applied to real data on mud crab stocking, and the result showed a yearly variation in the mixing rate.

IT is well known that many fisheries stocks have been reduced because of overfishing, habitat loss, and degradation. (Pauly et al. 2002). Some management procedures, such as setting a total allowable catch (TAC), are the usual practices for sustainable use of such fisheries stocks (e.g., Hilborn and Walters 1992). In addition to such management procedures, stock enhancement programs through release of hatchery-reared juveniles are used to supplement reduced natural stocks (Blaxter 2000; Svåsand et al. 2000; Leber et al. 2004; Kitada and Kishino 2006).

In stock enhancement programs, the assessment of the contribution of the released population to the commercial catch is one of the important tasks (Kitada and Kishino 2006). Also, it is necessary to monitor the genetic impact of release on a natural population. Evaluation of the mixing conditions of a released population into a natural population must then be carried out. However, if juveniles are too small to tag and/or the tag-shedding rate is significant, it is difficult to determine their mixing compositions through experiments with artificial tags. For such cases, there is the possibility of stock identification using genetic tags such as mtDNA haplotypes and microsatellite alleles (e.g., Leber et al. 2004).

We now consider a fisheries stock of a random-mating population in a closed fishing ground. Here, we call this population a natural population. Suppose that in each year hatchery-reared juveniles are released before the fishing season. Then, a population after the release of the juveniles composes a composite population, which consists of two source populations, that is, the natural and supplemented populations. The composite population could in turn be naturally reproducing in the next year. We call it a natural population again.

The haplotype or allele frequencies in a population in the fishing ground change temporally for two reasons: the release of juveniles and the genetic drift. The amount of genetic drift depends on the effective population size (N_e) (Nei and Tajima 1981; Waples 1989). The methods for estimating N_e on the basis of temporal changes in haplotype or allele frequencies have been developed. Waples (1989) introduced a moment-based estimation method using the standardized variance in allele frequency change. More efficient estimation by the maximum-likelihood method was introduced by Williamson and Slatkin (1999) and Anderson et al. (2000). Furthermore, Anderson (2005) presented a coalescent-based likelihood method for temporally sampled individuals.

Such temporal change in sampled haplotype or allele frequencies can also be used for the estimation of mixing rates. Typically, the estimation of mixing rates in a mixed population needs samples from different baseline populations as well as the composite population. For such situations, the maximum-likelihood (ML) and Bayesian methods were developed (e.g., Millar 1987; Pella and Masuda 2001). Another situation occurs in stock enhancement programs. In this case, one baseline population is a natural population before release, and the other is a released population. A mixed population is a composite population after release. Usually, sampling from the composite population is conducted in the fishing season, and the composite population in the previous year acts as a baseline. In this sense, a composite population in one year plays two roles: as a mixed population for the sampling year and as a baseline natural population for the following year. This kind of method was applied to data for the mud crab, Scylla paramamosain, in Japan. Imai et al. (2002) used the mitochondrial DNA (mtDNA) D-loop region as a genetic tag and gave a point estimate for the mixing rate of juveniles released for a year. Obata et al. (2006) obtained point estimates of mixing rates and their confidence intervals using the maximum-likelihood method.

As mentioned above, haplotype or allele frequencies can vary due to genetic drift. Hatchery releases could be effective for depleted populations with limited recruitment. Supportive breeding (Ryman and Laikre 1991) programs also target small populations of endangered species. In such cases, the effective population size could be small and the amount of genetic drift cannot be ignored. If the effect of genetic drift is large, actual haplotype or allele frequencies in the natural population, regarded as a baseline population, are different from frequencies in the natural population at release. The natural population at release should actually act as a baseline. Obata et al. (2006) conducted a simulation for sensitivity tests to quantify the magnitude of changes in estimates and standard errors of mixing rates for various values of genetic drift assumed to be known. However, the effect of the genetic drift on the estimate of the mixing rate has not been fully evaluated and the reliability of the classical method for such situations remains unknown.

In this article, we develop a new method to simultaneously estimate mixing rates and the magnitude of the genetic drift, on the basis of successive allelic frequencies. Our statistical model includes further parameters such as frequencies in the initial population and latent vectors that represent random haplotype or allele frequencies due to genetic drift. While the complete likelihood function for such latent variables and observed data can be explicitly expressed, the likelihood function for observed data does not have a closed form. To obtain the maximum-likelihood estimate of the genetic drift, we use a Monte Carlo approximation of likelihood with a Markov chain Monte Carlo (MCMC) method (e.g., Gelman et al. 2004; Robert and Casella 2004). Performance of the proposed method was evaluated via numerical simulations and real data of mud crab stocking in Japan were analyzed.

MODELS AND METHODS

We mostly illustrate our statistical model and estimation method for the case where alleles at a single locus (or haplotypes in mtDNA) are observed.

Statistical model:

Let Inline graphic be a vector of the true allele frequencies for a population in a fishing ground in year , where J is the number of alleles and . Let denote the observed allele frequencies of samples randomly collected from this natural population in year y. Then, given p_1y, the observed frequencies n_1y are assumed to have a multinomial distribution,

where Inline graphic is twice the sample size for the diploid (the sample size itself for the haploid marker).

Suppose that juveniles are reared each year using some sampled individuals with minor alleles and then released into the natural population in the following year. In this case, the alleles used in the released population act as genetic tags in the natural population. Let Inline graphic be a vector of allele frequencies for released juveniles in year (). Let denote sampled counts of allele frequencies from the released population in year y with a multinomial distribution,

where Inline graphic .

The allele frequencies in the population in the fishing ground change due to two factors: genetic drift and the release of juveniles into the population. To represent the variation due to the genetic drift, we assume a Dirichlet distribution with a probability density function

(1)

where Inline graphic is a vector of allele frequencies before release of juveniles in year y, and θ is a parameter that controls the amount of genetic drift; the smaller θ is, the larger the amount of genetic drift, and vice versa. In fact, under the assumption of nonoverlapping generations, the drift variance of allele frequencies between generations is given by p(1 − p)/2N_e, where p is an allele frequency in the previous generation and N_e is the effective population size (e.g., Nei and Tajima 1981; Waples 1989). Note that the variance for mtDNA haplotype frequencies is p(1 − p)/N_f, where N_f is the effective female population size. In this model, the variance of Inline graphic based on the Dirichlet distribution above is given by

Hence, the parameter θ is closely related to the effective population size as 1 + θ = 2N_e or 1 + θ = N_f. In this article, we assume a constant θ over all years.

Now, the vector of frequencies Inline graphic cannot be observed because the sample is taken after the release of juveniles, and therefore it is regarded as a latent random vector. After the genetic drift among years, the vector of allele frequencies further changes because of mixing with the released juveniles, as

where ω_y is the mixing rate of the released population into the natural population in year y. As a summary of our statistical model, a schematic diagram is shown in Figure 1.

Figure 1.— — A schematic of the statistical model with genetic drift.

Likelihood:

In this model, unknown parameters are mixing rates, Inline graphic , the degree of the genetic drift, θ, the vector of the true allele frequencies in a natural population in the initial year, p₁₁, and those of the released population, . For notational convenience, we summarize random variables as , , , and . If the latent vectors can be observed, the likelihood function for the unknown parameters, ω, θ, p₁₁, p₂, based on the observations n₁, n₂ and the latent variables Inline graphic , is explicitly expressed as follows:

Now, Inline graphic are unobservable latent variables, and hence we must consider a likelihood function based on the marginal distribution of n₁ and n₂ as follows:

Unfortunately, an explicit formula for the marginal likelihood function is not available. To find the maximum-likelihood estimates, we use a Monte Carlo approximation of likelihood as explained below.

Estimation algorithm:

For simplicity of illustration, we assume that the allele frequencies p₂ are known. In this case, the observation n₂ is not used in the estimation. The method with unknown p₂ is straightforward.

Let φ denote the set of parameters (ω, θ, p₁₁). To approximate L_M(φ | n₁) via a Monte Carlo simulation, we use the formula

where φ* is a reference vector, and Inline graphic is the conditional distribution of the latent variables at φ* given the observed data n₁. This integration can be approximated by Monte Carlo as

where Inline graphic are the realization of drawn from the conditional distribution .

Unfortunately, a closed form of the conditional distribution Inline graphic does not exist because it includes the marginal distribution L_M(φ* | n₁). To generate the random numbers from , we use a Metropolis-within-Gibbs sampling method (e.g., Robert and Casella 2004). Details of this algorithm are described in appendix a.

In general, the precision of the approximation depends on the choice of φ* as well as the size of m. One option for the determination of ω* and p*₁₁ in φ* is to use the maximum-likelihood estimates by ignoring the genetic drift. In this case, no latent variables are included in the likelihood. Another option is to use a Monte Carlo version of the EM algorithm (e.g., McLachlan and Krishnan 1997). A detailed description of this algorithm is given in appendix b. It is known that convergence of the EM algorithm tends to be slow, but values of parameters that are innovated up to several steps can be used for φ*, including θ*.

The case of multiple unlinked loci:

Our model can be extended for the case of multiple unlinked loci by considering the composite genotype frequencies. Different loci receive independent genetic drifts, and therefore the distribution of the latent allele frequencies, which is expressed in Equation 1, can be defined locus-by-locus. On the other hand, although the composite genotype frequencies are expressed as products of the genotype frequencies at the loci, the probability distribution of observed composite genotype is not given by the product of multinomial distributions for the loci. Here, we denote by x_1yil a composite genotype at the lth locus for the ith individual sampled from the composite population in year y (2 ≤ y ≤ Y) and by Inline graphic and p_2yl vectors of allele frequencies at the lth locus in the natural population before release and in the released population in year y, respectively. Then the distribution of x_1yi given and p_2yl is expressed as

where L is the number of loci, and the f's are probability distributions for the two source populations. The estimation algorithm with a MCMC becomes complicated for this case because the acceptance ratios in MCMC samplings depend on information from all the loci. In this article, we use the product of multinomial distributions from each locus as an approximated model.

SIMULATION STUDIES

Simulation scenarios:

To evaluate the estimation performance in our statistical model and the difference of estimates between considering and ignoring the genetic drift, we used simulation studies. As scenario I, the length of sampling years was fixed at 7, and the number of loci was assumed to be 1 or 5. Furthermore, as scenario II, we considered an extreme case where samples were taken for only 2 years. In this scenario, 5 or 10 loci were analyzed to cover the limited information from only 2 years. The allele frequencies in the natural population in the initial year and those in released populations were assumed to be common over loci, and the frequencies in the released populations were known. Note that, for the case of multiple loci, composite genotypes were generated from the true distribution, and then parameters were estimated using the approximated likelihood. To make the computational time shorter, we assumed a case of only three alleles. The mixing rates, of 0.1, were assumed to be the same over the years. Three levels of genetic drift, θ = 10, 100, and 1000, were used. The same sample size, N_y = 500, was assumed each year. Scenarios we used in this simulation are summarized in Table 1. The frequencies in the released populations are assumed to be known. Throughout this simulation study, we set the true values of parameters for φ*. Considering the computational burden, the sample size of the MCMC was fixed at 5000. Among 5000 samples, the first 1000 samples were discarded as constituting the burn-in period, and every fifth simulation draw was kept. Therefore, the effective sample size was 800. The number of simulation replicas was 100 for each scenario. All the computation was carried out using our own program written by statistical software R (http://www.r-project.org/).

TABLE 1.

True allele frequencies used in simulation studies

	Natural population			Released population
Year	E[p_1y1]	E[p_1y2]	E[p_1y3]	p_2y1	p_2y2	p_2y3
1	0.200	0.300	0.500	0.0	0.0	0.0
2	0.260	0.280	0.460	0.8	0.1	0.1
3	0.244	0.332	0.424	0.1	0.8	0.1
4	0.230	0.309	0.462	0.1	0.1	0.8
5	0.287	0.288	0.425	0.8	0.1	0.1
6	0.268	0.339	0.393	0.1	0.8	0.1
7	0.251	0.315	0.434	0.1	0.1	0.8

Open in a new tab

The mean of allele frequency E[p_1yj] was calculated using the recursive formula. Inline graphic .

Simulation results:

The results are shown in Table 2. In scenario I (Y = 7), although the estimate of θ tended to be overestimated, especially for the case of L = 1 and larger values of θ, estimates of the mixing rates, ω's, by the model that considered the genetic drift were almost unbiased throughout the simulation. The bias of Inline graphic was reduced when multiple loci (L = 5) were analyzed. On the contrary, those where the genetic drift was not considered were subject to large bias when θ was small; that is, the degree of genetic drift was large. Note that the estimation performance for the mixing rates tended to be similar in both the models for θ = 1000, the case of small genetic drift.

TABLE 2.

Expectations of estimators for mixing rates and genetic drift

		Parameter estimates
θ	L
		Scenario I (Y = 7)
10	1	0.112 (0.059)	0.132 (0.076)	0.134 (0.156)	0.122 (0.062)	0.121 (0.107)	0.137 (0.115)	16.6 (14.1)
		0.156 (0.207)	0.227 (0.261)	0.242 (0.305)	0.192 (0.266)	0.194 (0.296)	0.219 (0.290)
	5	0.104 (0.024)	0.099 (0.024)	0.106 (0.030)	0.107 (0.022)	0.111 (0.023)	0.107 (0.027)	10.1 (2.2)
		0.014 (0.035)	0.011 (0.031)	0.016 (0.081)	0.036 (0.065)	0.045 (0.079)	0.051 (0.133)
100	1	0.106 (0.061)	0.113 (0.071)	0.108 (0.086)	0.103 (0.052)	0.103 (0.068)	0.112 (0.093)	524.4 (831.1)
		0.106 (0.082)	0.116 (0.094)	0.126 (0.122)	0.108 (0.088)	0.113 (0.104)	0.121 (0.116)
	5	0.103 (0.025)	0.101 (0.030)	0.103 (0.034)	0.096 (0.028)	0.096 (0.028)	0.102 (0.039)	129.0 (39.2)
		0.075 (0.034)	0.057 (0.042)	0.030 (0.045)	0.067 (0.042)	0.060 (0.048)	0.072 (0.058)
1000	1	0.099 (0.040)	0.100 (0.042)	0.114 (0.048)	0.105 (0.038)	0.103 (0.046)	0.097 (0.073)	10,047 (17,722)
		0.100 (0.042)	0.103 (0.055)	0.116 (0.072)	0.103 (0.043)	0.099 (0.051)	0.100 (0.082)
	5	0.096 (0.013)	0.105 (0.014)	0.101 (0.021)	0.100 (0.014)	0.101 (0.015)	0.099 (0.028)	1,633 (458)
		0.098 (0.018)	0.096 (0.029)	0.089 (0.042)	0.096 (0.021)	0.092 (0.026)	0.097 (0.042)
		Scenario II (Y = 2)
10	5	0.107 (0.028)	—	—	—	—	—	12.2 (7.1)
		0.066 (0.070)	—	—	—	—	—
	10	0.105 (0.019)	—	—	—	—	—	11.4 (4.1)
		0.066 (0.049)	—	—	—	—	—
100	5	0.100 (0.034)	—	—	—	—	—	142.0 (91.0)^a
		0.094 (0.037)	—	—	—	—	—
	10	0.096 (0.023)	—	—	—	—	—	134.5 (60.7)^a
		0.092 (0.028)	—	—	—	—	—
1000	5	0.100 (0.030)	—	—	—	—	—	^b
		0.100 (0.027)	—	—	—	—	—
	10	0.099 (0.020)	—	—	—	—	—	^b
		0.104 (0.024)	—	—	—	—	—

Open in a new tab

Values in the top and bottom rows in each case showed the results of the models with and without considering genetic drift, respectively. The values in parentheses are standard deviations. The true mixing rates were assumed to be 0.1. The sample size was fixed at 500 through all years. In scenario II, an upper bound for the estimate of θ was set at 5000.

The estimates of θ that attained the upper bound (36 and 27 times for L = 5 and 10, respectively) were excluded from the calculation of the mean and standard deviation.

All the estimates of θ attained the upper bound.

In scenario II (Y = 2), because of the small size of replication with respect to years, the estimation was more difficult to make than in scenario I. In fact, especially for the case of θ = 1000, the estimates of θ tend to be infinity. Hence, we set an upper bound, 5000, for the estimate of θ. However, the simulation results suggest that the estimation of θ with many loci performs well when genetic drift is large. The estimation on mixing rates while considering the genetic drift was also stable in this scenario. On the other hand, biases were observed in the estimates without considering the drift. A noteworthy fact is that the biases are smaller than those in scenario I, where samples were taken for ≥7 years than the 2 in scenario II. This phenomenon indicates that the large bias in scenario I was due to the impact of the genetic drift through the year. Therefore, the longer the sampling period is the more effective it is to take the genetic drift into account.

To reduce the computation time for the simulation, we assumed only three alleles in our simulation studies. The allele or haplotype frequencies in the released population have a large impact on the estimation of mixing rates rather than the number of alleles. The results for our model with considering genetic drift showed that the estimation performance of mixing rates in years 4 and 7 for scenario I tended to be worse than that in other years. This is because the most common allele in the released population is also most common in the natural population before release. On the other hand, smaller standard deviations were observed in years 2 and 5, when the common allele in the released population is a rare allele in the natural population. In this way, use of rare alleles or haplotypes for the released population improves the estimation performance.

CASE STUDY: ANALYSIS FOR MUD CRABS

Data:

In Japan, three mud crab species, S. paramamosain, S. serrata, and S. olivacea, are commercially important resources in local scale fisheries. Urado Bay in Kochi Prefecture is the main area for S. paramamosain fisheries. The life span of this crab is from 1.5 to 2.5 years (Obata et al. 2006). May and June are the spawning season, and in September hatched crabs grow to commercial size and recruit into the fishing ground. The fishery is operated mainly from August to December with the highest catch in September. The commercial catch of this species has fallen since the mid-1970s from ∼12 tons and reached a record low of 1.6 tons in 1989. In the early 1980s, a stock enhancement program for this species started in Urado Bay, and the number of released juveniles has increased.

To examine the efficacy of the release program for this species, an experiment using genetic tagging was conducted (Obata et al. 2006). Samples were collected in Urado Bay around December in 1996–2002. In each year from 1996 to 2000, individuals with different haplotypes were chosen from samples to produce supplemental juveniles, and these minor haplotypes would then be used as a genetic tag. During mid-May/early June in each of the next years, juveniles were released into Urado Bay. Supplemented individuals, if not removed by fishing from the population in the year they are introduced, can contribute to the reproducing population in the second year.

We used observed haplotype frequencies in the mtDNA D-loop region of the mud crabs (Table 3). Here, we grouped very minor haplotypes into one category called “others.” The haplotype frequencies for the released population are also shown in Table 3. Only one haplotype was released every year, except 1998. Using these data, we estimated mixing rates and the genetic drift simultaneously. Because the number of released juveniles was known, we can regard the parameters p_2y as fixed. We set the burn-in period at 5000 and kept every fifth simulation draw of 250,000 samples as an effective output for Monte Carlo sampling.

TABLE 3.

A summary of observed haplotype frequencies in the mtDNA D-loop region of S. paramamosain for the natural and the released populations

	Haplotype
Year	1	2	3	4	5	6	7	8	9	Others
	Natural population
1996	79	23	13	16	10	3	1	3	0	1
1997	90	65	16	13	11	2	5	2	0	9
1998	57	37	22	11	7	0	1	2	0	6
1999	159	124	39	15	11	3	5	5	5	37
2000	225	147	47	32	21	2	4	5	4	34
2001	71	32	8	7	5	0	0	1	11	23
2002	105	50	30	17	0	5	0	0	5	35
	Released population
1996	0	0	0	0	0	0	0	0	0	0
1997	0	98,300	0	0	0	0	0	0	0	0
1998	0	0	64,000	0	0	0	11,000	0	0	0
1999	0	0	0	81,000	0	0	0	0	0	0
2000	0	0	0	0	72,000	0	0	0	0	0
2001	0	0	0	0	0	0	0	0	149,000	0
2002	0	0	0	0	0	0	0	0	0	0

Open in a new tab

Results:

Table 4 shows the summary of parameter estimations with and without taking the genetic drift into account. The estimates of mixing rates, Inline graphic , were small except for 1997. Little differences between the estimates of the two methods were observed. This may be due to the small genetic drift. In fact, the estimated θ was ∼341.1. Although θ may be overestimated considering the simulation results, severe genetic drift was not observed.

TABLE 4.

Summary of parameter estimations

Mixing rates (ω_y)
1997	1998	1999	2000	2001	Genetic drift (θ)
With considering genetic drift
0.152 (0.023)	0.0254 (0.021)	0.0000 (0.000)	0.0151 (0.0071)	0.0389 (0.0086)	341.1 (69.4)
Without considering genetic drift
0.142 (0.031)	0.0085 (0.018)	0.0000 (0.000)	0.0000 (0.0000)	0.0335 (0.0099)

Open in a new tab

The values in parentheses are the standard errors assessed by the inverse of Fisher information.

Although not so much difference in the number of released juveniles was recorded except for in 2001, a large yearly variation in the estimates of mixing rates was observed. In the next section, we discuss the effects of adaptation and selection on the estimation of mixing rates.

DISCUSSION

In this article, we proposed a statistical model for the simultaneous estimation of mixing rates and genetic drift for stock enhancement programs. In our model, temporal changes in haplotype or allele frequencies are modeled, taking into account mixing and genetic drift. This model can be regarded as an extension of work by Anderson et al. (2000), in which temporal changes due to genetic drift only were assumed and the effective population size (N_e) was estimated using the relationship between the genetic drift and N_e. Our simulation studies demonstrated that the performance for the estimation of the mixing rates, ω, based on the proposed model, was much better than that based on the model ignoring the genetic drift if the degree of genetic drift was large. Also, even if the degree of genetic drift was small, the mixing rates were well estimated.

Meanwhile, the simulation results also showed that the estimation of genetic drift is difficult compared to that of mixing rates. The genetic drift θ was overestimated especially when the genetic drift was small. In fact, the standard errors for θ in the case studies for mud crabs were large. This is partly due to the small number of sampling years and partly due to the use of only the mtDNA haplotype frequencies. Longer-term research would improve the estimation performance for the genetic drift and the mixing rates. In such studies, however, we must also consider the change in θ or N_e through time, because the parameter θ is closely related to the effective population size. Such variation may be likely in situations where juveniles are artificially released. Note that our model does not take into account effects of removal of individuals. We supposed that individuals are randomly caught by fisheries from the composite populations, and therefore change in haplotype frequencies due to fisheries catch occurs in the same manner as the genetic drift. In this respect, we think that any changes in statistical model are not essentially necessary even if considering the effect. However, we regarded such random changes as only a result of genetic drift and related it to the effective population size. If the amount of sample is large, effective population size after the fishing season may be underestimated.

The estimates of θ for mud crabs in Urado Bay were 341.1. Because mtDNA haplotypes were used in the analysis here, the degree of genetic drift θ is equal to the effective female population size minus one, N_f − 1. If the sex ratio is 1:1, then the estimate of the effective population size is N_e = 2N_f = 684.2. Although the accuracy of this value is not high because of the large standard error for θ, it seems to be small compared to the actual population size. The catch, in weight, in the late 1990s was ∼5 tons, and the average weight was estimated as 411 g through a market survey (Obata et al. 2006). Hence, the catch in number can be calculated at 12,165. The mortality rate is unknown for this stock, but if the exploitation rate can be assumed to be 33% and the natural mortality is ignored, the actual population size after exploitation, N, is (1 − 0.33) · 5000/(0.411 · 0.33) ≈ 24,700. This value is considerably larger than the estimated N_e. However, as described in Turner et al. (2002), the estimate of the ratio N_e/N for the red drum, Sciaenops ocellatus, was <0.001. The ratio N_e/N can take small values in nature, for example, if heterogeneity in fecundity or reproductivity among individuals or habitat is large. In this case, genetic impacts of hatchery releases on wild populations cannot be ignored. In this regard, further studies are necessary for the investigation of N_e and mixing for mud crabs in Urado Bay.

Our statistical model does not take into account selection on the juvenile mortality and fertility. For our situation, there are two possible selection forces: domestication and natural selection. Domestication is defined as genetic changes that result from different selective pressures in the hatchery and the wild environment (Waples and Drake 2004). If a hatchery population was established from the natural population and then kept in the hatchery and multiplied, the hatchery-reared stock generally becomes adapted to hatchery conditions and maladapted to natural conditions. Therefore, the mixing rate in the fishery season can be different from the one at the time of release. Our model estimates the former. Domestication may also cause a difference in fertility between the released individuals and the others in the composite population. In addition, natural selection is another factor that has an impact on changes in haplotype frequencies. In this study, we consider only the genetic drift and ignore the difference in the fertility and fitness.

If such selection in the reproduction process is mild and the effective population size is small, the effect on the estimation of mixing rates may be small. Meanwhile, if the effect of selection is relatively large, the directional changes in the frequencies may cause a bias in our estimation of mixing rate. For example, assume that minor haplotype I is used for release in one year and another haplotype II is used in the next year. Suppose that fitness of haplotype I is low. In this case, the frequency of haplotype I before release in the next year decreases due to selection. Therefore, the mixing rate of the released population with haplotype II is overestimated. If the same haplotype I is used in both years, and fitness of haplotype I stays low, then mixing rate is in turn underestimated. In this way, our estimates of mixing rates might be biased if the effect of selection is large. On the other hand, selection increases the level of genetic drift because it increases the rate of inbreeding. The result in the mud crab analysis showed a small level of genetic drift. We used wild female crabs caught in the fishing ground for juvenile production every year and duration for juveniles kept in the hatchery environment was short, ∼1 month or 1.5 months. Therefore, the domestication selection might be neglected for this case. Minor haplotypes are considered the consequence of long-term natural selection and evolution. Although further investigation is necessary, it may have suggested a small impact of selection on the estimation.

In this article, we described mostly a statistical model only for cases where haplotype or allele counts at a single locus are observed. Now, many microsatellites are often used in the study of population genetics, including genetic stock identification (e.g., Wirgin and Waldman 2005). The extension for multiple unlinked markers is direct. Even with loosely linked markers, our statistical models may perform well. However, the effect of linkage disequilibrium can be stronger than that in usual population genetics studies.

Linkage disequilibrium exponentially decays with successive generations. Hence, in a population reaching equilibrium, the assumption of independence among loci will be appropriate. Our article, however, considers admixture of yearly released populations, and it is impossible to expect such equilibrium. Therefore, the assumption of independent loci tends to no longer hold. Statistical models under linkage disequilibrium have been developed (e.g., Falush et al. 2003; Kitada and Kishino 2004). However, we think that such modeling and developing of estimation methods themselves are pretty complicated and beyond the scope of our research. Therefore, statistical models considering linkage disequilibrium as well as the mixing rates and genetic drift will be developed in future research.

Our statistical model included latent vectors representing haplotype frequencies in mtDNA due to genetic drift. The likelihood function is not expressed in a closed form, and hence direct maximization is not possible. In this article, we consider the maximum-likelihood estimation with a Monte Carlo approximation of likelihood (and Monte Carlo EM, MCEM, algorithm). Meanwhile, the maximum-likelihood estimate is not always the best because the profile likelihood has a drawback when the sample size is small compared to the number of parameters (e.g., Neyman and Scott 1948). In fact, for more precise estimation of θ, eliminating the other parameters including ω through integration may be efficient (Kitakado et al. 2006, accompanying article in this issue). In this case, some noninformative prior distributions are to be assumed, and the parameters, except for θ, are integrated out from the likelihood. The parameter θ is estimated by maximizing the integrated likelihood, and the parameters, except for θ, are inferred using their posterior distribution given data and the estimate of θ. This kind of hierarchical structure is also effective in a more practical situation where there is a random or systematic change in θ over years. Although in this case further hierarchical modeling is necessary, extension of our model and algorithm is not difficult for such purposes. Furthermore, if we consider the presence of locus-specific θ, the assumption of random variation of θ over loci could contribute to the reduction in the number of parameters as well as an improvement of estimation performance. In this case, further complicated algorithms would be necessary. These are matters to be investigated in future research.

In this article, we have developed a statistical method for estimating mixing rates of hatchery-released individuals in enhanced populations. Such mixing could also occur when the outside species or population comes from a natural population. In this sense, we note that our method under successive sampling can also be applied in conservation genetics to monitor the effect of invasion by outside populations.

Acknowledgments

The authors are grateful to two anonymous reviewers for their valuable comments on the original version of this manuscript. We also thank Katsuyuki Hamasaki, Naohisa Kanda, and Hiroshi Okamura for their helpful comments.

APPENDIX A: MCMC SAMPLING

We illustrate a method for generating random numbers from Inline graphic using a Metropolis-within-Gibbs sampling method. Let be the initial values for this algorithm. Then a sequence of random vectors is generated in turn,

(A1)

The method for actual generation is as follows:

Sample a candidate vector from a proposal distribution
Compute acceptance ratios defined by
Set as

For example, the form of the acceptance ratio is justified by the following formula:

(A2)

As a proposal distribution for Inline graphic , we use a Dirichlet distribution as

where the parameter c controls a deviation between a candidate vector p_1y* and the previous vector Inline graphic . In fact, the variance of is given by , and therefore the mixing condition over parameter space becomes slow if c increases, while the acceptance ratio decreases if c decreases. We chose a suitable value by trial and error. The samples in the initial stage were discarded as a burn-in period, which is usually used in MCMC to eliminate the effect of the initial values (Gelman et al. 2004). In addition, the MCMC sequence was thinned to reduce autocorrelation (Gelman et al. 2004).

APPENDIX B: MCEM ALGORITHM

Let Inline graphic be the logarithm of , and let φ^[t] denote a value of φ at the tth EM iteration. Define Q(φ | φ^[t]) by the conditional expectation of the l_c(φ) with respect to given n₁,

Then the original EM algorithm alternates between the two steps as follows:

E step: Compute Q(φ | φ^[t]).
M step: Define φ^[t+1] by a value of φ maximizing Q(φ | φ^[t]).

In our model, the complete log-likelihood, Inline graphic , can be obtained as

with Inline graphic . The original EM algorithm expects an explicit formula for Q(φ | φ^[t]). This is the case at least for statistical models with simple exponential families. Unfortunately, another step is necessary for computation of Q(φ | φ^[t]) in our model. In fact, the function Q(φ | φ^[t]) is expressed as

The two parts of the conditional expectation with respect to Inline graphic given n₁ have no explicit formulas. To compute the conditional expectation of the complete log-likelihood, Q(φ | φ^[t]), we use a Monte Carlo integration by generating simulation outcomes for given the observable data n₁. The random variables are generated from the conditional distribution, Inline graphic with an MCMC as described in appendix a. The size m_t at the tth step should be increased with t. Then the function Q(φ | φ^[t]) is approximately calculated as

where m_t is the length of the effective MCMC sequence considering the burn-in period and thinning.

References

Anderson, E. C., 2005. An efficient Monte Carlo method for estimating N_e from temporally spaced samples using a coalescent-based likelihood. Genetics 170: 955–967. [DOI] [PMC free article] [PubMed] [Google Scholar]
Anderson, E. C., E. G. Williamson and E. A. Thompson, 2000. Monte Carlo evaluation of the likelihood for N_e from temporally spaced samples. Genetics 156: 2109–2118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Blaxter, J. H. S., 2000. The enhancement of marine fish stocks. Adv. Mar. Biol. 38: 1–54. [Google Scholar]
Falush, D., M. Stephens and J. K. Pritchard, 2003. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164: 1567–1587. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gelman, A., J. B. Carlin, H. S. Stern and D. B. Rubin, 2004. Bayesian Data Analysis, Ed. 2. Chapman & Hall/CRC Press, Boca Raton, FL.
Hilborn, R., and C. J. Walters, 1992. Quantitative Fisheries Stock Assessment: Choice, Dynamics and Uncertainty, Chapman & Hall, New York.
Imai, H., Y. Obata, S. Sekiya, T. Shimizu and K. Numachi, 2002. Mitochondrial DNA markers confirm successful stocking of mud crab juveniles, Scylla paramamosain in a natural population. Suisan Zoshoku 50: 149–156. [Google Scholar]
Kitada, S., and H. Kishino, 2004. Simultaneous detection of linkage disequilibrium and genetic differentiation of subdivided populations. Genetics 167: 2003–2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kitada, S., and H. Kishino, 2006. Lessons learned from Japanese marine finfish stock enhancement programmes. Fish. Res. 80: 101–112. [Google Scholar]
Kitakado, T., S. Kitada, H. Kishino and H. J. Skaug, 2006. An integrated-likelihood method for estimating genetic differentiation between populations. Genetics 173: 2073–2082. [DOI] [PMC free article] [PubMed] [Google Scholar]
Leber, K. M., S. Kitada and H. L. Blankenship, 2004. Stock Enhancement and Sea Ranching, Ed. 2. Blackwell Publishing, Oxford.
McLachlan, G. J., and T. Krishnan, 1997. The EM Algorithm and Extensions. John Wiley & Sons, New York.
Millar, R. B., 1987. Maximum likelihood estimation of mixed stock fishery composition. Can. J. Fish. Aquat. Sci. 44: 583–590. [Google Scholar]
Nei, M., and F. Tajima, 1981. Genetic drift and estimation of effective population size. Genetics 98: 625–640. [DOI] [PMC free article] [PubMed] [Google Scholar]
Neyman, J., and E. L. Scott, 1948. Consistent estimates based on partially consistent observations. Econometrica 16: 1–32. [Google Scholar]
Obata, Y., H. Imai, T. Kitakado, K. Hamasaki and S. Kitada, 2006. The contribution of stocked mud crabs Scylla paramamosain to commercial catches, estimated using a genetic stock identification technique in Japan. Fish. Res. 80: 113–121. [Google Scholar]
Pauly, D., V. Christensen, S. Guénette, T. J. Pitcher, U. R. Sumaila et al., 2002. Towards sustainability in world fisheries. Nature 418: 689–695. [DOI] [PubMed] [Google Scholar]
Pella, J., and M. Masuda, 2001. Bayesian methods for analysis of stock mixtures from genetic characters. Fish. Bull. 99: 151–167. [Google Scholar]
Robert, C. P., and G. Casella, 2004. Monte Carlo Statistical Methods, Ed. 2. Springer-Verlag, New York.
Ryman, N., and L. Laikre, 1991. Effects of supportive breeding on the genetically effective population size. Conserv. Biol. 5: 325–329. [Google Scholar]
Svåsand, T., T. S. Kristiansen, T. Pedersen, A. G. V. Salvanes, R. Engelsen et al., 2000. The enhancement of cod stocks. Fish Fish. 1: 173–205. [Google Scholar]
Turner, T. F., J. P. Wares and R. G. John, 2002. Genetic effective size is three orders of magnitude smaller than adult census size in an abundant, estuarine-dependent marine fish (Sciaenops ocellatus). Genetics 162: 1329–1339. [DOI] [PMC free article] [PubMed] [Google Scholar]
Waples, R., 1989. A generalized approach for estimating effective size from temporal changes in allele frequency. Genetics 121: 379–391. [DOI] [PMC free article] [PubMed] [Google Scholar]
Waples, R., and J. Drake, 2004. Risk/benefit consideration for marine stock enhancement: a Pacific salmon perspective, pp. 260–306 in Stock Enhancement and Sea Ranching, Ed. 2, edited by K. M. Leber, S. Kitada and H. L. Blankenship. Blackwell Publishing, Oxford.
Williamson, E. G., and M. Slatkin, 1999. Using maximum likelihood to estimate population size from temporal changes in allele frequencies. Genetics 152: 755–761. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wirgin, I., and J. R. Waldman, 2005. Use of nuclear DNA in stock indentification: single-copy and repetitive sequence markers, pp. 331–370 in Stock Identification Methods, edited by S. X. Cadrin, K. D. Friedland and J. R. Waldman. Elsevier Academic Press, Burlington, MA.

[bib1] Anderson, E. C., 2005. An efficient Monte Carlo method for estimating N_e from temporally spaced samples using a coalescent-based likelihood. Genetics 170: 955–967. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Anderson, E. C., E. G. Williamson and E. A. Thompson, 2000. Monte Carlo evaluation of the likelihood for N_e from temporally spaced samples. Genetics 156: 2109–2118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Blaxter, J. H. S., 2000. The enhancement of marine fish stocks. Adv. Mar. Biol. 38: 1–54. [Google Scholar]

[bib4] Falush, D., M. Stephens and J. K. Pritchard, 2003. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164: 1567–1587. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Gelman, A., J. B. Carlin, H. S. Stern and D. B. Rubin, 2004. Bayesian Data Analysis, Ed. 2. Chapman & Hall/CRC Press, Boca Raton, FL.

[bib6] Hilborn, R., and C. J. Walters, 1992. Quantitative Fisheries Stock Assessment: Choice, Dynamics and Uncertainty, Chapman & Hall, New York.

[bib7] Imai, H., Y. Obata, S. Sekiya, T. Shimizu and K. Numachi, 2002. Mitochondrial DNA markers confirm successful stocking of mud crab juveniles, Scylla paramamosain in a natural population. Suisan Zoshoku 50: 149–156. [Google Scholar]

[bib8] Kitada, S., and H. Kishino, 2004. Simultaneous detection of linkage disequilibrium and genetic differentiation of subdivided populations. Genetics 167: 2003–2013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Kitada, S., and H. Kishino, 2006. Lessons learned from Japanese marine finfish stock enhancement programmes. Fish. Res. 80: 101–112. [Google Scholar]

[bib10] Kitakado, T., S. Kitada, H. Kishino and H. J. Skaug, 2006. An integrated-likelihood method for estimating genetic differentiation between populations. Genetics 173: 2073–2082. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Leber, K. M., S. Kitada and H. L. Blankenship, 2004. Stock Enhancement and Sea Ranching, Ed. 2. Blackwell Publishing, Oxford.

[bib12] McLachlan, G. J., and T. Krishnan, 1997. The EM Algorithm and Extensions. John Wiley & Sons, New York.

[bib13] Millar, R. B., 1987. Maximum likelihood estimation of mixed stock fishery composition. Can. J. Fish. Aquat. Sci. 44: 583–590. [Google Scholar]

[bib14] Nei, M., and F. Tajima, 1981. Genetic drift and estimation of effective population size. Genetics 98: 625–640. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Neyman, J., and E. L. Scott, 1948. Consistent estimates based on partially consistent observations. Econometrica 16: 1–32. [Google Scholar]

[bib16] Obata, Y., H. Imai, T. Kitakado, K. Hamasaki and S. Kitada, 2006. The contribution of stocked mud crabs Scylla paramamosain to commercial catches, estimated using a genetic stock identification technique in Japan. Fish. Res. 80: 113–121. [Google Scholar]

[bib17] Pauly, D., V. Christensen, S. Guénette, T. J. Pitcher, U. R. Sumaila et al., 2002. Towards sustainability in world fisheries. Nature 418: 689–695. [DOI] [PubMed] [Google Scholar]

[bib18] Pella, J., and M. Masuda, 2001. Bayesian methods for analysis of stock mixtures from genetic characters. Fish. Bull. 99: 151–167. [Google Scholar]

[bib19] Robert, C. P., and G. Casella, 2004. Monte Carlo Statistical Methods, Ed. 2. Springer-Verlag, New York.

[bib20] Ryman, N., and L. Laikre, 1991. Effects of supportive breeding on the genetically effective population size. Conserv. Biol. 5: 325–329. [Google Scholar]

[bib21] Svåsand, T., T. S. Kristiansen, T. Pedersen, A. G. V. Salvanes, R. Engelsen et al., 2000. The enhancement of cod stocks. Fish Fish. 1: 173–205. [Google Scholar]

[bib22] Turner, T. F., J. P. Wares and R. G. John, 2002. Genetic effective size is three orders of magnitude smaller than adult census size in an abundant, estuarine-dependent marine fish (Sciaenops ocellatus). Genetics 162: 1329–1339. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Waples, R., 1989. A generalized approach for estimating effective size from temporal changes in allele frequency. Genetics 121: 379–391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Waples, R., and J. Drake, 2004. Risk/benefit consideration for marine stock enhancement: a Pacific salmon perspective, pp. 260–306 in Stock Enhancement and Sea Ranching, Ed. 2, edited by K. M. Leber, S. Kitada and H. L. Blankenship. Blackwell Publishing, Oxford.

[bib25] Williamson, E. G., and M. Slatkin, 1999. Using maximum likelihood to estimate population size from temporal changes in allele frequencies. Genetics 152: 755–761. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Wirgin, I., and J. R. Waldman, 2005. Use of nuclear DNA in stock indentification: single-copy and repetitive sequence markers, pp. 331–370 in Stock Identification Methods, edited by S. X. Cadrin, K. D. Friedland and J. R. Waldman. Elsevier Academic Press, Burlington, MA.

PERMALINK

Simultaneous Estimation of Mixing Rates and Genetic Drift Under Successive Sampling of Genetic Markers With Application to the Mud Crab (Scylla paramamosain) in Japan

Toshihide Kitakado

Shuichi Kitada

Yasuhiro Obata

Hirohisa Kishino

Abstract

MODELS AND METHODS

Statistical model:

Figure 1.—

Likelihood:

Estimation algorithm:

The case of multiple unlinked loci:

SIMULATION STUDIES

Simulation scenarios:

TABLE 1.

Simulation results:

TABLE 2.

CASE STUDY: ANALYSIS FOR MUD CRABS

Data:

TABLE 3.

Results:

TABLE 4.

DISCUSSION

Acknowledgments

APPENDIX A: MCMC SAMPLING

APPENDIX B: MCEM ALGORITHM

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Simultaneous Estimation of Mixing Rates and Genetic Drift Under Successive Sampling of Genetic Markers With Application to the Mud Crab (Scylla paramamosain) in Japan

Toshihide Kitakado

Shuichi Kitada

Yasuhiro Obata

Hirohisa Kishino

Abstract

MODELS AND METHODS

Statistical model:

Figure 1.—

Likelihood:

Estimation algorithm:

The case of multiple unlinked loci:

SIMULATION STUDIES

Simulation scenarios:

TABLE 1.

Simulation results:

TABLE 2.

CASE STUDY: ANALYSIS FOR MUD CRABS

Data:

TABLE 3.

Results:

TABLE 4.

DISCUSSION

Acknowledgments

APPENDIX A: MCMC SAMPLING

APPENDIX B: MCEM ALGORITHM

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases