Alter et al. 10.1073/pnas.0706056104.

Supporting Information

Files in this Data Supplement:

SI Table 2
SI Table 3
SI Table 4
SI Table 5
SI Figure 4
SI Figure 5
SI Figure 6
SI Methods




SI Figure 4

Fig. 4. Graphic representation of migration scenario used in demographic simulations.





SI Figure 5

Fig. 5. Relationships between pairwise genetic distance and divergence (in years) for baleen whale species, for each intron and cytochrome b. Each data point represents the pairwise genetic distance between two baleen whale species versus the divergence time for those species (1).

1. Sasaki T, Nikaido M, Hamilton H, Goto M, Kato H, Kanda N, Pastene LA, Cao Y, Fordyce RE, Hasegawa M, et al. (2005) Syst Biol 54:77-90.





SI Figure 6

Fig. 6. Estimation of mutation rate using data from all autosomal introns. The estimated mutation rate derived from the slope (bold line) is 4.15 ´ 10-10 bp-1·year-1. Dotted lines represent 95% confidence limits around the regression line. Confidence intervals for the overall regression in Fig. 4 were generated in the statistical package JMP.





Table 2. Introns, associated genes, primers, optimal annealing temperature for gray whales, and references used for each
nuclear intron

Intron name

Associated gene

PCR primers

Source

Optimum Ta, ºC

Size of PCR product, bp

ACTA

Actin

ACT-1: GCT GTT TTT CCC TCC ATT GT

ref. 1

47

1,200

  

ACT-7: GTG TCA TGT ATC TTC TAC AT

ref. 1

  

BTN

Butyrophilin

But-b1s: TCA GGC AGG ACG AAA ACT AC BTNr4: GCA GTT TCT CAT ATA TTG TCT C

C. S. Baker, personal communication; S.E.A. (this study)

50

700

CP

Ceruloplasmin mRNA

CP-F: CTA GGT CCT GTC ATT TGG GC CP-R: TCT TTG GGG ACA GTC CAT TC

ref. 2

60

1,500

ESD

Esterase D

ESD-F: TTT TGG ACA CTC CAT GGG AG

ref. 2

60

800

  

ESD-R: TTC CAT TTA CTT TGA TCT GTT CCC

ref. 2

  

FGG

Fibrinogen g-polypeptide

FGG-F: TCA AGA CAT TGC CAA TAA GGG FGG-R: CCA GTA GGA GAC AGA TGT CCA AA

ref. 2

55

1,100

G6PD*

Glucose-6-phosphate dehydrogenase

G6PD-F: CGT GAG GAC CAG ATC TAC CG G6PD-R: GCA GGA GGT GGT TCT GCA

ref. 2

65

500

LACTAL

a-Lactalbumin

LacII.F: CCA AAA TGA TGT CCT TTG TC

ref. 3

50

1,200

  

LacIV.R: GAC TCA CCA GTA GGT AAT TC

ref. 3

  

PLP*

Proteolipid protein

PLP-F: GGC CAC TGG ATT RTG TTT CT PLP-R: CGC AGA TGG TGG TCT TGT AG

ref. 2

60

900

WT1

Wilms tumor 1

WT1-F: GAG AAA CCA TAC CAG TGT GA WT1-R: GTT TTA CCT GTA TGA GTC CT

ref. 4

60

700

T

a

, annealing temperature.

*Located on X chromosome.

1. Palumbi SR, Baker CS (1994) Mol Biol Evol 11:426-435.

2. Lyons LA, Laughlin TF, Copeland NG, Jenkins NA, Womack JE, O'Brien SJ (1997) Nat Genet 15:47-56.

3. Rychel AL, Reeder TW, Berta A (2004) Mol Phylogenet Evol 32:892-901.

4. Venta PJ, Brouillete JA, Yuzbasiyanurkan V, Brewer GJ (1996) Biochem Genet 34:321-341.





Table 3. Summary statistics for introns and mitochondrial markers calculated using DnaSP (1)

Intron

Θ(S)

Π

Fu and Li's D*

Fu and Li's F

Tajima's D

ACTA

0.00108

0.00244

1.06482

1.91713

2.83433

BTN

0.00061

0.00016

-0.99055

-1.23144

-1.20939

CP

0.00047

0.00029

0.70801

0.34996

-0.63519

ESD

0.00157

0.0031

0.04694

0.8644

2.19531

FGG

0.00074

0.00081

0.85595

0.75646

0.1811

G6PD

0

0

-

-

-

PLP

0.00086

0.00103

0.88765

0.86054

0.39472

LACTAL

0.0006

0.00089

0.85595

1.02463

0.92255

WT1

0.00048

0.00084

0.50695

0.72479

0.90013

cyt b

0.00273

0.00188

0.8129

0.30253

-0.91383

d-loop

0.01468

0.01519

1.35969

1.08120

0.064

No markers showed statistically significant deviations from neutrality as estimated by Fu and Li's D*, Fu and Li's F, or Tajima's D (Bonferroni-corrected P < 0.0015).

1. Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R (2003) Bioinformatics 19:2496-2497.





Table 4. Species of baleen whale used in each linear regression to determine substitution rate, mutation model as determined by MODELTEST (1), and slope of linear relationship [models selected include HKY (2), F81 (3), K80 (4), and TrN (5)]

Intron name

Species used

Mutation model

Slope

ACTA

1, 4, 6, 10, 12

HKY, I = 0, G = 0

1.00 ´ 10

-9

BTN

1, 4, 6, 7, 9, 10, 11, 12, 14, 15

HKY, I = 0, G = 0

9.00 ´ 10

-10

CP

2, 4, 6, 10

HKY, I = 0, G = 0

1.00 ´ 10-9

ESD

3, 6, 10, 11, 12

F81, I = 0, G = 0

1.00 ´ 10

-9

FGG

1, 2, 4, 6, 7, 12, 13

F81, I = 0, G = 0

3.00 ´ 10

-10

LACTAL

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14

F81 + I

7.00 ´ 10

-10

G6PD

1, 2, 4, 6, 7, 9, 10, 11, 12

HKY, I = 0, G = 0

8.00 ´ 10

-10

PLP

4, 6, 9, 10, 11, 12, 14, 15

K80, I = 0, G = 0

2.00 ´ 10

-9

WT1

1, 6, 7, 10, 12, 14

K80, I = 0, G = 0

8.00 ´ 10

-10

cyt b

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14

TrN + G

8.00 ´ 10

-9

I = 0: Proportion of invariable sites. G = 0: Equal rates for all sites. Species used were as follows: 1, E. glacialis; 2, E. japonica; 3, E. australis; 4, B. mysticetus; 5, C. marginata; 6, E. robustus; 7, B. borealis; 8, B. brydei; 9, B. edeni; 10, B. musculus; 11, B.physalus; 12, M. novangliae; 13, B.bonarensis; 14, B.acutorostrata; 15, D. delphinus.

1. Posada D, Crandall KA (1998) Bioinformatics 14:817-818.

2.

Hasegawa M, Kishino K, Yano T (1985) J Mol Evol 22:160-174.

3.

Felsenstein J (1981) J Mol Evol 17:368-376.

4.

Kimura M (1980) J Mol Evol 16:111-120.

5.

Tamura K, Nei M (1993) Mol Biol Evol 10:512-526.





Table 5. Microsatellite loci used in Structure analysis

Locus

Label

T

a

Repeats

N

H

O

H

E

Ref.

EV94

NED

60

2

10

0.72973

0.78934

38

Gata417

VIC

61

4

7

0.64865

0.66901

39

Gt023

6FAM

61

2

9

0.86842

0.7593

40

D17

6FAM

54

2

14

0.97368

0.88561

41

RW31

NED

54

2

9

0.79412

0.83363

42

AC137

NED

61

2

5

0.60526

0.66281

43

T

a

, annealing temperature; N, number of alleles; HO, observed heterozygosity; HE, expected heterozygosity.





SI Methods

Collection of Samples, Amplification, and Sequencing of Mitochondrial and Nuclear DNA.

Tissue samples from 42 individual gray whales were taken between 1991 and 2004 from the eastern North Pacific population from biopsied, stranded, or hunted animals, and were stored in 20% DMSO at -20°C until extraction (SWFSC accession nos. 394828504). Because the population size during this period was »18,000-30,000 (1) and most sampled tissue came from stranded animals, the probability that the same individuals were sampled twice or that close relatives were sampled is minimal. Genomic DNA was extracted at Southwest Fisheries Science Center (La Jolla, CA) by K. Robertson. Genomic DNA extraction of seven additional gray whale muscle and blubber tissues was performed in our laboratory by using a Nucleospin Tissue kit (BD Biosciences, San Jose, CA).

To amplify independently evolving introns in gray whales, we selected previously described exon-priming, intron-crossing primers from several references (Table 2). Primer pairs were used in gradient PCRs under the following conditions: 0.2 mM dNTPs, 0.2 mM 10´ buffer, 0.8 mM of each primer, 0.25 units of Dynazyme EXT polymerase, and 200-500 nmol of genomic DNA in a 25-ml reaction. From the primer sets that gave a single strong band between 500 and 2,000 bp in size, we selected nine loci (seven autosomal and two X-linked) for sequencing. In addition to nuclear introns, we also amplified and sequenced two mitochondrial loci (cytochrome b and control region). Control region data for 34 individuals were kindly supplied by A. Lang, and we sequenced the remaining individuals. PCR products were first purified with a SAP-EXO reaction (USB Corporation, Cleveland, OH). Cleaned PCR products were sequenced on an automated ABI 3100 sequencer (Applied Biosystems, Foster City, CA). Internal sequencing primers were designed for most loci to ensure complete coverage of the targeted fragment in both directions.

Sequence data were visualized and aligned by eye using Sequencher 4.5 (Gene Codes Corporation, Ann Arbor, MI). All heterozygotes were confirmed by sequencing in both the forward and reverse directions, giving a 97% probability of correct identification (2). Phase was determined for each marker based on inference from the sequences of homozygotes in the dataset.

We sequenced nine nuclear introns (>7,000 bp total) in 22-42 individual gray whales from the ENP population (Table 2; GenBank accession nos. EF043286-EF043340; additional allele data available on request). We used the program DnaSP (3) to retrieve several measures of diversity for the eastern North Pacific population, including Q, p, Tajima's D, and Fu and Li's D*. Departure from Hardy-Weinberg equilibrium was tested by using Pearson's c2 test (4). To test for linkage disequilibrium between loci, we used the index of association (IA) (5) compared to 1,000 randomized datasets, using the program Multilocus 1.2 (6).

No significant linkage disequilibrium as measured by IA was detected between any nuclear markers (P < 0.05). Significant deviation from Hardy-Weinberg equilibrium (P < 0.05) was only detected for one intron, PLP. This marker is X-linked in mammals, accounting for the heterozygote shortage.

Overall, variation at all nuclear introns and cytochrome b was low, with an average of one single nucleotide polymorphism (SNP) approximately every 300 bp. This pattern contrasts with the high diversity seen in the control region at the population level, where 34 of 523 positions vary. The observed transition/transversion ratio across all introns was 3.6.

Summary statistics were calculated for each locus in DnaSP (3) and are presented in SI Table 3. None of the markers showed statistically significant deviations from neutrality as estimated by Fu and Li's D*, Fu and Li's F, or Tajima's D (Bonferroni-corrected P < 0.0015). The ACTA and ESD introns showed high, although not statistically significant, Tajima's D values. Positive values of Tajima's D can be an indication of population decline or balancing or diversifying selection. Because selective events are much more likely to be purifying rather than balancing or diversifying across the genome (7), these positive values likely reflect the removal of rare alleles during a recent population bottleneck.

Research on the eastern North Pacific population suggests that substructure within this population is unlikely (e.g., refs. 8 and 9). We tested for the possibility of population structure in 38 of our samples by collecting additional data from six microsatellite loci (Table 5) that had previously been screened in gray or other baleen whales. We amplified each microsatellite locus in 12.5-ml reactions using 0.2 m M each of fluorescently labeled and unlabeled primers (Applied Biosystems, Foster City, CA), 0.2 mM each dNTP, 1.5 mM MgCl2, 0.065 units of Amplitaq polymerase (Applied Biosystems), 10´ PCR buffer (Applied Biosystems), and 10-20 ng of genomic DNA. Data were generated on a 3730 DNA Analyzer (Applied Biosystems) and analyzed by using GeneMapper, version 4.0 (Applied Biosystems), to obtain microsatellite genotypes. No loci were found to deviate from Hardy-Weinberg equilibrium or found to be in linkage disequilibrium. Population structure was inferred from genotypic data by using the program Structure version 2 (10), by conducting 10 independent runs each for five values of K (number of populations assumed), and calculating the posterior probability for each value of K. For all replicates, we used an admixture model with a burnin of 10,000 iterations and a run length of 10,000 iterations. Posterior probabilities of the data given K (ln Pr(X|K)) were 0.3063, 0.0005, 0.0296, 0.016387, and 0.014976 for K = 1, 2, 3, 4, and 5, respectively. The value K = 1 has a much higher posterior probability than corresponding values for larger values of K, indicating there is little evidence of substructuring within the eastern population in this set of samples.

Estimating Substitution Rates.

For each intron and the two mitochondrial markers, substitution rate was estimated by comparing pairwise genetic distance between species of baleen whales and their respective divergence times, the latter taken from ref. 11. When data were not publicly available for a given marker and species, the relevant marker was amplified and sequenced from genomic DNA provided by the Southwest Fisheries Science Center by using the methods and primers described above. For each intron and cytochrome b, the species used, mutation model, and estimated substitution rate are listed in SI Table 4. The average rate of substitutions across autosomal nuclear introns was found to be 4.8 ´ 10-10 substitutions bp-1·year-1 (ranging from 1.5 ´10-10 to 1.0 ´ 10-9). The rate of evolution of X-linked introns is slightly lower, as expected, with an average rate of 3.8 ´ 10-10 substitutions bp-1·year-1. These rates are lower than rates cited in the literature for noncoding DNA in other mammals, which average ≈3.75 ´ 10-9 substitutions bp-1·year-1 (7). However, this difference is not unexpected considering the extreme size and relatively low metabolic rate of baleen whales and the theoretical relationship between substitution rate, body size, and temperature (14, 15).

We estimated a substitution rate of 4.0 ´ 10-9 substitutions bp-1·year-1 in the cytochrome b region for baleen whales. This rate is once again relatively slow compared to the rate of the same mitochondrial marker in other mammals (16).

Estimating Genetic Population Parameters.

We calculated the genetic diversity parameter Q by using the maximum likelihood program LAMARC (17), which, by employing a coalescent framework, can jointly estimate Q with population growth and immigration rates, and can coestimate a single overall recombination rate for the genomic region(s) under consideration. Because our samples were taken from a single population, we had no immigration rates to coestimate with Q; we also did not coestimate recombination rates, because there was no or very low signal of recombination in eight of our nine introns based on initial runs of LAMARC on data from individual introns (3). Because preliminary runs of LAMARC on our data found no clear signal for exponential growth or shrinkage in the ENP population, we proceeded to estimate Q alone. Absence of a clear signal for long-term exponential change in the size of this population is consistent with our hypothesis of a relatively recent bottleneck that reduced the size by less than one order of magnitude. Our study of other potential bottleneck scenarios is described in the main text.

LAMARC uses a standard Markov chain Monte Carlo (MCMC) method to obtain its estimates of population parameters. The credence of any estimate produced by using an MCMC method is tied to both the volume of parameter space that was sampled during the search for the estimate and the degree to which the search is believed to have converged on the estimate. To address these issues, we performed each LAMARC analysis 15 times: for each of three different random number seeds, which influence the series of random sampling events in an analysis, we instructed LAMARC to begin its search at one of five different trial values for Q (0.0001, 0.001, 0.01, 0.1, and 1.0). For each genomic region, we performed 10 MCMC searches of 31,000 iterations each followed by 2 searches of 1,001,000 iterations each, building up a collection of inferred evolutionary trees during each Markov chain by sampling every 20th tree from the chain. After each chain completed, the Q value that best fit that chain's collection of sampled trees was used to generate the trees in the next chain. No samples were taken during the first 1,000 iterations of each chain, to reduce potential bias that could be incurred when sampling immediately after updating the value of Q; we thus used a "burn-in" of 1,000. We found the MLE of Q to be very robust to changes of starting point and random number seed, with a standard deviation of 6.83% of the mean. We also assessed convergence by using the program TRACER (18), by computing the effective sample size (ESS). Rambaut and Drummond suggest an ESS of 100 or 200 implies convergence has been attained. Our runs consistently achieved ESS values in the thousands or better. We therefore conclude that we ran LAMARC long enough to obtain stable, reproducible, reliable results.

We instructed LAMARC to obtain a joint maximum-likelihood estimate (MLE) for Q, using DNA sequences taken from 10 genomic regions: 7 autosomal nuclear introns, 2 X-linked introns, and cytochrome b. To produce a conservative estimate, we excluded control region data from the joint analysis because of uncertainty in phylogenetically derived mutation rate for this marker (e.g., ref. 19). Because the substitution rate in the control region is not known to the same degree of precision as it is in the aforementioned regions, we used the d-loop data alone to obtain a separate MLE for comparison. By default, LAMARC reports Q= 4Nem for nuclear autosomal data, Q = 2Ne(f)m for mitochondrial data, and Q = 3Nem for data from the X chromosome, where Ne is the effective population size, Ne(f) is the effective female population size, and m is the average mutation rate per base pair per generation. We therefore instructed LAMARC to apply scaling factors of 4 and 4/3 to our mitochondrial and X-chromosome data, respectively, using the observation that females and males are present with approximately equal frequencies in the eastern Pacific population (20). We also instructed LAMARC to report Q using a scale in which the average relative mutation rate per base pair per generation among our seven autosomal nuclear introns is unity; we did this by supplying LAMARC with scaling factors corresponding to the relative strengths of the background mutation rates in the different genomic regions. These two sets of scaling factors, along with a conversion factor of 15.5-22.28 years/generation for gray whales (see below), enabled us to simply divide each Q value received from LAMARC by the average substitution rate of the seven autosomal nuclear introns (4.786 ´ 10-10 substitutions bp-1·year -1), and then to divide that quotient by four to obtain an MLE for the effective population size Ne.

Carrying Capacity.

To understand whether carrying capacity has dropped today, we must address whether current feeding areas could support a much larger gray whale population. To calculate the number of gray whales the Bering and Chukchi shelves could potentially have supported in the past, we used the average prey biomass requirement for large whales of 366 kg-1 per individual per day (39), an average feeding window of 180 days per year (40), an amphipod density of 161 g/m2 (41), and the smallest estimated area of traditionally high amphipod density [37,000 km2 of a total feeding range of 1,000,000 km2 (40, 41)] to estimate a carrying capacity of 90,242 whales annually.

1. Rugh D, Hobbs RC, Lerczak JA, Breiwick JM (2005) J Cetacean Res Manag 7:112.

2. Hare MP, Palumbi SR (1999) Mol Ecol 8:1750.

3. Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R (2003) Bioinformatics 19:2496-2497.

4. Zar JH (1999) Biostatistical Analysis (Prentice Hall, Upper Saddler River, NJ), 4th Ed.

5. Haubold B, Travisano M, Rainey PB, Hudson RR (1998) Genetics 150:1341-1348.

6. Agapow PM, Burt A (2000) MultiLocus (Department of Biology, Imperial College, Silwood Park, UK), Version 1.2.

7. Graur D, Li W-H (2001) Fundamentals of Molecular Evolution (Sinauer, Sunderland MA), 2nd Ed.

8. Steeves TE, Darling JD, Rosel PE, Schaeff CM, Fleischer RC (2001) Conserv Genet 2:379-384.

9. Swartz SL, Taylor BL, Rugh DJ (2006) Mamm Rev 36:66-84.

10. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945-959.

11. Sasaki T, Nikaido M, Hamilton H, Goto M, Kato H, Kanda N, Pastene LA, Cao Y, Fordyce RE, Hasegawa M, et al. (2005) Syst Biol 54:77-90.

12. Posada D, Crandall KA (1998) Bioinformatics 14:817-818.

13. Swofford DL (1998) PAUP*: Phylogenetic Analysis Using Parsimony and Other Methods (Sinauer, Sunderland, MA).

14. Martin A, Palumbi SR (1993) Proc Natl Acad Sci USA 90:4087-4091.

15. Gillooly JF, Allen AP, West GB, Brown JH (2005) Proc Natl Acad Sci USA 102:140-145.

16. Pesole G, Gissi C, DeChirico A, Saccone C (1999) J Mol Evol 48:427-434.

17. Kuhner MK (2006) Bioinformatics 22:768-770.

18. Rambaut A, Drummond AJ (2003) Tracer (Univ of Oxford, Oxford), Version 1.3.

19. Stoneking M (2000) Am J Hum Genet 67:1029-1032.

20. Rice DW, Wolman AA (1971) The Life History and Ecology of the Gray Whale (Eschrichtius robustus). (Am Soc of Mammalog), Special Publication No. 3.

21. Heppell SS, Caswell H, Crowder LB (2000) Ecology 81:654-665.

22. Excoffier L, Novembre J, Schneider S (2000) J Hered 91:506-509.

23. Belle EMS, Ramakrishnan U, Mountain JL, Barbujani G (2006) Proc Natl Acad Sci USA 103:8012-8017.

24. Overpeck J, et al. (1997) Science 278:1251-1256.

25. Johnson K, Nelson C (1984) Science 225:1150-1152.

26. Nerini M (1984) in The Gray Whale (Eschrichtius robustus), eds Jones ML, Swartz SL, Leatherwood S (Academic, New York), pp 423-450.

27. Nelson C, Johnson K, Barber J (1987) J Sediment Petrol 57:419-430.

28. Eberl D (2004) Am Mineral 89:1784-1794.

29. Obst B, Hunt G (1990) Auk 107:678-688.

30. Palumbi SR, Baker CS (1994) Mol Biol Evol 11:426-435.

31. Lyons LA, Laughlin TF, Copeland NG, Jenkins NA, Womack JE, O'Brien SJ (1997) Nat Genet 15:47-56.

32. Rychel AL, Reeder TW, Berta A (2004) Mol Phylogenet Evol 32:892-901.

33. Venta PJ, Brouillete JA, Yuzbasiyanurkan V, Brewer GJ (1996) Biochem Genet 34:321-341.

34. Hasegawa M, Kishino K, Yano T (1985) J Mol Evol 22:160-174.

35. Felsenstein J (1981) J Mol Evol 17:368-376.

36. Kimura M (1980) J Mol Evol 16:111-120.

37. Tamura K, Nei M (1993) Mol Biol Evol 10:512-526.

38. Valsecchi A, Amos W (1996) Mol Ecol 5:151-156.

39. Palsboll P, Berube M, Larsen AH, Jorgensen H (1997) Mol Ecol 6:893-895.

40. Berube M, Jorgensen H, McEwing R, Palsboll PJ (2000) Mol Ecol 9:2181-2183.

41. Buchanan FC, Friesen MK, Littlejohn RP, Clayton JW (1996) Mol Ecol 5:571-575.

42. Waldick RC, Brown MW, White BN (1999) Mol Ecol 8:1763-1765.

43. Frasier TR, Rastogi T, Brown MW, Hamilton PK, Kraus SD, White BN (2006) Mol Ecol Notes 6:1025-1029.