Skip to main content
Genetics logoLink to Genetics
. 2012 Mar;190(3):1145–1148. doi: 10.1534/genetics.111.136697

Extended Linkage Disequilibrium in Noncoding Regions in a Conifer, Cryptomeria japonica

Etsuko Moritsuka *,1, Yosuke Hisataka †,1, Miho Tamura , Kentaro Uchiyama , Atsushi Watanabe §, Yoshihiko Tsumura , Hidenori Tachida *,2
PMCID: PMC3296249  PMID: 22209904

Abstract

We measured linkage disequilibrium in mostly noncoding regions of Cryptomeria japonica, a conifer belonging to Cupressaceae. Linkage disequilibrium was extensive and did not decay even at a distance of 100 kb. The average estimate of the population recombination rate per base pair was 1.55 × 10−5 and was <1/70 of that in the coding regions. We discuss the impact of low recombination rates in a large part of the genome on association studies.


LINKAGE disequilibrium (LD) in the coding genes and their surrounding regions of conifers has been reported to extend to only several hundred to a few thousand base pairs [Brown et al. (2004); Heuertz et al. (2006); Pyhäjärvi et al. (2007), but see Pyhäjärvi et al. (2011) for LD extending to several kilobases]. This observation led to the conclusion that conifers are not suitable for finding associations with traits using a genomic scan (Neale and Savolainen 2004). However, if the genome-wide extent of LD is not the same as that in the regions studied thus far, the strategy may change. Indeed, the extents of LD differ between genomic regions in maize possibly due to differences in the recombination rate (Gore et al. 2009). Furthermore, conifers generally have lower recombination rates than angiosperms (Jaramillo-Correa et al. 2010). Thus, it is important to know the extent of LD in regions other than the coding genes and their surrounding regions. However, the extent of LD in such regions has not been investigated in conifers to our knowledge possibly because of the lack of sequence information on those regions [but see Hamberger et al. (2009); Keeling et al. (2010); Kovach et al. (2010)].

We recently constructed a bacterial artificial chromosome (BAC) library for Cryptomeria japonica, a conifer belonging to Cupressaceae sensu lato (Gadek et al. 2000; Kusumi et al. 2000) and determined the sequences of eight random clones from the library. We used these sequences to design primers and resequenced fragments in those regions using the same tree samples used by Fujimoto et al. (2008) for the study of the coding regions. From these data, we investigated the extent of LD at a scale of up to 100 kb in the mostly noncoding regions covered by those BACs (see Supporting Information, File S1, for more details on materials and methods).

First, we give a summary of the sequence analysis of the BAC clones relevant to this study. Details of the analysis will be published elsewhere (M. Tamura, A. Watanabe, K. Uchiyama, N. Futamura, K. Shinohara, Y. Tsumura, H. Tachida, unpublished results). Eight random clones (BAC1–BAC8) from the BAC library developed by the Forestry and Forest Products Research Institute in Japan were sequenced. The total length of the sequences was ∼800 kb. These sequences were mostly noncoding, and at least 51% of them consisted of known (the majority being LTR retrotransposons) or unknown repeats. Only part (four exons) of one putative protein-coding gene, homologous to genes coding for calcium-dependent protein kinases, was found in BAC6. The ratio of nonsynonymous-to-synonymous divergence between the exons and the corresponding regions in Taxodium distichum, a close relative of C. japonica, was 0.086, suggesting that the gene was active when the two species diverged. The size of the largest intron of this gene was 70 kb (see Figure 1).

Figure 1 .

Figure 1 

Positions of the regions in BAC3, BAC6, and BAC7 used for the study. The regions sequenced for at least 41 samples are shown by “| |,” and the positions of the exons of a putative coding gene are indicated by solid boxes (for primer information, see Table S1). We expected to amplify only one copy of each fragment if it existed in a single copy in the genome because we used haploid genomic DNA from megagametophytes. However, because the genome of C. japonica contained many copies of very similar fragments, only sequences of several fragments each in BAC3, BAC6, and BAC7 indicated in the figure could be determined for at least 40 samples by direct sequencing. We used long PCR and gel electrophoresis to confirm the distances between the sites in the sequenced fragments. The lengths of the PCR products approximately agreed with those calculated from the assembled BAC sequences (data not shown).

In the resequencing experiment, in total, 19 fragments, including the coding regions in BAC6, could be sequenced for at least 40 samples, and their total length was 15,517 bp (Figure 1). We will call them the BAC regions hereafter. The fragment size ranged from 509 to 2016 bp, and the maximum lengths between pairs of sites in the sequenced regions in BAC3, BAC6, and BAC7 were 65,409, 110,613 and 103,427 bp, respectively.

We aligned those 40 or more sequences along with the corresponding sequence of the BACs. Thus, the total number of samples was at least 41. We computed various statistics of genetic diversity excluding sites involving indel variation (Table S2) using DNAsp 5.0 (Rozas et al. 2003). The average nucleotide diversity across all sites was 0.35%, which was similar to those at the silent sites in the mostly coding regions of the same species studied by Kado et al. (2003) and Fujimoto et al. (2008).

Figure 2 shows the squared allelic correlation coefficient, r2, against the number of base pairs separating the two sites. It decayed very little in all three BACs. Thus, LD extended to sites separated by up to 110 kb. We also estimated the population recombination rate, 4Ner, where Ne and r are the effective size of the population and the recombination rate, respectively, in the BAC regions using the composite-likelihood method (Hudson 2001; McVean et al. 2002) implemented in LDhat 2.1. The results for the BAC regions are shown in Table 1, along with those for the five coding genes previously studied by Fujimoto et al. (2008). The composite-likelihood curves for the data set are shown in Figure S1 (the BACs) and Figure S2 (the coding genes). The average r/u and 4Ner per base was 5.04 × 10−3 and 1.55 × 10−5, respectively, for the BAC regions and 4.46 × 10−1 and 1.18 × 10−3, respectively, for the five coding genes. Thus, the population recombination rate seems to be <1/70 in the BAC regions compared to that in the coding genes studied by Fujimoto et al. (2008).

Figure 2 .

Figure 2 

The levels of linkage disequilibrium in BAC3, BAC6, and BAC7. The squared allelic correlation coefficient, r2, for two sites is plotted against the number of bases separating them for BAC3 (A), BAC6 (B), and BAC7 (C).

Table 1 . Estimates of population recombination rate.

Region Length (bp) RM 4Nr 4Nr/bp 4Nu/bp r/u
BACs
 BAC3 65,411 0 0 0 0.00659 0
 BAC6 110,278 1 3 2.72 × 10−5 0.00317 0.00858
 BAC7 103,427 7 2 1.93 × 10−5 0.00296 0.00653
Mean 1.55 × 10−5 0.00346 0.00504
Coding genes
 NCED 2,125 2 3 0.00141 0.00213 0.66280
 AMT 2,682 2 2 0.00075 0.00093 0.80184
 Cal 3,541 2 3 0.00085 0.00454 0.18661
 AQU 1,787 1 4 0.00223 0.00469 0.47727
 Cryj2 4,257 6 3 0.00070 0.00681 0.10348
Mean 0.00118 0.00382 0.44640

One explanation for the difference of LD between the BACs and the coding regions is the difference in the recombination rate. In angiosperms, gene density and the recombination rate are positively correlated, possibly due to low gene density and low recombination rates in heterochromatic regions (Gaut et al. 2007). Therefore, if the genome of C. japonica is mostly heterochromatic, we expect to observe low recombination rates and low gene density in random BAC clones. Alternatively, the effective size, Ne, in the BAC regions might be smaller than that in the coding genes because of genetic draft (Gillespie 2000), background selection (Charlesworth et al. 1993), and/or the weak selection Hill–Robertson effect (McVean and Charlesworth 2000). However, because the levels of the silent nucleotide diversity in the two regions were comparable, this explanation seemed unlikely.

The slow decay of LD in the BAC regions is quite different from the rapid decay of LD within 500–2000 bp observed in Pinaceae and poplars (Brown et al. 2004; Ingvarsson 2005, 2008; Krutovsky and Neale 2005; González-Martínez et al. 2006; Heuertz et al. 2006; Pyhäjärvi et al. 2007; Ingvarsson et al. 2008; Li et al. 2010) although more extensive LD has been found in a few cases (Eckert et al. 2010; Pyhäjärvi et al. 2011). C. japonica is distantly related to these species (Chaw et al. 2000), and the difference may be ascribed to differences in the species studied; however, a more plausible explanation would be the differences in the regions studied. The other studies examined relatively narrow regions that included coding genes [except for the 80-kb gene-rich region surrounding the phytochrome B2 locus in aspen studied by Ingvarsson et al. (2008)], while our BAC regions were in mostly noncoding regions. Indeed, the extensive linkage disequilibrium found here might be consistent with the interlocus linkage disequilibrium found in coastal Douglas fir (Eckert et al. 2009) and in Pinus taeda (Eckert et al. 2010).

On the basis of the rapid decay of linkage disequilibrium in conifers, Neale and Savolainen (2004) concluded that genome-wide association studies would not be possible in conifers because of the enormous SNP marker density required (see also Neale and Kremer 2011). However, the number of necessary SNP markers may not be as large if most of the genome is segregating as blocks, as found in the BAC regions studied here. Thus, genome-wide association studies may be feasible in conifers whose genome sizes are generally large (Ohri and Khoshoo 1986), although it would be difficult to locate the causal polymorphism if LD is extensive (Atwell et al. 2010). Nevertheless, identifying causal polymorphisms is not necessary for genomic selection (Meuwissen et al. 2001). Of course, we need to choose the positions of SNP markers carefully so that they are spaced appropriately in terms of recombination rates by identifying recombination-poor and -rich regions (see Flint-Garcia et al. 2003). If heterochromatic regions are recombination-poor in conifers as they are in angiosperms (Gaut et al. 2007), then identifying heterochromatic regions is important for designing efficient markers.

In this article, we have shown that linkage disequilibrium in the BAC regions was much more extensive than that in the coding regions. However, our data are restricted to those from only three random BAC clones. We need to test the generality of these features in future studies.

Supplementary Material

Supporting Information

Acknowledgments

We thank two anonymous referees for their helpful comments. This study was partially supported by the Program for Promotion of Basic and Applied Researches for Innovations in Bio-oriented Industry and by a Grant-in-Aid for Scientific Research from the Japan Society for the Promotion of Science (no. 22370083).

Footnotes

Communicating editor: O. Savolainen

Literature Cited

  1. Atwell S., Huang Y. S., Vilhjlmsson B. J., Willems G., Horton M., et al. , 2010.  Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465: 627–631 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Brown G. R., Gill G. P., Kuntz R. J., Langley C. H., Neale D. B., 2004.  Nucleotide diversity and linkage disequilibrium in loblolly pine. Proc. Natl. Acad. Sci. USA 101: 15255–15260 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Charlesworth B., Morgan M. T., Charlesworth D., 1993.  The effect of deleterious mutations on neutral molecular variation. Genetics 134: 1289–1303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chaw S. M., Parkinson C. L., Cheng Y. C., Vincent T. M., Palmer J. D., 2000.  Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers. Proc. Natl. Acad. Sci. USA 97: 4086–4091 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Eckert A. J., Wegrzyn J. L., Pande B., Jermstad K. D., Lee J. M., et al. , 2009.  Multilocus patterns of nucleotide diversity and divergence reveal positive selection at candidate genes related to cold hardiness in coastal Douglas fir (Pseudotsuga menziesii var. menziesii). Genetics 183: 289–298 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Eckert A. J., Bower A. D., González-Martínez S. C., Wegrzyn J. L., Coop G., et al. , 2010.  Back to nature: ecological genomics of loblolly pine (Pinus taeda, Pinaceae). Mol. Ecol. 19: 3789–3805 [DOI] [PubMed] [Google Scholar]
  7. Flint-Garcia S. A., Thornsberry J. M., Buckler E. S., 2003.  Structure of linkage disequilibrium in plants. Annu. Rev. Plant Biol. 54: 357–374 [DOI] [PubMed] [Google Scholar]
  8. Fujimoto A., Kado T., Yoshimaru H., Tsumura Y., Tachida H., 2008.  Adaptive and slightly deleterious evolution in a conifer, Cryptomeria japonica. J. Mol. Evol. 67: 201–210 [DOI] [PubMed] [Google Scholar]
  9. Gadek P. A., Alpers D. L., Heslewood M. M., Quinn C. J., 2000.  Relationships within Cupressaceae sensu lato: a combined morphological and molecular approach. Am. J. Bot. 87: 1044–1057 [PubMed] [Google Scholar]
  10. Gaut B. S., Wright S. I., Rizzon C., Dvorak J., Anderson L. K., 2007.  Recombination: an underappreciated factor in the evolution of plant genomes. Nat. Rev. Genet. 8: 77–84 [DOI] [PubMed] [Google Scholar]
  11. Gillespie J. H., 2000.  Genetic drift in an infinite population: the pseudohitchhiking model. Genetics 155: 909–919 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Gonzalez-Martinez S. C., Ersoz E., Brown G. R., Wheeler N. C., Neale D. B., 2006.  DNA sequence variation and selection of tag single-nucleotide polymorphisms at candidate genes for drought-stress response in Pinus taeda L. Genetics 172: 1915–1926 [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Gore M. A., Chia J. M., Elshire R. J., Sun Q., Grills G. S., et al. , 2009.  The first generation haplotype map of maize. Science 326: 1115–1117 [DOI] [PubMed] [Google Scholar]
  14. Hamberger B., Hall D., Yuen M., Oddy C., Hamberger B., et al. , 2009.  Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome. BMC Plant Biol. 9: 106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Heuertz M., De Paoli E., Källman T., Larsson H., Jurman I., et al. , 2006.  Multilocus patterns of nucleotide diversity, linkage disequilibrium and demographic history of Norway spruce (Picea abies [L.] Karst). Genetics 174: 2095–2105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Hudson R. R., 2001.  Two-locus sampling distributions and their application. Genetics 159: 1805–1817 [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Ingvarsson P. K., 2005.  Nucleotide polymorphism and linkage disequilibrium within and among natural populations of European Aspen (Populus tremula, L., Salicaceae). Genetics 169: 945–953 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Ingvarsson P. K., 2008.  Multilocus patterns of nucleotide polymorphism and the demographic history of Populus tremula. Genetics 180: 329–340 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Ingvarsson P. K., Victoria Garcia M., Luquez V., Hall D., Jansson S., 2008.  Nucleotide polymorphism and phenotypic associations within and around the phytochrome B2 locus in European Aspen (Populus tremula, Salicaceae). Genetics 178: 2217–2226 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Jaramillo-Correa J. P., Verdú M., González-Martínez S. C., 2010.  The contribution of recombination to heterozygosity differs among plant evolutionary lineages and life-forms. BMC Evol. Biol. 10: 22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kado T., Yoshimaru H., Tsumura Y., Tachida H., 2003.  DNA variation in a conifer, Cryptomeria japonica (Cupressaceae sensu lato). Genetics 164: 1547–1559 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Keeling C. I., Dullat H. K., Yuen M., Ralph S. G., Jancsik S., et al. , 2010.  Identification and functional characterization of monofunctional ent-copalyl diphosphate and ent-kaurene synthases in white spruce reveal different patterns for diterpene synthase evolution for primary and secondary metabolism in gymnosperms. Plant Physiol. 152(3): 1197–1208 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Kovach A., Wegrzyn J. L., Parra G., Holt C., Bruening G. E., et al. , 2010.  The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences. BMC Genomics 11: 420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Krutovsky K. V., Neale D. B., 2005.  Nucleotide diversity and linkage disequilibrium in cold-hardiness- and wood quality-related candidate genes in Douglas fir. Genetics 171: 2029–2041 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Kusumi J., Tsumura Y., Yoshimaru H., Tachida H., 2000.  Phylogenetic relationships in Taxodiaceae and Cupressaceae sensu stricto based on matK gene, chlL gene, trnL-trnF IGS region, and trnL intron sequences. Am. J. Bot. 87: 1480–1488 [PubMed] [Google Scholar]
  26. Li Y., Stocks M., Hemmilä S., Källman T., Zhu H. T., et al. , 2010.  Demographic histories of four spruce (Picea) species of the Qinghai-Tibetan Plateau and neighboring areas inferred from multiple nuclear loci. Mol. Biol. Evol. 27(5): 1001–1014 [DOI] [PubMed] [Google Scholar]
  27. McVean G. A. T., Charlesworth B., 2000.  The effects of Hill-Robertson interference between weakly selected mutations on patterns of molecular evolution and variation. Genetics 155: 929–944 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. McVean G., Awadalla P., Fearnhead P., 2002.  A coalescent-based method for detecting and estimating recombination rates from gene sequences. Genetics 160: 1231–1241 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Meuwissen T. H. E., Hayes B. J., Goddard M. E., 2001.  Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–1829 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Neale D. B., Kremer A., 2011.  Forest tree genomics: growing resources and applications. Nat. Rev. Genet. 12: 111–122 [DOI] [PubMed] [Google Scholar]
  31. Neale D. B., Savolainen O., 2004.  Association genetics of complex traits in conifers. Trends Plant Sci. 9: 325–330 [DOI] [PubMed] [Google Scholar]
  32. Ohri D., Khoshoo T. N., 1986.  Genome size in gymnosperms. Plant Syst. Evol. 153: 119–132 [Google Scholar]
  33. Pyhäjärvi T., García-Gil M. R., Knürr T., Mikkonen M., Wachowiak W., et al. , 2007.  Demographic history has influenced nucleotide diversity in European Pinus sylvestris populations. Genetics 177: 1713–1724 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Pyhäjärvi T., Kujala S. T., Savolainen O., 2011.  Revisiting protein heterozygosity in plants: nucleotide diversity in allozyme coding genes of conifer Pinus sylvestris. Tree Genet. Genomes 7: 385–397 [Google Scholar]
  35. Rozas J., Sanchez-DelBarrio J. C., Messeguer X., Rozas R., 2003.  DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19: 2496–2497 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES