In 1997, a group of scientists met in Singapore and agreed to collaborate on sequencing the rice genome. It was agreed at the outset to work on a single cultivar, to share materials, to use a clone-by-clone approach, and to accept the policy of immediate sequence release (Sasaki and Burr, 2000). Although the ultimate goal of the International Rice Genome Sequencing Project (IRGSP) is to obtain a finished-quality sequence for the complete genome, the group has adopted an interim milestone of obtaining phase 2 quality for the complete genome by the end of 2002 (http://rgp.dna.affrc.go.jp/rgp/press_releas20011225.htm). Phase 2 quality is defined as sequenced bacterial artificial chromosomes (BACs) or P1 artificial chromosomes (PACs), with few sequencing gaps, whose pieces are ordered and oriented directionally. Currently, an estimated 206 Mb of nonoverlapping assembled sequence of phase 3 (finished-quality) or phase 2 rice genomic sequence is available in public databases, and three chromosomes are nearing completion (http://rgp.dna.affrc.go.jp/cgi-bin/statusdb/seqcollab.pl). Further details of the genome-sequencing methods used by the IRGSP were described by Eckardt (2000).
Having a robust physical map is critical to the success of the project. From the outset, this project had the advantage of having an extremely well-mapped plant genome. About 1450 genetically mapped expressed sequence tags (ESTs) were used previously to anchor a yeast artificial chromosome (YAC)–based physical map that covered 63% of the genome (Saji et al., 2001; http://rgp.dna.affrc.go.jp/Publicdata.html). YACs, BACs, and PACs are large clones generally containing inserts of >100 kb, and up to 1.5 Mb in the case of some YACs. The ESTs also have been instrumental in anchoring the BAC- and PAC-based maps.
In this issue of The Plant Cell, Wu et al. (pages 525–535) and Chen et al. (pages 537–545) present two complementary studies that greatly increase the number of mapped ESTs and refine the physical maps to provide 90% coverage of the rice genome. In addition, they provide new estimates of the size of the rice genome and indicate the distribution of genes on the rice chromosomes.
A COMPREHENSIVE RICE TRANSCRIPT MAP
In the first article, Wu and colleagues obtained 3′ end sequences from >20,000 clones of their rice cDNA libraries. 3′ end sequences tend to be the least redundant and thus the most likely to give gene-specific markers. From these, they selected 8440 unique sequences as templates for polymerase chain reaction primers. After screening, they retained 6713 sequences that amplified a single band of the predicted size from both rice genomic DNA and the pooled YAC library. Subsequently, they screened pools from the YAC library to identify the YAC clones containing the EST markers. Most of these markers identified YACs that were part of the physical map, permitting immediate mapping of these markers. Approximately 1500 ESTs identified YACs not placed on the physical map previously. A subset of 431 ESTs were mapped genetically, which allowed the placement of more clones on the YAC-based physical map, increasing coverage from 63 to 80%. Finally, a centromere-specific primer was used to identify YACs covering 11 of the 12 rice centromeres. In the end, an additional 6591 EST markers were placed on the physical map. This high marker density will be important in identifying BACs that fill the remaining gaps in the tiling path and in anchoring unplaced BAC contigs (a contig is a contiguous set of overlapping clones or sequences).
Previous work ties the physical map to the genetic map so that the two can be aligned. Because the genetic distance is not uniform with respect to the physical distance, these two maps show different spacing between the markers. Typically, recombination is reduced around centromeres so that genetic distances tend to be condensed, whereas they are more spread out on the chromosome arms. The work of Wu et al. (2002) presents a detailed view of the arrangement of transcribed genes along the length of the physical map. The primary lesson is that gene density varies from chromosome to chromosome and within chromosomes. The three largest chromosomes constitute 31% of the physical map but contain 41% of the EST sites. Within chromosomes, fewer ESTs are found in the vicinity of centromeres, and gene densities generally are highest at the distal ends of the chromosome arms. Similarly skewed gene distributions have been inferred in maize and wheat. Fifty-nine gene-rich regions were identified on the chromosome arms, and the authors estimate that 21% of the rice genome could contain 40% of the genes.
AN INTEGRATED PHYSICAL AND GENETIC MAP
BACs and PACs are the primary templates for the clone-by-clone sequencing approach. The article by Chen et al. (2002) represents the culmination of several years of work by Rod Wing and his colleagues at the Clemson University Genetics Institute (CUGI) to create a BAC-based physical map of rice that covers 100% of the euchromatin and >90% of the genome. Two BAC libraries were created from HindIII and BamHI restriction enzyme partial digests that together represent 25-fold coverage of the genome. The ends of each clone were sequenced, and ∼110,000 end sequences called sequence tag connectors (STCs) were obtained. STCs are used to pick clones flanking sequenced BACs with minimum overlap. The standard approach in building a physical map is to use DNA fingerprinting. In this method, individual BACs are digested to completion and displayed by high-resolution gel electrophoresis with molecular markers so that the fragments can be sized accurately. A collection of 25 or more sized fragments becomes the fingerprint for each BAC clone. FPC (FingerPrinted Contigs) software (Soderlund et al., 2000) is used to assemble the fingerprints to find overlapping BACs. These assemblies then must be examined manually to edit the contigs. Additionally, primer probes were developed from end sequences of terminal clones in many of the assemblies to identify overlapping clones or contigs. The contigs were placed on the genetic map by probing with mapped markers from the IRGSP. Carol Soderlund, the author of FPC, has written software that generates DNA fingerprints from sequenced BACs as they appear in GenBank and brings them into the assemblies, further anchoring the assemblies to the physical map.
The Monsanto Company conducted an independent genome sequence project for rice (Barry, 2001) in which >3000 BACs were sequenced to a fivefold level of redundancy. Monsanto generously made these sequences available to the IRGSP and to public researchers. As the IRGSP brings the sequence quality of Monsanto BACs to phase 2, the sequences are released to public databases. Brad Barbazuk of Monsanto was able to relate independently fingerprinted and assembled Monsanto BAC contigs to the CUGI assemblies by finding high-quality matches between the CUGI STCs and sequences in the assembled Monsanto BACs. This permitted the integration of the Monsanto BACs into the CUGI physical map.
RICE GENOME SIZE
Estimates of the genome size of rice and the physical length of rice chromosomes are important issues for rice genome sequencers, who need to know how much must be sequenced. Arumuganathan and Earle (1991) reported that the 2C (twice the gametic) value for Oryza sativa ssp. japonica ranged from 0.86 to 0.91 pg and that the haploid genome size was 430 Mb. Arumuganathan (personal communication) later measured O. japonica cv Nipponbare and found a 2C value of 0.90 ± 0.02 pg. Assuming a mass of 650 D per base pair, this value places the haploid genome of cv Nipponbare at 417 Mb.
Chen et al. (2002) estimate that they have covered nearly all of the euchromatic portions of the genome. The BAC contigs are anchored to the genetic map by mapped markers common to the physical and genetic maps, and a genetic distance for each gap between contigs can be measured. Using a local ratio of physical distance to genetic distance, the sizes of the gaps (in base pairs) can be estimated. When the BAC contigs and gaps are totaled, the estimate for the genome size comes to 403 Mb. Chen et al. (2002) allow that their physical map does not cover the nucleolar organizer region at the end of chromosome 9. Also, the libraries apparently do not include telomeres. Furthermore, the estimates for gaps in centromeres must be approximate because there is virtually no recombination in these regions.
The current estimate of the length of each chromosome was calculated assuming a genome size of 430 Mb and dividing this figure by each chromosome's fraction of the total genetic distance measured in the Nipponbare × Kasalanth mapping population. Now that sequencing for chromosomes 1, 4, and 10 is nearly complete, we can see that, except for chromosome 1, these are reasonable estimates of the sizes of most of the chromosomes. The size of chromosome 1 is not known accurately because there are still a few gaps, including in the centromere region. The sequencing groups working on chromosomes 1, 4, and 10 estimate sizes of 47, 36, and 24.5 Mb, respectively. The sizes estimated from the integrated physical map are 44.3, 36.6, and 25.7 Mb.
The work of Wu et al. (2002) and Chen et al. (2002) represents a major contribution to the sequencing of the rice genome in that they provide tools for obtaining a minimal tiling path of minimally overlapping clones to be used for sequencing templates. They further provide a detailed transcript map for rice and confirm the size of the genome to be slightly >400 Mb. Beyond sequencing, the comprehensive EST transcript map and integrated physical and genetic maps provide valuable tools for map-based cloning and gene identification in rice and related monocot species.
References
- Arumuganathan, K., and Earle, E.D. (1991). Nuclear DNA content of some important plant species. Plant Mol. Biol. Rep. 3, 208–218. [Google Scholar]
- Barry, G. (2001). The use of the Monsanto draft rice genome sequence in research. Plant Physiol. 125, 1164–1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen, M., et al. (2002). An integrated physical and genetic map of the rice genome. Plant Cell 14, 537–545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eckardt, N.A. (2000). Sequencing the rice genome. Plant Cell 12, 2011–2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saji, S., Umehara, Y., Antonio, B.A., Yamane, H., Tanoue, H., Baba, T., Aoki, H., Ishige, N., Wu, J., Koike, K., Matsumoto, T., and Sasaki, T. (2001). A physical map with yeast artifical chromosome (YAC) clones covering 63% of the 12 rice chromosomes. Genome 44, 32–37. [DOI] [PubMed] [Google Scholar]
- Sasaki, T., and Burr, B. (2000). International Rice Genome Sequencing Project: The effort to completely sequence the rice genome. Curr. Opin. Plant Biol. 3, 138–141. [DOI] [PubMed] [Google Scholar]
- Soderlund, C., Humphray, S., Dunham, A., and French, L. (2000). Contigs built with fingerprints, markers and FPC V4.7. Genome Res. 10, 1772–1787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu, J., et al. (2002). A comprehensive rice transcript map containing 6591 ex-pressed sequence tag sites. Plant Cell 14, 525–535. [DOI] [PMC free article] [PubMed] [Google Scholar]