Yamato et al. 10.1073/pnas.0609054104.

Supporting Information

Files in this Data Supplement:

SI Text
SI Table 3
SI Table 4
SI Figure 4
SI Data Set 1
SI Table 5
SI Table 6
SI Table 7
SI Figure 5
SI Figure 6
SI Figure 7
SI Table 8
SI Table 9




SI Figure 4

Fig. 4. Schematic illustration of the M. polymorpha Y chromosome and alignment of the sequenced PAC clones. In the 470-kb contig of YR1, PAC clones pMM4G7 and pMM2D3 were previously sequenced (black bars).





SI Figure 5

Fig. 5. Alignment of repetitive elements in M19B4.1-3 and human ODF3 (UniProt accession no. Q96PU9). Amino acid residues identical to the first element of M19B4.1-3 are indicated by :, and sequence gaps are indicated by -. Regions conserved among the elements are boxed.





SI Figure 6

Fig. 6. Insertions of mitochondrial DNA into the Y chromosome. Horizontal bars represent the collective sequence of YR1 (except the 2.2-kb BamHI repeats), the mitochondrial sequence (GenBank accession no. NC_001660), and the combined sequence of contig A and contig B in YR2, respectively, as indicated. The size of the mitochondrial sequence is exaggerated for better presentation. The bars for YR1 and YR2 are shown with tiling and representative clones. Each similarity pair with a BLASTN E value <10-20 is connected by a line, its color reflecting percent identity, i.e., red: 95% or higher, magenta: 90% or higher, and green: < 90%.





SI Figure 7

Fig. 7. Pairwise alignment of the deduced amino acid sequences of the M104E4.1 and F62B12.1 genes. Amino acid residues of F62B12.1 (GenBank accession no. AB272580) that are identical to those of M104E4.1 (GenBank accession no. AB272579) are indicated by colons. Alignment gaps are indicated by dashes. The boxed region shows the LOV domain. Sites of introns are indicated by arrowheads, and their sizes are given in nucleotides above or below the arrowheads, respectively.





Table 3. Statistics of the M. polymorpha Y chromosome

Statistic

YR1

 

YR2

 

No. of PAC clones sequenced

28*

 

58

 

Length, bp

3,200,899

 

5,998,135

 

GC content, %

45

 

43

 

No. of elements (occurrence per 100 kb)

 

 

 

Gene†

9‡

 

55

(0.9)

Pseudogene§

3

 

48

(0.8)

EST homolog¶

0

 

20

(0.3)

Transposable element||

10**

 

507

(8.5)

Mitochondrial DNA††

10**

 

90

(1.5)

Plastid DNA††

0

 

3

 

*Including previously sequenced PAC clones, pMM4G7 and pMM2D3.

† Sequences with BLASTX similarity (E value < 10-5) to known amino acid sequences (excluding transposable elements) or tagged by M. polymorpha EST(s).

‡The number of gene families is given, not the actual copy number.

§ Sequences with BLASTX similarity (E value < 10-5) to known amino acid sequences (excluding transposable elements), containing in-frame stop codons or frame-shifts.

¶ Sequences similar to M. polymorpha EST(s) but not identical.





Table 4. Summary of sequenced clones

Clone

Insert size, kb*

Sequence obtained, bp†

Phase‡

No. of contigs

Accession no.

Note

YR1

 

 

 

 

 

 

 

 

 

 

 

470-kb Contig

 

 

 

 

 

 

 

 

 

 

pMM24-58G10

100

 

191,821

 

1

91

 

AP009097

 

 

pMM2D3

90

 

83,249

 

1

6

 

AF542555 - AF542560

Ref. 1

 

pMM23-101C6

125

 

171,791

 

1

84

 

AP009098

 

 

pMM23-360C7

139

 

273,343

 

1

147

 

AP009099

 

 

pMM23-118G8

124

 

183,849

 

1

89

 

AP009100

 

 

pMM23-300E6

100

 

150,988

 

1

80

 

AP009101

 

 

pMM4G7

35

 

30,156

 

1

2

 

AB062742, AB062743

Ref. 2

 

 

 

 

 

 

 

 

 

 

 

 

Representative

 

 

 

 

 

 

 

 

 

 

pMM8H2

114

 

131,890

 

1

56

 

AP009102

 

 

pMM23-205B1

131

 

157,789

 

1

36

 

AP009103

 

 

pMM23-205E6

80

 

87,633

 

1

39

 

AP009104

 

 

pMM23-348H6

140

 

150,661

 

1

84

 

AP009105

 

 

pMM23-108B3

70

 

69,315

 

1

43

 

AP009106

 

 

pMM23-145D11

115

 

177,070

 

1

94

 

AP009107

 

 

pMM23-287F10

120

 

126,991

 

1

64

 

AP009108

 

 

pMM23-372H9

120

 

192,643

 

1

24

 

AP009109

 

 

pMM23-123D8

125

 

145,717

 

1

79

 

AP009110

 

 

pMM23-437F9

135

 

114,822

 

1

65

 

AP009111

 

 

pMM16A2

70

 

121,291

 

1

66

 

AP009112

 

 

pMM23-198A1

125

 

87,433

 

1

35

 

AP009113

 

 

pMM23-144H6

110

 

71,658

 

1

26

 

AP009114

 

 

pMM23-77F9

70

 

54,469

 

1

19

 

AP009115

 

 

pMM23-200H8

100

 

48,897

 

1

18

 

AP009116

 

 

pMM23-277H2

150

 

93,343

 

1

32

 

AP009117

 

 

pMM23-863F12

70

 

38,375

 

1

15

 

AP009118

 

 

pMM23-63B5

120

 

84,022

 

1

41

 

AP009119

 

 

pMM23-173B4

80

 

110,944

 

1

42

 

AP009120

 

 

pMM23-291C10

45

 

22,655

 

1

9

 

AP009121

 

 

pMM23-70G5

35

 

28,084

 

1

14

 

AP009122

 

 

 

 

 

 

 

 

 

 

 

 

YR2

 

 

 

 

 

 

 

 

 

 

 

Contig-A

 

 

 

 

 

 

 

AP009095

 

 

pMM23-431A8

110

 

90,102

 

1

13

 

 

Problematic repeats

 

pMM23-338F12

140

 

148,010

 

2

1

 

 

 

 

pMM23-354E2

130

 

128,026

 

2

1

 

 

 

 

pMM23-477F3

140

 

132,018

 

2

1

 

 

 

 

pMM23-480H6

130

 

135,906

 

2

1

 

 

 

 

pMM23-284C9

140

 

136,311

 

2

1

 

 

 

 

pMM19B4

150

 

141,379

 

2

2

 

 

 

 

pMM23-90G5

125

 

104,980

 

2

7

 

 

 

 

pMM23-355D5

140

 

139,361

 

2

1

 

 

 

 

pMM23-350E4

150

 

159,948

 

2

1

 

 

 

 

pMM23-166C5

125

 

123,886

 

2

1

 

 

 

 

pMM23-97D8

115

 

108,933

 

2

1

 

 

 

 

pMM23-165G8

120

 

122,404

 

2

1

 

 

 

 

pMM24-26B6

100

 

103,594

 

2

1

 

 

 

 

pMM23-627E2

140

 

145,557

 

2

2

 

 

 

 

pMM23-636F5

130

 

118,877

 

1

8

 

 

Problematic repeats

 

pMM23-729D7

150

 

127,651

 

2

4

 

 

Problematic repeats

 

pMM24-38C9

110

 

125,205

 

1

3

 

 

Extensive deletion

 

pMM23-547D3

110

 

80,900

 

2

2

 

 

 

 

pMM23-666C5

100

 

108,374

 

2

1

 

 

 

 

pMM23-217D8

120

 

117,581

 

2

1

 

 

 

 

pMM23-354D1

110

 

105,780

 

2

1

 

 

 

 

pMM23-212C1

115

 

132,531

 

2

1

 

 

 

 

pMM23-507A1

105

 

105,648

 

2

1

 

 

 

 

pMM23-589F12

140

 

121,325

 

1

26

 

 

Problematic repeats

 

pMM23-513H5

115

 

148,559

 

2

1

 

 

 

 

pMM23-104E4

140

 

133,635

 

3

1

 

 

 

 

pMM23-179A5

150

 

132,013

 

1

13

 

 

Problematic repeats

 

pMM23-46E1

170

 

177,142

 

2

1

 

 

 

 

pMM23-414C2

85

 

110,328

 

2

3

 

 

 

 

pMM23-222E12

100

 

95,727

 

1

7

 

 

 

 

pMM23-657H8

110

 

101,118

 

1

4

 

 

 

 

pMM23-537D7

90

 

97,830

 

1

9

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Contig-B

 

 

 

 

 

 

 

AP009096

 

 

pMM23-359F1

130

 

149,396

 

2

1

 

 

 

 

pMM24-95E6

90

 

96,919

 

2

1

 

 

 

 

pMM23-635B8

130

 

140,583

 

2

1

 

 

 

 

pMM23-408G1

135

 

135,387

 

3

1

 

 

 

 

pMM23-265H6

130

 

123,253

 

3

1

 

 

 

 

pMM23-420F5

135

 

132,477

 

3

1

 

 

 

 

bridgeM420F5-M286B9

-

 

3,354

 

-

-

 

 

 

 

pMM23-286B9

150

 

152,281

 

3

1

 

 

 

 

pMM23-578C3

130

 

128,465

 

3

1

 

 

 

 

pMM23-88B7

125

 

123,297

 

3

1

 

 

 

 

gap

 

 

(70 kb)

 

 

 

 

 

 

 

pMM23-47H4

150

 

 

 

 

 

 

 

Not sequenced due to extensive deletion during culture

 

pMM23-169B8

70

 

66,859

 

2

1

 

 

 

 

bridgeM169B8-M402H5

-

 

5,158

 

-

-

 

 

 

 

pMM23-402H5

130

 

122,996

 

2

1

 

 

 

 

pMM23-508E5

160

 

152,005

 

2

1

 

 

 

 

pMM23-435B3

90

 

97,507

 

2

2

 

 

 

 

pMM23-526D2

180

 

185,531

 

2

1

 

 

 

 

pMM23-54C8

105

 

116,519

 

2

1

 

 

 

 

pMM23-516B5

125

 

135,413

 

2

5

 

 

 

 

pMM23-468B3

75

 

72,725

 

2

2

 

 

 

 

pMM23-530A10

145

 

146,690

 

1

5

 

 

Problematic repeats

 

pMM24-22H6

155

 

106,302

 

1

5

 

 

Problematic repeats

 

pMM23-313C1

115

 

121,246

 

2

1

 

 

 

 

gap

 

 

(4 kb)

 

 

 

 

 

 

 

pMM23-546H7

110

 

126,876

 

1

5

 

 

Problematic repeats

 

pMM23-41H12

120

 

122,191

 

2

1

 

 

 

 

pMM53F7

120

 

38,431

 

1

6

 

 

Problematic repeats

 

pMM23-45F5

85

 

75,384

 

2

1

 

 

 

 

bridgeM45F5-M632E5

-

 

3,587

 

-

-

 

 

 

 

pMM23-632E5

125

 

119,747

 

1

4

 

 

Problematic repeats

*Estimated by pulse-field gel electrophoresis.

†Total length of contigs after assembling. Sequences of singlets are also included for YR1 clones.

‡Definition by Venter et al. (3).

1. Ishizaki K, Shimizu-Ueda Y, Okada S, Yamamoto M, Fujisawa M, Yamato KT, Fukuzawa H, Ohyama K (2002) Nucleic Acids Res 30: 4675-4681.

2. Okada S, Sone T, Fujisawa M, Nakayama S, Takenaka M, Ishizaki K, Kono K, Shimizu-Ueda Y, Hanajiri T, Yamato KT, et al. (2001) Proc Natl Acad Sci USA 98:9454-9459.

3. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al. (2001) Science 291: 1304-1351.





Table 5. Genes identified on the M. polymorpha Y chromosome

Name

Similarity to UniProt entries

 

 

 

Similarity to InterPro domains

Presence in female

Expression

ESTs

 

Accession

Annotation

Score

E

value

Organism

 

 

sexual organ

thallus

 

 

 

 

 

 

 

 

 

 

 

 

YR1

 

 

 

 

 

 

 

 

 

 

M2D3.1

Q9SB62

Putative alliin lyase

998

7e-97

Arabidopsis thaliana

Alliinase EGF-like

+

+

+

 

M2D3.2

Q6L491

Putative AP2 domain protein

413

8e-35

Oryza sativa

Transcriptional factor B3

+

+

+

 

M2D3.3

Q9C950

Hypothetical protein T7P1.17

247

3e-18

Arabidopsis thaliana

 

+

-

-

 

M2D3.4

Q8S6P9

Hypothetical protein OJ1004_F02.2

223

9e-16

Oryza sativa

 

-

+

+

 

M2D3.5

(ORF162)

Q5Z8R1

Putative VIP2 protein

174

2e-10

Oryza sativa

Ring-finger

-

+

-

 

M2D3.6

Q658A2

Transcriptional co-repressor-like

620

5e-57

Oryza sativa

Paired amphipathic helix

+

+

+

 

M8H2.3

Q7XIM6

Glyoxalase

235

2e-16

Oryza sativa

Glyoxalase/extradiol ring-cleavage dioxygenase

+

+

+

 

M205B1.4

GLH4_CAEEL

Putative ATP-dependent RNA helicase

167

1e-7

Caenorhabditis elegans

Zn-finger, CCHC type

+

+

-

 

M123D8.8

Q7EZ84

Putative calcium-transporting ATPase 8

107

8e-6

Oryza sativa

Calcium ATPase, transduction domain A

+

+

+

 

 

 

 

 

 

 

 

 

 

 

 

YR2, Contig-A

 

 

 

 

 

 

 

 

 

 

M338F12.2

Q9SU70

Hypothetical protein

135

7e-31

Arabidopsis thaliana

Ring-finger, TonB box

-

+

+

 

M338F12.1

Q94G52

Calcium-dependent protein kinase (Fragment)

784

0.0

Funaria hygrometrica

Serine/threonine protein kinase, Calcium-binding EF hand

-

+

+

rlwa06f07

M338F12.3

Q9FZ45

F6I1.14 protein (Hypothetical protein)

490

1e-137

Arabidopsis thaliana

 

+

+

+

rlwb31d23

M480H6.1

Q9U4M4

7138.7

176

4e-8

Leishmania major

 

-

+

-

 

M19B4.1-1

Q7QSN7

GLP_327_32944_31097

226

6e-58

Giardia lamblia

ATCC 50803

-

-

-

 

M19B4.1-2

 

 

 

 

 

 

-

-

-

 

M19B4.1-3

 

 

 

 

 

 

-

+

-

rlwb09e07, rlwb25b12

M19B4.2

Q5B778

Hypothetical protein

173

2e-11

Aspergillus nidulans

Leucine-rich repeat

-

+

-

 

M355D5.3

 

M. polymorpha

EST, rlwb44o19

 

 

 

 

+

+

+

rlwb44o19

M355D5.1

O82677

Retinoblastoma-related protein

108

1e-23

Chenopodium rubrum

 

+

+

-

 

M355D5.4

 

M. polymorpha

EST, rlwb16g05

 

 

 

 

-

+

-

rlwb16g05

M355D5.2

M3K1_ARATH

Mitogen-activated protein kinase kinase kinase 1

84

4e-16

Arabidopsis thaliana

Serine/threonine protein kinase

-

+

-

 

M350E4.4

Q9LFV5

Hypothetical protein (Fimbriata-like)

145

9e-34

Arabidopsis thaliana

Cyclin-like F-box, Galactose oxidase

-

+

-

 

M166C5.1

SFR1_ARATH

Pre-mRNA splicing factor SF2 (SR1 protein)

248

3e-65

Arabidopsis thaliana

RNA-binding region RNP-1

-

+

+

PTA2.1754.C1, PTA2.2367.C1, rlwb48e01

M166C5.2

Q8W555

Putative RNA-binding protein

380

1e-104

Arabidopsis thaliana

RNA-binding region RNP-1

-

+

+

PTA2.842.C1

M166C5.5

Q7XR01

OSJNBa0015K02.12 protein

482

1e-135

Oryza sativa

ABC1 family (ly-rich, unrelated to ABC transporters.)

-

+

-

rlwb48j21

M97D8.2

Q8H6S5

CTV.2

188

2e-9

Poncirus trifoliata

 

+

+

+

 

M26B6.2

Q9FEA1

Anthocyanin 1

183

6e-9

Petunia hybrida

 

-

+

+

 

M636F5.4

Q8L7G1

Hypothetical protein At4g25330

190

1e-9

Arabidopsis thaliana

 

+

+

+

 

M547D3.2

Q9FYC5

Hypothetical protein

498

1e-139

Arabidopsis thaliana

Serine/threonine protein kinase

+

+

+

rlwa21f14, PTA2.1527.C1

M547D3.1

Q9FJH0

GTP-binding protein, ras-like (At5g60860)

368

1e-101

Arabidopsis thaliana

Ras GTPase

-

+

+

PTA2.1434.C1, PTA2.1434.C2

M666C5.4

Q9FZJ2

F17L21.22

180

1e-8

Arabidopsis thaliana

 

+

+

+

 

M666C5.1

Q9GYB0

Possible CG15429 protein

168

6e-41

Leishmania major

Cytochrome b5

-

+

-

rlwb01o21, rlwb16a08

M217D8.1

Q8LJ68

Protein phosphatase 2C-like protein

244

7e-64

Oryza sativa

Protein phosphatase 2C-like

-

+

+

rlwa05l09, PTA2.1601.C1

M217D8.2

O23057

BAC IG005I10

991

0.0

Arabidopsis thaliana

 

-

+

+

rlwb23f14, PTA2.3279.C1

M217D8.4

 

M. polymorpha

EST, rlwb18g14

 

 

 

 

-

+

+

rlwb18g14

M217D8.3

 

M. polymorpha

EST, PTA2.2137.C1

 

 

 

 

-

+

+

PTA2.2137.C1

M354D1.2

O82317

At2g25800 protein

82

9e-28

Arabidopsis thaliana

 

-

+

+

 

M354D1.4

O22792

At2g33420 protein

180

1e-8

Arabidopsis thaliana

 

+

+

-

 

M354D1.5

O65224

F7N22.7 protein

156

9e-6

Arabidopsis thaliana

 

+

+

-

 

M212C1.4

P54674

Phosphatidylinositol 3-kinase 2

163

1e-6

Dictyostelium discoideum

Phosphatidylinositol 3- and 4-kinase

-

+

-

 

M104E4.1

Q9ST26

Nonphototrophic hypocotyl 1a

192

2e-47

Oryza sativa

PAC motif

-

+

+

rlwb06g04

M537D7.2-1

Q761Z7

BRI1-KD interacting protein 117 (Fragment)

193

4e-10

Oryza sativa

Zn-knuckle

+

+

+

 

M537D7.2-2

Q761Z7

BRI1-KD interacting protein 117 (Fragment)

193

4e-10

Oryza sativa

Zn-knuckle

+

+

-

 

 

 

 

 

 

 

 

 

 

 

 

YR2, Contig-B

 

 

 

 

 

 

 

 

 

 

M359F1.1

Q9FKL6

Calmodulin-binding protein

553

1e-156

Arabidopsis thaliana

 

-

+

+

 

M95E6.1

HSF3_ARATH

Heat shock factor protein 3

376

1e-103

Arabidopsis thaliana

Heat shock factor (HSF)-type (DNA-binding)

-

+

+

rlwb39a18

M408G1.2

BSL2_ARATH

Serine/threonine protein phosphatase

1419

0.0

Arabidopsis thaliana

Serine/threonine protein phosphatase (BSU1 type), Kelch repeat, Metallophosphoesterase

-

+

+

 

M286B9.1

Q9FIG5

Gb|AAF18661.1

200

2e-50

Arabidopsis thaliana

 

-

+

+

rlwb43f22

M420F5.1

Q9FIG5

Putative C2 domain-containing protein

159

3e-6

Oryza sativa

 

-

+

+

 

M286B9.2

Q41102

Phaseolin G-box binding protein PG2 (Fragment)

169

3e-41

Phaseolus vulgaris

 

-

+

+

M01F020

M578C3.1

Q9LJ30

EST AU082118(E20525) corresponds to a region of the predicted gene

770

0.0

Oryza sativa

ARM repeat, Importin-beta N-terminal domain

-

+

+

 

M88B7.1

Q94B66

ZIM-like 1 protein

256

1e-67

Arabidopsis thaliana

ZIM motif, CCT motif, GATA-type Zn-finger

-

+

+

 

M402H5.1

Q8RWB1

Hypothetical protein (At5g37370)

421

1e-117

Arabidopsis thaliana

PRP38 family

-

+

+

 

M402H5.5

Q84YI1

Polyphenol oxidase (EC 1.10.3.1)

90

2e-17

Trifolium pratense

Tyrosinase, Di-copper centre-containing domain

+

+

+

 

M402H5.6

 

M. polymorpha

EST, rlwb26I22

 

 

 

 

+

+

+

rlwb26I22

M508E5.1

PACG_MOUSE

Parkin coregulated gene protein homolog

269

2e-71

Mus musculus

ARM repeat

-

+

-

PTA2.3045.C1

M526D2.2

Q7T0Y4

MGC68877 protein

192

4e-10

Xenopus levis

 

-

+

-

 

M468B3.1

U7I1_HUMAN

Ubiquitin conjugating enzyme 7 interacting protein 1

69

2e-11

Homo sapiens

 

-

+

-

PTA2.1947.C1

M530A10.2

Q9FX33

Hypothetical protein T9L24.41 (Hypothetical protein At1g73380)

187

2e-9

Arabidopsis thaliana

 

+

+

+

 

M530A10.3

ALF1_PEA

Fructose-bisphosphate aldolase, cytoplasmic isozyme 1

545

1e-154

Pisum sativum

Fructose-bisphosphate aldolase class-I

-

+

-

M01D005

M22H6.1

Q7RIA9

Immediate early protein homolog (Fragment)

82

2e-15

Plasmodium yoelii yoelii

 

+

+

+

 

M313C1.2

O04892

Cytochrome P450 monooxygenase (Fragment)

166

5e-7

Nicotiana tabacum

 

+

+

+

 

M41H12.2

Q83GU1

Hypothetical protein

158

4e-6

Tropheryma whipplei

 

+

+

-

 

M41H12.1

Q8LHF0

NADH-dependent glutamate synthase

217

5e-56

Oryza sativa

Adrenodoxin reductase, Pyridine nucleotide-disulphide oxidoreductase (class-II)

-

-

-

 

M45F5.1

Q8VZD4

At1g28320/F3H9_2

115

2e-25

Arabidopsis thaliana

Trypsin-like serine proteases

+

+

+

 





Table 6. EST homologs

EST homologue

EST

Contig-A

 

M431A8.1

rlwb11l09

M354E2.1

rlwb32a18

M350E4.5

PTA2.2869.C1

M627E2.1

rlwa36n09

M636F5.2

rlwa22o10

M666C5.2

PTA2.1055.C1

M354D1.1

PTA2.1712.C2

M212C1.3

rlwb48d02

M513H5.3

PTA2.989.C1

M222E12.2

rlwb44o19

M657H8.1

rlwb48e16

M537D7.1

rlwb19i13

Contig-B

 

M286B9.3

PTA2.3014.C1

M313C1.1

PTA2.2164.C1

M359F1.3

rlwa04i13

M402H5.7

rlwb01b07

M408G1.4

rlwa19e23

M508E5.2

PTA2.1386.C1

M516B5.1

M01I026

M546H7.1

rlwb44o19





Table 7. Putative spermatogenic genes identified on the Y chromosome of M. polymorpha

Marchantia

Chromosome

Human

(UniGene ID)

Chromosome

Mouse

(UniGene ID)

Chromosome

Chlamydomonas

(JGI ID)

 

Description

Testis specific/biased expression

in human and mouse*

 

 

 

 

 

 

 

 

 

 

M19B4.1

Y

ODF3

(Hs.350949)

11

shippo1

(Mm.56404)

7

C_490060

 

Outer dense fiber of sperm tails 3

Yes

M508E5.1

Y

PACRG

(Hs.25791)

6

Pacrg

(Mm.18889)

17

C_20334

 

PARK2 co-regulated

Yes

M666C5.1

Y

FLJ32499

(Hs.27475)

17

Gm740

(Mm.371762)

11

C_220104

 

Hypothetical protein with cytochrome b5 domain

No

M480H6.1

Y

DKFZp434I099

(Hs.513635)

16

Gm770

(Mm.277112)

8

Not detected

 

Hypothetical protein

Yes

M526D2.2

Y

NYD-SP28

(Hs.393714)

12

Gm1040

(Mm.256588)

5

C_1160041

 

Hypothetical protein

Yes

M468B3.1

Y

TRIAD3

(Hs.487458)

7

MGI:1344349

(Mm.362087)

5

Not detected

 

Ubiquitin conjugating enzyme 7 interacting protein 1

No

*Based on UniGene's EST Profile Viewer at National Center for Biotechnology Information.





Table 8. Primers used for degenerate PCR and X-linkage analysis

Gene

Degenerate primers

 

 

Primers for X-linkage analysis

 

Target sequence

Name

Sequence*

 

Name

Sequence

M338F12.1

PPFWAET

M338F12.1DF1

CCNCCNTTYTGGGCNGARAC

M338F12.1FF1

GAAGGCTTTGCGGATCATAG

 

GTDWRKA

M338F12.1DR1

GCYTTNCKCCARTCNGTNCC

 

M338F12.1FR1

TATCATTCGGGCCTAAGTCG

 

 

 

 

 

 

 

M547D3.1

TIGVEFAT

M547D3.1DF1

ACNATHGGNGTNGARTTYGCNAC

M547D3.1FF1

GCATCAATGTGGACAGCAAA

 

TFENVERW

M547D3.1DR1

CCANCKYTCNACRTTYTCRAANGT

M547D3.1FR1

TCATATACCAAGAGAGCTCCTACG

 

 

 

 

 

 

 

M408G1.2

AAEAEAI

M408G1.2DF2

GCNGCNGARGCNGARGCNAT

 

M408G1.2FF2

TGTTGTAGCTGCGGAGTCTG

 

ECVMDGFE

M408G1.2DR3

CRAANCCRTCCATNACRCAYTC

 

M408G1.2FR2

CTGCGGTATCACAAAGCTCA

 

 

 

 

 

 

 

M88B7.1

PPEKVQAV

M88B7.1DF1

CCNCCNGARAARGTNCARGCNGT

M88B7.1FF1

ACTTCCAGCCCGCATGAATA

 

GLMWANKG

M88B7.1DR1

CCYTTRTTNGCCCACATNARNCC

M88B7.1FR1

TCCGAACAGTGTATCGAATTTTT

 

 

 

 

 

 

 

M402H5.1

MKLTVKQM

M402H5.1DF2

ATGAARYTTACNGTNAARCARATG

M402H5.1FF1

CACGAATTCCCGTACCTGTT

 

FGQRAPHR

M402H5.1DR2

CGATGAGGAGCACGTTGTCCRAA

M402H5.1FR1

AAACCGACAATGCAGCTTTC

* N, A+C+G+T; H, A+C+T; R, A+G; Y, C+T; K, G+T.

 

 

 





Table 9. Primers used for mapping Contig-A and Contig-B

 

Forward

 

Reverse

Product

 

 

Name

Sequence (5'->3')

 

Name

Sequence (5'->3')

length, bp

Template

Contig-A

CA-L02F

TTCCCAGGACTCATTCAAGC

CA-L02R

GAAAACCGCAAGAACAAGGA

4,000

pMM23-431A8

 

CA-L03F

TTCTCCACCGTTTCTGTTCA

CA-L03R

ATGGGTAACTGTTGCGCTTG

4,011

pMM23-338F12

 

CA-L05F

AAGCCGTAGAAAGGAGATAAGGA

CA-L05R

ACTTTGCATGAAAGCGGAAT

4,009

pMM23-354E2

 

CA-L07F

GATCCCTGATTTTTGCGTGT

CA-L07R

TCGAAAGCAACAATTTGACG

3,002

pMM23-338F12

 

CA-L08F

TCCAGGGGTATTGCTACAGG

CA-L08R

CCGAAGACCAAAACAACCTC

3,001

pMM23-338F12

 

CA-L09F

CCATGTACTTTTACCCCGTCA

CA-L09R

GGAGGAAACGTACCAAATCG

3,004

pMM23-338F12

 

CA-L10F

ATTCGCGCCTATGTTGAGTT

CA-L10R

TGAGGAAAAGTACGGATCACAA

2,000

pMM23-477F3

 

CA-L11F

CATTTCTCCTCCCCTAGCAA

CA-L11R

ATTCTTGGGCCTTGGATTCT

2,000

pMM23-338F12

 

CA-L12F

TGGACTGCATTCGATTTTGA

CA-L12R

GCGGCGTACAGAAGTACCTG

2,000

pMM23-338F12

Contig-B

CB-L01F

GCCTTTAGCAAGTGCCTACG

CB-L01R

TCGCATGAAAGTCAGAGGTG

6,002

pMM23-408G1

 

CB-L03F

CCTGCGAATTCCAAGTTCAT

CB-L03R

CTAGCGCGAGTTACGGTGAT

4,001

pMM24-95E6

 

CB-L04F

ATTATTGAGCCGCCAATGTC

CB-L04R

TGTAGACTGCGCCACAAACT

3,002

pMM23-408G1

 

CB-L05F

TGTGAAAGTGGCATACGAGAA

CB-L05R

CACAAAAGCTTTCAATGACACA

3,000

pMM23-359F1

 

CB-L06F

AACCACGAGGTTCGTGAGAG

CB-L06R

GGATATCGGTGGCTGACTGT

3,002

pMM23-359F1

 

CB-L07F

GCAGTGCTTGCGAACTCTTA

CB-L07R

AAAGCTGGTTGAACGTAGCC

3,000

pMM23-408G1

 

CB-L08F

TTATCACACCAAGTGTCGCAAT

CB-L08R

TGATAGCATCAATCATGCAAGG

3,000

pMM24-95E6

 

CB-L10F

CACGCACACATGGTAATTGA

CB-L10R

TCAATGCCTTTTCATCTGCTT

3,000

pMM23-359F1

 

CB-L11F

TTTATCGTTCCCTTCTTGTGG

CB-L11R

CTTCGACGGTGTGAGTGAAA

2,002

pMM23-408G1

 

CB-L12F

TTTGCTTGTCCAAGTTGCAG

CB-L12R

TTGCCTCTAAAGCCCACAAC

2,012

pMM23-635B8





SI Text

The Sequence of YR1

The accumulation of the 2.4-kb BamHI repeat family in YR1 prevents construction of its physical map by chromosome walking and forced us to exploit alternative strategies. First, altogether 429 PAC clones were isolated by colony hybridization by using the 2.4-kb BamHI repeat as a probe. One of these clones, pMM4G7, was initially sequenced (1). Using pMM4G7 as a starting clone, we obtained a contig map of 470 kb (SI Fig. 4). The map, however, could not be further extended because of the extensive repetitive nature of YR1. Therefore, to investigate the net sequence of YR1 beyond this initial 470-kb contig, a set of clones that are derived from YR1 but do not overlap each other was selected as follows.

The 429 Y-chromosomal clones were first clustered into groups by a fingerprinting method to identify clones that cover different portions of YR1. Assembly of a group was verified by comparing restriction profiles of its member clones in the same gel, and one representative clone was selected from each group. Consequently, 271 of the 429 clones were assembled into 22 groups by fingerprinting with either BamHI or DraI. The remaining 158 clones were rejected because of limited insert lengths, possible chimerism, and similarities to the 470-kb contig (SI Fig. 4).

In one of the 22 groups, all of its 29 member clones predominantly consisted of one of the Y chromosome-specific repeat units, the 2.2-kb BamHI repeat, which differs from the 2.4-kb BamHI repeat by a few missing small reiterated sequence motifs (2). Because the sum of the insert sizes of these 29 clones was »1.9 Mb and coverage of the male genomic library constructed was »7 times (2), the 2.2-kb BamHI repeat clusters presumably account for »270 kb of YR1.

The other groups provided 21 representative clones with a total insert size of 2.1 Mb. The 470-kb contig, the 21 representative clones (2.1 Mb), and the clusters of the 2.2-kb BamHI repeat (270 kb) collectively cover 2.8 Mb of YR1, which is only 70% of its estimated physical size of 4 Mb (SI Fig. 4). This apparent shortfall in sequence coverage can be largely explained by convergence of highly conserved repeats into a smaller number of representative sequences, which then suggests that the sequence coverage for YR1 is much higher than it appears. Formally, we cannot rigorously exclude that entirely unrelated sequences which escaped our screening are additionally present in the YR1 region.

To gather the sequence of YR1, five clones of the 470-kb contig and the 21 representative clones were shotgun sequenced (SI Fig. 4, Table 1, SI Table 3, and SI Table 4). Our efforts to sequence the YR1 representative clones initially focused on acquisition of phase 1 sequences of each PAC clone, i.e., unordered assemblies of sequence contigs (3).

The average number of sequence contigs generated by the sequence assembler was 50 per PAC clone. The sum of the contigs was often inconsistent with the sizes estimated by gel electrophoresis, reflecting misassemblies caused by the multitude of repeats (SI Table 4).

Physical Mapping and Sequencing of YR2

In addition to the six Y-linked markers previously isolated by RDA (4), four more Y-linked RDA markers were obtained. Because these 10 markers showed no similarity to any PAC clone of YR1 by BLAST analyses and gave no products in the YR1 PCR assays, we concluded that all these RDA markers belong to YR2. With these markers we initiated chromosome walking and constructed two contigs, contig A (3.5 Mb) and contig B (2.6 Mb). The combined coverage of the two contigs is »6 Mb, and thus consistent with the cytologically estimated size of YR2 of 6 Mb (1), suggesting that contig A and contig B together cover most of YR2. None of the member clones of contig A or contig B appear in the 429 PAC clones of YR1.

contig A is an assembly of 245 PAC clones, whereas contig B consists of 168 PAC clones and one PCR-amplified fragment (which covers a gap in the genomic PACs). Four and six of the 10 Y-linked markers were assigned to contig A and contig B, respectively. During chromosome walking, 137 of >300 primer pairs turned out to be male-specific, confirming that contig A and contig B represent segments of the Y chromosome, and simultaneously indicating that YR2 is a composite of regions specific to the male genome and of sequences shared by other chromosomes. The two contigs could not be further extended for the following reasons: one of the terminal regions of contig A (the end terminating with pMM23-431A8 in SI Fig. 4) shows similarity to retroelements and is highly repetitive also on autosomes. The other terminal regions of contig A and contig B are unique Y-linked sequences, but no further sequences were present in the male genomic library.

Final assembly status for each PAC clone is summarized in SI Table 4. One of the 26 clones in contig B, pMM23-47H4 could not be sequenced, because its DNA was highly unstable and lost segments of its insert randomly, leaving an »70-kb unsequenced gap. The overall gene organization of YR2 is illustrated in Fig. 2.

Acquisition of ESTs

To facilitate gene identification in the Y chromosome, 32,277 5' ESTs were newly generated from thalli and sexual organs of male plants, in addition to previously collected 1,163 nonredundant EST sequences (5). These new ESTs were assembled to generate 4,074 clusters and 6,409 singlets. All ESTs were clustered and resulted in 10,483 nonredundant sequences (SI Data Set 1). In database searches with this set of nonredundant tags, 6,370 (61%) showed BLASTX similari (E value of 1 ´ 10-5 or lower) to amino acid sequences in the public databases. When the 10,483 ESTs were compared to the Y genomic sequences, none of the ESTs tagged YR1 sequences, whereas 31 tagged YR2 sequences (SI Table 5). Five of the 31 tagged sequences do not show similarity to sequences in the public databases. An additional 20 ESTs aligned to portions of the YR2 sequences and were classified as EST homologs because of obvious discrepancies in their alignments (SI Table 6).

About 39% of the M. polymorpha ESTs do not show similarities to known sequences registered in the public databases: assuming that these 5' tags contain at least part of bona fide coding sequences, this suggests that 39% of the coding sequences in M. polymorpha are not detectable by the BLAST approach. This assumption in turn suggests that the analogous similarity search with the genomic sequences of the Y chromosome may have left a similar proportion of the genes undetected. This extrapolation would then raise the total number of genes in YR2 to ~80, including the 48 genes for which similarities were positively identified by the BLAST search against all genes in other organisms .

Genes Potentially Involved in Male Reproductive Functions

The 13 YR2 genes that are present only in the male genome and in addition are expressed in sexual organs but not in thalli are: M19B4.1, M508E5.1, M666C5.1, M480H6.1, M526D2.2, M468B3.1, M355D5.1, M19B4.2, M355D5.2, M350E4.4, M166C5.5, M212C1.4, and M530A10.3. Six of these genes encode proteins for which homologs are found in animals but not in angiosperms as listed in SI Table 7 with the annotations for the homologous genes.

The animal counterparts of the M19B4.1 and M508E5.1 genes have been shown to play a role in spermatogenesis. Mouse and human homologs of M19B4.1, Shippo1 and ODF3, respectively, are localized in sperm flagella (6). The homolog in the flagellated green alga Chlamydomonas reinhardtii has also been assigned to the flagellar machinery (7), suggesting that the M19B4.1 protein is one of the components of sperm flagella. Three copies of the M19B4.1 gene are found in a region of »140 kb. M19B4.1-3 encodes 24 units of a reiterated motif that is also conserved in the mouse and human homologs (SI Fig. 5), while the others, M19B4.1-1 and -2, encode only the C-terminal 12 units of M19B4.1-3, and thus are presumably truncated copies of M19B4.1-3.

The M508E5.1 gene codes for a protein similar to the mouse Parkin-coregulated gene (Pacrg; ref. 8), which is present in mature sperm and required for spermiogenesis in the mouse. Homologs of Pacrg are found in a number of metazoa and flagellated protozoa, suggesting that Pacrg and its homologs, including M508E5.1, are required for flagella formation.

Four more genes have animal counterparts whose functions are as yet unknown (SI Table 7). Their distribution among other organisms and expression patterns suggest that at least three of them participate in male functions. M666C5.1, for example, has homologs in sperm-producing animals and flagellated protozoa, indicating that this group of genes plays some role in flagellar formation and thus in spermatogenesis. The mouse and human homologs of the M480H6.1 and M526D2.2 genes, annotated as hypothetical proteins, are preferentially expressed in testis (UniGene's EST ProfileViewer at the National Center for Biotechnology Information; ref. 9), suggesting that these proteins are also involved in spermatogenesis. Since homologs of these genes were not detected in the currently available C. reinhardtii sequences (Chlamydomonas reinhardtii v2.0, DOE Joint Genome Institute), they might participate in male functions other than flagella formation. It is unclear if the mouse and human homologs of M468B3.1 have male-fertility functions because they show nontestis specific expression patterns, and no homolog has been found in C. reinhardtii thus far. It should also be noted that all but one (M468B3.1) of the six putative spermatogenic genes are detected in the draft genome sequences of the moss Physcomitrella patens, which produces flagellated sperms like M. polymorpha (data not shown).

Repeats, Transposable Elements, and Insertion of Organellar DNAs

Among a variety of putative transposable elements, one class contains a DNA methyltransferase domain (DNMT) associated with the polyprotein typical for retrotransposons, and was designated as DNMT-containing repetitive element (DRE). The DNMT of DRE shows the highest similarity to that of the mammalian DNA methyltransferase 3A (DNMT3a), which is required for imprinting of germ cells (10). DRE is also detected in the female genome and is expressed both in thalli and sexual organs (data not shown). There are only few reports to date on DNA methyltransferases associated with transposable elements (11, 12). The activity of transposable elements is often affected by the degree of methylation (13), but it remains unclear at present whether the DNA methyltransferase domain of DRE is indeed beneficial for its transposition.

Insertions of mitochondrial DNA (14) were detected in both YR1 and YR2 (SI Table 3 and SI Fig. 6). In striking contrast, no insertion of plastid DNA (15) is detected in YR1, and only very few events are found in YR2. No such bias is observed in the completely sequenced genomes of two flowering plant species, Arabidopsis thaliana and Oryza sativa. Because the sequence information on M. polymorpha autosomes is limited, it is presently unclear whether this difference is a characteristic of the Y chromosome or of the species M. polymorpha.

In the YR1 sequences, a portion of the mitochondrial DNA, positions 136,796-136,911 (NC_001660; a fragment containing the intron 4 of the cox1 gene), appears most frequently with ~94% identity, suggesting that the insertion event had occurred before the amplification of these YR1 sequences. Another mitochondrial sequence spanning positions 176,138-176,302 (a fragment of ORF277 or ccmB) was found in fewer YR1 representative clones. This fragment has higher similarity (98%) to the original mitochondrial sequence, suggesting that this insertion event occurred later than the more highly amplified insertion event of the cox1 gene. Concerning the evolution of the Y chromosome, these observations imply that the YR1 domain has been increasing its repetitive nature progressively, inadvertently coamplifying the mitochondrial insertions. In contrast, no such bias in relative abundance, and in the degree of divergence is observed for insertions in YR2. The different stages of divergence of the mitochondrial sequences suggest that these insertion events occurred at various time points during the evolution of the Y chromosome.

Comparison of the Structure of the M104E4.1 Gene and Its X-Linked Homolog, the F62B12.1 Gene

Using one of the X-linked RDA markers, rbf73 (4), an X-linked PAC clone, pMF28-62B12, was isolated from the female genomic library (2) and sequenced (GenBank accession no. AB272581). Similarity search detected a gene containing LOV (Light, Oxygen, and Voltage) domain (16), F62B12.1. By RACE, cDNA sequences of the F62B12.1 and M104E4.1 genes were determined. The deduced amino acid sequences of the X- and Y-linked genes show an overall similarity of 44.5%, with two conserved regions of higher similarity, 91.0% for the first 64 amino acid residues and 83.7% for the LOV domain (SI Fig. 7). The four intron-insertion sites in the coding sequences are identical between the two genes.

Materials and Methods

DNA Fingerprinting.

DNAs of PAC clones derived from YR1 were digested with either BamHI or DraI and fractionated by gel electrophoresis, and the resulting gel images were digitized for band identification by the software Image (17). Restriction fragments <850 bp and >48.5 kb were ignored in the band identification process. Band identification data were collected and transferred to the software FPC for automated grouping. Both Image and FPC were downloaded from www.sanger.ac.uk/Software (18). The parameters for grouping of the BamHI- and DraI-digested PAC clones using FPC were: tolerance of 3 and 4; cutoff of 9 ´10-1 and 1 ´ 10-6, respectively. Assemblies of PAC DNA fingerprints were visually inspected and manually edited.

Chromosome Walking.

Contig maps for YR2 were constructed by a combination of landmark content mapping and restriction digestion fingerprinting. Y-linked RDA markers (4) and PAC-end sequences were used to screen the gridded array of the male genomic PAC library (2). DNAs of isolated PAC clones were subjected to restriction digestion fingerprinting with BamHI and NotI to establish extents of overlaps among them.

1. Okada S, Sone T, Fujisawa M, Nakayama S, Takenaka M, Ishizaki K, Kono K, Shimizu-Ueda Y, Hanajiri T, Yamato KT, et al. (2001) Proc Natl Acad Sci USA 98:9454-9459.

2. Okada S, Fujisawa M, Sone T, Nakayama S, Nishiyama R, Takenaka M, Yamaoka S, Sakaida M, Kono K, Takahama M, et al. (2000) Plant J 24:421-428.

3. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al. (2001) Science 291:1304-1351.

4. Fujisawa M, Hayashi K, Nishio T, Bando T, Okada S, Yamato KT, Fukuzawa H, Ohyama K (2001) Genetics 159:981-985.

5. Nagai J, Yamato KT, Sakaida M, Yoda H, Fukuzawa H, Ohyama K (1999) DNA Res 6:1-11.

6. Egydio de Carvalho C, Tanaka H, Iguchi N, Ventela S, Nojima H, Nishimune Y (2002) Biol Reprod 66:785-795.

7. Li JB, Gerdes JM, Haycraft CJ, Fan Y, Teslovich TM, May-Simera H, Li H, Blacque OE, Li L, Leitch CC, et al. (2004) Cell 117:541-552.

8. Lorenzetti D, Bishop CE, Justice MJ (2004) Proc Natl Acad Sci USA 101:8402-8407.

9. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Church DM, DiCuccio M, Edgar R, Federhen S, Helmberg W, et al. (2005) Nucleic Acids Res 33:D39-D45.

10. Kaneda M, Okano M, Hata K, Sado T, Tsujimoto N, Li E, Sasaki H (2004) Nature 429:900-903.

11. Lyko F, Whittaker AJ, Orr-Weaver TL, Jaenisch R (2000) Mech Dev 95:215-217.

12. Hsu MY, Inouye M, Inouye S (1990) Proc Natl Acad Sci USA 87:9454-9458.

13. Okamoto H, Hirochika H (2001) Trends Plant Sci 6:527-534.

14. Oda K, Yamato K, Ohta E, Nakamura Y, Takemura M, Nozato N, Akashi K, Kanegae T, Ogura Y, Kohchi T, et al. (1992) J Mol Biol 223:1-7.

15. Ohyama K, Fukuzawa H, Kohchi T, Shirai H, Sano T, Sano S, Umesono K, Shiki K, Takeuchi M, Chang Z, et al. (1986) Nature 322:572-574.

16. Crosson S, Rajagopal S, Moffat K (2003) Biochemistry 42:2-10.

17. Sulston J, Mallett F, Durbin R, Horsnell T (1989) Comput Appl Biosci 5:101-106.

18. Soderlund C, Longden I, Mott R (1997) Comput Appl Biosci 13:523-535.