Diversity of GH43 modules or proteins. (A) Unrooted phylogenetic tree of GH43 proteins. The eukaryotic and bacterial GH43 proteins with EC numbers in the CAZy database and GH43 proteins from F. succinogenes S85 obtained from the FibRumba database were used to generate the phylogenetic tree. GH43 domains of the retrieved proteins were aligned by utilizing ClustalW. The sequence alignments are shown in Fig. S3 of the supplemental material. The GenBank accession numbers (source of protein) were as follows: ACE84667 (Cellvibrio japonicus Ueda107) (39), BAC68753 (Streptomyces avermitilis MA-4680) (21), BAA90772 (Streptomyces chartreusis GS901) (38), BAF98235 (Vibrio sp. XY-214) (60), AAF66622 (Azospirillum irakense KBC1), CAB13699 (Bacillus subtilis subsp. subtilis ATCC 6051) (7), AAB08024 (Bacteroides ovatus V975), AAO75476 (Bacteroides thetaiotaomicron VPI-5482), AAO67499 (Bifidobacterium adolescentis DSM20083) (64), AAA63610 (Butyrivibrio fibrisolvens GS113) (61), AAB87371 (Caldicellulosiruptor saccharolyticus), BAA02527 (Clostridium stercorarium F-9) (48), CAD48310 (Clostridium stercorarium NCIMB 11745) (2), AAB97967 (Selenomonas ruminantium GA192) (25), ACF39706 (uncultured bacterium) (70), ABB92159 (uncultured bacterium) (68), BAB07402 (Bacillus halodurans C-125) (52), AAC97375 (Bacillus pumilus PLS) (34), CAA29235 (Bacillus pumilus IPO) (72), AAC27699 (Bacillus sp. KK-1) (26), AAB41091 (Bacillus subtilis), BAC87941 (Clostridium stercorarium F-9) (55), ABI49959 (Geobacillus stearothermophilus T-6) (50), ABC75004 (Geobacillus thermoleovorans IT-08) (69), CAA89208 (Prevotella bryantii B14) (16), BAA20372 (Bacillus subtilis IFO3134), CAA99586 (Bacillus subtilis subsp. subtilis 168T+) (35), CAB15969 (Bacillus subtilis subsp. subtilis 168T+) (23), ACE73676 (Geobacillus stearothermophilus T-6) (3), BAB64339 (Bacillus thermodenitrificans TS-3) (58), ABN51896 (Clostridium thermocellum ATCC 27405) (20), BAC69820 (Streptomyces avermitilis MA-4680) (19), AAB95326 (Caldicellulosiruptor sp. Rt69B.1), AAD30363 (Caldicellulosiruptor sp. Tok7B.1), CAA40378 (Paenibacillus polymyxa) (17), AAG27441 (Aspergillus aculeatus 101.43) (51), EAA58736 (Aspergillus nidulans FGSC A4) (4), EAA58810 (Aspergillus nidulans FGSC A4) (4), AAA32682 (Aspergillus niger) (46), CAK49041 (Aspergillus niger) (46), BAD89094 (Penicillium chrysogenum 31B), BAD15018 (Penicillium chrysogenum 31B) (47), BAE55732 (Aspergillus oryzae RIB40) (56), AAC67554 (Cochliobolus carbonum) (71), XP_391644 (Gibberella zeae PH-1), BAC75546 (Penicillium herquei IFO 4674) (24), XP_391670 (Gibberella zeae PH-1), CAL81199 (Humicola insolens DSM 18000) (54), ACP50519 (Penicillium purpurogenum MYA-38), BAH29957 (Irpex lacteus NBRC5367) (32), and BAD98241 (Phanerochaete chrysosporium) (22). The identifications of the proteins from F. succinogenes S85 were as denoted in the FibRumba database (http://www.jcvi.org/rumenomics). An abbreviation for the EC number(s) for each protein is shown in the square brackets. The abbreviations are as follows: [8], EC 3.2.1.8 (xylanase); [37], EC 3.2.1.37 (β-1,4-xylosidase); [55], EC 3.2.1.55 (α-l-arabinofuranosidase); [72], EC 3.2.1.72 (β-1,3-xylosidase); [99], EC 3.2.1.99 (arabinanase); [145], EC 3.2.1.145 (galactan 1,3-β-galactosidase). When there is biochemical evidence for the enzymatic activity, the numbers in the square brackets denoting the particular activity are underlined, and the references are included in the legend above, after the GenBank accession numbers. Bar, 0.1 amino acid substitutions per single site. The bootstrap values are shown at the branch points. Differences in domain organizations of proteins are represented by the different colors: green (type I), blue (type II), red (type III), and black (type IV). (B) Classification of GH43 proteins based on the presence or absence of the commonly associated modules (CBM6 or XX domain). The GH43 proteins were grouped into type I (composed mostly of GH43 stand-alone module), type II (GH43 module fused at the C terminus to a CBM6), type III (GH43 module fused at the C terminus to an XX domain), and type IV (a more complex GH43 protein with a modular architecture outside types I, II, and III). Only the GH43 module was used in the phylogenetic analysis in panel A.