Schematic overview, taxonomic distribution and ancestral character analysis of the extended LHC protein superfamily. (A) Schematic overview of the major protein families. The drawing shows the approximate position and relative length of conserved regions and sequence motifs including the CB motif-containing first and third TM alpha helices. Details for each of the identified sequences are given in Additional file 1, Table S1. The classification into a limited set of families and subfamilies was based on three main criteria: (i) sequence similarity to members of already described or newly defined families or subfamilies (like OHP2 and RedCAP) using HMM and local BLAST analysis against our own database, (ii) the predicted secondary structures with the number and order of predicted TM helices, and (iii) predicted sequence motifs, like the CB motifs and carotenoid-binding motifs. One-helix sequences are divided into the well conserved, nuclear-encoded OHP1 limited to the green lineage and the more diverse, plastid-encoded or cyanobacterial HLIPs. Nuclear one-helix sequences with no pronounced similarity to OHP1 are referred to as OHP1-like. (B) Presence (black circle) and absence (white circle) of the LHC superfamily sequences in genomic and EST (indicated by a star*) databases of twelve photosynthetic eukaryotes representing all major lineages of Plantae and three divergent cyanobacteria (both unicellular, and filamentous with heterocysts). Genus names are used (for species names refer to Additional file 1, Table S1). The locations of the genes are marked with "p" for plastid-encoded or with "n" for nuclear-encoded. (C) Ancestral character evolution analysis for plastid-related genes of cyanobacterial origin (see also Methods). The distribution of distinct families of the extended LHC protein superfamily is indicated on a given species tree corresponding to a consensus plastid phylogeny. Gloeobacter violaceus was used to root the tree. This analysis suggests an evolutionary origin for the different families, which is indicated on the tree by a colored circle followed by their names.