Fig. 7.
Database subfamily representation. Proposed database representation for TE subfamilies maintaining a detailed phylogenetic structure while reducing the representative models for practical genome-scale annotation. The TE seed alignment (1) from a family with evidence of subfamily structure is analysed by a clustering method to produce a detailed subfamily structure and membership (2). Sequence models are developed for subfamilies and lumped (3) if model performance isn’t improved by the subdivision of two or more subfamilies. The lumped families and their corresponding seed alignments are added to the database (4) with metadata holding the detailed tree structure and seed sequence membership for each subfamily