Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2016 Jul 29;12(7):e1005038. doi: 10.1371/journal.pcbi.1005038

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2016 Bernardes et al

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PMC Copyright notice

Fig 1 — The 1st step (top) concerns the construction of domain profiles from the Pfam database. A specific set of species can be furnished to CLADE (optional) to guide the selection of homologous sequences (and species) for the construction of clade-centered models (CCM), otherwise set to be a random selection. The output is a library of probabilistic models: for each Pfam domain, it contains a SCM, provided by Pfam, and a large number of CCMs, associated to the FULL set of Pfam sequences for the domain. All probabilistic models are used to identify potential domains occurring in query sequences. The schema illustrates the model construction for domain D¹; it is applied to all Pfam domains. The 2nd step (middle) matches all models generated in step 1 against query sequences belonging to a given genome or to a set of sequences given as input, and identifies a set of potential domains occurring in the sequences. Then, it filters potential domains by using support vector machines. For each domain, it constructs a SVM that combines multiple features extracted from the SCM and CCM models associated to the domain. The schema illustrates domain identification for a given query sequence; it is applied to all input sequences. The 3rd step (bottom) takes the position of potential domains in a query sequence (from step 2) and runs DAMA, a tool designed to predict domain architectures from known ones.