Skip to main content
. 2024 Jan 4;25:11. doi: 10.1186/s12859-023-05586-5

Fig. 3.

Fig. 3

Data leakage and multi-domain proteins. A prime example of evolutionarily-induced data leakage stems from the modular anatomy of many proteins, wherein multiple copies (which often vary only slightly, e.g. as paralogs) of a particular domain are stitched together into a full-length protein. This type of phenomenon is particularly prevalent among protein homologs from more phylogenetically recent species (e.g., eukaryotes like human or yeast, versus archaeal or bacterial lineages). Notably, many proteins that contain SH3, OB and Ig domains are found to include multiple copies of those domains. Examples are schematically illustrated here, using PDB entries 2QQR, 1SSF, 3WGI, and 3L5H