Abstract
Myosins are one of the largest protein superfamilies with 24 classes. They have conserved structural features and catalytic domains yet show huge variation at different domains resulting in a variety of functions. Myosins are molecules driving various kinds of cellular processes and motility until the level of organisms. These are ATPases that utilize the chemical energy released by ATP hydrolysis to bring about conformational changes leading to a motor function. Myosins are important as they are involved in almost all cellular activities ranging from cell division to transcriptional regulation. They are crucial due to their involvement in many congenital diseases symptomatized by muscular malfunctions, cardiac diseases, deafness, neural and immunological dysfunction, and so on, many of which lead to death at an early age. We present Myosinome, a database of selected myosin classes (myosin II, V, and VI) from five model organisms. This knowledge base provides the sequences, phylogenetic clustering, domain architectures of myosins and molecular models, structural analyses, and relevant literature of their coiled-coil domains. In the current version of Myosinome, information about 71 myosin sequences belonging to three myosin classes (myosin II, V, and VI) in five model organisms (Homo Sapiens, Mus musculus, D. melanogaster, C. elegans and S. cereviseae) identified using bioinformatics surveys are presented, and several of them are yet to be functionally characterized. As these proteins are involved in congenital diseases, such a database would be useful in short-listing candidates for gene therapy and drug development. The database can be accessed from http://caps.ncbs.res.in/myosinome.
Keywords: myosin, Myosinome, myosin II, myosin V, myosin VI, myosin database
Introduction
Mobility in higher order organisms is known to be controlled by muscles. Muscles are made up of muscle fibrils, comprising thin and thick filaments, which, in turn, comprised actin and myosin oligomeric assemblies.1 Apart from the conventional fibre-forming myosins in muscles, there are numerous non-muscle myosins localized specifically in various compartments and cytoplasmic or nuclear regions of the cell. Preliminary sequence-based phylogenetic analysis suggested myosins are subdivided into 24 different classes2 and are denoted from myosin I to XXIV.
Myosins are molecular motors that bring about a movement by converting chemical energy to kinetic energy through ATP hydrolysis and conformational changes. The resultant motion of the motor is utilized in numerous mobility-associated functions inside the cell through self-assembly as well as association with various subcellular cargoes and molecules. For instance, myosin type II self-assembles into oligomeric structures at the cell plate and participates in cleavage furrow formation during cell division.3,4 Myosin type V forms dimers and participates in processes like melanosome and vesicle transport.5 Myosin V proteins are also reported to take part in growth cone formation in neuronal cells.6 Yet another interesting myosin, myosin VI, is the only one known to have retrograde motility towards the minus end of actin filament. These myosins act as pressure sensors at the cochlear hair cells of the inner ear. Indeed, a genetic mutation resulting in a non-functional myosin VI gives rise to congenital hearing disability.
Myosins have the general structure of a head domain followed by a lever arm (neck region) and a C-terminal tail. The N-terminal head possesses the ATPase activity and also retains actin-binding sites.7–9 In the actin-bound form of myosin, ATP hydrolysis effects a translation of conformational changes leading to a swinging motion of the lever arm. The head domain attached to the swinging lever arm finds the next binding site on the actin filament resulting in a stepping forward kind of motion.10 The molecular mechanism of head domain function is more or less conserved, whereas the tail regions differ considerably in their sequence, fine structure, and function.11
Myosins are relatively large proteins of 1000 to 2000 base pairs long. Sequence similarity-based phylogeny of the head domain sequences not only display different classes of myosins,12–14 the various classes are hypothesised to be derived from an ancestral myosin, which, in turn, is evolved from the P-loop NTPases.10 The overall domain architecture of myosins in many cases differs considerably at the tail region as in myosin types III, XII, XIV, and XVI, which do not possess an alpha helical region that can fold into a coiled coil. Instead, the head domain is directly connected to a globular tail domain.15 Myosinome database domain boundaries could be assigned using multiple approaches, and a consensus domain architecture is provided for each of the sequences.
The variations among myosins are not only due to differences in domain architectures. There have been incidences of insertions and deletions as well as amino acid substitution events resulting in functional variety. The unique insert domain of myosin VI neck is one such example where the insertion has resulted in reverting the direction of motion of the protein. The tail regions in most of the myosin types possess an alpha helix that may dimerize into a coiled coil structure that precedes a globular cargo-binding domain. The coiled coil region with a characteristic heptad repeat pattern where the 1st and 4th residues of the heptad (the a and d positions) are expected to be hydrophobic. This arrangement of amino acids forms a hydrophobic seam and results in the formation of a coiled coil through the knobs-into-holes arrangement of side chain packing, forming the hallmark or structural signature of a coiled coil.16 Many existing coiled coil prediction methods predict a continuous coiled coil and also are capable of reporting stutters and stammers. But experiments have proved that the predicted coiled coils in myosins are not necessarily interacting strongly to form perfect coiled coils. Instead, they may interact in parallel or even fold on to form trimers.17–19 Hence, the three-dimensional structure was obtained by imposing a coiled coil structure through a homology modeling tool (MODELLER)20 at the predicted coiled coil regions in all sequence entries in the database and later analyzed for the hydrophobic content of the central seam. Thus, the Myosinome database was developed as a platform to integrate the knowledge base and predictions on the myosin sequence, architecture, structure, and function information from various sources.
Construction and Content
Data collection and integration
For sequence data collection, we employed a sensitive sequence search tool, position-specific iterative blast (PSI-BLAST). We set up the searches using BLAST+ package21 against the proteomes of five model organisms: human, mouse, Drosophila, C. elegans, and yeast. Since the Myosinome database focuses on the domain architectures and specifically on the coiled coil, we restricted our searches to myosin II, IV, V, and VI. Searches were performed using 24 myosin sequences that were obtained from an initial text search at the National Center for Biotechnology Information (NCBI). At least one sequence from each of the selected myosin classes were employed as a query in such sequence searches. We employed a relatively relaxed E-value cutoff of 1e-5 in order to increase the coverage. A list of unique hits was pooled derived from all the searches using 24 query sequences.
As a second step in data collection, we validated the sequences obtained from PSI-BLAST.22 We adopted a three-fold validation. At first, using a string-based annotation filter script (genepeptscript1), we separated the annotated myosins from PSI-BLAST hits into myosin types II, IV, V, and VI using the GenePept data from NCBI as input. This filter also categorized the annotated myosin classes other than II, IV, V, and VI into a separate category. The sequences classified in this way were validated, in the second step, by assessing the evolutionary relationships of their head domains. For this, we used a phylogenetic clustering method, neighbor joining, in the PHYLIP package.23 Head domains of PSI-BLAST hits were seeded into a set of sequences comprising other known myosins of 19 different classes and a few non-myosin sequences, such as helicases, ATPases, and Kinesins. These non-myosins acted as outgroups and helped in ensuring the selection of myosins that grouped with other myosin sequences alone. This sequence similarity-based clustering was useful in assigning the myosin classes to each of the hits and cross validating before we organized them into a database. As a third step of validation, we employed an integrated domain architecture definition obtained from various databases to provide a more comprehensive and complete definition of domains to the sequences obtained. The domain definitions were obtained from CDD,24 PFAM,25 COILS,26 and so on. Boundary definitions were obtained from the SMART database. A detailed schema for sequence retrieval and validation protocol is provided in Figure 1.
Coiled coil regions—the dimerization motifs— of each of the identified myosin sequences were modeled using tropomyosin as a template using MODELLER software. Whereas the length of coiled coil was much longer than the template in cases like myosin II and V, models were generated by a protocol that involves repeating the same template over and over again with an overlapping region spanning 70 residues (Margaret et al, oral communication; December, 2010). The models generated this way were analyzed for charged patches using an in-house script. Charged patches are defined based on the density of charged residues in every heptad. An empirical scoring scheme was applied to calculate the charge density by counting the number of charged residues within a distance cutoff of 12Å. The charge density positively correlated with the total number of charged residues. Hence, a gradient color scheme representation was adopted to demarcate dense charged patches from weak charged patches. Apart from our own models, available crystal structures, NMR structures, and cryo-electron microscopy structures were also collected from Protein Data Bank (PDB)27 through keyword searches and literature-based searches.
Sequence and structural level data should be useful in the case of myosins whose functions are not yet experimentally characterized. However, numerous experiments have been performed to understand this class of motor protein, and a consolidation of this functional information will be enlightening for finding future directions. Function of a molecule is inseparable from the pathways that it takes part in. Hence, we provide available KEGG pathway maps for different classes of myosins in this database.28 Finally, a list of related literature is also presented along with the PubMed links. Literature collection involved keyword searches in PubMed. IDs of the literature relevant to individual protein accession number were obtained using automated scripts from the GenePept file.
Database implementation
All the results are delivered to the user through a web application. The database was implemented in MySQL and the graphical user interface (GUI) was designed in HTML. The communication between the GUI and the back-end database in order to retrieve information from the stored myosin data upon user query was achieved using PHP codes. The backend BLAST parser scripts were written in Perl, and domain architecture drawings were done using HTML style sheets and mouse over functions. Phylogenetic trees, models of the coiled coils, and charge distributions are provided as images. Coiled coil models are displayed in more friendly and interactive fashion by using JMOL applets. The entire application is set up behind an Apache web server using LAMP (Linux-Apache-MySQL-PHP/Perl) (See Fig. 2 for a schema of Myosinome web applications).
Utility and Discussion
Browsing
It is possible to browse through the Myosinome database in three ways: search using keyword, browse by organism and search using sequence.
In the keyword search box, users can provide different texts viz, myosin, myosin type II, or the accession number directly. This facilitates searching when a user knows a specific sequence accession number or when a user wants to search all the sequences of a particular myosin type in all the genomes in the Myosinome database.
The home page of the database provides a facility to submit partial or complete sequences in FASTA format by the user to search for similar sequences in the database. A background BLAST search would be performed and display the related sequences available in database.
Alternately, there is a panel with the list of organisms provided, each of which, upon mouse click, will lead to the list of myosins within the whole genome along with their accession numbers. The bottom panel in the home page tabulates the general information on myosins in four columns as myosin types, organisms, PDB structures, and KEGG maps and also provides reference to representative classical articles to each type of myosin (Fig. 3).
Content
The accession numbers of individual myosins are linked to a page where detailed information associated with the sequence is provided in a tabular format (Fig. 4). Important information in this table is displayed in two main panels. The first panel covers sequence-, structure-, and function-related data in three fields, and the bottom panel contains the most relevant literature corresponding to the protein of interest. The foremost field of the first panel provides the sequence and a phylogenetic tree. The phylogeny includes classical myosins from different subtypes with the position of the query highlighted in red color font to recognize its evolutionary position amid the other myosin subtypes. The domain architecture of sequence is shown as a block diagram and a mouseover option provides the domain boundaries for each of the domains. This is followed by a structure section where the three-dimensional model of the coiled coil domain of the sequence is provided for download as well as for graphical display. Besides, a graphical representation of charged patches is displayed (Fig. 5; please also see Section on Construction and Content) along with the hydrophobic residue distribution at the central region at the a and d positions of the heptads. These would give user clear understanding of the relative stability of protein–protein interaction sites and would be of predictive value in designing point mutations for functional analysis. Links to context-dependent help pages are provided where the methodology is explained in detail allowing the user to analyze the coiled coils for designing experiments.
Conclusions
Myosinome is a unique repository of sequence, structural, and function data of a limited number of myosin types from select model organisms. It provides downloadable files and in-depth information on sequence, structure, functions, pathways, and relevant literature. It focuses on aspects of variability such as sequence level differences as well as domain architectural level changes. Sequence level changes are examined through the analysis of the coiled coil, which is a crucial domain facilitating self-assembly and interaction with other proteins. The community participation portal in the Myosinome web application is a feature that allows expansion of the database. It is a fully integrated resource with external data sources for bridging maximum knowledge. The interface of Myosinome is navigation-friendly, efficient, and easy to use and, thereby, accelerates knowledge exploration and accumulation. Myosinome would be useful in performing sequence similarity searches against a database of known myosins belonging to selected classes and also in designing mutational experiments that can further expand the functional knowledge base of myosins.
Acknowledgements
We thank NCBS for infrastructural facilities. We thank Prof. Jim Spudich for useful discussions and Mr. Anu Nair for help with scripts on block diagrams of domain architectures.
Footnotes
Author Contributions
RS and DPS devised the framework of database. DPS, KS, NS, MS, CC and RS contributed to the analysis of sequence and structures of myosins. MI and DPS developed the back-end and web interface. RS and DPS participated in discussions and manuscript preparation.
Competing Interests
Author(s) disclose no potential conflicts of interest.
Disclosures and Ethics
As a requirement of publication author(s) have provided to the publisher signed confirmation of compliance with legal and ethical obligations including but not limited to the following: authorship and contributorship, conflicts of interest, privacy and confidentiality and (where applicable) protection of human and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that this article is unique and not under consideration or published in any other publication, and that they have permission from rights holders to reproduce any copyrighted material. Any disclosures are made in this section. The external blind peer reviewers report no conflicts of interest.
Funding
This work was supported by Human Frontier Science Program (HFSP) Grant (Code: RGP0054/2009-C/).
References
- 1.Fischmanan DA. Electron microscope study of myofibril formation in embryonic chick skeletal muscle. J Cell Biol. 1967;3:557–75. doi: 10.1083/jcb.32.3.557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Foth BJ, Goedecke MC, Soldati D. New insights into myosin evolution and classification. Proc Natl Acad Sci U S A. 2006;103(10):3681–6. doi: 10.1073/pnas.0506307103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Yumura S, Fukui Y. Reversible cyclic AMP-dependent change in distribution of myosin thick filaments in Dictyostelium. Nature. 1985;314:194–6. doi: 10.1038/314194a0. [DOI] [PubMed] [Google Scholar]
- 4.Zang J, Spudich JA. Myosin II localization during cytokinesis occurs by a mechanism that does not require its motor domain. Proc Natl Acad Sci U S A. 1998;95(23):13652–7. doi: 10.1073/pnas.95.23.13652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hasson T. Myosin VI: two distinct roles in endocytosis. J Cell Sci. 2003;116:3453–61. doi: 10.1242/jcs.00669. [DOI] [PubMed] [Google Scholar]
- 6.Desnos C, Huet S, Darchen F. ‘Should I stay or should I go?’: myosin V function in organelle trafficking. Biol Cell. 2007;99:411–23. doi: 10.1042/BC20070021. [DOI] [PubMed] [Google Scholar]
- 7.Bozler E, Prince JT. The control of energy release in extracted muscle fibers. J Gen Physiol. 1953;37(1):53–61. doi: 10.1085/jgp.37.1.53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Emoto Y, Kawamura T, Tawada K. Characterization of the ATPase active site in myosin subfragment-1 with the use of vanadate plus ADP as a reversible “affinity-labeling” reagent: evidence for heterogeneity in the active sites. J Biochem. 1985;98(3):735–45. doi: 10.1093/oxfordjournals.jbchem.a135331. [DOI] [PubMed] [Google Scholar]
- 9.Dalgarno DC, Prince HP, Levine BA, Trayer IP. Identification of a surface actin-binding site on myosin. Biochim Biophys Acta. 1982;707(1):81–8. doi: 10.1016/0167-4838(82)90399-5. [DOI] [PubMed] [Google Scholar]
- 10.Purcell TJ, Morris C, Spudich JA, Sweeney HL. Role of the lever arm in the processive stepping of myosin V. Proc Natl Acad Sci U S A. 2002;99(22):14159–64. doi: 10.1073/pnas.182539599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Krendel M, Mooseker MS. Myosins: tails (and heads) of functional diversity. Physiology (Bethesda) 2005;20(4):239–51. doi: 10.1152/physiol.00014.2005. [DOI] [PubMed] [Google Scholar]
- 12.Thompson RF, Langford GM. Myosin superfamily evolutionary history. Anat Rec. 2002;268:276–89. doi: 10.1002/ar.10160. [DOI] [PubMed] [Google Scholar]
- 13.Foth JB, Goedecke MC, Soldati D. New insights into myosin evolution and classification. Proc Natl Acad Sci U S A. 2006;103(10):3681–6. doi: 10.1073/pnas.0506307103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Richards TA, Cavalier-Smith T. Myosin domain evolution and the primary divergence of eukaryotes. Nature. 2005;436(25):1113–8. doi: 10.1038/nature03949. [DOI] [PubMed] [Google Scholar]
- 15.Goodson HV, Dawson SC. Multiplying myosins. Proc Natl Acad Sci U S A. 2006;103(10):3498–9. doi: 10.1073/pnas.0600045103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Crick FHC. The Fourier transform of a coiled-coil. Acta Crystallogr. 1953;6:685–97. [Google Scholar]
- 17.Knight PJ, Thirumurugan K, Yu Y, et al. The predicted coiled-coil domain of myosin 10 forms a novel elongated domain that lengthens the head. J Biol Chem. 2005;280:34702–8. doi: 10.1074/jbc.M504887200. [DOI] [PubMed] [Google Scholar]
- 18.Mukherjea M, Llinas P, Kim H, et al. Myosin VI dimerization triggers an unfolding of a three-helix bundle in order to extend its reach. Mol Cell. 2009;35(3):305–15. doi: 10.1016/j.molcel.2009.07.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Spudich JA, Sivaramakrishnan S. Myosin VI: an innovative motor that challenged the swinging lever arm hypothesis. Nat Rev Mol Cell Biol. 2010;11(2):128–37. doi: 10.1038/nrm2833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol. 1993;234:779–815. doi: 10.1006/jmbi.1993.1626. [DOI] [PubMed] [Google Scholar]
- 21.Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Syamaladevi DP, Kalaimathy S, Pasha N, Subramonian N, Sowdhamini R. A three-step validation following genome-wide data mining for myosin family members improves search efficiency. Procceedings of IEEE ICDM 2011 Workshop on Biological Data Mining and its Applications in Healthcare; Dec 10, 2011; Vancouver, Canada. [Google Scholar]
- 23.Felsenstein J. PHYLIP—Phylogeny Inference Package (Version 3.2) Cladistics. 1989;5:164–6. [Google Scholar]
- 24.Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004;32:W327–31. doi: 10.1093/nar/gkh454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Finn RD, Mistry J, Tate J, et al. The Pfam protein families database. Nucleic Acids Res. 2010;38:D211–22. doi: 10.1093/nar/gkp985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Lupas A, Dyke MV, Stock J. Predicting coiled coils from protein sequences. Science. 1991;252:1162–4. doi: 10.1126/science.252.5009.1162. [DOI] [PubMed] [Google Scholar]
- 27.Berman HM, Westbrook J, Feng Z, et al. The Protein Data Bank. Nucleic Acids Res. 2000;8:235–42. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]