Abstract
Gene mutation (e.g. substitution, insertion and deletion) and related phenotype information are important biomedical knowledge. Many biomedical databases (e.g. OMIM) incorporate such data. However, few studies have examined the quality of this data. In the current study, we examined the quality of protein single-point mutations in the OMIM and identified whether the corresponding reference sequences align with the mutation positions. Our results show that close to 20% of mutation data cannot be mapped to a single reference sequence. The failed mappings are caused by position conflict, site shifting (peptide, N-terminal methionine) and other types of data error. We propose a preliminary model to resolve such inconsistency in the OMIM database.
Electronic Supplementary Material
Supplementary material is available for this article at 10.1007/s13238-012-2037-2 and is accessible for authorized users.
Keywords: single-point mutation, OMIM, reference sequence, data quality
Electronic supplementary material
References
- Alonso G., Koegl M., Mazurenko N., Courtneidge S.A. Sequence requirements for binding of Src family tyrosine kinases to activated growth factor receptors. J Biol Chem. 1995;270:9840–9848. doi: 10.1074/jbc.270.25.15315. [DOI] [PubMed] [Google Scholar]
- Cambien F., Tiret L. Genetics of cardiovascular diseases: from single mutations to the whole genome. Circulation. 2007;116:1714–1724. doi: 10.1161/CIRCULATIONAHA.106.661751. [DOI] [PubMed] [Google Scholar]
- Caporaso J.G., Baumgartner W.A., Jr, Randolph D.A., Cohen K.B., Hunter L. MutationFinder: a high-performance system for extracting point mutation mentions from text. Bioinformatics. 2007;23:1862–1865. doi: 10.1093/bioinformatics/btm235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- George R.A., Smith T.D., Callaghan S., Hardman L., Pierides C., Horaitis O., Wouters M.A., Cotton R.G.H. General mutation databases: analysis and review. J Med Genet. 2008;45:65–70. doi: 10.1136/jmg.2007.052639. [DOI] [PubMed] [Google Scholar]
- Giardine B., van Baal S., Kaimakis P., Riemer C., Miller W., Samara M., Kollia P., Anagnou N.P., Chui D.H.K., Wajcman H., et al. HbVar database of human hemoglobin variants and thalassemia mutations: 2007 update. Hum Mutat. 2007;28:206. doi: 10.1002/humu.9479. [DOI] [PubMed] [Google Scholar]
- Hamosh A., Scott A.F., Amberger J.S., Bocchini C.A., McKusick V.A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33:D514–D517. doi: 10.1093/nar/gki033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Horaitis O., Talbot C.C., Jr, Phommarinh M., Phillips K.M., Cotton R.G.H. A database of locus-specific databases. Nat Genet. 2007;39:425. doi: 10.1038/ng0407-425. [DOI] [PubMed] [Google Scholar]
- Horn F., Lau A.L., Cohen F.E. Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors. Bioinformatics. 2004;20:557–568. doi: 10.1093/bioinformatics/btg449. [DOI] [PubMed] [Google Scholar]
- Kanagasabai R., Choo K.H., Ranganathan S., Baker C.J.O. A workflow for mutation extraction and structure annota tion. J Bioinform Comput Biol. 2007;5:1319–1337. doi: 10.1142/S0219720007003119. [DOI] [PubMed] [Google Scholar]
- Lee A.W., States D.J. Both src-dependent and -independent mechanisms mediate phosphatidylinositol 3-kinase regulation of colony-stimulating factor 1-activated mitogen-activated protein kinases in myeloid progenitors. Mol Cell Biol. 2000;20:6779–6798. doi: 10.1128/MCB.20.18.6779-6798.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leinonen R., Nardone F., Zhu W., Apweiler R. UniSave: the UniProtKB sequence/annotation version database. Bioinformatics. 2006;22:1284–1285. doi: 10.1093/bioinformatics/btl105. [DOI] [PubMed] [Google Scholar]
- Li Z., Liu X., Wen J., Xu Y., Zhao X., Li X., Liu L., Zhang X. “DRUMS: A human disease related unique gene mutation search engine”. Human Mutation. 2011;32:E2259–E2265. doi: 10.1002/humu.21556. [DOI] [PubMed] [Google Scholar]
- Ostell J. Data Sharing: Standards for Bioinformatic Cross-Talk. Hum Mutat. 2009;30:vii–vii. doi: 10.1002/humu.21013. [DOI] [Google Scholar]
- Rebholz-Schuhmann D., Marcel S., Albert S., Tolle R., Casari G., Kirsch H. Automatic extraction of mutations from Medline and cross-validation with OMIM. Nucleic Acids Res. 2004;32:135–142. doi: 10.1093/nar/gkh162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tatusova T.A., Madden T.L. BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett. 1999;174:247–250. doi: 10.1111/j.1574-6968.1999.tb13575.x. [DOI] [PubMed] [Google Scholar]
- Wheeler D. L., Barrett T., Benson D. A., Bryant S. H., Canese K., Chetvernin V., Church D. M., DiCuccio M., Edgar R., Federhen S., Geer L. Y., Kapustin Y., Khovayko O., Landsman D., Lipman D. J., Madden T. L., Maglott D. R., Ostell J., Miller V., Pruitt K. D., Schuler G. D., Sequeira E., Sherry S. T., Sirotkin K., Souvorov A., Starchenko G., Tatusov R. L., Tatusova T. A., Wagner L., Yaschenko E. Database resources of the National Center for Biotechnology Information. Nucl Acids Res. 2007;35(Database):D5–D12. doi: 10.1093/nar/gkl1031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xi H., Park J., Ding G., Lee Y.-H., Li Y. SysPIMP: the web-based systematical platform for identifying human disease-related mutated sequences from mass spectrometry. Nucl Acids Res. 2009;37:D913–D920. doi: 10.1093/nar/gkn848. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.