ChemmineR: a compound mining framework for R

Yiqun Cao; Anna Charisi; Li-Chang Cheng; Tao Jiang; Thomas Girke

doi:10.1093/bioinformatics/btn307

. 2008 Jul 2;24(15):1733–1734. doi: 10.1093/bioinformatics/btn307

ChemmineR: a compound mining framework for R

Yiqun Cao ¹, Anna Charisi ², Li-Chang Cheng ¹, Tao Jiang ¹, Thomas Girke ^2,^*

PMCID: PMC2638865 PMID: 18596077

Abstract

Motivation: Software applications for structural similarity searching and clustering of small molecules play an important role in drug discovery and chemical genomics. Here, we present the first open-source compound mining framework for the popularstatistical programming environment R. The integration with a powerful statistical environment maximizes the flexibility, expandability and programmability of the provided analysis functions.

Results: We discuss the algorithms and compound mining utilities provided by the R package ChemmineR. It contains functions for structural similarity searching, clustering of compound libraries with a wide spectrum of classification algorithms and various utilities for managing complex compound data. It also offers a wide range of visualization functions for compound clusters and chemical structures. The package is well integrated with the online ChemMine environment and allows bidirectional communications between the two services.

Availability: ChemmineR is freely available as an R package from the ChemMine project site: http://bioweb.ucr.edu/ChemMineV2/chemminer

Contact: thomas.girke@ucr.edu

REFERENCES

Carhart R, et al. Atom pairs as molecular features in structure-activity studies: definition and applications. J. Chem. Inf. Comput. Sci. 1985;25:64–73. [Google Scholar]
Chen J, et al. ChemDB: a public database of small molecules and related chemoinformatics resources. Bioinformatics. 2005;21:4133–4139. doi: 10.1093/bioinformatics/bti683. [DOI] [PubMed] [Google Scholar]
Chen X, Reynolds C. Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients. J. Chem. Inf. Comput. Sci. 2002;42:1407–1414. doi: 10.1021/ci025531g. [DOI] [PubMed] [Google Scholar]
Gedeck P, et al. QSAR–how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets. J. Chem. Inf. Model. 2006;46:1924–1936. doi: 10.1021/ci050413p. [DOI] [PubMed] [Google Scholar]
Gentleman R, et al. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer; 2005. [Google Scholar]
Girke T, et al. ChemMine. A compound mining database for chemical genomics. Plant Physiol. 2005;138:573–577. doi: 10.1104/pp.105.062687. [DOI] [PMC free article] [PubMed] [Google Scholar]
Guha R, et al. The Blue obelisk-interoperability in chemical informatics. J. Chem. Inf. Model. 2006;46:991–998. doi: 10.1021/ci050400b. [DOI] [PMC free article] [PubMed] [Google Scholar]
Holliday JD, et al. Analysis and display of the size dependence of chemical similarity coefficients. J. Chem. Inf. Comput. Sci. 2003;43:819–828. doi: 10.1021/ci034001x. [DOI] [PubMed] [Google Scholar]
Irwin JJ, Shoichet BK. ZINC–a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 2005;45:177–182. doi: 10.1021/ci049714. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lang DT, et al. rggobi: interface between R and GGobi. 2007 R package version 2.1.7. [Google Scholar]
O'Boyle NM, et al. Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem. Cent. J. 2008;2:1–7. doi: 10.1186/1752-153X-2-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
R Development Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2008. ISBN 3-900051-07-0. [Google Scholar]
Raymond J, et al. Heuristics for similarity searching of chemical graphs using a maximum common edge subgraph algorithm. J. Chem. Inf. Comput. Sci. 2002;42:305–316. doi: 10.1021/ci010381f. [DOI] [PubMed] [Google Scholar]
Seiler KP, et al. ChemBank: a small-molecule screening and cheminformatics resource database. Nucleic Acids Res. 2008;36(Database issue):351–359. doi: 10.1093/nar/gkm843. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B1] Carhart R, et al. Atom pairs as molecular features in structure-activity studies: definition and applications. J. Chem. Inf. Comput. Sci. 1985;25:64–73. [Google Scholar]

[B2] Chen J, et al. ChemDB: a public database of small molecules and related chemoinformatics resources. Bioinformatics. 2005;21:4133–4139. doi: 10.1093/bioinformatics/bti683. [DOI] [PubMed] [Google Scholar]

[B3] Chen X, Reynolds C. Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients. J. Chem. Inf. Comput. Sci. 2002;42:1407–1414. doi: 10.1021/ci025531g. [DOI] [PubMed] [Google Scholar]

[B4] Gedeck P, et al. QSAR–how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets. J. Chem. Inf. Model. 2006;46:1924–1936. doi: 10.1021/ci050413p. [DOI] [PubMed] [Google Scholar]

[B5] Gentleman R, et al. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer; 2005. [Google Scholar]

[B6] Girke T, et al. ChemMine. A compound mining database for chemical genomics. Plant Physiol. 2005;138:573–577. doi: 10.1104/pp.105.062687. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] Guha R, et al. The Blue obelisk-interoperability in chemical informatics. J. Chem. Inf. Model. 2006;46:991–998. doi: 10.1021/ci050400b. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] Holliday JD, et al. Analysis and display of the size dependence of chemical similarity coefficients. J. Chem. Inf. Comput. Sci. 2003;43:819–828. doi: 10.1021/ci034001x. [DOI] [PubMed] [Google Scholar]

[B9] Irwin JJ, Shoichet BK. ZINC–a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 2005;45:177–182. doi: 10.1021/ci049714. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] Lang DT, et al. rggobi: interface between R and GGobi. 2007 R package version 2.1.7. [Google Scholar]

[B11] O'Boyle NM, et al. Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem. Cent. J. 2008;2:1–7. doi: 10.1186/1752-153X-2-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] R Development Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2008. ISBN 3-900051-07-0. [Google Scholar]

[B13] Raymond J, et al. Heuristics for similarity searching of chemical graphs using a maximum common edge subgraph algorithm. J. Chem. Inf. Comput. Sci. 2002;42:305–316. doi: 10.1021/ci010381f. [DOI] [PubMed] [Google Scholar]

[B14] Seiler KP, et al. ChemBank: a small-molecule screening and cheminformatics resource database. Nucleic Acids Res. 2008;36(Database issue):351–359. doi: 10.1093/nar/gkm843. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

ChemmineR: a compound mining framework for R

Yiqun Cao

Anna Charisi

Li-Chang Cheng

Tao Jiang

Thomas Girke

Abstract

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

ChemmineR: a compound mining framework for R

Yiqun Cao

Anna Charisi

Li-Chang Cheng

Tao Jiang

Thomas Girke

Abstract

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases