Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2006:69–114. doi: 10.1007/0-387-28014-6_4

Pharmaceutical Drug Discovery: Designing the Blockbuster Drug

David Jesse Cummins 3
Editors: Angela Dean1, Susan Lewis2
PMCID: PMC7121638

Abstract

Twenty years ago, drug discovery was a somewhat plodding and scholastic endeavor; those days are gone. The intellectual challenges are greater than ever but the pace has changed. Although there are greater opportunities for therapeutic targets than ever before, the costs and risks are great and the increasingly competitive environment makes the pace of pharmaceutical drug hunting range from exciting to overwhelming. These changes are catalyzed by major changes to drug discovery processes through application of rapid parallel synthesis of large chemical libraries and high-throughput screening. These techniques result in huge volumes of data for use in decision making. Besides the size and complex nature of biological and chemical data sets and the many sources of data “noise”, the needs of business produce many, often conflicting, decision criteria and constraints such as time, cost, and patent caveats. The drive is still to find potent and selective molecules but, in recent years, key aspects of drug discovery are being shifted to earlier in the process. Discovery scientists are now concerned with building molecules that have good stability but also reasonable properties of absorption into the bloodstream, distribution and binding to tissues, metabolism and excretion, low toxicity, and reasonable cost of production. These requirements result in a high-dimensional decision problem with conflicting criteria and limited resources. An overview of the broad range of issues and activities involved in pharmaceutical screening is given along with references for further reading.

Keywords: Partial Little Square, Drug Discovery, Random Forest, Virtual Screening, Molecular Descriptor

References

  1. Abt M., Lim Y., Sacks J., Xie M., Young S. S. A sequential approach for identifying lead compounds in large chemical databases. Journal of Biomolecular Screening. 2001;16:154–168. [Google Scholar]
  2. Amidon G., Lennernäs H., Shah V., Crison J. A theoretical basis for a biopharmaceutic drug classification: The correlation of in vitro drug product dissolution and in vivo bioavailability. Pharmaceutical Research. 1995;12:413–420. doi: 10.1023/A:1016212804288. [DOI] [PubMed] [Google Scholar]
  3. Bejamini Y., Hochberg Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B. 1995;57:289–300. [Google Scholar]
  4. Birkett D. J. How drugs are cleared by the liver. Australian Prescriber. 1990;13:88–89. [Google Scholar]
  5. Birkett D. J. Bioavailability and first pass clearance. Australian Prescriber. 1991;14:14–16. [Google Scholar]
  6. Breiman L. Bagging predictors. Machine Learning. 1996;24:123–140. [Google Scholar]
  7. Breiman L. Arcing classifiers. The Annals of Statistics. 1998;26:801–849. doi: 10.1214/aos/1024691079. [DOI] [Google Scholar]
  8. Breiman L. Random forests, random features. Berkeley: University of California; 1999. [Google Scholar]
  9. Breiman L. Random forests. Machine Learning. 2001;45:5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]
  10. Breiman L. Statistical modeling: The two cultures. Statistical Science. 2001;16:199–231. doi: 10.1214/ss/1009213726. [DOI] [Google Scholar]
  11. Breiman L., Friedman J., Olshen R., Stone C. Classification and Regression Trees. New York: CRC Press; 1984. [Google Scholar]
  12. Burges C. J. C. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery. 1998;2:121–167. doi: 10.1023/A:1009715923555. [DOI] [Google Scholar]
  13. Burnham K. P., Anderson D. R. Model Selection and Multimodel Inference. New York: Springer-Verlag; 2002. [Google Scholar]
  14. Campbell, C., Christianini, N., and Smola, A. (2000). Query learning with large margin classifiers. Proceedings of ICML2000, 8.
  15. Comprehensive Medicinal Chemistry (2003). MDL Informations Systems, California.
  16. Cook R. D., Nachtsheim C. J. Model robust, linear-optimal design: A review. Technometrics. 1982;24:49–54. doi: 10.2307/1267577. [DOI] [Google Scholar]
  17. Crivori P., Cruciani G., Carrupt P., Testa B. Predicting blood-brain barrier permeation from three-dimensional molecular structure. Journal of Medicinal Chemistry. 2000;43:2204–2216. doi: 10.1021/jm990968+. [DOI] [PubMed] [Google Scholar]
  18. Crum-Brown A., Fraser T. R. On the connection between chemical constitution and physiological action. Transactions of the Royal Society of Edinburgh. 1869;25:151–203. [Google Scholar]
  19. Cummins D. J., Andrews C.W., Bentley J. A., Cory M. Molecular diversity in chemical databases: Comparison of medicinal chemistry knowledge bases and databases of commercially available compounds. Journal of Chemical Information and Computer Sciences. 1996;36:750–763. doi: 10.1021/ci950168h. [DOI] [PubMed] [Google Scholar]
  20. Dasarathy B. Nearest Neighbor Pattern Classification Techniques. Los Alamitos, CA: IEEE Computer Society Press; 1991. [Google Scholar]
  21. Derringer G., Suich R. Simultaneous optimization of several response variables. Journal of Quality Technology. 1980;12:214–219. [Google Scholar]
  22. Dorfman R. The detection of defective members of large populations. Annals of Mathematical Science. 1943;14:436–440. [Google Scholar]
  23. Drews J. Drug discovery: A historical perspective. Science. 2000;287:1960–1964. doi: 10.1126/science.287.5460.1960. [DOI] [PubMed] [Google Scholar]
  24. Dudoit S., Fridlyand J., Speed T. P. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association. 2002;97:77–87. doi: 10.1198/016214502753479248. [DOI] [Google Scholar]
  25. Engels M. F. M., Thielemans T., Verbinnen D., Tollenaere J. P., Verbeeck R. Cerberus: A system supporting the sequential screening process. Journal of Chemical Information and Computer Sciences. 2000;40:241–245. doi: 10.1021/ci990435+. [DOI] [PubMed] [Google Scholar]
  26. Fix E., Hodges J. L. Nonparametric discrimination: Consistency properties. Texas: U.S. Air Force, School of Aviation Medicine; 1951. Discriminatory analysis. [Google Scholar]
  27. Frank I., Friedman J. A statistical view of some chemometrics regression tools. Technometrics. 1993;35:109–148. doi: 10.2307/1269656. [DOI] [Google Scholar]
  28. Goldberg J., Wittes J. The estimation of false negatives in medical screening. Biometrics. 1978;34:77–86. doi: 10.2307/2529590. [DOI] [PubMed] [Google Scholar]
  29. Hansch C., Maolney P. P., Fujita T., Muir R. M. Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature. 1962;194:178–180. doi: 10.1038/194178b0. [DOI] [Google Scholar]
  30. Hartigan J. Clustering Algorithms. New York: John Wiley and Sons; 1975. [Google Scholar]
  31. Hastie T., Tibshirani R. Discriminant adaptive nearest-neighbor classification. IEEE Pattern Recognition and Machine Intelligence. 1996;18:607–616. doi: 10.1109/34.506411. [DOI] [Google Scholar]
  32. Hastie T., Tibshirani R. Discriminant adaptive nearest neighbor classification and regression. In: Touretzky D. S., Mozer M. C., Hasselmo M. E., editors. Advances in Neural Information Processing Systems. Cambridge: MIT Press; 1996. pp. 409–415. [Google Scholar]
  33. Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning. New York: Springer-Verlag; 2001. [Google Scholar]
  34. Hawkins D. M., Basak S. C., Mills D. Assessing model fit by cross-validation. Journal of Chemical Information and Computer Sciences. 2003;43:579–586. doi: 10.1021/ci025626i. [DOI] [PubMed] [Google Scholar]
  35. Higgs R., Bemis K., Watson I., Wikel J. Experimental designs for selecting molecules from large chemical databases. Journal of Chemical Information and Computer Sciences. 1997;37:861–870. doi: 10.1021/ci9702858. [DOI] [Google Scholar]
  36. JMP . Version 4.0.4. North Carolina: SAS Institute; 2003. [Google Scholar]
  37. Johnson M. E., Moore L. M., Ylvisaker D. Minimax and maximum distance designs. Journal of Statistical Planning and Inference. 1990;26:131–148. doi: 10.1016/0378-3758(90)90122-B. [DOI] [Google Scholar]
  38. Kennard R., Stone L. Computer aided design of experiments. Technometrics. 1969;11:137–148. doi: 10.2307/1266770. [DOI] [Google Scholar]
  39. Kramer C. Y. Extensions of multiple range tests to group means with unequal numbers of replications. Biometrics. 1956;12:309–310. doi: 10.2307/3001469. [DOI] [Google Scholar]
  40. Leach A. R., Gillet V. J. Introduction to Chemoinformatics. Boston: Kluwer Academic; 2003. [Google Scholar]
  41. Maccs Drug Data Report (2003). MDL Informations Systems, California.
  42. Major J. What is the future of high-throughput screening? Journal of Biomolecular Screening. 1999;4:119–125. doi: 10.1177/108705719900400304. [DOI] [PubMed] [Google Scholar]
  43. Miller A. J. Subset Selection in Regression. second edition. New York: Chapman & Hall/CRC; 2002. [Google Scholar]
  44. Phatarfod R. M., Sudbury A. The use of a square array scheme in blood testing. Statistics in Medicine. 1994;13:2337–2343. doi: 10.1002/sim.4780132205. [DOI] [PubMed] [Google Scholar]
  45. Rohrer S. P., Birzin E., Mosley R., Berk S. C., Hutchins S., Shen D., Xiong Y., Hayes E., Parmar R., Foor R., Mitra S., Degrado S., Shu M., Klopp J., Cai S. J., Blake A., Chan W. W. S., Pasternak A., Yang L., Patchett A., Smith R., Chapman K., Schaeffer J. Rapid identification of subtype-selective agonists of the somatostatin receptor through combinatorial chemistry. Science. 1998;282:737–740. doi: 10.1126/science.282.5389.737. [DOI] [PubMed] [Google Scholar]
  46. Rusinko A., III, Farmen M. W., Lambert C. G., Brown P. L., Young S. S. Analysis of a large structure/biological activity data set using recursive partitioning. Journal of Chemical Information and Computer Sciences. 1999;39:1017–1026. doi: 10.1021/ci9903049. [DOI] [PubMed] [Google Scholar]
  47. SAS System (2003), Version 8.2. SAS Institute, North Carolina.
  48. Shao J. Linear model selection by cross-validation. Journal of the American Statistical Association. 1993;88:486–494. doi: 10.2307/2290328. [DOI] [Google Scholar]
  49. Shi P., Tsai C.-L. Regression model selection—A residual likelihood approach. Journal of the Royal Statistical Society B. 2002;64:237–252. doi: 10.1111/1467-9868.00335. [DOI] [Google Scholar]
  50. Sittampalam G. S., Iversen P. W., Boadt J. A., Kahl S. D., Bright S., Zock J. M., Janzen W. P., Lister M. D. Design of signal windows in high throughput screening assays for drug discovery. Journal of Biomolecular Screening. 1997;2:159–169. doi: 10.1177/108705719700200306. [DOI] [Google Scholar]
  51. Tukey J. W. Reminder sheets for “Allowances for various types of error rate”. In: Braun H. I., editor. The Collected Works of John W. Tukey, volume VIII, Multiple Comparisons: 1948–1983. New York: Chapman & Hall; 1994. pp. 335–339. [Google Scholar]
  52. Tukey J. W. More honest foundations for data analysis. Journal of Statistical Planning and Inference. 1997;57:21–28. doi: 10.1016/S0378-3758(96)00032-8. [DOI] [Google Scholar]
  53. Vapnik V. N. Statistical Learning Theory. New York: Wiley-Interscience; 1998. [Google Scholar]
  54. Vapnik V. N. The Nature of Statistical Learning Theory. second edition. New York: Springer Verlag; 2000. [Google Scholar]
  55. Warmuth M. K., Liao J., Ratsch G., Mathieson M., Putta S., Lemmenk C. Active learning with support vector machines in the drug discovery process. Journal of Chemical Information and Computer Sciences. 2003;43:667–673. doi: 10.1021/ci025620t. [DOI] [PubMed] [Google Scholar]
  56. Weston J., Perez-Cruz F., Bousquet O., Chapelle O., Elisseeff A., Schölkopf B. Feature selection and transduction for prediction of molecular bioactivity for drug design. Bioinformatics. 2002;1:1–8. doi: 10.1093/bioinformatics/btg054. [DOI] [PubMed] [Google Scholar]
  57. Wikel J. H., Higgs R. E. Point: Applications of molecular diversity analysis in high throughput screening. Journal of Biomolecular Screening. 1997;2:65–66. doi: 10.1177/108705719700200202. [DOI] [Google Scholar]
  58. World Drug Index (2002). Thompson Derwent, London.
  59. Ye J. On measuring and correcting the effects of data mining and model selection. Journal of the American Statistical Association. 1998;93:120–131. doi: 10.2307/2669609. [DOI] [Google Scholar]
  60. Young, S. S., Ekins, S., and Lambert, C. G. (2002). So many targets, so many compounds, but so few resources. Current Drug Discovery, 1–6 (www.currentdrugdiscovery.com).
  61. Zemroch P. J. Cluster analysis as an experimental design generator. Technometrics. 1986;28:39–49. doi: 10.2307/1269602. [DOI] [Google Scholar]

Articles from Screening are provided here courtesy of Nature Publishing Group

RESOURCES