Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2017 May 22;114(22):5564–5566. doi: 10.1073/pnas.1706266114

The natural productome

Andrea M E Palazzolo a, Claire L W Simons a, Martin D Burke a,b,c,1
PMCID: PMC5465886  PMID: 28533417

Natural products have inspired many highly impactful human medicines, crop protectants, food preservatives, and biological probes. They also make the world a more wonderful place to see, smell, taste, and feel, by serving as many of the most popular colorants, perfumes, seasonings, and lotions (Fig. 1). Having the capacities to disrupt or promote protein–protein interactions (1), allosterically modify protein activities (2), and even autonomously perform protein-like functions (3), natural products continue to inspire the medicines of tomorrow. With such an extensive track record of contributions to society, the question of how much untapped potential natural products possess is an important one. Pye et al. (4) report in PNAS a quantitative analysis of the structural novelty of most of the microbial and marine-derived natural products discovered over the past 70 y. Their findings support what may seem at first glance to be paradoxical conclusions: natural product chemical space is largely bounded, and the opportunity for future discovery in this arena is substantial (Fig. 1).

Fig. 1.

Fig. 1.

Natural products have already made extensive contributions to society. The finding that the chemical space they occupy is largely bounded paradoxically reveals substantial opportunity for future impact.

Previous support for the notion that natural product chemical space is limited can be drawn from various angles. It has been theorized that because many natural products have coevolved with protein-binding partners, the boundedness of protein fold-space and the small number of structurally distinct ligand-binding pockets (5) has reciprocally imparted limitations on the structural space they occupy (6). Analogous to the assembly and folding of proteins, even the most topologically complex natural products are typically derived from modular assembly of just a few building blocks and intramolecular cyclizations of the resulting linear structures. These processes are fundamentally bounded by the laws of physical chemistry (5). The tailoring of these cyclic frameworks is performed by an increasingly well-defined collection of enzymes (7). Early efforts to comprehensively sequence the genomes of producing organisms suggest that although the majority of natural products likely await discovery, identifying the majority represents a bounded problem (8).

Pye et al. (4) approach this question of boundedness three different ways. First, they attempted to quantify the structural novelty of newly discovered microbial and marine-derived natural products as a function of time using Tanimoto scores. The analysis revealed an overall trend of decreasing percentage of newly discovered natural products being scored as structurally novel between 1941 and the mid-1990s, followed by a sustained low percentage during the last 20+ y. Pye et al. repeated the same analysis for natural products subdivided into different classes of producing organisms, and observed the same general trends during the ∼30 y following the onset of investigations of each producer.

Second, Pye et al. (4) looked specifically at cyclic tetrapeptides, to determine the percentage of natural product chemical space that is actually occupied by known natural products. Upon examining all physically accessible combinations of the 20 canonical amino acids, the percentage of covered space is on the order of a fraction of a percent. This mirrors similar findings in protein structural space (5) and supports the notion that functional constraints likely contribute to the boundedness of natural product structures (6).

Third, Pye et al. (4) calculated an alternative similarity score (Tversky) for each pair of molecules to determine the distribution of natural products within chemical space. This analysis revealed that >75% of the chemical space occupied by these natural products could be described by fewer than 6,500 parent scaffolds. This finding suggests that known natural product scaffolds fall into structurally related bins that may reflect similar selective pressures throughout evolution.

There are several caveats to note. The analysis excludes plant-derived natural products because of lack of a suitable comprehensive database. Additionally, quantifying structural novelty, especially on scale, is challenging and has limitations. In their first study, Pye et al. (4) used Tanimoto similarity scoring with a circular Morgan fingerprint and a subjective choice of 0.4 as a cut-off for structural novelty. This method describes molecules in terms of the neighborhood of each atom to a certain bond radius, in this case 2. This bond radius prohibits the measurement of relative stereochemistry between stereocenters. In addition, Tanimoto is a unidirectional scoring mechanism which means that aberrantly low scores can be observed when the newly discovered compound is a substructure of an existing natural product. In the second study, Pye et al. increased the bond radius to 4 to more accurately determine the structural similarity between the tetrapeptides, but the study still had the directionality limitation. Finally, when completing the scaffold analysis, Pye et al. transitioned to a bidirectional Tversky score to better capture substructural relationships, but returned to a bond radius of 2, prohibiting full stereochemical analysis. With these limitations noted, the results of all three approaches support the same general conclusion that natural product chemical space is largely bounded.

On the one hand, this finding reveals that continuing to search for natural products in the same producing organisms using the same approaches will be associated with a high probability of rediscovering the same natural products or new ones with high structural similarity. That said, it is notable that the decreasing percentage of novel structures has been counterbalanced by increasing numbers of isolated natural products, such that the field has maintained an impressive rate of ∼200 structurally novel natural products per year for the past 30 y. Moreover, the reported results for cyanobacteria support the notion that looking at new producing organisms causes a burst in discovery of structurally novel natural products (9, 10). Evidence suggests that investigating the same producing organism using new discovery tools, such as genome mining, should yield similarly productive bursts (11), as each genome encodes on average 25 secondary metabolites, only a few of which are known (8). This finding may help explain the interesting data from Pye et al. (4) that shows a decreasing trend in structural similarity of newly discovered natural products from Streptomyces in the past few decades.

On the other hand, this boundedness reveals an extraordinary opportunity to leverage more systematized approaches to natural product research. The recent history of neighboring fields is encouraging. Toward the end of the 20th century the study of human genes shifted from investigating one gene at a time to the comprehensive study of all genes simultaneously, culminating in the completion of the Human Genome Project in 2003 (12). This has enabled more rapid understanding of both individual genes, as well as complex multigene phenomena. For example, the bioinformatic BLAST algorithm can now be used to rapidly characterize a gene of unknown function, and genome-wide association studies performed on thousands of individuals can establish links between collections of genetic variants and increased risk of complex diseases, such as psychiatric disorders (13). The advantages of systematic science have similarly been leveraged in other areas, including RNA, proteins (14), and human gut microbes (15).

Efforts have already been initiated to systematize and automate both genome mining (8, 16) and heterologous expression to better enable natural product discovery (17, 18). A recent attempt to characterize the discovery potential in actinobacteria revealed a viable roadmap to genome sequencing-based characterization of all microbially derived natural products (8). A similar opportunity exists to systematically apply emerging tools to natural product structure elucidation. These include NMR databases that leverage chemical-shift patterns to predict complex stereochemical relationships (19), new NMR techniques for rapid determination of connectivity and stereochemistry (20), capillary flow NMR techniques that require only microgram quantities of material, and modern mass spectrometry (21). Recent breakthroughs in crystallization allow for the X-ray single-crystal diffraction of submicrogram quantities of natural products, including those that are otherwise noncrystalline (22). Moreover, at least partially predicting small-molecule structure directly from genome sequence is becoming increasingly sophisticated (16), which has the potential to shift the role of spectroscopy from de novo structure determination into predicted structure confirmation (23) and even enable the discovery of new natural products via direct synthesis of genome-predicted structures (24).

Remarkably, the natural function of most natural products (25) remains unknown. Just as the application of more systematic approaches has powerfully enabled functional characterization in related fields, new systematic tools for characterizing the functions of natural products holds similar promise. For example, computer algorithms can increasingly predict biosynthetic gene clusters responsible for a specific functional domain of a natural product and rapidly identify many other natural products with predicted activities (23). And small-molecule microarrays allow for several thousand protein binding assays to be run in parallel rapidly providing information about the binding targets of small molecules (26). The creation of a more generalized and automated approach to natural product synthesis will further enable progress in all of these areas. Leveraging a growing understanding of the common building block-based origins of most natural products has enabled progress in this direction to be made (2729).

Thus, we are likely much closer to the beginning than the end of what natural products have to offer, and the highly enabling understanding that this chemical space is largely bounded illuminates an exciting and actionable path to more effectively leverage this extraordinary natural resource.

Footnotes

Conflict of interest statement: M.D.B. is a cofounder and consultant for REVOLUTION Medicines.

See companion article on page 5601.

References

  • 1.Villar EA, et al. How proteins bind macrocycles. Nat Chem Biol. 2014;10:723–731. doi: 10.1038/nchembio.1584. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Du J, Lü W, Wu S, Cheng Y, Gouaux E. Glycine receptor mechanism elucidated by electron cryo-microscopy. Nature. 2015;526:224–229. doi: 10.1038/nature14853. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Grillo AS, et al. Restored iron transport by a small molecule promotes absorption and hemoglobinization in animals. Science. 2017 doi: 10.1126/science.aah3862. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Pye CR, Bertin MJ, Lokey RS, Gerwick WH, Linington RG. Retrospective analysis of natural products provides insights for future discovery trends. Proc Natl Acad Sci USA. 2017;114:5601–5606. doi: 10.1073/pnas.1614680114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Skolnick J, Gao M. Interplay of physics and evolution in the likely origin of protein biochemical function. Proc Natl Acad Sci USA. 2013;110:9344–9349. doi: 10.1073/pnas.1300011110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.van Hattum H, Waldmann H. Biology-oriented synthesis: Harnessing the power of evolution. J Am Chem Soc. 2014;136:11853–11859. doi: 10.1021/ja505861d. [DOI] [PubMed] [Google Scholar]
  • 7.Walsh CT. A chemocentric view of the natural product inventory. Nat Chem Biol. 2015;11:620–624. doi: 10.1038/nchembio.1894. [DOI] [PubMed] [Google Scholar]
  • 8.Doroghazi JR, et al. A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nat Chem Biol. 2014;10:963–968. doi: 10.1038/nchembio.1659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Van Arnam EB, et al. Selvamicin, an atypical antifungal polyene from two alternative genomic contexts. Proc Natl Acad Sci USA. 2016;113:12940–12945. doi: 10.1073/pnas.1613285113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Choi EJ, et al. Previously uncultured marine bacteria linked to novel alkaloid production. Chem Biol. 2015;22:1270–1279. doi: 10.1016/j.chembiol.2015.07.014. [DOI] [PubMed] [Google Scholar]
  • 11.Ziemert N, Alanjary M, Weber T. The evolution of genome mining in microbes—A review. Nat Prod Rep. 2016;33:988–1005. doi: 10.1039/c6np00025h. [DOI] [PubMed] [Google Scholar]
  • 12.International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–945. doi: 10.1038/nature03001. [DOI] [PubMed] [Google Scholar]
  • 13.Geschwind DH, Flint J. Genetics and genomics of psychiatric disease. Science. 2015;349:1489–1494. doi: 10.1126/science.aaa8954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Cravatt BF, Simon GM, Yates JR., 3rd The biological impact of mass-spectrometry-based proteomics. Nature. 2007;450:991–1000. doi: 10.1038/nature06525. [DOI] [PubMed] [Google Scholar]
  • 15.Gill SR, et al. Metagenomic analysis of the human distal gut microbiome. Science. 2006;312:1355–1359. doi: 10.1126/science.1124234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Medema MH, Fischbach MA. Computational approaches to natural product discovery. Nat Chem Biol. 2015;11:639–648. doi: 10.1038/nchembio.1884. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Luo Y, et al. Engineered biosynthesis of natural products in heterologous hosts. Chem Soc Rev. 2015;44:5265–5290. doi: 10.1039/c5cs00025d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Zhao H, Medema MH. Standardization for natural product synthetic biology. Nat Prod Rep. 2016;33:920–924. doi: 10.1039/c6np00030d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Higashibayashi S, Czechtizky W, Kobayashi Y, Kishi Y. Universal NMR databases for contiguous polyols. J Am Chem Soc. 2003;125:14379–14393. doi: 10.1021/ja0375481. [DOI] [PubMed] [Google Scholar]
  • 20.Liu Y, et al. Unequivocal determination of complex molecular structures using anisotropic NMR measurements. Science. 2017;356:eaam5349. doi: 10.1126/science.aam5349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Henke MT, Kelleher NL. Modern mass spectrometry for synthetic biology and structure-based discovery of natural products. Nat Prod Rep. 2016;33:942–950. doi: 10.1039/c6np00024j. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Inokuma Y, et al. X-ray analysis on the nanogram to microgram scale using porous complexes. Nature. 2013;495:461–466. doi: 10.1038/nature11990. [DOI] [PubMed] [Google Scholar]
  • 23.Tietz JI, et al. A new genome-mining tool redefines the lasso peptide biosynthetic landscape. Nat Chem Biol. 2017;13:470–478. doi: 10.1038/nchembio.2319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Chu J, et al. Discovery of MRSA active antibiotics using primary sequence from the human microbiome. Nat Chem Biol. 2016;12:1004–1006. doi: 10.1038/nchembio.2207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Meinwald J, Eisner T. Chemical ecology in retrospect and prospect. Proc Natl Acad Sci USA. 2008;105:4539–4540. doi: 10.1073/pnas.0800649105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kuruvilla FG, Shamji AF, Sternson SM, Hergenrother PJ, Schreiber SL. Dissecting glucose signalling with diversity-oriented synthesis and small-molecule microarrays. Nature. 2002;416:653–657. doi: 10.1038/416653a. [DOI] [PubMed] [Google Scholar]
  • 27.Woerly EM, Roy J, Burke MD. Synthesis of most polyene natural product motifs using just 12 building blocks and one coupling reaction. Nat Chem. 2014;6:484–491. doi: 10.1038/nchem.1947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Li J, et al. Synthesis of many different types of organic small molecules using one automated process. Science. 2015;347:1221–1226. doi: 10.1126/science.aaa5414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Service RF. Billion-dollar project would synthesize hundreds of thousands of molecules in search of new medicines. Science. 2017 doi: 10.1126/science.aal1073. [DOI] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES