Abstract
Cyclic peptides offer a range of notable advantages, including potent antibacterial properties, high binding affinity and specificity to target molecules, and minimal toxicity, making them highly promising candidates for drug development. However, a comprehensive database that consolidates both synthetically derived and naturally occurring cyclic peptides is conspicuously absent. To address this void, we introduce CyclicPepedia (https://www.biosino.org/iMAC/cyclicpepedia/), a pioneering database that encompasses 8744 known cyclic peptides. This repository, structured as a composite knowledge network, offers a wealth of information encompassing various aspects of cyclic peptides, such as cyclic peptides’ sources, categorizations, structural characteristics, pharmacokinetic profiles, physicochemical properties, patented drug applications, and a collection of crucial publications. Supported by a user-friendly knowledge retrieval system and calculation tools specifically designed for cyclic peptides, CyclicPepedia will be able to facilitate advancements in cyclic peptide drug development.
Keywords: cyclic peptide, drug development, knowledge network, database
INTRODUCTION
In contemporary drug synthesis, the principal molecular categories encompass small molecules, peptides, proteins, antibodies, and other biologics. Small molecules reign as the most prevalent pharmaceutical molecules, but they grapple with great challenges such as limitations in target specificity and drug resistance [1]. Proteins exhibit significant complexity in biosynthesis and show insufficient stability, leading to diminished in vivo bioavailability [2, 3]. In contrast, cyclic peptides that combine the inherent stability and membrane permeability of small molecules with the target selectivity and specificity of peptide drugs have become a growing trend in drug development [4].
Cyclic peptides, distinguished by their distinctive closed-loop structure, commonly consist of several to dozens of amino acids. The cyclization of peptides is typically achieved through lactamization, lactonization, thiolactonization, or disulfide bridge between two reactive groups within a peptide, which can be divided into four types: head-to-tail, head-to-side chain, side chain-to-tail, and side chain-to-side chain cyclization, depending on the site of the reactive groups involved [5].
The unique structure of cyclic peptides renders them favorable pharmacological features. Compared to linear peptides, the loop structure endows cyclic peptides with more rigidity, reducing the entropic cost of the receptor-binding [6]. Therefore, cyclic peptides can bind to target molecules with higher affinity and specificity and are capable of intervening protein–protein interactions, which are ‘undruggable’ for traditional drugs. Importantly, the high binding specificity and selectivity allow cyclic peptide drugs to act precisely on specific biological processes, minimizing interference with other biomolecules and thereby reducing the risk of adverse reactions [7]. These unparalleled pharmaceutical properties of cyclic peptides have gained wide attention in the field of drug development.
Cyclic peptides have shown prospects in the domains of anticancer, antibacterial, and antiviral drug discovery [8]. In anticancer research, cyclic peptide drugs play a pivotal role in impeding tumor growth and angiogenesis by actively targeting cancer cells [9]. For example, Vapreotide, a synthetic somatostatin analogue, attenuates tumor growth and metastasis via the inhibition of hormone secretion from neuroendocrine tumors [10]. As antiviral agents, cyclic peptides intervene in key stages such as virus replication [11, 12] and immune regulation [13], opening up novel paths for the advancement of more potent antiviral drugs. Recently, the successful isolation of Clovibactin from uncultured soil bacteria [14]—an antibiotic without detectable resistance—underscores the burgeoning potential of cyclic peptides in combating infectious diseases.
At present, the majority of cyclic peptide drugs are derived from natural products or their derivatives [15]. The conventional experimental strategies of extracting and purifying natural cyclic peptides are often time-consuming and yield relatively low success rates. The high-throughput sequencing technique can alleviate these problems [16, 17], but the precision of sequencing-based identification still heavily relies on prior knowledge pertaining to natural cyclic peptides. Furthermore, the widely employed de novo design approach for cyclic peptides also hinges upon a profound comprehension of the sequences, spatial structures, and biological activities of known counterparts [8, 18].
Databases housing information on small molecules and proteins, such as PubChem [19], UniProt [20], and DRAMP [21], offer rich data including chemical structures, biological activities, pharmacokinetic properties, and interactions with drug targets. These databases have been instrumental in propelling the progress of drug discovery endeavors. However, the cyclic peptide data in these database resources is insufficient, and the data is dispersed and lacks standardized organization, significantly weakening their impact on cyclic peptide drug development. On the other hand, existing databases such as CyBase and Norine have curated data on cyclic peptides. Still, their contributions to cyclic peptide research are limited. For instance, CyBase only collects several types of cyclic proteins/peptides, dominated by cyclotides (~75%), and lacks information on known targets, while Norine primarily concentrates on non-ribosomal peptides. Hence, establishing a comprehensive, exhaustive, and readily accessible cyclic peptide knowledge base emerges as an imperative cornerstone for furthering the research and development of cyclic peptide therapeutics.
To attain this overarching goal, the present study has systematically gathered and compiled an extensive array of available data on cyclic peptides, encompassing multifaceted information on their structure, bioactivity, pharmacokinetic properties, targets, functions, etc. Subsequently, a specialized knowledge base named CyclicPepedia was constructed for researchers to easily access standardized and well-categorized information on cyclic peptides. The establishment of CyclicPepedia is anticipated to expedite research in the realm of cyclic peptide drugs and fully unleash cyclic peptides’ potential in drug discovery and engineering.
METHODS
Data collection and processing
To comprehensively amass published cyclic peptide data, an exhaustive search was conducted across pertinent databases, including but not limited to PubChem [19], DrugBank [22], and UniProt [20]. According to the documentation of these databases, different query criteria were applied to retrieve metadata (e.g., identifiers, sequences, structure files, pharmacological properties, physicochemical properties, related literature, and manufacturers) through web crawling (Table 1). Taking PubChem as an illustration, we utilized cyclic peptide and cyclopeptide as keywords to build the text search query. To further increase data quantity, records closely related to the query results were also retrieved. For databases such as DRAMP and Norine, fields indicating peptide cyclization information were used to screen for cyclic peptides—e.g., linear/cyclic (cyclic) for DRAMP and structure type (cyclic) for Norine. Meticulous manual verification was then carried out to ensure the accuracy and reliability of data. Web crawling codes are freely available on GitHub at https://github.com/dfwlab/cyclicpepedia.
Table 1.
Data sources and statistics
| Data source | Query criteria | Peptides | Sequence | Structure | Date | Version |
|---|---|---|---|---|---|---|
| PubChem | ‘cyclic peptide’ or ‘cyclopeptide’ | 2535 | 1600 | 2532 | 2023/01 | / |
| DrugBank | Category: ‘Peptide, cyclic’ | 91 | 5 | 13 | 2023/01 | / |
| UniProt | ‘cyclic peptide’ or ‘cyclopeptide’ | 605 | 583 | 281 | 2023/01 | Release-2022_05 |
| DPL | Structure: ‘Cyclic’ or ‘C’ | 88 | 88 | 85 | 2023/01 | / |
| APD | Name contains ‘cyclic’; Class contains ‘XXC’ | 204 | 172 | 26 | 2023/01 | Jan. 2023 |
| DRAMP | Linear/Cyclic: ‘Cyclic’ | 914 | 914 | 95 | 2023/01 | Dec. 2022 |
| CyBase | Cyclic: ‘yes’ | 1236 | 1134 | 189 | 2023/06 | / |
| Norine | Structure type: ‘Cyclic’ | 576 | 468 | 158 | 2023/06 | Jan. 2022 |
| ConoServer | Framework: disulfide rich | 3416 | 3414 | 91 | 2023/12 | / |
| CyclicPepedia | / | 8744 | 8614 | 7032 | 2024/02 | V 1.3.1 |
We conducted structure and sequence alignment to remove redundancy. Additionally, structure-sequence conversion was applied to complemented sequence and structure data. For records devoid of sequence information, cyclic peptide SMILES were converted into sequences through the structure-to-sequence conversion (Struc2Seq), facilitated by RDKit (http://www.rdkit.org) and a monomer reference library of over 500 amino acid structural units. Conversely, the sequence-to-structure conversion (Seq2Struc) converted cyclic peptide sequences to SMILES by using RDKit. However, it’s worth noting that Seq2Struc posits a head-to-tail cyclization, which may not always be the case for all cyclic peptides. Hence, we provide an online editing interface whereby users can refine predicted structures using additional structural information they possess.
Sequence formats such as one-letter code (e.g., GCN), IUPAC condensed (e.g., cyclo[Gly-Cys-Asn]), amino acid chain (e.g., Gly(1)—Cys—Asn(1)), graph representation (e.g., Gly,Cys,Asn @0,2), and sequence graph (e.g., G(nodes = [(0, Gly), (1, Cys), (2, Asn)]; edges = [(0, 2)])) formats were inter-converted through a sequence format transformation approach. Inspired by the NOR format from Norine [23], the graph representation format separates monomers by comma and uses ‘@idx,idy’ to specify the connection points of ring closure bonds. The sequence graph is generated by NetworkX [24] for graph alignment. Structure format transformation (e.g., SMILES string, InChI, InChIKey, Mol block, and PDB block) was also accomplished based on RDKit. The implementation of structure and sequence format transformation tends to improve data standardization. Physicochemical properties were computed using RDKit, involving topological polar surface area, complexity, Log(P), hydrogen bond donor count, hydrogen bond acceptor count, rotatable bond count, drug-likeness, and fingerprints. Peptide sequence properties were predicted utilizing the ‘Peptides’ R package [25], for example, Boman index, charge, aliphatic index, instability index, and amino acid composition. The structure-sequence conversion methods (i.e., Struc2Seq and Seq2Struc), structure and sequence format transformation tools, and peptide properties prediction tools have been integrated into the website as plug-ins. In order to optimize website performance and facilitate user access, the processed metadata files were organized into multiple one-to-many or many-to-many relationships for storage based on targets, biological activities, functions, and other relevant criteria, as illustrated in Fig. 1A. This structured methodology significantly boosts the usability and efficiency of our database.
Figure 1.
The framework and available function modules of CyclicPepedia. (A) The procedure of collecting and processing cyclic peptide data. (B) The information architecture of CyclicPepedia.
Development tools and implementation
CyclicPepedia has been developed utilizing the Django framework (version 3.2.16) in conjunction with Python (version 3.8). This framework serves as a crucial component connecting the front-end web pages to cyclic peptide data and enables us to establish a data management system for easy data updates. Data is efficiently and structurally managed through the SQLite3 database (version 3.9.0). The front-end rendering of the platform is accomplished using standard Hypertext Markup Language (HTML), complemented by Bootstrap styles and JavaScript for enhanced user interface functionality. This platform is deployed on a Ubuntu server (Linux operating system). The website’s search functionalities (i.e., advanced search, structure search, and sequence search) are implemented through the addition of plug-ins such as RDKit, Biopython (https://biopython.org/), and NetworkX.
Platform layout and structure
CyclicPepedia is divided into several sections: homepage, browsing, search, tools, statistics, download, help, and data source interface. These sections can be accessed from the website’s navigation bar on any page (Fig. 1B).
The homepage offers a text search feature, links to various browsing pages (i.e., source, target, function, and family), and the latest news related to cyclic peptides. The browsing interface contains pages dedicated to sources, targets, functions, families, references, and a full list of cyclic peptides. Users can access the details page of a cyclic peptide by clicking on its CyclicPepedia ID on the browsing page. The cyclic peptide details page within CyclicPepedia furnishes fundamental data about the identifiers, general information (e.g., source, function, family, and target), structures, and sequences. Moreover, it also includes biological assays, pharmacokinetic properties, and physicochemical properties, if available.
To enhance user convenience, the search page has been designed to serve multiple search modalities, such as advanced search, structure search, and sequence search (Fig. 1B). CyclicPepedia also integrates multiple cyclic peptide tools for structure-sequence conversion, peptide property prediction, and structure and sequence format transformation. The statistics page presents graphic representations of existing data in the database, providing users with a more intuitive overview.
On the download page, users can selectively download specific datasets, for example, fundamental information data, structure data, and sequence data. The help page features a database tutorial, update logs, and a section for user feedback. A complete list of references and data sources is displayed on the data source interface, along with the online tools that have contributed to the database’s development.
RESULTS
Data summaries and statistics
Due to their unique pharmacological features, cyclic peptides have emerged at the forefront of drug development. More than dozens of cyclic peptides have been approved by the US FDA and EMA, and many more are at various developmental stages as immunosuppressants, antibiotics, antivirals, and anticancer drugs (Fig. S1). To expedite the development of cyclic peptide therapeutics, we constructed a comprehensive knowledge base comprising 8744 published cyclic peptides, of which 8614 records have their sequences available (1082 [12.56%] sequences were predicted by CyclicPepedia’s structure-to-sequence converter) and 7032 records have structural information (3789 [53.88%] structures were predicted by CyclicPepedia’s sequence-to-structure converter, Table 1). Among these, 1210 records have three-dimensional structures, with 655 having 3D conformation structures from PubChem, 218 having structures from PDB, and 158 having structures predicted by AlphaFold2.
In addition, CyclicPepedia collected multifaceted data on cyclic peptides from a range of databases, which includes ChEBI [26], ChEMBL [27], KEGG [28], and Wikipedia (Fig. 1B). Evidently, there is limited data on features regarding targets, toxicity, and mechanism of action (Fig. 2). However, compared with existing databases containing curated cyclic peptide information (e.g., CyBase [29], Norine [23], APD [30], DPL [31], DRAMP [21], and ConoServer [32]), our resource is far more extensive and detailed, especially in terms of structure and sequence (Table 1, Fig. 2). Featuring comprehensive and standardized, these curated data can provide superior benchmark datasets for artificial intelligence and contribute to the development of cyclic peptide therapeutics.
Figure 2.
Comparisons of databases containing cyclic peptides. DrugBank, PubChem, and UniProt are knowledge resources for drugs, chemicals, and proteins, respectively. DPL, APD, DRAMP, and ConoServer are peptide databases, while Norine is dedicated exclusively to non-ribosomal peptides (NRPs). CyBase is a database containing data on cyclic proteins/peptides, dominated by cyclotide. CyclicPepedia is a comprehensive database specifically designed for cyclic peptides.
In CyclicPepedia, natural cyclic peptides still dominate, with 5465 natural cyclic peptides, accounting for 62.50% of the total cyclic peptides (Fig. 3A). According to statistics, natural cyclic peptides are mainly derived from animals and plants (animals: 57.97%, plants and Fungi: 32.08%, and Bacteria: 10.43%, Fig. S1). Besides, there are 1537 (17.58%) synthetic cyclic peptides (Fig. 3A). Observations regarding the sequence length of cyclic peptides indicate that the majority fall within the range of 40 amino acids or fewer, and natural cyclic peptides account for a larger proportion of long sequences (> 20 amino acids) as compared to synthetic constructed cyclic peptides (Fig. 3B).
Figure 3.
Statistical analysis of cyclic peptide data. (A) Cyclic peptides are classified into natural cyclic peptides and synthetically constructed cyclic peptides based on their sources. The organism information of 19% of cyclic peptides remains unknown due to the lack of information in the original databases. (B) The distribution of cyclic peptide amino acid sequence length. (C) The distribution of cyclic peptide functions. Antimicrobials can be divided into antibacterial, antifungal, etc. Although they are not annotated in ConoServer, cyclic peptides belonging to the conotoxin family are known to have neurotoxicity. (D) The distribution of cyclic peptide family. (E) Circular bar plot of cyclic peptide targets in CyclicPepedia.
Upon classifying existing data by their functional effects, a substantial number of cyclic peptides exhibit antimicrobial activity (14.20%, n = 1242, Fig. 3C). Most of the cyclic peptides belong to toxins, cyclotides, and orbitides (Fig. 3D). Peptide toxins are potent and highly selective ligands for a wide range of ion channels and membrane receptors, therefore exhibiting great potential in drug discovery as lead compounds and pharmacological tools [33]. Cyclotides represent plant cyclic peptides characterized by the presence of a head-to-tail cyclic backbone and a cyclic cystine knot motif [34]. Many biological activities have been reported for cyclotides, including antimicrobial, antiviral, antitumor, antihypertensive, and enzyme inhibitor. Orbitides are another class of plant-derived peptides with a head-to-tail cyclization, yet lacking disulfide bonds [15]. They are associated with a diverse range of activities, for example, antimicrobial, anti-inflammatory, inhibiting cancer cell proliferation, and regulating immune response. Furthermore, the known targets of cyclic peptides are primarily concentrated in surface glycoproteins, 1,3-beta-glucan synthase, and somatostatin receptors, among others (Fig. 3E). It is worth noting that the 85 cyclic peptides targeting surface glycoproteins all exhibit antiviral properties [11, 35].
Cyclic peptide details page
In order to bolster the design and advancement of cyclic peptide drugs, our knowledge base offers a wealth of detailed information about individual cyclic peptides on the information interface (Fig. 4A). This information includes: 1) a concise summary, 2) structures, 3) sequences, 4) physicochemical properties, 5) biological assays, 6) target details, 7) predictive tools, 8) drug manufacturers, 9) links to external databases, and 10) relevant literature.
Figure 4.
The CyclicPepedia user interface. (A) Cyclic peptide information interface, which encompasses basic information, a knowledge network, chemical structure, pharmacology, targets, and other information pertinent to cyclic peptides. (B) CyclicPepedia’s function interface includes full-text search, structure search, and sequence search.
Besides essential information about cyclic peptides, CyclicPepedia also furnishes researchers with knowledge networks to facilitate insights into the interrelationships between cyclic peptides, sources, targets, functions, families, etc. The multidimensional display of cyclic peptides can significantly assist researchers in designing and optimizing drug molecules, identifying potential drug targets, and explaining the mechanisms of action (Fig. 4A). Beyond these, data on targets and biological assays enables in-depth toxicological studies and side-effect evaluations of cyclic peptide drugs (Fig. 4A), hastening the ‘bench to bedside’ process.
CyclicPepedia also provides connections to external resources and tools, as well as links to manufacturers such as Merck, Baxter Healthcare Corp, and Upsher-Smith laboratories (Fig. 4A).
CyclicPepedia’s function interface
In addition to the full-text search, three search methods are provided in CyclicPepedia: advanced search, structure search, and sequence search (Fig. 4B). In advanced search, users can enter peptide names or select multiple criteria—sequence and structure information, physiochemical properties, sequence properties, and biological annotation information—to create custom search queries.
The structure search function allows users to locate cyclic peptides by drawing structure, inputting SMILES and InChI codes, or uploading structure files in PDB/SDF formats. Three search types are available, for example, ‘exact search’, ‘substructure search’, and ‘similarity search’. The similarity search embraces multiple molecular fingerprints options such as ‘RDKit Fingerprint’, ‘MACCS (Molecular ACCess System) Keys’, and ‘Morgan Fingerprint’, coupled with similarity metrics including ‘Tanimoto similarity’ and ‘Dice similarity’. Users can constrain the search results by selecting coefficient thresholds via the slider (Fig. 4B).
The sequence search is divided into local alignment and graph alignment. The local alignment function employs the Smith-Waterman similarity search algorithm. It filters sequences with user-specified E-values and provides similarity scores in the results (Fig. 4B). Parameters can be adjusted to set penalties for matches, mismatches, and gaps. To leverage the cyclization information in cyclic peptide sequences (e.g., sequences in IUPAC condensed format), we developed a graph alignment algorithm based on NetworkX (Fig. S2A). This method can convert cyclic peptide sequences into graphical structures and measure the similarity between two peptides by graph isomorphism. The number of matched nodes and edges and match scores are presented in the search results.
Interface of cyclic peptide-specific tools
CyclicPepedia also provides computational tools concerning cyclic peptide data processing, for example, structure-to-sequence (Struc2Seq) converter, sequence-to-structure (Seq2Struc) converter, peptide property prediction, and structure and sequence format transformation.
The Struc2Seq converter can extract amino acid units from the cyclic peptide skeleton and match them with the monomer reference library, thereby transforming cyclic peptide SMILES into sequence information (Fig. S2B). Our monomer reference library contains over 500 amino acid structural units. Most of the monomer structural information comes from the Norine database [23]. Under the condition of head-to-tail cyclization, the Seq2Struc converter can convert cyclic peptide sequences into structures (Fig. S2C). An online editor is readily accessible on CyclicPepedia for any necessary structure corrections. In addition, CyclicPepedia offers tools to compute physiochemical properties using RDKit, as well as indices of peptide sequences with the ‘Peptides’ R package. The format transformation tools allow for the inter-conversion of diverse structure and sequence formats. All these tools generate downloadable reports in HTML or TSV (tab-separated values) format.
Download and help interface
The download page offers files of basic cyclic peptide information, sequences, and structure files in Mol and PDB formats. Also, users can download search results directly from the search module. The help page consists of a database tutorial, update log files, and a feedback page that allows users to promptly submit questions and requests, serving as a reference for future database updates.
CASE STUDIES
We used antimicrobial peptides to illustrate the practicability of CyclicPepedia. It was observed that the majority of antimicrobial cyclic peptides belong to frog skin active peptides (Fig. 5A). Additionally, the 1,3-beta-glucan synthase is a common target of most antimicrobial cyclic peptides (Fig. 5A, Table S1). Quinupristin and Bacitracin A, which have multiple targets, are important for inhibiting the growth of Gram-positive bacteria (Fig. 5A, Fig. S4).
Figure 5.
Case study of antimicrobial cyclic peptides. (A) The antimicrobial cyclic peptide knowledge network. The size of nodes represents the magnitude of the degree. (B) The network of cyclic peptides similar to Caspofungin (similarity score > 0.9). The node size of cyclic peptides represents their similarity to Caspofungin. (C) The structure schematic diagrams, InChIkey information, and similarity coefficients of Caspofungin and its structural analogues. Aculeacin A (CP00064) and mulundocandin (CP00128) have known targets, while the targets of CP00973 and CP01847 are unknown.
In studies of cyclic peptide drugs targeting 1,3-beta-glucan synthase, Caspofungin [36] is the first US FDA-approved antifungal drug to inhibit 1,3-glucan synthesis. We employed MACCS fingerprints and Tanimoto similarity metrics to search for cyclic peptides structurally similar to Caspofungin, leading to the identification of 49 cyclic peptides with a similarity of over 90%. Among them, 39 cyclic peptides have been documented in the literature to target 1,3-beta-glucan synthase and exhibit antifungal properties (Fig. 5B, Table S2).
As shown in Fig. 5C, Aculeacin A (CP00064, Similarity = 0.940) and Mulundocandin (CP00128, Similarity = 0.941) both share a similar structure with Caspofungin. Studies have reported that these three compounds belong to the echinocandin type of antifungal antibiotics and achieve fungicidal effects by inhibiting the activity of 1,3-beta-glucan synthase [37]. Therefore, based on the knowledge network analysis, we can speculate that CP01874 (PubChem: 102083719, Similarity = 0.970) and CP00973 (PubChem: 16072303, Similarity = 0.926) may also be similar in targets and functions to Caspofungin (Fig. 5B, C). In practical application, researchers can make valid inferences regarding the cyclic peptides of interest by examining the targets and functions of structurally similar peptides.
DISCUSSION
Advantages and roles of CyclicPepedia
Although cyclic peptides have become a new focal point in drug development, there has not been a specialized knowledge base tailored for the design of cyclic peptide therapeutics. To fill the gap, we developed a comprehensive database platform, namely CyclicPepedia, enabling researchers to conveniently and efficiently access cyclic peptide data. It provides a rich resource for studies on drug synthesis and acting mechanisms of cyclic peptides, including structure, synthesis, and biological activity.
Traditionally, drug development relied on laboratory experiments and testing. However, in recent years, this paradigm has gradually shifted towards analyzing big data, and data-driven approaches such as artificial intelligence have emerged as a mainstream trend. This transformation altered the process of drug development and significantly accelerated the discovery and validation of new drugs [38]. By leveraging metagenomic sequencing and specialized databases (e.g., UniProt, PubChem, and other massive data resources), researchers have developed new tools and algorithms for the sequencing-based bioactive natural product discovery [39–41]. Despite the remarkable progress in peptide therapeutics, the unique structure and biological properties of cyclic peptides necessitate more refined and specialized approaches to drug design and development [42].
Artificial intelligence-driven drug design (AIDD) and computer-aided drug design (CADD) have been shown to expedite the process of drug design. Various computational approaches such as quantitative structure–activity relationship (QSAR), molecular docking, molecular dynamics simulation, and AI models have been utilized to help identify therapeutic targets, understand relationships between molecular structures and biological activities, and facilitate drug screening and optimization [43, 44]. CyclicPepedia aims to support the early stage of cyclic peptide drug development. It collects and systematizes abundant resources related to cyclic peptides—sequence, structure, source, biological activity information, manufacturer resources, etc. Nonetheless, the limited data on cyclic peptide targets, toxicity, and mechanism of action is indicative of the insufficient research in these areas. The comprehensive data in CyclicPepedia can be used to predict cyclic peptides’ pharmacological features and high binding affinity targets, thus hastening research on the biological activity of cyclic peptides and the accumulation of relevant data. In addition, CyclicPepedia serves as an excellent resource for benchmark datasets for discovering and engineering novel functional cyclic peptides from genome data [39, 42], as well as developing AI tools for cyclic peptide synthesis, naturally-inspired cyclic peptide generation [45], and de novo design [46], making it an indispensable tool for advancing research in this field. The knowledge network offered by CyclicPepedia enables researchers to quickly mine cyclic peptide information of interest and thus improves data utilization. Furthermore, the available tools such as physicochemical calculation and biological activity prediction, along with the resources from manufacturers (e.g., Merck and Novartis), have broadened the application scope of CyclicPepedia and enhanced its value in the field of cyclic peptides.
Plans for future updates
As research on cyclic peptides advances, automating update procedure will be executed every six months to incorporate the newly discovered cyclic peptides. This data ingestion procedure consists of the following key steps: 1) extracting cyclic peptide data from external databases; 2) sanitizing and transforming data into standardized formats; 3) merging processed data into CyclicPepedia. Furthermore, the upcoming version of CyclicPepedia will predominantly focus on the data regarding cyclic peptide targets and their pharmacology properties. Pharmaceutical databases such as the Therapeutic Target Database [47], VARIDT [48], INTEDE [49], Comparative Toxicogenomics Database [50], TheMarker [51], and DrugMAP [52] will be explored for molecular interactions of cyclic peptides. Simultaneously, efforts will be made to enhance data quality and accuracy control, preventing the inclusion of duplicate or erroneous data. According to user feedback and demands, additional interfaces and the latest analytical tools for data application will be added in the future. For instance, by applying artificial intelligence technologies, in-depth analysis of cyclic structures can be conducted to extract features related to peptide cyclization and evaluate drug effectiveness.
In summary, CyclicPepedia stands as a pivotal resource in the realm of cyclic peptide research. In light of the expanding volume of publicly accessible cyclic peptide data, this study has established a standardized and meticulously organized knowledge base. This valuable resource empowers researchers to efficiently access and utilize cyclic peptide information, thereby expediting the development and engineering of cyclic peptide drugs.
Key Points
Cyclic peptides have exhibited immense potential as therapeutics due to their unique pharmacological features.
CyclicPepedia is a comprehensive knowledge base comprising 8744 published cyclic peptides, with 8614 sequences and 7032 structures.
The multifaceted data in CyclicPepedia can provide superior benchmark datasets for artificial intelligence and contribute to the development and engineering of cyclic peptide therapeutics.
Supplementary Material
ACKNOWLEDGEMENTS
We are grateful to all the subjects who participated in this study. Network graphs were drawn with Cytoscape.
Author Biographies
Lei Liu is a postdoctoral fellow at the School of Life Sciences and Technology, Tongji University, Shanghai, China. Her research interests focus on microbiome, AI drug design, and clinical informatics.
Liu Yang is a research assistant at the National Center, Children’s Hospital, Zhejiang University School of Medicine, Hangzhou, China.
Suqi Cao is a laboratory technician at the National Center, Children’s Hospital, Zhejiang University School of Medicine, Hangzhou, China.
Zhigang Gao is the vice president of Children’s Hospital, Zhejiang University School of Medicine, Hangzhou, China. He is an expert in the diagnosis and treatment of diseases related to hepatobiliary, gastrointestinal, and portal hypertension in children.
Bin Yang is a system operation and maintenance engineer with a degree in software engineering. He specializes in supporting the operation of Bio-Med Big Data Center.
Guoqing Zhang is a principal investigator from Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences. His major is biomedical databases and knowledge bases, and focusing on the integration and management of multi-omics data, literature data, health and medical real-world data.
Ruixin Zhu is a professor at the School of Life Sciences and Technology, Tongji University, Shanghai, China. His group is dedicated to metagenomics research driven by bioinformatics, algorithm development in metagenomics, and analysis of interactions between microbiota and host/environment.
Dingfeng Wu is a professor at the National Center, Children’s Hospital, Zhejiang University School of Medicine, Hangzhou, China. His works focus on metagenomic methodology and disease microbiome.
Contributor Information
Lei Liu, Department of Gastroenterology, Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200072, P. R. China.
Liu Yang, National Center, Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou 310052, P. R. China.
Suqi Cao, National Center, Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou 310052, P. R. China.
Zhigang Gao, Department of General Surgery, Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou 310052, P. R. China.
Bin Yang, Shanghai Southgene Technology Co., Ltd., Shanghai 201203, China.
Guoqing Zhang, National Genomics Data Center & Bio-Med Big Data Center, Chinese Academy of Sciences Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of the Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, P. R. China.
Ruixin Zhu, Department of Gastroenterology, Shanghai Tenth People's Hospital, School of Life Sciences and Technology, Tongji University, Shanghai 200072, P. R. China.
Dingfeng Wu, National Center, Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou 310052, P. R. China.
FUNDING
This work was supported by the National Natural Science Foundation of China (32200529 to DW, 92251307 to RZ & GZ, 82170542 to RZ, 92051116 to GZ), the National Key Research and Development Program of China (2021YFF0703702 to RZ), and the Key Research and Development Program of Zhejiang Province (2023C03029 to ZG). The funders had no role in study design, data collection, and analysis, decision to publish, or preparation of the manuscript.
DATA AVAILABILITY
All the software packages used in this study are open source and publicly available and all data and resources of CyclicPepedia are freely available on GitHub at https://github.com/dfwlab/cyclicpepedia.
AUTHORS’ CONTRIBUTIONS
Dingfeng Wu (conceived and designed the project), Ruixin Zhu (conceived and designed the project), Guoqing Zhang (conceived and designed the project). Each author has contributed significantly to the submitted work. Lei Liu (collected multiomics data, performed data analysis, and built the database, wrote the original manuscript), Liu Yang (collected multiomics data, performed data analysis, and built the database, wrote the original manuscript). All authors revised the manuscript. All authors read and approved the final manuscript.
References
- 1. Zhong L, Li Y, Xiong L, et al. Small molecules in targeted cancer therapy: advances, challenges, and future perspectives. Signal Transduct Target Ther 2021;6(1):201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Süssmuth RD, Mainz A. Nonribosomal peptide synthesis-principles and prospects. Angew Chem Int Ed Engl 2017;56(14):3770–821. [DOI] [PubMed] [Google Scholar]
- 3. Goldenzweig A, Fleishman SJ. Principles of protein stability and their application in computational design. Annu Rev Biochem 2018;87(1):105–29. [DOI] [PubMed] [Google Scholar]
- 4. Li X, Craven TW, Levine PM. Cyclic peptide screening methods for preclinical drug discovery. J Med Chem 2022;65(18):11913–26. [DOI] [PubMed] [Google Scholar]
- 5. Chow HY, Zhang Y, Matheson E, Li X. Ligation Technologies for the Synthesis of cyclic peptides. Chem Rev 2019;119(17):9971–10001. [DOI] [PubMed] [Google Scholar]
- 6. Driggers EM, Hale SP, Lee J, Terrett NK. The exploration of macrocycles for drug discovery--an underexploited structural class. Nat Rev Drug Discov 2008;7(7):608–24. [DOI] [PubMed] [Google Scholar]
- 7. Xiao W, Ma W, Wei S, et al. High-affinity peptide ligand LXY30 for targeting α3β1 integrin in non-small cell lung cancer. J Hematol Oncol 2019;12(1):56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Damjanovic J, Miao J, Huang H, Lin YS. Elucidating solution structures of cyclic peptides using molecular dynamics simulations. Chem Rev 2021;121(4):2292–324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Fang Y, Jiang Y, Zou Y, et al. Targeted glioma chemotherapy by cyclic RGD peptide-functionalized reversibly core-crosslinked multifunctional poly(ethylene glycol)-b-poly(ε -caprolactone) micelles. Acta Biomater 50:396–406. [DOI] [PubMed] [Google Scholar]
- 10. Rothen-Weinhold A, Besseghir K, De Zelicourt Y, et al. Development and evaluation in vivo of a long-term delivery system for vapreotide, a somatostatin analogue. Journal of controlled release: official journal of the Controlled Rel ease Society 1998;52(1–2):205–13. [DOI] [PubMed] [Google Scholar]
- 11. Desimmie BA, Humbert M, Lescrinier E, et al. Phage display-directed discovery of LEDGF/p75 binding cyclic peptide inhibitors of HIV replication. Mol Ther 2012;20(11):2064–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. White KM, Rosales R, Yildiz S, et al. Plitidepsin has potent preclinical efficacy against SARS-CoV-2 by targeting the host protein eEF1A. Science 2021;371(6532):926–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Guada M, Beloqui A, Kumar MN, et al. Reformulating cyclosporine a (CsA): more than just a life cycle management strategy. Journal of controlled release: official journal of the Controlled Release Society 2016;225:269–82. [DOI] [PubMed] [Google Scholar]
- 14. Shukla R, Peoples AJ, Ludwig KC, et al. An antibiotic from an uncultured bacterium binds to an immutable target. Cell 2023;186(19):4059–4073.e27. [DOI] [PubMed] [Google Scholar]
- 15. Zhang J, Yuan J, Li Z, et al. Exploring and exploiting plant cyclic peptides for drug discovery and development. Med Res Rev 2021;41(6):3096–117. [DOI] [PubMed] [Google Scholar]
- 16. Joo SH, Xiao Q, Ling Y, et al. High-throughput sequence determination of cyclic peptide library membe rs by partial Edman degradation/mass spectrometry. J Am Chem Soc 128(39):13000–9. [DOI] [PubMed] [Google Scholar]
- 17. Rentero Rebollo I, Sabisz M, Baeriswyl V, Heinis C. Identification of target-binding peptide motifs by high-throughput sequencing of phage-selected peptides. Nucleic Acids Res 2014;42(22):e169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Hosseinzadeh P, Watson PR, Craven TW, et al. Anchor extension: a structure-guided approach to design cyclic peptides targeting enzyme active sites. Nat Commun 2021;12(1):3384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Kim S, Chen J, Cheng T, et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res 2021;49(D1):D1388–d1395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res 2023;51(D1):D523–d531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Shi G, Kang X, Dong F, et al. DRAMP 3.0: an enhanced comprehensive data repository of antimicrobial peptides. Nucleic Acids Res 2022;50(D1):D488–d496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Wishart DS, Feunang YD, Guo AC, et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 2018;46(D1):D1074–d1082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Flissi A, Ricart E, Campart C, et al. Norine: update of the nonribosomal peptide resource. Nucleic Acids Res 2020;48(D1):D465–d469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Hagberg AA, Schult DA, Swart PJ. Exploring Network Structure, Dynamics, and Function using NetworkX. In: Varoquaux G, Vaught T, and Millman J (eds). Proceedings of the 7th Python in Science Conference. Pasadena, CA USA, 2008, 11–5.
- 25. Osorio D, Rondón-Villarreal P, Torres R. Peptides: a package for data mining of antimicrobial peptides. The R Journal 2015;7(1):4–14. [Google Scholar]
- 26. Hastings J, Owen G, Dekker A, et al. ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res 2016;44(D1):D1214–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Mendez D, Gaulton A, Bento AP, et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 2019;47(D1):D930–d940. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Kanehisa M, Furumichi M, Sato Y, et al. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res 2022;51(D1):D587–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Wang CK, Kaas Q, Chiche L, Craik DJ. CyBase: a database of cyclic protein sequences and structures, with applications in protein discovery and engineering. Nucleic Acids Res 2008;36(Database issue):D206–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Wang G, Li X, Wang Z. APD3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res 2016;44(D1):D1087–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Wang F, Li N, Wang C, et al. DPL: a comprehensive database on sequences, structures, sources and functions of peptide ligands. Database (Oxford) 2020;2020(D1):baaa089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Kaas Q, Yu R, Jin A-H, et al. ConoServer: updated content, knowledge, and discovery tools in the conopeptide database. Nucleic Acids Res 2011;40(D1):D325–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Silva ON, Torres MDT, Cao J, et al. Repurposing a peptide toxin from wasp venom into antiinfectives with dual antimicrobial and immunomodulatory properties. Proc Natl Acad Sci 2020;117(43):26936–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Rowe SM, Spring DR. The role of chemical synthesis in developing RiPP antibiotics. Chem Soc Rev 2021;50(7):4245–58. [DOI] [PubMed] [Google Scholar]
- 35. Hall PR, Hjelle B, Njus H, et al. Phage display selection of cyclic peptides that inhibit Andes virus infection. J Virol 2009;83(17):8965–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Thompson GR, 3rd, Soriano A, Cornely OA, et al. Rezafungin versus caspofungin for treatment of candidaemia and invasive candidiasis (ReSTORE): a multicentre, double-blind, double-dummy, randomised phase 3 trial. Lancet 2023;401(10370):49–59. [DOI] [PubMed] [Google Scholar]
- 37. Yamaguchi H, Hiratani T, Iwata K, et al. Studies on the mechanism of antifungal action of aculeacin a. J Antibiot (Tokyo) 1982;35(2):210–9. [DOI] [PubMed] [Google Scholar]
- 38. Yang X, Wang Y, Byrne R, et al. Concepts of artificial intelligence for computer-assisted drug discovery. Chem Rev 2019;119(18):10520–94. [DOI] [PubMed] [Google Scholar]
- 39. Ma Y, Guo Z, Xia B, et al. Identification of antimicrobial peptides from the human gut microbiome using deep learning. Nat Biotechnol 2022;40(6):921–31. [DOI] [PubMed] [Google Scholar]
- 40. Chu J, Koirala B, Forelli N, et al. Synthetic-Bioinformatic natural product antibiotics with diverse modes of action. J Am Chem Soc 2020;142(33):14158–68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Blin K, Shaw S, Kloosterman AM, et al. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res 2021;49(W1):W29–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Wang Z, Koirala B, Hernandez Y, et al. A naturally inspired antibiotic to target multidrug-resistant pathogens. Nature 2022;601(7894):606–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Sabe VT, Ntombela T, Jhamba LA, et al. Current trends in computer aided drug design and a highlight of drugs discovered via computational techniques: a review. Eur J Med Chem 2021;224:113705. [DOI] [PubMed] [Google Scholar]
- 44. Bhardwaj G, O'Connor J, Rettie S, et al. Accurate de novo design of membrane-traversing macrocycles. Cell 2022;185(19):3520–3532.e26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Rettie SA, Campbell KV, Bera AK, et al. Cyclic peptide structure prediction and design using AlphaFold, bioRxiv 2023:2023.02.25.529956. 10.1101/2023.02.25.529956. [DOI]
- 46. Pandi A, Adam D, Zare A, et al. Cell-free biosynthesis combined with deep learning accelerates de novo-development of antimicrobial peptides. Nat Commun 2023;14(1):7197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Zhou Y, Zhang Y, Zhao D, et al. TTD: therapeutic target database describing target druggability information. Nucleic Acids Res 2023;52(D1):D1465–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Yin J, Chen Z, You N, et al. VARIDT 3.0: the phenotypic and regulatory variability of drug transporter. Nucleic Acids Res 2023;52(D1):D1490–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Zhang Y, Liu X, Li F, et al. INTEDE 2.0: the metabolic roadmap of drugs. Nucleic Acids Res 2023;52(D1):D1355–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Davis AP, Wiegers TC, Johnson RJ, et al. Comparative Toxicogenomics database (CTD): update 2023. Nucleic Acids Res 2022;51(D1):D1257–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Zhang Y, Zhou Y, Zhou Y, et al. TheMarker: a comprehensive database of therapeutic biomarkers. Nucleic Acids Res 2023;52(D1):D1450–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Li F, Yin J, Lu M, et al. DrugMAP: molecular atlas and pharma-information of all drugs. Nucleic Acids Res 2022;51(D1):D1288–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All the software packages used in this study are open source and publicly available and all data and resources of CyclicPepedia are freely available on GitHub at https://github.com/dfwlab/cyclicpepedia.





