Abstract
The unique cyclic structure of cyclic peptides grants them remarkable stability and bioactivity, making them powerful candidates for treating various diseases. However, the lack of standardized tools for cyclic peptide data has hindered their potential in today’s artificial intelligence–driven efficient drug design landscape. To bridge this gap, here we introduce a Python package named cyclicpeptide specifically for cyclic peptide drug design. This package provides standardized tools such as Structure2Sequence, Sequence2Structure, and format transformation to process, convert, and standardize cyclic peptide structure and sequence data. Additionally, it includes GraphAlignment for cyclic peptide–specific alignment and search and PropertyAnalysis to enhance the understanding of their drug-like properties and potential applications. This comprehensive suite of tools aims to streamline the integration of cyclic peptides into modern drug discovery pipelines, accelerating the development of cyclic peptide–based therapeutics.
Keywords: cyclic peptide, drug design, bioinformatics tool, artificial intelligence
Introduction
Cyclic peptides, a class of peptides with cyclic structures, have garnered significant attention in drug design due to their unique drug-like properties. Cyclic peptides have exceptional biological stability and high selectivity for specific targets [1, 2] and exhibit various biological activities, including antibacterial, antitumor, and neurological applications [3–6]. As of now, the Food and Drug Administration (FDA) has approved over 40 cyclic peptide drugs [7], including zilucoplan in October 2023 for generalized myasthenia gravis [8]. Additionally, the early-approved cyclic peptide drug cyclosporine is widely used to prevent rejection in organ transplants and treat autoimmune diseases, making it a first-line treatment option worldwide for preventing graft rejection post-transplant [9].
Despite their significant potential, cyclic peptides encounter challenges in drug design arising from their complex amino acid composition and diverse cyclization methods (such as head-to-tail cyclization, side-chain linkage, and internal cyclization) [10, 11]. The advancements in computing power and artificial intelligence (AI) have empowered researchers to develop data-driven prediction algorithms, including AI-driven Drug Design (AIDD) and Computer-Aided Drug Design (CADD) [12–15]. However, utilizing these algorithms requires extensive standardized knowledge of cyclic peptides, such as amino acid composition and modifications, cyclization sites, and physicochemical properties, which introduces new challenges in cyclic peptide drug design [16–18]. In CADD, researchers commonly employ traditional cheminformatics toolkits, such as RDKit [19] and Biopython [20], for preliminary analyses. Nevertheless, these toolkits are primarily designed for small molecules and linear peptides and are suboptimal for cyclic peptide molecules. Emerging tools like pyPept demonstrate considerable promise in automating the generation of 2D and 3D representations of complex peptides, but its capacity to handle non-natural amino acids and predict the bioactive conformations of peptides remains constrained [21]. AI-driven cyclic peptide design tools such as AfCycDesign have also emerged; however, these tools currently focus on specific functions, for example, cyclic peptide generation [22]. While specialized tools for identifying cyclic peptide amino acid monomers exist—for example, Smiles2Monomers developed by Yoann Dufresne in 2015 [23]—these tools have limitations, including restricted functionality, cumbersome usage, and poor scalability, making them insufficient for current needs in cyclic peptide drug development.
Therefore, we introduce cyclicpeptide, a Python package tailored for cyclic peptide drug design. This package offers functionalities including Structure2Sequence, Sequence2Structure, GraphAlignment, and PropertyAnalysis, covering various application scenarios in cyclic peptide drug development. The open-source code also allows users to extend its capabilities. By integrating these comprehensive tools, cyclicpeptide seeks to seamlessly incorporate cyclic peptides into modern AI-driven drug development workflows, thus expediting the advancement of cyclic peptide–based therapeutics.
Results
Cyclicpeptide toolkit: key functions
cyclicpeptide is a Python package developed based on RDKit [19] and NetworkX [24], specifically designed for the analysis and processing of cyclic peptide molecules. It includes tools such as Structure2Sequence, Sequence2Structure, GraphAlignment, PropertyAnalysis, StructureTransformer and SequenceTransformer (for format transformation), and SequenceGeneration. This package is open-source (https://github.com/dfwlab/cyclicpeptide) and provides detailed documentation (https://dfwlab.github.io/cyclicpeptide/) to support users in effectively utilizing tools (Fig. 1A).
Figure 1.
The workflow and methodology overview of the cyclicpeptide python package. (A) Overview of the cyclicpeptide framework. Cyclicpeptide is developed based on RDKit and NetworkX, which includes functionalities: Structure2Sequence, Sequence2Structure, GraphAlignment, format transformation (StructureTransformer and SequenceTransformer), PropertyAnalysis, and SequenceGeneration. (B) Workflow diagram of Structure2Sequence. The process begins with detecting the backbone from the cyclic peptide structure, and then expanding the backbone to locate each monomer. The sequence is generated via monomer alignment and assembly, with an option to customize the monomer library during alignment. Structure2Sequence can produce an output report in HTML format. (C) Workflow diagram of Sequence2Structure. By decoding the sequence, the order of each amino acid and its bonds are determined. Then, through monomer alignment, the peptide is assembled into a head-to-tail cyclic peptide chain. (D) Schematic diagram of the GraphAlignment tool. Amino acid sequences are converted into sets of nodes and edges and then visualized using NetworkX. Two algorithms are available for similarity calculation: Graph_similarity() for computing graph edit distance and mcs_similarity() for determining the maximum common subgraph. (E) Various physicochemical properties can be calculated in PropertyAnalysis. (F) Structural format and sequence format transformation. StructureTransformer also provides 3D structure prediction. (G) Schematic diagram of the SequenceGeneration tool. This tool automatically generates new cyclic peptide sequences based on the amino acid substitution table. (H) Comparison of functionalities across different tools. Cyclicpeptide offers comprehensive functionality and is suitable for various analyses and processing tasks in cyclic peptide development.
Structure2Sequence
This tool performs amino acid monomer identification on the SMILES (Simplified Molecular Input Line Entry System) representation of cyclic peptides, splitting them into amino acid residues and converting them into sequence format (Fig. 1B). The Structure2Sequence function is suitable for various cyclization methods and supports over 500 natural and non-natural amino acid monomers (Table S1). With Structure2Sequence, researchers can quickly infer possible sequences from cyclic peptides with known structures. To visualize the structure-to-sequence conversion process, the tool also offers an HyperText Markup Language (HTML) report, which includes graphs of the cyclic peptide structure and the conversion process (Fig. 1B). This report enables users to verify the accuracy of the conversion and make adjustments if necessary.
Sequence2Structure
This tool can convert known sequences (in one-letter code or amino acid chains) into cyclic structures (SMILES format). By converting cyclic peptide sequences into graphs, the monomers and their bonding relationships were decoded. Sequence2Structure then extracts the corresponding monomer structures from the monomer library and assembles the cyclic peptide structure based on these bonding connections (Fig. 1C).
GraphAlignment
The GraphAlignment tool enables similarity comparison between cyclic peptide molecules (Fig. 1D). Leveraging NetworkX, it converts amino acid sequence format into graphs, allowing for an intuitive assessment of similarity between two cyclic peptide molecules in terms of nodes and edges. This approach overcomes the limitations of traditional linear sequence alignment algorithms when applied to cyclic structures. GraphAlignment offers two similarity metrics: graph edit distance and maximum common substructure.
PropertyAnalysis
PropertyAnalysis computes physicochemical properties and provides metrics such as Topological Polar Surface Area, Complexity, Log(P), Hydrogen Bond Donor Count, and Hydrogen Bond Acceptor Count (Fig. 1E), assisting researchers in understanding the drug-like properties and application potential of cyclic peptides more effectively.
Format transformation
cyclicpeptide also offers tools for format transformation of both structures (via StructureTransformer) and sequences (via SequenceTransformer) (Fig. 1F). The StructureTransformer supports multiple structural formats such as SMILES, InChI, and Protein Data Bank (PDB) for 3D structures, while the SequenceTransformer covers various sequence formats, including one-letter codes, IUPAC (International Union of Pure and Applied Chemistry) condensed format, amino acid chains, and graphical representations. These two tools can standardize the formats of cyclic peptide sequences and structures provided by users, ensuring compatibility with downstream tools such as Structure2Sequence, Sequence2Structure, and GraphAlignment.
SequenceGeneration
cyclicpeptide provides a sequence-based tool for generating cyclic peptides (SequenceGeneration, Fig. 1G). Users can automatically generate potential new cyclic peptide sequences using either a built-in amino acid substitution table or a custom table, which can be fed into other tools for further exploration.
We compared cyclicpeptide with several existing tools for peptide analysis (Fig. 1H). For instance, smiles2monomers is a web-based visualization tool that primarily focuses on structure conversion. RDKit provides visualization along with mutual conversion between sequence and structure, property calculations, and structural format transformation; however, it is mainly designed for small molecules. Biopython offers various fundamental features but lacks functionality for cyclic peptide design. Tools like pyPept and AfCycDesign concentrate on cyclic peptide structure generation with limited additional functionality. In contrast, our cyclicpeptide offers a comprehensive suite of tools tailored for cyclic peptides, including mutual conversion between sequence and structure (via Structure2Sequence and Sequence2Structure), sequence comparison (via GraphAlignment), structure and sequence format transformation (via StructureTransformer and SequenceTransformer), cyclic peptide generation (via SequenceGeneration), property calculation (via PropertyAnalysis), and visualization. Together, these features provide robust support for cyclic peptide drug development.
Tool’s reliability validation
To demonstrate the reliability of cyclicpeptide package, we extracted four sets of cyclic peptide data from the CyclicPepedia knowledge base [25]: (i) 830 cyclic peptides with both structure and sequence information, (ii) 484 monocyclic peptides with structure and sequence information, featuring head-to-tail cyclization, (iii) 670 monocyclic peptides with structural data and head-to-tail cyclization, and (iv) 4658 monocyclic peptides with sequence information and head-to-tail cyclization (4342 in one-letter code format and 316 in IUPAC condensed format) (Fig. 2A). These four datasets were used to validate the accuracy and stability of Structure2Sequence and Sequence2Structure (Fig. 2B). Accuracy is defined by whether the structure (or sequence) obtained from sequence-to-structure (or structure-to-sequence) conversion matches the observed structure (or sequence). Stability is defined by the consistency maintained when a sequence (or structure) is converted to a structure (or sequence) and then converted back to its original form.
Figure 2.
Validation of the cyclicpeptide python package. (A) Selection process of the validation datasets. From the 8745 cyclic peptides in the CyclicPepedia knowledge base, the following subsets were selected: 830 cyclic peptides with both structural and sequence information, 484 monocyclic peptides with both structure and sequence information as well as head-to-tail cyclization, 670 head-to-tail monocyclic peptides with solely structure information, and 4658 head-to-tail monocyclic peptides with only sequence information. (B) Reliability validation is divided into accuracy validation and stability validation. Accuracy validation involves converting structures (or sequences) into sequences (or structures) using conversion tools (Structure2Sequence and Sequence2Structure) and then comparing the results with their corresponding known counterparts. Stability validation involves comparing processed structures or sequences after two rounds of conversion with the originals to assess information loss during the conversion process. (C) Accuracy validation of the Structure2Sequence tool. Conformational differences refer to discrepancies in amino acid conformations between converted sequences and originals, often arising from unspecified conformations and mixed conformations in the original sequences. (D) Accuracy validation of the Sequence2Structure tool. Incorrect conversions were primarily due to special cyclization methods as well as the presence of prosthetic groups (e.g. metal ions and acetic acid). (E) Stability validation of structure conversion. (F) Stability validation of sequence conversion. Conformational differences appeared in the IUPAC condensed format. The one-letter code format does not specify conformations.
The a set was divided into multiple subsets based on amino acid composition (natural and non-natural amino acids) and cyclization type (monocyclic and polycyclic) for validating the Structure2Sequence. The Structure2Sequence achieved an average amino acid detection rate of 99.6% based on the existing dataset (Fig. 2C, Table S2). Among the remaining 0.4%, 0.1% were due to inconsistencies between the original sequence and structure (Fig. S1A), and 0.3% were due to incorrect splitting of amino acids in ester-bond cyclization (Fig. S1B). Errors in the segmentation of ester-bond-cyclized amino acids require further judgment based on prior knowledge of chemical reactions. Currently, cyclicpeptide provides detailed reports of multiple segmentation strategies for users.
The b set was used to validate Sequence2Structure, achieving a structure prediction accuracy of 97.7% (Fig. 2D). Incorrect conversion primarily arose from special cyclization methods, such as disulfide bonds and ester bonds, as well as the presence of prosthetic groups (e.g. metal ions and acetic acid). Discrepancies stem from the tool’s default assumption of head-to-tail peptide bond cyclization (Fig. S1C), making it less effective for predicting structures involving side-chain, ester-bond, or disulfide-bond cyclization. External editing tools are therefore necessary to modify these results.
Furthermore, stability validation of Structure2Sequence and Sequence2Structure was performed using datasets c and d (Fig. 2B). As shown in Fig. 2E and F, both conversion tools demonstrated high stability (dataset c: 98.2% and dataset d: 97.5%), ensuring that no information about the cyclic peptides was lost during the conversion process, except for prosthetic groups.
To further assess the practicality of cyclicpeptide, we compared it with several existing analytical tools, such as Smiles2Monomers, Biopython, RDKit, and pyPept, in terms of processing time and result accuracy. As shown in Table 1, both Smiles2Monomers and the Structure2Sequence tool in cyclicpeptide accurately identified amino acids during structure-to-sequence conversion. However, for the same data volume, Structure2Sequence was significantly faster than Smiles2Monomers—processing 452 cyclic peptides in 57.2 seconds. In addition, both pyPept and cyclicpeptide’s Sequence2Structure tool can convert cyclic peptide sequences into SMILES format, but Sequence2Structure demonstrated higher conversion efficiency. pyPept requires input sequences in Boehringer Ingelheim Line Notation (BILN) or Hierarchical Editing Language for Macromolecules (HELM) format, which adds an extra step before conversion.
Table 1.
Comparison of structure and sequence conversion performance.
| Tool | N a | Time | Results |
|---|---|---|---|
| Structure to Sequence | |||
| Smiles2Monomers | 452 | 64 m 38.4 s | 100% |
| Structure2Sequence | 452 | 57.2 s | 100% |
| Sequence to Structure | |||
| PyPept | 452 | 11.1 s | 100% |
| Sequence2Structure | 452 | 2.2 s | 100% |
The performance of these tools was evaluated on a personal computer (Mac system with Apple M1-core 8 CPUs). Smiles2Monomers was performed through its website.
a N denotes the number of cyclic peptides used for validation.
For the database lookup comparison, we used the head-to-tail cyclic peptide “AAGFPVFF” as the query (Table 2). The database (N = 452) included both the exact sequence “AAGFPVFF” and a rearranged cyclic variant, “PVFFAAGF.” We evaluated whether these tools could retrieve both “AAGFPVFF” and “PVFFAAGF” from the database. Biopython, which is designed for linear peptides, failed to identify the cyclic variant, yielding a similarity score of 0.54. In contrast, RDKit and GraphAlignment demonstrated robust performance; however, they each have limitations. RDKit requires complete structural information as it focuses on recognizing structural similarity (Table S2), while GraphAlignment has a lengthy processing time. To speed up GraphAlignment, we developed an estimation model for GraphAlignment using a graph convolutional network (GA_GCN, Fig. S2, Table 2). GA_GCN can accurately predict GraphAlignment scores (mean square error = 0.000295, Pearson r = 0.995, Spearman r = 0.985, Fig. S2C) and significantly reduces computation time (0.3 s, Table 2). Overall, each algorithm captures different cyclic peptide properties (Fig. S3), offering options for diverse research needs.
Table 2.
Database lookup for head-to-tail cyclic peptide “AAGFPVFF.”
| Tool | N a | Time | Similarity score | |
|---|---|---|---|---|
| AAGFPVFF | PVFFAAGF | |||
| Biopython | 452 | 1.3 s | 1.00 | 0.54 |
| RDKit | 452 | 0.4 s | 1.00 | 1.00 |
| GraphAlignment | 452 | 69 m 29.3 s | 1.00 | 1.00 |
| GA_GCN | 452 | 0.3 s | 1.00 | 1.00 |
In the database lookup comparison, we used the head-to-tail cyclic peptide “AAGFPVFF” as the query. The database (N = 452) contained the exact sequence “AAGFPVFF” and a rearranged cyclic variant, “PVFFAAGF.” We tested whether these tools could retrieve both “AAGFPVFF” and “PVFFAAGF” from the database. RDKit performed a structure-based search, requiring structural information of cyclic peptides, while other tools were applied to cyclic peptide sequences. GraphAlignment was calculated based on graph edit distance. GA_GCN, a graph convolutional network model, was used to estimate the GraphAlignment score (graph edit distance). These tools were evaluated on a personal computer (Mac system with Apple M1-core 8 CPUs).
a N is the database size used for searching.
Case study 1: mutual conversion between structure and sequence and similarity analysis
In this case study, we used three examples to better illustrate the application of fundamental functions in cyclicpeptide (Fig. 3). Structure2Sequence was employed to convert the structure of “Alpha-Amanitin” into sequences (Fig. 3A). Users can view the matched amino acids and possible sequences through the output HTML report. Additionally, Structure2Sequence differentiates between peptide and nonpeptide bonds using solid and dashed lines, respectively.
Figure 3.
Example results of cyclicpeptide’s fundamental functions. (A) The SMILES string of “alpha-Amanitin” was converted into a sequence using Structure2Sequence. (B) The sequence “AIPFNSL” was converted to a SMILES string using Sequence2Structure with the parameter “cyclic” set to true. (C) The “graph_similarity” (graph edit distance) and “mcs_similarity” (maximum common subgraph) computed by GraphAlignment are shown.
The Sequence2Structure supports sequence formats including amino acid chains and one-letter code. The sequence “AIPFNSL” was fed into Sequence2Structure for sequence-to-structure conversion (Fig. 3B). The tool split the sequence into individual amino acids and connection bonds and then matched them with the monomer library. The identified monomers were assembled into a cyclic peptide through head-to-tail cyclization.
Using “Cys(1)(2)—Cys—OH-DL-Val(2)—4OH-Leu—OH-Ile(1)” as the Reference sequence for the GraphAlignment algorithm, three similar sequences were selected as queries. As shown in Fig. 3C, GraphAlignment correctly determined that Query1 is the same cyclic peptide as the Reference. It also accurately highlighted the differences in connection bonds and amino acids between the Reference and Query2, as well as between the Reference and Query3. Moreover, users can choose between the graph edit distance or the maximum common substructure algorithms to calculate similarity.
Case study 2: design of novel cyclosporine A analogs
To demonstrate the role of the cyclicpeptide toolkit in cyclic peptide drug design, we conducted a case study on the design of cyclosporin A analogs. Sequence alignment of cyclosporin A and its analogs reveals that the amino acids at positions 2 and 3 may be critical for determining immunosuppressive activity [26] (Fig. 4A). Derivatives such as alisporivir and SCY-635 eliminate immunosuppressive effects while enhancing antiviral or other pharmacological activities (Fig. 4B).
Figure 4.
Design of novel cyclosporine A (CsA) analogs. (A) The workflow for drug design applications using the cyclicpeptide toolkit, exemplified by CsA. Structures of CsA and its analogs (such as Alisporivir) were obtained from drug databases like DrugBank. The Structure2Sequence tool was used to derive the amino acid chains, and GraphAlignment was employed to compare sequence similarities, identifying key sites for modification. Based on these findings, specific modification rules were established. SequenceGeneration was used to randomly generate cyclic peptide sequences according to the established criteria. These sequences were converted into 3D structures via Sequence2Structure and StructureTransformer and then docked with the known target, cyclophilin a (CypA) from the PDB database, using AutoDock. Analogs with better docking efficiency were selected, and their physicochemical properties were calculated using PropertyAnalysis for further study. (B) The sequence of CsA analogs, with highlighted amino acids indicating differences from the CsA sequence. (C) Illustration of the interaction between CsA and CypA, highlighting specific binding sites. (D) Illustration of the interaction between CsA analogs, such as alisporivir and other analogs, with CypA, along with their specific binding sites.
Based on these observations, modification rules were established in SequenceGeneration, resulting in 66 cyclosporin A analog sequences with variations at positions 2 and 3 and random modifications at other positions (variation rate < 30%, Table S3). Using the Sequence2Structure and StructureTransformer, the structures of these sequences were predicted and then subjected to protein–target docking with cyclophilin A. Among them, three top-scoring analogs (CsA_RP47, CsA_RP29, and CsA_RP48) exhibited the highest docking activity with cyclophilin A (Fig. 4B and D). Compared to cyclosporine A (Fig. 4C), these three analogs displayed a spatial conformation similar to that of Alisporivir, suggesting that they may also possess nonimmunosuppressive properties (Fig. 4D).
Cyclicpeptide user manual
Once the cyclicpeptide toolkit has been successfully installed, users need to call specific modules (Structure2Sequence, Sequence2Structure, GraphAlignment, etc.) to utilize its functions.
For structure-to-sequence conversion, users can import the Structure2Sequence module from the cyclicpeptide package and call the transform() function to convert structures (SMILES). The “monomers_path” parameter defaults to the built-in monomer library, but users can specify a custom monomer library through this parameter. The “report” parameter controls whether to output a results report (Fig. 5A).
Figure 5.
Code usage examples. (A) Example usage of the Structure2Sequence tool. The input structure for the transform() function must be in SMILES format; other formats can be converted using the StructureTransformer tool. (B) Example usage of the Sequence2Structure tool. The sequence input for seq2stru_non_naturalAA() must be in amino acid chain format, while the sequence input for seq2stru_naturalAA() can be in either one-letter code or amino acid chain format. Other formats can be converted using the SequenceTransformer tool. (C) Example usage of the GraphAlignment tool. The sequence input for sequence_to_node_edge() must be in amino acid chain format.
For sequence-to-structure conversion, first use the reference_aa_monomer() function to read the monomer library and then use seq2stru_non_naturalAA() to convert the sequence (amino acid chain) into a structure (SMILES) (Fig. 5B).
For sequence similarity comparison, first use the sequence_to_node_edge() function from GraphAlignment module to obtain nodes and edges information from sequences (amino acid chain). Then, create a NetworkX graph using the create_graph() function and choose a similarity calculation method (graph_similarity(), mcs_similarity()) to compute the similarity value (Fig. 5C).
In addition, cyclicpeptide provides the IOManager module for drawing and exporting images, such as plot_smiles() (Fig. 5B) and graph2svg() (Fig. 5C). Detailed usage instructions for the toolkit can be found in the documentation (Supplemental file, https://dfwlab.github.io/cyclicpeptide/).
Discussion
Despite cyclic peptides becoming a significant focus in drug development, there remains a lack of analytical tools specifically developed for cyclic peptide drug design. To address this, we developed a Python package named cyclicpeptide to assist in cyclic peptide drug design. This package provides tools such as Structure2Sequence, Sequence2Structure, GraphAlignment, and other functionalities, offering technical support for early-stage drug design of cyclic peptides.
AIDD and CADD are becoming integral in drug design, complementing traditional experimental methods. These new techniques, particularly through the use of deep learning and machine learning, can predict the membrane permeability of cyclic peptides and facilitate the design of target-specific cyclic peptide drugs for specified targets and diseases, significantly accelerating the drug design process [27–29]. These data-driven drug design approaches heavily rely on the quality and quantity of datasets [30, 31]. cyclicpeptide effectively addresses this issue in the early stages of cyclic peptide drug development by providing standardized datasets through its functionalities, including Structure2Sequence, Sequence2Structure, SequenceGeneration, and format transformation.
The GraphAlignment tool in cyclicpeptide facilitates the identification of structural similarities and differences in cyclic peptides, including amino acid composition and bond types, which is crucial for discovering new drug mechanisms and potential targets. However, due to the computational complexity of graph operations, we recommend optimizing performance through parallel computing, our GraphAlignment estimation model (GA_GCN), or by integrating GraphAlignment with other tools (e.g. Biopython, RDKit). Moreover, SequenceGeneration can be used to generate new cyclic peptides, and the PropertyAnalysis tool offers comprehensive physicochemical analysis, enabling researchers to discover new cyclic peptide drugs and assess their drug-like properties and application potential.
Our cyclicpeptide offers various functionalities for cyclic peptide design but comes with certain limitations that require further optimization. In particular, the GraphAlignment algorithm may result in a notable increase in computation time and memory usage when applied to large-scale datasets. To address this issue, we introduce a graph convolution model to approximate the GraphAlignment scoring. Furthermore, the synthesis of cyclic peptides is a complex and precise process involving various chemical reactions and molecular interactions. However, the current toolkit can only generate 2D structures of cyclic peptides and predict potential 3D structures based on energy minimization, which may differ from the actual experimental data. Users can employ tools like AfCycDesign [22] to generate 3D structures and verify the results. In the future version, we plan to incorporate methods to support the analysis of cyclic peptide synthesis, further expanding the applications of cyclicpeptide.
In summary, cyclicpeptide is a powerful analytical tool that significantly accelerates the early-stage design of cyclic peptide therapeutics, with ongoing improvements aimed at enhancing its computational efficiency and user accessibility.
Methods
Structure2Sequence
The Structure2sequence tool is based on the RDKit package (version 2023.9.5), enabling the conversion of cyclic peptide molecules from SMILES format to sequence format. The process involves identifying the cyclic peptide backbone from the SMILES string, followed by expanding the backbone to locate and isolate each amino acid. The isolated amino acid residues are then matched with a monomers library to generate the corresponding amino acid sequence (Fig. 1B). For greater flexibility, a “monomers_path” parameter is provided. By modifying this parameter, users can specify a custom amino acid monomer library to accommodate their research needs.
Sequence2structure
The Sequence2Structure tool, built on the RDKit package, can convert cyclic peptide molecules from sequence format to SMILES format. It first splits the cyclic peptide sequence into an amino acid set and a connection bond set and then retrieves the SMILES string for each amino acid from the monomer library and assembles them sequentially into a complete SMILES structure (Fig. 1C). The function also provides a “monomers_path” parameter. A Boolean parameter named “cyclic” can be set to specify whether a cyclic peptide should be generated.
GraphAlignment
GraphAlignment is used to screen sequences similar to the given amino acid sequence. Specifically, the target amino acid sequences (Reference and Query) are first split into sets of nodes and edges and then visualized using NetworkX (version 3.2.1). To calculate the similarity between sequences, two algorithms can be utilized: graph_similarity() and mcs_similarity() (Fig. 1E). The graph_similarity() algorithm evaluates similarity by counting the minimum number of edit operations (i.e. adding, deleting, or replacing nodes and edges) required to transform one graph into another. In contrast, the mcs_similarity() algorithm measures similarity by identifying the largest subgraph shared by both graphs, that is, the substructure containing the most shared nodes and edges.
To accelerate the speed of alignment, we employed a graph convolutional network to approximate the GraphAlignment score (graph edit distance calculated by graph_similarity()). Each cyclic peptide sequence was represented by a graph, with nodes representing amino acid residues and edges representing peptide bonds. Each residue was converted into a vector using an amino acid dictionary. The two input graphs were processed through multiple convolutional layers (Conv1 and Conv2), which aggregated local structural information from neighboring nodes to capture specific patterns and features within each peptide. The encoded information of the two cyclic peptides was then passed through fully connected layers to calculate a similarity score. The model was trained in PyTorch (version 12.1) for 200 epochs. GraphAlignment results of ~80 000 cyclic peptide pairs were used for model construction and testing (with a random 20% reserved for testing). Mean squared error loss was used during training and served as the evaluation metric for the model.
StructureTransformer
StructureTransformer is based on the RDKit package and supports mutual conversion between diverse structural formats, including SMILES, InChI, InChIKey, mol block, and PDB block. It is important to note that to generate a PDB block, the 3D conformation must first be predicted using predict_3d_conformation(mol).
SequenceTransformer
SequenceTransformer can read various formats of cyclic peptide sequences, including Graph presentation, IUPAC condensed, amino acid chain, and one-letter code, using the read_sequence(seq) function. It outputs the nodes and edges of the cyclic peptide, which can then be used with the create_sequence(nodes, edges) function to generate sequences in different formats. This allows for convenient conversion between multiple sequence formats.
PropertyAnalysis
PropertyAnalysis, based on the RDKit, integrates the calculation of various physicochemical properties, for example, exact mass, topological polar surface area, complexity, and hydrogen bond donor count. It also supports the generation of molecular fingerprints in different formats, such as RDKit fingerprints, topological fingerprints, Morgan fingerprints, and MACCSkeys fingerprints.
SequenceGeneration
The SequenceGeneration tool can generate new cyclic peptides from a base sequence using either a built-in amino acid substitution table or custom user-defined substitution rules. SequenceGeneration randomly replaces specified amino acids to create novel cyclic peptide sequences.
Validation
Four validation datasets were selected from the CyclicPepedia knowledge base based on criteria such as structural, sequence information, number of rings, and the cyclization method (head-to-tail cyclization). These sets are designated as follows: (i) cyclic peptides with both structural and sequence information, (ii) cyclic peptides with both structural and sequence information and head-to-tail cyclization, (iii) cyclic peptides with structural information and head-to-tail cyclization, and (iv) cyclic peptides with sequence information (in one-letter code or IUPAC condensed format) and head-to-tail cyclization (Fig. 2A).
The reliability of cyclicpeptide was validated from both accuracy and stability perspectives. For accuracy, structures from the a set were converted to sequences using Structure2Sequence, and the resulting sequences were compared to their original sequences using GraphAlignment. The sequences of the b set were converted to structures using Sequence2Structure, and these structures were then compared to the original structures using RDKit to determine consistency. Stability refers to the error-free conversion between cyclic peptide structural and sequence information during transformation. For stability validation, structures from the c set were first converted to sequences using Structure2Sequence. These sequences were then converted back to structures using Sequence2Structure, and the resulting structures were compared to the originals for accuracy. Similarly, for the d set, sequences were first converted to structures and then converted back to sequences, with the final sequences being compared to the original sequences to ensure consistency. (Fig. 2B).
Case study on the cyclosporin A analog design
Information on cyclosporin A and its known analogs, including SMILES, mechanism of action, targets, and structural details, was collected from databases such as DrugBank and PDB. Using the Structure2Sequence tool, we converted the structural information (SMILES) of cyclosporin A and its analogs into sequences. Sequence similarity was calculated using GraphAlignment. Based on the results of manual alignment, modification rules were established and then applied in SequenceGeneration. Amino acids in the cyclosporin A sequence were randomly substituted to generate multiple analog sequences. The Sequence2Structure and StructureTransformer tools were utilized to predict the 3D structures of these analogs, which were subsequently used as ligands for docking.
We conducted batch protein–ligand docking using AutoDock Vina [32]. The crystal structure of cyclophilin A, downloaded from the PDB, was prepared by removing the original ligand, dehydrating, removing nonstandard residues, decharging, and adding hydrogens. The active binding sites were automatically calculated from the protein complex. Ligand structures were minimized for energy using AutoDock Vina. Protein and cyclic peptide ligands were docked in AutoDock Vina, and the docking energy scores were ranked. The top-performing cyclosporin A analogs were selected for further analysis and visualized as complexes in PyMol (V3.1.0) [33]. Their physicochemical properties were calculated using PropertyAnalysis, providing insights for subsequent studies.
Key Points
Cyclic peptides have enormous potential as therapeutic drugs, but the lack of standardized tools has slowed their development.
We introduced a Python package named cyclicpeptide that provides multiple standardized tools for cyclic peptide drug design.
cyclicpeptide can efficiently process and convert cyclic peptide structure and sequence, which may speed up the early stages of cyclic peptide drug design.
Supplementary Material
Contributor Information
Liu Yang, National Center, Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, 3333 Binsheng Road, Hangzhou 310052, P. R. China.
Suqi Cao, National Center, Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, 3333 Binsheng Road, Hangzhou 310052, P. R. China.
Lei Liu, Department of Gastroenterology, Shanghai Tenth People’s Hospital, School of Life Sciences and Technology, Tongji University, 1239 Siping Road, Shanghai 200072, P. R. China.
Ruixin Zhu, Department of Gastroenterology, Shanghai Tenth People’s Hospital, School of Life Sciences and Technology, Tongji University, 1239 Siping Road, Shanghai 200072, P. R. China.
Dingfeng Wu, National Center, Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, 3333 Binsheng Road, Hangzhou 310052, P. R. China.
Conflict of interest: The authors declare that they have no conflict of interest.
Funding
This work was supported by the National Natural Science Foundation of China (82304351 to L.L., 32200529 to D.W.). The funders had no role in study design, data collection, and analysis, decision to publish, or preparation of the manuscript.
Data availability
All the software packages used in this study are open source and publicly available and all data and resources of cyclicpeptide are freely available on GitHub at https://github.com/dfwlab/cyclicpeptide and Pypi at https://pypi.org/project/cyclicpeptide/.
Author contributions
D.W. conceived and designed the project. Each author has contributed significantly to the submitted work. L.Y. and S.C. collected the data, performed data analysis, and built the package. L.Y. and S.C. wrote the original manuscript. All authors revised the manuscript. All authors read and approved the final manuscript.
Acknowledgments
We are grateful to all the subjects who participated in this study.
References
- 1. Li X, Craven TW, Levine PM. Cyclic peptide screening methods for preclinical drug discovery. J Med Chem 2022;65:11913–26. 10.1021/acs.jmedchem.2c01077. [DOI] [PubMed] [Google Scholar]
- 2. Harding CJ, Bischoff M, Bergkessel M. et al. An anti-biofilm cyclic peptide targets a secreted aminopeptidase from P. Aeruginosa. Nat Chem Biol 2023;19:1158–66. 10.1038/s41589-023-01373-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Muratspahić E, Deibler K, Han J. et al. Design and structural validation of peptide-drug conjugate ligands of the kappa-opioid receptor. Nat Commun 2023;14:8064. 10.1038/s41467-023-43718-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Fetse J, Zhao Z, Liu H. et al. Discovery of cyclic peptide inhibitors targeting PD-L1 for cancer immunotherapy. J Med Chem 2022;65:12002–13. 10.1021/acs.jmedchem.2c00539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Ribeiro R, Pinto E, Fernandes C. et al. Marine cyclic peptides: Antimicrobial activity and synthetic strategies. Mar Drugs 2022;20:397. 10.3390/md20060397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Damjanovic J, Miao J, Huang H. et al. Elucidating solution structures of cyclic peptides using molecular dynamics simulations. Chem Rev 2021;121:2292–324. 10.1021/acs.chemrev.0c01087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Jain S, Gupta S, Patiyal S. et al. THPdb2: Compilation of FDA approved therapeutic peptides and proteins. Drug Discov Today 2024;29:104047. 10.1016/j.drudis.2024.104047. [DOI] [PubMed] [Google Scholar]
- 8. Shirley M. Zilucoplan: First approval. Drugs 2024;84:99–104. 10.1007/s40265-023-01977-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Hansson MJ, Elmér E. Cyclosporine as therapy for traumatic brain injury. Neurotherapeutics 2023;20:1482–95. 10.1007/s13311-023-01414-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Chow HY, Zhang Y, Matheson E. et al. Ligation Technologies for the Synthesis of cyclic peptides. Chem Rev 2019;119:9971–10001. 10.1021/acs.chemrev.8b00657. [DOI] [PubMed] [Google Scholar]
- 11. Jin K. Developing cyclic peptide-based drug candidates: An overview. Future Med Chem 2020;12:1687–90. 10.4155/fmc-2020-0171. [DOI] [PubMed] [Google Scholar]
- 12. Vamathevan J, Clark D, Czodrowski P. et al. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 2019;18:463–77. 10.1038/s41573-019-0024-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Liu Z, Chen Q, Lan W. et al. SSLDTI: A novel method for drug-target interaction prediction based on self-supervised learning. Artif Intell Med 2024;149:102778. 10.1016/j.artmed.2024.102778. [DOI] [PubMed] [Google Scholar]
- 14. Lan W, Wang J, Li M. et al. Predicting drug–target interaction using positive-unlabeled learning. Neurocomputing 2016;206:50–7. 10.1016/j.neucom.2016.03.080. [DOI] [Google Scholar]
- 15. Huang L, Zhang L, Chen X. Updated review of advances in microRNAs and complex diseases: Towards systematic evaluation of computational models. Brief Bioinform 2022;23:bbac407. 10.1093/bib/bbac407. [DOI] [PubMed] [Google Scholar]
- 16. Dong J, Wu Z, Xu H. et al. FormulationAI: A novel web-based platform for drug formulation design driven by artificial intelligence. Brief Bioinform 2023;25:bbad419. 10.1093/bib/bbad419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Bannigan P, Aldeghi M, Bao Z. et al. Machine learning directed drug formulation development. Adv Drug Deliv Rev 2021;175:113806. 10.1016/j.addr.2021.05.016. [DOI] [PubMed] [Google Scholar]
- 18. Theodosiou AA, Read RC. Artificial intelligence, machine learning and deep learning: Potential resources for the infection clinician. J Infect 2023;87:287–94. 10.1016/j.jinf.2023.07.006. [DOI] [PubMed] [Google Scholar]
- 19. Lovric M, Molero JM, Kern R. PySpark and RDKit: Moving towards big data in cheminformatics. Mol Inform 2019;38:e1800082. 10.1002/minf.201800082. [DOI] [PubMed] [Google Scholar]
- 20. Cock PJ, Antao T, Chang JT. et al. Biopython: Freely available python tools for computational molecular biology and bioinformatics. Bioinformatics 2009;25:1422–3. 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Ochoa R, Brown JB, Fox T. pyPept: A python library to generate atomistic 2D and 3D representations of peptides. J Chem 2023;15:79. 10.1186/s13321-023-00748-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Rettie SA. et al. Cyclic peptide structure prediction and design using AlphaFold. bioRxiv 2023,26:2023.02.25.529956. 10.1101/2023.02.25.529956. [DOI] [Google Scholar]
- 23. Dufresne Y, Noé L, Leclère V. et al. Smiles2Monomers: A link between chemical and biological structures for polymers. J Chem 2015;7:62. 10.1186/s13321-015-0111-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Scalfani VF, Patel VD, Fernandez AM. Visualizing chemical space networks with RDKit and NetworkX. J Chem 2022;14:87. 10.1186/s13321-022-00664-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Liu L, Yang L, Cao S. et al. CyclicPepedia: A knowledge base of natural and synthetic cyclic peptides. Brief Bioinform 2024;25:bbae190. 10.1093/bib/bbae190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Dujardin M, Bouckaert J, Rucktooa P. et al. X-ray structure of alisporivir in complex with cyclophilin a at 1.5 Å resolution. Acta Crystallogr F Struct Biol Commun 2018;74:583–92. 10.1107/S2053230X18010415. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Vatansever S, Schlessinger A, Wacker D. et al. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021;41:1427–73. 10.1002/med.21764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Bao Z, Bufton J, Hickman RJ. et al. Revolutionizing drug formulation development: The increasing impact of machine learning. Adv Drug Deliv Rev 2023;202:115108. 10.1016/j.addr.2023.115108. [DOI] [PubMed] [Google Scholar]
- 29. Wang N, Zhang Y, Wang W. et al. How can machine learning and multiscale modeling benefit ocular drug development? Adv Drug Deliv Rev 2023;196:114772. 10.1016/j.addr.2023.114772. [DOI] [PubMed] [Google Scholar]
- 30. Wang YL, Wang F, Shi XX. et al. Cloud 3D-QSAR: A web tool for the development of quantitative structure-activity relationship models in drug discovery. Brief Bioinform 2021;22:bbaa276. 10.1093/bib/bbaa276. [DOI] [PubMed] [Google Scholar]
- 31. Song X, Chai L, Zhang J. Graph signal processing approach to QSAR/QSPR model learning of compounds. IEEE Trans Pattern Anal Mach Intell 2022;44:1963–73. 10.1109/TPAMI.2020.3032718. [DOI] [PubMed] [Google Scholar]
- 32. Eberhardt J, Santos-Martins D, Tillack AF. et al. AutoDock Vina 1.2.0: New docking methods, expanded force field, and python bindings. J Chem Inf Model 2021;61:3891–8. 10.1021/acs.jcim.1c00203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Mooers BHM, Brown ME. Templates for writing PyMOL scripts. Protein Sci 2021;30:262–9. 10.1002/pro.3997. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All the software packages used in this study are open source and publicly available and all data and resources of cyclicpeptide are freely available on GitHub at https://github.com/dfwlab/cyclicpeptide and Pypi at https://pypi.org/project/cyclicpeptide/.





