Integrating Protein Language Model and Molecular Dynamics Simulations to Discover Antibiofouling Peptides

Ibrahim A Imam; Shea Bailey; Duolin Wang; Shuai Zeng; Dong Xu; Qing Shao

doi:10.1021/acs.langmuir.4c04140

. Author manuscript; available in PMC: 2026 Jan 14.

Published in final edited form as: Langmuir. 2024 Dec 30;41(1):811–821. doi: 10.1021/acs.langmuir.4c04140

Integrating Protein Language Model and Molecular Dynamics Simulations to Discover Antibiofouling Peptides

Ibrahim A Imam ^1,^#, Shea Bailey ^2,^#, Duolin Wang ³, Shuai Zeng ⁴, Dong Xu ⁵, Qing Shao ⁶

PMCID: PMC11969446 NIHMSID: NIHMS2069456 PMID: 39810350

Abstract

Antibiofouling peptide materials prevent the nonspecific adsorption of proteins on devices, enabling them to perform their designed functions as desired in complex biological environments. Due to their importance, research on antibiofouling peptide materials has been one of the central subjects of interfacial engineering. However, only a few antibiofouling peptide sequences have been developed. This narrow scope of antibiofouling peptide materials limits their capacity to adapt to the broad spectrum of application scenarios. To address this issue, we searched for antibiofouling peptides in the vast sequence pool of the microbiome library using a combination of deep learning-based high-throughput search and molecular dynamics (MD) simulations. A random forest-based model with an ensemble of ten independent classifiers was developed. Each classifier was trained by prompt-tuning the foundational protein language model Evolution Scaling Modeling version 2 (ESM2) on a distinct training data set. We constructed the databases containing the same amount of antibiofouling and biofouling peptide sequences to attenuate the bias of the existing databases. MD simulations were conducted to investigate the interfacial properties of six selected peptide candidates and their interactions with a lysozyme protein. Two known antibiofouling peptides, (glutamic acid (E)-lysine (K))₁₅ and (EK-proline (P))₁₀, and one known fouling peptide, (glycine)₃₀, were used as the reference. The MD simulation results indicate that five of the six peptides present the potential to resist biofouling. Our research implies that deep learning and molecular simulations can be integrated to discover functional peptide materials for interfacial applications.

Graphical Abstract

graphic file with name nihms-2069456-f0001.jpg

1. INTRODUCTION

Materials that can resist biofouling play a crucial role in developing medical devices that perform their functions in complicated biological systems.^1–9 The biofouling on the device surfaces could segregate them from the biological systems or trigger detrimental effects on the systems.⁵ Developing antibiofouling materials has been one of the central subjects of interfacial engineering. Among various materials, antibiofouling peptides^10–17 are unique because they share amino acids, the same building blocks as proteins, the most abundant molecules in many biological systems. Sharing the same building blocks makes adapting antibiofouling peptides to biological systems easier and brings less concern over the toxicity issue due to material degradation.

Most current antibiofouling peptides are composed of several repetitive motifs such as (glutamic acid (E)-lysine (K)), (aspartic acid (D)-lysine (K)), or (glutamic acid (E)-lysine (K)-proline (P)).^{10,13,18–21} These antibiofouling peptides were first designed, inspired by the concept of zwitterionic antibiofouling synthetic materials, such as carboxybetaine- and sulfobetaine-based polymers.^1,22,23 The EK, DK, and EKP motifs contain cationic and one anionic amino acid residue, mimicking the zwitterionic structures. Later, White et al. revealed that E and K are the two most abundant amino acids on naturally occurring protein surfaces.²⁴ This statistical analysis implies that nature may adopt the “zwitterionic” concept to create proteins that can work in complicated biological systems during evolution. In addition, White et al.²⁵ also showed that the “E” and “K” pair presents a binding free energy weaker than the “R” and “D” pair. The weaker binding free energy may indicate that the zwitterionic “E” and “K” pairs could possess more ability to bind to water molecules instead of forming an association between themselves.

However, it is against intuition that only zwitterionic peptides can resist biofouling. Indeed, E and K are the two most abundant amino acid residues on the protein surface, but many other residues are also present. Their representation on protein surfaces indicates that the sequence space of antibiofouling peptides should be much larger than the zwitterionic pattern. Such a large space could provide a pool to search for peptide materials suitable for various antibiofouling plus “X” application scenarios. Some efforts have been made to design antibiofouling peptides beyond the zwitterionic sequences. Most such efforts proposed and examined a few sequences based on a rationalized design principle, such as the idea that strong hydration leads to antibiofouling. However, these hypothesis-driven efforts can focus on only a small scope of peptide sequence space.

One research effort worth noting is the attempt of the White group to discover antibiofouling peptides using Bayesian networks.²⁶ One significant contribution of their work is to comprise a database mixing synthetic antibiofouling peptide sequences and biofouling sequences from the known protein library. The data within this database support the development of deep learning-based methods. However, the original data samples possess two bias issues. First, the number of biofouling peptides (13,584) is much higher than that of antibiofouling peptides (3602). Second, all the antibiofouling peptide sequences are shorter than 30 amino acid sequences, while about 72% of the biofouling sequences are longer than 30 amino acid sequences (Figure S1). These two imbalance issues in the data set bring bias to the developed models. This observation is consistent with that in the literature. We will create more balanced data sets based on this data set created by White, and the details will be in Section 2.1.

Recently emerged protein language models have paved the way for deep learning-based high-throughput methods.^27–30 A protein language model deploys the amino acid sequence of a protein as input and outputs the embedding, a 1-D vector with a preset dimension. This embedding represents this protein in a latent space. The latent space enables the design of proteins and peptides with identical functions but different sequence patterns.³¹ Another significant advancement in recent years is the expansion of the protein sequence universe.^32–34 The development of sequencing technologies has enabled researchers to identify hundreds of millions or even billions of sequences from nature.^34–36 These naturally occurring sequences provide a rich reservoir to search for functional peptides and proteins using machine learning.^36–38 For instance, researchers discovered antimicrobial peptides by screening the global microbiome libraries.³⁹

Molecular simulations have been another versatile tool for protein and peptide design, either rationalizing the design by unveiling the mechanism or validating a handful of candidates based on in-silico data. Molecular simulations have been playing an important role in understanding and designing antibiofouling materials.^22,40 The molecular simulations of Zheng et al.⁴¹ revealed that most of the forces preventing proteins from the materials come from the water molecules around the material surface instead of the materials themselves. This simulation-based observation leads to the design principle of antibiofouling materials: strong hydration leads to antibiofouling ability.^42,43 The thermodynamic stability of the hydration layer and the slow dynamics of interfacial water molecules create a robust barrier, preventing protein adsorption and enhancing antibiofouling properties. This principle inspired the design of zwitterionic antibiofouling materials because their hydration is stronger than that of polyethylene glycol, the classic “golden” antibiofouling material.^44,45 Shao et al. performed molecular dynamics (MD) simulations to examine the antibiofouling potentials of a group of zwitterionic motifs.^46,47 They further developed three criteria for designing zwitterionic antibiofouling materials: (a) strong hydration, (b) weak self-association, and (c) no stick to proteins. They also show that molecular simulations could help to quickly evaluate the antibiofouling potential of tens of materials without running time-consuming and costly synthesis and experiments.

The foundational protein language model, Evolutionary Scaling Modeling version 2 (ESM2),⁴⁸ was used in this work. Indeed, many other protein language models have been published.^49–53 The ESM2 models utilize relative position encoding instead of traditional absolute position encoding, which enables the models to handle amino acid sequences of arbitrary lengths and enhances their learning efficiency. The ESM2 models have been used as the backbone or the pretrained models in various protein design tasks.^54–56 The ESM2 family contains models that have parameters ranging from 8 million (8M) to 15 billion (15B). The big ESM2 model would demand expensive hardware, although it may provide better prediction performance. For this project, we selected the ESM2 model with 650M parameters, considering the balance between the model performance, speed, and the availability of the hardware.

This work would integrate deep learning and MD simulations to conduct a high-throughput search for antibiofouling peptide candidates beyond the zwitterionic sequences. The search was conducted in two steps. Step 1: we will perform the deep learning-based high-throughput search to identify potential antibiofouling peptide sequences from the microbiome library. Step 2: we will perform MD simulations to unveil the interfacial properties of the selected candidates and verify their antibiofouling potential. We focused on one protein language model, ESM2, due to its advanced learning efficiency and versatility. Moreover, the priority of this work is to discover antibiofouling peptide candidates instead of bench-marking the performance of various deep learning methods. The rest of the paper contains the following sections: Section 2 describe the deep learning methods and MD simulations, Section 3 presents results and discussion, and Section 4 shows the conclusion.

2. METHODS

2.1. Construct Data Sets.

This work built two groups of data sets based on the original data set created by the White group. The constructed data sets will be available on the GitHub repository (https://github.com/qshao/AntiBiofoulingPeptides). The first group consists of ten data sets for training the ten ESM2-based classifiers based on prompt tuning. Each data set contains 2882 antibiofouling peptides and the same number of biofouling peptides selected from the White group data set. Only peptides containing amino acid residues ≤30 would be chosen to attenuate the bias due to the discrepancy in the sequence lengths of antibiofouling and biofouling peptides in the original data set. These peptides were then randomly split into the training and validation sets in a 3:1 ratio. The training process is detailed in Section 2.2.

The second group is a data set containing 720 antibiofouling and 720 biofouling peptide sequences selected from the original data set created by the White group. This data set was created to train the model ensemble based on the ten prompted ESM2 models. The peptides in this data set do not overlap with those in the first group. The antibiofouling and biofouling peptides are randomly split into training, validation, and testing subsets with a 3:1:1 ratio. The numbers of antifouling and fouling sequences are equal in the three subsets.

2.2. Prompt Tuning EMS2 Model for Binary Classification.

This work deploys prompt tuning to adapt the ESM2–650M model to a binary classifier that provides a prediction of antibiofouling (positive) or biofouling (negative). Unlike full-model fine-tuning, prompt tuning does not update the parameters of the base ESM2 model, making it more suitable for small training data. Figure 1 illustrates the continuous prompt tuning process for binary classification using the ESM2 model. Prompt tuning is an efficient method for enhancing large foundation models by introducing trainable parameters known as soft prompts.⁵⁷ The input consists of a peptide sequence with soft prompts prepended to its embeddings. Only the soft prompts are trainable, while the rest of the ESM2 model’s parameters remain fixed. The soft prompts are initialized by randomly drawing embeddings from the ESM2 embedding table to maintain consistency in the input distribution. Since prompt tuning extends the hidden states of the model, we omit the hidden states corresponding to the soft prompts to reduce noise. In the final layers, special tokens (CLS and SEP) added by the ESM2 model are removed, and the enriched amino acid representations, matching the input sequence length, are used as input to a binary classifier.

Figure 1. — Schematic of continuous prompt tuning of the ESM2 model for binary classification. The input consisted of a peptide sequence prepended with a soft prompt. The embedding is processed through the frozen ESM2 model with the detailed prompt architecture shown on the right. The soft prompt has a length of 5, and the maximum peptide length is 30. B represents the batch size; L represents the total length; and E represents an embedding dimension. Specifically, the input size is 64,351,280, intermediate embeddings are 64,351,280, and the final output embedding is (64,301,280). The output of the binary classifier provides a prediction of antibiofouling or biofouling.

The classifier is added to the output of the ESM2 model using the hidden embeddings of ESM2 for binary classification. The classifier consists of an average pooling layer that converts the amino acid-level representations into the protein-level representations and a deep learning neural network (DNN) that predicts whether the peptide is antibiofouling or biofouling. Table S1 lists the hyperparameters of the continuous prompt tunings. Equations 1–4 describe the prompt tuning process in the ESM2 model:

P_{1, 2, \dots, L}^{emb} = emb (P_{1, 2, \dots, L})

(1)

h = ESM (concat (S, P_{1, 2, \dots, L}^{emb}))

(2)

h_{emb} = rm_soft_prompts (h)

(3)

r_{AA} = rm_special_tokens (h_{emb})

(4)

where S_1,2,···,K represents the soft prompts (with K as the hyperparameter for prompt length), P_1,2,···,L represents the input peptide sequence, L represents the sequence length, and emb(.) represents the embedding function in the ESM2 model. The concat(·) concatenates the soft prompts and input sequence embeddings, h represents the hidden states, h_emb represents the hidden states after removing the soft prompts, and r_AA is the final amino acid representation produced by the ESM2 model, which is then passed to the binary classifier.

2.3. Random Forest-Based Model Ensemble.

A random forest (RF) binary classifier was developed to represent the model ensemble composed of the ten prompted ESM2 models. The input of the RF model was the outputs of the ten prompted ESM2 models, and the output was a number on the scale of [0, 1] regarding the antibiofouling ability of the peptides (biofouling: <0.5 and antibiofouling: 0.5–1). This RF binary classifier was trained based on the data set constructed in Section 2.1. The RF classifier was developed and trained using Scikit-learn.⁵⁸ Table S2 lists the hyperparameters for training the RF classifier.

2.4. Molecular Dynamics Simulation.

2.4.1. Molecular Model.

The initial 3D conformations of the six peptides and the three references, (EK)₁₅, (EKP)₁₀, and G₃₀, were predicted using the AlphaFold3⁵⁹ Webserver, a state-of-the-art protein structure prediction tool. The 3D structure of the reference protein, lysozyme (PDB ID: 1LYZ), was retrieved from the Protein Data Bank and processed to remove all water molecules. Hydrogen atoms were added to the peptides and lysozyme structures at a pH of 7 using PDBfixer.⁶⁰

This work creates two types of simulation boxes. The simulation box of peptide systems was created by placing a single peptide chain in the middle of a cubic box, and the simulation box of lysozyme–peptide interactions was made by putting lysozyme and the peptide chain 1.5 nm apart from each other in the middle of a cubic box. The cubic box was filled with water molecules using the TIP3P model,⁶¹ and the net charge of the system was neutralized with Na⁺ counterions. Figure 2 shows a snapshot of the initial configuration for a lysozyme–peptide simulation box. Table S3 lists the initial dimensions of the simulation boxes.

Figure 2. — A snapshot of the simulation box. The lysozyme (blue) and peptide #3 (orange) molecules are 1.5 nm apart and explicitly solvated with water molecules (red).

The lysozyme, peptides, and ions were described using the AMBER14SB force field.⁶² Nonbonded interactions were calculated as the sum of a Lennard-Jones 12–6 potential and a Coulombic potential, as shown in eq 5:

E_{i j} (r_{i j}) = 4 ϵ_{i j} ({(\frac{σ_{i j}}{r_{i j}})}^{12} - {(\frac{σ_{i j}}{r_{i j}})}^{6}) + \frac{e_{i} e_{j}}{4 π ε_{0} r_{i j}}

(5)

where E_ij is the potential energy between atoms i and j due to nonbonded interactions. r_ij is the distance between atoms i and j, ϵ_ij is the energetic parameter, σ_ij is the geometric parameter, and e_i is the partial charge of atom i.

Bonded interactions, including bond, angle, and dihedral terms, were quantified according to AMBER force field⁶³ specifications. Visual Molecular Dynamics (VMD) software⁶³ was employed for comprehensive visual analysis of the molecular structures in this study.

2.4.2. Simulation Details.

The MD simulation for each system involved three steps. First, an energy minimization was conducted using the steepest descent method to remove contacts between atoms. Second, a 200 ns isothermal–isobaric ensemble (NPT, P = 1 atm, T = 310 K, integral step= 2 fs) MD simulation was performed to reach thermodynamic equilibrium. During this step, the Berendsen thermostat⁶⁴ and barostat were employed to control the temperature and pressure. Third, a 300 ns NPT (P = 1 atm, T = 310 K, integral step = 2 fs) MD simulation was carried out with the coordinate trajectory recorded every 200 ps. Temperature coupling was managed using the V-rescale thermostat,⁶⁵ and the Parrinello–Rahman⁶⁶ barostat was used for pressure control. A 1.2 nm cutoff was applied for Lennard-Jones interactions, and long-range electrostatics were calculated using the particle-mesh Ewald⁶⁷ (PME) method. The LINCS⁶⁸ algorithm was used to maintain bond lengths, constraining all bonds involving hydrogen atoms. All simulations and energy minimizations were performed using GROMACS 2022.1.⁶⁹

3. RESULTS AND DISCUSSION

3.1. Evaluation of RF-Based Model Ensemble.

The developed RF-based model ensemble performs equally well in classifying the antibiofouling category of a given peptide. Figure 3 shows the confusion matrix of the developed RF-based model ensemble. The model ensemble predicts 96 out of 140 (69%) antibiofouling peptides and 103 out of 148 (70%) biofouling peptides. The model ensemble presents an overall accuracy of 0.69, with an F1 score of [0.68, 0.70], a precision score of [0.68, 0.70], and a recall score of [0.69, 0.70] for the antibiofouling and biofouling cases. These scores indicate that the RF model ensemble performs equally well in categorizing the peptide into the antibiofouling and biofouling classes.

This equality is critical for identifying potential antibiofouling candidates. Indeed, our RF-based model ensemble does not present accuracy as high as in the literature models^26,70 (0.82 and 0.88). However, the other models reported in the literature were developed based on a biased data set and may not serve well as a screener. For instance, the peptide BERT model⁷⁰ shows an accuracy of 0.88 but a significant gap in its F1 score, precision score, and recall score. The gap in these scores implies that the trained peptide BERT model is overwhelmingly biased toward the biofouling peptides. During our exploration, we observed a similar scenario for our models trained by a biased data set. Such biased models exclude many potential antibiofouling peptide candidates. Our RF-based model may not present an overall accuracy, as reported in the literature. Still, it could discover adequate peptide candidates that can be used for further screening.

3.2. Deep Learning-Based High-Throughput Search.

The deep learning-based high-throughput search involves three steps. We first deployed the RF-based model ensemble to identify antibiofouling peptide candidates from a pool of 1,202,505 sequences with ≤30 amino acid residues from the ProGenome v3 library.⁷¹ Then, we narrowed the list of identified peptide candidates by only selecting those with a net charge because many proteins are charged. We minimized the direct attraction between the proteins and peptides. After this, we rank the peptides based on the ratio of hydrophilic and polar amino acid residues in the sequence because we expect that the hydrophilic amino acids are more likely to bring strong hydration, an essential feature for antibiofouling materials.

We discuss the difference between our procedure and a pure hypothesis-driven design. The pure hypothesis-driven designs may focus on peptides with similar patterns or composed of a small scope of amino acid residues. Our procedure overcomes the issue of hypothesis-driven design. The deep-learning screening in the first step could identify many antibiofouling peptide sequences, possessing a broad spectrum of amino acid residues and patterns. Indeed, Steps 2 and 3 select the candidates from the pool using the hypothesis or knowledge. But, this selection is made on a pool with diversified sequence patterns and amino acid residues. The six candidates examined in this work (Table 1) show a wide spectrum regarding the types of amino acid residues and motif patterns. In contrast, many hypothesis-driven designs may come up with peptides with a very similar pattern or a small scope of amino acid residues.

Table 1.

Amino Acid Sequences of the Six Antibiofouling Peptide Candidates

#	Amino acid sequences
1	MTKRREKTREELTNEIENNQEKIRRYEEF
2	MDYSHKYATKEQYKINEDEKKDKDLSTGF
3	MLWDTANLNISTENEQTKTKNEKEKNRHE
4	MILIQPLNTHSENQKTNSNEQHQNKEDEK
5	MKKFKHTNEELELAKLQSEKEKKEKESEE
6	MKEAQQKSDYTQKREDNAVNSSYPQYPQN

Open in a new tab

We gained six peptide sequences using the three steps, as shown in Table 1. These sequences present a diversity in their amino acid pattern, different from the low complexity pattern in the current zwitterionic antibiofouling peptides. The six peptide candidates possess 11, 14, 14, 12, 11, and 13 amino acid types. On the contrary, most zwitterionic antibiofouling peptides possess 2–4 amino acid types. Such an increase in the number of amino acid types could expand the applicable scenario of antibiofouling peptides if these peptides are proven to be antibiofouling.

Despite the diversity in the amino acid types, some sequences present zwitterionic motifs. For instance, Peptide #1 shows the “EK” motif twice; Peptides #3 and #5 show the “EKEK” motif. Peptide #5 also shows the “KEKE” motif. These zwitterionic motifs may indicate the vital role of this motif in the antibiofouling ability of a peptide. However, a peptide with only repetitive zwitterionic motifs may not be the only answer to the antibiofouling ability.

3.3. Conformation of Peptides.

Figure 4 shows the AlphaFold3 predicted 3D structure of the six peptide candidates and two well-known zwitterionic antibiofouling peptides, (EK)₁₅ and (EKP)₁₀. Five of the six peptide candidates showed some similarity in their secondary structure. Peptides #1, #2, #5, and #6 show a dominating α-helix structure and a random coil near one terminal, and Peptide #3 shows a full helix structure. Peptide #4 is an exception, showing a full random coil structure similar to that of (EKP)₁₀. The disordered structure is expected for (EKP)₁₀ because it possesses many P residues. Peptide #4 possesses only one P, and AlphaFold3 predicts it to have a disordered structure.

Nevertheless, the six peptides show a dynamic secondary structure distribution in the MD simulations. Figure 5 shows the secondary structure distribution for the six peptides and the (EK)₁₅ and (EKP)₁₀ during 300 ns. For the two antibiofouling peptides, as shown in Figure 6a,b, the (EK)₁₅ peptide presents a majority in α-helix, and the (EKP)₁₀ peptide presents a mixture of random coil and ordered structure. The dynamic variation of secondary structure contours implies that the two peptide references do not possess a static secondary structure. Peptide #1 presents a majority in α-helix from residue 10 to 30 and a random coil at the C-terminal (Figure 5c), consistent with the AlphaFold3 prediction. Peptide #2 presents a mixture of α-helix, Turn, and random coil. Peptide #3 presents a majority of α-helices, consistent with AlphaFold3 prediction. The MD simulation result of Peptide #4 shows a discrepancy with the AlphaFold3 prediction. While Alpha-Fold3 predicts a conformation full of a random coil, the MD simulation indicates that Peptide #4 would adapt to an α-helix structure in residues 19–23. Peptide #5 presents another discrepancy between AlphaFold3 prediction and MD simulation results. While AlphaFold3 predicts Peptide #5 to be primarily in the α-helix, MD simulations indicate that only one-third of the residues are in the α-helix. Peptide #6 also presents diversity in conformation beyond the AlphaFold3 prediction.

Figure 6. — Solvent accessible surface area (SASA) of the peptides (A) hydrophilic (Gln, Thr, Asp, Glu, Ser, Asn, Arg, His, Lys, Cys, and Tyr) and (B) hydrophobic (Met, Gly, Val, Leu, Ile, Trp, Ala, Phe, and Pro).

The discrepancies between AlphaFold3 predictions and MD simulation results remind us that we must be cautious about drawing any conclusion on the peptide 3D conformation based on AlphaFold3 predictions. First, AlphaFold3 may predict a false structure for such short peptides. Second, and more importantly, AlphaFold3 predicts a static conformation, while peptides are expected to be dynamic and to change their conformation as a function of time. MD simulation trajectories may be more feasible for evaluating the conformations of these peptides.

3.4. Interfacial Properties.

The interfacial properties of peptides are expected to play an essential role in their antibiofouling ability. We will first analyze the interfacial properties of peptides using solvent-accessible surface area (SASA). The value of the SASA quantifies the exposure area of a peptide. Figure 6 shows the SASA of the hydrophobic and hydrophilic regions of the six predicted peptides and the two antibiofouling peptide references (EK)₁₅ and (EKP)₁₀.

All six predicted peptides present a hydrophilic dominant exposure area. As shown in Figure 6a, the majority of the SASA of the hydrophilic regions is 20–30 nm² for the six predicted peptides. Peptide #1 presents the highest SASA of the hydrophilic region among the six, and Peptides #2, #3, #5, and #6 show a similar median value of the SASA for the hydrophilic region. Peptide #4 presents a median lower than those of the other five predicted peptides for the SASA of the hydrophilic region. The six predicted peptides all showed a moderate hydrophilic region. As shown in Figure 6b, the six peptides present a SASA ranging from 4 to 10 nm² for the hydrophobic region, much less than that of the hydrophilic region (20–30 nm²). The high ratio of the hydrophilic region could enhance the antibiofouling potential of these predicted peptides.

We then analyze the interfacial properties of these peptides using the number of residue–residue and residue–water hydrogen bonds. The hydrogen bond was defined based on the criteria proposed by the Chandler group. The distance between the O(acceptor) and the O(donor) must be less than 0.35 nm, and the H(donor)–O(donor)–O(acceptor) angle must be less than 30°. Figure 7 shows the number of residue–residue and residue–water hydrogen bonds for the six predicted peptides, (EK)₁₅ (ref. 1) and (EKP)₁₀ (ref. 2)

The eight peptides present a wide range of residue–residue hydrogen bond numbers. As shown in Figure 7a, (EKP)₁₀ only has around 10 hydrogen bonds between residues, while Peptides #1, #3, and (EK)₁₅ possess around 25 hydrogen bonds. This difference in residue–residue hydrogen bonds is consistent with the secondary structure conformations of these peptides. Peptides #1, #3, and (EK)₁₅ are all majority in α-helix, and (EKP)₁₀ is mainly in a random coil structure. Peptides #2, #4, #5, and #6 possess around 20 hydrogen bonds, and the percentages of the ordered secondary structure in these four peptides are between the (EKP)₁₀ and (EK)_15.

All six predicted peptides possess large amounts of hydrogen bonds with water molecules. The hydrogen bonds with water molecules are suggested as one source contributing to the antibiofouling ability of materials. Figure 7b shows the number of hydrogen bonds between the eight peptides and water molecules. (EK)₁₅ possesses around 140 hydrogen bonds with water molecules, the highest among the eight peptides. The six predicted peptides possess around 100 hydrogen bonds with water molecules. This number is lower than that of (EK)₁₅ but is still probably high enough to provide a solid antibiofouling ability to the peptides.

3.5. Interaction with Lysozyme.

We analyzed the minimal distance between the individual residues and lysozyme proteins to explore their antibiofouling potential. Shao and Jiang⁴⁷ show that the distribution of distance between a zwitterionic motif and a protein can be used as a descriptor to analyze the potential of the motif to be developed into antibiofouling materials. We utilize the identical concept to explore the possibility of peptides being developed into antibiofouling materials. First, we analyze the distance distribution between three reference peptides and a lysozyme in a solvent. We use a threshold of 0.4 nm to define whether a residue is attached to the protein (median <0.4 nm). This threshold is determined based on the van der Waals radii of S (0.18 nm), C (0.17 nm), N (0.15 nm), and O (0.15 nm) atoms.

The antibiofouling and fouling peptides present a distinct difference regarding the percentage of residues attached to the lysozyme protein. Figure 8 shows the violin distribution of the minimal distance between individual residues on (EK)₁₅, (EKP)₁₀, and G₃₀ and the lysozyme protein during the MD simulations. The hydrogen atoms are excluded from this calculation. (EK)₁₅ and (EKP)₁₀ are known as antibiofouling peptides, and G₃₀ is the fouling peptide representative. Only two residues in the (EK)₁₅ peptide show a median of <0.3 nm (Figure 8a). The number of residues attached to the lysozyme increases to around one-third in the (EKP)₁₅ peptide, as shown in Figure 8b. Such an increase indicates that implementing proline may compensate for the antibiofouling capacity of the peptide but does not shift it to the biofouling category. On the other hand, G₃₀, the biofouling representative, presents at least two-thirds of the residues attached to the protein. Thus, the number of residues attached to lysozyme could be used to analyze the antibiofouling potential of a peptide.

Figure 8. — Peptide–lysozyme distance distribution. (A) (EK)₁₅, (B) (EKP)₁₀, and (C) G₃₀

We then analyze the antibiofouling potential of the six new peptide sequences based on the residue–lysozyme distance distributions (Figure 9). We utilize these distance distributions to determine whether a residue is attached to the lysozyme protein (<0.4 nm) or not. A peptide is considered to have a higher antibiofouling potential if it possesses fewer residues attached to the lysozyme protein. Based on this criterion, five predicted peptides demonstrate antibiofouling potential similar to (EKP)₁₀, except Peptide #2. Thus, five out of six peptides may present a chance to be developed into antibiofouling materials. Among the five, Peptides #1 and #3 present the highest opportunity because they only have three amino acid residues attached to the lysozyme protein, the lowest number among the five candidates. Peptide #5 may be the second because it has five amino acid residues attached to the lysozyme. Peptides #4 and #6 may be the third because they have ten amino acid residues attached to the lysozyme protein, the highest among the five candidates.

4. CONCLUSION

This work develops and deploys a deep learning-MD simulation throughput to identify antibiofouling peptides from the microbiome library. We developed ten independent deep-learning models to classify short antibiofouling and biofouling peptides by implementing prompt approaches into ESM2, a pretrained foundational protein language model. To enhance the classification task, we developed a model ensemble using the random forest method based on the ten prompted ESM2 models. The search of the global microbiome library identified six antibiofouling peptides. Subsequently, we performed MD simulations to analyze the conformation, interfacial properties, and protein interactions of the six predicted peptides, using known antibiofouling and fouling peptides as references. The MD simulations revealed diversity in the conformation of these peptides and a high capacity of hydrogen bonds with water molecules in the solvent environment. Protein interaction analysis indicates that five of the peptides have a high potential to be developed into antibiofouling materials.

This combination of deep learning and MD simulations highlights the transformative capability of protein language models in peptide discovery, offering an efficient framework to explore vast sequence spaces and predict functional candidates. Incorporating advanced simulation techniques, such as enhanced sampling or free energy calculations, may refine the mechanistic understanding of peptide–surface interactions, enabling the rational design of multifunctional peptides. This approach underscores the untapped potential of integrating deep learning with MD simulations to accelerate material discovery and innovation in biotechnology.

Supplementary Material

supplement

NIHMS2069456-supplement-supplement.pdf^{(349.1KB, pdf)}

ACKNOWLEDGMENTS

We thank Dr. Jin Chen for providing input for machine learning model selection. We want to acknowledge the National Institutes of Health (grant R01LM014510) for the financial support. D.W., S.Z., and D.X. are also partially supported by the National Institutes of Health (grant R35GM126985). S.B and Q.S. also acknowledge the support of the National Science Foundation (No. 2150337). The computation for this work was partially performed on the high-performance computing infrastructure provided by Research Computing Support Services at the University of Missouri. We also thank the University of Kentucky Center for Computational Sciences and Information Technology Services Research Computing for supporting and using associated research computing resources. This work also used Delta-GPU at NCSA through allocation CIS230053 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by the National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.

Footnotes

Supporting Information

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.langmuir.4c04140.

Table S1: hyperparameter of the continuous prompt tunings; Table S2: hyperparameters for random forest-based model ensemble using Scikit-learn; Table S3: the detail of the peptide and lysozyme–peptide systems (PDF)

The authors declare no competing financial interest.

Complete contact information is available at: https://pubs.acs.org/10.1021/acs.langmuir.4c04140

Contributor Information

Ibrahim A. Imam, Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States.

Shea Bailey, Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States; Department of Chemistry and Biochemistry, Butler University, Indianapolis, Indiana 46208, United States.

Duolin Wang, Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, United States.

Shuai Zeng, Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, United States.

Dong Xu, Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211, United States.

Qing Shao, Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States.

REFERENCES

(1).Li Q; Wen C; Yang J; Zhou X; Zhu Y; Zheng J; Cheng G; Bai J; Xu T; Ji J; et al. Zwitterionic biomaterials. Chem. Rev 2022, 122 (23), 17073–17154. [DOI] [PubMed] [Google Scholar]
(2).Asha AB; Chen Y; Narain R Bioinspired dopamine and zwitterionic polymers for non-fouling surface engineering. Chem. Soc. Rev 2021, 50 (20), 11668–11683. [DOI] [PubMed] [Google Scholar]
(3).Jiang C; Wang G; Hein R; Liu N; Luo X; Davis JJ Antifouling Strategies for Selective In Vitro and In Vivo Sensing. Chem. Rev 2020, 120 (8), 3852–3889. [DOI] [PubMed] [Google Scholar]
(4).Xu X; Chang Y; Gong Y; Zhang Y; Yu Y; Peng H; Fu C Recent Advances in Antifouling Surface Polymer Brushes. ACS Appl. Polym. Mater 2024, 6 (1), 1–27. [Google Scholar]
(5).Frutiger A; Tanno A; Hwu S; Tiefenauer RF; Vörös J; Nakatsuka N Nonspecific Binding–Fundamental Concepts and Consequences for Biosensing Applications. Chem. Rev 2021, 121 (13), 8095–8160. [DOI] [PubMed] [Google Scholar]
(6).Heggestad JT; Fontes CM; Joh DY; Hucknall AM; Chilkoti A In Pursuit of Zero 2.0: Recent Developments in Nonfouling Polymer Brushes for Immunoassays. Adv. Mater 2020, 32 (2), 1903285. [DOI] [PMC free article] [PubMed] [Google Scholar]
(7).Liu S; Guo W Anti-Biofouling and Healable Materials: Preparation, Mechanisms, and Biomedical Applications. Adv. Funct. Mater 2018, 28 (41), 1800596. [Google Scholar]
(8).Chen S; Jiang S An New Avenue to Nonfouling Materials. Adv. Mater 2008, 20 (2), 335–338. [Google Scholar]
(9).Durand H; Whiteley A; Mailley P; Nonglaton G Combining Topography and Chemistry to Produce Antibiofouling Surfaces: A Review. ACS Appl. Bio Mater 2022, 5 (10), 4718–4740. [DOI] [PubMed] [Google Scholar]
(10).Ye H; Wang L; Huang R; Su R; Liu B; Qi W; He Z Superior antifouling performance of a zwitterionic peptide compared to an amphiphilic, non-ionic peptide. ACS Appl. Mater. Interfaces 2015, 7 (40), 22448–22457. [DOI] [PubMed] [Google Scholar]
(11).Keefe AJ; Caldwell KB; Nowinski AK; White AD; Thakkar A; Jiang S Screening nonspecific interactions of peptides without background interference. Biomaterials 2013, 34 (8), 1871–1877. [DOI] [PMC free article] [PubMed] [Google Scholar]
(12).Chen S; Cao Z; Jiang S Ultra-low fouling peptide surfaces derived from natural amino acids. Biomaterials 2009, 30 (29), 5892–5896. [DOI] [PubMed] [Google Scholar]
(13).Sakala GP; Reches M Peptide-Based Approaches to Fight Biofouling. Adv. Mater. Interfaces 2018, 5 (18), 1800073. [Google Scholar]
(14).Saha A; Nir S; Reches M Amphiphilic Peptide with Dual Functionality Resists Biofouling. Langmuir 2020, 36 (15), 4201–4206. [DOI] [PubMed] [Google Scholar]
(15).Zhang D; Cao C; Chen Q; Liu J; Liu H; Liu Y; Yuan Y; Liu H; Lin H; Liu R Antifouling zwitterionic poly-β-peptides. Appl. Mater. Today 2022, 27, 101511. [Google Scholar]
(16).Nowinski AK; Sun F; White AD; Keefe AJ; Jiang S Sequence, Structure, and Function of Peptide Self-Assembled Monolayers. J. Am. Chem. Soc 2012, 134 (13), 6000–6005. [DOI] [PMC free article] [PubMed] [Google Scholar]
(17).Ederth T; Lerm M; Orihuela B; Rittschof D Resistance of Zwitterionic Peptide Monolayers to Biofouling. Langmuir 2019, 35 (5), 1818–1827. [DOI] [PubMed] [Google Scholar]
(18).Han R; Li Y; Shi M; Ding C; Luo X Designed Polyhydroxyproline Helical Peptide with Ultrarobust Antifouling Capability for Electrochemical Sensing in Diverse Complex Biological Fluids. Anal. Chem 2023, 95 (50), 18540–18548. [DOI] [PubMed] [Google Scholar]
(19).Guo Y; Xu L; Lin W; Chen S Development of Nonfouling Zwitterionic Copolymerized Peptides Based on Glutamic Acid and Lysine Dimers for Adjustable Enzymatic Degradation. Langmuir 2021, 37 (19), 5776–5782. [DOI] [PubMed] [Google Scholar]
(20).Beyer CD; Thavalingam S; Guseva T; Schardt L; Zimmermann R; Werner C; Dietze P; Bandow JE; Metzler-Nolte N; Rosenhahn A Zwitterionic Peptides Reduce Accumulation of Marine and Freshwater Biofilm Formers. ACS Appl. Mater. Interfaces 2021, 13 (42), 49682–49691. [DOI] [PubMed] [Google Scholar]
(21).Li C; Li M; Qi W; Su R; Yu J Effect of Hydrophobicity and Charge Separation on the Antifouling Properties of Surface-Tethered Zwitterionic Peptides. Langmuir 2021, 37 (28), 8455–8462. [DOI] [PubMed] [Google Scholar]
(22).Shao Q; Jiang S Molecular understanding and design of zwitterionic materials. Adv. Mater 2015, 27 (1), 15–26. [DOI] [PubMed] [Google Scholar]
(23).Jiang S; Cao Z Ultralow-fouling, functionalizable, and hydrolyzable zwitterionic materials and their derivatives for biological applications. Adv. Mater 2010, 22 (9), 920–932. [DOI] [PubMed] [Google Scholar]
(24).White AD; Nowinski AK; Huang W; Keefe AJ; Sun F; Jiang S Decoding nonspecific interactions from nature. Chem. Sci 2012, 3 (12), 3488–3494. [Google Scholar]
(25).White AD; Keefe AJ; Ella-Menye J-R; Nowinski AK; Shao Q; Pfaendtner J; Jiang S Free Energy of Solvated Salt Bridges: A Simulation and Experimental Study. J. Phys. Chem. B 2013, 117 (24), 7254–7259. [DOI] [PubMed] [Google Scholar]
(26).Barrett R; Jiang S; White AD Classifying antimicrobial and multifunctional peptides with Bayesian network models. Pept. Sci 2018, 110 (4), No. e24079. [Google Scholar]
(27).Ruffolo JA; Madani A Designing proteins with language models. Nat. Biotechnol 2024, 42 (2), 200–202. [DOI] [PubMed] [Google Scholar]
(28).Ferruz N; Höcker B Controllable protein design with language models. Nat. Mach. Intell 2022, 4 (6), 521–532. [Google Scholar]
(29).Qiu Y; Wei G-W Artificial intelligence-aided protein engineering: From topological data analysis to deep protein language models. Briefings Bioinf 2023, 24 (5), bbad289. [DOI] [PMC free article] [PubMed] [Google Scholar]
(30).Bepler T; Berger B Learning the protein language: Evolution, structure, and function. Cell Syst 2021, 12 (6), 654–669.e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
(31).Madani A; Krause B; Greene ER; Subramanian S; Mohr BP; Holton JM; Olmos JL; Xiong C; Sun ZZ; Socher R; Fraser JS; Naik N Large language models generate functional protein sequences across diverse families. Nat. Biotechnol 2023, 41 (8), 1099–1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
(32).Levitt M Nature of the protein universe. Proc. Natl. Acad. Sci. U. S. A 2009, 106 (27), 11079–11084. [DOI] [PMC free article] [PubMed] [Google Scholar]
(33).Durairaj J; Waterhouse AM; Mets T; Brodiazhenko T; Abdullah M; Studer G; Tauriello G; Akdel M; Andreeva A; Bateman A; Tenson T; Hauryliuk V; Schwede T; Pereira J Uncovering new families and folds in the natural protein universe. Nature 2023, 622 (7983), 646–653. [DOI] [PMC free article] [PubMed] [Google Scholar]
(34).Bileschi ML; Belanger D; Bryant DH; Sanderson T; Carter B; Sculley D; Bateman A; DePristo MA; Colwell LJ Using deep learning to annotate the protein universe. Nat. Biotechnol 2022, 40 (6), 932–937. [DOI] [PubMed] [Google Scholar]
(35).Richardson L; Allen B; Baldi G; Beracochea M; Bileschi ML; Burdett T; Burgin J; Caballero-Pérez J; Cochrane G; Colwell LJ MGnify: The microbiome sequence data analysis resource in 2023. Nucleic Acids Res 2023, 51 (D1), D753–D759. [DOI] [PMC free article] [PubMed] [Google Scholar]
(36).Barrio-Hernandez I; Yeo J; Jänes J; Mirdita M; Gilchrist CLM; Wein T; Varadi M; Velankar S; Beltrao P; Steinegger M Clustering predicted structures at the scale of the known protein universe. Nature 2023, 622 (7983), 637–645. [DOI] [PMC free article] [PubMed] [Google Scholar]
(37).Koehler Leman J; Szczerbiak P; Renfrew PD; Gligorijevic V; Berenberg D; Vatanen T; Taylor BC; Chandler C; Janssen S; Pataki A; Carriero N; Fisk I; Xavier RJ; Knight R; Bonneau R; Kosciolek T Sequence-structure-function relationships in the microbial protein universe. Nat. Commun 2023, 14 (1), 2351. [DOI] [PMC free article] [PubMed] [Google Scholar]
(38).Robinson SL; Piel J; Sunagawa S A roadmap for metagenomic enzyme discovery. Nat. Prod. Rep 2021, 38 (11), 1994–2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
(39).Santos-Júnior CD; Torres MDT; Duan Y; Rodríguez Del Río Á; Schmidt TSB; Chong H; Fullam A; Kuhn M; Zhu C; Houseman A; et al. Discovery of antimicrobial peptides in the global microbiome with machine learning. Cell 2024, 187 (14), 3761–3778.e16. [DOI] [PMC free article] [PubMed] [Google Scholar]
(40).Liu Y; Zhang D; Ren B; Gong X; Xu L; Feng Z-Q; Chang Y; He Y; Zheng J Molecular simulations and understanding of antifouling zwitterionic polymer brushes. J. Mater. Chem. B 2020, 8 (17), 3814–3828. [DOI] [PubMed] [Google Scholar]
(41).Zheng J; Li L; Tsao H-K; Sheng Y-J; Chen S; Jiang S Strong Repulsive Forces between Protein and Oligo (Ethylene Glycol) Self-Assembled Monolayers: A Molecular Simulation Study. Biophys. J 2005, 89 (1), 158–166. [DOI] [PMC free article] [PubMed] [Google Scholar]
(42).Chen S; Li L; Zhao C; Zheng J Surface hydration: Principles and applications toward low-fouling/nonfouling biomaterials. Polymer 2010, 51 (23), 5283–5293. [Google Scholar]
(43).Hower JC; Bernards MT; Chen S; Tsao H-K; Sheng Y-J; Jiang S Hydration of “Nonfouling” Functional Groups. J. Phys. Chem. B 2009, 113 (1), 197–201. [DOI] [PubMed] [Google Scholar]
(44).He Y; Hower J; Chen S; Bernards MT; Chang Y; Jiang S Molecular Simulation Studies of Protein Interactions with Zwitterionic Phosphorylcholine Self-Assembled Monolayers in the Presence of Water. Langmuir 2008, 24 (18), 10358–10364. [DOI] [PubMed] [Google Scholar]
(45).Shao Q; He Y; White AD; Jiang S Difference in Hydration between Carboxybetaine and Sulfobetaine. J. Phys. Chem. B 2010, 114 (49), 16625–16631. [DOI] [PubMed] [Google Scholar]
(46).Shao Q; Mi L; Han X; Bai T; Liu S; Li Y; Jiang S Differences in cationic and anionic charge densities dictate zwitterionic associations and stimuli responses. J. Phys. Chem. B 2014, 118 (24), 6956–6962. [DOI] [PubMed] [Google Scholar]
(47).Shao Q; Jiang S Influence of Charged Groups on the Properties of Zwitterionic Moieties: A Molecular Simulation Study. J. Phys. Chem. B 2014, 118 (27), 7630–7637. [DOI] [PubMed] [Google Scholar]
(48).Lin Z; Akin H; Rao R; Hie B; Zhu Z; Lu W; Smetanin N; Verkuil R; Kabeli O; Shmueli Y; dos Santos Costa A; Fazel-Zarandi M; Sercu T; Candido S; Rives A Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379 (6637), 1123–1130. [DOI] [PubMed] [Google Scholar]
(49).Lv L; Lin Z; Li H; Liu Y; Cui J; Chen CY-C; Yuan L; Tian Y Prollama: A protein large language model for multi-task protein language processing. arXiv, 2024. [Google Scholar]
(50).Brandes N; Ofer D; Peleg Y; Rappoport N; Linial M ProteinBERT: A universal deep-learning model of protein sequence and function. Bioinformatics 2022, 38 (8), 2102–2110. [DOI] [PMC free article] [PubMed] [Google Scholar]
(51).Heinzinger M; Weissenow K; Sanchez JG; Henkel A; Mirdita M; Steinegger M; Rost B ProstT5: Bilingual Language Model for Protein Sequence and Structure. bioRxiv, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
(52).Wang D; Pourmirzaei M; Abbas UL; Zeng S; Manshour N; Esmaili F; Poudel B; Jiang Y; Shao Q; Chen J; et al. S-PLM: Structure-Aware Protein Language Model via Contrastive Learning Between Sequence and Structure. Adv. Sci 2024, 2404212. [DOI] [PMC free article] [PubMed] [Google Scholar]
(53).Hayes T; Rao R; Akin H; Sofroniew NJ; Oktay D; Lin Z; Verkuil R; Tran VQ; Deaton J; Wiggert M et al.Simulating 500 million years of evolution with a language model. bioRxiv, 2024. [DOI] [PubMed] [Google Scholar]
(54).Zhang Z; Wayment-Steele HK; Brixi G; Wang H; Kern D; Ovchinnikov S Protein language models learn evolutionary statistics of interacting sequence motifs. Proc. Natl. Acad. Sci. U.S.A 2024, 121 (45), No. e2406285121. [DOI] [PMC free article] [PubMed] [Google Scholar]
(55).Ertelt M; Meiler J; Schoeder CT Combining Rosetta Sequence Design with Protein Language Model Predictions Using Evolutionary Scale Modeling (ESM) as Restraint. ACS Synth. Biol 2024, 13 (4), 1085–1092. [DOI] [PMC free article] [PubMed] [Google Scholar]
(56).Jiang K; Yan Z; Di Bernardo M; Sgrizzi SR; Villiger L; Kayabolen A; Kim BJ; Carscadden JK; Hiraizumi M; Nishimasu H; et al. Rapid in silico directed evolution by a protein language model with EVOLVEpro. Science 2024, No. eadr6006. [DOI] [PubMed] [Google Scholar]
(57).Lester B; Al-Rfou R; Constant N The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2021, 3045–3059. [Google Scholar]
(58).Pedregosa F; Varoquaux G; Gramfort A; Michel V; Thirion B; Grisel O; Blondel M; Prettenhofer P; Weiss R; Dubourg V; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res 2011, 12 (null), 2825–2830. [Google Scholar]
(59).Abramson J; Adler J; Dunger J; Evans R; Green T; Pritzel A; Ronneberger O; Willmore L; Ballard AJ; Bambrick J; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630 (8016), 493. [DOI] [PMC free article] [PubMed] [Google Scholar]
(60).Eastman P; Swails J; Chodera JD; McGibbon RT; Zhao Y; Beauchamp KA; Wang LP; Simmonett AC; Harrigan MP; Stern CD; Wiewiora RP; Brooks BR; Pande VS OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PloS Comput. Biol 2017, 13 (7), No. e1005659. [DOI] [PMC free article] [PubMed] [Google Scholar]
(61).Jorgensen WL; Chandrasekhar J; Madura JD; Impey RW; Klein ML Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys 1983, 79 (2), 926–935. [Google Scholar]
(62).Maier JA; Martinez C; Kasavajhala K; Wickstrom L; Hauser KE; Simmerling C ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput 2015, 11 (8), 3696–3713. [DOI] [PMC free article] [PubMed] [Google Scholar]
(63).Humphrey W; Dalke A; Schulten K VMD: Visual molecular dynamics. J. Mol. Graphics 1996, 14 (1), 33–38. [DOI] [PubMed] [Google Scholar]
(64).Berendsen HJC; Postma JPM; Vangunsteren WF; Dinola A; Haak JR Molecular-Dynamics with Coupling to an External Bath. J. Chem. Phys 1984, 81 (8), 3684–3690. [Google Scholar]
(65).Bussi G; Donadio D; Parrinello M Canonical sampling through velocity rescaling. J. Chem. Phys 2007, 126 (1), 014101. [DOI] [PubMed] [Google Scholar]
(66).Parrinello M; Rahman A Crystal-Structure and Pair Potentials - a Molecular-Dynamics Study. Phys. Rev. Lett 1980, 45 (14), 1196–1199. [Google Scholar]
(67).Darden T; York D; Pedersen L Particle Mesh Ewald - an N.Log(N) Method for Ewald Sums in Large Systems. J. Chem. Phys 1993, 98 (12), 10089–10092. [Google Scholar]
(68).Hess B; Bekker H; Berendsen HJC; Fraaije JGEM LINCS: A linear constraint solver for molecular simulations. J. Comput. Chem 1997, 18 (12), 1463–1472. [Google Scholar]
(69).Abraham MJ; Murtola T; Schulz R; Páll S; Smith JC; Hess B; Lindahl E GROMACS: High performance molecular simulations through multi-level parallelism from laptops to super-computers. SoftwareX 2015, 1–2, 19–25. [Google Scholar]
(70).Guntuboina C; Das A; Mollaei P; Kim S; Barati Farimani A Peptidebert: A language model based on transformers for peptide property prediction. J. Phys. Chem. Lett 2023, 14 (46), 10427–10434. [DOI] [PMC free article] [PubMed] [Google Scholar]
(71).Fullam A; Letunic I; Schmidt TS; Ducarmon QR; Karcher N; Khedkar S; Kuhn M; Larralde M; Maistrenko OM; Malfertheiner L; et al. proGenomes3: Approaching one million accurately and consistently annotated high-quality prokaryotic genomes. Nucleic Acids Res 2023, 51 (D1), D760–D766. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

supplement

NIHMS2069456-supplement-supplement.pdf^{(349.1KB, pdf)}

[R1] (1).Li Q; Wen C; Yang J; Zhou X; Zhu Y; Zheng J; Cheng G; Bai J; Xu T; Ji J; et al. Zwitterionic biomaterials. Chem. Rev 2022, 122 (23), 17073–17154. [DOI] [PubMed] [Google Scholar]

[R2] (2).Asha AB; Chen Y; Narain R Bioinspired dopamine and zwitterionic polymers for non-fouling surface engineering. Chem. Soc. Rev 2021, 50 (20), 11668–11683. [DOI] [PubMed] [Google Scholar]

[R3] (3).Jiang C; Wang G; Hein R; Liu N; Luo X; Davis JJ Antifouling Strategies for Selective In Vitro and In Vivo Sensing. Chem. Rev 2020, 120 (8), 3852–3889. [DOI] [PubMed] [Google Scholar]

[R4] (4).Xu X; Chang Y; Gong Y; Zhang Y; Yu Y; Peng H; Fu C Recent Advances in Antifouling Surface Polymer Brushes. ACS Appl. Polym. Mater 2024, 6 (1), 1–27. [Google Scholar]

[R5] (5).Frutiger A; Tanno A; Hwu S; Tiefenauer RF; Vörös J; Nakatsuka N Nonspecific Binding–Fundamental Concepts and Consequences for Biosensing Applications. Chem. Rev 2021, 121 (13), 8095–8160. [DOI] [PubMed] [Google Scholar]

[R6] (6).Heggestad JT; Fontes CM; Joh DY; Hucknall AM; Chilkoti A In Pursuit of Zero 2.0: Recent Developments in Nonfouling Polymer Brushes for Immunoassays. Adv. Mater 2020, 32 (2), 1903285. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] (7).Liu S; Guo W Anti-Biofouling and Healable Materials: Preparation, Mechanisms, and Biomedical Applications. Adv. Funct. Mater 2018, 28 (41), 1800596. [Google Scholar]

[R8] (8).Chen S; Jiang S An New Avenue to Nonfouling Materials. Adv. Mater 2008, 20 (2), 335–338. [Google Scholar]

[R9] (9).Durand H; Whiteley A; Mailley P; Nonglaton G Combining Topography and Chemistry to Produce Antibiofouling Surfaces: A Review. ACS Appl. Bio Mater 2022, 5 (10), 4718–4740. [DOI] [PubMed] [Google Scholar]

[R10] (10).Ye H; Wang L; Huang R; Su R; Liu B; Qi W; He Z Superior antifouling performance of a zwitterionic peptide compared to an amphiphilic, non-ionic peptide. ACS Appl. Mater. Interfaces 2015, 7 (40), 22448–22457. [DOI] [PubMed] [Google Scholar]

[R11] (11).Keefe AJ; Caldwell KB; Nowinski AK; White AD; Thakkar A; Jiang S Screening nonspecific interactions of peptides without background interference. Biomaterials 2013, 34 (8), 1871–1877. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] (12).Chen S; Cao Z; Jiang S Ultra-low fouling peptide surfaces derived from natural amino acids. Biomaterials 2009, 30 (29), 5892–5896. [DOI] [PubMed] [Google Scholar]

[R13] (13).Sakala GP; Reches M Peptide-Based Approaches to Fight Biofouling. Adv. Mater. Interfaces 2018, 5 (18), 1800073. [Google Scholar]

[R14] (14).Saha A; Nir S; Reches M Amphiphilic Peptide with Dual Functionality Resists Biofouling. Langmuir 2020, 36 (15), 4201–4206. [DOI] [PubMed] [Google Scholar]

[R15] (15).Zhang D; Cao C; Chen Q; Liu J; Liu H; Liu Y; Yuan Y; Liu H; Lin H; Liu R Antifouling zwitterionic poly-β-peptides. Appl. Mater. Today 2022, 27, 101511. [Google Scholar]

[R16] (16).Nowinski AK; Sun F; White AD; Keefe AJ; Jiang S Sequence, Structure, and Function of Peptide Self-Assembled Monolayers. J. Am. Chem. Soc 2012, 134 (13), 6000–6005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] (17).Ederth T; Lerm M; Orihuela B; Rittschof D Resistance of Zwitterionic Peptide Monolayers to Biofouling. Langmuir 2019, 35 (5), 1818–1827. [DOI] [PubMed] [Google Scholar]

[R18] (18).Han R; Li Y; Shi M; Ding C; Luo X Designed Polyhydroxyproline Helical Peptide with Ultrarobust Antifouling Capability for Electrochemical Sensing in Diverse Complex Biological Fluids. Anal. Chem 2023, 95 (50), 18540–18548. [DOI] [PubMed] [Google Scholar]

[R19] (19).Guo Y; Xu L; Lin W; Chen S Development of Nonfouling Zwitterionic Copolymerized Peptides Based on Glutamic Acid and Lysine Dimers for Adjustable Enzymatic Degradation. Langmuir 2021, 37 (19), 5776–5782. [DOI] [PubMed] [Google Scholar]

[R20] (20).Beyer CD; Thavalingam S; Guseva T; Schardt L; Zimmermann R; Werner C; Dietze P; Bandow JE; Metzler-Nolte N; Rosenhahn A Zwitterionic Peptides Reduce Accumulation of Marine and Freshwater Biofilm Formers. ACS Appl. Mater. Interfaces 2021, 13 (42), 49682–49691. [DOI] [PubMed] [Google Scholar]

[R21] (21).Li C; Li M; Qi W; Su R; Yu J Effect of Hydrophobicity and Charge Separation on the Antifouling Properties of Surface-Tethered Zwitterionic Peptides. Langmuir 2021, 37 (28), 8455–8462. [DOI] [PubMed] [Google Scholar]

[R22] (22).Shao Q; Jiang S Molecular understanding and design of zwitterionic materials. Adv. Mater 2015, 27 (1), 15–26. [DOI] [PubMed] [Google Scholar]

[R23] (23).Jiang S; Cao Z Ultralow-fouling, functionalizable, and hydrolyzable zwitterionic materials and their derivatives for biological applications. Adv. Mater 2010, 22 (9), 920–932. [DOI] [PubMed] [Google Scholar]

[R24] (24).White AD; Nowinski AK; Huang W; Keefe AJ; Sun F; Jiang S Decoding nonspecific interactions from nature. Chem. Sci 2012, 3 (12), 3488–3494. [Google Scholar]

[R25] (25).White AD; Keefe AJ; Ella-Menye J-R; Nowinski AK; Shao Q; Pfaendtner J; Jiang S Free Energy of Solvated Salt Bridges: A Simulation and Experimental Study. J. Phys. Chem. B 2013, 117 (24), 7254–7259. [DOI] [PubMed] [Google Scholar]

[R26] (26).Barrett R; Jiang S; White AD Classifying antimicrobial and multifunctional peptides with Bayesian network models. Pept. Sci 2018, 110 (4), No. e24079. [Google Scholar]

[R27] (27).Ruffolo JA; Madani A Designing proteins with language models. Nat. Biotechnol 2024, 42 (2), 200–202. [DOI] [PubMed] [Google Scholar]

[R28] (28).Ferruz N; Höcker B Controllable protein design with language models. Nat. Mach. Intell 2022, 4 (6), 521–532. [Google Scholar]

[R29] (29).Qiu Y; Wei G-W Artificial intelligence-aided protein engineering: From topological data analysis to deep protein language models. Briefings Bioinf 2023, 24 (5), bbad289. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] (30).Bepler T; Berger B Learning the protein language: Evolution, structure, and function. Cell Syst 2021, 12 (6), 654–669.e3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] (31).Madani A; Krause B; Greene ER; Subramanian S; Mohr BP; Holton JM; Olmos JL; Xiong C; Sun ZZ; Socher R; Fraser JS; Naik N Large language models generate functional protein sequences across diverse families. Nat. Biotechnol 2023, 41 (8), 1099–1106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] (32).Levitt M Nature of the protein universe. Proc. Natl. Acad. Sci. U. S. A 2009, 106 (27), 11079–11084. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] (33).Durairaj J; Waterhouse AM; Mets T; Brodiazhenko T; Abdullah M; Studer G; Tauriello G; Akdel M; Andreeva A; Bateman A; Tenson T; Hauryliuk V; Schwede T; Pereira J Uncovering new families and folds in the natural protein universe. Nature 2023, 622 (7983), 646–653. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] (34).Bileschi ML; Belanger D; Bryant DH; Sanderson T; Carter B; Sculley D; Bateman A; DePristo MA; Colwell LJ Using deep learning to annotate the protein universe. Nat. Biotechnol 2022, 40 (6), 932–937. [DOI] [PubMed] [Google Scholar]

[R35] (35).Richardson L; Allen B; Baldi G; Beracochea M; Bileschi ML; Burdett T; Burgin J; Caballero-Pérez J; Cochrane G; Colwell LJ MGnify: The microbiome sequence data analysis resource in 2023. Nucleic Acids Res 2023, 51 (D1), D753–D759. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] (36).Barrio-Hernandez I; Yeo J; Jänes J; Mirdita M; Gilchrist CLM; Wein T; Varadi M; Velankar S; Beltrao P; Steinegger M Clustering predicted structures at the scale of the known protein universe. Nature 2023, 622 (7983), 637–645. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] (37).Koehler Leman J; Szczerbiak P; Renfrew PD; Gligorijevic V; Berenberg D; Vatanen T; Taylor BC; Chandler C; Janssen S; Pataki A; Carriero N; Fisk I; Xavier RJ; Knight R; Bonneau R; Kosciolek T Sequence-structure-function relationships in the microbial protein universe. Nat. Commun 2023, 14 (1), 2351. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] (38).Robinson SL; Piel J; Sunagawa S A roadmap for metagenomic enzyme discovery. Nat. Prod. Rep 2021, 38 (11), 1994–2023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] (39).Santos-Júnior CD; Torres MDT; Duan Y; Rodríguez Del Río Á; Schmidt TSB; Chong H; Fullam A; Kuhn M; Zhu C; Houseman A; et al. Discovery of antimicrobial peptides in the global microbiome with machine learning. Cell 2024, 187 (14), 3761–3778.e16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] (40).Liu Y; Zhang D; Ren B; Gong X; Xu L; Feng Z-Q; Chang Y; He Y; Zheng J Molecular simulations and understanding of antifouling zwitterionic polymer brushes. J. Mater. Chem. B 2020, 8 (17), 3814–3828. [DOI] [PubMed] [Google Scholar]

[R41] (41).Zheng J; Li L; Tsao H-K; Sheng Y-J; Chen S; Jiang S Strong Repulsive Forces between Protein and Oligo (Ethylene Glycol) Self-Assembled Monolayers: A Molecular Simulation Study. Biophys. J 2005, 89 (1), 158–166. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] (42).Chen S; Li L; Zhao C; Zheng J Surface hydration: Principles and applications toward low-fouling/nonfouling biomaterials. Polymer 2010, 51 (23), 5283–5293. [Google Scholar]

[R43] (43).Hower JC; Bernards MT; Chen S; Tsao H-K; Sheng Y-J; Jiang S Hydration of “Nonfouling” Functional Groups. J. Phys. Chem. B 2009, 113 (1), 197–201. [DOI] [PubMed] [Google Scholar]

[R44] (44).He Y; Hower J; Chen S; Bernards MT; Chang Y; Jiang S Molecular Simulation Studies of Protein Interactions with Zwitterionic Phosphorylcholine Self-Assembled Monolayers in the Presence of Water. Langmuir 2008, 24 (18), 10358–10364. [DOI] [PubMed] [Google Scholar]

[R45] (45).Shao Q; He Y; White AD; Jiang S Difference in Hydration between Carboxybetaine and Sulfobetaine. J. Phys. Chem. B 2010, 114 (49), 16625–16631. [DOI] [PubMed] [Google Scholar]

[R46] (46).Shao Q; Mi L; Han X; Bai T; Liu S; Li Y; Jiang S Differences in cationic and anionic charge densities dictate zwitterionic associations and stimuli responses. J. Phys. Chem. B 2014, 118 (24), 6956–6962. [DOI] [PubMed] [Google Scholar]

[R47] (47).Shao Q; Jiang S Influence of Charged Groups on the Properties of Zwitterionic Moieties: A Molecular Simulation Study. J. Phys. Chem. B 2014, 118 (27), 7630–7637. [DOI] [PubMed] [Google Scholar]

[R48] (48).Lin Z; Akin H; Rao R; Hie B; Zhu Z; Lu W; Smetanin N; Verkuil R; Kabeli O; Shmueli Y; dos Santos Costa A; Fazel-Zarandi M; Sercu T; Candido S; Rives A Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379 (6637), 1123–1130. [DOI] [PubMed] [Google Scholar]

[R49] (49).Lv L; Lin Z; Li H; Liu Y; Cui J; Chen CY-C; Yuan L; Tian Y Prollama: A protein large language model for multi-task protein language processing. arXiv, 2024. [Google Scholar]

[R50] (50).Brandes N; Ofer D; Peleg Y; Rappoport N; Linial M ProteinBERT: A universal deep-learning model of protein sequence and function. Bioinformatics 2022, 38 (8), 2102–2110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] (51).Heinzinger M; Weissenow K; Sanchez JG; Henkel A; Mirdita M; Steinegger M; Rost B ProstT5: Bilingual Language Model for Protein Sequence and Structure. bioRxiv, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] (52).Wang D; Pourmirzaei M; Abbas UL; Zeng S; Manshour N; Esmaili F; Poudel B; Jiang Y; Shao Q; Chen J; et al. S-PLM: Structure-Aware Protein Language Model via Contrastive Learning Between Sequence and Structure. Adv. Sci 2024, 2404212. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] (53).Hayes T; Rao R; Akin H; Sofroniew NJ; Oktay D; Lin Z; Verkuil R; Tran VQ; Deaton J; Wiggert M et al.Simulating 500 million years of evolution with a language model. bioRxiv, 2024. [DOI] [PubMed] [Google Scholar]

[R54] (54).Zhang Z; Wayment-Steele HK; Brixi G; Wang H; Kern D; Ovchinnikov S Protein language models learn evolutionary statistics of interacting sequence motifs. Proc. Natl. Acad. Sci. U.S.A 2024, 121 (45), No. e2406285121. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] (55).Ertelt M; Meiler J; Schoeder CT Combining Rosetta Sequence Design with Protein Language Model Predictions Using Evolutionary Scale Modeling (ESM) as Restraint. ACS Synth. Biol 2024, 13 (4), 1085–1092. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R56] (56).Jiang K; Yan Z; Di Bernardo M; Sgrizzi SR; Villiger L; Kayabolen A; Kim BJ; Carscadden JK; Hiraizumi M; Nishimasu H; et al. Rapid in silico directed evolution by a protein language model with EVOLVEpro. Science 2024, No. eadr6006. [DOI] [PubMed] [Google Scholar]

[R57] (57).Lester B; Al-Rfou R; Constant N The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2021, 3045–3059. [Google Scholar]

[R58] (58).Pedregosa F; Varoquaux G; Gramfort A; Michel V; Thirion B; Grisel O; Blondel M; Prettenhofer P; Weiss R; Dubourg V; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res 2011, 12 (null), 2825–2830. [Google Scholar]

[R59] (59).Abramson J; Adler J; Dunger J; Evans R; Green T; Pritzel A; Ronneberger O; Willmore L; Ballard AJ; Bambrick J; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630 (8016), 493. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R60] (60).Eastman P; Swails J; Chodera JD; McGibbon RT; Zhao Y; Beauchamp KA; Wang LP; Simmonett AC; Harrigan MP; Stern CD; Wiewiora RP; Brooks BR; Pande VS OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PloS Comput. Biol 2017, 13 (7), No. e1005659. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R61] (61).Jorgensen WL; Chandrasekhar J; Madura JD; Impey RW; Klein ML Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys 1983, 79 (2), 926–935. [Google Scholar]

[R62] (62).Maier JA; Martinez C; Kasavajhala K; Wickstrom L; Hauser KE; Simmerling C ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput 2015, 11 (8), 3696–3713. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R63] (63).Humphrey W; Dalke A; Schulten K VMD: Visual molecular dynamics. J. Mol. Graphics 1996, 14 (1), 33–38. [DOI] [PubMed] [Google Scholar]

[R64] (64).Berendsen HJC; Postma JPM; Vangunsteren WF; Dinola A; Haak JR Molecular-Dynamics with Coupling to an External Bath. J. Chem. Phys 1984, 81 (8), 3684–3690. [Google Scholar]

[R65] (65).Bussi G; Donadio D; Parrinello M Canonical sampling through velocity rescaling. J. Chem. Phys 2007, 126 (1), 014101. [DOI] [PubMed] [Google Scholar]

[R66] (66).Parrinello M; Rahman A Crystal-Structure and Pair Potentials - a Molecular-Dynamics Study. Phys. Rev. Lett 1980, 45 (14), 1196–1199. [Google Scholar]

[R67] (67).Darden T; York D; Pedersen L Particle Mesh Ewald - an N.Log(N) Method for Ewald Sums in Large Systems. J. Chem. Phys 1993, 98 (12), 10089–10092. [Google Scholar]

[R68] (68).Hess B; Bekker H; Berendsen HJC; Fraaije JGEM LINCS: A linear constraint solver for molecular simulations. J. Comput. Chem 1997, 18 (12), 1463–1472. [Google Scholar]

[R69] (69).Abraham MJ; Murtola T; Schulz R; Páll S; Smith JC; Hess B; Lindahl E GROMACS: High performance molecular simulations through multi-level parallelism from laptops to super-computers. SoftwareX 2015, 1–2, 19–25. [Google Scholar]

[R70] (70).Guntuboina C; Das A; Mollaei P; Kim S; Barati Farimani A Peptidebert: A language model based on transformers for peptide property prediction. J. Phys. Chem. Lett 2023, 14 (46), 10427–10434. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R71] (71).Fullam A; Letunic I; Schmidt TS; Ducarmon QR; Karcher N; Khedkar S; Kuhn M; Larralde M; Maistrenko OM; Malfertheiner L; et al. proGenomes3: Approaching one million accurately and consistently annotated high-quality prokaryotic genomes. Nucleic Acids Res 2023, 51 (D1), D760–D766. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Integrating Protein Language Model and Molecular Dynamics Simulations to Discover Antibiofouling Peptides

Ibrahim A Imam

Shea Bailey

Duolin Wang

Shuai Zeng

Dong Xu

Qing Shao

Abstract

Graphical Abstract

1. INTRODUCTION

2. METHODS

2.1. Construct Data Sets.

2.2. Prompt Tuning EMS2 Model for Binary Classification.

Figure 1.

2.3. Random Forest-Based Model Ensemble.

2.4. Molecular Dynamics Simulation.

2.4.1. Molecular Model.

Figure 2.

2.4.2. Simulation Details.

3. RESULTS AND DISCUSSION

3.1. Evaluation of RF-Based Model Ensemble.

Figure 3.

3.2. Deep Learning-Based High-Throughput Search.

Table 1.

3.3. Conformation of Peptides.

Figure 4.

Figure 5.

Figure 6.

3.4. Interfacial Properties.

Figure 7.

3.5. Interaction with Lysozyme.

Figure 8.

Figure 9.

4. CONCLUSION

Supplementary Material

ACKNOWLEDGMENTS

Footnotes

Contributor Information

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases