Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2019 May 20;116(23):11259–11264. doi: 10.1073/pnas.1903376116

Design of self-assembly dipeptide hydrogels and machine learning via their chemical features

Fei Li a,1, Jinsong Han a,1, Tian Cao b, William Lam c, Baoer Fan d, Wen Tang d, Sijie Chen a, Kin Lam Fok c,2, Linxian Li a,2
PMCID: PMC6561259  PMID: 31110004

Significance

Hydrogels maintain great potential for biomedical applications. However, predicting whether a chemical can form a hydrogel simply based on its chemical structure remains challenging. In this study, we developed a combinational approach to obtain a structurally diverse hydrogel library with over 2,000 peptides as a training dataset for machine learning. We calculated their chemical features, including topological and physicochemical properties, and utilized machine learning methods to predict the self-assembly behavior.

Keywords: self-assembly, dipeptide hydrogels, machine learning

Abstract

Hydrogels that are self-assembled by peptides have attracted great interest for biomedical applications. However, the link between chemical structures of peptides and their corresponding hydrogel properties is still unclear. Here, we showed a combinational approach to generate a structurally diverse hydrogel library with more than 2,000 peptides and evaluated their corresponding properties. We used a quantitative structure–property relationship to calculate their chemical features reflecting the topological and physicochemical properties, and applied machine learning to predict the self-assembly behavior. We observed that the stiffness of hydrogels is correlated with the diameter and cross-linking degree of the nanofiber. Importantly, we demonstrated that the hydrogels support cell proliferation in culture, suggesting the biocompatibility of the hydrogel. The combinatorial hydrogel library and the machine learning approach we developed linked the chemical structures with their self-assembly behavior and can accelerate the design of novel peptide structures for biomedical use.


Hydrogels that are cross-linked by three-dimensional networks of modified molecules can maintain a large amount of water without dissolving its own chemical structure, which is very similar to natural tissue. As a result of favorable biocompatibility, hydrogels have great potential in biomedical applications such as drug delivery, tissue engineering, sensing, and cell encapsulation (17). In the past few years, considerable attention has been directed toward the design of peptide-based hydrogels in particular, not only because of their favorable features such as easy synthesis, decoration, biodegradability, and high compatibility, but also due to their wide applications in the biological and medical fields (814). However, to the best of our knowledge, the prediction and design of peptide-based hydrogels is still challenging, which limits our research choices on peptide-based hydrogels (15, 16). Therefore, the design strategy for hydrogels based on peptides is of great significance. Our aim is to reveal the relationship between molecular structure and hydrogel behavior, which can help us to predict and design peptide hydrogels with new chemical structures.

There are approaches using molecular dynamics simulation to model the self-assembly behavior of peptides into different types of nanostructures, including nanofibers, which can subsequently form hydrogels (1719). However, it is difficult to evaluate the actual prediction accuracy of the molecular dynamics simulation methods because only a few positive peptides were selected and synthesized to test whether they could form a hydrogel. Additionally, the current reported synthetic method on 9-fluorenylmethyloxycarbonyl (Fmoc)-peptide is limited to the traditional peptide synthesis method, involving step-by-step protection and deprotection. Since a high-throughput peptide generation method is not available, our first motive is to develop a simple and fast method to generate a library with thousands of peptidelike molecules and then test their abilities to form a hydrogel. Using a rational complexation behavior from either a carboxylic acid or metal ions (20, 21), we plan to design chemical structures that can form a hydrogel at neutral pH without any carboxylic acid groups and divalent or trivalent metal ions. Next, the structure–property relationship between the chemical structures of peptides and their self-assembly properties can be examined by introducing different chemical groups (other than carboxylic acids) into this peptide library.

Deep learning or machine learning has been successfully applied to medical applications with accurate prediction; for example, in the diagnosis of pathology images (2224). However, there are only a few reports on their application in the design of organic materials, and typical prediction accuracy is lower than 50% (25). Most of the work using machine learning for materials design is reported in the field of energy, but reports on their usage for biomaterials design are very limited. To our best knowledge, this is the first time that combinatorial chemistry and machine learning have been used to predict the self-assembly behavior of hydrogels. In this work, our second motive is to develop a machine learning method to link the chemical features of peptides with their self-assembly properties and to predict the gel formation ability based on the two-dimensional chemical structure.

In this work, we developed a peptidelike chemical library based on a Ugi four-component reaction for screening the compounds that can form hydrogels. Selected hydrogels were characterized with a rheometer and transmission electron microscopy (TEM) and were further cultured with an adherent cell line. We generated the chemical features of the whole chemical library and developed the machine learning method to recognize these chemical features and predict whether a chemical structure can form a hydrogel at neutral pH without any divalent or trivalent metal ions. We also summarize the relationship between the molecular structure and gelation property.

Peptide-based hydrogels are usually formed based on the response of the carboxylic acid group toward the metal ions. In this paper, we built a peptide-based library without a carboxylic acid group. For the construction of a comprehensive chemical library as a testing pool, we used 31 monomers, including 8 amines, 8 aldehydes/ketones, 12 Fmoc-amino acids, and 3 isocyanides to synthesize 2,304 compounds via the Ugi reaction as shown in Fig. 1. The reaction was verified via mass spectrometry (MS) (SI Appendix, Figs. S30–S125) and 1H NMR (SI Appendix, Figs. S126–S135) of 96 selected compounds. After the completion of the reactions, organic solvents were removed and phosphate buffered saline (PBS) solution was added to the reaction system. The solution was heated up to 80 °C and then cooled to room temperature to form hydrogels, as shown in Fig. 2E. Structure–property relationships of 81 hydrogels (Fig. 2 AD and G) demonstrate that monomers A12, B7, C6, and D3 were the most possible gelling-like structures to form hydrogels (Fig. 2 AD). We also studied the effect of the potential parameters on hydrogel properties, such as the numbers of hydrogen bond acceptors (nHBAcc) and donors (nHBDon), the number of basic groups (nBase), and the Ghose–Crippen LogKow (ALogP), as shown in Fig. 2 HK. The results demonstrated that compounds with lower nHBAcc, moderate nHBDon, no nBase, and higher ALogP had stronger abilities to generate hydrogels. However, these features are not enough to predict whether a compound can form a hydrogel with a new chemical structure.

Fig. 1.

Fig. 1.

Substrates for the construction of the hydrogel library.

Fig. 2.

Fig. 2.

From screening to the rational design of hydrogels. Statistical data of monomers Fmoc-amino acids (A), amines (B), aldehydes or ketones (C), and isocyanides (D) that formed gels. (E) Method for the preparation of hydrogels. (F) Design of machine learning methods. (G) Screening results of hydrogels (red, gel formed; gray, solution state). (HK) Correlation between hydrogel percentages with nHBAcc (H), nHBDon (I), nBase (J), and ALogP (K).

Machine learning-based artificial intelligence (AI) has proved to be useful for the prediction of human perception by employing a large number of psychophysical datasets (26, 27). However, the prediction for the formation of hydrogels is still challenging. Herein, machine learning was employed to explore the relationship between the molecular skeleton and hydrogel properties for the guidance of designing hydrogels.

First, 2,304 separate chemical structures were produced based on the monomers by the Ugi reaction for the construction of a combinatorial library. Second, PaDEL-Descriptor (28) was employed for the calculation of molecular descriptors and fingerprints from the chemical library that contains all 2,304 structures; ∼7,163,136 effective structural parameters (3,109 structural parameters per molecule) were obtained according to the calculation.

Next, data points were processed with machine learning algorithms (Fig. 2F). We formulated our question as a binary classification problem (i.e., given the structural parameters for each chemical structure): whether a hydrogel can be formed or not. This problem is challenging because our data are highly imbalanced. Only less than 4% (81/2,304) of the chemical structures can form hydrogels. To mitigate the class-imbalance problem, we introduced data resampling as a preprocessing step.

Data resampling is a common approach to handle imbalanced data. We used three common resampling approaches in our models: random oversampling (RO), synthetic minority oversampling technique (SMOTE), and adaptive synthetic sampling (ADASYN). RO is a naïve resampling approach which oversamples the minority class. The sampling strategy generates new samples by randomly sampling with replacement from the available samples. After RO resampling, there are multiple duplicated samples for certain data points. SMOTE is a synthetic sampling method which generates new samples from existing data points. For a given data point in the minority class, SMOTE generates a new sample as a linear combination between the data point and one of its nearest neighbors from the same class (Fig. 3B). ADASYN is an improved version of SMOTE. In ADASYN, the distribution of the minority class is considered in the sampling.

Fig. 3.

Fig. 3.

Machine learning algorithms for gel prediction. (A) Example of random forest and gradient boosting algorithms. (B) Illustration of SMOTE and ADASYN oversampling algorithms. (CE) PR curve and ROC curve calculated from the random forest model (C), the gradient boosting model (D), and the logistic regression model (E).

After data resampling, we applied multiple classification models to our data. We applied an extensive list of classification algorithms, from the linear classifiers such as logistic regression, to the nonlinear classifiers such as a neural network. After tuning the hyperparameters for each algorithm, we found three algorithms shown to possess the best prediction abilities (random forest, gradient boosting, and logistic regression; Fig. 3 A and CE), with gradient boosting being superior to the other two algorithms. We illustrate the precision–recall (PR) curves and receiver operating characteristic (ROC) curves for different methods in Fig. 3. As our data are highly imbalanced (only 4% of the data can form hydrogels), we focused on precision and recall here. Precision is the ratio of correct results to predicted results, while recall is the fraction of correct results in the predicted positive samples. Our methods can achieve precisions of 54%, 57%, and 62% for random forest, logistic regression, and gradient boosting, respectively, at the 50% recall. Moreover, feature importance was calculated and the top 20 descriptors were obtained from the best three machine learning algorithms (Fig. 4). The results indicated that the descriptors monomer1 (Fmoc-amino acids), SpMax1_Bhi (largest absolute of Burden modified eigenvalue), and SpMin1_Bhi (smallest absolute of Burden modified eigenvalue) contribute most to the formation of molecular hydrogels.

Fig. 4.

Fig. 4.

Feature importance (top-20 descriptors) from machine learning algorithms for gel prediction. (AC) Top 20 parameters related to gel formation calculated from random forest algorithm (A), gradient boosting algorithm (B), and logistic regression (C).

The hydrogels with diversified functional groups can exhibit different mechanical properties, which is important for controlled drug release and tissue engineering (2932). We selected typical hydrogels from our algorithm with good temperature-responsive properties to study their rheological properties. As shown in Fig. 5A (also see SI Appendix, Figs. S1–S11), the frequency-dependent oscillatory rheology (γ0 = 0.5%, 0.1 to 100 rad s−1) of selected hydrogels had certified hydrogel-like behavior, where G′ was regnant in the whole process. Meanwhile, different chemical structures showed a variety of rheological properties. Hydrogels 19/PBS, 10/PBS, 21/PBS, 20/PBS, and 64/PBS were selected based on the gradual increase of G′ and G″ values. They displayed a different value of storage and loss of oscillatory shear modulus (G′ and G″). These results reflected their distinction on hardness and elasticity of hydrogels, demonstrating that the peptidelike molecules with multiple functional groups can lead to the difference in rheological behavior. Meanwhile, the mechanical properties (such as elasticity and viscosity) of substrates can influence the morphology, proliferation, and differentiation of stem cells. The increase of G′ (elastic modulus) and G″ (viscous modulus) from compounds 19 to 64 demonstrated that these hydrogels owned a large range of mechanical properties that have potential application in stem cell research (3335). These results also indicated that a series of hydrogels with different rheological behaviors could be largely obtained via a combinatorial approach.

Fig. 5.

Fig. 5.

(A) Frequency-dependent (γ0 = 0.5%, 25 °C) oscillatory shear rheology (Insets: photographs of hydrogels and chemical structures of compounds 19, 10, 21, 20, and 64, from left to right). (Magnification: 5×.) (B) TEM pictures of compound 19/PBS, 10/PBS, 21/PBS, 20/PBS, and 64/PBS hydrogels (from left to right). (Scale bar: 1 μm.)

Since the microstructure of hydrogels can influence their rheological behavior, TEM experiments were performed to characterize their morphology. As shown in Fig. 5B, these compounds in PBS solution exhibited an entangled fibrous network, which is ascribed to the supramolecular self-assembly of these compounds in PBS solution, leading to the formation of hydrogels. Interestingly, consistent with G′ and G″, it was easily observed that the density of nanofibers gradually increased from the 19/PBS hydrogel to the 10/PBS and 21/PBS hydrogels. Meanwhile, in corroboration with their rheological behavior, an increase in the density and the diameter of nanofiber was observed from the 21/PBS hydrogel to the 20/PBS and 64/PBS hydrogels. These results suggest that compounds with different functional groups exhibit differential self-assembly abilities and differentiated morphologies, which in turn leads to their distinct rheological properties.

Finally, we tested the ability of hydrogels to support the culture of TM4 cells, an adherent mouse Sertoli cell line with epithelial cell morphology. We labeled the cell body with CellTracker Green, and the cell nucleus with Hoechst 33342, to visualize the potential changes in cell morphology. As shown in Fig. 6, at day 1 after seeding, a subpopulation of cells in hydrogel 10- and 79-coated dishes demonstrated classical epithelial morphologies, whereas another subpopulation formed small cell clusters. Both hydrogels 10 and 79 support the proliferation of cells as indicated by the increase in cell number from day 1 to day 3 after seeding, suggesting the biocompatibility of these hydrogels.

Fig. 6.

Fig. 6.

Morphologies of cells cultured on AI-designed hydrogels. Representative images showing the morphologies of TM4 cells on indicated hydrogels at day 1 (Left) and day 3 (Right) after seeding. Uncoated glass-bottom dish was used as a control. The cell body was stained with CellTracker Green 5-chloromethylfluorescein diacetate (CMFDA, green) and the nucleus was stained with Hoechst 33342 (blue). (Magnification: 100×.)

In conclusion, we have utilized a combinatorial approach to establish a chemical library and a screen for hydrogel behavior. This approach is highly efficient, allowing high-throughput design and prediction to obtain hydrogels with novel chemical structures and controlled physical properties. We have developed a machine learning approach to study the correlation between chemical features and the ability to form hydrogels of the peptidelike molecules. The machine learning revealed that the structure descriptors based on quantum chemistry exhibit a high correlation with gelling behavior. Importantly, we further showed that the hydrogels designed by this approach can be used in biomedical application such as cell culture. We envision that our combinatorial approach and machine learning method can be used as the design and prediction tools for peptide hydrogels with a controlled physical property for biomedical applications, such as drug delivery and tissue engineering.

Materials and Methods

Materials and methods are detailed in SI Appendix.

Supplementary Material

Supplementary File

Acknowledgments

This project was supported by start-up funds from Ming Wai Lau Centre for Reparative Medicine, Karolinska Institutet; the National Natural Science Foundation of China (Grant 81771639); the Research Grants Council of Hong Kong (Grants 14127316 and 14129316); and start-up funds from the Lo Kwee Seong Foundation.

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1903376116/-/DCSupplemental.

References

  • 1.Balakrishnan B., Banerjee R., Biopolymer-based hydrogels for cartilage tissue engineering. Chem. Rev. 111, 4453–4474 (2011). [DOI] [PubMed] [Google Scholar]
  • 2.Brown T. E., Anseth K. S., Spatiotemporal hydrogel biomaterials for regenerative medicine. Chem. Soc. Rev. 46, 6532–6552 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Choi M., et al. , Light-guiding hydrogels for cell-based sensing and optogenetic synthesis in vivo. Nat. Photonics 7, 987–994 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Li J., Mooney D. J., Designing hydrogels for controlled drug delivery. Nat. Rev. Mater. 1, 16071 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Li L., Yan B., Yang J., Chen L., Zeng H., Novel mussel-inspired injectable self-healing hydrogel with anti-biofouling property. Adv. Mater. 27, 1294–1299 (2015). [DOI] [PubMed] [Google Scholar]
  • 6.Li Y., Rodrigues J., Tomás H., Injectable and biodegradable hydrogels: Gelation, biodegradation and biomedical applications. Chem. Soc. Rev. 41, 2193–2221 (2012). [DOI] [PubMed] [Google Scholar]
  • 7.Yu L., Ding J., Injectable hydrogels as unique biomedical materials. Chem. Soc. Rev. 37, 1473–1481 (2008). [DOI] [PubMed] [Google Scholar]
  • 8.Banwell E. F., et al. , Rational design and application of responsive alpha-helical peptide hydrogels. Nat. Mater. 8, 596–600 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kisiday J., et al. , Self-assembling peptide hydrogel fosters chondrocyte extracellular matrix production and cell division: Implications for cartilage tissue repair. Proc. Natl. Acad. Sci. U.S.A. 99, 9996–10001 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Luo Z., et al. , A powerful CD8+ T-cell stimulating D-tetra-peptide hydrogel as a very promising vaccine adjuvant. Adv. Mater. 29, 1601776 (2017). [DOI] [PubMed] [Google Scholar]
  • 11.Lutolf M. P., Hubbell J. A., Synthetic biomaterials as instructive extracellular microenvironments for morphogenesis in tissue engineering. Nat. Biotechnol. 23, 47–55 (2005). [DOI] [PubMed] [Google Scholar]
  • 12.Madl C. M., et al. , Maintenance of neural progenitor cell stemness in 3D hydrogels requires matrix remodelling. Nat. Mater. 16, 1233–1242 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Moore A. N., Hartgerink J. D., Self-assembling multidomain peptide nanofibers for delivery of bioactive molecules and tissue regeneration. Acc. Chem. Res. 50, 714–722 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Yan C., Pochan D. J., Rheological properties of peptide-based hydrogels for biomedical and other applications. Chem. Soc. Rev. 39, 3528–3540 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Frederix P. W., et al. , Exploring the sequence space for (tri-)peptide self-assembly to design and discover new hydrogels. Nat. Chem. 7, 30–37 (2015). [DOI] [PubMed] [Google Scholar]
  • 16.Weiss R. G., The past, present, and future of molecular gels. What is the status of the field, and where is it going? J. Am. Chem. Soc. 136, 7519–7530 (2014). [DOI] [PubMed] [Google Scholar]
  • 17.Lee O. S., Cho V., Schatz G. C., Modeling the self-assembly of peptide amphiphiles into fibers using coarse-grained molecular dynamics. Nano Lett. 12, 4907–4913 (2012). [DOI] [PubMed] [Google Scholar]
  • 18.Lee O. S., Stupp S. I., Schatz G. C., Atomistic molecular dynamics simulations of peptide amphiphile self-assembly into cylindrical nanofibers. J. Am. Chem. Soc. 133, 3677–3683 (2011). [DOI] [PubMed] [Google Scholar]
  • 19.Frederix P. W., Ulijn R. V., Hunt N. T., Tuttle T., Virtual screening for dipeptide aggregation: Toward predictive tools for peptide self-assembly. J. Phys. Chem. Lett. 2, 2380–2384 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Du X., Zhou J., Shi J., Xu B., Supramolecular hydrogelators and hydrogels: From soft matter to molecular biomaterials. Chem. Rev. 115, 13165–13307 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zou R., et al. , Peptide self-assembly triggered by metal ions. Chem. Soc. Rev. 44, 5200–5219 (2015). [DOI] [PubMed] [Google Scholar]
  • 22.Deo R. C., Machine learning in medicine. Circulation 132, 1920–1930 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ehteshami Bejnordi B., et al. ; The CAMELYON16 Consortium , Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kermany D. S., et al. , Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122–1131.e9 (2018). [DOI] [PubMed] [Google Scholar]
  • 25.Nagasawa S., Al-Naamani E., Saeki A., Computer-aided screening of conjugated polymers for organic solar cell: Classification by random forest. J. Phys. Chem. Lett. 9, 2639–2646 (2018). [DOI] [PubMed] [Google Scholar]
  • 26.Keller A., et al. ; DREAM Olfaction Prediction Consortium , Predicting human olfactory perception from chemical features of odor molecules. Science 355, 820–826 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Poivet E., et al. , Functional odor classification through a medicinal chemistry approach. Sci. Adv. 4, eaao6086 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Yap C. W., PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 32, 1466–1474 (2011). [DOI] [PubMed] [Google Scholar]
  • 29.Burdick J. A., Murphy W. L., Moving from static to dynamic complexity in hydrogel design. Nat. Commun. 3, 1269 (2012). [DOI] [PubMed] [Google Scholar]
  • 30.Camci-Unal G., Annabi N., Dokmeci M. R., Liao R., Khademhosseini A., Hydrogels for cardiac tissue engineering. NPG Asia Mater. 6, e99 (2014). [Google Scholar]
  • 31.Hoare T. R., Kohane D. S., Hydrogels in drug delivery: Progress and challenges. Polymer 49, 1993–2007 (2008). [Google Scholar]
  • 32.Seliktar D., Designing cell-compatible hydrogels for biomedical applications. Science 336, 1124–1128 (2012). [DOI] [PubMed] [Google Scholar]
  • 33.Banerjee A., et al. , The influence of hydrogel modulus on the proliferation and differentiation of encapsulated neural stem cells. Biomaterials 30, 4695–4699 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Charrier E. E., Pogoda K., Wells R. G., Janmey P. A., Control of cell morphology and differentiation by substrates with independently tunable elasticity and viscous dissipation. Nat. Commun. 9, 449 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Engler A. J., Sen S., Sweeney H. L., Discher D. E., Matrix elasticity directs stem cell lineage specification. Cell 126, 677–689 (2006). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES