Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2017 Aug 1.
Published in final edited form as: Comput Methods Programs Biomed. 2016 Apr 28;132:93–103. doi: 10.1016/j.cmpb.2016.04.025

A review of the applications of data mining and machine learning for the prediction of biomedical properties of nanoparticles

David E Jones 1, Hamidreza Ghandehari 2,3, Julio C Facelli 1,3,*
PMCID: PMC4902872  NIHMSID: NIHMS782520  PMID: 27282231

Abstract

This article presents a comprehensive review of applications of data mining and machine learning for the prediction of biomedical properties of nanoparticles of medical interest. The papers reviewed here present results of research using these techniques to predict the biological fate and properties of a variety of nanoparticles relevant to their biomedical applications. These include the influence of particle physicochemical properties on cellular uptake, cytotoxicity, molecular loading, and molecular release in addition to manufacturing properties like nanoparticle size, and polydispersity. Overall the results are encouraging and suggest that as more systematic data from nanoparticles becomes available, machine learning and data mining would become a powerful aid in the design of nanoparticles for biomedical applications. There is however the challenge of great heterogeneity in nanoparticles, which will make these discoveries more challenging than for traditional small molecule drug design.

Keywords: Nanomedicine, Nanoinformatics, Data Mining, Machine Learning

Introduction

The field of nanomedicine, which focuses on the use of nanoparticles and nanotechnology in the bio-medical domain, is increasingly becoming a very active field of research. To date only a few nanoparticle systems have received FDA approval (1, 2) and there is a great deal of interest in accelerating the translation of nanoscience bench scientific discoveries into clinical practice. While nanomedicine research has well developed experimental protocols, the corresponding informatics support for nanomedicine is less developed and there is a substantial lack of authoritative sources of information accessible to non-informatics specialists (3). Increasing the use QSAR (quantitative structure activity relationships) methods in nanomedcine, aka “nano-QSARs” (4, 5) and other predictive models in the field of nanomedicine can greatly accelerate the translational process (4, 6, 7). However information about exciting nano-QSAR approaches is not readily available to non-informatics specialists interested in nanomedicine The goals of this review are to: (1) review research involving the use of data mining and machine learning for the prediction of biomedical properties of nanoparticles of medical interest and (2) examine the progress and challenges that this relatively new field of nano-QSAR methodology faces to become a major contributor toward the development of effective nanomedicines.

A comprehensive search of the existing literature in the field of nanomedicine referencing the use of data mining and/or machine learning techniques was conducted in the fall of 2015. Both Scopus and PubMed were accessed using the search criteria, “nanomedicine AND ((data mining) OR (machine learning)).” Upon retrieving the initial set of articles, they were reviewed to assess content as well as to gather additional references from related publications. The methods and results reported in these publications are discussed in the following sections. While in the discussion section we present the authors’ perspective about the successes and remaining challenges when using artificial intelligence and data mining for the prediction of biomedical properties of nanoparticles of medical interest.

The research papers covered in this review focus on applications of data mining and machine learning to nanoinformatics with the goals of developing predictive models for a variety of nanoparticle properties and their biological effects. The material is divided into two main sections, one discussing papers in which some of the biomedical effects of nanoparticles are predicted using nanoparticle properties/conditions and the other one discussing papers that using the aforementioned techniques, attempt to predict actual molecular or aggregate properties of nanomaterials based on their composition and processing.

Predicted Properties

In this section the material is organized into sections covering the properties discussed above as follows: cellular uptake, cytotoxicity, molecular loading, molecular release, nanoparticle adherence, nanoparticle size, and polydispersity.

Cellular Uptake

Significant efforts have been made in the field of nanomedicine to understand and improve cellular uptake and targeting. This is primarily driven by the desire to use nanoparticles to treat cancer, by using them to deliver biologically active compounds specifically to cancerous cells (8-11). The use of predictive models and nano-QSARs in this area could be very beneficial because the development cost of novel nanoparticles with the desired properties is quite high and the design space is quite large. Any computational tool that can assist in reducing the design space by quantitatively predicting the characteristics of desirable molecules before synthesis, would allow researchers to dedicate limited resources toward performing experimental work on the most promising candidates.

Two papers have reported the use of data mining and machine learning to predict cellular uptake of nanoparticles. Both papers examined the cellular uptake data of cross-linked iron oxide (CLIO) nanoparticles from the paper by Weissleder (10).

Fourches et al. (6) developed a method for predicting the cellular uptake of cross-linked iron oxide (CLIO) nanoparticles, with a variety of small organic molecules decorating their surface, by human pancreatic cancer cells (PaCa2) as a function of the nanoparticle properties. For the 109 organic compounds in their study, they calculated 150 two-dimensional MOE descriptors (using commercial software distributed by the Chemical Computing Group), which included surface areas, physical properties, Kier & Hall connectivity indices, kappa shape indices, atom and bond counts, adjacency and distance matrix descriptors, molecular charges, and pharmacophore feature descriptors. Their method utilized a 5-fold cross validation k Nearest Neighbors (kNN) regression as the prediction algorithm. kNN is an algorithm whose central concept is that the activity of a certain compound can be predicted by examining the average activities of k compounds from the dataset that share chemical similarity with the compound (12, 13) under consideration. Initial results for their model showed an R2 value of 0.72, but when they applied an applicability domain criterion and removed compounds that were outside of the domain under consideration, the R2 value improved to 0.77. They found that the most important features in this model are associated with the lipophilicity of the compounds. In both models, the observation was made that the more lipophilic the molecule bound to the CLIO nanoparticle, the greater the cellular uptake.

Winkler et al. (14) examined the ability of machine learning techniques to predict cellular uptake of CLIO nanoparticles, decorated also with a variety of small molecules, by human umbilical vein endothelial cells (HUVEC) and PaCa2. The dataset consisted of 108 samples, which were split into 21 samples for the test set and 87 samples for the training set. Two-dimensional DRAGON descriptors were calculated for the decorated CLIO nanoparticles (15) and two different models were developed, a linear and nonlinear nano-QSAR model. A multiple linear regression algorithm along with an expectation minimization algorithm with a sparse (Laplacian) prior was used to develop the sparse linear nano-QSAR model (16). A Bayesian regularized neural network with either a Gaussian or Laplacian prior was used to develop the sparse nonlinear nano-QSAR model (17-19). The linear and nonlinear models for HUVEC uptake utilized 11 of the DRAGON descriptors, and yielded R2 values of 0.63 and 0.66, respectively. The linear and nonlinear models for PaCa2 uptake utilized 19 of the DRAGON descriptors, and yielded R2 values of 0.79 and 0.54, respectively (see Fig. 1).

Figure 1.

Figure 1

Predicted vs. measured log nanoparticle uptake for PaCa2 cells. Training set (circles), test set (triangles). From reference (14), with permission.

The authors reported little to no overlap in the sets of DRAGON descriptors used for the HUVEC and PaCa2 cellular uptake models, and concluded that this suggests that different mechanisms for cellular uptake may be utilized by these two cell types (20). The authors of this paper suggest that poor results observed in the macrophage and macrophage-like cells may be due either to the small size or surface modifications of the nanoparticles used in this study.

Both of these studies, Fourches et al. (6) and Winkler et al. (14), reported similar results for their best performing method to predict PaCa2 cell line uptake (0.77 vs. 0.79), even though they used slightly different predictive methods and molecular descriptors. More than likely the reason for this is that they used the same dataset of 109 fluorescent nanoparticles taken from a study performed by Weissleder (10).

Cytotoxicity

Predicting cytotoxicity of nanoparticles has been the most common application of data mining and machine learning to research in nanoinformatics. Cytotoxicity to non-target cells is a major concern in nanomedicine (21, 22), because the use of nanoparticles for human treatment is contingent upon low cytotoxicity of the carriers at the needed therapeutic doses. Toxicity is also a serious concern for nanoparticles used in consumer products due to their potential environmental impact (23). The ability to predict cytotoxicity via in silico approaches is highly desirable, because of the potentially high payoff in nanomaterial design and development of prescreening for toxicity. This can result in shifting limited development resources into the synthesis and testing of nanoparticles with predicted low cytotoxicity (24) that are more likely to be suitable for human treatment or consumption.

Experimentally, cytotoxicity can be measured by a number of in vitro toxicity assays that can infer cytotoxicity by examining different cellular parameters including, but not limited to, oxidative stress, inflammatory response, genotoxicity, and cell viability (25). The articles reviewed in this area of data mining and machine learning research report results on several different nanoparticle types, cell types, and cytotoxicity analysis methods. A summary of the systems studied, methods, and findings from these research articles is given in Table 1.

Table 1.

Summary of the systems studied, methods, and findings from papers using data mining and machine learning to predict cytotoxicity of nanoparticles. The evaluation results correspond to the most successful model reported in the publication.

First Author Nanoparticle Type Cell Type Cytotoxicity Analysis Predictive Method Accuracy
Sayes (26) Metal oxide nanoparticles (TiO2) Rat lung alveolar macrophages and immortalized rat L2 lung epithelial cells LDH Release LDA classification R2 = 0.77
Puzyn (5) Metal oxide nanoparticles (ZnO, CuO, V2O3, Y2O3, Al2O3, Fe2O3, SiO2, ZrO2, SnO2, TiO2, CoO, NiO, Cr2O3, and La2O3) E. coli EC50 Multiple regression method R2 = 0.85
Liu (27) Metal oxide nanoparticles (Al2O3, CeO2, Co3O4, TiO2, ZnO, CuO, SiO2, Fe3O4, and WO3) BEAS-2B Plasma membrane Integrity Logistic regression models Accuracy = 100%
Horev-Azaria (28) Metal oxide nanoparticle (CoFe2O4) A549, NCI H441, HepG2, MDCK, Caco-2 TC7, TK6, and primary mouse dendritic-cells Cell viability J48 Accuracy = 92.5%
Winkler (14) Metal oxide nanoparticle (CLIO) Endothelial muscle cells, smooth muscle cells, hepatocytes, and monocytes Smooth Muscle Apoptosis Bayesian neural networks R2 = 0.90
Toropova (29) Metal oxide nanoparticles E. coli pLC50 Monte Carlo Method R2 = 0.9835
Fourches (6) Metal nanoparticles (CLIO, pseudo caged, monocrystalline iron oxide, CdSe core quantum dot, and iron-based) Monocytes, hepatocytes, endothelial cells, and smooth muscle cells Biological activity profiles Support vector machine-based classification Accuracy = 88%
Liu (30) Metal nanoparticles, dendrimer, metal oxide, and polymeric materials Embryonic zebrafish 24 hour post-fertilization mortality IBK Accuracy = 83.7%
Jones (24) PAMAM dendrimers Caco-2 Cell viability J48 Accuracy = 83.5%

Many of the articles reported in Table 1 report cytotoxicity prediction of metal oxide nanoparticles. Sayes and Ivanov (26) used linear discriminant analyses (LDA) classification (31) and multivariate linear regression to predict lactate dehydrogenase (LDH) release from rat lung alveolar macrophages and immortalized rat L2 lung epithelial cells, caused by exposure to titanium dioxide (TiO2) and zinc oxide (ZnO) nanoparticles (26). LDH release is indicative of cell membrane damage and ultimately cell death. The TiO2 nanoparticles were characterized by five different physiochemical properties that were experimentally measured and then used as feature descriptors in the models, these properties are: engineered size, size in water, size in PBS, concentration, and zeta potential. The ZnO nanoparticles were characterized by six different physiochemical properties that were experimentally measured, engineered size, size in water, size in PBS, size in CCM, concentration, and zeta potential. For both the TiO2 and ZnO nanoparticles, all possible combinations of descriptors were analyzed for the predictive models. The dataset consisted of a total of 42 samples, 24 TiO2 nanoparticle samples at different concentrations ranging from 25-200 mg/L and 18 ZnO nanoparticle samples at different concentrations ranging from 25-100 mg/L. TiO2 and ZnO nanoparticle sample sets were analyzed independently. The multivariate linear regression analysis of the TiO2 nanoparticles yielded R2 values ranging from 0.15-0.70 with the highest performance being observed when all possible descriptors were utilized, which may be an indication of overfitting. The LDA analysis of the TiO2 nanoparticles yielded R2 values ranging from 0.70-0.77. Due to the observed correlation between the different size measurements for ZnO, only all possible combinations of engineered size, concentration, and zeta potential were examined in the multivariate linear regression analysis predictive models of ZnO. The analysis yielded R2 values ranging from 0.19-0.49 with the highest performance model being obtained when all these descriptors were utilized, leading the authors to conclude that their dataset did not have enough data to obtain accurate predictions of LDH release for ZnO. The authors also acknowledge that for ZnO there might be other features that were not present in their dataset that are necessary to obtain better prediction models. This highlights the need of larger well curated data sets to gain a better understanding of the real limitation of nano-QSAR methods.

Puzyn et al. (5) predicted the cytotoxicity, specifically effective concentration of a compound that causes bacterial viability to be reduced by 50%, EC50, of Escherichia coli (E. coli), caused by exposure to 17 different metal oxide nanoparticles: ZnO, CuO, V2O3, Y2O3, Bi2O3, In2O3, Sb2O3, Al2O3, Fe2O3, SiO2, ZrO2, SnO2, TiO2, CoO, NiO, Cr2O3, and La2O3. The MOPAC 2009 software package was used to calculate 12 different molecular descriptors (standard heat of formation of the oxide cluster, total energy of the oxide cluster, electronic energy of the oxide cluster, core-core repulsion energy of the oxide cluster, area of the oxide cluster calculated, volume of the oxide cluster calculated, energy of the highest occupier molecular orbital of the oxide cluster, energy of the lowest unoccupied molecular orbital of the oxide cluster, energy difference between HOMO and LUMO energies, enthalpy of detachment of metal cations Men+ from the cluster surface, enthalpy of formation of a gaseous cation, and lattice energy of the oxide) of the metal oxide nanoparticles. A multiple regression method was combined with a genetic algorithm to find the best model for the prediction of cytotoxicity (32). Selection of the best combination of calculated descriptors was performed by the genetic algorithm, which found that the enthalpy of formation of a gaseous cation having the same oxidation state than in the metal oxide structure, ΔHMe+, is the best descriptor. The multiple regression using this descriptor reached an R2 value of 0.85, with an externally validated regression coefficient, Q2ext, of 0.83, and an RMS error of 0.19. The authors concluded that their model can be used to predict the toxicity of novel, untested metal oxide nanoparticles, however this only applies if the structure is not significantly different from the metal oxide nanoparticles in the training set, limiting the generalizability of this approach.

Using logistic regression models, Liu et al. (27) classified cytotoxicity by examining the plasma membrane integrity when transformed bronchial epithelial cells (BEAS-2B) were exposed to nine different metal oxide nanoparticles (Al2O3, CeO2, Co3O4, TiO2, ZnO, CuO, SiO2, Fe3O4, and WO3). For the development of the model, a set of 10 nanoparticle descriptors was selected and measured experimentally. These descriptors include simple constitutional descriptors (number of oxygen atoms in the metal oxide, number of metal atoms in the metal oxide, metal oxide molecular weight, and atomic mass of the metal); stability and reactivity information (atomization energy); element group properties (periodic table group and period of the metal in the metal oxide); simple geometric descriptor (nanoparticle primary size); and indicators of surface charge and aggregation tendency (zeta potential and isoelectric point). Additional experimental conditions were taken into account by adding measured values for a set of four different concentrations as input parameters of the model. The paper does not specifically state the number of samples used in the dataset, however it appears that 83 samples were used. All possible combinations of the descriptors and concentrations were analyzed for their nano-QSAR models, which generated accuracies ranging from 93 to 100%. The atomization energy of the metal oxide, nanoparticle size, nanoparticle volume fraction, and period of the metal in the nanoparticle were the four descriptors used in the best performing model. The authors observed that the atomization energy had the greatest contribution to the model showing that as the atomization energy decreases, the toxicity of the metal oxide nanoparticle increases. They argue that this could be explained by the decrease in stability of the metal oxide nanoparticle and the increase of its reactivity. The authors of this paper were impressed by their results but stated that it is necessary to expand the experimental dataset used in order to increase confidence and improve the reliability of their results as the high accuracies reported may be consequence of either overfitting or perhaps lack of diversity in the reference data.

Horev-Azaria et al. (28) used a J48 classification model to predict cytotoxicity, measured as cell viability using a binary classification of toxic or non-toxic, of cobalt ferrite nanoparticles on seven different cell lines (A549, NCI H441, HepG2, MDCK, Caco-2 TC7, TK6, and primary mouse dendritic-cells) and precision-cut rat lung slices. J48 is a decision tree classifier, which is based on the C4.5 algorithm (33). The paper does not specifically state the number of samples used in the dataset, however it appears that 151 samples were used. Their model involved the use of a ten-fold cross-validation that was tested for three different, decreasing cell viability values: 30%, 25%, and 20%. The accuracy of their model reached 92.5%, 89%, and 85.2% for the 30%, 25%, and 20% cell viability values, respectively. The J48 decision tree shows that the most important descriptor used in making the predictions was the concentration of cobalt ferrite nanoparticles. Also two experimental conditions were present in the decision tree for making the predictions, cell-type and exposure time, but no intrinsic nanoparticle properties were found of importance in the model. The authors of this paper indicated that their study is restricted to a specific type of nanoparticle and the cell lines used, and to make this model more generalizable, it would require a significantly larger database of different nanoparticles and cell lines.

Winkler et al. (14) reported the use of Bayesian neural networks and multiple linear regression models to predict smooth muscle cell apoptosis caused by 50 different CLIO nanoparticles on endothelial muscle cells, smooth muscle cells, hepatocytes, and monocytes. The data also consists of four biological assays to determine toxicity and four different concentrations of nanoparticles used, yielding a sample size of 3,200 samples. The two models utilized here are presented above in the Cellular Uptake section of the review. The linear and non-linear models tested in this work achieved R2 values of 0.86 and 0.90, respectively. The authors observed that the nanoparticles core material, surface coating type, and surface charge were the most important features needed to make accurate predictions of the smooth muscle apoptosis caused by CLIO nanoparticles.

Using a Monte Carlo method, Toropova et al. (29) built a nano-QSAR model to predict pLC50 values, which are the negative decimal logarithm of the lethal concentration of nanoparticle that causes 50% of the original bacterial population to die, induced by metal oxide nanoparticles in E. coli. The paper does not specifically state the number of samples used in the dataset. The authors utilized quasi-SMILES as their calculated descriptors for the model. The data for their study was split into six different datasets, where each dataset was used as a training, calibration, or testing set. Their models yielded R2 values ranging from 0.73 to 0.98. The authors of this paper state that their method of distributing data into training, calibration, and validation sets significantly influences the results of their study, and suggest that data should be distributed into training and external validation sets instead to improve reliability of the study.

Fourches et al. (6) used support vector machine-based classification to predict cytotoxicity, as a binary value, toxic/nontoxic, of a variety of metal nanoparticles (CLIO, pseudocaged, monocrystalline iron oxide, CdSe core quantum dots, and iron-based) on four different cell lines (monocytes, hepatocytes, endothelial cells, and smooth muscle cells). Their model utilized experimentally measured attributes, describing the nanoparticles (nanoparticle size, zeta potential, and relaxivity which represents the magnetic properties of the nanoparticle). The biological activity profile was represented by the dose, cell line, and assay utilized for the nanoparticle; all of these values were used to create an arithmetic mean which was then used to create the binary classification to be used by the model as toxic or non-toxic. The number of samples used in this study is not explicitly stated, however it can be estimated to be 3,264 samples. A five-fold cross-validation method was used for their model and prediction accuracies ranging from 56% to 88% were achieved by their model. The authors of this paper suggest that exploration of many different approaches will be necessary to identify and predict relationships between metal nanoparticle structures and their biological activity in order to provide more generalizable nano-QSAR relationships.

Liu et al. (30) used a variety of algorithms (IBK, Bagging, M5P, and KStar) in an effort to predict embryonic zebrafish post-fertilization toxic effects of several nanoparticles, including metal nanoparticles, dendrimers, metal oxides, and polymeric materials. IBK is a K-nearest neighbor predictor thatassigns an input to the most common output label among its K nearest neighbors (34). Bagging is a hybrid classification method that creates classes and reduces variance by bagging classifiers (35). M5P is a tree algorithm that generates M5 Model trees and rules (36) and KStar is an instance-based classifier where the test instance’s class is based upon the class of similar training instances (37). The paper does not specifically state the number of samples used in the dataset. Their model used 20 input variables representing the nanoparticle properties (e.g. particle size distribution, structure, surface charge, water solubility, etc.) and experimental conditions (e.g., exposure route, concentration, duration, etc.). The most successful predictions were obtained for the 24-hour post-fertilization mortality, 120-hour post-fertilization mortality, and 120-hour post-fertilization heart malformation with accuracies of 0.84, 0.77, and 0.73, respectively, when using the IBK algorithm. For other prediction models, the accuracies corresponding to the prediction of different properties using a variety of methods are reported in Table 2.

Table 2.

Accuracies reported for models predicting different measurements of toxicity on zebra fish post-fertilization embryos using different methods (30).

Predicted Attribute Algorithm Accuracy
120 hour post-fertilization jaw malformation IBK 0.667
120 hour post-fertilization trunk malformation IBK 0.657
24 hour post-fertilization developmental progression IBK 0.591
120 hour post-fertilization pigmentation IBK 0.565
120 hour post-fertilization eye malformation IBK 0.544
120 hour post-fertilization snout malformation IBK 0.486
120 hour post-fertilization touch response IBK 0.476
120 hour post-fertilization caudal fin malformation IBK 0.441
120 hour post-fertilization yolk sac edema Bagging 0.439
120 hour post-fertilization pectoral fin malformation IBK 0.387
120 hour post-fertilization swim bladder M5P 0.380
120 hour post-fertilization circulation IBK 0.368
120 hour post-fertilization otic malformation IBK 0.331
120 hour post-fertilization brain malformation IBK 0.297
120 hour post-fertilization axis malformation IBK 0.294
120 hour post-fertilization somite malformation Bagging 0.262
24 hour post-fertilization notochord malformation M5P 0.125
24 hour post-fertilization spontaneous movement KStar -0.003

The results of Liu et al. (30) indicate that dosage concentration, shell composition, and surface charge are the most important attributes when analyzing embryonic zebrafish post-fertilization mortality, which agrees with previous bench studies (38, 39). The authors of this paper discussed increasing the size and diversity of the data used for their study to expand and refine the impact of their predictive models.

Jones et al. (24) tested the ability of a variety of algorithms (Naïve Bayes, SMO, J48, Bagging, Classification via Regression, Filtered Classifier, LWL, Decision Table, DTNB, NBTree, and Random Forest) to predict the cytotoxicity, measured as cell viability considered as a binary variable, toxic/non-toxic, of poly(amido amine) (PAMAM) dendrimers on human colorectal cancer cells (Caco-2). Naïve Bayes is a Bayesian classifier which uses posterior probability to predict the value of the target attribute (40), i.e., by using a given input attribute, the classifier attempts to find the target attribute value that maximizes the conditional probability of the target attribute. SMO is a support vector machine classifier that globally replaces all values and transforms nominal attributes into binary ones (41). This method starts with large sets of cases which belong to known classes, cases are analyzed for patterns that allow for reliable discrimination of classes. The patterns are represented as models, either in the form of decision trees or sets of if-then rules which can be used to classify new cases. Classification via regression performs its classification by binarizing each class and building one regression model for each class (42). Filtered classifier is an arbitrary classifier that runs on data passed through an arbitrary filter (43). LWL uses an instance-based algorithm to assign instance weights, the abbreviation stands for locally weighted learning (44). Decision table is a simple decision table majority classifier (45). DTNB is a decision table/Naïve Bayes hybrid classifier. During the search the algorithm determines the need to divide the attributes into two disjoint subsets: one for the decision table, the other for Naïve Bayes (46). NBTree is a decision tree/Naïve Bayes hybrid classifier that builds a decision tree with Naïve Bayes classifiers at the leaves (47). The dataset used in this study consisted of 103 samples. The predictive models utilized 51 molecular descriptors (e.g., molecular weight, pI, molecular polarizability, etc.), which were calculated using MarvinSketch (48). Their models achieved 10-fold cross-validated accuracies ranging from 65.0 to 83.5%. Their best classification models were obtained using the J48 and Bagging methods with 10-fold cross-validated accuracies of 83.5%, for both methods. The decision tree from their J48 classifier (see Figure 2) shows that the descriptors used in making the best prediction were pI, molecular weight, and concentration of PAMAM dendrimer. Indications of the importance of using larger datasets to create more reliable and robust classification models are made by the authors of this paper.

Figure 2.

Figure 2

Decision tree for 10-fold cross-validation J48 classifier of the fourth analysis including the molecular descriptors expert selected with the concentration information of dendrimers used in the experiments. Values present on the branches represent the rule or decision used for making the classification. The boxes at the bottom represent the classifications with the number of PAMAM dendrimers classified as such on the left and the number of exceptions (misclassifications on the right). From Ref. (24) with permission.

Overall, there is a tendency toward developing models for toxicity of metal oxide and metal nanoparticles. There are still many classes of nanoparticles for which no work has been reported on the use of data mining methods to predict their cytotoxicity, such as micelles, liposomes, polymeric nanoparticles, etc. It is encouraging to see that many of the papers concluded that properties related to charge, concentration, and size of nanoparticles are important in developing predictions of cytotoxicity. These properties have been hypothesized to be important indicators of the potential cytotoxicity of nanoparticles (49), but the results compiled in this review provide substantial computational verification. The collection/aggregation of more data regarding cytotoxicity is a definite must for the further development of cytotoxicity prediction methods of nanoparticles.

Molecular Loading

Molecular loading is a very important property for nanoparticles when these nanoparticles are intended to be used as delivery devices of drugs and/or image contrasting agents to specific tissues or cells. Two research articles reported the use of data mining and machine learning techniques to predict the ability to load molecules into nanoparticles. A summary of the findings from these research articles is given in Table 3.

Table 3.

Summary of the data mining and machine learning methods used to predict molecular loading of nanoparticles. The evaluation results correspond to the most successful model reported in the publication.

Primary Author Nanoparticle Type Loaded Molecule Type Loading Type Predictive Method Evaluation Results
Winkler (14) Surface-modified gold nanoparticles Protein Nonspecific protein binding Bayesian neural Networks R2 = 0.94
Shalaby (50) Di- and triblock copolymers of polyethylene glycol and polylactide Noscapine Entrapment efficiency ANN R2 = 0.96484

Winkler et al. (14) explored the use of Bayesian neural networks and multiple linear regression models to predict inhibition of acetylcholinesterase (AChE) or nonspecific adsorption, and nonspecific protein binding to surface-modified gold nanoparticles. The dataset used for this study consisted of 80 samples. Two-dimensional DRAGON descriptors were calculated for the surface-modified gold nanoparticles (15). The two models utilized here are presented above in the Cellular Uptake section of the review.The linear and nonlinear models for AChE inhibition utilized 14 of the DRAGON descriptors and yielded R2 values of 0.81 and 0.80, respectively. The linear and nonlinear models for nonspecific protein binding utilized 10 of the DRAGON descriptors, and yielded R2 values of 0.93 and 0.94, respectively. The nonspecific protein binding correlated only with the concentration of protein and did not exhibit any dependence on the nanoparticle properties. The authors state that for their model the results were good, but care should be taken when making new predictions because there is a need for reasonably sized datasets, high quality data, and high quality descriptors to further verify the generalizability of their models.

Shalaby et al. (50) used artificial neural networks (ANN) to predict the entrapment efficiency of noscapine in di- and triblock co-polymers of poly(ethylene glycol) and poly(lactide). The number of samples used in this study is not explicitly stated. The experimentally measured input variables used by their model were the molecular weight of the polymer, the ratio of polymer to drug, and number of blocks per polymer. Their model yielded an overall R2 value 0.91486 for entrapment efficiency of noscapine predictions. The ratio of polymer to drug was the most important feature for the predictions of noscapine entrapment efficiency. Experimentally, similar results have been seen in poly(lactide) (PLA) and poly(ethylene glycol) (PEG) (51).

Both research articles showed promising results in the use of data mining and machine learning to predict the ability to load molecules to nanoparticles. The two articles reported R2 values above 0.90, and showed clear evidence of first order reaction dynamics for the entrapment process, as the most important feature for predicting the ability to load molecules to nanoparticles correlates with the amount of the molecules available in solution to be loaded. Clearly more data regarding a much more diverse number of nanoparticles is needed to evaluate the relevance of nanoparticle property on molecule loading beyond their concentration.

Molecular Release

Due to the toxic nature of many cancer drugs, it is important to encapsulate or conceal them within a nanoparticle until they reach the cancerous cells or target tissues in the body (52, 53). Some nanoparticle drug delivery systems also possess the ability to be suitable carriers for unstable active pharmaceutical ingredients (54) in order to protect them from degradation before reaching the release target cells or tissues. Two publications examined the use of data mining and machine learning to predict the ability to release molecules encapsulated in nanoparticles. A summary of the findings from these research articles is given in Table 4.

Table 4.

Summary of the data mining and machine learning methods used to predict molecular release from nanoparticles. The evaluation results correspond to the most successful model reported in the publication.

Primary Author Nanoparticle Type Released Molecule Type Predictive Method Evaluation Results
Husseini (52) Polymeric (Pluronic P105) micelles Doxorubicin ANN Maximum prediction errors = 0.001
Szlek (55) Poly(lactic-co-glycolic acid) (PLGA) nanoparticles Bovine serum albumin, human serum albumin, recombinant human erythropoietin, lysozyme, recombinant human epidermal growth factor, recombinant human growth hormone, beta-amyloid, recombinant human erythropoietin coupled with human serum albumin, hen ovalbumin, insulin, bovine insulin, L-asparaginase, chymotrypsin, and alpha-1 antitrypsin ANN RMSE = 15.4

Husseini et al. (52) used ANN to model the release of doxorubicin from polymeric (Pluronic P105) micelles at two different frequencies of ultrasound. The number of samples used in this study is not explicitly stated. The model was trained using experimentally obtained input-output data of the release of doxorubicin from the micelles. The predictions made by the ANN method corresponded closely to the experimental data used and the maximum prediction errors at the ultrasound frequencies of 20 and 70 kHZ were 0.002 and 0.001, respectively.

Szlek et al. (55) used ANN to predict the release of macromolecules (bovine serum albumin, human serum albumin, recombinant human erythropoietin, lysozyme, recombinant human epidermal growth factor, recombinant human growth hormone, beta-amyloid, recombinant human erythropoietin coupled with human serum albumin, hen ovalbumin, insulin, bovine insulin, L-asparaginase, chymotrypsin, and alpha-1 antitrypsin) from poly(lactic-co-glycolic acid) (PLGA) nanoparticles. The paper does not specifically state the number of samples used in the dataset, however it appears that 754 samples were used with 320 variables. The independent parameters used in the models included formulation characteristics, experimental conditions, and molecular descriptors calculated using the Marvin cxcalc plugin (56). Feature selection was performed in order to remove features that did not improve the predictions, and resulted in four different analyses. These models achieved relative root-mean-squared errors of 17.7, 17.1, 16.4, and 15.4 when using 21, 17, 16, and 11 features as input variables, respectively, and using the monotone multi-layer perceptron neural network. The analysis with eleven feature input variables was the best and included Szeged index, pI, quaternary structure of macromolecule, lactide-to-glycolide in polymer ratio, poly(vinyl alcohol) inner phase concentration, poly(vinyl alcohol) outer phase concentration, encapsulation rate, mean particle size, dissolution pH, production method, and percentage of macromolecule dissolved as the input variables.

As can be seen from the results, these two articles demonstrate that it is feasible to create predictive models for the quantitative release of molecules from nanoparticles. Several different molecules released and nanoparticles were studied, however it is necessary to evaluate a wider variety of nanocarriers, predictive algorithms, and carried substances to make a final determination of the power of machine learning for this application.

Nanoparticle Adherence

Often in the treatment and imaging of cancer, and to take advantage of the enhanced permeability and retention of smaller nanoparticles, researchers limit the size of synthesized nanoparticles to 200-300 nm (57). This is not always necessarily the best strategy for development of new therapies because there are many limitations that exist for enhanced permeation and retention-based therapies (58). Also, if nanoparticles are to be used to treat other diseases than cancer, relying upon fenestrations in the vasculature will be unsuccessful as they are specific to cancer. The nanoparticles in the study discussed below were designed to adhere to the walls of diseased blood vessels and avoid dislodgement from hydrodynamic forces and provide a useful data set to explore data mining and machine learning to predict nanoparticle adherence.

Boso et al. (59) utilized ANN to predict the number of polystyrene fluorescent nanoparticles adhering to the vessel walls as a function of wall shear rate and nanoparticle diameter. This is important because it is desired to develop an optimal structural configuration of nanoparticles to enhance their accumulation in diseased tissues. The paper does not specifically state the number of samples used in the dataset. The ANN performed quite well at predicting the optimal particle diameter with root mean squared error values of 0.03678 nm and 0.03460 nm respectively. The authors of this article claim that this work demonstrated that by using ANN, the number of long parallel plate flow chamber experiments can be minimized due to the accuracy of the predictive models. They also argue that the predictive model developed could be optimized for in vivo studies, thereby limiting the amount of animal experimentation.

Nanoparticle Size

As can be seen above, the size of nanoparticles is a very important molecular property that can affect their usefulness in nanomedicine, for instance the size of a nanoparticle has been found to be a very important factor determining the fate of the nanoparticle in vivo (60). Optimization of size is also important for the design and development of nanoparticles used to treat a variety of tumors, because the size of the nanoparticle affects their permeability and retention (57). Nanoparticle size can change based upon solution conditions, manufacturing, drug loading, and release of drugs (61). Two publications examined the use of data mining and machine learning to predict nanoparticle size. A summary of the findings from these research articles is given in Table 5.

Table 5.

Summary of the data mining and machine learning methods used to predict nanoparticle size. The evaluation results correspond to the most successful model reported in the publication.

Primary Author Nanoparticle Type Predictive Method Evaluation Results
Asadi (62) Poly(lactide)-poly(ethylene glycol)-poly(lactide) (PLA-PEG-PLA) nanoparticles ANN R2 = 0.9434
Shalaby (50) Di- and triblock copolymers of poly(ethylene glycol) and poly(lactide) ANN R2 = 0.9783

Asadi et al. (62) determined the features that are most relevant to the prediction of particle size of tri-block poly(lactide)-poly(ethylene glycol)-poly(lactide) (PLA-PEG-PLA) nanoparticles using ANN. The paper does not specifically state the number of samples used in the dataset, however it appears that 51 samples were used. There were four input variables used in this study: amount of drug, nanoparticle polymer concentration, mixing rate, and solvent ratio. The method predicted the size of the polymer-based nanoparticles in nm. The specific ANN used was a three-layered feed forward back propagation neural network, and it achieved an R2 value of 0.9434 for the validation data. They found that the nanoparticle polymer concentration is the most important feature when determining nanoparticle size.

Shalaby et al. (50) used ANN to predict the particle size of di- and triblock copolymers of poly(ethylene glycol) and poly(lactide). The paper does not specifically state the number of samples used in the dataset, however it appears that 27 samples were used. There were three input variables used in this study: nanoparticle polymer molecular weight, ratio of nanoparticle polymer to drug, and number of blocks in the nanoparticle copolymer. The method predicted the size of the polymer-based nanoparticles in nm. Their model yielded an R2 value of 0.9783. The prediction of particle size was mostly determined by the molecular weight of the nanoparticle polymer type.

Clearly, the methods for predicting nanoparticle size are quite accurate. One potential reason for this observation, is that for these methods the prediction of the nanoparticle size appears to be dependent upon only a single feature. However, this single feature is different for the different cases reported in the articles listed above.

Polydispersity

One of the many challenges and goals of the field of nanomedicine is the ability to prepare narrowly dispersed nanoparticles (63). Commonly, nanoparticles exhibit relatively high polydispersity that can result in a number of drawbacks, like mixture of nanoparticles with varying loading capacities, decrease in physical stability, variety of release profiles, and unpredictable degradation and clearance rates (64-67).

Esmaeilzadeh-Gharehdaghi et al. (68) predicted the polydispersity of chitosan nanoparticles using ANN with four input features: amplitude of sonication of chitosan solution, sonication time of chitosan solution, chitosan solution concentration, and chitosan solution pH. The dataset used in this study consisted of 39 samples. The application of the model to the validation data yielded an R2 value of 0.84. The data mining work revealed that when the chitosan solution concentration was increased, polydispersity decreased and that when the pH of the chitosan solution is lower or more acidic, the polydispersity increases.

Discussion

The steady growth of the field of nanomedicine has led to the development of nanoinformatics and subsequently the use of data mining and machine learning to develop nano-QSARs and other methods to predict both functional and structural properties of nanoparticles. Research articles focusing on this area of research appear to be published in a wide variety of journals. The methods reported attempt to predict a large number of nanoparticle properties including, cellular uptake, cytotoxicity, molecular loading, molecular release, nanoparticle adherence, nanoparticle size, and polydispersity.

There are two common themes that can be observed from the papers reviewed here. First, the most common method used to create predictions is some variant of artificial neural networks, ANN. There are several reasons for which this may be considered the method of choice, including the complexity of nanoparticle data, large number of attributes describing nanoparticles, and the potential difficulty in creating a prediction with rule-based algorithms due to the lack of sufficient empirical knowledge. The most common descriptors or attributes necessary to create accurate predictions often involve charge, concentration, and size based properties of nanoparticles.

Cytotoxicity from inorganic materials is the most commonly predicted nanoparticle property and most reports find that the most common factors determining it are charge, concentration, and size; this is not surprising as these properties have been hypothesized to be important indications of the potential cytotoxicity of nanoparticles (49). Very little work has been reported on the use of data mining and machine learning methods to predict cytotoxicity of organic nanoparticles. One potential reason for this, is the lack of databases or publications analyzing the cytotoxicity caused by a variety of organic nanoparticles. Another reason is the variability of biological models in different laboratories. Factors such as potential aggregation of nanoparticles, variations in the media used, cell origin and passage, among others further contribute to variability in the data obtained.

Only one of the articles reviewed (30) examined in vivo applications of data mining in nanomedicine. There is a clear lack of use of data mining and machine learning applications toward in vivo data regarding nanoparticles. Again this could probably be attributed to the lack of easily accessible data regarding in vivo applications of nanoparticles and the higher degree of variability of in vivo results.

Another commonality observed among many of the research articles presented in this review is the limited sample size and high-dimensionality of the dataset used for analysis. Several consequences can arise due to lack of data including overfitting; difficulty in demonstrating reliability, generalizability, and applicability of the predictive models to other nanoparticles; and class imbalance. Validation of a predictive model can be problematic when the sample size is limited and the variables representing those samples have high-dimensionality (69). The most common method for overcoming the issue of high-dimensionality of a dataset is to utilize variable (feature) selection to reduce the number of variables analyzed in the predictive model (70). Variable selection was commonly used in the research articles presented in this review, and as stated before most of the researchers paired down their respective lists of variables to charge, concentration, and size based properties of nanoparticles to create accurate predictions. Class imbalance is a challenging problem for the data mining community, it occurs when the samples representing one class is much lower than those representing other classes (71). The simplest way to overcome this issue is to ensure that there is a balanced representation of the members of each classe present in the dataset, but this is a significant challenge in nanoinformatics as the lack of large well curated datasets seriously limits the amount, quality, and variety of data available. This is perhaps the most serious limitation observed in most of the papers discussed in this review. ,. Since the field of nanoinformatics is relatively young, the data mining and machine learning results reported in the research articles presented in this review are very preliminary and their generalizability is still an issue open for further investigation. As stated previously, it is our belief that NLP methods and the development of large curated databases with nanoparticle information will contribute to relieve the limitations commonly identified in most of the articles discussed here. Recently, review papers have focused on several challenges that face the development of nano-QSARs and other predictive models, including the lack of high-quality experimental data, lack of knowledge regarding interactions between nanoparticles like aggregation, high polydispersity in nanoparticles, etc. (72, 73). These are definitely significant challenges that the field of nanoinformatics faces and should definitely be focuses for future research.

The papers reviewed here clearly illustrate the power and accuracy data mining and machine learning methods bring toward creating predictions of functional and structural properties of nanoparticles. With the development of text mining, text extraction, and useful databases in the field of nanomedicine, the authors believe that the development of accurate nano-QSARs and other predictive models is quite possible with state of the art data mining and machine learning practices. None-the-less the great heterogeneity in nanoparticles will make these discoveries more challenging than for traditional small molecule drug design.

Highlights.

  • Nanomedicine, which focuses on the use of nanoparticles and nanotechnology in the biomedical domain, is increasingly becoming a very active field of research.

  • Informatics support for nanomedicine is less developed and there is a substantial lack of authoritative sources of information accessible to non-informatics specialists.

  • This review are to: (1) review research involving the use of data mining and machine learning in the field of nanomedicine and (2) examine the progress and challenges that this relatively new field of nanoinformatics faces to become a major contributor toward the development of effective nanomedicines.

Acknowledgments

The project described was supported by Grant Number T15LM007124 from the National Library of Medicine. Also, this work has been partially funded by the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number 1ULTR001067 and NIH grant R01ES024681.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Jain K. The Handbook of Nanomedicine. 1. Totowa, New Jersey: Humana; 2008. [Google Scholar]
  • 2.Dawidczyk CM, Kim C, Park JH, Russell LM, Lee KH, Pomper MG, et al. State-of-the-art in design rules for drug delivery platforms: lessons learned from FDA-approved nanomedicines. Journal of controlled release : official journal of the Controlled Release Society. 2014;187:133–44. doi: 10.1016/j.jconrel.2014.05.036. Epub 2014/05/31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Tropsha A, Golbraikh A. Predictive QSAR modeling workflow, model applicability domains, and virtual screening. Current pharmaceutical design. 2007;13(34):3494–504. doi: 10.2174/138161207782794257. Epub 2008/01/29. [DOI] [PubMed] [Google Scholar]
  • 4.Puzyn T, Leszczynska D, Leszczynski J. Toward the development of “nano-QSARs”: advances and challenges. Small. 2009;5(22):2494–509. doi: 10.1002/smll.200900179. Epub 2009/09/30. [DOI] [PubMed] [Google Scholar]
  • 5.Puzyn T, Rasulev B, Gajewicz A, Hu X, Dasari TP, Michalkova A, et al. Using nano-QSAR to predict the cytotoxicity of metal oxide nanoparticles. Nature nanotechnology. 2011;6(3):175–8. doi: 10.1038/nnano.2011.10. Epub 2011/02/15. [DOI] [PubMed] [Google Scholar]
  • 6.Fourches D, Pu D, Tassa C, Weissleder R, Shaw SY, Mumper RJ, et al. Quantitative nanostructure-activity relationship modeling. ACS nano. 2010;4(10):5703–12. doi: 10.1021/nn1013484. Epub 2010/09/23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.de la Iglesia D, Maojo V, Chiesa S, Martin-Sanchez F, Kern J, Potamias G, et al. International efforts in nanoinformatics research applied to nanomedicine. Methods of information in medicine. 2011;50(1):84–95. doi: 10.3414/ME10-02-0012. Epub 2010/11/19. [DOI] [PubMed] [Google Scholar]
  • 8.Jones AT, Gumbleton M, Duncan R. Understanding endocytic pathways and intracellular trafficking: a prerequisite for effective design of advanced drug delivery systems. Advanced drug delivery reviews. 2003;55(11):1353–7. doi: 10.1016/j.addr.2003.07.002. [DOI] [PubMed] [Google Scholar]
  • 9.Gabizon A, Bradbury M, Prabhakar U, Zamboni W, Libutti S, Grodzinski P. Cancer nanomedicines: closing the translational gap. The Lancet. 2014;384(9961):2175–6. doi: 10.1016/S0140-6736(14)61457-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Weissleder R, Kelly K, Sun EY, Shtatland T, Josephson L. Cell-specific targeting of nanoparticles by multivalent attachment of small molecules. Nature biotechnology. 2005;23(11):1418–23. doi: 10.1038/nbt1159. Epub 2005/10/26. [DOI] [PubMed] [Google Scholar]
  • 11.Kim SS, Rait A, Rubab F, Rao AK, Kiritsy MC, Pirollo KF, et al. The clinical potential of targeted nanomedicine: delivering to cancer stem-like cells. Molecular therapy : the journal of the American Society of Gene Therapy. 2014;22(2):278–91. doi: 10.1038/mt.2013.231. Epub 2013/10/12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zheng W, Tropsha A. Novel variable selection quantitative structure--property relationship approach based on the k-nearest-neighbor principle. Journal of chemical information and computer sciences. 2000;40(1):185–94. doi: 10.1021/ci980033m. Epub 2000/02/08. [DOI] [PubMed] [Google Scholar]
  • 13.Shen M, Xiao Y, Golbraikh A, Gombar VK, Tropsha A. Development and validation of k-nearest-neighbor QSPR models of metabolic stability of drug candidates. Journal of medicinal chemistry. 2003;46(14):3013–20. doi: 10.1021/jm020491t. Epub 2003/06/27. [DOI] [PubMed] [Google Scholar]
  • 14.Winkler DA, Burden FR, Yan B, Weissleder R, Tassa C, Shaw S, et al. Modelling and predicting the biological effects of nanomaterials. SAR and QSAR in environmental research. 2014;25(2):161–72. doi: 10.1080/1062936X.2013.874367. Epub 2014/03/15. [DOI] [PubMed] [Google Scholar]
  • 15.Dragon Professional Software Package. 5.3 for Windows ed. Milano, Italy: Milano Chemometrics and QSAR Research Group; 2009. [Google Scholar]
  • 16.Burden FR, Winkler DA. Optimum QSAR model feature selection using sparse Bayesian methods. QSAR Comb Sci. 2009;28:645–53. [Google Scholar]
  • 17.Burden FR, Winkler DA. Robust QSAR models using Bayesian regularized neural networks. Journal of medicinal chemistry. 1999;42(16):3183–7. doi: 10.1021/jm980697n. Epub 1999/08/17. [DOI] [PubMed] [Google Scholar]
  • 18.Burden FR, Winkler DA. An optimal self-pruning neural network that performs nonlinear descriptor selection for QSAR. QSAR Comb Sci. 2009;28:1092–7. [Google Scholar]
  • 19.Winkler DA, Burden FR. Robust QSAR models from novel descriptors and Bayesian regularized neural networks. Mol Simul. 2000;24:243–58. [Google Scholar]
  • 20.Epa VC, Burden FR, Tassa C, Weissleder R, Shaw S, Winkler DA. Modelling biological activities of nanoparticles. Nano letters. 2012;12:5808–12. doi: 10.1021/nl303144k. [DOI] [PubMed] [Google Scholar]
  • 21.Elsaesser A, Howard CV. Toxicology of nanoparticles. Advanced drug delivery reviews. 2012;64(2):129–37. doi: 10.1016/j.addr.2011.09.001. Epub 2011/09/20. [DOI] [PubMed] [Google Scholar]
  • 22.Fadeel B, Garcia-Bennett AE. Better safe than sorry: Understanding the toxicological properties of inorganic nanoparticles manufactured for biomedical applications. Advanced drug delivery reviews. 2010;62(3):362–74. doi: 10.1016/j.addr.2009.11.008. Epub 2009/11/11. [DOI] [PubMed] [Google Scholar]
  • 23.Maojo V, Fritts M, de la Iglesia D, Cachau RE, Garcia-Remesal M, Mitchell JA, et al. Nanoinformatics: a new area of research in nanomedicine. International journal of nanomedicine. 2012;7:3867–90. doi: 10.2147/IJN.S24582. Epub 2012/08/07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Jones DE, Ghandehari H, Facelli JC. Predicting cytotoxicity of PAMAM dendrimers by molecular descriptors. Beilstein J Nanotechnol. 2015;6:1886–96. doi: 10.3762/bjnano.6.192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Landsiedel R, Ma-Hock L, Kroll A, Hahn D, Schnekenburger J, Wiench K, et al. Testing metal-oxide nanomaterials for human safety. Adv Mater. 2010;22(24):2601–27. doi: 10.1002/adma.200902658. Epub 2010/06/01. [DOI] [PubMed] [Google Scholar]
  • 26.Sayes C, Ivanov I. Comparative study of predictive computational models for nanoparticle-induced cytotoxicity. Risk analysis : an official publication of the Society for Risk Analysis. 2010;30(11):1723–34. doi: 10.1111/j.1539-6924.2010.01438.x. Epub 2010/06/22. [DOI] [PubMed] [Google Scholar]
  • 27.Liu R, Rallo R, George S, Ji Z, Nair S, Nel AE, et al. Classification NanoSAR development for cytotoxicity of metal oxide nanoparticles. Small. 2011;7(8):1118–26. doi: 10.1002/smll.201002366. Epub 2011/04/02. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Horev-Azaria L, Baldi G, Beno D, Bonacchi D, Golla-Schindler U, Kirkpatrick JC, et al. Predictive toxicology of cobalt ferrite nanoparticles: comparative in-vitro study of different cellular models using methods of knowledge discovery from data. Particle and fibre toxicology. 2013;10:32. doi: 10.1186/1743-8977-10-32. Epub 2013/07/31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Toropova AP, Toropov AA, Rallo R, Leszczynska D, Leszczynski J. Optimal descriptor as a translator of eclectic data into prediction of cytotoxicity for metal oxide nanoparticles under different conditions. Ecotoxicology and environmental safety. 2015;112:39–45. doi: 10.1016/j.ecoenv.2014.10.003. Epub 2014/12/03. [DOI] [PubMed] [Google Scholar]
  • 30.Liu X, Tang K, Harper S, Harper B, Steevens JA, Xu R. Predictive modeling of nanomaterial exposure effects in biological systems. International journal of nanomedicine. 2013;8(Suppl 1):31–43. doi: 10.2147/IJN.S40742. Epub 2013/10/08. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Devroye L, Gyorfi L, Lugosi G. A Probabilistic Theory of Pattern Recognition. New York: Springer-Verlag; 1996. [Google Scholar]
  • 32.Stewart JJP. MOPAC2009. Stewart Computational Chemistry; 2009. [Google Scholar]
  • 33.Quinlan R. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers; 1993. [Google Scholar]
  • 34.Aha D, Kibler D, Albert MK. Instance-based learning algorithms. Machine Learning. 1991;6(1):37–66. [Google Scholar]
  • 35.Breiman L. Bagging predictors. Machine Learning. 1996;24(2):123–40. [Google Scholar]
  • 36.Quinlan RJ, editor. Learning with continuous classes; 5th Australian Joint conference on Artificial Intelligence; Singapore. 1992. [Google Scholar]
  • 37.Cleary JG, Trigg LE. K*: An Instance-based Learner Using an Entropic Distance Measure; 12th International conference on Machine Learning; 1995. pp. 108–14. [Google Scholar]
  • 38.Asharani PV, Lian Wu Y, Gong Z, Valiyaveettil S. Toxicity of silver nanoparticles in zebrafish models. Nanotechnology. 2008;19(25):255102. doi: 10.1088/0957-4484/19/25/255102. Epub 2008/06/25. [DOI] [PubMed] [Google Scholar]
  • 39.Harper SL, Carriere JL, Miller JM, Hutchison JE, Maddux BL, Tanguay RL. Systematic evaluation of nanomaterial toxicity: utility of standardized materials and rapid assays. ACS nano. 2011;5(6):4688–97. doi: 10.1021/nn200546k. Epub 2011/05/26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Witten I, Frank E, Hall M. Data Mining: Practical Machine Learning Tools and Techniques. 3. Morgan Kaufmann Publishers; 2011. p. 629. [Google Scholar]
  • 41.Schoelkopf B, Burges C, Smola A. Advances in Kernel Methods - Support Vector Learning. 1998 [Google Scholar]
  • 42.Frank E, Wang Y, Inglis S, Holmes G, Witten IH. Using model trees for classification. Machine Learning. 1998;32(1):63–76. [Google Scholar]
  • 43.Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA Data Mining Software: An Update. SIGKDD Explorations. 2009;11(1):10–8. [Google Scholar]
  • 44.Frank E, Hall M, Pfahringer B. Locally Weighted Naive Bayes; 19th conference in Uncertainty in Artificial Intelligence; 2003. pp. 249–56. [Google Scholar]
  • 45.Kohavi R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection; International Joint conference on Artificial Intelligence; 1995. [Google Scholar]
  • 46.Hall M, Frank E. Combining Naive Bayes and Decision Tables; 21st Florida Artificial Intelligence Society conference (FLAIRS); 2008. [Google Scholar]
  • 47.Kohavi R. Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid; Second International conference on Knowledge Discovery and Data Mining; 1996. pp. 202–7. [Google Scholar]
  • 48.ChemAxon. Marvin 5.12.4. 2013 [Google Scholar]
  • 49.El-Sayed M, Ginski M, Rhodes CA, Ghandehari H. Influence of Surface Chemistry of Poly(Amidoamine) Dendrimers on Caco-2 Cell Monolayers. journal of Bioactive and Compatible Polymers. 2003;18(1):7–22. [Google Scholar]
  • 50.Shalaby KS, Soliman ME, Casettari L, Bonacucina G, Cespi M, Palmieri GF, et al. Determination of factors controlling the particle size and entrapment efficiency of noscapine in PEG/PLA nanoparticles using artificial neural networks. International journal of nanomedicine. 2014;9:4953–64. doi: 10.2147/IJN.S68737. Epub 2014/11/05. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Leo E, Brina B, Forni F, Vandelli MA. In vitro evaluation of PLA nanoparticles containing a lipophilic drug in water-soluble or insoluble form. International journal of pharmaceutics. 2004;278(1):133–41. doi: 10.1016/j.ijpharm.2004.03.002. Epub 2004/05/26. [DOI] [PubMed] [Google Scholar]
  • 52.Husseini GA, Mjalli FS, Pitt WG, Abdel-Jabbar N. Using artificial neural networks and model predictive control to optimize acoustically assisted Doxorubicin release from polymeric micelles. Technology in cancer research & treatment. 2009;8(6):479–88. doi: 10.1177/153303460900800609. Epub 2009/11/21. [DOI] [PubMed] [Google Scholar]
  • 53.Makadia HK, Siegel SJ. Poly Lactic-co-Glycolic Acid (PLGA) as Biodegradable Controlled Drug Delivery Carrier. Polymers. 2011;3(3):1377–97. doi: 10.3390/polym3031377. Epub 2012/05/12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Danhier F, Ansorena E, Silva JM, Coco R, Le Breton A, Preat V. PLGA-based nanoparticles: an overview of biomedical applications. Journal of controlled release : official journal of the Controlled Release Society. 2012;161(2):505–22. doi: 10.1016/j.jconrel.2012.01.043. Epub 2012/02/23. [DOI] [PubMed] [Google Scholar]
  • 55.Szlek J, Paclawski A, Lau R, Jachowicz R, Mendyk A. Heuristic modeling of macromolecule release from PLGA microspheres. International journal of nanomedicine. 2013;8:4601–11. doi: 10.2147/IJN.S53364. Epub 2013/12/19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Marvin cxcalc plugin, UK. 5.11 ed. Budapest, Hungary: ChemAxon; [Google Scholar]
  • 57.Matsumura Y, Oda T, Maeda H. General mechanism of intratumor accumulation of macromolecules: advantage of macromolecular therapeutics. Gan to kagaku ryoho Cancer & chemotherapy. 1987;14(3 Pt 2):821–9. Epub 1987/03/01. [PubMed] [Google Scholar]
  • 58.Jain RK. Barriers to drug delivery in solid tumors. Scientific American. 1994;271(1):58–65. doi: 10.1038/scientificamerican0794-58. Epub 1994/07/01. [DOI] [PubMed] [Google Scholar]
  • 59.Boso DP, Lee SY, Ferrari M, Schrefler BA, Decuzzi P. Optimizing particle size for targeting diseased microvasculature: from experiments to artificial neural networks. International journal of nanomedicine. 2011;6:1517–26. doi: 10.2147/IJN.S20283. Epub 2011/08/17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Harashima H, Sakata K, Funato K, Kiwada H. Enhanced hepatic uptake of liposomes through complement activation depending on the size of liposomes. Pharmaceutical research. 1994;11(3):402–6. doi: 10.1023/a:1018965121222. Epub 1994/03/01. [DOI] [PubMed] [Google Scholar]
  • 61.Ren J, Hong H, Song J, Ren T. Particle size and distribution of biodegradable poly-d,llactide- co-poly(ethylene glycol) block polymer nanoparticles prepared by nanoprecipitation. J Appl Polym Sci. 2005;98:1884–90. [Google Scholar]
  • 62.Asadi H, Rostamizadeh K, Salari D, Hamidi M. Preparation of biodegradable nanoparticles of tri-block PLA-PEG-PLA copolymer and determination of factors controlling the particle size using artificial neural network. Journal of microencapsulation. 2011;28(5):406–16. doi: 10.3109/02652048.2011.576784. Epub 2011/07/09. [DOI] [PubMed] [Google Scholar]
  • 63.Mukherjee B, Santra K, Pattnaik G, Ghosh S. Preparation, characterization and in-vitro evaluation of sustained release protein-loaded nanoparticles based on biodegradable polymers. International journal of nanomedicine. 2008;3(4):487–96. doi: 10.2147/ijn.s3938. Epub 2008/01/01. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Sattler KD. Handbook of Nanophysics: Nanoparticles and Quantum Dots. CRC Press; 2010. [Google Scholar]
  • 65.Sinko PJ. Martin’s Physical Pharmacy and Pharmaceutical Sciences: Physical Chemical and Biopharmaceutical Principles in the Pharmaceutical Sciences. Lippincott Williams & Wilkins; 2006. [Google Scholar]
  • 66.Belbella AA, Vauthier C, Fessi H, Devissaguet JP, Puisieux F. In vitro degradation of nanospheres from poly(D,L-lactides) of different molecular weights and polydispersities. International journal of pharmaceutics. 1996;129:95–102. [Google Scholar]
  • 67.Schärtl W. Light Scattering from Polymer Solutions and Nanoparticle Dispersions. Springer-Verlag Berlin Heidelberg; 2007. [Google Scholar]
  • 68.Esmaeilzadeh-Gharehdaghi E, Faramarzi MA, Amini MA, Moazeni E, Amani A. Processing/formulation parameters determining dispersity of chitosan particles: an ANNs study. Journal of microencapsulation. 2014;31(1):77–85. doi: 10.3109/02652048.2013.805842. Epub 2013/06/26. [DOI] [PubMed] [Google Scholar]
  • 69.Dougherty ER, Hua J, Bittner ML. Validation of computational methods in genomics. Current Genomics. 2007;8(1):1–19. doi: 10.2174/138920207780076956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Lee C, Lee GG. Information gain and divergence-based feature selection for machine learning-based text categorization. Information Processing & Management. 2006;42(1):155–65. [Google Scholar]
  • 71.Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics. 2012;42(4):463–84. [Google Scholar]
  • 72.Tantra R, Oksel C, Puzyn T, Wang J, Robinson KN, Wang XZ, et al. Nano(Q)SAR: Challenges, pitfalls and perspectives. Nanotoxicology. 2015;9(5):636–42. doi: 10.3109/17435390.2014.952698. Epub 2014/09/12. [DOI] [PubMed] [Google Scholar]
  • 73.Oksel C, Ma CY, Liu JL, Wilkins T, Wang XZ. (Q)SAR modelling of nanomaterial toxicity: A critical review. Particuology. 2015;21:1–19. [Google Scholar]

RESOURCES