Skip to main content
Nanomedicine logoLink to Nanomedicine
. 2024 Jun 21;19(14):1271–1283. doi: 10.1080/17435889.2024.2359355

The role of artificial intelligence and data science in nanoparticles development: a review

Rodrigo Fonseca Silveira a, Ana Luiza Lima a, Idejan Padilha Gross a, Guilherme Martins Gelfuso a, Tais Gratieri a, Marcilio Cunha-Filho a,*
PMCID: PMC11285233  PMID: 38905147

ABSTRACT

Artificial intelligence has revolutionized many sectors with unparalleled predictive capabilities supported by machine learning (ML). So far, this tool has not been able to provide the same level of development in pharmaceutical nanotechnology. This review discusses the current data science methodologies related to polymeric drug-loaded nanoparticle production from an innovative multidisciplinary perspective while considering the strictest data science practices. Several methodological and data interpretation flaws were identified by analyzing the few qualified ML studies. Most issues lie in following appropriate analysis steps, such as cross-validation, balancing data, or testing alternative models. Thus, better-planned studies following the recommended data science analysis steps along with adequate numbers of experiments would change the current landscape, allowing the exploration of the full potential of ML.

Keywords: : artificial neural network, data mining, data science, machine learning, polymeric nanoparticle, quality by design

Plain language summary

Executive summary.

Data science

  • The main methodologies in data science are presented, including the knowledge discovery in databases (KDDs).

  • Extracting new knowledge from datasets requires the realization of critical steps.

  • The model should be validated outside the laboratory to assess its compliance.

State-of-the-art polymeric nanoparticles assisted by data science

  • Only a few qualified studies used machine learning to obtain polymeric nanoparticles.

  • Several methodological flaws were identified, considering modern practices in data science.

  • Databases commonly showed an inappropriate number of experiments.

  • All the studies were performed using commercial software without personalized computer programming.

  • Several studies exhibited models with apparent problems of overfitting or selection bias.

Opportunities for improvements

  • Data science analysis steps should be scrupulously followed with the assistance of an expert in the field.

  • Performing an appropriate data balancing and the proper number of experiments is crucial.

  • Better-planned studies using machine learning could lead to reliable and helpful real-life models.

1. Background

Over time, data science has progressed fast, uncovering new knowledge hidden in data and contemporizing humanity's significant achievements. Algorithms and models for classifying or predicting information have gained prominence, particularly those based on the machine learning (ML) approach. In this context, the rise of artificial neural networks (ANN) has revolutionized the field, especially due to the bioinspired architecture that allows the capture of highly complex relationships and its growing capacity to learn and adapt with the progressive increase in available knowledge [1]. Thus, data science has supported decision-making in many fields, from finance to education to pharmaceuticals [2].

The modern framework of data sciences has an essentially interdisciplinary nature, not only because of its potential application in all areas of knowledge but also because it requires analytical steps from very specialized domains. For example, computer science is essential to select, prepare, and process datasets using programming codes. Statistics, in turn, creates models based on data distribution, probability, and resampling techniques. Additionally, mathematical science is indispensable in recognizing models associated with time series, such as ANN, or algebraic concepts, such as vectors and matrices. Finally, design and marketing are necessary to present the results intuitively and naturally for the project's target audience, known as ‘data storytelling’ [2]. In this intricate association of expertise, the result is invariably compromised if any step fails, making the conclusions obtained useless or even dangerously wrong.

Nanotechnology, coincidentally, presents similar development milestones and has an equally disruptive character. The possibility of manipulating matter in nanometric dimensions has demonstrated remarkable potential in energy, environment, and health. However, nanotechnology is still far from having its full potential explored on an industrial scale [3]. Even though most scientific studies in nanotechnology have entirely empirical developments, data science has been used to obtain predictive models related to specific properties of nanoparticles and their production process [4–6].

Indeed, computer-aided design of experiments has become a valuable addition to the systematic quality-by-design approach in pharmaceutical drug development [7]. In fact, the basis of quality-by-design lies in describing the product's desired quality attributes using computational tools such as molecular docking and surface response methods [8,9].

However, these tools alone seem pretty simplistic, considering the sophisticated processes available for data science. Response surface uses more straightforward methods, such as linear regression and polynomial regression, to model the data and cannot capture the complexity and diversity of some real-world phenomena [8]. In addition, it can be sensitive to outliers, noise, and measurement errors in the data. In the case of molecular docking, inaccurate consideration of ligand flexibility or binding energy due to the lack of reliable information limits its scope of action to preliminary research phases [7].

On the other hand, ML models that can generalize different complex processes could fill these gaps and make more accurate predictions, mainly when supported by balanced input samples during model calibration. The ability of ML to recognize patterns and optimize processes makes it a valuable tool to support the development of nanomedicines that align with the overall goals of quality-by-design.

In fact, ML has been used to support the development of different types of drug products, including those tailored to assess in vivo/in vitro correlations [10], dissolution profiles [11,12], and skin permeation characteristics [13–15]. ML tools have also been explored in obtaining drug-loaded nanoparticles, particularly for the prediction of the particle size [4–6,16–24], which holds great importance in drug biodistribution [25,26] and drug delivery [27]. Furthermore, ML models have been created to predict the zeta potential [4,19,21] and encapsulation efficiency [21,24,28], which can be crucial for stability and drug targeting. Despite the recognized benefits of this approach, ML is far from being a reality accessible for the pharmaceutical industry in nanostructured systems.

In this scenario, this review proposes, for the first time, to perform a critical analysis of the data science methodologies and protocols used in the predictive models for drug-loaded nanoparticle properties obtained from ML. Specifically, this study focuses on the evaluation of the ML state-of-the-art related to polymeric nanoparticles, which are one of the most common nanoparticles used for therapeutic purposes, identifying if the proper steps for building a reliable model were followed and how the chosen models can be used to solve real-world problems outside the laboratory. Finally, opportunities for improvements in data science in this field are discussed.

2. Data science

2.1. Methodologies & analysis steps

The development of sophisticated methodologies for data science over time has helped to organize the critical steps in extracting new knowledge from datasets [29]. One of the most renowned processes is the cross-industry standard process for data mining (CRISP-DM) [30], which combines the best practices to make data mining as productive and efficient as possible. This tool has been used in financial data, human resources, production, and customer habits, mainly focused on business understanding and deployment aspects.

The knowledge discovery in databases (KDD), in turn, is another recognized method for developing strategies to discover knowledge from data, focusing on data mining steps that have been used to identify medical problems [31], in the field of psychology for text mining [32], or even for classifying energy quality in feeders [33]. Another approach, the data mining process known as SEMMA, involves five key stages – sampling, exploration, modification, modeling, and evaluation. Such a system has proven helpful in identifying the best techniques for handling large amounts of data [34], improving probabilistic sound predictions [35], or building a predictive model for loan defaults [36]. Despite their particularities, they all have similar steps and common goals [37], as in Table 1.

Table 1.

Comparison and steps of data science methods knowledge discovery in databases, cross-industry standard process for data mining and SEMMA.

Knowledge discovery in database CRISP-DM SEMMA Points to be addressed
Business understanding What does the business need?
Data selection and integration Data understanding Sample What data are available or needed?
Are they clean?
Data cleaning and preprocessing   Explore  
Transformation Data preparation Modify How to organize the data for modeling?
Data mining Modeling Model What modeling techniques could be used?
Pattern evaluation/Interpretation Evaluation Assessment Which model best meets the business goals?
Deployment How can stakeholders access the results in the real world?

The first phase considered crucial in data analysis by CRISP-DM is business understanding, in which the target product or research objective is outlined. This step is part of identifying success criteria, assessing the resources needed, evaluating the associated risks, developing a project plan, and analyzing the related costs and benefits [29].

The next step, common to the different methodologies (understanding/data selection/sampling phase), is focused on gathering essential information [37]. The focus is on describing, exploring, and understanding the data, ensuring quality, and preventing the issue of imbalanced data [29].

In the data preparation/transformation/modification phase, various techniques are applied to eliminate data imbalances, for example, oversampling or undersampling methods [38]. Oversampling involves supplementing the original dataset with artificially generated data from the existing data source [29]. Conversely, undersampling reduces the data in the original dataset to achieve a balanced representation. Then, the most appropriate modeling techniques are chosen based on the previously analyzed input characteristics and the desired target solution. Models can vary based on the specific problem [37].

The results achieved are carefully examined in the evaluation/interpretation/assessment phase, and their compliance with the success criteria defined in the business understanding phase is assessed. In this phase, whether to continue with the implementation step or go through the previous phases again if the results do not meet the expected results. In this situation, all necessary steps are repeated until the success criteria are met and the desired results are achieved [30]. This evaluation process is crucial for determining the effectiveness and validity of the developed model.

Finally, the implementation plan is executed to put the developed model into practice, foreseen explicitly in the CRISP-DM methodology called Deployment. As a part of this process, it is crucial to establish a robust monitoring and maintenance plan to ensure the continued effectiveness and usefulness of the deployed model in the long term [29,37].

2.2. Basics considerations on ANN

The functioning of biological neural networks inspires ANN and usually consists of three layers level: the input, the hidden, and the output layers [1,39]. The input layers of the neural networks receive the available database that can be composed, for example, of experimental and/or theoretical results. The hidden layers perform the required calculations and transformations on the input data. Finally, the output layer generates the results or predictions based on the processed inputs.

The multilayer perceptron is an extended version of the feedforward neural network. The term feedforward is derived from the unidirectional flow of Information within the network, where data moves forward from the input layers through the hidden layers to the output layer [1,39,40]. The training of neurons in a multilayer perceptron is usually performed using the backpropagation learning algorithm, which facilitates the adjustment of weights and biases and improves the precision and accuracy of predictions. It serves as a fundamental mathematical mechanism in data mining and ML. Multilayer perceptron can approximate any continuous function and solve problems that are not linearly separable. An ANN is distinguished by its ability to perform better on large amounts of data, unlike traditional algorithms [40,41].

ANN also has hyperparameters that must be specified before training the network [42,43]. These hyperparameters include the activation function, the number of nodes in the hidden layer, and the number of epochs. The activation function of a neuron in an ANN determines whether it should be activated based on the inputs it receives [42,44,45]. Different activation functions can be used depending on the nature of the problem.

The number of neurons in the hidden layer of an ANN is a crucial hyperparameter. Typically, the number of neurons in the hidden layer can be chosen between the size of the input layer and the size of the output layer [39,43]. If too few neurons are used, the model may not have enough capacity to capture the underlying patterns in the data, resulting in inadequate fitting. On the other hand, too many neurons can lead to overfitting, where the model memorizes the training data instead of learning generalizable patterns [39,42].

The epoch refers to a complete cycle over the entire training dataset during ANN training. Determining an appropriate number of epochs is critical to achieving optimal performance.

A reasonable starting point for the number of epochs is often three times the number of columns or features in the dataset [39,42]. Selecting the correct hyperparameters for an ANN requires experimentation and fine-tuning to find the optimal configuration that provides the best performance and generalization ability for the problem [39,40]. It demands in-depth knowledge of the nature of the data analyzed and is carried out in a non-automated way.

2.3. State-of-the-art polymeric nanoparticles assisted by data science

A literature review was performed to identify studies involving the intersection of polymeric nanoparticles and data science. Web of Science, Scopus, and Google Scholar were used as databases. The search criteria involved combining keywords such as ‘polymeric nanoparticles’ with relevant data science terms, including ‘machine learning,’ ‘prediction,’ ‘artificial intelligence,’ ‘neural network’ and ‘deep learning’. No time restrictions were used in order to obtain all possible results. Only articles that specifically addressed the production of polymeric nanoparticles for therapeutic application were included. Accordingly, specific exclusion criteria were applied, eliminating studies, for example, that produced metallic nanoparticles or focused on environmental applications. Table 2 provides a comprehensive overview of data science studies that specifically address the production of polymeric nanoparticles and fulfill pre-established criteria.

Table 2.

Studies of polymeric nanoparticle production using data science models.

Year Input features Output features Program package Data size Ref.
2005 Energy interaction polymer/drug, polymer, hydration energy Encapsulation efficiency NeuralWare 18 [28]
2011 Polymer viscosity, drug content, S/W ratio, mixing rate Particle size MATLAB 51 [6]
2011 Polymer concentration, surfactant concentration, storage temperature Particle size MATLAB 49 [16]
2014 Solution flow rate, washing time, ultrasonic power, molar ratio of CO2 and solvent, agitation time Particle size NeuralWare 10 [17]
2017 Needle diameter, voltage, polymer/enzyme concentration Particle size INForm 30 [18]
2017 Viscosity, contact angle, polymer concentration, S:W ratio, interfacial tension Particle size, Polydispersity index Visual Gene Developer 24 [5]
2018 Acetone concentration, polymer concentration, total volume, solvent/antisolvent ratio, agitation type, agitation intensity, temperature, mixing time, flow type Particle size, Polydispersity index, Zeta potential FormRules 189 [4]
2021 Type of stabilizer, stabilizer concentration. solvent/antisolvent ratio, solvent volume, mixing time, flow rate, acetone concentration, polymer molecular weight, polymer concentration Particle size, Polydispersity index, Zeta potential FormRules 299 [19]
2014 Polymer molecular weight, polymer and drug ratio, number of blocks Particle size MATLAB 27 [20]
2015 CaCl2 concentration, homogenizer speed, agar concentration, polymer concentration Particle size, Polydispersity index, Zeta potential, Encapsulation efficiency MATLAB 60 [21]
2016 Polymer concentration, thiamine pyrophosphate concentration, polymer/thiamine pyrophosphate mass ratio Particle size MATLAB 45 [22]
2012 Polymer concentration, buffer pH, sonication amplitude, sonication time Particle size INForm 52 [23]
2017 Polymer concentration, albumin concentration, stirring time Particle size, encapsulation efficiency, cytotoxicity INForm 30 [24]

2.4. Output features used in ANN

The particle size is one of the most essential characteristics to be monitored in the production of nanoparticles, and it can be assessed easily using, for example, dynamic light scattering. This characteristic is decisive for the therapeutic use of nanostructured systems, as it influences their pharmacokinetics and stability [17,46,47]. Indeed, studies have reported a significant correlation between size and drug absorption and have highlighted that particle size is an essential factor for their biodistribution profile [6].

Unsurprisingly, this output is present in 12 of the 13 studies selected for this analysis (Table 2). At the same time, it is a challenging parameter to control since even the exact reproduction of nanoparticle-obtaining protocols can sometimes result in different particle sizes, showing that this parameter is influenced by multiple non-linear factors [48].

The encapsulation efficiency is another crucial parameter in nanostructured drug-delivery systems. It measures the percentage of drugs encapsulated/entrapped within the nanoparticles. Such property is governed by the interplay of several physicochemical and processing parameters. In fact, drug properties, including size, charge, and lipophilicity, influence their interaction with the polymer matrix. Notwithstanding, polymer characteristics, such as molecular weight, hydrophobicity, and functional groups, also directly impact drug-polymer interactions and loading capacity.

Moreover, the preparation method plays a significant role in determining the encapsulation efficiency [49]. In this context, factors such as solvent selection, mixing ratios, and processing parameters (for example, sonication, emulsification, and stirring method) may significantly affect the encapsulation efficiency. Additionally, external factors such as temperature, pH, and stabilizers can also affect drug stability and interactions within the nanoenvironment, thereby impacting encapsulation yields.

Contrary to its relevance for nanostructured medicine, only 3 among the 13 selected studies (Table 2) added encapsulation efficiency as an output to the model. The laborious determination of this assay is what limits its modeling. The drug content required for the calculations is usually determined by complex analytical methods employing high-performance liquid or gas chromatography [50,51]. A model capable of making predictions regarding this parameter would have to be built with many drugs interspersing formulation and production variables, making the experiment execution difficult.

Zeta potential, which refers to the electrical potential at the surface of the hydrodynamic shear surrounding the colloidal particles, is another critical nanoparticle characteristic. These systems' stability during the nanostructured medicine's shelf life is highly dependent on this parameter [46]. Despite its easy experimental determination, only a few studies (three among 13) included this parameter in the model (Table 2).

Controlling parameters within the desired ranges and reproducing results, especially when there is a change in production scale, are common issues in nanoparticle production since numerous factors strongly influence such features. This characteristic raises the need for multiple experiments. However, it is also the primary indication that ANN's use of data science prediction algorithms can radically leverage the development of nanoparticles based on reliable models [28].

2.5. Computer-aided design of experiments

Initial database searches returned 278 results involving this review's topics of interest. However, most of them use the surface response approach and a variant of it, the Box-Behnken experimental design – a surface response experiment that works with a fractional experimental design and estimates first and second-order coefficients [52]. This method proves advantageous when the safe operating range of the process is known, which is hardly the case when it comes to polymeric nanoparticles.

The surface response methodology is a valuable tool in experimental design, offering distinct advantages in optimizing processes and understanding the intricate relationships between input variables and desired outcomes. This methodology facilitates the exploration of the design space by systematically varying factors and analyzing their impact on the response of interest. Nevertheless, the approach allows for identifying optimal conditions improving efficiency and cost-effectiveness in pharmaceutical and chemical processes. One of the main advantages of such a method lies in its ability to provide a mathematical model with a relatively small number of experiments. However, surface response can produce biased results when factors vary widely or in the presence of outliers, especially when classical experimental principles such as repetition and control variables are not strictly followed [21,52]. Furthermore, the response surface uses more straightforward methods, such as linear regression and polynomial regression, to model the data, potentially limiting the accuracy of predictions in highly nonlinear systems inherent in the complexity and diversity of some real-world phenomena [21,53,54].

Another very common computational approach that has been used is molecular docking. Such a theoretical approach consists of a computational method generally based on molecular mechanics, which involves describing the polyatomic system using classical physics, often parametrized by quantum-mechanical semiempirical and ab initio theoretical calculations [55]. In this context, molecular docking is often used to simulate the interaction of two molecules, which represents an interesting strategy to simulate the interaction between different nanoparticles and drugs, testing the best combination of materials to maximize the drug-loading capacity, for example [56]. However, molecular docking faces challenges, especially in accurately predicting complicated intermolecular interaction [56] and the complexity of the experimental conditions involved in the nanoparticle's synthesis restricts its viability.

ANN, for instance, can learn from data without requiring prior knowledge or assumptions about the process. It can also handle nonlinear and high-dimensional data, which appears to be the case with nanoparticles in general [42]. High dimensional data are naturally a feature of nanoparticle production, which depends on countless factors such as stabilizer concentration, solvent volume, mixing time, polymer molecular weight, polymer concentration, temperature, active ingredient content, and storage temperature [5,16,17,19,22]. The studies that compared the performance of ANN to the surface response method demonstrated their superiority for almost all outcomes of their models, even though using an insufficient number of experiments [21,53,57].

Traditional computational methods are practical because they require fewer experiments based on simpler equations to model the data [54,57]. Also, they are more accessible for laypeople once there are plenty of proprietary software options compared with ANN, which requires programming and more non-automated operations to make the tuned model work. Also, data science algorithms require extensive upfront experimentation for calibration and computer programming. Therefore, most of the nanotechnology research using computer-aid relies on traditional computational methods over ML. Despite experimental feasibility, such models are limited and have little application in the real world. Therefore, this review focused on the 13 quality studies on polymer nanoparticle fabrication using ML.

2.6. Computer languages, frameworks, & programming for data science

Computer programming is central to data science, providing tools – in code, software, or model applications – to apply knowledge to real-world problems and facilitate their solution. Studying data science requires knowledge of computer programming, using at least one programming language directly or through specialized software [2].

Using a programming language directly lets the programmer choose all the steps and variables in a model. It also allows the latest packages to be incorporated without the disadvantage of being tied to a specific software version, the update of which depends on the manufacturer [2,58]. Studies highlight Python, R, and SQL as the most advantageous programming languages for data science [59–61]. Although Java and C++ are used somewhat, Python and R are characterized by graphical analysis and the implementation of ML models, making them particularly efficient for data cleansing and integration tasks [59–61].

Five authors used MATLAB software to run their ML models in the studies reviewed, while most used proprietary software. This pattern strongly indicates that these authors are less familiar with the state-of-the-art tools commonly used in data science.

Although MATLAB is recognized as a fast and powerful language for data science, offering an excellent graphics package, it is considered less packaged in terms of ML algorithms and more complex to use than Python and R53. The Neuralware software, used in two studies, was last updated in 2011 and offers only feedforward neural networks trained with backpropagation for prediction and classification problems and Kohonen self-organizing maps for clustering problems (https://www.neuralware.com/index.php/products). The latest version of the INform software used in three studies was released in 2014 (https://www.inform-software.com/en/expertise/downloads). The newest version of Visual Gene Developer, used in one of the articles, dates from 2019 and was developed to optimize synthetic genes offering only ANN algorithms (https://www.visualgenedeveloper.net/Download.html). FormRules utilized by two articles was last updated in 2016 and only works with neuro-fuzzy systems with a limited input variable (https://www.bioz.com/result/formrules). Thus, considering how quickly data analysis procedures evolve, proprietary software does not meet the necessary flexibility in ML and is not updated enough in the dynamic landscape of data science.

2.7. Number of experiments

The amount of data required to train an ML model is influenced by several factors, the most important of which is the number of parameters the algorithm must learn. A widely recognized benchmark for assessing the adequacy of the dataset is the application of the ‘10 rules of thumb’ [62]. This guideline specifies that the minimum amount of input data should be ten-times greater than the number of degrees of freedom of the model [62]. Some studies suggest that a dataset of more than 100 instances is generally sufficient for most models that have complexity higher than linear [63,64].

It is important to note that following these recommendations does not guarantee optimal model accuracy. However, they are valuable initial benchmarks in most cases. Hence, less than half of the selected studies complied with such requirements regarding sample size (Table 2), which is quite surprising considering that most employed ANN exclusively. If an insufficient number of experiments are utilized, the model may lack the capacity to effectively capture the intricate underlying patterns within the data, leading to suboptimal fitting and compromised generalization performance.

The use of a limited sample amount may be due to the technical challenges involved in nanoparticle production and physicochemical characterization. The comprehensive characterization of colloidal dispersions involves laborious processes, as discussed in the previous section. This multilayered characterization can extend over several months with enormous associated costs. Consequently, conducting many experiments is possibly the biggest challenge, depending on the team resources available and the deadline for completing the research [65,66].

2.8. Useful data manipulation techniques

Some data manipulation procedures, generally unavailable in proprietary software, can be applied in ML to improve the dataset's quality. Strategies such as data equalization techniques, cross-validation methods, and fine-tuning of hyperparameters are particularly interesting for producing nanoparticles as they can reduce the number of experiments necessary to build a model with adequate accuracy.

Cross-validation can be a valuable strategy for small datasets. This technique facilitates a more objective and precise evaluation of an ML model performance by dividing the available data into distinct training and test sets. Such a technique is recommended explicitly for small datasets as it helps to utilize the available data better and avoid problems of overfitting or selection bias [41]. This verification seems particularly important when the validation approach gains heightened significance in domains such as nanoparticle synthesis development, where the experimental details could be of great significance. A one-time, randomized split training/testing approach was used in all articles examined (Table 2). However, training the models with data proportions different from the test data for small data sets is advisable. A solution is provided by K-fold cross-validation, where the dataset is randomly split into n subsets, and the models are trained and tested n times, mitigating the effects of chance and improving generalization ability (Figure 1).

Figure 1.

Figure 1.

Cross-validation strategies for small datasets. (A) Scheme in which only one random split of the dataset was made. (B) A scheme where parts of the dataset are split n-times randomly and the model is trained over and over, with the final accuracy being an average of these executions. The blank portion is the testing data, and the gray portions are the training data, randomly changed every round.

Another essential analysis relates to the problem of unbalanced data, which often occurs in scenarios with small datasets and can significantly affect the performance of ML models [38]. A dataset is considered unbalanced if there is a noticeable disproportion in the number of examples in one or more classes compared with the others. In such cases, the effectiveness of ML algorithms' inaccurate classifications and regressions is impaired, as the majority classes are favored while the minority classes have a lower recognition rate. In situations with multiple variables and continuous values, such as the particle size, even a slight imbalance in the class distribution on the input layer can lead to accuracy errors. Therefore, it is essential to perform a comprehensive equilibrium analysis at the beginning of the study to mitigate such problems [41]. Furthermore, when dealing with dependent variables that collectively influence the same outcome, it becomes crucial to address not only the individual balance of each variable in isolation but also to carefully consider and balance the intercorrelation among the data.

Unbalanced data issues can be addressed using under- and over-sampling techniques. In the case of small data sets, oversampling can be a valuable strategy. Different oversampling methods exist, including oversampling with noise [67]. In this approach, a fixed, randomly generated number is added or subtracted to the mean of the unbalanced class. Another effective oversampling strategy is the synthetic minority oversampling technique (SMOTE), which produces synthetic data points based on the K-nearest neighbors' algorithm [67]. These approaches can help overcome the challenges associated with imbalanced data and improve the performance of machine learning models.

SMOTE technique is also recommended when dealing with small sample sizes, primarily when the input data in each range is known for its excellent accuracy regarding the output data. By using a K-Means algorithm or selecting artificial data with some deviation from the actual values (noise) – obtained through a careful curation process – these accurate output values can serve as input data. Their precision, coupled with a small margin of error when outstanding accuracy is present in each range, helps improve the model's overall accuracy. It is important to note that this technique should be implemented with new tests to validate its effectiveness before considering its consolidation into the model [67].

No data-balancing analyses were performed in any of the studies reviewed. One study only cited such a step [28], where it was executed automatically using NeuralWare software without providing a detailed description of the procedure. All other studies randomly divided the data into train and test sets. In one study, for example [51], experiments were used to train the model, only four experiments were used for testing and an accuracy of 94% was obtained, probably due to overfitting [6]. The same problem was identified in another study [16], in which 40 and nine experiments were used, respectively, and an accuracy of 97% was achieved.

2.9. Authors academic background

There was a noticeable diversity of academic background among the authors when evaluating the training areas of all the authors involved in the selected publications, of whom only 4% have academic training in computer science. Most authors (42%) have a degree in pharmacy, followed by 13% in chemical engineering and nanomedicine (Figure 2). The papers that use MATLAB have authors with degrees in mathematics and engineering. Using a programming language is crucial to properly follow the steps established in data science for ANN, as already discussed.

Figure 2.

Figure 2.

Authors' training areas related to the selected articles involving ANN in producing polymeric nanoparticles for pharmaceutical purposes.

The lack of knowledge about the methods and techniques of data science appears to be an important factor that prevented the papers from achieving better results. Indeed, using a programming language is essential to properly follow the steps established in data science for ANN, as already discussed. The fact that none of the papers used languages such as R, SQL, and Python strongly suggests that the authors are not fully integrated into the ML and data science environment and possibly have never deployed and made their researched models available for external validation.

Furthermore, they need to perform the data analysis and preparation steps adequately. In these steps, the input variables are thoroughly examined. A correlation analysis must be performed to determine which of them could be discarded. If there is an imbalance in the data, it should be corrected not to compromise the model's accuracy. In addition, outliers must be identified and removed from the database. Frequently, converting continuous variables to discrete variables can reduce the complexity of the model, and it is a recommended procedure. Sometimes, an alternative to reduce complexity is to group one or more variables [2,37].

In the field of ML, it is imperative to recognize that there is no single model that is suitable for solving all problems [1,39,68]. The choice of the optimal model depends on factors such as the size and structure of the input dataset. Only three articles among the analyzed tested alternative models. Trying multiple models is interesting because each model can have different characteristics regarding complexity, accuracy, interpretability, and generalization [41]. When the dataset is small, these differences can be more significant and dependent on the training and test data. As the name suggests, data science is about experimenting and testing hypotheses, which should involve using multiple models to compare results. In classification, a simpler model, such as a decision tree, may better match the nature of the dataset being analyzed than a more complex model, such as a neural network [41].

For example, in the case of nanoparticles, where multiple dimensions of variables are involved, a neural network could provide a more accurate result. Nevertheless, in the case of simple classification, which is concerned with whether nanoparticles form in a particular method, the use of simpler models may be more appropriate.

In practically all the studies, the experiments were just collected, some hyperparameters were tested, the model was run and good accuracy values were obtained, which is usually very difficult to achieve, especially with small amounts of data and the lack of data balance analysis or cross-validation to compensate for the few experiments. Besides this weakness, some studies have not deployed the model, or the authors have validated it with few experiments using their dataset and still obtained good accuracy values, which could probably be a case of overfitting [5,6,18,20]. A good practice in data science is that another team should test the final model, which did not happen in any of the cases analyzed [30].

The pragmatic identification of the steps to be followed in using ML based on the checklist in Table 3 can serve as a basis for future studies in which ANN is used to produce polymeric nanoparticles.

Table 3.

Analysis of selected articles using a checklist of the good data analysis practices for machine learning.

Practices/references [28] [6] [16] [17] [18] [5] [4] [19] [20] [21] [22] [23] [24]
Appropriate data size
Steps of data analysis, preparation, and transformation
Cross-validation
Balance data
Wide input variables
Appropriate model deployment
Use of several models
Wide output variables
Impact of factors by ANN
Use of different values for ANN hyperparameters
Realistic accuracy determination
Appropriate language used for data science

2.10. Opportunities for improvements

An analysis of the articles regarding the implementation of essential steps in ML is presented in Table 3.

As a positive aspect, all studies considered using different values for the ANN hyperparameters, an easily adjustable variable in software such as MATLAB. Thus, all either tested a different number of neurons in the hidden layer, used more than one activation function, or tested different values for the epochs. Some studies used the surface response method to compare the results with the ANN model and discussed the relative importance weights of the factors of the input variables and their significance [5,6,17,21,22].

However, all analyzed studies must include several basic data analysis steps discussed in the previous topics. There are no defined success criteria for building the model [2,37]. Determining the requirements for the model and the level of accuracy that should be achieved from the beginning is important in deciding whether to add a new cycle to the process, optimize the adjustment of hyperparameters, or even conduct new experiments [37].

This background analysis may suggest future research studies integrated into the ML and data science environment could profit from incorporating data scientists as team members.

3. Conclusion

This review addresses the question of how nanotechnology and data science can be combined to enhance the production of polymer drug-loaded nanoparticles. Nanomedicines are still far from realizing their full potential on an industrial scale, as repeatable production patterns are a constant challenge. While most scientific studies in the field of nanotechnology are based on empirical developments, the ability of ML algorithms to recognize patterns and optimize processes is a valuable tool to support the development of nanomedicines that align with the overall goals of quality-by-design.

In any data science project, ensuring usability for potential users is based on reliable predictive capability, which depends entirely on the quality of the model's training data. Quality means not only the balance between input and output variables but also the accuracy of the observed value. In this way, the analyzed articles in this review showed a big gap for improvement in projects dealing with nanotechnology and data science mainly due to methodological shortcomings. Indeed, only 38% of such studies have addressed appropriate data size, leading to prediction errors and biased results. Additionally, none of them clearly commented on the steps of data analysis, preparation and transformation, or cross-validation and data balancing, which can greatly compromise the predictive ability of the models. Also, no studies commented on the possibility of overfitting, as they arrived at high metric validation scores despite too few experiments and variables and only a few articles evaluated different model possibilities in order to find the best fit for their data. Finally, no model was actually used and tested by external users with their own dataset to prove the prediction results in real problems.

4. Future perspective

Data science could be a decisive instrument to overcome the empiricism that marks nanoparticle production and scaling up. Numerous studies have explored such a potential, but almost all utilize less powerful computer tools.

ML seems to be the most appropriate strategy, considering the characteristics of nanostructured systems whose properties are conditioned by multiple non-linear factors. To extract the maximum potential from ML, developing better-planned studies following the recommended analysis steps involving data science specialists and having a database containing an adequate number of experiments to construct reliable and helpful models is essential.

Funding Statement

This work was supported by Fundação de Apoio à Pesquisa do Distrito Federal (FAPDF, 00193-00000735/2021-10) and the University of Brasilia.

Author contributions

RF Silveira: conceptualization & writing – original draft; AL Lima: data curation, review & editing; IP Gross: data curation, review & editing; GM Gelfuso: methodology, review & editing; T Gratieri: resources, review & editing; M Cunha-Filho: conceptualization, writing – original draft.

Financial disclosure

This work was supported by Fundação de Apoio à Pesquisa do Distrito Federal (FAPDF, 00193-00000735/2021-10) and the University of Brasilia. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Competing interests disclosure

The authors have no competing interests or relevant affiliations with any organization or entity with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.

Writing disclosure

No writing assistance was utilized in the production of this manuscript.

References

Papers of special note have been highlighted as: • of interest; •• of considerable interest

  • 1.Alzubi J, Nayyar A, Kumar A. Machine learning from theory to algorithms: an overview. J Phys Conf Ser. 2018;1142:12012. doi: 10.1088/1742-6596/1142/1/012012 [DOI] [Google Scholar]
  • 2.Dhar V. Data science and prediction. Commun ACM. 2013;56:64–73. doi: 10.1145/2500499 [DOI] [Google Scholar]
  • 3.Bastogne T. Quality-by-design of nanopharmaceuticals – a state of the art. Nanomedicine. 2017;13:2151–2157. doi: 10.1016/j.nano.2017.05.014 [DOI] [PubMed] [Google Scholar]
  • 4.Jara MO, Catalan-Figueroa J, Landin M, et al. Finding key nanoprecipitation variables for achieving uniform polymeric nanoparticles using neurofuzzy logic technology. Drug Deliv Transl Res. 2018;8:1797–1806. doi: 10.1007/s13346-017-0446-8 [DOI] [PubMed] [Google Scholar]
  • 5.Youshia J, Ali ME, Lamprecht A. Artificial neural network based particle size prediction of polymeric nanoparticles. European J Pharmaceut Biopharmaceut. 2017;119:333–342. doi: 10.1016/j.ejpb.2017.06.030 [DOI] [PubMed] [Google Scholar]
  • 6.Asadi H, Rostamizadeh K, Salari D, et al. Preparation of biodegradable nanoparticles of tri-block PLA–PEG–PLA copolymer and determination of factors controlling the particle size using artificial neural network. J Microencapsul. 2011;28:406–416. doi: 10.3109/02652048.2011.576784 [DOI] [PubMed] [Google Scholar]
  • 7.Stanzione F, Giangreco I, Cole JC. Chapter four-use of molecular docking computational tools in drug discovery. Prog Med Chem. 2021;60:273–343. doi: 10.1016/bs.pmch.2021.01.004 [DOI] [PubMed] [Google Scholar]
  • 8.Zeeshan M, Ali H, Ain QU, et al. A holistic QBD approach to design galactose conjugated PLGA polymer and nanoparticles to catch macrophages during intestinal inflammation. Mater Sci Engin. 2021;126:112183. doi: 10.1016/j.msec.2021.112183 [DOI] [PubMed] [Google Scholar]
  • 9.Vemula SK, Daravath B, Repka M. Quality by design (QbD) approach to develop fast-dissolving tablets using melt-dispersion paired with surface-adsorption method: formulation and pharmacokinetics of flurbiprofen melt-dispersion granules. Drug Deliv Transl Res. 2023;13:3204–3222. doi: 10.1007/s13346-023-01382-z [DOI] [PubMed] [Google Scholar]
  • 10.de Matas M, Shao Q, Richardson CH, et al. Evaluation of in vitro in vivo correlations for dry powder inhaler delivery using artificial neural networks. European J Pharmaceut Sci. 2008;33:80–90. doi: 10.1016/j.ejps.2007.10.001 [DOI] [PubMed] [Google Scholar]
  • 11.Peh KK, Lim CP, Quek SS, et al. Use of artificial neural networks to predict drug dissolution profiles and evaluation of network performance using similarity factor. Pharm Res. 2000;17:1384–1389. doi: 10.1023/A:1007578321803 [DOI] [PubMed] [Google Scholar]
  • 12.Leane MM, Cumming I, Corrigan OI. The use of artificial neural networks for the selection of the most appropriate formulation and processing variables in order to predict the in vitro dissolution of sustained release minitablets. AAPS PharmSciTech. 2003;4:129–140. doi: 10.1208/pt040226 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Yamashita F, Hashida M. Mechanistic and empirical modeling of skin permeation of drugs. Adv Drug Deliv Rev. 2003;55:1185–1199. doi: 10.1016/S0169-409X(03)00118-2 [DOI] [PubMed] [Google Scholar]
  • 14.Chen L, Lian G, Han L. Prediction of human skin permeability using artificial neural network (ANN) modeling. Acta Pharmacol Sin. 2007;28:591–600. doi: 10.1111/j.1745-7254.2007.00528.x [DOI] [PubMed] [Google Scholar]
  • 15.Ita K, Roshanaei S. Artificial intelligence for skin permeability prediction: deep learning. J Drug Target. 2024;32:334–346. doi: 10.1080/1061186X.2024.2309574 [DOI] [PubMed] [Google Scholar]
  • 16.Furtuna R, Curteanu S, Racles C. NSGA-II-RJG applied to multi-objective optimization of polymeric nanoparticles synthesis with silicone surfactants. Open Chem. 2011;9:1080–1095. doi: 10.2478/s11532-011-0096-5 [DOI] [Google Scholar]
  • 17.Zabihi F, Xin N, Jia J, et al. High yield and high loading preparation of curcumin–PLGA nanoparticles using a modified supercritical antisolvent technique. Ind Eng Chem Res. 2014;53:6569–6574. doi: 10.1021/ie404215h [DOI] [Google Scholar]
  • 18.Yaghoobi N, Majidi RF, ali Faramarzi M, et al. Preparation, optimization and activity evaluation of PLGA/streptokinase nanoparticles using electrospray. Adv Pharm Bull. 2017;7:131. doi: 10.15171/apb.2017.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Jara MO, Landin M, Morales JO. Screening of critical variables in fabricating polycaprolactone nanoparticles using Neuro Fuzzy Logic. Int J Pharm. 2021;601:120558. doi: 10.1016/j.ijpharm.2021.120558 [DOI] [PubMed] [Google Scholar]; •• This is the study with the largest number of data collected using machine learning in the synthesis of polymeric drug-loaded nanoparticles.
  • 20.Shalaby KS, Soliman ME, Casettari L, et al. Determination of factors controlling the particle size and entrapment efficiency of noscapine in PEG/PLA nanoparticles using artificial neural networks. Int J Nanomed. 2014;9:4953. doi: 10.2147/IJN.S68737 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zaki MR, Varshosaz J, Fathi M. Preparation of agar nanospheres: comparison of response surface and artificial neural network modeling by a genetic algorithm approach. Carbohydr Polym. 2015;122:314–320. doi: 10.1016/j.carbpol.2014.12.031 [DOI] [PubMed] [Google Scholar]; • Tests alternative models for the data in addition to the neural network.
  • 22.Hashad RA, Ishak RAH, Fahmy S, et al. Chitosan-tripolyphosphate nanoparticles: optimization of formulation parameters for improving process yield at a novel pH using artificial neural networks. Int J Biol Macromol. 2016;86:50–58. doi: 10.1016/j.ijbiomac.2016.01.042 [DOI] [PubMed] [Google Scholar]; • Tests alternative models for the data in addition to the neural network.
  • 23.Esmaeilzadeh-Gharedaghi E, Faramarzi MA, Amini MA, et al. Effects of processing parameters on particle size of ultrasound prepared chitosan nanoparticles: an artificial neural networks study. Pharm Dev Technol. 2012;17:638–647. doi: 10.3109/10837450.2012.696269 [DOI] [PubMed] [Google Scholar]
  • 24.Baharifar H, Amani A. Size, loading efficiency, and cytotoxicity of albumin-loaded chitosan nanoparticles: an artificial neural networks study. J Pharm Sci. 2017;106:411–417. doi: 10.1016/j.xphs.2016.10.013 [DOI] [PubMed] [Google Scholar]
  • 25.Kumar M, Kulkarni P, Liu S, et al. Nanoparticle biodistribution coefficients: a quantitative approach for understanding the tissue distribution of nanoparticles. Adv Drug Deliv Rev. 2023;194:114708. doi: 10.1016/j.addr.2023.114708 [DOI] [PubMed] [Google Scholar]
  • 26.Gaumet M, Vargas A, Gurny R, et al. Nanoparticles for drug delivery: the need for precision in reporting particle size parameters. European J Pharmaceut Biopharmaceut. 2008;69:1–9. doi: 10.1016/j.ejpb.2007.08.001 [DOI] [PubMed] [Google Scholar]
  • 27.Cho K, Wang X, Nie S, et al. Therapeutic nanoparticles for drug delivery in cancer. Clin Cancer Res. 2008;14:1310–1316. doi: 10.1158/1078-0432.CCR-07-1441 [DOI] [PubMed] [Google Scholar]
  • 28.Devarajan PV, Sonavane GS, Doble M. Computer-aided molecular modeling: a predictive approach in the design of nanoparticulate drug delivery system. J Biomed Nanotechnol. 2005;1:375–383. doi: 10.1166/jbn.2005.051 [DOI] [Google Scholar]; • One of the few articles found that evaluates encapsulation efficiency as an output in determining models using neural networks.
  • 29.Shafique U, Qaiser H. A comparative study of data mining process models (KDD, CRISP-DM and SEMMA). Inter J Innovat Scientif Res. 2014;12:217–222. [Google Scholar]; •• Important reference about data mining process models.
  • 30.Martinez-Plumed F, Contreras-Ochando L, Ferri C, et al. CRISP-DM twenty years later: from data mining processes to data science trajectories. IEEE Trans Knowl Data Eng. 2019;33:3048–3061. doi: 10.1109/TKDE.2019.2962680 [DOI] [Google Scholar]
  • 31.Steiner MTA, Soma NY, Shimizu T, et al. Study of a medical problem using kdd, with emphasis on exploratory data analysis. Gestão & Produção. 2006;13:325–337. doi: 10.1590/S0104-530X2006000200013 [DOI] [Google Scholar]
  • 32.Mariñelarena-Dondena L, Errecalde ML, Castro Solano A. Extracción de conocimiento con técnicas de mineria de textos aplicadas a la psicologia. Rev Argent Cienc Comport. 2017;9:65–76. [Google Scholar]
  • 33.Góes ART, Steiner MTA, Neto PJS. Classification of power quality considering voltage sags occurred in feeders. Proceedings of the International Conference on Neural Computation Theory and Applications. Vilamoura, Portugal: September 20-22, 2013. p. 433–442. [Google Scholar]
  • 34.Firas O. A combination of SEMMA & CRISP-DM models for effectively handling big data using formal concept analysis based knowledge discovery: a data mining approach. World J Advan Engin Technol Sci. 2023;8:9–14. doi: 10.30574/wjaets.2023.8.1.0147 [DOI] [Google Scholar]
  • 35.Holdaway KR. Data mining methodologies enhance probabilistic well forecasting. Proceedings of the SPE Middle East Intelligent Oil and Gas Symposium. Manama, Bahrain,: October 28–30, 2013. p. D031S014R004. [Google Scholar]
  • 36.Tariq HI, Sohail A, Aslam U, et al. Loan default prediction model using sample, explore, modify, model, and assess (SEMMA). J Comput Theor Nanosci. 2019;16:3489–3503. doi: 10.1166/jctn.2019.8313 [DOI] [Google Scholar]
  • 37.Azevedo A, Santos MF. KDD, SEMMA and CRISP-DM: a parallel overview. IADS-DM. 2008;2008:182–185. [Google Scholar]
  • 38.Domingues I, Amorim JP, Abreu PH, et al. Evaluation of Oversampling Data Balancing Techniques in the Context of Ordinal Classification. Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN). Rio de Janeiro, Brazil: IEEE; July 8–13 2018. p. 1–8. doi: 10.1109/IJCNN.2018.8489599 [DOI] [Google Scholar]
  • 39.Delashmit WH, Manry MT. Recent developments in multilayer perceptron neural networks. Proceedings of the Seventh Annual Memphis Area Engineering and Science Conference, MAESC. Memphis TN, USA: May 11, 2005. p. 1–15. [Google Scholar]
  • 40.Alom MZ, Taha TM, Yakopcic C, et al. A state-of-the-art survey on deep learning theory and architectures. Electronics (Basel). 2019;8:292. doi: 10.3390/electronics8030292 [DOI] [Google Scholar]
  • 41.Vabalas A, Gowen E, Poliakoff E, et al. Machine learning algorithm validation with a limited sample size. PLOS ONE. 2019;14:e0224365. doi: 10.1371/journal.pone.0224365 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Kadhim ZS, Abdullah HS, Ghathwan KI. Artificial neural network hyperparameters optimization: a survey. Inter J Online Biomed Engin. 2022;18:18. doi: 10.3991/ijoe.v18i15.34399 [DOI] [Google Scholar]
  • 43.Pannakkong W, Thiwa-Anont K, Singthong K, et al. Hyperparameter tuning of machine learning algorithms using response surface methodology: a case study of ANN, SVM, and DBN. Math Probl Eng. 2022;2022:1–17. doi: 10.1155/2022/8513719 [DOI] [Google Scholar]
  • 44.Lau MM, Lim KH. Review of adaptive activation function in deep neural network. Proceedings of the 2018 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES). Sarawak, Malaysia: December 03-06, 2018. p. 686–690. doi: 10.1109/IECBES.2018.8626714 [DOI] [Google Scholar]
  • 45.Apicella A, Donnarumma F, Isgro F, et al. A survey on modern trainable activation functions. Neural Netw. 2021;138:14–32. doi: 10.1016/j.neunet.2021.01.026 [DOI] [PubMed] [Google Scholar]
  • 46.Lima AL, Gratieri T, Cunha-Filho M, et al. Polymeric nanocapsules: a review on design and production methods for pharmaceutical purpose. Methods. 2022;199:54–66. doi: 10.1016/j.ymeth.2021.07.009 [DOI] [PubMed] [Google Scholar]; • This review is an updated compilation of the production of polymeric nanoparticles for pharmaceutical purposes.
  • 47.Pires FQ, Gross IP, Sa-Barreto LL, et al. In-situ formation of nanoparticles from drug-loaded 3D polymeric matrices. European J Pharmaceut Sci. 2023;188:106517. doi: 10.1016/j.ejps.2023.106517 [DOI] [PubMed] [Google Scholar]
  • 48.Rocha JL, Pires FQ, Gross IP, et al. Propranolol-loaded nanostructured lipid carriers for topical treatment of infantile hemangioma. J Drug Deliv Sci Technol. 2023;80:104099. doi: 10.1016/j.jddst.2022.104099 [DOI] [Google Scholar]
  • 49.Oliveira ACS, Oliveira PM, Cunha-Filho M, et al. Latanoprost loaded in polymeric nanocapsules for effective topical treatment of alopecia. AAPS PharmSciTech. 2020;21:1–7. doi: 10.1208/s12249-020-01863-1 [DOI] [PubMed] [Google Scholar]
  • 50.Angelo T, Pires FQ, Gelfuso GM, et al. Development and validation of a selective HPLC-UV method for thymol determination in skin permeation experiments. J Chromatogr B. 2016;1022:81–86. doi: 10.1016/j.jchromb.2016.04.011 [DOI] [PubMed] [Google Scholar]
  • 51.Kasbaum FE, de Carvalho DM, de Jesus Rodrigues L, et al. Development of lipid polymer hybrid drug delivery systems prepared by hot-melt extrusion. AAPS PharmSciTech. 2023;24:156. doi: 10.1208/s12249-023-02610-y [DOI] [PubMed] [Google Scholar]
  • 52.Ferreira SLC, Bruns RE, Ferreira HS, et al. Box-Behnken design: an alternative for the optimization of analytical methods. Anal Chim Acta. 2007;597:179–186. doi: 10.1016/j.aca.2007.07.011 [DOI] [PubMed] [Google Scholar]
  • 53.Bingöl D, Hercan M, Elevli S, et al. Comparison of the results of response surface methodology and artificial neural network for the biosorption of lead using black cumin. Bioresour Technol. 2012;112:111–115. doi: 10.1016/j.biortech.2012.02.084 [DOI] [PubMed] [Google Scholar]
  • 54.Samuel OD, Okwu MO. Comparison of Response Surface Methodology (RSM) and Artificial Neural Network (ANN) in modelling of waste coconut oil ethyl esters production. Energy Sources Part A. 2019;41:1049–1061. doi: 10.1080/15567036.2018.1539138 [DOI] [Google Scholar]
  • 55.Prieto-Martinez FD, Arciniega M, Medina-Franco JL. Acoplamiento molecular: avances recientes y retos. TIP Rev Especializada en Ciencias Quimico-Biológicas. 2019;21:65–87. doi: 10.22201/fesz.23958723e.2018.0.143 [DOI] [Google Scholar]
  • 56.Ghasemi JB, Abdolmaleki A, Shiri F. Molecular docking challenges and limitations. Pharmaceutical Sciences: Breakthroughs in Research and Practice. Hershey, Pennsylvania, USA: IGI Global; 2017. p. 770–794. doi: 10.4018/978-1-5225-1762-7.ch030 [DOI] [Google Scholar]
  • 57.Gomes HM, Awruch AM. Comparison of response surface and neural network with other methods for structural reliability analysis. Struct Safety. 2004;26:49–67. doi: 10.1016/S0167-4730(03)00022-5 [DOI] [Google Scholar]
  • 58.Raschka S, Patterson J, Nolet C. Machine learning in python: main developments and technology trends in data science, machine learning, and artificial intelligence. Information. 2020;11:193. doi: 10.3390/info11040193 [DOI] [Google Scholar]
  • 59.Zhou X, Ordonez C. Programming Languages in Data Science: a Comparison from a Database Angle. Proceedings of the 2021 IEEE International Conference on Big Data (Big Data). Orlando, FL, USA: December 15-18, 2021. p. 3147–3154. doi: 10.1109/BigData52589.2021.9672007 [DOI] [Google Scholar]
  • 60.Cagnoni S, Cozzini L, Lombardo G, et al. Emotion-based analysis of programming languages on stack overflow. ICT Express. 2020;6:238–242. doi: 10.1016/j.icte.2020.07.002 [DOI] [Google Scholar]
  • 61.Ozgur C, Colliau T, Rogers G, et al. MatLab vs. Python vs. R. J Data Sci. 2017;15:355–371. doi: 10.6339/JDS.201707_15(3).0001 [DOI] [Google Scholar]
  • 62.RVSPK R, Priyanath HMS, Megama RGN. Methods and rules-of-thumb in the determination of minimum sample size when applying structural equation modelling: a review. J Soc Sci Res. 2020;15:102–109. doi: 10.24297/jssr.v15i.8670 [DOI] [Google Scholar]
  • 63.Boomsma A. Nonconvergence, improper solutions, and starting values in lisrel maximum likelihood estimation. Psychometrika. 1985;50:229–242. doi: 10.1007/BF02294248 [DOI] [Google Scholar]
  • 64.Kline RB. Methodology in the social sciences. Principles and practice of structural equation modeling (3rd Edition). New York, USA: Guilford Publications; 2023. [Google Scholar]
  • 65.Nassiri Koopaei N, Abdollahi M. Opportunities and obstacles to the development of nanopharmaceuticals for human use. DARU Journal of Pharmaceutical Sciences. 2016;24(1):23–29. doi: 10.1186/s40199-016-0163-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Awasthi R, Roseblade A, Hansbro PM, et al. Nanoparticles in cancer treatment: opportunities and obstacles. Curr Drug Targets. 2018;19:1696–1709. doi: 10.2174/1389450119666180326122831 [DOI] [PubMed] [Google Scholar]
  • 67.Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–357. doi: 10.1613/jair.953 [DOI] [Google Scholar]
  • 68.Mahesh B. Machine learning algorithms-a review. Inter J Sci Res (IJSR)Internet. 2020;9:381–386. [Google Scholar]

Articles from Nanomedicine are provided here courtesy of Taylor & Francis

RESOURCES