Abstract
The advancement of technologies and the development of more efficient artificial intelligence (AI) enable the processing of large amounts of data in a very short time. Concurrently, the increase in information within biological databases, such as 3D molecular structures or networks of functional macromolecule associations, will facilitate the creation of new methods for risk assessment that can serve as alternatives to animal testing. Specifically, the predictive capabilities of AI as new approach methodologies (NAMs) are poised to revolutionise risk assessment approaches. Our previous studies on molecular docking predictions, using the software Autodock Vina, indicated high‐affinity binding of certain toxic chemicals to the 3D structures of human proteins associated with nervous and reproductive functions. Similar approaches revealed potential sublethal interactions of neonicotinoids with proteins linked to the bees' immune system. Building on these findings, we plan to develop an AI‐based decision tool that exploits the data available on the toxicity of the most know chemical, such as LD50, and the data obtainable by their interaction with the human proteins to support risk assessment studies for multiple stressors still not characterised. Our focus will be on utilising these new bioinformatics methodologies to develop specific experimental designs that allow for confident and predictable study of the toxic and sublethal effects of pesticides on humans. We will also validate the developed NAMs by integrating existing in vivo information from scientific literature and technical reports. These approaches will significantly impact toxicity studies, guiding researchers' experiments and greatly reducing the need for animal testing.
Keywords: artificial intelligence, molecular docking, molecular stressors, pesticide toxicity, protein 3D structure, risk assessment, software development
SUMMARY
The lethal dose 50 (LD50) is a chemical measurement unit that defines the amount of a substance that is lethal to 50% (one half) of the experimental group of test animals exposed to it as can be seen in Figure 1. As with the definition, it is impossible to avoid the use of experimental animals for the determination of LD50. However, it is crucial to know the LD50 of pesticides, additives, preservatives and other chemicals present in the food production chain, in order to protect human health. Although there are some research lines that address the issue from an in silico prediction perspective, we are still far from obtaining a reliable predictor that can reduce the animal experimentation for new chemicals. Obtaining the LD50 without resorting to animal sacrifice would represent a positive advance both ethically and economically. However, this is not a trivial regression problem. In our work, we propose an in‐depth experimentation using descriptors obtained from molecule docking with different proteins, combined with others obtained from verified databases and through different deep learning architectures.
FIGURE 1.

LD50 is the amount of a lethal substance required to kill half of a group of test animals.
In two of the three architectures trained to date, the results show an advantage in the configuration that, through 2D convolutional networks, learns from the patterns of the 2D structures extracted from PubChem. Other tests consisted of extracting eight descriptors in a 3D space through flexible docking of different chemicals with the protein acetylcholinesterase. These descriptors are based on those selected by RosENet, which developed a predictor for binding affinity. With this voxelised configuration in a 25 × 25 × 25 × 8 space, a 3D convolutional network was trained. Although in two architectures, the results are worse due to possible ‘noise’ induced in the weights by the number of parameters, in one of the architectures where 3D convolutional neurons predominated compared to the other two designed architectures, results like the optimal configurations were obtained, showing a possible improvement by expanding the data set according to the docking with new proteins.
1. INTRODUCTION
The exponential growth of data in global databases across various scientific fields has encountered a bottleneck in their analysis. This issue is particularly evident in risk assessment analyses, where a vast amount of data from diverse fields must be evaluated. The advancement of technologies and the development of efficient artificial intelligences (AI) can enable us to process this large volume of data within a manageable timeframe. Additionally, AI can be utilised to generate new insights from the analysis of publicly available data, such as predicting molecular interactions between chemicals and biological macromolecules and understanding how these interactions might impact the organism's metabolic pathways.
In fact, the increase in information within biological and chemical databases, such as 3D molecular structures (Burley et al., 2021), functional macromolecule association networks (Goldsmith et al., 2014) or the effects of active substances on metabolic pathways (EFSA, 2023), will facilitate the development of new risk assessment methods that serve as alternatives to animal testing.
Molecular docking (MOD) analysis, which predicts the molecular interactions that hold together a protein and a ligand (typically a small molecule) in a non‐covalent bound state, can be used to uncover unknown interactions. Flexible MOD, an advanced variant, considers the flexibility of the involved molecules, especially the protein, which can adopt different conformations. This variant provides a more accurate representation of molecular interactions, as in biological conditions, molecules are not rigid. This capability allows for a more focused approach to studying toxic chemicals. Specifically, these findings can help to understand the molecular‐level results obtained in in vivo experiments, clarifying the metabolic pathways affected by toxic chemicals and aiding in the risk assessment under certain conditions. Moreover, predicting new interactions between proteins and toxic ligands enables scientists to design new experiments to gather data on chemical effects that have not yet been evaluated.
Using MOD approaches, we predicted high‐affinity binding of certain toxic chemicals to the 3D structures of human proteins related to nervous and reproductive functions (manuscript in preparation). The predicted interactions correlated with effects observed experimentally, as described in the literature, leading us to hypothesise a functional model of interaction that explains both the predicted and experimental data. This model opens up new avenues for research. Similarly, bioinformatics studies on a data set of 3D protein structures from honeybees (Del Águila Conde & Febbraio, 2022) highlighted potential sublethal interactions of neonicotinoids with proteins related to the immune system of bees (manuscript in preparation). This suggests a possible link between the spread of neonicotinoids and bee deaths, not through direct toxicity, but via immune system inhibition and subsequent exposure to environmental pathogens.
Although the underlying algorithms for MOD have existed since the 1980s, its practical application was not possible until improvements in computational capacity in recent decades, which has led to a surge in research and information obtained through this process. A similar phenomenon has been observed in the field of deep learning.
Deep learning consists of a set of machine learning algorithms that employ deep neural networks to model complex data and extract high‐level features. In this project, we aim to apply these techniques to solve a regression problem. Within the realm of deep learning, convolutional neural networks (CNNs) stand out as a class of networks designed to process data with a grid‐like structure, such as images. These networks use convolution operations to capture spatial and temporal patterns in the data. We will employ two types of convolutional networks: 3D convolutional networks, which will consider a three‐dimensional space of descriptors corresponding to the PDB coordinates resulting from docking, and 2D convolutional networks, which we will use for learning from the two‐dimensional structures of PubChem (Kim et al., 2023).
In order to increase the number of descriptors for the CNNs analysis, we exploited RosENet, a research project developed by Hassan‐Harrirou, Zhang and Lemmin, which extracts descriptors from a protein after performing molecule docking with a series of chemicals and trains a deep learning ResNet architecture for binding affinity prediction (Hassan‐Harrirou et al., 2020). These descriptors include attraction, repulsion, electrostatic and solvation energies obtained using Rosetta software (Leman et al., 2020), as well as chemical descriptors such as aromatic carbon, hydrogen bond acceptor, positive ionisable and negative ionisable, obtained using Autodock Vina software (Eberhardt et al., 2021). The results obtained by RosENet demonstrate significant improvements in binding affinity prediction by combining the accuracy of molecular mechanics energies with the capabilities of an ensemble of 3D convolutional neural networks. This research is the main inspiration for our current work.
2. DATA AND METHODOLOGIES
2.1. Data
A data set consisting of 238 chemicals used in fertilisers and pesticides was developed. The chemical formulas, SMILES formulas, PDB files and both oral and dermal LD50 values were obtained from online databases, such as PubChem (Kim et al., 2023) or ZINC (Irwin & Shoichet, 2005), and scientific literature. Additionally, the PDB file of the protein acetylcholinesterase was acquired from the online PDB database (https://www.rcsb.org/).
Once all the materials were collected, they were processed for MOD using the target protein, and descriptors were extracted using Rosetta. The extracted descriptors align with those obtained in the RoseNet project. Since these descriptors exist in a three‐dimensional space, they needed to be discretised. This was achieved through voxelisation into a 25 × 25 × 25 × 8 grid (the last value represents the number of descriptors), with additional trials conducted using a 2 × 2 × 2 × 8 grid.
Based on this, a series of convolutional neural networks were trained using different inputs or combinations thereof to compare results and provide an initial estimation of the path forward in predicting LD50 values.
The following subsections will explain the methodology of the two main aspects: the execution of the docking and the training of the neural networks.
2.2. Methodologies
2.2.1. Methodology for performing docking
The docking process involved the development and extension of several scripts, including .sh scripts for command line execution on Linux operating systems and higher level .py scripts. Most of the work was conducted on the Ubuntu operating system. The main scripts used include:
vina_script.sh: This script selects all ligands and combines them with all proteins contained within a directory. It utilises the Vina library and requires the ligand names to start with ‘ligand_’ and the protein names to start with ‘protein_’. During the current research, it was extended to perform flexible docking, which is more computationally intensive but provides more realistic results and better extraction of corresponding energies. The script reads the bounding box from a file named conf.txt. The total execution time for 238 ligands and one protein was approximately 3 days.
mol2_to_pdbqt.py and pdbqt_to_pdb.sh: These are format conversion scripts since pdbqt format is required for docking, but pdb format is often clearer for other tasks. Note: mol2_to_pdbqt.py requires libraries exclusive to Windows, making it a necessary operating system for this script.
Other scripts include mathematical operations and file management, such as descriptor extraction using Rosetta, energy subtraction between docking and the original protein and voxelisation. Most of these libraries are only compatible with Python 2.7, making it important to work with this version.
2.2.1.1. Preparation
Initially, a working folder was created to store all the necessary files for the docking process. The required files included proteins (prefixed with ‘protein_’), ligands (prefixed with ‘ligand_’), the docking software Vina, the custom execution script vina_script.sh, and the configuration file conf.txt containing coordinates obtained through AutoDock tools, which delimit the docking space.
2.2.1.2. Process steps
- Preparation of Proteins and Ligands:
-
○Protein files with the prefix ‘protein_’ were placed in the working folder.
-
○Ligand files were verified and, if necessary, renamed using the anadir_ligand.py script to ensure they had the prefix ‘ligand_’. The ligand files in. pdbqt format were also placed in the working folder.
-
○
- Execution of Docking:
-
○The docking was executed using the vina_script.sh script with the command. /vina_script.sh > output.txt. This script performed the docking process and saved the results in the same folder, recording binding affinities in the output.txt file, which were used as descriptors.
-
○
By following these steps and utilising the developed scripts, the docking process was carried out in an organised and efficient manner. The provided documentation ensures a comprehensive understanding of the work performed and the methodologies employed.
2.2.2. Methodology for the design and training of neural network architectures
Three different architectures were designed with varying numbers of layers and neurons. The design emphasised convolutional layers over fully connected layers, or vice versa, and included a balanced approach. Ten training sessions with different parameter sets were conducted under the configuration of hyperparameters considered optimal based on a series of test trainings and validation checks using a validation graph table. For each set of models, two versions were trained, one with movable test set and the other with fixed test set.
This document describes the methodology used to design and train three different neural network architectures, varying the number of layers and neurons, as well as the approaches in layer design.
2.2.2.1. Architecture design
Variety of Architectures:
-
○
Three distinct architectures were designed, each with a variable number of layers and neurons.
-
○Design approaches:
- Priority to convolutional layers: More convolutional layers compared to fully connected layers.
- Priority to fully connected layers: Greater emphasis on fully connected layers than on convolutional layers.
- Balanced architecture: A balance between convolutional layers and fully connected layers.
2.2.2.2. Network training
Training configuration:
Ten training sessions were conducted with different sets of parameters.
Optimal hyperparameters were determined through a series of test trainings and verified using a validation graph table.
Two version of any training session, movable test set and fixed test set approaches, always shuffling train and validation sets (Figure 2).
FIGURE 2.

Architecture of the neural network.
2.2.2.3. Loss functions and comparison
Loss function:
-
○
Mean squared error (MSE) was used as the loss function to help minimise large errors during training.
Results comparison:
-
○
Mean absolute error (MAE) was used to compare the results of the architectures. This approach provides greater clarity in understanding the differences between the obtained results.
3. ASSESSMENT
3.1. Statistical analyses
3.1.1. Statistical assessments
The analyses conducted reveal several important findings regarding the evaluated configurations. First, the Friedman test demonstrates significant differences between the configurations in both the testing and validation phases. This suggests that the configurations used have a considerable influence on the results obtained.
The post hoc Nemenyi analysis, on the other hand, provides specific results regarding these differences in the validation phase. Configurations that include energies (25 × 25 × 25, 2 × 2 × 2 or the approachment including only energies and binding affinity) show significant differences compared to those where the input is directly made into the fully connected network. This finding indicates that energies and other voxelised descriptors have a distinct impact on model performance during validation.
However, the post hoc Nemenyi analysis does not show significant differences in the testing phase. This apparent contradiction with the Friedman test is an unusual phenomenon suggesting that, although there is an overall difference between the configurations, none stands out significantly over the others in the testing phase. This contradiction is expected to be resolved by expanding the data set with docking involving more proteins, thereby increasing the number of energies and other descriptors.
3.1.2. General assessments
Improvement in binding affinity: The binding affinity extracted from molecule docking significantly improves results compared to other descriptors derived from docking. This finding underscores the importance of this specific parameter in model performance.
Impact of energies and voxelised descriptors: While energies and other voxelised descriptors do not provide the best results on their own, a higher level of definition and the inclusion of more convolutional layers help reduce error, bringing it closer to the optimal value achieved so far. This suggests that, with more detailed processing, these descriptors may still offer significant benefits.
Use of descriptors from multiple proteins: Currently, the descriptors used focus on a single protein. A crucial future work would be to extend this methodology to extract descriptors from a variety of proteins, evaluating whether this improves prediction accuracy.
Effectiveness of 2D Convolutional Layers: 2D convolutional layers applied to structural images of chemicals have proven superior to voxelised descriptors. In tests, these layers provide the best results in two of the three evaluated architectures, highlighting their potential in future applications. However, in the architecture with the highest convolutional density, the results are almost similar between the image input architecture and the architecture based on voxelised docking descriptors.
4. CONCLUSION
The innovation of this research lies in the application of molecule docking; a simulation method for chemical‐protein binding that has become computationally feasible only in recent decades. By combining chemicals with known LD50 values and the protein acetylcholinesterase, the study extracted descriptors such as energy values and Boolean indicators, distributed in a 3D space and voxelised into a 25x25x25 grid for processing. This data set combined has become the cornerstone of our research.
In addition to these descriptors, the research utilised SMILES formulas (from which we obtain the number and type of halogen atoms), binding affinity data and 2D chemical structures. Three different neural network architectures were trained, varying the fully connected and convolutional layers to identify patterns in 3D spaces or images. Initial results indicate that two of the three architectures showed improved performance using 2D structures, with potential for further error reduction through the inclusion of more docking descriptors and proteins.
This research also highlighted the distinct impact of voxelised energy descriptors on model performance, suggesting that while 2D convolutional layers applied to structural images of chemicals are currently more effective, there is significant potential for improvement through the integration of more detailed voxelised descriptors. Future studies should focus on expanding the data set to include multiple proteins, which could enhance the accuracy and generalisability of the predictive models. Additionally, the importance of binding affinity as a critical descriptor was underscored, emphasising its role in refining model performance. The findings suggest that a more comprehensive approach, combining multiple types of descriptors and advanced neural network architectures, may yield superior predictive capabilities in chemical‐protein binding studies.
5. RECOMMENDATIONS
To continue improving in this line of research, the following actions are proposed:
Expansion of energies and other docking‐based descriptors: It is essential to broaden the range of energies and descriptors based on the conjunction of current chemicals and a wider set of proteins. This will allow us to evaluate how protein diversity influences model accuracy and exponentially expand the training set.
Filtering compatible proteins: Proteins should be filtered to identify those that are compatible with the studied chemicals under in vitro conditions. This will ensure that the descriptors used are relevant and representative.
Evaluation of new descriptor combinations: It is recommended to thoroughly evaluate other combinations of descriptors, especially those related to 2D structures. This evaluation will help identify the most effective configurations for improving model accuracy.
Conversion of 2D structures to graphs for training GNNs: While convolutional networks are currently trained to learn patterns from 2D structures, it could be interesting to explore a graph‐based approach to eliminate non‐informative data.
Improvement through simultaneous predictions: A promising approach is to attempt improving results through simultaneous predictions of multiple parameters, such as oral and dermal LD50, or even binding affinity.
In summary, the results obtained so far are promising and provide a solid foundation for future research. The expansion and refinement of current methodologies, along with the exploration of new combinations and approaches, promise significant advances in prediction and analysis in this field.
COPYRIGHT FOR NON‐EFSA CONTENT
EFSA may include images or other content for which it does not hold copyright. In such cases, EFSA indicates the copyright holder and users should seek permission to reproduce the content from the original source.
ACKNOWLEDGEMENTS
This report is funded by EFSA as part of the EU‐FORA programme. The Authors wishes to thank the following for the support provided to this scientific output: Marco Chino, Francesco Altiero, Angelo Ciaramella. A special thanks to Beatriz Remeseiro, who has taught Enol Junquera much of what he has applied in this work.
Junquera, E. , Díaz, I. , Montes, S. , & Febbraio, F. (2024). New approach methodologies for risk assessment using deep learning. EFSA Journal, 22(S1), e221105. 10.2903/j.efsa.2024.e221105
Approved: 23 September 2024
The declarations of interest of all scientific experts active in EFSA’s work are available at https://open.efsa.europa.eu/experts
REFERENCES
- Burley, S. K. , Bhikadiya, C. , Bi, C. , Bittrich, S. , Chen, L. , Crichlow, G. V. , Christie, C. H. , Dalenberg, K. , Di Costanzo, L. , Duarte, J. M. , Dutta, S. , Feng, Z. , Ganesan, S. , Goodsell, D. S. , Ghosh, S. , Green, R. K. , Guranović, V. , Guzenko, D. , Hudson, B. P. , … Zhuravleva, M. (2021). RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Research, 49, D437–D451J. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Del Águila Conde, M. , & Febbraio, F. (2022). Risk assessment of honey bee stressors based on in silico analysis of molecular interactions. EFSA Journal, 20, e200912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eberhardt, J. , Santos‐Martins, D. , Tillack, A. F. , & Forli, S. (2021). AutoDock Vina 1.2.0: New docking methods, expanded force field, and python bindings. Journal of Chemical Information and Modeling, 61(8), 3891–3898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- EFSA (European Food Safety Authority) . (2023). EFSA's public MetaPath database. Zenodo. [Google Scholar]
- Goldsmith, M.‐R. , Grulke, C. M. , Chang, D. T. , Transue, T. R. , Little, S. B. , Rabinowitz, J. R. , & Tornero‐Velez, R. (2014). DockScreen: A database of in Silico biomolecular interactions to support computational toxicology. Dataset Papers in Science, 2014, 1–5. [Google Scholar]
- Hassan‐Harrirou, H. , Zhang, C. , & Lemmin, T. (2020). RosENet: Improving binding affinity prediction by leveraging molecular mechanics energies with an ensemble of 3D convolutional neural networks. Journal of Chemical Information and Modeling, 60(6), 2791–2802. [DOI] [PubMed] [Google Scholar]
- Irwin, J. J. , & Shoichet, B. K. (2005). ZINC – A free database of commercially available compounds for virtual screening. Journal of Chemical Information and Modeling, 45(1), 177–182. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim, S. , Chen, J. , Cheng, T. , Gindulyte, A. , He, J. , He, S. , Li, Q. , Shoemaker, B. A. , Thiessen, P. A. , Yu, B. , Zaslavsky, L. , Zhang, J. , & Bolton, E. E. (2023). PubChem 2023 update. Nucleic Acids Research, 51(D1), D1373–D1380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leman, J. K. , Weitzner, B. D. , Lewis, S. M. , Adolf‐Bryfogle, J. , Alam, N. , Alford, R. F. , Aprahamian, M. , Baker, D. , Barlow, K. A. , Barth, P. , Basanta, B. , Bender, B. J. , Blacklock, K. , Bonet, J. , Boyken, S. E. , Bradley, P. , Bystroff, C. , Conway, P. , Cooper, S. , … Bonneau, R. (2020). Macromolecular modeling and design in Rosetta: Recent methods and frameworks. Nature Methods, 17(7), 665–680. [DOI] [PMC free article] [PubMed] [Google Scholar]
