Skip to main content
EPA Author Manuscripts logoLink to EPA Author Manuscripts
. Author manuscript; available in PMC: 2026 May 13.
Published in final edited form as: Comput Toxicol. 2025 May 13;34:100353. doi: 10.1016/j.comtox.2025.100353

Can graph similarity metrics be helpful for analogue identification as part of a read-across approach?

Brett Hagan a,b, Imran Shah b, Grace Patlewicz b,*
PMCID: PMC12973264  NIHMSID: NIHMS2131328  PMID: 41816015

Abstract

Read-across is a technique used to fill data gaps for substances lacking specific hazard data. The technique relies on identifying source analogues with relevant data that are ‘similar’ to the substance of interest (target). Typically, source analogues are identified on the basis of structural similarity but the evaluation of their suitability for read-across depends on other contexts of similarity. This manuscript aimed to review the ways in which source analogues are identified for read-across using chemical fingerprint/scaffold approaches before describing graph-based approaches including; graph kernel, graph embedding, and deep learning. To demonstrate how these could be practically used for analogue identification, five different toxicity datasets of varying size and diversity were selected that had been the subject of previous read-across or QSAR analyses. One dataset was an analogue set whereas the other four datasets comprised substances evaluated for their skin sensitisation, skin irritation, fathead minnow aquatic toxicity and genotoxicity potential. The analogues and their associated similarities using the different graph based approaches were compared with the outcomes from two chemical fingerprint approaches (ToxPrints and Morgan). The results for each dataset are briefly described. Based on the examples evaluated, graph kernel approaches were found to have some promise, in contrast unsupervised whole graph embedding approaches were ineffective for all the datasets evaluated. Graph convolutional networks produced meaningful embeddings for the genotoxicity dataset evaluated. Depending on use case, availability and size of training data, graph similarity approaches have the potential to play a larger role in analogue identification and evaluation for read-across.

Keywords: Read-across, Graph similarity, Graph kernels, Graph Convolutional Networks (GCNs)

1. Introduction

1.1. Background to the need for Read-Across

There are tens of thousands of substances that exist in active commerce e.g. the US Toxic Substances Control Act (TSCA) inventory comprises ~42,000 substances, of which only a small proportion have undergone sufficient toxicological evaluation [1]. In a recent EPA report [2], only 15% of substances in US commerce had been subjected to any of the standard toxicity tests used to characterise human health that assessing each chemical would present a significant and impractical challenge in terms of cost, animal welfare, and resources. In vitro and in silico approaches have the potential to play a large role in prioritising which chemicals to focus on in the absence of conventional toxicity data. In silico approaches encompass existing information, (quantitative) structure–activity relationships ((Q)SAR) as well as read-across, both the latter two relate chemical structure to (eco)toxicological or physical property endpoints. QSARs are commonly used to address gaps for environmental fate, ecotoxicological and physical property endpoints whereas read-across is most often used for human health related endpoints. To illustrate its significance for regulatory purposes, read-across is cited as the most commonly used adaptation to address information requirements under the European Union’s Registration Evaluation and Authorisation of Chemicals (REACH) regulation [3,4].

In brief, read-across describes the method for filling a data gap whereby a substance with existing data (termed the ‘source analogue’) is used to make a prediction of the same property for a ‘target’ substance with limited available empirical data. Read-across can be performed in a number of different ways to fill data gaps: one-to-one; many-to-one (two or more source analogues being used to make a prediction), one-to-many or many-to-many [5,6]. The approach relies on the premise that both source and target substances are ‘similar’ in some context with relevant information pertaining to a specific outcome [5,6]. Key to this approach is the characterisation of similarity. Although structural similarity is the most common approach used to identify candidate source analogues, other similarity contexts such as similarity in physicochemical properties, metabolism, chemical reactivity, bioactivity and toxicological profile also play a significant role in justifying the relevance and suitability of those source analogues for read-across. For example, metabolic similarity might entail an assessment of the similarity of transformation pathways or the commonality of metabolites formed as determined in experimental studies. Physicochemical similarity might compare certain physical property information such as the log of the octanol–water partition coefficient (logKow), melting point, boiling point etc. of source analogues relative to the target substance to determine whether physical form and partitioning are likely to be sufficiently similar. Similarity in toxicity might evaluate whether the available empirical data identifies the same target organs impacted and whether the potencies are comparable or follow a specific trend. Such similarity context assessments are largely qualitative and heavily reliant on expert judgement in concert with empirical data [7]. This does result in challenges in terms of reproducibility, scalability and acceptance for regulatory purposes [8,9]. Indeed, read-across as a technique has been in use for ~25 years, but acceptance for certain regulatory contexts (e.g. risk assessment) or within specific jurisdictions still remains variable [9]. Thus, progress towards approaches that may increase confidence in and reduce the levels of inherent uncertainty in read-across predictions continue to be a focus of ongoing research.

Significant effort has been directed towards the evaluation of confidence in analogue identification and evaluation across a wide range of studies [7,1013]. Several have aimed to define frameworks for characterising uncertainty [10,11,13,14], whereas others have demonstrated how high-throughput screening data can be helpful in substantiating mechanistic or biological similarity within read-across justifications [13,15,16]. The European Chemicals Agency (ECHA) has developed a read-across assessment framework in an effort to improve the characterisation and documentation of read-across uncertainties [17] whereas the Organisation of Economic and Co-operative Development (OECD) has been facilitating the development of case studies with the aim of updating existing grouping technical guidance [18] with one focus being on reducing read-across uncertainties (Integrated Approaches to Testing and Assessment (IATA)). In our own work, Generalised Read-Across (GenRA) [8,9] was created with the goals of quantifying performance and uncertainty by establishing performance baselines and quantifying the contribution that different similarity contexts play in identifying source analogues and making toxicity predictions. Research has continued to evaluate the impact that different types of similarity play in read-across for the prediction of in vivo toxicity outcomes [1923] together with implementing the insights gained in the publicly available web application GenRA (https://www.comptox.gov/genra) [9,1924].

Existing technical guidance as well as other reviews have described the main terms of reference for read-across [6]. This article aims to describe the ways in which source analogues are typically identified for read-across using chemical fingerprint/scaffold approaches before moving on to explore the potential utility of graph based approaches. Three such approaches; graph kernel, graph embedding, and deep learning (DL) approaches were selected for use in a series of case studies. After outlining the conceptual basis of each approach, we then evaluate how these could be practically used for analogue identification using five different toxicity datasets of varying size and diversity. The size of the different datasets influences which graph based approach might be more meaningful to investigate. The datasets selected had been the subject of previous read-across or QSAR analyses, were ones that at least one of the authors was familiar with or helped assemble which allowed for a reasonable understanding of the selection process for the substances they contained and the quality of the underlying biological data. They were also chosen to represent a range of modes of action which could influence the identification and evaluation of analogue similarity. After evaluating each case study dataset, recommendations were made as to the suitability of each of the approaches depending on use case, endpoint and size of dataset. Fig. 1 provides a graphical outline of this article for ease of reading.

Fig. 1.

Fig. 1.

Graphical guide of the organisation of the manuscript.

1.2. Source analogue identification

There are a number of software tools that facilitate the identification of source analogues. Most of these use structural similarity as a basis to return analogues. This is usually performed in one of two main ways – either by a descriptor-based similarity calculation or a substructure-based assessment [25]. In practice, this means that a software tool contains a large database (or dataset) of chemical substances that serves as a source analogue inventory. To identify analogues, a search query is performed using the target substance of interest to return candidate analogues. In a substructure-based approach, a determination of the substructures shared with the target substance are made or matched molecular pairs [26,27] are generated to identify common core structures that are distinguished at a given site. Such substructure-based calculations are binary – either the target and source analogues share a pre-defined substructure or not, therefore no adjustable threshold exists to tune the returned set of candidate analogues. On the other hand, the hits returned are often more chemically intuitive and interpretable.

In a descriptor-based approach, the key considerations are how the substances forming the source inventory are represented numerically and what metric is used to quantify a specific threshold of similarity. Source analogues can be characterised by 1D, 2D or 3D representations of structures or hybrids of these. The EPA CompTox Chemicals Dashboard [28], PubChem, as well as the many functionalities within the OECD Toolbox (qsartoolbox.org) [29] facilitate such analogue searches. Two dimensional binary chemical fingerprints are frequently used for practical efficiency especially when a source inventory contains large numbers of substances e.g. 1.2 million substances that underpin the EPA CompTox Chemicals Dashboard. A target substance will then be converted into the same chemical fingerprint representation and a query based on pairwise similarities will return a number of candidates either based on the similarity threshold set or a user-defined number of candidates. The similarity threshold is a quantitative measure between 0 and 1 that summarises the commonality in structure based on the presence and absence of particular chemical fingerprints. By far, the most common similarity index that is used is the Tanimoto (Jaccard) index [30] though there are a number of other similarity indices that can also be employed [30,31]. The choice of similarity index depends on the chemical representation used. A Tanimoto index lends itself best to binary fingerprints whereas other metrics (as outlined in Gallegos-Saliner et al. [32]) may be more suitable in cases where continuous descriptors represent the source analogues. Whilst there is no specific threshold to determine the suitability of a source analogue for read-across, using a Tanimoto score of 0.6 or higher can be a reasonable starting point to return an initial set of source analogues for further evaluation.

There are several types of chemical fingerprints, one of the most popular is the extended connectivity fingerprint (ECFP) or Morgan fingerprint [33]. The ECFP defines molecular features by assigning identifiers to each of the atoms in the molecule based on some combination of properties such as atomic number, atomic mass etc. Then each atom collects its identifier and those of its neighbouring atoms into an array and uses a hash function to reduce the array into a single integer identifier. This captures the neighbourhood of the atom. Once all atoms have generated their new identifiers, these are updated and the process is performed several times over. After each iteration, the identifier contains information about the immediate neighbours and then the neighbours of those neighbours and so on until each atom will contain information from all parts of the molecule. Finally, the identifiers are converted into a bit array depending on the length of the fingerprint array that the user has defined. ECFP4 is probably the most common ECFP fingerprint where the 4 denotes the largest possible fragment having a width of 4 bonds.

Another type of fingerprint is the key or dictionary fingerprint where there is a defined fixed set of substructural features representing molecular characteristics. MACCS (Molecular Access System by Molecular Design Limited) [34] and ChemoType ToxPrints [35] are examples of these. The MACCS fingerprint was one of the first developed, containing 166 structural features. The original ToxPrints comprised a set of 729 generic structural fragments organised by atom, bond, chain, ring types as well as specific chemical functional groups. Atom pairs forms another type of fingerprint where an algorithm of atom typing is performed such that certain values for each atom of a molecule is computed [36]. An atom pair is defined in terms of the atomic environments of, and the shortest path separations between, all pairs of atoms in the topological representation of a chemical structure.

In each case, the fingerprint is usually represented as a bit string or binary vector to denote presence and absence of a structural feature that can then be used as a query to search for source analogues. Some fingerprints can also encode counts to capture the number of occurrences of a structural feature rather than just its presence or absence.

Chemical fingerprints have proved useful for fast similarity comparisons as well as inputs into development of QSARs for different activity outcomes including toxicity endpoints. The fingerprints themselves represent a simplified representation of a chemical that may be insufficient to resolve differences in toxicity outcomes that are important in read-across. For example, Morgan circular fingerprints are typically poor at perceiving global features of a molecule (e.g. size or shape) and may fail to discriminate between subtle changes between 2 small molecules. More details on the different types of structural representations can be found in a recent review [37].

This study took inspiration from Mellor et al [38], to evaluate whether considering the inherent representation of a chemical structure as a molecular graph, with atoms as nodes and bonds as edges, might offer novel ways of characterising structural information and in turn similarity for read-across.

It is important to acknowledge that considering chemicals as molecular graphs is not a novel concept. In fact, a wide variety of chemical properties and processes have been modelled using information derived from molecular graphs for many decades. Traditional topological indices for chemical structures are algebraic invariants of hydrogen depleted molecular graphs which represent the topology of a molecule. There are hundreds of topological indices although the majority can be broadly categorised into 5 main types namely: degree-based indices, distance-based indices, count-based indices, eigenvalue-based indices and information-theoretic indices [39]. Further information on these indices is provided in the supplementary information.

1.3. Graph similarity

Topological indices provide a single, composite number that characterises each molecule’s structure. Whilst this approach is advantageous for its simplicity and efficiency, it often compresses complex structural information into one value, which can obscure finer details about specific molecular substructures. To address this, a broader range of graph similarity methods have been developed, allowing for more granular analysis. Techniques such as graph edit distance, graph isomorphism, and maximum common subgraph matching offer ways to directly compare molecular structures, identifying subtle differences and shared features that single-index values may overlook [4046]. These are summarised in brief.

1.3.1. Graph edit distances using Reduced graphs

Reduced graphs provide a summarised representation of a chemical structure that are produced by collapsing connected atoms into single nodes and forming edges between the nodes in accordance with bonds in the original structure. Reduced graphs have been used in a variety of applications in cheminformatics ranging from the representation and search of Markush structures to the identification of structure–activity relationships (SARs). There are a number of different graph reduction schemes though each has been devised to address a different purpose [4749]. Graph reduction schemes have been developed for similarity searching often with the objective of identifying substances with similarity in activity.

1.3.2. Graph isomorphism

Questions in chemical similarity have often been framed as graph comparison problems where chemical equivalence may be modelled as a graph isomorphism task, i.e. are two chemical graphs identical (isomorphic)? Another question may be to determine whether some chemical graph is included as part of another chemical. Searching for a specific substructure (e.g. a benzene ring) within another chemical has been modeled as a subgraph isomorphism task. The enumeration of possible chemical structures is closely related to graph enumeration [50]. Graph isomorphism is a test of structural equivalence, wherein two graphs are isomorphic if a structure exists that preserves a one-to-one correspondence between the two graphs sets of nodes and edges.

1.3.3. Maximum common subgraph

A common need in cheminformatics is the ability to align pairs of molecules together to make a determination of the degree of structural overlap. This is useful when exploring SARs, predicting bioactivity of substances or identifying chemical reaction sites. The degree of overlap between a pair of chemicals can be achieved using maximum common subgraph isomorphism algorithms [51,52]. In cheminformatics, maximum common subgraph isomorphism is usually referred to as identifying the maximum common substructure (MCS). Given two structures, the MCS is the largest substructure common to both. Maximum could be interpreted to imply the maximum number of atoms, number of bonds, number of cycles or even some physical property. There are also variations in how atom and bond equivalency might be defined. However, the most common MCS is where all atoms are the same if the element numbers are the same and the bonds are of the same type. There are a range of algorithms that can determine the MCS between pairs of chemicals. Raymond and Willett [52] reviewed the main solutions for pairwise MCS including multiple MCS [53].

Although methods such as the maximum common subgraph (MCS) excel at pinpointing shared structural features in pairwise comparisons, they can become computationally intensive and less scalable when applied to large chemical datasets. Further, the presence of different functional groups can reduce the similarity as the mismatch prevents inclusion into the MCS especially if the algorithm used prioritises topological similarity relative to functionality similarity. To address some of the limitations, more recent graph similarity approaches, such as graph kernel methods, graph embeddings, and deep learning-based techniques, have been developed. Graph kernel methods directly calculate a similarity score between two graphs based on their structural properties. Graph embedding methods transform graphs into numerical representations (vectors) that can be compared using standard distance metrics. These techniques are both unsupervised in that the representations are not tuned or customised for any specific outcome such as toxicity. Deep learning methods e.g. neural networks learn these numerical representations using labelled data (i.e. supervised) such that the resulting embeddings capture insights about the toxicity endpoint of interest.

1.3.4. Graph kernels

Graph kernels were first introduced as a way to compare complex structures like graphs based on a concept from Haussler’s work on kernels for discrete structures [54]. The term “graph kernels” soon emerged to describe methods specifically for comparing graphs [5456]. The core idea behind graph kernels is to break down a graph into smaller components, called substructures. These substructures are then used to create feature vectors, which characterise the graph. By comparing these feature vectors, it is possible to measure how similar two graphs are. The inner products of the feature vectors can be efficiently computed to produce a similarity score between the graphs. The key to graph kernels lies in how the graph is decomposed. One simple approach is to count how many node labels are shared between graphs and computing the inner products of these label counts to produce a similarity score [57]. Fig. S1 provides a conceptual example of counting node labels.

Another way to decompose a graph is through random walk kernels. This involves taking random paths through the graph and counting how often each path occurs in each graph [58]. Shortest path kernels aim to find the shortest paths between labelled nodes (atoms) in each graph and using these to construct feature vectors [59]. A more advanced method builds on the Weisfeiler-Lehman (WL) graph isomorphism heuristic that was introduced by Shervashidze in 2011; known as the WL subtree kernel [60]. The WL isomorphism heuristic works by iteratively updating the labels of each atom based on the labels of its neighbouring atoms. Over several iterations, this process captures more detailed substructures within the molecule. This helps capture the context of each atom in the molecule gradually embedding the molecular structure into the labels. As the labels evolve, they encode increasingly larger neighbourhoods around each atom. This means that the WL kernel can capture structural features like functional groups that are shared between molecules. If at any point the labels of the atoms in the molecular graphs do not match, the algorithm is terminated as the two molecular graphs can not be isomorphic. The number of matching labels across iterations serves as a measure of graph similarity i.e. how similar the molecules are in terms of their structure. Fig. 2 shows an example iteration of the kernel between two graphs.

Fig. 2.

Fig. 2.

One iteration of the WL kernel. Feature vectors initially consist of counts of original node labels. At each iteration, new labels (colors) are created for each atoms by considering the labels of its neighbours. Nodes a in both G1′ and G2′ are labeled grey as they were both adjacent to a single blue label in the previous iteration, whereas nodes e in G1′ and G2′ are assigned different labels due to the differences in their neighbouring node labels. The feature vectors consist of counts of the original and newly created node labels as iteration continues until a defined limit or convergence is reached. The inner products are computed to obtain a similarity score.

1.3.5. Graph embeddings

Whilst there are numerous advantageous qualities to graph representations, their unstructured, relational nature does not allow them to be directly used as inputs into read-across and QSAR models [61]. To overcome this limitation, graph embedding techniques are used to create lower dimensional representations of graph data whilst retaining as much topological and label information as possible. Graph embeddings allow for a type of similarity measurement between graphs. Embedding methods represent graphs in a multi-dimensional latent space, where highly similar molecular graphs will lie near each other, and dissimilar molecular graphs will lie further apart. The distance between the embedding of two molecular graphs in the latent space provides a quantitative measure of similarity.

A number of different methods exist that are capable of creating graph embeddings which can be broadly divided into two categories: node embeddings, and whole graph embeddings. Node embeddings map individual atoms in a molecular graph to numerical vectors, capturing atom characteristics and relationships. Graph embeddings represent the entire molecular graph as a single vector, often by combining atom embeddings or using other methods, to permit pairwise molecular graph comparisons. There are a variety of different approaches to either task, with well established taxonomies in literature dividing them into three distinct categories; matrix factorisation methods, random walk based methods, and neural network methods, with substantial areas of overlap between the three [62,63].

Matrix factorisation techniques were the earliest studied, beginning with the multi-dimensional scaling (MDS) that decomposed adjacency matrices [64.]. Other factorisation methods operate on graph proximity (distance matrices) or graph Laplacian matrices [65,66]. Although factorisation methods are the most well-established and theoretically understood, they often scale poorly [67]. Random walk based embeddings [68] later emerged based upon word and document embedding methodologies such as Word2Vec, adopting the skip-gram neural network model used to create word embeddings to the graph context. The skip-gram model is a simple single hidden layer neural network (see Fig. 3) that is trained to predict the probabilities for each word in a given vocabulary to appear near in sequence to a given target word. The network is trained, and the weights of the trained network are exploited as vectorised word embeddings, with the underlying intuition being that words that often appear in similar contexts are likely to be highly similar [69].

Fig. 3.

Fig. 3.

Skip gram model for Word2Vec word embeddings. An one hidden layer neural network is trained to determine the probabilities for each word in a vocabulary of appearing near in sequence to a given target word. The target word is presented as an one-hot encoded input, and after training via backpropagation over a number of epochs, the hidden weights of the network are used as embedded vector representations of words.

In the chemistry domain, Jaeger et al [70] developed Mol2Vec which is synonymous to the concept of Word2Vec. Mol2Vec was developed to learn vector representations of molecular substructures that point in similar directions for chemical related substructures. Substructures were derived using the the Morgan algorithms as “words” and substances as “sentences”. The Word2Vec algorithm was then applied to a corpus of 19.9 million substances taken from the ZINC and ChEMBL databases. The feature vectors for the substructures generated were then summed to obtain substance vectors which could be used as inputs for any subsequent machine learning approaches. DeepWalk adapted the SkipGram approach to a graph setting [68] for node embedding. Words are analogous to nodes in the graph, the sequences of words (a “context”) are analogous to random walks across node neighborhoods, and the vocabulary of words is analogous to all nodes in the graph. Node2Vec iterated upon DeepWalk with the introduction of parameters to control the length and freedom of the random walk operations [71]. Graph2Vec iterated upon Node2Vec to allow for skip-gram based whole graph embeddings based off rooted subgraphs analogous to words in Word2Vec [72]. In the context of molecular graphs, Graph2Vec identifies recurring substructures across many molecules and learns which are important and how they combine to form the whole molecule. The information is then encoded into a fixed length vector for each molecule. Each molecule is represented by a vector that captures the overall structure from both a local perspective (in terms of specific functional groups) in additional to more global patterns (like the arrangement of these features). GL2Vec improved upon Graph2Vec in classification tasks by incorporating information gleaned from a line graph representation, better allowing for the capture of structural information [73]. The LDP (Local Degree Profile) embedding method introduced in 2019, showed comparable performance to more sophisticated embeddings methods while only considering the degree information of nodes in a graph without considering any label information whatsoever [74].

1.3.6. Deep learning embeddings

Graph neural networks (GNNs) were introduced in 2009 with the goal of extending existing neural network models for processing graph structured data [75]. Graph convolutional networks (GCNs) were introduced by Duvenaud et al. [76] to operate on graphs for molecular property predictions. Subsequently, Coley et al. [77] constructed feature vectors of atoms using atom and bond attributes in molecules and considered local chemical environment information within different neighborhood radii. By directly inputting the complete molecular graph into the Convolutional Neural Network (CNN), the model could learn to recognise atom cluster features, significantly improving the performance of the CNN model. Gilmer et al. [78] reformulated existing models as message passing neural networks (MPNN) and leveraged MPNN to demonstrate state-of-the-art results on quantum mechanical property prediction tasks for small organic molecules. Wang et al. [79] used graph structures with convolutional networks to discover the relationship of each atom and designed a convolution spatial graph embedding layer (C-SGEL) to make full use of the spatial connectivity information of molecules.

At the base level, GCNs take a graph as input and pass it through a number of convolutional layers that aggregate each node’s neighbourhood information. At each training epoch, each node in the graph has its hidden state updated by aggregating each of the node’s neighbours hidden states together by some function and combining it with the current hidden state of the node. The output of convolutional layers is a set of node embeddings, vectorised representations of each node in the graph. Whole graph embeddings are generated from these individual node embeddings by combining them through a “pooling” layer that aggregates the node embeddings together. The resulting embeddings can then be used as inputs into different regression or classification based machine learning models (see Fig. 4).

Fig. 4.

Fig. 4.

Graph convolutional network conceptual model. A graph is given as input into the model and passed through a series of convolutional layers and activation functions that produce embeddings for each node in the graph. The individual node embeddings are aggregated together by some pooling operation in a readout layer in order to produce a whole graph embedding as output that can be used in building classification or regression models.

2. Case study methods

Having reviewed ways in which source analogues can be identified using chemical features and providing an overview of different graph similarity approaches, we turn our attention to evaluating how different graph based approaches could practically be used to assess similarity for read-across. Three representative approaches; Weisfeiler-Lehman (WL) graph kernel, graph embeddings, and deep learning (DL)-based models that span a spectrum of methodological complexity and interpretability were selected. This was motivated by the need to explore how different levels of structural abstraction and model capacity affected similarity in datasets varying in size, diversity, and chemical context. Morgan and ToxPrint chemical fingerprints were used as baseline comparators. Five different datasets varying in size, from an analogue approach case taken from an EPA assessment to a large publicly available dataset of genotoxicity outcomes for several thousand substances were selected as case studies. The selection of the datasets were influenced by several considerations; 1) they had been the subject of previous read-across or QSAR analyses, 2) were ones that at least one of the authors was familiar with which allowed for a reasonable understanding of the selection process for the substances they contained and the quality of the underlying biological data; and 3) they represented a range of endpoints and use cases which could influence the use of molecular similarity. For example, the single analogue approach reflected the expert assessment use case for repeated dose toxicity whereas the large genotoxicity dataset could leverage deep learning approaches.

2.1. Datasets analysed

The five datasets represented different read-across scenarios, thus allowing several different types of similarity calculations to be performed on different representations of chemical structure. The datasets were publicly available, and had been extensively curated as part of their original publications. The repeated dose dataset was the subject of one of the published EPA Provisional Peer-Reviewed Toxicity Values (PPRTV) assessments. A PPRTV is defined as a toxicity value derived for use in the EPA Superfund Program (see https://www.epa.gov/pprtv). The Local Lymph Node Asssay (LLNA) dataset was taken from 2 publications that were evaluating the performance of different in vitro test methods to address the information requirements for REACH as part of an IATA. The reaction domain information annotated for the dataset was one of the outcomes of an ECVAM expert workshop whose objective was to evaluate the need for new methods to assess pre- and pro-haptens [80]. The BfR skin irritation dataset underpinned the BfR structural rulebase classification scheme [81]. The dataset was provided to the JRC for use in re-implementing the scheme into the JRC Toxtree rulebase software tool [82] as well as its sister read-across tool, ToxMatch [83]. The fathead minnow dataset underpins the EPA’s ASTER scheme [84] that assigns mode of action information. Finally the genotoxicity dataset was compiled as part of an EPA project aimed at helping to prioritise potential candidate substances for risk based evaluations under TSCA (https://www.epa.gov/sciencematters/proof-concept-case-study-integrating-publicly-available-information-screen) and subsequently used to investigate development of consensus models for genotoxicity [85]. Table 1 summarises the datasets.

Table 1.

The datasets investigated in this study with a description of the coverage and scope.

Dataset No. Effect/Toxicity No. Chemicals Types of Chemicals Techniques attempted Reference
1 Repeated dose toxicity 6 Analogue approach for a nitrotoluene and its analogues taken from a published EPA assessment WL PPRTV
2 Local Lymph Node Assay (LLNA) for skin sensitisation that have both chemical and biological diversity 222 A broad range of chemicals capturing different reaction mechanisms. Dataset was originally compiled to evaluate the performance of different in vitro assays relative to in vivo sensitisation outcomes. The reaction domains were assigned by an expert panel during an ECVAM workshop as described in the cited reference. WL, Graph2Vec, Mol2Vec Patlewicz et al [80]; Asturiol et al [86]
3 Fathead Minnow MOA aquatic acute toxicity 617 Broad range of chemicals capturing different Mode of Action (MOAs). Dataset underpinning various EPA models such as ASsessment Tools for the Evaluation of Risk (ASTER), WL, Graph2Vec, Mol2Vec Dataset taken from the public read-across tool, ToxMatch [83]
4 BfR skin irritation 70 Training set of chemicals including aliphatic alcohols, esters, aldehydes and haloalkanes with classification information for skin irritation that was used to inform the BfR rulebase WL Dataset taken from Toxtree [82]
5 Genotoxicity dataset 5403 Summary genotoxicity outcomes extracted from ToxValDB 9.5 but aggregated in accordance with Pradeep et al. [85] Graph2Vec, GCN Pradeep et al. [85]

WL=Weisfeiler-Lehman (WL) graph isomorphism; Graph2Vec = Whole Graph Embedding; Mol2Vec = Node Embedding; GCN = Graph Convolutional Networks.

2.2. Chemical representations

Morgan chemical fingerprints were generated using a radius of 3 and a bitvector length of 2048. ToxPrints were the original 729 features described in Yang et al. [35]. The WL subtree kernel were generated using the Grakel python library [87]. Node level information used for the derivation of WL kernels comprised the atom type, its degree, hybridisation, aromaticity, formal charge and implicit hydrogen count.

Graph2Vec embeddings were created using the KarateClub package [88] from which pairwise cosine distances were calculated. Word2Vec was used to derive a model based on tokenised Morgan fingerprints to derive Mol2Vec type embeddings. A Mol2Vec approach to learning molecular embeddings inspired by natural language processing techniques like Word2Vec was applied to train a model to uncover embeddings. The DSSTox library [89] of approx 0.5 million substances that could be represented by a discrete structure was used as the corpus of diverse chemicals. SMILES were tokenised on the basis of Morgan chemical fingerprints. Gensim’s [90] Word2Vec engine was then used to train a model to learn embeddings for the molecular fragments. Embeddings for the entire molecule were created by taking the mean of the fragment embeddings.

The performance of the different representations were inspected using heatmap plots that summarised the pairwise similarity (or distance) of the datasets. For the genotoxicity dataset, the Graph2Vec embeddings generated were also used as inputs in two classifier models; a k-nearest neighbour (k-NN) classifier and logistic regression to assess their informative content. The 2 classifiers were implemented using the open source Python package scikit-learn [91] with the area under the curve-receiver operating characteristic (AUC-ROC) as a performance metric. Model performance was assessed through a 5-fold cross validation procedure.

For the deep learning graph convolutional neural network model, three convolutional layers (GATv2Conv), a graph attention layer per the algorithm described in Brody et al. [92] with ReLU activation functions, a global mean pooling readout layer, and a single fully connected linear layer was used to make predictions. For the molecular graphs, one hot encodings of the atom symbol labels were attached as node f eature vectors. The graphs were split into a training and validation set. Using cross entropy loss and an Adam optimiser with a learning rate of 0.001, the model was trained over 50 epochs, with the AUC score of the training and validation graphs reported at each epoch. After training, embeddings for the validation graphs were generated by inputting the graphs into the trained model and extracting the resultant embedding from the readout layer. These were visualised via t-Distributed Stochastic Neighbor Embedding (t-SNE) [93] and labelled by toxicity outcome. These GCN embeddings were also used as inputs into k-NN and logistic regression classification models, with performance compared against the use of Morgan chemical fingerprints.

3. Data and code availability

All analysis was performed in Python 3.10 using Jupyter notebooks. RDKit [94] was used for generation of Morgan chemical fingerprints. The EPA Cheminformatics Modules were used to retrieve ToxPrints.

Molecular graph representations were created using the Python package RDKit [94]. The open source Python package GraKeL [87] was used to implement the WL subtree kernel. The open source Python package KarateClub was used to create the Graph2Vec embeddings. Gensim [90] was used to train the Word2Vec model for the Mol2Vec approach. Scikit-learn [91] was used to develop k-NN and logistic models for the embeddings derived from the Graph2Vec and GCN approaches. Pytorch geometric [95] was used to train the GCN model using the genotoxicity dataset.

The code repository and associated data files supporting this analysis are available at https://github.com/patlewig/metgraph_survey and 10.5281/zenodo.15368541 respectively.

4. Results and discussion

4.1. Pairwise similarities

4.1.1. Local Lymph Node Assay (LLNA)

The LLNA dataset comprised 222 substances with their associated skin sensitising outcome as well as their reaction chemistry domain. The reaction domain assignments had been reported in Patlewicz et al [80] as the result of an ECVAM workshop reviewing which substances were likely to act directly with skin proteins or required some kind of activation either by metabolism or chemical transformation. The assignments differentiated substances that could act via a Schiff base reaction mechanism from a Michael acceptor mechanism or a bimolecular nucleophilic substitution (SN2) reaction mechanism. The five main reaction domains associated with skin sensitisation are described in more detail by Roberts and Aptula [96]. These are Michael acceptors (MA), Schiff base formers (SB), Acyl transfer agents (Acyl), Bimolecular nucleophilic substitution reactions (SN2), and SNAr, nucleophilic aromatic substitution reactions. Essentially, the rate determining step for skin sensitisation relies on a substance forming a covalent bond with a skin protein, thus substances that are electrophilic in nature can bind with nucleophilic skin proteins. Potential skin sensitisers can be identified based on their electrophilic reaction sites e.g. substances such as alpha, beta-unsaturated esters or aldehydes are activated to act via a Michael addition reaction. The reaction domains have formed the basis of structural alerts for skin sensitisation such as those implemented in software tools including the OECD Toolbox and Toxtree as described earlier.

Pairwise similarities were calculated on the basis of the 2 chemical fingerprints as well as the WL kernel approach. The expectation was that higher similarities (closer to 1) could be expected to be observed for substances sharing the same reaction domain compared with the overall dataset. Pairwise Jaccard similarities calculated from using Morgan chemical fingerprints were determined to be low across the entire LLNA dataset. The maximum of the median pairwise similarities across all substances was 0.125, whereas the minimum median value was 0.021. If these same parameters were re-derived for the a specific reaction domain, the values only marginally increased i.e. the range of median pairwise similarities across all 29 Michael acceptors (MA) varied from 0.04 to 0.16, whereas the 20 Schiff base formers (SB) varied from 0.05 to 0.13 and the 14 Acyl transfer agents (Acyl), varied between 0.06–0.16. That is to say, substances sharing the same reaction domain and characterised by Morgan chemical fingerprints show little increase in pairwise similarities relative to the entire dataset. Morgan fingerprints did not appear to capture the reaction chemistry information to differentiate substances acting via a specific pathway.

Pairwise Jaccard similarities using ToxPrints were comparatively higher, with a maximum median Jaccard similarity of 0.17 for the entire dataset and even higher values for specific reaction domains; for Michael acceptors, the maximum median Jaccard similarity was 0.32, for Schiff base formers, 0.25 and Acyl transfer agents was 0.195. Thus pairwise similarities of substances within the same reaction domain assignment tended to be higher than those across the entire dataset, suggesting that ToxPrints were able to capture chemical information pertaining to the functional groups associated with the electrophilic reaction centres to a greater extent than Morgan fingerprints.

Maximum median pairwise similarities were also higher using the WL subtree kernel relative to the other 2 fingerprint types; 0.34 for the entire dataset but the pairwise similarity values were lower for specific reaction domains e.g. 0.24 for Michael acceptors, 0.22 for Schiff base formers and 0.34 for Acyl transfer agents. WL kernels appear to be able to capture more chemical information but whilst the metrics were overall higher, the kernel score dropped within reaction domains suggesting that the WL was unable to discriminate well between domains.

Fig. 5 shows the pairwise similarities for one reaction domain, Michael acceptors (MA) using all 3 approaches where the orange-red cells are indicative of higher pairwise similarities (0.6 and higher). In panel 3 of Fig. 5, from inspection, only a few pairs of substances appear to be very similar based on Morgan fingerprints whereas there appears to be a higher number of similar pairs within the WL and ToxPrint pairwise comparisons.

Fig. 5.

Fig. 5.

Pairwise similarity matrices across the 3 approaches for all chemicals (top panel) and the same matrices restricted by the Michael acceptors (MA) reaction (bottom panel). The pockets of oranges throughout the matrix highlight those pairs of chemicals that are most similar to each other. The frequency of the orange squares in the bottom panel is much more pronounced in the ToxPrints and WL kernel heatmaps whereas there are few if any cases in the Morgan heatmap. There would be an expectation of greater pairwise similarity within a given reaction domain, as the scope of the chemicals would be expected to react via the same reaction chemistry. The fact that the WL and ToxPrints showed a higher proportion of more similar pairs indicates that both these representations appear to better capture the features important for the chemistry of the reaction domain. In contrast, there was little to discriminate the substances when represented by Morgan chemical fingerprints as indicated by the large proportion of pairwise similarities falling in the lowest ranges. Indeed, ToxPrints captured structural features that characterised the reaction domains well which explains the greater proportion of higher pairwise similarities. The WL approach appeared to be often better at characterising structural features relevant for skin sensitisation better than Morgan fingerprints, as evidenced by almost 5% of pairs having a similarity of 0.5 or greater contrasted with only 0.4% of pairs based on Morgan fingerprints.

Extracting from the pairwise similarity matrices, across the whole dataset characterised by Morgan fingerprints (see Table 2), 71% of the pairwise comparisons had a Jaccard similarity range of 0–0.1, with 27% having a similarity range of 0.1–0.3. In contrast, for Michael acceptors characterised by Morgan fingerprints, 49% of the pairwise comparisons had similarities ranging from 0 to 0.1 whereas 43% of the comparisons fell within a similarity range of 0.1–0.3; i.e. within a reaction domain the pairwise similarities were still very low as evident for a lower proportion of cases.

Table 2.

Percentage of Michael acceptor (MA) pairs that fall into different similarity thresholds based on their structural representation.

Fingerprint Representation/Pairwise similarity ranges 0–0.1 0.1–0.3 0.3–0.5 0.5–0.7 0.7–1
Morgan 49 % 43.6 % 6.6 % 0.2 % 0.2 %
ToxPrint 30.5 % 43 % 19.9 % 4.67 % 1.7 %
WL 31 % 52.2 % 11.8 % 4.4 % 0.4 %

For ToxPrints within the Michael acceptor domain, the proportions were quite different from Morgan fingerprints with 30% having a Jaccard similarity range between 0–0.1, 43% with a Jaccard similarity range of 0.1–0.3, 20% having a Jaccard similarity range of 0.3–0.5, and the remainder with similarities in excess of 0.5. Of note, approximately 5% of WL scores within the Michael domain were greater than 0.5.

A handful of example pairs from the same reaction are shown in Table 3 which demonstrates the higher similarities when using ToxPrints and the WL kernel.

Table 3.

Example cases of substances sharing the same reaction domain and their associated pairwise similarity score. Domains are represented by Schiff base formers (SB), Michael Acceptors (MA), Acyl transfer agents (Acyl), nucleophilic aromatic substitution (SNAr), unimolecular nucleophilic substitution (SN1) or bimolecular nucleophilic substitution (SN2).

Domain/Similarity type WL TxP Morgan
SB 2,2,6,6-Tetramethyl-3,5-heptanedione (DTXSID7049396) 5-Methyl-2,3-hexanedione (DTXSID7049215) 0.21 0.5 0.185
MA Ethyl acrylate (DTXSID4020583) Butyl acrylate (DTXSID6024676) 0.458 0.66 0.5
MA trans-2-decenal (DTXSID5047035) trans-2-hexenal (DTXSID1041425) 0.53 0.86 0.48
Acyl Phthalic anhydride (DTXSID2021159) Trimellitic anhydride (DTXSID7026235) 0.538 0.7 0.368

4.1.2. BfR skin irritation

The BfR dataset comprised 70 substances with their associated skin irritation classification outcome per the former EU Classification and Labelling regulation. Substances classified as irritants were labelled with a R38. Fig. 6 shows heatmaps of the pairwise similarities based on Morgan, ToxPrint and WL structural representations. Overall, the WL heatmap showed a larger number of similar pairings compared with the other 2 fingerprint types which is confirmed by the percentages of cases with the higher similarity ranges (Table 4).

Fig. 6.

Fig. 6.

Pairwise similarity matrices across the 3 approaches for the BfR dataset. The pockets of oranges throughout the matrix highlight those pairs of chemicals that are most similar to each other. The frequency of the orange squares is much more pronounced in the WL heatmap overall whereas there are few if any cases in the Morgan heatmap.

Table 4.

Percentage of substance pairs that fall into different similarity thresholds based on their structural representation.

Fingerprint Representation/Pairwise similarity ranges 0–0.1 0.1–0.3 0.3–0.5 0.5–0.7 0.7–1
Morgan 74 % 22 % 2.98 % 0.6 % 0.2 %
ToxPrint 65.9 % 23.36 % 8.29 % 1.36 % 0.9 %
WL 49.8 % 32.8 % 9.8 % 5.1 % 2.3 %

Table 5 highlights several example pairs of chemicals and their pairwise similarities. It is evident that ToxPrints were not able to differentiate between the number of substituents present, such that 1,6-dibromohexane and 1-Bromohexane were considered equivalent. In contrast, WL gave rise to a high similarity wheras the similarity based on Morgan fingerprints was only modest. The pairwise similarity between 1-bromohexane and 1-bromopentane yielded a much higher score when using ToxPrints but was less pronounced for the other two representations; the chain length difference did not impact the score for ToxPrints.

Table 5.

Example cases of substances and their associated pairwise similarity score by the 3 representations.

WL TxP Morgan
1,6-Dibromohexane (DTXSID4044452) 1-Bromohexane (DTXSID4021929) 0.72 1 0.52
1-Bromohexane (DTXSID4021929) 1-Bromopentane (DTXSID3049203) 0.75 0.9 0.71
3-Phenylprop-2-enal (DTXSID1024835) Cyclamen aldehyde (DTXSID2044769) 0.27 0.2 0.13
alpha-Terpineol (DTXSID5026625) D-Limonene (DTXSID1020778) 0.415 0.5 0.4

Although the chemical fingerprints are agnostic to the irritation endpoint from which these examples were drawn from, it was noteworthy that 1,6-dibromohexane is not irritating whereas both 1-bromohexane and 1-bromopentane were both classified as irritating. None of the fingerprint approaches take into account molecular size attributes which could have modulated the differences in irritation potential observed and would be worth capturing to augment the structural representations used. 3-Phenylprop-2-enal and cyclamen aldehyde are both aldehydes which share a benzene ring though the former substance has the potential to react as a Michael acceptor and impart its irritating properties through this reaction pathway whereas cyclamen aldehyde only has the potential to act as a weak Schiff base former. Though both substances are irritating, their pairwise similarities were very low ranging from 0.125 to 0.274; suggesting an absence of any correlation between similarity score and irritation potential. Indeed, alpha-Terpineol and D-Limonene share a cyclic diene scaffold and are both irritating but their pairwise similarities were low although consistent across the 3 representations (0.4–0.5).

Stratifying pairwise similarities by whether substances were irritants or not, led to some shifts in proportions of similar pairs being observed for the different representations. An increase in the percentage pairs that were most similar (0.7–1) c.f 0.6% vs. 0.2% was noted for Morgan fingerprints, whereas a shift for pairs with a low similarity (0.3–0.5) c.f. 13 % vs 8.6 %was observed with ToxPrints, and a shift for the moderate similarity range (0.5–0.7) cf. 7.69% vs. 5.1%. was noted for WL.

Examples of irritants with the highest similarities between each other were Sodium dodecyl sulfate (DTXSID1026031), Methyl hexadecanoate (DTXSID4029149), 1-Decanol (DTXSID7021946), 10-Undecenoic acid (DTXSID8035001), 1-Bromopentane (DTXSID3049203). However if a query was performed for one of these e.g. 10-Undecenoic acid (DTXSID8035001), the top 3 closest analogues in terms of their WL similarities were Dodecanoic acid (DTXSID5021590), Methyl dodecanoate (DTXSID5026889), Methyl hexadecanoate (DTXSID4029149). Whilst these analogues appeared to be structurally related based on visual inspection, they spanned both irritants and non-irritants reaffirming that none of the representations used here encodes any features that helps to discriminate for irritation potential.

4.1.3. Fathead Minnow (FHM)

The FHM dataset comprised 617 substances with their associated acute lethality outcomes in fathead minnow as well as their mode of action (MOA) annotations. One of the best known MOA schemes is that proposed by Verhaar et al [97] which classifies organic compounds into one of four categories: inert chemicals (Class 1), less inert chemicals (Class 2), reactive chemicals (Class 3), and chemicals acting by a specific mechanism (Class 4).

Chemicals in Class 1 exhibit nonpolar narcosis or baseline toxicity and can only be predicted if they have log octanol:water partition coefficient (Kow) values between 0 and 6 (e.g., benzenes). Chemicals in Class 2 are more toxic and cause polar narcosis, and typically possess hydrogen bond donor acidity (e.g., phenols and anilines). Chemicals in Class 3 demonstrate enhanced toxicity as compared to baseline toxicity and react nonspecifically with biomolecules (e.g., epoxides) or are metabolised into more toxic species (e.g., nitriles). Chemicals in Class 4 cause toxicity through a specific mechanism such as acetylcholinesterase (AChE) inhibition by carbamate insecticides. The assignment of a chemical to a class is based on a decision tree that utilises the presence or absence of certain chemical structures and moieties.

Pairwise similarities using Morgan, ToxPrint fingerprints and the WL kernel were performed and stratified based on 2 of the MOAs (baseline narcosis which had the highest number of chemicals and a specific MOA for acetylcholinesterase activity (AChE)). Fig. 7 depicts the heatmaps of the pairwise similarities based on these 3 structural representations and 2 MOAs. Overall, the WL heatmap shows a large number of similar pairings compared with the other 2 fingerprint types for baseline narcotics (12% of pairs had a similarity between 0.3–0.5) whereas ToxPrints appear to better differentiate for AChEs (over 7% of pairs had a similarity greater than 0.5). In the latter case, this was probably since the dataset was limited to several substances that were either closely related carbamates or organophosphates.

Fig. 7.

Fig. 7.

Pairwise similarity matrices across the 3 approaches for the FHM dataset. The pockets of oranges throughout the matrix highlight those pairs of chemicals that are most similar to each other. The frequency of the orange squares is more pronounced in the WL heatmap for substances acting as baseline narcotics whereas there are more examples of similar AChE pairs using ToxPrints.

As an example substance, the top 4 analogues for 1-bromoheptane (DTXSID7022095) (nominally assigned as a baseline narcotic due to its a saturated hydrocarbon class) were retrieved on the basis of the WL scores and compared with ToxPrint similarities. (Note whilst Morgan similarities were computed, we chose not to explicitly describe their scores for this example given their lack of discrimination in the previous 2 case studies.) Pairwise similarities for the analogues; 1-bromohexane (DTXSID4021929), 1-octanamine (DTXSID8021939), 1-octanol (DTXSID7021940), 1-bromooctane (DTXSID3021938) all exceeded 0.78. However these pairwise similarities differed to a much greater extent if ToxPrints formed the basis of the representations. 1-Octanol and 1-octanamine had much lower similarities due to the different functional groups present relative to target 1-bromoheptane, yet all substances were presumed to act as baseline narcotics (see Table 6). Similar to the LLNA, for this MOA FHM dataset, ToxPrints appear to be better able to discriminate substances where specific functional groups were significant in characterising the MOA, such as the case for the AChE domain whereas the broader more general baseline narcosis domain benefited from the WL kernel representation to identify promising candidate analogues.

Table 6.

Analogues identified for 1-bromoheptane on the basis of WL similarity and TxP features.

Name Role WL TxP
1-bromoheptane (DTXSID7022095) Target 1.0 1.0
1-bromooctane (DTXSID3021938) Analogue 0.93 0.91
1-bromohexane (DTXSID4021929) Analogue 0.86 1.0
1-octanol (DTXSID7021940) Analogue 0.78 0.36
1-octanamine (DTXSID8021939) Analogue 0.78 0.36

4.1.4. PPRTV

A read-across example, comprising target substance 2-Amino-4,6-dinitrotoluene (2-ADNT) (CASRN 35572–78–2) and its structural analogues, was extracted from one of the published EPA Provisional Peer-Reviewed Toxicity Values (PPRTV) assessments. A PPRTV is defined as a toxicity value derived for use in the EPA Superfund Program. PPRTVs are derived after a review of the relevant scientific literature using established EPA Agency guidance on human health toxicity value derivations. The objective is to provide support for the hazard and dose–response assessment pertaining to chronic and subchronic exposures of substances of concern, to present the major conclusions reached in the hazard identification and derivation of the PPRTVs, and to characterise the overall confidence in these conclusions and toxicity values. Current assessments can be accessed on the U.S. Environmental Protection Agency’s (EPA’s) PPRTV website at https://www.epa.gov/pprtv. In cases where there is a paucity of data to derive a PPRTV for a specific substance, an analogue approach is applied which permits the use of data from related substances to calculate a screening value. The exact procedure is described in more detail in Wang et al [98].

Five structural analogues with relevant oral non cancer toxicity values had been identified for the target substance 2-ADNT (see Table 7) within the PPRTV assessment report.

Table 7.

2-ADNT is denoted as the target substance based on its role designation. TNT was ultimately selected as the read-across candidate out of the 5 candidate analogues as discussed in the PPRTV report. WL, TxP and Morgan denote the similarity scores computed. WL relies on molecular graphs constructed using only atoms and other atom property information. The pairwise scores are shown in each case. e.g. TNT was determined to have a Jaccard similarity with 2-ADNT of 0.57 with Morgan fingerprints and 0.67 with ToxPrints whereas the WL score was 0.69.

Substance Role DTXSID WL TxP Morgan
2-ADNT Target DTXSID6044068 1 1 1
TNT Selected DTXSID7024372 0.69 0.67 0.57
2-Methyl-5-nitroaniline Candidate DTXSID4020959 0.49 1 0.4
Isopropalin Candidate DTXSID8024157 0.39 0.33 0.21
Pendimethalin Candidate DTXSID7024245 0.46 0.37 0.24
Trifluralin Candidate DTXSID4021395 0.36 0.26 0.23

Table 7 compares the WL scores with the Jaccard similarities based on Morgan and ToxPrint fingerprints.

Based on the expert-driven evaluation of the structural, physicochemical, available toxicokinetic (TK) data, and toxicity data performed as described in the PPRTV assessment, 2,4,6-Trinitrotoluene (TNT) was determined to be the ‘best analogue’ primarily based on its metabolic similarity, structural similarity, and shared metabolites. As quoted in the assessment report ‘the similarity of toxicological outcomes across all the source analogues established confidence in the toxicologic read-across for 2-ADNT. TNT was also determined to be the most health-protective analogue because its point of departure (POD) and corresponding reference dose (RfD) value were lower than the other candidate analogues’. WL and Jaccard (based on Morgan fingerprints) pairwise similarities across the target and all analogues are shown in Fig. 8. TNT had both the highest WL score and Jaccard similarity on the basis of Morgan fingerprints. ToxPrints identified 2-methyl-5-nitroani-line as more similar on account of the number of repeating functional groups. Overall based on the highest WL score, TNT would have been prioritised as the most promising candidate analogue. However, that is not to say that the representation captures the other considerations that factored into its selection for the read-across of 2-ADNT.

Fig. 8.

Fig. 8.

Pairwise similarity matrices across the 3 approaches for 2-ADNT and its source analogues.

Pairwise similarities: Preliminary insights

Across 3 structurally diverse heterogenous datasets (LLNA, BfR and FHM), WL was able to differentiate between structurally similar and dissimilar substances better than Morgan fingerprints and to some extent ToxPrints. WL iteratively relabels node information thereby capturing information about the atoms and the topology of the molecular graph. A refinement to the approach should consider adding bond information as another attribute in the node labels so that analogues could be refined further. This would better differentiate between certain functional groups especially those activated by an unsaturated bond e.g. alpha, beta-unsaturated aldehydes vs. alkyl aldehydes. When datasets were stratified by MOA or reaction chemistry that was well aligned with specific functional groups such as those indicative of electrophilic features, ToxPrints fared better at differentiating between chemicals. ToxPrints fared poorly when the presence of multiple functional groups was a factor e.g. 2-ADNT and 2-methyl-5-nitroaniline were considered equivalent on account of the nitro group but the dinitro moiety would confer some different reaction chemistry. Based on the insights derived from exploring these datasets, WL appears to show promise in identifying candidate analogues but only where reactive chemistry is not a determining factor for the toxicity concerned e.g. WL returned promising candidate analogues for the baseline narcotic 1-bromoheptane.

4.2. Unsupervised graph embeddings

4.2.1. Graph2Vec

Given WL focused on relabelling of nodes alone, Graph2Vec was next investigated in an attempt to learn whole graph-level embeddings of the substances for both the LLNA and FHM datasets. These datasets were chosen since they were modest in size (several hundred substances in each case). t-SNE 2D projections [93] colour coded by the LLNA skin sensitisation reaction domains (Fig. 9) (or FHM MOA) (data not shown) showed no obvious clustering of the substances. There would have been an expectation for substances sharing the same reaction domain to group together based on their whole graph embeddings. This was a disappointing outcome, though suggestive that a much larger dataset is needed to train the model in order for meaningful embeddings to be generated to capture nuanced differences from the substances. Graph2Vec’s default dimensionality comprises 128 features, thus a dataset of at least 1000 examples may have yielded more reasonable embeddings that using datasets of 200–600 substances. Accordingly to leverage the utility of the Graph2Vec technique, datasets of 1000s of substances would be needed to learn more useful embeddings.

Fig. 9.

Fig. 9.

TSNE plot of embeddings from Graph2Vec for the LLNA dataset of substances.

A case in point was that for 1-bromoheptane (DTXSID7022095), the top 4 analogues based on the Graph2Vec embeddings and their cosine distance (range 0.45–0.47) were 2-methoxyethylamine (DTXSID1021908), 2,3-dihydrobenzofuran (DTXSID2022040), methyl tert-butyl ether (DTXSID3020833) and 2-chloro-1-methylpyridinium iodide (DTXSID6022260). For comparison, the cosine distances for the source analogues previously identified by using a WL kernel as discussed in Section 4.1.3 (see Table 8) were all quite high (cosine distance ranges from 0 to 2, where distances closest to 0 are indicative of high similarity) demonstrating that the embeddings did not determine these source analogues as particularly similar.

Table 8.

WL similarities and cosine distances derived from Graph2Vec embeddings for WL-identified analogues of 1-bromoheptane as taken from the FHM dataset. Note the different scales WL represent a similarity from 0 to 1 whereas Graph2Vec is measured with a cosine distance that ranges from 0 to 2, where values close to 0 are indicative of high similarity.

Name Role WL Graph2Vec
1-bromoheptane (DTXSID7022095) Target 1.0 0.0
1-bromooctane (DTXSID3021938) Analogue 0.93 0.70
1-bromohexane (DTXSID4021929) Analogue 0.86 0.81
1-octanol (DTXSID7021940) Analogue 0.78 0.63
1-octanamine (DTXSID8021939) Analogue 0.78 0.66

4.2.2. Mol2Vec

The Mol2Vec model derived from DSSTox structures was applied to both the LLNA and FHM datasets from which distance matrices using cosine as a metric were generated. Pairwise distances were explored for the entire dataset as well as different reaction domains/MOAs.

Considering the same pairs of substances as in Section 4.1.1, pairwise cosine distances were found to be very low suggesting that the embeddings were able to resolve high similarities between the pairs in Table 9.

Table 9.

Pairwise similarities based on ToxPrint fingerprints and cosine distances from Mol2Vec embeddings for selected substances from the LLNA dataset.

Reaction domain Mol2Vec ToxPrint
MA Ethyl acrylate (DTXSID4020583) Butyl acrylate (DTXSID6024676) 0.0096 0.66
MA trans-2-decenal (DTXSID5047035) trans-2-hexenal (DTXSID1041425) 0.015 0.86
Acyl Phthalic anhydride (DTXSID2021159) Trimellitic anhydride (DTXSID7026235) 0.0067 0.7

However closer inspection of the cosine distance matrix revealed very little variation across the entire LLNA dataset. In fact, ~94% of the pairwise distances were in the range of 0–0.1 whereas the most dissimilar pairs had only a cosine distance of up to 0.5. Searching for the top 5 source analogues for 1-bromobutane (DTXSID6021903) identified very unrelated substances at least on the basis of their reaction domain. 1-Bromobutane would react by a SN2 mechanism with respect to skin sensitisation, whereas the source analogues identified were either non-reactive or Schiff base formers (1-butanol (DTXSID1021740), 2,3-butandione (DTXSID6021583), N.N-dimethylformamide (DTXSID6020515) and 1,2-propylene glycol (DTXSID0021206)).

For the analogues identified based on WL for 1-bromoheptane, all the source analogues had a cosine distance of 0.004–0.006 but overall the cosine distance matrix for the FHM dataset had 93% of its pairwise distance in the range of 0–0.1 revealing it was unable to discriminate dissimilar substances effectively.

Accordingly, a much larger corpus is needed to produce meaningful embeddings, perhaps in the order of 10s of million structures as used in the original work vs 1/2 million diverse structures that were taken from DSSTox, in order to extract any useful insights for the 2 datasets herein. Of note, the original Mol2Vec package had been archived and it was not possible to recreate the same corpus used in that study to explore whether any useful embeddings might have been derived for the 2 datasets here.

Unsupervised graph embeddings: Preliminary insights

Unsupervised whole graph embeddings using Graph2Vec or Mol2Vec appear to offer the potential to better encode whole molecule information beyond the more limited capabilities that WL can offer in terms of capturing local neighbourhoods and atom information. However, for the 2 datasets (FHM and LLNA) explored, the embeddings learned from a ‘large’ dataset of 0.5 million DSSTox substances proved insufficient to be able to resolve differences in structure that could be useful from a read-across perspective.

4.3. Graph embedding

As a final attempt to explore the utility of the graph embedding approach, a large dataset of genotoxicity outcomes was used. The genotoxicity dataset was an updated version of that compiled in Pradeep et al [85] drawn from the EPA Toxicity Values database (ToxValDB) (https://www.epa.gov/comptox-tools/downloadable-computational-toxicology-data#AT). The same methodology as described in Pradeep et al [85] was used to create a dataset with a summary genotoxicity outcome for each chemical. Genotoxicity studies, including in vitro and in vivo chromosomal aberration, Ames, micronucleus, mouse lymphoma studies were initially retrieved from ToxValDB. To create a single outcome per chemical, the dataset was first grouped by substance identifier and summarised as follows: if a substance was associated with a positive Ames result, a positive genotoxicity outcome was returned, if a substance was not associated with a positive Ames but did have a reported positive chromosomal or micronucleus outcome, it was tagged as a clastogen. If only inconclusive studies were associated with a substance, an inconclusive tag was assigned, finally if only negative outcomes were associated with the substance, a non-genotoxicity outcome was returned. For the dataset compiled with structural information, there were 5403 chemicals with QSAR-READY SMILES and a genotoxicity outcome. Only positive and negative outcomes were carried forward in the subsequent analyses.

Vectorised embeddings for each substance were derived using Graph2Vec embedding models. The embeddings were projected in 2D using t-SNE [93], which was color coded by genotoxicity outcome. Fig. 10 shows the 2D projection though there was little if any discrimination between positive and negative outcomes for genotoxicity.

Fig. 10.

Fig. 10.

Graph2Vec embeddings projected in 2D t-SNE for the genotoxicity dataset.

The embeddings were also used as inputs in 2 classifiers; a k-NN classifier and logistic regression to assess their informative content. As a baseline comparator, Morgan chemical fingerprints were used as feature inputs into the same two classifiers.

The quality of the embeddings generated by Graph2Vec failed to capture relevant chemical features effectively to be able to discriminate between genotoxic and non-genotoxic outcomes. Morgan chemical fingerprints outperformed the graph embeddings using both classifiers (see Table 10). Graph2Vec struggled to separate the data, with almost no discrimination between the two outcomes as shown in Fig. 10. Fine tuning parameters such as embedding length and learning rates could possibly increase performance since the embeddings were generated using the default parameters of the model. Default parameters were also used for the classification models, leaving another area of possible improvement.

Table 10.

5-fold cross validated k-nn and logistic regression genotoxicity classification results using Morgan fingerprints and the Graph2Vec embeddings method.

Embedding Method k-NN Logistic Regression
Morgan FPs 0.67 0.73
Graph2Vec 0.51 0.548

4.4. GCN embeddings

The same dataset as described Section 4.3 was used to demonstrate the applicability of the GCN embedding method. GCN embeddings were visualised via t-SNE and labelled by outcome as shown in Fig. 11. The 5-fold cross validation AUC scores for the k-NN and logistic regression using the GCN embeddings were found to be 0.659 and 0.778 respectively, a comparable performance to Morgan fingerprints using a k-NN approach but a marked improved with the logistic regression.

Fig. 11.

Fig. 11.

GCN embeddings of validation graph set labelled by genotoxicity outcome.

As with the previous graph embedding discussed in Section 4.3, default parameters were used for both classification models, likely leaving room for improvement in performance through hyperparameter tuning. As with all DL models, there are a large number of options available when constructing a GCN architecture. Layer types, selection of activation functions, pooling methods, choice of loss functions and optimisers, as well as the fine tuning of parameters such as the opti-miser’s learning rate, the number of training epochs and number of neurons per layer are all of significant importance in a network’s performance. Further experimentation with network architecture would likely lead to better performance, but for the purposes of this illustrative example, the application of a generically designed network without any fine tuning was still able to yield reasonable performance.

Taking the embeddings generated for the validation set, and deriving the cosine metric identified 21.6% of pairings as falling in the range of 0–0.1 cosine distance, 44.46% in the 0.1–0.3 distance range, 21% in the 0.3–0.5 distance range, 8.4% in the 0.5–0.7 and 4.3% in the 0.7–1.0 distance range; i.e. over ~20% of the pairwise embeddings are very similar and an additional ~40% are reasonably similar.

For target substance, 3-Methyl-4-nitroquinoline 1-oxide (DTXSID8074944), the top 4 closest analogues as shown in Table 11 were identified. Three of analogues were associated with positive genetox outcomes in concordance with that of the target substance. These source analogues all contained nitroso moieties known to be associated with positive genetox outcomes. Whereas the unsupervised whole embedding approach proved unsuccessful, it was possible to use a large dataset of several thousand substances and use that to extract a learned embedding that encoded specific features that was capable in discriminating genotoxicity. These embeddings showed much more promise in retrieving similar analogues that were both structurally and toxicologically related, as illustrated for the target 3-Methyl-4-nitroquinoline 1-oxide (DTXSID8074944).

Table 11.

Pairwise cosine distances for analogues related to 3-Methyl-4-nitroquinoline 1-oxide DTXSID8074944.

Cosine distance Genetox outcome
3-Methyl-4-nitroquinoline 1-oxide (DTXSID8074944) 0.0 1
1-Naphthalenesulfonyl chloride, 3-diazo-3,4-dihydro-4-oxo-(DTXSID9067980) 0.05 0
N-Isobutyl-N’-nitro-N-nitrosoguanidine (DTXSID8020751) 0.06 1
1-Ethyl-1-nitrosourea (DTXSID8020593) 0.08 1
3,4-Dibromo-n-nitrosopiperidine (DTXSID70875601) 0.08 1

5. Conclusions

In this study, the common ways in which source analogues are typically identified for read-across using chemical fingerprint/scaffold approaches were described before the potential utility of graph based approaches were explored. A selection of approaches to quantify graph similarity were investigated using 5 different toxicity datasets to better understand their utility in identifying and evaluating analogues within a read-across approach compared with 2 conventional chemical fingerprint descriptors. The datasets ranged in size and diversity from an expert driven analogue approach for repeated dose toxicity to a set of genotoxicity outcomes for several thousand substances.

The WL graph kernel approach was found to be useful in characterising potential analogues relative to 2-ADNT in the analogue approach, identifying TNT as the most similar analogue which was consistent with the selection made in the published read-across assessment. WL scores was also found to be useful in capturing similar chemicals into their respective MOA for the FHM dataset particularly where reactive functional groups did not play a role in directing the toxicity. Pairwise similarities within the baseline narcosis MOA were highest for the WL kernel relative to ToxPrints and Morgan chemical similarities.

However WL scores were found to be sensitive to the way in which the molecular graphs were initially constructed such that if atom and bond characteristics were not sufficiently captured, local differences in structural representations could be underrepresented relative to the whole molecular effects thereby overinflating the resulting scores. Careful attention is needed to capture node and edge information before use. Topological and label information play a significant role in ascertaining the WL similarities.

For the datasets such as skin sensitisation where substances were assigned by reaction chemistry domains or for specific MOA where reactivity played a significant role, ToxPrints were found to more effective at capturing relevant chemical information, largely since the features explicitly define relevant bond, atom and functional groups. In all the datasets where pairwise similarities were considered, Morgan fingerprints were associated with the lowest similarities.

In contrast, embedding approaches building on the Mol2Vec approach were found to be extremely poor at capturing relevant molecular information to discriminate between substances of different reaction domains for skin sensitisation or MOAs for fathead minnow. However this might have been more related to the less than ideal dataset sizes for training purposes. The original Mol2Vec model was trained on millions of substances to create the generalised features that resulted in informative embeddings whereas the efforts herein were limited to only 1/2 million substances from DSSTox. Graph2Vec approaches were also found to be ineffective for all datasets evaluated even for the genotoxicity dataset comprising several thousand substances. The embeddings produced were not able to discriminate between substances that were genotoxic or not. Morgan fingerprints were found to be superior in predicting the genotoxicity outcomes as part of a classification model.

A deep learning GCN model fared much better, with a marked improvement in performance compared with Morgan fingerprints for classifying genotoxicity outcomes. Whereas the Graph2Vec embedding approaches were unsupervised in nature, the GCN required labelled training data to create informed embeddings which in turn facilitated genotoxicity classification. The enriched embeddings showed promise in identifying source analogues for read-across based on the target substance considered though this was not systematically evaluated. The performance increase observed came at a cost of resources, complexity and required a much larger dataset for training purposes. Continued efforts could explore the applicability of GCNs to other toxicity datasets of similar size to evaluate whether the performance is as promising and whether the source analogues identified are more relevant than those identified by typical chemical fingerprints. This is a line of investigation that the authors are now actively exploring. Overall, the datasets considered helped to illustrate the potential that graph similarity approaches could play in the identification of suitable analogues for read-across. WL kernels were most useful for analogue identification where the endpoint was not mediated by reaction chemistry. Graph2Vec embeddings were ineffective in any of the example datasets despite the potential that whole graph embeddings might have in capturing structural information. For larger datasets with toxicity outcomes, GCN approaches produced promising embeddings informed by genotoxicity which showed better performance over Morgan fingerprints for classification of genotoxicity and in identifying relevant analogues.

Thus depending on use case and availability of training data, graph similarity could play a larger role in analogue identification and evaluation for read-across. Future work should also consider the role that graph based approaches could play in encoding other types of information beyond structure such as metabolism information for read-across purposes.

Supplementary Material

Supplement1

Funding

The work presented in this manuscript was supported by appropriated funds of the US EPA.

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.comtox.2025.100353.

Footnotes

CRediT authorship contribution statement

Brett Hagan: Writing – original draft, Formal analysis. Imran Shah: Writing – review & editing, Supervision. Grace Patlewicz: Writing – original draft, Visualization, Supervision, Methodology, Investigation, Formal analysis, Conceptualization.

Declaration of competing interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Grace Patlewicz has served as an editor for special issues of this journal.

Disclaimer

This manuscript reflects the opinions of the authors and are not reflective of the opinions or policies of the US EPA.

References

  • [1].USEPA. The Frank R. Lautenberg Chemical Safety for the 21st Century Act, 2016. https://www.epa.gov/assessing-and-managingchemicals-under-tsca/frank-r-lautenberg-chemical-safety-21st-centuryact-law.
  • [2].USEPA, Scientific studies supporting development of transcriptomic points of departure for epa transcriptomic assessment products (etaps) (2024). doi: 10.23645/epacomptox.25365550. [DOI] [Google Scholar]
  • [3].EU. Commission, Regulation (EC) No 1907/2006 of the European Parliament and of the Council of 18 December 2006 concerning the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH), establishing a European Chemicals Agency, amending Directive 1999/45/EC and repealing Council Regulation (EEC) No 793/93 and Commission Regulation (EC) No 1488/94 as well as Council Directive 76/769/EEC and Commission Directives 91/155/EEC, 93/67/EEC,93/105/EC and 2000/21/EC, legislative Body: CONSIL, EP (Dec. 2006). URL http://data.europa.eu/eli/reg/2006/1907/oj/eng.
  • [4].Macmillan DS, Bergqvist A, Burgess-Allen E, Callan I, Dawick J, Carrick B, Ellis G, Ferro R, Goyak K, Smulders C, Stackhouse RA, Troyano E, Westmoreland C, Ramón BS, Rocha V, Zhang X, The last resort requirement under REACH: From principle to practice, Regul. Toxicol. Pharm 147 (2024) 105557, 10.1016/j.yrtph. 2023.105557. https://www.sciencedirect.com/science/article/pii/S0273230023002258. [DOI] [Google Scholar]
  • [5].Enoch SJ, Chemical Category Formation and Read-Across for the Prediction of Toxicity, in: Puzyn T, Leszczynski J, Cronin MT (Eds.), Recent Advances in QSAR Studies: Methods and Applications, Springer; Netherlands, Dordrecht, 2010, pp. 209–219. doi:10.1007/978-1-4020-9783-6_7. Doi: 10.1007/978-1-4020-9783-6_7. [DOI] [Google Scholar]
  • [6].OECD, Guidance on Grouping of Chemicals, Second Edition | en | OECD (2014). URL https://www.oecd.org/publications/guidance-on-grouping-of-chemicals-second-edition-9789264274679-en.htm.
  • [7].Patlewicz G, Ball N, Boogaard PJ, Becker RA, Hubesch B, Building scientific confidence in the development and evaluation of read-across, Regul. Toxicol. Pharm.: RTP 72 (1) (2015) 117–133, number: 1. doi: 10.1016/j.yrtph.2015.03.015. [DOI] [Google Scholar]
  • [8].Shah I, Liu J, Judson RS, Thomas RS, Patlewicz G, Systematically evaluating read-across prediction and performance using a local validity approach characterized by chemical structure and bioactivity information, Regul. Toxicol. Pharmacol. RTP 79 (2016) 12–24, 10.1016/j.yrtph.2016.05.008. [DOI] [PubMed] [Google Scholar]
  • [9].Patlewicz G, Shah I, Towards systematic read-across using Generalised Read-Across (GenRA), Comput. Toxicol 25 (2023) 100258, 10.1016/j.comtox.2022.100258. https://www.sciencedirect.com/science/article/pii/S2468111322000469. [DOI] [Google Scholar]
  • [10].Blackburn K, Stuard SB, A framework to facilitate consistent characterization of read across uncertainty, Regul. Toxicol. Pharm.: RTP 68 (3) (2014) 353–362, number: 3. doi: 10.1016/j.yrtph.2014.01.004. [DOI] [Google Scholar]
  • [11].Schultz TW, Richarz A-N, Cronin MTD, Assessing uncertainty in read-across: Questions to evaluate toxicity predictions based on knowledge gained from case studies, Comput. Toxicol 9 (2019) 1–11, 10.1016/j.comtox.2018.10.003. URL https://www.sciencedirect.com/science/article/pii/S2468111318300811. [DOI] [Google Scholar]
  • [12].Wu S, Blackburn K, Amburgey J, Jaworska J, Federle T, A framework for using structural, reactivity, metabolic and physicochemical similarity to evaluate the suitability of analogs for SAR-based toxicological assessments, Regul. Toxicol. Pharmacol.: RTP 56 (1) (2010) 67–81, 10.1016/j.yrtph.2009.09.006. [DOI] [PubMed] [Google Scholar]
  • [13].Patlewicz G, Cronin MT, Helman G, Lambert JC, Lizarraga LE, Shah I, Navigating through the minefield of read across frameworks: A commentary perspective, Comput. Toxicol 6 (2018) 39–54, 10.1016/j.comtox.2018.04.002. https://linkinghub.elsevier.com/retrieve/pii/S2468111318300331. [DOI] [Google Scholar]
  • [14].Schultz TW, Amcoff P, Berggren E, Gautier F, Klaric M, Knight DJ, Mahony C, Schwarz M, White A, Cronin MTD, A strategy for structuring and reporting a read-across prediction of toxicity, Regul. Toxicol. Pharm.: RTP 72 (3) (2015) 586–601, number: 3. doi: 10.1016/j.yrtph.2015.05.016. [DOI] [Google Scholar]
  • [15].Escher SE, Kamp H, Bennekou SH, Bitsch A, Fisher C, Graepel R, Hengstler JG, Herzler M, Knight D, Leist M,Norinder U, Oúedraogo G, Pastor M, Stuard S, White A, Zdrazil B, van de Water B, Kroese D, Towards grouping concepts based on new approach methodologies in chemical hazard assessment: the read-across approach of the EUToxRisk project, Archives of Toxicology 93 (12) (2019) 3643–3667, number: 12. doi: 10.1007/s00204-019-02591-7. URL http://link.springer.com/10.1007/s00204-019-02591-7. [DOI] [PubMed] [Google Scholar]
  • [16].Rovida C, Escher SE, Herzler M, Bennekou SH, Kamp H, Kroese DE, Maslankiewicz L, Moné MJ, Patlewicz G, Sipes N, Aerts L. v., White A, Yamada T, Water B. v. d., NAM-supported read-across: From case studies to regulatory guidance in safety assessment, ALTEX - Alternatives to animal experimentation 38 (1) (2021) 140–150, number: 1. doi: 10.14573/altex.2010062. URL https://www.altex.org/index.php/altex/article/view/2140. [DOI] [PubMed] [Google Scholar]
  • [17].European. Chemicals. Agency (ECHA), Read-Across Assessment Framework (RAAF)., Publications Office, 2017. URL https://data.europa.eu/doi/10.2823/619212.
  • [18].OECD. Integrated Approaches to Testing and Assessment (IATA) – OECD. URL https://www.oecd.org/chemicalsafety/risk-assessment/iata/.
  • [19].Patlewicz G, Karamertzanis P, Paul Friedman K, Sannicola M, Shah I, A systematic analysis of read-across within REACH registration dossiers, Comput. Toxicol 30 (2024) 100304. https://www.sciencedirect.com/science/article/pii/S2468111324000069, 10.1016/j.comtox.2024.100304. [DOI] [Google Scholar]
  • [20].Tate T, Wambaugh J, Patlewicz G, Shah I, Repeat-dose toxicity prediction with Generalized Read-Across (GenRA) using targeted transcriptomic data: A proof-of-concept case study, Comput. Toxicol. (Amsterdam, Netherlands) 19 (2021) 1–12, 10.1016/j.comtox.2021.100171. [DOI] [Google Scholar]
  • [21].Helman G, Shah I, Patlewicz G, Extending the Generalised Read-Across approach (GenRA): A systematic analysis of the impact of physicochemical property information on read-across performance, Comput. Toxicol. (Amsterdam, Netherlands) 8 (2018) 34–50, 10.1016/j.comtox.2018.07.001. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6820193/. [DOI] [Google Scholar]
  • [22].Nelms MD, Mellor CL, Enoch SJ, Judson RS, Patlewicz G, Richard AM, Madden JM, Cronin MTD, Edwards SW, A mechanistic framework for integrating chemical structure and high-throughput screening results to improve toxicity predictions, Comput. Toxicol. (Amsterdam, Netherlands) 8 (2018) 1–12, 10.1016/j.comtox.2018.08.003. [DOI] [Google Scholar]
  • [23].Boyce M, Meyer B, Grulke C, Lizarraga L, Patlewicz G, Comparing the performance and coverage of selected in silico (liver) metabolism tools relative to reported studies in the literature to inform analogue selection in read-across: A case study, Comput. Toxicol. (Amsterdam, Netherlands) 21 (2022) 1–15, 10.1016/j.comtox.2021.100208. [DOI] [Google Scholar]
  • [24].Shah I, Patlewicz G, GenRA (2024). URL https://www.comptox.epa.gov/genra.
  • [25].Kunimoto R, Vogt M, Bajorath J, Maximum common substructure-based Tversky index: an asymmetric hybrid similarity measure, J. Comput. Aided Mol. Des 30 (7) (2016) 523–531, 10.1007/s10822-016-9935-y. [DOI] [PubMed] [Google Scholar]
  • [26].O’Boyle NM, Boström J, Sayle RA, Gill A, Using matched molecular series as a predictive tool to optimize biological activity, J. Med. Chem 57 (6) (2014) 2704–2713, doi: 10.1021/jm500022q. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].G Lester C, Yan G, A matched molecular pair (MMP) approach for selecting analogs suitable for structure activity relationship (SAR)-based read across – Regul. Toxicol. Pharmacol 104966 (2021) doi: 10.1016/j.yrtph.2021.104966. [DOI] [Google Scholar]
  • [28].Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, Patlewicz G, Shah I, Wambaugh JF, Judson RS, Richard AM, The CompTox Chemistry Dashboard: a community data resource for environmental chemistry, J. Cheminf 9 (1) (2017) 61, 10.1186/s13321-017-0247-6. [DOI] [Google Scholar]
  • [29].Schultz TW, Diderich R, Kuseva CD, Mekenyan OG, The OECD QSAR Toolbox Starts Its Second Decade, Methods Mol. Biol. (Clifton N.J.) 1800 (2018) 55–77, 10.1007/978-1-4939-7899-1_2. [DOI] [Google Scholar]
  • [30].Bajusz D, Rácz A, Héberger K, Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. Cheminf 7 (1) (2015) 20, 10.1186/s13321-015-0069-3. [DOI] [Google Scholar]
  • [31].Floris M, Manganaro A, Nicolotti O, Medda R, Mangiatordi GF, Benfenati E, A generalizable definition of chemical similarity for read-across, J. Cheminf 6 (1) (2014) 39, 10.1186/s13321-014-0039-1. [DOI] [Google Scholar]
  • [32].Saliner A Gallegos, Patlewicz G, Worth A, A similarity based approach for chemical category classification. Joint Research Centre EUR 21867 EN (2005). [Google Scholar]
  • [33].Rogers D, Hahn M, Extended-Connectivity Fingerprints, Journal of Chemical Information and Modeling 50 (5) (2010) 742–754, publisher: American Chemical Society. doi:10.1021/ci100050t. Doi: 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
  • [34].Durant JL, Leland BA, Henry DR, Nourse JG, Reoptimization of MDL Keys for Use in Drug Discovery, Journal of Chemical Information and Computer Sciences 42 (6) (2002) 1273–1280, publisher: American Chemical Society. doi:10.1021/ci010132r. Doi: 10.1021/ci010132r. [DOI] [PubMed] [Google Scholar]
  • [35].Yang C, Tarkhov A, Marusczyk J, Bienfait B, Gasteiger J, Kleinoeder T, Magdziarz T, Sacher O, Schwab CH, Schwoebel J, Terfloth L, Arvidson K, Richard A, Worth A, Rathman J, New publicly available chemical query language, CSRML, to support chemotype representations for application to data mining and modeling, J. Chem. Inf. Model 55 (3) (2015) 510–528, 10.1021/ci500667v. [DOI] [PubMed] [Google Scholar]
  • [36].Carhart RE, Smith DH, Venkataraghavan R, Atom pairs as molecular features in structure-activity studies: definition and applications, Journal of Chemical Information and Computer Sciences 25 (2) (1985) 64–73, publisher: American Chemical Society. doi:10.1021/ci00046a002. Doi: 10.1021/ci00046a002. [DOI] [Google Scholar]
  • [37].Banerjee A, Kar S, Roy K, Patlewicz G, Charest N, Benfenati E, Cronin M, Molecular similarity in chemical informatics and predictive toxicity modeling: from quantitative read-across (q-ra) to quantitative read-across structure-activity relationship (q-rasar) with the application of machine learning, Crit Rev Toxicol. 54 (2024) 659–684, 10.1080/10408444.2024.2386260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [38].Mellor CL, Marchese Robinson RL, Benigni R, Ebbrell D, Enoch SJ, Firman JW, Madden JC, Pawar G, Yang C, Cronin MTD, Molecular fingerprint-derived similarity measures for toxicological read-across: Recommendations for optimal use, Regul. Toxicol. Pharm.: RTP 101 (2019) 121–134. doi: 10.1016/j.yrtph.2018.11.002. [DOI] [Google Scholar]
  • [39].Alameri A, Alsharafi M, Topological indices types in graphs and their applications, 2021.
  • [40].Ullmann J, An Algorithm for Subgraph Isomorphism, Journal of the ACM 23 (1) (1976) 31–42, type: Journal Article. Doi: 10.1145/321921.321925. [DOI] [Google Scholar]
  • [41].Pelillo M, Replicator equations, maximal cliques, and graph isomorphism, Neural Comput. 11 (8) (1999) 1933–1955, 10.1162/089976699300016034. https://direct.mit.edu/neco/article/11/8/1933-1955/6302. [DOI] [PubMed] [Google Scholar]
  • [42].Melnik S, Garcia-Molina H, Rahm E, Similarity flooding: a versatile graph matching algorithm and its application to schema matching, in: Proceedings 18th International Conference on Data Engineering, IEEE Comput. Soc, San Jose, CA, USA, 2002, pp. 117–128. doi: 10.1109/ICDE.2002.994702. URL http://ieeexplore.ieee.org/document/994702/. [DOI] [Google Scholar]
  • [43].Jeh G, Widom J, SimRank: a measure of structural-context similarity, in: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ‘02, Association for Computing Machinery, New York, NY, USA, 2002, pp. 538–543. doi:10.1145/775047.775126. Doi: 10.1145/775047.775126. [DOI] [Google Scholar]
  • [44].Zager LA, Verghese GC, Graph Similarity for scoring and matching, Applied Mathematics Letters 21 (1) (2008) 86–94, type: Journal Article. Doi: 10.1016/j.aml.2007.01.006. [DOI] [Google Scholar]
  • [45].Koutra D, Parikh A, Ramdas A, Xiang J, Algorithms for Graph Similarity and Subgraph Matching, Report, Carnegie Mellon University, 2011. [Google Scholar]
  • [46].Chartrand G, Kubicki G, Schultz M, Graph similarity and distance in graphs, aequationes mathematicae 55 (1) (1998) 129–145, type: Journal Article. doi: 10.1007/s000100050025. Doi: 10.1007/s000100050025. [DOI] [Google Scholar]
  • [47].Gillet VJ, Willett P, Bradshaw J, Similarity searching using reduced graphs, J. Chem. Inf. Comput. Sci 43 (2) (2003) 338–345, 10.1021/ci025592e. [DOI] [PubMed] [Google Scholar]
  • [48].Birchall K, Gillet VJ, Reduced graphs and their applications in chemoinformatics, Methods Mol. Biol. (Clifton N.J.) 672 (2011) 197–212, 10.1007/978-1-60761-839-3_8. [DOI] [Google Scholar]
  • [49].Birchall K, Gillet VJ, Harper G, Pickett SD, Training similarity measures for specific activities: application to reduced graphs, J. Chem. Inf. Model 46 (2) (2006) 577–586, 10.1021/ci050465e. [DOI] [PubMed] [Google Scholar]
  • [50].Akutsu T, Nagamochi H, Comparison and enumeration of chemical graphs, Comput. Struct. Biotechnol. J 5 (6) (2013) e201302004, 10.5936/csbj.201302004. https://www.sciencedirect.com/science/article/pii/S2001037014600325 e201302004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [51].Duesbury E, Holliday JD, Willett P, Maximum Common Subgraph Isomorphism Algorithms, MATCH Communications in Mathematical and in Computer Chemistry 77 (2) (2017) 213–232, number: 2 Publisher: Sheffield. URL http://match.pmf.kg.ac.rs/content77n2.htm. [Google Scholar]
  • [52].Raymond JW, Willett P, Maximum common subgraph isomorphism algorithms for the matching of chemical structures, J. Comput. Aided Mol. Des 16 (7) (2002) 521–533, 10.1023/A:1021271615909. [DOI] [PubMed] [Google Scholar]
  • [53].Dalke A, Hastings J, FMCS: a novel algorithm for the multiple MCS problem, J. Cheminf 5 (Suppl 1) (2013) O6, 10.1186/1758-2946-5-S1-O6. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3606201/. [DOI] [Google Scholar]
  • [54].Haussler D, Convolution kernels on discrete structures http: www.cse.ucsc.eduhaussler. Accessed 2th April 2025 https://bpb-us-e1.wpmucdn.com/sites.ucsc.edu/dist/4/821/files/2018/12/Convolution-Kernels.pdf.
  • [55].Kondor R, Lafferty JD, Diffusion kernels on graphs and other discrete input spaces, in: International Conference on Machine Learning. Vol. 2 (2002). [Google Scholar]
  • [56].Vishwanathan S, Borgwardt KM, Schraudolph NN, Fast-Computation-of-Graph-Kernels, in: Schölkopf B, Platt J, Hofmann T (Eds.), Advances in Neural Information Processing Systems 19, The MIT Press, 2007, pp. 1449–1456, 10.7551/mitpress/7503.003.0186. [DOI] [Google Scholar]
  • [57].Kriege NM, Johansson FD, Morris C, A survey on graph kernels, Applied Network Science 5 (1) (2020) 1–42, number:1 Publisher: SpringerOpen. doi: 10.1007/s41109-019-0195-3. URL https://appliednetsci.springeropen.com/articles/10.1007/s41109-019-0195-3. [DOI] [Google Scholar]
  • [58].Gärtner T, A survey of kernels for structured data, SIGKDD Explor. Newsl 5 (1) (2003) 49–58, 10.1145/959242.959248. [DOI] [Google Scholar]
  • [59].Borgwardt KM, Kriegel HP, Shortest-path kernels on graphs, in: Fifth IEEE International Conference on Data Mining (ICDM’05), p. 8 pp. doi: 10.1109/ICDM.2005.132. [DOI] [Google Scholar]
  • [60].Shervashidze N, Scweitzer P, Jan van Leeuwen E, Melhorn K, Weisfeiler-lehman graph kernels, J. Mach. Learn. Res 12 (77) (2011) 2539–2561. [Google Scholar]
  • [61].Cai H, Zheng VW, Chang K, A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications, IEEE Transactions on Knowledge & Data Engineering 30 (09) (2018) 1616–1637, type: Journal Article. doi: 10.1109/TKDE.2018.2807452. URL http://doi.ieeecomputersociety.org/10.1109/TKDE.2018.2807452. [DOI] [Google Scholar]
  • [62].Xu M, Understanding Graph Embedding Methods and Their Applications, SIAM Review 63 (4) (2021) 825–853, type: Journal Article. doi: 10.1137/20m1386062. URL https://epubs.siam.org/doi/abs/10.1137/20M1386062. [DOI] [Google Scholar]
  • [63].Goyal P, Ferrara E, Graph embedding techniques, applications, and performance: A survey, Knowledge-Based Systems 151 (2018) 78–94, type: Journal Article. doi: 10.1016/j.knosys.2018.03.022. URL https://www.sciencedirect.com/science/article/pii/S0950705118301540. [DOI] [Google Scholar]
  • [64].Kruskal (2024/06/18 1978). Multidimensional Scaling. Sage University Paper Series on Quantitative Applications in the Social Sciences, No. 07–011, Sage Publications, Newbury Park. doi: 10.4135/9781412985130. [DOI] [Google Scholar]
  • [65].Tenenbaum JB, de Silva V, Langford JC, A global geometric framework for nonlinear dimensionality reduction, Science 290 (5500) (2000) 2319–2323, 10.1126/science.290.5500.2319. [DOI] [PubMed] [Google Scholar]
  • [66].Belkin M, Niyogi P, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput. 15 (6) (2003) 1373–1396, 10.1162/089976603321780317. [DOI] [Google Scholar]
  • [67].Xu M, Understanding graph embedding methods and their applications, arXiv: 2012.08019 [cs, math] (Dec. 2020). doi: 10.48550/arXiv.2012.08019. URL http://arxiv.org/abs/2012.08019. [DOI] [Google Scholar]
  • [68].Perozzi B, Al-Rfou R, Skiena S, DeepWalk: Online Learning of Social Representations, in: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014, pp. 701–710, arXiv:1403.6652 [cs]. doi: 10.1145/2623330.2623732. URL http://arxiv.org/abs/1403.6652. [DOI] [Google Scholar]
  • [69].Mikolov T, Chen K, Corrado G, Dean J, Efficient Estimation of Word Representations in Vector Space, arXiv:1301.3781 [cs] (Sep. 2013). URL http://arxiv.org/abs/1301.3781. [Google Scholar]
  • [70].Jaeger S, Fulle S, Turk S, Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition, Journal of Chemical Information and Modeling 58 (1) (2018) 27–35, publisher: American Chemical Society. doi:10.1021/acs.jcim.7b00616. Doi: 10.1021/acs.jcim.7b00616. [DOI] [PubMed] [Google Scholar]
  • [71].Grover A, Leskovec J, node2vec: Scalable feature learning for networks, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016). [Google Scholar]
  • [72].Narayanan A, Chandramohan M, Venkatesan R, Chen L, Liu Y, Jaiswal S, graph2vec: Learning Distributed Representations of Graphs, ArXiv abs/1707.05005, type: Journal Article (2017). [Google Scholar]
  • [73].Chen H, Koga H, Gl2vec: Graph embedding enriched by line graphs with edge features, Neural Information Processing, Springer International Publishing; (2019) 3–14, 10.1007/978-3-030-36718-3_1. [DOI] [Google Scholar]
  • [74].Cai H, Zheng VW, Chang KC-C, A comprehensive survey of graph embedding: Problems, techniques, and applications, IEEE Transactions on Knowledge Camp; Data Engineering 30 (09) (2018) 1616–1637. doi: 10.1109/tkde.2018.2807452. URL https://doi.ieeecomputersociety.org/10.1109/TKDE.2018.2807452. [DOI] [Google Scholar]
  • [75].Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G, The Graph Neural Network Model, IEEE Transactions on Neural Networks 20 (1) (2009) 61–80, type: Journal Article. doi: 10.1109/TNN.2008.2005605. [DOI] [PubMed] [Google Scholar]
  • [76].Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, Adams RP, Convolutional networks on graphs for learning molecular fingerprints, Adv. Neural Inf. Proces. Syst 28 (2015). [Google Scholar]
  • [77].Coley CW, Barzilay R, Green WH, Jaakkola TS, Jensen KF, Convolutional embedding of attributed molecular graphs for physical property prediction, J. Chem. Inf. Model 57 (8) (2017) 1757–1772. [DOI] [PubMed] [Google Scholar]
  • [78].Gilmer J, Schoenholz S, Riley P, Vinyals O, Dahl G, International conference on machine learning, Neural message passing for quantum chemistry (2017). [Google Scholar]
  • [79].Wang X, Li Z, Jiang M, Wang S, Zhang S, Wei Z, Molecule Property Prediction Based on Spatial Graph Embedding, J. Chem. Inf. Model 59 (9) (2019) 3817–3828, 10.1021/acs.jcim.9b00410. [DOI] [PubMed] [Google Scholar]
  • [80].Patlewicz G, Casati S, Basketter DA, Asturiol D, Roberts DW, Lepoittevin J-P, Worth AP, Aschberger K, Can currently available non-animal methods detect pre and pro-haptens relevant for skin sensitization? Regul. Toxicol. Pharm. RTP 82 (2016) 147–155, 10.1016/j.yrtph.2016.08.007. [DOI] [Google Scholar]
  • [81].Hulzebos E, Walker J, Gerner I, Schlegel K, Use of structural alerts to develop rules for identifying chemical substances with skin irritation or skin corrosion potential, QSAR Comb. Sci, 24 (2005) 332–342, 10.1002/qsar.200430905. [DOI] [Google Scholar]
  • [82].Patlewicz G, Jeliazkova N, Safford RJ, Worth AP, Aleksiev B, An evaluation of the implementation of the Cramer classification scheme in the Toxtree software, SAR QSAR Environ. Res 19 (5–6) (2008) 495–524, 10.1080/10629360802083871. [DOI] [PubMed] [Google Scholar]
  • [83].Patlewicz G, Jeliazkova N, Gallegos Saliner A, Worth AP, Toxmatch–a new software tool to aid in the development and evaluation of chemically similar groups, SAR QSAR Environ. Res 19 (3–4) (2008) 397–412, 10.1080/10629360802083848. [DOI] [PubMed] [Google Scholar]
  • [84].Russom CL, Bradbury SP, Broderius SJ, Hammermeister DJ, Drummond RA, Veith GD, Predicting modes of toxic action from chemical structure, Environ. Toxicol. Chem 32 (7) (2013) 1441–1442, 10.1002/etc.2249. [DOI] [PubMed] [Google Scholar]
  • [85].Pradeep P, Judson R, DeMarini DM, Keshava N, Martin TM, Dean J, Gibbons CF, Simha A, Warren SH, Gwinn MR, Patlewicz G, Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset, Computational Toxicology (Amsterdam, Netherlands) 18 (May 2021). doi: 10.1016/j.comtox.2021.100167. [DOI] [Google Scholar]
  • [86].Asturiol D, Casati S, Worth A, Consensus of classification trees for skin sensitisation hazard prediction, Toxicol. in Vitro: an International Journal Published in Association with BIBRA 36 (2016) 197–209, 10.1016/j.tiv.2016.07.014. [DOI] [Google Scholar]
  • [87].Siglidis G, Nikolentzos G, Limnios S, Giatsidis C, Skianis K, Vazirgiannis M, GraKeL: A Graph Kernel Library in Python, arXiv:1806.02193 [cs, stat] (Mar. 2020). doi: 10.48550/arXiv.1806.02193. URL http://arxiv.org/abs/1806.02193. [DOI] [Google Scholar]
  • [88].Rozemberczki B, Kiss O, Sarkar R, Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs, in: Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM’20), 2020, pp. 3125–3132. [Google Scholar]
  • [89].Grulke CM, Williams AJ, Thillanadarajah I, Richard AM, EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research, Comput Toxicol. 12 (2019), 10.1016/j.comtox.2019.100096. [DOI] [Google Scholar]
  • [90].Rehurek R, Sojka P, Gensim–python framework for vector space modelling, NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic 3 (2) (2011). [Google Scholar]
  • [91].Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E, Scikit-learn: Machine learning in python, J. Mach. Learn. Res 12 (2011) 2825–2830. [Google Scholar]
  • [92].Brody S, Alon U, Yahav E, How attentive are graph attention networks?, ArXiv abs/2105.14491 (2021). [Google Scholar]
  • [93].van er Maaten L, Hinton G, Visualizing Data using t-SNE, J. Mach. Learn. Res 8 (2018) 2579–2605. [Google Scholar]
  • [94].Landrum GL, RDKit: Open-source cheminformatics;.URL http://www.rdkit.org.
  • [95].Fey M, Lenssen JE, Fast graph representation learning with pytorch geometric (2019). arXiv:1903.02428. URL https://arxiv.org/abs/1903.02428. [Google Scholar]
  • [96].Roberts DW, Aptula AO, Determinants of skin sensitisation potential, J. Appl. Toxicol 28 (3) (2008) 377–387, 10.1002/jat.1289. [DOI] [PubMed] [Google Scholar]
  • [97].Verhaar H, van Leeuwen C, Hermens J, Classifying environmental pollutants. 1. structure-activity relationships for prediction of aquatic toxicity, Chemosphere 25 (1992) 471–491. [Google Scholar]
  • [98].Wang NCY, Jay Zhao Q, Wesselkamper SC, Lambert JC, Petersen D, Hess-Wilson JK, Application of computational toxicological approaches in human health risk assessment. I. A tiered surrogate approach, Regulatory Toxicology and Pharmacology 63 (1) (2012) 10–19, number: 1. doi: 10.1016/j.yrtph.2012.02.006. URL https://linkinghub.elsevier.com/retrieve/pii/S0273230012000323. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement1

Data Availability Statement

All analysis was performed in Python 3.10 using Jupyter notebooks. RDKit [94] was used for generation of Morgan chemical fingerprints. The EPA Cheminformatics Modules were used to retrieve ToxPrints.

Molecular graph representations were created using the Python package RDKit [94]. The open source Python package GraKeL [87] was used to implement the WL subtree kernel. The open source Python package KarateClub was used to create the Graph2Vec embeddings. Gensim [90] was used to train the Word2Vec model for the Mol2Vec approach. Scikit-learn [91] was used to develop k-NN and logistic models for the embeddings derived from the Graph2Vec and GCN approaches. Pytorch geometric [95] was used to train the GCN model using the genotoxicity dataset.

The code repository and associated data files supporting this analysis are available at https://github.com/patlewig/metgraph_survey and 10.5281/zenodo.15368541 respectively.

RESOURCES