Abstract
High-fidelity computer-aided experimentation is becoming more accessible with the development of computing power and artificial intelligence tools. The advancement of experimental hardware also empowers researchers to reach a level of accuracy that was not possible in the past. Marching toward the next generation of self-driving laboratories, the orchestration of both resources lies at the focal point of autonomous discovery in chemical science. To achieve such a goal, algorithmically accessible data representations and standardized communication protocols are indispensable. In this perspective, we recategorize the recently introduced approach based on Materials Acceleration Platforms into five functional components and discuss recent case studies that focus on the data representation and exchange scheme between different components. Emerging technologies for interoperable data representation and multi-agent systems are also discussed with their recent applications in chemical automation. We hypothesize that knowledge graph technology, orchestrating semantic web technologies and multi-agent systems, will be the driving force to bring data to knowledge, evolving our way of automating the laboratory.
Keywords: Knowledge graph, digital twin, chemistry digitalization, closed-loop optimization, laboratory automation
Introduction
The automation of the laboratory involves linking the abstract concepts of chemical processes and the hardware responsible for the execution.1,2 It can be achieved by creating a fully connected virtual representation of the physical equipment and their status, that is, a “digital twin” of the laboratory that bridges the gap between the virtual and the real world. By doing so, it enables the orchestration of physical and computational experimentation in cyberspace, facilitating the automation of chemical discovery.3 Therefore, it shortens the time span from making a new chemical in the research environment to the delivery of its mass production to the end-users. This presents the opportunity to deliver a significant level of decarbonization with reduced labor and energy consumption, making the digitalization of chemical manufacturing one of the critical technology paths toward a more sustainable society.4,5
The first automated hardware for chemistry dates back to the late 1960s.6 Since then, considerable advances have been made to expand the potentialities of such a tool, covering the fields of chemical reactions,7,8 drug discovery,9 and material discovery for clean energy.10,11 As chemists quest to achieve a universal organic compound synthesis machine, three key capabilities were identified,12 that is, access to a database of chemical reaction knowledge, synthetic steps planning, and automated execution of a proposed action sequence. For a detailed historical excursus, the readers may refer to Dimitrov et al.13 In 2018, Aspuru-Guzik and Persson14 proposed Materials Acceleration Platform (MAP), a platform-based approach, as the paradigm to accelerate the material discovery process, which was further adopted and expanded by Flores-Leonar et al.15 In line with the three key capabilities that seem to be required to build a robo-chemist,12 Flores-Leonar et al.15 envisaged integration of machine learning (ML) algorithms and robotics platforms, with further interfacing between humans and robots, as the way toward autonomous experimentation. The current practices of development toward laboratory automation are seen to be following this trend. Researchers adopt automation of chemical experiments and advances in ML to enable functional material discovery,16,17 the discovery of chemical reactions,18 synthesis planning,19,20 and optimization of process conditions.21−23 Despite the great success demonstrated by the community, the effort required to incorporate new equipment into an existing platform can be expensive. Tailored extraction–transformation–loading (ETL) tools and the specific data exchange scheme for establishing effective communication are to be developed for each piece of equipment added. Therefore, these platforms normally face difficulties in scalability and interoperability due to heterogeneous data formats as an obstacle to holistic integration, especially when it comes to the vision of a globally integrated collaboration network.11 As a prerequisite condition toward digitalization, the absence of standardized data representation and exchange protocols is seen as one of the critical challenges faced by the community.8
A way forward may be offered by Semantic Web technologies,24 which present a vision of a fully linked web of data, demonstrating interoperability across scales and domains. It uses ontologies to describe the concepts and relationships within a given domain for communal understanding. In this perspective, we refer to ontologies developed to describe knowledge in the chemistry domain, and more importantly, those implemented in a way that is compatible with the semantic web standards,25 as chemical ontologies. One prominent example is ChEBI.26,27 An ontology normally consists of two components: a terminological box (TBox) and an assertional box (ABox).25 TBox refers to the description at a conceptual level, while ABox stores the data that is a realization of the concepts defined by the TBox. Both levels can be accessed via internationalized resource identifiers (IRIs), essentially generalized uniform resource identifiers (URIs), for unambiguous identification. In the context of automating experiments, this opens up the possibility of developing a fully linked data representation for the chemical processes and equipment status as a universal framework to facilitate concrete data exchange within and between platforms.
Besides the interoperable data representation, an effective way to communicate and share data must be addressed to achieve laboratory automation. In this regard, collective intelligent agents have been used to automate the tasks involved in crystal-structure phase mapping,28 material discovery,29 and reaction optimization.30 Considering the historical discussions of integrating the two technologies,31 we hypothesize that an ontological representation of a laboratory, linked with different data standards, would enable the rapid implementation of artificial intelligence (AI) tools for chemical discovery and development.
This perspective aims to review the potential for arising technologies to enhance how we approach laboratory automation. The presentation of this perspective is structured as follows. First, we review the state-of-the-art in laboratory automation practice with a focus on data infrastructure. Based on the limitations of current approaches, we assess community efforts toward standardized data representation and effective data exchange. We identify dynamic knowledge graphs, that is, a combination of ontologies and agents, as an interesting technology option. This approach allows the intelligent automation of experiments to be linked with chemical knowledge resources and aligned with other AI techniques. It is suggested that this will play a key role in the next generation of laboratory automation.
Platform-Based Approach
Detailed reviews of the applications of the closed-loop optimization have been published by Cao et al.32 and Coley et al.7 In this section, we focus on the data flow between the different components of such an automated experimentation platform as presented in the state-of-the-art studies. To have a clearer demonstration of the data flow between different parts, thus revealing how these functional components can be shifted into agents as in the knowledge-graph-based approach, we regroup the five key elements proposed by Flores-Leonar et al.15 and recast them as illustrated in Figure 1. The receptionist acts as a human–machine interface that receives, analyzes, and translates the requests into machine-understandable objects, as well as enables real-time and interactive communication between user and data. The coordinator manages the workflow by locating resources given constraints, requesting data from the librarian, asking the planner for suggestions over the next steps, and requesting experiments from the executor. The planner is a decision making entity that designs the experiment, plans retrosynthesis steps, and also selects suitable surrogate models given use-cases. The librarian is responsible for data management, including maintenance of the database, data cleaning, data validation, and outlier detection. The executor performs the computational and physical experiments, both interfaced with the available experimental resources. We categorize the selected studies into the realization of functional components and assess the data communication between each of them. It should be noted that we do not cover the specific internal realization of the components, that is, we do not consider how the planner handles the input historical data and how it recommends the synthesis route; instead, we focus on the format of the recommendation output from the planner. Following the review, we list the limitations of the platform-based approach that lead to the quest for better data representation and exchange protocols.
Figure 1.
Functional components of a platform-based approach toward chemical discovery, annotated with the communications between each component.
Selected Studies
There have been extensive reviews on developing each of the functional components.15,33−36 In the context of chemical automation, Mateos et al.37 reviewed the realization of the components in selected continuous flow platforms. In this perspective, we selected the studies below to illustrate how the data is exchanged between the functional components in the platform-based approach. Specifically, we will review the data exchange protocols between the coordinator, librarian, planner, and executor for further investigation on interoperability within one platform and between different platforms in the current setups. We identified three main types of data representation and storage in the automated experimentation platforms, namely, variables stored in a reserved memory location of programming languages, data stored in a file on a hard disk, and data stored in a database. Based on this classification, three types of data transfer and communication protocols were identified as assigning in-memory cache values during software program run-time, file transfer protocol, and HTTP request/response. It should be noted that although both of the latter two ways of communication belong to the application layer in the TCP/IP model, they are distinguished herein to emphasize the format in which the data is stored and consequently transferred. To the best of our knowledge, the complete details are summarized in tables in the Supporting Information.
Receptionist
The receptionist acts as the human–machine interface. Among different platforms, multiple ways of interaction have been reported. Knight et al.38 present a voice-controlled user interface integrating voice, text, and visual dashboards. This increased the flexibility for the experimentalist to communicate and collaborate with the automated setups without coding experience required. Web interfaces via HTTP requests/responses21,39,40 is another way of interaction. The advantage of this approach is that authorized users can log in to the web page and access the platform from all over the world.35 Moreover, the natural language processing (NLP) modules can build on top of the web interface as chatbots, which can further connect to existing messaging services such as Gmail, Twitter, Slack, and Dropbox.16,41 The graphical user interface (GUI) is a more intuitive way of interaction between the users and the automated experimental platforms. It can be built through different coding software, such as Matlab,42 Python,17,19 and LabVIEW.22,43,44 It should be noted that each receptionist can only work within its own operating system due to its bonded communication protocols as well as the coding language.
Coordinator
The coordinator manages the workflow in the closed-loop system. Among the different programming languages and tools that have been employed to develop the coordinator, Python is perhaps the most widely adopted. The Aspuru-Guzik group proposed ChemOS,16,41 a modular coordinator orchestrating the learning module (the AI-based planner), the communication module (server-based receptionist), and an operation module for remote control of the robotic platform. ChemOS demonstrated decision-making capabilities in managing the workflow for thin-film material discovery16 and increasing the efficiency of organic photovoltaics.45 It has now been commercialized as Atinary SDLabs46 with a Scientia version freely available for academics. Zhu’s group presented MAOSIC,17 a coordinator upgraded from their previous system MAOS,47 which was applied to the autonomous discovery of optically active chiral inorganic perovskite nanocrystals. Experiment Specification, Capture and Laboratory Automation Technology (ESCALATE) has a coordinator acting as a bridge to connect the experimental workflow.48 Its initial implementation was designed for the exploratory synthesis of single-crystal metal-halide perovskites. Further discovery of the formation of two new perovskite phases was demonstrated.49 Chemputer19 was developed for organic synthesis optimization in batch reactors. This coordinator brought together synthesis abstraction, chemical programming and hardware control, and tested the synthesis of three small pharmaceutical compounds with similar yields to those obtained by manual work. Moreover, by using a standardized format for reporting a chemical synthesis procedure within the coordinator, Chemputer captures synthetic protocols as digital code that can be further published, versioned, and transferred flexibly.
LeyLab39 is a PHP-based coordinator orchestrating multiple users and equipment in different continents for the development of catalysts and process conditions in flow reactors. The firewall within the coordinator prevents malicious attacks from unauthorized users.
The Lapkin group presented a Matlab-based coordinator for multi-objective optimization of the reaction conditions for SNAr and N-benzylation reactions.50 It demonstrated its flexibility to a different chemical system with an aldol condensation reaction optimization.42
There are also coordinators based on LabVIEW. Given the user-friendly graphical programming interface in LabVIEW, building a receptionist module is not required in this setup. However, Matlab43 or Python44 are occasionally paired up with the LabVIEW to enable the planner module to suggest new experiments.
Another notable development is C#-based ARES OS,51 an open-source software released by Air Force Research Laboratory (AFRL) following their autonomous research system (ARES). As the first reported autonomous experimentation system for materials development, ARES demonstrated its capability in carbon nanotube synthesis experiments52,53 and additive manufacturing applications.54
It can be seen that coordinators followed different coding philosophies in different programming languages. For each case study, the reported coordinator indeed satisfied the specific need yet failed to extend to other systems.
Coordinator–Librarian
The interaction between the coordinator and librarian focuses on reading historical data and writing new data for data storage. Depending on the operating system of the coordinator, as well as the structure of the librarian, in each platform, the data communication protocols between the coordinator and librarian are various.
An intuitive approach is to store and transfer the data as variables in the memory of the operating system. Jeraal et al.42 stored and transferred data as Matlab variables. Similarly, Christensen et al.55 used Python variables for communication. This approach is lightweight and independent of the database structure. However, it is vulnerable as there is no backup for the data obtained. Moreover, the data stored are hard-coded and picked beforehand, meaning the variables will be reassigned during the iterations.
File transfer is an approach to overcome this issue. Cao et al.5,32 used CSV files as the bridge for communication. Other studies used MAT files in a similar fashion.22,56 In this approach, the experimental results were exported and stored as a file that can be loaded later for suggesting the next experiments. Compared to storing data as in-memory cache variables, the file transfer approach gives a way to back up the data on a separate machine or online server with flexible access and secure storage. However, the files can still be hard to track and classify when the number of experiments is high or more than one type of experiment is run on the platform.
Databases provide a solution to efficiently manage large amounts of experimental data. Li et al.17 stored long-term data through SQLAlchemy, which supports a database management system (DBMS), with databases such as MySQL, Postgres, Oracle, and SQLite as the back-end. The coordinator MAOSIC can read and write new entries to the server-based database via API. In Roch et al.,41 the coordinator ChemOS was connected to SQLite, and the information was stored in four distinct databases (requestDB, parameterDB, robotDB, feedbackDB) on SQLite to better classify the data and retrieve them in the later stage. Materials Experiment and Analysis Database (MEAD)57 consists of both raw data and metadata from high-throughput experimentation. By instantiating an event-sourced architecture for materials provenances (ESAMP),58 the MEAD database enabled the ML algorithm to utilize the material state within its experimental workflow for accelerating materials discovery.
Coordinator–Planner
To avoid an exhaustive search of the chemical space, the planner needs to decide which new experiments should be conducted. Depending on the purpose of the platform, the planner algorithm can be classified into discovery and optimization. Detailed reviews of the existing algorithms for planner have already been published; interested readers can refer to Garud et al.59 and Clayton et al.60 The communication between the coordinator and the planner is mainly done in two ways: variables stored in memory16,22,30 and file transfer.5,19,20,50 It is worth mentioning that the communication protocols are not necessarily the same over one platform. Li et al.17 used database queries for the interaction between the coordinator and librarian, yet they depend on Python variables for the communication between the coordinator and planner. It can be seen that the platform-based approach can adapt to different ways of data exchange, yet modifications that are case sensitive will be needed.
Coordinator–Executor
The executor runs the experiments, computationally or physically, and sends back the experimental results. The interaction between the coordinator and executor module highly depends on the operating system for the instrument, as the actual experiment resources within the executor are normally surrounded by a layer of interface. Therefore, we review the communication protocols of the physical and computational experimental platforms separately.
Physical Experiment Interface
Robotic platforms have their origins in instances such as peptide synthesis6 and the pharmaceutical industry.61,62 Some existing commercially available semi-automated and fully automated platforms in chemistry have emerged as powerful tools and can be embedded into the closed-loop optimization system.15
Commercial platforms provide various high-throughput workflow solutions, ranging from single benchtop or standalone automated workstations up to complete and integrated product development workflows for the entire product development process in chemical material science.63,64 Greenaway et al.65 applied the Chemspeed Accelerator SLT-100 synthesizer platform in the discovery of porous organic cages and the optimization of the cage formation conditions. This platform can carry out up to 96 reactions in parallel, highly speeding up the testing of the proposed experimental conditions that are sent to the platform via file transfer within the Chemspeed custom software. The hardware from Chemspeed is also used by IBM’s RoboRxn,66 a remotely accessible automated organic synthesis platform utilizing various Transformer-based67 ML algorithms for chemical reaction prediction,68 retrosynthetic pathway planning,69 synthesis action extraction,70 and chemistry grammar extraction.71 Vapourtec delivers an automated flow reaction platform with multiple choices for pumps and flow reactors. Successful examples of using the Vapourtec system in the closed-loop optimization setup include drug discovery,72 scale-up development,73 and reaction condition optimization.42,50 It is worth mentioning that commercially available mobile robots and robotic arms have been used in complex and multistep operations.20,23 Communication between the coordinator and the robots was achieved using various communication protocols (TCP/IP over WIFI/LAN, RS-232, websocket, etc.). Although commercial systems developed by various vendors are easily implemented with a user-friendly user interface, it limits the experimental choice across platforms, and it is hard to configure the platform to the existing workflow architecture and setups in the lab.
To enable a modular-based plug-and-play platform, single-board controllers, for example, Raspberry Pi and Arduino, were used to act as the interface layer connecting the coordinator to the actual experiment executor, that is, sample preparation, analytics etc. This is favored by the academic community due to its flexibility and compatibility with different experimental instruments at a relatively low cost. The communication protocols between the coordinator, single-board controller, and experiment executors are various. A TCP/IP protocol was used in the cases where Raspberry Pi was applied. Fitzpatrick et al.21 used a VLAN to control lab equipment and also an SSH tunnel between the virtual environment and the remote control server. Similarly, Roch et al.74 controlled the pump system using Raspberry Pi and interacted via an SCP with the executor codes. In Chemputer designed by Steiner et al.,19 an Arduino was designed as the microcontroller. Instances of experiment executors are created as Python instances at the initialization stage and the coordinator reads related information stored in a GraphML file. Li et al.17 conducted their high-throughput experiments via an Arduino control board as well but followed the JSON-RPC 2.0 protocol used for robots and characterization equipment control. A detailed review of microcontrollers and their applications in automated experimental systems can be found in Fitzpatrick et al.75 The in-house built platform can connect to different lab equipment based on the users’ need and existing lab setup, yet different communication protocols prevent it from extending to other labs or systems.
Robot Operating System (ROS)76 is the de facto standard middleware in the robotics field for orchestrating multi-robot systems. In 2019, Marquez-Gamez and Maffetton77 proposed a ROS architecture for laboratory robotics motivated by Burger et al.,23 envisaging a “cobot” future where human researchers and robots work collaboratively in the chemistry lab using modular and reconfigurable lab equipment interfaced via ROS. A recent paper from Fakhruldeen et al.78 shows proof-of-concept toward this direction.
Computational Experiment Interface
With the rapid development of computational power and simulation methods, computational experiments are playing a more vital role in catalyst design and optimization,79 synthesis planning,80 and catalyst discovery.81 By using theoretical, fully automated screening methods combining ML and optimization to guide density functional theory (DFT) calculations, Tran and Ulissi82 screened across intermetallics for the discovery of electrocatalysts for CO2 reduction and H2.
The main executor for computational experiments is the high-performance computer (HPC). However, the interaction between the HPC and the coordinator on local computers is different from case to case. The scheduler is the interface for the users on the login nodes to submit batch jobs to the compute nodes on the HPC, as the users cannot run their calculations directly and interactively (as they do on their personal workstations or laptops). The scheduler stores the batch jobs, evaluates their resource requirements and priorities, and distributes the jobs to suitable compute nodes.
There are quite a few open-source scheduling software depending on the setup of HPC, among which SLURM is widely used in research computing services.83 Rosen et al.84 developed the PyMOFScreen Python package to manage automated DFT calculations, leading to new electronic structure database constructions and accelerating new materials discovery.85 Multiple software packages were developed to enable high-throughput screening on the HPC, such as Python Materials Genomics (pymatgen),86 FireWorks,87 custodian,86 Atomate,88 GASpy,81,82 and ChemEco.89,90 Depending on the user’s need as well as the DFT calculation software, the structure and the output file of those Python packages are different and nontransferable. A notable effort in addressing this issue is MolSSI QCArchive,91 which offers open access to millions of quantum chemistry calculations done with different software, as well as on-demand computation.
Current Limitations
Despite the huge improvements made in the literature, a few limitations remain to be overcome before it is possible to achieve a global collaborative network.11 The platform-based approach presented heavily relies on the coordinator. This increases the possibility of data loss during transmission, and it will become unsustainable soon with further expansion of the ecosystem. Direct communication between functional components is one potential approach to mitigate this issue, as demonstrated by Fitzpatrick et al.21 in letting the planner directly communicate with lab equipment via TCP/IP.
Another limitation is the ad hoc data representation and storage. This is particularly important as there is no standard method of representing results or recipes for chemical experiments, despite several competing standards of representing molecules coexisting. The heterogeneous data format lacks interoperability that precludes the full utilization of the embedded information. This problem is further exacerbated when the collaboration between different groups is considered; potentially data generated from one group will be shared and tested on the platform of another group for reproducibility and further experimentations. Moreover, the consequent various data transfer and communication protocols result in low extensibility issues as a considerable amount of time is often required when new hardware or software is integrated, also noted by Breen et al.92
Unbalanced chemical data is another limitation to be addressed.8 In ML applications, historical data from reaction databases are normally applied as the training set to guide the learning of the planner models. However, only “good” experiment results are published and stored in these databases, limiting the opportunity of learning from “bad” examples,93 not to mention those platforms generating experimental data from scratch, without utilizing the prior chemical knowledge at all. A further issue lies in several examples where users are required to manually input chemical data.42,94 This is error-prone and limits the potential of full automation.
In brief, improving the interoperability within one platform and between different platforms is a key step in lowering the entry barrier of digitalizing chemistry and promoting a fully automated laboratory. It is thus important for us, as a community, to know how far we are from meeting the prerequisite condition – a fully interconnected data representation capturing the data generated within the experimentation.
Data Representation and Exchange Protocols
As promoted by various researchers,1,8,36,95 the digitalization of chemistry facilitates the collaboration between research groups. Figure 2 reviews data representation and exchange from the different perspectives of a chemical experiment, namely, molecule, reaction, analytical data and method, procedure and hardware, and finally holistic data capture and exchange. Importantly, we distinguish the community efforts into non-semantic and semantic paradigms depending on whether chemical ontologies are involved, and we lay out the connection between them. The agent-based approaches toward standardized and effective communication between each of the components involved are discussed.
Figure 2.
Community landscape toward better data representation and exchange in chemical digitalization. The focus of each category: (a) Molecule: chemical structure, physicochemical properties, and spectral information on a given species; (b) Reaction: chemical reaction scheme, conditions, description of procedures, and statistic summary of the reaction outcome; (c) Analytical data and method: analytical data collected and the methods applied within the experimentation (this is distinct from the spectral information on a given species as this focuses on the data collection process); (d) Procedure and hardware: the operational procedure in an experiment in the format that can be directly executed by hardware; (e) Holistic data capture and exchange: the initiatives to capture all the experimental information generated within the experiment and the exchange of data between different hardware/software. For those on the fence between two categories, we meant they cover both areas. Chemical Markup Language (CML) was labeled as both semantic and non-semantic since it preserves hard-coded and rule-based semantics but not ontologies following semantic web standards.25 Basic Formal Ontology (BFO) is an upper-level ontology as the basis of other ontologies, and it does not capture any domain-specific information.
Non-Semantic Representation
In this review, we broadly distinguish non-semantic efforts into four parts: a representation of cheminformatics formats, a schema for constrained encoding of data, a collection of data stored in a database, and finally a holistic architecture that aims to capture all data generated within an experiment.
Since the discovery of the periodic table of the elements, chemical knowledge is built on structures with competing representations.96 The most commonly used representation is string and line notation, including SMILES,97 InChI,98 SMARTS,99 SELFIES,100 etc. for molecules, and RInChI,101 SMIRKS,102 etc. for reactions. Chemical table files express molecules and reactions in terms of x-y-z coordinates of atoms and bonds. For a more visual representation, molecules and reactions can be illustrated with 2D line drawings (or 2.5D including stereochemistry), and 3D conformers. These formats are interchangeable with the help of cheminformatics tools, e.g., Open Babel103 and RDKit.104 An ML application normally starts with encoding structural representations in the form of high-dimensional vectors to map the implicit chemistry to either physicochemical properties of one molecule or reactivity between different molecules.
Popular chemical databases and registry systems normally store various representations of the above with registry numbers, for example, IUPAC name, CAS number, and PubChem CID, for unique and unambiguous identification within themselves and cross-reference between repositories. PubChem105 is the largest open-source structural chemical information repository. For reaction informatics,106 the scale of open-source databases is much smaller. The USPTO database107 is one of the seminal databases in the community and contains 3.7 million reactions extracted from US patents. It was commercialized as Pistachio108 containing more than 13 million reactions with annotated reaction classifications using named reaction ontology (RXNO109) and expanded coverage to other patent offices, that is, World Intellectual Property Organization (WIPO) and European Patent Office (EPO). Despite the public availability of the USPTO database, its representation schema, Chemical Markup Language (CML) in eXtensible Markup Language (XML), requires extra efforts of format transferring for ML applications. This results in different versions of the USPTO subset that were derived and adapted by various researchers for their applications.68,110−112 As the tailored database can be kept private to the research group, it could be difficult for bench-marking new algorithms.
To facilitate the development of ML in chemistry, Open Reaction Database (ORD)113,114 was formed to encourage precompetitive data sharing in a standardized format. It records how the reaction was performed, including reaction inputs, conditions, outcome, etc. Notably, ORD uses a protocol buffer as its data structure, instead of the commonly used XML schema. It deliberately avoids the use of ontologies due to insufficient ML applications with ontologies seen in the community.115 Despite ORD storing the operation sequence in a machine-readable format, the authors declared it a nongoal at present to make it compatible with programmatic execution on automated synthesis hardware. For more complex operations, ORD only supports a free-text description of the procedure. In terms of the reaction outcome, it focuses more on the statistical summary of the reaction, for example, conversion and yield, and unprocessed analytical data if available. At present, ORD contains 2 million reactions,115 including part of the USPTO data set that was converted from CML.
Unified Data Model (UDM)116 is another initiative aiming at capturing and integrating the experimental information generated during chemical synthesis. UDM was originally developed by Roche as a transfer model of MDL RD file format for integrating data from various sources into Reaxys database.117 It has since evolved to an XML schema with three main elements, namely, citations, molecules, and reactions. In addition to recording the molecule and reaction identifiers, UDM annotates its data with semantic vocabularies. The reaction classification is based on the molecular processes (MOP118) and RXNO ontologies, demonstrated by its sample data taken from Reaxys. The analytical method and results type are based on a working draft version of Allotrope Foundation Ontology (AFO119) where duplicate entries exist. However, it should be noted that the way UDM integrates the ontologies is by enumerating the ontological classes as a sub-schema of UDM and tagging them to the XML elements as attributes. One general issue with this type of enumeration and attribution is that the relationships declared in the ontologies are not retained in the XML schema, for example, class and subclass relationship between concepts in MOP and RXNO, and the corresponding relationship between result types and analytical methods in the AFO. Looking at the publicly available resources, there are no programmatic constraints over how ontological axioms are enforced in a UDM file. Moreover, UDM allows any type of format for analytical data recording, at least by XML schema itself; tailored tools would be necessary for better utilization of the data. In its latest release, UDM extends its support to the SPRESI database.120 Moving forward, UDM aims to provide fully captured representations of reaction predictions and optimizations for multistep reactions. Additional support for environmental health and safety data is also of interest.121
Similar to ORD, Chemotion122 aims to build a community-driven repository to better publish reaction data generated across different laboratories. In practice, despite containing less data, a key distinguisher of Chemotion is its level of interoperability in enabling programmatic transfer of raw analytical measurements for integration of electronic lab notebook (ELN) from individual laboratories. It does so by supporting reading and converting analytical data in the widely used JCAMP-DX format.123 Each published reaction in Chemotion has a semi-machine-readable format with a digital object identifier (DOI). It cross-references compound entries in PubChem. Like UDM, Chemotion incorporates ontologies (RXNO and chemical methods (CHMO124)) for semantic annotations at a vocabulary level. On the data validation front, Chemotion automates curation of some types of analytical data, for example, plausibility checks of nuclear magnetic resonance (NMR) data. Human inputs are still required to ensure data quality for publication. To enable more data resources, Chemotion is planning to support reactions stored in a UDM format. Chemotion is also planning to connect ELN to robotics to establish an automated platform for chemical synthesis.125
As mentioned, JCAMP-DX is a data standard widely used for recording and sharing analytical data. However, one drawback to its utilization is the lack of validation tools making it difficult for data generated from different software to adhere to the standard terms.126 One approach to alleviate this problem is modernizing the standard terms with an XML schema, such as Analytical Information Markup Language (AnIML).127 AnIML is partly based on SpectroML128 and Generalized Analytical Markup Language (GAML)126 and also draws from JCAMP-DX and ASTM ANDI. On the chemical structure side, AnIML supports the CML format together with other commonly used line notations. AnIML aims to provide vendor-neutral analytical and biological data representations that are designed for manufacturers to install and maintain. For the same reason, AnIML provides audit trials and other metadata for reporting information in regulatory processes. At present, AnIML supports most common analytical equipment with detailed documentation for ultraviolet–visible spectrophotometry (UV/vis), chromatography, and indexing.
Up to this point, reviewed efforts are standardizing the data generated during the experiment. Initiatives exist to standardize the instrumentation interface, for example, Standardization in Lab Automation (SiLA).129 SiLA is a micro-service architecture using gRPC and HTTP/2 protocols with a protocol buffer as its payload. It adopts a client/server view to describe the devices in the lab environment, where entities expose (multiple) services as SiLA Features accessible to others. SiLA Features are expressed in a predefined XML-based schema and stored in an online repository for service discovery. Each feature is assigned a unique identifier to enable peer-to-peer interactive communication, status queries, and reactions to events. As SiLA is a communication protocol for equipment control, it utilizes AnIML as the medium for the bidirectional transfer of analytical data between laboratory information management systems (LIMS) and chromatography data systems (CDS) in a file-less fashion.130 The combination of SiLA and AnIML represents a promising direction: standardized interfaces for instrumentation and unified machine-readable data representations. This results in a complete data package after completion of the analytical experiment, including all the process steps and the generated data.
While SiLA standardizes the equipment interface, chemical recipe file (CRF)20 and chemical description language (XDL)131 are initiatives to automate experiment execution. They both focus on translating the operational procedures from unstructured descriptions to robot execution commands.
CRF20 is a CSV-based schema developed for flow synthesis. Since the instructions are generated based on batch reaction data, human modification is required to enable continuous processes. One notable aspect of their setup is their modularized reaction hardware, making it robotically self-reconfigurable, as demonstrated by the back-to-back synthesis of medicinally relevant small molecules.
XDL131 is an XML schema focusing on batch synthesis. It contains three main components as the apparatus to be employed and manually configured, chemicals to be used, and robotic steps abstracted from operations used by chemists in the lab. An ontology is proposed to map the command and hardware executions; however, it is not published in semantic web standards.25 Before the instructions are sent to execution, researchers can modify the conditions to benefit human intuitions.
Both CRF and XDL focused on providing a flexible framework to conduct synthesis for multiple molecules. However, neither of them included an automated analysis step. The statistical summary of the chemical synthesis is thus not provided in a standardized format as done by other reaction schemas.
ESCALATE is an attempt toward holistic data capture and exchange.48 It proposed an ontological framework for experimentation, supporting data collection, reporting, and experiment generation. This framework captures and reports all the reactions conducted, including “bad reactions”, in line with the cultural change promoted by the community.95 In its first release,48 the claimed ontological framework was realized by implementing template-based files to store the experimental information, for example, CSV and text files in a file-sharing folder infrastructure (Google Drive). The authors additionally acknowledge that the Allotrope Foundation Data Standard could be incorporated into this data lake. Despite uniform resource locators (URLs) being employed as pointers to some data, the data representation remains heterogeneous and only semi-structured, without the semantic features required by semantic web standards.25 In a more recent development,132 an ESCALATE REST API133 was made available to showcase the possibility of retrieving chemical informatics data from PubChem API, interacting with a Postgres database for submitting experiment jobs to a laboratory, and querying the hosted results.
In general, the non-semantic efforts are closely connected to each other. Multiple representations are normally used within schemas or databases to meet the needs of different applications. Databases cross-reference to each other using registry numbers.
Another notable trend is the adoption of XML schema as data structures. XML is a machine-readable format for algorithmic operations. It relies on string parsing when automating some of the processing steps, for example, the automated unit conversion provided by XDL, where the case-insensitive conversion to a standard unit was performed. However, XML is not designed to host large sets of data as querying between different files can be challenging. The linkage between entries in XML is implicit and requires tailored codes to handle. A solution to this problem could be hosting data in a database and exposing that as the query interface. Yet as demonstrated in the platform-based approach, the same scalability issue would emerge.
It is worth noting the efforts to improve interoperability. Most of the schemas classify items using annotations based on ontological taxonomies. There are also works that claim to have developed ontologies, but that are not however represented in a formal ontology language, such as Web Ontology Language (OWL); their data is still file-based. In the context of this perspective, we consider these outputs to be taxonomies that formalize the hierarchical relationships, distinguishing them from the chemical ontologies that are introduced in the next section. The difficulty of achieving general interoperability remains an issue to be addressed.
Semantic Representation
Since the landmark publication by Berners-Lee et al.,24 the semantic web field has envisioned the next generation of the web in both a human- and machine-readable format for better data sharing among mankind and faster data processing using computers. Through ups and downs, the semantic web community has pivoted from ontologies to linked data, and further to knowledge graphs, which are gaining attention again in recent years. For a comprehensive review of developments in the semantic web field, interested readers are referred to Hitzler.134 The focus herein is the uptake of such technologies in the chemistry domain, as illustrated in the right half of Figure 2. For initiatives where only TBox are available, we labeled them as “Ontology”, whereas ABox that are published are labeled “Semantic Web”. Those under “TheWorldAvatar” will be introduced in the next section.
Chemical informatics has a long history of utilizing semantic web technologies. The chemical semantic web135−137 is one of such early attempts by Murray-Rust and co-workers, contemporaneously to Berners-Lee’s proposal of the semantic web.24 In their work, CML was employed to host the data, prior to OWL becoming the semantic web standard. CML schema covers concepts related to atoms, molecules, computational chemistry, crystallography, spectra, chemical reactions, and polymers. It greatly influenced the development of reaction informatics; especially, it is the molecule representation implicitly used by various cheminformatics software.138
Since OWL became more and more popular in modeling ontologies, more activities of ontology development have been demonstrated in the scientific domain. Despite the authors of CML holding the view that ontologies following the semantic web standards25 are “too complex for the chemical community to take on board, and provides little effective added value”139 compared to their approach, the benefit of semantics motivated the development of chemical ontologies to a great extent, especially work at Royal Society of Chemistry (RSC),140 that is, CHMO,124 RXNO,141 and MOP.118 These ontologies are sophisticated and carefully curated. As demonstrated in the non-semantic efforts, they are widely used for annotating reaction classes and analytical methods.
Another driving force of ontology development in the chemistry and biology domain is the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI). In contrast to RSC ontologies that only provide concepts, EBI ontologies provide knowledge at both a terminological and an assertional level, covering small molecules (ChEBI26) and cheminformatics (CHEMINF142) in a cross-referenced fashion. CHEMINF supports molecular structure representations in the CML format; it also partly transformed data from PubChem into a knowledge base together with cross-reference to their PubChem entries. ChEBI deposited its data in PubChem entries and cross-referenced to Reaxys entries. These ontologies complement other ontologies in the field. For example, CHMO intends to describe the physical and practical methods, whereas CHEMINF covers the computational and theoretical ones.
Ontologizing existing databases was demonstrated in the community, including ChEMBL RDF,143 and PubChemRDF,144 the semantic version of the current largest open-source chemical information repository, PubChem.105 However, the Resource Description Framework (RDF) version of these databases did not come with officially supported SPARQL Protocols and RDF Query Language (SPARQL) endpoint. Galgonek and Vondrášek145 recently addressed this issue by integrating PubChem, ChEMBL, and ChEBI data sets as a PostgreSQL database and exposing that to support SPARQL queries. This enabled fast access to chemical data from different sources.
Allotrope Foundation is a collaborative effort from the pharmaceutical industry.119 Similar to AnIML, it aims to propose a common data exchange format to unify the laboratory information technology (IT) landscape. It started from realizing the vision of Roberts et al.146,147 where an XML schema was envisaged to provide a holistic data format. It later decided to store data based on HDF5 and RDF formats that were controlled by ontologies for semantic capabilities. The foundation now contains three ontologies, namely, AFO, Allotrope Data Format (ADF), and Allotrope Data Model (ADM). AFO is the ontology at the TBox level representing the knowledge in the chemistry domain and it borrows heavily from CHMO. ADF refers to the ontology ABox classified by AFO, extended with more features on data structure and provenance for long-term archiving. ADM is the constraint for how data in ADF should be modeled following AFO. However, only AFO is freely accessible to the public, with the remaining resources restricted to community members.
Compared to non-semantic efforts, a key distinguishing factor of the semantic approach is its fully linked concepts and data instances. This is particularly true for the ontologies reviewed above, as their concepts follow the classification of the Basic Formal Ontology (BFO). The instances stored under each ontology are inherently linked and consistent in logic. This enables interoperability between domains and easy access to data from different sources via SPARQL queries. Moreover, the linked nature made it possible to reduce duplication of information by providing unique identification to the entities, whereas in XML it would be more likely that the same information would appear in different files, for example, when the same molecules are involved in different reactions.
The biology community has demonstrated that the population of data is the key to a broader impact with well-defined ontologies.148 However, classifying and annotating data into ontologies while maintaining logical consistency is a challenging task, especially with complex ontologies. It is costly to adopt and creates a high entry barrier. This is reflected in reaction informatics, as ontological data is still very much limited to chemical species information, and there is currently no semantic version of reaction data available. This further exacerbated the problem of insufficient adoption of semantic web technologies in ML and other practical engineering applications, as noted by the developers of ORD,115 not to mention that to actually control the equipment execution and automate the data exchange framework is even more challenging. A trade-off between engineering practices and comprehensive representation is thus important. A potential solution to this would be to convert existing databases149 into RDF.
The same issue was acknowledged by the Allotrope Foundation119 that there is a trend of making simpler data models for practical applications. One of their partner companies, TetraScience, developed an Intermediate Data Schema (IDS)—a JSON-based schema of analytical data as the precursor of the AFO format. Using an agent, data generated from the analytical equipment was collected and converted to ADF for further analytics. Despite being proprietary, it enlightens the way forward to standardize data conversion and integration while it is generated. A perspective from Godfrey et al.150 backed this idea, that is, data stored in an ontological framework would very much facilitate the proliferation of interoperable standards and also keep the flexibility of introducing new methodologies.
Agent-Based Approaches
With the ontological data representation, the way of data generation and consumption is another issue needing to be addressed. By definition, an agent is a piece of “automated” software capable of acting toward achieving its objectives.151 In such a process, agents can communicate and coordinate, that is, exchange information with each other, in a standardized format. As aforementioned, TetraScience utilizes agents to standardize data generation; this section focuses on agent applications in standardizing data utilization.
In the context of chemical automation, agent-based approaches can be adapted to replace the functional components within a platform-based approach. Montoya et al.29 wrapped different algorithms as agents to suggest the next experiments for DFT calculations on stable materials discovery. Gomes et al.28 standardized various tasks as agents (bots) in a platform for crystal-structure phase mapping. Caramelli et al.30 applied agent-based model simulations to showcase the effectiveness of multi-threaded networking principles in searching for the optimal solution in the chemical space.
In the above studies, a step was made to turn functional components into modularized agents and standardize the data exchange between them. However, the communication was done by passing in-memory programming variables28,29 or posting plain-text on a human messaging platform (Twitter).30 As discussed in earlier sections, the same drawbacks such as lack of scalability and interoperability will emerge when scaling up the framework and integrating computational and physical experimentation. A relevant first step toward addressing this issue is demonstrated by DLHub,152 which allows users to publish, share, and cite ML models for applications in science.
Following the introduction of ontological data representations, a natural question is to ask whether the use of agents and ontologies can be combined to harness the strengths of both approaches. The challenge of how best to do this has been an open research question since the 2000s.31 In theory,24 the ontology can help agents with more flexible operations, whereas agents can help the ontology for better data utilization. The Foundation for Intelligent Physical Agents153 (FIPA) proposed a set of specifications focusing on communication and interoperability between agents. Specifically, FIPA Ontology Service Specification elaborated the idea of having an ontology agent to support the message interpretation between agents in detail. However, it never made it to the standard stage. In the following years, JADE,154 a Java-based software platform that simplifies the implementation of FIPA-compatible multi-agent systems, attempted to provide an ontology in its realization of FIPA standards, but they only provided the ontology as part of the Java code, without connecting to a knowledge base. Attempts to merge the two technologies have been seen in other domains, but not much in chemistry until very recently. An attempt to do this is described in the next section.
Dynamic Knowledge-Graph-Based Approach
In this section, we explore how a combination of semantic web technologies and multi-agent systems, a dynamic knowledge-graph-based approach, might be applied to realize a complete digital and self-driving laboratory, that is, a chemical digital twin. We review an attempt to develop such an approach in the “World Avatar” project. We subsequently outline a conceptual example of automated closed-loop optimization powered by a dynamic knowledge graph and assess its potential in achieving full automation.
Before diving into further details, we also provide a glossary of terms that are heavily used in this section. We acknowledge that the terms may have different meanings in other contexts; we make no attempt at general definitions here.
Knowledge graph: a collection of data and software agents expressed as a directed graph controlled by ontologies, where the nodes and edges refer to concepts and relationships correspondingly. This has broader coverage than the knowledge graph as commonly used in semantic web studies,134 where only data are modeled as a directed graph. This is also different from the knowledge graph built based on Reaxys by Segler and Waller155 for reaction discovery problems, which expressed molecules as nodes and binary reactions as edges.
Digital twin: a virtual replica of real-world entities in the form of a knowledge graph. It is usually created for the real-time monitoring and controlling of real entities and thus should be synchronous with its physical counterpart.
Autonomous agent: a semantic web service that acts upon the knowledge graph to achieve predefined goals. Importantly, agents themselves are part of the knowledge graph and represented using the ontology for the agent. While active, agents communicate with each other and interact with the knowledge graph for data retrieval and operation. In the sense of a multi-agent system, the knowledge graph is the “environment” of the agents. The communication between the active agents is conducted via an HTTP request/response. They use ontologies to establish a common understanding of the topic of interest.
Dynamic knowledge graph: a knowledge graph that is constantly modified by agents with the latest status of the real world. It controls and influences the real world by updating the specifications of the digital twin and actuating that with agents.
Current State
The World Avatar (http://theworldavatar.com/) project aims to develop an all-encompassing framework156 that is capable of describing any aspect of the world. The World Avatar uses a dynamic knowledge graph, based on an ontological representation of physical entities and interoperable agents. The agents are able to update the knowledge graph with new data, analyze data, make decisions and control entities in the real world. This approach has been suggested to offer a suitable design for a universal digital twin.157
Starting from an industrial perspective, the J-Park Simulator, a precursor of the World Avatar, developed a framework that was applied to describe waste energy158 and optimize the operation159 of an eco-industrial park on Jurong Island, Singapore.160
The World Avatar has also been applied to describe a number of different types of chemical data and provides ontologies for quantum chemistry (OntoCompChem161), chemical reaction kinetics (OntoKin162), chemical species (OntoSpecies163), and combustion experiments (OntoChemExp164). OntoSpecies links other ontologies to provide unambiguous identification of the chemicals, enabling translation of chemical names when integrating chemical data gathered from different sources.164 The ontologies are connected to many of those described in previous sections. For instance, the development of OntoCompChem is partly based on the CompChem terms as described in the CML and the Gainesville Core (GNVC) ontology.165 The relationship between these ontologies and other data representations used by the community is shown in Figure 2.
To facilitate the automated data utilization within the knowledge graph, an agent ontology (OntoAgent166) was developed as the design pattern of interoperable agents. Each atomic agent is capable of predefined simple tasks with its input/output (I/O) signature linked to the concepts in the domain ontologies. This enabled I/O-based service discoveries to form the agent composition for complex tasks.166 Notably, by using OntoAgent to express the agents as part of the knowledge graph, the activities of agents are easily trackable so that provenance can be recorded to document the changes of the knowledge graph over time.
Tools and Resources
All outputs from the World Avatar project are available in the public domain. Various agents were developed and released on Github to provide service in the chemistry domain, for example, automated DFT calculations to address inconsistent thermodynamic data,167 automated mechanism calibration to improve the alignment between kinetic models and experimental data,164 and a question answering system enabling intuitive human data interaction–natural language queries of chemical data covering data from different sources.168 Work is in progress to integrate services provided by agents into the natural language processing system so that on-demand computations can be invoked when a question could not be answered with the current knowledge. Users are welcome to check for more functionalities over time: https://kg.cmclinnovations.com/explore/marie.
Knowledge Graph Value Proposition
A core strength of the knowledge graph approach is interoperability. The knowledge graph provides a mechanism to combine data, descriptions of software, and hardware interfaces in a standardized way, facilitating automation and allowing communication between agents acting on data from different domains.164,167
Another key feature is the open-world assumption, enabling the scalability of a knowledge graph system. Once the skeleton ontology is set, extending knowledge coverage and tailoring against specific applications is easy to manage. It should work just like adding new features to a computational library.
Moreover, once the code of conduct is defined for each of the agents, they can act autonomously and modify the knowledge graph as time elapses. By doing so, the dynamic knowledge-graph reflects and influences the ever-evolving status of the real world.
Automated Closed-Loop Optimization
The characteristics of dynamic knowledge graphs open up the possibility of a new and powerful approach to closed-loop optimization. In this section, we explore how to apply a dynamic knowledge graph to do this in the context of a case study that was previously automated using a platform-based approach.42 The case study considers flow chemistry. However, given suitable ontologies and agents, the underlying principles are expected to generalize to any practices in chemistry where a “design–make–test–analyze” loop is involved.
Figure 3 illustrates the whole framework consisting of three layers, namely, the real world, the dynamic knowledge graph, and active agents. Reaction data are expressed in ontologies and hosted in the knowledge graph, together with the digital twin of the lab equipment and interoperable agents. Once activated, these agents act autonomously over the knowledge graph and keep the cyber and real worlds synchronized. The update of the digital twin is based on the readings from the equipment. This is not limited to the reaction and analytical equipment but includes environmental sensors located in the laboratory. Each device has its corresponding input agent transmitting the data into the knowledge graph. The monitor agent is responsible for monitoring the status of the digital twin and assessing if further optimization is required. If needed, it invokes the design of experiment (DoE) agent to suggest new experiments and update the configurations of the digital twin. The actuation of such settings is the responsibility of the execution agent to reflect the changes made in the knowledge graph. This loop of self-optimization continues until the monitor agent decides the optimal condition is reached. Importantly, with agents expressed in the OntoAgent format, this framework supports agent discovery service to enable agent-agnostic execution requests.
Figure 3.
Dynamic knowledge-graph-based approach toward automated closed-loop optimization. The real world layer demonstrates the existing physical entities, adapting from the experimentation setup of Jeraal et al.42 The dynamic knowledge graph layer hosts all the data generated during the experimentation and a digital twin of the experimentation apparatus. This layer is dynamic as it reflects and influences the status of the real world in real time. This synchronization is enforced by the agents in the active agents layer, which are instantiated from their ontological representation in the knowledge graph.
Compared to the platform-based approach, one distinguishing feature of the dynamic knowledge-graph-based approach is that everything is connected, scalable, unambiguous, distributed, multi-domain, interoperable, accessible, and most importantly evolving in time. As all the digital replicas of the hardware are expressed in the same way, new equipment can be immediately accessed by any existing software once it is instantiated in the knowledge graph. The same applies when adding new ML algorithms wrapped following OntoAgent specifications; standardized interactions with data and HPC services can be established in no time.167 This enables the rapid integration of the most advanced algorithms and equipment. Due to the modularized nature, in contrast to heavily intertwined coding logic within a monolithic application, the duty of development of each component is separated, improving the maintainability of the entire system.
Another advantage of this approach is its future-proof nature, for example, its interoperability when integrating with other ontological initiatives in the community. At the species level, OntoSpecies acts like a register system that covers most of the chemical identifiers, making it possible to match with PubChemRDF or other molecular databases. In terms of chemical reactions, OntoKin is already able to describe the kinetic mechanisms of gas-phase chemistry, with OntoChemExp covering the statistical summary of combustion reactions. These concepts can be expanded to describe other chemistry domains of interest. A further opportunity lies in linking the reactions with concepts as defined in RXNO and MOP, embracing their full semantic capabilities. Similar expansion can be made with CHMO or AFO to describe the analytical data and method employed in the experimentation.
Toward a Digital Laboratory and Beyond
Beyond closed-loop optimization, various researchers have pictured the future toward the next-generation of autonomous laboratories and a global collaborative network.1,8,11,15,36,40,66,92,146,147 Jointly, we listed below a few key challenges and how we see the knowledge-graph-based approach helping.
Data Generation, Integration, and Sharing
This challenge lies in the data management practice in the platform-based approach.8,36 Going toward a full digitalization, the ability to capture all generated data within an experiment (even a “bad” reaction), integrate it with literature data, and share with the community is crucial for navigating in the chemical space. As aforementioned, the knowledge-graph-based approach is designed to be a holistic data capture and exchange framework. With a consensual description of the experiment, literature data stored in the open-source databases can be converted into the ontological format, integrated with the newly generated data.
Roberts et al.147 envisioned a combination of XML and relational databases to achieve the same goal. However, the authors acknowledged that a database is difficult for a nonspecialist to explore without clear documentation. To enable data-agnostic queries within the knowledge graph, question answering systems can be of help.168 Researchers can thus interact with data intuitively from anywhere at any time, aligning with FAIR principles.169 The semantic-rich nature incorporates prior knowledge into the data, presenting the potential to explore informed ML applications.170
Orchestration of Physical and Computational Experiments
This challenge lies in the emerging trend of physically synthesizing the compounds identified by computational high-throughput screening.8,65,92,171,172 In a platform-based approach, this requires a heavy workload on the coordinator to manage the information flow and to orchestrate the software and hardware from different vendors. SiLA and AnIML are the initiatives to provide standardized interfaces and data reporting for proprietary hardware, adopting a mindset of peer-to-peer information exchange that is similar to the platform-based approach.
Whereas in the vision by Roberts et al.146,147 and a dynamic knowledge-graph, information is promoted to be accessible to all stakeholders within a laboratory environment, flattening the structural design. For instance, active agents in the World Avatar share the same world-view. The communication between them only serves as a pointer to the correct resources (IRIs). This enables asynchronous communication to accommodate time-consuming activities. Moreover, the communication itself is stored in the knowledge graph and accessible to all agents: everything is transparent and FAIR. By further introducing dependency between different concepts, both data and instructions to the instrument will act like a flow of information traveling in the knowledge graph, analogous to an adaptive organism.
Democratization of Chemical Automation
As previously discussed, different approaches toward chemical automation coexist. Choices are to be made for groups upgrading from a common lab environment. Ideally, an off-the-shelf solution should be available that is compatible with any platform to lower the entry barrier. Therefore, interoperability is key toward the democratization of chemical automation.
By design, the knowledge graph approach is able to connect to any laboratory. As it is based on ontologies abstracted from the laboratory entities, it is possible to instantiate a new lab into the knowledge graph and utilize the framework. Developing such a usable and reusable ontology is an iterative process and requires the consensus of the domain. It is envisioned to be a community effort in developing and maintaining its life-cycle. As demonstrated by the general semantic web community134 and particular application experience in the chemical engineering community (OntoCAPE173), trial-and-error will be inevitable in the coming decade. However, it is reasonable to be positive given the successful adoption of these technologies by giant IT companies.174 In that regard, the World Avatar is an open project with all resources available on Github and welcomes contributions from the community.
Role of Human Researchers
Despite the advantage of chemical automation, there has been scepticism that the automation of chemistry will replace the bench chemist.175 In our view, the development of a digitalized and automated laboratory would enhance the capability of human researchers, enabling them to focus on creative activities, without worrying about the exact physical steps required to achieve their goals. This is similar to how the computer changed our way of working and increased productivity. Since the data in the knowledge graph is easy to query, researchers can focus on interpreting the experimental data and finding insights in historical knowledge generated from mankind.106,176 There exists an opportunity for researchers to encode their chemistry intuition into the knowledge graph, essentially making a digital twin of themselves. It would be possible for researchers from different laboratories to exchange views and establish collaborations previously unfeasible. It would be interesting to see what human intuition can achieve when empowered by greater computing abilities.
Moreover, the linked nature of semantic web technologies can bring us further to smart factories, smart buildings, and smart grids,177 as has already been demonstrated by the application of the World Avatar in smart city planning,178 and the UK Digital Twin157 (https://kg.cmclinnovations.com/explore/digital-twin). By constructing a digital laboratory and linking it to the wider context, we believe it will facilitate multi-scale and cross-domain interactions between scientists, engineers, and policy makers to investigate how research done in the lab would affect the whole world. Equipped with scenario analysis, this will help to identify the direction science advances.
Conclusions and Outlook
This contribution was motivated by the absence of standardized data representations and communication protocols, which precludes further development toward the vision of a global collaborative research network.
We performed a thorough review of the data flow between the different functional components within state-of-the-art studies on chemical automation. We found the common platform-based approach employs ad hoc data representations and subsequently different data transfer protocols. This results in scalability issues when integrating new hardware and software, and interoperability issues when collaborating among different platforms: better data representation and exchange are desired.
We reviewed both semantic and non-semantic efforts in the community and outlined the connections between initiatives. Besides the existence of a pattern to promote semantic representations of chemical knowledge, studies are emerging to use agent-based approaches for standardized generation and consumption of data.
With our past experience in closed-loop optimization and knowledge-graph development, we conjecture that a dynamic knowledge-graph-based approach would enable rapid integration of data and AI-based agents for chemical discovery and development. By integrating physical entities into the cyber space, it promotes better utilization of the plethora of computational power in our efforts toward a sustainable future.179
In light of the Industry 4.0 revolution, as well as the current COVID situation, this perspective combines the review of common practices in data representation/exchange, community landscape in the development of better data for reaction informatics, and also an outlook toward the holistic integration of automation, AI, and chemistry. The topic of this perspective is timely, and we believe it will start thought-provoking conversations over our way toward fully digitalized chemistry as a community.
Following the knowledge graph approach, hopefully in the not too distant future, we will see the realization of a global collaborative research network. We envisage it would allow more interdisciplinary studies to be conducted for a better understanding of the research activities of mankind. With such further advancements to knowledge graph technology, we are looking forward to a sustainable future in the commencing decade.
Acknowledgments
This research was supported by the National Research Foundation, Prime Minister’s Office, Singapore, under its Campus for Research Excellence and Technological Enterprise (CREATE) programme, and Pharma Innovation Platform Singapore (PIPS) via grant to CARES Ltd “Data2Knowledge, C12”. The authors are grateful to EPSRC (grant number: EP/R029369/1) and ARCHER for financial and computational support as a part of their funding to the UK Consortium on Turbulent Reacting Flows (www.ukctrf.com). This work was cofunded by EPSRC (grant number: EP/R009902/1) “Combining Chemical Robotics and Statistical Methods to Discover Complex Functional Products”. The authors thank Dr. Jacob W. Martin for his advice on information management. The authors thank Dr. Andrew C. Breeson for his help with proofreading. The authors thank Yiqun Bian and Guanhua Li for their helpful recommendations and feedback on color scheme, which helped to improve the overall aesthetic expression of the TOC graphic. J. Bai acknowledges financial support provided by CSC Cambridge International Scholarship from Cambridge Trust and China Scholarship Council. M. Kraft gratefully acknowledges the support of the Alexander von Humboldt Foundation.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/jacsau.1c00438.
Detailed findings from the selected state-of-the-art studies in chemical automation: To the best of our knowledge, we identified the functional component realization in a platform-based approach in Table S1 and in Table S2, we categorized the data flow and communication protocols between the functional components following the method described in the main text (PDF)
Author Contributions
∥ J.B. and L.C. contributed equally to this work.
The authors declare no competing financial interest.
Supplementary Material
References
- Wilbraham L.; Mehr S. H. M.; Cronin L. Digitizing Chemistry Using the Chemical Processing Unit: From Synthesis to Discovery. Acc. Chem. Res. 2021, 54, 253–262. 10.1021/acs.accounts.0c00674. [DOI] [PubMed] [Google Scholar]
- Hammer A. J. S.; Leonov A. I.; Bell N. L.; Cronin L. Chemputation and the Standardization of Chemical Informatics. JACS Au 2021, 1, 1572–1587. 10.1021/jacsau.1c00303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tao F.; Qi Q. Make More Digital Twins. Nature 2019, 573, 490–491. 10.1038/d41586-019-02849-1. [DOI] [PubMed] [Google Scholar]
- Inderwildi O.; Zhang C.; Wang X.; Kraft M. The Impact of Intelligent Cyber-Physical Systems on the Decarbonization of Energy. Energy Environ. Sci. 2020, 13, 744–771. 10.1039/C9EE01919G. [DOI] [Google Scholar]
- Cao L.; Russo D.; Felton K.; Salley D.; Sharma A.; Keenan G.; Mauer W.; Gao H.; Cronin L.; Lapkin A. A. Optimization of Formulations Using Robotic Experiments Driven by Machine Learning DoE. Cell Rep. Phys. Sci. 2021, 2, 100295. 10.1016/j.xcrp.2020.100295. [DOI] [Google Scholar]
- Merrifield R. B.; Stewart J. M.; Jernberg N. Instrument for Automated Synthesis of Peptides. Anal. Chem. 1966, 38, 1905–1914. 10.1021/ac50155a057. [DOI] [PubMed] [Google Scholar]
- Coley C. W.; Eyke N. S.; Jensen K. F. Autonomous Discovery in the Chemical Sciences Part I: Progress. Angew. Chem., Int. Ed. 2020, 59, 22858–22893. 10.1002/anie.201909987. [DOI] [PubMed] [Google Scholar]
- Coley C. W.; Eyke N. S.; Jensen K. F. Autonomous Discovery in the Chemical Sciences Part II: Outlook. Angew. Chem., Int. Ed. 2020, 59, 23414–23436. 10.1002/anie.201909989. [DOI] [PubMed] [Google Scholar]
- Schneider G. Automating drug discovery. Nat. Rev. Drug Discovery 2018, 17, 97–113. 10.1038/nrd.2017.232. [DOI] [PubMed] [Google Scholar]
- Tabor D. P.; Roch L. M.; Saikin S. K.; Kreisbeck C.; Sheberla D.; Montoya J. H.; Dwaraknath S.; Aykol M.; Ortiz C.; Tribukait H.; Amador-Bedolla C.; Brabec C. J.; Maruyama B.; Persson K. A.; Aspuru-Guzik A. Accelerating the Discovery of Materials for Clean Energy in the Era of Smart Automation. Nat. Rev. Mater. 2018, 3, 5–20. 10.1038/s41578-018-0005-z. [DOI] [Google Scholar]
- Stach E.; et al. Autonomous Experimentation Systems for Materials Development: A Community Perspective. Matter 2021, 4, 2702–2726. 10.1016/j.matt.2021.06.036. [DOI] [Google Scholar]
- Peplow M. Organic Synthesis: The Robo-Chemist. Nature 2014, 512, 20. 10.1038/512020a. [DOI] [PubMed] [Google Scholar]
- Dimitrov T.; Kreisbeck C.; Becker J. S.; Aspuru-Guzik A.; Saikin S. K. Autonomous Molecular Design: Then and Now. ACS Appl. Mater. Interfaces 2019, 11, 24825–24836. 10.1021/acsami.9b01226. [DOI] [PubMed] [Google Scholar]
- Aspuru-Guzik A.; Persson K. Materials Acceleration Platform: Accelerating Advanced Energy Materials Discovery by Integrating High-Throughput Methods and Artificial Intelligence. Mission Innovation: Innovation Challenge 6. 2018; http://nrs.harvard.edu/urn-3:HUL.InstRepos:35164974, Accessed 12 November 2021.
- Flores-Leonar M. M.; Mejía-Mendoza L. M.; Aguilar-Granda A.; Sanchez-Lengeling B.; Tribukait H.; Amador-Bedolla C.; Aspuru-Guzik A. Materials Acceleration Platforms: On the Way to Autonomous Experimentation. Curr. Opin. Green Sustain. Chem. 2020, 25, 100370. 10.1016/j.cogsc.2020.100370. [DOI] [Google Scholar]
- MacLeod B. P.; et al. Self-Driving Laboratory for Accelerated Discovery of Thin-Film Materials. Sci. Adv. 2020, 6, eaaz8867. 10.1126/sciadv.aaz8867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li J.; Li J.; Liu R.; Tu Y.; Li Y.; Cheng J.; He T.; Zhu X. Autonomous Discovery of Optically Active Chiral Inorganic Perovskite Nanocrystals through an Intelligent Cloud Lab. Nat. Commun. 2020, 11, 2046. 10.1038/s41467-020-15728-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McNally A.; Haffemayer B.; Collins B. S. L.; Gaunt M. J. Palladium-Catalysed C–H Activation of Aliphatic Amines to Give Strained Nitrogen Heterocycles. Nature 2014, 510, 129–133. 10.1038/nature13389. [DOI] [PubMed] [Google Scholar]
- Steiner S.; Wolf J.; Glatzel S.; Andreou A.; Granda J. M.; Keenan G.; Hinkley T.; Aragon-Camarasa G.; Kitson P. J.; Angelone D.; Cronin L. Organic Synthesis in a Modular Robotic System Driven by a Chemical Programming Language. Science 2019, 363, eaav2211. 10.1126/science.aav2211. [DOI] [PubMed] [Google Scholar]
- Coley C. W.; et al. A Robotic Platform for Flow Synthesis of Organic Compounds Informed by AI Planning. Science 2019, 365, eaax1566. 10.1126/science.aax1566. [DOI] [PubMed] [Google Scholar]
- Fitzpatrick D. E.; Maujean T.; Evans A. C.; Ley S. V. Across-the-World Automated Optimization and Continuous-Flow Synthesis of Pharmaceutical Agents Operating through a Cloud-Based Server. Angew. Chem., Int. Ed. 2018, 57, 15128–15132. 10.1002/anie.201809080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bédard A.-C.; Adamo A.; Aroh K. C.; Russell M. G.; Bedermann A. A.; Torosian J.; Yue B.; Jensen K. F.; Jamison T. F. Reconfigurable System for Automated Optimization of Diverse Chemical Reactions. Science 2018, 361, 1220–1225. 10.1126/science.aat0650. [DOI] [PubMed] [Google Scholar]
- Burger B.; Maffettone P. M.; Gusev V. V.; Aitchison C. M.; Bai Y.; Wang X.; Li X.; Alston B. M.; Li B.; Clowes R.; Rankin N.; Harris B.; Sprick R. S.; Cooper A. I. A Mobile Robotic Chemist. Nature 2020, 583, 237–241. 10.1038/s41586-020-2442-2. [DOI] [PubMed] [Google Scholar]
- Berners-Lee T.; Hendler J.; Lassila O. The Semantic Web. Sci. Am. 2001, 284, 34–43. 10.1038/scientificamerican0501-34.11396337 [DOI] [Google Scholar]
- W3C, Semantic Web. 2015; https://www.w3.org/standards/semanticweb/, Accessed 1 June 2021.
- Hastings J.; de Matos P.; Dekker A.; Ennis M.; Harsha B.; Kale N.; Muthukrishnan V.; Owen G.; Turner S.; Williams M.; Steinbeck C. The ChEBI Reference Database and Ontology for Biologically Relevant Chemistry: Enhancements for 2013. Nucleic Acids Res. 2012, 41, D456–D463. 10.1093/nar/gks1146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hastings J.; Glauer M.; Memariani A.; Neuhaus F.; Mossakowski T. Learning Chemistry: Exploring the Suitability of Machine Learning for the Task of Structure-Based Chemical Ontology Classification. J. Cheminf. 2021, 13, 23. 10.1186/s13321-021-00500-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gomes C. P.; Bai J.; Xue Y.; Björck J.; Rappazzo B.; Ament S.; Bernstein R.; Kong S.; Suram S. K.; van Dover R. B.; Gregoire J. M. CRYSTAL: A Multi-Agent AI System for Automated Mapping of Materials’ Crystal Structures. MRS Commun. 2019, 9, 600–608. 10.1557/mrc.2019.50. [DOI] [Google Scholar]
- Montoya J. H.; Winther K. T.; Flores R. A.; Bligaard T.; Hummelshøj J. S.; Aykol M. Autonomous Intelligent Agents for Accelerated Materials Discovery. Chem. Sci. 2020, 11, 8517–8532. 10.1039/D0SC01101K. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caramelli D.; Salley D.; Henson A.; Camarasa G. A.; Sharabi S.; Keenan G.; Cronin L. Networking Chemical Robots for Reaction Multitasking. Nat. Commun. 2018, 9, 3406. 10.1038/s41467-018-05828-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hendler J. Agents and the Semantic Web. IEEE Intell. Syst. 2001, 16, 30–37. 10.1109/5254.920597. [DOI] [Google Scholar]
- Cao L.; Russo D.; Lapkin A. A. Automated Robotic Platforms in Design and Development of Formulations. AIChE J. 2021, 67, e17248. 10.1002/aic.17248. [DOI] [Google Scholar]
- Godfrey A. G.; Masquelin T.; Hemmerle H. A Remote-Controlled Adaptive Medchem Lab: An Innovative Approach to Enable Drug Discovery in the 21st Century. Drug Discovery Today 2013, 18, 795–802. 10.1016/j.drudis.2013.03.001. [DOI] [PubMed] [Google Scholar]
- Ley S. V.; Fitzpatrick D. E.; Ingham R. J.; Myers R. M. Organic Synthesis: March of the Machines. Angew. Chem., Int. Ed. 2015, 54, 3449–3464. 10.1002/anie.201410744. [DOI] [PubMed] [Google Scholar]
- Fitzpatrick D. E.; Ley S. V. Engineering Chemistry for the Future of Chemical Synthesis. Tetrahedron 2018, 74, 3087–3100. 10.1016/j.tet.2017.08.050. [DOI] [Google Scholar]
- Häse F.; Roch L. M.; Aspuru-Guzik A. Next-Generation Experimentation with Self-Driving Laboratories. Trends Chem. 2019, 1, 282–291. 10.1016/j.trechm.2019.02.007. [DOI] [Google Scholar]
- Mateos C.; Nieves-Remacha M. J.; Rincón J. A. Automated Platforms for Reaction Self-Optimization in Flow. React. Chem. Eng. 2019, 4, 1536–1544. 10.1039/C9RE00116F. [DOI] [Google Scholar]
- Knight N. J.; Kanza S.; Cruickshank D.; Brocklesby W. S.; Frey J. G. Talk2Lab: The Smart Lab of the Future. IEEE Internet Things J. 2020, 7, 8631–8640. 10.1109/JIOT.2020.2995323. [DOI] [Google Scholar]
- Fitzpatrick D. E.; Battilocchio C.; Ley S. V. A Novel Internet-Based Reaction Monitoring, Control and Autonomous Self-Optimization Platform for Chemical Synthesis. Org. Process Res. Dev. 2016, 20, 386–394. 10.1021/acs.oprd.5b00313. [DOI] [Google Scholar]
- Ingham R. J.; Battilocchio C.; Fitzpatrick D. E.; Sliwinski E.; Hawkins J. M.; Ley S. V. A Systems Approach Towards an Intelligent and Self-Controlling Platform for Integrated Continuous Reaction Sequences. Angew. Chem., Int. Ed. 2015, 127, 146–150. 10.1002/ange.201409356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roch L. M.; Häse F.; Kreisbeck C.; Tamayo-Mendoza T.; Yunker L. P. E.; Hein J. E.; Aspuru-Guzik A. ChemOS: An Orchestration Software to Democratize Autonomous Discovery. PLoS One 2020, 15, e0229862. 10.1371/journal.pone.0229862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jeraal M. I.; Sung S.; Lapkin A. A. A Machine Learning-Enabled Autonomous Flow Chemistry Platform for Process Optimization of Multiple Reaction Metrics. Chem. Methods 2021, 1, 71–77. 10.1002/cmtd.202000044. [DOI] [Google Scholar]
- Mo Y.; Rughoobur G.; Nambiar A. M. K.; Zhang K.; Jensen K. F. A Multifunctional Microfluidic Platform for High-Throughput Experimentation of Electroorganic Chemistry. Angew. Chem., Int. Ed. 2020, 59, 20890–20894. 10.1002/anie.202009819. [DOI] [PubMed] [Google Scholar]
- Chatterjee S.; Guidi M.; Seeberger P. H.; Gilmore K. Automated Radial Synthesis of Organic Molecules. Nature 2020, 579, 379–384. 10.1038/s41586-020-2083-5. [DOI] [PubMed] [Google Scholar]
- Langner S.; Häse F.; Perea J. D.; Stubhan T.; Hauch J.; Roch L. M.; Heumueller T.; Aspuru-Guzik A.; Brabec C. J. Beyond Ternary OPV: High-Throughput Experimentation and Self-Driving Laboratories Optimize Multicomponent Systems. Adv. Mater. 2020, 32, 1907801. 10.1002/adma.201907801. [DOI] [PubMed] [Google Scholar]
- Atinary Technologies Inc. & Atinary Technologies Sárl, Antinary – Enabling Self-Driving Laboratories. https://atinary.com/, Accessed 13 November 2021.
- Li J.; Tu Y.; Liu R.; Lu Y.; Zhu X. Toward “On-Demand” Materials Synthesis and Scientific Discovery through Intelligent Robots. Adv. Sci. 2020, 7, 1901957. 10.1002/advs.201901957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pendleton I. M.; Cattabriga G.; Li Z.; Najeeb M. A.; Friedler S. A.; Norquist A. J.; Chan E. M.; Schrier J. Experiment Specification, Capture and Laboratory Automation Technology (ESCALATE): A Software Pipeline for Automated Chemical Experimentation and Data Management. MRS Commun. 2019, 9, 846–859. 10.1557/mrc.2019.72. [DOI] [Google Scholar]
- Li Z.; Najeeb M. A.; Alves L.; Sherman A. Z.; Shekar V.; Cruz Parrilla P.; Pendleton I. M.; Wang W.; Nega P. W.; Zeller M.; Schrier J.; Norquist A. J.; Chan E. M. Robot-Accelerated Perovskite Investigation and Discovery. Chem. Mater. 2020, 32, 5650–5663. 10.1021/acs.chemmater.0c01153. [DOI] [Google Scholar]
- Schweidtmann A. M.; Clayton A. D.; Holmes N.; Bradford E.; Bourne R. A.; Lapkin A. A. Machine Learning Meets Continuous Flow Chemistry: Automated Optimization Towards the Pareto Front of Multiple Objectives. Chem. Eng. J. 2018, 352, 277–282. 10.1016/j.cej.2018.07.031. [DOI] [Google Scholar]
- Air Force Research Laboratory, ARES OS. 2021; https://github.com/AFRL-ARES/ARES_OS, Accessed 13 November 2021.
- Nikolaev P.; Hooper D.; Perea-Lopez N.; Terrones M.; Maruyama B. Discovery of Wall-Selective Carbon Nanotube Growth Conditions via Automated Experimentation. ACS Nano 2014, 8, 10214–10222. 10.1021/nn503347a. [DOI] [PubMed] [Google Scholar]
- Nikolaev P.; Hooper D.; Webber F.; Rao R.; Decker K.; Krein M.; Poleski J.; Barto R.; Maruyama B. Autonomy in Materials Research: A Case Study in Carbon Nanotube Growth. npj Comput. Mater. 2016, 2, 16031. 10.1038/npjcompumats.2016.31. [DOI] [Google Scholar]
- Deneault J. R.; Chang J.; Myung J.; Hooper D.; Armstrong A.; Pitt M.; Maruyama B. Toward Autonomous Additive Manufacturing: Bayesian Optimization on a 3D Printer. MRS Bull. 2021, 46, 566–575. 10.1557/s43577-021-00051-1. [DOI] [Google Scholar]
- Christensen M.; Yunker L. P. E.; Adedeji F.; Häse F.; Roch L. M.; Gensch T.; dos Passos Gomes G.; Zepel T.; Sigman M. S.; Aspuru-Guzik A.; Hein J. Data-Science Driven Autonomous Process Optimization. Commun. Chem. 2021, 4, 112. 10.1038/s42004-021-00550-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wigley P. B.; Everitt P. J.; van den Hengel A.; Bastian J. W.; Sooriyabandara M. A.; McDonald G. D.; Hardman K. S.; Quinlivan C. D.; Manju P.; Kuhn C. C. N.; Petersen I. R.; Luiten A. N.; Hope J. J.; Robins N. P.; Hush M. R. Fast Machine-Learning Online Optimization of Ultra-Cold-Atom Experiments. Sci. Rep. 2016, 6, 25890. 10.1038/srep25890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soedarmadji E.; Stein H. S.; Suram S. K.; Guevarra D.; Gregoire J. M. Tracking Materials Science Data Lineage to Manage Millions of Materials Experiments and Analyses. npj Comput. Mater. 2019, 5, 79. 10.1038/s41524-019-0216-x. [DOI] [Google Scholar]
- Statt M. J.; Rohr B. A.; Brown K.; Guevarra D.; Hummelshøj J. S.; Hung L.; Anapolsky A.; Gregoire J. M.; Suram S. K. ESAMP: Event-Sourced Architecture for Materials Provenance Management and Application to Accelerated Materials Discovery. ChemRxiv Preprint 2021, 10.26434/chemrxiv.14583258.v1. [DOI] [Google Scholar]
- Garud S. S.; Karimi I. A.; Kraft M. Design of Computer Experiments: A Review. Comput. Chem. Eng. 2017, 106, 71–95. 10.1016/j.compchemeng.2017.05.010. [DOI] [Google Scholar]
- Clayton A. D.; Manson J. A.; Taylor C. J.; Chamberlain T. W.; Taylor B. A.; Clemens G.; Bourne R. A. Algorithms for the Self-Optimisation of Chemical Reactions. React. Chem. Eng. 2019, 4, 1545–1554. 10.1039/C9RE00209J. [DOI] [Google Scholar]
- Winicov H.; Schainbaum J.; Buckley J.; Longino G.; Hill J.; Berkoff C. Chemical Process Optimization by Computer-A Self-Directed Chemical Synthesis System. Anal. Chim. Acta 1978, 103, 469–476. 10.1016/S0003-2670(01)83110-X. [DOI] [Google Scholar]
- Lindsey J. S. A Retrospective on the Automation of Laboratory Synthetic Chemistry. Chemom. Intell. Lab. Syst. 1992, 17, 15–45. 10.1016/0169-7439(92)90025-B. [DOI] [Google Scholar]
- McNally A.; Prier C. K.; MacMillan D. W. Discovery of an α-Amino C-H Arylation Reaction Using the Strategy of Accelerated Serendipity. Science 2011, 334, 1114–1117. 10.1126/science.1213920. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoogenboom R.; Fijten M. W. M.; Brändli C.; Schroer J.; Schubert U. S. Automated Parallel Temperature Optimization and Determination of Activation Energy for the Living Cationic Polymerization of 2-Ethyl-2-Oxazoline. Macromol. Rapid Commun. 2003, 24, 98–103. 10.1002/marc.200390017. [DOI] [Google Scholar]
- Greenaway R. L.; Santolini V.; Bennison M. J.; Alston B. M.; Pugh C. J.; Little M. A.; Miklitz M.; Eden-Rump E. G. B.; Clowes R.; Shakil A.; Cuthbertson H. J.; Armstrong H.; Briggs M. E.; Jelfs K. E.; Cooper A. I. High-Throughput Discovery of Organic Cages and Catenanes Using Computational Screening Fused with Robotic Synthesis. Nat. Commun. 2018, 9, 2849. 10.1038/s41467-018-05271-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Neill S. AI-Driven Robotic Laboratories Show Promise. Engineering 2021, 7, 1351. 10.1016/j.eng.2021.08.006. [DOI] [Google Scholar]
- Wolf T.et al. Transformers: State-of-the-Art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations; Association for Computational Linguistics, 2020; pp 38–45. [Google Scholar]
- Schwaller P.; Laino T.; Gaudin T.; Bolgar P.; Hunter C. A.; Bekas C.; Lee A. A. Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction. ACS Cent. Sci. 2019, 5, 1572–1583. 10.1021/acscentsci.9b00576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwaller P.; Petraglia R.; Zullo V.; Nair V. H.; Haeuselmann R. A.; Pisoni R.; Bekas C.; Iuliano A.; Laino T. Predicting Retrosynthetic Pathways Using Transformer-Based Models and a Hyper-Graph Exploration Strategy. Chem. Sci. 2020, 11, 3316–3325. 10.1039/C9SC05704H. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vaucher A. C.; Zipoli F.; Geluykens J.; Nair V. H.; Schwaller P.; Laino T. Automated Extraction of Chemical Synthesis Actions from Experimental Procedures. Nat. Commun. 2020, 11, 3601. 10.1038/s41467-020-17266-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwaller P.; Hoover B.; Reymond J.-L.; Strobelt H.; Laino T. Extraction of Organic Chemistry Grammar from Unsupervised Learning of Chemical Reactions. Sci. Adv. 2021, 7, eabe4166. 10.1126/sciadv.abe4166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilmore K.; Kopetzki D.; Lee J. W.; Horváth Z.; McQuade D. T.; Seidel-Morgenstern A.; Seeberger P. H. Continuous Synthesis of Artemisinin-Derived Medicines. Chem. Commun. 2014, 50, 12652–12655. 10.1039/C4CC05098C. [DOI] [PubMed] [Google Scholar]
- Vieira T.; Stevens A. C.; Chtchemelinine A.; Gao D.; Badalov P.; Heumann L. Development of a Large-Scale Cyanation Process Using Continuous Flow Chemistry En Route to the Synthesis of Remdesivir. Org. Process Res. Dev. 2020, 24, 2113–2121. 10.1021/acs.oprd.0c00172. [DOI] [PubMed] [Google Scholar]
- Roch L. M.; Häse F.; Kreisbeck C.; Tamayo-Mendoza T.; Yunker L. P. E.; Hein J. E.; Aspuru-Guzik A. ChemOS: Orchestrating Autonomous Experimentation. Sci. Robot. 2018, 3, eaat5559. 10.1126/scirobotics.aat5559. [DOI] [PubMed] [Google Scholar]
- Fitzpatrick D. E.; O’Brien M.; Ley S. V. A Tutored Discourse on Microcontrollers, Single Board Computers and Their Applications to Monitor and Control Chemical Reactions. React. Chem. Eng. 2020, 5, 201–220. 10.1039/C9RE00407F. [DOI] [Google Scholar]
- Quigley M.; Gerkey B.; Conley K.; Faust J.; Foote T.; Leibs J.; Berger E.; Wheeler R.; Ng A.. ROS: An Open-Source Robot Operating System. ICRA Workshop on Open Source Software. IEEE: 2009; http://robotics.stanford.edu/~ang/papers/icraoss09-ROS.pdf Accessed 14 November 2021.
- Marquez-Gamez D.; Maffetton P.. A ROS Based Architecture for an Autonomous Chemistry Laboratory. Presented at ROSCon Macau 2019. 2019.
- Fakhruldeen H.; Marquez-Gamez D.; Cooper A. I. Development of a ROS Driver and Support Stack for the KMR iiwa Mobile Manipulator. Annual Conference Towards Autonomous Robotic Systems; Springer: 2021; pp 304–314. [Google Scholar]
- Varghese J. J.; Cao L.; Robertson C.; Yang Y.; Gladden L. F.; Lapkin A. A.; Mushrif S. H. Synergistic Contribution of the Acidic Metal Oxide–Metal Couple and Solvent Environment in the Selective Hydrogenolysis of Glycerol: A Combined Experimental and Computational Study Using ReOX–Ir as the Catalyst. ACS Catal. 2019, 9, 485–503. 10.1021/acscatal.8b03079. [DOI] [Google Scholar]
- Thakkar A.; Johansson S.; Jorner K.; Buttar D.; Reymond J.-L.; Engkvist O. Artificial Intelligence and Automation in Computer Aided Synthesis Planning. React. Chem. Eng. 2021, 6, 27–51. 10.1039/D0RE00340A. [DOI] [Google Scholar]
- Tran K.; Palizhati A.; Back S.; Ulissi Z. W. Dynamic Workflows for Routine Materials Discovery in Surface Science. J. Chem. Inf. Model. 2018, 58, 2392–2400. 10.1021/acs.jcim.8b00386. [DOI] [PubMed] [Google Scholar]
- Tran K.; Ulissi Z. W. Active Learning Across Intermetallics to Guide Discovery of Electrocatalysts for CO2 Reduction and H2 Evolution. Nat. Catal. 2018, 1, 696–703. 10.1038/s41929-018-0142-1. [DOI] [Google Scholar]
- Reuther A.; Byun C.; Arcand W.; Bestor D.; Bergeron B.; Hubbell M.; Jones M.; Michaleas P.; Prout A.; Rosa A.; Kepner J. Scalable System Scheduling for HPC and Big Data. J. Parallel Distrib. Comput. 2018, 111, 76–92. 10.1016/j.jpdc.2017.06.009. [DOI] [Google Scholar]
- Rosen A. S.; Notestein J. M.; Snurr R. Q. Identifying Promising Metal-Organic Frameworks for Heterogeneous Catalysis via High-Throughput Periodic Density Functional Theory. J. Comput. Chem. 2019, 40, 1305–1318. 10.1002/jcc.25787. [DOI] [PubMed] [Google Scholar]
- Rosen A.; Iyer S.; Ray D.; Yao Z.; Aspuru-Guzik A.; Gagliardi L.; Notestein J.; Snurr R. Q. Machine Learning the Quantum-Chemical Properties of Metal-Organic Frameworks for Accelerated Materials Discovery. Matter 2021, 4, 1578–1597. 10.1016/j.matt.2021.02.015. [DOI] [Google Scholar]
- Ong S. P.; Richards W. D.; Jain A.; Hautier G.; Kocher M.; Cholia S.; Gunter D.; Chevrier V. L.; Persson K. A.; Ceder G. Python Materials Genomics (pymatgen): A Robust, Open-Source Python Library for Materials Analysis. Comput. Mater. Sci. 2013, 68, 314–319. 10.1016/j.commatsci.2012.10.028. [DOI] [Google Scholar]
- Jain A.; Ong S. P.; Chen W.; Medasani B.; Qu X.; Kocher M.; Brafman M.; Petretto G.; Rignanese G.-M.; Hautier G.; Gunter D.; Persson K. A. FireWorks: A Dynamic Workflow System Designed for High-Throughput Applications. Concurr. Comput. Pract. Exp 2015, 27, 5037–5059. 10.1002/cpe.3505. [DOI] [Google Scholar]
- Mathew K.; et al. Atomate: A High-Level Interface to Generate, Execute, and Analyze Computational Materials Science Workflows. Comput. Mater. Sci. 2017, 139, 140–152. 10.1016/j.commatsci.2017.07.030. [DOI] [Google Scholar]
- Hachmann J.; Afzal M. A. F.; Haghighatlari M.; Pal Y. Building and Deploying a Cyberinfrastructure for the Data-driven Design of Chemical Systems and the Exploration of Chemical Space. Mol. Simul. 2018, 44, 921–929. 10.1080/08927022.2018.1471692. [DOI] [Google Scholar]
- Haghighatlari M.; Vishwakarma G.; Altarawy D.; Subramanian R.; Kota B. U.; Sonpal A.; Setlur S.; Hachmann J. ChemML: A Machine Learning and Informatics Program Package for the Analysis, Mining, and Modeling of Chemical and Materials Data. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2020, 10, e1458. 10.1002/wcms.1458. [DOI] [Google Scholar]
- MolSSI QCArchive, The MolSSI Quantum Chemistry Archive. https://qcarchive.molssi.org/, Accessed 14 November2021.
- Breen C. P.; Nambiar A. M. K.; Jamison T. F.; Jensen K. F. Ready, Set, Flow! Automated Continuous Synthesis and Optimization. Trends Chem. 2021, 3, 373–386. 10.1016/j.trechm.2021.02.005. [DOI] [Google Scholar]
- Raccuglia P.; Elbert K. C.; Adler P. D. F.; Falk C.; Wenny M. B.; Mollo A.; Zeller M.; Friedler S. A.; Schrier J.; Norquist A. Machine-Learning-Assisted Materials Discovery Using Failed Experiments. Nature 2016, 533, 73–76. 10.1038/nature17439. [DOI] [PubMed] [Google Scholar]
- Skilton R. A.; et al. Remote-Controlled Experiments with Cloud Chemistry. Nat. Chem. 2015, 7, 1–5. 10.1038/nchem.2143. [DOI] [PubMed] [Google Scholar]
- Herres-Pawlis S.; Koepler O.; Steinbeck C. NFDI4Chem: Shaping a Digital and Cultural Change in Chemistry. Angew. Chem., Int. Ed. 2019, 58, 10766–10768. 10.1002/anie.201907260. [DOI] [PubMed] [Google Scholar]
- Zhou Q.; Tang P.; Liu S.; Pan J.; Yan Q.; Zhang S.-C. Learning Atoms for Materials Discovery. Proc. Natl. Acad. Sci. U. S. A. 2018, 115, E6411–E6417. 10.1073/pnas.1801181115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weininger D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. 10.1021/ci00057a005. [DOI] [Google Scholar]
- Heller S. R.; McNaught A.; Pletnev I.; Stein S.; Tchekhovskoi D. InChI, the IUPAC International Chemical Identifier. J. Cheminf. 2015, 7, 23. 10.1186/s13321-015-0068-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daylight, SMARTS - A Language for Describing Molecular Patterns. 2014; https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html, Accessed 27 May 2021.
- Krenn M.; Häse F.; Nigam A.; Friederich P.; Aspuru-Guzik A. Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation. Mach. Learn.: Sci. Technol. 2020, 1, 045024. 10.1088/2632-2153/aba947. [DOI] [Google Scholar]
- Grethe G.; Blanke G.; Kraut H.; Goodman J. M. International Chemical Identifier for Reactions (RInChI). J. Cheminf. 2018, 10, 38. 10.1186/s13321-018-0277-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daylight, SMIRKS - A Reaction Transform Language. 2014; https://www.daylight.com/dayhtml/doc/theory/theory.smirks.html, Accessed 27 May 2021.
- O’Boyle N. M.; Banck M.; James C. A.; Morley C.; Vandermeersch T.; Hutchison G. R. Open Babel: An Open Chemical Toolbox. J. Cheminf. 2011, 3, 33. 10.1186/1758-2946-3-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Landrum G.et al. RDKit: Open-Source Cheminformatics Software. https://www.rdkit.org/, Accessed 27 May 2021.
- Kim S.; Chen J.; Cheng T.; Gindulyte A.; He J.; He S.; Li Q.; Shoemaker B. A.; Thiessen P. A.; Yu B.; Zaslavsky L.; Zhang J.; Bolton E. E. PubChem 2019 Update: Improved Access to Chemical Data. Nucleic Acids Res. 2019, 47, D1102–D1109. 10.1093/nar/gky1033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nicklaus M. C. NIH Virtual Workshop on Reaction Informatics, May 18–20. 2021; https://cactus.nci.nih.gov/presentations/NIHReactInf_2021-05/NIHReactInf.html, Accessed 31 July 2021.
- Lowe D. Chemical Reactions from US Patents (1976-Sep 2016). 2017, 10.6084/m9.figshare.5104873.v1. [DOI]
- NextMove Software , Pistachio. https://www.nextmovesoftware.com/pistachio.html, Accessed 15 July 2021.
- Schneider N.; Lowe D. M.; Sayle R. A.; Tarselli M. A.; Landrum G. A. Big Data from Pharmaceutical Patents: A Computational Analysis of Medicinal Chemists’ Bread and Butter. J. Med. Chem. 2016, 59, 4385–4402. 10.1021/acs.jmedchem.6b00153. [DOI] [PubMed] [Google Scholar]
- Bradshaw J.; Kusner M. J.; Paige B.; Segler M. H. S.; Hernández-Lobato J. M.. A Generative Model for Electron Paths. Proceedings of the 7th International Conference on Learning Representations (ICLR 2019). ICLR: 2019; pp 1–19. [Google Scholar]
- Jin W.; Coley C. W.; Barzilay R.; Jaakkola T. Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network. Proceedings of the 31st International Conference on Neural Information Processing Systems. Association for Computing Machinery: 2017; pp 2604–2613. [Google Scholar]
- Schwaller P.; Gaudin T.; Lanyi D.; Bekas C.; Laino T. Found in Translation”: Predicting Outcomes of Complex Organic Chemistry Reactions Using Neural Sequence-to-Sequence Models. Chem. Sci. 2018, 9, 6091–6098. 10.1039/C8SC02339E. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Open Reaction Database Project Authors , Welcome to the Open Reaction Database! 2021; https://docs.open-reaction-database.org/en/latest/, Accessed 27 May 2021.
- Kearnes S. M.; Maser M. R.; Wleklinski M.; Kast A.; Doyle A. G.; Dreher S. D.; Hawkins J. M.; Jensen K. F.; Coley C. W. The Open Reaction Database. J. Am. Chem. Soc. 2021, 143, 18820–18826. 10.1021/jacs.1c09820. [DOI] [PubMed] [Google Scholar]
- Kearnes S.The Open Reaction Database. NIH Virtual Workshop on Reaction Informatics, May 18–20. 2021; https://cactus.nci.nih.gov/presentations/NIHReactInf_2021-05/Kearnes_Open_Reaction_Database-NIH_Reaction_Informatics.pptx, Accessed 13 November 2021.
- Pistoia Alliance , Unified Data Model. 2020; https://github.com/PistoiaAlliance/UDM, Accessed 27 May 2021.
- Goodman J. Computer Software Review: Reaxys. J. Chem. Inf. Model. 2009, 49, 2897–2898. 10.1021/ci900437n. [DOI] [Google Scholar]
- EMBL-EBI , Molecular Process Ontology. 2014; https://www.ebi.ac.uk/ols/ontologies/mop, Accessed 14 June 2021.
- Millecam T.; Jarrett A. J.; Young N.; Vanderwall D. E.; Della Corte D. Coming of Age of Allotrope: Proceedings from the Fall 2020 Allotrope Connect. Drug Discovery Today 2021, 26, 1922–1928. 10.1016/j.drudis.2021.03.028. [DOI] [PubMed] [Google Scholar]
- Roth D. L. SPRESIweb 2.1, a Selective Chemical Synthesis and Reaction Database. J. Chem. Inf. Model. 2005, 45, 1470–1473. 10.1021/ci050274b. [DOI] [Google Scholar]
- Blanke G.The Unified Data Model (UDM). NIH Virtual Workshop on Reaction Informatics, May 18–20. 2021; https://cactus.nci.nih.gov/presentations/NIHReactInf_2021-05/UDM_at_NIH_Reaction_conference_May_2021_-_Gerd_Blanke.pdf, Accessed 13 November 2021.
- Tremouilhac P.; Lin C.-L.; Huang P.-C.; Huang Y.-C.; Nguyen A.; Jung N.; Bach F.; Ulrich R.; Neumair B.; Streit A.; Bräse S. The Repository Chemotion: Infrastructure for Sustainable Research in Chemistry. Angew. Chem., Int. Ed. 2020, 59, 22771–22778. 10.1002/anie.202007702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lampen P.; Lambert J.; Lancashire R. J.; McDonald R. S.; McIntyre P. S.; Rutledge D. N.; Fröhlich T.; Davies A. N. An Extension to the JCAMP-DX Standard File Format, JCAMP-DX V. 5.01. Pure Appl. Chem. 1999, 71, 1549–1556. 10.1351/pac199971081549. [DOI] [Google Scholar]
- EMBL-EBI , Chemical Methods Ontology. 2019; https://www.ebi.ac.uk/ols/ontologies/chmo, Accessed 14 June 2021.
- Jung N.Documentation and Publication of Reactions with Chemotion ELN and Repository. NIH Virtual Workshop on Reaction Informatics, May 18–20, 2021; https://cactus.nci.nih.gov/presentations/NIHReactInf_2021-05/Nicole_Jung_Chemotion_NIH_2021.pdf, Accessed 13 November 2021.
- Thermo Fisher Scientific (Informatics) , An XML-Based File Format for Archival Storage of Analytical Instrument Data. 2001; http://www.gaml.org/Documentation/XML%20Analytical%20Archive%20Format.pdf, Accessed 31 July 2021.
- AnIML Working Group , AnIML: Overview. https://www.animl.org/overview, Accessed 31 July 2021.
- Rühl M. A.; Schäfer R.; Kramer G. W. Spectro ML-A Markup Language for Molecular Spectrometry Data. JALA: J. Assoc. Lab. Autom. 2001, 6, 76–82. 10.1016/S1535-5535-04-00168-6. [DOI] [Google Scholar]
- SiLA , SiLA Rapid Integration | Standardization in Lab Automation. 2021; https://sila-standard.com/, Accessed 27 May 2021.
- Schäfer B.Data Exchange in the Laboratory of the Future - A Glimpse at AnIML and SiLA. 2018; https://analyticalscience.wiley.com/do/10.1002/gitlab.17270/full/, Accessed 15 July 2021.
- Mehr S. H. M.; Craven M.; Leonov A. I.; Keenan G.; Cronin L. A Universal System for Digitization and Automatic Execution of the Chemical Synthesis Literature. Science 2020, 370, 101–108. 10.1126/science.abc2986. [DOI] [PubMed] [Google Scholar]
- Noack M. M.; Sethian J. A.. Autonomous Discovery in Science and Engineering. 2021; https://www.osti.gov/biblio/1818491, Accessed 14 November 2021.
- ESCALATE , Interacting with the ESCALATE REST API. https://github.com/darkreactions/ESCALATE/blob/master/demonstrations/REST_API_DEMO.ipynb, Accessed 15 November 2021.
- Hitzler P. A Review of The Semantic Web Field. Commun. ACM 2021, 64, 76–83. 10.1145/3397512. [DOI] [Google Scholar]
- Gkoutos G. V.; Murray-Rust P.; Rzepa H. S.; Wright M. Chemical Markup, XML, and the World-Wide Web. 3. Toward a Signed Semantic Chemical Web of Trust. J. Chem. Inf. Comput. Sci. 2001, 41, 1124–1130. 10.1021/ci000406v. [DOI] [PubMed] [Google Scholar]
- Murray-Rust P.; Rzepa H. S.; Tyrrell S. M.; Zhang Y. Representation and Use of Chemistry in the Global Electronic Age. Org. Biomol. Chem. 2004, 2, 3192–3203. 10.1039/b410732b. [DOI] [PubMed] [Google Scholar]
- Coles S. J.; Day N. E.; Murray-Rust P.; Rzepa H. S.; Zhang Y. Enhancement of the Chemical Semantic Web through the Use of InChI Identifiers. Org. Biomol. Chem. 2005, 3, 1832–1834. 10.1039/b502828k. [DOI] [PubMed] [Google Scholar]
- Murray-Rust P. Chemistry for Everyone. Nature 2008, 451, 648–651. 10.1038/451648a. [DOI] [PubMed] [Google Scholar]
- Murray-Rust P.CML - Frequently Asked Questions. http://www.xml-cml.org/documentation/FAQ.html#chemistry, Accessed 31 July 2021.
- Batchelor C.; Corbett P.. Semantic Enrichment of Journal Articles Using Chemical Named Entity Recognition. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. Association for Computational Linguistics: 2007; pp 45–48.
- EMBL-EBI , Name Reaction Ontology. 2021; https://www.ebi.ac.uk/ols/ontologies/rxno, Accessed 14 June 2021.
- Hastings J.; Chepelev L.; Willighagen E.; Adams N.; Steinbeck C.; Dumontier M. The Chemical Information Ontology: Provenance and Disambiguation for Chemical Data on the Biological Semantic Web. PloS One 2011, 6, e25513. 10.1371/journal.pone.0025513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Willighagen E. L.; Waagmeester A.; Spjuth O.; Ansell P.; Williams A. J.; Tkachenko V.; Hastings J.; Chen B.; Wild D. J. The ChEMBL Database as Linked Open Data. J. Cheminf. 2013, 5, 23. 10.1186/1758-2946-5-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu G.; Batchelor C.; Dumontier M.; Hastings J.; Willighagen E.; Bolton E. PubChemRDF: Towards the Semantic Annotation of PubChem Compound and Substance Databases. J. Cheminf. 2015, 7, 34. 10.1186/s13321-015-0084-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galgonek J.; Vondrášek J. IDSM ChemWebRDF: SPARQLing Small-Molecule Datasets. J. Cheminf. 2021, 13, 38. 10.1186/s13321-021-00515-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roberts J. M.; Bean M. F.; Cole S. R.; Young W. K.; Weston H. E. Informatics in the Analytical Laboratory: Vision for a New Decade. Am. Pharm. Rev. 2010, 13, 60. [Google Scholar]
- Roberts J. M.; Bean M. F.; Cole S. R.; Young W. K.; Weston H. E. The Adaptable Laboratory: A Holistic Informatics Architecture. Am. Pharm. Rev. 2011, 14, 12. [Google Scholar]
- Bard J. B. L.; Rhee S. Y. Ontologies in Biology: Design, Applications and Future Challenges. Nat. Rev. Genet. 2004, 5, 213–222. 10.1038/nrg1295. [DOI] [PubMed] [Google Scholar]
- Menon A.; Krdzavac N. B.; Kraft M. From Database to Knowledge Graph-Using Data in Chemistry. Curr. Opin. Chem. Eng. 2019, 26, 33–37. 10.1016/j.coche.2019.08.004. [DOI] [Google Scholar]
- Godfrey A. G.; Michael S. G.; Sittampalam G. S.; Zahoránszky-Köhalmi G. A Perspective on Innovating the Chemistry Lab Bench. Front. Robot. AI 2020, 7, 24. 10.3389/frobt.2020.00024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Russell S.; Norvig P.. Artificial Intelligence: A Modern Approach, 3rd ed.; Prentice Hall, 2010. [Google Scholar]
- Chard R.; Li Z.; Chard K.; Ward L.; Babuji Y.; Woodard A.; Tuecke S.; Blaiszik B.; Franklin M. J.; Foster I. DLHub: Model and Data Serving for Science. 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE: 2019; pp 283–292. [Google Scholar]
- The Foundation for Intelligent Physical Agents , Welcome to the Foundation for Intelligent Physical Agents. 2020; http://www.fipa.org/, Accessed 27 May 2021.
- JADE, Java Agent DEvelopment Framework: Jade Site. 2021; https://jade.tilab.com/, Accessed 27 May 2021.
- Segler M. H. S.; Waller M. P. Modelling Chemical Reasoning to Predict and Invent Reactions. Chem.—Eur. J. 2017, 23, 6118–6128. 10.1002/chem.201604556. [DOI] [PubMed] [Google Scholar]
- Eibeck A.; Lim M. Q.; Kraft M. J-Park Simulator: An Ontology-Based Platform for Cross-domain Scenarios in Process Industry. Comput. Chem. Eng. 2019, 131, 106586. 10.1016/j.compchemeng.2019.106586. [DOI] [Google Scholar]
- Akroyd J.; Mosbach S.; Bhave A.; Kraft M. Universal Digital Twin - A Dynamic Knowledge Graph. Data-Centric Engineering 2021, 2, e14. 10.1017/dce.2021.10. [DOI] [Google Scholar]
- Zhang C.; Romagnoli A.; Zhou L.; Kraft M. Knowledge Management of Eco-industrial Park for Efficient Energy Utilization Through Ontology-Based Approach. Appl. Energy 2017, 204, 1412–1421. 10.1016/j.apenergy.2017.03.130. [DOI] [Google Scholar]
- Zhou L.; Pan M.; Sikorski J. J.; Garud S.; Aditya L. K.; Kleinelanghorst M. J.; Karimi I. A.; Kraft M. Towards an Ontological Infrastructure for Chemical Process Simulation and Optimization in the Context of Eco-industrial Parks. Appl. Energy 2017, 204, 1284–1298. 10.1016/j.apenergy.2017.05.002. [DOI] [Google Scholar]
- Pan M.; Sikorski J.; Kastner C. A.; Akroyd J.; Mosbach S.; Lau R.; Kraft M. Applying Industry 4.0 to the Jurong Island Eco-industrial Park. Energy Procedia 2015, 75, 1536–1541. 10.1016/j.egypro.2015.07.313. [DOI] [Google Scholar]
- Krdzavac N.; Mosbach S.; Nurkowski D.; Buerger P.; Akroyd J.; Martin J.; Menon A.; Kraft M. An Ontology and Semantic Web Service for Quantum Chemistry Calculations. J. Chem. Inf. Model. 2019, 59, 3154–3165. 10.1021/acs.jcim.9b00227. [DOI] [PubMed] [Google Scholar]
- Farazi F.; Akroyd J.; Mosbach S.; Buerger P.; Nurkowski D.; Salamanca M.; Kraft M. OntoKin: An Ontology for Chemical Kinetic Reaction Mechanisms. J. Chem. Inf. Model. 2020, 60, 108–120. 10.1021/acs.jcim.9b00960. [DOI] [PubMed] [Google Scholar]
- Farazi F.; Krdzavac N. B.; Akroyd J.; Mosbach S.; Menon A.; Nurkowski D.; Kraft M. Linking Reaction Mechanisms and Quantum Chemistry: An Ontological Approach. Comput. Chem. Eng. 2020, 137, 106813. 10.1016/j.compchemeng.2020.106813. [DOI] [Google Scholar]
- Bai J.; Geeson R.; Farazi F.; Mosbach S.; Akroyd J.; Bringley E. J.; Kraft M. Automated Calibration of a Poly(oxymethylene) Dimethyl Ether Oxidation Mechanism Using the Knowledge Graph Technology. J. Chem. Inf. Model. 2021, 61, 1701–1717. 10.1021/acs.jcim.0c01322. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chemical Semantics , GNVC: Gainesville Core Ontology - Standard for Publishing Results of Computational Chemistry. 2015; http://ontologies.makolab.com/gc/gc07.owl, Accessed 21 September 2021.
- Zhou X.; Eibeck A.; Lim M. Q.; Krdzavac N. B.; Kraft M. An Agent Composition Framework for the J-Park Simulator - a Knowledge Graph for the Process Industry. Comput. Chem. Eng. 2019, 130, 106577. 10.1016/j.compchemeng.2019.106577. [DOI] [Google Scholar]
- Mosbach S.; Menon A.; Farazi F.; Krdzavac N.; Zhou X.; Akroyd J.; Kraft M. Multiscale Cross-Domain Thermochemical Knowledge-Graph. J. Chem. Inf. Model. 2020, 60, 6155–6166. 10.1021/acs.jcim.0c01145. [DOI] [PubMed] [Google Scholar]
- Zhou X.; Nurkowski D.; Mosbach S.; Akroyd J.; Kraft M. Question Answering System for Chemistry. J. Chem. Inf. Model. 2021, 61, 3868–3880. 10.1021/acs.jcim.1c00275. [DOI] [PubMed] [Google Scholar]
- Wilkinson M. D.; et al. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Sci. Data 2016, 3, 160018. 10.1038/sdata.2016.18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- von Rueden L.; Mayer S.; Beckh K.; Georgiev B.; Giesselbach S.; Heese R.; Kirsch B.; Walczak M.; Pfrommer J.; Pick A.; Ramamurthy R.; Garcke J.; Bauckhage C.; Schuecker J. Informed Machine Learning - A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems. IEEE Trans. Knowl. Data Eng. 2021, 10.1109/TKDE.2021.3079836. [DOI] [Google Scholar]
- Curtarolo S.; Hart G. L. W.; Nardelli M. B.; Mingo N.; Sanvito S.; Levy O. The High-Throughput Highway to Computational Materials Design. Nat. Mater. 2013, 12, 191–201. 10.1038/nmat3568. [DOI] [PubMed] [Google Scholar]
- Coley C. W. Defining and Exploring Chemical Spaces. Trends Chem. 2021, 3, 133–145. 10.1016/j.trechm.2020.11.004. [DOI] [Google Scholar]
- Morbach J.; Yang A.; Marquardt W. OntoCAPE - a Large-scale Ontology for Chemical Process Engineering. Eng. Appl. Artif. Intell. 2007, 20, 147–161. 10.1016/j.engappai.2006.06.010. [DOI] [Google Scholar]
- Noy N.; Gao Y.; Jain A.; Narayanan A.; Patterson A.; Taylor J. Industry-Scale Knowledge Graphs: Lessons and Challenges. Commun. ACM 2019, 62, 36–43. 10.1145/3331166. [DOI] [Google Scholar]
- Brazil R.Automation in the Chemistry Lab. 2021; https://www.chemistryworld.com/careers/automation-in-the-chemistry-lab/4012832.article, Accessed 31 July 2021.
- Nicklaus M. C.NIH Virtual Workshop on Ultra-Large Chemistry Databases, Dec 1–3, 2020; https://cactus.nci.nih.gov/presentations/NIHBigDB_2020-12/NIHBigDB.html, Accessed 31 July 2021.
- Sabou M.; Biffl S.; Einfalt A.; Krammer L.; Kastner W.; Ekaputra F. J. Semantics for Cyber-Physical Systems: A Cross-Domain Perspective. Semantic Web 2020, 11, 115–124. 10.3233/SW-190381. [DOI] [Google Scholar]
- Chadzynski A.; Krdzavac N.; Farazi F.; Lim M. Q.; Li S.; Grisiute A.; Herthogs P.; von Richthofen A.; Cairns S.; Kraft M. Semantic 3D City Database - An Enabler for a Dynamic Geospatial Knowledge Graph. Energy and AI 2021, 6, 100106. 10.1016/j.egyai.2021.100106. [DOI] [Google Scholar]
- Gomes C.; et al. Computational Sustainability: Computing for a Better World and a Sustainable Future. Commun. ACM 2019, 62, 56–65. 10.1145/3339399. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.