A possible extension to the RInChI as a means of providing machine readable process data

Philipp-Maximilian Jacob; Tian Lan; Jonathan M Goodman; Alexei A Lapkin

doi:10.1186/s13321-017-0210-6

. 2017 Apr 11;9:23. doi: 10.1186/s13321-017-0210-6

A possible extension to the RInChI as a means of providing machine readable process data

Philipp-Maximilian Jacob ¹, Tian Lan ¹, Jonathan M Goodman ², Alexei A Lapkin ^1,^✉

PMCID: PMC5388667 PMID: 29086180

Abstract

The algorithmic, large-scale use and analysis of reaction databases such as Reaxys is currently hindered by the absence of widely adopted standards for publishing reaction data in machine readable formats. Crucial data such as yields of all products or stoichiometry are frequently not explicitly stated in the published papers and, hence, not reported in the database entry for those reactions, limiting their usefulness for algorithmic analysis. This paper presents a possible extension to the IUPAC RInChI standard via an auxiliary layer, termed ProcAuxInfo, which is a standardised, extensible form in which to report certain key reaction parameters such as declaration of all products and reactants as well as auxiliaries known in the reaction, reaction stoichiometry, amounts of substances used, conversion, yield and operating conditions. The standard is demonstrated via creation of the RInChI including the ProcAuxInfo layer based on three published reactions and demonstrates accurate data recoverability via reverse translation of the created strings. Implementation of this or another method of reporting process data by the publishing community would ensure that databases, such as Reaxys, would be able to abstract crucial data for big data analysis of their contents.

Background

In the current environment of ever increasing amounts of available chemical data both industrial and academic actors find themselves in a constant process of having to review the continuously changing state-of-the-art of their activities. In 2005 it was estimated that 1.5 million new compounds alone were being discovered annually [1]. Though this figure is slightly out-of-date, it gives an estimate of the growth rate observed and the challenges this raises when trying to keep an overview of a field of research or of practice. This trend towards higher availability of data has also seen the advent of large scale databases holding chemical reaction information, such as Reaxys (Elsevier), the CAS databases accessed through SciFinder (American Chemical Society) or ChemSpider (Royal Society of Chemistry). Data held in well-structured databases are amenable to algorithmic analyses. It has been postulated in 1990 [2] and demonstrated in 2005 [3] that data held within Reaxys (or rather its predecessors) can be converted into a network, allowing the use of graph theoretical approaches. Having a network of reactions rather than a database greatly facilitates the identification of possible synthetic pathways by using network traversal algorithms [4]. Similarly, it has been shown that the network representation can be used for the optimisation of parallel syntheses [5], the identification of suspicious purchases of precursors to controlled substances [6], the estimation of functional group cross-influence on chemical reactivity [7], or the discovery of one-pot reactions [8]. These demonstrated uses rely on connectivity data across disjoint papers and some structural information on the molecules.

Particularly from a chemical engineering or process chemistry perspective, however, it is crucial to ensure that the connectivity exploited for synthesis route planning is not superficial but that the algorithms navigate the network in a meaningful way. This definition of “meaningfulness” can necessarily be adapted to the specific use case, though could encompass criteria such as economic factors, preservation of certain chemical structure elements across the route, minimisation of process condition changes between synthesis steps, or the consideration of different sustainability criteria. We have recently demonstrated the use of sustainability criteria in this context by linking a process synthesis on the basis of network traversal, with exergy analysis, automated e-factor calculation and multi criteria decision making [9]. However, such detailed analysis of reactions requires reaction data and information on the process conditions. When analysing a set of 33.5 million reactions downloaded from Reaxys [10], which amounts to 80% of the total number of reactions contained in the database [11], and removing all incomplete and multistep reactions, which leaves 15.4 million reactions or 37%, it is discovered that a significant number of data points is missing, making any further analysis impossible. We expect that any other large scale database of chemical data would, at present, have similar data scarcity issues.

As Table 1 clearly shows, in the analysed sample set 54% of reaction entries had no yield data attached, while 53.9 and 98.4% had no temperature or pressure entries, respectively. Furthermore, the database does not record stoichiometry. The absence of such crucial data makes any automated evaluation of a synthesis route candidate along mass- or energy/exergy-based criteria nearly impossible. Analysing the multi-year trend by investigating the information content of all reactions added to Reaxys in a given year for the set of reaction data types shown in Table 1, it becomes apparent that the picture overall is encouraging in many areas, see Fig. 1. The number of records added every single year has more than doubled between the years of 2000 and 2015. During this time the information content of most entries seems to be rising for the properties analysed here. While in the year 2000 50% of records added were still without temperature data, to pick but one property, by 2015 this has dropped to roughly 20%. This trend is pointing in the right direction but 20% is still a large number and progress for many other properties, such as yield, which still hovers around 40%, has not been as good. Though awareness of and efforts to overcome the problem seem to have led to improvements, a systemic issue still seems to persist.

Table 1.

Analysis of reaction data content in Reaxys, based on a sample set of 15.4 million reactions

Property	Percentage of reactions with value for property
Yield	46.0
Temperature	46.1
Pressure	1.6
pH-value	1.0
Reaction time	48.4
Solvent ID	70.9
Reagent ID	67.8
Catalyst ID	4.3

Open in a new tab

Fig. 1 — Plot of the number of records added to Reaxys in a given year and their information content when analysing a fixed set of properties. The “# of Records” line is plotted against the right-hand side y-axis

The cause for this problem is two-fold. On the one hand crucial data, such as reaction stoichiometry, is too frequently absent from publications, while on the other hand existing data, such as temperature or pressure which will be reported in some form in almost all papers, is not published in a way that allows it to be excerpted correctly. Both causes can be remedied. For example, by agreeing on clear and enforced data reporting standards the life of authors would be made easier by clearly setting out what data, and in which format, are required to allow the publication to achieve its maximum impact. At the same time the task of the database provider would be simplified by ensuring that the agreed, and provided, data are available in a machine-readable format.

Structure and reaction data formats can be roughly split into two categories, both of which are based on connection tables: those that are XML-based, such as the Chemical Markup Language [12, 13] and Reaxys’s internal data storage format, and those that are line-based, such as SMILES (simplified molecular input line system) [14] and InChI [15].

Connection tables are widely used and form the basis of many other standards, but no formal standard exists for the tables [16]. Connection tables store information on the atoms, bonds and, optionally, the atoms’ coordinates for a given molecule, making it a graph representation of a molecule [17, 18]. Connection tables can be canonicalised to provide one unique table per molecule, for example, using the Morgan algorithm, first proposed in 1965 and still in use with some modifications [19, 20]. By applying graph theoretical algorithms it is then possible to carry out substructure matching across a database of connection tables [16]. One of the earliest mentions of connection tables was in 1957 [21]. Subsequently, the tables found wide adoption and are used by the CAS database as well as other data formats, such as the Chemical Mark-up Language (CML) and as a basis to generate InChIs [17, 22–24]. A consequence of the way bonds are represented in traditional connection tables is that they struggle to represent delocalised bonds, inorganics and reaction intermediates, which is something that has seen some attempts at being addressed [16, 25]. In the absence of a non-proprietary standard gaining traction over time, the CTfile [26] has become the de facto standard for connection tables and the exchange of structural data [16]. It was initially developed by MDL which is now owned by Biovia, a subsidiary of the Dassault Group. This connection table forms the basis for many formats, such as of the molfile, which describes a single molecule, the reaction file (rxnfile), which contains the structural information of the reactants and products, and the Reaction-data files (RDfiles), which can represent molecules and reactions as well as their associated data. The current version of the standards can be found on Biovia’s website.

XML-based data standards are useful when it comes to electronic database storage of data as they are highly extensible, flexible and all data entries are labelled. A key example of this is the Chemical Markup Language or Elsevier’s Unified Data Model. This is useful when it comes to exchanging data between different software suites [27–29]. A key downside is that, if the data is not already generated by a machine, generation of a valid XML document can be complicated and requires a certain degree of IT knowledge.

SMILES is one of the major formats seeking to condense this tabular format into a more compact and easier to use linear, alphanumeric string [18]. This greatly reduces the required storage space and is faster than handling a whole connection table [16]. Conversion to line notation from connection tables does, however, incur some information loss [16].

An issue that very quickly arose, however, was that SMILES strings in use were not canonical, which severely limited the applicability of SMILES in databases [23]. Canonical SMILES strings are available but are proprietary and the algorithm is not publicly available. Thus, various different versions are in circulation and implementation is seriously hampered [15, 16, 30]. These severe drawbacks were among the factors that led to the creation of the IUPAC International Chemical Identifier (InChI) in order to create a freely available, non-proprietary identifier to allow the easier linking of data compilations and the unambiguous identification of chemical substances [31].

The InChI is a representation that allows for the canonical encoding of structures, with both known and, as of yet, unknown [32], tautomers and isotopes. In addition, it is an open standard and can be easily incorporated into in-house software [1]. The InChI has turned into a widely adopted, worldwide standard as far as line notation is concerned [15, 22]. Additionally, it can be hashed to further reduce required storage space and to facilitate indexing and searching [15, 16, 18, 33]. Though collisions of keys are possible due to the hashing, so far only two cases have been reported since 2007 [22]. In theory the probability is finite, but extremely small [22]. The collision resistance was investigated experimentally, with a conclusion “the current design and implementation seem to meet their goals” [34].

The InChI algorithm itself can, to date, process organometallic and coordination compounds as well as radicals, neutral and ionic organic molecules. Projects are being undertaken to extend the representation to reactions and polymers, which is facilitated by the fact that due to its hierarchical nature new layers can be added relatively easily [16, 35].

The InChI is composed of six hierarchical layers, where each successive layer is designed to provide further structural refinement [16, 32, 36]. All layers aside from the main one are optional, and will only appear if the corresponding information has been provided in the source file [16, 36]. If the same structure has been drawn at two different levels of detail, the InChI for the one with less detail forms a subset of the one with more [15]. For further technical information on InChIs the reader is referred to [37].

Amongst several extensions to the InChI agreed upon by the InChI Trust [38] is a reaction identifier termed RInChI. Largely developed by Jonathan Goodman, Chad Allen and Guenter Grethe this culminated in the publication of an interim report in 2013 [35]. The RInChI consists of a version field (V), three groups containing molecules (group1 and group2, each containing the molecules on one side of the arrow in the reaction equation and group3 containing the substances present above, below or on both sides of the arrow, such as solvents and catalysts) and an optional directionality layer showing whether group1 contains the reactants and group2 the products (denoted by “d+”), vice versa (“d−”), or if it is an equilibrium reaction (“d=”). The molecules within each group are represented by their InChIs, separated by a double forward slash “//” and are sorted; subsequently, the order of the groups containing the starting materials and products is determined using the Unix ‘sort’ command [35]. For the exact definition of version 0.02 the reader is referred to [35]. A new version (0.03) has recently been released, the definition of which can be found in [39]. A template is shown in Eq. (1):

R I n C h I = 0.03 . 1 S / g r o u p 1 < > g r o u p 2 < > g r o u p 3 / d i r e c t i o n a l i t y

The “0.03” denotes the RInChI version and “1S” the InChI version used. The RInChI standard, under its current scope, does not define fields to store reaction conditions, scale, process type and kinetic data, all critical for any process calculations. The RInChI has the great advantage that it is an entirely open-source standard, building on the widely-adopted InChI and supported by both IUPAC and several major publishing houses. This presents tangible advantages to the proprietary data standards in its ease of adoption and incorporation into in-house software suites. It is understood that XML-based standards are able to capture a greater wealth of data and are better suited to use in databases. This, however, comes at a cost. Firstly, permitting a near-unlimited choice of data to include and an ability to specify units relatively freely results in a lesser engagement of the publishing author with his or her data during publication. Secondly, adoption of an XML-based format is more complicated and requires a greater degree of IT proficiency. The latter point weighs heavily as it has the potential to significantly hinder uptake of a proposed standard. Using the already in-built facility to extend RInChi through auxiliary layers we put forward a potential formal interface between authors, publishers and database providers, ultimately also contributing to the quality of data stored in XML-based datasets.

In this paper we show how an optional auxiliary field appended to the RInChI, termed ProcAuxInfo, could be used for this purpose and demonstrate data integrity upon reverse translation in three examples, before proceeding to show a plausible application of machine readable process data in automated reaction analysis by using the reverse translated data to determine a reaction mass efficiency. To our knowledge this is the first publication trying to provide this additional information in the RInChI standard, and is intended to contribute to the discussion of standards for publication of research data in machine readable formats.

ProcAuxInfo

Definition of a standard

So as to not affect the integrity of the RInChI standard it is proposed that the reaction information is appended to the existing RInChI string and that this field is optional as far as the standard is concerned. In order for the standard to be useful in addressing the challenges set out above, it requires widespread adoption, most easily achieved by demonstrating its use in extending the reach of a paper and by journals mandating submission of the data required to compile it during the editing processes.

The ProcAuxInfo string is to contain some of the reaction data deemed most essential to further analysis, though is open to further extension during subsequent iterations:

Version of ProcAuxInfo
Starting material
Stoichiometry
Reaction temperature
Reaction pressure
(Time: Conversion) pairs
Yield of product and byproducts
Molar amounts of reactants used
Amounts of group3 compounds used
Reactor volume

The ProcAuxInfo field begins with a double dollar sign (“$$”) to clearly demarcate it from the main RInChI, as neither Version 0.03 nor Version 0.02 of the RInChI contain any dollar signs in the standard, and additional ProcAuxInfo layers, as outlined further on. Each field is to be separated by a single vertical line (“|”), thus taking the following form, Eq. (2):

\begin{matrix} P r o c A u x I n f o = $ $ V e r s i o n | S t a r t i n g M a t e r i a l | S t o i c h i o m e t r y o f g r o u p 1 | \\ S t o i c h i o m e t r y o f g r o u p 2 | T e m p e r a t u r e | P r e s s u r e | \\ T i m e : C o n v e r s i o n | Y i e l d | A m o u n t o f g r o u p 1 fed | \\ A m o u n t o f g r o u p 2 f e d | A m o u n t o f g r o u p 3 fed | \\ V o l u m e o f r e a c t o r \end{matrix}

If no data are available for a given field, or sub-field, a question mark (“?”) is to be used as a space-holder instead. If a given group is absent from the RInChI, for example if no auxiliaries are used and thus no group3 exists, then the fields in the ProcAuxInfo relating to the missing group are to contain a question mark too. The current version is 0.01. The version field is to have exactly one decimal point at all times and is to begin with “PAI” to clearly identify the following block.

The “starting material” is the species with respect to which all properties, such as conversion and yield, are specified. This may be the limiting reactant but does not have to be. It is to be specified by its group number followed by the index of its position in that group counting left to right, separated by a colon (“:”). It is realised that since different studies of the same reaction may define different substances as starting materials, the ProcAuxInfo layer will not be canonical. Since reaction searching is, however, carried out through the canonical InChI string and the ProcAuxInfo layer acts as data repository this is not considered to create any problems.

The stoichiometry fields are based on the stoichiometric coefficients of the products and reactants as found in the fully balanced stoichiometric equation, based on which the RInChI is compiled. These are to be integers and positive, as the directionality is already given in the main body of the RInChI. The coefficients are to be listed according to the order of the corresponding species in the respective group and separated from each other by use of a semicolon (“;”).

Reaction temperature is to be given in degrees Kelvin and the reaction pressure in Pascals. The reaction pressure is to be represented in scientific exponential notation in order to save space and to clearly indicate the number of significant digits.

Time is to be specified in seconds, again in scientific exponential notation. Reaction time is often reported as time taken to achieve maximum conversion, though different definitions are possible and the definition used in the particular case is thus not always apparent. For this reason, time is reported as a value pair along with conversion of the starting material. The two values are to be separated by a colon (“:”). To allow kinetic studies it is encouraged to publish multiple time:conversion pairs, each separated by a semicolon (“;”). In the case of a flow experiment residence time:conversion pairs are to be published instead. Both yield and conversion values are to be published in their decimal fractional value out of one rather than as percentage (for example, 0.01 instead of 1%). The yield is to be included for each species derived from the starting material. The yields are to be listed in the order in which the respective products are listed in group1 or group2 and separated by a semicolon (“;”). Where a substance is not derived from the starting material and a yield would thus be meaningless or where no yield data are available, the field for that substance is to contain a question mark (“?”) as a space holder instead. The yield is to be calculated using the following equation:

Y_{i} = \frac{n_{i, o u t} - n_{i, i n}}{n_{S M, i n}}

where Y _i is the yield of species i, n _i,out is the amount of i at the end of the reaction and $n_{i, i n}$ is the amount fed (in the case of the flow reactions these are the corresponding flow rates); n _SM,in is the amount of starting material fed. The conversion is defined as:

X = \frac{n_{S M, o u t} - n_{S M, i n}}{n_{S M, i n}}

Amounts of group1, group2 compounds are to be specified in terms of moles of substance fed (or mol s⁻¹ fed in the case of flow reactions) and listed in the order that the compounds are given in the respective group in the main body of the RInChI. The different values are to be separated using a semicolon (“;”) and given in scientific exponential notation.

For catalysts it may not be meaningful to specify the amounts in moles as it is not always clear what constitutes a molecule of the catalyst. Thus, the catalyst is specified in grams as a base unit. In addition, in the case of flow chemistry or bulk continuous processes the catalyst might be immobilised and thus does not have an associated flowrate, for example in fluidised catalytic beds, coated wall reactors or packed beds. As such, each entry in group3 is to be followed by “: $m$ ” or “: $g$ ”, depending on whether or not it is expressed as moles or grams and subsequently by “: $f$ ” or “ $: a$ ” depending on whether it is a flowrate or an absolute amount. Therefore if three grams of catalyst were immobilised inside the reactor the entry would read “3: $g$ : $a$ ”, while four moles per second of solvent being fed would read as “4: $m$ : $f$ ”. Should the expression of the amount of catalyst only be possible in moles, then this format allows this to be easily accommodated by changing the flag to “: $m$ ” instead of “: $g$ ”. The amount of group3 substances fed is also to be specified in scientific exponential notation.

The current version of RInChI allows for a species to appear in two places, say as reactant and as auxiliary, if a reactant for example also acts as solvent. This could lead to double-counting of masses when compiling the group1, 2 or 3 amount fields. Therefore, if a species appears more than once all entries but the first one for that species in the group1, group2 or group3 amount fields need to be marked appropriately. To this end, they are to be marked with an “x” followed by a colon and the group number and another colon and the index within that group corresponding to the first appearance of the species in the RInChI. Thus, the position where the amount fed can be found is indexed and links back to the first entry without registering the amount twice.

Furthermore, version 0.03 of the RInChI introduces empty fields instead of groups in the case of, for example, incomplete “half” reactions where no reactants or no products are listed. This can be observed in some cases in Reaxys. It is unclear if this is a faulty database entry or already the case in the paper. However, the standard provides for this to be generally applicable. If this is the case the field containing the amount fed of the corresponding group needs to be marked with a question mark as a place holder and left empty otherwise. Similarly, the number of amounts fed specified need to match the number of species specified in the respective group of the RInChI.

The volume of the reactor is to be expressed in terms of metres cubed, m³. In the case of a batch reaction it is to contain the expression “batch” instead. If it was a batch reaction the amounts of group1 and group2 substances given previously are absolute amounts, else they are flowrates. At the same time this provides valuable information about the scale of the reaction (bench, pilot or industrial).

Should the reaction have been carried out at several different sets of conditions (such as different temperatures) a separate ProcAuxInfo is to be published for each set and appended to the previous string.

Should no value be available for a given property the field in question still needs to be included in the string using the requisite separators, but the field itself is to contain a question mark (“?”) as space holder instead of a value.

The use of the ProcAuxInfo is demonstrated below for three published reactions carrying out palladium-catalyzed aziridination of aliphatic amines [40], a ruthenium oxide catalyzed oxidation of benzyl alcohol [41] and a Suzuki coupling. The first two have been chosen from the groups’ publications and the third has been randomly chosen from the reactions classified by Reaxys as Suzuki reactions [42]. However, only the data available in the published article or its published supplementary information were used in all three cases.

Generation of ProcAuxInfo

Example 1

The reaction is carried out between 3,3,5,5-tetramethylmorpholin-2-one (starting material) and (diacetoxyiodo)benzene as reactants forming 2,2,6-trimethyl-4-oxa-1-azabicyclo[4.1.0]heptan-5-one, iodobenzene and acetic acid. Toluene acts as solvent and palladium(II)acetate as catalyst and acetic acid and acetic anhydride as auxiliary substances as shown (Scheme 1).

Scheme 1 — A case study of C–H activation reaction

Using the RInChI generator the following RInChI is generated for this reaction: graphic file with name 13321_2017_210_Figa_HTML.jpg

The reactor volume was 1 × 10⁻⁵ m³. All other required information on the reaction can be found in Tables 2, 3 and 4.

Table 2.

The amounts of substances fed in the example 1

Compound	Amount fed (mol s⁻¹)
3,3,5,5-Tetramethylmorpholin-2-one	8.3 × 10⁻⁷
(Diacetoxyiodo)benzene	8.3 × 10⁻⁷
Acetic acid	8.3 × 10⁻⁶
Palladium(II) acetate	4.2 × 10⁻⁹
Acetic anhydride	1.7 × 10⁻⁶
Toluene	1.5 × 10⁻⁴

Open in a new tab

Table 3.

Conditions of reaction 1

Property	Value
Reaction temperature	393 K
Reaction pressure	6 × 10⁶ Pa
Yield	0.90

Open in a new tab

Table 4.

Residence time: conversion pairs for the reaction 1

Residence time (s)	Conversion
60	0.06
120	0.14
180	0.20
240	0.32
300	0.40
360	0.52
420	0.70
480	0.90
540	1.00
600	1.00

Open in a new tab

The resulting ProcAuxInfo is thus given by: graphic file with name 13321_2017_210_Figb_HTML.jpg

Example 2

This reaction oxidises benzyl alcohol into benzaldehyde and water with molecular oxygen as an oxidant, toluene as solvent and using ruthenium supported on aluminium oxide as a catalyst, as shown in Scheme 2.

Scheme 2 — A case study of benzalcohol oxidation

InChIs are not currently able to represent ruthenium supported on aluminium oxide and thus considers them as separate species. This leads to the following RInChI: graphic file with name 13321_2017_210_Figc_HTML.jpg

All required reaction data can be found in Tables 5, 6 and 7.

Table 5.

Amounts of substances fed into the reaction 2

Compound	Amount fed
Benzyl alcohol	3.3 × 10⁻⁵ mol s⁻¹
Toluene	3.1 × 10⁻⁴ mol s⁻¹
Oxygen	4.9 × 10⁻⁶ mol s⁻¹
Ruthenium	9 × 10⁻³ g
Aluminium oxide	0.991 g

Open in a new tab

Table 6.

Conditions of the reaction 2

Property	Value
Reaction temperature	388 K
Reaction pressure	8 × 10⁶ Pa
Yield	0.25
Reactor volume	9 × 10⁻⁴ m³

Open in a new tab

Table 7.

Residence time: conversion pairs for the reaction 2

Residence time (s)	Conversion
9	0.25

Open in a new tab

Allowance had to be made for the fact that the InChI standard is not able to represent ruthenium supported on aluminium oxide and thus required reporting of the two substances individually. This is a limitation in the InChI standard, which filters down to the RInChI and thus also impacts the ProcAuxInfo layer. Seeing as this limitation originates in the InChI it was not attempted to “fix” this limitation in the ProcAuxInfo layer as this would most likely be the wrong place for such an attempt.

The resulting ProcAuxInfo is thus given by: graphic file with name 13321_2017_210_Figd_HTML.jpg

Example 3

For this example it was decided to encode a Suzuki–Miyaura reaction as this is a very common reaction in organic synthesis. A publication reporting the Suzuki–Miyaura reaction was chosen at random from Reaxys. The specific example [42] carries out a Suzuki–Miyaura reaction using phenylboronic acid and 4-bromotoluene as reagents to produce 4-phenyltoluene. It uses a phosphine ligand, N-methyl-2-pyrrolidinone as solvent and sodium carbonate as base as shown in Scheme 3.

Scheme 3 — An example of a Suzuki–Miyaura reaction

Observing the reported reaction equation it is apparent that the equation is not balanced, since the byproduct species are missing. Another problematic factor is that the base is, at least partially, consumed during the reaction. Reporting it as an agent is, hence, not entirely accurate. Seeing as this example is translating the information provided in the paper this assumption is not questioned but the RInChI is generated taking account of the missing product species. Processing the information with the RInChI API yields the following RInChI: graphic file with name 13321_2017_210_Fige_HTML.jpg

All required reaction data, as taken directly from the paper, can be found in Tables 8, 9 and 10.

Table 8.

Amounts of substances fed into the reaction 3

Compound	Amount fed
Phenylboronic acid	1.1 × 10⁻³ mol
4-bromotoluene	1.0 × 10⁻³ mol
Phosphine ligand	2.2 × 10⁻⁵ mol
N-methyl-2-pyrrolidinone	3.1 × 10⁻² mol
Palladium(II) acetylacetonate	2.2 × 10⁻⁵ mol
Sodium carbonate	Not reported

Open in a new tab

Table 9.

Conditions of the reaction 3

Property	Value
Reaction temperature	363 K
Reaction pressure	Not reported
Yield	0.89
Reactor volume	Not reported

Open in a new tab

Table 10.

Residence time: conversion pairs for the reaction 3

Residence time (s)	Conversion
.	.

Open in a new tab

From the way the data is reported we could deduce the limiting reactant and then the corresponding amounts of agents, which were reported as per cent. The amount of base was not reported at all. This highlights why a precisely defined set of information and the associated units are required when transmitting data, which would force the authors to complete the necessary data.

One might reasonably assume that the reaction was conducted at atmospheric pressure under reflux conditions as no pressure is given in the paper. However, doing so might run the risk of potentially establishing an erroneous assumption as fact. Hence, this is not done here and the pressure field is left blank. Similarly, a mention of the reactor volume is absent from the paper. The paper does not specify with regards to which species the yield is defined, but we could reasonably assume that this would be the limiting reactant.

The paper does not specify any side-reactions or by-products being formed so one might assume that all reacted reactant is converted into product, thus making conversion equal to yield. Given the fact that at least one product species is missing accepting this assumption at face value could be highly misleading. No reaction time is given; thus it is impossible to reliably deduce a residence time:conversion pair in Tables 7 and 10.

Taking the information thus extracted it is possible to produce a ProcAuxInfo string: