An empirical analysis of enzyme function reporting for experimental reproducibility: missing/incomplete information in published papers

Peter Halling; Paul F Fitzpatrick; Frank M Raushel; Johann Rohwer; Santiago Schnell; Ulrike Wittig; Roland Wohlgemuth; Carsten Kettner

doi:10.1016/j.bpc.2018.08.004

. Author manuscript; available in PMC: 2019 Nov 1.

Published in final edited form as: Biophys Chem. 2018 Aug 24;242:22–27. doi: 10.1016/j.bpc.2018.08.004

An empirical analysis of enzyme function reporting for experimental reproducibility: missing/incomplete information in published papers

Peter Halling ^a, Paul F Fitzpatrick ^b, Frank M Raushel ^c, Johann Rohwer ^d, Santiago Schnell ^e, Ulrike Wittig ^f, Roland Wohlgemuth ^g, Carsten Kettner ^h

PMCID: PMC6258184 NIHMSID: NIHMS1505950 PMID: 30195215

Abstract

A key component of enzyme function experiments is reporting of considerable metadata, to allow other researchers to replicate, interpret properly or use fully the results. This paper evaluates the completeness of enzyme function data reporting for reproducibility. We present a detailed examination of 11 recent papers (and their supplementary material) from two leading journals. We found that in every paper we were not able to collect some critical information necessary to reproduce the enzyme function findings. Study of 100 papers used by the SABIO-RK database confirmed some of the most common omissions: concentration of enzyme or its substrates, identity of counter-ions in buffers. A computer system should be better at preventing such omissions, helping secure the scientific record. Many of the omissions found would be trapped by the currently available version of STRENDA DB.

Keywords: enzyme function, metadata, reproducibility, transparency, data quality, standardization

Introduction

Studies of enzyme function are important in many areas, including biochemistry, cell biology, chemical biology and applied biocatalysis. The comprehensive description of experiments on enzyme function is necessary but not trivial, as many factors can affect the results obtained. Unfortunately full description cannot be guaranteed using a simple checklist, as the factors that need to be reported depend on the precise experiment carried out and the type of measurements performed. Most scientists working in the field have experience of finding papers with missing essential information about experimental details, but this often only comes to light when attempting to repeat the experiments described. Some previous studies have shown that key information is rather commonly missing (Wittig et al, 2014ab). Through extensive interactions with the biochemistry community, the Standards for Reporting Enzymology Data (STRENDA) Commission, supported by the Beilstein Institut, has established reporting guidelines with the aim to support authors to comprehensively report kinetic and equilibrium data from their investigations of enzyme activities (http://www.beilstein-strenda.org). The goal is providing a seamless mechanism through which scientists and journals can jointly ensure reliable reporting of data (Apweiler et al., 2005; Apweiler et al., 2010; Gardossi et al., 2010; Tipton et al. 2014).

These guidelines are now recommended by more than 50 international biochemistry journals which publish work on enzyme function. In addition, the guidelines are registered with FAIRsharing.org (a resource on data and metadata standards, inter-related to databases and data policies) as part of the FAIRDOM Guidelines for Systems Biology. However, none of these journals enforce data reporting in compliance with the guidelines. Full reporting is clearly important if others wish to repeat the experiments. But it can be equally if not more important to those who simply wish to compare the reported results or reuse them for purposes like modelling metabolic pathways. The values of kinetic parameters can be very dependent on the precise conditions under which they are measured. Hence accurate and comprehensive reporting of conditions is essential to make results useful to users in fields such as systems biology, synthetic biology and applied biocatalysis.

In this work, we seek to test the efficacy of current reporting in the recent literature by systematically sampling papers investigating enzyme function in two high profile journals. Over the last few years, a shift is occurring in the research community towards sharing to any reader all data necessary to understand, assess, and extend the conclusions of the manuscript. For example, Science suggests using established community repositories and databases to host data (American Association for the Advancement of Science. Science journals: Editorial Policies. Available at www.sciencemag.org/authors/science-journals-editorial-policies). We test the availability of critical enzyme assay data to allow replication of experiments, using the STRENDA Guidelines. We found that in all of the papers sampled, we were unable to find at least one piece of critical data.

Approach adopted

Our empirical sample comprised of 11 papers that report studies on enzyme function in two selected leading journals, which we studied very carefully. Having arbitrarily chosen a recent date, we worked backwards to find all papers including significant enzyme function studies, getting6 and 5 from the two journals. No further selection was made, so these papers should be a representative sample. These 11 papers and their Supplementary Materials were then studied in great detail, and assessed using the STRENDA Reporting Guidelines. All 11 papers were found to have some issues, of which further details are given below. Clearly some are more serious than others, but all should be avoided when the requirement of a comprehensive description of an enzymology experiment should be met. The object of this study is not to show deficiencies in the papers of a particular journal or individual authors. Undoubtedly similar problems could be found by detailed study of papers from any journal and most authors. Hence references are not given below, and the quoted text has irrelevant sections removed to prevent easy identification by searches. Full references and quotes have been made available to editors and referees on a confidential basis. We do also give full details of some previous publications by co-authors of the present paper, who have identified similar issues with their own work! In addition, to get more meaningful statistics, we have looked for selected omissions in papers used as sources for the SABIO-RK database on enzyme function (http://sabiork.h-its.org/) (Wittig et al, 2018), extending the previous study (Wittig et al, 2014ab). For this purpose we have analysed the 100 most recent papers taken from the 10 journals that are the most common sources for this database.

The objective of our approach is to show how easy it is for authors, referees, editors and others to fail to spot missing information. Avoiding omissions in reporting data is a job that electronic systems do much better than the human brain. It is for this reason that the Beilstein Institut, with guidance from the STRENDA Commission, has created a web-based system (STRENDA DB) that incorporates the STRENDA guidelines. The STENDA-DB data submission form automatically checks the manuscript data prior or during the publication process in compliance with the guidelines. The data entered normally become publicly available in a database only after the corresponding article has been peer-reviewed and published in a journal. The current released version is freely available for use at https://www.beilstein-strenda-db.org/strenda/, and has been described in more detail (Swainston et al, 2018). As this tool develops it should gain ever more sophisticated algorithms to handle the admittedly quite complicated rules for what details are needed for each type of experiment. But many of the issues noted below would already be trapped by the current version of STRENDA DB, as indicated in the table showing details. This is done to illustrate better the omissions in the papers analysed, rather than to show comprehensively what the current version of STRENDA DB can and cannot do.

Details of issues found in recent published papers

We evaluated by careful inspection whether each paper provided information to carry out replication and judged that all papers missed some details critical to make the results potentially reproducible. The missing data issues are given in the following Table 1. In every case we looked carefully to see if apparently missing information could be found elsewhere in the paper or the Supplementary Materials, but did not find it. Extracts shown may be from the paper itself or its Supplementary Materials. In each case we have removed some unnecessary detail from the quotations to make the sources hard to identify. Deletions are indicated by ….., while substitutions have been made when ..[enzyme].. or ..[substrate].. replaces the full name.

Table 1. Missing information found in 11 sample papers.

The table also illustrates where the use of STRENDA DB would ensure proper reporting.

Issues found in papers	Where could STRENDA DB help?
Paper 1 In the description of a single turnover experiment, we find “After the pre-incubation periods, the samples were heated to 37°C and reaction was initiated by mixing 1:1 with 25 mM HEPES, pH 7.5, 50 mM KCl, 16 mM MgCl2, 1 mM DTT, 0.1 mg/mL BSA and 5% glycerol, to a final concentration of 2 μM enzyme and 5 nM substrate.” HEPES buffers require a positive counter-ion, usually Na+ or K+, but the choice can affect enzyme function. The inclusion of KCl in the buffer here suggests there may be monovalent cation effects on this enzyme. A reviewer has pointed out that the high temperature coefficient of HEPES buffers will also lead to a significant pH change in experiments with temperature varied between 20 °C and 37 °C. (See further details for paper 5).	STRENDA DB requires naming of a specified compound for all buffer components. If the user follows the instructions given when the compound is assigned as a buffer, full counter-ion details will be given.

Paper 2 An assay is described “to determine the initial rate of O2 consumption in solutions employing 0.2 μM ..[enzyme].., dissolved atmospheric oxygen (i.e., 0.250 mM), and sodium formate (100 mM) in 50 mM acetate (pH 3.6).” The wording suggests that sodium formate might have been added to an acetate buffer previously adjusted to pH 3.6. Acetate at pH 3.6 has rather weak buffering capacity, whereas formate is close to its pKa and so will dominate buffering – hence the final pH would end up quite far above 3.6. Of course the formate might have been adjusted to pH 3.6, but this is not clear. Furthermore, the counter-ion to the acetate is not specified – probably Na+, but can’t be sure.	STRENDA DB requires the final assay pH, which should be the value after all components are added.
Results from assays are reported as apparent k_cat and k_cat/K_M values. But there is no information on the range of substrate concentrations studied to estimate these values. From the values presented we can see that K_M must be much lower than the fixed value of substrate concentration quoted elsewhere.	STRENDA DB requires entry of buffer counter-ions if help recommendations are followed. STRENDA DB requires entry of the range of concentrations studied for any substrate that is varied.

Paper 3 A part of the study is reported as “Test reactions using ..[enzyme].. (1 mg/mL), 2 mM ..[substrate], 50 mM HEPPS (pH 8.0), and 1 mM of various nucleotide-linked sugars were allowed to incubate for 12 h at 37 °C.” Again the counter-ion in the HEPPS buffer is not stated.	STRENDA DB requires entry of buffer counter-ions if help recommendations are followed. STRENDA DB requires entry of the buffer counter-ions if help recommendations are followed.
For determination of kinetic parameters we have “The 1.5 mL reaction mixtures contained 5 mM ..[substrate].., 50 mM HEPPS (pH 8.0), 0.28 μM ..[enzyme].., and ..[substrate 2].. concentrations ranging from 0.01−2.0 mM.” Again the buffer counter-ion is not stated. (This is a single time point assay, and the reaction time is not stated here, but a comment in the Results section confirms that this was within the linear range.)	STRENDA DB requires entry of buffer counter-ions if help recommendations are followed. STRENDA DB requires entry of the range of concentrations studied for any substrate that is varied.

Paper 4 A plot of the Michaelis-Menten equation for the effect of substrate concentration has its y axis labelled just “[hour−1]”. This may be an initial linear reaction rate presented as mol product (mol enzyme) −1 hour−1 but this is not made clear.	STRENDA DB does not currently allow entry of initial rate values, although this is a planned enhancement. Units for all results entered must be selected from a pre-set list which should have
	unambiguous meanings.

Paper 5 The legend for a figure showing effects on k_cat reads “The concentration of NADH was 80 μM and concentration of ..[substrate].. was increased as needed with increasing temperature to maintain substrate saturation. The data was normalized relatively to that in the buffer without TMAO. All measurements were performed in triplicate with average values reported. Buffer: 100mM Na-phosphate, pH 7.2 with or without TMAO.” No information is given about the actual substrate concentrations studied, and the evidence that enzyme saturation was ensured so that k_cat values were obtained. Furthermore, the value of k_cat used for normalisation is not stated, so no absolute values can be deduced.	STRENDA DB requires entry of the range of concentrations studied for any substrate that is varied. To make an independent confirmation of saturation would require initial rate data for various concentrations of both substrates, which cannot currently be entered in STRENDA DB. Another planned enhancement is specification of the equation which has been fitted to the data, which would clarify the meaning of k_cat values. STRENDA DB requires that k_cat values are reported in absolute units, selected from a preset list.
Studies of the effect of temperature over a significant range are reported, but without necessary detail on the treatment of pH. Was buffer pH set at some reference temperature, then heated or cooled as required? This can cause significant changes in actual pH. Or were pH values actually set/measured at each assay temperature?	Fuller details of the treatment of pH are a planned enhancement.

Paper 6 Assays use a “standard buffer, consisting of 50 mM HEPES (pH 7.5), 150 mM KCl, 10 mM MgCl2, and 5 mM β- mercaptoethanol (β-ME).” The description misses the counter-ion in the HEPES buffer, and the presence of KCl again makes one suspect monovalent cation effects.	STRENDA DB requires entry of buffer counter-ions if help recommendations are followed.
For a pre-steady state study “In a typical experiment, one syringe contained ….. enzyme at a final concentration of 5 μM … monomer, 8 units/mL PPiase, a saturating concentration of ..[substrate], and 25 μM (final) ..[substrate 2] (if present) in reaction buffer, while the second syringe contained 100 μM [α−32P]ATP in reaction buffer.” Unlike “standard buffer”, “reaction buffer” is not defined elsewhere in the paper or Supplementary Information.	STRENDA DB requires entry of the buffer type, concentration and pH for every assay. If these are the same as the previous assay, default values can be left in place – but these can be edited where necessary.
The last extract, and also a figure legend, refer to “a saturating concentration of ..[substrate]..”, but no information is given as to what this was, and how the authors know it to be saturating.	STRENDA DB requires entry of substrate concentrations, either a fixed value or a range. To make an independent confirmation of saturation would require initial rate data for various substrate concentrations, which cannot currently be entered in STRENDA DB.
A table reports kinetic parameters for both substrates, without indicating whether these are true or apparent values. For a two substrate enzyme there will be one true k_cat value and a true K_M value for each substrate. It is possible to obtain apparent K_M and k_cat values when one substrate concentration is held constant and the other is varied. However, these values depend on the arbitrary choice of fixed concentration for the other substrate, and so are only meaningful if this is stated. Because two k_cat values are stated, it is likely that the values are apparent ones, but the paper gives no information on the chosen fixed substrate concentrations.	A planned enhancement to STRENDA DB is the entry of the equation fitted to data in order to estimate kinetic parameters. STRENDA DB currently requires entry of substrate concentrations used, either as a fixed value or a range.
In a table and the text there are reported rates of a presteady state process stated to show exponential and linear phases. Rates of both these phases are given in units of s-1. For an exponential process the meaning is clear as an apparent first order rate constant. But for the linear phase the meaning needs to be clarified – probably it means mol reaction per mol enzyme per s.	STRENDA DB does not currently allow entry of pre-steady state kinetic parameters, but this is a possible extension.

Paper 7. Describing kinetic experiments on the enzyme, the authors state “After suitable period of time (2–80 min, depending on the activity …….), the reactions were simultaneously quenched”. This implies that time courses were studied, which would allow estimation of initial rates and from them, kinetic parameters. But inspection of the results indicates that most k_cat and K_M values quoted are actually based on a single time point. However, it seems that progress was not actually fully linear over the time range used, as evidenced by a plot showing a rather poor fit to the Michaelis-Menten equation when rates assumed constant over different time periods are combined. A relatively subtle problem, but we really need raw time and conversion data to allow better analysis of kinetics.	The current STRENDA DB version would not check this, indeed it does not currently allow entry of single time point data. Future enhancements do include more details of data from which k_cat and K_M values are estimated. (Relevant details might be captured by the existing free text “Methodology” field.)
A figure gives “rate” in units of min-1. This may be a turnover frequency (mol product formed per mol catalyst per min), but this is not clear. Rates are normally reported as concentration or molar amount per unit time.	STRENDA DB does not currently accept rate data, but this is a planned enhancement. Clear units are enforced for all data entered.

Paper 8 In describing an assay, the authors write “30 pmol of 5’ fluorescein labeled primer (…..) were annealed with 30 pmol of template (……) and 0.4 μg of polymerase by heat denaturation at 90◦C for 1 minute and allowing to cool to room temperature. Reactions were initiated by the addition of “start” mix which contained 50mM Tris-HCl (pH8.4), 10 mM (NH4)2SO4, 10 mM KCl, 2 mM MgSO4 and 200 μM dNTPs.” Here amounts of nucleic acid and template are presented, but with no indication of assay volume. Hence the reaction cannot be repeated with the correct concentrations. The same problem is found in two further assay descriptions in this paper.	STRENDA DB requires concentrations for all assay components as well as a defined assay temperature
In another assay there is no mention of the amount or concentration of enzyme used, but the results are given in units of nmol/s - this is meaningless without knowledge of the amount of enzyme present.	STRENDA DB requires concentrations for all assay components as well as a defined assay temperature STRENDA DB requires enzyme concentration (and also a statement on how this was measured).
In a radiochemical assay the authors report “Reactions (100 μL) consisting of assay buffer, 1 mM MgSO4, and 500 nM duplex were initiated by variable amounts of α-P32-dCTP (0.003–400μM), which was diluted 1:400 in unlabeled dCTP.” This is not completely clear as to the final dCTP concentration. Is this 0.003–400 μM, or is it actually 400 times higher? (The latter is unlikely but possible).	STRENDA DB requires concentrations of all substrates.
Two figures show values of k_cat and K_M without units.	STRENDA DB requires units for all kinetic parameters.
No error estimates are given for reported k_cat and K_M values, and the number of replicates is not stated.	STRENDA DB requires error estimates. The possibility to report more detail on the basis for these error estimates is a possible enhancement.

Paper 9 An assay buffer is described as “50 mM GPT buffer (50 mM glycine, 50 mM phosphate, and 50 mM Tris), pH 7.6”. This is not clear. It could be a mixture of glycine and Tris base adjusted to pH with phosphoric acid, but the total phosphate concentration in this case would be less than 50 mM. Hence there are probably also other counter-ions present, such as Na⁺ and/or Cl⁻.	STRENDA DB requires entry of buffer counter-ions if help recommendations are followed.
In a later assay description, no buffer is mentioned. Since at least two different buffers are reported for other enzyme assays, we cannot guess which is used (or whether it’s actually a third).	STRENDA DB requires entry of the buffer type, concentration and pH for every assay. If these are the same as the previous assay, default values can be left in place – but these can be edited where necessary.
Studies of the effect of temperature over a significant range are reported, but without necessary detail on the treatment of pH (see paper 5).	Details of the treatment of pH are planned as enhancements for future releases of STRENDA DB.

Paper 10 In an assay description we find “The mixtures were diluted into 50 mM Tris-HCl, 5 mM NaCl, pH 7 and mM ascorbic acid ……..” A value is missing before “mM ascorbic”.	STRENDA DB requires concentrations for all assay components.
Later, the authors report “The mixtures were diluted into 50 mM Tris-HCl, 5 mM NaCl, pH 7 and …. [the compound] ….. under investigation (HEPES, MES and MOPS @ 500 mM, ascorbic acid @ 100 mM, or Tris alone @ 50 mM) to a final concentration of 16.7 nM …. protein and a final volume of 300 μL.” The wording suggests that test compounds were added to a buffer previously adjusted to pH 7. Adding these as a single compound could substantially alter pH. But perhaps the additives were pH adjusted, in which case details of the counter-ions present are missing.	STRENDA DB requires entry of full details of buffer species and counter-ions, provided help recommendations are followed. If these species were assigned as “Other components”, user would still have to select a defined chemical compound to enter. STRENDA DB requires entry of assay pH, which should be the final value in the assay.
In describing an assay there is “The mixtures were diluted into 500 mM HEPES, pH 7”, but no mention of the cation in the HEPES buffer.	STRENDA DB requires entry of buffer counter-ions if help recommendations are followed.
Figures in the paper and in Supplementary Materials show that the progress of the reaction studied is clearly not linear. However, data is presented as turnover frequency (min−1) averaged over the full experimental period. Really what should be reported is observed number of turnovers after a given reaction time.	STRENDA DB does not currently accept input of single time point data, but this is a planned enhancement.

Paper 11 An assay mixture is described as comprising “20 mM MOPS-KOH, pH 7, 8 mM MgSO₄, 3 mM ATP”. In this case where the ATP concentration is within an order of magnitude of the buffer concentration, the salt form or counter-ions of the added ATP should be given. The counter-ions may have a significant effect, and a significant pH effect is also possible.	STRENDA DB requires entry of a defined chemical compound for all assay components. Hence the appropriate salt of ATP would have been selected. The final assay pH is also required, which should be the value after all components are added.

Open in a new tab

Footnote to table: A reviewer has pointed out that the IUBMB Recommendations for ‘Symbolism and Terminology in Enzyme Kinetics (Recommendation 1981)’ http://www.sbcs.qmul.ac.uk/iubmb/kinetics/index.html imply that k_cat should only be used where it can be calculated using the measured active site concentration in the enzyme preparation used. This would imply that the presentation of results as k_cat in papers 2, 3 ,4 and 8 is wrong. Since papers 2–4 do make clear the way protein concentration has been measured, their experiments are reproducible in this respect. However, their reported k_cat values may only set a minimum for the true microscopic rate constant.

As referees of manuscripts, we will quite often spot issues of this type and require their correction. But as a referee, the focus is naturally on the scientific interest and quality of the paper, and the extent to which conclusions are supported by the results presented. In the present cases, where we did not need to be concerned with such topics, we could focus on these issues of completeness in the description of the experiments. As noted, this is ultimately a job better suited to a computational system, leaving referees to deal just with the scientific quality, for which human judgment is essential. It is of course important to assess whether the experimental design was sensible and the analysis of the results appropriate. But this is beyond the scope of our current paper. We simply focus on a more basic issue: does the paper fully report what was done? If essential details are omitted, then complete further expert evaluation is impossible.

Data omissions in published papers by authors of this article

The papers analysed in Table 1 are presented anonymously, because we do not suggest they are in any way unusual in having these deficiencies. Our next step is to show similar omissions in some published papers by co-authors of this article to illustrate the complexity of reporting enzyme function data Table 2.

Table 2. Omissions in published papers by authors of this article.

We also show where STRENDA DB would have avoided those omissions.

Issues found in papers	Where could STRENDA DB help?
Michele Cianci, Bartlomiej Tomaszewski, John R. Helliwell, and Peter J. Halling (2010) “Crystallographic Analysis of Counterion Effects on Subtilisin Enzymatic Action in Acetonitrile” J. Am. Chem. Soc. 132, 2293–2300. In Materials & Methods, under “Cs Soaking, Cross-Linking, and Acetonitrile Wash” there is “So the soak solution was prepared freshly by mixing 2 volumes of 25% aqueous glutaraldehyde with 3 volumes of 250 mM Na-cacodylate buffer pH 6.5 containing 25% v/v glycerol and, where required, 1.5 M salt. Crystals were soaked in this for 15 min, with vortexing in the case of suspensions of multiple crystals for activity assay. Washing in acetonitrile followed immediately.” The presentation of the buffer and pH does not make it clear whether:	STRENDA DB requires entry of the pH, which should be the final value in the assay mixture. Further details of the treatment of pH are a planned enhancement.
a) 250 mM Na-cacodylate pH 6.5 was first prepared, then 25% glycerol was added, or
b) pH 6.5 is an apparent pH reading in the aqueous-glycerol mixture from an electrode calibrated in dilute aqueous buffer, or
c) pH 6.5 refers to an absolute pH scale established for 25% aqueous glycerol, with a protocol using appropriate standards in this mixture.

Richard S. Hall, Rakhi Agarwal, Daniel Hitchcock, J. Michael Sauder,Stephen K. Burley, Subramanyam Swaminathan and Frank M. Raushel (2010) “Discovery and Structure Determination of the Orphan Enzyme Isoxanthopterin Deaminase” Biochemistry 49, 4374–4382. Under “Measurement of Enzymatic Activity”, there is “The coupling system contained 7.4mM alpha-ketoglutarate, 0.4mM NADH, 6 units of glutamate dehydrogenase, and 100 mM HEPES (pH8.5) in a final volume of 0.25mL.” The cation present in the HEPES buffer is not stated. Also, the range of substrate concentrations used to measure the kinetic constants was not clearly reported.	STRENDA DB requires entry of buffer counter-ions if help recommendations are followed. In addition, STRENDA DB requires entry of substrate concentrations, either a fixed value or a range.

Wolfgang E. Schäfer, Johann M. Rohwer and Frederik C. Botha (2004) “A kinetic study of sugarcane sucrose synthase”. Eur. J. Biochem. 271, 3971–3977.
In the description under “SuSy assays”, the assay temperature is not specified.	STRENDA DB requires the assay temperature.

Table 1 does not provide error estimates for K_i values.	STRENDA DB requires error estimates for all kinetic constants entered.

Open in a new tab

Missing data statistics of papers used as sources for SABIO-RK

The SABIO-RK database is an expert-curated collection of enzyme kinetic values, of particular value in modelling multi-enzyme networks and metabolism (Wittig et al, 2018). Wittig et al. (2014ab) recently investigated publications on enzyme function with the aim to detect omissions that complicate the efficient data extraction for SABIO-RK. This analysis showed a number of issues that prevent readers from repetition of the experiments such as missing temperature or pH values at which the assays have been carried out. The major focus however was on problems that complicated the abstraction of data into the database, such as the absence of unique identifiers for enzymes (e.g. UniProtKB codes). Such issues would often not prevent a dedicated expert from reproducing the experiments, but are a serious impediment for database curators, and researchers who wish to use data from many publications, for example in systems biology models. Other problems of this type identified were:

required information is scattered in different parts of the paper, and between text, tables and figures
no use of recognised unique Identifiers or controlled vocabularies for source organism, compounds used in assay, etc
a reference (which often leads to a sequence of further references) must be consulted to find details of methods
values are given in non-standard units, e.g. mg/mL rather than M
assay concentrations are not stated, but must be calculated from separate values of amount and volume
full reaction catalysed is not stated, e.g. products not named

It should be noted that the consistent use of STRENDA DB will also deal with issues of this type.

In this paper, however, we focus on issues that would definitely prevent repetition of the work (Table 3). These will clearly also affect the meaning and transferability of the results reported. It is clear that omissions of this type are common. Previously analysis by Wittig et al (2014ab) indicated that such omissions are not becoming less frequent over time. The larger test set of papers analysed here allows us to make more confident comments about the most common types of omission, which hence should be priority targets for checking.

Table 3:

Statistics for omissions based on recent papers used in SABIO-RK database

Missing information	Found in papers (%)
No reaction pH	4
No temperature	11
States “room temperature”	5
No information on buffer	3
No enzyme concentration	45
No substrate concentrations	11
“Tris buffer” with no counter-ion stated	22

Open in a new tab

The 100 papers analysed for this table were all from one of the 10 journals most represented in the SABIO-RK database (J Biol Chem, Biochemistry, Biochem J, Eur J Biochem, Biochim Biophys Acta, Arch Biochem Biophys, J Bacteriol, FEBS Lett, Plant Physiol, Proc Natl Acad Sci U S A). They are the 100 most recent (as of September 2017), with the earliest dating from 2008.

The most common omission is clearly the failure to report the concentration of the enzyme used. Authors may have allowed for this in reporting a V or k_cat value, but the actual value used is necessary, for example to be sure that studies were in the range where the reaction rate is proportional to enzyme concentration.

The next most common omission is failing to report the counter-ion used in a buffer, which was also identified in 5 out of the 11 papers examined more closely in Table 1. Any ions present in the reaction medium can affect the enzyme behaviour, so these should be identified. Common lab parlance will refer to “phosphate buffer” or “Tris buffer”, but this should not allow counter-ions to be omitted in formal reporting.

Failure to report substrate concentrations used (usually a range) was also found in 4 out of the 11 papers. When estimating K_M or to “ensure saturation”, these are needed to test the validity of the conclusions. In the case of two-substrate enzymes it is not trivial to ensure saturation with both substrates, and kinetic constants obtained varying only one substrate are typically apparent values dependent on the fixed concentration of the other substrate.

Some other issues identified in several of the 11 papers would not have been spotted in our analysis of the SABIO-RK papers.

A value of pH may be given, but its meaning is not clear. In two papers it is not clear whether this is the final reaction pH or that of a buffer to which significant acidic or basic compounds are then added. Ideally the final reaction pH should be reported. But at least it should be completely obvious at what stage the quoted pH value was measured or set. In another two papers there are issues about the treatment of pH for experiments conducted at a significant range of temperatures. The most common protocol is to measure or set the pH of a buffer at a reference temperature (typically about 20 ^oC), then heat or cool to the chosen assay temperature. This can be reproduced, but is theoretically not ideal, because the pH will change significantly with temperature, particularly for some buffer species. A better option, but less used, is to calibrate the pH electrode with an appropriate standard at each temperature, then measure or adjust buffer pH at the same temperature. When it is not clear which approach is adopted, the experiments cannot be reproduced precisely.

In two papers kinetic parameters are given without units or with only relative values. Using values relative to some reference can often be a useful way to present results, but the absolute reference value should always be stated, so that absolute values under any conditions can be calculated if wanted. Papers like these would not be incorporated in SABIO-RK, as the parameters are not usable.

Conclusions

We find that 11 recent papers reporting research on enzyme function, chosen non-selectively, all do not provide enough information to reproduce their findings. Our detailed analysis is backed up by statistics from 100 papers used as sources for the SABIO-RK database. Therefore, we can be confident that a large fraction of published papers have similar critical omissions important for replication purposes. It is clear that the field would benefit from systems ensuring that necessary data and meta-data are properly recorded, so that others can use results or repeat experiments in confidence. It is obvious that paper-based reporting guidelines will not be widely applied, neither by the authors nor by the journals, as discussed in the introduction. However, the use of an electronic tool such as STRENDA DB should increase the chances of the adoption of and reporting in compliance with guidelines. Moving towards deposit of data, at the time of publication, in open and trusted repositories appears to be a natural and obvious step in research and publication policy. This should improve the quality, openness and transparency of experimental enzyme function data in the literature, especially if planned enhancements are implemented, including algorithms that validate datasets as complete.

Highlights.

Most enzyme function papers still omit meta-data essential for reproduction
Enzyme or substrate concentrations are often missing
Humans are not good at spotting such omissions, computer tool is needed
Many can be prevented by currently available STRENDA DB tool
Standardized description for assay conditions and results is required

Acknowledgments

SS acknowledges the funding from NIH/NIDDK under grant R25 DK088752.

Footnotes

Declarations of interest: none.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

Apweiler R, Cornish-Bowden A, Hofmeyr J-HS, Kettner C, Leyh TS, Schomburg D and Tipton KT (2005) The importance of uniformity in reporting protein-function data. Trends Biochem. Sci. 30:11–12. [DOI] [PubMed] [Google Scholar]
Apweiler R, Armstrong R, Bairoch A, Cornish-Bowden A, Halling PJ, Hofmeyr J-HS, Kettner C, Leyh TS, Rohwer J, Schomburg D, Steinbeck C and Tipton KT (2010) A large-scale protein-function database. Nature Chemical Biology 6: 785 DOI:10.1038/nchembio.460. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gardossi L, Poulsen PB, Ballesteros A, Hult K, Švedas VK, Vasić-Rački Đ, Carrea G, Magnusson A, Schmid A, Wohlgemuth R, Halling PJ (2010) Guidelines for reporting of biocatalytic reactions. Trends Biotechnol. 28, 171–180. [DOI] [PubMed] [Google Scholar]
Swainston Neil, Baici Antonio, Bakker Barbara M., Athel Cornish-Bowden Paul F. Fitzpatrick, Halling Peter, Leyh Thomas S., Claire O’Donovan Frank M. Raushel, Reschel Udo, Rohwer Johann M., Schnell Santiago, Schomburg Dietmar, Tsai Ming-Daw, Westerhoff Hans V., Wittig Ulrike, Wohlgemuth Roland and Kettner Carsten (2018) “STRENDA DB: enabling the validation and sharing of enzyme kinetics data” FEBS Journal, 285, 2193–2204. DOI: 10.1111/febs.14427 [DOI] [PMC free article] [PubMed] [Google Scholar]
Tipton Keith F., Armstrong Richard N., Bakker Barbara M., Bairoch Amos, Athel Cornish-Bowden Peter J. Halling, Hofmeyr Jan-Hendrik, Leyh Thomas S., Kettner Carsten, Raushel Frank M., Rohwer Johann, Schomburg Dietmar, Steinbeck Christoph (2014) Standards for Reporting Enzyme Data: The STRENDA Consortium: What it aims to do and why it should be helpful. Perspectives in Science 1(1–6):131–137. DOI:10.1016/j.pisc.2014.02.012 [Google Scholar]
Wittig Ulrike, Kania Renate, Bittkowski Meik, Wetsch Elina, Shi Lei, Jong Lenneke, Golebiewski Martin, Rey Maja, Weidemann Andreas, Rojas Isabel, Müller Wolfgang (2014a) “Data extraction for the reaction kinetics database SABIO-RK” Perspectives in Science 1, 33–40 [Google Scholar]
Wittig Ulrike, Rey Maja, Kania Renate, Bittkowski Meik, Shi Lei, Golebiewski Martin, Weidemann Andreas, Müller Wolfgang and Rojas Isabel (2014b) “Challenges for an enzymatic reaction kinetics database” FEBS Journal 281, 572–582 [DOI] [PubMed] [Google Scholar]
Wittig Ulrike, Rey Maja, Weidemann Andreas, Kania Renate, Müller Wolfgang (2018) SABIO-RK: an updated resource for manually curated biochemical reaction kinetics. Nucleic Acids Res 46(D1):D656–D660 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Apweiler R, Cornish-Bowden A, Hofmeyr J-HS, Kettner C, Leyh TS, Schomburg D and Tipton KT (2005) The importance of uniformity in reporting protein-function data. Trends Biochem. Sci. 30:11–12. [DOI] [PubMed] [Google Scholar]

[R2] Apweiler R, Armstrong R, Bairoch A, Cornish-Bowden A, Halling PJ, Hofmeyr J-HS, Kettner C, Leyh TS, Rohwer J, Schomburg D, Steinbeck C and Tipton KT (2010) A large-scale protein-function database. Nature Chemical Biology 6: 785 DOI:10.1038/nchembio.460. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] Gardossi L, Poulsen PB, Ballesteros A, Hult K, Švedas VK, Vasić-Rački Đ, Carrea G, Magnusson A, Schmid A, Wohlgemuth R, Halling PJ (2010) Guidelines for reporting of biocatalytic reactions. Trends Biotechnol. 28, 171–180. [DOI] [PubMed] [Google Scholar]

[R4] Swainston Neil, Baici Antonio, Bakker Barbara M., Athel Cornish-Bowden Paul F. Fitzpatrick, Halling Peter, Leyh Thomas S., Claire O’Donovan Frank M. Raushel, Reschel Udo, Rohwer Johann M., Schnell Santiago, Schomburg Dietmar, Tsai Ming-Daw, Westerhoff Hans V., Wittig Ulrike, Wohlgemuth Roland and Kettner Carsten (2018) “STRENDA DB: enabling the validation and sharing of enzyme kinetics data” FEBS Journal, 285, 2193–2204. DOI: 10.1111/febs.14427 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Tipton Keith F., Armstrong Richard N., Bakker Barbara M., Bairoch Amos, Athel Cornish-Bowden Peter J. Halling, Hofmeyr Jan-Hendrik, Leyh Thomas S., Kettner Carsten, Raushel Frank M., Rohwer Johann, Schomburg Dietmar, Steinbeck Christoph (2014) Standards for Reporting Enzyme Data: The STRENDA Consortium: What it aims to do and why it should be helpful. Perspectives in Science 1(1–6):131–137. DOI:10.1016/j.pisc.2014.02.012 [Google Scholar]

[R6] Wittig Ulrike, Kania Renate, Bittkowski Meik, Wetsch Elina, Shi Lei, Jong Lenneke, Golebiewski Martin, Rey Maja, Weidemann Andreas, Rojas Isabel, Müller Wolfgang (2014a) “Data extraction for the reaction kinetics database SABIO-RK” Perspectives in Science 1, 33–40 [Google Scholar]

[R7] Wittig Ulrike, Rey Maja, Kania Renate, Bittkowski Meik, Shi Lei, Golebiewski Martin, Weidemann Andreas, Müller Wolfgang and Rojas Isabel (2014b) “Challenges for an enzymatic reaction kinetics database” FEBS Journal 281, 572–582 [DOI] [PubMed] [Google Scholar]

[R8] Wittig Ulrike, Rey Maja, Weidemann Andreas, Kania Renate, Müller Wolfgang (2018) SABIO-RK: an updated resource for manually curated biochemical reaction kinetics. Nucleic Acids Res 46(D1):D656–D660 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

An empirical analysis of enzyme function reporting for experimental reproducibility: missing/incomplete information in published papers

Peter Halling

Paul F Fitzpatrick

Frank M Raushel

Johann Rohwer

Santiago Schnell

Ulrike Wittig

Roland Wohlgemuth

Carsten Kettner

Abstract

Introduction

Approach adopted

Details of issues found in recent published papers

Table 1. Missing information found in 11 sample papers.

Data omissions in published papers by authors of this article

Table 2. Omissions in published papers by authors of this article.

Missing data statistics of papers used as sources for SABIO-RK

Table 3:

Conclusions

Highlights.

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

An empirical analysis of enzyme function reporting for experimental reproducibility: missing/incomplete information in published papers

Peter Halling

Paul F Fitzpatrick

Frank M Raushel

Johann Rohwer

Santiago Schnell

Ulrike Wittig

Roland Wohlgemuth

Carsten Kettner

Abstract

Introduction

Approach adopted

Details of issues found in recent published papers

Table 1. Missing information found in 11 sample papers.

Data omissions in published papers by authors of this article

Table 2. Omissions in published papers by authors of this article.

Missing data statistics of papers used as sources for SABIO-RK

Table 3:

Conclusions

Highlights.

Acknowledgments

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases