Skip to main content
The Clinical Biochemist Reviews logoLink to The Clinical Biochemist Reviews
letter
. 2005 Nov;26(4):155–158.

Uncertainty of Measurement: What it is and What it Should Be

Tony Badrick 1,*, Robert C Hawkins 2, Susan R Wilson 3, Peter E Hickman 4
PMCID: PMC1320178  PMID: 16648885

Uncertainty of Measurement (UM) is defined [ISO15189 (3.17)] as “a parameter associated with the result of a measurand that characterises the dispersion of values.”2

In the clinical laboratory, similar information has long been available in the form of the Standard Deviation (SD) as a measure of imprecision. Thus the current discussion is about an old concept that is routinely measured and available from all accredited laboratories. What is therefore surprising is that UM is causing such anxiety and interest when it is so familiar to laboratory professionals. Perhaps some are concerned that highlighting UM issues may be interpreted by clinicians as a confession of previously undisclosed “laboratory error”.

Although UM may not be a new idea for the laboratory, the present emphasis on the statistics tends to overlook the far greater problem of over-interpretation of results by clinicians. There are several areas where uncertainty in laboratory results can cause clinical problems. Examples are given below, as well as suggestions as to how guidelines on UM could be extended to provide useful information to clinical users.

1. “Does This Change in Result Reflect a Pathological Process?”

Clinicians are often faced with the problem, and may accost the laboratory with “is this change in result a ‘real’ change in a patient or is it simply a reflection of ‘noise’ in the assay?” How can the laboratory aid the clinician with answering this question?

There are 5 major components involved in the variability of test results. They are: pre-analytical factors, intra-individual variation, assay imprecision, operator differences and pathological processes. Of these, the clinician is only interested in the last.

The ways in which pre-analytical factors can influence results of analyses are many. Pre-analytical factors may be either physiological, such as fasting or pregnancy, related to a particular test, such as the timing of a drug level or a hormone in relation to a cycle, or a collection artefact, such as difficulty with collection, poor preservation of the sample post-collection or the use of the incorrect anticoagulant. There are also potential interferences in assays such as the presence of heterophilic antibodies or drugs leading to incorrect results and invalid interpretations. Efforts can be made to minimise such variables but their effect cannot be totally eliminated. For clinical laboratory analyses, UM assessment generally excludes pre-analytical factors.

The issue of biological variation is discussed in section 4 below.

With regard to assay imprecision, it is possible to state whether a given change in two measurements of the same analyte using the same assay can be explained by analytical imprecision alone.3 If one assumes that the SD for the assay is the same at the two concentrations in question, then that SD can be used to estimate the maximum change in analyte concentration which is due essentially to analytical imprecision.

For example, let us take potassium levels in a patient changing from 4.8 to 5.2 mmol/L in an assay with an imprecision (SD) of 0.1 mmol/L, and assuming that these measurements are statistically independent. If one wants to be 95% confident that an apparent change in concentration is not due to assay imprecision alone, then the difference between the two results must be greater than a derived value, known as the Reference Change Value (RCV). The RCV is calculated using the SD multiplied by 2.77. The factor, 2.77, is derived from z x √2, where z = 1.96, z is determined by the 95% confidence interval values and √2 arises as we are comparing two results with the same SD. In our particular example of potassium, the calculated RCV is 0.3 mmol/L (2.77 x 0.1), so we conclude that the change from 4.8 to 5.2 represents a ‘likely’ change in the patient.

The choice of 95% as the confidence level is widely used in many statistical applications but it should be appreciated that its use stems empirically from Fisher in the 1920s.4 He was interested in (agricultural) experiments comparing different treatments and needed to answer the question “does this outcome differ from that of another treatment?” He wanted to choose a decision point (confidence level) that was extremely unlikely to have arisen by chance alone. An alternative choice to the 95% confidence interval that may be useful in the clinical laboratory setting is to use the 50% confidence interval, as this is the boundary where an apparent change moves from being more likely due to analytical error, to more likely a ‘real’ change, due to a pathological change in that patient. An advantage to using the 50% confidence interval is that it is very close to the SD value (RCV = 0.954 x SD).

2. Choice of Reporting Unit Interval

Following on from this analysis of degree of certainty that a change in result is due to pathological processes as opposed to an analytical artefact, comes a consideration of what is the appropriate width of reporting interval for a particular analyte.

It is apparent that for many assays we report analyte concentrations to too many significant figures.5 For example, even using the less stringent 50% CI criterion – that the apparent change in analyte concentration is more likely to be due to pathological change rather than be caused by an assay artefact – creatinine concentrations reported in 1 μmol/L increments assume an unwarranted imprecision, and we suggest that creatinine should only be reported in 5 μmol/L increments within the reference interval and probably at 10 μmol/L increments at higher concentrations.3 If one wants to report analyte concentrations with a greater degree of certainty that the change is ‘real’ and not simply a reflection of assay and other imprecision, then the reporting interval will need to be substantially wider.

Unfortunately many laboratory information systems do not allow rounding of results at the final reporting step, or only allow one level of rounding rather than differential rounding based on concentration. Laboratories should demand that their laboratory information systems have this capability.

3. The Problem of Using Sharply Defined Cutpoints

With better appreciation of the uncertainty associated with all laboratory measurements, the practice of using sharply defined interpretative cutpoints is clearly inappropriate. We accept that, for many illnesses except perhaps trauma, there is a gradation from healthy to clinically diseased. For example, type 2 diabetes usually represents the endpoint of a gradual decline in glucose tolerance rather than a sharp boundary between normal and diabetic. The intermediate ‘grey zone’ is recognised as impaired glucose tolerance or impaired fasting glycaemia.

Yet in other situations we apparently do not accept this situation. Take for example, the interpretation of Tetracosactrin or Synacthen stimulation tests. Laboratories often use criteria such as “a normal response is an increase of at least 200 nmol/L to a final concentration of 500 nmol/L”. Using these criteria, a change from 310 to 505 is abnormal whilst a change from 295 to 500 is normal. If the analytical SD for cortisol measurement at this concentration is 40 nmol/L, one can easily see that the results are effectively identical as the estimated range of the lower result would be 300 +/− 80 nmol/L (mean +/− 2 SD).

To deal with this from a UM perspective, one must consider two different comparisons: the 200 nmol/L serial change and the 500 nmol/L absolute cutoff. The choice of z value and confidence level to calculate the single result uncertainty and RCV has been discussed earlier. Results can pass one, both or neither criteria. Consideration of UM makes us aware of the unsatisfactory nature of the present approach. The profession needs to develop a more scientific approach to dynamic testing akin to that for glucose tolerance testing with differentiation of normal glucose tolerance from impaired fasting glycaemia/impaired glucose tolerance and these from frank diabetes. Studies are needed to examine the diagnostic and prognostic importance of each criterion separately and together.

Considering UM when interpreting results against fixed cutoffs and change values should lead to a more reasoned approach to interpretation, which also is more compatible with rule-governed computerised interpretative reporting.

In the meantime, it may be more realistic to report Synacthen stimulation results as “final concentration <400 nmol/L is abnormal, final concentration >600 nmol/L is a normal response to Synacthen stimulation”. Clear but unspoken is that concentrations between 400 and 600 nmol/L require clinical interpretation and that the laboratory alone cannot provide guidance.

4. What is the Validity of Applying Biological Variation (BV) Data (Collected in Healthy People) in Sick People?

The Laboratory Implementation Guide proposed that data on intra-individual BV should be used in the assessment of fitness for purpose. We are cautious of this proposal and below discuss our concerns regarding choice of appropriate BV data and practical implementation of this suggestion.

There is evidence that for some analytes and conditions, biological variability seen in healthy individuals mirrors that seen in sick individuals.69 Thus most BV data is collected in healthy subjects and then applied to unhealthy individuals. This is a moot point, especially in the setting of the seriously ill individual where entirely different regulatory mechanisms may be operating.

As an example, consider the regulation of plasma glucose in a healthy person. The plasma glucose concentration will be dependent upon diet (carbohydrate content, with up- and down-regulation of insulin receptors), insulin secretion and insulin resistance (in our society largely driven by obesity). When a person becomes seriously ill, these regulatory mechanisms will be substantially modified by excess secretion of cortisol which induces synthesis of gluconeogenic enzymes, with a resultant higher hepatic production of glucose. It is therefore unsurprising to note that median intra-individual BV of glycohaemoglobin is higher in diabetic patients with co-morbidities (9.8%) than in diabetic patients without coexisting medical problems (7.1%), despite no difference in median glycohaemoglobin between the two groups.10

A more recent example is the much higher intra-individual BV values for NT-proBNP seen in patients with stable chronic heart failure (median 35%) compared to healthy individuals (mean 9.1%).11,12 This example also reminds us that the characteristics of the population in which BV is assessed should match the population in which the assay will be used. Thus similarly, BV data on tumour markers used for monitoring should ideally be taken from diseased patients rather than healthy volunteers. Readers should be aware of this when assessing published BV data.

Choice of sampling interval will also affect measures of BV. Changing from monthly to three monthly sampling intervals in the glycohaemoglobin study mentioned above increased median intra-individual BV from 5.1 to 9.8% for patients with co-morbidities and 4.2 to 7.1% for patients without co-morbidities. Although tables of BV data can be useful,13,14 one may need to check the primary literature to ascertain the sampling interval used in the studies quoted. One should also appreciate that some individuals show larger BVs than the mean or median figures quoted. This variability is recognised for lipid specimens with the relative range model used to determine the number of specimens required,15 but is rarely considered for other analytes. It should also be considered when interpreting analytical goals based on BV data.14

Readers should also be aware of the arbitrary nature of the factors used in the definitions of optimum, desirable and minimum analytical goals used for deciding fitness for clinical use. There is little justification for the figures of 0.25, 0.50 and 0.75 and other arbitrary figures of 0.10, 0.15, 0.20 could just as well be used. Many analytes easily achieve the ‘optimal standard’ for acceptability and some key analytes (sodium, calcium) fail to reach the minimal standard. These observations suggest that this approach is insufficient as the sole guide to fitness for clinical use, a point acknowledged in the guideline.

Although the published guideline suggests that BV data not be included in UM assessments, some may argue that inclusion, even if based on healthy individuals, can reduce some of the over-interpretation of laboratory data by providing a robust UM figure for “ruling out” significant change. The imprecision of an assay is unique to a laboratory and laboratories can easily determine the analytical imprecision of each assay. Certainly laboratories should be able to tell referring clinicians if a change in results is due to analytical imprecision. Whatever one’s view, whether or not BV is included in the UM value available to users should be clearly stated.

Conclusions

We believe that exploring the UM is of great importance for clinical laboratories and their users. Information on UM should help reduce over-interpretation of minor changes in analyte concentrations by users ignorant of the inherent imprecision of the measurement. We have identified a number of issues where further work is needed and suggest that future ISO guidelines should offer definitive recommendations in these areas.

Footnotes

In November 2004, “Uncertainty of Measurement in Quantitative Medical Testing: A Laboratory Implementation Guide” was published in the Clinical Biochemist Reviews.1 In the Preface, comments were invited, and that is the purpose of this paper. Whilst most of the material in the document is logical and useful, there are elements to it that should be more widely discussed before it is adopted as an application document.

References

  • 1.White G, Farrance I. Uncertainty of measurement in quantitative medical testing: a laboratory implementation guide. Clin Biochem Rev. 2004;25:S1–S24. [PMC free article] [PubMed] [Google Scholar]
  • 2.International Standard ISO 15189. First edition 2003-02-15. Medical laboratories – particular requirements for quality and competence. Reference number ISO 15189; 2003(E).
  • 3.Badrick T, Wilson SR, Dimeski G, Hickman PE. Objective Determination of appropriate reporting intervals. Ann Clin Biochem. 2004;41:385–90. doi: 10.1258/0004563041731583. [DOI] [PubMed] [Google Scholar]
  • 4.Sterne JA, Egger M, Smith GD. Systematic reviews in health care: Investigating and dealing with publication and other biases in meta-analysis. BMJ. 2001;323:101–5. doi: 10.1136/bmj.323.7304.101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hawkins RC, Johnson RN. The significance of significant figures. Clin Chem. 1990;36:824. [PubMed] [Google Scholar]
  • 6.Trape J, Aliart ML. Reference change value for HbA1c in patients with type 2 diabetes mellitus. Clin Chem Lab Med. 2000;38:1283–7. doi: 10.1515/CCLM.2000.202. [DOI] [PubMed] [Google Scholar]
  • 7.Fraser CG. Age–related changes in laboratory test results. Clinical implications. Drugs Aging. 1993;3:246–57. doi: 10.2165/00002512-199303030-00006. [DOI] [PubMed] [Google Scholar]
  • 8.Holzel WG. Intra-individual variation of analytes in serum from patients with chronic liver diseases. Clin Chem. 1987;33:1133–6. [PubMed] [Google Scholar]
  • 9.Holzel WG. Intra-individual variation of some analytes in serum of patients with insulin-dependent diabetes mellitus. Clin Chem. 1987;33:57–61. [PubMed] [Google Scholar]
  • 10.Phillipou G, Phillips P. Intraindividual variation of glycohemoglobin: implications for interpretation and analytical goals. Clin Chem. 1993;39:2305–8. [PubMed] [Google Scholar]
  • 11.Melzi d'Eril G, Tagnochetti T, Nauti A, et al. Biological Variation of N-Terminal Pro-Brain Natriuretic Peptide in healthy individuals. Clin Chem. 2003;49:1554–5. doi: 10.1373/49.9.1554. [DOI] [PubMed] [Google Scholar]
  • 12.Bruins S, Fokkema MR, Romer JWP, et al. High intraindividual variation of B-Type Natriuretic Peptide (BNP) and Amino-Terminal proBNP in patients with stable chronic heart failure. Clin Chem. 2004;50:2052–8. doi: 10.1373/clinchem.2004.038752. [DOI] [PubMed] [Google Scholar]
  • 13.Westgard J. Desirable specifications for total error, imprecision and bias derived from biologic variation. 2000. URL: www.westgard.com/biodatabase1.htm. Accessed 28/3/05.
  • 14.Fraser C. Biological variation: from principles to practice. Washington: AACC Press, 2001.
  • 15.Cooper G, Smith S, Myers G, Sampson E, Magid E. Estimating and minimizing effects of biologic sources of variation by relative range when measuring the mean of serum lipids and lipoproteins. Clin Chem. 1994;40:227–32. [PubMed] [Google Scholar]

Articles from The Clinical biochemist. Reviews / Australian Association of Clinical Biochemists. are provided here courtesy of Australasian Association for Clinical Biochemistry and Laboratory Medicine

RESOURCES