Abstract
All results in laboratory medicine are compared to some reference for interpretation. This reference may be a previous result from the same patient, a reference population – either healthy or diseased, or both – or a decision limit recommended by an expert group. The aim for the medical laboratory is to improve the signal-to-noise ratio by increasing the signal or reducing the noise. This presentation deals with the more general tools for reduction of the noise component, and focuses on biological within-subject variation, reference intervals and decision models.
Regarding biological within-subject variation, the estimation of reference change value (RCV) as a yardstick for judging measured differences within the patient over time is an important tool. Here, only type 1 errors are usually applied, but type 2 errors should also be taken into consideration. Moreover, variance homogeneity is assumed for the application of RCV, but this assumption is not always fulfilled, and erroneous interpretations may be introduced. A tool for comparison of different and more complicated algorithms applied to serial measurements is computer simulation (e.g. on data from tumour markers).
In order to reduce the noise component from reference intervals, partitioning according to relevant subgroups is a tool, and useful criteria for judging whether subgroups should be combined are reported. Geographical and racial differences may cause different reference distributions (e.g. plasma proteins), but it has been possible to establish common reference intervals for 25 common components in Caucasians in the five Nordic countries. Transformation of data and presentation of accumulated ranked values in rankit plots where Gaussian (or log-Gaussian) distributions show up as straight lines is a valuable tool for interpretation of the distributions and comparison of subgroups. In this way it is often possible to isolate a low-risk group which fits a log-Gaussian distribution. In case of thyroid autoantibodies the distributions look biphasic, even after all possible rule-out criteria have been exhausted, but a composite model makes it possible to extract a reference population from the mixture.
The classical decision model is bimodal, reflecting an assumption of two independent but overlapping distributions, with a clear but unknown prevalence for the disease. When a decision limit (cut-off point) is applied, the percentages of false positive and false negative results will be determined, but the underlying prevalence is unchanged. In contrast to this, the unimodal distributions cover a continuum of probabilities for a certain disease (risk) and the decision limit is arbitrarily chosen. Thus, the disease or risk is defined by the measured component. Consequently, the decision limit directly defines the prevalence, and this limit can be different over time and geography as has been the case for cholesterol or glucose.
Introduction
Interpretation of patients’ laboratory data varies from clinician to clinician and covers all levels from the simplest, where each result is compared to some decision limit, to very advanced computer aided interpretations according to complicated algorithms.
Simple interpretations, however, are dominant for the majority of data produced in laboratory medicine, and the primary objective for the profession is to create the background for the best interpretation by optimising the signal-to-noise ratio for this. An example of such an optimisation is for the diagnosis of myocardial infarction, introducing components with increasing specificity for the purpose, starting with plasma aspartate amino transferase and improving the diagnostic accuracy by introducing, first plasma creatine kinase, and then refining the isoenzymes (isoenzyme B, isoenzyme MB), ending with the most specific components plasma troponin I and troponin T. This type of optimisation is essential for the improvement of diagnostics and knowledge about the best treatment of patients, but it takes a lot of manpower and other resources to produce these investigations.
Other tools to improve the interpretation of the laboratory data are general in nature and related to analyses of the patient results, based on more detailed understanding of the biological and clinical variability and elucidation of the basic data. These tools mainly focus on reduction of the ‘noise’ element of the signal-to-noise ratio by use of other information than the laboratory data and by careful evaluation of biological and clinical distributions of patient results. A well known method to reduce some of the noise is to use factors such as age, gender, race and to reduce the controllable factors (preparation of patient, standardisation of sampling, reduction of analytical noise), whereas the non controllable factors are the patient’s biological/diseased set point and within-subject variation.
In the simple interpretation of laboratory data, all results are compared to some reference:
- one or more previous results from the same patient
- a reference group (healthy population, with a certain disease or both)
- a decision limit (defined by expert committee or by evaluation of data)
The purpose of the present paper, is to describe a series of such tools for optimisation of signal-to-noise ratios for
- biological within-subject variation
- reference intervals and
- decision models
Biological Within-Subject Variation
Over time, the concentration of a component is changing in the individual person/patient, and during steady state conditions the variations are assumed to be Gaussian distributed about a biological set point. The biological set point (estimated by the arithmetic mean value, α) is characteristic of the person/patient and it is assumed constant for a certain period – often years. The biological within-subject variability is described by the standard deviation, σ, as calculated from the measurements around the set point. In order to facilitate comparison between individuals and between components, the coefficient of variation, CVwithin-subject %, is calculated from σ/μ*100. The set points are distributed as well, which will be described in more detail under reference intervals.
Callum Fraser has described the many aspects of biological variability in his book on ‘biological variation’1 with very illuminating examples and a long list of CV values for biological within- and between-subject variations from Carmen Ricós et al is published on Westgard’s website.2
Reference Change Value
The term RCV was introduced by Harris & Yasaka3 in order to give a statistical estimate for a measured difference between two measurements in the same patient to be considered significant. RCV = z*2½* CVwithin-subject, where z is the standardised deviation from mean for a certain probability, e.g. 1.96 for p = 5% (two tailed), and 2½ refers to two measurements with the same variation CVwithin-subject. The RCV is thus also known as 2.77* CVwithin-subject.
In the use of RCV, only the statistical type 1 error is taken into consideration (p= 5% ~ 2α), assuming steady state. The type 2 error, however, is β = 50% for a change equal to RCV, and thus the power is (1-β) = 50%. The α and β can be changed, but where the α determines the z, and thereby the value of RCV, the power is related to a change of the size zα + zβ which must be defined as the change [(zα + zβ)*2½* CVwithin-subject] within the individual, which can be detected with the desired power.4
Index of individuality is an estimate of the ratio between CVwithin-subject and CVbetween-subject introduced by Harris in order to validate the relative usefulness of RCV and reference intervals.5 These applications have further been investigated by Iglesias et al. in relation to the power functions and the relationship between decision limits (cut-off points) and the power functions of RCV.6
In practice, the variances - corresponding to CVwithin-subject, which is calculated as the square root of the mean of variances – are considered homogeneous, which allows for the use of this mean for all individuals. This homogeneity of variances is seldom tested, but homogeneity may be achieved in some situations,7 whereas, considerable deviations are documented in other cases.8
Computer Simulations of Tumour Markers
When more complicated algorithms are needed in follow up of cancer patients after surgery and chemotherapy monitoring, simulation by computer using variations in concentrations of tumour markers can be used. The steady state is simulated by the use of randomly Gaussian distributed values and adding an exponential function of the tumour marker simulating the recurrence of the tumour.9
The simulations can be repeated 100 times for each assumed situation and the percentage of false positive signals and of delay of true positive signals can be estimated. Different algorithms can easily be compared using such computer simulations and the best outcome for the clinical situation chosen. This type of investigation needs reliable data for the applied parameters and when such data are obtained, the simulation method can substitute for large clinical investigations and also save money by avoiding considerable analytical work.10
Reference Intervals
Reference intervals are usually estimated from only one specimen from each of the reference individuals, and thus consist of elements from both CVwithin-subject and CVbetween-subject variations. As the index of individuality for many components is low, the distribution of set points becomes more decisive, and the distributions of reference values is seldom Gaussian, but sometimes log-Gaussian, or subgroups are distributed log-Gaussian. Consequently, non parametric statistical methods are often used to estimate reference limits, but the frequent use of this statistic leads to broader and less informative reference intervals as discussed below.
Reference limits from reference intervals are often used as decision limits when no recommendations for judgement of laboratory data are available. This habit may come from the increasing volume of laboratory data and the escalating use of such data in wellness testing, but this leads to ‘repeated testing’ (mass significance) with increasing probability of measurements outside these reference limits for each test performed.11 Therefore, reference limits should not be used as decision limits, unless it is concluded from investigations or decided as a consensus. However, as long as these investigations are lacking, we have to live with this (mis)use.
Criteria for Partitioning of Reference Values
In comparison of reference intervals from different groups or subgroups, it is important to have criteria for when to combine two groups as one single reference group and when to decide for partitioning. The original methods for these decisions were based mainly on statistical t or U tests for comparison of mean values and F tests for comparison of variances, but these tests do not focus on the reference limits and the dependency of sample size makes them unsuitable for the purpose. Lahti et al. have, however, based the decisions on the percentage of each subgroup with values outside the common reference limits.12,13 The method is simply to subtract the two upper limits and divide with the smallest standard deviation – and similarly for the lower limits. If one of these test ratios exceeds 0.75, then separate reference intervals should be used. If both test ratios are below 0.25, the two groups should be combined to one single reference interval. If the test ratios are between 0.25 and 0.75, other information should be taken into consideration, such as data from the literature, practicability, importance of decisions, etc. For non parametric distributions, the percentage outside the common reference limits for each subgroup should be between 0.9 and 4.1% for combining the two intervals.
Racial and Environmental Variability
Reference intervals may depend not only on age and gender, but also on race and environment, as illustrated for plasma proteins by Ichihara et al. in six Asian cities (Tokyo, Seoul, Hong Kong, Taipei, Kuala Lumpur and Shanghai), where they found considerable differences between the populations.14 The investigated reference groups were both geographically and racially different, so it was not possible from the investigation to see whether race or location was the dominating factor.
In another investigation, Johnson et al. compared plasma proteins in adult men of Caucasian and Asian Indian origin (second generation in England) living in Leeds, England.15 The most extreme differences were for Orosomucoid (α-acid glycoprotein) where the Caucasians had the highest concentrations (high test ratio = 1.7) and α1-antitrypsin, where the distributions were mainly overlapping, but with an overweight of low values in the Caucasians due to the genotypes MS and MZ, both at lower concentrations (low test ratio = 2.5). Here the geographical factor is eliminated, but the genetic factors may be superimposed by differences in diet in the two groups.
Caucasians within a geographical area with comparable lifestyle can share common reference intervals for common analytes. In a Nordic project on common reference intervals for 25 common clinical chemical analytes, samples from more than 3000 Caucasian reference individuals from the five Nordic countries were investigated.16,17 Partitioning according to gender and age was based on Lahti’s criteria and all five countries could share the common reference intervals – but dependent on each laboratory’s ability to obtain control results within an acceptable bias of ± 0.375*CVpopulation.
Rankit Transformation
When details of distributions of subgroups are wanted, rankit transformation of the Gaussian (or log-Gaussian) distributions facilitates interpretations of the homogeneity of the groups.18 By accumulation of the distribution of the bell shaped Gaussian distribution from minus infinity the area becomes a function of the z value, e.g. z = −1.96 gives the area 2½%, and z = 0 gives 50%. This accumulated curve is S shaped, and by stretching the ordinate to minus and plus infinity balanced by the z value, the S shaped curve becomes a straight line in the transformed rankit plot. Comparably, if an accumulated distribution of ranked reference values shows up as a straight line in the rankit plot, the distribution can be considered Gaussian and if the abscissa is logarithmic, log-Gaussian. The advantage is that it is easy to distinguish several distributions and, if the accumulated points do not fit a straight line, then to see where the deviation is located.
Low Risk Populations
When we investigated the consequences of introducing new criteria for diagnosis of diabetes according to the ‘American Diabetes Association’ in the late 1990s, we found that there had been no publications on distributions of plasma glucose in healthy individuals for more than twenty years. Consequently, we began a project where 2100 individuals were selected randomly according to the local Personal Identification Register. Seven hundred and fifty five individuals volunteered and fasting plasma glucose was measured on them. However, after ruling out individuals with risk factors (first degree relatives with diabetes, history of hypertension, cardiac and vascular diseases) only a smaller group of 424 low risk individuals were left. The fasting plasma glucose concentrations from this low risk group showed up as a straight line in the log-rankit plot in accordance with a log-Gaussian distribution.19 Subgroups according to gender or age or body mass index were also log-Gaussian and the lines were parallel in the rankit plot.
In another project on reference intervals for plasma TSH, a low risk group was identified based on the guidelines from the National Academy of Clinical Biochemistry (NACB). By excluding all individuals with increased concentrations of autoantibodies against thyroid peroxidase (TPOAb), thyroglobulin (TGAb) and TSH receptor (TRAb), the distributions of serum TSH were log-Gaussian for the four subgroups according to gender and the age of 40 years. The test ratios according to Lahti et al.12 were 0.31 and 0.15 for the most extreme groups: men below 40 and women above 40.20 This means that there is no reason to retain different reference intervals for serum TSH, but the autoantibodies should be determined.
For autoantibodies, however, the improved sensitivity of analytical methods introduces a problem for TPOAb and TGAb as all healthy individuals have measurable concentrations, even when all information according to NACB and other tools are exhausted. The accumulated curve in a rankit plot is clearly biphasic, and if it is considered a composite distribution, the odd curve can be explained by the combination of two distributions with a certain prevalence of the ‘high’ group. Thus two subgroups, a ‘high’ and a ‘low’ are hidden in the composite distribution, and the ‘low’ has the same mean and standard deviation (in logarithmic units) for all four subgroups according to gender and age within the Lathi criteria (unpublished data).
Decision Models
Bimodal Distributions
The classical model for classification of patients is based on the assumption of two different groups: a healthy (or without the disease under consideration) and a diseased (or pre-diseased), where the decision limit (cut-off point) is decided from some optimisation analysis of the percentages of false positive (FP) and false negative (FN). The characteristic assumption is that the prevalence is independent of the decision limit, whose location only changes the values of FP and FN.
Unimodal Distributions
In contrast to the bimodal distributions, the unimodal distributions have no clear biological or medical difference between healthy and diseased and the decision about state of health is based on a single (or two) measurements of a certain component using a more or less arbitrarily chosen cut-off.21 The unimodal concept is often used in relation to risk, but the decision limits are also used both for diagnosis and for decisions about treatment.
An example is cholesterol, where a decision limit of 6.2 mmol/L has often been used for initiating treatment. According to a report of the distribution of cholesterol values in a Scottish population, 55% of the men over 45 years then need treatment, or if the result is given in the same terms as for bimodal distributions, the prevalence is 55%.22 The value 6.2 mmol/L is an arbitrary figure; if a value of 7.0 mmol/L is chosen instead, the prevalence would fall to 25%.
Another unimodal interpretation is the diagnosis of diabetes according to the American Diabetes Association, where one measurement of fasting plasma glucose above 7.0 mmol/L together with other risk factors – or two measurements above 7.0 mmol/L - confirm the diagnosis. Here it is possible to plot the risk/probability of being diagnosed as diabetic as a function of the patient’s set point. When the CVwithin-subject = 5.7%2 the probability curve describes an S shaped curve with the probability of 50% for a single measurement for a patient with the set point = 7.0 mmol/L. When two measurements are used the combined probability changes to 25% for the same set point.23 Choosing a set point of 6.8 mmol/L, the probabilities are 31% and 10% respectively.
Discussion
Interpretation of the increasing volume of laboratory data is becoming more and more difficult. Even computerised delta checks and marking of ‘abnormal’ results, i.e. values outside the reference limits, cannot compensate for the many ‘false’ ‘abnormal’ results due to repeated testing (mass significance). Combining the results from several components in patterns may help, but needs computer programs designed for all the specific clinical situations.
In this presentation, I have tried to describe a number of problems which need to be solved, and have provided some tools to optimise signal-to-noise ratios, mainly by reducing the noise elements.
In monitoring of patients, the relevant within-subject variation should be used for interpretation of reference change values, and further the type 2 errors should also be considered. The non homogeneity of within-subject variances is a problem, which should be further investigated. A tool for this could be computer simulations based on firm biological and clinical data, for selecting the best algorithms for monitoring, (e.g. tumour markers in cancer patients) sparing the patients and saving resources and money by letting the computers perform the repetitious work.
For reference intervals, criteria for combining or partitioning subgroups are essential, as race and geography may result in variations of different sizes in order to keep the reference interval for subgroups at a reasonable level. Common reference intervals for Caucasians in the Nordic countries are possible for the most common components in clinical chemistry.
The rankit plot is a tool for investigation of distributions and subgroups, and to disclose lack of homogeneity, and to judge the ‘low-risk’ groups. It is also a tool for de-convolution of composite distributions where all other tools for sub grouping have been exhausted.
Finally, the decision models are increasingly changing from the classical bimodal with a clear prevalence of the diseased group to the unimodal with gradually increasing risk for increasing (or decreasing) concentrations of the component for which the decisions are made based on an arbitrarily chosen cut-off point.
Conclusions
The signal-to-noise ratios can be improved for
reference change values by using the relevant within-subject variation, considering the type 2 errors as well
reference intervals by using reliable criteria for partitioning and rankit plots for validation of distributions
decision models by using the relevant model for the purpose.
Footnotes
Competing interests: None declared
References
- 1.Fraser CG. Biological variation: from principles to practice. 2001 AACC Press 2101 L Street, NW, Suite 202, Washington, DC 20037-1558, USA.
- 2.Ricós C, García-LarioJV, Alvarez V, et al. Biological variation database, and quality specifications for imprecision, bias and total error (desirable and minimum). The 2004 update. http://www.westgard.com/guest26.htm Accessed 14-10-2005.
- 3.Harris EK, Yasaka T. On the calculation of a “reference change” or comparing two consecutive measurements. Clin Chem. 1983;29:25–30. [PubMed] [Google Scholar]
- 4.Iglesias Canadell N, Hyltoft Petersen P, Jensen E, Ricós C, Jørgensen PE. Reference change value and power functions. Clin Chem Lab Med. 2004;42:415–22. doi: 10.1515/CCLM.2004.073. [DOI] [PubMed] [Google Scholar]
- 5.Harris EK. Effects of intra- and interindividual variation on the appropriate use of normal ranges. Clin Chem. 1974;20:1535–42. [PubMed] [Google Scholar]
- 6.Iglesias Canadell N, Hyltoft Petersen P, Ricos C. Power functions of the reference change value in relation to cutoff points, reference intervals and index of individuality. Clin Chem Lab Med. 2005;43:441–8. doi: 10.1515/CCLM.2005.078. [DOI] [PubMed] [Google Scholar]
- 7.Lytken Larsen M, Fraser CG, Hyltoft Petersen P. A comparison of analytical goals for haemoglobin A1c derived using different strategies. Ann Clin Biochem. 1991;28:272–8. doi: 10.1177/000456329102800313. [DOI] [PubMed] [Google Scholar]
- 8.Tuxen MK, Söletormos G, Hyltoft Petersen P, Dombernowsky P. Interpretation of sequential measurements of cancer antigen 125 (CA 125), Carcinoembryonic antigen (CEA), and tissue polypeptide antigen (TPA) based on analytical imprecision and biological variation in the monitoring of ovarian cancer. Clin Chem Lab Med. 2001;39:531–8. doi: 10.1515/CCLM.2001.089. [DOI] [PubMed] [Google Scholar]
- 9.Sölétormos G, Hyltoft Petersen P, Dombernowsky P. Progression criteria for cancer antigen 15.3 and carcinoembryonic antigen in metastatic breast cancer compared to computer simulation of marker data. Clin Chem. 2000;46:939–49. [PubMed] [Google Scholar]
- 10.Sölétormos G, Hyltoft Petersen P, Dombernowsky P. Assessment of CA 15.3, CEA and TPA concentrations during monitoring of breast cancer. Clin Chem Lab Med. 2000;38:453–63. doi: 10.1515/CCLM.2000.066. [DOI] [PubMed] [Google Scholar]
- 11.Jørgensen LGM, Brandslund I, Hyltoft Petersen P. Should we maintain the 95 percent reference intervals in the era of wellness testing? A concept paper. Clin Chem Lab Med. 2004;42:747–51. doi: 10.1515/CCLM.2004.126. [DOI] [PubMed] [Google Scholar]
- 12.Lahti A, Hyltoft Petersen P, Boyd JC, Fraser CG, Jørgensen N. Objective criteria for partitioning Gaussian-distributed reference values into subgroups. Clin Chem. 2002;48:338–52. [PubMed] [Google Scholar]
- 13.Lahti A, Hyltoft Petersen P, Boyd J. Impact of subgroup prevalence on partitioning of Gaussian-distributed reference values. Clin Chem. 2002;48:1987–99. [PubMed] [Google Scholar]
- 14.Ichihara K, Itoh Y, Min W-K, et al. Diagnostic and epidemiological implications of regional differences in serum concentrations of proteins observed in six Asian cities. Clin Chem Lab Med. 2004;42:800–9. doi: 10.1515/CCLM.2004.133. [DOI] [PubMed] [Google Scholar]
- 15.Johnson AM, Hyltoft Petersen P, Whicher JT, Carlström A, MacLennan S on behalf of the International Federation of Clinical Chemistry and Laboratory Medicine, Committee on Plasma Proteins. Reference intervals for serum proteins: similarities and differences between adult Caucasian and Asian Indian males in Yorkshire, UK. Clin Chem Lab Med. 2004;42:792–9. doi: 10.1515/CCLM.2004.132. [DOI] [PubMed] [Google Scholar]
- 16.Rustad P. Felding(eds)Transnationalbiologicalreference intervals. . Scand J Clin Lab Invest. 2004;64:263–441. [Google Scholar]
- 17.Rustad: www.furst.no/norip/ Accessed 14-10-2005.
- 18.Hyltoft Petersen P, Blaabjerg O, Andersen M, Jørgensen LGM, Schousboe K, Jensen E. Graphical interpretation of confidence curves in rankit plots. Clin Chem Lab Med. 2004;42:715–24. doi: 10.1515/CCLM.2004.122. [DOI] [PubMed] [Google Scholar]
- 19.Jørgensen LGM, Stahl M, Brandslund I, Hyltoft Petersen P, Borch-Johnsen K, de Fine Olivarius N. Plasma glucose reference interval in a low-risk population 2. Impact of the new WHO and ADA recommendations on diagnosis of diabetes mellitus. Scand J Clin Lab Invest. 2001;61:181–90. doi: 10.1080/003655101300133621. [DOI] [PubMed] [Google Scholar]
- 20.Jensen E, Hyltoft Petersen P, Blaabjerg O, et al. Establishment of a serum thyroid stimulating hormone (TSH) reference interval in healthy adults. The importance of environmental factors, including thyroid antibodies. Clin Chem Lab Med. 2004;42:824–32. doi: 10.1515/CCLM.2004.136. [DOI] [PubMed] [Google Scholar]
- 21.Hyltoft Petersen P, Hørder M. Ways of assessing quality goals for diagnostic tests in clinical situations. Arch Pathol Lab Med. 1988;112:435–43. [PubMed] [Google Scholar]
- 22.Tunstall-Pedoe H. Who is for cholesterol testing? Br Med J. 1989;298:1593–4. doi: 10.1136/bmj.298.6688.1593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Jørgensen LGM, Hyltoft Petersen P, Brandslund I. The impact of variability in the risk of disease exemplified by diagnosing diabetes mellitus based on ADA and WHO criteria as gold standard. Int J Risk Assessment and Management. 2005;5:358–73. [Google Scholar]
