Abstract
Orthopaedic research, and biomedical research in general, has made enormous strides to develop treatments for conditions long thought to be inevitable or untreatable; however, there is growing concern about the quality of published research. Considerable efforts have been made to improve overall research quality, integrity, and rigor, including meaningful proposals focused on transparency of reporting, appropriate use of statistics, and reporting of negative results. However, we believe that there is another key component to rigor and reproducibility that is not discussed sufficiently – analytical validation and quality control (QC). In this commentary, we discuss QC and method validation principles and practices that are systematically applied in the clinical laboratory setting to verify and monitor analytical performance of quantitative assays, and the utility of applying similar practices to biochemical assays in the orthopaedic research setting. This commentary includes: 1) recommendations for validation and QC practices, including examples of assay performance limitations uncovered by validation experiments performed in our laboratory, and 2) a description of a continuous QC program developed to monitor the ongoing performance of commonly used assays in our lab. We hope that this commentary and the examples presented here will be thought provoking and inspire further discussion and adaptation of analytical validation and QC procedures to advance our shared pursuit of high quality, rigorous, and reproducible orthopaedic research.
Keywords: Meniscus, Glycosaminoglycans, Cartilage, Collagen, DNA
Introduction
Orthopaedic research, and biomedical research in general, has made enormous strides to develop treatments for conditions long thought to be inevitable or untreatable; however, there is growing concern about the quality of published research. Reports in many fields have raised the alarm about limited reproducibility of results in biomedical research1; 2. Most researchers recognize that this is a legitimate concern, and considerable efforts have been made to improve overall research quality, integrity, and rigor, including meaningful proposals focused on transparency of reporting, appropriate use of statistics, and reporting of negative results3. All of these initiatives have the potential to improve the quality of research. However, we believe that there is another key component to rigor and reproducibility that is not discussed sufficiently – analytical validation and quality control (QC). While it may be taken for granted, or assumed that every laboratory has sufficiently validated their methods, achieving analytical quality is something that requires ongoing monitoring, analysis, and process improvement.
The need for ongoing analytical quality assessment has been recognized in the clinical laboratory setting for over 40 years4; 5, and QC and quality assurance (QA) practices in the clinical laboratory are a continuously evolving area of active investigation with hundreds of publications on best practices for QC and method validation appearing in clinical laboratory journals annually. Requirements for quality management practices and documentation were formalized in 1988 for all clinical laboratories in the United States by the Clinical Laboratory Improvement Act (CLIA)6. Extensive QC/QA practices are employed in all clinical laboratories and are a major part of accreditation, continuing education, and management activities.
QC and QA practices are far less common and not standardized in research laboratories. Research labs performing preclinical research for FDA approval of a therapeutic are required to follow Good Laboratory Practice (GLP), as defined in federal regulation 21 CFR Part 58; however, for any other preclinical or basic science research there is little in the way of guidance or standards for analytical quality management. Researchers receiving NIH funding are required to describe in their grant applications a plan for “authentication of key biological or chemical resources” but this requirement only applies to “resources,” such as chemicals and cell lines, without specifying validation of analytical methods or assays. This lack of focus on analytical validity of laboratory assays represents a significant area of opportunity for improving rigor and reproducibility in preclinical and basic science research.
More widespread application of well-established QC/QA and method validation practices7 in biomedical laboratory research could improve the quality of preclinical research. Therefore, we have performed simple method validation experiments for biochemical assays commonly used in our laboratory and developed a continuous QC program to monitor the performance of these assays. The examples presented below demonstrate the importance of performing such method validation to determine the analytical range and suitability of an assay for a given sample type or experimental design. We hope to raise awareness of the importance of analytical quality management practices and spur further discussion of best practices for ensuring accurate and rigorous results in biomedical research.
Method Validation
In clinical laboratory practice, even extensively characterized, FDA approved, commercially available assays are subject to in-house verification of performance before they can be used to report patient results. Furthermore, there is an acute awareness in clinical laboratory practice that introducing any novel components into an assay, including sample type or collection method, can introduce unexpected consequences and therefore requires validation. It is essential to verify that the assay performs according to the manufacturer’s specifications when utilized with different patient populations, sample types, environmental conditions, and/or staff members. In particular, for FDA approved tests, verification of precision, accuracy, reportable range, and reference interval is standard practice and required by CLIA. For in-house lab developed tests, consideration is also given to evaluating the possibility of interfering substances and sample stability concerns. Not all of these parameters apply to the research setting. However, research labs could save considerable time, money, and mental resources spent chasing erroneous results by applying similar quality management approaches, including validation for all assays and sample types regularly utilized by the lab.
Precision and Assay Range
Precision and assay range should be assessed by simple validation experiments. For example, we performed experiments to assess these parameters on assays commonly used in our laboratory, including the dimethyl methylene blue (DMMB) assay for sulfated glycosaminoglycans (sGAG), hydroxyproline (OHP) assay for collagen, and PicoGreen™ assay for DNA. Samples were prepared in experimental conditions designed to produce very high concentrations of the relevant analyte and assayed in serial dilution, spanning the assay range, to assess linearity and range on an experimentally representative sample. The upper limit of assay range is the limit of linearity. There are statistical methods for confirmation of the limit of linearity10, but visual assessment of the line of best fit is generally sufficient for research applications. However, determination of the lower assay limit requires more careful assessment. The lower limit of an assay can be determined using a few easily calculated metrics, known as the limit of detection (LOD) and limit of quantitation (LOQ)9. LOD, which identifies the lowest value that can be distinguished from zero, is calculated as follows:
LOD = mean blank value + [3.29*(standard deviation)]9.
The value 3.29*standard deviation (SD) is an approximation of the 95% confidence intervals around the zero/blank and a very low sample, and therefore represents the lowest concentration distinguishable from zero with 95% confidence9.
The DMMB assay used in our lab and many other orthopaedic research labs is adapted from Farndale et al.8, who described the assay as being linear down to 5 µg/mL8. The protocol that was used in our lab, which has been modified and handed down throughout the years in many orthopaedic research labs, included a standard curve ranging from 3.125 µg/mL to 200 µg/mL. However, a LOD calculation, based on replicate measurements of our zero standard (n=6), indicated a lower bound of 11.9 µg/mL. In the protocol as written, two standards fall below the calculated LOD (Figure 1A). Most concerning is that these standards, which are below the calculated LOD, lead the user to believe that measurements in this range are useable, but in reality these points are outside the detectable range of the assay. Indeed, in our dilution series of a high concentration experimental sample, measurements that fell in the range above the lowest standard but below the calculated LOD gave discordant results compared to measurements between the LOD and the highest standard (Figure 1B).
Figure 1:
(A) Representative standard curve for DMMB assay. Limit of detection (LOD) was determined to be 11.9 μg/mL. (B) Calculated concentration of sGAG (µg/mL; measured concentration*dilution factor) from a serial dilution of an unknown sample. Measurements below the LOD return spurious results (Mean ± SD, n=3).
Another important consideration in assay validation is the LOQ, which is defined as the lowest value that “can not only be reliably detected but at which some predefined goals for bias and imprecision are met”9. Therefore, LOQ is generally determined by assay precision, or the reproducibility of replicate measurements. In the clinical laboratory, the exact criteria for LOQ varies based on the clinical application of the assay. However, in the absence of clear clinical criteria to guide determination, a common definition for LOQ is the lowest concentration at which the assay imprecision is less than 20%9, as indicated by the percent coefficient of variation (% CV = SD/mean*100). As is common for most assays, we observed that the imprecision was inversely proportional to the signal for the DMMB assay (Figure 1B), and by calculating the percent CV for each point on the standard curve and experimental sample dilution series, we calculated a LOQ of approximately 20 µg/mL. It is ultimately up to the discretion of the researcher to determine the criteria used to define assay range, but a precision based LOQ may be the most appropriate lower bound for research use, as the statistical comparison of results from experimental groups is generally based on the assumption that variability between biological replicates is far greater than variability in technical replicates. It is important to note that these parameters are specific to the performing laboratory and may be affected by numerous variables, such as protocol, reagents, sample type or preparation, operator, measurement device, and diluent. Therefore, these values are not generalizable to other labs, and should be empirically determined by each lab in order to understand the performance characteristics of an assay for its given application.
Interfering Substances
An extensive workup for the presence of interfering substances may be beyond the scope of the limited validation that we are suggesting for a research-use assay. However, performing a dilution series on an experimentally generated sample provides an opportunity to detect the presence of interfering substances. When a high concentration sample is diluted in sample diluent, deviations from linearity within range of the linear standard curve indicate the presence of an interfering substance. For example, such a deviation was observed in our Picogreen assay validation. The standard curve, prepared in sample digestion buffer, was linear up to 2,000 ng/mL (Figure 2A). However, when a sample of digested meniscus tissue was prepared in serial dilution, it lost linearity around 1,600 ng/mL (Figure 2B). In this case, it is necessary to perform a minimum dilution for meniscus tissue digests in order to prevent inaccurate results. Importantly, these specific results are also not generalizable, as the interference effect is likely dependent on the amount and type of digested tissue. While these findings are not surprising, they highlight the importance of validating assay performance for a given sample type or experimental setup. A researcher who only assesses the linearity of the standard curve could easily report erroneous results if they have not validated the performance and range of the assay using a sample that is representative of experimental conditions.
Figure 2:
(A) Standard curve for Picogreen™ assay in papain, showing linearity up to 2000 ng/mL with the DNA standard. (B) Dilution series of papain digested meniscus tissue with unknown concentration shows loss of linearity at approximately 1600 ng/mL, indicating interference from the sample matrix (Mean ± SD, n=3).
Continuous Quality Control
In order to monitor assay performance and repeatability over time, a continuous QC program similar to what is employed in clinical laboratories is necessary. In our laboratory, we prepared large quantities of material representative of experimental sample types used in our lab, including papain digested meniscus tissue and supernatant from cultured tissue explants. These samples were prepared to have a high concentration of the analytes of interest. Concentrations of each analyte were measured, and two dilutions were prepared for each assay, one falling near the upper and one near the lower end of the range for each assay. These high and low concentration QC material samples were then frozen in hundreds of single-use aliquots. We recommend preparing enough aliquots for at least a year’s worth of assays, so that assay performance can be monitored over a significant period of time before preparing a new batch of QC material. Each time an assay is performed, a high and low QC material aliquot is thawed and run alongside the experimental samples. This allows assay performance and stability to be monitored over time, as well as providing a measure of overall assay variability. Performing continuous QC in this way alerts users to issues affecting assay performance, such as deterioration of an assay reagent or a problematic new lot or supplier of any assay components. Assay performance over time can be monitored using a Levey-Jennings plot11. In our laboratory, lab members enter their measured QC values into an excel sheet (see Supplemental data for template) embedded in a shared OneNote notebook to generate Levey-Jennings plots (Figure 3), using a method similar to that described by Sharma12. Mean and SD are calculated based on the first 20 values obtained, then performance is monitored over time by observing if subsequent measurements fall within the expected range based on the historical distribution; that is, 95% of subsequent measurements are expected to be within 2 SD of the mean, and > 99% should be within 3 SD of the historic mean. Values that deviate more than 3 SD from the historical mean, or multiple values greater than 2 SD from the historical mean (i.e., both QC levels for one run or one value each on consecutive runs) should alert the user that there may be an issue with assay performance based on the “Westgard rules”13; 14. Additionally, drift in assay performance should be visually apparent on the Levey-Jennings plot if numerous consecutive values are on the same side of the historical mean or trending in one direction. After the initial setup, this system requires very little effort on the part of the researcher performing an assay, as the QC samples are easily integrated into assay setup. This small effort is well worth it, as it quickly alerts the user to issues with assay performance, preventing wasted time and resources, and/or drawing of false conclusions based on spurious assay results.
Figure 3:
Example of Levey-Jennings control chart generated with a simple Excel template for monitoring of QC values. Dashed lines represent mean (green), mean ± 2 SD (yellow), and mean ± 3 SD (red).
Further Considerations
We believe that the above described suggestions represent a minimum set of validation/QC methods for quantitative assays in the research setting. These approaches do not impose a significant burden on the laboratory and greatly increase researchers’ awareness of the performance characteristics, limitations, and analytical validity of their assays. Verification of precision and assay range can be accomplished by a single quick experiment, involving a standard curve and dilution series of an unknown, high concentration experimental sample. While clinical laboratories use at least 40 samples run across multiple days to verify these parameters, a smaller number of samples are likely sufficient for the research setting. A duplicate run of the standard curve and a dilution series with at least n=6 technical replicates may be a reasonable minimum. If the assay will be used for longitudinal measurements, where results generated at multiple time points will be pooled or compared, then between-run precision should be assessed by repeat measurement of aliquots from experimentally representative samples (for example a low, medium, and high sample) and total variability should be determined over at least three separate runs on different days.
An additional consideration required for FDA approved standard clinical assay validation is accuracy. This is usually assessed by comparison of the method under investigation to another available assay method for the same analyte15 and/or by verification against a reference standard with a concentration that has been determined to a very high degree of certainty16. In clinical practice, standardization to a “traceable” international reference standard requires significant effort and is only standard practice for a limited number of analytes17. While this type of verification of accuracy may be desirable to produce results that can be objectively compared to results from the same assay in a different lab, it is not generally feasible in most cases in the research setting.
Another parameter that should be considered in method validation is sample stability. A number of sample aliquots should be subjected to relevant storage conditions, such as freeze-thaw cycles and various lengths of time at storage temperatures, and assayed to determine concordance of results. This small amount of time and resources spent to determine sample stability parameters may pay dividends by preventing usage of degraded samples and/or QC material and guiding appropriate experimental design to ensure quality samples are used for analysis.
Discussion
While the need to validate assays and assess the quality of data may seem obvious, the practices necessary to ensure analytical quality of results are not. As in the clinical lab, quality management in a research setting will not be achieved by a one-size-fits-all prescription; rather, it requires knowledge, awareness, and ongoing process improvement. The example data presented here are not generalizable to other research labs. Instead, the principles and tools discussed in this commentary to evaluate performance for each assay and sample type can be applied in all labs to establish a culture and practice of analytical quality control.
As with other improvements of rigor and reproducibility, the greatest question is how to promote implementation and sustained practice of analytical quality management. Analytical methods and experimental setups are even more varied in the research lab than in the clinical laboratory setting, and even in the clinical lab CLIA does not mandate specific quantitative measures of quality management. Thus, it is far more challenging to prescribe a formula for appropriate validation and QC in the research lab setting. Therefore, professional judgement by individual researchers is necessary to determine the validation practices necessary for each assay. Education on validation and QC practices should be a priority, with an emphasis placed on the fact that these practices will save time and resources from being wasted by running multiple assays with degraded reagents or following up on spurious results. In our laboratory, the availability of consistent QC material and a QC data monitoring system has already been helpful, as COVID-related supply disruptions necessitated changes in reagent suppliers, allowing immediate identification of the effects of new reagents on assay performance.
We hope that this commentary and the examples reported here will be thought provoking and provide useful illustrations of the utility of analytical validation and QC for biochemical assays. We aim to inspire further discussion and awareness of the need for analytical validation and quality control to advance our shared pursuit of high quality, rigorous, and reproducible orthopaedic research.
Supplementary Material
Summary Recommendations.
-
Assay validation recommendations:
-
Prepare standard curve and dilution series of experimentally representative sample (“unknown”) spanning assay range (n=6 technical replicates for each standard and unknown).
Assess upper limit of linearity based on unknown sample dilution.
Calculate LOD from zero standard (Meanblank + 3.29*SDblank).
Calculate %CV for each dilution of the unknown. Note variability in precision across assay range.
Determine LOQ (lowest concentration with %CV < 20%).
Note differences in assay performance between standards (pure reference material + diluent) and experimentally representative samples (complex sample with potential interfering substances).
Perform validation for every sample type/assay combination, and when changes are made in assay formulation or sample composition.
-
-
QC recommendations:
Prepare pools of experimentally representative sample material near the upper and lower ends of assay range.
Store individual aliquots (enough for > 1 year of assay use or as much as practical) at ≤ −70°C.
Perform measurement of high and low QC material with experimental samples every time assay is performed.
Calculate mean and SD of first 20 assay runs. Monitor assay performance over time by plotting results on Levey-Jennings plot.
-
Investigate QC failures:
One measurement >3 SD or multiple measurements >2 SD from mean, or trending measurements (multiple consecutive measurements on one side of mean or increasing in one direction) indicate assay performance issues and should prompt troubleshooting of assay components, preparation, or measurement system.
References
- 1.Begley CG, Ellis LM. 2012. Raise standards for preclinical cancer research. Nature 483:531–533. [DOI] [PubMed] [Google Scholar]
- 2.Prinz F, Schlange T, Asadullah K. 2011. Believe it or not: how much can we rely on published data on potential drug targets? Nature Reviews Drug Discovery 10:712–712. [DOI] [PubMed] [Google Scholar]
- 3.Casadevall A, Ellis LM, Davies EW, et al. 2016. A Framework for Improving the Quality of Research in the Biological Sciences. mBio 7:e01256–01216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Howanitz PJ, Howanitz JH. 1983. Quality control for the clinical laboratory. Clinics in laboratory medicine 3:541–551. [PubMed] [Google Scholar]
- 5.Annino JS. 1978. What Does Laboratory “Quality Control” Really Control? New England Journal of Medicine 299:1130–1131. [DOI] [PubMed] [Google Scholar]
- 6.Ehrmeyer SS, Laessig RH. 2004. Has compliance with CLIA requirements really improved quality in US clinical laboratories? Clinica Chimica Acta 346:37–43. [DOI] [PubMed] [Google Scholar]
- 7.Linnet K, Moons KGM, Boyd JC. 2023. Statistical methodologies in laboratory medicine: Analytical and Clinical Evaluation of Laboratory Tests. Tietz Textbook of Laboratory Medicine, Seventh Edition. St. Louis, MO 63043: Elsevier, Inc. [Google Scholar]
- 8.Farndale RW, Buttle DJ, Barrett AJ. 1986. Improved quantitation and discrimination of sulphated glycosaminoglycans by use of dimethylmethylene blue. Biochimica et biophysica acta 883:173–177. [DOI] [PubMed] [Google Scholar]
- 9.Armbruster DA, Pry T. 2008. Limit of blank, limit of detection and limit of quantitation. Clin Biochem Rev 29 Suppl 1:S49–S52. [PMC free article] [PubMed] [Google Scholar]
- 10.Jhang JS, Chang C-C, Fink DJ, et al. 2004. Evaluation of Linearity in the Clinical Laboratory. Archives of Pathology & Laboratory Medicine 128:44–48. [DOI] [PubMed] [Google Scholar]
- 11.Westgard JO, Barry PL, Hunt MR, et al. 1981. A multi-rule Shewhart chart for quality control in clinical chemistry. Clinical Chemistry 27:493–501. [PubMed] [Google Scholar]
- 12.Sharma D 2011. Use of the microsoft excel for automated plotting of Levey Jennings charts. Indian Journal of Medical Microbiology 29:448–449. [DOI] [PubMed] [Google Scholar]
- 13.Carroll TA, Pinnick HA, Carroll WE. 2003. Probability and the Westgard Rules. Annals of clinical and laboratory science 33:113–114. [PubMed] [Google Scholar]
- 14.2019. “Westgard Rules” and Multirules https://www.westgard.com/mltirule.htm. Last accessed 11/14/22.
- 15.Pum J 2019. A practical guide to validation and verification of analytical methods in the clinical laboratory. Advances in clinical chemistry 90:215–281. [DOI] [PubMed] [Google Scholar]
- 16.Vesper HW, Thienpont LM. 2009. Traceability in laboratory medicine. Clin Chem 55:1067–1075. [DOI] [PubMed] [Google Scholar]
- 17.Vesper HW, Myers GL, Miller WG. 2016. Current practices and challenges in the standardization and harmonization of clinical laboratory tests. The American journal of clinical nutrition 104 Suppl 3:907s–912s. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.