Skip to main content
The Journal of Clinical Endocrinology and Metabolism logoLink to The Journal of Clinical Endocrinology and Metabolism
. 2015 Apr 7;100(6):2165–2170. doi: 10.1210/jc.2015-1040

Measuring Estrogen Exposure and Metabolism: Workshop Recommendations on Clinical Issues

L M Demers 1,, S E Hankinson 1, S Haymond 1, T Key 1, W Rosner 1, R J Santen 1, F Z Stanczyk 1, H W Vesper 1, R G Ziegler 1
PMCID: PMC5393513  PMID: 25850026

The Pooks Hill workshop on “Measuring Estrogen Exposure and Metabolism” addressed key clinical and research issues reflected by the title. This meeting brought together 25 experts as speakers and an additional 75 attendees. Principal sponsors for the meeting were The Endocrine Society, the American Association for Clinical Chemistry (AACC), and the Partnership for Accurate Testing of Hormones (PATH), with meeting coordination by Penn State Continuing Education. After the 25 formal presentations and discussion, time was devoted to obtaining opinions on clinical issues related to the measurement of the estrogens and their metabolites. The discussion regarding research issues and the opinions expressed will be formulated and published separately. The clinical recommendations of the workshop participants are highlighted below, and the following paragraphs summarize the conclusions in more detail, as compiled by the workshop steering committee, the authors of this manuscript. A special issue of the journal Steroids contains the manuscripts summarizing most of the individual presentations.

Recommendations

  • • 

    Stakeholders should acknowledge the importance of standardization and harmonization and thus support the efforts of the Centers for Disease Control and Prevention (CDC) or alternative strategies to arrive at measurement methods for estrogens and their metabolites (see Glossary for definitions) that are accuracy-based and reliable.

  • • 

    Both immunoassays and mass spectrometry-based assays for estrogens and their metabolites (see Glossary for definitions) would be acceptable if they are accurate and reliable and meet performance criteria suitable for their intended use.

  • • 

    A study to evaluate the different approaches for validation of estrogen assay accuracy should be conducted to assess the general applicability and to obtain further information about the strengths and limitations of each approach.

  • • 

    Reference ranges for estradiol need to be established in postmenopausal women and in prepubertal and pubertal boys and girls (age and Tanner stage-specific within each gender) using validated, standardized assays.

  • • 

    The Standards of Reporting Diagnostic Accuracy (STARD) criteria should be updated to require data on a clinical assay's accuracy (trueness) or bias compared to an accepted reference method.

  • • 

    It was recommended that journals, at this time, require a statement regarding accuracy assessment for estrogen and estrogen metabolite assays with a long-term goal of requiring measurements using accurate assays.

Importance of Standardization

Sex steroid hormone measurements are widely employed in patient care using either immunoassay or mass spectrometry-based methods. Problems associated with the accuracy and precision of these measurements, particularly with immunoassays in select populations, have been reported for several years (1, 2). The recognition of the negative impact this has had on research and clinical management led to recommendations not to use certain assays for clinical care and scientific publication (3, 4). Efforts are under way to address these limitations and the need for improved hormone tests to enable better patient care. Emanating from a meeting of experts sharing a broad perspective and interest in T standardization, great progress has been made toward standardizing T assays (4). Similarly, there is currently a lack of adequate performance and standardization among assays for estrogens and their metabolites (3). This impacts clinical decision-making and patient care, particularly at low estrogen concentrations, because measurements by different methods may not yield the same numeric result. Reference ranges for estrogens also vary based on the specific assay used and may not be available for relevant populations, which, due to assay inadequacies, further complicates appropriate clinical care.

Once methods are standardized or harmonized, reference ranges will need to be established in postmenopausal women as well as prepubertal and pubertal boys and girls. These ranges should be age- and puberty stage-specific, within each gender. Clinical laboratories performing estrogen measurements have often established such ranges, but these differ across labs and do not translate across methods.

Meeting attendees generally agreed that stakeholders should acknowledge the importance of standardization and harmonization for clinical laboratory measurement procedures and support the efforts of the CDC or other alternative strategies described below to arrive at estrogen methods that are accuracy-based and reliable. Standardization is possible for analytes for which primary (pure substance) reference materials and/or reference measurement procedures are available. A roadmap for harmonization of laboratory measurement procedures was published in 2011 and describes approaches to harmonize results for analytes that do not have primary reference materials or reference method procedures (5). The CDC Hormone Standardization (HoSt) Program consists of three steps: developing a reference system, calibrating individual assays to a single reference material, and certifying end-user test performance as meeting requirements for accuracy and precision over time. The program is designed for assay manufacturers, proficiency testing providers, and those employing laboratory-developed tests. It offers technical assistance to those enrolled to meet defined criteria for accuracy and precision. The standardization process for T is in place, and that for estradiol is in progress. A key point of this program is the use of single donor human matrix-matched (ie, serum or plasma) materials for commutability. The use of a mass spectrometry-based assay does not guarantee certification. As with immunoassays, mass spectrometry assays can exhibit inaccuracy, limited specificity, and imprecision resulting in differences among mass spectrometry results. Although variations in accuracy are most often due to the standards used for calibration, inaccuracy can also result from interference and matrix effects, especially at low estrogen concentrations. The CDC methods are, however, costly but have shown success in the standardization of steroid methods for the measurement of T and vitamin D (4) http://www.cdc.gov/labstandards/pdf/hs/CDC_Certified_Vitamin_D_Procedures.pdf.

Immunoassay vs Mass Spectrometry-Based Assays

Mass spectrometry-based methods have gained popularity and preference for measurement of estrogens and their metabolites in specific populations due to their improved sensitivity, specificity, and ability to multiplex. However, the instrumentation for mass spectrometry assays is expensive, and assays to quantify estrogen metabolites are technically demanding. Therefore, despite clear limitations, many clinical laboratories continue to use direct (nonextraction, nonchromatography) immunoassays because they are inexpensive, are easy to use, and provide rapid results. The key consideration is that estrogen assays for clinical research, epidemiological studies, and patient care need to be validated using stringent criteria regardless of the methodology used. This was considered a key issue by the conference attendees. Most attendees reached the agreement that both immunoassays and mass spectrometry-based assays for estrogens and estrogen metabolites would be acceptable if they are accurate and reliable, meeting performance criteria suitable for their intended use.

Use of currently available direct estradiol immunoassays appears to compromise assay specificity, probably due to antibody cross-reactivity, matrix effects, and suboptimal sensitivity. These direct assays usually yield higher values than those measured by methods with superior specificity, including immunoassays preceded by organic solvent extraction, liquid chromatography, or gas or liquid chromatography-mass spectrometry assays. The inaccuracy of direct immunoassays is especially problematic in patients with very low estrogen concentrations, which includes men, postmenopausal women, those taking aromatase inhibitors, and prepubertal children (2, 6). Thus, the use of current direct assays for measuring estradiol concentrations in these groups of patients should be discouraged.

Method Validation

There was substantial discussion about method validation and what is required to validate assay accuracy. Accuracy is defined as how closely a result represents the true value in the sample (ie, trueness), as determined by a “gold standard” or reference method. Assay inaccuracy complicates the comparison of results obtained from different laboratories, from the same laboratory over time, or from different epidemiological and clinical studies and constrains the application of guidelines to individual and/or groups of patients. A set of clinical practice guidelines cannot be implemented unless the assays used are accurate and precise.

The key elements of method validation for a clinical assay include accuracy, precision, specificity, sensitivity, and the establishment of a reportable range and a reference range. Measurement of estradiol is challenging because the physiologically relevant concentration range spans at least four orders of magnitude. Thus, accuracy and precision across this wide concentration range may not be consistent, and an assay that is suitable for use for diagnosis and management of infertility in adult women may not meet performance specifications for use when evaluating the onset of puberty in a child. Two particular areas are problematic in the use of current estradiol assays: measurements in postmenopausal women and in pediatric subjects where sensitivity is a critical issue. Method validation studies are designed at the discretion of the laboratory director, but guidelines for clinical validation of quantitative methods are available (710). The CDC HoSt program has developed initial criteria as part of the standardization efforts, but further investigation is needed to determine their general applicability as method validation criteria. At this time, workshop attendees recommended no specific requirements for accuracy and precision, but it was agreed that performance must be appropriate to the question being addressed. The lower limit of quantitation (sometimes referred to as sensitivity) of the assay also needs to be adequately validated and appropriate for the patients being tested. For example, the lower limit of quantitation of estradiol assays for postmenopausal women needs to be substantially lower than that used in pubertal children or premenopausal women. This requires that the concentrations of the calibration and certified reference materials used be appropriate for the values of the samples to be assayed. Laboratories should follow best practices, verify manufacturers' claims for Food and Drug Administration-cleared assay performance in their specific testing populations, and ensure that performance is adequate over the clinically relevant concentration range. This includes verification of claimed accuracy, precision, reportable range, and reference intervals (11).

Requirement for Accuracy

Assay accuracy is commonly determined by comparing the measured results to those expected, based on a given concentration determined by a reference method procedure. Thus, certified reference materials are essential for determining assay accuracy. The target values are assigned to these reference materials using metrological reference methods (or reference measurement procedures) set out by several national and international bodies. Metrological reference laboratories differ from clinical laboratories. Reference measurement procedures are codified and provide the highest possible level of specificity, accuracy, and precision and are not intended for use in patient care or clinical or epidemiological research. Furthermore, the quality of the reference material used to assess measurement accuracy is crucial. Thus, high-quality reference materials with target values assigned by a metrological reference method are necessary to assess and describe assay accuracy.

Methods for Determining Analytical Accuracy

The assurance of assay accuracy is a complex process, which is often limited by expense. There are several methods for evaluating the accuracy of clinical assays, which vary in degree of excellence and cost. A first-order level of determination, the most expensive, involves programs such as the National Glycohemoglobin Standardization Program. Such programs use multiple levels of reference materials (20–40 different levels) with “true” values assigned by a recognized reference method. The accuracy assessment follows generally recognized protocols (12). In all approaches, bias should be evaluated and be within clinically allowable limits. Linear regression analysis and Bland-Altman type plots are also recommended to further assess any concentration-dependent bias. Described below are several alternative approaches for determining accuracy, each of which is associated with less cost than a first-order method.

  • 1. 

    Certified, serum-based reference materials such as those offered by metrological institutions such as National Institute of Standards and Technology (NIST) can be used. These materials consist of pooled sera and are available at two to four different analyte concentrations. A potential complicating factor is that some of these materials may have been modified to a degree such that they will not allow the intended correct assessment of measurement accuracy. This problem usually results from matrix effects that influence some assay methods and not others, a characteristic of reference materials called “noncommutability.” In addition, reliance on only two to four analyte concentrations prevents evaluation of validity over a broad range of physiological values. Furthermore, assay specificity can be related to the sex, age, and menopausal status of the subject, and a limited number of analyte preparations may not identify the related differences in the validity of an assay. Generally recognized protocols and procedures to perform an accuracy assessment using these reference materials are limited. Assays are performed by the laboratories using these materials and are normally not reviewed by an independent organization.

  • 2. 

    Accuracy-based surveys are made available by several proficiency testing providers. The materials used in these surveys consist of a number of different analyte concentrations in an artificial matrix (ie, not serum or plasma). Some of these materials may be noncommutable and thus unsuitable for the correct assessment of accuracy. Generally recognized protocols and procedures to perform an accuracy assessment using these materials are limited.

  • 3. 

    Comparing results on similar specimens between or among laboratories represents another method. Generally recognized protocols and statistical procedures exist to make this comparison (13). These kinds of studies are not normally monitored by an independent organization but are used by participating laboratories to help ensure comparability and accuracy.

Study to Compare Methodologies for Determining Accuracy

At the workshop, it was recommended that a study to evaluate the different approaches for validating accuracy be conducted to assess the general applicability and to obtain further information about the strengths and limitations of each approach. Funding might be obtained from large commercial reference laboratories and/or other interested sources. Suggestions were put forward that PATH, The Endocrine Society, and the AACC might collaboratively undertake this effort. This will be particularly important for estrogen metabolites for which there are no certified reference materials or reference measurement procedures likely to be developed.

Advice to Journal Editors

The group concurred that recommendations about estradiol assays be developed and made available to journal editors to aid in the assessment of assay validity in submitted manuscripts.

Whether accuracy should be considered an essential component of assay validity in submitted manuscripts and whether this requirement should be imposed by journal editors remain core questions not only for estrogen assays but also for all estrogen and estrogen metabolite assays. No specific requirements for accuracy and precision (ie, coefficients of variation between and within analytical runs at specific concentrations) were recommended, but it was suggested that performance must be appropriate to the question in the manuscript being addressed. Workshop attendees recommended that authors report assay precision, as well as accuracy and sensitivity over the time period that the samples were being assayed.

Currently, journals do not require that assay results be traceable to a “gold standard.” It was recommended that journals, at this time, require a statement regarding accuracy assessment, with the long-term goal of requiring accurate assays. The methodology must describe the traceability of the calibration standards used for the assay. As an example of this description, analyte “A” was measured using a kit from the “ABC” company with validity based on commutable standards from a national metrology institute such as NIST (National Institute for Standards and Technology) or the Institute for Reference Materials and Measurements.

The information obtained from the planned study of the various accuracy assessment approaches should aid reviewers and editors to assess the appropriateness of the accuracy information provided in a given manuscript.

Ten years ago, a set of standards for publications dealing with the diagnostic accuracy of clinical laboratory tests was published and adopted by many clinical journals (14). The STARD require a full description of the analytical method, including the assay's precision and the calibration standards used. STARD does not currently require inclusion of data regarding the analytical accuracy of the test. Attendees at the workshop recommended that the STARD criteria be updated to require data on a clinical assay's accuracy (trueness) or bias compared to an accepted reference method.

Glossary*

Assay accuracy is defined as how closely the result represents the true values in the sample (ie, trueness) as determined by measurement with a “gold standard” or reference method procedure.

Certified reference material (CRM) is a reference material accompanied by documentation issued by an authoritative body that provides one or more specified property values with associated uncertainties and traceabilities using valid procedures. CRMs may be used to calibrate and validate analytical measurement methods or to verify metrological traceability of other reference materials.

Commutability is the ability of a reference material to have interassay properties comparable to the properties demonstrated by authentic clinical samples when measured by more than one measurement procedure.

Estrogen is a general term that refers to estradiol, estrone, and estriol.

Estrogen metabolites is a general term that refers primarily to the 2-, 4-, and 16-hydroxyestrone and estradiol metabolites and their methylated, glucuronidated, and sulfated derivatives.

“Gold standard” is a term used to describe the benchmark method that is considered the best available in a given scenario.

Harmonization refers to any process that enables the establishment of equivalence of reported values produced by different measurement procedures for the same analyte (ie, the quantity intended to be measured). In a narrower sense, harmonization is defined as the uniformity of laboratory test results among laboratories when neither a commutable, higher-order primary reference material nor a reference measurement procedure is available.

Metrological reference laboratories are laboratories with the highest level of assay methodology, which are designed to obtain data that most closely approximate “trueness.” An example is The Laboratory of Analytical Chemistry in the Department of Pharmaceutical Analysis in Ghent, Belgium.

NIST is the National Institute of Standards and Technology.

Noncommutability refers to a limitation of certain reference materials, likely due to matrix effects, that causes differences in the properties of the reference material and authentic clinical samples when measured by more than one measurement procedure.

Reference measurement procedure refers to a measurement method that has been rigorously validated and accepted as providing measurement results fit for its intended use of assessing measurement trueness of value obtained from other measurement procedures and for characterization of calibrators and reference materials.

Standardization is the uniformity of laboratory test results among laboratories based on the relationship to a recognized, commutable, nationally or internationally agreed upon reference standard (AACC Position Statement).

Trueness is defined as a condition when the amount of substance measured by the assay is identical to the amount present as determined by the best available method.

*Definitions differ depending on the specialty utilizing these terms. The definitions here conform to those used in ISO Standards.

Acknowledgments

Appreciation is expressed for financial support for the meeting provided by the Intramural Program of the Division of Cancer Epidemiology and Genetics of the National Cancer Institute, the National Institutes of Health; the CDC Foundation; the American Association for Clinical Chemistry; The Endocrine Society; ARUP Laboratories; Beckman Coulter, Inc; inVentiv Health; and Quest Diagnostics.

Disclosure Summary: L.M.D., S.E.H., S.H., T.K., W.R., F.Z.S., H.W.V., and R.G.Z. have no disclosures. R.J.S. is the principal investigator on a grant from Pfizer to the University of Virginia and served on a Pfizer Advisory board in 2013. The document represents his personal opinion, not that of his role as an officer of The Endocrine Society.

Funding Statement

Appreciation is expressed for financial support for the meeting provided by the Intramural Program of the Division of Cancer Epidemiology and Genetics of the National Cancer Institute, the National Institutes of Health; the CDC Foundation; the American Association for Clinical Chemistry; The Endocrine Society; ARUP Laboratories; Beckman Coulter, Inc; inVentiv Health; and Quest Diagnostics.

References

  • 1. Stanczyk FZ, Lee JS, Santen RJ. Standardization of steroid hormone assays: why, how, and when? Cancer Epidemiol Biomarkers Prev. 2007;16(9):1713–1719. [DOI] [PubMed] [Google Scholar]
  • 2. Stanczyk FZ, Jurow J, Hsing AW. Limitations of direct immunoassays for measuring circulating estradiol levels in postmenopausal women and men in epidemiologic studies. Cancer Epidemiol Biomarkers Prev. 2010;19(4):903–906. [DOI] [PubMed] [Google Scholar]
  • 3. Rosner W, Hankinson SE, Sluss PM, Vesper HW, Wierman ME. Challenges to the measurement of estradiol: an Endocrine Society Position Statement. J Clin Endocrinol Metab. 2013;98(4):1376–1387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Rosner W, Vesper H. Toward excellence in testosterone testing: a consensus statement. J Clin Endocrinol Metab. 2010;95(10):4542–4548. [DOI] [PubMed] [Google Scholar]
  • 5. Greg Miller W, Myers GL, Lou Gantzer M, et al. . Roadmap for harmonization of clinical laboratory measurement procedures. Clin Chem. 2011;57(8):1108–1117. [DOI] [PubMed] [Google Scholar]
  • 6. Ikegami S, Moriwake T, Tanaka H, et al. . An ultrasensitive assay revealed age-related changes in serum oestradiol at low concentrations in both sexes from infancy to puberty. Clin Endocrinol (Oxf). 2001;55(6):789–795. [DOI] [PubMed] [Google Scholar]
  • 7. Tholen DW, Kallner A, Kennedy JW, Krouwer JS, Meier K. Evaluation of precision performance of quantitative methods; approved guideline. 2nd ed NCCLS Document EP5-A2 Wayne, PA: Clinical and Laboratory Standards Institute; 2014:1–39. [Google Scholar]
  • 8. Tholen DW, Kroll M, Astles JR, et al. . Evaluation of the linearity of quantitative measurement procedures: a statistical approach; approved guideline. NCCLS Document EP6-A Wayne, PA: Clinical and Laboratory Standards Institute; 2003:1–47. [Google Scholar]
  • 9. Krouwer JS, Cembrowski GS, Tholen DW. Preliminary evaluation of quantitative clinical laboratory measurement procedures; approved guideline. 3rd ed NCCLS Document EP10-A3-AMD Wayne, PA: Clinical and Laboratory Standards Institute; 2014:1–50. [Google Scholar]
  • 10. Tholen DW, Linnet K, Kondratovich M, et al. . Protocols for determination of limits of detection and limits of quantitation; approved guideline. NCCLS Document EP17-A Wayne, PA: Clinical and Laboratory Standards Institute; 2004. [Google Scholar]
  • 11. Carey RN, Anderson FP, George H, et al. . User verification of performance for precision and trueness; approved guideline. 2nd ed NCCLS Document EP15-A2 Wayne, PA: Clinical and Laboratory Standards Institute; 2005:1–49. [Google Scholar]
  • 12. Centers for Disease Control and Prevention. Laboratory quality assurance and standardization programs. http://www.cdc.gov/labstandards/hs_standardization.html. Updated March 25, 2015.
  • 13. Lee JS, Ettinger B, Stanczyk FZ, et al. . Comparison of methods to measure low serum estradiol levels in postmenopausal women. J Clin Endocrinol Metab. 2006;91(10):3791–3797. [DOI] [PubMed] [Google Scholar]
  • 14. Bossuyt PM, Reitsma JB, Bruns DE, et al. . Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Fam Pract. 2004;21(1):4–10. [DOI] [PubMed] [Google Scholar]

Articles from The Journal of Clinical Endocrinology and Metabolism are provided here courtesy of The Endocrine Society

RESOURCES