ABSTRACT
Objectives
Many investigations of human health, behaviors, and adaptations require an indicator of ovarian cycle functioning as a causal, outcome, or confounding variable in the study design and analyses. Because the dynamic fluctuations in cycle hormones can rarely be adequately characterized by a single measurement, but repeated blood sampling can be onerous, salivary free progesterone (PFree‐SAL) concentration is widely used in both clinical and research contexts as an alternative to total progesterone concentration in venous blood samples (PTotal‐VEN). However, some doubts have been raised about the use of PFree‐SAL because of suggestions that Bolivian and other populations and/or individuals might differ markedly in the ratio of PFree‐SAL to PTotal‐VEN (the apparent uptake fraction, UF). If there are such differences, several decades of comparative population research based on PFree‐SAL would require reconsideration, and a seemingly useful tool in both clinical and research contexts would be lost or require additional extensive pre‐use evaluations. Such impacts would fall disproportionally on clinical monitoring and research studies of menstruating persons, a segment of the population that has long been underrepresented in research and clinical trials, especially in low resource conditions. Therefore, we tested three hypotheses: (H1) UF differs by ovarian cycle phase; (H2) UF differs in Bolivian women from that of non‐Bolivian women; and (H3) within a population, UF is consistently higher or lower in some individuals than in most others.
Methods
We collected mid‐follicular and mid‐luteal near‐concurrent samples of venous blood and saliva from 36 healthy premenopausal Bolivian women. PTotal‐VEN and PFree‐SAL were measured using commercial enzyme immunoassays. To test the study hypotheses, we used graphical and statistical methods to analyze these new data and to analyze data from several previously published studies.
Results
In our study sample of Bolivian women, PFree‐SAL and PTotal‐VEN concentrations (n = 66 pairs) were significantly and highly correlated (Spearman's rho = 0.858; mixed model: intercept = 77.4 pmol/L [(p < 0.001), β = 0.0191 (p < 0.001)]). An individual's follicular‐phase UF and luteal‐phase UF were not significantly correlated (rho = −0.19, p = 0.462). Median UF equaled 8.1% for follicular and 2.3% for luteal phase pairs and were comparable to published values for other populations.
Conclusions
Hypothesis 1 was supported. Consistent with prior reports for other populations, in these Bolivian women UF was higher and more variable in the follicular than in the luteal phase. The source(s) of phase‐associated variation in UF deserves additional study, particularly the dynamic relationship to different conformers of corticosteroid‐binding globulin (CBG). Hypothesis 2 was not supported. Paired PFree‐SAL and PTotal‐VEN were highly correlated, and UF in these Bolivians was comparable to published values for other populations. Hypothesis 3 was not supported. There was no evidence that some individuals have consistently higher (or lower) UF than most other persons. In sum, these findings do not support the suggestions that the physiology underlying the relationship between PFree‐SAL and PTotal‐VEN differs substantially and inexplicitly between populations and individuals. These results also reinforce the critical roles of fastidious attention to sample collection and handling, judicious assessment of assay results, and appropriate statistical methods when using ovarian steroid data in any project. We suggest some guidelines for meeting these requirements. Used with due consideration for its advantages and limitations, PFree‐SAL reliably tracks PTotal‐VEN during the menstrual cycle and is a useful option in the biomarker toolkit. Just as it is costly to continue our work with tools not up to the task, so is it costly to discard useful tools without good reason. The development (and improvement through replication) of a robust toolkit for assessing changes in and the impacts of menstrual cycle hormones is foundational to reducing gender‐based health disparities. (The linked file listed below under “Supporting Information” presents these findings in Spanish).
Keywords: analytical methods, biomarkers, Bolivia, corticosteroid‐binding globulin, enzyme immunoassay, ovarian function, United States
Scatterplot of paired measurements of PFree‐SAL (y‐axis, log scale) versus PTotal‐VEN (x‐axis, log scale) for each of our 36 Bolivian study participants. Dotted diagonals represent lines of constant uptake fraction (UF = PFree‐SAL/PTotal‐VEN) from 0.2% to 100%. The solid red line represents the linear regression model for luteal phase pairs (red dots), which tend to monotonically increase together, hence falling within a relatively narrow band of UF values at the highest serum values. The distribution of the follicular PFree‐SAL–PTotal‐VEN pairs (blue triangles) is more globular and spans a wider range of UF values than the distribution of the luteal pairs. Similar data distributions have been observed in non‐Bolivian populations (e.g., New Zealand), refuting the hypothesis that UF in Bolivian women differs from that of non‐Bolivian women.

RESUMEN
Objetivos
Muchas investigaciones sobre la salud, los comportamientos y las adaptaciones humanas requieren un indicador del funcionamiento del ciclo ovárico como variable causal, de resultado o de confusión en el diseño y los análisis del estudio. Dado que las fluctuaciones dinámicas de las hormonas del ciclo rara vez pueden caracterizarse adecuadamente con una sola medición, pero que la toma repetida de muestras de sangre puede resultar onerosa, la concentración de progesterona libre salival (PLibre‐SAL) se utiliza ampliamente tanto en contextos clínicos como de investigación como alternativa a la concentración total de progesterona en muestras de sangre venosa (PTotal‐VEN). Sin embargo, se han planteado algunas dudas sobre el uso de PLibre‐SAL debido a las sugerencias de que las poblaciones bolivianas y otras poblaciones y/o los individuos podrían diferir notablemente en la relación entre PLibre‐SAL y PTotal‐VEN (la fracción de captación aparente, UF). Si existieran tales diferencias, habría que reconsiderar varias décadas de investigación poblacional comparativa basada en la PLibre‐SAL, y se perdería una herramienta aparentemente útil tanto en contextos clínicos como de investigación, o se requerirían amplias evaluaciones adicionales previas a su uso. Tales repercusiones recaerían desproporcionadamente en el seguimiento clínico y los estudios de investigación de las personas que menstrúan, un segmento de la población que durante mucho tiempo ha estado infrarrepresentado en la investigación y los ensayos clínicos, especialmente en condiciones de escasos recursos. Por lo tanto, pusimos a prueba tres hipótesis: (H1) la UF difiere según la fase del ciclo ovárico; (H2) la UF difiere en las mujeres bolivianas de la de las mujeres no bolivianas; y (H3) dentro de una población, la UF es consistentemente mayor o menor en algunos individuos que en la mayoría de los demás.
Métodos
Recogimos muestras casi simultáneas de sangre venosa y saliva de mediados foliculares y mediados lúteas de 36 mujeres bolivianas premenopáusicas sanas. Se midió la PTotal‐VEN y la PLibre‐SAL utilizando inmunoensayos enzimáticos comerciales. Para comprobar las hipótesis del estudio, utilizamos métodos gráficos y estadísticos para analizar estos nuevos datos y para analizar los datos de varios estudios publicados anteriormente.
Resultados
En nuestra muestra de estudio de mujeres bolivianas, las concentraciones de PFree‐SAL y PTotal‐VEN (n = 66 pares) estaban significativamente y altamente correlacionadas (rho de Spearman = 0,858; modelo mixto: intercepto = 77,4 pmol/L [(p < 0,001), β = 0,0191 (p < 0,001)]). La UF de la fase folicular y la UF de la fase lútea de un individuo no estaban significativamente correlacionadas (rho = −0,19; p = 0,462). La mediana de UF fue del 8,1% para los pares en fase folicular y del 2,3% para los pares en fase lútea, y fueron comparables a los valores publicados para otras poblaciones.
Conclusiones
Se confirmó la Hipótesis 1. En consonancia con informes anteriores de otras poblaciones, en estas mujeres bolivianas la UF fue mayor y más variable en la fase folicular que en la lútea. La(s) fuente(s) de la variación de la UF asociada a la fase merece un estudio adicional, en particular la relación dinámica con los diferentes conformadores de la globulina fijadora de corticosteroides (CBG). No se confirmó la Hipótesis 2. La PLibre‐SAL y la PTotal‐VEN emparejadas estaban altamente correlacionadas, y la UF en estos bolivianos era comparable a los valores publicados para otras poblaciones. La Hipótesis 3 no fue corroborada. No hubo evidencia de que algunos individuos tengan consistentemente mayor (o menor) UF que la mayoría de las otras personas. En resumen, estos resultados no apoyan las sugerencias de que la fisiología subyacente a la relación entre PLibre‐SAL y PTotal‐VEN difiere sustancial e inexplícitamente entre poblaciones e individuos. Estos resultados también refuerzan las funciones críticas de la atención meticulosa a la recogida y manipulación de muestras, la evaluación juiciosa de los resultados de los ensayos y los métodos estadísticos apropiados cuando se utilizan datos de esteroides ováricos en cualquier proyecto. Sugerimos algunas directrices para cumplir estos requisitos. Utilizado con la debida consideración de sus ventajas y limitaciones, PLibre‐SAL realiza un seguimiento fiable de PTotal‐VEN durante el ciclo menstrual y es una opción útil en el conjunto de herramientas de biomarcadores. Del mismo modo que es costoso continuar nuestro trabajo con herramientas que no están a la altura, también lo es descartar herramientas útiles sin una buena razón. El desarrollo (y la mejora a través de la replicación) de un sólido conjunto de herramientas para evaluar los cambios y los efectos de las hormonas del ciclo menstrual es fundamental para reducir las disparidades de salud basadas en el género. (El archivo vinculado que figura a continuación en «Información complementaria» presenta estas conclusiones en español).
Abbreviations
- E2
estradiol
- P
progesterone
- PFree‐SAL
free (not bound to carrier proteins) progesterone in saliva or a saliva sample
- PFree‐VEN
free progesterone in circulation or in a venous blood sample (plasma or serum)
- PTotal‐DBS
total progesterone concentration in a dried blood spot sample (from a fingertip prick)
- PTotal‐VEN
total (free plus bound) progesterone in circulation or in a venous blood sample
- UF
apparent uptake fraction = PFree‐SAL/PTotal‐VEN
- UFFOL
UF during the follicular phase
- UFLUT
UF during the luteal phase
1. Introduction
Many investigations of human health, behaviors, and adaptations require an indicator of ovarian cycle functioning as a causal, outcome, or confounding variable in the study design and analyses. The belated recognition that females 1 were severely underrepresented in biomedical studies prompted passage of the U.S. National Institutes of Health (NIH) Revitalization Act in 1993, which mandated the inclusion of female participants in most NIH‐funded research on humans (Mastroianni et al. 1994; NIH 2024). For similar policies elsewhere see, for example, Government of Canada IAP on RE (2023) and Klinge (2008).
Although there has been progress, 30 years later biases in sex and gender inclusion persist worldwide at every level of research from the bench to the community (Ah‐King 2022; Arnegard et al. 2020; Barr et al. 2024; Beery and Zucker 2011; DuBois and Shattuck‐Heidorn 2021; Geller et al. 2018; Mazure and Jones 2015; Orr et al. 2020; Wilson et al. 2020). This imbalance is exacerbated for those persons who experience health disparities (i.e., are at greater risk of poorer health and higher mortality compared to the general population) because of their ethnicity/race, low income, social status, rural residency, or other factors (Thomson et al. 2006).
In many (arguably most) studies that have included females, either the possible effects of changes in cycle hormones on outcome variables are ignored entirely or data collection is restricted to the early follicular phase (when the principal ovarian steroids, progesterone [P] and estradiol [E2], are low) (Shea and Vitzthum 2020; Wenner and Stachenfeld 2020). Many studies collect only a single venous blood sample to ascertain whether a cycle hormone concentration falls within some expected range. Such research strategies are insufficient for assessing the roles of ovarian hormones on health and well‐being during at least 75% (3 of every 4 weeks) of the roughly 40 years of menstrual cycling experienced by half of the human population.
Addressing these inequities is no simple task. The menstrual cycle is not a clock (Vitzthum 2009; Vitzthum et al. 2021). The natural variability in follicular and luteal phase durations between cycles and between persons (Figure 1) makes it difficult to accurately predict the timings of ovulation, peak preovulatory E2, or the peaks of mid‐luteal E2 and P. Hence, a single sample collected on a day presumed to represent one of these fleeting states will not necessarily be accurately timed for capturing the true hormonal concentrations at these commonly defined transitions in the cycle. Because the body dynamically responds to internal and external cues (Vitzthum 2001, 2008, 2024), reliance on a single blood sample, even judiciously and fortuitously timed, is typically not up to the task of adequately characterizing ovarian functioning (Fujimoto et al. 1990).
FIGURE 1.

Natural (“normal”) variation in serum progesterone concentration in adult premenopausal women. Inter‐cycle variability (i.e., within‐woman or intra‐woman variability) for ovulation (dark orange) and next menses (dark blue) are the 95% prediction intervals for the timing of these events in any single woman, assuming an inter‐cycle average duration that is equal to the population average. Inter‐woman (between women) variability for ovulation (orange) and next menses (blue) are the 95% prediction intervals for the timing of these events in the overall population. Hormone levels represent usual ones [and are], not necessarily related to what is healthy. Hormone ranges vary between persons at the same biological stage of the menstrual cycle. The actual timing (days from menstruation) of that biological stage varies between cycles and between women. The ranges denoted by biological stage are the 90% prediction intervals for hormone levels for women at the same biological stage (dark green). Inter‐woman variability (between women, yellow) are the up to 95% prediction intervals for hormone levels in the overall population. Source: Adapted from: https://commons.wikimEdia.org/wiki/File:Hormones_estradiol,_progesterone,_LH_and_FSH_during_menstrual_cycle.png.
Ovarian hormone concentrations are best captured by serial sampling. Because serial venous blood collection is logistically demanding and understandably onerous for most persons, methods to measure hormones in urine, saliva, and fingertip blood drops (dried blood spots, DBS) have been developed. The choice of a biomarker depends on several considerations, including the research question, available facilities, and the study population. There are various biomarkers, each with advantages and limitations, that meet these differing needs (see, e.g., Leonard (2021), Snodgrass (2022), and the papers in each issue; Gröschl (2017), McDade (2014), Vitzthum (2021), and Worthman and Costello (2009)).
The concentration of free progesterone in saliva (PFree‐SAL) is one widely used biomarker (Figure 2). PFree (P not bound to the carrier proteins, corticosteroid‐binding globulin [CBG], and albumin) is the biologically active (i.e., is available to the tissues) portion of total P (PTotal). Because PFree diffuses rapidly from blood into saliva (Bolaji 1994; Vining and McGinley 1986), PFree‐SAL is a credible proxy for (but does not equal) PFree in venous blood (PFree‐VEN) (Blom et al. 1993; Delfs et al. 1994; Evans 1986; Laine and Ojanotko 1999; Wang and Knyba 1985). However, PFree‐VEN is not routinely measured because the protocols require considerably more labor and expense than does measuring PTotal in a sample of venous blood (PTotal‐VEN), and because PFree‐VEN is not commonly used for clinical purposes (Read 1989). Therefore, the validity of PFree‐SAL as a useful biomarker has been evaluated, with few exceptions, by comparing the concentrations of PFree‐SAL and PTotal‐VEN. 2 , 3
FIGURE 2.

Progesterone pathways from circulating venous blood to saliva. In circulation, most P molecules are bound to a carrier protein (CBG or albumin); only unbound PFree‐Ven diffuses passively, requiring about 1 min for unidirectional passage from blood into saliva. Saliva may be contaminated by blood (from gums or other mouth injury/disease) and/or mouth debris (e.g., food, tobacco). If present, such contamination typically raises the measured PFree‐SAL by several times the concentration of PFree‐SAL that enters saliva via passive diffusion.
Some investigators have questioned the use of PFree‐SAL to compare ovarian functioning across populations and individuals because of hypothesized, but unspecified, atypical secretory mechanics in Bolivians (Chatterton et al. 2006) and perhaps also in some U.S. women who had self‐identified as either Japanese or Caucasian ethnicity (Konishi et al. 2012). The authors of these two independent studies suggested that the ratio PFree‐SAL/PTotal‐VEN (for convenience, referred to here as the apparent uptake fraction [UF]) is too variable, for whatever biological or methodological reasons, for PFree‐SAL to be a useful biomarker of ovarian functioning for at least some persons and/or populations. Their conclusions differed markedly from previously published validation studies of PFree‐SAL.
Many research groups have evaluated paired PTotal‐VEN and PFree‐SAL measurements from near‐concurrently collected blood and saliva samples and have reported a high correlation between the two biomarkers (Appendix A; discussed in Section 1.2). Of those studies that also reported UF, follicular‐phase (UFFOLL) values are typically higher than luteal phase (UFLUT) values; phase‐specific UF values are roughly similar across studies. However, it is much rarer to compare paired PTotal‐VEN and PFree‐SAL measurements in those populations that are culturally averse to having any blood drawn and/or in those research settings lacking the requisite facilities and personnel for venipuncture collection and subsequent storage of blood samples. Under such circumstances, and in light of the high correlations reported in laboratory validation studies, forgoing a comparison of paired PTotal‐VEN and PFree‐SAL in novel contexts has not previously prompted much concern about the validity of PFree‐SAL as a proxy for PTotal‐VEN.
On the one hand, it is a reasonable assumption until shown otherwise that the biochemistry of PFree diffusion from blood to saliva (Figure 2) is comparable across human populations. As such, it appears justified to use protocols that have been laboratory validated in one population for studies in other populations.
On the other hand, if (as was argued by Chatterton et al. (2006) and Konishi et al. (2012)) the diffusion of PFree (and perhaps other analytes) consistently and significantly differs between some populations and/or some persons, several decades of community‐based comparative research relying on PFree‐SAL measurements would require reassessment.
Furthermore, the future use of serial saliva sampling to evaluate ovarian functioning in study participants would, at the least, require additional pre‐study validation that the UF is sufficiently similar in prospective participants and/or populations for reliably evaluating the study questions, or that UF variation can be suitably accounted for in analyses. If not, there would be fewer tools for serial measurements throughout the cycle of hormonal changes and their consequent effects on health and behaviors.
1.1. Study Hypotheses
To evaluate the utility of PFree‐SAL as a biomarker of human ovarian functioning, we present new data from 66 sets of near‐concurrent saliva and venous blood samples collected in a research laboratory from 36 Bolivian women. We tested three hypotheses regarding the UF for progesterone (i.e., the ratio of PFree‐SAL/PTotal‐VEN).
UF differs by ovarian cycle phase (i.e., follicular vs. luteal phases).
UF differs in Bolivian women from that of non‐Bolivian women.
Within a population, UF is consistently higher (or lower) in some individuals than in most others.
1.2. Prior Evaluations of PFree ‐SAL as a Biomarker for Circulating Progesterone
A search for published and gray literature that reported assessments of the relationship of PFree‐SAL to PTotal‐VEN yielded the list of studies in Appendix A. Notably, only a few of these are for populations outside the United States and Europe. However, not all validation studies are necessarily published, perhaps particularly those that are done in a different population but are consistent with previously published validations of the same (or near same) protocols.
A principal goal motivating the development of salivary steroid assays in the late 1970s was the need for minimally invasive alternatives to blood collection, thus facilitating serial monitoring of patients and the effects of treatments. Moreover, in many populations there were (and still are) strong aversions to having blood drawn, and/or a lack of facilities and trained personnel, which limit both health care and health research. International organizations (e.g., World Health Organization [WHO]) as well as research units at health centers and universities were involved in the development of various alternatives to collecting venous blood for measuring analytes of interest (International Atomic Energy Agency (IAEA) 1982; Read et al. 1984; Sufi et al. 1982, 1985).
The Tenovus Institute for Cancer Research in Cardiff, Wales, was among those first teams to develop PFree‐SAL collection and assay protocols (R. F. Walker et al. 1978, 1979; S. Walker et al. 1981). For validation, a total of nine healthy menstruating volunteers collected daily saliva samples throughout a complete menstrual cycle (Appendix A: Studies 1 and 2). Matched venous blood samples were collected from each of these volunteers on at least 8 and up to 25 days. The correlation (r) of each volunteer's paired PFree‐SAL and PTotal‐VEN concentrations ranged from 0.81 to 0.97 and was 0.86 for the entire sample of 143 matched pairs.
By 1980, the Tenovus Institute protocols were in use in a Bangladeshi study of reproductive functioning in impoverished women (Seaton and Riad‐Fahmy 1980). Saliva samples were collected daily for 3–4 months, a protocol not likely to have been sustained if blood samples had been collected instead. Low PFree‐SAL throughout a cycle, indicative of anovulation or postovulatory luteal insufficiency, was common in the sample of Bangladeshi women. Such patterns can be readily missed if only one, or even a few, blood or salivary samples are collected during each cycle. An assessment of UF in Bangladeshis at that time, if done, was not reported in the available literature. Given that the investigators interpreted low PFree‐SAL as they would have if observed in United Kingdom (UK) women, it appears the investigators assumed that the correlation between measurements of salivary and plasma P, and the UF, was similar in United Kingdom and Bangladeshi women.
Some three decades later the relationship of PFree‐SAL to PTotal‐VEN was assessed in a new study of Bangladeshi women. Houghton (2008) compared the correlation of PFree‐SAL and PTotal‐VEN in three samples of adult women: nonmigrant (“sedentee”) Bangladeshi, Bangladeshi who had migrated in adulthood to the United Kingdom, and Britons of European descent (Appendix A: Study 26). A single pair of saliva and blood samples was collected during the presumptive luteal phase. Participant reporting of the subsequent first day of menses revealed that some sample collections were not, in fact, during the luteal phase (predicting the timing of the luteal phase is a common difficulty in ovarian cycle research). Houghton (2008) reported several unforeseeable challenges that specifically impacted data collection from the Bangladeshi migrants, which may explain the very low correlation (r = 0.19) of PFree‐SAL and PTotal‐VEN for this sample of women. In contrast, the correlations of PFree‐SAL and PTotal‐VEN in the Bangladeshi sedentee and British samples were high and comparable to those observed in earlier studies (Appendix A). Moreover, UFLUT in the British and Bangladeshi sedentee samples were similar (0.30% ± 0.18% and 0.25% ± 0.23%, respectively). These findings are consistent with the expectation that the dynamic relationship between P in blood and in saliva does not differ across human populations (or, at the least, not between Bangladeshi and British women).
For various reasons, measurements of an analyte are not necessarily comparable across laboratories (Dabbs Jr. et al. 1995). Evaluating hormone concentrations over time and across populations typically requires the use of equivalent materials and protocols. Rigorous quality controls within laboratories help to maintain consistency in measurement. With these and other requirements in mind, in the 1970s the World Health Organization (Special Program of Research in Human Reproduction) established the Matched Reagent Program in dozens of countries worldwide (Sufi et al. 1982).
As part of this program, protocols for PFree‐SAL and salivary estradiol (E2Free‐SAL) were evaluated in daily samples throughout a cycle from six women in each of five countries (Chile, China, India, Singapore, Thailand; Sufi et al. 1985). In this initial study (published results were not identified by country), PFree‐SAL in three centers were comparable to those in previously published studies (the authors cited 4 Chearskul et al. 1982; Donaldson et al. 1984; R. F. Walker et al. 1984). In the one study center where PFree‐SAL was unusually low, four of the six participants had been from an infertility clinic. For uncertain reasons, one center had exceptionally high values for PFree‐SAL and E2Free‐SAL, and variability for each steroid was several times greater than in the other centers. Nonetheless, in this center the relative changes in both steroids across cycle phases mirrored those of the other centers. The program concluded that PFree‐SAL is a “clearly useful analytical tool, particularly for studies involving examination of endocrine status over a period of time [and] in population groups from whom regular blood samples would be difficult or impossible to obtain…” (Sufi et al. 1985).
At Harvard University (Boston, USA), Ellison adapted the Tenovus salivary progesterone protocols (R. F. Walker et al. 1979; S. Walker et al. 1981) for use in anthropological research (Ellison 1988; Ellison et al. 1986). The reported correlation of an individual's paired serum and saliva samples during “the course of the menstrual cycle” ranged from 0.80 to 0.97 in three Boston women (Appendix A: Study 20 [Lipson and Ellison 1992]). These collection and assay protocols were used in several populations (Figure 3) including Bostonians and Congolese (Ituri Forest, DRC) horticulturalists/hunter‐gatherers (Ellison et al. 1986, 1989), Nepalese farmers (Panter‐Brick et al. 1993), Polish farmers (Jasienska and Ellison 1998), and Bolivian Quechua agropastoralists (Vitzthum et al. 2000).
FIGURE 3.

Mid‐luteal progesterone (PFree‐SAL concentration) in five population samples assayed following the same protocol in the same laboratory (Ellison et al. 1993). The left panel compares all participants in each study sample. The right panel compares similar subsamples (ages 25–35 years; cycles during the population's least stressful season) from each study (data from Vitzthum et al. (2000): table 2).
Only a few studies (Appendix A) have published (in English) evaluations of the association of PFree‐SAL and PTotal‐VEN in populations that are outside the United States, Europe, or New Zealand: China (Liu et al. 2018; Wong et al. 1990), Thailand (Chearskul and Visutakul 1994; Vienravi et al. 1994), Iraq (Abood 2008), and Bangladesh (Houghton 2008). In every case, the correlation of PFree‐SAL and PTotal‐VEN was greater than 0.7, except for older Iraqi women (Study 25). However, correlations say nothing about the UF (i.e., even if populations differed in UF, the correlation of population‐specific PFree‐SAL and PTotal‐VEN would not necessarily differ across populations).
1.3. Variation of UF by Cycle Phase
Perhaps the earliest indication that UF varies by cycle phase comes from Chearskul et al. (1982) (Appendix A: Study 3). From their published summary data (their figure 8), we estimated UFFOLL to be about 3% and UFLUT to be about 0.9%. Subsequent independent studies likewise reported UFFOLL to be from two to six times higher than UFLUT (Appendix A: Studies 2, 5, 7–9, 12–14 [Bourque et al. 1986; Cedard et al. 1984; Choe et al. 1983; De Boever et al. 1986; Evans 1986; Riad‐Fahmy et al. 1987; Tallon et al. 1984; S. Walker et al. 1981; Zorn et al. 1984]). Some, perhaps much, of the variability across studies in UF likely reflects differences in calculating UF (discussed in Section 2.4 and Appendix B), selected cycle days, samples per person, and collection and assay protocols. All of these early studies were in countries where the population was predominantly of European descent (e.g., United Kingdom, United States), so it is unlikely that the study differences in UF are attributable to inherent population differences in steroid biochemistry. 5 Notably, a study (Appendix A: Study 23) in Thailand (Chearskul and Visutakul 1994) reported UF comparable to those reported by the United Kingdom and United States studies.
FIGURE 8.

Scatterplots of paired measurements of PFree‐SAL (y‐axis, log scale) versus PTotal‐VEN (x‐axis, log scale) for each participant in four studies. Linear regression models (panels A and D) and linear mixed model (panel B) (see endnote 7) are shown for luteal phase pairs (excluding outliers) in each sample. x and y scales are identical across studies. Shaded areas represent values below the limit of assay sensitivity (shown if reported by study authors). Diagonal dotted lines represent lines of constant uptake fraction (UF = PFree‐SAL/PTotal‐VEN) from 0.2% to 100%. Panel A (this study): 66 pairs from 36 Bolivians. Panel B (data from Evans 1986): 38 pairs from 4 New Zealanders. Panel C (data from Wang and Knyba 1985): 36 pairs collected on a random cycle day from 36 Britons. Study authors screened values below the sensitivity of either assay (see their figure 4 for unscreened full data set). Panel D (data from Chatterton et al. 2006): 25 pairs from 25 Bolivians, and 15 pairs from 15 Chicagoans. In panels A and B, paired samples were collected during the follicular phase (blue triangle) and luteal phase (red dot). In panel B, purple “+” indicates collection on peri‐ovulatory days. In panels A and C, green “+” indicates uncertain phase. In panel D, all samples were collected at the presumptive mid‐luteal phase (NW‐Bolivia: red dots; NW‐Chicago: black “x”). Circled symbols are PFree‐SAL–PTotal‐VEN pairs with implausibly high UF, suggesting sample contamination.
Although these studies, individually and collectively, refuted the previously common assumption that UF is, on average, constant across the cycle, the various authors concluded that the high correlations between PFree‐SAL and PTotal‐VEN supported the use of PFree‐SAL for monitoring ovarian function. These findings also made evident the need to account for cycle phase when calculating the correlation of matched saliva–serum pairs. Nonetheless, depending on the research question, phase differences in UF do not lessen the utility of PFree‐SAL as a biomarker for ovarian functioning. Changes in PFree‐SAL follow a similar trajectory to those in PTotal‐SER. Therefore, the timing of ovulation (if it occurs) and cycle phase can be determined fairly accurately from the pattern of PFree‐SAL variation in serial saliva samples spanning the cycle (Vitzthum 2021).
Most investigators listed in Appendix A explicitly noted the limitations of relying on a single saliva or blood sample. Although studies have concluded that neither salivary metabolism nor saliva flow rate is a source of error in PFree‐SAL (Blom et al. 1993; Laine and Ojanotko 1999), the use of any single P measurement in diagnoses or research is problematic for several other reasons (Ansbacher 1990; Arslan et al. 2023; Fujimoto et al. 1990). Because the secretion of P is pulsatile (Delfs et al. 1994; Filicori et al. 1984; Fujimoto et al. 1990; L. L. Lewis et al. 1995; O'Rourke and Ellison 1990; Rossmanith et al. 1990; Soules et al. 1988; Veldhuis et al. 1988), a single blood draw or saliva sampling is unknowingly at the peak, nadir, or in‐between pulses. Significant diurnal variation in both PFree‐SAL and PTotal‐VEN has also been observed in free living study participants and controlled laboratory settings (Patil et al. 2023; Rahman et al. 2019). In the body, the half‐life of progesterone is about 5 min (Kolatorova et al. 2022), and the passage from blood to saliva is about 1 min (Blom et al. 1993). Therefore, time lags between the collection of blood and saliva samples and prolonged saliva collection may yield a PFree‐SAL measurement that is not strictly synchronous with the measured PTotal‐VEN, contributing to differences between studies in calculated UF and the correlation of PFree‐SAL and PTotal‐VEN. Other factors that may affect measurement precision and accuracy, and hence correlation and calculated UF, include the protocols and materials for collecting and storing samples, variation in study participants' adherence to protocols, and different types of assays. Idiosyncratic and unsuitable statistical analyses also make it difficult to interpret findings and to compare outcomes across these and other studies.
These methodological differences aside, if UF does differ systematically across populations or even subpopulations, some research questions cannot be addressed with PFree‐SAL, and much of the work and theory relying on such data would need rethinking. Furthermore, future studies might not be able to rely on this noninvasive method for adequately monitoring ovarian functioning in a broad array of investigations of the determinants of human health and adaptations. For these reasons, we conducted this study to test the three hypotheses listed above.
2. Methods
2.1. Study Sample
We recruited study participants via word of mouth and posted announcements at the Universidad Mayor de San Andrés, La Paz. All participants were native Bolivians, born and residing at ≥ 3500 m, from 20 to 40 years old inclusive, of normal weight for height, regularly cycling for at least the past 3 months, and not using hormonal contraceptives nor any other hormonal medication during the previous 4 months. Of 38 women who entered the study, 36 completed all protocols.
Each participant was interviewed in her preferred language (Aymara or Spanish) by a native speaker regarding her birthplace and date, medical and reproductive histories, residences, education, and occupations. A private training/practice session on saliva collection was done prior to the first scheduled day of sample collection. A team member demonstrated the full protocol, which was then practiced by the participant. Standard anthropometrics (weight, height, arm circumference, and four skinfolds) were measured by a single experienced physician (Dr. Hilde Spielvogel).
2.2. Sample Collection
Based on reported first day of the previous menses, two sample collections were scheduled to coincide with days approximating the mid‐follicular and mid‐luteal phases, respectively. Each collection comprised staff‐monitored participant self‐collection of saliva according to a standard protocol (Vitzthum 2021) immediately followed by venous blood draw by an experienced phlebotomist. Collection materials were made of either laboratory‐quality glass (blood) or polypropylene (saliva). On the morning of the scheduled appointment, a participant was required to refrain from eating or drinking anything other than plain water, to refrain from tooth brushing or flossing (to avoid gum irritation), and not to wear lipstick or other ointments on the lips or mouth. Upon arrival at the laboratory, she rinsed her mouth with cool clean water to remove any debris and to help constrict gum capillaries to reduce the risk of blood contamination. To avoid sample dilution, saliva collection began ≥ 20 min after rinsing. Saliva production was promoted by gently chewing a piece of inert wax, allowing saliva to drool into the collection vial. Participants avoided talking or sucking their cheeks during saliva collection, which typically required 2–5 min. A research team member monitored study participant preparation and saliva collection throughout the procedure.
Shortly after collection, blood (but not saliva) samples were centrifuged, and serum and saliva aliquots were frozen at −20°C and maintained frozen until assayed. All samples were air shipped in a single day in an insulated shipping box packed with −20°C phase‐change ice packs; a constant‐read electronic thermometer confirmed that the samples had stayed frozen throughout shipping (Thornburg and Vitzthum 2020).
For each participant, the expected date of the first day of the next menses was estimated based on prior cycle length. Participants were asked to record when the upcoming first day of bleeding actually occurred and to contact the research team. Participants were also contacted shortly after the anticipated first day and thereafter to learn the actual first day of menses. Of the 36 women, 3 were uncertain of the day of menses onset.
At their first sample collection appointments (first round, Figure 4), saliva samples from four women were contaminated by visible gum bleeding; two of these women withdrew from the study, and two elected to continue. In one other case, samples were not assayed because of a lengthy time delay between saliva and blood collection. Of 36 women who participated in the second round of collection, one saliva sample had visible blood contamination, and 2 other women had trouble producing an adequate saliva sample. A total of 66 sample pairs were assayed; all measurements were included in subsequent statistical analyses except as noted.
FIGURE 4.

Study sample composition.
2.3. Laboratory Assays
Enzyme immunoassays (EIAs) were done at the CISAB Laboratory at Indiana University, Bloomington, USA, using serum kits from American Laboratory Products Company (Salem, NH, USA) with a lower limit of detection of 0.318 nmol/L (=0.1 ng/mL) PTotal‐SER and saliva kits from Salimetrics LLC. (State College, PA, USA) with a lower limit of detection of 31.8 pmol/L (=10.0 pg/mL) PFree‐SAL. All saliva (or serum) samples from a given participant were assayed directly on the same plate in sequential positions on the plate according to the manufacturer's instructions. For the PFree‐SAL assay, participants were randomly allocated to one of two plates and randomly allocated to positions on the plate. For the PTotal‐SER assay, the allocation of participants and positions on the plates mirrored those of the salivary assay plates. All samples were assayed in duplicate.
A saliva–serum sample pair was designated as follicular if it was collected ≥ 15 days before, or as luteal if it was collected ≤ 14 days before, the reported first day of menses following sample collection; the luteal designation is presumptive because the occurrence and timing of ovulation in a given cycle is unknown. Of 66 serum–saliva pairs, number of follicular = 29 (median collection day = cycle day 8 [=reverse cycle day −21]), number of luteal = 34 (median collection day = cycle day 22 [=reverse cycle day −8]), and 3 were unassigned because of uncertainty in menses onset.
2.4. Analytical Approaches
PFree‐SAL units are pmol/L (=0.3145 pg/mL); PTotal‐VEN units are nmol/L (=0.3145 ng/mL). Data were analyzed using SPSS v29.0 (IBM). Sample central tendency is reported as mean ± standard deviation (SD) or median; coefficient of variation (CV) = SD/mean. Statistical significance was set at p ≤ 0.05. Descriptive statistics were calculated for PFree‐SAL, PTotal‐SER, UFFOLL, UFLUT, and participant characteristics. Possible subsample differences arising from the allocation of samples to the two assay plates were evaluated with Student's t‐test. The relationship of PFree‐SAL to PTotal‐SER was evaluated statistically including Spearman's correlation 6 and hierarchical linear modeling (HLM, SPSS mixed model command). HLM accounts for the lack of independence of serial sample pairs from the same individual (repeated measurements violate standard linear regression's assumption of independence). The impact of outliers in the data set (which may arise from errors in reported days of menses, collection and/or assay protocols, sample storage, and/or sample contamination or dilution) was explicitly evaluated.
The data were also evaluated graphically. In the case of data with a large dynamic range (i.e., data for which the largest value is much larger than the smallest value, as is often the case for PFree‐SAL and PTotal‐SER), a standard “linear” plot makes it hard to distinguish relative variation among the smaller values. A plot on logarithmic scales makes it much easier to perceive relative variation among all the values, both small and large. For both linear and log scales, scatterplots of PFree‐SAL versus PTotal‐SER are a useful tool for understanding the joint distribution of these data. Such scatterplots can convey important information which is hidden by summary statistics or one‐dimensional plots such as histograms or cumulative distributions.
Recall that differences in laboratory protocols usually preclude direct comparison of analyte measurements across studies. However, plots using logarithmic scales allow for cross‐study comparisons of the trends and shapes in the distributions of the data. That is, suppose for example that we are comparing two studies (“A” and “B”), and that A's serum assay gives 1.25 times the result of B's serum assay for the same samples, and A's saliva assay gives 1.5 times the result of B's saliva assay for the same samples. Then in a log–log scatterplot of PFree‐SAL versus PTotal‐SER (such as those in Figures 5 and 8), the pattern of A's and B's points will be identical, with A's pattern shifted to the right by log(1.25) and up by log(1.5) with respect to B's pattern. In a log‐scale histogram or cumulative distribution of UF (such as in Figure 9), A's pattern will again be identical to B's pattern, except shifted to the right by log(1.5/1.25).
FIGURE 5.

Uptake fraction (UF = PFree‐SAL/PTotal‐VEN) in Bolivia sample (66 PFree‐SAL–PTotal‐VEN pairs, 36 women). PFree‐SAL (y‐axis, log scale) versus PTotal‐VEN (x‐axis, log scale). Diagonals represent lines of constant UF from 0.5% to 100%. For each sample pair, cycle phase is follicular (blue triangle), luteal (red dot), or uncertain (green cross). Vertical dashed line = 4 nmol/L PTotal‐VEN, above which all sample pairs have UF between 1.5% and 4.5%. Two vertical black arrows represent the phase‐specific change in UF due to the same quantity of a hypothetical contaminant in a saliva sample (discussed in text; see Table 3). Circled symbols are PFree‐SAL–PTotal‐VEN pairs with implausibly high UF, suggesting sample contamination. Shaded areas are below the limit of assay sensitivity.
FIGURE 9.

Distribution of % uptake fraction (x‐axis, log scale) in four study samples for luteal‐phase (red) and follicular‐phase (blue) samples. Histogram (left y‐axis, bins are equally spaced in log(UF)), cumulative distribution (right y‐axis), median (◊), and IQR (|—|). Panel A (this study); Panel B (data from Evans 1986); Panels C and D (data from Chatterton et al. 2006).
2.4.1. Calculating and Comparing UF
There is no standard approach for calculating mean UF, and the most common computation (the ratio of arithmetic mean [A‐mean] of the saliva analyte to the A‐mean of the venous analyte) is not the most useful. Nonetheless, for comparison of our findings to the published literature, we calculated this ratio as has been done in most other reports. Specifically,
| (1) |
where N equals the number of pairs of concurrently collected saliva and serum concentrations, PFree‐SAL,i and PTotal‐SER,i respectively, for i = 1, 2, 3, …, N. Note that some authors have calculated this ratio using non‐paired data (i.e., all available measurements, even if one member of one or more paired samples was unavailable [e.g., lost because of contamination or assay failure]).
In Appendix B, we discuss the limitations of this common computational approach and present more suitable alternatives (e.g., geometric mean or median) for describing and comparing UF.
Regardless of which measure of central tendency is chosen, it is strongly preferable to use only paired data when computing the UF. If non‐paired data are used, then UF is inherently confounded by any inter‐cycle‐phase or inter‐individual variation in PFree‐SAL, PTotal‐SER, and/or UF that may be present. For example, a data set with a larger fraction of luteal phase saliva samples than the fraction of luteal phase serum samples will have an upwardly biased UF. This bias is easy to envision for the extreme hypothetical case of a study sample comprising only ovulatory cycles with all serum samples collected during the follicular phase and all saliva samples collected during the luteal phase: the computed UF would be systematically biased upwards.
3. Results
Table 1 presents sample characteristics and hormone concentrations for those women who completed the study protocol (n = 36). These variables did not statistically differ between the women randomly assigned to the two plates used in each assay.
TABLE 1.
Sample characteristics and hormone concentrations for 36 participants, 66 saliva–serum pairs (29 follicular phase, 34 luteal phase, 3 phase uncertain).
| Range | ||
|---|---|---|
| Age (mean ± SD, years) | 28.1 ± 6.6 | 20.1 to 39.8 |
| Menarcheal age (median years) | 13 | 9 to 19 |
| Age at first birth (median years), n = 19 | 22.5 | 16.9 to 33.0 |
| Number of live births (median) | 1 | 0 to 3 |
| Currently breastfeeding (n) | 4 | |
| Education (median, years) | > 12 | 5 to 12+ |
| Principal occupation | ||
| Student | 53% (19) | |
| Employed | 39% (14) | |
| Homemaker | 8% (3) | |
| Height (mean ± SD, cm) | 155 ± 5.9 | |
| Weight (mean ± SD, kg) | 57.5 ± 7.9 | |
| BMI (mean ± SD, kg/m2) | 24.0 ± 3.6 | |
| Cycle duration (median days) | 28.0 | 22 to 38 |
| Follicular phase: reverse cycle day range (median) | −21 | −29 to −15 |
| Luteal phase: reverse cycle day range (median) | −8 | −14 to −2 |
| PSal‐Free (pmol/L; mean ± SD [CV]) | ||
| Follicular phase | 100.5 ± 50.8 [0.51] | |
| Luteal phase | 446.5 ± 277.3 [0.62] | |
| PSer‐Total (nmol/L; mean ± SD [CV]) | ||
| Follicular phase | 0.989 ± 0.525 [0.53] | |
| Luteal phase | 19.14 ± 10.64 [0.56] | |
| Correlation: follicular‐phase UF vs. luteal‐phase UF (n = 17) | rho = −0.19 (p = 0.462) |
For each assay, controls (provided by the manufacturers except for pooled serum) were run in quadruplicates on two plates. For PFree‐SAL low (137 pmol/L) and high (3121 pmol/L) controls respectively, inter‐assay CVs were 5.3% and 2.8%, and intra‐assay CVs were all < 4.1% and < 4.5%. Intra‐assay CV (duplicates of participants' samples) was 2.2%. For PTotal‐SER pooled low (6.52 nmol/L) and high (30.85 nmol/L) controls respectively, inter‐assay CVs were 13.1% and 4.4%, and intra‐assay CVs were all < 11.2% and < 4.4%. Intra‐assay CV (duplicates of participants' samples) was 4.3%.
Figure 5 is a scatterplot on log scales of PFree‐SAL versus PTotal‐VEN for all 66 sample pairs from 36 Bolivians. Dashed diagonals represent lines of constant UF from 0.5% to 100% (mean UF reported by other researchers ranges from about 0.9% to 9% [Appendix A] depending on cycle phase and method of computation).
In our Bolivia sample, UF is between 1.5% and 4.5% for all sample pairs in which PTotal‐VEN is > 4 nmol/L (indicated in Figure 5 by vertical dashed line), and UF is < 10% for all cases of PTotal‐VEN > 1 nmol/L. A set of nine pairs (circled symbols), each with PTotal‐VEN < 1 nmol/L (for these nine, mean PTotal‐VEN = 0.536 ± 0.244 nmol/L), has implausibly high UF values (18.6%–52.5%). Eight were collected during the follicular phase and one during the very late luteal phase (red dot). Each of the nine pairs is from a different participant. The second PFree‐SAL–PTotal‐VEN sample pair collected from each of these nine participants has biologically typical UF (1.67%–5.44%) and, collectively, much higher mean PTotal‐VEN (19.52 ± 12.16 nmol/L), typical of luteal‐phase P.
Excluding the one luteal‐phase outlier (circled red dot), PTotal‐VEN is > 1 nmol/L in all luteal phase samples, and UFLUT is 1.5%–8.8% (median = 2.2%, G‐mean = 2.4%). Excluding the eight follicular‐phase outliers, UFFOLL is slightly more variable (3.2%–12.6%) and greater (median = 7.3%, G‐mean = 7.0%) than UFLUT (also see Figure 9A in Section 4.2).
The correlation of PFree‐SAL and PTotal‐VEN is high for the total sample (Spearman rho = 0.858) and luteal‐phase pairs (rho = 0.857), but low for the follicular‐phase pairs (rho = 0.18) (Table 2). Correlation of the follicular‐phase pairs rises dramatically (rho = 0.70) once eight biologically improbable outliers are excluded from the analysis. (For comparison, Table 2 also shows results from two other studies that are discussed in Section 4.2.)
TABLE 2.
PFree‐SAL versus PTotal‐VEN correlations for Bolivia and Chicago samples in three data sets.
| Bolivia samples | Chicago samples | |
|---|---|---|
| This study | This study | |
| PFree‐SAL vs. PTotal‐VEN (all pairs) | rho = 0.858 (n = 66) | |
| PFree‐SAL vs. PTotal‐VEN (all luteal‐phase pairs) | rho = 0.857 (n = 34) | |
| PFree‐SAL vs. PTotal‐VEN (luteal; outlier excluded) | rho = 0.70 (n = 33) | |
| PFree‐SAL vs. PTotal‐VEN (all follicular‐phase pairs) | rho = 0.18 (n = 29) | |
| PFree‐SAL vs. PTotal‐VEN (follicular; outliers excluded) | rho = 0.70 (n = 21) | |
| Source: Lu et al. (1997) | Lu‐Chicago | |
| Luteal PFree‐SAL vs. PTotal‐VEN | r = 0.75 (n = 48), p < 0.001 | |
| Data source: Chatterton et al. (2006); samples assayed at Northwestern University | NW‐Bolivia | NW‐Chicago |
| PFree‐SAL vs. PTotal‐VEN (all pairs, presumed luteal) | r = 0.45 (n = 25), p = 0.026 | r = 0.17 (n = 15), p = 0.55 |
| PFree‐SAL vs. PTotal‐VEN (outliers excluded) | r = 0.43 (n = 24), p = 0.037 | r = 0.48 (n = 13), p = 0.10 |
Figure 6 is a scatterplot of UFLUT (y‐axis) versus UFFOLL (x‐axis) for the 17 women in our Bolivia sample who each had both a follicular and a luteal serum/saliva pair with no evidence of contamination in either pair. The correlation between an individual's UFFOLL and UFLUT was low (rho = −0.19) and nonsignificant (p = 0.462). In other words, it was not the case that an individual having a relatively high follicular‐phase UFFOLL tended to also have a relatively high luteal‐phase UFLUT, or vice versa. This observation refutes Hypothesis 3 (i.e., that within a population, UF is consistently higher [or lower] in some individuals than in most others).
FIGURE 6.

UFLUT versus UFFOLL in this study of Bolivian women. N = 17 women, each having a follicular and a luteal serum/saliva pair (nine outlier PFree‐SAL–PTotal‐VEN pairs with UF > 18% were excluded).
Figure 7 depicts the best‐fit mixed models for PFree‐SAL as a function of PTotal‐VEN. 7 The model parameters changed only slightly when the nine outlier PFree‐SAL–PTotal‐VEN pairs with UF > 18% were excluded from the analysis. Variables for cycle phase, cycle day, and reverse cycle day were not significant when included in the model and did not improve the fit of the model (analyses not shown).
FIGURE 7.

Best‐fit mixed models for PFree‐SAL as a function of PTotal‐SER. Individual intercepts are treated as random effects. Solid line: N = 66 PFree‐SAL–PTotal‐VEN pairs, intercept = 77.4 pmol/L (p < 0.001), β = 0.0191 (p < 0.001). Dashed line (nine pairs having a likely contaminated sample are excluded): N = 57 pairs, intercept = 121.3 (p < 0.001), β = 16.5 (p < 0.001). Insert in upper left is enlargement of boxed section of plot in lower left quadrant; blue triangles = follicular phase, red circles = luteal phase, green cross = phase uncertain; circled symbols are PFree‐SAL–PTotal‐VEN pairs with implausibly high UF, suggesting sample contamination.
4. Discussion
Analyses of these new data from Bolivian women support the utility of, and highlight the necessary conditions for, the use of PFree‐SAL as a biomarker of ovarian functioning and as a proxy for PTotal‐VEN in individuals and across populations.
In this set of paired saliva and venous blood (specifically, serum) samples, mean and SD for follicular‐ and luteal‐phase PTotal‐VEN and PFree‐SAL (Table 1) are comparable to published serum reference values (Stricker et al. 2006) and “expected normal ranges” for the salivary assay (Salimetrics 2010), respectively. The similarity of the samples' CV (Table 1) for PTotal‐VEN and PFree‐SAL in each cycle phase (respectively, 0.53 and 0.51 in the follicular phase, and 0.56 and 0.62 in the luteal phase) indicates that phase‐specific PFree‐SAL is not substantially more or less variable than is PTotal‐VEN for the same phase. The correlations (Spearman rho) of PFree‐SAL and PTotal‐VEN for all 66 pairs (rho = 0.858) and for the luteal‐phase pairs (rho = 0.857) are high. The correlation for follicular‐phase pairs is 0.70 once outliers with an implausible UF (≥ 19%) are excluded.
At least some of the differences in previously reported correlations of PFree‐SAL and PTotal‐VEN arise from study differences in time lags between the collections of blood and a paired saliva sample. For this reason, in the present study in which the specific goal was to evaluate the concurrent UF, we took particular care to collect saliva near‐concurrent with the blood draw. Illustrative of this point, the two saliva–serum pairs in our study which were not collected concurrently (and were not included in the statistical analyses) had UFs of 26% and 12%, both substantially higher than the median UF in our study.
Follicular‐phase PFree‐SAL and PTotal‐VEN pairs correlated well (rho = 0.70) once biologically implausible outliers were excluded from the analysis (circled symbols in Figure 5). Such outliers are credibly explained by saliva contamination from minute cross‐reactants (e.g., food particles, drink residues, chewed substances, ointments, make‐up [Núñez‐de la Mora et al. 2007; Vitzthum et al. 1993; Wang and Knyba 1985]). Not surprisingly, when PTotal‐VEN is very low, as is typical during the follicular phase, the impact of a contaminant is proportionally far greater than when PTotal‐VEN is high.
For example (Table 3), suppose that the true UFFoll = 8% and UFLut = 2%, and that some saliva samples have a contaminant that cross‐reacts in the assay to produce a false PFree‐SAL reading of 120 pmol/L. Then for a true mid‐follicular PTotal‐VEN of 900 pmol/L and PFree‐SAL of 72 pmol/L, the sum of the true and false readings yields a measured PFree‐SAL of 192 pmol/L and a measured UF of 21.3% (tall vertical black arrow in Figure 5). In contrast, for a true mid‐luteal PTotal‐VEN of 30 nmol/L and PFree‐SAL of 600 pmol/L, adding the false reading from contamination yields a measured PFree‐SAL of 720 pmol/L, which only raises the measured UF to 2.4% (short vertical black arrow in Figure 5).
TABLE 3.
Different impacts of same hypothetical saliva contaminant (which cross‐reacts in the assay to produce a false PFree‐SAL reading) in follicular versus luteal phases.
| Mid‐follicular | PTotal‐VEN (nmol/L) | PFree‐SAL (pmol/L) | UFFOLL (%) |
|---|---|---|---|
| True | 0.9 | 72 | 8.0 |
| Contaminant | 120 | ||
| Total (measured) | 192 | 21.3 |
| Mid‐luteal | PTotal‐VEN (nmol/L) | PFree‐SAL (pmol/L) | UFLUT (%) |
|---|---|---|---|
| True | 30 | 600 | 2 |
| Contaminant | 120 | ||
| Total (measured) | 720 | 2.4 |
Such low‐level contamination in some samples may explain our observation that, independent of phase, PFree‐SAL and PTotal‐VEN are highly correlated in pairs in which PTotal‐VEN > 4 nmol/L (rho = 0.807, p < 0.001), poorly correlated in pairs in which PTotal‐VEN < 4 nmol/L (rho = 0.168, p = 0.306), but moderately well correlated in pairs in which PTotal‐VEN < 4 nmol/L once samples with anomalously high UF (> 15%, evidence of likely contamination) are excluded (rho = 0.697, p < 0.001).
4.1. Hypothesis 1: UF Differs by Ovarian Cycle Phase
These new data from Bolivian women support Hypothesis 1. Excluding outliers, median UFFOLL (=7.3%) is more than three‐fold higher than median UFLUT (=2.2%). Previous studies have reported phase‐associated differences in mean UF of similar (Appendix A: Studies 2, 5, 7, 12) (Choe et al. 1983; Evans 1986; Tallon et al. 1984; S. Walker et al. 1981) and even greater magnitude (Appendix A: Studies 9, 13, 14) (Bourque et al. 1986; Cedard et al. 1984; De Boever et al. 1986).
To illustrate, we plotted our Bolivian data shown in Figure 7 on a log scale (Figure 8A) and data from a New Zealand (NZ) sample (Evans 1986) on the same log scale (Figure 8B), and also depicted lines of equal UF (dotted diagonals). In the NZ sample (outliers were excluded by Evans), median UFFOLL = 2.7% (n = 10) and was three‐fold higher than UFLUT (=0.9%, n = 23), a ratio similar to that observed in our Bolivian sample.
Also note the similar shape in the distributions of Bolivian and NZ data. In both the Bolivian and NZ samples, luteal PFree‐SAL–PTotal‐VEN pairs (red dots) tend to monotonically increase together, hence falling within a relatively narrow band of UF values at the highest serum values. The linear regressions (red curves plotted in Figure 8A,B) of PFree‐SAL on PTotal‐VEN for luteal phase pairs (outliers excluded) in these study samples are also similar. Furthermore, in both the Bolivian and NZ samples, the distribution of the follicular PFree‐SAL–PTotal‐VEN pairs (blue triangles) is more globular and spans a wider range of UF values than the distribution of the luteal pairs (red dots). The distribution of data (panel 8C) from a UK sample (Wang and Knyba 1985) also resembles those in the Bolivia and NZ samples (i.e., a narrower band of UF at higher PTotal‐VEN, fanning into a wider range of UF as PTotal‐VEN decreases) with the caveat that cycle phase was not identified in the UK data (samples were collected at random during the cycle).
The sources of observed phase differences in UF are not immediately obvious. The disproportionate effect of a contaminant when PFree‐SAL concentration is low is one factor (discussed above). However, this sort of random error is unlikely to produce the fairly consistent UFFOLL values that have been reported by many independent studies (Appendix A). Therefore, idiosyncratic effects aside, there is presumably a biological explanation for relatively higher UFFOLL compared to UFLUT.
Explicitly, this difference in UF is likely a consequence of phase‐specific physiology (De Geyter et al. 2002; Hamidovic et al. 2020; Hodyl et al. 2020; Vitzthum et al. 2006). For example, Misao et al. (1999) reported that CBG mRNA level is positively correlated with PTotal‐VEN (r = 0.85, p < 0.01) and is highest at the mid‐luteal phase. Cameron et al. (2010) demonstrated that CBG affinity for P is temperature dependent over a clinically relevant range of human body temperature. They also suggested that because CBG binds preferentially to cortisol over progesterone, changes in plasma cortisol levels could result in changes in the free versus bound fractions of venous P and hence the ratio of PFree‐VEN to PTotal‐VEN and PFree‐SAL to PTotal‐VEN (i.e., the UF). Lewis and Elder (2011, 2013) challenged standing assumptions regarding the dynamic relationship between total CBG and cortisol (and by extension, P) with evidence that intact and elastase‐cleaved CBG may coexist in circulation.
Collectively, these studies are consistent with the hypothesis that the variation in UF with cycle phase reflects corresponding variation in the free versus bound fractions of venous P, itself related to fluctuations in CBG conformers and/or bound/free cortisol concentrations. In other words, it appears that with ovulation, there follows a progressive increase in P and upregulation of CBG mRNA, such that a lower fraction of venous P is free (and available to diffuse into saliva); but the net effect is higher PFree‐VEN in the luteal than in the follicular phase (as was observed by Choe et al. 1983; Evans 1986; Wang and Knyba 1985; Appendix A: Studies 5, 10, 12).
4.2. Hypothesis 2: UF in Bolivian Women Differs From That of Non‐Bolivian Women
Our new data from Bolivian women refute Hypothesis 2. Specifically, the UFs in our study sample of Bolivians (Table 1, Figures 5 and 9) are consistent with other reported values for various populations (Appendix A: Studies 5, 7, 14 [Choe et al. 1983; De Boever et al. 1986; Tallon et al. 1984]) but differ from those of Chatterton et al. (2006), who argued that steroid biochemistry is somehow unusual in Bolivians. Further examination of the data used by Chatterton et al. (2006), described below, suggests more prosaic interpretations of those data.
Based on pairs of PFree‐SAL and PTotal‐VEN samples collected at the presumptive mid‐luteal phase, Chatterton et al. (2006) reported that the mean PTotal‐VEN for NW‐Bolivia was 197% of the mean PTotal‐VEN for NW‐Chicago, but the mean PFree‐SAL for NW‐Bolivia was only 48% of the mean PFree‐SAL in NW‐Chicago (Table 4). This computational approach is unique to their study and therefore cannot be compared to the work done by other investigators (Appendix A). More importantly, their approach is useful (i.e., arithmetically and biologically meaningful) only if the UF is constant for all saliva–venous pairs in each of the analytical samples (i.e., only if the UF does not vary between saliva–venous pairs in the NW‐Chicago sample, and likewise for the NW‐Bolivia sample). It is evident in Figures 8D and 9C,D, that in both the NW‐Chicago and NW‐Bolivia samples there is marked variation in UF across the paired saliva–venous samples even though all pairs were collected during the presumptive mid‐luteal phase. Furthermore, the variation is greater in the NW‐Chicago sample (the two sample pairs we designated as outliers in Figure 8D were included in the analyses in Chatterton et al. (2006)). In sum, their calculation of the ratio of the group means for each analyte for the two population samples cannot address whether and to what extent the UF of paired saliva and venous samples might differ between populations.
TABLE 4.
Computational approaches for evaluating the relationship between PFree‐SAL and PTotal‐VEN in NW‐Bolivia and NW‐Chicago samples.
| Population ratio for each analyte (reported by Chatterton et al. 2006) | ||
|---|---|---|
| (NW‐Bolivia mean PTotal‐VEN)/(NW‐Chicago mean PTotal‐VEN) | 30.2/15.3 = 197% | |
| (NW‐Bolivia mean PFree‐SAL)/(NW‐Chicago mean PFree‐SAL) | 252/522 = 48% | |
| UF computation a | NW‐Bolivia | NW‐Chicago |
|---|---|---|
| All data (reported by Chatterton et al. 2006) | n: PFree‐SAL = 25, PTotal‐VEN = 26 | n: PFree‐SAL = 17, PTotal‐VEN = 18 |
| (A‐mean PFree‐SAL)/(A‐mean PTotal‐VEN) | 0.83% | 3.4% |
| Alternative approaches/estimates for same data used by Chatterton et al. (2006) | ||
|---|---|---|
| Only paired data (PFree‐SAL & PTotal‐VEN) | n pairs = 25 | n pairs = 15 |
| (A‐mean PFree‐SAL)/(A‐mean PTotal‐VEN) | 0.87% | 3.1% |
| A‐mean (individual PFree‐SAL/PTotal‐VEN) | 1.97% | 12.1% |
| (G‐mean PFree‐SAL)/(G‐mean PTotal‐VEN) | 1.17% | 5.0% |
| G‐mean (individual PFree‐SAL/PTotal‐VEN) | ||
| Median (individual PFree‐SAL/PTotal‐VEN) | 0.74% | 4.4% |
| Paired data without outliers | n pairs = 24 | n pairs = 13 |
| (A‐mean PFree‐SAL)/(A‐mean PTotal‐VEN) | 0.88% | 2.5% |
| A‐mean (individual PFree‐SAL/PTotal‐VEN) | 2.04% | 5.1% |
| (G‐mean PFree‐SAL)/(G‐mean PTotal‐VEN) | 1.2% | 3.45% |
| G‐mean (individual PFree‐SAL/PTotal‐VEN) | ||
| Median (individual PFree‐SAL/PTotal‐VEN) | 0.77 | 3.14% |
A‐mean = arithmetic mean; G‐mean = geometric mean.
Chatterton et al. (2006) also reported (A‐mean PFree‐SAL)/(A‐mean PTotal‐VEN) using all measurements, regardless of whether they were paired (Table 4). This approach appears to be the most frequently used by others (Appendix A); however, it is often not clear in these studies whether a previously reported UF included all measurements or only paired data. Using non‐paired data means that any between‐day or between‐individual variation in PFree‐SAL, PTotal‐VEN, and/or UF within a group confounds the inter‐group UF difference (see discussion in Section 2.4 and Appendix B). Calculated using all data, UFLut for NW‐Bolivia = 0.83%, similar to the range of values reported by others for the luteal phase (Appendix A), and NW‐Chicago = 3.4%, greater than any of these previous reports. Table 4 also presents UF for each NW sample using alternative computational approaches and only paired data. Notably, for NW‐Chicago the arithmetic mean of all the individual UFs is 12.1%, an exceptionally high mean UF that strongly suggests the presence of one or more outliers and/or other anomalies in the NW‐Chicago data.
The radioimmunoassay (RIA) protocol used by Chatterton et al. (2006) was first published by Lu et al. (1997), who reported r = 0.75 for paired PFree‐SAL–PTotal‐VEN samples collected during the luteal phase from 48 Chicago women (designated here as “Lu‐Chicago”). In contrast, the correlation is only r = 0.17 for NW‐Chicago (n = 15) and r = 0.45 for NW‐Bolivia samples (n = 25) (Table 2). Excluding three extreme outliers (which lower r and strongly bias the group means), the correlation is still much lower in both samples (NW‐Chicago r = 0.48; NW‐Bolivia r = 0.43) than those reported by Lu et al. (1997) and nearly all other researchers (Appendix A). These much lower correlations strongly suggest—particularly in the case of the NW‐Chicago sample—that these assay data do not accurately represent the true hormone levels in the study participants, most likely due to failed laboratory protocols and/or sample contamination or degradation.
Some of the problems with the NW‐Chicago and NW‐Bolivia data are evident in the scatterplot of the data and the plotted regression line fitted to the non‐outlier luteal‐phase data in each sample (Figure 8D). In such a scatterplot, an ideal data set (PFree‐SAL highly correlated with PTotal‐VEN) would have all the points representing luteal‐phase pairs falling on a relatively narrow diagonal band of UF and those points representing follicular‐phase pairs falling within a somewhat broader band with biologically reasonable values and ranges for each hormone and each UF. Correspondingly, the luteal‐phase regression model would have an intercept that is only a tiny fraction of a typical mid‐luteal PFree‐SAL, so that the linear (“slope”) term in the model contributes a large majority of the modeled PFree‐SAL for almost all luteal data. Such patterns are seen in Figure 8A–C. The NW‐Chicago and NW‐Bolivia data are far from this expectation. In particular, there are extreme outliers. Two of the NW‐Chicago PFree‐SAL values are excessively high (> 900 pmol/L), and one NW‐Bolivia pair (boxed red dot in Figure 8D) has the implausible combination of a very low PFree‐SAL (51 pmol/L), suggesting an anovulatory cycle or a follicular sample, paired with a moderately high PTotal‐VEN (13.7 nmol/L), typical of an ovulatory cycle's luteal phase. These outliers severely bias the sample means, rendering questionable any interpretations based on the means reported by Chatterton et al. (2006) (see Table 4).
Moreover, the data for the NW‐Chicago sample are actually more anomalous than those for the NW‐Bolivia sample and thus are not explained by the hypothesis that Bolivians have unusual physiology and/or biochemistry. Figure 9 presents histograms and cumulative distributions of UF by cycle phase (red = luteal, blue = follicular) in Bolivia (this study, panel 9A) and NZ (panel 9B), and for mid‐luteal phase sample pairs in NW‐Chicago (panel 9C) and NW‐Bolivia (panel 9D); the x‐axis is log scale UF. In both Bolivia and NZ, the median luteal phase UF is lower and the variability is less than for the respective follicular‐phase UF. The luteal phase UF is much more variable (shown in Figure 9 as the distributions being much broader and the IQR “error bars” being much wider) in both NW samples (Figure 9C,D) than in the Bolivia (Figure 9A) or NZ (Figure 9B) samples. In fact, the NW‐Chicago sample has a more variable luteal UF (IQR = 3.45%) than the NW‐Bolivia sample (IQR = 1.63%).
The effects of these anomalies are evident in Figure 8D. Most notably, in both the NW‐Chicago and NW‐Bolivia samples (each comprising only luteal phase measurements), PFree‐SAL shows little systematic variation even as PTotal‐VEN ranges over more than a factor of 20. UF varies over almost as wide a range as PTotal‐VEN but in the opposite direction to PTotal‐VEN so that their product (PFree‐SAL) varies only weakly. This pattern is captured by the plotted regressions for NW‐Chicago (black line) and for NW‐Bolivia (red line), both of which are notably flatter (i.e., have much higher intercepts relative to the mid‐luteal PFree‐SAL) than those in NZ (Figure 8B) and Bolivia (Figure 8A). This relative constancy of PFree‐SAL across the luteal phase in both of the NW study samples is a biologically inexplicable pattern that has never been previously reported. In contrast, other published studies generally show that for data restricted to the luteal phase, PFree‐SAL is roughly proportional to PTotal‐VEN with a relatively narrow range of UF (see, e.g., Figure 9A,B).
We find sample contamination and/or inaccurate assays to be a more parsimonious explanation for the results reported by Chatterton et al. (2006). They suggested possible differences in bioavailability in the NW‐Bolivia sample as an explanation for their findings. However, their results are actually more anomalous (little systematic variation in PFree‐SAL while PTotal‐VEN varies over a wide range) for the NW‐Chicago sample than for the NW‐Bolivia sample (Figure 8D).
Moreover, if UF actually varied in the manner suggested by their presented data, then it would be difficult to explain how many other published studies of several different populations (Ellison et al. 1989; Jasienska and Ellison 1998; Panter‐Brick et al. 1993), including Bolivians (Vitzthum 2013; Vitzthum et al. 2002, 2004) and Chicagoans (Lu et al. 1997), have obtained reasonable profiles of PFree‐SAL during the course of an ovarian cycle (Figure 10).
FIGURE 10.

Salivary progesterone profiles of samples from four populations, measured in the same laboratory using the same assay protocol (Ellison et al. 1993).
4.3. Hypothesis 3: UF is Consistently Higher (Or Lower) in Some Individuals Than in Most Others
Hypothesis 3, proposed by Konishi et al. (2012), was not supported by our analyses of new data from Bolivians. We found no evidence of subsamples of women that had consistently higher or lower UF than the sample average. Notably, an individual's follicular and luteal phase UFs were not significantly correlated (Figure 6). None of the nine Bolivian women exhibiting an unusually high UF (> 19%) for one saliva–serum pair exhibited a particularly high UF in her other saliva–serum pair. Notably, PTotal‐VEN was < 1.0 nmol/L in all nine cases in which UF > 19%. The most parsimonious explanation for these nine outliers is undetected contamination in samples of particularly low PFree‐SAL (i.e., low signal to noise ratio as in the hypothetical example presented in Table 3). Consistent with this explanation, UF was 12%–59% in samples in which subjects had visible gum bleeding (these corrupted measures were neither plotted nor included in our analyses).
Konishi et al. (2012) evaluated PFree‐SAL (measured using Salimetrics EIA kits) and PTotal‐VEN (measured in DBS collected from a pricked fingertip) in samples of self‐identified Caucasian or Japanese‐ancestry women living in the Seattle area. Based on standard (fixed effects) hierarchical linear regression (pooled sample of 232 observations, 4 weekly paired saliva‐DBS samples from each of 58 women), they observed that the UF in some women were either consistently above or below the regression line, attributed such patterns to “individual‐specific characteristics perhaps salivary hormone excretion capacity,” and concluded that “salivary P4 [does] not closely mirror between‐individual variation of serum P4…in a substantial proportion of individuals.”
The differences in findings between Konishi et al. (2012) and our study (and also the past studies presented in Appendix A) likely arise from their analytical approaches. In a repeated measures study design (i.e., any study design where samples are collected at multiple times from the same participant), the samples from a single individual are not independent, thus precluding the use of standard linear regression. Mixed‐effects regression models, which we used to evaluate our Bolivian data, can address this nonindependence (Twisk 2006; West et al. 2007) and can (should) incorporate per‐individual random effects to model inter‐individual variation in overall hormone levels. Twisk emphasized that the inclusion of a random intercept is a conceptual necessity in regression models of repeated observations.
Konishi et al. (2012) treated 232 paired samples as independent observations in a standard linear regression analysis. Women with four positive residuals were designated as “high secretors,” those with four negative residuals were designated as “low secretors,” and those with a mix of positive and negative residuals were designated as “medium secretors.” However, the nonindependence of the four sample pairs from each woman means that a woman with one positive residual is more likely to have four positive residuals than would be predicted by random chance alone. Although Konishi et al. (2012) mentioned mixed‐effects models and included the mixed models' coefficients in their table 3, their interpretations and conclusions appear to be based solely on the fixed‐effects models.
In addition, the distribution of their PFree‐SAL data at a given PTotal‐DBS is quite skewed (has a long right tail; see their figure 3A). This means that, contrary to the assumption of their calculation of the expected number of high and low secretors in their sample, each individual residual is not expected to have an equal (50%) probability of being positive or negative. Instead, we expect > 50% negative residuals (relatively smaller in magnitude) and < 50% positive residuals (relatively larger in magnitude). This expectation is broadly consistent with their finding that 40% (23/58) of individuals had four negative residuals, but only 20% (11/58) had four positive residuals. A more precise assessment is not possible due to the nonindependence (correlation) of the four residuals from a single woman.
Another factor that may have confounded Konishi et al.'s analyses is their stated disregard of individual differences in cycle length or whether a cycle was ovulatory (even though they did collect these data). Unfortunately, their approach is likely to upwardly bias the number of follicular‐phase saliva–blood pairs given their sample collection schedule. The paired samples for their analyses were collected once per week for 4 weeks from each of 48 women, but it is unclear whether the collections were all on the same cycle days. Even if that were the case, those women with longer follicular phases (which is more variable in length than the luteal phase) may have contributed three follicular‐phase samples and only one luteal‐phase sample. An additional 10 women collected daily samples at home; of these, cycle days 3, 10, 17, and 24 were included in the analyses (for a total of 58 cycles of which 17 were missing a cycle end date). Of course, anovulatory cycles, by definition, would not have contributed any luteal phase samples (2 cycles > 73 days length were considered anovulatory but retained in the analyses; as no other criteria for anovulation were considered, all other cycles, some of which may have been anovulatory, were also retained).
As discussed earlier, UF differs by cycle phase in ovulatory cycles, and the low PTotal‐VEN values in the follicular phase are more likely to have lower signal to noise ratios. Nonetheless, Konishi et al. (2012) (p. 239) judged anovulation to be “irrelevant…because statistical analyses were conducted without taking cycle phase into account.”
As Konishi et al. (2012) did not consider these various sources of potential bias, interpretation of their findings is difficult. The specific cause of the greater similarity among the samples and residuals from a single woman cannot be ascertained from the provided information. Rather than positing unknown mechanisms, there are several more parsimonious explanations for their reported observations, including:
a participant's personal sample collection practices (e.g., a participant who typically shortens the waiting period after rinsing may have consistently lower measured UF than would otherwise be the case, and one who has undetected gum bleeding and/or poor tooth brushing/flossing technique, leaving debris in the mouth, may have consistently higher measured UF),
the use of DBS from finger pricks rather than serum from venipuncture,
delays between collecting saliva and DBS,
delayed or inadequate freezing of sample,
inclusion of four samples from each anovulatory cycle, all of which can be expected to have low progesterone and be more susceptible to contamination (which could produce a falsely elevated UF),
other plausible biological factors (e.g., variation in CBG and/or cortisol concentrations).
In sum, it is not surprising that more women have all positive or all negative residuals in Konishi et al. (2012)'s regression analyses than would be expected for independent random variation of residuals across individuals and samples (each with an independent 50%/50% probability of being positive/negative), and it cannot be inferred that such a pattern is likely to be the consequence of hypothesized individual “secretor capacity.” As discussed earlier, PFree‐SAL enters saliva via passive diffusion, the mechanics of which have not been shown to vary significantly between individuals. Studies of saliva flow rate and of salivary metabolism have reported neither factor to be an important source of error or variation in PFree‐SAL (Blom et al. 1993; Laine and Ojanotko 1999).
4.4. Advantages and Limitations of This Study
Our sample collection protocol was specifically optimized for high‐quality saliva collection (see Section 2.2). Advantages of this study include:
Pre‐collection individual training for each study participant included demonstration and practice of saliva collection protocol.
Participants abstained from breakfast the morning of sample collection to reduce the possibility of food‐particle contamination of saliva samples.
A research team member monitored saliva collection to ensure compliance with protocol.
Blood sampling followed saliva collection to minimize any negative impact of blood draws on saliva sample (passage of PFree from blood to saliva is only about 1 min [Blom et al. 1993]).
Follow‐up with participant to learn first day of subsequent menses (used to calculate reverse cycle day and estimate day of ovulation).
All sampling in the morning (before lunch) to minimize the potential effect of circadian variation.
Conducted phase‐specific analyses.
Used commercial EIA kits. Most studies that have examined the relationship between PFree‐SAL and PTotal‐VEN have used RIA assays (Table 1). EIA dispenses with the need to handle and dispose of radioactive materials and is now more commonly used in many research projects and smaller labs. But fewer studies have used EIA (instead of RIA) to assess the correlation between PFree‐SAL and PTotal‐VEN, a gap this study narrows.
Use of commercial assay kits makes it easier for other researchers to replicate/generalize our results.
Limitations of this study include:
No direct measure of occurrence and timing of ovulation.
Only two paired saliva/blood collections (follicular phase and presumptive luteal phase) per woman.
Because our study participants met selection criteria (e.g., regularly cycling and not using hormones) intended to minimize the effects of potential confounders, the study sample is not strictly representative of the population of Andean Bolivian women. In the absence of modern contraception, fertility is high (≥ 7 children/family) in Andean Bolivian women (Stinson 1982). The sample's low parity is not surprising (half of the sample are university students) but contrasts with reproductive patterns in the larger population, especially those in rural Andean communities where there is little access to contraception. Although early investigators had suggested high altitude might impair successful reproduction, this hypothesis was not supported by subsequent demographic data (reviewed in Vitzthum (2013)). Notably, there is evidence for natural selection for adaptation to hypoxic environments in Tibetan populations (Ye et al. 2024), but comparable work has yet to be undertaken in Andean populations.
Collectively, the study participants were a convenience (rather than random) sample and likely were, on average, economically and socially better off than the average Andean Bolivian woman. The participants included residents from a poorer neighborhood (where we had conducted a previous study [Vitzthum et al. 2002]). Some were maids/housekeepers or had other low‐paying and/or occasional jobs. The economic status of the study participants attending university was more varied, roughly ranging from low to high middle class; it is unlikely any students were very poor or especially well‐off.
5. Summary and Broader Implications
UF in this study sample of Bolivian women did not differ in any substantial or consistent manner from UF observed in women in other populations. In addition, an individual's follicular‐ and luteal‐phase UF were not correlated, refuting the hypothesis that there are Bolivian women who consistently have relatively higher or lower UF than the sample's mean UF. Mean UF did vary by cycle phase, being generally higher and more variable before ovulation than after. Prior research on the biochemistry of progesterone transport has consistently found that progesterone UF varies between cycle phases. To the best of our knowledge, there is no credible evidence that phase‐specific UF of progesterone substantially differs between human populations or individuals. However, this possibility has not yet been thoroughly investigated. The sources of phase‐associated variation in UF deserve additional study, particularly regarding temporal and individual variation in PTotal‐VEN and PFree‐SAL, and their dynamic relationship to variation in total and different conformers of CBG, and perhaps also to variation in free and bound cortisol.
There is substantial evidence of natural variation in progesterone concentration between populations and between healthy individuals within populations (Figures 1, 3, and 10). At the same time, variability (measured by the ratio of the 95th to the 5th percentile) in progesterone biomarkers is similar in samples of Germans (mean‐peak‐pregnanediol‐glucuronide (PdG) varies 3.65‐fold) and Bolivians (mean‐peak‐PFree‐SAL varies 3.9‐fold) (Vitzthum et al. 2021). Although there are several hypotheses, the cause(s) of progesterone variation are uncertain. There is, nonetheless, clear empirical evidence that Bolivian women successfully reproduce (averaging about 7–8 pregnancies in a lifetime) with an average progesterone concentration only 70% that of US women (Vitzthum et al. 2004). These and other findings raise questions regarding the criteria for defining “normal” progesterone levels.
All too often, it is assumed without even a semblance of justification that the observed biology of Western Europeans and their descendants (aka “white”) is “normal” and thus the natural yardstick by which to assess the normalness of a phenotype in other populations. With regard to menstrual cycling, this myopia has been repeatedly interrogated (Eaton et al. 1994; Strassmann 1997; Vitzthum et al. 2021) and yet it persists, as witnessed by the assumption that the progesterone data from the NW‐Chicago sample were assumed to be correct and, therefore, those from the NW‐Bolivia sample were presumed to be aberrant. To the contrary, compared to the findings from other studies, it is the NW‐Chicago data that were decidedly more “abnormal.”
General acknowledgment of the extensive natural (i.e., healthy and “normal”) variability in ovarian cycling, between individuals and populations, has been slow in coming despite having been extensively documented (Arey 1939; Foster 1889; Fraenkel 1926; Treloar et al. 1967). Because the timing of hormonal changes during the ovarian cycle is uncertain, a single measurement of a cycle hormone provides limited information regarding ovarian functioning. The common practice of collecting data only during the early follicular phase in order to avoid accounting for the effects of ovarian steroid variation has hampered basic research and clinical trials in menstruating persons. The serial collection of salivary biomarkers, including PFree‐SAL, has yielded otherwise unattainable insights regarding variation in human reproductive functioning. There are, however, many aspects of saliva collection, storage, shipping, and processing that require consistent adherence (Ellison 1988; Gröschl 2017; Vitzthum 2021). Appendix C discusses some of the more notable challenges in the assessment of ovarian functioning. Neglecting to address these issues can lead to spurious outcomes and misinterpretations of the data.
With due attention to the collection and handling of saliva samples, PFree‐SAL is a suitable biomarker for evaluating temporal, individual, and population variation in PTotal, particularly if repeated sampling is done during the luteal phase when PTotal is elevated in ovulatory cycles. Investigators have often cautioned that a single P measurement has only limited research or diagnostic utility. A principal advantage of PFree‐SAL is serial sample collection where such would be difficult or impossible with venous blood collection. In novel contexts, a selected biomarker should be pre‐tested to evaluate research team and study participant compliance to protocols and setting‐ and culturally‐specific challenges (including logistics and unanticipated sources of sample contamination). Data on likely and potential covariates of hormonal variation (e.g., age, ethnicity, socioeconomic resources) should be collected. As did Wang and Knyba (1985), it is very valuable to future investigators to publish all data and measurements, even for those cases where the sample measurements were below assay sensitivity, and to calculate sample statistics that both include and omit such measurements and other outliers.
Often, a biomarker is a surrogate or approximation of the biological status or process of interest, but is not of interest in and of itself. Careful protocols can reduce error in the biomarker, but less error does not necessarily make the biomarker a more suitable proxy for the focus of interest. The history of the use of body mass index (BMI) exemplifies this distinction. There has been extensive effort to make measurements of height and weight more precise, and by extension the calculation of BMI more accurate. Nonetheless, only relatively recently has there been a comprehensive reconsideration of whether BMI is a suitable indicator of body composition in all populations (Heymsfield et al. 2016).
Selection of a biomarker should be based on the research question, the research context, and the available resources. Because confidence in the selected biomarker is requisite to a successful study, the suitability of biomarker options should be evaluated prior to initiating the bulk of sample collection. Different steroids have different properties, functions, and carrier molecules. The biochemistry of progesterone cannot be generalized to all steroids or even all reproductive steroids.
Gender‐based health disparities are pervasive throughout the world regardless of an individual's or population's social and economic status. Women's health is manifestly understudied, a reflection of both cultural and political views of female bodies and of the challenges of adequately accounting for hormonal changes during the menstrual cycle. Over‐representation of data collected during the early follicular phase does not suffice. We can and must do better. Some necessary steps to improve research and health care are obvious and relatively easy (adequate funding to collect data throughout menstrual cycling; development, monitoring and improvement of tools and protocols for measuring hormonal variability). Other steps are more challenging (accounting for the dynamic interactions between the biological and social determinants of health).
Just as it is costly to continue our work with tools not up to the task, so it is equally costly to discard useful tools without good reason. The development (and improvement through replication) of a robust toolkit for assessing changes in and the impacts of menstrual cycle hormones is foundational to reducing gender‐based health disparities.
Author Contributions
All coauthors contributed to the study execution and reviewed the manuscript for intellectual content. V.J.V. and D.B. conceived and supervised the study. V.J.V., L.E., and E.C. conducted data collection in Bolivia. V.J.V. and J.T. performed statistical analyses and wrote the manuscript.
Ethics Statement
All study protocols were approved by the institutional review boards at the Medical School of Universidad Mayor de San Andrés, La Paz, Plurinational State of Bolivia, and Indiana University, Bloomington, USA.
Consent
All study participants gave signed informed consent.
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Data S1. Supporting information.
Acknowledgments
This work is dedicated to the memory of Dr. Hilde Spielvogel, core investigator for the study reported here and for Project REPA (Reproduction and Ecology in Provincía Aroma), and la Responsable de la Unidad de Fisiología del Ejercicio del Instituto Boliviano de Biología de Altura—Universidad Mayor de San Andrés, La Paz, Bolivia. Her keen intellect, good humor, generosity, and kindness to innumerable patients, students, colleagues, and collaborators enriched all and immeasurably advanced understanding of adaptations and health in high‐altitude populations.
We thank Gertrudis Nina for invaluable assistance with participant recruitment and data collection; Dr. Rosemary A. Stewart (currently at Michigan State University) for contributing to the study design and performing the assays at the Center for the Integrative Study of Animal Behavior (Indiana University, Bloomington), and the study participants for their time and patience.
Preliminary analyses were presented at annual meetings of the Human Biology Association (Thornburg et al. 2008a, 2011) and the American Association of Physical Anthropology (Thornburg et al. 2008b).
Appendix A. Assessments of the Correlation Between Progesterone in Blood and Saliva Samples
| Author and year | Population/location | n persons (n paired samples) (L: luteal; F: follicular) | r (mean [range]) (L: luteal) | p | Saliva assay | Blood assay [PTotal‐VEN unless noted otherwise] | Apparent uptake fraction (UF) | |
|---|---|---|---|---|---|---|---|---|
| Follicular | Luteal | |||||||
| (1) R. F. Walker et al. (1979) | UK (United Kingdom) |
4 (L: 40; 8–15/person) |
L: [0.91–0.97] | RIA | RIA/plasma | |||
| (2) S. Walker et al. (1981) a , b | UK |
5 (103; 14–25/person) |
[0.81–0.93] | RIA | RIA/plasma | |||
|
9 (143; 8–25/person) |
0.86 [0.81–0.97] |
RIA | RIA/plasma | 2.3% | 1.2% | |||
| (3) Chearskul et al. (1982) c | UK |
5 (“several” during cycle) |
Parallel changes but nonlinear correlation | RIA | RIA/serum | ≈ 3% | ≈ 0.9% | |
| (4) Shah and Swift (1984) | UK |
50 (50; random day) |
0.89 | RIA | RIA/plasma | |||
| (5) Choe et al. (1983) d | US (United States) |
9 ovulatory cycles (96) |
0.58 (L: 0.45) |
0.001 L: < 0.001 |
RIA | RIA/plasma | 8% | 1.5% |
| (6) Donaldson et al. (1984) | UK |
2 (10; 18) L: 1 day, ~hourly |
RIA | RIA/plasma | ~ 1% | |||
| (7) Tallon et al. (1984) e | Ireland |
11 (56) |
0.85 | < 0.001 | EIA | EIA/plasma | 7.5% | 1.8% |
| (8) Zorn et al. (1984) f , g | France |
32 (L: 80) |
L: 0.76 | < 0.001 | RIA | RIA/plasma | 5.1% | 1.02% ± 0.39% |
| (9) Cedard et al. (1984) h | France |
14 (L: 39) |
L: 0.63 | < 0.001 | RIA | RIA/plasma | 5.3% | 1.12% ± 0.39% |
| (10) Wang and Knyba (1985) | UK |
36 (36) |
0.78 | < 0.001 | RIA | RIA/serum | ||
|
36 (36) |
0.75 | < 0.001 | RIA |
PFree‐VEN: RIA/serum |
||||
| (11) Webley and Edwards (1985) | UK | (77) | 0.71 | < 0.001 | RIA | RIA/plasma | ||
| (12) Evans (1986) i | New Zealand |
27 (L: 25; F: 18) |
0.90 | RIA | RIA/plasma | |||
|
4 (38) |
0.83 | RIA | RIA/plasma | 2.1% | 0.87% | |||
| (13) Bourque et al. (1986) j | Belgium |
14 (76) |
0.68 L: 0.78 F: 0.04 |
< 0.001 L: < 0.01 F: ns |
RIA | RIA/plasma | 9.6% | 1.6% |
| (14) De Boever et al. (1986) k | Belgium |
7 (96) |
0.88 | < 0.001 | CIA | RIA/serum |
Early‐mid F: 19.6% Peri‐ovul: 6.3% |
1.8% |
| (15) Heasley and Thompson (1986) | Ireland | 40 | 0.82 | < 0.001 | RIA | RIA/serum | ||
| (16) Petsos et al. (1986) | UK |
6 (82) |
0.89 [0.78–0.97] |
< 0.002 | RIA | RIA/serum | ||
| (17) Vuorento et al. (1989) | Finland |
32 (40) |
0.93 | N/A | RIA | RIA/serum | ||
| (18) Wong et al. (1990) | People's Republic of China |
10 (43) |
0.93 | < 0.001 | RIA | RIA/serum | ||
| (19) Kesner et al. (1992) | US |
10 (7–15 pairs/person; ~2nd & 3rd cycle weeks) |
0.73 (mean r of 10 individual r) |
≤ 0.0001 | RIA | RIA/serum | ||
| (20) Lipson and Ellison (1992) | US |
3 (“over course of menstrual cycle”) |
0.80–0.97 | RIA | RIA/serum | |||
| (21) Bolaji (1994) | Ireland | 23 | 0.99 | < 0.0001 | EIA | EIA/serum | ||
| (22) Vienravi et al. (1994) | Thailand |
20 (~40) |
0.835 | < 0.001 | RIA |
PFree‐VEN : RIA/serum |
||
| (23) Chearskul and Visutakul (1994) l | Thailand |
10 ovulatory cycles (daily: 280 pairs) (L: 138; F: 132) |
All: 0.89 L: 0.75 F: −0.08 |
< 0.001 < 0.001 ns |
RIA | RIA/plasma | 3.32% ± 0.32% | 1.39% ± 0.07% |
| (24) Lu et al. (1997) | US |
48 (L: 48) |
L: 0.75 | < 0.001 | RIA | RIA/serum | ||
| (25) Abood (2008) | Iraq (16–19 years) | 9 | 0.73 | 0.02 | RIA | RIA/serum | ||
| (20–29 years) | 26 | 0.75 | 0.001 | RIA | RIA/serum | |||
| (30–39 years) | 42 | ns | RIA | RIA/serum | ||||
| (40–49 years) | 11 | 0.65 | 0.03 | RIA | RIA/serum | |||
| (26) Houghton (2008) m | Britons (European descent) |
10 (L: 8, F: 1, ?: 1) |
0.88 | < 0.0005 | RIA | RIA/serum | 0.30% ± 0.18% | |
| Bangladeshi adulthood migrants to UK |
18 (L: 9, F: 1, ?: 8) |
0.19 | ns (0.448) | RIA | RIA/serum | 1.10% ± 2.2% | ||
| Bangladeshi sedentee |
11 (L: 10, F: 1) |
0.70 | < 0.0005 | RIA | RIA/serum | 0.25% ± 0.23% | ||
n = 9 comprises n = 5 reported by S. Walker et al. (1981) and n = 4 reported by R. F. Walker et al. (1979).
UF for n = 9 reported by Riad‐Fahmy et al. (1987): table II.
UF estimated from means reported by Choe et al. (1983).
UF estimated from means reported by Tallon et al. (1984).
Luteal UF reported by Zorn et al. (1984): table 1 (calculated from individual saliva/plasma pairs).
Follicular UF (mean saliva P)/(mean plasma P) and luteal UF (calculated from individual saliva/plasma pairs) reported by Cedard et al. (1984).
UF estimated from mean of individual paired ratios reported by Evans (1986).
UF estimated from data reported by De Boever et al. (1986).
UF reported by Chearskul and Visutakul (1994): table 2 (calculated from individual saliva/plasma pairs).
UF reported by Houghton (2008): calculated from individual saliva/plasma pairs.
Appendix B.
Rather than using computation (1), presented in Section 2.4.1, a preferable if less common way to calculate UF is to use the A‐mean of the individual UFs:
| (B.1) |
where each individual UF is defined as UF i = PFree‐SAL,i /PTotal‐SER,i .
Unfortunately, while both (1) and (B.1) would seem to be “natural” definitions for “mean UF,” they generally give different answers (see Table 4 in Section 4.2 for several examples), and it is not obvious whether (1) or (B.1) is “more natural.”
Another problem with both (1) and (B.1) is that when calculating the A‐mean of a set of values with a wide dynamic range, the A‐mean is relatively much more sensitive to changes in the larger values and less sensitive to changes in the smaller values. For example, given a set of samples from both the follicular and luteal phases, if we were to (hypothetically) double a single (relatively large) mid‐luteal PFree‐SAL,i value (leaving all the other PFree‐SAL,i and all the PTotal‐SER,i unchanged), computation (1) would change much more than if we were to double a single (relatively small) mid‐follicular PFree‐SAL,i value. Similarly, if we were to (hypothetically) double a single (relatively large) mid‐follicular UF i (leaving all the other UF i unchanged), computation (B.1) would change much more than if we were to double a single (relatively smaller) mid‐luteal UF i .
Because of these problems, we suggest that “mean uptake fraction” is better defined as the geometric mean (G‐mean):
![]() |
(B.2) |
or alternatively as
![]() |
(B.3) |
where the G‐mean of a set of N numbers X 1, X 2, X 3, …, X N is defined as
Mathematically, computations (B.2) and (B.3) always give precisely the same result, so it doesn't matter which of these we choose to define “mean uptake fraction.” Moreover, the G‐mean has the same relative sensitivity to both small and large values (e.g., hypothetically doubling a single value causes the same percentage change in the G‐mean regardless of whether the doubled value is relatively small or relatively large).
Another possible definition (offering improved robustness to outliers) is to take the median of the individual uptake fractions: UF(median of UFs) = median(UF i for i = 1, 2, 3, …, N). If the distribution of UF is non‐symmetric (as is often the case), the median UF is a more informative measure of central tendency than the A‐mean or G‐mean UF, or alternatively the UF i values may be log‐transformed.
Appendix C. Notable Challenges in the Assessment of Ovarian Functioning
Accurately characterizing individual and population variation in ovarian steroids requires fastidious attention to sample collection and judicious evaluation of the data. Regardless of the chosen matrix and assay protocols, the inherent nature of ovarian functioning presents several potential pitfalls that should be recognized, and addressed accordingly, in nearly any measurement of ovarian hormones.
C.1. Identifying Anovulatory Cycles and Cycle Phases
Anovulatory cycles are fairly common in many populations and subpopulations, particularly those living in demanding physical and socioeconomic conditions. Reported anovulation rates in adult premenopausal non‐breastfeeding women range from only 8% to more than 50% (Vitzthum 2009). Obviously, an anovulatory cycle has no luteal phase and therefore will not have the expected rise in postovulatory steroids. Progesterone concentration and UF in the third or fourth week post‐menses onset will resemble those of a follicular phase. Furthermore, cycle length is not reliably indicative of ovulatory status, therefore reverse day counting to estimate the timing of a putative ovulation and define phases is prone to error. There are several methods for evaluating the ovulatory status of a cycle and estimating its timing, the choice of which depends on resources, research population and setting, and research questions (Vitzthum 2021).
C.2. Sampling Frequency: No Matter the Matrix, One Sample Per Cycle Likely Won't Suffice
Despite substantial evidence of marked variability in ovarian hormones between individuals and populations as well as during the menstrual cycle, the inadequacy of a single sample for many clinical and research purposes remains widely underappreciated (Ansbacher 1990; Arslan et al. 2023; Fujimoto et al. 1990). Natural variability in P arises from several factors including pulsatile hormone release and individual circadian and ultradian rhythms (Delfs et al. 1994; Filicori et al. 1984; Veldhuis et al. 1988). Factors that confound interpretation of a single measurement include minute contaminants, undetected bleeding, type of assay and its sensitivity, different reference standards and other laboratory idiosyncrasies, and insufficient adherence to collection and storage protocols, among others (Fujimoto et al. 1990).
Thus, a single sample, whatever the matrix, is likely inadequate for accurately characterizing individual and populational variation in ovarian steroids, for ascertaining whether ovulation has occurred or for testing many other questions of interest.
For example, following the advent of RIA for ovarian steroids, small anthropological studies (van der Walt et al. 1977, 1978), and several large epidemiological studies that investigated hypothesized links between breast cancer and ovarian steroid variation, collected only a single venous sample per person (reviewed in Vitzthum 2009). The sparse sampling in these early studies significantly undercut useful interpretation of the data.
As well, a high P concentration in a solo sample does not necessarily indicate the occurrence of ovulation; the sample may have been contaminated. In an ovulatory cycle, ovulation is accompanied by a continuing rise in progesterone, which is best assessed using serial sampling (i.e., at least every other day). In addition, an inordinately high progesterone reading during the follicular phase is not necessarily indicative of a contaminant. In some instances, a persistent elevation in follicular phase progesterone is associated with early pregnancy loss (Vitzthum et al. 2006), a pattern that would have been unrecognized if not for serial sampling.
A single elevated PFree‐SAL or PTotal‐VEN measurement taken at some time in the middle of the second week following the first appearance of vaginal bleeding may occur for various other reasons including a short follicular phase, miscounting days since bleeding, a very early pregnancy loss, or fortuitously capturing a pulsatile P spike or a preovulatory P spike. A low PFree‐SAL or PTotal‐VEN measurement from a putative mid‐luteal day may indicate a long follicular phase, anovulation, or assay failure. Serial sampling, aggregating repeated measures, and examining the data set for obvious outliers based on statistical methods and biological plausibility substantially mitigates such errors.
Wang and Knyba's (1985) evaluation of the relationship between PFree‐SAL and PTotal‐VEN exemplifies judicious assessments of measurements and illustrates the limitations of relying on a single sample. They collected paired saliva and blood samples on a single randomly selected day from 96 volunteers in the United Kingdom. Seven of these pairs had PTotal‐VEN < 10 nmol/L (indicative of follicular phase) but PFree‐SAL > 300 pmol/L (two were ≥ 750 pmol/L), each of which could be mistaken for luteal phase P if relied upon as the sole evaluation of menstrual functioning. Despite their rigorous efforts to prevent such contamination, they concluded (p. 978) that food debris was the most likely immunoreactive impurity because some “saliva samples would need to have up to 25% of blood present” for the observed readings and “a subsequent saliva sample from such a volunteer did not have a high progesterone level.”
Other potential sources of error in the measurement of salivary steroids (which are particularly “sticky” molecules) include the use of collection devices and plastics other than polypropylene (Banerjee et al. 1985; Wood 2009; Vitzthum 2021). Unfortunately, findings from some older studies that followed then contemporary standards are now of uncertain validity because of the materials used. There's still the occasional study that unwittingly uses unsuitable collection materials (e.g., polyethylene).
C.3. Essential Precautions for Saliva Collection and Interpretation of Steroid Measurements
Our study highlights the unavoidable necessity of giving as much attention to the collection, transport, and storage of saliva samples as is typically given to the assays of those same samples. The ease with which saliva is readily obtained in almost any setting belies the risks of contamination and degradation that may render the sample useless and generate spurious findings. There are several resources on best practices for saliva sample collection, handling, and interpretation of data (see Vitzthum (2021) and references therein). Sample handling, transport, and storage from shortly after collection until assaying must be designed to minimize bacterial growth and biomarker degradation. A preservative may be added to the sample, but some preservatives can interfere with some assays (see Chester and Vitzthum (2018) for an EIA protocol that can be used with samples preserved with sodium azide). Freezing shortly after collection is ideal but may be difficult in remote locales and requires the maintenance of a cold chain until the samples are assayed (repeated freeze/thaw cycles degrade many biomarkers).
Undetected food particles, drink residues, chewed substances (e.g., tobacco, coca leaves, betel nut), ointments, makeup, and blood can result in erroneously elevated measurements. Rinsing with cool clean water can mitigate the presence of these spoilers, but there must be enough time after rinsing to allow equilibration of normal saliva composition and avoid erroneously low measurements. It is fairly easy to ensure adherence to collection protocols in a laboratory but obviously far more difficult to do so if study participants are self‐collecting alone at home or elsewhere. For example, a participant who typically shortens the waiting period after rinsing may have consistently lower UF than would otherwise be the case, and one who has undetected gum bleeding may have consistently higher measured UF. In our previous community‐based studies in Bolivia (Vitzthum et al. 2006, 2009), each saliva sample was collected in a participant's home during a brief visit by a research team member who could ensure adherence to a protocol that minimized sample contamination and dilution. Several thousand salivary samples were successfully collected across multiple sequential menstrual cycles from more than 200 rural Bolivian women.
C.4. Suitable Statistical Methods
The median or geometric mean of individual ratios is the better approach for computing mean UF (see Section 2.4.1 and Appendix B). The conclusions reported by Chatterton et al. (2006) were based on ratios of NW‐Bolivia and NW‐Chicago group means for PFree‐SAL and PTotal‐VEN rather than on individual ratios. Some of the problems with this approach are evident in a scatterplot of their data (Figure 8D). The use of arithmetic means is inappropriate here, as “mean uptake fractions” computed in this way are generally not the same as the mean of the individual uptake fractions. (In contrast, if geometric means were used, the resulting uptake fractions would coincide with the geometric means of the individual uptake fractions.) For example, Table 4 compares P UFs calculated using Chatterton et al.'s (2006) method versus the arithmetic mean, geometric mean, and median of the individual UFs. Notice that the UFs computed with Chatterton et al.'s (2006) method are substantially different from those computed by the other methods. These differences are a reflection of the wide range of the hormone levels within each sample, and of the presence of outliers in the data.
As did Wang and Knyba (1985), it is very valuable to future investigators to present all data and measurements, even for those cases where the sample measurements were below assay sensitivity, and to calculate sample statistics that include and omit such measurements and other outliers.
Vitzthum, V. J. , Bellido D., Echalar L., Caceres E., and Thornburg J.. 2025. “Salivary/Serum Progesterone Ratio Differs Between Menstrual Cycle Phases but Not Between Populations: Implications for Health, Reproductive, and Behavioral Research.” American Journal of Human Biology 37, no. 7: e70077. 10.1002/ajhb.70077.
Funding: This work was supported by Indiana University and Instituto Boliviano de Biologia de Altura.
Endnotes
There are many persons who experience ovarian cycling but who do not self‐identify as women (and/or female) and persons who identify as women and/or female but who may not be currently experiencing ovarian cycling. The arguments in this paper apply to all these persons. For cited literature, we use the terms used by the author(s) of the work. We use woman/women when referring to the Bolivian study participants, all of whom self‐identified as women; in the cultures of highland Bolivia, questions regarding sexuality would have been considered highly inappropriate. Given the biochemistry of saliva production and steroid transfer, there is no reason known at this time to expect gender and/or sexuality to influence those dynamics.
PTotal‐VEN concentration can be measured in serum (the liquid that remains after a blood sample is allowed to clot) or plasma (the liquid that remains after an anticoagulant is added to the blood sample to prevent clotting).
Ideally, with careful collection (see Appendix C for further information) and a reliable assay protocol, the measurement of free progesterone in a saliva sample is a useful approximation of free (unbound) progesterone in circulation. Nonetheless, assay cross‐reactivity with similar binding sites on other molecules and/or contaminants in saliva samples can introduce error in the measurement of a target analyte. Rigorous attention to collection and measurement protocols help to minimize these potential sources of error.
As no mention was made in any of these studies, it is impossible to know whether all participants in each study were predominantly of European descent. None of the studies reported “ethnic” variation in their study participants, perhaps because there was none and/or because they thought that ethnicity was irrelevant to their study of fundamental human physiology.
Spearman rho is preferred to Pearson r in this case because Spearman does not require normality and data can be ordinal. Moreover, when the data have a large dynamic range, Pearson r is mainly determined by the behavior of the larger (i.e., luteal) values. Pearson r is relatively insensitive to changes in the smaller (i.e., follicular) values. For example, if one were to turn our luteal data into a “cloud” by pairing each luteal‐phase serum measurement with a randomly‐chosen luteal‐phase saliva measurement, the Pearson correlation of the entire data set would drop considerably. But if one were to instead do the same thing to the follicular data (pair each follicular‐phase serum measurement with a randomly‐chosen follicular‐phase saliva measurement) the Pearson correlation of the entire data set would drop by only a little bit. In contrast, Spearman rho is equally sensitive to all parts of the serum and saliva distributions. Note that when a sample has multiple observations per person, Pearson or Spearman roughly estimates the association of two data distributions but neither sample statistic is suitable for hypothesis testing.
Each model plotted in Figure 7 and Figure 8B is a single‐level mixed model of the general form, (PFree‐SAL) ij = β 0 + μ i + β 1(PTotal‐SER) ij + ε ij , where i indexes the participant, j indexes the multiple serum–saliva pairs for each participant, β 0 and β 1 are the model intercept and slope respectively, each μ i is a per‐participant random intercept which models inter‐individual variation in overall hormone levels within a population (Twisk 2006), and each ε ij is an unmodelled random error.
Each model also incorporates an autocorrelation structure to model the nonindependence of serial samples from the same participant. “Spatial power” (using reverse cycle day and day relative to ovulation day as the time variables for the Figure 7 and Figure 8B models, respectively) and AR(1) structures for the covariance matrix gave nearly identical models, including nearly identical values of Akaike's Information Criterion (AIC) and Schwartz's Bayesian Information Criterion (BIC); we used the “spatial power” structure for the final models reported here.
Data Availability Statement
The data that support the findings of this study are openly available in DRYAD at https://datadryad.org/.
References
- Abood, N. A. 2008. “Assessment of Steroidal Hormones Levels From Serum and Saliva in females.” Master's thesis, Al‐Nahrain University.
- Ah‐King, M. 2022. “The History of Sexual Selection Research Provides Insights as to Why Females Are Still Understudied.” Nature Communications 13, no. 1: 6976. 10.1038/s41467-022-34770-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ansbacher, R. 1990. “Variability of Serum Prolactin and Progesterone Levels in Normal Women: The Relevance of Single Hormone Measurements in the Clinical Setting.” Obstetrics and Gynecology 76, no. 5 Pt 1: 895–896. [PubMed] [Google Scholar]
- Arey, L. B. 1939. “The Degree of Normal Menstrual Irregularity.” American Journal of Obstetrics and Gynecology 37: 12–29. https://www.ajog.org/article/S0002‐9378(16)40957‐9/abstract. [Google Scholar]
- Arnegard, M. E. , Whitten L. A., Hunter C., and Clayton J. A.. 2020. “Sex as a Biological Variable: A 5‐Year Progress Report and Call to Action.” Journal of Women's Health 29, no. 6: 858–864. 10.1089/jwh.2019.8247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arslan, R. C. , Blake K., Botzet L. J., et al. 2023. “Not Within Spitting Distance: Salivary Immunoassays of Estradiol Have Subpar Validity for Predicting Cycle Phase.” Psychoneuroendocrinology 149: 105994. 10.1016/j.psyneuen.2022.105994. [DOI] [PubMed] [Google Scholar]
- Banerjee, S. , Levitz M., and Rosenberg C. R.. 1985. “On the Stability of Salivary Progesterone Under Various Conditions of Storage.” Steroids 46: 967–974. [DOI] [PubMed] [Google Scholar]
- Barr, E. , Popkin R., Roodzant E., Jaworski B., and Temkin S. M.. 2024. “Gender as a Social and Structural Variable: Research Perspectives From the National Institutes of Health (NIH).” Translational Behavioral Medicine 14, no. 1: 13–22. 10.1093/tbm/ibad014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beery, A. K. , and Zucker I.. 2011. “Sex Bias in Neuroscience and Biomedical Research.” Neuroscience and Biobehavioral Reviews 35, no. 3: 565–572. 10.1016/j.neubiorev.2010.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blom, T. , Ojanotko‐Harri A., Laine M., and Huhtaniemi I.. 1993. “Metabolism of Progesterone and Testosterone in Human Parotid and Submandibular Salivary Glands In Vitro.” Journal of Steroid Biochemistry and Molecular Biology 44, no. 1: 69–76. 10.1016/0960-0760(93)90153-n. [DOI] [PubMed] [Google Scholar]
- Bolaji, I. I. 1994. “Sero‐Salivary Progesterone Correlation.” International Journal of Gynaecology and Obstetrics 45, no. 2: 125–131. 10.1016/0020-7292(94)90119-8. [DOI] [PubMed] [Google Scholar]
- Bourque, J. , Sulon J., Demey‐Ponsart E., Sodoyez J. C., and Gaspard U.. 1986. “A Simple, Direct Radioimmunoassay for Salivary Progesterone Determination During the Menstrual Cycle.” Clinical Chemistry 32, no. 6: 948–951. 10.1093/clinchem/32.6.948. [DOI] [PubMed] [Google Scholar]
- Cameron, A. , Henley D., Carrell R., Zhou A., Clarke A., and Lightman S.. 2010. “Temperature‐Responsive Release of Cortisol From Its Binding Globulin: A Protein Thermocouple.” Journal of Clinical Endocrinology & Metabolism 95, no. 10: 4689–4695. 10.1210/jc.2010-0942. [DOI] [PubMed] [Google Scholar]
- Cedard, L. , Janssens Y., Tanguy G., and Zorn J. R.. 1984. “Radioimmunoassay of Plasma and Salivary Progesterone During the Menstrual Cycle.” Journal of Steroid Biochemistry 20, no. 1: 487–490. 10.1016/0022-4731(84)90258-9. [DOI] [PubMed] [Google Scholar]
- Chatterton, R. T. , Mateo E. T., Lu D., and Ling F. J.‐H.. 2006. “Interpopulational Differences in the Concentrations and Ratios of Salivary and Serum Progesterone.” Fertility and Sterility 86, no. 3: 723–725. 10.1016/j.fertnstert.2006.01.034. [DOI] [PubMed] [Google Scholar]
- Chearskul, S. , Rincon‐Rodriguez I., Sufi S. B., Donaldson A., and Jeffcoate S. L.. 1982. “Simple Direct Assays for Measuring Oestradiol and Progesterone in Saliva.” In Radioimmunoassay and Related Procedures in Medicine 1982: Proceedings of a Symposium, Vienna (21–25 June 1982), 265–274. International Atomic Energy Agency. https://inis.iaea.org/search/search.aspx?orig_q=RN:14772267. [Google Scholar]
- Chearskul, S. , and Visutakul P.. 1994. “Non‐Invasive Hormonal Analysis for Ovulation Detection.” Journal of the Medical Association of Thailand 77, no. 4: 176–186. [PubMed] [Google Scholar]
- Chester, E. M. , and Vitzthum V. J.. 2018. Protocol for Extraction and Enzyme Immunosorbent Assay (ELISA) of Progesterone or EStradiol From Human Salivary Samples Preserved With or Without Sodium Azide (Protocol v1.0). IU Scholar Works. http://hdl.handle.net/2022/21883. [Google Scholar]
- Choe, J. K. , Khan‐Dawood F. S., and Yusoff‐Dawood M.. 1983. “Progesterone and Estradiol in the Saliva and Plasma During the Menstrual Cycle.” American Journal of Obstetrics and Gynecology 147, no. 5: 557–562. 10.1016/0002-9378(83)90016-9. [DOI] [PubMed] [Google Scholar]
- Dabbs, J., Jr. , Campbell B., Gladue B., et al. 1995. “Reliability of Salivary Testosterone Measurements, a Multicenter Evaluation.” Clinical Chemistry 41: 1581–1584. 10.1093/clinchem/41.11.1581. [DOI] [PubMed] [Google Scholar]
- De Boever, J. , Kohen F., and Vandekerckhove D.. 1986. “Direct Solid‐Phase Chemiluminescence Immunoassay for Salivary Progesterone.” Clinical Chemistry 32, no. 5: 763–767. 10.1093/clinchem/32.5.763. [DOI] [PubMed] [Google Scholar]
- De Geyter, C. , De Geyter M., Huber P. R., Nieschlag E., and Holzgreve W.. 2002. “Progesterone Serum Levels During the Follicular Phase of the Menstrual Cycle Originate From the Crosstalk Between the Ovaries and the Adrenal Cortex.” Human Reproduction 17, no. 4: 933–939. 10.1093/humrep/17.4.933. [DOI] [PubMed] [Google Scholar]
- Delfs, T. M. , Klein S., Fottrell P., Naether O. G., Leidenberger F. A., and Zimmermann R. C.. 1994. “24‐Hour Profiles of Salivary Progesterone.” Fertility and Sterility 62, no. 5: 960–966. 10.1016/s0015-0282(16)57058-7. [DOI] [PubMed] [Google Scholar]
- Donaldson, A. , Jeffcoate S. L., and Sufi S. B.. 1984. “Assays of Oestradiol and Progesterone in Saliva in the Assessment of Ovarian Function.” In Steroid Hormones in Saliva, edited by Ferguson D. B., 80–86. Karger. [Google Scholar]
- DuBois, L. Z. , and Shattuck‐Heidorn H.. 2021. “Challenging the Binary: Gender/Sex and the Bio‐Logics of Normalcy.” American Journal of Human Biology 33, no. 5: e23623. 10.1002/ajhb.23623. [DOI] [PubMed] [Google Scholar]
- Eaton, S. B. , Pike M. C., Short R. V., et al. 1994. “Women's Reproductive Cancers in Evolutionary Context.” Quarterly Review of Biology 69, no. 3: 353–367. 10.1086/418650. [DOI] [PubMed] [Google Scholar]
- Ellison, P. T. 1988. “Human Salivary Steroids: Methodological Considerations and Applications in Physical Anthropology.” American Journal of Physical Anthropology 31, no. S9: 115–142. 10.1002/ajpa.1330310507. [DOI] [Google Scholar]
- Ellison, P. T. , Lipson S., O'Rourke M., et al. 1993. “Population Variation in Ovarian Function.” Lancet 342, no. 8868: 433–434. 10.1016/0140-6736(93)92845-K. [DOI] [PubMed] [Google Scholar]
- Ellison, P. T. , Peacock N. R., and Lager C.. 1986. “Salivary Progesterone and Luteal Function in Two Low‐Fertility Populations of Northeast Zaire.” Human Biology 58: 473–483. https://www.jstor.org/stable/41463781. [PubMed] [Google Scholar]
- Ellison, P. T. , Peacock N. R., and Lager C.. 1989. “Ecology and Ovarian Function Among Lese Women of the Ituri Forest, Zaire.” American Journal of Physical Anthropology 78, no. 4: 519–526. 10.1002/ajpa.1330780407. [DOI] [PubMed] [Google Scholar]
- Evans, J. J. 1986. “Progesterone in Saliva Does Not Parallel Unbound Progesterone in Plasma.” Clinical Chemistry 32, no. 3: 542–544. 10.1093/clinchem/32.3.542. [DOI] [PubMed] [Google Scholar]
- Filicori, M. , Butler J., and Crowley W.. 1984. “Neuroendocrine Regulation of the Corpus Luteum in the Human.” Journal of Clinical Investigation 73: 1638–1647. https://www.jci.org/articles/view/111370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Foster, F. P. 1889. “The Periodicity and Duration of the Menstrual Flow.” New York Medical Journal 49: 610–611. https://hdl.handle.net/2027/nnc2.ark:/13960/t40s2g25p. [Google Scholar]
- Fraenkel, L. 1926. “Zeit und Ursaechlichkeitsverhaeltnis Zwischen Ovulation und Menstruation [Time and the Causal Relationship Between Ovulation and Menstruation].” In Handbuch Der Normalen Und Der Pathologischen Physiologie [Handbook of Normal and Pathological Physiology], vol. 1, 454. Springer. https://link.springer.com/chapter/10.1007/978‐3‐642‐91024‐1_71. [Google Scholar]
- Fujimoto, V. Y. , Clifton D. K., Cohen N. L., and Soules M. R.. 1990. “Variability of Serum Prolactin and Progesterone Levels in Normal Women: The Relevance of Single Hormone Measurements in the Clinical Setting.” Obstetrics and Gynecology 76, no. 1: 71–78. http://hdl.handle.net/1773/4434. [PubMed] [Google Scholar]
- Geller, S. E. , Koch A. R., Roesch P., Filut A., Hallgren E., and Carnes M.. 2018. “The More Things Change, the More They Stay the Same: A Study to Evaluate Compliance With Inclusion and Assessment of Women and Minorities in Randomized Controlled Trials.” Academic Medicine 93, no. 4: 630–635. 10.1097/ACM.0000000000002027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Government of Canada IAP on RE . 2023, January 11. Tri‐Council Policy Statement: Ethical Conduct for Research Involving Humans – TCPS 2 (2022). https://ethics.gc.ca/eng/policy‐politique_tcps2‐eptc2_2022.html.
- Gröschl, M. 2017. “Saliva: A Reliable Sample Matrix in Bioanalytics.” Bioanalysis 9, no. 8: 655–668. 10.4155/bio-2017-0010. [DOI] [PubMed] [Google Scholar]
- Hamidovic, A. , Karapetyan K., Serdarevic F., Choi S. H., Eisenlohr‐Moul T., and Pinna G.. 2020. “Higher Circulating Cortisol in the Follicular vs. Luteal Phase of the Menstrual Cycle: A Meta‐Analysis.” Frontiers in Endocrinology 11: 311. 10.3389/fendo.2020.00311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heasley, R. N. , and Thompson W.. 1986. “Salivary Progesterone Measurements in the Normal Menstrual Cycle.” Irish Journal of Medical Science 155: 19–22. http://link.springer.com/10.1007/BF02939945. [DOI] [PubMed] [Google Scholar]
- Heymsfield, S. B. , Peterson C. M., Thomas D. M., Heo M., and Schuna J. M. Jr. 2016. “Why Are There Race/Ethnic Differences in Adult Body Mass Index–Adiposity Relationships? A Quantitative Critical Review.” Obesity Reviews 17, no. 3: 262–275. 10.1111/obr.12358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hodyl, N. A. , Stark M. J., Meyer E. J., Lewis J. G., Torpy D. J., and Nenke M. A.. 2020. “High Binding Site Occupancy of Corticosteroid‐Binding Globulin by Progesterone Increases Fetal Free Cortisol Concentrations.” European Journal of Obstetrics & Gynecology and Reproductive Biology 251: 129–135. 10.1016/j.ejogrb.2020.05.034. [DOI] [PubMed] [Google Scholar]
- Houghton, L. 2008. “Ethnic Variation in Correlations of Salivary and Serum Reproductive Steroid Hormones: A Comparison of Bangladeshi and British Women.” Master's thesis, Durham University. http://etheses.dur.ac.uk/2038/.
- International Atomic Energy Agency (IAEA) . 1982. Radioimmunoassay and Related Procedures in Medicine, 1982: Proceedings of an International Symposium on Radioimmunoassay and Related Procedures in Medicine. IAEA. https://inis.iaea.org/records/4z6sb‐n1h97/files/14771821.pdf. [Google Scholar]
- Jasienska, G. , and Ellison P. T.. 1998. “Physical Work Causes Suppression of Ovarian Function in Women.” Proceedings of the Royal Society B: Biological Sciences 265, no. 1408: 1847–1851. 10.1098/rspb.1998.0511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kesner, J. , Wright D., Schrader S., Chin N., and Krieg E. F. Jr. 1992. “Methods of Monitoring Menstrual Function in Field Studies: Efficacy of Methods.” Reproductive Toxicology 6: 385–400. [DOI] [PubMed] [Google Scholar]
- Klinge, I. 2008. “Gender Perspectives in European Research.” Pharmacological Research 58, no. 3–4: 183–189. 10.1016/j.phrs.2008.07.011. [DOI] [PubMed] [Google Scholar]
- Kolatorova, L. , Vitku J., Suchopar J., Hill M., and Parizek A.. 2022. “Progesterone: A Steroid With Wide Range of Effects in Physiology as Well as Human Medicine.” International Journal of Molecular Sciences 23, no. 14: 7989. 10.3390/ijms23147989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Konishi, S. , Brindle E., Guyton A., and O'Connor K. A.. 2012. “Salivary Concentration of Progesterone and Cortisol Significantly Differs Across Individuals After Correcting for Blood Hormone Values.” American Journal of Physical Anthropology 149, no. 2: 231–241. 10.1002/ajpa.22114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laine, M. A. , and Ojanotko A. O.. 1999. “Progesterone Metabolism in Human Saliva In Vitro.” Journal of Steroid Biochemistry and Molecular Biology 70, no. 1–3: 109–113. 10.1016/S0960-0760(99)00087-4. [DOI] [PubMed] [Google Scholar]
- Leonard, W. R. 2021. “Novel Applications of Minimally Invasive Biomarkers in Human Biology Research.” American Journal of Human Biology 33, no. 1: e23568. 10.1002/ajhb.23568. [DOI] [PubMed] [Google Scholar]
- Lewis, J. G. , and Elder P. A.. 2011. “Corticosteroid‐Binding Globulin Reactive Centre Loop Antibodies Recognise Only the Intact Natured Protein: Elastase Cleaved and Uncleaved CBG May Coexist in Circulation.” Journal of Steroid Biochemistry and Molecular Biology 127, no. 3–5: 289–294. 10.1016/j.jsbmb.2011.08.006. [DOI] [PubMed] [Google Scholar]
- Lewis, J. G. , and Elder P. A.. 2013. “Intact or “Active” Corticosteroid‐Binding Globulin (CBG) and Total CBG in Plasma: Determination by Parallel ELISAs Using Monoclonal Antibodies.” Clinica Chimica Acta 416: 26–30. 10.1016/j.cca.2012.11.016. [DOI] [PubMed] [Google Scholar]
- Lewis, L. L. , Greenblatt E. M., Amanda Rittenhouse C., Veldhuis J. D., and Jaffe R. B.. 1995. “Pulsatile Release Patterns of Luteinizing Hormone and Progesterone in Relation to Symptom Onset in Women With Premenstrual Syndrome.” Fertility and Sterility 64, no. 2: 288–292. 10.1016/S0015-0282(16)57725-5. [DOI] [PubMed] [Google Scholar]
- Lipson, S. F. , and Ellison P. T.. 1992. “Normative Study of Age Variation in Salivary Progesterone Profiles.” Journal of Biosocial Science 24, no. 2: 233–244. 10.1017/S0021932000019751. [DOI] [PubMed] [Google Scholar]
- Liu, J. , Qiu X., Wang D., et al. 2018. “Quantification of 10 Steroid Hormones in Human Saliva From Chinese Adult Volunteers.” Journal of International Medical Research 46, no. 4: 1414–1427. 10.1177/0300060517752733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu, Y. , Chatterton R. T., Vogelsong K. M., and May L. K.. 1997. “Direct Radioimmunoassay of Progesterone in Saliva.” Journal of Immunoassay 18, no. 2: 149–163. 10.1080/01971529708005810. [DOI] [PubMed] [Google Scholar]
- Mastroianni, A. C. , Faden R., and Federman D.. 1994. “Women and Health Research: A Report From the Institute of Medicine.” Kennedy Institute of Ethics Journal 4, no. 1: 55–62. https://muse.jhu.edu/article/245708. [DOI] [PubMed] [Google Scholar]
- Mazure, C. M. , and Jones D. P.. 2015. “Twenty Years and Still Counting: Including Women as Participants and Studying Sex and Gender in Biomedical Research.” BMC Women's Health 15, no. 1: 94. 10.1186/s12905-015-0251-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McDade, T. W. 2014. “Development and Validation of Assay Protocols for Use With Dried Blood Spot Samples.” American Journal of Human Biology 26, no. 1: 1–9. 10.1002/ajhb.22463. [DOI] [PubMed] [Google Scholar]
- Misao, R. , Nakanishi Y., Fujimoto J., Iwagaki S., and Tamaya T.. 1999. “Levels of Sex Hormone‐Binding Globulin and Corticosteroid‐Binding Globulin mRNAs in Corpus Luteum of Human Subjects: Correlation With Serum Steroid Hormone Levels.” Gynecological Endocrinology 13, no. 2: 82–88. 10.3109/09513599909167537. [DOI] [PubMed] [Google Scholar]
- NIH . 2024. “The Inclusion of Women and Minorities as Participants in Research Involving Human Subjects.” https://grants.nih.gov/policy/inclusion/women‐and‐minorities.htm.
- Núñez‐de la Mora, A. , Chatterton R. T., Mateo E. T., Jesmin F., and Bentley G. R.. 2007. “Effect of Chewing Betel Nut on Measurements of Salivary Progesterone and Estradiol.” American Journal of Physical Anthropology 132, no. 2: 311–315. 10.1002/ajpa.20513. [DOI] [PubMed] [Google Scholar]
- O'Rourke, M. T. , and Ellison P. T.. 1990. “Salivary Measurement of Episodic Progesterone Release.” American Journal of Physical Anthropology 81, no. 3: 423–428. 10.1002/ajpa.1330810311. [DOI] [PubMed] [Google Scholar]
- Orr, T. J. , Burns M., Hawkes K., et al. 2020. “It Takes Two to Tango: Including a Female Perspective in Reproductive Biology.” Integrative and Comparative Biology 60, no. 3: 796–813. 10.1093/icb/icaa084. [DOI] [PubMed] [Google Scholar]
- Panter‐Brick, C. , Lotstein D. S., and Ellison P. T.. 1993. “Seasonality of Reproductive Function and Weight Loss in Rural Nepali Women.” Human Reproduction 8: 684–690. 10.1093/oxfordjournals.humrep.a138015. [DOI] [PubMed] [Google Scholar]
- Patil, S. , Patil N., Bhat R., et al. 2023. “Diurnal Variation in Salivary Progesterone in Fertile Indian Women.” Heliyon 9, no. 1: e12719. 10.1016/j.heliyon.2022.e12719. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Petsos, P. , Ratcliffe W. A., Heath D. F., and Anderson D. C.. 1986. “Comparison of Blood Spot, Salivary and Serum Progesterone Assays in the Normal Menstrual Cycle.” Clinical Endocrinology 24: 31–38. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1365‐2265.1986.tb03251.x. [DOI] [PubMed] [Google Scholar]
- Rahman, S. A. , Grant L. K., Gooley J. J., Rajaratnam S. M. W., Czeisler C. A., and Lockley S. W.. 2019. “Endogenous Circadian Regulation of Female Reproductive Hormones.” Journal of Clinical Endocrinology & Metabolism 104, no. 12: 6049–6059. 10.1210/jc.2019-00803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Read, G. F. 1989. “Hormones in Saliva.” In Human Saliva, Volume II, edited by Tenovuo J. O.. CRC Press. [Google Scholar]
- Read, G. F. , Riad‐Fahmy D., Walker R. F., and Griffiths K., eds. 1984. Immunoassays of Steroids in Saliva: Proceedings of the Ninth Tenovus Workshop, Cardiff, November 1982. Alpha Omega Publishing Ltd. https://archive.org/details/immunoassaysofst0000teno. [Google Scholar]
- Riad‐Fahmy, D. , Read G. F., Walker R. F., Walker S. M., and Griffiths K.. 1987. “Determination of Ovarian Steroid Hormone Levels in Saliva: An Overview.” Journal of Reproductive Medicine 32: 254–272. [PubMed] [Google Scholar]
- Rossmanith, W. G. , Laughlin G. A., Mortola J. F., Johnson M. L., Veldhuis J. D., and Yen S. S.. 1990. “Pulsatile Cosecretion of Estradiol and Progesterone by the Midluteal Phase Corpus Luteum: Temporal Link to Luteinizing Hormone Pulses.” Journal of Clinical Endocrinology and Metabolism 70, no. 4: 990–995. 10.1210/jcem-70-4-990. [DOI] [PubMed] [Google Scholar]
- Salimetrics . 2010. “Salivary Progesterone EIA Kit Insert, Cat.# 1‐1502, 1‐1502‐5. Revision Date: 12‐9‐10.”
- Seaton, B. , and Riad‐Fahmy D.. 1980. “Use of Salivary Progesterone Assays to Monitor Menstrual Cycles in Bangladeshi Women.” Journal of Endocrinology 87, no. 3_Supplement: 21. [Google Scholar]
- Shah, R. , and Swift A. D.. 1984. “Salivary Progesterone Throughout the Menstrual Cycle and Pregnancy.” In Immunoassays of Steroids in Saliva: Proceedings of the Ninth Tenovus Workshop, Cardiff, November 1982, edited by Read G. F., Riad‐Fahmy D., Walker R. F., and Griffiths K., 134–147. Alpha Omega Publishing Ltd. https://archive.org/details/immunoassaysofst0000teno. [Google Scholar]
- Shea, A. A. , and Vitzthum V. J.. 2020. “The Extent and Causes of Natural Variation in Menstrual Cycles: Integrating Empirically‐Based Models of Ovarian Cycling Into Research on Women's Health.” Drug Discovery Today: Disease Models 32: 41–49. 10.1016/j.ddmod.2020.11.002. [DOI] [Google Scholar]
- Snodgrass, J. J. 2022. “Minimally Invasive Biomarkers in Human Population Biology Research, Part 2: An Introduction to the Special Issue.” American Journal of Human Biology 34, no. 11: e23822. 10.1002/ajhb.23822. [DOI] [PubMed] [Google Scholar]
- Soules, M. R. , Clifton D. K., Steiner R. A., Cohen N. L., and Bremner W. J.. 1988. “The Corpus Luteum: Determinants of Progesterone Secretion in the Normal Menstrual Cycle.” Obstetrics and Gynecology 71, no. 5: 659–666. [PubMed] [Google Scholar]
- Stinson, S. 1982. “The Interrelationship of Mortality and Fertility in Rural Bolivia.” Human Biology 54, no. 2: 299–313. [PubMed] [Google Scholar]
- Strassmann, B. I. 1997. “The Biology of Menstruation in Homo sapiens : Total Lifetime Menses, Fecundity, and Non‐Synchrony in a Natural‐Fertility Population.” Current Anthropology 38, no. 1: 123–129. 10.1086/204592. [DOI] [Google Scholar]
- Stricker, R. , Eberhart R., Chevailler M.‐C., Quinn F. A., Bischof P., and Stricker R.. 2006. “Establishment of Detailed Reference Values for Luteinizing Hormone, Follicle Stimulating Hormone, Estradiol, and Progesterone During Different Phases of the Menstrual Cycle on the Abbott ARCHITECT Analyzer.” Clinical Chemistry and Laboratory Medicine 44, no. 7: 883–887. 10.1515/CCLM.2006.160. [DOI] [PubMed] [Google Scholar]
- Sufi, S. B. , Donaldson A., Gandy S. C., et al. 1985. “Multicenter Evaluation of Assays for Estradiol and Progesterone in Saliva.” Clinical Chemistry 31, no. 1: 101–103. 10.1093/clinchem/31.1.101. [DOI] [PubMed] [Google Scholar]
- Sufi, S. B. , Jeffcoate S. L., Hall P. E., and Goncharov N.. 1982. “Five Years' Experience With the World Health Organization Matched Assay Reagent Programme in Research in Human Reproduction.” In Radioimmunoassay and Related Procedures in Medicine 1982. IAEA. https://inis.iaea.org/search/search.aspx?orig_q=RN:14771855. [Google Scholar]
- Tallon, D. F. , Gosling J. P., Buckley P. M., et al. 1984. “Direct Solid‐Phase Enzyme Immunoassay of Progesterone in Saliva.” Clinical Chemistry 30, no. 9: 1507–1511. 10.1093/clinchem/30.9.1507. [DOI] [PubMed] [Google Scholar]
- Thomson, G. E. , Mitchell F., and Williams M., eds. 2006. Examining the Health Disparities Research Plan of the National Institutes of Health: Unfinished Business. National Academies Press. https://books.google.ca/books?id=LKVVAgAAQBAJ&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false. [PubMed] [Google Scholar]
- Thornburg, J. , Bellido D., Stewart R., et al. 2011. “New Data Demonstrates That Relationship of Salivary to Serum Progesterone in Bolivian Women Is Comparable to That in Other Populations.” American Journal of Human Biology 23, no. 2: 274–275. 10.1002/ajhb.21153. [DOI] [Google Scholar]
- Thornburg, J. , Spielvogel H., and Vitzthum V.. 2008a. “Salivary Progesterone as a Proxy for Serum Progesterone: Reports of Its Death Have Been Greatly Exaggerated.” American Journal of Human Biology 20: 236. 10.1002/ajhb.20760. [DOI] [Google Scholar]
- Thornburg, J. , Spielvogel H., and Vitzthum V. J.. 2008b. “Are There Significant Interpopulational Differences in the Ratio of Salivary to Serum Progesterone?” American Journal of Physical Anthropology 135, no. S46: 206–207. https://onlinelibrary.wiley.com/doi/epdf/10.1002/ajpa.20806.18046774 [Google Scholar]
- Thornburg, J. , and Vitzthum V. J.. 2020. “An Economical Insulated Shipping Container Providing a >4 Day Lifetime for Frozen Biosamples Without Dry Ice.” engrXiv: 1–26. 10.31224/osf.io/ef9k4. [DOI] [Google Scholar]
- Treloar, A. E. , Boynton R. E., Behn B. G., and Brown B. W.. 1967. “Variation of the Human Menstrual Cycle Through Reproductive Life.” International Journal of Fertility 12: 77–126. https://scispace.com/papers/variation‐of‐the‐human‐menstrual‐cycle‐through‐reproductive‐4p0ybcy8rk. [PubMed] [Google Scholar]
- Twisk, J. W. R. 2006. Applied Multilevel Analysis: A Practical Guide for Medical Researchers. Cambridge University Press. 10.1017/CBO9780511610806. [DOI] [Google Scholar]
- van der Walt, L. A. , Wilmsen E. N., Levin J., and Jenkins T.. 1977. “Endocrine Studies on the San ('Bushmen') of Botswana.” South African Medical Journal 52: 230–232. [PubMed] [Google Scholar]
- van der Walt, L. A. , Wilmsen E. N., and Jenkins T.. 1978. “Unusual Sex Hormone Patterns Among Desert‐Dwelling Hunter‐Gatherers.” Journal of Clinical Endocrinology & Metabolism 46, no. 4: 658–663. [DOI] [PubMed] [Google Scholar]
- Veldhuis, J. D. , Christensen E., Evans W. S., Kolp L. A., Rogol A. D., and Johnson M. L.. 1988. “Physiological Profiles of Episodic Progesterone Release During the Midluteal Phase of the Human Menstrual Cycle: Analysis of Circadian and Ultradian Rhythms, Discrete Pulse Properties, and Correlations With Simultaneous Luteinizing Hormone Release.” Journal of Clinical Endocrinology & Metabolism 66, no. 2: 414–421. 10.1210/jcem-66-2-414. [DOI] [PubMed] [Google Scholar]
- Vienravi, V. , Amatayakul K., Kanluan T., Uttavichai C., and Andres R.. 1994. “A Direct Radioimmunoassay for Free Progesterone in Saliva.” Journal of the Medical Association of Thailand = Chotmaihet Thangphaet 77, no. 3: 138–147. [PubMed] [Google Scholar]
- Vining, R. F. , and McGinley R. A.. 1986. “Hormones in Saliva.” CRC Critical Reviews in Clinical Laboratory Sciences 23, no. 2: 95–146. 10.3109/10408368609165797. [DOI] [PubMed] [Google Scholar]
- Vitzthum, V. J. 2001. “Why Not So Great Is Still Good Enough: Flexible Responsiveness in Human Reproductive Functioning’.” In Reproductive Ecology and Human Evolution, edited by Ellison P. T., 179–202. Aldine de Gruyter. [Google Scholar]
- Vitzthum, V. J. 2008. “Evolutionary Models of Women's Reproductive Functioning.” Annual Review of Anthropology 37: 53–73. https://www.annualreviews.org/content/journals/10.1146/annurev.anthro.37.081407.085112. [Google Scholar]
- Vitzthum, V. J. 2009. “The Ecology and Evolutionary Endocrinology of Reproduction in the Human Female.” American Journal of Physical Anthropology 140, no. S49: 95–136. https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fajpa.21195&file=AJPA_21195_sm_suppfig.eps, https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fajpa.21195&file=AJPA_21195_sm_supptables.pdf. [DOI] [PubMed] [Google Scholar]
- Vitzthum, V. J. 2013. “Fifty Fertile Years: Anthropologists' Studies of Reproduction in High Altitude Natives.” American Journal of Human Biology 25, no. 2: 179–189. 10.1002/ajhb.22357. [DOI] [PubMed] [Google Scholar]
- Vitzthum, V. J. 2021. “Field Methods and Strategies for Assessing Female Reproductive Functioning.” American Journal of Human Biology 33, no. 5: e23513. 10.1002/ajhb.23513. [DOI] [PubMed] [Google Scholar]
- Vitzthum, V. J. 2024. “11. How It Works: The Biological Mechanisms That Generate Demographic Diversity.” In Human Evolutionary Demography, edited by Burger O., Lee R., and Sear R., 251–290. Open Book Publishers. 10.11647/OBP.0251. [DOI] [Google Scholar]
- Vitzthum, V. J. , Bentley G. R., Spielvogel H., et al. 2002. “Salivary Progesterone Levels and Rate of Ovulation Are Significantly Lower in Poorer Than in Better‐Off Urban‐Dwelling Bolivian Women.” Human Reproduction 17: 1906–1913. https://academic.oup.com/humrep/article‐abstract/17/7/1906/576950. [DOI] [PubMed] [Google Scholar]
- Vitzthum, V. J. , Dornum M., and Ellison P. T.. 1993. “Effect of Coca‐Leaf Chewing on Salivary Progesterone Assays.” American Journal of Physical Anthropology 92: 539–544. 10.1002/ajpa.1330920410. [DOI] [PubMed] [Google Scholar]
- Vitzthum, V. J. , Ellison P. T., Sukalich S., Caceres E., and Spielvogel H.. 2000. “Does Hypoxia Impair Ovarian Function in Bolivian Women Indigenous to High Altitude?” High Altitude Medicine and Biology 1, no. 1: 39–49. 10.1089/152702900320676. [DOI] [PubMed] [Google Scholar]
- Vitzthum, V. J. , Spielvogel H., and Thornburg J.. 2004. “Interpopulational Differences in Progesterone Levels During Conception and Implantation in Humans.” Proceedings of the National Academy of Sciences 101, no. 6: 1443–1448. 10.1073/pnas.0302640101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vitzthum, V. J. , Spielvogel H., Thornburg J., and West B.. 2006. “A Prospective Study of Early Pregnancy Loss in Humans.” Fertility and Sterility 86, no. 2: 373–379. 10.1016/j.fertnstert.2006.01.021. [DOI] [PubMed] [Google Scholar]
- Vitzthum, V. J. , Worthman C. M., Beall C. M., et al. 2009. “Seasonal and Circadian Variation in Salivary Testosterone in Rural Bolivian Males.” American Journal of Human Biology 21: 762–768. http://doi.wiley.com/10.1002/ajhb.20927. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vitzthum, V. J. , Thornburg J., Spielvogel H., and Deschner T.. 2021. “Recognizing Normal Reproductive Biology: A Comparative Analysis of Variability in Menstrual Cycle Biomarkers in German and Bolivian Women.” American Journal of Human Biology 33, no. 5: e23663. 10.1002/ajhb.23663. [DOI] [PubMed] [Google Scholar]
- Vuorento, T. , Lahti A., Hovatta O., and Huhtaniemi I.. 1989. “Daily Measurements of Salivary Progesterone Reveal a High Rate of Anovulation in Healthy Students.” Scandinavian Journal of Clinical and Laboratory Investigation 49: 395–401. http://www.tandfonline.com/doi/full/10.3109/00365518909089113. [DOI] [PubMed] [Google Scholar]
- Walker, R. F. , Hughes I. A., Riad‐Fahmy D., and Read G. F.. 1978. “Assessment of Ovarian Function by Salivary Progesterone.” Lancet 312, no. 8089: 585. 10.1016/S0140-6736(78)92931-8. [DOI] [PubMed] [Google Scholar]
- Walker, R. F. , Read G. F., and Riad‐Fahmy D.. 1979. “Radioimmunoassay of Progesterone in Saliva: Application to the Assessment of Ovarian Function.” Clinical Chemistry 25, no. 12: 2030–2033. 10.1093/clinchem/25.12.2030. [DOI] [PubMed] [Google Scholar]
- Walker, R. F. , Walker S., and Riad‐Fahmy D.. 1984. “Salivary Progesterone as an Index of Luteal Function.” In Immunoassays of Steroids in Saliva: Proceedings of the Ninth Tenovus Workshop, Cardiff, November 1982, edited by Read G. F., Riad‐Fahmy D., Walker R. F., and Griffiths K., 115–126. Alpha Omega Publishing Ltd. https://archive.org/details/immunoassaysofst0000teno. [Google Scholar]
- Walker, S. , Mustafa A., Walker R. F., and Riad‐Fahmy D.. 1981. “The Role of Salivary Progesterone in Studies of Infertile Women.” BJOG 88, no. 10: 1009–1015. 10.1111/j.1471-0528.1981.tb01689.x. [DOI] [PubMed] [Google Scholar]
- Wang, D. Y. , and Knyba R. E.. 1985. “Salivary Progesterone: Relation to Total and Non‐Protein‐Bound Blood Levels.” Journal of Steroid Biochemistry 23, no. 6: 975–979. 10.1016/0022-4731(85)90055-X. [DOI] [PubMed] [Google Scholar]
- Webley, G. E. , and Edwards R.. 1985. “Direct Assay for Progesterone in Saliva: Comparison with a Direct Serum Assay.” Annals of Clinical Biochemistry 22, no. Pt 6: 579–585. https://journals.sagepub.com/doi/pdf/10.1177/000456328502200605. [DOI] [PubMed] [Google Scholar]
- Wenner, M. M. , and Stachenfeld N. S.. 2020. “Point: Investigators Should Control for Menstrual Cycle Phase When Performing Studies of Vascular Control That Include Women.” Journal of Applied Physiology 129, no. 5: 1114–1116. 10.1152/japplphysiol.00443.2020. [DOI] [PubMed] [Google Scholar]
- West, B. T. , Welch K. B., and Galecki A. T.. 2007. Linear Mixed Models, a Practical Guide Using Statistical Software. Chapman & Hall/CRC. [Google Scholar]
- Wilson, R. , Adams M. L., and Pyke K. E.. 2020. “Inclusion of Female Participants in Cardiovascular Research: A Case Study of Ontario NSERC‐Funded Programs.” Applied Physiology, Nutrition, and Metabolism 45, no. 8: 911–914. 10.1139/apnm-2019-0693. [DOI] [PubMed] [Google Scholar]
- Wong, Y. F. , Mao K., Panesar N. S., Loong E. P. L., Chang A. M. Z., and Mi Z. J.. 1990. “Salivary Estradiol and Progesterone During the Normal Ovulatory Menstrual Cycle in Chinese Women.” European Journal of Obstetrics & Gynecology and Reproductive Biology 34, no. 1–2: 129–135. 10.1016/0028-2243(90)90016-T. [DOI] [PubMed] [Google Scholar]
- Wood, P. 2009. “Salivary Steroid Assays – Research or Routine?” Annals of Clinical Biochemistry 46: 183–196. [DOI] [PubMed] [Google Scholar]
- Worthman, C. , and Costello E.. 2009. “Tracking Biocultural Pathways in Population Health: The Value of Biomarkers.” Annals of Human Biology 36, no. 3: 281–297. 10.1080/03014460902832934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ye, S. , Sun J., Craig S. R., et al. 2024. “Higher Oxygen Content and Transport Characterize High‐Altitude Ethnic Tibetan Women With the Highest Lifetime Reproductive Success.” Proceedings of the National Academy of Sciences 121, no. 45: e2403309121. 10.1073/pnas.2403309121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zorn, J. R. , McDonough P. G., Nessman C., Janssens Y., and Cedard L.. 1984. “Salivary Progesterone as an Index of the Luteal Function.” Fertility and Sterility 41, no. 2: 248–253. 10.1016/s0015-0282(16)47599-0. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data S1. Supporting information.
Data Availability Statement
The data that support the findings of this study are openly available in DRYAD at https://datadryad.org/.


