Table 3.
Data cleaning and analysis information of the included studies.
Study | Raw data preparation | Impossible values excluded | Missing data | Outlier | Analysis approach for cortisol data only | Statistical approach for the main research question |
---|---|---|---|---|---|---|
1. Abshire et al., (2018). United States | Data were checked for completeness, quality, and consistency. | NR | NR | NR | Original value | Nonparametric tests (including Mann–Whitney two-group comparisons) were used to examine the difference between implant strategy groups for continuous variables; categorical data comparisons were done using χ2 tests. A Spearman's rank correlation matrix was created to examine relationships between continuous psychological and physiological stress variables. Bivariate logistic regression modeling was used to explore relationships between physiological and psychological stress and dichotomized outcomes (high quality of life (QOL) and high functional status. |
2. Anderson et al., 2021. United States | Participants were initially excluded from cortisol assays if they reported use of psychotropic or steroid-based medications (excluding birth control). Participants were excluded if there was no actigraphy or low actigraphy wear time (<80% wear time; excluded 36 participants), they did not have all saliva samples on the required days (excluded 23 participants), they did not have actigraphy data (including sleep) on the appropriate day to align with saliva (excluded 17 participants), or they did not have demographic data (excluded 1 participant); Only participants who had two complete consecutive days of data and saliva samples from the following morning were included in analysis | NR | NR (Missing data was handled using mixed effect model) | NR | Original value | Multilevel linear models |
3. Armer et al., 2018. United States | Before statistical analyses, sampling time outliers for cortisol were removed. Ranges of sampling times were determined to fit the maximum number of participants while maintaining homogeneity. Acceptable ranges were from 0400 to 0900 h for morning cortisol collection, from 1600 to 1830 h for afternoon cortisol collection, and from 2000 to 2400 h for nocturnal cortisol collection. | NR | NR | Cortisol values greater than 4 standard deviations (SD) beyond the mean for a particular time point were excluded. | log transformation (natural log) | General linear models controlling for patient age were used, and Bonferroni corrections were applied to allow for pairwise comparisons between time points. Longitudinal analyses included all 3 time points in trajectory calculation and used linear mixed-effects models with fixed slopes and participant intercept terms, Mediation model |
4. Ayala-Grosso et al., 2021. Venezuela | Volunteers that failed in collecting the complete set of samples were excluded from the analysis. |
NR | NR | NR | log transformation | Correlation |
5. Basson et al., 2019. Canada | NR | NR | NR | NR | log transformation (log base 10) | Independent Samples t tests for group difference; simple linear regressions, ANOVA, linear mixed method |
6. Benz et al., 2019. Germany | Recorded times from the MEMS caps were checked against the times written down on the protocol sheets to allow identification of discrepancies, visual inspection of raw data; Special occurrences noted on the protocol sheets like heavy exercise or sickness were used to discard individual observations. | NA | interpolation of missing values after visual inspection of raw data | winsorizing of outliers | raw data | Type III ANOVAs |
7. Bernsdorf and Schwabe, 2018. Germany | NR | NR | NR | NR | raw data | Mixed model of ANOVA and correlations |
8. Bitsika et al., 2017. Australia | NR | NR | NR | NR | raw data | MANOVA models |
9. Boss et al., 2016. United States | NR | NR | NR | NR | log transformation (natural log) | Univariate analyses and multiple linear regression |
10. Chandola et al., 2018. UK | NR | NR | NR | NR | log transformation (natural log) | Multilevel growth curve model |
11. Charles et al., 2020. United States | NR | NR | Missing rate were low (this was mention for AL, to impute) | NR | raw data | Multi-level linear mixed effects model |
12. Chiang et al., 2016. United States | Morning saliva samples that were considered noncompliant according to actigraphy-based estimations of wake time were also assigned as missing given that the estimation of CAR is sensitive to timing of samples relative to actual wake time (Dockray et al., 2008; Stalder et al., 2016). Samples were deemed non-compliant if they were provided past a 15-min window around the actigraph wake time, and around the 15- and 30-min mark after actigraphy wake time. On any given day, 43–84 adolescents provided at least one non-compliant morning sample |
Cortisol values greater than 60 nmol/L were set to missing | multiple imputation was conducted in order to minimize potential bias stemming from missing data. All study variables, potential confounds, and auxiliary variables were included in imputation models, and twenty datasets were generated. | After excluding outliers and cortisol values from noncompliant saliva samples, 217 out of the 316 participants had complete data on all computed variables of interest and covariates. | log transformed | multiple linear regressions (run both log transformed and raw values. and results reported based on raw values, using multiple imputation dataset) |
13. Chin et al., 2017. United States | In all cases, samples were only included for analysis if they were collected ±45 min of the scheduled collection time. This was based on our earlier work indicating we could maintain 95% or more of the data using this range and at the same time retain the normal diurnal rhythm (e.g., Janicki-Deverts et al., 2016; also, see http://www.cmu.edu/common-cold-project//combining-the-5-studies/variable-modifications.html). Samples collected outside of this window were treated as missing. | NR | NR (using missing data concept to define sufficient data, but not report how to deal with missing data) | NR | log transformation (log base 10) | hierarchical multiple linear regression with waking day cortisol AUC as outcome, and multilevel modeling waking daily cortisol slope as outcome |
14. Corominas-Roso et al., 2017. Spain | NR | NR | NR | NR | log transformation (log base 10) | Pearson correlation |
15. Cuneo et al., 2017. United States | NR | NR | Three participants missing afternoon cortisol values had slopes calculated from morning and bedtime samples, an approach consistent with recommendations from Kraemer et al., (2006). | Participants possessing cortisol values ≥ 4 SD from the mean at any time-point were also excluded (N = 1) | log transformation (natural log) | General linear models |
16. D'Cunha et al., 2019. Australia | NR | NR | NR | NR | log transformation | Friedman test |
17. Darabos et al., 2020. United States | NR | NR | NR | NR | log transformation | Multiple linear regression |
18. Doolin et al., 2017. Ireland | NR | NR | NR | NR | log transformation | Mann-Whitney U test and correlation |
19. Engert et al., 2018. Germany | NR | NR | Because salivary cortisol and experience sampling self-report data were eventually averaged acrosstwo sampling days, missing values were replaced for these repeatedlysampled variables | winsorization of outliers. non-parametricSpearman correlations in all analyses. Because Spearman's correlation limits an outlier to the value of its rank, outliers were included unwinsorized. |
log transformation | Spearman Correlation, Network analysis |
20. Fuentecilla et al., 2019. United States | Participants completed "five to seven daily diary interviews with a mean of 6.87 interviews (SD = 0.37) and provided saliva on average 3.99 (SD = 0.07) of the diary days. Given that waking up in the late afternoon is associated with cortisol output, the days in which participants woke up in the afternoon (n = 5 were excluded). Thus, of the total 563 valid days, 5 days were removed from the analysis, resulting in a total of 558 days. | Cortisol values were examined on a daily basis and removed if participants did not complete a daily interview, participants did not indicate time of sample collection, at least one cortisol value was over 60 nmol/L, participants were awake for less than 12 h or more than 20 h, or woke up past 12:00 noon. The entire day was excluded if there was less than 15 min or more than 60 min between the waking cortisol sample and the 30-min cortisol sample. | multilevel model can handle missing data | NR | The skew and kurtosis of each cortisol value was assessed. Due to the non- normal distribution of the cortisol levels, the natural log was calculated for all cortisol values and used for all analyses. | Multilevel modeling |
21. Garcia A.F. et al., 2017. United States | To minimize the potential effects of exposure to stressful events during the sampling period, participants who were currently students were not sampled the week prior to scheduled class examinations. Inaddition, participants indicating daily hassles or exposure to stressful daily events or protocol non-compliance during sampling periods (teeth brushing, etc.) were excluded from the final analyses. |
NR | NR | NR | the results based on raw score; but also use log transformed variables for modeling | Mixed effects regression model and path analysis. |
22. Garcia M.A. et al., 2021. United States | NR | NR | NR | NR | NR | correlation and ANOVA |
23. Goldstein et al., 2017. United States | Samples were excluded if the adolescent reported being sick; participants were only included in analyses if they had at least 1 day with all 3 samples meeting inclusion criteria. | NR | excluded participants with only one day of samples (this did not alter results) | the cortisol level was more than 3 SD above the mean for the cohort. Samples were also excluded if they fell outside the following time windows: waking samples taken more than 10 min after waking time,30-min samples taken less than 15 or more than 45 min after waking,and evening samples taken before 16:00 h or after 24:00 h. |
Prior to conducting inferential statistics all individual cortisol samples were adjusted for sampling time since waking using regression | t-test, linear regression |
24. Herane-Vives et al., 2018. UK and Chile | NR | NR | NR | NR | raw data | ANOVA, linear regression and logistic regression |
25. Ho, Lo et al., 2020. Hongkong, China | NR | NR | Missing data was handled using full information maximum likelihood under the missing-at-random assumption for the intent-to-treat analytic approach. | NR | log transformation | t-test, latent difference score approach |
26. Ho, Fong, Yau et al., 2020. Hongkong, China | NR | NR | Missing data were handled via full information maximum likelihood under the missing-at-random assumption | Cortisol analysis was based on 838 valid samples (98.0%) after removing 17 outliers that deviated substantially (>3 standard deviations) from the mean. | raw data | structural equation modeling |
27. Ho, Fong, Chan et al., 2020. Hongkong, China | NR | NR | Missing data were handled via full information maximum likelihood under the missing-at random assumption, which allowed the analysis of all of the available data under the standard intent-to-treat clinical approach | Preliminary screening of cortisol values winsorized outliers that deviated substantially (>3 SD) from the means. A total of 17, 13, 21, and 11 cortisol outliers were winsorized among the 853, 821, 761, and 678 samples at Time1, Time 2, Time 3, and Time 4, respectively. |
raw data | Multigroup latent growth modeling |
28. Holmqvist-Jamsen et al., 2017. Finland | NR | NR | NR | The cortisol values were winsorized to reduce the effect of potentially spurious outliers by setting outliers to 3 SD from the mean |
raw data | GEE |
29. Hooper, 2019. United states | NR | NR | NR only mention smoking status | NR | log transformation | Repeated measures ANOVA tested the effects of time of day, race/ethnicity, and their interactions on cortisol levels. Models controlled for income, education (continuous variables), and smoking status. Multivariate logistic regression models examined the odds of smoking relapse at the one-month follow-up by race/ethnicity, while controlling for (1) demographic covariates and (2) demographic covariates and baseline cortisol slope. |
30. Huang et al., 2020. Taiwan, China | salivary cortisol data of 6 hepatocellular carcinoma patients were incomplete because the participants had forgotten to collect their saliva at certain time points. | NR | NR | NR | t tests to assess the difference in mean cortisol levels at each time point between the subgroups, GEE |
|
31. Huynh et al., 2016. United states | Adolescents provided three days of cortisol samples on different days of the week. Only weekday samples were included in the analyses | Samples with cortisol values over 60 (n = 14) were removed. Morning samples in which participants reported more than 30 min between sample 1 and sample 2 (n = 12) or more than 60 min between collecting sample 1 and sample 3 (n = 10) for a particular day were flagged. Analyses excluding these cases did not change the results, therefore these samples were not excluded from the final analyses. Above description is not clear that the exclusion is impossible value or treated as outlier. |
NR | NR | log transformation | multiple regression |
32. JakuszkowiakWojtenet al., 2016. Poland | Six subjects delivered incomplete sets of saliva samples and were excluded from the analysis | NR | NR | NR | raw data | Chi square; Pearson correlation |
33. Johnson et al., 2020. Canada | NR | Cortisol values greater than 4 standard deviations above the sample mean for that timepoint were removed | The variables used in the analysis were examined for missing data using the MissMech package in R. The pattern of missing data as well as a non-significant Little's MCAR (missing completely at random) tests indicated that there was not enough evidence to reject the MCAR assumptions. Missing data were imputed using a multiple imputation with predictive mean matching method in the MICE package |
Cortisol values greater than 4 standard deviations above the sample mean for that timepoint were removed. | To adjust for the non-normal distributions of the raw cortisol values, all values were transformed using a natural log transformation and the transformed values were used for all analyses | multilevel structural equation modeling framework |
34. Keefe et al., 2018. United states | The average subject had 94.3% of pre-treatment measurements completed (mean = 11.3), and 92.8% of post-treatment measurements completed (mean = 11.1). | NR | All collected awakening and post-awakening measurements were used in the model, under the assumption that any given unobserved measurement was missing at random | NR | log transformation (log base 10) | mixed model |
35. Kristiansen et al., 2020. Sweden | Only if there was a congruency between either exact time entries in the diary or event entries in the ECG with the movement pattern and increased heart rate (indicating awakening) were the morning samples included in the analysis. Based on this strict selection, 83% of the patients had acceptable cortisol samples and were included in the analysis (167 out of 201 individuals). Individuals with diabetes had a lower rate of successful sampling than controls (80% versus 88%), mostly due to low glucose levels in the morning that impeded cortisol sampling in some cases. Children had a lower rate of successful sampling than adults (80% versus 91%). | NR | NR | NR | log transformation (natural log) | Mann–Whitney U test |
36. Labad et al., 2018. Spain | NR | NR | NR | NR | Cortisol values were transformed to approximate a normal distribution, as suggested by recent expert consensus guidelines. The following power transformation was used: X’ = (Xˆ0.26 − 1)/0.26 | Pearson correlations (and Spearman correlations, when needed), GLM, Three separate multiple regression analyses |
37. Landau et al., 2021. Australia | Consecutive morning saliva samples were averaged to create average Cortmorn and average CRPmorn values; evening saliva samples were calculated the same to create average Corteve and average CRPeve values. Morning Cort:CRP ratio (Cort:CRPmorn) was calculated by dividing untransformed Cortmorn values by untransformed CRPmorn values, and evening Cort:CRP ratio (Cort:CRPeve) was calculated in the same manner with Corteve and CRPeve values. Diurnal cortisol slopes were calculated by taking the difference between natural-log transformed Cortmorn and Corteve values divided by time between sample collection. Saliva data outliers (n = 5 at T1 and n = 4 at T3) were winsorized to 0.01 μg/dL for cortisol values and 0.01 pg/mL for CRP values. | NR | Out of the 122 intention to treat sample at T1, a total of 107 participants (87.7% of the total sample) provided full or partial T3 (follow-up) data. Multiple Imputation was performed on the entire dataset. Predictive mean matching imputation was used for quantitative continuous data (e.g., saliva, questionnaires), and logistic regression was used for categorical data. Out of the 122 intentions to treat sample at T1, a total of 107 participants (87.7% of the total sample) provided full or partial T3 (follow-up) data. Little's Missing Completely at Random (MCAR) tests were used to test for patterns of missingness in the data prior to imputation. Little's MCAR results indicated non-significance (statistics not shown) suggesting MCAR and acceptability to multiple imputation. Multiple imputation was performed on the entire dataset using the ‘Multiple Imputation by Chained Equations’ (mice) package in RStudio with all variables included in the present study. Predictive mean matching imputation, considered more robust for use with non-normal data was used for quantitative continuous data (e.g., saliva, questionnaires), and logistic regression was used for categorical data. Percentage of variables missing and other missingness assumptions are presented inSupplemental Table 1. |
Outliers > ±3 standard deviations (SD) above/below the mean were investigated by log-transforming the values (ref to Laudau2019).,Saliva data outliers (n = 5 at T1 and n = 4 at T3) were winsorized to 0.01 μg/dL for cortisol values Outliers for questionnaire variables were not adjusted (as in Blake et al., 2016, 2017a, 2017b, 2018) because research has shown psychological variables are typically positively skewed in non-clinical populations with outliers to be expected due to the self-report nature of these measures. |
raw data and log transformation (natural log) | Simple regression analyses; A series of analyses of covariance (ANCOVA)A series of multivariate linear and logistic regression analyses |
38. Laures-Gore et al., 2019. United States | NR | NR | NR | NR | raw data | Repeated measures ANOVA |
39. Liu et al., 2017. United States | A saliva sample was invalidif: 1) the caregiver was awake for less than 12hr or greater than 20hr (n = 14), or 2) the caregiver woke up after 12pm (n = 0), or 3) for cortisol assay specifically, there was a greater than 10 nmol/L rise between the second (30 min after getting out of bed) and third sample (before lunch) (n = 11), or 4) the recorded collection time between the first (upon wakeup) and second sample (30 min after getting out of bed) is either less than 15min or greater than 60 min (n = 99). | NR | NR | NR | raw data | growth curve models |
40. Mitchell et al. (2020). United States | NR | NR | Not specify, only mention to include who provide complete data. | NR | descriptive | Hierarchical general linear modeling |
41. Morgan et al. (2017). United States | NR | NR | NR | NR | Cortisol Modeling: Y_ij = f(t_ij)+α_i+ϵ_ij Yij: the log-transformed cortisol value for the jth sample from the ith respondent; tij: the time at which the sample was taken αi: a respondent-level deviation from the mean with distribution N(0, σ2α). The error term ϵij is assumed to be independent with distribution N(0,σ2), log transformed average cortisol levels |
Unadjusted and adjusted multiple linear regression. These models were fit using the survey weights distributed with the data set that accounts for differential probabilities of selection and differential nonresponse. Design-based standard errors were obtained using the linearization method46 as implemented in the Stata statistical software package version 13.1.47 |
42. Otto et al. (2018). United States | Days were excluded from the calculation of the cortisol indices if (1) saliva collection time stamps were missing, (2) the participant woke up after 12 pm, (3) the participant was awake <12 h or >20 h, or (4) if there was an indication of non-compliance with the saliva collection protocol such that <15 or >60 min elapsed between the first two measurements (Stawski, Cichy, Piazza and Almeida, 2013). The analytic sample sizes were 46 participants for DCS and 43 participants for CAR and AUCg. | NR | NR | NR | raw data | linear regression |
43. Pace et al. (2021). United States | Success was defined as obtaining biomarker data from ≥85% of samples per protocol. Saliva concentrations of cortisol were averaged across collection days in morning, afternoon, or evening because an effect of day was not expected; We first examined biomarker and HRQOL variables by computing means and their standard errors by biomarker and time point (for cortisol only). | NR | not specify, only mentioned 96% and 92% of saliva samples were collected from survivors and caregivers | NR | Data that were not normally distributed (Shapiro–Wilk test) were naturallog transformed before any inferential testing | Examined the association between biomarker variables (CRP, AM cortisol, PM cortisol, and cortisol slope) and HRQOL domains by computing partial and semi partial correlation coefficients controlling for body mass index (BMI) and chemotherapy treatment (survivors) and Pearson product-moment correlation coefficients (caregivers). A Spearman's rank correlation coefficient was computed instead for associations where one or both outcomes were not normally distributed. |
44. Ramos-Quiroga et al. (2016). Spain | NR | NR | NR | NR | Because the distribution of cortisol values was positively skewed, these data have been base-10 logarithmically transformed prior to any further analyses. | Chi-square test (χ2); repeated measures ANCOVA; Spearman-Rho correlations |
45. Rosnick et al. (2016). United States | NR | NR | NR | NR | NR | GEE analysis was conducted to examine the between treatment group difference in peak cortisol change over time from pre- to post-augmentation. |
46. Sampedro-Piquero et al. (2020). Spain | NR | NR | NR | NR | raw data | RM ANOVA and MANOVA, Pearson correlation |
47. Schreier and Chen, 2017. United States | Cortisol data were unavailable for 17 adolescents who did not return useable samples. These adolescents did not differ from participants who returned useable samples with respect to age, BMI, chronic and acute stress ratings, ethnicity, and family income (ps > 0.10) but were more likely to be female (χ2 (1) = 6.184, p = .013). On average, adolescents completed 5.47 (±1.03) out of the 6 days. | NR | NR | NR | log transformation | hierarchical multiple regression analyses |
48. Schuler et al., 2017. United States | Before testing hypotheses, cortisol data were inspected for outliers. | NR | NR | Four criteria were used to identify outliers, namely, (1) standardized cortisol values were bigger than three standard deviations from the mean; (2) adolescent participants were ill on a given sampling day (e.g., any illness symptoms indicated in the diary); (3) blood contamination (e.g., from cuts in the mouth); and (4) saliva samples deemed to be collected nonadherent to sampling instructions (i.e., participants ate or drank before collecting saliva samples or saliva samples were collected outside the instructed time) | raw data | a hierarchical multiple regression |
49. Seidenfaden et al. (2017). Denmark | NR | NR | For series of samples with more than one sample missing, the AUC was not computed. If only one sample was missing, values were replaced by the mean of the two adjacent values, or, if the missing value were either the awakening or 11 pm sample, by the mean of the full sample for that time point. | Before computations, extreme values in each group for each time point (outside the 99th percentile) were excluded (30 out of a total of 658 determinations). | NR | repeated measures ANOVA |
50. Sin et al. (2017). United States | NR | NR | Models were estimated using full information maximum likelihood estimation in SAS 9.4 PROC MIXED, which makes use of all available data in the estimation of parameters and can flexibly handle missing data | cortisol samples were excluded where the cortisol level was >60 nmol/L (1.46%), the time stamp was missing (1.28%), or the lunch sample was ≥10 nmol/L more than the 30-min post-waking sample (suggesting that participants ate before collecting their saliva, 1.82%). Further, cortisol samples were excluded from days when participants woke before 4 a.m. (3.14%) or after 12 pm (0.67%), or days when <15 or >60 min elapsed between the first two samples (indicators of noncompliance that influence assessment of the awakening response, 9.74%). | log transformation (natural log) | Multilevel modeling |
51. Starr et al. (2017). United States | Of the original sample of 241, 12 were excluded from cortisol procedures for medical reasons, and 18 declined to participate in cortisol procedures or failed to return samples, leaving 211 participants with samples that were assayed. careful measures were taken to exclude values that might not accurately represent the CAR. | NR | Cortisol values at each sampling time were winsorized to correct for extreme outliers (>3SD; 5 data points for waking, 2 for +30 min, and 5 for +60 min) | Both variables were winsorized to 3 SD to correct for outliers | NR | Moderation analysis, linear regression |
52. Strahler and Nater, 2018. Germany | NR | NR | NR | NR | NR | Hierarchical linear models |
53. Tada, 2018.Japan | NR | NR | NR | NR | NR | Baseline data on POMS-SF and salivary biomarkers of both groups were compared using the Mann–Whitney U test. Wilcoxon signed-rank tests were used to compare differences in the groups' scores at baseline and 6-month follow-up. Correlations between changes in cortisol level and in POMS-SF “fatigue” score were assessed using Pearson correlation coefficients |
54. Urizar et al. (2021). United States | veraging the cortisol values across the two saliva collection days at each study time point. | no impossible values based on no outliers | Missing cortisol samples for a particular collection day were estimated by using the participant's second day sample for that timepoint. | No cortisol outliers (defined as being three standard deviations from the mean for each cortisol index) were identified in the current investigation; | log transformation (log base 10) | Pearson correlation, mixed effect linear model |
55. Walls et al., 2020. United States | Single Sample Values were examined for possible measurement error | NR | NR | Single Sample Values were examined for possible measurement error and any outlier values that required deeper examination; We also performed separate t-tests to examine the influence of the largest discrepancies (i.e. outliers and extreme cases) on cortisol indices. | raw data | Pearson correlation, t-test |
56. Wong and Shobo, 2017. United States | A set of criteria was used to determine the analytic sample. 235 did not provide saliva samples and were dropped. Individuals who did not follow the cortisol collection procedures (n = 10) and those who did not provide complete data on medication use (n = 79) were dropped. | Following the Winsorization statistical approach (Dixon and Yuen, 1974), salivary cortisol values higher than 60 nmol/L were recoded as 61 to minimize the influence of extreme outliers. |
NR | Following the Winsorization statistical approach (Dixon and Yuen, 1974), salivary cortisol values higher than 60 nmol/L were recoded as 61 to minimize the influence of extreme outliers. |
log transformation | Two-level multilevel models |
57. Yu et al. (2016). Netherlands | All samples were checked for correctness of sampling. Cases were excluded from analyses if the cortisol data were of incorrect sampling time, unclear how it was sampled (i.e., not registered), contaminated (e.g., by smoking or brushing teeth), or of extreme values (i.e., >3 SD from average | NR | Reported attrition and little's MCAR test; applied Full Information Maximum Likelihood (FIML) in Mplus for the model estimations | NR | raw data | Multiple regression models incorporating latent growth models |
58. Yu et al. (2019). Netherlands | NR | analyses of all variables used in this study revealed a normed χ2 (χ2/df) of 1.04, which indicates that the pattern of the missing data was not materially different from a missing completely at random pattern | NR | NR | raw data | mixed model |
Notes. NR: not reported.
GEE: generalized estimating equations; ANOVA: Analyses of variance.