. 2024 Dec 29;43:100936. doi: 10.1016/j.bbih.2024.100936

Table 3.

Data cleaning and analysis information of the included studies.

Study	Raw data preparation	Impossible values excluded	Missing data	Outlier	Analysis approach for cortisol data only	Statistical approach for the main research question
1. Abshire et al., (2018). United States	Data were checked for completeness, quality, and consistency.	NR	NR	NR	Original value	Nonparametric tests (including Mann–Whitney two-group comparisons) were used to examine the difference between implant strategy groups for continuous variables; categorical data comparisons were done using χ2 tests. A Spearman's rank correlation matrix was created to examine relationships between continuous psychological and physiological stress variables. Bivariate logistic regression modeling was used to explore relationships between physiological and psychological stress and dichotomized outcomes (high quality of life (QOL) and high functional status.
2. Anderson et al., 2021. United States	Participants were initially excluded from cortisol assays if they reported use of psychotropic or steroid-based medications (excluding birth control). Participants were excluded if there was no actigraphy or low actigraphy wear time (<80% wear time; excluded 36 participants), they did not have all saliva samples on the required days (excluded 23 participants), they did not have actigraphy data (including sleep) on the appropriate day to align with saliva (excluded 17 participants), or they did not have demographic data (excluded 1 participant); Only participants who had two complete consecutive days of data and saliva samples from the following morning were included in analysis	NR	NR (Missing data was handled using mixed effect model)	NR	Original value	Multilevel linear models
3. Armer et al., 2018. United States	Before statistical analyses, sampling time outliers for cortisol were removed. Ranges of sampling times were determined to fit the maximum number of participants while maintaining homogeneity. Acceptable ranges were from 0400 to 0900 h for morning cortisol collection, from 1600 to 1830 h for afternoon cortisol collection, and from 2000 to 2400 h for nocturnal cortisol collection.	NR	NR	Cortisol values greater than 4 standard deviations (SD) beyond the mean for a particular time point were excluded.	log transformation (natural log)	General linear models controlling for patient age were used, and Bonferroni corrections were applied to allow for pairwise comparisons between time points. Longitudinal analyses included all 3 time points in trajectory calculation and used linear mixed-effects models with fixed slopes and participant intercept terms, Mediation model
4. Ayala-Grosso et al., 2021. Venezuela	Volunteers that failed in collecting the complete set of samples were excluded from the analysis.	NR	NR	NR	log transformation	Correlation
5. Basson et al., 2019. Canada	NR	NR	NR	NR	log transformation (log base 10)	Independent Samples t tests for group difference; simple linear regressions, ANOVA, linear mixed method
6. Benz et al., 2019. Germany	Recorded times from the MEMS caps were checked against the times written down on the protocol sheets to allow identification of discrepancies, visual inspection of raw data; Special occurrences noted on the protocol sheets like heavy exercise or sickness were used to discard individual observations.	NA	interpolation of missing values after visual inspection of raw data	winsorizing of outliers	raw data	Type III ANOVAs
7. Bernsdorf and Schwabe, 2018. Germany	NR	NR	NR	NR	raw data	Mixed model of ANOVA and correlations
8. Bitsika et al., 2017. Australia	NR	NR	NR	NR	raw data	MANOVA models
9. Boss et al., 2016. United States	NR	NR	NR	NR	log transformation (natural log)	Univariate analyses and multiple linear regression
10. Chandola et al., 2018. UK	NR	NR	NR	NR	log transformation (natural log)	Multilevel growth curve model
11. Charles et al., 2020. United States	NR	NR	Missing rate were low (this was mention for AL, to impute)	NR	raw data	Multi-level linear mixed effects model
12. Chiang et al., 2016. United States	Morning saliva samples that were considered noncompliant according to actigraphy-based estimations of wake time were also assigned as missing given that the estimation of CAR is sensitive to timing of samples relative to actual wake time (Dockray et al., 2008; Stalder et al., 2016). Samples were deemed non-compliant if they were provided past a 15-min window around the actigraph wake time, and around the 15- and 30-min mark after actigraphy wake time. On any given day, 43–84 adolescents provided at least one non-compliant morning sample	Cortisol values greater than 60 nmol/L were set to missing	multiple imputation was conducted in order to minimize potential bias stemming from missing data. All study variables, potential confounds, and auxiliary variables were included in imputation models, and twenty datasets were generated.	After excluding outliers and cortisol values from noncompliant saliva samples, 217 out of the 316 participants had complete data on all computed variables of interest and covariates.	log transformed	multiple linear regressions (run both log transformed and raw values. and results reported based on raw values, using multiple imputation dataset)
13. Chin et al., 2017. United States	In all cases, samples were only included for analysis if they were collected ±45 min of the scheduled collection time. This was based on our earlier work indicating we could maintain 95% or more of the data using this range and at the same time retain the normal diurnal rhythm (e.g., Janicki-Deverts et al., 2016; also, see http://www.cmu.edu/common-cold-project//combining-the-5-studies/variable-modifications.html). Samples collected outside of this window were treated as missing.	NR	NR (using missing data concept to define sufficient data, but not report how to deal with missing data)	NR	log transformation (log base 10)	hierarchical multiple linear regression with waking day cortisol AUC as outcome, and multilevel modeling waking daily cortisol slope as outcome
14. Corominas-Roso et al., 2017. Spain	NR	NR	NR	NR	log transformation (log base 10)	Pearson correlation
15. Cuneo et al., 2017. United States	NR	NR	Three participants missing afternoon cortisol values had slopes calculated from morning and bedtime samples, an approach consistent with recommendations from Kraemer et al., (2006).	Participants possessing cortisol values ≥ 4 SD from the mean at any time-point were also excluded (N = 1)	log transformation (natural log)	General linear models
16. D'Cunha et al., 2019. Australia	NR	NR	NR	NR	log transformation	Friedman test
17. Darabos et al., 2020. United States	NR	NR	NR	NR	log transformation	Multiple linear regression
18. Doolin et al., 2017. Ireland	NR	NR	NR	NR	log transformation	Mann-Whitney U test and correlation
19. Engert et al., 2018. Germany	NR	NR	Because salivary cortisol and experience sampling self-report data were eventually averaged acrosstwo sampling days, missing values were replaced for these repeatedlysampled variables	winsorization of outliers. non-parametricSpearman correlations in all analyses. Because Spearman's correlation limits an outlier to the value of its rank, outliers were included unwinsorized.	log transformation	Spearman Correlation, Network analysis
20. Fuentecilla et al., 2019. United States	Participants completed "five to seven daily diary interviews with a mean of 6.87 interviews (SD = 0.37) and provided saliva on average 3.99 (SD = 0.07) of the diary days. Given that waking up in the late afternoon is associated with cortisol output, the days in which participants woke up in the afternoon (n = 5 were excluded). Thus, of the total 563 valid days, 5 days were removed from the analysis, resulting in a total of 558 days.	Cortisol values were examined on a daily basis and removed if participants did not complete a daily interview, participants did not indicate time of sample collection, at least one cortisol value was over 60 nmol/L, participants were awake for less than 12 h or more than 20 h, or woke up past 12:00 noon. The entire day was excluded if there was less than 15 min or more than 60 min between the waking cortisol sample and the 30-min cortisol sample.	multilevel model can handle missing data	NR	The skew and kurtosis of each cortisol value was assessed. Due to the non- normal distribution of the cortisol levels, the natural log was calculated for all cortisol values and used for all analyses.	Multilevel modeling
21. Garcia A.F. et al., 2017. United States	To minimize the potential effects of exposure to stressful events during the sampling period, participants who were currently students were not sampled the week prior to scheduled class examinations. Inaddition, participants indicating daily hassles or exposure to stressful daily events or protocol non-compliance during sampling periods (teeth brushing, etc.) were excluded from the final analyses.	NR	NR	NR	the results based on raw score; but also use log transformed variables for modeling	Mixed effects regression model and path analysis.
22. Garcia M.A. et al., 2021. United States	NR	NR	NR	NR	NR	correlation and ANOVA
23. Goldstein et al., 2017. United States	Samples were excluded if the adolescent reported being sick; participants were only included in analyses if they had at least 1 day with all 3 samples meeting inclusion criteria.	NR	excluded participants with only one day of samples (this did not alter results)	the cortisol level was more than 3 SD above the mean for the cohort. Samples were also excluded if they fell outside the following time windows: waking samples taken more than 10 min after waking time,30-min samples taken less than 15 or more than 45 min after waking,and evening samples taken before 16:00 h or after 24:00 h.	Prior to conducting inferential statistics all individual cortisol samples were adjusted for sampling time since waking using regression	t-test, linear regression
24. Herane-Vives et al., 2018. UK and Chile	NR	NR	NR	NR	raw data	ANOVA, linear regression and logistic regression
25. Ho, Lo et al., 2020. Hongkong, China	NR	NR	Missing data was handled using full information maximum likelihood under the missing-at-random assumption for the intent-to-treat analytic approach.	NR	log transformation	t-test, latent difference score approach
26. Ho, Fong, Yau et al., 2020. Hongkong, China	NR	NR	Missing data were handled via full information maximum likelihood under the missing-at-random assumption	Cortisol analysis was based on 838 valid samples (98.0%) after removing 17 outliers that deviated substantially (>3 standard deviations) from the mean.	raw data	structural equation modeling
27. Ho, Fong, Chan et al., 2020. Hongkong, China	NR	NR	Missing data were handled via full information maximum likelihood under the missing-at random assumption, which allowed the analysis of all of the available data under the standard intent-to-treat clinical approach	Preliminary screening of cortisol values winsorized outliers that deviated substantially (>3 SD) from the means. A total of 17, 13, 21, and 11 cortisol outliers were winsorized among the 853, 821, 761, and 678 samples at Time1, Time 2, Time 3, and Time 4, respectively.	raw data	Multigroup latent growth modeling
28. Holmqvist-Jamsen et al., 2017. Finland	NR	NR	NR	The cortisol values were winsorized to reduce the effect of potentially spurious outliers by setting outliers to 3 SD from the mean	raw data	GEE
29. Hooper, 2019. United states	NR	NR	NR only mention smoking status	NR	log transformation	Repeated measures ANOVA tested the effects of time of day, race/ethnicity, and their interactions on cortisol levels. Models controlled for income, education (continuous variables), and smoking status. Multivariate logistic regression models examined the odds of smoking relapse at the one-month follow-up by race/ethnicity, while controlling for (1) demographic covariates and (2) demographic covariates and baseline cortisol slope.
30. Huang et al., 2020. Taiwan, China	salivary cortisol data of 6 hepatocellular carcinoma patients were incomplete because the participants had forgotten to collect their saliva at certain time points.	NR	NR	NR		t tests to assess the difference in mean cortisol levels at each time point between the subgroups, GEE
31. Huynh et al., 2016. United states	Adolescents provided three days of cortisol samples on different days of the week. Only weekday samples were included in the analyses	Samples with cortisol values over 60 (n = 14) were removed. Morning samples in which participants reported more than 30 min between sample 1 and sample 2 (n = 12) or more than 60 min between collecting sample 1 and sample 3 (n = 10) for a particular day were flagged. Analyses excluding these cases did not change the results, therefore these samples were not excluded from the final analyses. Above description is not clear that the exclusion is impossible value or treated as outlier.	NR	NR	log transformation	multiple regression
32. JakuszkowiakWojtenet al., 2016. Poland	Six subjects delivered incomplete sets of saliva samples and were excluded from the analysis	NR	NR	NR	raw data	Chi square; Pearson correlation
33. Johnson et al., 2020. Canada	NR	Cortisol values greater than 4 standard deviations above the sample mean for that timepoint were removed	The variables used in the analysis were examined for missing data using the MissMech package in R. The pattern of missing data as well as a non-significant Little's MCAR (missing completely at random) tests indicated that there was not enough evidence to reject the MCAR assumptions. Missing data were imputed using a multiple imputation with predictive mean matching method in the MICE package	Cortisol values greater than 4 standard deviations above the sample mean for that timepoint were removed.	To adjust for the non-normal distributions of the raw cortisol values, all values were transformed using a natural log transformation and the transformed values were used for all analyses	multilevel structural equation modeling framework
34. Keefe et al., 2018. United states	The average subject had 94.3% of pre-treatment measurements completed (mean = 11.3), and 92.8% of post-treatment measurements completed (mean = 11.1).	NR	All collected awakening and post-awakening measurements were used in the model, under the assumption that any given unobserved measurement was missing at random	NR	log transformation (log base 10)	mixed model
35. Kristiansen et al., 2020. Sweden	Only if there was a congruency between either exact time entries in the diary or event entries in the ECG with the movement pattern and increased heart rate (indicating awakening) were the morning samples included in the analysis. Based on this strict selection, 83% of the patients had acceptable cortisol samples and were included in the analysis (167 out of 201 individuals). Individuals with diabetes had a lower rate of successful sampling than controls (80% versus 88%), mostly due to low glucose levels in the morning that impeded cortisol sampling in some cases. Children had a lower rate of successful sampling than adults (80% versus 91%).	NR	NR	NR	log transformation (natural log)	Mann–Whitney U test
36. Labad et al., 2018. Spain	NR	NR	NR	NR	Cortisol values were transformed to approximate a normal distribution, as suggested by recent expert consensus guidelines. The following power transformation was used: X’ = (Xˆ0.26 − 1)/0.26	Pearson correlations (and Spearman correlations, when needed), GLM, Three separate multiple regression analyses
37. Landau et al., 2021. Australia	Consecutive morning saliva samples were averaged to create average Cortmorn and average CRPmorn values; evening saliva samples were calculated the same to create average Corteve and average CRPeve values. Morning Cort:CRP ratio (Cort:CRPmorn) was calculated by dividing untransformed Cortmorn values by untransformed CRPmorn values, and evening Cort:CRP ratio (Cort:CRPeve) was calculated in the same manner with Corteve and CRPeve values. Diurnal cortisol slopes were calculated by taking the difference between natural-log transformed Cortmorn and Corteve values divided by time between sample collection. Saliva data outliers (n = 5 at T1 and n = 4 at T3) were winsorized to 0.01 μg/dL for cortisol values and 0.01 pg/mL for CRP values.	NR	Out of the 122 intention to treat sample at T1, a total of 107 participants (87.7% of the total sample) provided full or partial T3 (follow-up) data. Multiple Imputation was performed on the entire dataset. Predictive mean matching imputation was used for quantitative continuous data (e.g., saliva, questionnaires), and logistic regression was used for categorical data. Out of the 122 intentions to treat sample at T1, a total of 107 participants (87.7% of the total sample) provided full or partial T3 (follow-up) data. Little's Missing Completely at Random (MCAR) tests were used to test for patterns of missingness in the data prior to imputation. Little's MCAR results indicated non-significance (statistics not shown) suggesting MCAR and acceptability to multiple imputation. Multiple imputation was performed on the entire dataset using the ‘Multiple Imputation by Chained Equations’ (mice) package in RStudio with all variables included in the present study. Predictive mean matching imputation, considered more robust for use with non-normal data was used for quantitative continuous data (e.g., saliva, questionnaires), and logistic regression was used for categorical data. Percentage of variables missing and other missingness assumptions are presented inSupplemental Table 1.	Outliers > ±3 standard deviations (SD) above/below the mean were investigated by log-transforming the values (ref to Laudau2019).,Saliva data outliers (n = 5 at T1 and n = 4 at T3) were winsorized to 0.01 μg/dL for cortisol values Outliers for questionnaire variables were not adjusted (as in Blake et al., 2016, 2017a, 2017b, 2018) because research has shown psychological variables are typically positively skewed in non-clinical populations with outliers to be expected due to the self-report nature of these measures.	raw data and log transformation (natural log)	Simple regression analyses; A series of analyses of covariance (ANCOVA)A series of multivariate linear and logistic regression analyses
38. Laures-Gore et al., 2019. United States	NR	NR	NR	NR	raw data	Repeated measures ANOVA
39. Liu et al., 2017. United States	A saliva sample was invalidif: 1) the caregiver was awake for less than 12hr or greater than 20hr (n = 14), or 2) the caregiver woke up after 12pm (n = 0), or 3) for cortisol assay specifically, there was a greater than 10 nmol/L rise between the second (30 min after getting out of bed) and third sample (before lunch) (n = 11), or 4) the recorded collection time between the first (upon wakeup) and second sample (30 min after getting out of bed) is either less than 15min or greater than 60 min (n = 99).	NR	NR	NR	raw data	growth curve models
40. Mitchell et al. (2020). United States	NR	NR	Not specify, only mention to include who provide complete data.	NR	descriptive	Hierarchical general linear modeling
41. Morgan et al. (2017). United States	NR	NR	NR	NR	Cortisol Modeling: Y_ij = f(t_ij)+α_i+ϵ_ij Yij: the log-transformed cortisol value for the jth sample from the ith respondent; tij: the time at which the sample was taken αi: a respondent-level deviation from the mean with distribution N(0, σ2α). The error term ϵij is assumed to be independent with distribution N(0,σ2), log transformed average cortisol levels	Unadjusted and adjusted multiple linear regression. These models were fit using the survey weights distributed with the data set that accounts for differential probabilities of selection and differential nonresponse. Design-based standard errors were obtained using the linearization method46 as implemented in the Stata statistical software package version 13.1.47
42. Otto et al. (2018). United States	Days were excluded from the calculation of the cortisol indices if (1) saliva collection time stamps were missing, (2) the participant woke up after 12 pm, (3) the participant was awake <12 h or >20 h, or (4) if there was an indication of non-compliance with the saliva collection protocol such that <15 or >60 min elapsed between the first two measurements (Stawski, Cichy, Piazza and Almeida, 2013). The analytic sample sizes were 46 participants for DCS and 43 participants for CAR and AUCg.	NR	NR	NR	raw data	linear regression
43. Pace et al. (2021). United States	Success was defined as obtaining biomarker data from ≥85% of samples per protocol. Saliva concentrations of cortisol were averaged across collection days in morning, afternoon, or evening because an effect of day was not expected; We first examined biomarker and HRQOL variables by computing means and their standard errors by biomarker and time point (for cortisol only).	NR	not specify, only mentioned 96% and 92% of saliva samples were collected from survivors and caregivers	NR	Data that were not normally distributed (Shapiro–Wilk test) were naturallog transformed before any inferential testing	Examined the association between biomarker variables (CRP, AM cortisol, PM cortisol, and cortisol slope) and HRQOL domains by computing partial and semi partial correlation coefficients controlling for body mass index (BMI) and chemotherapy treatment (survivors) and Pearson product-moment correlation coefficients (caregivers). A Spearman's rank correlation coefficient was computed instead for associations where one or both outcomes were not normally distributed.
44. Ramos-Quiroga et al. (2016). Spain	NR	NR	NR	NR	Because the distribution of cortisol values was positively skewed, these data have been base-10 logarithmically transformed prior to any further analyses.	Chi-square test (χ2); repeated measures ANCOVA; Spearman-Rho correlations
45. Rosnick et al. (2016). United States	NR	NR	NR	NR	NR	GEE analysis was conducted to examine the between treatment group difference in peak cortisol change over time from pre- to post-augmentation.
46. Sampedro-Piquero et al. (2020). Spain	NR	NR	NR	NR	raw data	RM ANOVA and MANOVA, Pearson correlation
47. Schreier and Chen, 2017. United States	Cortisol data were unavailable for 17 adolescents who did not return useable samples. These adolescents did not differ from participants who returned useable samples with respect to age, BMI, chronic and acute stress ratings, ethnicity, and family income (ps > 0.10) but were more likely to be female (χ2 (1) = 6.184, p = .013). On average, adolescents completed 5.47 (±1.03) out of the 6 days.	NR	NR	NR	log transformation	hierarchical multiple regression analyses
48. Schuler et al., 2017. United States	Before testing hypotheses, cortisol data were inspected for outliers.	NR	NR	Four criteria were used to identify outliers, namely, (1) standardized cortisol values were bigger than three standard deviations from the mean; (2) adolescent participants were ill on a given sampling day (e.g., any illness symptoms indicated in the diary); (3) blood contamination (e.g., from cuts in the mouth); and (4) saliva samples deemed to be collected nonadherent to sampling instructions (i.e., participants ate or drank before collecting saliva samples or saliva samples were collected outside the instructed time)	raw data	a hierarchical multiple regression
49. Seidenfaden et al. (2017). Denmark	NR	NR	For series of samples with more than one sample missing, the AUC was not computed. If only one sample was missing, values were replaced by the mean of the two adjacent values, or, if the missing value were either the awakening or 11 pm sample, by the mean of the full sample for that time point.	Before computations, extreme values in each group for each time point (outside the 99th percentile) were excluded (30 out of a total of 658 determinations).	NR	repeated measures ANOVA
50. Sin et al. (2017). United States	NR	NR	Models were estimated using full information maximum likelihood estimation in SAS 9.4 PROC MIXED, which makes use of all available data in the estimation of parameters and can flexibly handle missing data	cortisol samples were excluded where the cortisol level was >60 nmol/L (1.46%), the time stamp was missing (1.28%), or the lunch sample was ≥10 nmol/L more than the 30-min post-waking sample (suggesting that participants ate before collecting their saliva, 1.82%). Further, cortisol samples were excluded from days when participants woke before 4 a.m. (3.14%) or after 12 pm (0.67%), or days when <15 or >60 min elapsed between the first two samples (indicators of noncompliance that influence assessment of the awakening response, 9.74%).	log transformation (natural log)	Multilevel modeling
51. Starr et al. (2017). United States	Of the original sample of 241, 12 were excluded from cortisol procedures for medical reasons, and 18 declined to participate in cortisol procedures or failed to return samples, leaving 211 participants with samples that were assayed. careful measures were taken to exclude values that might not accurately represent the CAR.	NR	Cortisol values at each sampling time were winsorized to correct for extreme outliers (>3SD; 5 data points for waking, 2 for +30 min, and 5 for +60 min)	Both variables were winsorized to 3 SD to correct for outliers	NR	Moderation analysis, linear regression
52. Strahler and Nater, 2018. Germany	NR	NR	NR	NR	NR	Hierarchical linear models
53. Tada, 2018.Japan	NR	NR	NR	NR	NR	Baseline data on POMS-SF and salivary biomarkers of both groups were compared using the Mann–Whitney U test. Wilcoxon signed-rank tests were used to compare differences in the groups' scores at baseline and 6-month follow-up. Correlations between changes in cortisol level and in POMS-SF “fatigue” score were assessed using Pearson correlation coefficients
54. Urizar et al. (2021). United States	veraging the cortisol values across the two saliva collection days at each study time point.	no impossible values based on no outliers	Missing cortisol samples for a particular collection day were estimated by using the participant's second day sample for that timepoint.	No cortisol outliers (defined as being three standard deviations from the mean for each cortisol index) were identified in the current investigation;	log transformation (log base 10)	Pearson correlation, mixed effect linear model
55. Walls et al., 2020. United States	Single Sample Values were examined for possible measurement error	NR	NR	Single Sample Values were examined for possible measurement error and any outlier values that required deeper examination; We also performed separate t-tests to examine the influence of the largest discrepancies (i.e. outliers and extreme cases) on cortisol indices.	raw data	Pearson correlation, t-test
56. Wong and Shobo, 2017. United States	A set of criteria was used to determine the analytic sample. 235 did not provide saliva samples and were dropped. Individuals who did not follow the cortisol collection procedures (n = 10) and those who did not provide complete data on medication use (n = 79) were dropped.	Following the Winsorization statistical approach (Dixon and Yuen, 1974), salivary cortisol values higher than 60 nmol/L were recoded as 61 to minimize the influence of extreme outliers.	NR	Following the Winsorization statistical approach (Dixon and Yuen, 1974), salivary cortisol values higher than 60 nmol/L were recoded as 61 to minimize the influence of extreme outliers.	log transformation	Two-level multilevel models
57. Yu et al. (2016). Netherlands	All samples were checked for correctness of sampling. Cases were excluded from analyses if the cortisol data were of incorrect sampling time, unclear how it was sampled (i.e., not registered), contaminated (e.g., by smoking or brushing teeth), or of extreme values (i.e., >3 SD from average	NR	Reported attrition and little's MCAR test; applied Full Information Maximum Likelihood (FIML) in Mplus for the model estimations	NR	raw data	Multiple regression models incorporating latent growth models
58. Yu et al. (2019). Netherlands	NR	analyses of all variables used in this study revealed a normed χ² (χ²/df) of 1.04, which indicates that the pattern of the missing data was not materially different from a missing completely at random pattern	NR	NR	raw data	mixed model

Notes. NR: not reported.

GEE: generalized estimating equations; ANOVA: Analyses of variance.