. 2022 Oct 31;12:18311. doi: 10.1038/s41598-022-22709-9

Table 1.

Study design overview.

Question	Hypothesis (if applicable)	Sampling Plan (e.g., power analysis)	Analysis Plan	Interpretation given to different outcomes
RQ1. Is the judgement of trustworthiness of singular facial parts (eyes/ mid-face /mouth) different from the judgement of trustworthiness of whole faces?	H0: Trustworthiness judgements of singular facial parts are not different from trustworthiness judgements of the whole face H1: Trustworthiness judgements of singular facial parts are different from trustworthiness judgements of the whole face	α = 5%, minimum power = 99%, two-sided; ICC = 0.30; 15% non-response and dropout rate), needed sample size is N = 2276 raters⁵³	Random-effects multilevel model: L1 (within-person) = ratings of different faces/facial parts; L2 (between-person) = raters	If trustworthiness judgements of one or more facial parts do not significantly differ (p ≥ 0.05) from the whole face (i.e., reference category), then this/these facial part/s is/are primarily responsible for the trustworthiness judgments of faces
RQ2A. Do humans judge the trustworthiness of faces/facial parts of the stimuli from their own ethnicity differently compared to stimuli from other ethnicities?	H0: Humans do not judge the trustworthiness of faces/facial parts of the stimuli from their own ethnicity differently compared to stimuli from other ethnicities H1: Humans judge the trustworthiness of faces/facial parts of the stimuli from their own ethnicity differently compared to stimuli from other ethnicities	α = 5%, minimum power = 99%, two-sided; ICC = 0.30; 15% non-response and dropout rate), needed sample size is N = 2276 raters⁵³	Random-effects multilevel model: L1 (within-person) = ratings of different faces/facial parts; L2 (between-person) = raters	If one of the rater ethnicities is significant (p < 0.05), this means that raters judge the trustworthiness of whole faces/facial parts differently depending on whether target’s ethnicity matches/mismatches their own ethnicity If none of the rater ethnicities are significant (p ≥ 0.05), this means that raters judge the trustworthiness of whole faces/ facial parts independent of whether target’s ethnicity matches/mismatches their own ethnicity
RQ2B. Do humans judge the trustworthiness of faces/facial parts of stimuli from the dominant ethnicity of their social environment differently compared to stimuli from other ethnicities?	H0: Humans do not judge the trustworthiness of faces/facial parts of stimuli from the dominant ethnicity of their social environment differently compared to stimuli from other ethnicities H1: Humans judge the trustworthiness of faces/facial parts of stimuli from the dominant ethnicity of their social environment differently compared to stimuli from other ethnicities	α = 5%, minimum power = 99%, two-sided; ICC = 0.30; 15% non-response and dropout rate), needed sample size is N = 2276 raters⁵³	Random-effects multilevel model: L1 (within-person) = ratings of different faces/facial parts; L2 (between-person) = raters	If the rater’s dominant ambient ethnicity is significant (p < 0.05), this means there is a difference between raters’s dominant ambient ethnicity and other ethnicities regarding the judgements of trustworthiness of whole faces and facial parts If the rater’s dominant ambient ethnicity is not significant (p ≥ 0.05), this means there is no difference between the rater’s dominant ambient ethnicity and other ethnicities regarding the judgements of trustworthiness of whole faces and facial parts
EA1: target sex	Not applicable	α = 5%, minimum power = 99%, two-sided; ICC = 0.30; 15% non-response and dropout rate), needed sample size is N = 2276 raters⁵³	Random-effects multilevel model: L1 (within-person) = ratings of different faces/facial parts; L2 (between-person) = raters
EA2: rater sex, eye color, and hair color	Not applicable	α = 5%, minimum power = 99%, two-sided; ICC = 0.30; 15% non-response and dropout rate), needed sample size is N = 2,276 raters⁵³	Random-effects multilevel model: L1 (within-person) = ratings of different faces/facial parts; L2 (between-person) = raters
EA3: difficulty of rating the stimuli of the full faces, eyes parts, mid-face parts, and mouth parts	Not applicable	α = 5%, minimum power = 99%, two-sided; ICC = 0.30; 15% non-response and dropout rate), needed sample size is N = 2,276 raters⁵³	Random-effects multilevel model: L1 (within-person) = ratings of different faces/facial parts; L2 (between-person) = raters