ABSTRACT
Background
A key outcome in aesthetic treatments is the patient's view of how their skin looks and feels after a treatment.
Objectives
The aim of this study was to add a Treatment Outcome scale to the SKIN-Q patient-reported outcome measure.
Methods
Concept elicitation interviews were performed with patients in Canada and the United States. Data were coded, analyzed, and used to draft a Treatment Outcome scale. The scale was refined with patient and expert feedback and field tested in an online sample (ie, Prolific). Psychometric analyses were performed to examine reliability and validity.
Results
The concept elicitation interviews included 26 participants. The first draft of the Treatment Outcome scale included 32 items that assessed changes in appearance (eg, look better) and well-being (eg, feel more confident). Items were revised with input from 12 experts, 11 patients, and 174 online participants who had aesthetic face and/or body treatments and provided 180 survey responses, resulting in 36 items. Prolific data from 499 participants provided 542 assessments. The sample comprised 80.6% women; 78.8% had a facial treatment, 11.4% had a body treatment, and 9.8% had both. Data for a final 10-item Treatment Outcome scale fit the Rasch model (chi-square = 50.46, df = 40, P = .124). The scale evidenced high reliability, with the person separation index, Cronbach α, and intraclass correlation coefficent values ≥.87. A total of 19 out of 22 (86%) predefined construct validation hypotheses were accepted.
Conclusions
This new SKIN-Q scale can be used alongside other patient-centered outcome tools to measure outcomes of minimally invasive aesthetic treatments.
Level of Evidence: 5 (Diagnostic)
In aesthetics, there is a growing interest in minimally invasive surgical techniques to enhance skin appearance and quality. According to the 2023 statistics from the American Society of Plastic Surgery, the use of these treatments has increased by 7% from the previous year, with 25 million procedures performed annually.1 Treatments such as platelet-rich plasma injections, autologous fat transfers, and various collagen induction therapies are reported to have regenerative effects on the skin by improving vascularity and promoting collagen and elastin formation.2-5
However, research designed to evaluate the effectiveness of minimally invasive interventions has been limited due to a paucity of validated and reliable patient-reported outcome measures (PROMs) designed to assess subjective changes in skin appearance and quality. In clinical practice and in research trials, measurement from the perspective of patients is paramount. Objective outcome assessments and clinician-reported outcome measures may correlate with the experiences of patients but cannot replace their unique perspective.6,7
The SKIN-Q was developed to address this limitation. The SKIN-Q is a novel PROM composed of item libraries that measures satisfaction with how the skin looks and feels. SKIN-Q items cover the 4 key domains of skin quality described by Goldie et al, ie, tone evenness, skin surface evenness, skin firmness, and skin glow.8 The SKIN-Q can be used as an outcomes tool in studies of minimally invasive aesthetic treatments. The item libraries can be used in full or as customized short-form scales to maximize content validity while minimizing respondent burden. The SKIN-Q was developed following best-practice guidelines9-12 and is described in detail elsewhere.13
The aim of our study was to add a new scale to the SKIN-Q to provide a means to measure what patients think of the outcome of minimally invasive aesthetic treatments. A useful reference for this is FACE-Q Aesthetics, where one of the most widely used scales is the Treatment Outcome scale,14 which measures how satisfied people are with the outcome of their treatment (eg, Pleased with the result; The result was just as I expected).15 However, FACE-Q Aesthetics was developed specifically for facial treatments and does not provide a means to evaluate satisfaction with aesthetic treatments of the body.
METHODS
This mixed-methods study followed international guidelines for PROM development and validation.9-12 We obtained ethics board approval (number 13603) from the Hamilton Integrated Ethics Board of McMaster University, Canada. Prior to data collection, participants gave their informed consent.
Figure 1 shows the study methodology. The reader is referred elsewhere for a detailed description of the methods used to develop the SKIN-Q, including the concept elicitation interview guide.13
Figure 1.
Methods flow diagram. CTT, classical test theory; DIF, differential item functioning; ICC, intraclass correlation coefficient; PCA, principal component analysis; PSI, person separation index; TRT, test-retest.
Content Validation
Between October 2021 and March 2022, participants for concept elicitation interviews were recruited in Canada (3 sites) and the United States (3 sites). Clinic staff were asked to include people who varied by age, gender, race, and type of minimally invasive treatment. Interviews followed the SKIN-Q interview guide13 and were audio-recorded, transcribed verbatim, and coded by 2 interviewers who reached a consensus on their codes. Constant comparison was used to refine the codes. Interviews continued until saturation of concepts elicited was reached. We provided each participant with a US$100 gift card to thank them for their participation in the study.
Outcome concepts related to appearance and psychological well-being were formed into items for the Treatment Outcome scale and refined with feedback from the concept elicitation participants through a REDCap (Vanderbilt University, Nashville, TN) survey and cognitive interviews. Participants who provided feedback received a US$30 gift card. Participants who completed the REDCap survey and participated in a cognitive interview received an additional US$70 gift card. Experts in aesthetics and aesthetics industry representatives were invited by email to provide feedback.
In December 2022, we further studied content validity in a sample of Prolific (www.prolific.com) participants who were paid the equivalent of UK£10.80 per hour for completing our survey. We included participants who had the treatments listed in Supplemental Table 1 as long as they had been to a plastic surgery or dermatology clinic for treatment in the past 12 months, and did not chose “none” or “other” for the treatment type, or “other” for the location of their body treatment. Participants provided answers to questions to determine if they understood the SKIN-Q items and if the items were relevant to them.
In February 2023, the Prolific sample completed the SKIN-Q survey. Following the pilot field-test, another screening survey in Prolific using the same criteria (Supplemental Table 1) identified a new sample. Testing of the Treatment Outcome scale was part of a larger study designed to create and test the SKIN-Q13 and new FACE-Q scales.16,17 Given the survey's length, it was divided into 2 parts separated by several days, with the Treatment Outcome scale included in Part 2 and sent to those who agreed to continue the study. Final data collection involved a test-retest study 7 to 14 days after the initial survey.
The pilot and field-test data were merged and analyzed using Rasch measurement theory analysis.18,19 RUMM2030 software20 (RUMM Laboratory Pty Ltd, Duncraig, Australia) was used with the unrestricted Rasch model for polytomous data. Rasch measurement theory analysis was used to identify the best subset of items to retain for the Treatment Outcome scale based on a set of psychometric assessments, including classical test theory (Tables 1, 2).19-27
Table 1.
Psychometric tests performed
| Test | Description |
|---|---|
| Thresholds for item response options | Examines whether the item response options are ordered on a continuum (eg, a score of 1 should be lower than scores of 2 and higher). The item hierarchy sorts items into order from easiest to hardest to endorse. |
| Item fit | Examines the extent to which observed values fit the expectations of the Rasch model. Item fit is assessed with fit residuals and chi-square statistics. Fit residuals summarize the observed and expected responses to an item by the sample and should ideally lie within the range −2.5 and +2.5. Chi-square values summarize the difference between observed and expected responses to an item for subgroups in the sample (class intervals) and should be nonsignificant after Bonferroni adjustment for multiple testing. Item characteristic curves can be viewed graphically.19 The sample was adjusted to 500 for tests of statistical fit. |
| Local dependency | Residual correlations were examined to identify any >0.20 above the average correlations. Items deemed locally dependent were included in subtests to determine impact on scale reliability.21 |
| Scale-to-sample targeting | Inspects the spread of person locations and item locations. A scale that is better targeted has more coverage with the mean person location close to the center of the items.22 |
| Differential item functioning (DIF) | Examines if items are invariant across subgroups—age (ie, 20-30, 31-40, >40 years), gender, skin location (body, face)—in the sample. This test uses analysis of variance to examine estimated person ability differences between class intervals within subgroups. A significant difference between subgroups identifies potential DIF. If DIF was identified, variables were split for the relevant items, with both original and split person locations correlated to examine the impact of DIF on scale scoring.23 |
| Reliability | Reliability statistics range from 0 to 1 with higher scores indicating greater reliability. Scores should be >0.70.24,25 Three types of reliability were examined:
|
| Unidimensionality | Scale items were included in a principal component analysis in SPSS with the hypothesis that all items would load onto a single factor and factor loadings would be >0.70 to support that each item represents part of the scale's latent variable27. |
| Construct Validity | Examines the extent to which the scale accurately measures what it proports to measure. To examine construct validity, FACE-Q Aesthetics14-17 and SKIN-Q scales13 were included. |
Table 2.
Predefined Hypotheses for Testings of Construct Validity
| Hypotheses | Accepted |
|---|---|
| Overall | |
| Women will score higher than men | Y |
| Those who think they look younger or the same as their age will score higher than those who think they look older on the FACE-Q Age Visual Analogue Scale | Y |
| Those whose treatment has worn off will score lower | Y |
| Scores will incrementally decrease as participants report being more bothered by skin laxity | N |
| Scores will incrementally increase as participants report being more satisfied with how the body/face looks today compared to before treatment | Y |
| Scores will be negligibly correlated (<0.3) with age | Y |
| Face only | |
| Scores will incrementally increase as participants report being more satisfied with facial appearance | Y |
| Those who plan to have another facial cosmetic treatment in the future will score higher than those who were unsure or who do not want future treatment | Y |
| Scores will incrementally increase as participants report being more satisfied with how their facial skin looks now compared to before treatment | Y |
| Scores will incrementally increase as participants report being more satisfied with how their facial skin feels now compared to before treatment | Y |
| Scores will incrementally increase as participants report that their skin looks more natural | Y |
| Scores will incrementally increase as participants report that their skin feels more natural | Y |
| Convergent validity with other PROM scales | |
| Related similar construct >0.5 | |
| FACE-Q Outcome | Y |
| FACE-Q Decision | Y |
| Related but dissimilar construct (0.3-0.5) | |
| SKIN-Q Looks | Y |
| SKIN-Q Feels | Y |
| FACE-Q Face Overall | Y |
| FACE-Q Lines Overall | N |
| FACE-Q Aging Appraisal | N |
| FACE-Q Psychological | Y |
| FACE-Q Social | Y |
| FACE-Q Appearance Distress | Y |
| Total number of hypotheses accepted | 19 |
| Total number of hypotheses tested | 22 |
| Percent accepted | 86% |
RESULTS
Participant and treatment characteristics for the qualitative and cognitive interview participants and the psychometric sample are shown in Tables 3 and 4. The qualitative sample of 26 participants comprised 23 women and 3 men who ranged in age from 24 to 76 years (mean [standard deviation], 46.9 [13.8] years). The cognitive sample of 174 participants comprised 142 women, 29 men, and 3 people who identified as gender diverse. Participants ranged in age from 20 to 73 years (mean, 40.5 [13.5] years). The psychometric sample of 499 participants comprised 402 women, 89 men, 6 gender diverse, and 2 who preferred to not answer the gender question. This larger sample ranged in age from 20 to 85 years (mean, 38.2 [12.5] years). Most participants in psychometric sample werewhite in terms of race, living as married or common law, with a college trade or university degree, and living in the United States.
Table 3.
Participant Characteristics
| Qualitative sample | Prolific | |||||
|---|---|---|---|---|---|---|
| cognitive sample | Psychometric sample | |||||
| N = 26 | N = 174 | % | N = 499 | % | ||
| Sample | Body | 0 | 45 | 25.9 | 57 | 11.4 |
| Face | 26 | 123 | 70.7 | 393 | 78.8 | |
| Face and body | 6 | 6 | 3.4 | 49 | 9.8 | |
| Country | Canada | 6 | 31 | 17.8 | 82 | 16.4 |
| USA | 20 | 143 | 82.2 | 415 | 83.2 | |
| Missing | 0 | 0 | 0 | 2 | 0.4 | |
| Age (years) | 20-29 | 3 | 44 | 25.3 | 142 | 28.5 |
| 30-39 | 6 | 46 | 26.4 | 169 | 33.9 | |
| 40-49 | 7 | 30 | 17.2 | 85 | 17.0 | |
| 50-59 | 6 | 35 | 20.1 | 64 | 12.8 | |
| ≥60 | 4 | 19 | 10.9 | 39 | 7.8 | |
| Gender | Woman | 23 | 142 | 81.6 | 402 | 80.6 |
| Man | 3 | 29 | 16.7 | 89 | 17.8 | |
| Gender diverse | 0 | 3 | 1.7 | 6 | 1.2 | |
| Prefer to not answer | 0 | 0 | 0 | 2 | 0.4 | |
| Race | White | 22 | 127 | 73.0 | 340 | 68.1 |
| Black | 2 | 15 | 8.6 | 35 | 7.0 | |
| Latin American | 0 | 15 | 8.6 | 22 | 4.4 | |
| East Asian | 0 | 12 | 6.9 | 31 | 6.2 | |
| Middle Eastern | 0 | 5 | 2.9 | 8 | 1.6 | |
| South Asian | 1 | 4 | 2.3 | 13 | 2.6 | |
| Southeast Asian | 1 | 4 | 2.3 | 4 | 0.8 | |
| Indigenous | 0 | 1 | 0.6 | 1 | 0.2 | |
| Mixed | 0 | 0 | 0 | 42 | 8.4 | |
| Other | 0 | 1 | 0.6 | 3 | 0.6 | |
| Marital status | Married/common law | 16 | 78 | 44.8 | 251 | 50.3 |
| Single | 7 | 61 | 35.1 | 190 | 38.1 | |
| Divorced | 2 | 26 | 14.9 | 42 | 8.4 | |
| Separated | 0 | 3 | 1.7 | 5 | 1.0 | |
| Widowed | 1 | 2 | 1.1 | 2 | 0.4 | |
| Other/prefer not to answer | 0 | 4 | 2.3 | 9 | 1.8 | |
| Fitzpatrick skin type | Always burn and never tan | 2 | 9 | 5.2 | 38 | 7.6 |
| Usually burn and minimally tan | 9 | 45 | 25.9 | 129 | 25.9 | |
| Mild burn and then tan | 9 | 64 | 36.8 | 173 | 34.7 | |
| Rarely burn and always tan | 4 | 33 | 19.0 | 107 | 21.4 | |
| Rarely burn and tan very easily | 1 | 15 | 8.6 | 42 | 8.4 | |
| Never burn and never tan | 1 | 8 | 4.6 | 10 | 2.0 | |
| Highest education | Some high school | 0 | 2 | 1.1 | 2 | 0.4 |
| High school | 1 | 11 | 6.3 | 27 | 5.4 | |
| Some college, trade, or university | 4 | 24 | 13.8 | 70 | 14.0 | |
| College, trade, or university degree | 9 | 98 | 56.3 | 261 | 52.3 | |
| Some masters or doctoral degree | 0 | 7 | 4.0 | 30 | 6.0 | |
| Masters or doctoral degree | 11 | 31 | 17.8 | 108 | 21.6 | |
| Missing/prefer to not answer | 1 | 1 | 0.6 | 1 | 0.2 | |
Table 4.
Treatment History
| Qualitative Sample | Prolific | |||||
|---|---|---|---|---|---|---|
| Cognitive Sample | Psychometric Sample | |||||
| N = 26 | N = 180 | % | N = 499a | % | ||
| Facial treatments | ||||||
| Injectable | Botox | 18 | 76 | 58.9 | 190 | 38.1 |
| Filler | 17 | 71 | 55.0 | 145 | 29.1 | |
| Platelet-rich plasma | 1 | 7 | 5.4 | 20 | 4.0 | |
| Skin booster | 0 | 0 | 0 | 20 | 4.0 | |
| Skin resurfacing | Microdermabrasion | 7 | 59 | 45.8 | 182 | 36.5 |
| Chemical peel | 16 | 51 | 39.5 | 191 | 38.3 | |
| Hydrafacial | 2 | 40 | 41.0 | 202 | 40.5 | |
| Laser | 14 | 37 | 28.7 | 96 | 19.2 | |
| Microneedling | 2 | 30 | 23.3 | 117 | 23.4 | |
| Light therapy | 14 | 25 | 19.4 | 75 | 15.0 | |
| Skin tightening | Radiofrequency | 7 | 11 | 8.6 | 51 | 10.2 |
| High-intensity ultrasound | 0 | 9 | 7.0 | 41 | 8.2 | |
| Thread lift | 1 | 6 | 4.7 | 24 | 4.8 | |
| Fat removal | Fat removal | 1 | 6 | 4.7 | 23 | 4.6 |
| Body treatments | ||||||
| Injectables | Filler | 0 | 17 | 33.3 | 35 | 7.0 |
| Skin booster | 0 | 0 | 0 | 16 | 3.2 | |
| Skin resurfacing | Laser | 1 | 0 | 0 | 0.0 | |
| Fat reduction | Fat removal | 0 | 6 | 11.8 | 18 | 3.6 |
| Cryolipolysis | 2 | 28 | 50.9 | 41 | 8.2 | |
| Laser lipolysis | 0 | 10 | 19.6 | 17 | 3.4 | |
| Radiofrequency | 1 | 8 | 15.6 | 13 | 2.6 | |
| High-intensity focused electromagnetic | 4 | 0 | 0 | 0 | 0.0 | |
| Skin tightening | High-intensity ultrasound | 0 | 13 | 24.5 | 30 | 6.0 |
| Radio frequency | 2 | 14 | 26.4 | 25 | 5.0 | |
| Intense pulsed light and radiofrequency | 0 | 8 | 15.7 | 17 | 3.4 | |
| Cellulite | Cellulite treatment | 0 | 17 | 33.4 | 32 | 6.4 |
aRepresents number of treatments reported by participants.
Concepts from the qualitative data were developed into 32 items and refined during 3 rounds of feedback. Figure 1 shows the number of participants in each round of feedback, and Supplemental Table 2 shows the changes made to the scale in each round. In the final round of feedback, 45 items were tested and understood by ≥97.8% of participants. Relevance ratings ranged from 94.4% to 47.2%, with 32 out of 45 items deemed relevant by ≥80% of the sample. The final field-test version of the scale included 36 items.
Figure 1 shows the number of Prolific participants who completed the Treatment Outcome scale. We combined the pilot (N = 153) and field-test (N = 389) data from 499 participants (total assessments = 542). Of the 36 items tested, we dropped 9 items that did not fit the Rasch model and 9 items were deemed redundant based on residual correlation values. For the remaining 18 items, we found a solution that reduced items to 10 and retained as many of the 4 remaining psychological items as possible. This solution ensured that the Treatment Outcome scale measures results in terms of appearance and psychological well-being. The final set of 10 items included 4 psychological items and 6 appearance items, providing a good fit to the Rasch model with high reliability. All 10 items had ordered thresholds (see Supplemental Figure 1A), fit the Rasch model with non-significant chi-square P-values after Bonferroni adjustment, and 9 of the 10 items had fit residuals ±2.5 or less. There was no detectable differential item functioning (DIF) in any item by age, gender, or location (body vs face). Data from the sample fit the Rasch model (chi-square = 50.46, df = 40, P = .124). PSI (0.87, 0.88) and Cronbach α (.95, .91) values with and without extremes were high. Person-item residual correlations were <0.20 for all but one item pair (0.33). A subtest performed to determine the impact of this correlation on reliability reduced the PSI by 0.01 and the Cronbach α by .02. Overall, 394 out of 542 participants (72.7%) scored within the range of measurement provided by the scale. Supplemental Figure 1B shows the Person-Item Threshold Distribution for the face and body treatments groups. The principal component analysis supported a single factor; factor loadings ranged from 0.72 to 0.86.
Participants' scores for the Treatment Outcome scale were transformed to a 0 (worse) to 100 (best) scale. The proportion of participants to score on the floor and ceiling was low (0.3 and 1.7, respectively). The mean Treatment Outcome scale score was 78 [18] for those who had facial treatments and 68 [23] for those with treatment on the body. Of the tested hypotheses shown in Table 2, 19 out of 22 (86%) were accepted, exceeding the 75% threshold recommended to support construct validity according to COSMIN methodology. Detailed results are provided in Supplemental Tables 3 and 4.
The test-retest study included 136 eligible participants. The intraclass correlation coefficient result (average) for the Treatment Outcome scale was 0.89 (95% CI, 0.85-0.92).
DISCUSSION
Our findings provide evidence of the reliability and validity of the novel SKIN-Q Treatment Outcome scale using data from a large sample of participants who underwent minimally invasive skin treatments for the face and body. These findings expand upon the authors' previous work, which used best-practice guidelines for PROM development to develop and examine the validity and reliability of the SKIN-Q item libraries for measuring how the skin looks (46 items) and feels (20 items).13
Given current trends in aesthetic plastic surgery and dermatology, which continues to establish a year-over-year increase in the number of minimally invasive treatments performed, the development and validation of a PROM that evaluates participant satisfaction with the outcome of treatment is needed to keep pace with clinical practice.
The SKIN-Q Treatment Outcome scale can be utilized to evaluate patient satisfaction with a variety of commonly performed minimally invasive procedures, including injectables (eg, Botox, fillers, platelet-rich plasma), skin resurfacing treatments (eg, laser therapy, microneedling, chemical peels), and skin-tightening techniques (eg, radiofrequency).2-5 This new scale can be used in research and clinical care to assess the relative effectiveness of these procedures and emerging innovations.
This study has several key limitations. First, an important limitation was that only a few participants who had undergone an aesthetic treatment of the body were included. Second, only a small number of participants represented several interventions. Third, our study included only English speakers from Canada and the United States, and therefore, the data may not be generalizable to other patient demographics. Fourth, an important limitation of our study is that the clinical data were provided by participants and could not be verified. Finally, a limitation is the use of an online platform to recruit the sample. Although such research is advantageous in that it is possible to collect a large amount of data quickly and at a low cost, participants self-select to take part and were paid for their involvement, which introduces the potential for bias. Prolific has been shown to provide high-quality data compared with other similar platforms.28,29
CONCLUSIONS
The SKIN-Q Treatment Outcome scale is a valid and reliable PROM designed to measure patient satisfaction for minimally invasive treatments of skin anywhere on the face or body. This PROM can be used to inform clinical care and research that evaluates the effectiveness of various noninvasive treatment options commonly used in aesthetics, such as injectables, skin resurfacing, and skin-tightening procedures. The SKIN-Q is available free of charge for academic use at www.qportfolio.org.
Supplemental Material
This article contains supplemental material located online at https://doi.org/10.1093/asj/sjaf075.
Supplementary Material
Disclosures
Drs Klassen, Cano, and Pusic are co-developers of the SKIN-Q PROM and receive a share of any license revenues as royalties based on their institutions' inventor sharing policy. Dr Klassen is an owner of EVENTUM Research (Hamilton, ON, Canada), which provides consulting services to the pharmaceutical industry.
Funding
The authors received no financial support for the research, authorship, and publication of this article.
References
- 1. Aesthetic Plastic Surgery National Databank Statistics 2023. Aesthet Surg J. 2024;44(Suppl 2):1–25. doi: 10.1093/asj/sjae188. [DOI] [PubMed] [Google Scholar]
- 2. Hoffman L, Fabi S. Look better, feel better, live better? The impact of minimally invasive aesthetic procedures on satisfaction with appearance and psychosocial wellbeing. J Clin Aesthet Dermatol. 2022;15:47–58. [PMC free article] [PubMed] [Google Scholar]
- 3. Jack MC, Pozner JN. Putting it all together. Plast Reconstr Surg. 2014;134:101S–107S. doi: 10.1097/prs.0000000000000670 [DOI] [PubMed] [Google Scholar]
- 4. Noland ME, Lalonde DH, Yee GJ, Rohrich RJ. Current uses of botulinum neurotoxins in plastic surgery. Plast Reconstr Surg. 2016;138:519e–530e. doi: 10.1097/prs.0000000000002480 [DOI] [PubMed] [Google Scholar]
- 5. Weissler JM, Carney MJ, Carreras Tartak JA, Bensimon RH, Percec I. The evolution of chemical peeling and modern-day applications. Plast Reconstr Surg. 2017;140:920–929. doi: 10.1097/prs.0000000000003787 [DOI] [PubMed] [Google Scholar]
- 6. Told R, Placheta-Györi E, Lackner B, et al. FACE-Q Patient report-assisted subjective and objective evaluation of blepharoplasty outcomes using two different suturing techniques: a randomized and patient-blinded pilot study. Aesthet Plast Surg. 2023;47:1410–1417. doi: 10.1007/s00266-023-03339-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Mommaerts MY. Patient- and clinician-reported outcomes of lower jaw contouring using patient-specific 3D-printed titanium implants. Int J Oral and Maxillofac Surg. 2020;50:373–377. doi: 10.1016/j.ijom.2020.07.008 [DOI] [PubMed] [Google Scholar]
- 8. Goldie K, Kerscher M, Fabi SG, et al. Skin quality—a holistic 360° view: consensus results. Clin Cosmet Investig Dermatol. 2021;14:643–654. doi: 10.2147/ccid.s309374 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Food and Drug Administration . Guidance for Industry Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. U.S. Department of Health and Human Services; 2009. [Google Scholar]
- 10. Terwee CB, Prinsen CA, Chiarotto A, et al. COSMIN Methodology for Assessing the Content Validity of PROMs. VU University Medical Center; 2018. [Google Scholar]
- 11. Patrick DL, Burke LB, Gwaltney CJ, et al. Content validity–establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO good research practices task force report: part 1–eliciting concepts for a new PRO instrument. Value Health. 2011;14:967–977. doi: 10.1016/j.jval.2011.06.014 [DOI] [PubMed] [Google Scholar]
- 12. Patrick DL, Burke LB, Gwaltney CJ, et al. Content validity–establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO Good Research Practices Task Force report: part 2–assessing respondent understanding. Value Health. 2011;14:978–988. doi: 10.1016/j.jval.2011.06.013 [DOI] [PubMed] [Google Scholar]
- 13. Klassen AF, Pusic AL, Kaur M, et al. The SKIN-Q: an innovative patient-reported outcome measure for evaluating minimally invasive skin treatments for the face and body. Facial Plast Surg Aesthet Med. 2024;26:247–255. doi: 10.1089/fpsam.2023.0204 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Gallo L, Kim P, Yuan M, et al. Best practices for FACE-Q aesthetics research: a systematic review of study methodology. Aesthet Surg J. 2023;43:NP674–NP686. doi: 10.1093/asj/sjad141 [DOI] [PubMed] [Google Scholar]
- 15. Klassen AF, Cano SJ, Schwitzer JA, Scott AM, Pusic AL. FACE-Q Scales for health-related quality of life, early life impact, satisfaction with outcomes, and decision to have treatment. Plast Reconstr Surg. 2014;135:375–386. doi: 10.1097/prs.0000000000000895 [DOI] [PubMed] [Google Scholar]
- 16. Klassen AF, Pusic AL, Kaur M, et al. Extending the range of measurement for minimally invasive treatments by adding new concepts to FACE-Q aesthetics scales. Plast Reconstr Surg Glob Open. 2024;12:e5736. doi: 10.1097/gox.0000000000005736 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Klassen AF, Cano S, Mansouri J, et al. I want it to look natural”: development and validation of the FACE-Q aesthetics natural module. Aesthet Surg J. 2024;44:733–743. doi: 10.1093/asj/sjad374 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Rasch G. Studies in mathematical psychology: I. Probabilistic models for some intelligence and attainment tests. Nielsen & Lydiche; 1960.
- 19. Hobart J, Cano S. Improving the evaluation of therapeutic interventions in multiple sclerosis: the role of new psychometric methods. Health Technol Assess. 2009;13:iii, ix–x, 1–177. doi: 10.3310/hta13120 [DOI] [PubMed] [Google Scholar]
- 20. Andrich D, Sheridan BS, Luo G. RUMM2030Plus: Rasch Unidimensional Models for Measurement. RUMM Laboratory; 2021. [Google Scholar]
- 21. Christensen KB, Makransky G, Horton M. Critical values for Yen's Q 3: identification of local dependence in the rasch model using residual correlations. Appl Psychol Meas. 2017;41:178–194. doi: 10.1177/0146621616677520 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Cleanthous S, Bongardt S, Marquis P, Stach C, Cano S, Morel T. Psychometric analysis from EMBODY1 and 2 clinical trials to help select suitable fatigue pro scales for future systemic lupus erythematosus studies. Rheumatol Ther. 2021;8:1287–1301. doi: 10.1007/s40744-021-00338-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Andrich D, Hagquist C. Real and artificial differential item functioning. J Educ Behav Stat. 2012;37:387–416. doi: 10.3102/1076998611411913 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Nunnally JC. Psychometric Theory, 3rd ed. McGraw-Hill;1994. [Google Scholar]
- 25. Prinsen CAC, Mokkink LB, Bouter LM, et al. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27:1147–1157. doi: 10.1007/s11136-018-1798-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Andrich D. An index of person separation in latent trait theory, the traditional KR.20 index, and the Guttman scale response pattern. Educ Res Perspect. 1982;9:95–104. [Google Scholar]
- 27. Gaskin CJ, Happell B. On exploratory factor analysis: a review of recent evidence, an assessment of current practice, and recommendations for future use. Int J Nurs Stud. 2013;51:511–521. doi: 10.1016/j.ijnurstu.2013.10.005 [DOI] [PubMed] [Google Scholar]
- 28. Peer E, Rothschild D, Gordon A, Evernden Z, Damer E. Data quality of platforms and panels for online behavioral research. Behav Res Methods. 2022;54:1643–1662. doi: 10.3758/s13428-021-01694-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Douglas BD, Ewell PJ, Brauer M. Data quality in online human-subjects research: comparisons between MTurk, prolific, CloudResearch, qualtrics and SONA. PLoS One. 2023;18:e0279720. doi: 10.1371/journal.pone.0279720 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

