Abstract
BACKGROUND
A validated scale is needed for objective and reproducible comparisons of marionette lines before and after treatment in clinical studies.
OBJECTIVE
To describe the development and validation of a 5-point photonumeric marionette lines scale.
METHODS
The scale was developed to include an assessment guide, verbal descriptors, and real and morphed subject images for each scale grade. Intrarater and interrater reliability was evaluated in initial scale validation (web-based review) (N = 51) and live-subject validation (N = 75) studies, each completed during 2 sessions.
RESULTS
In the initial scale validation study, intrarater agreement for 2 physician raters was near perfect (weighted kappa = 0.92 and 0.94). Interrater agreement was excellent during sessions 1 and 2 (intraclass correlation coefficients of 0.94 and 0.95, respectively). In the live-subject validation study, intrarater agreement for 3 physician raters showed a strong correlation (mean weighted kappa = 0.77). Interrater agreement was high during live-subject validation sessions 1 and 2 (intraclass correlation coefficients = 0.89 for both sessions).
CONCLUSION
This new marionette lines scale is a validated and reliable scale for physician rating of marionette line severity.
Labiomandibular or melomental folds, commonly referred to as marionette lines, are cutaneous depressions that extend from the corner of the mouth toward the chin/mandible and become more prominent with age (Figure 1).1,2 Contractions of the platysma, depressor anguli oris, and mentalis muscles contribute to marionette lines.3 There are 2 underlying structural fixation points that anatomically define the marionette line: the cutaneous insertion of the depressor anguli oris muscle (superior border) and the attachment site of the mandibular ligament (inferior border).1 The lateral pull exerted by the zygomaticus major and platysma muscles at these 2 fixation points, which occurs with facial animation, reveals the entire length of this natural fold.1 Similar to the nasolabial fold, the superficial musculoaponeurotic system that traverses this area transitions from the superficial musculoaponeurotic system type I morphology on the cheek side to type II on the lower-lip side,4 further contributing to the natural development of a habitual fold. With age-related soft tissue changes, the appearance of marionette lines can become more obvious as a result of increasing laxity of the submandibular and mandibular septum fat compartments, the retaining ligaments, and the overlying skin.5,6 Marionette lines contribute to a downturned appearance of the mouth corners, thereby conveying not only an aged appearance, but also a sad or angry expression.7
Figure 1.

Illustration of marionette lines.
The appearance of marionette lines is a common aesthetic concern that can be improved using minimally invasive treatments including injectable fillers and botulinum toxin type A.2,8–11 In an online survey of over 500 aesthetically oriented women, the priorities regarding facial treatments shifted with advancing age (i.e., age 30–34 years vs 60–65 years) from areas of the upper face to areas of the lower face, with marionette lines representing the greatest difference in treatment priority.12 The increase in popularity of minimally invasive facial aesthetic procedures has been accompanied by a shift in treatment approach from single areas of the face to multiple areas of the face to improve overall appearance.13,14 Facial features in the perioral area, such as marionette lines, are a focus of aesthetic improvement and may be included in panfacial treatment.15
Validated rating scales are important tools for objectively quantifying aesthetic treatment outcomes during clinical trials, and for setting treatment goals and aligning expectations between physicians and patients.16,17 Existing photonumeric scales to assess marionette line severity16,18 are not commonly used in clinical studies, comprise different facial angles, and lack validation across diverse populations (e.g., race/ethnicity, skin type), which may hinder their utility and clinical relevance. This study describes the development and validation of a new Allergan photonumeric scale for the clinical assessment and rating of marionette line severity applicable across a demographically diverse patient population.
Materials and Methods
Figure 2 summarizes key steps in the development and validation of the marionette lines scale that was performed between January 2020 and March 2021.
Figure 2.

Development and validation process for the marionette lines scale. 2D, two-dimensional.
Subjects
A total of 160 males and females above 18 years old with a range of skin types (Fitzpatrick skin types [FSTs] I−VI), races, and ethnicities were recruited for scale development and/or live validation. All subjects provided written informed consent.
Enrollment criteria were the same for scale development and live scale validation. Subjects were instructed to arrive at the study center clean shaven, remove make-up and jewelry, wear a provided black T-shirt, not drink alcohol excessively before the sessions, try not to alter their usual routine (e.g., their facial care routine and normal sleep or hydration patterns) between sessions, and not have tanning sessions or extensive sun exposure between sessions. Subjects were excluded if they had the following: their photographs included in the scale (live validation only); disorders or scarring that would interfere with visual assessment of the face; any treatment with toxin/fillers or surgery that would alter facial appearance within 2 weeks of the evaluation session, or plans to have 1 of these procedures between evaluation sessions; or were pregnant.
Scale Development
A team of 6 European physicians reviewed 110 2-dimensional subject photographs on a web-based format. Photographs were captured by trained photographers from Canfield Scientific, Inc. (Canfield, Parsippany, NJ). A Canon 6D camera (Canon, Melville, NY) with IntelliStudio (Canfield) was used to capture frontal and oblique images of each subject. Images were captured at rest, and each captured image was cropped below the nose by a Canfield graphic technician. Using their own experiences of marionette line severities, the physicians graded the photos on a preliminary 5-grade severity scale that had no images or descriptors. The physicians were not allowed to discuss image reviews with 1 another, and each completed the review independently.
The prescale development data were collaboratively reviewed by AbbVie, Canfield, and the team of physicians to determine which subjects had most agreement for potential use on the scale. For each grade, 2 subject photographs that had high agreement among most physicians (i.e., 6 of 6 physicians rated the photograph the same grade) were selected. Using these representative photographs for each of the 5 grades, scale descriptors were created: Grade 0 (None): No lines; Grade 1 (Minimal): Oral commissures begin to point downward with shallow lines; Grade 2 (Moderate): Inverted oral commissures with moderate lines; Grade 3 (Severe): Inverted oral commissures with deep lines; and Grade 4 (Very Severe): Inverted oral commissures with deep furrows and sagging. A representative subject was selected as a base model by scale developers to generate morphed photographs. A Canfield graphics technician morphed the selected subject image to match the created descriptors of each numeric grade. In total, the scale contains 10 morphed images (5 frontal, 5 oblique) to provide better guidance for grading assessments without any other distracting features. The other images in the scale are nonmanipulated subject photographs representing the multiple physical changes that occur in an aging face.
The finalized marionette lines scale contains the scale descriptors for each grade, real and morphed subject images, and an assessment guide, which is a line drawing of anatomic markers demarcating the area of interest (Figures 3 and 4). The assessment guide was developed by Canfield based on detailed instructions from AbbVie. The morphed images are the first 2 images for each grade on the scale and are outlined with a white border.
Figure 3.

Assessment guide for the marionette lines scale.
Figure 4.

The marionette lines scale assigns a grade from None (0) to Very Severe (4) that describes the severity of marionette lines.
The physician raters who participated in the initial scale development received a copy of the finalized scale and participated in a web-based review to determine whether the scale was easy to use and functioned reliably as intended. This review was performed twice (2 sessions) using left oblique, frontal, and right oblique photographs from the original 51 subject photographs. Each physician individually assigned a score using the scale. Interrater and intrarater reliability were evaluated. Based on the results of this review, some subjects were invited to the live scale validation as training subjects.
Live Scale Validation
Three US physician raters experienced in using aesthetic photonumeric scales and who were not involved in scale development participated in 2 live-subject validation sessions occurring 2 weeks apart. Before the first live evaluation session, all physician raters were trained to use the marionette lines scale. The first 10 subjects rated during the first validation session were considered run-in training subjects and were excluded from the analysis. The physician raters evaluated each enrolled subject twice, once during each of 2 validation sessions. Each physician rated subjects independently, and physicians were unable to discuss subject scores. Scores were recorded and uploaded to a server using a dedicated tablet. Interrater and intrarater reliability were evaluated.
Statistics
The SAS macro “INTRACC” was used to determine interrater reliability by calculating the intraclass correlation coefficient (intraclass correlation coefficients [ICC] [2,1]) and 95% confidence intervals (CIs) for validation sessions 1 and 2 using the formula described by Shrout and Fleiss.19 For each rater, the SAS procedure “FREQ” was used to determine intrarater reliability by calculating a weighted kappa with Cicchetti-Allison weights (SAS default weights) and 95% CIs.20 For the live scale validation, the overall mean weighted kappa was computed by averaging the overall weighted kappa for each rater from validation sessions 1 and 2. Intraclass correlation coefficients and weighted kappa coefficients 0.00 to 0.20 indicate slight agreement, 0.21 to 0.40 indicate fair agreement, 0.41 to 0.60 indicate moderate agreement, 0.61 to 0.80 indicate substantial agreement, and 0.81 to 1.00 indicate almost perfect agreement.21 SAS version 9.4 (Cary, NC) was used for all statistical analyses.
Results
Initial Scale Validation: Web-Based Review
The marionette lines scale demonstrated excellent interrater agreement (ICC >0.90) during validation sessions 1 and 2, with ICCs of 0.94 and 0.95, respectively (Table 1). Overall intrarater agreement for physician raters 1 and 2 showed a near perfect correlation (kappa >0.90), with weighted kappas of 0.92 and 0.94, respectively.
TABLE 1.
Estimates of Agreement for the Marionette Lines Scale: Initial Scale Validation (Web-Based Review)
| Interrater agreement, ICC (95% CI) | |||
| Validation session 1 (N = 3 raters) | 0.94 (0.90–0.96) | ||
| Validation session 2 (N = 2 raters) | 0.95 (0.92–0.97) | ||
| Intrarater agreement, weighted kappa (95% CI) | |||
| Physician rater 1 (N = 51 subject photographs) | 0.92 (0.86–0.98) | ||
| Physician rater 2 (N = 51 subject photographs) | 0.94 (0.88–0.99) |
CI, confidence interval; ICC, intraclass correlation coefficient.
Live Scale Validation
A total of 75 subjects were enrolled in the live scale validation study. Demographic characteristics of subjects in the final scale validation set are shown in Table 2. The mean age of the validation study population was 51 years, with a broad span of ages represented (range, 22–87). Most subjects were female (67%), white (76%), and had FST III/IV (52%; Table 2).
TABLE 2.
Subject Demographics: Live Scale Validation (N = 75)
| Characteristic | |
| Mean age (range), yr | 51 (22–87) |
| Sex, n (%) | |
| Female | 50 (67) |
| Male | 25 (33) |
| Fitzpatrick skin type, n (%) | |
| I | 6 (8) |
| II | 13 (17) |
| III | 22 (29) |
| IV | 17 (23) |
| V | 13 (17) |
| VI | 4 (5) |
The scale demonstrated high interrater agreement (ICC: 0.75–0.90), with ICCs of 0.89 for both validation sessions (Table 3). Overall, intrarater agreement for physician raters 1, 2, and 3 showed a strong correlation (kappa: 0.80–0.90), with a mean weighted kappa of 0.77 (Table 3).
TABLE 3.
Estimates of Agreement for the Marionette Lines Scale: Live Scale Validation
| Interrater agreement, ICC (95% CI) | |||
| Validation session 1 (N = 3 raters) | 0.89 (0.86–0.91) | ||
| Validation session 2 (N = 3 raters) | 0.89 (0.85–0.92) | ||
| Intrarater agreement, weighted kappa (95% CI) | |||
| Physician rater 1 (N = 75 subjects) | 0.76 (0.70–0.82) | ||
| Physician rater 2 (N = 75 subjects) | 0.82 (0.77–0.87) | ||
| Physician rater 3 (N = 75 subjects) | 0.73 (0.67–0.79) | ||
| Mean weighted kappa | 0.77 (0.74–0.80) |
CI, confidence interval; ICC, intraclass correlation coefficient.
Discussion
This study demonstrated high interrater and substantial intrarater agreement for the new marionette lines scale, indicating that the scale is reliable and validated for measurement of marionette line severity. This scale uses morphed and nonmanipulated subject photographs showing the perioral area at 2 different angles with 5 marionette line severity descriptions. The scale was developed to represent a demographically diverse population; therefore, a range of FSTs and races/ethnicities were included in the validation. Two subject photographs representing each grade of the scale were selected to represent diversity in sex and race/ethnicity. Online and live subject validation studies were performed to allow refinement of the scale.
Other scales to evaluate marionette lines have been described in the literature. A photonumeric scale was developed to objectively quantify the improvement of facial wrinkles with injectable filler by correlating observer judgment with optical assessments of skin replicas.18 The Marionette Line Grading Scale, a 5-point photonumeric scale to quantify the severity of marionette lines, demonstrated similar interrater and intrarater reliability compared with the current study.16 Although the Marionette Line Grading Scale is a validated scale that improved upon former scales, it was developed by raters using morphed images to judge photographs of live patients.16 Morphed images, although standardized to a site-specific area, do not translate clinically to the multiple physical changes that occur in an aging face. In addition, the development of both scales did not consider the incorporation of different demographics or races/ethnicities. These scales have not been widely used in clinical studies.
There are some limitations to this study. To assess the clinical significance of the scale, patient-reported outcomes, such as the FACE-Q, may capture clinically meaningful improvement from a patient perspective.22–24 Although the scale includes all skin types, the representation of FSTs I and VI were low in the subject population (8% and 5%, respectively) compared with other skin types. Similar to previously developed aesthetic rating scales,22–25 the use of 2-dimensional photonumeric scales to capture 3-dimensional anatomical areas may not fully capture the true benefits of treatment. Ultimately, live-subject assessment is superior to photographic assessment. The scale assesses marionette lines in isolation, whereas in reality, facial aesthetics is multifactorial and multiple areas of the face are often treated simultaneously in the clinic. Although it is important to note that this scale was designed for clinical research and not for routine use in aesthetics clinics, further validation using patients in a real-world clinical setting may further increase its applicability and clinical relevance.
Conclusions
The new marionette lines scale demonstrated high interrater and substantial intrarater agreement among physicians and accurately describes the severity of marionette lines. This validated scale includes a user-friendly diagram, detailed descriptions, and real and morphed patient images representative across sexes and skin types. The scale's standardized ratings may be uniformly applied in clinical trials and by clinicians who treat patients seeking aesthetic improvement in the appearance of marionette lines.
Footnotes
Allergan Aesthetics, an AbbVie Company, funded this study and participated in the study design, research, analysis, data collection, interpretation of data, reviewing, and approval of the publication. All authors had access to relevant data and participated in the drafting, review, and approval of this publication. No honoraria or payments were made for authorship. Medical writing support was provided by Maria Lim, PhD of Peloton Advantage, an OPEN Health company, and funded by Allergan Aesthetics, and AbbVie company.
The authors have indicated no significant interest with commercial supporters.
Contributor Information
Sofía Ruiz del Cueto, Email: ruizdelcueto@miracueto.com.
Fernando Urdiales-Gálvez, Email: furdiales@institutomedicomiramar.com.
Laurence Barry, Email: barry.laurence@wanadoo.fr.
Alessandro Gritti, Email: alessandrogritti81@gmail.com.
Alexandre Marchac, Email: alexandremarchac@gmail.com.
Maria Lim, Email: marialim@openhealthgroup.com.
Carola de la Guardia, Email: carola.delaguardia@abbvie.com.
Graeme Kerson, Email: graeme.kerson@abbvie.com.
Michael Silberberg, Email: michael.silberberg@abbvie.com.
References
- 1.Pessa JE, Garza PA, Love VM, Zadoo VP, et al. The anatomy of the labiomandibular fold. Plast Reconstr Surg 1998;101:482–6. [DOI] [PubMed] [Google Scholar]
- 2.Bae GY, Na JI, Park KC, Cho SB. Nonsurgical correction of drooping mouth corners using monophasic hyaluronic acid and incobotulinumtoxinA. J Cosmet Dermatol 2020;19:338–45. [DOI] [PubMed] [Google Scholar]
- 3.Braz AV, Louvain D, Mukamal LV. Combined treatment with botulinum toxin and hyaluronic acid to correct unsightly lateral-chin depression. An Bras Dermatol 2013;88:138–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Sandulescu T, Franzmann M, Jast J, Blaurock-Sandulescu T, et al. Facial fold and crease development: a new morphological approach and classification. Clin Anat 2019;32:573–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Reece EM, Pessa JE, Rohrich RJ. The mandibular septum: anatomical observations of the jowls in aging-implications for facial rejuvenation. Plast Reconstr Surg 2008;121:1414–20. [DOI] [PubMed] [Google Scholar]
- 6.Gierloff M, Stöhring C, Buder T, Wiltfang J. The subcutaneous fat compartments in relation to aesthetically important facial folds and rhytides. J Plast Reconstr Aesthet Surg 2012;65:1292–7. [DOI] [PubMed] [Google Scholar]
- 7.Michaud T, Gassia V, Belhaouari L. Facial dynamics and emotional expressions in facial aging treatments. J Cosmet Dermatol 2015;14:9–21. [DOI] [PubMed] [Google Scholar]
- 8.Bertucci V, Nikolis A, Solish N, Lane V, et al. Subject and partner satisfaction with lip and perioral enhancement using flexible hyaluronic acid fillers. J Cosmet Dermatol 2021;20:1499–504. [DOI] [PubMed] [Google Scholar]
- 9.Solish N, Bertucci V, Percec I, Wagner T, et al. Dynamics of hyaluronic acid fillers formulated to maintain natural facial expression. J Cosmet Dermatol 2019;18:738–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mess SA. Lower face rejuvenation with injections: Botox, Juvederm, and Kybella for marionette lines and jowls. Plast Reconstr Surg Glob Open 2017;5:e1551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Carruthers A, Carruthers J, Monheit GD, Davis PG, et al. Multicenter, randomized, parallel-group study of the safety and effectiveness of onabotulinumtoxinA and hyaluronic acid dermal fillers (24-mg/ml smooth, cohesive gel) alone and in combination for lower facial rejuvenation. Dermatol Surg 2010;36(Suppl 4):2121–34. [DOI] [PubMed] [Google Scholar]
- 12.Narurkar V, Shamban A, Sissins P, Stonehouse A, et al. Facial treatment preferences in aesthetically aware women. Dermatol Surg 2015;41(Suppl 1):S153–S160. [DOI] [PubMed] [Google Scholar]
- 13.2020 National Plastic Surgery Statistics. American Society of Plastic Surgeons; 2020. Available at: https://www.plasticsurgery.org/documents/News/Statistics/2020/plastic-surgery-statistics-report-2020.pdf. Accessed: June 11, 2021. [Google Scholar]
- 14.Kaminer MS, Cohen JL, Shamban A, Werschler WP, et al. Maximizing panfacial aesthetic outcomes: findings and recommendations from the HARMONY study. Dermatol Surg 2020;46:810–7. [DOI] [PubMed] [Google Scholar]
- 15.Cohen JL, Rivkin A, Dayan S, Shamban A, et al. Multimodal facial aesthetic treatment on the appearance of aging, social confidence, and psychological wellbeing: HARMONY study. Aesthet Surg J 2022;42:NP115–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Carruthers A, Carruthers J, Hardas B, Kaur M, et al. A validated grading scale for marionette lines. Dermatol Surg 2008;34(Suppl 2):S167–72. [DOI] [PubMed] [Google Scholar]
- 17.Niforos F, Liew S, Acquilla R, Ogilvie P, et al. Creation and validation of a photonumeric scale to assess volume deficiency in the infraorbital region. Dermatol Surg 2017;43:684–91. [DOI] [PubMed] [Google Scholar]
- 18.Lemperle G, Holmes RE, Cohen SR, Lemperle SM. A classification of facial wrinkles. Plast Reconstr Surg 2001;108:1735–50; discussion 1751-2. [DOI] [PubMed] [Google Scholar]
- 19.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420–8. [DOI] [PubMed] [Google Scholar]
- 20.Cicchetti DV, Allison T. A new procedure for assessing reliability of scoring EEG sleep recordings. Am J EEG Technol 1971;11:101–10. [Google Scholar]
- 21.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74. [PubMed] [Google Scholar]
- 22.Donofrio L, Carruthers J, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of infraorbital hollows. Dermatol Surg 2016;42(Suppl 1):S251–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Carruthers J, Donofrio L, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of facial fine lines. Dermatol Surg 2016;42(Suppl 1):S227–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Carruthers A, Donofrio L, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of static horizontal forehead lines. Dermatol Surg 2016;42(Suppl 1):S243–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Junxue A, Chen L, Xiaobing M, Qu J, et al. Validation of a chin retrusion scale for Chinese subjects. J Craniofac Surg 2022;33:48–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
