Skip to main content
Lippincott Open Access logoLink to Lippincott Open Access
. 2016 Sep 26;42(Suppl 1):S235–S242. doi: 10.1097/DSS.0000000000000851

Development and Validation of a Photonumeric Scale for Evaluation of Transverse Neck Lines

Derek Jones *,, Alastair Carruthers , Bhushan Hardas , Diane K Murphy , Jonathan M Sykes §, Lisa Donofrio , Jean Carruthers , Lela Creutz #, Ann Marx , Sara Dill
PMCID: PMC5671792  PMID: 27661746

Abstract

BACKGROUND

A validated scale is needed for objective and reproducible comparisons of horizontal neck lines before and after treatment in practice and clinical studies.

OBJECTIVE

To describe the development and validation of the 5-point photonumeric Allergan Transverse Neck Lines Scale.

METHODS

The Allergan Transverse Neck Lines Scale was developed to include an assessment guide, verbal descriptors, morphed images, and real subject images for each scale grade. The clinical significance of a 1-point score difference was evaluated in a review of multiple image pairs representing varying differences in severity. Interrater and intrarater reliability was evaluated in a live-subject rating validation study (N = 297) completed during 2 sessions occurring 3 weeks apart.

RESULTS

A difference of ≥1 point on the scale was shown to reflect a clinically significant difference (mean [95% confidence interval] absolute score difference, 1.22 [1.09–1.35] for clinically different image pairs and 0.57 [0.42–0.72] for not clinically different pairs). Intrarater agreement between the 2 live-subject rating validation sessions was substantial (mean weighted kappa = 0.78). Interrater agreement was substantial during the second rating session (0.73, primary end point).

CONCLUSION

The Allergan Transverse Neck Lines Scale is a validated and reliable scale for rating of severity of neck lines.


Horizontal or transverse neck lines can occur at any age.1 Neck lines may be associated with the deposition of submental and subplatysmal fat, and they are exacerbated by age-related decreases in elasticity and thickness of the skin of the neck, combined with gravity and the downward pull of the platysma muscle.24 Horizontal neck lines may be treated with botulinum toxin Type A in cases where the lines are clearly caused by the activity of the platysma muscles,35 although some groups report having little success with this approach.6 Use of injectable filler for the treatment of horizontal neck lines has been reported in one case study1 and in a prospective single-center study in combination with other therapies.7 Other approaches for reducing the appearance of neck lines include rhytidectomy,8 fractional laser treatment,9,10 fractional radiofrequency treatment,11,12 and microfocused ultrasound.13,14

Patients are increasingly seeking treatment for nonfacial rejuvenation, including neck lines, and clinicians need a way to both educate and assess patients regarding treatments. Clinical studies of neck line treatments have assessed outcomes using general numeric wrinkle scales that did not include images and were not validated for the assessment of the neck.9,10,12 This report describes the development and validation of a new photonumeric scale designed to rate horizontal lines of the neck (Allergan Transverse Neck Lines Scale). The scale was created to meet FDA requirements for outcome assessments in clinical trials15 and to provide a practical tool that physicians can use for the assessment of patients. The objectives of this study were to determine the clinically significant difference in scale scores and to establish the interrater and intrarater reliability of the scale for rating severity of horizontal lines of the neck in live subjects.

Methods

Scale Development

Figure 1 summarizes key steps in the creation and validation of the Allergan Transverse Neck Lines Scale. A 9-member team comprising 5 external members (3 board-certified dermatologists, 1 board-certified oculoplastic surgeon, and 1 board-certified facial plastic surgeon) and 4 Allergan employees (2 dermatologists, 1 plastic surgeon, and 1 clinical scientist) developed the scale from a pool of subject images collected for scale development by Canfield Scientific, Inc (Canfield, Fairfield, NJ). A total of 396 men and women aged 18 years or older with Fitzpatrick skin Types I through VI and in good general health volunteered for image capture. All subjects provided informed photograph consent before image collection. Subjects were excluded if they had anything that would interfere with visual assessment of the area of interest. Canfield photographers obtained full 2-dimensional (2D) images of the face and neck using a 2D custom suite for face and neck imaging (Nikon D7100 Hi Res SLR). Images were cropped horizontally from 1 cm lateral to the neck/shoulder junction on the left and right sides and vertically from 1 cm above the bony menton down to 2 cm below the neck/shoulder junction to produce images of the area of interest.

Figure 1.

Figure 1.

Scale development and validation processes.

Scale descriptors were created for each of the 5 grades of the scale (Table 1). Two members of the Allergan team met individually with each member of the scale development team for preliminary input on each scale grade. After preliminary scale grades were established, all 9 individuals involved in scale creation had a collaborative discussion about the scale grades and descriptors. The wording for each grade was then finalized by the Allergan team.

TABLE 1.

Descriptors for the Allergan Transverse Neck Lines Scale

graphic file with name ds-42-s235-g002.jpg

Canfield created an assessment guide with a line drawing of anatomic markers demarcating the anterior third of the neck between each sternocleidomastoid based on detailed instructions from the Allergan team regarding anatomic markers (Figure 2). Canfield revised the drawing multiple times based on careful review by the Allergan team.

Figure 2.

Figure 2.

Assessment guide for the Allergan Transverse Neck Lines Scale.

A base image to demonstrate Grade 2 neck lines was selected, and this image was morphed to represent all 5 grades of the scale. A Canfield graphics technician morphed the anatomic area of interest in the base image to match the descriptors provided for Grades 0, 1, 3, and 4. Alignment of the morphed images with the scale descriptors was achieved via an interactive process with the Allergan team.

A forced ranking review was performed to delineate the range of severity between Grades 2 and 3 and to confirm the selection of the best representative image to be used as Grade 2. The 5 external scale developers performed the web-based forced ranking exercise on preselected images that represented the upper and lower boundaries of Grades 2 and 3.

To determine whether there was a clinically significant difference between grades of the scale, the 5 external scale developers were asked to perform an online clinical significance review of image pairs. Multiple image pairs were selected to represent varying degrees of differences in severity (ranging from no difference to a 4-point difference). During the session, the scale developers determined whether there was a clinically significant difference (Yes/No) between images for each pair. After the session, the images from all image pairs were randomly mixed in with other images to be used in the morphed image scale validation (described in the following paragraph) and assigned a score by scale developers so that score differences between the 2 images in each pair could be calculated.

The morphed image scale was validated by having the 5 external scale developers use the scale to rate randomized images representing all scale grades during 2 web-based sessions occurring at least 3 days apart. A total of 299 images were rated (120 images in Session 1 and 179 images in Session 2). The scale had acceptable interrater and intrarater agreement (>0.5), so scale development proceeded using the morphed images.

For both the clinical significance review and the morphed image scale validation review, Canfield provided scale developers uniform hardware to complete the reviews. Before the reviews, the external scale developers completed a web‐based PowerPoint training to familiarize themselves with the hardware, the review platform, and the purpose of the clinical significance and morphed image validation reviews. The scale developers were not allowed to discuss the reviews with one another, and each completed the reviews independently.

After the morphed image scale was created, 2 subject photographs representing each grade of the scale were selected to represent diversity in sex and Fitzpatrick skin type per grade. The final scale contains the scale descriptors for each grade, an assessment guide, the morphed images, and the real subject images (Figure 3).

Figure 3.

Figure 3.

The Allergan Transverse Neck Lines Scale assigns a grade from none (0) to extreme (4) that describes the presence and depth of transverse lines within the area of the neck demarcated in the diagram in the upper right corner.

Scale Validation

The interrater and intrarater reliability of the final scale was evaluated in a live-subject rating validation study. Eight physician raters experienced in using aesthetic photonumeric scales who were not involved in scale development participated in two 2-day live validation sessions occurring 3 weeks apart. Before the first live evaluation session, all physician raters were trained on the use of the scale in an interactive group training session using 4 example subjects. Raters were instructed to rate only horizontal neck lines, to disregard vertical lines (e.g., platysmal bands on neck), to select a grade based on the most severe line present (with 1 line being sufficient to determine grade), and to assess effaceable versus noneffaceable lines visually and not through attempts to manually efface lines (Figure 3).

All subjects who qualified for the initial image capture events were invited to attend the live validation sessions. Subjects were instructed to arrive clean shaven, remove makeup and jewelry, wear dark pants or jeans and a provided black T-shirt, not drink alcohol excessively before the sessions, try not to alter their usual routine (e.g., their facial care routine and normal sleep or hydration patterns) between sessions, and not have tanning sessions or extensive sun exposure between sessions. Upon arrival at the study center for the first live validation session, subjects signed informed consent and were assessed for eligibility, age, sex, race (as reported by the subject), and Fitzpatrick skin type (determined by the investigator). Subjects were excluded if they had their photographs included in the scale; anything that would interfere with the visual assessment of the area of interest; any treatment with toxin/fillers, dental procedures, or surgery that would alter the area of interest within 2 weeks of the first validation session or plans to have one of these procedures between the 2 sessions; or diagnosis of pregnancy. Two-dimensional images of each subject were collected using a 2D custom studio suite at the first live validation session. The first 5 subjects rated during the first validation session were considered run-in training subjects and were excluded from the analysis.

During the first and second live scale validation sessions, each physician rater evaluated all subjects on all scales (7 additional scales for other anatomic features were evaluated at the same sessions and are reported separately1622). Raters had separate evaluation stations with an examination lamp, table, a stool for subject seating, supplies, and the photonumeric scale mounted and displayed for use in subject evaluation. Subjects presented themselves to each rater individually and proceeded from one rating station to the next in the same order until evaluated by all 8 raters. Raters were instructed to not discuss ratings with subjects or other raters. Raters took at least a 10-minute break every hour and at least a 30-minute lunch break to avoid rater fatigue.

Statistics

To determine the utility of the scale grades for detecting clinically meaningful differences in horizontal neck lines, absolute score differences for the image pairs deemed “clinically different” or “not clinically different” during scale development were summarized (mean, standard deviation, range, 95% confidence interval [CI]). For the live-subject scale validation study, intrarater reliability was compared between Round 1 and Round 2 scores by calculating weighted kappa scores using Fleiss-Cohen weights.23 Kappa scores within the range of 0.0 to 0.20 indicate slight agreement, 0.21 to 0.40 indicate fair agreement, 0.41 to 0.60 indicate moderate agreement, 0.61 to 0.80 indicate substantial agreement, and 0.81 to 1.00 indicate almost perfect agreement.24 Interrater agreement was measured by determining the intraclass correlation coefficient (ICC [2,1]) and 95% CIs calculated using the formula described by Shrout and Fleiss.25 The a priori primary end point for the interrater agreement analysis was ICC (2,1) for the second rating session. SAS version 9.3 (Cary, NC) was used for all statistical analyses.

Sample Size Considerations

The sample size for the live-subject validation sessions was calculated using the method described by Bonett.26 With up to 10 raters and an ICC of 0.5, a total of 66 subjects were needed in order to have a 95% CI with a width of 0.2 for interrater reliability. Considering the potential loss of subjects between the 2 rounds, at least 80 subjects were to be enrolled for the scale. Because 297 subjects were eligible for the scale validation analysis, the number of subjects evaluated using this scale was substantially larger than the preplanned sample size of 80, and the overall number of assessments for some grades of this scale were larger than those for the other grades. To minimize the imbalance in the number of subjects across scale grades and to meet the sample size requirement, the mean score across the 8 raters for each subject was used to assign an overall grade for each subject, and a subset of 80 subjects with minimal imbalance across the grades (∼16 subjects per each of the 5 scale grades) was randomly selected from the eligible subjects using a prespecified procedure and a preselected randomization seed. This random selection of the subset was performed 20 times. Interrater and intrarater agreements calculated for each of the 20 subsets were combined using SAS procedure PROC MIANALYZE to obtain the overall interrater and intrarater agreements.

Results

Clinical Significance Determination by Scale Developers

The mean (95% CI) absolute difference in scores was 1.22 (1.09–1.35) for image pairs identified as clinically different and 0.57 (0.42–0.72) for image pairs identified as not clinically different (Table 2). The 95% CIs for clinically different pairs did not overlap with the 95% CIs for pairs deemed not clinically different, confirming that a 1-point difference in scores is clinically significant.

TABLE 2.

Differences in Scores for Image Pairs Deemed Clinically Different or Not Clinically Different Using the Allergan Transverse Neck Lines Scale

graphic file with name ds-42-s235-g005.jpg

Live-Subject Scale Validation

Of the 297 subjects eligible for Allergan Transverse Neck Lines Scale validation analysis, 288 subjects were selected in at least 1 of the 20 random subsets. Demographic characteristics of subjects in the final scale validation set are shown in Table 3. Most subjects were female (67%), Caucasian (79%), and had Fitzpatrick skin Type III (27%) or IV (33%). Median age was 48 years, and a broad span of ages was represented (18–83 years).

TABLE 3.

Demographics of Subjects in the Live Scale Validation Study for the Allergan Transverse Neck Lines Scale

graphic file with name ds-42-s235-g006.jpg

Intrarater agreement between the 2 live-subject rating validation sessions was substantial (mean weighted kappa = 0.78) (Table 4). Interrater agreement for the Allergan Transverse Neck Lines Scale was substantial in Session 1 (0.72) and Session 2 (0.73) (Table 4).

TABLE 4.

Physician Intrarater and Interrater Agreement on the Allergan Transverse Neck Lines Scale (Validation Testing With Live Subjects)

graphic file with name ds-42-s235-g007.jpg

Discussion

This study demonstrated substantial interrater and intrarater agreement for the Allergan Transverse Neck Lines Scale, indicating that the scale is reliable for multiple assessments of the same subject and across different raters. A 1-point difference in scale ratings was shown to reflect clinically significant differences, indicating that the scale has sufficient sensitivity for detecting clinically significant changes in horizontal lines of the neck.

The scale requires that effaceable versus noneffaceable lines be assessed visually, not manually; most physicians with experience in the treatment of neck lines can generally tell whether the line is effaceable with visual inspection alone. The scale uses morphed images to represent each grade to focus the rater's attention on the change from one grade to the next, with all other features remaining constant across scale grades. Real-world images representing a diverse range of skin types across sexes and races are an important addition to the scale because morphed images may not always translate to the broad array of appearances or physical changes observed in the clinic. Representation of both sexes and multiple ethnic groups in rating scales is important, as growing numbers of men and members of diverse ethnic groups are seeking aesthetic facial treatment.4,27

Patients are increasingly seeking aesthetic treatment for areas other than the face, including the neck. In the experience of the authors, transverse neck lines are often observed in younger patients, even those without extensive photodamage. In some middle-aged patients, the neck is much more severely damaged than the face, making neck lines a chief concern. Restoration of a more normal neck appearance can substantially improve self-esteem and confidence. Clinicians need a way to both educate and assess patients for neck line treatments, and the Allergan Transverse Neck Lines Scale provides standardized ratings that may be uniformly applied in day-to-day clinical practice and potentially in clinical trials, due to its validation in live subjects and use of both morphed and unaltered images.

The Allergan Transverse Neck Lines Scale is not used to rate vertical neck lines. In the experience of the authors, neck treatments such as botulinum toxin Type A are especially useful for improving the appearance of the neck and jaw line rather than just reducing lines; the loss of downward pull and the softening of vertical lines are also important considerations with neck treatments. More generic wrinkle scales may be helpful for assessing vertical neck lines.9,10,12

Study Limitations

The scale developers solely determined the clinical significance of scale scores; although a 1-point change on the scale was considered meaningful to the scale developers, it may or may not be meaningful to subjects. Hence, this scale is not intended for patient self-assessment of meaningful improvement. Use of the FACE-Q appearance appraisal scale, a validated patient satisfaction instrument with a subscale for satisfaction with the neck, may be helpful for capturing the perspective of the patient on the appearance before and after treatment.28 Finally, the verbal descriptors for each grade on the Allergan Transverse Neck Lines Scale are subjective. However, the descriptors were developed and refined during extensive collaboration among 9 clinical experts to minimize inherent subjectivity.

Conclusions

Because increasing numbers of patients are seeking aesthetic treatment of the neck, there is a need for a validated scale for the assessment of neck lines. The Allergan Transverse Neck Lines Scale includes user-friendly diagrams, detailed verbal descriptions, and morphed and real subject images representative of both sexes and diverse skin types. The scale demonstrated substantial intrarater and interrater agreement among physicians, and a 1-point score difference was shown to reflect clinically significant differences in horizontal neck lines. The scale meets FDA criteria for validated clinical outcome measures in clinical trials and provides standardized ratings that can be uniformly applied by dermatologists and plastic surgeons who treat patients seeking treatment of horizontal lines of the neck.

Acknowledgments

The authors thank the following physicians for completing the scale validation study: David E. Bank, MD, FAAD; Sue Ellen Cox, MD; Timothy M. Greco, MD, FACS; Z. Paul Lorenc, MD, FACS; David J. Narins, MD, PC, FACS; William B. Nolan, MD; Robert A. Weiss, MD; and Margaret Weiss, MD. Statistical support was provided by Yijun Sun, PhD, and Shraddha Mehta, PhD of Allergan plc, Irvine, CA.

Footnotes

Supported by Allergan plc, Dublin, Ireland. Editorial support for this article was provided by Peloton Advantage, Parsippany, New Jersey, and was funded by Allergan plc. The authors received an honorarium for participating in scale development and validation.

B. Hardas, A. Marx, and D.K. Murphy are employees of Allergan plc. L. Creutz provided medical writing services at the request of the authors, which was funded by Allergan plc. The remaining authors have indicated no significant interest with commercial supporters. The opinions expressed in this article are those of the authors. The authors received no honorarium or other form of financial support related to the development of this article.

References

  • 1.Chao YY, Chiu HH, Howell DJ. A novel injection technique for horizontal neck lines correction using calcium hydroxylapatite. Dermatol Surg 2011;37:1542–5. [DOI] [PubMed] [Google Scholar]
  • 2.Brandt FS, Boker A. Botulinum toxin for the treatment of neck lines and neck bands. Dermatol Clin 2004;22:159–66. [DOI] [PubMed] [Google Scholar]
  • 3.Raspaldo H, Niforos FR, Gassia V, Dallara JM, et al. Lower-face and neck antiaging treatment and prevention using onabotulinumtoxin A: the 2010 multidisciplinary French consensus–part 2. J Cosmet Dermatol 2011;10:131–49. [DOI] [PubMed] [Google Scholar]
  • 4.Carruthers JD, Glogau RG, Blitzer A. Advances in facial rejuvenation: botulinum toxin type a, hyaluronic acid dermal fillers, and combination therapies–consensus recommendations. Plast Reconstr Surg 2008;121(Suppl 5):5S–30S. [DOI] [PubMed] [Google Scholar]
  • 5.Ascher B, Talarico S, Cassuto D, Escobar S, et al. International consensus recommendations on the aesthetic usage of botulinum toxin type A (Speywood Unit)—Part II: wrinkles on the middle and lower face, neck and chest. J Eur Acad Dermatol Venereol 2010;24:1285–95. [DOI] [PubMed] [Google Scholar]
  • 6.Dayan SH, Maas CS. Botulinum toxins for facial wrinkles: beyond glabellar lines. Facial Plast Surg Clin North Am 2007;15:41–9, vi. [DOI] [PubMed] [Google Scholar]
  • 7.Sarnoff DS, Gotkin RH. ACELIFT: a minimally invasive alternative to a facelift. J Drugs Dermatol 2014;13:1038–46. [PubMed] [Google Scholar]
  • 8.Agarwal A, Dejoseph L, Silver W. Anatomy of the jawline, neck, and perioral area with clinical correlations. Facial Plast Surg 2005;21:3–10. [DOI] [PubMed] [Google Scholar]
  • 9.Oram Y, Akkaya AD. Neck rejuvenation with fractional CO2 laser: long-term results. J Clin Aesthet Dermatol 2014;7:23–9. [PMC free article] [PubMed] [Google Scholar]
  • 10.Bencini PL, Tourlaki A, Galimberti M, Pellacani G. Non-ablative fractionated laser skin resurfacing for the treatment of aged neck skin. J Dermatolog Treat 2015;26:252–6. [DOI] [PubMed] [Google Scholar]
  • 11.Abraham MT, Ross EV. Current concepts in nonablative radiofrequency rejuvenation of the lower face and neck. Facial Plast Surg 2005;21:65–73. [DOI] [PubMed] [Google Scholar]
  • 12.Alexiades M, Berube D. Randomized, blinded, 3-arm clinical trial assessing optimal temperature and duration for treatment with minimally invasive fractional radiofrequency. Dermatol Surg 2015;41:623–32. [DOI] [PubMed] [Google Scholar]
  • 13.Fabi SG, Goldman MP. Retrospective evaluation of micro-focused ultrasound for lifting and tightening the face and neck. Dermatol Surg 2014;40:569–75. [DOI] [PubMed] [Google Scholar]
  • 14.Woodward JA, Fabi SG, Alster T, Colon-Acevedo B. Safety and efficacy of combining microfocused ultrasound with fractional CO2 laser resurfacing for lifting and tightening the face and neck. Dermatol Surg 2014;40(Suppl 12):S190–3. [DOI] [PubMed] [Google Scholar]
  • 15.U.S. Department of Health and Human Services, Food and Drug Administration. Guidance for industry: patient-reported outcome measures: use in medical product development to support labeling claims. 2009. Available from: http://www.fda.gov/downloads/Drugs/Guidances/UCM193282.pdf. Accessed July 21, 2016.
  • 16.Jones D, Donofrio L, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of volume deficit of the hand. Dermatol Surg 2016;42(Suppl 10):S195–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sykes JM, Carruthers A, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for assessment of chin retrusion. Dermatol Surg 2016;42(Suppl 10):S211–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Donofrio L, Carruthers A, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of facial skin texture. Dermatol Surg 2016;42(Suppl 10):S219–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Carruthers J, Jones D, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of volume deficit of the temple. Dermatol Surg 2016;42(Suppl 10):S203–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Carruthers J, Donofrio L, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of facial fine lines. Dermatol Surg 2016;42(Suppl 10):S227–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Carruthers A, Donofrio L, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of static horizontal forehead lines. Dermatol Surg 2016;42(Suppl 10):S243–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Donofrio L, Carruthers J, Hardas B, Murphy DK, et al. Development and validation of a photonumeric scale for evaluation of infraorbital hollows. Dermatol Surg 2016;42(Suppl 10):S251–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measure of reliability. Educ Psychol Meas 1973;33:613–9. [Google Scholar]
  • 24.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159–74. [PubMed] [Google Scholar]
  • 25.Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420–8. [DOI] [PubMed] [Google Scholar]
  • 26.Bonett DG. Sample size requirements for estimating intraclass correlations with desired precision. Stat Med 2002;21:1331–5. [DOI] [PubMed] [Google Scholar]
  • 27.American Society of Plastic Surgeons. 2014 Plastic surgery statistics report. 2015. Available from: http://www.plasticsurgery.org/Documents/news-resources/statistics/2014-statistics/plastic-surgery-statsitics-full-report.pdf. Accessed July 21, 2016. [Google Scholar]
  • 28.Klassen AF, Cano SJ, Scott AM, Pusic AL. Measuring outcomes that matter to face-lift patients: development and validation of FACE-Q appearance appraisal scales and adverse effects checklist for the lower face and neck. Plast Reconstr Surg 2014;133:21–30. [DOI] [PubMed] [Google Scholar]

Articles from Dermatologic Surgery are provided here courtesy of Wolters Kluwer Health

RESOURCES