Abstract
Introduction:
The U.S. Food and Drug Administration’s The Real Cost campaigns inform the public about the harms of tobacco use. The U.S. Food and Drug Administration has followed a 2-part approach to advertise formative testing to ensure sufficient audience receptivity: preproduction qualitative testing and postproduction quantitative testing. This process provides safeguards for ad performance but course corrections to message selection or development after the ads are already fully produced can be time and resource intensive. This study assesses equivalence in message evaluations between pre and postproduction campaign ads.
Methods:
Data were collected in 2023 from an online panel survey of 804 youth aged 13–17 years, who currently use or are susceptible to use E-cigarettes or cigarettes. Participants were randomly assigned to a set of 4 preproduction or postproduction The Real Cost campaign ads. Participants completed a series of message evaluation measures after each ad. Authors tested for equivalence of mean message evaluation scores between the pre and postproduction versions of each ad using 2 one-sided t-tests and estimated concordance correlation coefficients. Data were analyzed in 2024.
Results:
Aggregate ad performance scores between pre and postproduction versions were equivalent. Patterns varied for individual ads, with most demonstrating equivalence on some but not all message evaluation measures. Concordance correlation coefficients for pre and postproduction ads were positive and moderate in size.
Conclusions:
Results suggest that message evaluation assessments of preproduction ads are likely a good indicator of message evaluation assessments for postproduction ads. Public health practitioners may consider incorporating quantitative copy testing earlier in the formative testing process to identify the most promising ad concepts.
INTRODUCTION
The Family Smoking Prevention and Tobacco Control Act grants the U.S. Food and Drug Administration (FDA) broad authority to regulate the manufacture, distribution, and marketing of tobacco products.1 As part of this authority, FDA’s Center for Tobacco Products is tasked with educating the public, including youth, about the dangers of using tobacco products. Public education campaigns are an evidence-based strategy to prevent tobacco use among youth.2 In 2014, FDA launched The Real Cost Youth Cigarette Prevention Campaign to educate youth aged 12–17 years about the dangers of cigarette smoking. The Real Cost brand was later expanded in 2018 to include other tobacco products, including The Real Cost Youth E-Cigarette Prevention Campaign.
The Real Cost campaigns use a scientific approach to campaign design, implementation, and evaluation. When developing new ad content for The Real Cost campaigns, the FDA integrates lessons learned from the performance of past ads. For example, after research found that framing addiction-related messages as threats to independence was effective with youth, subsequent addiction-focused ads incorporated the same loss-of-control framing strategies.3 In addition, new ad content incorporates new or emerging research on teen trends and tobacco-related science. The campaigns continually monitor the tobacco, teen, and media landscapes using national data sets such as the National Youth Tobacco Survey and the Youth Risk Behavior Surveillance System, Google search trends, social media listening, conversations with subject-matter experts, and in-depth interviews with teens about their lives and their relationship with tobacco. This information is used to develop campaign messages and ad concepts, which may undergo formative testing with members of the intended audiences. For example, this monitoring revealed that bathrooms were a key location for E-cigarette use among teens, which directly informed the development of an ad set in a high-school bathroom.
This information is used to develop campaign messages and ad concepts, which may undergo formative testing with members of the intended audiences. Once a campaign ad is launched, performance is continually measured and optimized through digital metrics (e.g., impressions, video completion rate). The FDA also conducts outcome evaluation surveys to measure the impact of ad exposure on tobacco-related knowledge, attitudes, and behaviors. These activities ensure that a scientific approach is used throughout the campaign lifecycle and have led to effective ads that have demonstrated high receptivity,4 changes in campaign-related beliefs and attitudes,5,6 and behavioral change7,8 among the intended audiences.
The FDA has developed criteria for undergoing formative testing that enables the development, rigorous testing, and timely launch of new or refreshed ad concepts to keep pace with a rapidly changing tobacco marketplace and an ever-evolving teen audience. When determining whether an ad concept will undergo pre-market testing, the FDA considers whether the concept is for a new audience, includes a new tobacco-related fact, and whether it is part of an established messaging approach that has previously demonstrated effectiveness, in addition to adherence to federal rules and guidance on such testing. For ads that undergo formative testing, the FDA has generally followed a 2-part approach to formative testing to assess and ensure sufficient receptivity among the intended audience: preproduction qualitative testing and postproduction quantitative testing.9
The first stage of formative testing seeks to gather audience feedback on ad concepts or animatics, which are draft visual representations of what an ad could look like when fully produced. Animatics are commonly used by commercial advertisers and public health practitioners as a cost-efficient way to pretest ads and concepts prior to spending the time and budget to produce the ads. These animatics (also referred to as preproduction ads) take a variety of formats, such as still images with a voice over, basic line drawings, and higher-resolution color video animatics (Figure 1 provides examples of preproduction concepts from The Real Cost campaigns). Once animatics are developed, FDA tests them in focus group discussions with the campaign’s intended audience, where groups are shown between 4 and 5 animatics and asked for feedback on the main message, believability, emotional response, likes/dislikes, and format/tone after viewing each concept. On the basis of qualitative data from the focus groups, FDA selects and refines promising ad concepts for production.
Figure 1.

Screenshot examples of pre and postproduction concepts from The Real Cost Campaigns.
Notes: Figure 1A–C displays screenshots from preproduction versions of each ad concept, and Figure 1D–F displays screenshots from the postproduction versions of each ad. Figure 1A is a full-color animatic. Figure 1B is a still image with voiceover. Figure 1C is a black-and-white boardomatic with voiceover.
After production, ads may be quantitatively tested with the campaign’s intended audience through postproduction testing (i.e., copy testing) prior to being launched in the market. A substantial body of research on quantitative message testing indicates that this approach can predict a message’s potential to impact audience campaign-related knowledge, attitude, and behavioral intentions.10,11 The FDA assesses multiple dimensions of message evaluation (ME) commonly included in the literature, including perceived effectiveness (PE) measures,12,13 emotional responses,14 and psychological reactance.15
FDA’s formative research process provides safeguards for ad performance at multiple stages of production. Qualitative preproduction testing of animatics provides a low-cost avenue to explore creative ideas, gather in-depth feedback and perspectives from the campaign’s intended audience, and make revisions before the advertisement goes into full production. Quantitative postproduction testing, in turn, can triangulate preproduction findings and offset limitations of qualitative research by enabling direct standardized comparisons across messages; generalizing findings; and assessing the impact of the ad on beliefs, attitudes, and behavioral intentions.16
Although this testing process has led to content that has demonstrated high effectiveness, it has also presented a few challenges. First, a 2-part approach to formative testing increases both the timeline and budget for ad production. Second, decision making around which ads to produce is based solely on qualitative data. Quantitative ME measures are collected after the ads have been developed and produced, which limits the ability to easily refine the ads on the basis of quantitative results. Because of these limits, quantitative measures serve essentially as a safeguard (e.g., preventing ads that score poorly on quantitative measures from being launched into the market or prompting minor postproduction changes, such as voiceover script refinement, when quantitative testing suggests confusion or unintended consequences among the intended audience).
One solution to address these challenges is to use a mixed-methods design with concurrent timing rather than sequential timing; that is, incorporate quantitative ad effectiveness measures alongside preproduction qualitative focus group testing. In addition to yielding significant cost savings, this approach offers several logistical and methodologic benefits. First, if any red flags or unintended consequences are identified through quantitative measures, it is less time and resource intensive to make changes to the concept before the ad has gone through the production process. Second, collecting quantitative measures during preproduction testing allows for testing a much larger number of concepts and/or multiple variations of the same concept, better leveraging the diagnostic capabilities of message effectiveness measures and ensuring that the strongest ads will be produced. Third, a concurrent mixed-methods approach to preproduction testing allows for timely and concurrent data triangulation to enable more robust decision making around ad selection to go into production. Finally, this streamlined approach allows for adaptation to a rapidly changing marketplace and the ability to launch new concepts into the market more quickly while also ensuring a robust evidence base for ad production decisions.
Despite these potential benefits of incorporating quantitative measures earlier in the formative testing process, more research is needed to understand how quantitative ME measures perform with ad concepts in the preproduction stage. Literature evaluating the predictive validity of ME measures has primarily been conducted using postproduction ads,11 and limited research has been conducted to assess the extent to which preproduction ad performance is equivalent to postproduction ad performance. A small, older body of commercial advertising research with adult participants found that assessments at the animatic stage of concept development generally correspond to those of the finished execution, including emotional responses, but may be less effective for concepts that will rely on audio–visual techniques to create an emotional response (e.g., food commercials focused on appetite and pleasure appeals).17,18 A more recent study adds that animatics may be perceived as equally credible and effective as fully produced ads when focus group moderators explicitly instruct participants that the ad is a draft or preview of an ad that will later be developed.19
However, little research in this area has been conducted in recent years, and there have been many technological advancements not reflected in this older body of work (e.g., the ability to create highly finessed animation as opposed to basic storyboards or other static images). This study updates and adds to the body of research by assessing equivalence in MEs between preproduction animatics and fully produced ads among youth. In addition, the study assesses the extent to which an ad’s relative performance assessed before production is diagnostic of its relative performance assessed after production.
METHODS
Study Sample
Data for this study were obtained through an online, cross-sectional survey (N=804) conducted between August 10 and September 14, 2023. The study was approved by Advarra IRB. To recruit participants, an online survey vendor, Dynata, sent invitations to adults in their panel who were likely to have eligible children on the basis of their panel profiles. Once parental permission was obtained, youth were asked to give their assent to participate and were then directed to a brief screener to determine eligibility.
Eligible participants were U.S.-based youth (aged 13–17 years) who reported current use of or susceptibility to E-cigarettes or cigarettes. Current use was defined as having used E-cigarettes or cigarettes on 1 or more of the past 30 days. Lifetime use was defined as having used E-cigarettes or cigarettes at least once in their lifetime but not in the past 30 days. Susceptibility to E-cigarettes and cigarettes was determined using an adapted version of the measure where respondents who answered definitely yes, probably yes, or probably not to at least 1 of the following were considered susceptible: intent to vape or smoke in the next year, intent to vape or smoke soon, or likelihood that they would try vaping or smoking if offered by a friend.20
Potential participants were excluded if they reported established cigarette smoking (smoked at least 100 cigarettes in their lifetime), if they had participated in tobacco-related research in the past 3 months, or if they or a family member worked or works for a tobacco company. Participants were assigned to either an E-cigarette or cigarette use group on the basis of their reported use of and susceptibility to E-cigarettes and cigarettes with quotas to achieve an even distribution between use groups. For example, a respondent who reported current use of E-cigarettes only and was not susceptible to and had not used cigarettes would be assigned to the E-cigarette use group. Respondents who qualified for both use groups were assigned to the cigarette group until the cigarette use group quota was filled. Within each use group, quotas were set to achieve a balanced distribution of participants who are susceptible to versus those who had ever used the product.
Within each use group, participants were randomly assigned to 1 of 2 experimental study conditions. Those in the preproduction ad condition were shown a series of preproduction ads (e.g., animatic, still image series); those in the postproduction ad condition were shown corresponding fully produced video ads designed for airing on online video, streamed TV, and/or broadcast TV (examples are provided in Figure 1; full descriptions are provided in Appendix Table 1, available online; and key frames are provided in Appendix Table 2, available online). This resulted in the following 4 study conditions: E-cigarette preproduction ads (n=199), E-cigarette postproduction ads (n=203), cigarette preproduction ads (n=200), and cigarette postproduction ads (n=202). A built-in least-fill randomizer function was used on the survey platform to achieve a relatively even distribution across groups.
Each ad set (E-cigarettes and cigarettes) included 8 The Real Cost ads ranging from 15 to 45 seconds and 1 comparison ad (i.e., informational ad featuring text description and narration describing tobacco product features). Comparison ads were identical between pre and postproduction conditions and were used to descriptively compare with mean scores for The Real Cost ads; results for comparison ads are not shown in the primary results tables. Appendix Tables 3 and 4 (available online) display mean scores for pre and postproduction The Real Cost and comparison ads. Within each study condition, respondents were shown a random selection of 4 of the 9 ads and asked a series of ME measures after each ad.
Measures
PE was assessed using a scaled measure of the following items: This ad is… worth remembering, powerful, informative, meaningful, convincing and This ad grabbed my attention on a 5-point scale (1=strongly disagree to 5=strongly agree).12 Effects perceptions was captured using scaled measure of the following items: How much does this message… Make you worry about what vaping/smoking cigarettes will do to you, Make you think vaping/smoking cigarettes is a bad idea, and Discourage you from vaping/smoking cigarettes, on a 5-point scale (1=not at all to 5=a great deal).13 Reactance was assessed through a scaled measure of the following items: This ad… is trying to manipulate me, annoys me, and is over-blown on a 5-point scale (1=strongly disagree to 5=strongly agree).15 Negative emotional reactions included 3 individual items (sad, afraid, and angry) rated on a 5-point scale (1=not at all to 5=very). Positive emotional reactions comprised 3 individual items (amused, surprised, and hopeful) using the same 5-point scale.
Statistical Analysis
To confirm that the random assignment to ad condition worked as intended, Fisher’s exact tests were used to determine whether there were nonrandom associations between ad condition (before versus after production) and age, race/ethnicity, and geographic region of residence. Tests were conducted for each tobacco product (E-cigarettes and cigarettes) separately.
The study addressed the following research questions: (1) To what extent do ad reactions vary between pre and postproduction ad versions? Compared with preproduction ads, are postproduction versions perceived as less, more, or equally effective? (2) To what extent are ad reactions for preproduction ads associated with ad reactions for postproduction ads?
Research Question 1 tested for equivalence of mean ME scores between pre and postproduction ads. Equivalence testing is a statistical procedure that involves testing whether the means of 2 groups are practically equivalent, according to a predetermined equivalence range. A difference in mean scores that is within the equivalence range is considered practically equivalent; differences that are greater than the equivalence range are not. To determine an appropriate equivalence range, data from a recent study were used, which included assessment of audience reactions to 78 postproduction E-cigarette prevention ads selected from FDA, Truth Initiative, and state and county public health agencies.21 For each ME measure, ad-level data were used to calculate an equivalence range as the difference between the mean and lowest score. For example, in this study, the average mean PE score (range=1–5) was 3.93, and the lowest score was 3.31; the difference in these scores—0.62—was used to represent the equivalence range for PE. This same approach was used to determine the equivalence range for other ME measures. To be considered practically different, the difference between pre and postproduction ads (accounting for uncertainty) would need to be greater than the equivalence range for a given ME measure.
Equivalence of mean ME scores between pre and postproduction ads was tested within a symmetric equivalence interval, using a 2 one-sided t-tests approach.22 Using Stata’s tostt procedure, for each ME measure, the authors conducted unpaired t-tests with null hypotheses of the following form: Ho1: Δ − (mean [preproduction ME score] − mean[postproduction ME score]) ≤ 0; Ho2: (mean[preproduction ME score] − mean[postproduction ME score]) + Δ ≤ 0, where the equivalence interval ranges from ∅ − Δ to ∅ + Δ, and where ∅ is the difference in means. Δ is the equivalence range expressed in the same units as the pre and postproduction ME scores. Each null hypothesis is a statement of nonequivalence, that is, if either null hypothesis is true, then the mean scores are not equivalent. Thus, it requires both null hypotheses to be rejected to consider the mean scores equivalent. Each test is conducted using an alpha of 0.05. Analyses were conducted overall for each ad set (E-cigarette versus cigarette prevention ads) and for each individual ad.
Research Question 2 estimated concordance correlation coefficients (CCCs) between pre and postproduction ME scores. The CCC assesses agreement on a continuous measure obtained by 2 persons or methods,23,24 enabling examination of the extent of agreement between pre and postproduction ads. CCC range from 0 to ±1, with 1 indicating perfect positive agreement, −1 indicating perfect negative agreement, and zero indicating no agreement. To facilitate these analyses, the authors created an ad-level data set with 1 observation per ad (n=16; 8 E-cigarette prevention ads and 8 cigarette prevention ads), with records of mean ME scores for each version (before and after production) of each ad. Because of the small ad-level sample size, these analyses combined all E-cigarette and cigarette prevention ads.
RESULTS
Table 1 shows sample characteristics. The distribution of age categories ranged from 14.9% (age 13 years) to 25.1% (age 16 years). The majority of respondents were White, non-Hispanic (62.6%). Geographic residence varied, with the greatest proportion of respondents (40.4%) reporting that they lived in the south. Across categories of E-cigarette use and susceptibility, the greatest proportion of respondents (44.8%) had never used E-cigarettes and were susceptible to using them. Across categories of cigarette use and susceptibility, the greatest proportion of respondents (40.0%) had never smoked cigarettes and were not susceptible to using them. Current use of other tobacco products ranged from 2.1% (smokeless tobacco) to 9.7% (cigars). No statistically significant (p<0.05) associations were found between the experimental study condition and age, race/ethnicity, and geographic region of residence, suggesting that assignment to the condition was adequately randomized.
Table 1.
Sample Characteristics
| Variable | % | n |
|---|---|---|
| Age, years | ||
| 13 | 14.9 | 120 |
| 14 | 15.9 | 128 |
| 15 | 20.1 | 162 |
| 16 | 25.1 | 202 |
| 17 | 23.9 | 192 |
| Race/ethnicity | ||
| White, NH | 62.6 | 503 |
| Black, NH | 9.7 | 78 |
| Asian, NH | 2.5 | 20 |
| Other/multiracial, NH | 7.0 | 56 |
| Hispanic | 18.3 | 147 |
| Region | ||
| Northeast | 19.0 | 153 |
| Midwest | 21.8 | 175 |
| South | 40.4 | 325 |
| West | 18.8 | 151 |
| E-cigarette use status | ||
| Never used, not susceptible | 1.4 | 11 |
| Never used, susceptible | 44.8 | 360 |
| Susceptible lifetime user | 15.0 | 121 |
| Current user | 38.8 | 312 |
| Cigarette smoking status | ||
| Never used, not susceptible | 40.0 | 322 |
| Never used, susceptible | 30.3 | 244 |
| Susceptible lifetime user | 11.7 | 94 |
| Current user | 17.9 | 144 |
| Current use of other tobacco Products | ||
| Smokeless tobacco | 2.1 | 17 |
| Cigars | 9.7 | 78 |
| Hookah | 5.3 | 43 |
Note: Other/multiracial, NH includes those identifying as American Indian/Native Alaskan, Native Hawaiian/other Pacific Islander, or multiple races and non-Hispanic ethnicity.
NH, non-Hispanic.
Tables 2 and 3 show raw differences in mean ME scores for E-cigarette and cigarette prevention ads, with positive scores reflecting ads for which postproduction scores were higher than those of preproduction ads and negative scores reflecting ads with higher scores on preproduction than on postproduction versions. Overall, across both E-cigarette and cigarette prevention ads, mean ME scores between pre and postproduction versions were equivalent. Patterns varied for individual ads, with most ads demonstrating equivalence on some but not all measures. In most cases of nonequivalence, postproduction ads scored higher than preproduction ads. Among 72 equivalence tests for individual E-cigarette ad pairs, 60 were equivalent, and 12 were not equivalent. Among 72 equivalence tests for individual cigarette ad pairs, 67 were equivalent, and 5 were not equivalent.
Table 2.
Difference in Mean ME Scores for E-cigarette Prevention Ads (Post–Pre)
| Ad | Perceived effectiveness | Effects perceptions | Reactance | Sad | Afraid | Angry | Amused | Surprised | Hopeful |
|---|---|---|---|---|---|---|---|---|---|
| Overall | 0.09 | 0.00 | −0.07 | 0.18 | 0.18 | 0.06 | −0.04 | 0.11 | −0.08 |
| Scary Enough | 0.51 | 0.31 | −0.28 | 0.22 | 0.63 | 0.36 | −0.37 | 0.35 | 0.00 |
| Awkward Silence: Ride Home | −0.13 | −0.25 | −0.02 | 0.49 | 0.31 | 0.03 | −0.16 | −0.29 | −0.27 |
| Possessed | 0.28 | 0.24 | −0.03 | 0.35 | 0.34 | 0.44 | −0.50 | 0.10 | −0.32 |
| Epidemic | 0.50 | 0.38 | −0.13 | 0.41 | 0.32 | 0.06 | −0.05 | 0.72 | −0.07 |
| No Vape in Team: Cheer | −0.15 | −0.20 | −0.14 | −0.19 | −0.28 | −0.42 | −0.24 | −0.15 | 0.08 |
| No Vape in Team: Football | −0.33 | −0.47 | 0.14 | 0.20 | 0.13 | 0.13 | 0.10 | 0.18 | −0.27 |
| Nicotine Addiction Isn’t Pretty: Bathroom | −0.13 | −0.19 | 0.05 | 0.25 | −0.03 | 0.06 | 0.55 | −0.05 | −0.02 |
| Macroscopic Metals | 0.20 | 0.20 | −0.15 | −0.02 | 0.20 | −0.08 | 0.08 | 0.07 | 0.09 |
Note: Boldface indicates that scores are equivalent between pre and postproduction ads (within equivalence bounds for a given metric).
ME, message evaluation.
Table 3.
Difference in Mean ME Scores for Cigarette Prevention Ads (Post–Pre)
| Ad | Perceived effectiveness | Effects perceptions | Reactance | Sad | Afraid | Angry | Amused | Surprised | Hopeful |
|---|---|---|---|---|---|---|---|---|---|
| Overall | 0.23 | 0.15 | −0.10 | −0.05 | −0.03 | 0.06 | 0.03 | 0.17 | 0.11 |
| Said Every Smoker Ever | 0.13 | 0.23 | 0.03 | 0.13 | −0.01 | 0.17 | 0.16 | 0.21 | 0.07 |
| The Auctioneer | 0.28 | 0.12 | −0.24 | −0.22 | −0.19 | −0.07 | −0.04 | −0.19 | 0.02 |
| Little Lungs: Snowboard | 0.20 | 0.05 | 0.12 | −0.04 | 0.01 | −0.01 | −0.28 | 0.19 | −0.04 |
| Little Lungs: Celebrity | 0.25 | 0.16 | −0.14 | 0.04 | −0.07 | −0.01 | 0.29 | 0.28 | −0.02 |
| Delivery | 0.18 | 0.19 | −0.09 | 0.41 | 0.26 | 0.08 | −0.40 | 0.37 | −0.09 |
| Contract | 0.24 | 0.15 | −0.22 | 0.07 | −0.21 | 0.01 | 0.05 | −0.14 | 0.13 |
| Smoke Army | 0.32 | 0.10 | −0.25 | −0.29 | −0.04 | 0.08 | 0.00 | 0.32 | −0.02 |
| Tooth | 0.08 | 0.14 | 0.02 | −0.42 | −0.02 | 0.31 | −0.08 | 0.08 | 0.49 |
Note: Boldface indicates scores are equivalent between pre and postproduction ads (within equivalence bounds for a given metric).
ME, message evaluation.
Figure 2 shows mean scores and CCCs for pre and postproduction ads, by ME measure. To aid interpretation, exhibit figures include a gray 45-degree line (indicative of perfect positive agreement) to illustrate deviation in paired values from perfect positive agreement. Across all ME measures, CCCs for pre and postproduction ads were positive, ranging from 0.40 (reactance) to 0.79 (surprised). Although there are no objective criteria for evaluating CCC scores, Lawrence and Chinchilli25 suggest that comparisons should be made for similar ranges of measurement and to historical values for a similar method of measurement. However, with a lack of comparable historical values, Altman26 suggests a general rule of thumb that scores <0.20 are poor, and those ≥0.80 are excellent.
Figure 2.

Mean scores and concordance correlation coefficients for pre and postproduction ads.
DISCUSSION
This study used 16 pre/postproduction sets of ad concepts from The Real Cost campaigns to first assess the extent to which preproduction ME ad performance is equivalent to ME of the same concept’s postproduction ad performance. Overall, the mean scores for audience reactions to both E-cigarette and cigarette prevention ads were substantively equivalent between production stages (i.e., before production compared with after production).
The individual ads followed largely the same pattern. Among the 144 equivalence tests for individual ad pairs, a total of 127 (88% of total tests) were equivalent. In individual ad findings where equivalence was not found, generally, postproduction ads had higher ME scores. Specifically, for measures that assessed the effectiveness of the ad (i.e., PE and effects perceptions), all but 1 finding of nonequivalence showed higher scores for the postproduction version of the ad. These overall findings of higher PE for postproduction ads are in the expected direction because postproduction ads are more refined and professionally produced. These results also provide confidence that measures evaluating preproduction ads are indicative of ad performance assessed at the postproduction phase, because these are generally equivalent or, in selected cases, lower than expected from postproduction performance.
There were a limited number of tests where findings of nonequivalence found preproduction scores to be higher than postproduction scores; however, these often occurred with emotional responses, some of which the ad did not intend to evoke. For example, The Real Cost E-cigarette campaign ad No Vape in Team: Cheer showed nonequivalence with higher levels of feeling angry in the preproduction version, and the Possessed ad from the same campaign showed nonequivalence with higher levels of amusement. Because these emotional responses were not intended to be evoked in these ads, the fact that this emotional response was lower in the final produced version indicates that there is no strong reason to suspect that the lower emotional response would compromise campaign objectives. None of the ads elicited high levels of reactance, with the mean score below 3 (on a scale of 1–5) for all ads; however, there were 3 instances of lower levels of reactance in ads assessed before production than after production (i.e., Scary Enough, The Auctioneer, and Smoke Army). One potential explanation for this finding is that messages perceived as effective often simultaneously elicit moderate levels of reactance.27 Supporting this explanation, the Scary Enough ad demonstrated higher PE for postproduction than for preproduction ads.
The second study objective was to assess the extent to which an ad’s performance relative to other ads assessed before production is diagnostic of its relative performance assessed after production by computing CCC between pre and postproduction ads for each outcome measure. CCC scores enable a descriptive assessment of the level of agreement between pre and postproduction scores to inform the understanding of how patterns of scores are related between the 2 ad groups. Findings showed that across all ME measures, CCC scores for pre and postproduction ads were positive. In general, these moderate, positive concordance correlations suggest that preproduction ads may be a good indicator of performance for postproduction ads.
This study points to additional research that could be conducted to expand this area of research. Specifically, there may be characteristics of the ads beyond the production stage (i.e., before production versus after production) that could contribute to differences in ads that were not directly accounted for in this study. Two of these differences include (1) production disparity (e.g., type and sophistication of preproduction compared with those of postproduction ads) and (2) creative deviation between preproduction and postproduction ads. The preproduction animatics included in this study ranged from basic (e.g., a screenshot with voice over) to highly produced (e.g., full-color animation) animatics. Future research could more thoroughly examine which types of preproduction ads may be best able to elicit audience reactions that would best indicate postproduction ad performance. For example, it is possible that more sophisticated animatics may be warranted for ads that are intended to elicit strong emotions from the audience (e.g., fear) but are not needed for concepts that do not rely on strong emotional responses.
In addition, this study did not assess the impact of creative change (i.e., changes in script, fact, or concept) between the pre and postproduction versions of the ad. All ad pairings in this study had at least some creative deviation between the preproduction and postproduction versions, except for Awkward Silence, with some concepts having larger creative deviations than others (Appendix Table 1, available online, describes the changes between preproduction and postproduction ads for all ads). Future studies could account for (or systematically manipulate) these and other ad characteristics that may explain some of the differences found in this study.
Limitations
This study includes a small number of pre and postproduction ads, which limit generalizability across other types of ads or comparison of coefficients across measures. This study also focuses on tobacco prevention messages specifically designed for youth, which limits generalizability across other health contexts and populations. Future research could explore equivalence testing across other contexts and intended audiences and further explore whether specific ME measures might inherently have stronger agreement across production phases or whether specific measures should be prioritized for certain types of ads. Other limitations with this study include that study participants were recruited from a nonprobability-based panel; as such, results may not be representative of U.S. youth aged 13–17 years.
CONCLUSIONS
Overall, findings from this study revealed that preproduction ad scores were generally substantively equivalent to postproduction ad scores for both The Real Cost Youth E-cigarette and Cigarette Prevention campaign ads. In cases of nonequivalence, postproduction ads generally outperformed preproduction ads, particularly for outcomes assessing message effectiveness rather than emotional responses. In addition, CCCs revealed positive associations between preproduction and postproduction ad scores, which provide evidence that the ads’ relative performance assessed before production is diagnostic of its relative performance assessed after production. Together, these results provide empirical support for using preproduction ad evaluations as a suitable proxy for postproduction MEs in copy testing in health communication messaging.
The integration of these study findings into concept development strengthens FDA’s evidence-based framework for The Real Cost campaigns. Using a concurrent mixed-methods approach during preproduction formative research offers enhanced contextual insights at earlier stages of the production timeline. This use of science-based criteria to inform timely ad selection and testing may help other public health entities to optimize resource allocation and balance logistical and methodologic considerations in the development of effective health communication messaging.
Supplementary Material
Supplemental materials associated with this article can be found in the online version at https://doi.org/10.1016/j.amepre.2025.108044.
Acknowledgments
Disclaimer:
The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Food and Drug Administration.
Declaration of interest:
None.
Footnotes
CREDIT AUTHOR STATEMENT
Emily B. Peterson: Conceptualization; Methodology; Writing – Original Draft. Matthew E. Eggers: Conceptualization; Methodology; Formal analysis; Writing – Original Draft; Visualization. Emily C. Sanders: Conceptualization; Writing – Review & Editing. Anh Nguyen Zarndt: Conceptualization; Methodology; Writing – Review & Editing. Emily McDonald: Writing – Original Draft; Writing – Review & Editing. Xiaoquan Zhao: Conceptualization; Methodology; Writing – Review & Editing. James M. Nonnemaker: Conceptualization; Methodology; Formal analysis; Writing – Review & Editing. Nicole B. Swires: Formal analysis; data curation; Project administration. Margaret G. Moakley: Formal analysis; data curation; Writing – Original Draft; Project administration
REFERENCES
- 1.U.S. Food and Drug Administration. Division A–family smoking prevention and tobacco control act. In: Public Law. Department of Health and Human Services U.S., ed.:111–131 Vol 12562009. https://www.govinfo.gov/content/pkg/PLAW-111publ31/pdf/PLAW-111publ31.pdf. Accessed September 2, 2025. [Google Scholar]
- 2.Farrelly MC, Niederdeppe J, Yarsevich J. Youth tobacco prevention mass media campaigns: past, present, and future directions. Tob Control. 2003;12(suppl 1):i35–i47. 10.1136/tc.12.suppl_1.i35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Roditis ML, Jones C, Dineva AP, Alexander TN. Lessons on addiction messages from “the Real cost” campaign. Am J Prev Med. 2019;56(2) (suppl 1):S24–S30. 10.1016/j.amepre.2018.07.043. [DOI] [PubMed] [Google Scholar]
- 4.Zhao X, Delahanty JC, Duke JC, et al. Perceived message effectiveness and campaign-targeted beliefs: evidence of reciprocal effects in youth tobacco prevention. Health Commun. 2022;37(3):356–365. 10.1080/10410236.2020.1839202. [DOI] [PubMed] [Google Scholar]
- 5.MacMonegle AJ, Smith AA, Duke J, et al. Effects of a national campaign on youth beliefs and perceptions about electronic cigarettes and smoking. Prev Chronic Dis. 2022;19:E16. 10.5888/pcd19.210332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Duke JC, Farrelly MC, Alexander TN, et al. Effect of a national tobacco public education campaign on youth’s risk perceptions and beliefs about smoking. Am J Health Promot. 2018;32(5):1248–1256. 10.1177/0890117117720745. [DOI] [PubMed] [Google Scholar]
- 7.Farrelly MC, Duke JC, Nonnemaker J, et al. Association between the Real cost media campaign and smoking initiation among youths — United States, 2014–2016. MMWR Morb Mortal Wkly Rep. 2017;66 (2):47–50. 10.15585/mmwr.mm6602a2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Duke JC, MacMonegle AJ, Nonnemaker JM, et al. Impact of the Real cost media campaign on youth smoking initiation. Am J Prev Med. 2019;57(5):645–651. 10.1016/j.amepre.2019.06.011. [DOI] [PubMed] [Google Scholar]
- 9.Crosby K, Santiago S, Talbert EC, Roditis ML, Resch G. Bringing “the Real cost” to life through breakthrough, evidence-based advertising. Am J Prev Med. 2019;56(2)(suppl 1):S16–S23. 10.1016/j.amepre.2018.08.024. [DOI] [PubMed] [Google Scholar]
- 10.Ma H, Gottfredson O’Shea N, Kieu T, et al. Examining the longitudinal relationship between perceived and actual message effectiveness: a randomized trial. Health Commun. 2024;39(8):1510–1519. 10.1080/10410236.2023.2222459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Noar SM, Barker J, Bell T, Yzer M. Does perceived message effectiveness predict the actual effectiveness of tobacco education messages? A systematic review and meta-analysis. Health Commun. 2020;35 (2):148–157. 10.1080/10410236.2018.1547675. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Davis KC, Nonnemaker J, Duke J, Farrelly MC. Perceived effectiveness of cessation advertisements: the importance of audience reactions and practical implications for media campaign planning. Health Commun. 2013;28(5):461–472. 10.1080/10410236.2012.696535. [DOI] [PubMed] [Google Scholar]
- 13.Noar SM, Rohde JA, Prentice-Dunn H, Kresovich A, Hall MG, Brewer NT. Evaluating the actual and perceived effectiveness of e-cigarette prevention advertisements among adolescents. Addict Behav. 2020;109:106473. 10.1016/j.addbeh.2020.106473. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Dillard JP, Peck E. Affect and persuasion: emotional responses to public service announcements. Commun Res. 2000;27(4):461–495. 10.1177/009365000027004003. [DOI] [Google Scholar]
- 15.Hall MG, Sheeran P, Noar SM, Ribisl KM, Boynton MH, Brewer NT. A brief measure of reactance to health warnings. J Behav Med. 2017;40(3):520–529. 10.1007/s10865-016-9821-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Parvanta S, Gibson L, Forquer H, et al. Applying quantitative approaches to the formative evaluation of antismoking campaign messages. Soc Mar Q. 2013;19(4):242–264. 10.1177/1524500413506004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Morris JD and Waine C, Managing the creative effort: pre-production and post-production measures of emotional response, Paper presented at:, In: Proceedings of the Conference of the American Academy of Advertising, 1993, 1993. https://www.adsam.com/wp-content/uploads/2020/02/Managing-Creative-Effort.pdf. Accessed September 2, 2025. [Google Scholar]
- 18.Reynolds TJ, Gengler C. A strategic framework for assessing advertising: the animatic vs. finished issue. J Advert Res. 1991;31(5):61–71. 10.1080/00218499.1991.12466792. [DOI] [Google Scholar]
- 19.Jiménez FR, Gammoh BS, Wergin R. The effect of imagery and product involvement in copy testing scores of animatics and finished ads: a schemata approach. J Mark Theor Pract. 2020;28(4):460–471. 10.1080/10696679.2020.1782231. [DOI] [Google Scholar]
- 20.Strong DR, Hartman SJ, Nodora J, et al. Predictive validity of the expanded susceptibility to smoke index. Nicotine Tob Res. 2015;17 (7):862–869. 10.1093/ntr/ntu254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Peinado S, Vigorita M, Eggers M, et al. Examining the Validity of Message Evaluation Measures in the Context of Vaping Prevention. 2025. [In preparation].
- 22.Schuirmann DJ. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bio-availability. J Pharmacokinet Biopharm. 1987;15(6):657–680. 10.1007/BF01068419. [DOI] [PubMed] [Google Scholar]
- 23.Lin LI. A concordance correlation coefficient to evaluate reproducibility. Biometrics. 1989;45(1):255–268. 10.2307/2532051. [DOI] [PubMed] [Google Scholar]
- 24.Steichen TJ, Cox NJ. A note on the concordance correlation coefficient. STATA J. 2002;2(2):183–189. 10.1177/1536867X0200200206. [DOI] [Google Scholar]
- 25.Lawrence I and Chinchilli V, Rejoinder to the letter to the editor from Atkinson and Nevill, Biometrics, 1997, 777–778. https://www.jstor.org/stable/2533979. Accessed September 2, 2025. [Google Scholar]
- 26.Altman DG. Practical Statistics for Medical Research. Boca Raton, FL: Chapman & Hall/CRC, 1990. 10.1201/9780429258589. [DOI] [Google Scholar]
- 27.Hall MG, Sheeran P, Noar SM, Ribisl KM, Bach LE, Brewer NT. Reactance to health warnings scale: development and validation. Ann Behav Med. 2016;50(5):736–750. 10.1007/s12160-016-9799-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
