Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jun 1.
Published in final edited form as: Psychol Assess. 2023 Mar 16;35(6):469–483. doi: 10.1037/pas0001231

Measuring affect in daily life: A multilevel psychometric evaluation of the PANAS-X across four ecological momentary assessment samples

Alison M Haney 1, Megan N Fleming 1, Andrea M Wycoff 1, Sarah A Griffin 2, Timothy J Trull 1
PMCID: PMC10213137  NIHMSID: NIHMS1880770  PMID: 36931821

Abstract

While there is strong evidence for the psychometric reliability of the PANAS-X in cross-sectional studies, the between- and within-person psychometric performance of the PANAS-X in an intensive longitudinal framework is less understood. As affect is thought to be dynamic and responsive to context, this study investigated the multilevel reliability of PANAS-X positive affect, negative affect, fear, sadness, and hostility. Generalizability Theory and structural equation modeling techniques (coefficient ω) were employed in four ecological momentary assessment samples (N = 309; 41,261 reports). Results demonstrate that the PANAS-X subscales, including short versions of the positive and negative affect subscales, can reliably detect between-person differences. PANAS-X subscales were also able to reliably measure within-person change, though these estimates may be impacted by scale content and study design. These results support the use of the PANAS-X in daily life research to intensively measure affect in the natural environment.

Keywords: PANAS-X, affect, mood, EMA, generalizability theory, psychometrics


Our understanding of mood and affect, as well our ability to measure emotional experiences, has long been central to the study of psychological well-being. In addition to the direct role of affective experience contributing to psychological health, affective states are associated with numerous psychological and behavioral outcomes. This includes alcohol use (Duif et al., 2019; Dvorak et al., 2016), smoking (Vinci et al., 2017), binge-eating (Schaefer et al., 2020), nonsuicidal self-injury (Dillon et al., 2021; Hepp et al., 2021), and attention deficit hyperactivity disorder symptoms (Murray et al., 2021). Importantly, affect is dynamic and can change in response to environmental and social context (Bejarano et al., 2019; Beute & de Kort, 2018; Liu et al., 2019; Mote & Fulford, 2020). Changes in affect and patterns of affective lability can be equally and incrementally informative when seeking to understand the role of affect in psychological outcomes (Trull et al., 2008, 2015). To better understand these complex, health-relevant processes, it is imperative that researchers have dynamic measurement tools that can reliably capture affect and are sensitive to affective change.

Measuring Affect in EMA

Historically, investigators have attempted to gather data on dynamic psychological processes, like affective experience, by using single-occasion retrospective self-reports. Retrospective descriptions of a person’s affective experience are often quite broad and subject to a variety of reporting and memory biases (e.g., duration neglect, peak-end rule; Fredrickson, 2000; Fredrickson & Kahneman, 1993). Furthermore, there is widespread agreement that affective or emotional experience fluctuates both within and across days and is influenced by factors that are both internal (e.g., perceptions, cognitive styles, biological) and external (e.g., context, environment) to the individual. Therefore, single-occasion and retrospective reports on affective experience are imprecise and do not reflect its dynamic nature.

For these reasons, many investigators have turned to ecological momentary assessment (EMA; or ambulatory assessment, AA) methods to characterize and study affective experience. EMA/AA involves multiple assessments of individuals’ current or very recent states or behaviors over time in the real-world, where individuals are free to choose their own environments and contexts and can rate affective states as they are experienced. Intuitively, the argument for obtaining data on persons in real time or near real time is compelling, but there has been relatively little research on the psychometric properties of the items and scales used in EMA/AA assessment (Fisher & To, 2012; Shrout & Lane, 2012; Stone & Shiffman, 2002; Trull & Ebner-Priemer, 2020; Wright & Zimmermann, 2019). Commonly, researchers select items from a larger cross-sectional measure and adapt the instructions to fit the desired time frame (e.g., “over the last 15 minutes”). However, despite the convenience, one cannot assume that cross-sectional measures will retain original, or even similar, psychometric properties when administered repeatedly (e.g., several times per day) over shorter intervals (e.g., in the last 15 minutes). For example, research examining the intensive longitudinal properties of the five-factor model of personality has shown differences in between- and within-person structures (Borkenau & Ostendorf, 1998; Ringwald et al., 2020).

This relative neglect of psychometric evaluation of questionnaires for use in EMA/AA may be due to a general lack of familiarity with statistical methods to assess psychometric properties of repeated longitudinal data. Fortunately, options exist for addressing this issue (Calamia, 2019; Cranford et al., 2006; Fisher & To, 2012; Geldhof et al., 2014; Shrout & Lane, 2012). Several procedures for evaluating the multilevel reliability of scales used in EMA/AA research have been outlined, both between-subjects (Geldhof et al., 2014; Shrout & Lane, 2012), and within-subject (i.e., across time), including reliability of change scores within-subject (Cranford et al., 2006).

Measuring Affect with the PANAS-X

In this paper, we focus on the reliability of scales and subscales from the Positive and Negative Affect Schedule – Expanded Form (PANAS-X; Watson & Clark, 1999), a measure commonly used in EMA/AA studies to assess affective experience. In a recent systematic review of 234 studies using EMA for momentary assessment of mood symptoms, more than a quarter of all included studies used a version of the PANAS (Hall et al., 2021).The PANAS-X was designed to capture the valence of affect (e.g., positive affect [PA] and negative affect [NA]) at higher scale dimensions and specific affect contents in lower dimensions (e.g., anger, sadness, joviality, attentiveness). Studies examining the psychometric properties of the PANAS-X using different time course instructions tapping into person-level (e.g., “in general,” past year, past month) and state-level (e.g., past few days, today, right now) affect generally support the psychometric utility of the measure in a wide range of populations and languages (Cotigă, 2012; Merz et al., 2013; Terracciano et al., 2003; von Humboldt et al., 2017; Watson et al., 1988; Watson & Clark, 1999). For example, Watson and Clark (1999) reported excellent internal consistency of the PANAS-X in data from thousands of participants across time courses (α=0.83–0.90 for NA; 0.84–0.91 for PA).

Additional work has found that person-level test-retest correlations for the PANAS are high when reporting on affect “in general” (0.73 for NA and 0.76 for PA), though test-retest correlations at the state-level are more moderate when reporting on momentary affect (i.e., “right now”, NA = 0.52, PA = 0.65; Terracciano et al., 2003). The strength of these within-person test-retest correlations may vary, however, depending on the time course assessed, as one study found lower test-retest correlations when assessing over the “past few days” (0.43 for NA; 0.39 for PA; Cotigă, 2012). Although most are familiar with measures of scale reliability in cross-sectional studies (e.g., coefficient alpha), the reliability of scales used in EMA/AA requires a different approach, one that estimates reliability of a scale within persons and reliability of a scale between persons.

Methods of Multilevel Reliability Assessment

Generalizability Theory

One framework for assessing the reliability of the PANAS-X for use in EMA/AA is Generalizability Theory. Generalizability Theory (GT) is an extension of classical test theory where multiple sources of variance can be partitioned simultaneously (rather than a single source as in Spearman-Brown approaches; Cronbach et al., 1972; Shavelson & Webb, 1991). Previous work has detailed how this approach can be applied to determine the reliability of multi-item measures of affect in daily life research (Cranford et al., 2006; Revelle & Condon, 2018; Shrout & Lane, 2012). To summarize, there are two analytic steps when employing GT, the first of which is commonly referred to as the generalizability study (G study). The “G Study” involves decomposing systematic and error variance in measurement using multilevel linear regression (Cranford et al., 2006). A researcher determines what components to include in this variance decomposition model based on their study design. In a survey-based EMA design using a measure of affect like the PANAS-X, such components commonly include item, person, and day of study as potential sources measurement variance (Cranford et al., 2006). This is represented in Equation 1, where Mijk is the measure for item i for person j at time k.

Mijk=μ+Ii+Pj+Tk+IPij+ITik+PTjk+IPTijk+eijk (1)

By estimating these components simultaneously, the G study allows researchers to consider multiple sources of variation in measurement. This includes between-person variation (Pj), as well as how individuals may respond differently to specific items (item*person, IPij) or at different times (person*time, PTjk). Equation 1 represents time as crossed with person (person*time), which assumes that times and items can be meaningfully distinguished or ordered. For example, suppose a researcher studies a sample of students the week before an exam, such that every student responds to the same items at the same times on the same days. The researcher may treat time and items as crossed in order to look at the trajectory of an outcome like stress over the set number of days before the exam. In many EMA research designs, data is not collected at the same time and frequency across participants, and considering time as nested within person, rather than crossed, may be more appropriate (Shrout & Lane, 2012). In this case, G study variance estimates would be derived from a nested linear model (i.e., time nested within person; time[person]). A nested model would differ from Equation 1 by estimating only three nested variance components, person, time(person), and an error component, with time and items treated as random effects for each of the components (for a description of nested models, see Nezlek & Gable, 2001).

The next step in a GT framework is the “D Study”, or “decision” study, which estimates reliability as a proportion of variance using the variance components from the G Study. GT can produce multiple forms of between- and within-person reliability estimates that are based on researcher-defined variance components, and can flexibly accommodate multiple nested (e.g., day of study within person) and crossed (e.g., developmental age range; Hamrick et al., 2020) components of variance. The reliability estimates produced (commonly including RkF, RkR, RkRN, RC, RCN) provide information about the circumstances under which a measure may be reliable.

For example, suppose a researcher gives a sample of participants a measure every day for 7 days. RkF (Reliability over k Fixed days) represents the reliability of that measure if the researcher wanted to take an average score for the week (k = 7) and look at between-person differences. RkF assumes individuals are measured over the same (fixed) days. In this example, this would mean all participants in the sample started the week of data collection on the same day. In EMA research, it is often the case that the data collection period is not the same for all participants (e.g., different start days due to rolling recruitment). In this case, RkR (Reliability over k Random days) is an estimate of the between-person reliability across an average (k) number of days. RkF and RkR treat items and times as crossed, which assumes a meaningful order or structure for these variables (e.g., measuring response to a psychological intervention across 7 distinct sessions in a set order). However, in many EMA/AA studies, such as event-contingent designs, treating occasions as nested within individuals (rather than crossed) may be more appropriate, as the timing and frequency of reports can vary. In this case, the researcher would use RkRN (Reliability over k Random Nested days) to estimate between-person reliability. Finally, the researcher may be interested in within-person changes in the measure over the 7 days of the study. Depending on whether their study design is best represented by crossed or nested treatment of time, RC (Reliability of Change) and RCN (Reliability of Change when time is Nested) represent the measure’s ability to reliably detect systematic within-person change over the course of the week.

Coefficient Omega

Like Generalizability Theory, coefficient omega (ω) offers several advantages over classical test theory (α) for EMA/AA researchers (for a comparison of elements in GT and ω and a review of multilevel confirmatory factor analysis approaches see Geldhof et al., 2014). Limitations of the GT approach are that it relies on the assumptions that items are parallel (equally related to true score) and that items do not have differing error distributions (Shrout & Lane, 2012). Violations of these assumptions in the data may lead to misestimation of reliability. For example, reliability can be underestimated in a GT framework if the assumption that item loadings are equal is violated (Lane & Shrout, 2010). In contrast, between- and within-person omegas are calculated using a multilevel structural equation modeling approach, and therefore items are not assumed to be parallel and can have different error distributions (McDonald, 1999). This means that factor loadings for each item are freely estimated at both levels, allowing for different item factor structures within versus between subjects, rather than constraining these factor loadings to equality as in GT approaches. Equation 2 demonstrates how the classic reliability ratio is extended to produce ω, where λi2 represents the sum of the factor loadings for all items, and Ψi2 the sum of the variance of the errors in estimation for those items. When applied to EMA data with multiple levels, this approach employs multilevel confirmatory factor analysis to estimate reliability both between-person and within-person (Geldhof et al., 2014).

ω=λi2λi2+Ψi2 (2)

Coefficient ω offers an approach based on fewer assumptions than GT that may better accommodate EMA data structures. However, it has been suggested that a small number of within-person observations may lead to over-estimation of between-person reliability (ω) under some circumstances (Geldhof et al., 2014; Huang & Weng, 2012). Depending on the specific type of reliability metric of interest to the researcher and the underlying structure of the data within a particular research design, both GT estimates and 𝜔 can provide important information about how reliably the PANAS-X performs under different circumstances.

The Present Study

To evaluate the reliability of the PANAS-X when used in daily life research, we utilized data from four EMA studies and assessed reliability using both Generalizability Theory and coefficient ω for each PANAS-X subscale. To determine how the PANAS-X may perform under different circumstances, in each sample we calculated several reliability estimates that make distinct assumptions about the underlying data structure (e.g., nested vs. crossed, fixed vs. random time). Additionally, we examined the reliability of shorter versions of PANAS-X subscales to assess whether subscales requiring less participant time could reliably be used in EMA/AA research.

Methods

Participants

Participants for this study were drawn from four separate samples, all recruited from outpatient psychiatric clinics and the surrounding community in central Missouri. Though considerable overlap exists between eligibility criteria, inclusion and exclusion criteria varied slightly between studies (see Table 1). Complete demographic information on each of these samples has previously been reported (see Table 2 for relevant citations), and information on age, gender, and ethnic composition for each sample is summarized in Table 3. Study 1 recruited two groups of individuals: one group met DSM-IV criteria for borderline personality disorder (BPD) including the criterion of affective instability (Study 1 BPD group), and the other group met DSM-IV criteria for current major depressive disorder or dysthymia but not for BPD or affective instability (Study 1 DD group). Study 2 also recruited two groups of participants, one group meeting DSM-IV criteria for BPD including the criterion of affective instability (Study 2 BPD group), and the other group included community controls that did not meet criteria for BPD or report history of affective instability (Study 2 COM group). Study 3 and Study 4 each recruited participants that met criteria for current emotional distress disorders (i.e., depressive disorder, anxiety disorder, and/or borderline personality disorder). Studies 2 and 3 included alcohol-specific inclusion/exclusion criteria. Study 4 also included an fMRI component, leading to additional inclusion/exclusion criteria.

Table 1.

Comparison of Eligibility Criteria Across Samples 1 – 4

Sample 1 Sample 2 Sample 3 Sample 4
Inclusion Criteria 18 to 65 years old
Met DSM-IV criteria for BPD
Met DSM-IV criteria for current major depressive disorder and/or dysthymia
18 to 45 years old
Reported drinking alcohol avg. once per week over the past month
Met DSM-IV criteria for BPD (BPD only)
Receiving outpatient psychotherapy (BPD)
18 to 45 years old
Reported drinking alcohol at least 2 times per week
Met DSM-5 criteria for a mood, anxiety, or borderline personality disorders
Receiving outpatient psychotherapy
18 to 45 years old Women
Met DSM-5 criteria for a mood, anxiety, or borderline personality disorders
Receiving outpatient psychotherapy
Exclusion Criteria Current psychosis
Intellectual disability
History of neurological dysfunction or severe head trauma
DSM-IV diagnostic criteria for severe substance use disorder DSM-IV diagnostic criteria for Mania
Co-Occurring BPD (DD only)
Current psychosis
Intellectual disability
History of neurological dysfunction or severe head trauma
Currently in or seeking treatment for alcohol use or alcohol problems
Being pregnant or planning to become pregnant
Reported emotion dysregulation (COM only)
Meeting DSM-IV criteria for BPD (COM)
Responding to BPD-targeted advertisements regardless of meeting criteria (COM)
History of head trauma
Current treatment for substance use or substance-related problems
Being pregnant or planning to become pregnant
History of head trauma
Diagnosed with cystic fibrosis or diabetes
Contraindication for fMRI procedures
Being pregnant or planning to become pregnant
Left-handed

Note: BPD=Borderline Personality Disorder; DD=Depressive Disorder

Table 2.

Comparison of Study Procedures Across Studies 1 – 4

Study 1 Study 2 Study 3 Study 4
Study Procedures
 Recruitment Method Outpatient psychiatric clinics Outpatient psychiatric clinics, circulars, message boards. Separate ads were targeted for individuals with BPD and community volunteers. Outpatient psychiatric clinics and a university-wide email news bulletin Outpatient psychiatric clinics and a university-wide email news bulletin
 Diagnostic Interviews1 SIDP-IV, SCID-I SIDP-IV, SCID-I SIDP-IV, M.I.N.I. 7.0 SIDP-IV, M.I.N.I. 7.0
 Max Compensation2 $200 $180 $250 $400
EMA Procedure
 Device Used Palm Pilots (Palm Zire 31) Palm Pilots (Palm Tungsten E2 © handheld computer) Motorola Droid MAXX, Android 4.4.4 smartphone Motorola Droid MAXX, Android 4.4.4 smartphone
 # of Days 28 21 73 14
 # of Prompts/Day 6 7 or more 7 or more 7 or more
EMA Reports
 Random reports Administered at equal intervals 6 times per day, not within 30 min of another report 6 times per day, not within 30 min of another report 6 times per day, not within 30 min of another report
 Self-initiated reports Not applicable Report drinking, self-harm, cigarette smoking Report drinking, change in mood, or alcohol craving Report change in mood
 Follow-up reports Not applicable 30, 60, 120, and 180 min after self-initiated report of drinking, self-harm 30, 60, and 120 min after self-initiated or random report of drinking Not applicable
Reference for procedure details Trull et al., 2008; Hepp et al., 2021 Lane et al., 2016; Carpenter et al., 2017 Trull et al., 2022; Griffin et al., 2021 Trull et al., 2022; Griffin et al., 2021

1Structured Interview for DSM-IV Personality = SIDP-IV, Structured Clinical Interview for DSM-IV Axis I = SCID-I, Mini-International Neuropsychiatric Interview for DSM-5 = M.I.N.I. 7.0.

2Compensation varied based on participant compliance to EMA protocols

3While additional days with EMA reports were collected, only the first 7 days were consecutive

Table 3.

Summary of Sample Demographics

Sample 1
Sample 2
Sample 3
Sample 4
BPD DD BPD COM All All
N 80 51 56 60 30 32
Total Reports 11,822 7,496 7,562 8,327 3,504 2,550
Age (Mean) 32.1 34.5 26.0 26.7 24.0 24.9
Ethnicity (%)
African American 6.3 7.8 5.4 6.7 10.0 6.3
Hispanic/Latinx 2.5 3.9 1.8 3.3 3.3 3.1
European American 85.0 88.2 83.9 85.0 90.0 87.5
Native American 1.3 0.0 0.0 0.0 0.0 3.1
Asian American 2.5 0.0 1.8 3.3 0.0 0.0
Other 2.5 0.0 7.1 1.7 0.0 0.0
Gender (%)
Men 7.5 21.6 17.9 25.0 23.3 0.0
Women 92.5 78.4 82.1 75.0 73.3 100.0
Nonbinary/Other 0.0 0.0 0.0 0.0 3.3 0.0

Measures

Participants completed a variety of affect items in survey prompts administered across samples, including those from the PANAS-X (Watson & Clark, 1999). For the PANAS-X items, participants were asked to rate the degree to which they felt the following 21 negative affect items: “afraid,” “ashamed,” “distressed,” “guilty,” hostile,” “irritable,” “jittery,” “nervous,” “scared,” “upset,” “frightened,” “shaky,” “angry,” “scornful,” “disgusted,” “loathing,” “sad,” “blue,” “downhearted,” “alone,” and “lonely.” In addition, participants were also presented with the following 10 positive affect items: “active,” “alert,” “attentive,” “determined,” “enthusiastic,” “excited,” “inspired,” “interested,” “proud,” and “strong.”

Ratings were obtained using a 5-point scale (1 = very slightly or not at all, 2 = a little, 3 = moderately, 4 = quite a bit, 5 = extremely). From these items, subscales for anger, sadness, and hostility were also derived and examined, along with five-item versions of the positive and negative affect scales used in the International Positive and Negative Affect Schedule PANAS Short Form (I-PANAS-SF; Thompson, 2007). A complete list of items used in each subscale can be found in Supplemental Table 1. In Studies 2, 3, and 4 participants rated the degree to which they experienced each item in the past 15 minutes, while Study 1 asked participants to rate according to their experience since the last prompt.

Procedure

A summary of procedures for Studies 1–4 is provided in Table 2. All studies received approval from the University of Missouri Institutional Review Board. All participants completed diagnostic interviews to determine eligibility, including the Structured Interview for DSM-IV Personality (SIDP-IV; Pfohl et al., 1997) for personality pathology, and the Structured Clinical Interview for DSM-IV (SCID; First et al., 1995) or the Mini-International Neuropsychiatric Interview 7.0 (MINI; Sheehan et al., 1998) for other diagnoses. Participants across samples completed an initial study visit to complete informed consent, baseline assessment, and EMA training procedures. Studies 3 and 4 included an additional visit, during which participants completed psychophysiological (Study 3) or fMRI (Study 4) assessment and then were trained on completing EMAs. Though the number of lab visits varied between samples, participants across samples visited the lab once per week so study staff could download data from study devices, monitor compliance, and provide feedback on ways to boost compliance.

Analytic Plan

Generalizability Theory

Within a generalizability theory (GT) framework, six estimates of reliability were computed for each subscale of the PANAS-X for each sample (Revelle & Condon, 2018). Estimates were computed using the psych (Revelle, 2020) and lme4 (Bates et al., 2015) packages in R (R Core Team, 2021). Reliability estimates are represented in Equations 37, where m represents the number of items in each scale, k the number of days in the study, and days are treated either as random (subscript R) or fixed (subscript F). Reliability coefficient RkF (Equation 3) represents the stability of between-person differences in a scale when averaged across items and days. RkR (Equation 4) represents reliability when taking the average of k time points across all items. RC (Equation 5) represents the within-person reliability of change when time and items are fixed. Although these first three reliability estimates assume that items and days can be distinguished (e.g., person and day are crossed [person*day] in Equation 4), the final two reliability estimates (RkRN and RCN) treat days as nested within person, as is often the case in EMA studies examining affective fluctuation and change using the PANAS-X. RkRN represents the generalizability of between-person differences averaged over days, and RCN represents the generalizability of within-person variations averaged over items. While RkR and RkRN are both measures of between-person reliability, and RC and RCN are both measures of within-person reliability, RkRN and RCN differ from RkR and RC by treating time as nested (denoted by subscript N; Equations 6 & 7).

RkF=σPerson2+σPerson*Item2/mσPerson2+σPerson*Item2/m+σError2/k*m (3)
RkR=σPerson2+σPerson*Item2/mσPerson2+σPerson*Item2/m+σDay2+σPerson*Day2/k+σError2/k*m (4)
RC=σPerson*Day2σPerson*Day2+σError2/m (5)
RkRN=σPerson2σPerson2+σDayPerson2/k+σError2/k*m (6)
RCN=σDayPerson2σDayPerson2+σError2/m (7)

Coefficient Omega

Between- and within-person coefficient omegas (ω) were computed for each PANAS-X subscale using the multilevelTools (Wiley, 2020) and lavaan (Rosseel, 2012) packages in R. Between- and within-person ω were computed using a multilevel confirmatory factor analysis approach outlined by Geldhof and colleagues (2014). Unlike GT estimates derived using linear mixed modeling, ω was estimated using structural equation modeling which allowed separate items (e.g., “active” and “alert”) to have different error distributions and relate differentially to the underlying construct (e.g., positive affect; Shrout & Lane, 2012).

Supplemental Validity Analyses

Correlations among each of the affect subscales (10- and 5-item positive and negative affect, fear, hostility, and sadness) and related constructs also were computed. Using data from Studies 1 and 2, we conducted bivariate correlations with each PANAS-X/I-PANAS-SF subscale and the Beck Depression Inventory (BDI; Beck et al., 1961) and the Neuroticism and Extraversion domains of the Revised NEO Personality Inventory (NEO-PI-R; Costa & McCrae, 2008).These measures have been used in prior work to evaluate the concurrent and discriminant validity of subscales of the PANAS-X (e.g., Terracciano et al., 2003).

To create variables representing individuals’ mean levels of 10- and 5-item positive affect, 10- and 5-item negative affect, fear, hostility, and sadness while in the study, we first took the average of individuals’ subscale scores for observations collected from a given participant on a given day (day-level mean). Then we averaged day-level means for a given participant to create variables representing individuals’ overall average level of each subscale during the study.

Across all four samples, we also examined bivariate correlations between PANAS subscale scores collected at the momentary level with participants’ momentary EMA reports of drinking (e.g., a dichotomous variable representing whether participants responded “yes” or “no” to the item, “Have you consumed/used alcohol since the last recording/beep you answered?”).

This study was not preregistered. Data and study materials for the included samples are not available online.

Results

Across all four studies, a total of 41,261 reports were collected from 309 individuals. Multilevel reliabilities for Sample 1 are reported in Table 4, results for Sample 2 are reported in Table 5, and results for Samples 3 and 4 are reported in Table 6. Variance estimates (ϭ2) and percentages for each component used to compute GT reliability estimates are reported in Tables 79. To interpret reliabilities, we used classical test theory conventions (i.e., 0.00–0.10 = virtually none, 0.11–0.40 = slight, 0.41–0.60 = fair, 0.61–0.80 = moderate, 0.81–1.00 = substantial; Shrout, 1998). In general, between-person reliability estimates were higher than within-person reliabilities. While complete results are presented in Tables 46, we will emphasize comparisons between a) within-person ω and RCN and b) between-person ω and RkRN, as these components have similar interpretations across methods and may be most relevant to typical EMA/AA designs.

Table 4.

Multilevel Reliabilities for Sample 1

Sample 1
PA (10) PA (5) NA (10) NA (5) Fear Hostility Sadness
RkF .99 .97 .99 .99 .99 .99 .99
RkR .98 .97 .98 .97 .98 .97 .98
RC .72 .57 .76 .63 .71 .69 .70
RkRn .97 .97 .98 .97 .98 .97 .98
RCN .65 .50 .71 .55 .63 .63 .65
Within ω .89 .83 .87 .77 .83 .86 .85
Between ω .96 .92 .96 .91 .93 .95 .94
Sample 1 - BPD
PA (10) PA (5) NA (10) NA (5) Fear Hostility Sadness

RkF .99 .99 .99 .99 .99 .99 .99
RkR .97 .97 .98 .98 .98 .97 .98
RC .72 .58 .78 .65 .73 .72 .71
RkRn .97 .96 .98 .97 .98 .97 .98
RCN .66 .51 .73 .59 .67 .65 .68
Within ω .90 .83 .88 .79 .84 .87 .85
Between ω .96 .92 .97 .95 .94 .96 .96
Sample 1 - MDD
PA (10) PA (5) NA (10) NA (5) Fear Hostility Sadness

RkF .99 .99 .99 .98 .99 .99 .99
RkR .98 .97 .97 .96 .98 .96 .98
RC .71 .56 .73 .59 .67 .64 .69
RkRn .98 .97 .97 .96 .98 .96 .97
RCN .63 .49 .65 .48 .56 .57 .62
Within ω .88 .82 .85 .75 .80 .83 .85
Between ω .96 .92 .92 .82 .91 .92 .91

Note. RkF = Between-person reliability when averaged across measurement times (fixed); RkR = Between-person reliability when averaged across measurement times (random); RC = Within-person reliability of change; RkRN = Between-person reliability when averaged across measurement times (random, nested); RC = Within-person reliability of change (nested); Within ω = Within-person omega; Between ω = Between-person omega. BPD=Borderline Personality Disorder; DD=Depressive Disorder

Table 5.

Multilevel Reliabilities for Sample 2

Sample 2
PA (10) PA (5) NA (10) NA (5) Fear Hostility Sadness
RkF .99 .98 .99 .99 .99 .99 .99
RkR .97 .96 .98 .97 .98 .97 .97
RC .66 .47 .68 .54 .60 .58 .66
RkRn .97 .96 .98 .97 .97 .97 .97
RCN .59 .40 .62 .47 .52 .52 .63
Within ω .87 .78 .83 .72 .76 .82 .85
Between ω .96 .90 .96 .94 .93 .95 .98
Sample 2 - BPD
PA (10) PA (5) NA (10) NA (5) Fear Hostility Sadness

RkF .99 .98 .99 .98 .99 .99 .99
RkR .97 .96 .97 .96 .97 .96 .96
RC .68 .48 .70 .57 .64 .59 .67
RkRn .97 .96 .97 .96 .97 .96 .96
RCN .61 .41 .64 .49 .54 .52 .64
Within ω .87 .77 .84 .73 .76 .83 .85
Between ω .97 .94 .95 .93 .92 .94 .97
Sample 2 - COM
PA (10) PA (5) NA (10) NA (5) Fear Hostility Sadness

RkF .99 .98 .97 .94 .95 .95 .95
RkR .97 .96 .92 .90 .92 .90 .88
RC .64 .46 .59 .42 .44 .52 .62
RkRn .97 .96 .92 .89 .91 .90 .87
RCN .56 .38 .56 .40 .40 .50 .60
Within ω .88 .80 .81 .68 .73 .81 .84
Between ω .95 .88 .91 .88 .83 .93 .92

Note. RkF = Between-person reliability when averaged across measurement times (fixed); RkR = Between-person reliability when averaged across measurement times (random); RC = Within-person reliability of change; RkRN = Between-person reliability when averaged across measurement times (random, nested); RC = Within-person reliability of change (nested); Within ω = Within-person omega; Between ω = Between-person omega. BPD=Borderline Personality Disorder; DD=Depressive Disorder

Table 6.

Multilevel Reliabilities for Samples 3 and 4

Sample 3
PA (10) PA (5) NA (10) NA (5) Fear Hostility Sadness
RkF .97 .93 .89 .79 .86 .90 .88
RkR .91 .88 .70 .64 .69 .73 .69
RC .66 .45 .71 .52 .64 .69 .69
RkRn .90 .88 .69 .63 .69 .73 .69
RCN .61 .40 .65 .48 .59 .63 .68
Within ω .89 .79 .84 .73 .77 .86 .85
Between ω .96 .94 .81 .79 .77 .89 .93
Sample 4
PA (10) PA (5) NA (10) NA (5) Fear Hostility Sadness
RkF .98 .96 .97 .93 .97 .95 .96
RkR .94 .91 .93 .89 .94 .90 .90
RC .69 .51 .61 .42 .54 .53 .64
RkRn .94 .91 .93 .88 .94 .89 .90
RCN .64 .44 .54 .35 .46 .46 .59
Within ω .92 .87 .85 .73 .82 .82 .88
Between ω .96 .89 .95 .88 .97 .92 .89

Note. RkF = Between-person reliability when averaged across measurement times (fixed); RkR = Between-person reliability when averaged across measurement times (random); RC = Within-person reliability of change; RkRN = Between-person reliability when averaged across measurement times (random, nested); RC = Within-person reliability of change (nested); Within ω = Within-person omega; Between ω = Between-person omega

Table 7.

Variance Decomposition and % for Sample 1

Sample 1
PA (10) PA (5) NA (10) NA (5) Fear Hostility Sadness
Crossed ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 %
 Person 0.31 23 0.34 23 0.28 27 0.27 26 0.34 32 0.26 26 0.52 37
 Day 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Item 0.10 7 0.08 6 0.04 4 0.05 5 0.04 4 0.05 6 0.00 0
 Person x Day 0.17 12 0.19 13 0.15 14 0.15 15 0.16 15 0.16 16 0.24 17
 Person x Item 0.13 10 0.14 10 0.13 12 0.13 12 0.13 12 0.09 9 0.13 9
 Day x Item 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Residual 0.66 48 0.70 48 0.46 44 0.44 42 0.39 37 0.43 43 0.50 36
 Total 1.38 100 1.45 100 1.06 100 1.04 100 1.06 100 0.99 100 1.40 100
Nested
 Person 0.32 24 0.36 25 0.29 28 0.29 29 0.36 35 0.27 28 0.54 39
 Day(Person) 0.16 12 0.18 12 0.15 14 0.15 14 0.15 15 0.16 16 0.23 17
 Residual 0.88 64 0.90 62 0.62 58 0.59 57 0.54 51 0.56 56 0.62 44
 Total 1.37 100 1.44 100 1.06 100 1.03 100 1.05 100 0.98 100 1.40 100
Sample 1 - BPD
Crossed ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 %
 Person 0.29 21 0.31 21 0.37 31 0.34 31 0.38 34 0.33 29 0.58 41
 Day 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Item 0.10 7 0.08 6 0.04 3 0.04 4 0.05 4 0.06 5 0.00 0
 Person x Day 0.18 13 0.20 14 0.17 15 0.17 15 0.18 16 0.19 17 0.24 17
 Person x Item 0.13 9 0.14 10 0.11 9 0.10 9 0.10 9 0.10 9 0.10 7
 Day x Item 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Residual 0.69 50 0.73 50 0.48 41 0.45 41 0.40 36 0.46 40 0.50 35
 Total 1.39 100 1.47 100 1.18 100 1.12 100 1.12 100 1.14 100 1.42 100
Nested
 Person 0.30 22 0.34 24 0.38 32 0.37 33 0.40 36 0.35 31 0.59 42
 Day(Person) 0.18 13 0.19 13 0.17 15 0.17 15 0.18 16 0.19 17 0.24 17
 Residual 0.91 66 0.92 63 0.63 53 0.58 52 0.53 48 0.60 53 0.58 41
 Total 1.39 100 1.46 100 1.17 100 1.11 100 1.11 100 1.14 100 1.42 100
Sample 1 - MDD
Crossed ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 %
 Person 0.36 26 0.37 26 0.15 17 0.15 17 0.29 29 0.13 18 0.43 32
 Day 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Item 0.10 8 0.08 6 0.04 4 0.07 7 0.04 4 0.05 6 0.01 0
 Person x Day 0.15 11 0.17 12 0.11 13 0.12 13 0.12 12 0.11 15 0.23 17
 Person x Item 0.14 10 0.14 10 0.15 17 0.17 18 0.16 17 0.07 10 0.18 14
 Day x Item 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Residual 0.61 45 0.66 46 0.43 49 0.41 45 0.36 37 0.38 51 0.51 38
 Total 1.35 100 1.43 100 0.89 100 0.92 100 0.97 100 0.75 100 1.36 100
Nested
 Person 0.37 27 0.40 28 0.17 19 0.19 20 0.31 32 0.14 19 0.47 34
 Day(Person) 0.14 11 0.16 11 0.11 13 0.11 12 0.12 12 0.11 15 0.22 16
 Residual 0.83 62 0.85 60 0.61 68 0.61 67 0.54 56 0.49 66 0.68 49
 Total 1.35 100 1.42 100 0.89 100 0.91 100 0.97 100 0.74 100 1.37 100

Note: BPD=Borderline Personality Disorder; DD=Depressive Disorder

Table 9.

Variance Decomposition and % for Samples 3 and 4

Sample 3
PA (10) PA (5) NA (10) NA (5) Fear Hostility Sadness

Crossed ϭ2 % ϭ2 % ϭ2 % ϭ2 % ϭ2 % ϭ2 % ϭ2 %
 Person 0.32 22 0.32 22 0.04 6 0.03 5 0.04 8 0.08 10 0.09 11
 Day 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Item 0.05 3 0.08 5 0.03 5 0.03 4 0.02 3 0.04 6 0.00 0
 Person x Day 0.16 11 0.14 10 0.10 15 0.09 15 0.08 18 0.15 20 0.21 26
 Person x Item 0.13 9 0.10 6 0.07 11 0.05 9 0.04 9 0.06 8 0.03 4
 Day x Item 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Residual 0.81 55 0.85 57 0.42 63 0.40 67 0.29 62 0.42 56 0.45 58
 Total 1.47 100 1.48 100 0.66 100 0.60 100 0.46 100 0.76 100 0.79 100
Nested
 Person 0.33 23 0.34 23 0.05 7 0.04 7 0.04 9 0.09 12 0.10 12
 Day(Person) 0.15 10 0.14 9 0.10 15 0.08 14 0.08 18 0.15 20 0.20 26
 Residual 0.98 67 1.00 68 0.52 78 0.46 78 0.33 73 0.52 69 0.48 62
 Total 1.47 100 1.47 100 0.66 100 0.59 100 0.45 100 0.75 100 0.78 100
Sample 4
Crossed ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 %
 Person 0.25 20 0.21 15 0.11 13 0.07 9 0.14 20 0.09 12 0.17 17
 Day 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Item 0.06 5 0.09 7 0.05 5 0.05 7 0.05 7 0.07 10 0.00 0
 Person x Day 0.16 12 0.16 11 0.08 9 0.07 9 0.07 10 0.08 11 0.18 18
 Person x Item 0.11 9 0.14 10 0.10 12 0.09 13 0.08 12 0.05 7 0.12 12
 Day x Item 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Residual 0.71 55 0.76 56 0.50 60 0.45 62 0.35 51 0.42 59 0.52 53
 Total 1.29 100 1.35 100 0.83 100 0.72 100 0.68 100 0.71 100 0.99 100
Nested
 Person 0.26 20 0.24 18 0.12 15 0.09 12 0.15 22 0.10 14 0.19 19
 Day(Person) 0.15 12 0.15 11 0.07 9 0.06 9 0.07 10 0.07 11 0.18 18
 Residual 0.87 68 0.95 71 0.63 76 0.56 79 0.46 68 0.53 76 0.62 63
 Total 1.29 100 1.34 100 0.83 100 0.71 100 0.67 100 0.70 100 0.99 100

Positive Affect

Within-person ω for the 10-item positive affect scale was substantial across all samples (range 0.87–0.92). For the 5-item positive affect scale, within-person ω ranged from moderate to substantial (0.77–0.87). For the 10-item positive affect scale, RCN ranged from fair to moderate (0.56–0.66). RCN for the 5-item positive affect scale ranged from slight to fair (0.38–0.51). For both the 10- and 5-item positive affect scales, between-person ω and RkRN were substantial (>0.88).

Negative Affect

Within-person ω was substantial for the 10-item negative affect scale (range 0.81–0.88), and moderate for the 5-item negative affect scale (0.68–0.79). For the 10-item negative affect scale RCN ranged from fair to moderate (0.54–0.73), while RCN ranged from slight to fair for the 5-item version (0.35–0.59). Between-person ω was substantial for the 10-item negative affect scale (0.81–0.97) and ranged from moderate to substantial for the 5-item scale (0.79–0.95). RkRN ranged from moderate to substantial for both the 10- and 5-item version (10-item: 0.69–0.98; 5-item: 0.63–0.97).

Fear, Hostility, and Sadness

Within-person ω for the 6-item fear scale was moderate to substantial (range 0.73–0.84), and RCN ranged from slight to moderate (0.40–0.67). Between-person ω and RkRN for fear ranged from moderate to substantial (ω: 0.77–0.97; RkRN: 0.69–0.98). For the 6-item hostility scale, within-person ω was substantial (0.81–0.87), and RCN ranged from fair to moderate 0.46–0.65. Between-person ω for hostility was substantial (0.89–0.96), and RkRN ranged from moderate to substantial (0.73–0.97). Within-person ω for the 5-item sadness scale was substantial (0.84–0.88), and RCN ranged from fair to moderate (0.59–0.68). For sadness, between-person ω was substantial (0.89–0.98), and RkRN ranged from moderate to substantial (0.69–0.98).

Supplemental Validity Analyses

Results from supplemental analyses examining correlations among the affect subscales and depressive symptoms, Neuroticism, Extraversion, and alcohol use are presented in Supplemental Tables 2-4. The 10- and 5-item versions of the positive and negative affect scales showed nearly identical patterns of correlation with related constructs. Both versions of negative affect were positively associated with depressive symptoms (5-item r = 0.61; 10-item r =0.64), Neuroticism domain scores (5-item r = 0.48; 10-item r = 0.51), and all Neuroticism facet scores. Both negative affect subscales were negatively associated with positive emotions (5-item r = −0.25; 10-item r = −0.26) and drinking occurrence (rs = −0.05). The 10- and 5-item versions of positive affect were negatively associated with Neuroticism domain scores (5-item r = −0.14; 10-item r = −0.17), and the depression and self-consciousness facets. Both positive affect subscales were positively associated with activity (5-item r = 0.22; 10-item r = 0.26) and drinking occurrence (5-item r = 0.05; 10-item r = 0.08). Fear, hostility, and sadness subscales were all positively associated with depressive symptoms (r range = 0.59 to 0.66), Neuroticism domain scores (r range = 0.46 to 0.48), and all Neuroticism facets. The fear, hostility, and sadness subscales were all negatively associated with drinking occurrence (r range = −0.06 to −0.04).

Discussion

We examined the reliability of the PANAS-X subscales in an ecological momentary assessment framework using multiple samples and forms of reliability estimation. Overall, our findings support previous work demonstrating the psychometric strength of the PANAS-X and extend this work by identifying the circumstances under which the PANAS-X can be used reliably as an intensive repeated measure. As anticipated based on the assumptions made by each approach, Generalizability Theory estimates, particularly when time was treated as nested, were generally lower than coefficient ω estimates. The PANAS-X showed particularly strong between-person reliability, while within-person reliability varied depending on scale content and length.

Reliability of Positive and Negative Affect Subscales

We examined both 10-item (from the PANAS-X) and 5-item (from the I-PANAS-SF) versions of positive and negative affect subscales. Between-person reliabilities (between-person ω, RkF, RkR, RkRN) were largely equivalent across 5-item and 10-item versions, demonstrating moderate to substantial reliability. This indicates that when a researcher is primarily interested in between-person differences in affect, either scale length could reliably be used. Reliability of within-person change was lower than between-person reliability for each version of the positive and negative affect subscales, particularly when time was treated as nested (RCN). While reliabilities for 10-item versions of positive and negative affect were fair to moderate, 5-item versions of these subscales ranged from slight to fair. When primarily examining within-person change in positive or negative affect, the 10-item version of the PANAS-X positive or negative affect may therefore detect this change most reliably. However, with the exception of one estimate of within-person reliability for negative affect (Sample 4 RCN = .35), all within-person reliability estimates for 5-item versions of positive and negative affect were fair, suggesting that these shorter subscales can reliably detect within-person change.

Reliability of Fear, Hostility, and Sadness Subscales

Between-person ω and GT estimates of between-person reliability for fear, hostility, and sadness subscales were all moderate to substantial, suggesting these subscales can reliably examine between-person differences across a study period. Although they only comprised 5–6 items each, estimates of within-person reliabilities of change were fair to moderate for hostility and sadness, indicating that both subscales can also be used to reliably measure within-person change. Within-person reliability estimates of fear were somewhat more variable (slight to moderate; 0.40–0.67), but most estimates produced fell in the fair to moderate range. This indicates that the fear subscale may be used to detect within-person change, though reliability should be examined within the study sample, particularly when time is treated as nested within person (RCN).

Using the PANAS-X in Daily Life Research

Researchers should consider the unique aspects of their planned study design when determining whether and how to use the PANAS-X in daily life research. For example, in many EMA/AA designs, measurement occasions are best represented as nested within person. However, this study found that compared to GT estimates of reliability when measurement times were crossed (i.e., RkR and RC), nested GT estimates of within-person reliability (RkRN and RCN) were lower. This suggests that studies measuring the same individuals over the same fixed time intervals may measure mood dynamics more reliably, though practical and other constraints may not support such a study design.

Further, when examining score variance due to item and person*item, it appears that many PANAS-X subscales have a notable amount of variance due to item and how individuals respond to certain items (see Tables 79). For example, in Sample 1, 7% of the variance in positive affect was due to item, or the general tendency for some items to get higher or lower responses than others, and 10% of the variance was due to the way some people responded idiosyncratically to certain items (e.g., an affirmative response style; person*item, see Table 7). This variability due to item content and individual responses to particular items is important to consider given the heterogeneity in items selected for use in EMA studies on affect. Often, researchers will select a small number of items from a scale like the PANAS-X to reduce participant burden in an EMA design, perhaps without consideration for how this may influence multilevel reliability and internal consistency. Item-level variability was identified using Generalizability Theory approaches to reliability, and GT methods assume τ equivalence. As previously discussed, when this assumption of equal item loadings is violated, GT approaches may underestimate reliability (Lane & Shrout, 2010). When using an approach that does not assume this item equivalence (i.e., coefficient ω) within-person reliability estimates were moderate to substantial (at or above 0.7) for all subscales.

It is important for researchers to consider what information reliability estimates provide in an EMA context. Determining that a measure can reliably detect within-person change does not necessarily mean that it is sensitive to detect that change across all contexts and time scales. Researchers must consider whether adequate fluctuation in their construct of interest can reasonably occur within the time frame they plan to observe. Further, if the construct of interest is thought to fluctuate in response to a particular event (e.g., affect change following alcohol consumption), timing of prompts closer in time to the target event (e.g., event-contingent designs) may capture greater amounts of meaningful variation than a fixed-interval design. As researchers are often interested in how two constructs covary over time, some have proposed examining the reliability of within-person couplings over time (e.g., stress and negative affect, Neubauer et al., 2020) to maximize reliability.

In addition to reliability, researchers should also consider the relative amount of between-person and within-person variability in their affective subscale of interest. We found stronger between-person reliability than within-person for the PANAS-X, and variance decomposition estimates (see Tables 79) suggest that in general a larger proportion of variance in item response was between-person (person) than within-person (person*day or day[person]). While much research on affect is focused on within-person dynamic changes over time, the amount of variance due to within-person change varied depending on subscale. Within-person variance in positive affect, for example, ranged from 10–12%, while for sadness this range was 16–26%. While positive affect and sadness subscales may demonstrate similar reliability, researchers must still consider whether there is adequate within-person variability within a given sample in their affective construct of interest.

Clinical Research Implications

In this article, we focused on the reliability of commonly used scales for momentary measurement of affective experience in EMA/AA clinical research, comprised of PANAS-X items. Importantly, because the same set of 31 items (21 for NA and 10 for PA) were administered across the four studies we were able to compare reliability indices across studies. Further, we were able to examine these indices for 10-item versions of PA and NA, 5-item versions of PA and NA, and fear, hostility, and sadness subscales. Therefore, our findings are relevant to a wide range of momentary affective states (both broad and specific) that are targeted by EMA/AA researchers.

Momentary affective experience is a dynamic process, and these items appear to reliably assess these affects as well as their changes. One way that emotion dysregulation has been operationalized is to examine changes in affect states from moment to moment (e.g., through lag analyses or mean square successive differences; Jahng et al., 2008), allowing for empirical tests of factors associated with such changes (e.g., experiences, contexts, interpersonal conflict; Hepp et al., 2021). As such, establishing the reliability of measuring change in affect is a necessary first step toward measuring emotion dysregulation in daily life, which is increasingly implicated in studies of psychopathology given its transdiagnostic nature.

Our results also suggest that clinical researchers seeking to assess the broad domains of NA and PA at the momentary level might consider using the five-item version of these scales, as they do not appear to significantly sacrifice reliability of the scales or change scores. Clinical researchers that use EMA/AA methods must balance the length of assessments with participant burden, rendering the short versions of these scales attractive and viable alternatives. To this end, researchers often use single-item measures to capture a construct with minimal participant burden, and the approaches presented here can provide information about reliability of single affect items to detect within-person change. For example, the equation for RC (Equation 5) does not include item-level variance estimates and would be interpreted in a similar fashion for a single-item measure. However, examinations of single-item reliability suggest that using more than one item to measure a construct may be beneficial, as single items may be associated with greater error variance (Dejonckheere et al., 2022).

Constraints on Generality, Limitations, and Future Directions

Although our reliability findings for the PANAS-X items were largely replicated across four studies, it is important to acknowledge that they might not generalize to other populations (e.g., those with other psychiatric diagnoses), other studies with different lengths of assessment (e.g., one month or longer), different racial/ethnic composition, or different ages (e.g., youth). Given the observed variance suggesting idiosyncratic differences in how some individuals responded to some items, understanding how PANAS-X items may vary across time as a function of these individual differences requires additional investigation.

Although PANAS-X items are very frequently used in EMA/AA research, there are alternative measures of affect and mood that are widely used in EMA studies (e.g., items from the Profile of Mood States or the Inventory of Anxiety and Depression Symptoms; Jimenez et al., 2022). Our findings do not provide information about other measures of affect (or other combinations of PANAS-X items), and the multilevel reliability of these measures needs to be evaluated in each study (see Jimenez et al., 2022). Indeed, the PANAS-X may not capture the full range of discrete affect states that a researcher may be interested in examining. While we examined positive and negative affect along with fear, hostility, and sadness subscales, other theoretical models of affect include additional affective states, including low arousal positive affect (e.g., relaxed, calm) not included in the PANAS-X positive affect subscale. Circumplex models conceptualize affect along two dimensions, and in addition to positive and negative affect (Watson et al., 1988), some consider tension and energy (Thayer, 1996), approach and withdrawal (Lang, Bradley, & Cuthbert, 1998), or valence and arousal (Russell, 1980) as primary dimensions. Future work should consider the multilevel reliability of a broader range of affective states and measurement modalities.

Conclusions

Considerable EMA/AA research has used the PANAS-X to examine affect and affective change between- and within-person. This psychometric investigation demonstrates that the PANAS-X can reliably capture affective states and fluctuations when used in an intensive repeated measures framework for positive affect, negative affect, fear, hostility, and sadness. Researchers should, however, continue to consider the underlying structure and variance distribution of their data when using the PANAS-X to measure affect in daily life.

Supplementary Material

Supplemental Material

Table 8.

Variance Decomposition and % for Sample 2

Sample 2
PA (10) PA (5) NA (10) NA (5) Fear Hostility Sadness

Crossed ϭ2 % ϭ2 % ϭ2 % ϭ2 % ϭ2 % ϭ2 % ϭ2 %
 Person 0.33 24 0.29 22 0.15 29 0.14 28 0.15 32 0.13 26 0.20 35
 Day 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Item 0.08 6 0.07 6 0.01 2 0.01 3 0.01 3 0.02 3 0.00 0
 Person x Day 0.13 10 0.13 9 0.05 10 0.05 11 0.05 11 0.06 11 0.10 17
 Person x Item 0.15 11 0.15 11 0.05 10 0.05 10 0.05 11 0.05 9 0.02 4
 Day x Item 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Residual 0.68 50 0.72 53 0.25 49 0.23 48 0.20 43 0.24 50 0.25 44
 Total 1.37 100 1.36 100 0.52 100 0.49 100 0.46 100 0.49 100 0.58 100
Nested
 Person 0.35 25 0.32 24 0.15 29 0.15 30 0.16 33 0.13 28 0.20 35
 Day(Person) 0.13 9 0.12 9 0.05 10 0.05 11 0.05 10 0.05 11 0.10 17
 Residual 0.90 65 0.91 67 0.32 61 0.29 59 0.26 56 0.30 61 0.28 48
 Total 1.37 100 1.36 100 0.52 100 0.49 100 0.47 100 0.49 100 0.58 100
Sample 2 - BPD
Crossed ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 %
 Person 0.39 27 0.34 24 0.21 26 0.19 25 0.22 30 0.18 24 0.29 30
 Day 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Item 0.06 4 0.06 4 0.03 3 0.03 4 0.03 5 0.04 5 0.00 0
 Person x Day 0.15 10 0.14 10 0.09 11 0.10 13 0.09 12 0.09 12 0.17 18
 Person x Item 0.14 10 0.14 10 0.09 11 0.08 10 0.08 11 0.08 10 0.04 5
 Day x Item 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Residual 0.71 49 0.74 52 0.39 49 0.38 48 0.31 42 0.37 49 0.44 46
 Total 1.46 100 1.42 100 0.81 100 0.79 100 0.74 100 0.76 100 0.94 100
Nested
 Person 0.40 27 0.37 26 0.22 26 0.21 26 0.23 31 0.19 25 0.29 31
 Day(Person) 0.15 10 0.13 9 0.09 11 0.09 12 0.09 11 0.09 11 0.17 18
 Residual 0.93 63 0.93 65 0.52 63 0.49 62 0.44 58 0.48 63 0.48 51
 Total 1.48 100 1.43 100 0.83 100 0.80 100 0.76 100 0.76 100 0.95 100
Sample 2 - COM
Crossed ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 % ϭ 2 %
 Person 0.29 22 0.26 20 0.02 9 0.01 9 0.01 9 0.02 10 0.01 10
 Day 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Item 0.09 7 0.09 7 0.00 2 0.00 2 0.00 2 0.00 2 0.00 0
 Person x Day 0.11 9 0.12 9 0.02 10 0.01 10 0.01 9 0.02 13 0.03 21
 Person x Item 0.15 12 0.15 11 0.01 8 0.01 6 0.01 10 0.01 5 0.01 4
 Day x Item 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0 0.00 0
 Residual 0.65 50 0.69 53 0.12 71 0.10 72 0.09 70 0.13 69 0.09 64
 Total 1.29 100 1.31 100 0.17 100 0.14 100 0.13 100 0.18 100 0.14 100
Nested
 Person 0.30 24 0.29 22 0.02 10 0.01 11 0.01 11 0.02 11 0.02 11
 Day(Person) 0.11 9 0.11 8 0.02 10 0.01 10 0.01 9 0.02 13 0.03 21
 Residual 0.87 68 0.90 69 0.14 80 0.11 79 0.11 80 0.14 76 0.10 68
 Total 1.28 100 1.29 100 0.17 100 0.14 100 0.13 100 0.18 100 0.14 100

Note: BPD=Borderline Personality Disorder; DD=Depressive Disorder

Public Significance Statement:

The Positive and Negative Affect Schedule (PANAS-X) is commonly used to measure change in affect from moment to moment in ecological momentary assessment studies, though the PANAS-X was not originally designed with these studies in mind. This study demonstrates that researchers can reliably use the PANAS-X to capture changes in positive affect, negative affect, sadness, hostility, and fear in daily life using ecological momentary assessment.

Acknowledgments

This research was supported by National Institute on Alcohol Abuse and Alcoholism (NIAAA) Grants T32 AA013526, F31 AA027958, P60 AA011998 (PI: Trull/Heath), R21 MH069472 (Trull), AA022099 (Trull, PI), and MH100359 (Kerns and Trull, MPIs).

Footnotes

This study was not preregistered.

Data and study materials are not available online but are available upon request.

Disclosures: Timothy J. Trull is a co-founder of TigerAware LLC which developed the software described in this study. He receives no compensation or royalties for the use of the software.

Alison M. Haney played a lead role in conceptualization, data curation, formal analysis, methodology, validation, writing of original draft, and writing of review and editing. Megan N. Fleming played a supporting role in data curation, project administration, validation, writing of original draft, and writing of review and editing. Andrea M. Wycoff played a supporting role in conceptualization, data curation, formal analysis, methodology, project administration, investigation, and writing of review and editing. Sarah A. Griffin played a supporting role in data curation, writing of original draft, and writing of review and editing. Timothy J. Trull played a lead role in funding acquisition, investigation, project administration, resources, software, and supervision and a supporting role in conceptualization, formal analysis, methodology, validation, writing of original draft, and writing of review and editing.

References

  1. Beck AT, Ward CH, Mendelson M, Mock J, & Erbaugh J (1961). An inventory for measuring depression. Archives of General Psychiatry, 4(6), 561–571. 10.1001/archpsyc.1961.01710120031004 [DOI] [PubMed] [Google Scholar]
  2. Borkenau P, & Ostendorf F (1998). The Big Five as states: How useful is the five-factor model to describe intraindividual variations over time? Journal of Research in Personality, 32(2), 202–221. Scopus®. 10.1006/jrpe.1997.2206 [DOI] [Google Scholar]
  3. Bejarano CM, Cushing CC, & Crick CJ (2019). Does context predict psychological states and activity? An ecological momentary assessment pilot study of adolescents. Psychology of Sport and Exercise, 41, 146–152. 10.1016/j.psychsport.2018.05.008 [DOI] [Google Scholar]
  4. Beute F, & de Kort YAW (2018). The natural context of wellbeing: Ecological momentary assessment of the influence of nature and daylight on affect and stress for individuals with depression levels varying from none to clinical. Health & Place, 49, 7–18. 10.1016/j.healthplace.2017.11.005 [DOI] [PubMed] [Google Scholar]
  5. Calamia M (2019). Practical considerations for evaluating reliability in ambulatory assessment studies. Psychological Assessment, 31(3), 285–291. 10.1037/pas0000599 [DOI] [PubMed] [Google Scholar]
  6. Costa PT Jr., & McCrae RR (2008). The Revised NEO Personality Inventory (NEO-PI-R). In The SAGE Handbook of Personality Theory and Assessment: Volume 2—Personality Measurement and Testing (pp. 179–198). SAGE Publications Ltd. 10.4135/9781849200479 [DOI] [Google Scholar]
  7. Cotigă MI (2012). Development and validation of a Romanian version of the expanded version of Positive and Negative Affect Schedule (PANAS-X). PSIWORLD 2011, 33, 248–252. 10.1016/j.sbspro.2012.01.121 [DOI] [Google Scholar]
  8. Cranford JA, Shrout PE, Iida M, Rafaeli E, Yip T, & Bolger N (2006). A procedure for evaluating sensitivity to within-person change: Can mood measures in diary studies detect change reliably? Personality and Social Psychology Bulletin, 32(7), 917–929. 10.1177/0146167206287721 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cronbach L, Gleser G, Nanda H, & Rajaratnam N (1972). The dependability of behavioral measurement: Theory of generalizability for scores and profiles Wiley. [Google Scholar]
  10. Dejonckheere E, Demeyer F, Geusens B, Piot M, Tuerlinckx F, Verdonck S, & Mestdagh M (2022). Assessing the reliability of single-item momentary affective measurements in experience sampling. Psychological Assessment 10.1037/pas0001178 [DOI] [PubMed]
  11. Dillon KH, Glenn JJ, Dennis PA, Mann AJ, Deming CA, Aho N, Hertzberg JS, DeBeer BB, Meyer EC, Morissette SB, Gratz KL, Silvia PJ, Calhoun PS, Beckham JC, & Kimbrel NA (2021). Affective states and nonsuicidal self‐injury (NSSI): Results from an ecological momentary assessment study of veterans with NSSI disorder. Suicide and Life-Threatening Behavior, sltb.12818. 10.1111/sltb.12818 [DOI] [PMC free article] [PubMed]
  12. Duif M, Thewissen V, Wouters S, Lechner L, & Jacobs N (2019). Affective instability and alcohol consumption: Ecological momentary assessment in an adult sample. Journal of Studies on Alcohol and Drugs, 80(4), 441–447. 10.15288/jsad.2019.80.441 [DOI] [PubMed] [Google Scholar]
  13. Dvorak RD, Pearson MR, Sargent EM, Stevenson BL, & Mfon AM (2016). Daily associations between emotional functioning and alcohol involvement: Moderating effects of response inhibition and gender. Drug and Alcohol Dependence, 163, S46–S53. 10.1016/j.drugalcdep.2015.09.034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. First MG, Spitzer RL, Gibbon M, & Williams JB (1995). Structured Clinical Interview for DSM-IV Axis I disorder—Patient Ed New York State Psychiatric Institute, Biometrics Research Department. [Google Scholar]
  15. Fisher CD, & To ML (2012). Using experience sampling methodology in organizational behavior. Journal of Organizational Behavior, 33(7), 865–877. 10.1002/job.1803 [DOI] [Google Scholar]
  16. Fredrickson BL (2000). Extracting meaning from past affective experiences: The importance of peaks, ends, and specific emotions. Cognition & Emotion, 14(4), 577–606. 10.1080/026999300402808 [DOI] [Google Scholar]
  17. Fredrickson BL, & Kahneman D (1993). Duration neglect in retrospective evaluations of affective episodes. Journal of Personality and Social Psychology, 65(1), 45–55. 10.1037/0022-3514.65.1.45 [DOI] [PubMed] [Google Scholar]
  18. Geldhof GJ, Preacher KJ, & Zyphur MJ (2014). Reliability estimation in a multilevel confirmatory factor analysis framework. Psychological Methods, 19(1), 72–91. 10.1037/a0032138 [DOI] [PubMed] [Google Scholar]
  19. Hall M, Scherner PV, Kreidel Y, & Rubel JA (2021). A systematic review of momentary assessment designs for mood and anxiety symptoms. Frontiers in Psychology, 12, 642044. 10.3389/fpsyg.2021.642044 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hamrick LR, Haney AM, Kelleher BL, & Lane SP (2020). Using generalizability theory to evaluate the comparative reliability of developmental measures in neurogenetic syndrome and low-risk populations. Journal of Neurodevelopmental Disorders, 12(1), 16. 10.1186/s11689-020-09318-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hepp J, Carpenter RW, Freeman LK, Vebares TJ, & Trull TJ (2021). The environmental, interpersonal, and affective context of nonsuicidal self-injury urges in daily life. Personality Disorders: Theory, Research, and Treatment, 12(1), 29–38. 10.1037/per0000456 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Huang P-H, & Weng L-J (2012). Estimating the reliability of aggregated and within-person centered scores in ecological momentary assessment. Multivariate Behavioral Research, 47(3), 421–441. 10.1080/00273171.2012.673924 [DOI] [PubMed] [Google Scholar]
  23. Jahng S, Wood PK, & Trull TJ (2008). Analysis of affective instability in ecological momentary assessment: Indices using successive difference and group comparison via multilevel modeling. Psychological Methods, 13(4), 354–375. 10.1037/a0014173 [DOI] [PubMed] [Google Scholar]
  24. Jimenez A, McMahon TP, Watson D, & Naragon-Gainey K (2022). Dysphoria and well-being in daily life: Development and validation of ecological momentary assessment scales. Psychological Assessment 10.1037/pas0001117 [DOI] [PubMed]
  25. Lane SP, & Shrout PE (2010). Assessing the reliability of within-person change over time: A dynamic factor analysis approach. Multivariate Behavioral Research, 45(6), 1027–1027. 10.1080/00273171.2010.534380 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Lang PJ, Bradley MM, & Cuthbert BN (1998). Emotion, motivation, and anxiety: Brain mechanisms and psychophysiology. Biological Psychiatry, 44(12), 1248–1263. 10.1016/S0006-3223(98)00275-3 [DOI] [PubMed] [Google Scholar]
  27. Liu H, Xie QW, & Lou VWQ (2019). Everyday social interactions and intra-individual variability in affect: A systematic review and meta-analysis of ecological momentary assessment studies. Motivation and Emotion, 43(2), 339–353. 10.1007/s11031-018-9735-x [DOI] [Google Scholar]
  28. McDonald RP (1999). Test theory: A unified treatment (1st ed.). Lawrence Erlbaum Associates, Inc. [Google Scholar]
  29. Merz EL, Malcarne VL, Roesch SC, Ko CM, Emerson M, Roma VG, & Sadler GR (2013). Psychometric properties of Positive and Negative Affect Schedule (PANAS) original and short forms in an African American community sample. Journal of Affective Disorders, 151(3), 942–949. 10.1016/j.jad.2013.08.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Mote J, & Fulford D (2020). Ecological momentary assessment of everyday social experiences of people with schizophrenia: A systematic review. Schizophrenia Research, 216, 56–68. 10.1016/j.schres.2019.10.021 [DOI] [PubMed] [Google Scholar]
  31. Murray AL, Wong S-C, Obsuth I, Rhodes S, Eisner M, & Ribeaud D (2021). An ecological momentary assessment study of the role of emotional dysregulation in co-occurring ADHD and internalising symptoms in adulthood. Journal of Affective Disorders, 281, 708–713. 10.1016/j.jad.2020.11.086 [DOI] [PubMed] [Google Scholar]
  32. Nezlek JB, & Gable SL (2001). Depression as a moderator of relationships between positive daily events and day-to-day psychological adjustment. Personality and Social Psychology Bulletin, 27(12), 1692–1704. 10.1177/01461672012712012 [DOI] [Google Scholar]
  33. Pfohl B, Blum N, & Zimmerman M (1997). Structured interview for DSM-IV personality: SIDP-IV American Psychiatric Pub. [Google Scholar]
  34. Revelle W, & Condon DM (2018). Reliability from alpha to omega: A tutorial [Preprint] PsyArXiv. 10.31234/osf.io/2y3w9 [DOI] [PubMed]
  35. Ringwald WR, Manuck SB, Marsland AL, & Wright AGC (2022). Psychometric evaluation of a Big Five personality state scale for intensive longitudinal studies. Assessment, 29(6), 1301–1319. 10.1177/10731911211008254 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Russell JA (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. 10.1037/h0077714 [DOI] [Google Scholar]
  37. Schaefer LM, Smith KE, Anderson LM, Cao L, Crosby RD, Engel SG, Crow SJ, Peterson CB, & Wonderlich SA (2020). The role of affect in the maintenance of binge-eating disorder: Evidence from an ecological momentary assessment study. Journal of Abnormal Psychology, 129(4), 387–396. 10.1037/abn0000517 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Shavelson RJ, & Webb NM (1991). Generalizability theory: A primer (pp. xiii, 137). Sage Publications, Inc. [Google Scholar]
  39. Sheehan DV, Lecrubier Y, Sheehan KH, Amorim P, Janavs J, Weiller E, & Dunbar GC (1998). The Mini-International Neuropsychiatric Interview (MINI): The development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. Journal of Clinical Psychiatry, 59(20), 22–33. [PubMed] [Google Scholar]
  40. Shrout PE, & Lane SP (2012). Psychometrics. In Mehl MR & Conner TS (Eds.), Handbook of research methods for studying daily life (pp. 302–320). The Guilford Press. [Google Scholar]
  41. Stone AA, & Shiffman S (2002). Capturing momentary, self-report data: A proposal for reporting guidelines. Annals of Behavioral Medicine, 24(3), 236–243. 10.1207/S15324796ABM2403_09 [DOI] [PubMed] [Google Scholar]
  42. Terracciano A, McCrae RR, & Costa PT Jr. (2003). Factorial and Construct Validity of the Italian Positive and Negative Affect Schedule (PANAS). European Journal of Psychological Assessment, 19(2), 131–141. 10.1027//1015-5759.19.2.131Thayer, R. E. (1996). The origin of everyday moods: managing energy, tension, and stress. Oxford University Press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Thompson ER (2007). Development and Validation of an Internationally Reliable Short-Form of the Positive and Negative Affect Schedule (PANAS). Journal of Cross-Cultural Psychology, 38(2), 227–242. 10.1177/0022022106297301 [DOI] [Google Scholar]
  44. Trull TJ, & Ebner-Priemer UW (2020). Ambulatory assessment in psychopathology research: A review of recommended reporting guidelines and current practices. Journal of Abnormal Psychology, 129(1), 56. [DOI] [PubMed] [Google Scholar]
  45. Trull TJ, Lane SP, Koval P, & Ebner-Priemer UW (2015). Affective dynamics in psychopathology. Emotion Review, 7(4), 355–361. 10.1177/1754073915590617 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Trull TJ, Solhan MB, Tragesser SL, Jahng S, Wood PK, Piasecki TM, & Watson D (2008). Affective instability: Measuring a core feature of borderline personality disorder with ecological momentary assessment. Journal of Abnormal Psychology, 117(3), 647–661. 10.1037/a0012532 [DOI] [PubMed] [Google Scholar]
  47. Vinci C, Li L, Wu C, Lam CY, Guo L, Correa-Fernández V, Spears CA, Hoover DS, Etcheverry PE, & Wetter DW (2017). The association of positive emotion and first smoking lapse: An ecological momentary assessment study. Health Psychology, 36(11), 1038–1046. 10.1037/hea0000535 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. von Humboldt S, Monteiro A, & Leal I (2017). Validation of the PANAS: A measure of positive and negative affect for use with cross-national older adults. Review of European Studies, 9(2), 11. 10.5539/res.v9n2p10 [DOI] [Google Scholar]
  49. Watson D, & Clark LA (1999). The PANAS-X: Manual for the Positive and Negative Affect Schedule—Expanded Form University of Iowa. [Google Scholar]
  50. Watson D, Clark LA, & Tellegen A (1988). Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology, 54(6), 1063–1070. 10.1037/0022-3514.54.6.1063 [DOI] [PubMed] [Google Scholar]
  51. Wright AGC, & Zimmermann J (2019). Applied ambulatory assessment: Integrating idiographic and nomothetic principles of measurement. Psychological Assessment, 31(12), 1467–1480. 10.1037/pas0000685 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

RESOURCES