Skip to main content
JAMA Network logoLink to JAMA Network
. 2024 Jan 22;7(1):e2352590. doi: 10.1001/jamanetworkopen.2023.52590

Mental Health Counseling From Conversational Content With Transformer-Based Machine Learning

Zac E Imel 1, Michael J Tanana 1, Christina S Soma 1,, Thomas D Hull 2, Brian T Pace 1, Sarah C Stanco 1, Torrey A Creed 3, Theresa B Moyers 4, David C Atkins 1
PMCID: PMC10804269  PMID: 38252437

Key Points

Question

Is session content associated with client-reported outcomes in asynchronous text-based counseling?

Findings

In this quality improvement study, a transformer-based, deep learning model was used to evaluate the content from 166 644 clients and 20 600 274 messages. Significant correlations were found between therapist interventions and client satisfaction, engagement, and distress.

Meaning

The findings of this study suggest that components of supportive counseling (eg, asking open-ended questions and making reflective listening statements) may be key factors in the success of asynchronous text-based counseling.

Abstract

Importance

Use of asynchronous text-based counseling is rapidly growing as an easy-to-access approach to behavioral health care. Similar to in-person treatment, it is challenging to reliably assess as measures of process and content do not scale.

Objective

To use machine learning to evaluate clinical content and client-reported outcomes in a large sample of text-based counseling episodes of care.

Design, Setting, and Participants

In this quality improvement study, participants received text-based counseling between 2014 and 2019; data analysis was conducted from September 22, 2022, to November 28, 2023. The deidentified content of messages was retained as a part of ongoing quality assurance. Treatment was asynchronous text-based counseling via an online and mobile therapy app (Talkspace). Therapists were licensed to provide mental health treatment and were either independent contractors or employees of the product company. Participants were self-referred via online sign-up and received services via their insurance or self-pay and were assigned a diagnosis from their health care professional.

Exposure

All clients received counseling services from a licensed mental health clinician.

Main Outcomes and Measures

The primary outcomes were client engagement in counseling (number of weeks), treatment satisfaction, and changes in client symptoms, measured via the 8-item version of Patient Health Questionnaire (PHQ-8). A previously trained, transformer-based, deep learning model automatically categorized messages into types of therapist interventions and summaries of clinical content.

Results

The total sample included 166 644 clients treated by 4973 therapists (20 600 274 messages). Participating clients were predominantly female (75.23%), aged 26 to 35 years (55.4%), single (37.88%), earned a bachelor’s degree (59.13%), and were White (61.8%). There was substantial variability in intervention use and treatment content across therapists. A series of mixed-effects regressions indicated that collectively, interventions and clinical content were associated with key outcomes: engagement (multiple R = 0.43), satisfaction (multiple R = 0.46), and change in PHQ-8 score (multiple R = 0.13).

Conclusions and Relevance

This quality improvement study found associations between therapist interventions, clinical content, and client-reported outcomes. Consistent with traditional forms of counseling, higher amounts of supportive counseling were associated with improved outcomes. These findings suggest that machine learning–based evaluations of content may increase the scale and specificity of psychotherapy research.


This quality improvement study examines the use of machine learning to evaluate the outcomes of text-based counseling services.

Introduction

Psychotherapy is a conversation-based treatment traditionally delivered via in-person, 50-minute sessions between a licensed therapist and a client.1 Due to problems with access, maturing digital communication technologies, and the COVID-19 pandemic, text-based psychotherapy has advantages over traditional, in-person treatment. In particular, asynchronous message-based conversations, where a therapist and client may have an ongoing interaction, have become popular because of their convenience and immediacy, particularly for individuals with unpredictable work schedules, and similarity to text-based interactions that are common in everyday life.2,3,4

The outcomes of psychotherapy delivered through text can be comparable to treatments offered via traditional forms for key diagnostic concerns,5,6,7 and additional large-scale studies are under way.8 However, as with traditional in-person treatment, it is likely that both outcomes and the quality of treatment provided vary.9 In particular, quality assurance is complicated by the lack of scalable methods for assessing therapy quality.10 Reference standard methods for evaluating psychotherapy rely on observational coding systems in which trained evaluators rate the presence of specific, evidence-based practices or content.11,12,13 Observational coding systems are labor-intensive, expensive at scale, and often not used.

Complicating the evaluation of psychotherapy in naturalistic settings, therapists use a variety of treatment practices, including general, supportive counseling (eg, empathy and reflections), to more structured interventions, such as cognitive behavioral therapy (CBT).14 Therapy conversations can vary substantially in topical focus (ie, suicide, relationship concerns, or chat).15 Without scalable assessment methods, the quality and content of psychotherapy are largely unknown, undermining quality improvement and treatment transparency for clients.

Advances in natural language processing, such as newer transformer-based machine learning methods, have led to advances in research on the process of psychotherapy. Using reference standard, observational coding and psychotherapy transcripts as training data, models learn associations between session content and processes such as empathy,16 active listening,17 CBT skills,18 and session content.19,20 Dictionary-based natural language processing has shown associations between specific word use and treatment outcomes (eg, first person pronoun use and symptom ratings in 6229 transcripts).21 In a study of approximately 17 000 clients receiving text-based manualized CBT, there were significant associations between a novel measure of therapist CBT skills, therapy-related content, symptom improvement, and treatment engagement.22

In this study, previously trained machine learning models assessed therapy content and the use of specific therapy interventions based on reference standard measures created by the treatment developers (eg, Moyers et al23 and Young and Beck24) in asynchronous message-based counseling episodes for 166 644 clients. The resulting content and intervention assessments were then examined in in terms of client satisfaction, engagement, and symptom change.

Methods

Design

Data were obtained from an online and mobile therapy app (Talkspace) for services provided between 2014 and 2019.25 The platform connects a network of contracted and employed licensed mental health professionals, including psychologists, clinical mental health therapists, marriage and family therapists, social workers, and psychiatrists, to individuals seeking care into a virtual treatment room directly through self-pay or as part of employer and health plan benefits. Although video teletherapy is available, most clients select the messaging option where clients can message their therapist 24 hours, 7 days a week and receive responses from their therapist during agreed-upon office hours, which is the focus of the current research. Diagnoses are established by the client’s online and mobile therapy app provider, and the population is similar to that of those seeking care within outpatient private therapy practice. Large-scale effectiveness studies to date suggest that the platform is beneficial for most clients.6,26 The information captured through the platform is intended to support quality monitoring, treatment planning, and auditing of services. When registering to use the service, clients and therapists agree to the use of their anonymized data for quality assurance and for research. This study was thus deemed exempt from full institutional review board review at the University of New Mexico. The article was prepared in accordance with the Standards for Quality Improvement Reporting Excellence (SQUIRE) reporting guideline.

A treatment episode was defined as inclusive of all the messages a patient sent and received. Clients with treatment episodes inclusive of fewer than 30 messages were excluded. The total sample size varied across key outcomes because of missing responses for the 8-item version of Patient Health Questionnaire (PHQ-8) and satisfaction outcomes. For engagement, there were 166 644 clients and 4973 therapists (20 600 274 messages). For change in PHQ-8 score, there were 43 118 clients and 4381 therapists (26% of the total sample, 89% of therapists). For treatment satisfaction, there were 17 636 clients and 3155 therapists (11% of the total sample, 64% of therapists).

Outcomes

Clinical outcomes included satisfaction, client engagement, and change in the PHQ-8 score.27 Satisfaction was the mean of 3 survey items that included (1) How satisfied are you with your overall Talkspace experience so far? (2) How likely are you to recommend Talkspace to a friend or colleague? and (3) How likely are you to recommend your therapist to a friend or colleague? Engagement was estimated using the number of weeks a client was an active user of Talkspace services. All client-reported measures were accessible through the app. Users received an in-app alert to complete surveys. Clients were asked to complete a symptom outcome measure at baseline and then every 3 weeks. The online and mobile therapy app first administered the PHQ-8. Although the PHQ-9 was later added, the available sample was quite small. As the PHQ-8 and 9 are very highly correlated,28,29 we used the PHQ-8 score for analysis. Satisfaction surveys were prompted at 5- and 14-day posttreatment initiation, and models used the first available rating.

Machine Learning Assessment of Message Content

Messages (n = 20 600 274) were assessed for 54 different interventions and content (eTable 1 in Supplement 1) by a machine learning system that uses a transformer-based deep learning model30 posttrained on annotated therapy transcripts (ie, in-domain training data; eTable 2 in Supplement 1 for human-interrater agreement for training data and overall model performance on each label). Transformer models are a type of deep learning tool that can classify long sequences of input words based on extensive pretraining with unlabeled examples of text. The 54 dimensions include items from the Motivational Interviewing Skills Code (MISC) 2.1,23 which assesses supportive counseling dialogue acts, the Cognitive Therapy Rating Scale (CTRS),24 which assesses 11 dimensions of CBT competency, and a list of clinical content labels. The MISC and content codes were generated for each message and normalized as percentages of utterances in a treatment episode. The CTRS values, as well as a subset of MISC labels, were generated as summary scores for each client’s treatment episode. An episode was defined as the entire record of conversation between the therapist and client. For the MISC, empathy and collaboration scores range from 1 to 5, with 1 indicating little evidence the professional attempted to understand and 5 indicating consistent attempts to understand. For the CTRS, CBT scores ranged from 0 to 6, with 0 representing poor or no use of the skill and 6 excellent. The 54 dimensions of interventions and content then served as inputs to regressions on clinical outcomes.

Statistical Analysis

Data analysis was conducted from September 22, 2022, to November 28, 2023. We performed 3 general linear mixed-effects regression analyses, with dependent variables being engagement, satisfaction, and change in PHQ-8 score. Cases without at least 2 PHQ-8 scores were excluded from PHQ-8 analyses. Variables were then normalized (ie, percent of messages) with each feature or overall session level ratings of each client’s treatment episode. All PHQ-8 analyses subtracted pretreatment PHQ-8 scores from the final PHQ-8 score and thus features were corresponding to change scores. We obtained an omnibus multiple R value for each outcome from the mixed-effects models. We do not present individual coefficients for each feature due to multicollinearity. For interpretation, we present bivariate correlations between each outcome and the machine-generated feature. All mixed-effect regression and bivariate correlation analyses were performed in R, version 4.2 (R Foundation for Statistical Computing). Statistical significance was defined with a P value <.05.

Results

Demographic characteristics were available for a subset of the total sample (n = 106 423) (Table). Clients identified their race and ethnicity as African American (12.74%, n = 1255), Asian (7.92%, n = 780), biracial or multiracial (0.86%, n = 85), Hispanic (6.99%, n = 689), Native American (0.36%, n = 35), Native Hawaiian or Other Pacific Islander (0.05%, n = 5), White (61.8%, n = 6089), other (7.99%, n = 787), or declined to answer (1.29%, n = 127). Race and ethnicity data were collected to ensure that this mobile and online therapy app is used equitably, and that reporting of usage is more accurate. Clients identified their gender as female (75.23%, n = 49 765), male (23.25%, n = 15 380), nonbinary (0.4%, n = 265), transgender female (0.13%, n = 86), transgender male (0.24%, n = 159), gender queer (0.47%, n = 311), gender variant (0.1%, n = 106), or another gender identifier (0.18%, n = 191). Most clients were aged 26 to 35 years (55.4%).

Table. Client Demographic Characteristics.

Category and identifier No. (%)
Gender, total (n = 66 150)
Female 49 765 (75.23)
Male 15 380 (23.25)
Gender queer 311 (0.47)
Nonbinary 265 (0.40)
Transgender male 159 (0.24)
Transgender female 86 (0.13)
Gender other 119 (0.18)
Race and ethnicity, total (n = 9852)
African American 1255 (12.74)
Asian 780 (7.92)
Biracial or multiracial 85 (0.86)
Hispanic 689 (6.99)
Native American 35 (0.36)
Native Hawaiian or other Pacific Islander 5 (0.05)
White 6089 (61.80)
Other 787 (7.99)
Declined 127 (1.29)
Marital status, total (n = 55 930)
Single 21 186 (37.88)
Married 17 853 (31.92)
In a relationship 7047 (12.60)
Living with a partner 6863 (12.27)
Divorced 1956 (3.15)
Separated 1040 (1.86)
Widowed 179 (0.32)
Educational level, total (n = 33 845)
Less than a high-school diploma 403 (1.19)
High school diploma 44 048 (11.96)
Some college no degree 3077(9.09)
Associate degree 1124 (3.32)
Professional degree 670 (1.98)
Bachelor’s degree 20 013 (59.13)
Master’s degree 3686 (10.89)
Doctoral degree 822 (2.43)
Age, total, y (n = 17 527)
0-17 8 (0.05)
18-25 3737 (21.3)
26-35 9702 (55.4)
36-49 3425 (19.5)
≥50 655 (3.7)

Frequency of Interventions and Content

Therapist interventions and clinical content varied substantially across clients and therapists. From the MISC, mean (SD) therapist empathy was 2.3 (0.41). There was a notable range of empathy, from a low of 1.0 to a high of 3.8. Overall, 5% (n = 10 130) of therapist messages included complex reflections, 5% (n = 10 130) open questions, 8% (n = 16 207) affirmations, and 2% (n = 4052) client autonomy. The 2 highest scores for the CTRS were interpersonal effectiveness (mean, 3.3) and collaboration (mean, 2.7), which represent broader therapeutic skills. Over half (52%; n = 105 347) of therapist messages involved giving information, and 8% (n = 16 207) offered advice.

Less than 1% (<2026) of messages contained simple reflections, providing structure or confrontation. Other CBT fidelity scores were quite low, indicating a general low level of CBT fidelity in conversations. Mean subscale scores ranged from 1.44 to 3.34, with most scores below 2—ratings that are considered between mediocre and barely adequate use of CBT skills. Across all client sessions, the predicted total CBT competence score was 22.1, which is well below acceptance levels of competence (ie, 40). The most frequent topics of conversation were goals and interventions (28% of messages; n = 56 725), 14% (n = 28 363) primary or family relationships, 15% (n = 30 389) other, and 7% (n = 14 181) mood and emotions.

Interventions and Content Related to Engagement (Weeks in Treatment)

The model-based correlation between therapy interventions and content with engagement was strong (multiple R = 0.43), explaining 18% of the variance, (P < .001). To interpret individual associations, eTable 3 in Supplement 1 contains each of the bivariate correlations and 95% CIs between each feature and outcome. The therapist behaviors that had the highest positive correlation with increased client retention were complex reflections (r = 0.26), open questions (r = 0.15), closed questions (r = 0.12), simple reflections (r = 0.12), and affirmations (r = 0.10) (Figure 1 and Figure 2). Cognitive behavioral therapy skills as measured by the CTRS were associated with decreased client retention.

Figure 1. Percent of Complex Reflections That Correspond With Weeks of Retention.

Figure 1.

Figure 2. Percent of Open Questions That Correspond With Weeks of Retention.

Figure 2.

The topics discussed that had the highest positive correlation with increased client retention were social relationships (r = 0.21), mood or emotional state (r = 0.19), relationships or family (r = 0.17), and activities and hobbies (r = 0.17). Topics that had the highest negative correlation with decreased engagement were case management (r = −0.26) and assessment (r = −0.16).

Interventions and Content Related to Client Satisfaction

The overall model-based correlation between therapy intervention and content with satisfaction was strong (multiple R = 0.46), explaining 22% of the variance (P < .001). The therapist interventions that had the highest positive correlation with satisfaction were complex reflections (r = 0.22), affirmations (r = 0.16), open questions (r = 0.10), advising (r = 0.07), and simple reflections (r = 0.08). The therapist interventions with the highest negative correlation with client satisfaction include giving information (r = −0.16) and providing structure (r = −0.21). All CTRS-coded behaviors were correlated with decreased client satisfaction.

The topics that had the highest positive correlation with increased client retention were mood or emotional state (r = 0.20), relationships or family (r = 0.17), activities and hobbies (r = 0.16), social relationships (r = 0.14), and self-identity development (r = 0.13). Topics that had the highest negative correlation with decreased satisfaction were case management (r = −0.36) and assessment (r = −0.13) (eTable 3, eTable 4 in Supplement 1).

Interventions and Content Related to PHQ-8 Change

The mean (SD) initial PHQ-8 score was 10.80 (5.78; minimum = 0; maximum = 24), and the final score was 8.04 (5.72; minimum = 0; maximum = 24), with a moderate effect size (Cohen d = 0.48). The overall model-based correlation between therapy interventions and content with PHQ-8 change was significant (multiple R = 0.13), explaining 1.7% of the variance (P < .001). In a post hoc sensitivity analysis, multiple R values reported in the results were generally similar to a subsample of clients whose first reported PHQ-8 score was greater than or equal to 10. For engagement and satisfaction, the correlations were still moderate and significant, but somewhat lower for satisfaction (engagement: 0.44 vs 43 and satisfaction: 0.38 vs 0.46). For change in PHQ-8 score, the correlation was slightly larger in the clinically elevated sample (0.21 vs 0.13). Therapist behaviors that were most associated with reduction in PHQ-8 score were complex reflections (r = −0.01), simple reflections (r = −0.02), affirmations (r = −0.08), and advice (r = −0.01). The therapist behaviors that had the highest positive correlation with increases in PHQ-8 were giving information (r = 0.03) and structuring (r = 0.04). The topics that had the highest negative correlation with reduction in PHQ-8 were mood or emotional state (r = −0.04), activities and hobbies (r = −0.03), self-identity development (r = −0.04), and health and medical issues (r = −0.03) (eTable 3, eTable 5 in Supplement 1).

Discussion

In this quality improvement study, results suggest robust associations between therapist interventions, clinical content, and client-reported outcomes. Together, machine learning–generated measures of therapist interventions and content correlated strongly (multiple R = 0.43-0.46) with engagement and satisfaction. The multiple correlation of 0.14 with change in PHQ-8 score is relatively smaller, but with more than 20 million unique messages and a broad assessment of interventions and content, this study highlights key, curative processes in community-based treatment and is several orders of magnitude larger than previous meta-analyses on treatment process.31,32

Robust predictors of outcomes have been difficult to find within the psychotherapy process literature.33 The most consistent correlates of treatment outcome are client-rated relational measures, such as the working alliance and empathy.34,35 These studies are large but rely on self-report measures, providing no insight into the therapy conversation. Observational studies are labor intensive, and samples are typically limited. A meta-analysis of therapist CBT adherence to treatment and outcome revealed a near 0 association between measures of adherence, competence, and clinical outcome,36 while a recent meta-analysis found no association between adherence, and outcome, but some associations with indicators of treatment integrity in a smaller set of 3 studies with less than 500 clients total.32 While the overall correlation reported herein with PHQ-8 is smaller than in the abovementioned meta-analysis,32 it is larger than correlations with outcomes in previous natural language processing–based work.22 This could be a result of the additional measure of counseling skills beyond a standardized form of CBT interventions and more varied clinical content.

The present research found that better client outcomes were associated with higher empathy, open questions, complex reflections, and affirmations, which is consistent with prior work on treatment process and supportive counseling more broadly.37,38 In particular, complex reflections were among the strongest relative predictors across outcomes. Giving information was the most common therapist intervention. It may be these types of behaviors are most common in text-based counseling. To our knowledge, there are no comparable analyses in traditional psychotherapy, and it is difficult to interpret the base rates found herein for different interventions. Regardless, more giving information was associated with worse outcomes, suggesting that an overabundance of psychoeducation may be detrimental to treatment.

In this sample, CBT measures were often associated with less satisfaction, engagement, and less improvement in PHQ. As is common in community research, it is unclear how many therapists intended to provide CBT as a primary intervention or whether CBT interventions were part of an eclectic mix of therapy approaches. Overall CBT quality was low, similar to other community-based research.39,40 Conclusions cannot be drawn about the impact of CBT if it were delivered with fidelity in this modality, given the very low base rate of these skills. Furthermore, CTRS codes were developed for ratings of a single 50-minute conversation, which are likely to be different from a potentially multiday asynchronous text-based conversation. It is also possible that text-based counseling is more suited to therapists providing support, encouragement, and understanding, whereas attempts to structure sessions or assign homework in ways comparable to in-person counseling may have unintended consequences or require alternative approaches. Future work should examine adapted and new measures of CBT skills in text-based counseling.

The present results also provide a detailed analysis of the content discussed in therapy conversations, based on actual conversations as opposed to surveys or medical records. Consistent with extensive research on the benefits of discussing emotion during therapy, such as reported by Pascual-Leone,41 discussing mood and emotional state and anxiety corresponded with higher satisfaction and engagement and decreases in client distress. Conversations surrounding the client’s general well-being (eg, physical health) and unique client experiences (eg, hobbies and identity development) may be indicative of the therapist asking more questions about the client, potentially demonstrating curiosity and eliciting client engagement.

Limitations

The findings in this study are correlational—clients were not assigned to conditions in which therapists attempted more or less specific types of interventions or topics. Future studies could assign therapists to conditions in which they receive ongoing feedback on the use of skills to examine whether feedback can increase use of skills and outcomes. For example, generative language models could be trained to write specific types of interventions for the therapist to review or potentially to edit therapist statements, which has shown promise in peer communication.42 While therapy chatbots are now widely available as phone apps, the augmentation of traditional human therapists with machine learning intelligence could be an important next step in mental health care.

This study assessed the use of a variety of clinical content domains. However, included measures drawn from motivational interviewing and CBT treatments do not capture the full range of possible interventions that may be beneficial to clients. Across the interventions evaluated in this study, indicators of therapist empathy and active listening were the most consistent correlates of key outcomes. This finding is broadly consistent with prior work on the common factors of psychotherapy43 that suggests the factors common across different types of psychotherapy are the most robust predictors of treatment outcome.44,45 Future studies should also assess other types of evidence-based interventions that are usually found in clinical practice.

In a post hoc sensitivity analysis, the clients excluded from the PHQ-8 regressions (eg, with either 0 or 1 PHQ-8 response) were similar to those included in the analysis in terms of satisfaction and baseline level of PHQ-8. However, the sample with missing PHQ-8 scores had lower overall engagement in treatment. Veritably, the longer a client engaged in treatment, the more chance they would have to complete any or a second PHQ-8. As such, results may be more generalizable for clients who engage longer in treatment.

Conclusion

In this quality improvement study of conversational content in counseling, there were robust associations between therapist interventions, clinical content, and client-reported outcomes. Increased access to services further suggests the need for quality assurance tools that can scale with these services. Machine learning–derived metrics have the potential to complement labor-intensive tools that have never fully been used outside of research settings. These tools may increase visibility into the care received by clients as well as the specificity of research on why treatments work.

Supplement 1.

eTable 1. Description of Lyssn ML Generated Metrics

eTable 2. Interrater and Human-Machine F-Scores

eTable 3. Engagement Bivariate Correlations and 95% CIs

eTable 4. Satisfaction Bivariate Correlations and 95% CIs

eTable 5. PHQ-8 Change Bivariate Correlations and 95% CIs

Supplement 2.

Data Sharing Statement

References

  • 1.Tadmon D, Olfson M. Trends in outpatient psychotherapy provision by US psychiatrists: 1996–2016. Am J Psychiatry. 2022;179(2):110-121. doi: 10.1176/appi.ajp.2021.21040338 [DOI] [PubMed] [Google Scholar]
  • 2.Carlo AD, Hosseini Ghomi R, Renn BN, Areán PA. By the numbers: ratings and utilization of behavioral health mobile applications. NPJ Digit Med. 2019;2:54. doi: 10.1038/s41746-019-0129-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Hull TD, Mahan K. A study of asynchronous mobile-enabled SMS text psychotherapy. Telemed J E Health. 2017;23(3):240-247. doi: 10.1089/tmj.2016.0114 [DOI] [PubMed] [Google Scholar]
  • 4.Song J, Litvin B, Allred R, Chen S, Hull TD, Areán PA. Comparing message-based psychotherapy to once-weekly, video-based psychotherapy for moderate depression: randomized controlled trial. J Med Internet Res. 2023;25:e46052. doi: 10.2196/46052 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Chan S, Li L, Torous J, Gratzer D, Yellowlees PM. Review of use of asynchronous technologies incorporated in mental health care. Curr Psychiatry Rep. 2018;20(10):85. doi: 10.1007/s11920-018-0954-3 [DOI] [PubMed] [Google Scholar]
  • 6.Hull TD, Malgaroli M, Connolly PS, Feuerstein S, Simon NM. Two-way messaging therapy for depression and anxiety: longitudinal response trajectories. BMC Psychiatry. 2020;20(1):297. doi: 10.1186/s12888-020-02721-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Malgaroli M, Hull TD, Wiltsey Stirman S, Resick P. Message delivery for the treatment of posttraumatic stress disorder: longitudinal observational study of symptom trajectories. J Med Internet Res. 2020;22(4):e15587. doi: 10.2196/15587 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Arean P, Hull D, Pullmann MD, Heagerty PJ. Protocol for a sequential, multiple assignment, randomized trial to test the effectiveness of message-based psychotherapy for depression compared with telepsychotherapy. BMJ Open. 2021;11(11):e046958. doi: 10.1136/bmjopen-2020-046958 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Baldwin SA, Imel ZE. Therapist effects: Findings and methods. In: Lambert MJ, et. Bergin and Garfield’s Handbook of Psychotherapy and Behavior Change. 6th ed. John Wiley & Sons; 2013:258-297. [Google Scholar]
  • 10.Institute of Medicine (US) Committee on Crossing the Quality Chasm . Adaptation to Mental Health, Disorders A: The Quality Chasm in Health Care for Mental and Substance-Use Conditions. National Academies Press; 2006. [Google Scholar]
  • 11.Proctor EK, Landsverk J, Aarons G, Chambers D, Glisson C, Mittman B. Implementation research in mental health services: an emerging science with conceptual, methodological, and training challenges. Adm Policy Ment Health. 2009;36(1):24-34. doi: 10.1007/s10488-008-0197-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Moyers TB, Martin T, Manuel JK, Hendrickson SML, Miller WR. Assessing competence in the use of motivational interviewing. J Subst Abuse Treat. 2005;28(1):19-26. doi: 10.1016/j.jsat.2004.11.001 [DOI] [PubMed] [Google Scholar]
  • 13.Goldberg SB, Baldwin SA, Merced K, et al. The structure of competence: evaluating the factor structure of the Cognitive Therapy Rating Scale. Behav Ther. 2020;51(1):113-122. doi: 10.1016/j.beth.2019.05.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Solomonov N, McCarthy KS, Gorman BS, Barber JP. The Multitheoretical List of Therapeutic Interventions–30 items (MULTI-30). Psychother Res. 2019;29(5):565-580. doi: 10.1080/10503307.2017.1422216 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Bamatter W, Carroll KM, Añez LM, et al. Informal discussions in substance abuse treatment sessions with Spanish-speaking clients. J Subst Abuse Treat. 2010;39(4):353-363. doi: 10.1016/j.jsat.2010.07.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Xiao B, Imel ZE, Georgiou PG, Atkins DC, Narayanan SS. Rate my therapist: automated detection of empathy in drug and alcohol counseling via speech and language processing. PloS One. 2015;10(12):e0143055. doi: 10.1371/journal.pone.0143055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Can D, Marín RA, Georgiou PG, Imel ZE, Atkins DC, Narayanan SS. “It sounds like...”: a natural language processing approach to detecting counselor reflections in motivational interviewing. J Couns Psychol. 2016;63(3):343-350. doi: 10.1037/cou0000111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Flemotomos N, Martinez V, Gibson J, Atkins D, Creed T, Narayanan S. Language features for automated evaluation of cognitive behavior psychotherapy sessions. In: Interspeech 2018. ISCA; 2018. [Google Scholar]
  • 19.Imel ZE, Steyvers M, Atkins DC. Computational psychotherapy research: scaling up the evaluation of patient-provider interactions. Psychotherapy (Chic). 2015;52(1):19-30. doi: 10.1037/a0036841 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Gaut G, Steyvers M, Imel ZE, Atkins DC, Smyth P. Content coding of psychotherapy transcripts using labeled topic models. IEEE J Biomed Health Inform. 2017;21(2):476-487. doi: 10.1109/JBHI.2015.2503985 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Nook EC, Hull TD, Nock MK, Somerville LH. Linguistic measures of psychological distance track symptom levels and treatment outcomes in a large set of psychotherapy transcripts. Proc Natl Acad Sci U S A. 2022;119(13):e2114737119. doi: 10.1073/pnas.2114737119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ewbank MP, Cummins R, Tablan V, et al. Quantifying the association between psychotherapy content and clinical outcomes using deep learning. JAMA Psychiatry. 2020;77(1):35-43. doi: 10.1001/jamapsychiatry.2019.2664 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Moyers T, Martin T, Catley D, Harris KJ, Ahluwalia JS. Assessing the integrity of motivational interviewing interventions: reliability of the Motivational Interviewing Skills Code. Behav Cogn Psychother. 2003;31(2):177-184. doi: 10.1017/S1352465803002054 [DOI] [Google Scholar]
  • 24.Young J, Beck A. Cognitive Therapy Scale Rating Manual. 1980. Accessed December 17, 2023. https://beckinstitute.org/wp-content/uploads/2021/06/CTRS-Manual-2020.pdf
  • 25.Talkspace. Accessed December 10, 2023. https://www.talkspace.com/
  • 26.Darnell D, Pullmann MD, Hull TD, Chen S, Areán P. Predictors of disengagement and symptom improvement among adults with depression enrolled in Talkspace, a technology-mediated psychotherapy platform: naturalistic observational study. JMIR Form Res. 2022;6(6):e36521. doi: 10.2196/36521 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kroenke K, Strine TW, Spitzer RL, Williams JBW, Berry JT, Mokdad AH. The PHQ-8 as a measure of current depression in the general population. J Affect Disord. 2009;114(1-3):163-173. doi: 10.1016/j.jad.2008.06.026 [DOI] [PubMed] [Google Scholar]
  • 28.Shin C, Lee SH, Han KM, Yoon HK, Han C. Comparison of the usefulness of the PHQ-8 and PHQ-9 for screening for major depressive disorder: analysis of psychiatric outpatient data. Psychiatry Investig. 2019;16(4):300-305. doi: 10.30773/pi.2019.02.01 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wu Y, Levis B, Riehm KE, et al. Equivalency of the diagnostic accuracy of the PHQ-8 and PHQ-9: a systematic review and individual participant data meta-analysis. Psychol Med. 2020;50(8):1368-1380. doi: 10.1017/S0033291719001314 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, et al. , eds. Advances in Neural Information Processing Systems; vol 30. Curran Associates Inc; 2017:5998-6008. [Google Scholar]
  • 31.Magill M, Apodaca TR, Borsari B, et al. A meta-analysis of motivational interviewing process: technical, relational, and conditional process models of change. J Consult Clin Psychol. 2018;86(2):140-157. doi: 10.1037/ccp0000250 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Power N, Noble LA, Simmonds-Buckley M, et al. Associations between treatment adherence-competence-integrity (ACI) and adult psychotherapy outcomes: a systematic review and meta-analysis. J Consult Clin Psychol. 2022;90(5):427-445. doi: 10.1037/ccp0000736 [DOI] [PubMed] [Google Scholar]
  • 33.Barkham M, Lutz W, Castonguay LG. Bergin and Garfield’s Handbook of Psychotherapy and Behavior Change. John Wiley & Sons; 2021. [DOI] [PubMed] [Google Scholar]
  • 34.Flückiger C, Del Re AC, Wampold BE, Horvath AO. The alliance in adult psychotherapy: a meta-analytic synthesis. Psychotherapy (Chic). 2018;55(4):316-340. doi: 10.1037/pst0000172 [DOI] [PubMed] [Google Scholar]
  • 35.Elliott R, Bohart AC, Watson JC, Murphy D. Therapist empathy and client outcome: an updated meta-analysis. Psychotherapy (Chic). 2018;55(4):399-410. doi: 10.1037/pst0000175 [DOI] [PubMed] [Google Scholar]
  • 36.Webb CA, Derubeis RJ, Barber JP. Therapist adherence/competence and treatment outcome: a meta-analytic review. J Consult Clin Psychol. 2010;78(2):200-211. doi: 10.1037/a0018912 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Moyers TB, Miller WR. Is low therapist empathy toxic? Psychol Addict Behav. 2013;27(3):878-884. doi: 10.1037/a0030274 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Cuijpers P, Driessen E, Hollon SD, van Oppen P, Barth J, Andersson G. The efficacy of non-directive supportive therapy for adult depression: a meta-analysis. Clin Psychol Rev. 2012;32(4):280-291. doi: 10.1016/j.cpr.2012.01.003 [DOI] [PubMed] [Google Scholar]
  • 39.Creed TA, Wolk CB, Feinberg B, Evans AC, Beck AT. Beyond the label: relationship between community therapists’ self-report of a cognitive behavioral therapy orientation and observed skills. Adm Policy Ment Health. 2016;43(1):36-43. doi: 10.1007/s10488-014-0618-5 [DOI] [PubMed] [Google Scholar]
  • 40.Santa Ana EJ, Martino S, Ball SA, Nich C, Frankforter TL, Carroll KM. What is usual about “treatment-as-usual”? data from two multisite effectiveness trials. J Subst Abuse Treat. 2008;35(4):369-379. doi: 10.1016/j.jsat.2008.01.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Pascual-Leone A. How clients “change emotion with emotion”: a programme of research on emotional processing. Psychother Res. 2018;28(2):165-182. doi: 10.1080/10503307.2017.1349350 [DOI] [PubMed] [Google Scholar]
  • 42.Sharma A, Lin IW, Miner AS, Atkins DC, Althoff T. Human–AI collaboration enables more empathic conversations in text-based peer-to-peer mental health support. Nat Mach Intell. 2023;5(1):46-57. doi: 10.1038/s42256-022-00593-2 [DOI] [Google Scholar]
  • 43.Wampold BE, Mondin GW, Moody M, Stich F, Benson K, Ahn HN. A meta-analysis of outcome studies comparing bona fide psychotherapies: empiricially,” all must have prizes. Psychol Bull. 1997;122(3):203. doi: 10.1037/0033-2909.122.3.203 [DOI] [Google Scholar]
  • 44.Elliott R, Bohart AC, Watson JC, Greenberg LS. Empathy. Psychotherapy (Chic). 2011;48(1):43-49. doi: 10.1037/a0022187 [DOI] [PubMed] [Google Scholar]
  • 45.Horvath AO, Del Re AC, Flückiger C, Symonds D. Alliance in individual psychotherapy. Psychotherapy (Chic). 2011;48(1):9-16. doi: 10.1037/a0022186 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1.

eTable 1. Description of Lyssn ML Generated Metrics

eTable 2. Interrater and Human-Machine F-Scores

eTable 3. Engagement Bivariate Correlations and 95% CIs

eTable 4. Satisfaction Bivariate Correlations and 95% CIs

eTable 5. PHQ-8 Change Bivariate Correlations and 95% CIs

Supplement 2.

Data Sharing Statement


Articles from JAMA Network Open are provided here courtesy of American Medical Association

RESOURCES