Voice and speech are remarkably rich and versatile mediums that hold untapped potential for transforming healthcare. We propose that a systematic collection of voice and speech data in clinical trials can offer profound insights, enhance patient monitoring, and drive forward the decentralization of clinical research.
Voice and speech, by their very nature, encapsulate a lot of health-related information, such as, but not restricted to our emotional state, cognitive function, respiratory health, or neuromuscular changes [1]. Voice is the sound wave produced by the larynx caused by the vibration of the vocal folds through air pressure from the lungs. That sound wave is altered by the shape of the throat, mouth, and nose (the resonators), similar to a musical instrument. Speech, on the other hand, involves the articulation of words and the production of speech content, linguistics, and prosody. The fluctuations and nuances in voice and speech can serve as sensitive indicators of underlying health conditions, either as a vocal biomarker (fulfilling the criteria of the BEST biomarker definition [2]) or a digital outcome measure in clinical trials [3]. Yet, despite its potential, voice and speech data remain a largely underutilized resource in clinical research, mostly due to the lack of standards and processes to integrate voice biomarkers into clinical trials.
The longitudinal tracking of health through voice and speech data can profoundly modify our approach to patient monitoring. Unlike traditional biomarkers that often require invasive procedures, voice data can be collected non-invasively and regularly, providing a dynamic picture of an individual’s health over time. This frequent monitoring could be particularly valuable for remote patient monitoring allowing for timely and personalized interventions [4]. Moreover, voice data collection can significantly reduce the burden on patients. Traditional clinical trials often require frequent and inconvenient visits to clinical sites. In contrast, voice data can be collected remotely, contributing to decentralization of clinical trials and reducing the need for physical visits. This aspect is especially critical in the context of chronic diseases, where patients might find frequent clinical visits to be a substantial burden. By allowing patients to contribute data remotely, we can expand the reach of clinical trials beyond traditional clinical settings. This decentralization is particularly beneficial in expanding access to trials for individuals in remote or underserved areas, thereby enhancing the diversity and generalizability of clinical research findings.
The low-cost implications of integrating voice data into clinical trials are noteworthy. Traditional biomarkers and diagnostic tools can be expensive and resource-intensive. Voice data collection, however, requires minimal equipment – often just a smartphone or a computer with a microphone. This accessibility not only reduces costs but also democratizes participation in clinical trials, making it easier for a broader population to contribute to and benefit from research.
Despite the clear advantages, the integration of voice data into clinical trials is not without its challenges. One of the primary hurdles is the lack of standardized protocols for voice data collection and analysis. We have not yet fully cracked the “code” of how to systematically leverage voice data. There is ongoing work by a large multidisciplinary consortium [5] on the development and validation of standardized methods for capturing, processing, and analyzing voice data in clinical research contexts [6]. Furthermore, many voice features are not specific to a single disease or symptom. Rather, they may indicate a range of health conditions, reflecting the complex interplay of various biological, physiological, or even cognitive processes. This non-specificity, while challenging, also underscores the potential of voice as a holistic biomarker for health. By capturing the subtle longitudinal variations in voice, we can gain insights into the overall health status of an individual, rather than being limited to specific diseases or conditions. Voice can, for example, be used as a rapid, scalable, first-line screening or health checker. Voice could also become a marker of health deterioration in settings such as emergency departments, hospitals, or even home monitoring for patients with known disease, giving insight into patients’ overall health decline and indicating to the health team when to intervene. In contrast with voice being a holistic biomarker of health, voice biomarkers can also become very precisely measurable digital endpoints in the context of clinical trials. For example, patients with Parkinson’s disease (PD) show a slower rate of speech than patients without PD. A clinical trial for a drug target PD could add this digital outcome measure by collecting voice from patients at different time points of their treatment and using speech rate as a digital endpoint.
Although there are still many unknowns on how to extract potential biomarkers, we strongly argue that collecting the data now in a standardized fashion can provide an important advantage that can later be explored.
It is also crucial to address the ethical considerations associated with voice data collection. Ensuring the privacy and security by design of voice data is of paramount importance to ensure the buy-in from end users (patients, healthcare professionals) and stakeholders. Robust protocols for preserving privacy, secure storage and trustworthy use of the voice data, in line with existing or upcoming regulations (i.e., HIPAA, GDPR, AI Act) must be in place to protect the sensitive information encapsulated in voice recordings.
In light of these considerations, we call upon the scientific and clinical research communities to embrace the collection of voice data in clinical trials. By doing so, we can build a strong evidence base that confirms the utility of voice as a longitudinal health biomarker. Collaborative efforts, such as the Bridge 2 AI Voice consortium [5] in North America, are essential to develop standardized protocols and datasets; provide guidelines, best practices, reference values; and ensure ethical and secure data handling.
In conclusion, we advocate for a concerted effort to incorporate voice data into clinical trials. This call to action is not merely about adopting a new technology; it is about embracing a paradigm shift in how we monitor and understand health and unleashing the decentralization of clinical trials. By leveraging the richness of voice, we can advance toward a future where healthcare is more personalized, accessible, and effective.
Conflict of Interest Statement
G.F. is a member of the External Scientific Advisory Board of the Bridge2AI-Voice as a Biomarker of Health consortium and is heading a European network on vocal biomarkers. He has provided advisory/speaking services and/or has received research grants and/or speaker honoraria from MSD, MSDAvenir, Eli Lilly, Roche Diabetes Care, AstraZeneca, Danone Research, Diabeloop, Bristol Myers Squibb, L’Oréal R&D, AbbVie Pharmaceutical, Pfizer, VitalAire, and Akuity Care. Y.B. is an assistant professor, laryngologist, and chief of the laryngology division at the University of South Florida in the Department of Otolaryngology – Head and Neck Surgery.
Funding Sources
G.F. is running Colive Voice, an international research program on vocal biomarkers, funded by the Luxembourg Institute of Health. Y.B. is the co-PI of the Bridge2AI-Voice as a Biomarker of Health award from the National Institutes of Health (Award Nos.: 01 OT-OD032 01 and 03 OT-OD032 01S1). These sources had no role in the study design, execution and analysis, and manuscript conception, planning, writing, and decision to publish.
Author Contributions
G.F. and Y.B. contributed equally to the conceptualization, data curation, methodology, validation, writing – original draft, writing – review and editing. They take full responsibility for the decision to submit and publish.
Funding Statement
G.F. is running Colive Voice, an international research program on vocal biomarkers, funded by the Luxembourg Institute of Health. Y.B. is the co-PI of the Bridge2AI-Voice as a Biomarker of Health award from the National Institutes of Health (Award Nos.: 01 OT-OD032 01 and 03 OT-OD032 01S1). These sources had no role in the study design, execution and analysis, and manuscript conception, planning, writing, and decision to publish.
References
- 1. Fagherazzi G, Fischer A, Ismael M, Despotovic V. Voice for health: the use of vocal biomarkers from research to clinical practice. Digit Biomark. 2021;5(1):78–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. FDA-NIH Biomarker Working Group; Glossary . BEST (biomarkers, endpoints, and other tools) resource. Food and Drug Administration; 2021. [PubMed] [Google Scholar]
- 3. Bensoussan Y, Elemento O, Rameau A. Voice as an AI biomarker of health-introducing audiomics. JAMA Otolaryngol Head Neck Surg. 2024;150(4):283–4. [DOI] [PubMed] [Google Scholar]
- 4. Evangelista EG, Bélisle-Pipon J-C, Naunheim MR, Powell M, Gallois H; Bridge2AI-Voice Consortium, et al. Voice as a biomarker in health-tech: mapping the evolving landscape of voice biomarkers in the start-up world. Otolaryngol Head Neck Surg. 2024;171(2):340–52. [DOI] [PubMed] [Google Scholar]
- 5.Bridge2AI-Voice. [cited 2024 Jul 9]. Available from: https://www.b2ai-voice.org/
- 6. Awan SN, Bahr R, Watts S, Boyer M, Budinsky R; Bridge2AI Voice Consortium, et al. Validity of acoustic measures obtained using various recording methods including smartphones with and without headset microphones. J Speech Lang Hear Res. 2024;67(6):1712–30. [DOI] [PubMed] [Google Scholar]