Despite decades of research, psychiatrists still don’t have reliable biomarkers to trace the path of human misery. The complexity and heterogeneity of psychiatric conditions means that there are no biopsies for bipolar disorder and no blood tests or brain scans to indicate the depths of someone’s depression. Even once diagnosed and offered treatment, many patients struggle to improve because they miss appointments, don’t take medication, and fail to realize when their state of mind is about to take a downturn.
By tracking the way that people with psychiatric conditions use their smartphones, researchers are finding characteristic types of behavior that could help doctors track and know when to treat mental ailments. Image credit: Shutterstock/Gorodenkoff.
But a potentially helpful tool may already—quite literally—be in hand. By tracking the way that those with psychiatric conditions use their smartphones, such as the frequency of their battery charging, the number of calls and messages they send, where they spend their time, or how well they sleep, researchers are finding characteristic types of behavior. Called digital phenotypes, these patterns could signal early warnings of relapse and track symptom severity, or simply show when a clinician should check in with a vulnerable person.
Still, even as proponents hone digital phenotyping for mental health, they’ve yet to rigorously demonstrate its effectiveness on a large scale—studies to date are relatively small, and researchers often don’t share the same software platforms, making it difficult to compare results. Others worry about privacy and the way such information could be abused. “We have to definitely take into account that there are a lot of risks that are associated with the implementation of these tools,” says Katherine Bassil, a neuro-ethicist at the University Medical Center Utrecht in the Netherlands.
Disease Tracking
The idea that people’s use of smartphones can offer important clues about their mental health emerged in a 2015 Nature Biotechnology article (1), in which US researchers pointed out that the “convergence of digital technologies and biology” offered an opportunity to ask whether “our interface with technology could be somehow diagnostic and/or prognostic for certain conditions.”
For example, a bipolar patient whose mania manifests as rapid, uninterruptible speech, or hypergraphia, could have their disease characterized by the frequency, length, and content of participation in social media, the article suggested. Digital phenotypes, it added, could help “ensure that early manifestations of disease do not go unnoticed and allow the healthcare system to develop nimble, targeted and prompt interventions.”
“As people go through mental health challenges, their digital traces—the breadcrumbs people leave behind—will look different. And if you look at those digital biomarkers, you could get insights that help clinical decision making.”
—John Brownstein
John Brownstein, an epidemiologist at Harvard Medical School in Massachusetts and a coauthor on that pioneering 2015 paper, says the idea grew from efforts at the time to track infectious disease from the way people disclosed symptoms on social media. Such revelations provided access to population health data that the team otherwise wouldn’t have been privy to, he says. “And it led to a wide range of different studies.”
His team used social media data, such as Twitter mentions and Google search terms, to track many factors, including obesity, drug abuse, and gun violence. “We advocated it as an additional tool for healthcare decision making, but also as a research tool for public health,” he says. The increasing sophistication of mobile devices and new media platforms soon led people to quantify “their health in far more ways than when we first started,” Brownstein says.
The idea was initially considered unworkable by some, he says. Would the data that are part of people’s digital footprints be detailed enough to pick up the symptoms and characteristics of specific conditions—conditions whose symptoms often overlap? But as the number of usable data types has grown, and methods have been developed to find signature patterns, digital phenotypes have emerged as a useful and more widely accepted psychiatric tool. “They’re not perfect,” Brownstein says. “But they provide a window into an individual that you couldn’t get otherwise.”
More Accurate Measures
In some cases, that window also offers a more reliable view than self-reporting. For example, people in surveys tend to overestimate how much they sleep (2) and underestimate how much time they are sedentary (3).
Access to accurate data on an individual’s behavior can offer some distinct advantages compared with such self-reported information for psychiatrists trying to deal with opaque mental health conditions. “It makes sense, right? If you’re not feeling well, you probably change your behaviors to some extent, and that may reflect in the ways you use your phone,” says John Torous, a psychiatrist at the Beth Israel Deaconess Medical Center in Boston, Massachusetts, and a leading researcher in digital phenotyping.
Driven by this idea, dozens of studies have analyzed connections between psychiatric symptoms and data collected from sensors on phones and other mobile devices. These studies have linked, for example, blunted facial expressivity, as measured on smartphone cameras, to the severity of schizophrenia (4); sleep disruption to increased mania in bipolar disorder (5); and more time spent at home, as measured by GPS data, with more severe depression (6).
“As people go through mental health challenges, their digital traces—the breadcrumbs people leave behind—will look different,” Brownstein says, “And if you look at those digital biomarkers, you could get insights that help clinical decision making.”
The most reliable digital phenotype, proponents agree, is one that examines multiple types of behavior and sees if any of them change, for better or worse. In a 2018 study, Torous and colleagues used this approach to show that AI algorithms can spot “anomalies” in mobility and social behavior collected by smartphones and thereby detect early signals of relapse in psychotic patients (7). More recently, Torous and colleagues have found similar signals to detect spikes in anxiety and depression. “To really help understand the earliest warning signs when someone may need extra help is pushing us towards prevention,” Torous says. This sort of monitoring, he adds, can avert emergency room visits.
Measuring multiple types of activity to find an early warning was also shown in a study published last November, in which researchers in South Korea used machine learning to analyze multiple data streams from connected apps and wearable devices to better predict whether people with mood disorders would show preliminary signs or symptoms of a panic attack the following day (8). The study, which monitored 43 patients for two years, not only recorded heart rate, sleep duration, step count, and other wearable information, but also relied on a separate smartphone app that asked users each day to record their mood status and their levels of energy, anxiety, and annoyance, along with their smoking habits and their consumption of coffee and alcohol. Users also self-reported panic symptoms.
An AI system then identified patterns in the data to differentiate stable days from those just before a panic attack set in. Identifying stable days is as clinically useful as predicting panic attacks, the researchers say, because it offers reassurance about their state of mind, thereby reducing anxiety.
Sharing Concerns
Despite the promising results, the nascent field of digital phenotyping does draw skepticism. Bassil is concerned about drawing overly specific conclusions from behavioral data—especially when the connections are drawn by AI. “Many times, these machine learning algorithms that are analyzing all these data are unsupervised,” she says. “We don’t know how they come out with the output that they come out with.” The Korean study that identified the warning signs of panic, for example, could only highlight a cluster of variables that seemed the most significant—including responses to clinical questions about childhood trauma alongside data on heart rate.
Bassil also raises a more fundamental objection. Many digital phenotype research projects make a point of sharing the findings with those who volunteer their data, Torous says. But, according to Bassil, evidence from years of genetic testing shows that returning results in this way carries significant risks. “Handing the results of genetic testing to individuals that do not necessarily have any symptoms, but might show a certain risk for a certain disease or disorder, can lead to unnecessary anxiety in individuals,” she says.
Moreover, digital phenotyping results that indicate latent signs of mental disorders could make people behave in irrational and unpredictable ways, she warns: “When we think of bioethics principles and this kind of testing, we think about how would this influence the autonomy of an individual after they receive such information.”
Torous agrees that patients could misunderstand digital phenotyping results and says the findings are always best interpreted through a clinician. “We’re still learning about the clinical validity of these signals, but you can certainly talk about it with a person. You can interpret it together,” he says, noting that his team provides different visualizations of the data to educate patients. Changes or patterns in the data could spark a discussion between patient and physician to understand what is happening at that point.
Sharing results also means that patients themselves could inadvertently reveal more about themselves on social media than they would like. “There’s definitely a risk as people make data more and more public that the information could be used in assessing them,” Brownstein says, adding that the work of academics in this space is less concerning than how social media companies already use data to push advertising and content.
As digital phenotyping evolves and becomes more widely used, it will likely develop in a way similar to how genetic counseling helps people understand DNA tests. “I think for digital phenotypes, we’re going to need something similar as we build up the science of what it means,” Torous says.
Sizing Up
A crucial factor for developing that science is study size: scientists have yet to conduct a large-scale project to validate the findings of smaller studies and gather enough data to generate clinically reliable links between behavior and symptoms for whole populations, not just individuals. Given the ubiquity and convenience of smartphones, “this should be something that we can do in thousands, millions, or billions of people,” Torous says.
But so far, most published results from digital phenotype projects looking at people with psychiatric conditions remain at the pilot scale, each collecting smartphone and symptom data from a few hundred people or so at the most. For digital phenotyping to develop and become a clinically useful tool, studies need to expand to groups of thousands of patients, perhaps more. And that will be a logistical challenge.
“I think it is possible. It just needs a lot of, like, coordinated efforts or a large consortium of different groups,” says Talayeh Aledavood, a computer scientist at Aalto University in Finland, who recently published the results of a digital phenotyping pilot-scale project in that country. Called Mobile Monitoring of Mood, the study gathered smartphone behavioral data over 12 months from some 130 patients who had been diagnosed with various depressive disorders. It identified dozens of potential markers for disease severity and change, including how well the phone battery was charged and how often the screen was off (9).
One challenge for large-scale studies is that many of the behavioral markers differ among individuals. While increased messaging might signal worse depression for some people, in others, it comes when they feel better. “That means when we are looking at a group, we don’t get a strong signal because people are canceling out each other,” she says. It’s a Catch-22. “We need to use personalized models, but to make those models, we need a lot more data.”
Other studies have shown a similar picture, with the same behavioral signal associated with disparate, sometimes contrasting, symptoms in different people. That also makes sense, Torous says, highlighting that digital phenotyping is currently best viewed as personal medicine, rather than a tool that can be applied to broad and diverse populations.
“These mental illnesses are very heterogeneous in how people experience them, and the environmental factors that trigger them or exacerbate them for people are, of course, different,” Torous says. “They cannot be reduced down to one gene, one brain scan, or one kind of smartphone signal.”
To progress, the field must now wrestle with not only science challenges, but an important technical one. To date, different academic and commercial groups looking for digital phenotypes have tended to produce and use their own apps and software and often kept the details secret. That has blocked attempts to combine and reproduce the results of studies—but this is starting to change.
“One of the Achilles heels of this field is that there’s so many different software platforms,” Torous adds. “So it’s definitely exciting to finally see many teams asking their own independent questions and their own samples, but using the same software.”
References
- 1.Jain S., et al. , The digital phenotype. Nat. Biotechnol. 33, 462–463 (2015). [DOI] [PubMed] [Google Scholar]
- 2.Lauderdale D. S., et al. , Self-reported and measured sleep duration: How similar are they? Epidemiology 19, 838–845 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Prince S. A., et al. , A comparison of self-reported and device measured sedentary behaviour in adults: A systematic review and meta-analysis. Int. J. Behav. Nutr. Phys. Act 17, 31 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Abbas A., et al. , Facial and vocal markers of schizophrenia measured using remote smartphone assessments: Observational study. JMIR Form Res. 6, e26276 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ebner-Priemer U. W., et al. , Digital phenotyping: Towards replicable findings with comprehensive assessments and integrative models in bipolar disorders. Int. J. Bipolar Disord 8, 35 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Laiou P., et al. , The association between home stay and symptom severity in major depressive disorder: Preliminary findings from a multicenter observational study using geolocation data From smartphones. JMIR Mhealth Uhealth 10, e28095 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Barnett I., et al. , Relapse prediction in schizophrenia through digital phenotyping: A pilot study. Neuropsychopharmacology 43, 1660–1666 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Jang S., et al. , A digital phenotyping dataset for impending panic symptoms: A prospective longitudinal study. Sci. Data 11, 1264 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ikäheimonen A., et al. , Predicting and monitoring symptoms in patients diagnosed with depression using smartphone data: Observational study. J. Med. Internet Res. 26, e56874 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

