Digital phenotyping, defined as the in situ data collection of people's phenotypes using digital devices, is an increasingly attractive method for understanding and treating psychiatric disorders1. The popularity of this method reflects, in part, the near ubiquity of smartphones, geolocation, social media, and other behavioral and physiological recording technologies in many modern societies. These technologies are relatively inexpensive and often yield continuous data streams that can be collected unobtrusively while people navigate their daily routine. This has the benefit of extending the geographical boundaries of assessment well beyond the traditional clinical setting and can potentially improve the effectiveness and reduce the cost of various interventions.
To date there is an impressive literature demonstrating “proof of principle/concept” for a wide range of technologies2. The voluminous material typically yielded by these technologies lends nicely to “big data” and complex computational approaches to understanding them. Indeed, the literature is replete with algorithms showing impressive accuracies for predicting a slew of clinical events and conditions using these technologies. To our knowledge, however, none have been approved for clinical psychiatric or psychological use by a governmental regulatory agency, and few, if any, have been adopted by clinicians, patients or organizations. This is in stark contrast to medicine more generally, which has adopted a large number of ambulatory objective technologies for a growing number of assessment and treatment solutions3.
We believe that the challenges in implementing these technologies reflect, in part, a lack of psychometrics to effectively evaluate and understand them4. Traditionally, measures of psychiatric/psychological phenomena are evaluated using reliability and validity. Reliability concerns the consistency of a measure across time (test‐retest reliability), individual items of the measure (e.g., internal consistency), informants (e.g., inter‐rater reliability), and situations (e.g., situational reliability). On the other hand, validity concerns the accuracy of the measure, evaluated based on putative structure (e.g., structural validity) and potential convergence with conceptually related (e.g., convergent measure) and unrelated (e.g., discriminant validity) constructs, and clinically relevant criteria (e.g., concurrent and predictive criterion validity).
Importantly, the reliability and validity of measures in clinical psychiatry are far below those acceptable in other sciences. Reliability values explaining 50% of score variance are generally considered acceptable5. Validity values are even more liberally interpreted: for example, measures showing less than 50% overlapping variance with conceptually relevant measures are often considered acceptable, if not good/excellent2. Such large amounts of unexplained variance would be unacceptable in many applied medicine, physics, chemistry, engineering, biological, computer, informatics and related sciences.
Why are reliability and validity insufficient for understanding psychiatric phenomena? Psychiatric phenotypes are not static across time and space. When measured using sufficiently “high resolution” optics we see that, for example, psychosis varies as a function of proximity to stressful situations, borderline symptoms primarily emerge as a function of proximity to interpersonal “objects” , and substance use craving waxes and wanes in proximity to substance‐related cues2, 6. Even psychological constructs like narcissism and cognition vary considerably over temporal and spatial epochs using high resolution measures6, 7. For example, “sundowning” effects, involving progressive deterioration of fluid cognitive resources throughout the day, are a signature of many neurodegenerative disorders and a potentially useful target for discriminating dementia from depression.
Currently used clinical measures offer limited information regarding the temporal and spatial dynamics underlying psychiatric phenotypes. Structured clinical interviews, personality tests, symptom inventories and functioning measures generally rely on self‐report responses and collateral observations/information obtained cross‐sectionally during a spatially “constrained” interaction (i.e., a clinical visit), and usually have imprecise instructions with respect to how specific facets of the construct change as a function of temporal and spatial features. Moreover, few, if any, oft‐used clinical measures provide data that can be scaled over user‐defined periods of time or over clearly‐operationalized spatial contexts.
Imagine if, for example, a patient's psychosis could be understood using an interface similar to online geographic maps. One could “zoom out” (decrease the resolution) to observe psychosis symptoms over days, weeks and months, and could “zoom in” (increase the resolution) to observe whether psychosis systematically change as a function of time (e.g., worse in the evening) or spatial conditions (e.g., worse when interacting with certain peers). This sort of dynamic data and interface would provide unprecedented opportunities for understanding psychiatric disorders and for personalizing pharmacological, psychosocial and emergency interventions.
Just as the reliability and validity of biomedical measures of, for example, glucose or heart rate3 are only reported and evaluated during specific and controlled circumstances, so too should the reliability and validity of digital phenotyping technologies be understood as a function of time and space. Digital phenotyping technologies are not “reliable and valid” per se, but rather can have reliability and validity under specific circumstances and for specific purposes. Reporting psychometric features with regard to relevant temporal and spatial characteristics can help guide implementation of digital phenotyping technologies, improve interpretation of their data, and potentially help optimize signal and reduce noise. Conceivably, this can improve reliability and validity parameters such that they approximate those of biomedical tests more generally.
To illustrate how resolution can improve digital phenotyping validation efforts, consider natural language processing technologies used to quantify psychosis. A cursory review of the literature reveals that “validity” has been established, in that modest convergence is documented between various computationally‐derived semantic speech features and “gold‐standard” clinical symptom ratings8. This approach to validation seems inappropriate when one considers the mismatch in resolution between these measures – with the former being derived from systematic analysis of brief language samples procured during a fairly‐contrived clinical interaction or cognitive task, and the latter representing an ordinal rating assigned by a clinician based on an extended clinical interview9. These ratings reflect very different temporal and spatial characteristics, and hence, failures to find large convergence is unsurprising. While machine learning‐based algorithms connecting digital phenotyping technologies and clinical ratings have shown impressive accuracy, they have generally also ignored the overt resolution mismatch between these variables and have not demonstrated generalizability to new samples, speaking tasks or clinical measures2, 9.
To our knowledge, resolution is not generally considered in digital phenotyping research. In order for digital phenotyping of psychiatric disorders to be considered on‐par with that of biomedical disorders more generally, their psychometrics need to be similarly precise. This precision can be achieved through deliberate consideration of “resolution” .
References
- 1. Insel TR. World Psychiatry 2018;17:276‐7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Cohen AS. Psychol Assess 2019;31:277‐84. [DOI] [PubMed] [Google Scholar]
- 3. Rodbard D. Diabetes Technol Ther 2016;18(Suppl. 2):S3‐13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Holmlund TB, Foltz PW, Cheng J et al. Psychol Assess 2019;31:292‐303. [DOI] [PubMed] [Google Scholar]
- 5. Koo TK, Li MY. J Chiropr Med 2016;15:155‐63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Wright AGC, Hopwood CJ. Assessment 2016;23:399‐403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Salthouse TA. Neuropsychology 2007;21:401‐11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Cohen AS, Elvevåg B. Curr Opin Psychiatry 2014;27:203‐9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Elvevåg B, Foltz PW, Rosenstein M et al. Schizophr Bull 2017;43:509‐13. [DOI] [PMC free article] [PubMed] [Google Scholar]