Abstract
Our current diagnostic methods for treatment planning in Psychiatry and Neurodevelopmental Disabilities leave room for improvement, and null results in clinical trials in these fields may be a result of insufficient tools for patient stratification. Great hope has been placed in novel technologies to improve clinical and trial outcomes, but we have yet to see a substantial change in clinical practice. As we examine attempts at biomarker validation within these fields, we find that it may be the diagnoses themselves that fall short. We now need to improve neuropsychiatric nosologies with a focus on validity based not solely on behavioral features, but on a synthesis that includes genetic and biological data as well. The eventual goal is diagnostic biomarkers and diagnoses themselves based on distinct mechanisms, but such an understanding of the causal relationship across levels of analysis is likely to be elusive for some time. Rather, we propose an approach in the near-term that deconstructs diagnosis into a series of independent, empiric and clinically relevant associations among a single, defined patient group, a single biomarker, a single intervention and a single clinical outcome. Incremental study across patient groups, interventions, outcomes and modalities will lead to a more interdigitated network of knowledge, and correlations in metrics across levels of analysis will eventually give way to the causal understanding that will allow for mechanistically based diagnoses.
New technologies have always had the potential to uproot our existing perspective on disease causation and nosology (Drury, 1996, p. 4; Greenwald, 2012). In almost every field of medicine, including Neurology, methods of observation have advanced sufficiently to understand illness mechanisms in a way that has transformed clinical practice and therapeutic outcomes. The fields of Psychiatry and Neurodevelopmental Disabilities (NDD) have been the outliers. Both medical disciplines have, for over a century, relied on behavior, subjective phenomenology and function within the community as the basis for assessment, diagnosis, therapy and prognostication. Our framework for diagnosis has, for the past 40 years, been based on the Diagnostic and Statistical Manual of Mental Disorders (DSM-III) (American Psychiatric Association, 1980). Prior to that time, there were multiple, competing psychiatric nosologies (Kendler, 2009), based on different clinicians’ attempts to lump patients into groups in terms of their behavior and reported subjective experience. The goal when developing the DSM was to enhance inter-rater reliability, the probability that any two psychiatrists would arrive at the same diagnoses in the same patient. What DSM was not explicitly developed to reflect was validity of the diagnostic constructs, “[confirmed] hypotheses concerning the etiology and pathophysiology of an illness construct, demonstrating that the category represents a real and naturally occurring entity with a specific pathological mechanism” (Bogenschutz & Nurnberg, 2000). It is not clear, then, that DSM diagnoses truly “carve nature at its joints” but rather may simply reflect a semi-arbitrary naming convention.
While the DSM diagnoses are helpful to treatment planning in some instances (examples to follow), their collective limitations have become evident (Zachar & Kendler, 2017), even at the level of describing symptomatology. Heterogeneity is rife: there are 636,120 permutations of symptoms that qualify for the diagnosis of Post-Traumatic Stress Disorder (Galatzer-Levy & Bryant, 2013). Moreover, high rates of comorbidities mean that the lines between diagnoses are not as clear as one would expect if DSM diagnoses were truly reflections of distinct pathogenic mechanisms. The limitations of DSM extend to predicting outcomes and responsiveness to therapy both in the clinic and clinical trials. An especially striking example involves the widespread use of aripiprazole, in schizophrenia, bipolar disorder (BPD), major depressive disorder (MDD), Tourette syndrome and for irritability associated with autism spectrum disorder (ASD). The fact that this medication is useful in all these DSM diagnoses suggests that these diagnoses do not have discriminant validity, i.e., judged from the framework of aripiprazole responsiveness, the different disorders reflect a distinction without a difference.
Conversely, there is a concern that behavior- and feeling-based syndromes are inadequate to stratify patients meaningfully in clinical trials. If patients vary widely in illness mechanism within a diagnostic syndrome, the success of novel therapies for those patients as a group is not likely to be high. A string of failures in neurobehavioral clinical trials has caused several large pharmaceutical companies to move out of this space beginning a decade ago (Hyman, 2012), denying millions of sufferers critical investment in future therapies. Some suspect that a true effect in some responders with an intervention-sensitive pathogenic mechanism has been washed out by a larger number of non-responders who have alternate and intervention-insensitive pathogenic mechanisms. Restated, there is important mechanistic heterogeneity within a single diagnosis that is not captured by our current nosology, and this stifles drug development and personalized medicine.
These perceived limitations in DSM diagnoses extend not only to symptom expression and responsiveness to interventions, but to other types of biological data. Genetics (Gandal et al., 2018), imaging effects (Lui et al., 2015; Zhang, Lei, et al., 2020), behavioral characterization (Reininghaus et al., 2019; Rosen et al., 2012; Waterhouse, London, & Gillberg, 2016), cognition (Hill et al., 2009; Hill et al., 2013), eye movement studies (Sweeney et al., 1994; Sweeney, Takarae, Macmillan, Luna, & Minshew, 2004) and outcomes of interventions (Dew, Switzer, Myaskovsky, DiMartini, & Tovt-Korshynska, 2006) have both great heterogeneity within diagnoses and overlap/nonspecificity across diagnoses.
What is the alternative to the DSM approach? One is to assume that behaviorally based diagnoses have the potential to be adequate for our purposes, and that other ways of “slicing the symptom pie” may generate divisions that are more clinically meaningful (Clementz et al., 2020). A radically different approach is to make biology a co-equal or even primary factor in considering how patients are best separated into diagnostic groups. And, beyond that, lies the holy grail of approaches, which relates each diagnosis to a biological causal mechanism of pathological behavior or subjective experience, based on “etiology and pathophysiology,” and representing “a real and naturally occurring entity” (Bogenschutz & Nurnberg, 2000). This is the apparent goal set by regulatory agencies who have a role in biomarker development. With such a mechanistically based nosology, the hope and expectation are that the diagnostic categories would align tightly with the efficacy of existing and future interventions, as well as with symptomatology and read-outs of various types of biology.
Mechanisms and Degeneracy
A mechanism is an aspect of (patho)physiology or anatomy that causes normal or altered function at a higher level of analysis (Illari & Williamson, 2012). By “level of analysis” (Fig. 1), we mean an inferential scale within neuroscience. For example, we believe that genetic code determines molecular processes, and that those molecular process are responsive to pharmacotherapy; experience-based factors, possibly mediated by epigenetic mechanisms, may influence genetic expression. Molecular processes influence both neuroanatomical “wiring” as well as the expression of neuronal physiology. Neuronal physiology influences the behavior of neuronal ensembles, and the computational properties of even larger scale neural networks influence behavior. In Psychiatry and NDD more so than other medical disciplines, supra-organismal levels of analyses are also important: inter-personal processes may create judgements about “normal” vs. “abnormal” behavior, and function in line with community supports and expectations may generate the presence or absence of disability.
Figure 1.
Levels of analysis considered for biomarker development programs
What is hoped for, then, is a clean understanding of pathogenic mechanisms in Psychiatry and NDD diagnoses that are based on mechanistic constructs, with the technology available to measure those mechanisms in a sensitive and specific way. That way, we can predict when a new treatment will work for a given patient to produce a particular outcome and rationally develop new interventions. A mechanistic description that spans multiple levels will successfully predict outcomes from interventions applied at various levels (e.g., pharmacology, neurostimulation, cognitive training).
Consider Neurology as a neighboring example of the type of understanding we may desire in neurobehavioral disorders. In a patient with nocturnal headaches and vomiting, symptoms concerning for hydrocephalus, we obtain an MRI which can show direct consequences of increased intracranial pressure. If positive, we ask the neurosurgeons to insert a shunt to relieve the pressure. If the patient has symptoms of meningitis, we culture the cerebrospinal fluid (CSF) and treat with antibiotics that kill the offending bacteria. If the patient has a spell that sounds like a seizure, we measure the brain’s electrophysiology with an EEG and prescribe a medication that alters that physiology. Importantly, these tests also provide a directly relevant and specific way to evaluate treatment outcome when necessary. Clinical phenomenology, assessment and treatment are all linked in explainable fashion.
Neurology has advanced much more rapidly over recent decades than Psychiatry and NDDs when it comes to developing these multi-level mechanistic constructs and using them to guide diagnostic and therapeutic practice. With this multilevel mechanistic understanding, there is relatively little interindividual variation when it comes to sensitivity of particular tests for markers of disease mechanisms within diagnostic entities.
Not so for Psychiatry or behaviorally defined NDDs, where there is considerable heterogeneity of symptoms within DSM diagnoses and co-morbidity across diagnoses, and we have relatively little solid understanding of the mechanistic drivers of neurobehavioral illness. Furthermore, in the case of psychiatric disorders, mechanistic studies are particularly difficult. Cross-species research is more challenging because we do not have illness mechanisms to model and it is hard to be certain that a mouse model and a human patient share the neurobiological condition. Moreover, causal reasoning is difficult to validate based on observational studies or the limited capacity to experimentally perturb brain systems to test mechanistic models (Adamek, Luo, & Ewen, in press). There is no simple equivalent to Koch’s Postulates in cognitive neuroscience.
The diagnoses, diagnostic tools and clinical characteristics in many fields of Neurology are similar to the nearly 1:1 relationship between the 47,XX/XY,+21 karyotype and the clinical features of Down syndrome. Many examples in Psychiatry are likely more akin to the multifactorial and statistically noisy relationship between the 100-plus risk genes of modest effect for ASD or schizophrenia and the expression of the disorders.
Restated, the link between levels of analysis in Psychiatry remain poorly established and are likely more complicated in the number of contributor biological and environmental/cultural influences. When clinically relevant subjectivity that defines whether a disorder is even present in the first place is a product of many factors apart from objective biology (Bennett & Hacker, 2003), detecting specific abnormalities with biological tests is likely to paint an incomplete picture at best. However, we do not take this as a reason not to pursue biological therapies and biomarkers. The existence of effective biological therapies, such as psychopharmacology, as well as the (relative) success of cognitive neuropsychology in cortical localization are proof positive of the link between the neurobiological aspects of psychiatric illness and clinically meaningful outcomes. While we believe there are level-spanning mechanisms, yet to be discovered, until there is a better understanding of these neurobehavioral mechanisms, it seems probable that we will get the best signal-to-noise (SNR) from biological measurements made at the level closest to that of the intended intervention. An early example of clinically relevant measurements made at a level relevant to the therapy is provided by studies on the distribution of plasma concentrations of drugs in groups given the same dose, ultimately determined to be genetic in nature (Endrenyi, Inaba, & Kalow, 1976; Evans, Manley, & McKusick, 1960). Thus, patients could be diagnosed as being normal or fast metabolizers.
By the same reasoning, a factor that has presumably contributed to the increasing apparency of DSM limitations is the fact that novel therapies are progressively more focused on specific biological mechanisms distal from the nonspecific phenomenology that forms the current framework for diagnosis. Prescribing cognitive-behavioral therapy to alleviate anxiety based solely on subjective phenomenology is relatively straight forward, with a psychological problem treated by a psychological intervention. By contrast, psychopharmacological interventions work at a molecular level, and the strongest correlation with psychopharmacological intervention may come from technologies that index the molecular level. As another example, some investigators believe that different individuals may arrive at the core symptoms of Autism Spectrum Disorder (ASD) through different cognitive routes (Happé, Ronald, & Plomin, 2006). A panel of cognitive event-related potentials (ERPs) may then help differentiate patients who develop the ASD phenotype as a result of a deficit in face processing (Pelphrey, Yang, & McPartland, 2014), for example, vs. those who have a deficit in multi-sensory processing (Brandwein et al., 2015). The optimal set of cognitive interventions for a particular patient may be based on what cognitive mechanism caused his/her ASD phenotype in the first place.
Enter Biomarkers
Technological and neuroscientific advances have spurred a new, hopeful era in biomarker development for neurobehavioral conditions (Clementz et al., 2016; Ewen, 2016; Sahin et al., 2018), in which we have the ability to measure directly the structure and activity of biological systems at a wider variety of levels of analysis than ever before. The hope is that, by measuring processes closer to the level of action of a particular intervention, we have the best possible SNR for identifying participants for clinical trials and for later targeting therapy in the clinic. MRI, EEG and CSF studies, genetic investigation and blood-based biochemical characterization have had success in advancing mechanistic understanding, but when evaluated for use in our currently defined, heterogeneous clinical syndromes, they have not surprisingly lacked the degree of specificity needed to develop or apply mechanistically informed neuropsychiatric biomarkers. As a result, without improved nosologies that define more biologically discrete conditions or patient subgroups, advances in drug development and clinical care will likely remain limited. We offer a path for doing so below, but first must lay out the nature of how biomarker candidates are actually validated and shown to serve the function intended.
Validation
Every measurement contains noise (variance) and sensitivity to artifact (bias). The goal of a validation study (Bossuyt et al., 2015; Ewen & Beniczky, 2018) is to define how much noise a novel metric contains (dependent variable; “index test”), when judged against an existing reference standard (independent variable; “gold standard” or “criterion”). There is no meaningful sense in which a biomarker candidate has measurable performance except in relationship to some reference standard. The output of any validation effort is simply to report how well the output of the novel biomarker (index test) reflects the output of the reference test. Therefore, in the context of biomarker validation, (1) all mismatch between index test and reference test must be assumed to be noise within the index test, and (2) all validated diagnostic biomarkers necessarily inherit the limitations of their reference standard and the diagnostic constructs against which they were validated. Validating a novel biomarker candidate against a DSM diagnosis, then, does not permit one to transcend the limitations of DSM diagnoses but rather passes those limitations to the “next generation” of diagnostic tests.
In the authors’ experience, most current efforts in diagnostic biomarker validation use some metric of DSM diagnosis as the reference standard/independent variable and therefore recapitulate DSM diagnosis, with all of its limitations. This is not to criticize the DSM on all counts, as there are situations where it clearly offers clinical benefit, such as prescribing therapy in a patient with low mood who is diagnosed with Bipolar Disorder (BPD) vs. Major Depressive Disorder (MDD) vs. Adjustment Disorder. But the aripiprazole example of trans-diagnostic efficacy is perhaps the rule rather than the exception.
Biomarkers can add value to a DSM-based diagnostic approach when they are cheaper, faster, less invasive, less subjective, have a higher psychometric reliability, have a lower failure rate, or may be deployed more widely (requires less specialized equipment or less specialized human expertise) (Ewen & Beniczky, 2018; Gunnarsdottir et al., 2020; Gunnarsdottir et al., 2018). These considerations are particularly relevant given workforce shortages in Psychiatry and NDD fields. One example, though not an uncontroversial one (Loo et al., 2013), is the use of an EEG-based device to allow general pediatricians to better estimate the DSM diagnosis of ADHD vs. ADHD-mimics. In these studies, the reference standard was DSM diagnosis as determined by a multi-disciplinary panel of specialists (Snyder, Rugino, Hornig, & Stein, 2015). The “value added” of the biomarker, then, is that it helps general pediatricians better recapitulate the expert diagnoses of a multi-disciplinary panel more cost-effectively and with greater availability than trying to refer each patient to the multi-disciplinary panel itself.
However, it is often the case that our modern technological wonders are not less expensive, more widespread, faster or less invasive, so what is the added value of employing a new, less efficient test, to estimate the output of an existing and more efficient test? Performing functional MRI is much more expensive than a psychiatric inventory, and there are few people who are trained to perform and interpret it reliably. The hope, of course, is that the biomarker can offer information that is more mechanistically based, or can at least report on structure and activity closer to the level of analysis of the intended intervention. And with information synthesized from a range of technologies, patient groups and interventions, we have a hope of discovering new information that can reboot nosology to transcend the limitations of the DSM and thus foster development of improved and more personalized therapies.
Yet the path is not so simple, since the use of the imperfect DSM diagnosis as a reference standard or criterion inherently shuts down our ability to do just this. If the aim is to develop a biomarker, there is, mathematically, no way in this scenario to disambiguate the noise associated with our novel technique with the noise inherent in our reference test and in the diagnosis that the reference test measures; it is an ill-posed problem, akin to trying to find a unique solution to a single equation with two unknowns. In particularly egregious clinical trial situations, a single dataset could contain up to four unknowns: effect of a novel intervention, performance of the biomarker candidate, validity of the reference standard and utility of the underlying diagnostic construct. How do we apportion to statistical uncertainty among these unknowns? Mathematically, there is no principled way to do so, and we risk falling into an irresolvable state of confusion (Plato, 1924).
The Criterion Problem
This paradox is not new; it vexed efforts to validate psychometric tests in the middle of the 20th century and was known as the Criterion Problem: “The goal of validating measures of psychological [diagnoses] necessarily requires criteria that are themselves valid. One cannot show that a [diagnostic biomarker] of some form of psychopathology is valid, unless one can show that the [diagnostic biomarker] relates to an indicator of that form of psychopathology that is, itself, valid. One cannot show that a certain deficit in cognitive processing [or neurobiology] characterizes individuals with a certain disorder unless one has defined and validly measured the disorder” (Strauss & Smith, 2009) (updating of some of the terminology is ours). In short, if we do not have an existing and valid way to measure our “Platonic Form” of a diagnostic construct, then we have no direct way to measure how well a new biomarker approximates it. And if we are also skeptical of our Platonic Form of the diagnostic construct, we are in even worse shape. Contrast this situation to that of Neurology, neuropathological diagnosis almost always stood as a final and unequivocal arbiter of the disease pathology (Meehl, 1977, pp. 285–288). Even if neuropathological information came “too late” for the individual patient, it served to tune neurologists’ understanding of the performance of their diagnostic tools.
In sum, we want to develop new tools that perform better for treatment planning and clinical trial stratification, yet our nosological framework and associated existing diagnostic tools leave us without a trusted yard-stick against which to judge the performance of these new tools. If anything, we hope to use these (yet unproven) tools to engineer a new yard-stick! The process that can move the field forward will involve the incremental interplay among mechanistic science, biomarker development, trials of novel interventions and ongoing nosological refinement.
To build up to this goal, we begin by deconstructing why we employ diagnoses in the first place. We break the problem into smaller and more empiric units, since we do not yet have adequate mechanistic understanding of the range of psychiatric manifestations. If we make explicit what the function of diagnoses are, then we have a near-term target to aim for. As we proceed, we remain mindful of where similar problems were previously solved in classical psychometrics, to avoid making errors that were identified and solved in the middle of the last century, for example, in studies of personality and clinical symptom patterns (Cattell & Gibbons, 1968; Lorr, Jenkins, & O’Connor, 1955; Strack, Choca, & Gurtman, 2001). But we also have to extend the psychometric toolkit to consider how we can integrate data from multiple levels of analysis.
Just What is a Diagnosis For Anyway?
Patients walk into a medical clinic with a chief complaint and hopes of walking out with an effective treatment plan and an accurate prognosis. Diagnoses and differential diagnoses can be thought of as practical tools that mediate the transformation from chief complaint to treatment/prognosis. Diagnoses have at least four functions:
Predict how likely a certain treatment is to work in an individual patient,
Prognosticate a patient’s natural history,
Identify clinically (treatment, prognostically) meaningful groups of patients by summarizing symptoms and clinical course, and
Reflect a causal mechanism, which links multiple manifestations, outcomes and interventions, with the goal of enabling cross-species research and providing prior knowledge, for example, about the probability of whether a never-before-tested therapy will be effective for a certain disorder.
These functions are inter-related. Any given diagnosis or diagnostic biomarker accomplishes some—but not necessarily all—of these.
DSM-based diagnoses are predicated primarily on the third function, above. As illustrated in the examples below, merely categorizing patients into groups (function #3) is an insufficient criterion for clinical utility unless those groups meaningfully guide therapy (function #1) or enable prognostication about natural history (function #2). This is in fact the charge we level against DSM: it fails to perform functions #1 and #2 consistently. Because explanatory cognitive neuroscience is still far away from adequate development to perform function #4, with functions #1–3 following naturally after function #4, an alternative and intermediate approach is to focus on relatively straight-forward, data-driven/“black-box” methods that develop tools that may not be diagnostically related or specific but have proven utility in guiding therapy or prognostication.
More empiric, “single-use” biomarkers focus on the prediction of responsiveness to one particular treatment (Zhang et al., 2018) or the prognostication of one particular outcome in a particular condition; a diagnosis centered around this approach might be “hormone-responsive breast cancer.” Predicted responsiveness to a therapy becomes the nosology defined operationally by biomarker measurement.
Synthesis Across Levels of Analysis
In parallel to considering the issue of how to conceptualize diagnosis for the purpose of validating biomarkers, we must contend with the issue of how to synthesize—or not—data across multiple levels of analysis. Mechanistic approaches, where we have causal understanding of how processes and one level drive processes at another level, remain the Holy Grail. Consider the example of Parkinson’s disease (PD), which is an illness with a relatively well understood etiological mechanism (Krakauer, Ghazanfar, Gomez-Marin, MacIver, & Poeppel, 2017). Knowing that a certain behavioral symptom (e.g., bradykinesia) is a consequence of specific alterations of the dopaminergic system helps us consider which new medications may be effective and study the phenomenon in non-human animal models. All of these elements are in play in the case of PD, in which motor abnormalities (the behavioral symptoms) are known to be linked to deficiency of dopaminergic transmission in a specific brain region, motivating treatments that would increase dopamine function, ideally in a limited area to avoid side effects. And there has been sufficient experimentation both in animals and humans to be reasonably confident of the validity of the diagnostic construct termed “PD.” And with knowledge of dopamine deficiency in the CNS, initially discovered through post-mortem studies, researchers had a target to aim for when developing biomarkers such as PET imaging of dopamine transporter abundance. These biomarkers have been developed and validated as a useful means of documenting the degree of pathology. However, even in the case of PD, we are not certain that the diagnostic construct is valid in terms of delineating a single fully understood disease process that can be modelled in animals allowing us to discover and apply additional translational biomarkers. Nonetheless, the PD example illustrates how the development of translational biomarkers is more readily accomplished when there is some mechanistic knowledge to link measurements across species such as dopamine transporter density in a brain region in the living rather than post-mortem state.
By contrast, non-mechanistic, “single-use” approaches that reduce diagnosis to the prognostication of a single outcome or prediction of response to a single therapy, treat each level of analysis of biomarker read-out independently from any other level of analysis, in multi-axial fashion (Rutter, Shaffer, & Shepherd, 1973). The expectation for this approach would be that, in clinical practice, treatments at one level of analysis would be prescribed based on the biomarker that reflects that level of analysis, and independently of the output from biomarkers at other levels of analysis.
An approach that lies somewhere between the fully-independent, multi-axial approach and the causal, mechanistic approach is to look for statistical clustering within multi-modal data integrated across levels of analysis (with subsequent validation as to the predictive/prognostic utility of cluster membership). However, while this multi-level clustering approach reflects “correlation” across levels, it does not establish true causation, as the fully mechanistic approach requires. As such, like the single-level biomarker, these clusters are limited in that they are applicable only for a prespecified patient group, intervention and outcome.
While mechanistically informed biomarkers and diagnoses are the Holy Grail of clinically-applied cognitive neuroscience, our primary contention is that successful near-term efforts will focus on “single-use,” empiric biomarkers. Having considered the mechanics of validation, how diagnosis may be reduced to more elemental clinical problems, and strategies for statistically linking (or not) measures at multiple levels of analysis, we now build-up a picture of how neurobehavioral biomarker efforts may proceed.
Empiric, Elemental Diagnostic Biomarkers and Predictive Validity
Predictive validity is a type of validity in which there is a time component: we assess how the index test, measured at t0, reports on the value of the criterion at t1. The original intelligence test, for example, developed by Binet and Simon, was intended not for defining some high-minded concept of human performance, but as a simple empiric tool that would predict which children would not benefit well from the French public education system. If a biomarker reports on an outcome earlier than it would otherwise be evident (Ewen et al., 2009), then it still may be valuable even if it is not cheaper, more deployable, less invasive, etc., than the reference test that serves as the independent variable. If a $2,000 fMRI-based test performed before treatment initiation is able to predict whether a patient will respond to a certain medication that may require weeks of treatment before clinical response is clear (even as measured by something as simple as the Clinical Global Impression [CGI] scale), it may not matter that the fMRI is expensive, takes an hour to perform and is only available at a few centers, whereas the CGI takes a brief interview and a pencil. The ability to know-in-advance, to save patients weeks of suffering during a treatment trial that may not be effective, may justify the expense in many circumstances.
In this sense, we can reduce much of diagnosis merely to prediction and prognostication (functions #1 and #2 of diagnosis). Diagnosis should create a path to optimal treatment, and the optimal treatment is selected based on estimating the probability and effect size of response to one intervention from among a set. A patient reports low mood. Will the mood improve to the greatest degree if we assess and mitigate psychosocial stressors vs. if we treat with a SSRI vs. if we treat with a mood stabilizer? Rather than diagnosing Adjustment Disorder, MDD or BPD, our diagnoses could be rewritten as “Low Mood with Responsiveness to SSRI,” “Low Mood with Responsiveness to Mood Stabilizer,” and “Low Mood with Responsiveness to Stressor Management.” While this example gives credence to the utility of DSM-style diagnoses, the case of psychosis is different. Whatever DSM psychotic diagnostic criteria the patient might meet, the primary treatment will still be an antipsychotic. In the one instance for which the efficacy of one antipsychotic can be differentiated from all others, clozapine is useful in patients that fail to respond to other antipsychotics, and in this case we would have a diagnosis that identifies who will not respond to any antipsychotic other than clozapine.
A particular diagnosis may also trigger an investigation into an underlying cause that could lead to sequelae beyond the chief complaint or surveillance for associated conditions. These aspects can similarly be broken down into elemental problems. For example, a patient reports head pain (cephalgia). What is the probability that the patient will die of cerebral herniation? We currently apply available tools (history, neurological exam, MRI) to diagnose migraine vs. mass lesion or increased intracranial pressure (ICP). But our diagnoses could be rewritten as “Cephalgia with Increased Risk of Herniation” and “Cephalgia without Increased Risk of Herniation,” which do not map precisely to the current diagnoses of migraine, increased ICP, cluster headache, etc. We could validate neurological checklists and MRI interpretations (human or machine learning) against the criterion of risk for severe neurological outcomes.
The use of biomarkers in this way to predict treatment outcomes with demonstrated clinical utility in Psychiatry of for NDDs is, to date, fairly rare (Parker et al., 2019; Ren et al., 2013; Zhang et al., 2018). However, there are examples of studies that have shown some utility of clinical prediction of treatment response at the single-patient level and prediction of future illness in high-risk individuals (Cao et al., 2019; Emerson et al., 2017; Zhang, Nery, et al., 2020). One of the most recent examples emerges from a machine learning based analysis of EEG data demonstrating potential biomarkers of antidepressant response to an SSRI (Wu et al., 2020). A limitation of current research for the approach suggested here is that it often includes a specific DSM diagnosis as part of the inclusion/exclusion criteria, rather than leaving DSM to one side and basing inclusion/exclusion solely on individual symptoms or chief complaints, such as in the approach taken by the Bipolar and Schizophrenia Network for Intermediate Phenotypes (B-SNIP) consortium (Tamminga et al., 2013).
The current governing schema of biomarkers for use in clinical trials from the (US) FDA Biomarkers, EndpointS and other Tools (BEST) framework (FDA-NIH Biomarker Working Group, 2016) distinguishes among diagnostic biomarkers, predictive biomarkers and prognostic biomarkers (Leptak et al., 2017; Menetski et al., 2019). The agency typically encourages treatment development for diagnoses rather than symptoms, which has led to biomarker efforts in clinical trials considered at the level of syndromes/diagnoses rather than symptom clusters or biologically defined conditions. With the validity of DSM diagnoses in doubt, the biomarker-based approach toward restructuring or subtyping diagnosis redefines plans for biomarker development toward independent predictions and prognostications at the individual level. The upside of this approach is that it is concrete. It can be fully data-driven and “black box”: there is no need to appeal to theoretically based neurobiological or psychological mechanisms. But the flip-side of employing such a “single-use” approach is that a biomarker that predicts response to one class of medication may not effectively imply response to a different class of medications or other in-class medications; we certainly cannot assume that it does. This is somewhat analogous to communicating in a foreign language using only a phrase-book: this method is practical, but there is little room for flexibility beyond the scenarios that the authors have set.
Another downside to this “elemental” and empiric approach is that it could be cumbersome to characterize all clinically relevant outcomes and susceptibilities that we typically associate with a diagnosis. Rather than saying that a patient has Generalized Anxiety Disorder, the output of a biomarker panel may instead conclude that she has a “Chief Complaint of Feeling Anxious in Multiple Contexts, High Probability of Response to SSRIs, Low Probability of Response to Benzodiazepines, Moderate Probability of Response to Cognitive-Behavioral Therapy, Low Probability of Suicide, Low Probability of Thyroid Complications, Low Probability of Somatic Outcomes Related to Pheochromocytoma,” with each statement possibly arrived at via a different biomarker technology. Using this approach, we would treat different clinical problems independently, even if they existed at the same level of analysis (e.g., responsiveness to SSRIs and responsiveness to benzodiazepines).
When we have clinical symptoms or potential interventions at differing levels of analysis, our approach could also treat them as independent problems, with no assumption of dependence between levels of analysis. This is particularly relevant in the discipline of NDD, in which patients routinely receive interventions at different levels of analysis. Embracing degeneracy and assigning multi-axial diagnoses is an established part of NDD care. Educational approaches are based on attentional and cognitive factors, often guided by history and psychoeducational testing (Ewen & Shapiro, 2008), while novel molecular therapeutics are more related to genotype, and neurostimulatory interventions may prove to be more related to a systems-neuroscientific level of analysis. Consider patients with Fragile X syndrome (FXS) with and without Autism Spectrum Disorder (ASD), and patients with ASD with and without FXS. Many patients with FXS express the phenotype of ASD, and some patients with ASD have a genotype of FXS. Applied Behavioral Analysis (ABA) is a recognized, behavioral intervention for ASD. mGluR5 antagonists are experimental pharmacological (molecular) therapies for FXS. A patient with FXS may be included in a clinical trial using a mGluR5 antagonist whether or not he carries an ASD diagnosis, and a patient with ASD could be prescribed ABA whether or not he has a diagnosis of FXS. We can therefore consider a patient’s ASD status and FXS status independently, in a multi-axial fashion (Rutter et al., 1973).
Progress in developing these types of biomarkers for neuropsychiatric and neurodevelopmental disorders remains very limited. Nonetheless, there are recent smaller-scale efforts and ongoing consortia and collaborative efforts developing multi-axial biomarkers in disorders such as ASD and schizophrenia to identify one or more biomarkers that might be applied to treatment development or delivery (Brady & Potter, 2014; Kas, Serretti, & Marston, 2019).
In this empiric approach, as we have described it so far, that the relevant patient group (inclusion/exclusion criteria for validation and then utilization of the biomarker) is based on existing clinical criteria to determine who will receive the test and who does not. If the patient falls slightly outside of the inclusion/exclusion criteria under which the biomarker was validated, both clinician and patient are out of luck. And because inclusion/exclusion criteria are predefined in the validation study and are based on symptomatology that may or may not correlate well with the true mechanisms the biomarker candidate is sensitive to, there is no a priori way to optimally pre-stratify and determine which patients should be considered to be candidates for the biomarker. However, data-driven clustering approaches may offer a “first step” path forward to establish biomarker utility.
Clustering and Outliers: Data-Driven Selection of Patient Groups
The empiric and “single-use” prediction/prognostication approach described above helps perform the third diagnostic function, of finding clinically meaningfully similar groups of patients, by dividing them dichotomously by a single outcome: high vs. low risk of a particular outcome, or high vs. low probability of response to a particular intervention. However, there is another approach to stratify patients, one that uses identification of multiple modes within the distribution of a single biomarker read-out or clustering among multiple read-outs considered simultaneously. In this approach, cluster membership rather than the thresholded/binarized output of the biomarker becomes the dependent variable in the validation study or the new diagnosis. And, of course, the mere fact that biomarker data clusters in a statistically significant way is an insufficient demonstration that these clusters are clinically meaningful for any purpose. The ability of cluster membership to help predict any particular clinical outcome needs to be subjected to individual validation studies (in separate samples) of each of the relevant clinical outcomes. In the EEG prediction of SSRI response (Wu et al., 2020), the machine learning approach was not a purely data-driven approach of pulling out one or more EEG defined subgroup(s) of depressed patients but rather a directed approach to look for EEG characteristics of patients who showed a good response to an SSRI but not to placebo.
Determination of whether clusters of individuals, defined unidimensionally or multidimensionally, exist in clinical and biomarker data can be determined using cluster analysis, mixture analysis or machine learning approaches; recent advances in machine learning provide tools have facilitated advances along these lines (Ivleva, Turkozer, & Sweeney, 2020; Sun et al., 2018; Sun et al., 2015; Xiao et al., 2019). In a sense, however, this process of “lumping and splitting” similar and dissimilar groups of patients is of a kind with the one that occurred in the minds of late 19th and early 20th century psychiatrists (Kendler, 2009), though now grounded in statistical inference.
The problem of how to deal with multiple levels of analysis parallels the earlier discussion. In the easiest case, “clusters” are based on a single dimension (dependent variable), such as the example of finding a group of “fast metabolizers” in drug studies (Endrenyi et al., 1976; Evans et al., 1960). Although it required methodologic advances to describe the detailed genetics underlying the observed variations, there is no question that simple recognition of a non-normal distribution set off an important and clinically relevant chain of studies.
A more complex example uses multiple measurements within a single level of analysis. Genetic variations can have very complex distributions. To date, decades of subgrouping psychiatric patients according to various candidate single-nucleotide polymorphisms (SNPs) has failed to provide clinically meaningful subgroup delineation outside of the metabolism of drugs, such as dosing and susceptibility to side effects of antipsychotics (Hoehe & Morris-Rosendahl, 2018). However, efforts continue, and polygenic risk scores, perhaps in combination with phenotypes of interest, are of great interest as potential biomarkers in Psychiatry. Another widely used approach for biological subtyping is anatomical MRI-based efforts, which take into account thousands of variables in each patient, have shown significant clustering associated with different white-matter tracts (Sun et al., 2015), though the clinical relevance of these groups has not yet been established through predictive validation.
Other clustering efforts take into account data from multiple levels of analysis at once. When data from different levels of analysis is non-redundant (Rodrigue et al., 2018; Stan et al., 2020), then more clusters may be identified using all data sources than by only using a single data source. Identifying (and then validating) a finer-grained set of individuals using this approach may better fulfill the promise of identifying a novel nosology of groups of patients who are most similar in terms of a range of clinical outcomes. Such was the case when identifying distinct “biotypes” of patients with psychosis from psychophysical and EEG data (Clementz et al., 2016).
Information from different levels of analysis can also be redundant. Although this redundancy does not increase our ability to carve nature at finer joints, it does increase our confidence that the joints we have discovered are real and naturally occurring (convergent validity). As we have discussed, all methods are sensitive to assumptions, confounds and artifact, with different methods differentially sensitive to artifact. For example, in an investigation of visual processing using resting-state fMRI and psychophysics, the fMRI may be less sensitive to (confounding) attentional fluctuations, and the psychophysics results may be less sensitive to participant movement.
Moreover, links between a measure at one level of analysis may be more or less strongly correlated with a measure at another, depending on how many additional, unaccounted-for factors are involved and how much noise is inherent in the system or its measurement. Strong association between metrics on different levels of analysis, convergent validity, can also be viewed as a form of “inter-method” reliability (Campbell & Fiske, 1959), one where performance on either measure is robust to the assumptions and potential sources of noise of both methods. The presence of convergent validity makes it more likely that features are identifying naturally occurring joints in nature and not merely methodological noise. One way of establishing convergent validity or inter-method reliability is to demonstrate that the clusters identified from data at one or more levels of analysis also differentiate themselves at other levels of analysis. Such was the case when neuropsychologically- and EEG-defined clusters in patients with psychosis were tested against clinical and MRI measures (Clementz et al., 2016). Again, these statistical “correlations” do not entail that the variation in clinical measures is caused by the processes that are reflected in the EEG measures.
For the most part, this is where the state-of-the-art stands. We are still a ways from the goal of mechanistically defined biomarkers and diagnostic constructs. Yet the approaches described above give us tools to help us reach the lofty goal of mechanistic biomarkers. We have methods to begin to carve nature and to validate whether we are truly hitting the joints. And while clinical practice can yet gain much from the procedures described above, it is worth considering what more is to be gained from mechanistically (rather than empirically) defined biomarkers and hurdles that need to be overcome to reach this goal.
Mechanisms, Diagnoses and Construct Validation
We should contrast biomarker studies, which represent a relatively small fraction of scientific work into affective and cognitive disorders, with the far larger body of explanatory science. Whereas biomarker validation studies require individual-level statistics and attempt to quantify the noise/uncertainty invariably associated with clinical practice, explanatory studies usually employ case-control designs and group comparisons and try to artificially remove sources of uncertainty. Explanatory multi-level, multi-modal investigation (Mill, Ito, & Cole, 2017) is making progress, if slowly, in assembling pictures of pathogenic mechanisms in neuropsychiatric illnesses, and these results will eventually be used to achieve the goal of mechanistically based diagnoses and diagnostic biomarkers. For reasons discussed above, these explanatory science efforts have not yet translated into diagnostic biomarkers or new nosologies. But one feels that the decades of preclinical and clinical neuroscience research in neuropsychiatry are approaching the point where biological and behavioral features together will be considered in clinical practice (Lui, Zhou, Sweeney, & Gong, 2016). However, it seems worthwhile at this stage to consider what approaches are best for developing mechanistically influenced diagnoses and biomarkers, and what such developments would permit.
Mechanistically-based diagnoses and biomarkers that can index those mechanisms with specificity are qualitatively different from the “single-use” biomarkers that have been discussed to this point. One limitation of the “single-use,” elemental approach is that it would take large numbers of independent biomarkers to learn all of the potential outcomes that one would want to know in the clinical care of a single patient. Another limitation of the “single-use,” predictive approach is that these biomarkers exist independently of each other, little can be predicted when the precise inclusion/exclusion criteria are changed, the intervention (in the case of response-prediction biomarkers) is changed or the outcome measure of interest (criterion, reference standard) is changed. We return to our analogy of this being like relying on a phrase-book as a tourist. In order to navigate the complexities of clinical practice, however, it is better to be more like a fluent speaker of the language (analogy to (Searle, 1980)). To correctly act on a positive CSF culture, we do not need to care that the patient in front of us is outside the age range in which the CSF culture was originally validated. This level of interplay between mechanistic knowledge and clinical testing is common to many other medical fields, but translation of neuroscience knowledge and technologies into patient care in Psychiatry and NDD remains stalled.
There is much about human brain function that cannot be measured in clinical protocols even now with current tools, and as a result, we are required to infer many constructs from the sparse data we do have, at least until new tools come along (Drury, 1996, p. 4; Greenwald, 2012). We can look to the history of medicine to understand how a more mechanistically informed clinical practice arose in other disciplines.
Semmelweis demonstrated empirically that hand-washing decreases rates of puerperal fever. Because the pathological mechanism was not understood, he invented a construct—“cadaveric particles”—as an ad hoc “explanation” of this effect. It was only after the development of Germ Theory, liking communicable disease with observable entities (bacteria) and, eventually, with the sensitivity of those entities to inoculation, antibiotics and pasteurization, that we had the full range of inter-linked knowledge (nomological network) that now powers the field of Infectious Diseases. In short, we have built up a cohesive “vocabulary” and “grammar” of infectious disease. As a result of this nomological network, we are now able to observe multiple properties of bacteria with new tools, contrast the types, identify their effective components (e.g., toxins) and predict their response to various conditions. We are able to rationally design new therapies, preventive strategies and clinical tests.
But because so many important aspects of brain and cognitive function cannot be observed directly, it is natural that—whether we are aware of it or not—cognitive brain research is balkanized into different disciplines, with different frameworks, different assumptions and different “vocabularies” (Andreas, 2017). This balkanization probably has not been seen in other Western medical fields since the pre-Galenic era (see (Kleinman, 1981)). Elemental biomarkers, as proposed above, side-step this issue by reducing diagnosis to a set of observable, “surface” features. On the other hand, trying to inter-link these observations in meaningful ways, given the limitations of current human neuroscience, requires that we build mechanistic theories that link observations using assumption-laden constructs.
When we equate diagnosis with a mechanism that is not directly unobservable, we are bound by the assumptions, “vocabulary,” and “grammar” of the theory and discipline that gave rise to that particular account. DSM maps observable symptomatology onto the derived constructs of Affective Disorders, Psychotic Disorders and Autism Spectrum Disorders. Psychoanalysis has “ego strength,” “resilience” and “reality testing.” Neuropsychology has procedural and declarative memory, aphasia, apraxia and agnosia. Dynamic Systems Theory has physical laws and chaotic attractors applied to psychological phenomena. Connectomics approaches often use whole-brain measures of path length and clustering coefficient, which differ substantially from localizationist claims about which lobe or gyrus “supports” particular functions or are altered in particular disorders. Synthesizing accounts that include, for example, both whole-brain network metrics and gyral-level BOLD activation in functional MRI requires considerable work to get these approaches to “speak the same language,” just as it would to try to synthesize psychophysical accounts of visual-spatial attention within a psychodynamic account of ego strength (see (Nagel, 1979)). And these research fields exist largely separate from clinical practice (Li, Wu, Lui, Gong, & Sweeney, 2020; Lui et al., 2016), which represents another divergence of Psychiatry from other fields of medicine for which using laboratory tests of target organ function and pathology are standard clinical practice. This “Tower of Babel” problem was recognized in the psychiatric literature prior to the MRI era (Meehl, 1977, p. 286) and been noted recently in the Cognitive Neuroscience literature (Poldrack & Yarkoni, 2016).
How can we develop a more universal vocabulary and grammar of clinical and basic neuroscience? More effort needs to be made to define a common set of constructs that spans frameworks, and to the extent possible, can be shown empirically to “carve nature at the joints.” This is essentially an identical problem to the one that motivates this article: the clinical insufficiency that has caused many to conclude that DSM does not reflect “real and naturally occurring entities.” Data-driven clustering, dimensional reduction methods and “double dissociations” (Shallice, 1988; Shallice & Cooper, 2011, pp. 74–78) help us identify “potential joints.” Convergent validity approaches across modalities help us validate that the apparent joints are not merely artifacts of the tools we use (MacCorquedale & Meelh, 1948), and discriminant validity approaches, where different groups are shown to have differing relationships to external criteria, help us validate that the joints are real.
This article started out by drawing attention to the paradox inherent in trying to demonstrate, empirically, how a novel biomarker candidate performs when there is uncertainty about the integrity of the diagnostic nosology against which it is judged. The pathway we propose represents an effort to end the segregation of explanatory biological studies and clinical diagnosis in psychiatry by bootstrapping a novel, mechanism-based nosology at the same time as we assess the validity of the read-out of novel technologies. To overcome the paradox, this goal can only be accomplished over the course of many studies that cover broad sets of biomarker technologies, patient groups, clinical outcomes and interventions, and establish the pattern of links and dissociations across this parametric landscape. Then, biology can inform nosology and treatment planning and nosology can be revised as knowledge is gained. The validation of constructs, contrasted with predictive validation in purely empiric approaches, has been seen throughout the history of psychometrics as an iterative and progressive boot-strapping process without a clear end-point (Nunnally & Bernstein, 2010, pp. 83–113; Strauss & Smith, 2009). It is entirely possible that a “Newtonian Revolution of the Brain” will spur a sudden and enlightening paradigm shift that will make further advances all but trivial, but it is definitionally impossible to lay a course to such a destination. Investment in highly unorthodox frameworks could support these efforts, but most investment will likely be directed strategically toward “normal science” iterative approaches for developing clinically useful biomarkers and bringing biological observations into diagnostic practice in neuropsychiatry.
A Path Forward
To recap, some diagnostic biomarkers work within the current DSM-style nosology in instances when we have faith in the diagnosis itself and the independent variables that serve as our current “gold standard” for those diagnoses. Given workforce issues in Psychiatry and NDD, these types of biomarkers have a welcome place to sustain, modify or validate current practice but do not advance treatment planning and prognosis. They can estimate the diagnostic procedure of expert clinicians more quickly, more cheaply or in a more widely available fashion. The Criterion Problem rears its head when we are not confident that our current diagnostic nosologies match underlying disease processes. To get around this problem, we can reduce diagnosis to empiric statements about outcome or responsiveness to individual therapies. We can also use empiric, “black box” methods to identify discrete clusters of individuals, but these then need to be subsequently empirically demonstrated to have clinical utility, and for what functions.
“Heterogeneity” within DSM diagnoses and nonspecificity across diagnoses is in part a function of the noisy links between levels of analysis in clinical cognitive neuroscience. By using methods that interrogate levels of analysis closer to the action of our interventions, we can also hope that any new nosology will show greater statistical performance relative to selecting interventions and predicting treatment response.
The apparent end-goal of the biomarker community, however, is to develop and validate mechanistically oriented biomarkers to guide subsequent advances in nosology. Building up extensive and cross-discipline mechanistic knowledge is distinguished from and complementary to an empirically based approach. Mechanistically oriented biomarkers have the advantage of allowing translation of knowledge across species as well as applicability beyond the “single use” contexts in any particular biomarker was explicitly validated. However, they require extensive knowledge resulting from explanatory, case-control science. They also require the empiric validation of a “vocabulary” of constructs that can allow for translation across the many brain- and behavior-oriented scientific and clinical disciplines.
The path toward desired biomarkers for neuropsychiatric and neurodevelopmental conditions has seen slow but steady progress. Methods have been improved, relations to clinical syndromes better understood, and early successes in establishing narrow clinical utility offer promise for the future. Perhaps investing in more empiric and/or “single use” biomarkers as the next step forward remains the best path to follow to reach the ultimate goal of fully mechanistically based biomarkers. Whether explicitly developed to advance mechanistic knowledge or to form the basis for model development to achieve that aim, this may be the best approach to move forward at the current time. However, as noted in the PD example, one can make major empirical advances to reach an intermediate stage of mechanistic understanding which leads to effective treatments without yet having the total picture.
Among the biomarker consortia/collaborative studies that have been formed ranging from US based ones in the areas of psychotic, autism and mood spectrum disorders to those in some of the same areas funded under the European Union’s Innovative Medicines Initiative, there is an implicit commitment to such a strategy (Brady & Potter, 2014). Instead of waiting for the field of neuroscience to develop the methods needed to understand causal relationships across all levels of analysis from genetics through brain function to behaviors and feelings, we may still advance therapeutics through finding biomarker predictors of intervention-specific response in syndromal (i.e., DSM-style) conditions or identifiable subgroups of individuals within broadly defined syndromes. The prediction of which depressed patients will respond to an SSRI based on pretreatment EEG cited earlier using data from the collaborative EMBARC study is one such example (Wu et al., 2020). Other consortia involving public-private partnerships are focused on identifying a wider range of biomarkers of degenerative neurologic brain conditions ranging including Alzheimer’s disease, PD, multiple sclerosis and frontotemporal dementia.
A low-hanging yield from such studies that brings industry to the investment table is the immediate application of biomarkers of brain function to ascertain whether or not an administered agent is doing what one expects it to do based on whatever clinical and preclinical data one has. The simple expedient of making sure that a drug really engages some molecular target in the brain and produces some functional effect (determined by EEG, fMRI and/or PET) could prevent useless investment is agents that have been either under-dosed or proven not sufficiently brain penetrant in humans to change the mechanism in question (Paul et al., 2010; Potter, 2015). Cumulatively, in the past, literally billions of dollars have been spent on studies that lacked efforts to establish target engagement in patients. The real advance has been to have measures with good enough psychometric properties and established association with disease to produce read-outs that can be trusted to go forward or not with a particular treatment.
Some of the challenges we now face were surmounted by previous generations in other fields of medicine and in behaviorally-focused Psychiatry through good psychometric practices. What is needed now is for those practices to be extended to include inference between levels of analysis, and efforts to establish real world clinical utility. Although data-driven approaches toward biomarker development can be successful as outlined above, and may be a necessary first steps toward achieving practical aims, thinking through the theoretical aspects of the challenge—psychological theory, brain theory and psychometric theory—can speed progress through providing a framework for aligning efforts to avoid redundancy and speed progress.
Will we be successful? Let us hope so. We and many others are confident that we will be able to find clinically useful biomarkers on the molecular and perhaps systems biology levels that can advance development and clinical use of psychopharmacological interventions—hence the large investments noted above. But progress in fully establishing a tight and specific explanatory model that links levels of analysis is likely to be a challenge for the coming decades of basic and clinical research.
Funding:
This work was supported by the National Institutes of Health [U54HD082008 to JAS, R01MH113652 to JBE, and P50HD103538 to the Kennedy Krieger Institute] and the National Natural Science Foundation of China [NSFC81820108018 to JAS].
References
- Adamek JH, Luo Y, & Ewen JB (in press). Using Connectivity to Explain Neuropsychiatric Conditions: The Example of Autism. In Thakor NV (Ed.), Springer Handbook of Neuroengineering: Springer. [Google Scholar]
- American Psychiatric Association. (1980). Diagnostic and Statistical Manual of Mental Disorders (Third ed.). Philadelphia: Americal Psychiatric Press. [Google Scholar]
- Andreas H (2017, July 20, 2017). Theoretical Terms in Science. The Stanford Encyclopedia of Philosophy. Fall 2017. Retrieved from https://plato.stanford.edu/archives/fall2017/entries/theoretical-terms-science/
- Bennett MR, & Hacker PMS (2003). Philosophical Foundations of Neuroscience. Hoboken, NJ: Wiley-Blackwell. [Google Scholar]
- Bogenschutz MP, & Nurnberg HG (2000). Classification of Mental Disorders. In Sadock BJ & Sadock VA (Eds.), Kaplan & Sadock’s Comprehensive Textbook of Psychiatry (Vol. 1, pp. 828). Philadelphia: Lippincott Williams & Wilkins. [Google Scholar]
- Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, … Group, S. (2015). STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ, 351, h5527. doi: 10.1136/bmj.h5527 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brady LS, & Potter WZ (2014). Public-private partnerships to revitalize psychiatric drug discovery. Expert Opin Drug Discov, 9(1), 1–8. doi: 10.1517/17460441.2014.867944 [DOI] [PubMed] [Google Scholar]
- Brandwein AB, Foxe JJ, Butler JS, Frey HP, Bates JC, Shulman LH, & Molholm S (2015). Neurophysiological indices of atypical auditory processing and multisensory integration are associated with symptom severity in autism. J Autism Dev Disord, 45(1), 230–244. doi: 10.1007/s10803-014-2212-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Campbell DT, & Fiske DW (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychol Bull, 56(2), 81–105. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/13634291 [PubMed] [Google Scholar]
- Cao H, McEwen SC, Chung Y, Chen OY, Bearden CE, Addington J, … Cannon TD (2019). Altered Brain Activation During Memory Retrieval Precedes and Predicts Conversion to Psychosis in Individuals at Clinical High Risk. Schizophr Bull, 45(4), 924–933. doi: 10.1093/schbul/sby122 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cattell RB, & Gibbons BD (1968). Personality factor structure of the combined Guilford and Cattell personality questionnaires. J Pers Soc Psychol, 9(1), 107–120. doi: 10.1037/h0025724 [DOI] [PubMed] [Google Scholar]
- Clementz BA, Sweeney JA, Hamm JP, Ivleva EI, Ethridge LE, Pearlson GD, … Tamminga CA (2016). Identification of Distinct Psychosis Biotypes Using Brain-Based Biomarkers. Am J Psychiatry, 173(4), 373–384. doi: 10.1176/appi.ajp.2015.14091200 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clementz BA, Trotti RL, Pearlson GD, Keshavan MS, Gershon ES, Keedy SK, … Tamminga CA (2020). Testing Psychosis Phenotypes From Bipolar-Schizophrenia Network for Intermediate Phenotypes for Clinical Application: Biotype Characteristics and Targets. Biol Psychiatry Cogn Neurosci Neuroimaging, 5(8), 808–818. doi: 10.1016/j.bpsc.2020.03.011 [DOI] [PubMed] [Google Scholar]
- Dew AM, Switzer GE, Myaskovsky L, DiMartini A, & Tovt-Korshynska MI (2006). Rating Scales for Mood Disorders. In Stein D, Kupfer and Schatzberg AF (Ed.), Textbook of Mood Disorders. Washington, DC: American Psychiatric Publishing, Inc. [Google Scholar]
- Drury MOC (1996). The Danger of Words and Writings on Wittgenstein (Berman D, Fitzgerald M, & Hayes J Eds.). Chippenham, Wiltshire: Thoemmes Press. [Google Scholar]
- Emerson RW, Adams C, Nishino T, Hazlett HC, Wolff JJ, Zwaigenbaum L, … Piven J (2017). Functional neuroimaging of high-risk 6-month-old infants predicts a diagnosis of autism at 24 months of age. Sci Transl Med, 9(393). doi: 10.1126/scitranslmed.aag2882 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Endrenyi L, Inaba T, & Kalow W (1976). Genetic study of amobarbital elimination based on its kinetics in twins. Clin Pharmacol Ther, 20(6), 701–714. doi: 10.1002/cpt1976206701 [DOI] [PubMed] [Google Scholar]
- Evans DA, Manley KA, & McKusick VA (1960). Genetic control of isoniazid metabolism in man. Br Med J, 2(5197), 485–491. doi: 10.1136/bmj.2.5197.485 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ewen JB (2016). The eternal promise of EEG-based biomarkers: Getting closer? Neurology, 87(22), 2288–2289. doi: 10.1212/WNL.0000000000003275 [DOI] [PubMed] [Google Scholar]
- Ewen JB, & Beniczky S (2018). Validating biomarkers and diagnostic tests in clinical neurophysiology: Developing strong experimental designs and recognizing confounds. In Schomer DL & Lopes da Silva FH (Eds.), Niedermeyer’s Electroencephalography (7th ed.). New York: Oxford University Press. [Google Scholar]
- Ewen JB, Kossoff EH, Crone NE, Lin DD, Lakshmanan BM, Ferenc LM, & Comi AM (2009). Use of quantitative EEG in infants with port-wine birthmark to assess for Sturge-Weber brain involvement. Clin Neurophysiol, 120(8), 1433–1440. doi: 10.1016/j.clinph.2009.06.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ewen JB, & Shapiro BK (2008). Specific Learning Disabilities. In Accardo PJ (Ed.), Developmental Disabilities in Infancy and Childhood (3rd ed.). Baltimore: Paul H. Brookes Publishing Co. [Google Scholar]
- FDA-NIH Biomarker Working Group. (2016). BEST (Biomarkers, EndpointS, and other Tools) Resource. Silver Spring (MD): Food and Drug Administration (US). [PubMed] [Google Scholar]
- Galatzer-Levy IR, & Bryant RA (2013). 636,120 Ways to Have Posttraumatic Stress Disorder. Perspect Psychol Sci, 8(6), 651–662. doi: 10.1177/1745691613504115 [DOI] [PubMed] [Google Scholar]
- Gandal MJ, Haney JR, Parikshak NN, Leppa V, Ramaswami G, Hartl C, … Geschwind DH (2018). Shared molecular neuropathology across major psychiatric disorders parallels polygenic overlap. Science, 359(6376), 693–697. doi: 10.1126/science.aad6469 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Greenwald AG (2012). There Is Nothing So Theoretical as a Good Method. Perspect Psychol Sci, 7(2), 99–108. doi: 10.1177/1745691611434210 [DOI] [PubMed] [Google Scholar]
- Gunnarsdottir KM, Gamaldo C, Salas RM, Ewen JB, Allen RP, Hu K, & Sarma SV (2020). A novel sleep stage scoring system: Combining expert-based features with the generalized linear model. J Sleep Res, e12991. doi: 10.1111/jsr.12991 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gunnarsdottir KM, Gamaldo C, Salas RM, Ewen JB, Allen RP, & Sarma SV (2018). A Novel Sleep Stage Scoring System: Combining Expert-Based Rules with a Decision Tree Classifier. Conf Proc IEEE Eng Med Biol Soc, 2018, 3240–3243. doi: 10.1109/EMBC.2018.8513039 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Happé F, Ronald A, & Plomin R (2006). Time to give up on a single explanation for autism. Nat Neurosci, 9(10), 1218–1220. doi: 10.1038/nn1770 [DOI] [PubMed] [Google Scholar]
- Hill SK, Reilly JL, Harris MS, Rosen C, Marvin RW, Deleon O, & Sweeney JA (2009). A comparison of neuropsychological dysfunction in first-episode psychosis patients with unipolar depression, bipolar disorder, and schizophrenia. Schizophr Res, 113(2–3), 167–175. doi: 10.1016/j.schres.2009.04.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hill SK, Reilly JL, Keefe RS, Gold JM, Bishop JR, Gershon ES, … Sweeney JA (2013). Neuropsychological impairments in schizophrenia and psychotic bipolar disorder: findings from the Bipolar-Schizophrenia Network on Intermediate Phenotypes (B-SNIP) study. Am J Psychiatry, 170(11), 1275–1284. doi: 10.1176/appi.ajp.2013.12101298 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoehe MR, & Morris-Rosendahl DJ (2018). The role of genetics and genomics in clinical psychiatry. Dialogues Clin Neurosci, 20(3), 169–177. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/30581286 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hyman SE (2012). Revolution stalled. Sci Transl Med, 4(155), 155cm111. doi: 10.1126/scitranslmed.3003142 [DOI] [PubMed] [Google Scholar]
- Illari PM, & Williamson J (2012). What is a mechanism? Thinking about mechanisms across the sciences. Euro Jnl Phil Sci, 2, 119–135. doi:DOI 10.1007/s13194-011-0038-2 [DOI] [Google Scholar]
- Ivleva EI, Turkozer HB, & Sweeney JA (2020). Imaging-Based Subtyping for Psychiatric Syndromes. Neuroimaging Clin N Am, 30(1), 35–44. doi: 10.1016/j.nic.2019.09.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kas MJ, Serretti A, & Marston H (2019). Quantitative neurosymptomatics: Linking quantitative biology to neuropsychiatry. Neurosci Biobehav Rev, 97, 1–2. doi: 10.1016/j.neubiorev.2018.11.013 [DOI] [PubMed] [Google Scholar]
- Kendler KS (2009). An historical framework for psychiatric nosology. Psychol Med, 39(12), 1935–1941. doi: 10.1017/S0033291709005753 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kleinman A (1981). Patients and Healers in the Context of Culture. Los Angeles: University of California Press. [Google Scholar]
- Krakauer JW, Ghazanfar AA, Gomez-Marin A, MacIver MA, & Poeppel D (2017). Neuroscience Needs Behavior: Correcting a Reductionist Bias. Neuron, 93(3), 480–490. doi: 10.1016/j.neuron.2016.12.041 [DOI] [PubMed] [Google Scholar]
- Leptak C, Menetski JP, Wagner JA, Aubrecht J, Brady L, Brumfield M, … Wholley D (2017). What evidence do we need for biomarker qualification? Sci Transl Med, 9(417). doi: 10.1126/scitranslmed.aal4599 [DOI] [PubMed] [Google Scholar]
- Li F, Wu D, Lui S, Gong Q, & Sweeney JA (2020). Clinical Strategies and Technical Challenges in Psychoradiology. Neuroimaging Clin N Am, 30(1), 1–13. doi: 10.1016/j.nic.2019.09.001 [DOI] [PubMed] [Google Scholar]
- Loo SK, Cho A, Hale TS, McGough J, McCracken J, & Smalley SL (2013). Characterization of the theta to beta ratio in ADHD: identifying potential sources of heterogeneity. J Atten Disord, 17(5), 384–392. doi: 10.1177/1087054712468050 [DOI] [PubMed] [Google Scholar]
- Lorr M, Jenkins R, & O’Connor JP (1955). Factors descriptive of psychopathology and behavior of hospitalized psychotics. J Abnorm Psychol, 50(1), 78–86. doi: 10.1037/h0045418 [DOI] [PubMed] [Google Scholar]
- Lui S, Yao L, Xiao Y, Keedy SK, Reilly JL, Keefe RS, … Sweeney JA (2015). Resting-state brain function in schizophrenia and psychotic bipolar probands and their first-degree relatives. Psychol Med, 45(1), 97–108. doi: 10.1017/S003329171400110X [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lui S, Zhou XJ, Sweeney JA, & Gong Q (2016). Psychoradiology: The Frontier of Neuroimaging in Psychiatry. Radiology, 281(2), 357–372. doi: 10.1148/radiol.2016152149 [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacCorquedale K, & Meelh PE (1948). On a Distinction Between Hypothetical Constructs and Intervening Variables. Psychological Review, 55(2), 95–107. [DOI] [PubMed] [Google Scholar]
- Meehl PE (1977). Psychodiagnosis: Selected Papers. New York, NY: W. W. Norton & Company, Inc. [Google Scholar]
- Menetski JP, Austin CP, Brady LS, Eakin G, Leptak C, Meltzer A, & Wagner JA (2019). The FNIH Biomarkers Consortium embraces the BEST. Nat Rev Drug Discov, 18(8), 567–568. doi: 10.1038/d41573-019-00015-w [DOI] [PubMed] [Google Scholar]
- Mill RD, Ito T, & Cole MW (2017). From connectome to cognition: The search for mechanism in human functional brain networks. Neuroimage, 160, 124–139. doi: 10.1016/j.neuroimage.2017.01.060 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nagel E (1979). The Structure of Science: Problems in the Logic of Scientific Explanation. Indianapolis, IN: Hacket Publishing Company. [Google Scholar]
- Nunnally JC, & Bernstein IH (2010). Psychometric Theory (Third ed.): Tata McGraw-Hill. [Google Scholar]
- Parker DA, Hamm JP, McDowell JE, Keedy SK, Gershon ES, Ivleva EI, … Clementz BA (2019). Auditory steady-state EEG response across the schizo-bipolar spectrum. Schizophr Res, 209, 218–226. doi: 10.1016/j.schres.2019.04.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paul SM, Mytelka DS, Dunwiddie CT, Persinger CC, Munos BH, Lindborg SR, & Schacht AL (2010). How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov, 9(3), 203–214. doi: 10.1038/nrd3078 [DOI] [PubMed] [Google Scholar]
- Pelphrey KA, Yang DY, & McPartland JC (2014). Building a social neuroscience of autism spectrum disorder. Curr Top Behav Neurosci, 16, 215–233. doi: 10.1007/7854_2013_253 [DOI] [PubMed] [Google Scholar]
- Plato. (1924). The Meno (Lamb WRM, Trans.). In Plato II (pp. 264–371). Cambridge, MA: Harvard University Press. [Google Scholar]
- Poldrack RA, & Yarkoni T (2016). From Brain Maps to Cognitive Ontologies: Informatics and the Search for Mental Structure. Annu Rev Psychol, 67, 587–612. doi: 10.1146/annurev-psych-122414-033729 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Potter WZ (2015). Optimizing early Go/No Go decisions in CNS drug development. Expert Rev Clin Pharmacol, 8(2), 155–157. doi: 10.1586/17512433.2015.991715 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reininghaus U, Bohnke JR, Chavez-Baldini U, Gibbons R, Ivleva E, Clementz BA, … Tamminga CA (2019). Transdiagnostic dimensions of psychosis in the Bipolar-Schizophrenia Network on Intermediate Phenotypes (B-SNIP). World Psychiatry, 18(1), 67–76. doi: 10.1002/wps.20607 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ren W, Lui S, Deng W, Li F, Li M, Huang X, … Gong Q (2013). Anatomical and functional brain abnormalities in drug-naive first-episode schizophrenia. Am J Psychiatry, 170(11), 1308–1316. doi: 10.1176/appi.ajp.2013.12091148 [DOI] [PubMed] [Google Scholar]
- Rodrigue AL, McDowell JE, Tandon N, Keshavan MS, Tamminga CA, Pearlson GD, … Clementz BA (2018). Multivariate Relationships Between Cognition and Brain Anatomy Across the Psychosis Spectrum. Biol Psychiatry Cogn Neurosci Neuroimaging, 3(12), 992–1002. doi: 10.1016/j.bpsc.2018.03.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosen C, Marvin R, Reilly JL, Deleon O, Harris MS, Keedy SK, … Sweeney JA (2012). Phenomenology of first-episode psychosis in schizophrenia, bipolar disorder, and unipolar depression: a comparative analysis. Clin Schizophr Relat Psychoses, 6(3), 145–151. doi: 10.3371/CSRP.6.3.6 [DOI] [PubMed] [Google Scholar]
- Rutter M, Shaffer D, & Shepherd M (1973). An evaluation of the proposal for a multi-axial classification of child psychiatric disorders. Psychol Med, 3(2), 244–250. doi: 10.1017/s0033291700048595 [DOI] [PubMed] [Google Scholar]
- Sahin M, Jones SR, Sweeney JA, Berry-Kravis E, Connors BW, Ewen JB, … Mamounas LA (2018). Discovering translational biomarkers in neurodevelopmental disorders. Nat Rev Drug Discov. doi: 10.1038/d41573-018-00010-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Searle JR (1980). Mind, brains, and programs. Behavioral and Brain Sciences, 3(3), 417–457. [Google Scholar]
- Shallice T (1988). From Neuropsychology to Mental Structure. New York: Cambridge University Press. [Google Scholar]
- Shallice T, & Cooper RP (2011). The Organisation of Mind. New York: Oxford University Press. [Google Scholar]
- Snyder SM, Rugino TA, Hornig M, & Stein MA (2015). Integration of an EEG biomarker with a clinician’s ADHD evaluation. Brain Behav, 5(4), e00330. doi: 10.1002/brb3.330 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stan AD, Tamminga CA, Han K, Kim JB, Padmanabhan J, Tandon N, … Gibbons RD (2020). Associating Psychotic Symptoms with Altered Brain Anatomy in Psychotic Disorders Using Multidimensional Item Response Theory Models. Cereb Cortex, 30(5), 2939–2947. doi: 10.1093/cercor/bhz285 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Strack S, Choca JP, & Gurtman MB (2001). Circular structure of the MCMI-III personality disorder scales. J Pers Disord, 15(3), 263–274. doi: 10.1521/pedi.15.3.263.19206 [DOI] [PubMed] [Google Scholar]
- Strauss ME, & Smith GT (2009). Construct validity: advances in theory and methodology. Annu Rev Clin Psychol, 5, 1–25. doi: 10.1146/annurev.clinpsy.032408.153639 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun H, Chen Y, Huang Q, Lui S, Huang X, Shi Y, … Gong Q (2018). Psychoradiologic Utility of MR Imaging for Diagnosis of Attention Deficit Hyperactivity Disorder: A Radiomics Analysis. Radiology, 287(2), 620–630. doi: 10.1148/radiol.2017170226 [DOI] [PubMed] [Google Scholar]
- Sun H, Lui S, Yao L, Deng W, Xiao Y, Zhang W, … Gong Q (2015). Two Patterns of White Matter Abnormalities in Medication-Naive Patients With First-Episode Schizophrenia Revealed by Diffusion Tensor Imaging and Cluster Analysis. JAMA Psychiatry, 72(7), 678–686. doi: 10.1001/jamapsychiatry.2015.0505 [DOI] [PubMed] [Google Scholar]
- Sweeney JA, Clementz BA, Haas GL, Escobar MD, Drake K, & Frances AJ (1994). Eye tracking dysfunction in schizophrenia: characterization of component eye movement abnormalities, diagnostic specificity, and the role of attention. J Abnorm Psychol, 103(2), 222–230. doi: 10.1037//0021-843x.103.2.222 [DOI] [PubMed] [Google Scholar]
- Sweeney JA, Takarae Y, Macmillan C, Luna B, & Minshew NJ (2004). Eye movements in neurodevelopmental disorders. Curr Opin Neurol, 17(1), 37–42. doi: 10.1097/00019052-200402000-00007 [DOI] [PubMed] [Google Scholar]
- Tamminga CA, Ivleva EI, Keshavan MS, Pearlson GD, Clementz BA, Witte B, … Sweeney JA (2013). Clinical phenotypes of psychosis in the Bipolar-Schizophrenia Network on Intermediate Phenotypes (B-SNIP). Am J Psychiatry, 170(11), 1263–1274. doi: 10.1176/appi.ajp.2013.12101339 [DOI] [PubMed] [Google Scholar]
- Waterhouse L, London E, & Gillberg C (2016). ASD Validity. Rev J Autism Dev Disord, 3, 302. doi: 10.1007/s40489-016-0085-x [DOI] [Google Scholar]
- Wu W, Zhang Y, Jiang J, Lucas MV, Fonzo GA, Rolle CE, … Etkin A (2020). An electroencephalographic signature predicts antidepressant response in major depression. Nat Biotechnol, 38(4), 439–447. doi: 10.1038/s41587-019-0397-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiao Y, Yan Z, Zhao Y, Tao B, Sun H, Li F, … Lui S (2019). Support vector machine-based classification of first episode drug-naive schizophrenia patients and healthy controls using structural MRI. Schizophr Res, 214, 11–17. doi: 10.1016/j.schres.2017.11.037 [DOI] [PubMed] [Google Scholar]
- Zachar P, & Kendler KS (2017). The Philosophy of Nosology. Annu Rev Clin Psychol, 13, 49–71. doi: 10.1146/annurev-clinpsy-032816-045020 [DOI] [PubMed] [Google Scholar]
- Zhang W, Lei D, Keedy SK, Ivleva EI, Eum S, Yao L, … Sweeney JA (2020). Brain gray matter network organization in psychotic disorders. Neuropsychopharmacology, 45(4), 666–674. doi: 10.1038/s41386-019-0586-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang W, Nery FG, Tallman MJ, Patino LR, Adler CM, Strawn JR, … DelBello MP (2020). Individual prediction of symptomatic converters in youth offspring of bipolar parents using proton magnetic resonance spectroscopy. Eur Child Adolesc Psychiatry. doi: 10.1007/s00787-020-01483-x [DOI] [PubMed] [Google Scholar]
- Zhang W, Xiao Y, Sun H, Patino LR, Tallman MJ, Weber WA, … DelBello MP (2018). Discrete patterns of cortical thickness in youth with bipolar disorder differentially predict treatment response to quetiapine but not lithium. Neuropsychopharmacology, 43(11), 2256–2263. doi: 10.1038/s41386-018-0120-y [DOI] [PMC free article] [PubMed] [Google Scholar]