The advent of microarray-based genomic technologies in the late 1980s and early 1990s, leading to the groundbreaking paper by Brown and coworkers in 1995 describing the first microarray expression analysis (1), ushered in the age of genomics. Since then, a number of approaches aimed at the “collective characterization and quantification of pools of biological molecules that translate into the structure, function, and dynamics of an organism” (2), together referred to as “omics,” have been developed and applied to a wide variety of biological systems. Omics approaches, including genomics, transcriptomics, proteomics, and metabolomics, have transformed their respective disciplines. No longer is the study of a single gene, RNA, protein, or metabolite the norm. These approaches have also transformed modern biology as well, all the while stirring up a vigorous debate between those who favor hypothesis-driven science and those who favor data-driven science (3). This dichotomy, however, is too simplistic. In this editorial, I discuss how arguments about hypothesis- vs data-driven science have impacted the integration of omics approaches into our biological tool kit, as well as the grant review panels that evaluate proposed omics research. In addition, I discuss the role of journals, such as Molecular Endocrinology, in providing a forum for the presentation of high-quality omics resources.
“The goal is to discover things we neither knew nor expected …”
Patrick O. Brown
Discovery, Mechanism, and Description: Two Out of Three Ain't Bad
Although many scientists, like me, get excited by a well-executed and informative omics experiment, especially one using a newly developed method that can delve into previously inaccessible biological endpoints, others are put off by such “discovery science.” Pat Brown, an advocate of discovery science, asserted that “the goal is to discover things we neither knew nor expected, and to see relationships and connections among the elements, whether previously suspected or not. It follows that this process is not driven by hypothesis and should be as model-independent as possible” (4). In contrast, John Allen, a critic of the approach, predicted that “induction and data-mining, uninformed by ideas, can themselves produce neither knowledge nor understanding” (5). Although entertaining, thought provoking, and, at times, useful, this debate does not reflect the current realities and applications of omics 20+ years into the enterprise.
In many respects, labeling omics as purely discovery science is a misrepresentation. Although discovery is an important use for omics, testing molecular mechanisms on a global scale is equally important. Scientists have gotten better at designing omic experiments and combining them with other approaches (eg, perturbations, such as RNAi-mediated knockdown or chemical inhibition of effector proteins), demonstrating the power of genomics to test the generality of molecular mechanisms on a global scale. If there is a problem with the use and application of omics approaches, it lies not with the tools (ie, omics approaches), but those who wield them (ie, scientists), at times not so deftly.
In my view, there are 3 types of omics experiments, which I refer to as discovery-focused, mechanism-focused, and descriptive (see Box 1). Discovery-focused omics experiments are designed to “discover things we neither knew nor expected,” as Pat Brown imagined. They apply new or existing methodologies to biological systems under conditions or at time points that are most likely to reveal key aspects of the biology (hmmm … that almost sounds like a hypothesis, but I will address that in more detail below). Such experiments may be combined with high-throughput screens (eg, RNAi or chemical libraries), which allows discovery of targets on a global scale. Mechanism-focused omics experiments (also referred to as Functional omics experiments) are typically based on clear hypotheses and are designed to test, or perhaps reveal, underlying molecular mechanisms on a global scale. These are classical perturbation-effect experiments writ large, revealing the generality of a molecular mechanism, or variations in a mechanism, across the “ome” in question. Finally, descriptive omics experiments are those that are neither discovery- nor mechanism-focused, and perhaps might best be described as an omics experiment gone wrong. They survey the biological system, without leading to significant discoveries. They may also appear to address mechanisms without actually doing so. Descriptive omics experiments produce catalogs (lists of genes, transcripts, proteins, or metabolites) whose levels “change” from condition A to condition B, without revealing how or why. I think discovery- and mechanism-focused omics experiments have great value in modern biology. Descriptive omics experiments, however, are of limited value, because they are typically poorly designed and, as such, fail to capitalize on the power of omics as a tool of discovery and hypothesis testing.
Grant Me This: Study Sections and Their Love/Hate Relationship With Omics Experiments
The debate about hypothesis-driven vs data-driven science has made its way into the grant review process. For grant applicants, this can be a difficult minefield to navigate. This is disappointing, because a well-crafted and well-integrated omics experiments can contribute significantly to a research proposal. In my experience, both as a member of an NIH study section and an applicant, grant review panels whose traditional purview is mechanisms or physiology/pathophysiology are prone to a dim view of specific aims that are omics-centric. Common criticisms include: 1) lack of a hypothesis, 2) proposed experiments that are too ambitious, and 3) lack of a clear plan for sorting through and testing targets and potential mechanisms. Ironically, proposals that focus on a single gene and lack a global view are often criticized (rightly so, in my opinion) for not including the obvious omics experiment that would broaden the perspective and reveal new molecular details. Thus, it is understandable if applicants sense that grant review panels have a love/hate relationship with omics experiments.
Although each of the criticisms noted above may be valid in specific cases, I think they are too often applied in a knee-jerk manner. For example, even discovery-focused omics experiments have an implicit hypothesis: “the signature of the transcriptome (or proteome, or metabolome) is indicative of the state of the biological system and will reveal the key effectors (RNAs, proteins, metabolites) driving the biological state.” Furthermore, the proliferation of omics technologies has led to the establishment of core facilities and companies that can process samples, generate data, and assist with data analysis, making omics experiments practical, cost effective, and accessible to most researchers (although access to the computational resources needed for detailed analyses are limiting for many scientists, but that is a topic for another editorial). Nonetheless, in a practical and strategic surrender, I have taken to coaching new faculty and postdocs applying for grants or fellowships to downplay their omics experiments in their aims, presenting them as preliminary data (if initial experiments can be completed before submission) or making them more of an afterthought or “confirmation on a global scale.” Of course, none of this is ideal. Rather than furthering the scientific enterprise, it stands in the way. What can be done?
I think there is shared culpability for the issues that I outlined above, and both the grant reviewers and the applicants need to be part of the solution. On the one hand, grant reviewers must get past the aversion to discovery science. Although the philosophical debate may persist, the practical debate is over. Experience from the past 20+ years tells us that omics approaches are an essential part of modern biology. They have allowed us to make discoveries and understand biology in a way that we could not have done without them. In addition, grant reviewers should be willing to accept that even a discovery-focused omics experiment can have a hypothesis. Finally, grant reviewers must make every effort to distinguish between the 3 types of omics experiments described above, weeding out those that are purely descriptive.
On the other hand, applicants must provide a clear justification for the need, as well as an explanation of the benefit, of their proposed omics experiments. Importantly, they must present clear hypotheses for the discovery- and mechanism-focused omics experiments that they propose, as noted above, and avoid descriptive omics experiments. Furthermore, applicants must include 1) a clear plan for analyzing and sorting through the data, 2) a proposal for establishing the priority of targets for follow-up analyses, 3) a description of the expected outcomes, and 4) how the outcomes will provide a test of the hypothesis. The applicant is responsible for leading the reviewer through the experiments. I think there is a tendency for applicants to include a particular omics experiment in their proposal, with the assumption that the reviewer will somehow be able to discern the path from A to B to C. For example, “performing RNA-seq” becomes shorthand for everything from the sample collection, data generation, data analysis, and the outcomes and interpretation, without actually elucidating any of the details for the reviewer. Finally, the applicant must present a credible plan for data analysis, including a description of the methods, resources, computational infrastructure, and computational personnel available for the analysis.
I think these common-sense approaches for applicants and reviewers will improve the use and integration of omics experiments into research plans, as well as improve the process of reviewing grant applications that propose to use omics approaches.
Data Resources for the Community: A Role for Journals?
The large datasets generated in omics experiments can be mined again and again, making them great resources for future experiments that were not yet conceived at the time of the initial data generation. In my own lab, we regularly integrate data from our own genomic experiments with others that are publically available. For example, in one recent study (6), we mined more than 25 different publically available genomic datasets and integrated them with a few of our own. The publically available data gave us a much broader and deeper view of the system than we could have discerned from our data alone.
The availability of data generated by other laboratories or consortia, especially the raw data, which can be more flexibly mined, is an important component of the overall omics enterprise. Although many omics datasets are associated with and released as part of publications in science journals, neither the journals, nor the authors, are equipped for, or well suited to, maintaining repositories of the data for others who wish to mine them. Rather, a number of omics data repositories, many of which are publically supported (eg, by the NIH), have been established to maintain the data as a resource to the scientific community. These include the National Center for Biotechnology Information (NCBI)'s Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) for genomic data and the Metabolomics Consortium Data Repository and Coordinating Center (DRCC; http://www.metabolomicsworkbench.org/data/DRCCDataDeposit.php) for metabolomic data. More should be done to support, maintain, and expand these repositories. Wisely, most journals have made submission of new omics datasets used in the papers they publish a requirement for acceptance and publication.
Box 1. Three Types of Omics Experiments
1) Discovery-focused
Designed to discover things neither known nor expected. They apply new or existing methodologies to biological systems under conditions or at time points that are most likely to reveal key aspects of the biology.
2) Mechanism-focused (functional)
They are based on clear hypotheses and are designed to test underlying molecular mechanisms on a global scale using perturbation-effect approaches. They reveal the generality of a molecular mechanism, or variations in a mechanism, across the ome in question.
3) Descriptive
They survey the biological system, without leading to significant discovery, and they may appear to address a mechanism without actually doing so. They produce catalogs (lists of genes, transcripts, proteins, or metabolites) whose levels change from condition A to condition B, without revealing how or why.
In addition to broad repositories, such as those noted above, some fields of research have developed their own topical resources that contain (or link to) omics data, as well as other information, relevant to the field. One such resource relevant to many readers of Molecular Endocrinology is the Nuclear Receptor Signaling Atlas (NURSA; https://www.nursa.org/nursa/index.jsf), which organizes and contains links to published datasets in the nuclear receptor field, as well as a tool called “Transcriptomine,” which allows users to mine tissue-specific nuclear receptor signaling pathways based on public transcriptomic datasets. Field-specific resources are even being organized, supported, and maintained by some institutes at the NIH. This includes the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)'s Information Network (dkNET; http://www.dknet.org), which provides access to large pools of data relevant to the mission of the NIDDK. These resources are a useful launching point for exploring available omics data in a particular field.
Box 2. Criteria for Resource Papers in Molecular Endocrinology
Describes the development of a new method or an improved version of an existing method that is of general use to others who might conduct similar studies in the future.
Describes a dataset for a relevant biological system that is superior to any previously published datasets (must meet standards of quality, novelty, and uniqueness).
Presents the data in interesting ways that illustrate the full spectrum of the data and the biological interest.
Includes new or improved methods that lead to new observations or the generation of new hypotheses that were not reported previously.
Includes functional follow-up that tests the underlying hypotheses generated from the data.
Journals have also begun to venture in the realm of omics data resources by publishing “resource” papers that highlight new, interesting, and broadly useful datasets, especially those generated using newly developed methodologies. Omics datasets, however, are not useful to the community if they are limited, highly specific, or of poor quality. Obviously, not every RNA-seq or ChIP-seq dataset is a useful resource. Thus, a rigorous set of criteria must be applied when evaluating resource papers. The editors at Molecular Endocrinology have developed a set of criteria that we use and ask our reviewers to use when evaluating resource papers for publication (see Box 2). Understandably, papers with descriptive omics experiments are not reviewed favorably. We hope that these criteria will help authors as they consider submitting their omics papers for publication as resources.
I think my enthusiasm for the use of omics in modern biology should be evident from this editorial. I am an advocate for the use and integration of omics experiments wherever applicable. However, I also think that omics is a tool that must be wieldy deftly for maximum benefit. Continuing dialogs like this one should help us to refine the way omics tools are applied and allow us to perfect this craft.
W. Lee Kraus, PhD
Acknowledgments
The author thanks Gary Hon for helpful comments and feedback on this piece.
The Omics research in the author's lab is funded by grants from the National Institutes of Health (NIH)/National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), the Cancer Prevention Research Institute of Texas (CPRIT), and the Cecil H. and Ida Green Center for Reproductive Biology Sciences Endowment. W.L.K. holds the Cecil H. and Ida Green Distinguished Chair in Reproductive Biology Sciences.
Disclosure Summary: The author has nothing to disclose.
References
- 1. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270:467–470. [DOI] [PubMed] [Google Scholar]
- 2. Simó C, Cifuentes A, García-Cañas V, eds. Fundamentals of Advanced Omics Technologies: From Genes to Metabolites. 1st ed Amsterdam, The Netherlands: Elsevier; 2014. [Google Scholar]
- 3. Mazzocchi F. Could Big Data be the end of theory in science? A few remarks on the epistemology of data-driven science. EMBO Rep. 2015;16:1250–1255. DOI: 10.15252/embr.201541001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Brown PO, Botstein D. Exploring the new world of the genome with DNA microarrays. Nat Genet. 1999;21:33–37. [DOI] [PubMed] [Google Scholar]
- 5. Allen JF. Bioinformatics and discovery: induction beckons again. Bioessays. 2001;23:104–107. [DOI] [PubMed] [Google Scholar]
- 6. Hah N, Murakami S, Nagari A, Danko CG, Kraus WL. Enhancer transcripts mark active estrogen receptor binding sites. Genome Res. 2013;23:1210–1223. [DOI] [PMC free article] [PubMed] [Google Scholar]