The era of Precision Medicine (PM) has entered its execution phase where acquisition of human genome variation data is already leading the charge [1], [2]. PM has come a long way to this phase from the Human Genome Project started some 25 years ago [3], [4]; it marks the beginning of disease-centric research in the field of biomedical sciences. Other than delivering high-coverage human genomes by millions—of course tailored to patients and common diseases—in the next decade or so, what else do we expect to have and what should we be doing to make the best out of PM projects beyond the current expectation of genetics?
Believe or not, the PM bandwagon is still largely bannered with slogans promoting large-scale, data-driven (or discovery-driven) approaches, such as in-depth discovery of cancer-causing mutations, drug targets, and biomarkers, and may be coupled with hypotheses to balance the currently-debated environment-centric vs. mutation-centric arguments. Nevertheless, we are still in time to speculate how to grasp disease-centric PM tasks, largely focusing on various common diseases, which are clear in priority and complex in nature, including but not limited to cancers, as well as cardiovascular, infectious, metabolic, and neurological diseases or disorders. Other than disease-oriented objectives, we should think more and deeper about other useful data and well-thought tasks in addition to genome sequences, as well as new synthesis as to how to acquire novel data and to interpret them with wisdom [5], [6], [7].
First of all, once we have an enormous amount of high-quality sequences, understanding the human population structures and defining haplotypes within and between populations, as well as their disease relevance, are of essence. As the nature of this particular venture is largely informational, population-based sequence variation databases [8] are long-awaited, since the raw data accumulation may exceed the current computation and storage capacities. An integrated database hosts all sequence variations and functional annotation is highly desirable. In addition, mutation biases can also be further used to define function-selected sequence elements beyond protein-coding sequences [9], [10]. For instance, when human genes are partitioned into house-keeping and tissue-specific genes, the mutation rate of tissue-specific genes appear 33% higher than that of house-keeping genes [11], [12]. This observation suggests that the germline associated genes and chromosome organization hold keys to a variable mutation rate. Another example is the fact that the size of universal introns (size-invariable within lineages) are found—based on population data—to be functionally selected toward size optima, albeit lineage-associated [9], [13]. After all, mutations are neither created randomly nor equally in an operational sense for overall DNA sequences and their carriers—chromosomes, let alone selections that are largely attributable to function or phenotype and poorly defined in structural terms of genes and intergenic sequences [14].
Second, studying cellular gene expression and regulation in precision presents another challenge. As a compartmental unit of life, cellular heterogeneity provides functional diversity, even when spoken about asexually divided bacteria. Although transcriptomics is well-respected as a critical paradigm for gene expression study tailored to cells [15], not yet has a single standard human transcriptome been produced, claimed, validated, and hosted in an authorized database. The difficulties are enormous at present time [16]. We have not yet been able to sequence RNA directly in a resolution of copies per cell and to define chemically-modified RNA sequences quantitatively at single-molecule resolution [17], [18], [19]. We have not yet been able to separate cellular RNAs into appropriate classes, ribosomal vs. total, messenger vs. non-coding, small vs. large, etc. All these call upon a genome-wide large-scale project world-wide: the Human Transcriptomes Project. The Human Transcriptomes Project will certainly come after the sequencing effort in the early phase of the PM project, or being a sequel of it or maybe even sooner, as the current sequencing capacity and tasks will have to be redirected after the genome sequencing effort reaches a peak.
Third, defining DNA structural elements and gene organization as landmarks of chromosomes is undoubtedly a major endeavor. The ENCODE project has been paving ways for thorough definition of operational DNA elements for each cell type and tissue (http://www.genome.gov/encode/). There are quite a few unanswered questions along the line. For instance, most human genes are organized into clusters but circadian-regulated genes are cluster-avoiding. How are they synchronized in expression and organized into chromosome territories [14]? How transcript-rich cells, such as testis, brain, and stem cells, are organized to express most of their genes [20], [21]? How chromosomes are organized, inherited, and regulated in step-wise changes precisely in germline cells to ensure body development from a zygote? Obviously, a human chromosome-based gene organization map becomes important, which may include experimental data and information, such as sites of chemical modifications, gene clustering and regulation (such as antisense transcription-based regulation), nucleosome occupancy (density vs. expression levels), non-transcribed regulatory elements, and organizer-anchorage sequences. We still have a long way toward three-dimensional modeling of the human chromosomes in a dynamic way for development and differentiation.
Fourth, other than informational and operational (structural and interactive functions) systems, rules and nature of various homeostatic processes are also critical and unique, including generation and control of energy, material, and signal transduction. The leptin-adipocyte signal control system represents an excellent example; the discovery of obese (ob) gene and its mutation in mouse has not yet led to an ultimate cure for obesity [22]. In this particular regime, cellular processes involved in cross-physiological systems are to be deciphered and large PM projects to categorize components and metabolites in circulating and excreting body fluids are to be expected. To measure everything in precision, novel assays and instrumentation are both essential.
Fifth, some PM projects have to go longitudinal as life is after all governed by time. In the dimension of time, we have so many mysteries to be solved; in addition to normal development and aging, there are more than enough symptoms to be reduced and maybe even cured, including menopause syndrome, Alzheimer’s disease, osteoporosis, osteoarthritis, diabetes, just to name a few. In this regime, plasticity or cellular responsiveness to stress signals and materials comes to the center stage, and the degree and timing are both to be measured in precision. Whether the relevant PM projects are named exposomes, stressomes, or plastisomes may not be important but time-lapse records and measures are the keys. The connectome projects for neurology of several model organisms together with the Human Connectome Project have been pioneering on cognitive plasticity (http://www.humanconnectomeproject.org/). Similar projects on the lymphatic system have also come to their time.
It is clear that the stratification of biology into distinct systems is equally important to that of diseases, as we are increasingly capable of exploiting new territories of research fields [23], [24]. The fields of genetics, epigenetics, and environment have not provided enough conceptual freedom to allow precise description of genotype–phenotype relationship, where complexity, phenotypic plasticity, and cellular heterogeneity are conceptions frequently used. On the one hand, multi-track biology takes a divide-and-conquer approach to define useful data for understanding disease mechanisms at molecular levels. On the other hand, multi-track biology also takes a systematic approach to integrate data into the information commons that can be synthesized into knowledge on physiological systems as well as diseases, the pathological states of the physiological systems. The medicine side of the PM projects is also disease-centric, and their success largely depends on the organization of cohorts; we also expect the data from this side merges into the same information commons, where mechanisms of diseases are deciphered and strategies are designed to make humans healthier more than ever.
Competing interests
The author declared that there is no competing interest.
Acknowledgments
This work was supported by the “Strategic Priority Research Program” of the Chinese Academy of Sciences (Grant No. XDA08010304) awarded to JY.
Handled by Hongxing Lei
Footnotes
Peer review under responsibility of Beijing Institute of Genomics, Chinese Academy of Sciences and Genetics Society of China.
References
- 1.National Research Consul (US) Committee on A Framework for Developing a New Taxonomy of Disease . National Academies Press (US); 2011. Toward precision medicine: building a knowledge network for biomedical research and a new taxonomy of disease. [PubMed] [Google Scholar]
- 2.Holst L. 2015. The precision medicine initiative: data-driven treatments as unique as your own body. https://www.whitehouse.gov/blog/2015/01/30/precision-medicine-initiative-data-driven-treatments-unique-your-own-body. [Google Scholar]
- 3.National Research Consul (US) Committee on Mapping and Sequencing the Human Genome . National Academies Press (US); 1988. Mapping and sequencing the human genome. [PubMed] [Google Scholar]
- 4.DeLisi C. The Human Genome Project: the ambitious proposal to map and decipher the complete sequence of human DNA. Am Sci. 1988;76:488–493. [Google Scholar]
- 5.Yu J., Wong G.K. Genome biology: the second modern synthesis. Genomics Proteomics Bioinformatics. 2005;3:3–4. doi: 10.1016/S1672-0229(05)03002-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yu J. Challenges to the common dogma. Genomics Proteomics Bioinformatics. 2012;10:55–57. doi: 10.1016/j.gpb.2012.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Yu J. Life on two tracks. Genomics Proteomics Bioinformatics. 2012;10:123–126. doi: 10.1016/j.gpb.2012.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ling Y., Jin Z., Su M., Zhong J., Zhao Y., Yu J. VCGDB: a dynamic genome database of Chinese population. BMC Genomics. 2014;15:265. doi: 10.1186/1471-2164-15-265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Yu J., Yang Z., Kibukawa M., Paddock M., Passey D.A., Wong G.K. Minimal introns are not “junk”. Genome Res. 2002;12:1185–1189. doi: 10.1101/gr.224602. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yang L., Yu J. A comparative analysis of divergently-paired genes (DPGs) of Drosophila and vertebrate genomes. BMC Evol Biol. 2009;9:55. doi: 10.1186/1471-2148-9-55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Cui P., Ding F., Lin Q., Zhang L., Li A., Zhang Z. Distinct contributions of replication and transcription to mutation rate variation of human genomes. Genomics Proteomics Bioinformatics. 2012;10:4–10. doi: 10.1016/S1672-0229(11)60028-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Cui P., Lin Q., Ding F., Hu S., Yu J. The transcript-centric mutations in human genomes. Genomics Proteomics Bioinformatics. 2012;10:11–22. doi: 10.1016/S1672-0229(11)60029-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wang D., Yu J. Both size and GC-content of minimal introns are selected in human population. PLoS One. 2011;6:e17945. doi: 10.1371/journal.pone.0017945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wu G., Zhu J., He F., Wang W., Hu S., Yu J. Gene and genome parameters of mammalian liver circadian genes (LCG) PLoS One. 2012;7:e46961. doi: 10.1371/journal.pone.0046961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wu J., Xiao J., Zhang Z., Wang X., Hu S., Yu J. Ribogenomics: the science and knowledge of RNA. Genomics Proteomics Bioinformatics. 2014;12:57–63. doi: 10.1016/j.gpb.2014.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Yu J. The human transcriptomes project: is it hard? Next Gener Sequenc Appl. 2015;2:e104. [Google Scholar]
- 17.Marinov G.K., Williams B.A., McCue K., Schroth G.P., Gertz J., Myers R.M. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res. 2014;24:496–510. doi: 10.1101/gr.161034.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Macaulay I.C., Voet T. Single cell genomics: advances and future perspectives. PLoS Genet. 2014;10:e1004126. doi: 10.1371/journal.pgen.1004126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ozsolak F., Milos P.M. Single-molecule direct RNA sequencing without cDNA synthesis. Wiley Interdiscip Rev RNA. 2011;2:565–570. doi: 10.1002/wrna.84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cui P., Liu W., Zhao Y., Lin Q., Zhang D., Ding F. Comparative analyses of H3K4 and H3K27 trimethylations between the mouse cerebrum and testis. Genomics Proteomics Bioinformatics. 2012;10:82–93. doi: 10.1016/j.gpb.2012.05.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Cui P., Liu W., Zhao Y., Lin Q., Ding F., Xin C. The association between H3K4me3 and antisense transcription. Genomics Proteomics Bioinformatics. 2012;10:74–81. doi: 10.1016/j.gpb.2012.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Li M.D. Leptin and beyond: an Odyssey to the central control of body weight. Yale J Biol Med. 2011;84:1–7. [PMC free article] [PubMed] [Google Scholar]
- 23.Niu Y., Zhao X., Wu Y.S., Li M.M., Wang X.J., Yang Y.G. N6-methyl-adenosine (m6A) in RNA: an old modification with a novel epigenetic function. Genomics Proteomics Bioinformatics. 2013;11:8–17. doi: 10.1016/j.gpb.2012.12.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Dominissini D., He C., Rechavi G. RNA epigenetics: DNA isn’t the only decorated nucleic acid in the cell. Scientist. 2016 http://www.the-scientist.com/?articles.view/articleNo/44873/title/RNA-Epigenetics/ [Google Scholar]
