The American Statistical Association (ASA) has recently made an official warning to stop misuse of p values in science.1 Misuse of statistics has been discussed for many decades without much improvement. We have not been wise enough to eliminate this problem. In the ASA’s statement,1 Dr. Jeff Leek’s quote adequately describes the current chaotic situation as “the vast majority of data analysis is not performed by people properly trained to perform data analysis”. In analogy, as pathologists or laboratory physicians, we should ask ourselves, “Would you let non-professionals make pathological diagnosis?”
The statement1 begins with an intriguing question and answer; “Q: Why do so many people still use p = 0.05?”, “A: Because that’s what they were taught in college or grad school”. This is very analogous to the historic misfortune that the “QWERTY” keyboard has got very popular up to date despite its inefficient arrangement (Paul David, 1985). The answer is simple as “because that’s what was taught and everyone is using it”.
We should be skeptical if any p value (such as 0.05) can serve as a threshold to claim scientific significance. As a substantial challenge evident in the literature, the vast majority of researchers who conduct studies have received only inadequate training in study design, data analyses, and informatics. Thus, it appears that researchers have needed to rely their scientific judgments on a threshold as p = 0.05, without considering context of research design and analyses. However, to appropriately interpret a given p value, thorough understanding of not only statistics but also a particular study design and data analysis is essential. By analogy, we must undergo appropriate training in Anatomic Pathology or Clinical Pathology to interpret histopathologic features or laboratory assays, respectively.
This p value of 0.05 was arbitrarily suggested as a threshold when we had neither big datasets nor computer. Now it is easy for us to test 10 or 1000 hypotheses by just several mouse clicks. The numbers of analyses, papers and journals are all skyrocketing. One can select and report the most significant findings without describing all analyses conducted (i.e., selective reporting). As a result, we are facing a large number of non-reproducible findings, which many researchers produce without adequate knowledge and skills in study design and statistics.
It is very clear that all researchers who conduct studies must receive adequate training in study design and data analyses. This is one reason why pathology and laboratory medicine should become integrated with statistics and health data science (ie, epidemiology).2 Molecular pathological epidemiology has been growing as a single unified field,2 with a promise that all pathologists and laboratory professionals will be well trained in statistics and data science in the future.
Although it may seem nearly impossible, we should set our goal of having all biological, medical and public health researchers well-versed with data science in the near future, as we are interested in pursuing genomics and precision medicine. We believe that this issue needs to be addressed by the entire research community.
Acknowledgments
Funding support: This work was supported by the U.S.A. National Institute of Health Grants R35 CA197735 (to SO), and K07 CA190673 (to RN).
We thank Dr. Robert B. Wilson for helpful comments.
Footnotes
We declare no conflict of interest.
References
- 1.Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose. The American Statistician. 2016 in press (published online) [Google Scholar]
- 2.Ogino S, Nishihara R, VanderWeele TJ, et al. The role of molecular pathological epidemiology in the study of neoplastic and non-neoplastic diseases in the era of precision medicine. Epidemiology. 2016 doi: 10.1097/EDE.0000000000000471. in press (published online) [DOI] [PMC free article] [PubMed] [Google Scholar]