Abstract
It is incredibly essential that the current clinicians and researchers remain updated with findings of current biomedical literature for evidence-based medicine. However, they come across many types of research that are nonreproducible and are even difficult to interpret clinically. Statistical and clinical significance is one such difficulty that clinicians and researchers face across many instances. In simpler terms, the P value tests all hypothesis about how the data were produced (model as whole), and not just the targeted hypothesis that it is intended to test (such as a null hypothesis) keeping in mind how reliable are the of the research results. Most of the times it is misinterpreted and misunderstood as a measure to judge the results as clinically significant. Hence this review aims to impart knowledge about “P” value and its importance in biostatistics, also highlights the importance of difference between statistical and clinical significance for appropriate interpretation of research results.
Keywords: Bias, biostatistics, clinical significance, research design, sample size, statistical significance
Introduction
Currently, in the publish or perish era where most of the researches are judged based on their statistically significant findings, it is often difficult for young researchers to interpret the correct findings of the research. The recent development of high-speed and more sophisticated computing power, utilizing high-end computers and statistical software packages, has resulted in a significant increase in the use of statistical methods, tests for hypothesis testing and reporting to the health literature. Unluckily, the appropriate interpretation of research results from the clinical point of not received similar interest.[1] This imbalance from decades to determine the actual importance of statistical and clinical significance and publication of such results in reputed indexed journals had led researchers to consider statistically significant results also as a clinically important one. It is essential to understand that publications in reputed indexed journals do not indicate that appropriate study design or statistics methods were used. This dilemma of the young researchers creates obstacles in their clinical decision-making and ultimately affects their role in Evidence-based practice.[2]
Researchers must realize that a clinical study is valuable and is of importance to clinical practice when the results are appropriately interpreted. Every year hundreds of studies and clinical trials are conducted to test different hypothesis. These trials are entirely dependent on appropriate statistical tests to assess whether new therapies or treatment protocol are better in clinical practice as compared to the usual approach or methods. Researchers should understand what is the importance of both statistical and clinical significance.[3]
When looking from a clinical point of view, the statistically significant difference among groups is not of prime importance. If a well-conducted study shows a difference in treatment options within two groups, it is of prime importance to know whether that difference is of clinically importance or not.[4] Since sample size and measurement variability can easily influence the statistical results, a nonsignificant outcome does not imply that the new therapy or treatment protocol is not clinically useful.[5,6]
Hence this review aims to impart knowledge about “P” value and its importance in biostatistics, also highlights the importance of difference between statistical and clinical significance for appropriate interpretation of research results.
What does P value infer?
In simpler terms, the P value tests all hypothesis about how the data were produced (the whole model), not just the targeted hypothesis that it is intended to test (such as a null hypothesis).[7]
The P value is the likelihood that if every model assumption, including the test hypothesis, were correct, the chosen test statistic would have been at least as large as its observed value.[7]
The most common threshold value for the “P” we find in biomedical literature is 0.05 (or 5%), and most often the P value is distorted into a dichotomous number where results are considered “statistically significant” when P falls on or below a cut-off (usually 0.05) and otherwise its declared “nonsignificant”.[7]
Why are “P” values not enough?
According to Ron Wasserstein, ASA's executive director, the P value was never meant to substitute the scientific reasoning, which is of greater interest. P value, which is a number whose value can range from zero to one in relation to a threshold value, represents the probability that the difference between the groups is not by chance. A well-reasoned and scientifically driven explanation will always remain the basis of reporting scientific outcomes.[8]
On what factors does the “P” value depend?
It should be borne in mind that the “P” value only represents that to what extent the data are inconsistent or incompatible with a given specific statistical model (i.e., null hypothesis). Hence it only aims to accept or reject the null hypothesis rather than focusing on the research hypothesis. From a statistical point of view, it measures the strength of evidence against the null hypothesis.[9]
With the advancement in biostatistics, it is now clear that the “P” value can easily be affected by various factors like sample size, the magnitude of the relationship and error. Each of these factors can work independently or in combination to distort the study findings based on “P” values.[10]
(1) Effect of error on “P” values
In general, two types of errors that is, systematic and random error effects the “P” value.
“Systematic errors,” that is, “Non-random errors” of certain significant magnitude distorts the research results towards a specific direction or can result in altered observed association in either direction. This type of error generally occurs when a single examiner takes the measurement leading to an unintended bias of deviating the research results to his/her expectations or may also result from modification of the measuring technique. Hence, Systematic error is a systematic flaw in the measurement of a variable due to methodological error leading to underestimation or overestimation of measurements. The extent of systematic errors can be determined by re-examination and re-measurement of a certain sufficient number (i.e., 20%, not always applicable) of individuals again by material and method used in the agreement. Some statistical tests like paired t-test, the intraclass correlation and the Bland-Altman method can also help in the determination of systematic errors.[10,11,12]
A “random error” is defined as a variability of the data which cannot be explained. Random errors of high magnitude means trouble in reproducibility of the measurements, which may result in questionable methodology and questionable examiners’ ability. This occurs randomly across the population, ultimately distorting the results. Random errors can be minimized by taking a large number of samples or measurements. Let us understand this by taking an example of measuring Mid-Upper Arm Circumference (MUAC) of the population. While measuring the MUAC of each individual in the population, random error may exhibit itself in the form of random MUAC among individuals that is, less or more MUAC measured as compared to the actual measurement. This may be a result of how the tape was held while taking the measurement, at what position it was measured (ideally midway between the olecranon process and the acromion), and who was the researcher who took the measurement. Random error can be reduced by incorporating a large number of samples or measurements that is, the more study participants are included in these measurements, the smaller the effect of random error will become.[10]
(2) Effect of sample size on “P” values
It is well known that the P value depends on the sample size to a vast extent. More the sample size less will be the variability of the measurement or data, and more precise will be the measurement for the study population. With an increase in sample size, the magnitude of random error decreases and the study is more likely to find a significant relationship if it exists.[10]
(3) Effect of magnitude of relationship between groups on “P” values
P-value also relies on the magnitude of difference or relationship between the groups compared. In simpler terms, if the magnitude of difference between the two groups is more substantial, then it will be easy to detect and will have a small P value.[10]
What are the American Statistical Association (ASA) principal statements on statistical significance and P values?
ASA on 8th March 2016, in the event of the growing concern of misuse and misinterpretation of P values, gave six principal statements to improve conduct and interpretation of quantitative research and increase research reproducibility. The six principal statements issued regarding significance and P value which are as follows:
P-value shows the extent of incompatibility of the data with the stated statistical model.[8]
P-value is neither the measure of the probability of the studied hypothesis being true nor is the representation of the probability that study data were produced by random chance alone.[8]
It is extremely important to note that any business model, policy decision, or conclusion related to any scientific study or experiment should not be based on the P value and merely on the fact whether it passes a specific threshold or not.[8]
It is the moral duty of the authors and researchers to report the research or experimental findings to its full extent and with transparency.[8]
A P value is neither represents the importance of research results nor is the representation of the effect size of the study.[8]
P-value does not give a sufficient measure of evidence regarding a model or “hypothesis”.[8]
What are clinically significant outcomes?
The term “clinically significant” can be used for the researches in which clinically relevant results or outcomes are used to assess the effectiveness or efficacy of a treatment modality. When used the term “clinically significant” findings are those who make the patient improves the quality of life and makes him/her feel, function well.[13]
Clinically significant findings are those which improve medical care resulting in the improvement of individual's physical function, his/her mental status, and ability to engage in social life. The term improvement of quality of life in medical care deals with both subjective and objective terms. Here the term objective deals with improvement in performance status, duration of remission of disease, and prolongation of life-span, while subjective improvement in quality-of-life deals with improved mood, attitude, physical and social activity, feeling of general wellbeing, and the alleviation of distressing symptoms like pain, weakness, and discomfort.
Since statistical significance results do not necessarily mean that the results are clinically relevant and lead to improvement in the quality of life of the individuals. Hence, many outcomes can be statistically significant but not clinically relevant in a clinical point of view. Hence, clinicians and researchers should give importance to both statistical and clinical significance.[13]
A clinically relevant intervention justifies the effects which over-benefits the associated costs, harm, and the inconveniences caused to the individuals for whom it is targeted. The main difference between statistical and clinical significance is that the clinical significance observes dissimilarity between the two groups or the two treatment modalities, while statistical significance implies whether there is any mathematical significance to the carried analysis of the results or not.
Different studies can have a similar statistical significance but may differ significantly in clinical significance. Let's consider an example of two different chemotherapy agents for cancer. The first study estimates to increase the survival of treated patient with Drug A (Less Expensive than usual chemotherapeutic agents) by five years (P = 0.01) and alpha being 0.005, similarly a Second study utilizing Drug B (Expensive than usual chemotherapeutic agents) estimates to increase the survival of treated patient by mere five months (P = 0.01) and alpha being 0.005. In both cases, the statistical test is significant, but Drug B only increases the survival by only five months which is not clinically significant as compared to Drug A which increases survival by five years, nor useful in terms of cost-effectiveness and superiority when compared to already available chemotherapeutic agents.[14,15]
Conclusion
Hence from the above description of statistically significant and clinically significant results, it is clear that both the notations have the importance of their own. The statistically significant results may not of clinical importance, vice versa the results which are of clinical importance may not be statistically significant. It is high time now that the researchers, journal editors, and readers should take a keen interest in looking beyond the threshold “P” value and also consider the results from a clinical point of view rather than just assessing the worth of research by considering the “P” values. All the researchers should also take into account the design, sample size, effect size of the study, bias incorporated, and reproducibility of the study while analyzing the study results. An aware researcher with a logically and critically thinking mind is in the best position to evaluate research results and thereby applying them to practice evidence-based medicine. Logically, discussion of the clinically significant research results will increase discussion and understanding of the new treatment modalities and will help in the upliftment of evidence-based practice.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References
- 1.Arnold LD, Braganza M, Salih R, Colditz GA. Statistical trends in the Journal of the American Medical Association and implications for training across the continuum of medical education. PLoS One. 2013;8:e77301. doi: 10.1371/journal.pone.0077301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Grabowski B. “P<0.05” might not mean what you think: American statistical association clarifies P values. J Natl Cancer Inst. 2016;108:djw194. doi: 10.1093/jnci/djw194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Page P. Beyond statistical significance: Clinical interpretation of rehabilitation research literature. Int J Sports Phys Ther. 2014;9:726–36. [PMC free article] [PubMed] [Google Scholar]
- 4.Bhandari M, Joensson A. Part ΠC: Understanding Treatment Effects. Clinical Research for Surgeons. Thieme Electronic Book Library. 333, Seventh Avenue. New York, NY 10001, USA: Thieme Publisher; 2009. pp. 139–44. [Google Scholar]
- 5.Sullivan GM, Feinn R. Using effect size-or why the P value is not enough. J Grad Med Educ. 2012;4:279–82. doi: 10.4300/JGME-D-12-00156.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Batterham AM, Hopkins WG. Making meaningful inferences about magnitudes. Int J Sports Physiol Perform. 2006;1:50–7. [PubMed] [Google Scholar]
- 7.Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. Eur J Epidemiol. 2016;31:337–50. doi: 10.1007/s10654-016-0149-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.“P-Values under Question.” American Psychological Association, American Psychological Association. [Last accessed: 30 December 2020]. Available from: https://www.apa.org/science/about/psa/2016/03/p-values .
- 9.Nahm FS. What the P values really tell us. Korean J Pain. 2017;30:241–2. doi: 10.3344/kjp.2017.30.4.241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Thiese MS, Ronna B, Ott U. P value interpretations and considerations. J Thorac Dis. 2016;8:E928–31. doi: 10.21037/jtd.2016.08.16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Houston WJ. The analysis of errors in orthodontic measurements. Am J Orthod. 1983;83:382–90. doi: 10.1016/0002-9416(83)90322-6. [DOI] [PubMed] [Google Scholar]
- 12.Cançado RH, Lauris JR. Error of the method: What is it for.? Dental Press J Orthod. 2014;19:25–6. doi: 10.1590/2176-9451.19.2.025-026.ebo. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Armijo-Olivo S. The importance of determining the clinical significance of research results in physical therapy clinical research. Braz J Phys Ther. 2018;22:175–6. doi: 10.1016/j.bjpt.2018.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Tenny S, Abdelgawad I. StatPearls [Internet] Treasure Island (FL): StatPearls Publishing; 2019. [[Updated 2019 May 13]]. Statistical Significance. Available from: https://www.ncbi.nlm.nih.gov/books/NBK459346/ [PubMed] [Google Scholar]
- 15.Man-Son-Hing M, Laupacis A, O’Rourke K, Molnar FJ, Mahon J, Chan KB, et al. Determination of the clinical importance of study results. J Gen Intern Med. 2002;17:469–76. doi: 10.1046/j.1525-1497.2002.11111.x. [DOI] [PMC free article] [PubMed] [Google Scholar]