How we utilize Statistical Inference is indeed a critical piece in the evaluation of new information in the Biomedical Literature.
Many researchers today believe that a “statistically significant” P-value is the primary justification for publication of the findings of their studies. In the first two weeks of March 2016, two articles on the use of the “p-value” were published. The first was highlighted in the British Journal, Nature, summarizing a statement released by the American Statistical Association on the misuse of the p-value [i]. The ASA issued a strongly worded statement cautioning against the over reliance on a p-value that is “statistically significant” in driving major changes in concepts or public policy. The second article, published in the American journal, JAMA, discussed how the use of the p-value has increased in medical literature over the last quarter of a century[ii]. Both articles caution that use of a p-value, by itself, to drive a change in scientific thinking is potentially misleading. Indeed often a study with results that are “statistically significant”, as evidenced by the p-value, cannot be reproduced.
A p-value is supposed to tell us whether an observed difference in ratios or other numbers is potentially related to chance or may be “real” allowing one can reject the “null hypothesis” that the numbers are similar. It is easy to find a computer program that will do the arithmetic to calculate a mean and standard deviation, from which a p-value can be calculated. However, one would be well served to remember that classical statistics depend on “normally distributed” values, and that the sample must not be biased. If the collection of the sample to be studied is biased then the “statistical significance” will also be biased. Finally, we must be aware of the “pragmatic significance” of a difference in means or ratios. Who cares if the difference in the average height of a class of 11-year-old children is 4 feet, 3.5 inches or 4 ft, 3.25 in, even if the p value is less than 0.001? This would be a case of a highly statistically significant, but pragmatically unimportant difference.
One really inopportune use of statistical inference is when the sample sizes are very small. In this instance it is indeed likely that the samples are not normally distributed and that they have a high likelihood of being biased. Here any p-value can potentially lead us down a garden path to nowhere.
The increasing volume of research papers (from just over 400,000 in 1990 to almost 1,200,000 in 2014, an almost trebling in volume of papers over 25 years) often depend on “statistical significance” to get published. The exact proportion of irreproducible results among these millions of papers, unfortunately, isn’t clear.
One message that has been proposed is that before we change our perspective the first paper to show a difference/correlation really needs an independent confirmation. Perhaps the second paper is potentially more important than the first. In addition, we might be well served to ask, “So what? Is this a meaningful result, regardless of the p-value?”
[ii] Chavalarias, D; et al: Evolution of Reporting P Values in the Biomedical Literature, 1990-2015: JAMA, 2016, 315, 1141-1148