Biologists are not the best mathematicians in the world. But that’s okay because many of our experiments do not require complicated mathematical manipulations. However, we do often use statistics, and this is where our training often lets us down. Many university biology courses do not require students to study statistics and therefore many biologists lack a basic understanding of null hypothesis testing. For a good primer on statistical tests, including null hypothesis testing, see Stattrek.
Like all scientists, biologists generate hypotheses, design experiments to test these hypotheses and then often use null-hypothesis testing to analyse the data. Null hypothesis testing is the application of a statistical test to determine whether or not there is a significant difference between test conditions. For example, you may have the hypothesis (H1) that 11-year old boys are different in height to 11-year old girls. The null hypothesis (H0) for this statement would be that there is no difference in height between 11-year old boys and girls. You would then measure the heights of a number of 11-year old boys and girls and compare them with a statistical test (such as a standard student’s t-test) to find out whether or not there is a significant difference in the mean (average) heights. To do this, scientists usually look at the p-value generated by a t-test. Usually, biologists use a cut-off p-value of <0.05 – if the test gives a p-value of less than 0.05, the difference between control and test groups is deemed to be significant and the null hypothesis is rejected. In this case, a p-value of below 0.05 would mean that we conclude that there is a significant difference in the heights of 11-year old boys and girls. Setting the p-value at 0.05 necessarily means that 5% of the time, we may incorrectly reject the null hypothesis, which gives us a ‘false-positive’ result. This means that we have a 1 in 20 chance of incorrectly thinking that there is a significant difference in heights, when there is actually no difference. [Note – I don’t know whether or not there is a difference in the heights of 11-year old boys and girls – feel free to test this!]
A 5% chance of incorrectly accepting a hypothesis seems an acceptable level, but I would like to describe one of the ways in which biologists misuse statistical tests and why we should stop doing it. This is something that I’ve thought for a while, but a recent paper in Psychological Science presents data showing how the lazy experimental design can lead to large increases in false positive results (Simmons et al., Psych Sci 2011). The authors of this paper suggest that is too easy to publish “statistically significant” data consistent with any hypothesis because of the way that scientists manipulate their experiments.
In this paper, the authors show that in an experiment with a small sample size (n<10 – as is often the case in biological experiments), performing a statistical test after each additional data collection increases the likelihood of reaching ‘false’ significance (as determined by a p-value <0.05) to a massive 22%. It is true that in many cases, findings in some areas of biology (though not epidemiology or clinical medicine) do not wholly rest on the use of statistics, as they do in psychology, but it does question the value of some data presented in this way. It is common to hear biologists say, “I’ve done 3 repeats and it’s almost significant, so I’ll just do it a few more times”, which means that they may well repeat it once, test for significance and if it doesn’t reach significance, repeat again, which is exacly the kind of ‘conditional stopping’ problem that the paper in Psychological Science illustrates. What many researchers don’t realise is that this is not causing a small increase in the likelihood of a false-positive result, but more than quadrupling it.
Getting a false-positive result 5% of the time is not bad, but unfortunately many researchers in biology and other disciplines, do not apply the tests properly, which vastly increases the rate of false positives. This then means that the field is mis-lead and that other researchers waste their time trying to repeat the results from ‘false-positive’ experiments. It is also notable that ‘failures to replicate’ or ‘negative data’ is very difficult to publish in most journals, which means that many incorrect results remain unchallenged in the scientific literature. Biological experiments are often time-consuming, but also a bit ‘messy’ as there are many, many variables that can cause changes to experimental results. It’s time for researchers in biology to either stop using misleading and misunderstood statistical tests, or for them to take it seriously and ensure that they are applying appropriate tests in the right way.