fbpx

What does p<.05 really mean?

If you’re involved in research, you’ve likely come across the notation ‘p < .05’ in journal articles or elsewhere, or you’re using this notation in your writing to indicate a statistically significant result. It is used pretty much everywhere, almost as a mantra across studies.

But do you know what p < .05 actually means? Are you interpreting it correctly?

Here are a few common interpretations of p < .05 (Spence & Stanley, 2018).

“There is a low probability (less than 5%) that the result was due to chance”

“There is less than a 5% chance that the null hypothesis is true”

“There is a 95% chance of finding the same result in a replication”

“The odds that a result happened due to chance is small –specifically less than 5%”

The problem with these interpretations is that they are all wrong.

What does the expression p < .05 mean?

Because most students hate reading about statistics, I am taking the liberty of presenting the bottom line first.

The bottom line of a statistically significant result is that it means that there is probably an effect — there is probably a difference between the means of your groups, or there is probably a relationship between your variables.

Now let us look at some explanatory notes that may appear beneath this bottom line.

Stay with me, please…

The notes explain that in the case of a significant result (p < .05), you would have no idea how big the effect is that probably exists, how big the difference is that probably exists between the group means, or how strong the relationship is that probably exists between your variables.

All you can safely say from a significant result is that there is probably some nonzero effect, some nonzero difference, or some nonzero relationship.

So where to from here?

To judge how strong this probable nonzero effect is or how large the mean difference or relationship is, the next step is to calculate the effect size. The effect size indicates the meaningfulness or practical significance of the statistically significant result.

The concept of effect size is very important (and the topic of a future blog).

Want to know more about the statistics behind p and significance?

Here we go…

The main method for testing your hypothesis or expectation is called null hypothesis significance testing. In this method, we test the assumption via the null hypothesis that there is no treatment effect in your study, i.e., no difference between your groups or no relationship between your variables. Opposing the null hypothesis is your alternate hypothesis H1 that there is an effect. Therefore, the null hypothesis is Ho: No treatment effect. The alternate hypothesis is H1: There is an effect. (These hypotheses are usually written in Greek symbols).

Now, think about your research. If you are using quantitative methodology, you have probably selected a sample, ideally a random sample, from a population and calculated its mean, mean difference, or relationship between variables.

Next, consider the following theoretical scenario, assuming that Ho is true (under Ho, there is no difference or no effect).

Hypothetically, you could continue drawing many, many random samples from your population of interest and thus land up with a population of samples under the assumption that Ho is true. Thereafter, you can easily work out the proportion of samples that would be drawn from this hypothetical distribution of samples under Ho that would give a result similar to your observed sample result, or a more extreme result than it. If you find that many of these hypothetical samples give a result similar to the one you found in your study sample or are more extreme than it, then you continue assuming that Ho is true.

However, say you find that very few, i.e., fewer than 5% of the many hypothetical samples drawn under the null hypothesis give the same result as you found in your study sample or yield a more extreme result. Thus, very few samples under Ho support the result you found in your sample. Therefore, your assumption that the null hypothesis is true is incorrect. Because the proportion of hypothetical samples drawn under Ho that supports your finding is so small (p < .05), you would reject Ho.

Therefore, p is an index that determines whether you have a significant result. P is the proportion of samples that would be drawn from a hypothetical distribution of samples under Ho (i.e., if there is no effect) that would yield results similar to yours or that would be more extreme.

The bottom line once again: What a significant result or rejecting Ho (p < .05) means

As Ho says that there is no effect, the essence of what it means to reject Ho is that you reject that there is no effect. You reject that there is no difference between your groups. You reject that there is no relationship between your variables. Therefore, on the basis of the small proportion (< 5%) of hypothetical samples under Ho that would support your finding, you say that there is probably an effect, a difference between your groups, or a relationship between your variables.

Then, you move on to calculating and interpreting the effect size.

I hope you will join me further on our statistical journey.

Contact me at [email protected] if you need help with your study design, statistical analysis, or any stage of your dissertation.