The concept of p-value is very important in the field of statistics because of one solid advantage it has over the critical value method; you don’t have to state the significance level before conducting the hypothesis test in the case of the p-value method. It is easier to understand intuitively whether or not you are going to reject the null hypothesis. In this segment, we will be looking at a very typical problem of testing whether a coin is fair or not using the concept of p-value.
Recall the definition of p-value: It states the probability of observing a similar or more extreme observation, given that the null hypothesis is true.
Let's try to understand the definition a little better here because you may not have noticed, but this definition allows us to conduct hypothesis testing on distributions that are not normal in nature. ( In fact, hypothesis testing can be done on non-normal distributions. However, given the concepts that you learnt in the previous sessions, only the p-value method is within the scope of what we can discuss here.)
This method is best explained using an example. This is a very common type of question asked in interviews.
Demonstration
Suppose you toss a coin, the nature of which (whether it is biased or unbiased) you are not aware of. After tossing for 10 times, you observed 8 heads and 2 tails. Now you are asked to test the hypothesis of whether the coin is biased or unbiased. You are also asked to measure the p-value at a 0.05 significance level and make a decision.
Now, the solution methodology for this case may not seem straightforward at first glance, but as a matter of fact, it is quite neat and intuitive.
First, as we always do while conducting a hypothesis test, let's define the null and the alternative hypotheses.
So, what would the null hypothesis be in this case?
Well, according to the question, the null hypothesis of this test is that the coin is unbiased, i.e., P(H) = P(T) = 0.5.
And the alternative hypothesis would be P(H) ≠ 0.5, or P(T) ≠ 0.5. Observe that both these cases are similar in nature, as both the hypotheses denote the same situation.
Now, let's formally create the null and alternative hypotheses.
H0 : P(H) = 0.5
H1 : P(H) ≠ 0.5
(You can also use P(T) to denote the null and alternative hypotheses in the case above.)
Now, as stated in the problem, we have observed 8 heads.
Recall what the p-value definition states: It is the probability of observing a similar or more extreme observation, given that the null hypothesis is true.
Let's use this definition in our solution methodology to get the answer.
The solution methodology using the definition of p-value would look somewhat like this:
Solution methodology
Assume the null hypothesis to be true, i.e., P(H)=0.5.
Here, a similar or more extreme observation would be denoted by (Heads ≥ 8) and its probability would be given by P(Heads ≥ 8).
Calculate the probability of P(Heads ≥ 8), given that P(H) = 0.5.
Observe that the hypothesis-test is two-tailed. Hence, multiply the previous probability by 2. This would be the p-value of this test.
Explanation
First, we assumed that the null hypothesis is true. Then we checked the current observation and tried to deduce what the extreme version of this observation might be from the given null hypothesis.
In the ideal case, we would have got 5 heads. But here, we got 8 heads. Thus, the more extreme versions lie towards 8 or more heads, rather than 8 or fewer heads. This is analogous to the way critical regions are found.
(But you can also say that observing 1 or 2 heads can also be an extreme observation. How do we take that into consideration? You will see how in a short while.)
Step 3 is the most crucial step. Here, we leverage the definition to calculate the p-value. Given that the null hypothesis is true, i.e., P(H) = 0.5, we are about to calculate the probability of getting similar or extreme observations, which is the probability as given by P(Heads ≥ 8).
If you observe carefully, you will see that it is equivalent to calculating the probability of observing 8 or more heads in a coin toss experiment where the unbiased coin is flipped 10 times.
Or, the aforementioned problem can be reduced to that of calculating the cumulative probability of a binomial distribution, with p = 0.5, n = 10 and r = 8.
Thus, P(Heads ≥ 8) = P(X ≥ r) = P(X ≥ 8) = P(X = 8) + P(X = 9) + P(X = 10) = 10C80.580.52 + 10C90.590.51 + 10C100.510 = 0.055.
Thus, the probability of P(Heads ≥ 8) is now calculated. Now, note that this would be analogous to a two-tailed test because from the null hypothesis, we can infer that the extreme observations can occur at both ends, i.e., it can be biased towards the tails or heads. (Take a look at the image above to understand the position of the extreme observations.)
So, we can have observations of 2 or 3 heads as another extreme. What do we do now?
Well, since the binomial distribution is symmetric, we need not do much here; simply multiply the previous value by 2 in a manner similar to how you calculated in the case of a normal distribution.
And voila! We have the p-value as 2 * P(Heads ≥ 8) = 2 * 0.055 = 0.11.
Given the significance level of 0.05 and the calculated p-value, we can safely say that we fail to reject the null hypothesis.