Following are the two types of chi-squared tests:
Chi-squared test of independence
Chi-squared goodness of fit (This is used to test whether the sample data correctly represents the population data.)
Chi-squared test of independence: This is used to determine whether or not there is a significant relationship between two nominal (categorical) variables.
For example, a researcher wants to examine the relationship between gender (male vs female) and the chances of developing Alzheimer's disease. The chi-squared test of independence can be used to examine this relationship. The null hypothesis (Ho) for this test is that there is no relationship between gender and life expectancy, and the alternative hypothesis is that there is a relationship between gender and life expectancy.
Here, there are two categorical variables (nominal variables): male and female.
Let’s draw a table for both these categorical values:
|
The expected value is calculated by assuming that the null hypothesis is correct. So, if you select a sample of, say, 100 Alzheimer’s patients, 50 should be men and 50 should be women.
Putting the expected values in the table above, you get:
| Female | |||
Expected Value | 50 | 50 | ||
Sample Value |
Let’s say the sample value comes out to be a bit different, and in a sample of 100 Alzheimer’s patients, 60 are men and 40 are women.
| Female | |||
Expected Value | 50 | 50 | ||
Sample Value | 60 | 40 |
|
The test statistic for the chi-squared test is equal to ,
where O is the observed sample value and E is the expected value.
So, our test statistic will be equal to:
𝝌2 = (10^2)/50 + (10^2)/50 = 4
Let’s select the level of significance as 5%, or 0.05.
Degrees of freedom = (r - 1) x (c - 1), where r is the number of rows and c is the number of columns.
So, the degree of freedom, in this case, is 1.
Now, you will use the chi-squared distribution table to calculate the critical value.
Select the value corresponding to the required degrees of freedom and the significance level.
So, the critical value is 3.84, and the test statistic value is 4.
The following is a chi-squared distribution for different values of k (degrees of freedom):
In this case, the test statistic value (4), which is greater than the critical value, lies in the rejection region.
Therefore, you reject the null hypothesis.