Now that you have learnt all the basics of hypothesis testing, you are now well equipped to frame a hypothesis, test it, and make a decision to reject or not reject the null hypothesis. (This is done considering the fact that the population standard deviation for the data is known and the sample size is greater than 30.)
But how will you test the hypothesis if these conditions are not fulfilled? Let’s find out.
The t-distribution, is kind of a normal distribution; it is also symmetric and single peaked but less concentrated around its peak. In layman’s terms, a t-distribution is shorter and flatter around the centre than a normal distribution. It is used to study the mean of a population that has a distribution fairly close to a normal distribution (but not an exact normal distribution).
Two simple conditions to determine when to use the t-statistic are as follows:
The population standard deviation is unknown.
The sample size is less than 30.
Even if one of them is applicable in a situation, you can comfortably go for a t-test. The formula to determine the t-statistic is:
Here, s is the sample standard deviation.
Let’s look at a problem to get a better understanding of the t-test.
The National Highways Authority of India (NHAI) stated that the average number of accidents per month on national highways is 12,000. A researcher wanted to test this claim. To that end, he collected 25 samples for 25 months and found out that the sample mean was 13,105 and the sample standard deviation was 1638.4.
Let’s now try to solve this problem according to the steps we discussed earlier.
The hypothesis for this case will be:
: μ = 12000
: μ ≠ 12000
In this case, the population standard deviation is not given. So, you will calculate the t-statistic.
t = (x – μ) / (s/√(n))
= (13105 - 12000)/(1638.4/√25)
= 1105/327.68
= 3.37
Now, as in the case of a normal test, you need to compare the value you calculated with the tabular value.
For a 90% confidence interval and a sample size of 25, the critical t value is 1.71.
(Here is a link to the tutorial of critical t-value calculation: http://www.dummies.com/education/math/statistics/how-to-find-t-values-for-confidence-intervals/.)
Thus, our acceptance region lies between +1.71 and -1.71.
As our calculated t-value lies outside the acceptance region, you reject the null hypothesis and can say that you don't have sufficient evidence to support the fact that the number of accidents is equal to 12,000 per month on the highways.
With this example, you have a complete understanding of the one-sample t-test. Let’s now focus on the two sample t-test. As the name suggests, this test is conducted on two sets of sample data in order to compare the means of two samples.
Note that a two-sample test can be performed for multiple statistical parameters, but you are going to focus only on the two-sample test for means, where the standard deviations of both the samples are unknown.
The formula for the two-sample t-test is:
Suppose that you want to come up with a hypothesis test regarding the mean age difference between men and women. You can use the two-sample t-test in such a case.