So far you have learnt how to estimate a population’s statistics from sample data. Now, you will learn how to validate the accuracy of your estimates in the upcoming video.
In this video, you learnt how to arrive at a confidence interval within which the population mean (µ) will lie with a certain probability.
First, the sampling distribution of sample means is constructed for a particular sample size (n ≥ 30). This distribution is a normal distribution with a mean equal to the population mean (µ) and a standard error equal to the standard deviation of the population mean divided by the square root of the sample size (). To be clear, we do not know the mean and standard deviation of the population.
As stated by the empirical rule, 95% of all sample means lie within two standard errors of the mean of the sampling distribution (which is also the mean of the population, according to the central limit theorem).
Now, pick a random sample of size ‘n’. If the standard deviation of the population is not known to us, we have to estimate the sample standard deviation (S) as the population standard deviation (σ). This can only be done if we assume that the population follows a normal distribution. Hence, if we are not provided with an estimate of the population standard deviation, we need to make the assumption that the population follows a normal distribution as we need to compute the standard error in order to proceed.
Now, according to the empirical rule, the mean of a randomly-picked sample has a 95% chance of lying within two standard errors of the population mean (as the mean of the sampling distribution is equal to the population mean). We can invert this statement to say that the population mean has a 95% chance of lying within two standard errors of the sample mean. (Think carefully about why we can perform this inversion.)
Hence, we have arrived at the conclusion that the population mean has a 95% chance of lying within two standard errors of the sample mean. Hence, the 95% confidence interval for the population mean is approximately to .
Remember that if the population standard deviation (σ) is not known, we can approximate it using the sample standard deviation (S) by assuming that the population is normally distributed.
The confidence interval for different confidence levels can be calculated using the following formula:
Here, X̄ is the sample mean, σ is the standard deviation of the population, and n is the sample size. The z-score depends on the confidence level chosen. The z-scores of some commonly used confidence intervals are given in the table provided below.
Confidence Level | Z-score |
50% | 0.674 |
80% | 1.282 |
90% | 1.645 |
95% | 1.960 |
99% | 2.576 |
You can refer to this table in order to quickly arrive at the confidence intervals for a given confidence level.
Another important formula, which will be quite useful as you proceed in this course is the significance.
Significance is simply 100% - Confidence%
OR
1- confidence in decimal