Now that you have been introduced to the world of inferential statistics, it is time to look at an important tool that will come in handy as you voyage through it – the central limit theorem.
In the upcoming video, you will learn more about the central limit theorem from Thomas.
Note: This is a simulation tool in this link could be used to understand sampling better.
In this video, you learnt that the central limit theorem states that if you take sufficiently large random samples (sample size ‘n’) from any population distribution with a mean μ and standard deviation σ, the distribution of sample means (or the ‘sampling distribution of sample means’) will be a normal distribution with a mean µ and standard deviation σ/√n.
How would you feel about working with a distribution of the outcomes of a sample and are not sure about what that distribution is like? Well, that is where the central limit theorem comes into play. The central limit theorem, in simple words, says that for a large sample size, the distribution is likely to be a normal distribution.
Suppose you have taken a sample size of 100 employees and their age As per the central limit theorem, the sample outcomes distribution is likely to be that of a normal distribution, where the values of one interval are likely to be concentrated in the centre peak of the distribution.
Let’s break down the statement of the central limit theorem to better understand what it means.
The sampling distribution of sample means is a probability density function of the values of the sample means of a particular sample size. This distribution turns out to be a normal distribution and has some interesting properties.
First, the mean of the sampling distribution is assumed to be equal to the mean of the population.
Second, the standard deviation of the sampling distribution is assumed to be equal to the standard deviation of the population divided by the square root of the sample size.
Note that the standard deviation of the sample means distribution is also referred to as the ‘standard error of the mean’, or simply the ‘standard error’, and is denoted by ‘SE’.
From the formula of the standard error, it is clear that as the sample size increases, the sampling distribution of sample means becomes narrower and better resembles a normal distribution.
You should adhere to the convention that the sample size should at least be 30 for applying the central limit theorem. Hence, a sample size of 30 can be treated as a cut-off for the central limit theorem to properly apply.
To summarise, the central limit theorem claims that irrespective of the probability distribution of the population, the distribution of sample means follows a normal distribution if the sample size is sufficiently large.
One of the biggest implications of the central limit theorem is that it applies irrespective of the probability distribution of the population. This is what makes the central limit theorem so impactful. Also, the assumption that the population mean is equal to the mean of the sampling distribution has several implications, which you will learn in the next segment.
Now that you have learnt how to estimate a population’s statistics from sample data, the next segment will teach you how to validate the accuracy of your estimates.