Before moving on to the Python code, we need to address an important aspect of linear regression: the assumptions of linear regression.
While building a linear model, you assume that the target variable and the input variables are linearly dependent. But do you need any assumptions other than this?
Let’s hear what Rahim has to say.
You are making inferences on the 'population' using a 'sample'. The assumption that variables are linearly dependent is not enough to generalise the results you obtain on a sample to the population, which is much larger in size than the sample. Thus, you need to have certain assumptions in place in order to make inferences.
Let's understand the importance of each assumption one by one:
There is a linear relationship between X and Y:
Error terms are normally distributed with mean zero(not X, Y):
Error terms are independent of each other:
Error terms have constant variance (homoscedasticity):
You will look at each of these assumptions in more detail later and validate these while building the model.
You can also go through the following link to see what happens when the assumptions are violated. But things will anyway get clearer once we keep moving ahead.
Image sources: