The one crucial aspect of multiple linear regression that remains to be discussed is feature selection. When building a multiple linear regression model, you may have quite a few potential predictor variables; selecting just the right ones becomes an extremely important exercise.
Let’s see how you can select the optimal features for building a good model.
To get the optimal model, you can always try all the possible combinations of independent variables and see which model fits best. But this method is time-consuming and infeasible. Hence, you need another method to get a decent model. This is where manual feature elimination comes in, wherein you:
Note that the second and third steps go hand in hand, and the choice of which features to eliminate first is very subjective. You will see this during the hands-on demonstration of multiple linear regression in Python in the next session.
Now, manual feature elimination may work when you have a relatively low number of potential predictor variables, say, ten or even twenty. But it is not a practical approach when you have a large number of features, say 100. In such a case, you automate the feature selection (or elimination) process. Let's see how.
You need to combine the manual approach and the automated one in order to get an optimal model relevant to the business. Hence, you first do an automated elimination (coarse tuning), and when you have a small set of potential variables left to work with, you can use your expertise and subjectivity to eliminate a few other features (fine tuning).