Recall that you had used RFE to select 15 features. But as you saw in the pairwise correlations, there are high values of correlations present between the 15 features, i.e. there is still some multicollinearity among the features. So you definitely need to check the VIFs as well to further eliminate the redundant variables. Recall that VIF calculates how well one independent variable is explained by all the other independent variables combined. And its formula is given as:
where '' refers to the variable which is being represented as a combination of rest of the independent variables.
Let's see Rahim talk about eliminating the insignificant variables based on the VIFs, and the p-values.
To summarise, you basically performed an iterative manual feature elimination using the VIFs and p-values repeatedly. You also kept on checking the value of accuracy to make sure that dropping a particular feature doesn't affect the accuracy much.
This was the set of 15 features that RFE had selected which we began with:
And this is the final set of features which you arrived at after eliminating features manually:
As you can see, we had dropped the features 'PhoneService' and 'TotalCharges' as a part of manual feature elimination.
Refer to the above image, i.e. the final summary statistics after completing manual feature elimination. Now suppose you are a data analyst working for the telecom company, and you want to compare two customers, customer A and customer B. For both of them, the value of the variables tenure, PhoneService, Contract_One year, etc. are all the same, except for the variable PaperlessBilling, which is equal to 1 for customer A and 0 for customer B.
In other words, customer A and customer B have the exact same behaviour as far as these variables are concerned, except that customer A opts for paperless billing, and customer B does not. Now use this information to answer the following questions.
Now that we have a final model, we can begin with model evaluation and making predictions. We'll start doing that in the next session.