Let's look back at the credit card example from earlier sessions.
Let's look at 2 cases, where the variable's value is missing (equal to NA):
- Utilisation is missing: As mentioned earlier, if this variable is missing for a particular customer, that could very well be because the bank did not find that customer worthy enough for a credit card. Hence, these missing values are not missing at random, and it would be unfair to just replace them with the mean or the median. As mentioned earlier, it would be wiser to perform a WOE analysis and then replace these values.
- Age is missing: Consider why the variable age is missing for some customers. Here, it may actually make more sense to just replace the missing value with the mean or the median, instead of wasting time on WOE analysis. This is because it is very likely that the variable age is just missing because of a system error or a manual error, and there is no clear pattern behind the missing values.
Also, recall that Hindol mentioned that there are two more methods for treating missing values. Those were not taught in this course because of their complexity. However, if you wish to, you can go through both of them here:
- Markov Chain Monte Carlo - SAS Support
- Expectation Maximisation - Oxford University Statistics