You learnt the nuances of sample selection in the last lecture. However, suppose the model you built, after selecting the data with all due considerations, has a low accuracy. Here, you know that the model is not performing well on the chosen sample data set. Your task is to make a model which gives a decent model performance.
So, assuming that you cannot take another sample, what can you do to increase the model’s performance? There are various ways of handling such problems in industries, and one of them is segmentation of the population. Let’s learn how to perform a population segmentation in detail from Hindol Basu.
Hence, it is very helpful to perform segmentation of the population before building a model.
Let's talk about the ICICI example again.
For students and salaried people, different variables may be important. While students' defaulting and not defaulting will depend on factors such as program enrolled for, the prestige of the university attended, parents' income, etc., the probability of salaried people will depend on factors such as marital status, income, etc. So, the predictive pattern across these two segments is very different, and hence, it would make more sense to make different child models for both of them, than to make one parent model.
A segmentation that divides your population into male and female may not be that effective, as the predictive pattern would not be that different for these two segments.
So with that, we've looked at the various aspects to take care of while selecting data (samples) for building a model. In the next lecture, you will learn which aspects you should take care of when you talk about variables.