In the last segment, Ashish explained the steps to collect the data . Now again like in the last segments, the first step is to generate ceratin hypotheses before we collect the data and analyse it. Let's hear from Ashish as he walks us through the steps for the same.
The data related to claims and the policy is easily available with the insurance company so it is not difficult to get a hold of the same. There is no cost involved in procuring this data and analytics team will typically have access to this data anyways.
We create a risky profession, risky geography (city and state) variables. The way to do that essentially calculates the average claims number and map each city against this average and mark high or low and use that flag in the model. This way the model always penalizes the right combinations and is always updated
Each policy is scored at login and is provided with a score which eventually gets segregated into High medium and low. The top 2 % policies are either straight away rejected or sent for medicals.
The early claims have come down over a period of time, if the early claims volume goes up, we revisit our model and do a recalibration of the model. However, what is important to note here is that it is not just the analytical model that has made all the difference.