Now that you have learnt the theory of ROC curve, let's plot an ROC curve in Python for our telecom churn case study.
Let's first take a look at the ROC curve code that you just saw:
# Defining the function to plot the ROC curve def draw_roc( actual, probs ): fpr, tpr, thresholds = metrics.roc_curve( actual, probs, drop_intermediate = False ) auc_score = metrics.roc_auc_score( actual, probs ) plt.figure(figsize=(5, 5)) plt.plot( fpr, tpr, label='ROC curve (area = %0.2f)' % auc_score ) plt.plot([0, 1], [0, 1], 'k--') plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate or [1 - True Negative Rate]') plt.ylabel('True Positive Rate') plt.title('Receiver operating characteristic example') plt.legend(loc="lower right") plt.show() return None # Calling the function draw_roc(y_train_pred_final.Churn, y_train_pred_final.Churn_Prob)
Notice that in the last line you're giving the actual Churn values and the respective Churn Probabilities to the curve.
Following is the ROC curve that you got. Note that it is the same curve you got in Excel as well but that was using scatter plot to represent the discrete points and here we are using a continuous line.
The Diagonal
For a completely random model, the ROC curve will pass through the 45-degree line that has been shown in the graph above and in the best case it passes through the upper left corner of the graph. So the least area that an ROC curve can have is 0.5, and the highest area it can have is 1.
The Sensitivity vs Specificity Trade-off
As you saw in the last segment as well, the ROC curve shows the trade-off between True Positive Rate and False Positive Rate which essentially can also be viewed as a tradeoff between Sensitivity and Specificity. As you can see, on the Y-axis, you have the values of Sensitivity and on the X-axis, you have the value of (1 - Specificity). Notice that in the curve when Sensitivity is increasing, (1 - Specificity), And since, (1 - Specificity) is increasing, it simply means that Specificity is decreasing.
Area Under the Curve
By determining the Area under the curve (AUC) of a ROC curve, you can determine how good the model is. If the ROC curve is more towards the upper-left corner of the graph, it means that the model is very good and if it is more towards the 45-degree diagonal, it means that the model is almost completely random. So, the larger the AUC, the better will be your model which is something you saw in the last segment as well.
The following link provides a very interesting insight into the ROC curve.