A quick follow up on the last post. I forgot to write about plotting ROC curves in R based on the different models. In the last post I created 5 progressively more complicated decision trees which didn’t really add any benefit when looking at the accuracy of the model. But accuracy is just one metric, what do the ROC curves look like and what are the areas under the curves?
For this we need to load the package ROCR and then the process is pretty simple. Here is the process for plotting the ROC curve for one of the models:
#Calcuate probabilities pred_p_t <- predict(tree, test, type = "prob")[,2] # Make prediction objects for the ROC curves predict_1 <- prediction(pred_p_t, test$FiveHundredPlus) #Calculate the performance perf_1 <- performance(predict_1, "tpr", "fpr") #Calculate the area under the curve auc_1 <- performance(predict_1,"auc")@y.values[[1]] #Plot the ROC curve plot(perf_1)
Which, when this process is repeated for models 1-5, gives the chart
And the AUC values:
Model | Accuracy | AUC |
Decision Tree 1 | 0.920 | 0.722 |
Decision Tree 2 | 0.920 | 0.722 |
Decision Tree 3 | 0.920 | 0.729 |
Decision Tree 4 | 0.920 | 0.850 |
Decision Tree 5 | 0.920 | 0.867 |
On this measure the more complex trees do add value. Which shows the value of using more than one performance metric.