Simple machine learning in R – ROC

A quick follow up on the last post. I forgot to write about plotting ROC curves in R based on the different models. In the last post I created 5 progressively more complicated decision trees which didn’t really add any benefit when looking at the accuracy of the model. But accuracy is just one metric, what do the ROC curves look like and what are the areas under the curves?

For this we need to load the package ROCR and then the process is pretty simple. Here is the process for plotting the ROC curve for one of the models:

#Calcuate probabilities
pred_p_t <- predict(tree, test, type = "prob")[,2]

# Make prediction objects for the ROC curves
predict_1 <- prediction(pred_p_t, test$FiveHundredPlus)

#Calculate the performance
perf_1 <- performance(predict_1, "tpr", "fpr")

#Calculate the area under the curve
auc_1 <- performance(predict_1,"auc")@y.values[[1]]

#Plot the ROC curve
plot(perf_1)

Which, when this process is repeated for models 1-5, gives the chart

And the AUC values:

Model Accuracy AUC
Decision Tree 1 0.920 0.722
Decision Tree 2 0.920 0.722
Decision Tree 3 0.920 0.729
Decision Tree 4 0.920 0.850
Decision Tree 5 0.920 0.867

 

On this measure the more complex trees do add value. Which shows the value of using more than one performance metric.

Leave a Reply

Your email address will not be published. Required fields are marked *