I thought I was done with this but I’m not. Time to have more of a play in R with Land registry data and the National Statistics Postcode Lookup (NSPL). You never know maps and networks might also be involved.
In the last two posts I created some simple decision trees and tested their accuracy. Now it’s time to try some other models. As before I’m going to continue predicting the variable FiveHundredPlus with a limited set of factors to keep the processing pressures down. Once I’m a bit more confident I’ll move to the larger dataset and a more powerful machine. I’m going to use the package caret and recreate this post from Analytics Vidhya.
Full code saved on my github page here.