A quick follow up on the last post. I forgot to write about plotting ROC curves in R based on the different models. In the last post I created 5 progressively more complicated decision trees which didn’t really add any benefit when looking at the accuracy of the model. But accuracy is just one metric, what do the ROC curves look like and what are the areas under the curves? Read More
Now that the land registry data has been imported and had some initial exploratory work done to it lets have a go at making a price prediction model. I’ll use a small subset of the data and initially only try to predict whether or not the house is worth more or less than £500k, rather than the more complicated process of predicting the price. The code used in this post is largely based upon the DataCamp course “Introduction to Machine Learning”. Code for this project is on my GitHub page here. This post focuses on decision trees using the package rpart. Read More
In my first land registry post I imported a month’s worth of land registry data, named the rows and had a go at using the ggplot2 package to produce a number of nice looking charts. This time I want to progress a little further. My aims are, using the same dataset to:
- Look at the distribution of prices
- Look at the prices by different factors
- Initially just using factors in the land registry data
After my last post on the ONS data structure this post is the first of a few on using that structure and some other public data, mostly UK government data, and mapping it using R. This first post is about getting shapefiles from various locations, loading them into R and plotting them.