I thought I was done with this but I’m not. Time to have more of a play in R with Land registry data and the National Statistics Postcode Lookup (NSPL). You never know maps and networks might also be involved.
I’ve posted before on this and I clearly didn’t know what I was doing. I still don’t really know what I’m doing but I now have some pretty pictures and that’s all anybody really wants. In this post I’m going to import a postcode shapefile from the OS, plot the postcodes in R, find the neighbours of each postcode and convert the data into a network graph. Github repository here.
During the world cup I ran a competition to see who on the team was the best at predicting the results of world cup. Unlike everyone else in my company my competition used R Shiny as an interface and was hosted on an Amazon virtual machine.
In the last two posts I created some simple decision trees and tested their accuracy. Now it’s time to try some other models. As before I’m going to continue predicting the variable FiveHundredPlus with a limited set of factors to keep the processing pressures down. Once I’m a bit more confident I’ll move to the larger dataset and a more powerful machine. I’m going to use the package caret and recreate this post from Analytics Vidhya.
Full code saved on my github page here.
A quick follow up on the last post. I forgot to write about plotting ROC curves in R based on the different models. In the last post I created 5 progressively more complicated decision trees which didn’t really add any benefit when looking at the accuracy of the model. But accuracy is just one metric, what do the ROC curves look like and what are the areas under the curves? Read More
Now that the land registry data has been imported and had some initial exploratory work done to it lets have a go at making a price prediction model. I’ll use a small subset of the data and initially only try to predict whether or not the house is worth more or less than £500k, rather than the more complicated process of predicting the price. The code used in this post is largely based upon the DataCamp course “Introduction to Machine Learning”. Code for this project is on my GitHub page here. This post focuses on decision trees using the package rpart. Read More
Quite a long time ago now I wrote a post on my map of UK postcode towns in Microsoft Excel based on various posts I had seen on clearandsimply.com. It turns out I do actually use this map quite a lot as it is very quick to use and doesn’t require any extra specialist software. So I thought I would revisit this. Read More
Following on from the previous Land Registry posts I've had a go at using the GoogleVis package to plot the data. So far my favourite method is the motion chart below. Although I need to make more equal sales areas as London ruins the x-axis a bit. The best view is the histogram view. Read More
In my first land registry post I imported a month’s worth of land registry data, named the rows and had a go at using the ggplot2 package to produce a number of nice looking charts. This time I want to progress a little further. My aims are, using the same dataset to:
- Look at the distribution of prices
- Look at the prices by different factors
- Initially just using factors in the land registry data