A quick follow up on the last post. I forgot to write about plotting ROC curves in R based on the different models. In the last post I created 5 progressively more complicated decision trees which didn’t really add any benefit when looking at the accuracy of the model. But accuracy is just one metric, what do the ROC curves look like and what are the areas under the curves? Read More
Now that the land registry data has been imported and had some initial exploratory work done to it lets have a go at making a price prediction model. I’ll use a small subset of the data and initially only try to predict whether or not the house is worth more or less than £500k, rather than the more complicated process of predicting the price. The code used in this post is largely based upon the DataCamp course “Introduction to Machine Learning”. Code for this project is on my GitHub page here. This post focuses on decision trees using the package rpart. Read More
Quite a long time ago now I wrote a post on my map of UK postcode towns in Microsoft Excel based on various posts I had seen on clearandsimply.com. It turns out I do actually use this map quite a lot as it is very quick to use and doesn’t require any extra specialist software. So I thought I would revisit this. Read More
Following on from the previous Land Registry posts I've had a go at using the GoogleVis package to plot the data. So far my favourite method is the motion chart below. Although I need to make more equal sales areas as London ruins the x-axis a bit. The best view is the histogram view. Read More
After my last post on the ONS data structure this post is the first of a few on using that structure and some other public data, mostly UK government data, and mapping it using R. This first post is about getting shapefiles from various locations, loading them into R and plotting them.
I have been looking through the ONS geographic data on their Geo Portal and there are acronyms and variables everywhere so I thought it best to understand what they all mean. Whenever I refer to the output areas and super output areas I’m referring to the ones as at the 2011 census in England and Wales.
I was setting up some trackers at work the other day using some OLAP cubes in Excel across a number of different variables (about 20) to track monthly sales which I could refresh each month. Once I’d set the sales tracker up I realised that I wanted to look at average price across the same variables so I made a copy of the spreadsheet and went through each of the pivot tables changing them to track average price. When I then wanted to look at sales mix (%sales that month) I thought there must be a better way so decided to write some VBA to do all of this for me.