Now that the land registry data has been imported and had some initial exploratory work done to it lets have a go at making a price prediction model. I’ll use a small subset of the data and initially only try to predict whether or not the house is worth more or less than £500k, rather than the more complicated process of predicting the price. The code used in this post is largely based upon the DataCamp course “Introduction to Machine Learning”. Code for this project is on my GitHub page here. This post focuses on decision trees using the package rpart.Continue reading

In my first land registry post I imported a month’s worth of land registry data, named the rows and had a go at using the ggplot2 package to produce a number of nice looking charts. This time I want to progress a little further. My aims are, using the same dataset to:

  1. Look at the distribution of prices
  2. Look at the prices by different factors
    1. Initially just using factors in the land registry data

Continue reading

My computer has been struggling with some of the code I’ve been trying to run, it is pretty old and doesn’t have enough memory for large datasets in R. So rather than buy a better laptop I’ve set up an Amazon Web Service account and using this guide set up a computer so I don’t have to use mine. I’m only using the free one for now but if I want to have a go at processing something larger this will allow me to pay a small fee to use a more powerful machine for a short period of time.

Continue reading

"You know they'll never really die while the Trunk is alive[...]
It lives while the code is shifted,
and they live with it, always Going Home."

- Moist von Lipwig, Going Postal, Chapter 13

It is a year and a day since Sir Terry Pratchett died, author of the Discworld series amongst other things, died. But it is nice to see that a year on that he still lives on, not just through his books but through the Clacks Overhead. This was a little movement by web developers to keep his memory alive.

Continue reading