Land Registry Data

My computer has been struggling with some of the code I’ve been trying to run, it is pretty old and doesn’t have enough memory for large datasets in R. So rather than buy a better laptop I’ve set up an Amazon Web Service account and using this guide set up a computer so I don’t have to use mine. I’m only using the free one for now but if I want to have a go at processing something larger this will allow me to pay a small fee to use a more powerful machine for a short period of time.

One thing this does mean is that I can’t just download my data to my computer and reference it from a file location but that isn’t exactly and issue in R.

So, on to my first project using this cloud based R. I’m going to have a look at the different analysis tools in R using the UK land registry data from data.gov. In this post I’ll be playing with just one month’s worth of data and using the package ggplot.

Ggplot is easy to use and produces very nice looking charts with no effort at all. After loading the data into a dataframe in R and a little processing around the dates. The code

ggplot( data=landregistry, aes(x=Transfer_Year, y=Price)) + geom_smooth()

produces this chart

Rplot

Which, as expected, shows the average price of a house going up between 1995 and 2015.

This code

ggplot( data = AveragePrice, aes( x=Year, y=Average_Price )) + geom_point(shape=1) + geom_line()

produces this chart

Rplot01

 

Converting the input text into times was tricky at first, but once I found the code it was quite easy using the function “as.POSIXct” combined with the function as.Date

landregistry$Transfer_Date <- as.Date(as.POSIXct(landregistry$Date_of_Transfer, "%Y-%m-%d %H:%M"))

I haven’t had much time to play with all of this but what I have done so far I’ve found interesting. Hopefully I’ll be able to post more soon. The code I’ve used to get this far is saved here.

Leave a Reply

Your email address will not be published. Required fields are marked *