Land Registry Part 2

In my first land registry post I imported a month’s worth of land registry data, named the rows and had a go at using the ggplot2 package to produce a number of nice looking charts. This time I want to progress a little further. My aims are, using the same dataset to:

  1. Look at the distribution of prices
  2. Look at the prices by different factors
    1. Initially just using factors in the land registry data

The first aim is easy, plotting a histogram only requires the code

#Look at the distribution of prices
hist(landregistry$Price)

Which produces the chart

rplot3

The problem is that this data contains sales from different times, not just 2016 sales. I only want to see the sales from 2016 so need to first filter the sales and then produce a histogram of these sales. This is done using the code

#Look at the distribution of prices for 2016 sales
sales2016 <- landregistry[ which(landregistry$Transfer_Year==2016), ]
hist(sales2016$Price)

To produce the chart

rplot02

In the end it looks very similar to the non filtered dataframe but it was the right thing to do either way.

Now to move onto the house prices by the different factors. In the data there are a number of different factors which could be worth looking at:

  • Property type
  • Old/new
  • Duration
  • County
  • Date of transfer, I’ll just look at the month

There are details on these variables here. I could also look at details based upon the postcode but I don’t feel like looking at them at the moment. Throughout this section I’ll be using the 2016 only data.

Using the plyr package

Average_Month <- ddply(sales2016, c("Transfer_Month"), summarise,
Sales         = length(Price),
Average_Price = mean(Price)
)
#Plot a bar chart of the results
barplot(Average_Month$Average_Price, main="Average price by month", xlab=ction Month")

You get the chart

rplot04

Which isn’t all that informative, what does it look like in ggplot using the code

ggplot(data=Average_Month, aes(x=Transfer_Month, y=Average_Price)) + geom_bar(stat="identity")

rplot05

Which is a bit better, but not much. Using the same process for property type and old/new we get:

rplot07

rplot06

Which show that, unsurprisingly shows that Detached houses are the most expensive and flats, terraced houses and semi-detached houses being cheaper. It is interesting that on this view new builds are more expensive but I think there is more to it than that. I haven't looked at the other variables I said I would because I have run out of time.

Hopefully I'll have time to look into this data more and post what I find. I'll try to post my R code from what I do here 

Leave a Reply

Your email address will not be published. Required fields are marked *