After my last post on the ONS data structure this post is the first of a few on using that structure and some other public data, mostly UK government data, and mapping it using R. This first post is about getting shapefiles from various locations, loading them into R and plotting them.
First I need some shape files of the UK. I’ve found UK shapefiles for counties and countries, the ONS LSOA boundaries and postcode boundaries at sector, district and area level. For these I used these sources:
|Shapefile(s)||Source 1||Source 2|
|UK Counties and Countries||http://gadm.org/|
Note that you will need to read each of their private policies to see how their data can be used. UK government data can usually be used privately and commercially but the other sources may not be so generous.
The code I’m using throughout this project is saved on my GitHub account here. The different data sources I’ve got my data from generally use one of two map projections
- OSGB36 – The Ordinance Survey’s national grid. This uses Eastings and Northings and gives the number of metres north and east of a point just off the south west of Cornwall.
- WGS84 – Longitude and Latitude using the World Geodetic System
More info on these can be found at www.epsg.org
The general process for loading in the shapefiles is to first use the function readOGR from the package rgdal. Then use the function spTransform, again from rgdal, to convert the coordinates into the correct coordinate system. You also need to have the package sp loaded for the function CRS to work. CRS tells you what projection the current SpatialPointsDataFrame uses.
The code does also try to dissolve the postcode areas away to give a map of the UK using the functions spChFIDs (sp) and gUnaryUnion (rgeos) but it doesn’t seem to work that well, that is also why the code loads the various ONS lookup files and doesn’t use them I just can’t get the dissolve to work to get the boundaries for MSOA and LADs.
This process is repeated for each of the shapefiles.
Now that they have all been loaded we can see what they look like using progressively more complicated code. All of the examples below will use the postcode area file since this contains a nice number of areas to plot. And produces a nice comparison to what I did with choropleth maps in Excel and how much simpler this process is!
spplot(Post.Area.SP, "name", colorkey=FALSE)
Gives the slightly more colourful
At the moment this is just giving the areas random colours (I think), how do we get some real data on here. Here I merge on some real data (saved here) and plot the population of each area in England and Wales
What about plotting this on a Google map
Which is already much nicer than the Excel choropleth map. And with a few extra tweaks
plotGoogleMaps( SP = Post.Area.SP2,
filename = 'GooglePostArea.htm',
zcol = 2,
fillOpacity = 0.5)
If you output the HTML file you can open it even if you aren’t connected to the internet, your browser just doesn’t open the map underneath. And the HTML file is here GooglePostArea.
Now the problem here is that the above map is over 20mb which is a bit too big for such a simple map. This is because the shape file is very, very detailed. So for plotting purposes we can simplify the data using the function gSimplify (rgeos).
Post.Area.SP2 <- gSimplify(Post.Area.SP,500,topologyPreserve = TRUE)
Which creates this chart,GooglePostArea500m , which is what I’ll use for plotting in future posts.
Now I just need to find some actual data to process.