Recently I’ve been working through the various courses on DataCamp. It soon became apparent that this was something that I should have done long ago. It has soon become apparent that it would have saved me a lot of time reading through various blogs and Stack Overflow questions. Read More
In my first land registry post I imported a month’s worth of land registry data, named the rows and had a go at using the ggplot2 package to produce a number of nice looking charts. This time I want to progress a little further. My aims are, using the same dataset to:
- Look at the distribution of prices
- Look at the prices by different factors
- Initially just using factors in the land registry data
My computer has been struggling with some of the code I’ve been trying to run, it is pretty old and doesn’t have enough memory for large datasets in R. So rather than buy a better laptop I’ve set up an Amazon Web Service account and using this guide set up a computer so I don’t have to use mine. I’m only using the free one for now but if I want to have a go at processing something larger this will allow me to pay a small fee to use a more powerful machine for a short period of time.
Sometimes I have a spreadsheet containing lots of spreadsheets of similar formats all using the same colour scheme. And if after a while I decide I don’t like the colours any more then it can be quite annoying to change all of the colours. So I decided to write this short little macro to change the colours.
"You know they'll never really die while the Trunk is alive[...]
It lives while the code is shifted,
and they live with it, always Going Home."
- Moist von Lipwig, Going Postal, Chapter 13
It is a year and a day since Sir Terry Pratchett died, author of the Discworld series amongst other things, died. But it is nice to see that a year on that he still lives on, not just through his books but through the Clacks Overhead. This was a little movement by web developers to keep his memory alive.
Time is of the essence with this post so please excuse the strange mix of notations later on. One of my friends sent me this puzzle from fivethirtyeight.com and here is my solution to how the dog needs to catch the duck.
Lots of peoples first instinct is times faster but this is really just a lower bound, i.e. if the dog travels less than π times faster than the duck then the duck can just read radially to the antipoint of the dogs starting position.
There are a few key assumptions here that we can make from the beginning:
In SAS it is easy to loop a macro between two numbers
%DO I=1 %TO 10; ... %END;
But if you have a list of non-sequential numbers or text you want to run your macro over, e.g. a list of towns, it can be a bit trickier. This SUGI paper gives a macro which lets you do just that. A few years ago I looked at that macro and didn't really understand how it worked and not wanting to use code I didn't understand I wrote my own version.
I’ve been trying to improve my Excel choropleth map spreadsheet from my first post.
The first thing I tried was to try to update it using some maps created on ClearlyandSimply.com. So I’ve created two new versions, one for Europe and one of the World. Both using the maps from ClearlyandSimply.com, with a few small alterations to the code so that on hovering over the map it tells you the country not the abbreviation. I did try to make my own UK maps of Constituency boundaries and Counties but using this technique but no matter how hard I tried I couldn’t get it to work. If you do get it to work, please let me know. Note that in both the Europe and World map the data may not be accurate and is only there for illustrative purposes.
One of the most frequent tasks I do is summarising data using either proc sql or proc means with code like this:
proc means data=inputdata nway missing noprint; class var1 var2; var var3 var4; output out=outputdata (drop = _type_ _freq_) sum=; run;
Given that I use it in SAS a lot I’m going to assume that I’ll use it in R a lot so it seems like the next sensible thing to learn.