Simple machine learning in R - Decision Trees

Now that the land registry data has been imported and had some initial exploratory work done to it lets have a go at making a price prediction model. I’ll use a small subset of the data and initially only try to predict whether or not the house is worth more or less than £500k, rather than the more complicated process of predicting the price. The code used in this post is largely based upon the DataCamp course “Introduction to Machine Learning”. Code for this project is on my GitHub page here. This post focuses on decision trees using the package rpart. Read More

Summarising data in R

Summarising data

One of the most frequent tasks I do is summarising data using either proc sql or proc means with code like this:

proc means data=inputdata nway missing noprint;
    class var1 var2;
    var var3 var4;
    output out=outputdata (drop = _type_ _freq_) sum=;

Given that I use it in SAS a lot I’m going to assume that I’ll use it in R a lot so it seems like the next sensible thing to learn.
Read More