Recently I’ve been working through the various courses on DataCamp. It soon became apparent that this was something that I should have done long ago. It has soon become apparent that it would have saved me a lot of time reading through various blogs and Stack Overflow questions.
The most recent course I’ve worked on is on importing data. It starts off using the standard functions, e.g. read.csv, but it them moves onto what they claim to be better and quicker functions from the packages readr and data.table. While I do believe the course, I wanted to find out which was quicker for myself. So, using the data from the UK land registry I set up this little test
Have run this on my not very good home laptop and a virtual machine on AWS and had the results:
Wow my computer is slow compared to the Amazon one. I checked and the data was loaded correctly in the amazon case.
From this not very scientific test we can see that on my laptop both the fread function and the read_csv functions are both much quicker than the read.csv function and that fread is slightly quicker in this case but there isn’t much in it. This is reflected on the AWS machine so from now on I’ll use one of fread or read_csv. Thanks Datacamp.
On a different note it turns out that like physical computers, virtual machines need to be rebooted every so often otherwise they crash.