Treasure Island

Creating punctuation worksheet using python

I wanted to make some simple grammar worksheets based on an existing book, rather than making some up myself. So rather than just opening a book and typing them out I thought I’d use Python. This way you can personalise the worksheet to a book the kids like and has language at their level. To make things extra difficult I’ve done this on a Raspberry Pi.

Spoiler - Here is the Treasure Island One

Challenge

Be able to extract random sentences from an ebook (.epub) of choice and remove any punctuation, capitalisation and display that to the user.

Issues

  1. A python IDE I like isn’t installed on my Pi and I don’t really understand Linux.
  2. Once in Python I don’t really know where to start

Solution

First, I installed Jupyter using this guide, this took more restarts than I realised it would and I lost the first draft of this post.

Then I got an ebook of Treasure Island, along with a handful of other books for testing, from Project Gutenberg. Easy bits done!

After much messing around I found this post on Medium which gave me the functions I needed to extract the text from the book. With some keyboard mashing I got this to work and import my book and create a list of all of the sentences. I then added some functions to remove returns from the sentences, choose 15 sentences at random, remove the punctuation and then create a pandas dataframe of the questions and answers.

To get the data into a pdf I heavily relied on this post to use the packages Jinja and WeasyPrint. I had to set up a HTML template and for the styling I used the same CSS file as in the post as I don’t really understand CSS.

Once that was done I could make run the code over my book and create my worksheets.

I’ve tested it on a number of books and it seems to work pretty well. There are some issues around where characters speak more than one sentence so there are open quotation marks in the answers but I’m happy with the result. Another issue is that it sometimes includes copyright notices in the questions, but to get around that I just run the code again to generate a new set.

Example Questions:

  • had there been a breath of wind we should have fallen on the six mutineers who were left aboard with us slipped our cable and away to sea
  • and who may you be and then as he saw the squires letter he seemed to me to give something almost like a start

The code to make these challenges, including the CSS and HTML templates are on my Github page here

Example worksheets

Shapefiles in R

I’ve posted before on this and I clearly didn’t know what I was doing. I still don’t really know what I’m doing but I now have some pretty pictures and that’s all anybody really wants. In this post I’m going to import a postcode shapefile from the OS, plot the postcodes in R, find the neighbours of each postcode and convert the data into a network graph. Github repository here.

Read More

Simple machine learning in R - Caret

In the last two posts I created some simple decision trees and tested their accuracy. Now it’s time to try some other models. As before I’m going to continue predicting the variable FiveHundredPlus with a limited set of factors to keep the processing pressures down. Once I’m a bit more confident I’ll move to the larger dataset and a more powerful machine. I’m going to use the package caret and recreate this post from Analytics Vidhya.

Full code saved on my github page here.

Read More

Simple machine learning in R - Decision Trees

Now that the land registry data has been imported and had some initial exploratory work done to it lets have a go at making a price prediction model. I’ll use a small subset of the data and initially only try to predict whether or not the house is worth more or less than £500k, rather than the more complicated process of predicting the price. The code used in this post is largely based upon the DataCamp course “Introduction to Machine Learning”. Code for this project is on my GitHub page here. This post focuses on decision trees using the package rpart. Read More