Here are some links to some of my best work this year:
R for Journalists
In the autumn of 2016 I created R for Journalists, one of the first websites anywhere to look at how the R programming language could be used for journalism.
I started it out of frustration – I was looking into how I could use R but all the resources I could find online were geared towards data science, full of jargon and assumed a relatively high prior level of knowledge.
I decided to go back to basics and set up a website designed to be accessible as possible for beginners looking to get into R.
In the last 12 months I have published 18 posts chronicling my own data journalism using R and working through examples.
Automated weekly print pages
The Trinity Mirror Data Unit prepares print pages for 34 of our weekly titles. Every week our team produces automated print pages about a data topic such as crime, house sales or health.
Some of these pages are my responsibility, including the ones about crime.
The data I needed was contained in hundreds of open data spreadsheets of monthly street-level crime data available from data.police.uk, containing millions of rows of data in total.
I collated them using R, found the areas that were relevant to our titles, made an overview of the situation and identified the worst five streets for each area. This included using an R package to make API calls to Google to get the postcodes of the worst streets.
This entire process is automated, meaning we can produce 34 detailed print graphics that are individually relevant to each of our titles.
School absence and deprivation
I analysed the Government’s school absence data and combined it with the latest Index of Multiple Deprivation to see whether pupils who go to school in poorer areas were more likely to miss school without their parents’ permission.
I used R to analyse the hundreds of thousands of rows of raw data.
My hunch was correct: poorer pupils were more likely to skip school than their richer peers. This lack of attendance won’t just be truancy – some pupils won’t be at school because of their chaotic family lives, according to two psychologists I spoke to for the piece.
This absence in turn will likely affect their ability to get into universities or further education in the future.
Prisons under Pressure
The English and Welsh prison system is a mess. I’ve reported for years about how prison violence is rising, self-harm is increasing and drug-taking is rife.
With Prisons under Pressure I brought all these publicly available datasets together to show at a glance just how serious the violence and drug use has become.
With our coder Ashley Brown and designer Kelly Leung we created an innovative chart invoking prison bars. The user selects a prison, ‘unlocks’ and indicator and the chart ranks the facility, showing how bad it is in that prison and how it compares to the rest of the country.
Cyclists hitting car doors
I analysed millions of rows of the STATS19 data from the Department for Transport using R. These are taken from police records of road accidents that involve at least one person getting injured or killed up and down Britain.
I narrowed it down to cyclists getting injured or killed by colliding with car doors. Drivers can get very careless when opening car doors, not paying attention to oncoming cyclists.
The charity Cycling UK advocates using the so-called ‘Dutch reach’ – using your opposite hand to open a car door, which forces your body to turn and increases the likelihood of spotting a cyclist either directly or in your wing mirror.
It’s well-known that children born in the summer fare worse at school than those who are born in the autumn at the start of the academic year.
An extra 10 or 11 months to develop is a long time for five-year-olds. But could this advantage in the early years extend all the way to university?
To find out, I submitted Freedom of Information requests to Britain’s leading Russell Group of universities, asking them for details on the month of birth for their first-year undergraduates for the past five years.
I calculated the percentages to work out the proportion of students born in each month. The job wasn’t done – births are not evenly distributed throughout the year. As a baseline, I used 20 years of England and Wales births data from the Office for National Statistics.
My findings showed that there was a bias towards students born in September and October and away from students born in July and August.
To show the results, our developer Carlos Novoa, designer Kelly Leung and I created this widget that revealed for the first time the summer gap at British universities.
You could select your university and it showed you whether students there born in each month tallied with what we would expect.
This is something schools minister Nick Gibb is concerned about.
A Department for Education spokesman told me: “We want all children to have an equal chance to excel in school and are concerned that some children may be missing the reception year.
“We are carefully considering how best to address these issues and any impact this may have on the admissions system.”
High Speed Rail or Slow Speed Fail?
A perennial complaint from town halls outside London is that the capital has vastly superior infrastructure compared to the rest of the country.
London has its own bus regulation, Britain’s only extensive Underground system, Crossrail and Britain’s only current High Speed rail terminus. This gives London an unfair advantage in the eyes of many outside the capital.
I wanted to investigate whether trains to and from London were faster than those to and from Britain’s other major cities.
I downloaded the coordinates of each train station in Britain from the Department for Transport’s records.
Then I selected Britain’s largest cities and looped through the formula to get the distance as the crow flies between each station.
That gave us the distances. Remember your school physics classes: Speed = distance / time.
We needed the time by train between each station pair. We were able to get this by scraping the Trainline website – here’s an example page.
The results confirmed what we thought: Britain’s provincial cities really do have slow train connections between each other compared to the train to London.
The fastest trains were all to and from London. It’s much easier to go North-South or South-North in Britain by train than to go East-West. This left some isolated cities such as Hull and Plymouth with slow rail connections.