Burn Media Sites

You Mailed It! Celebrating South Africa’s Slickest Email Campaigns of 2025

Inbox icons, subject line sorcerers, CTA kings – the results are in. The 2025 You Mailed It Awards by Everlytic have crowned their champs, with Old Mutual Rewards and Machine_ taking…

Gen Z and Millennials Quietly Disrupt South Africa’s Insurance Industry

A silent revolution is underway in South Africa’s insurance industry, and it’s being led by Gen Z and Millennials. These digitally native generations are…

Stranger Things Season 5 Teaser Drops

Netflix has officially unveiled the teaser trailer for Stranger Things Season 5, setting the stage for the final chapter in the award-winning sci-fi series….

MTN Group Fintech’s Mandela Day campaign is about much more than hygiene kits

By now, most Mandela Day campaigns follow a familiar script: brief acts of corporate kindness, obligatory photos, and promises to “give back.” MTN Group…

Old Mutual SMEgo has launched the 2025 Entrepreneur Mental Wellbeing Survey, a national initiative that puts the mental health of small business owners at…

Arab Bank for Economic Development in Africa Joins AWIEF 2025

With just four months to go until the 11th edition of the Africa Women Innovation and Entrepreneurship Forum (AWIEF), organisers have announced a keynote…

Acer Unleashes Beastly New Predator BiFrost and Nitro GPUs with AMD Radeon RX 9000 Series

Acer is raising the bar for gaming and content creation with the launch of its latest Predator BiFrost and Nitro graphics cards, now powered…

Philips Evnia Drops Jaw-Dropping QD OLED Monitors: 240Hz, Ambiglow, and All the Good Stuff

Game On, Reality Off: Philips Evnia Unleashes QD OLED Mayhem Let’s cut to the chase: Philips Evnia just nuked the gaming monitor scene. The…

Microsoft launches new Surface devices in new AI era

Microsoft today announced the general availability of the all-new Surface Pro and the all-new Surface Laptop to empower users in South Africa to unlock…

TDK enhances IMUs for extreme temps

TDK has responded to developing market needs with a new range of advanced inertial measurement units (IMU) for automotive applications. The Japanese electronics giant…

Data centres and defence are reviving diesel

Data centres will command power equivalent to the entire Japanese power grid by 2030. It’s a startling prediction and one that infrastructure futurists, data…

The most recognisable tactical pickup truck evolves

Perhaps the most iconic of all light tactical vehicles is the Toyota Land Cruiser Technical. These pickup trucks have been a platform of choice…

Continue in 10 seconds

Skip

Online journalism • 28 May 2013

Aspiring data journalist? This book is a must-read

By Peter Verweij

Data

It was work in progress, but after almost one year and 40 ‘versions’ later, Paul Bradshaw’s ‘Scraping for journalists‘ is published. Bradshaw is teaching at London City University and the City University at Birmingham, but he is also a respected data journalist and blogger at the Online Journalism Blog. And not without reason.

You can order a copy of the work as an e-book, available in PDF, Mobi or Epub formats. Leanpub, where you can obtain a copy, has an interesting concept: it offers all the tools for the production and for the publishing of a book. You can make changes and additions while publishing, and, not an unimportant factor, the royalties are higher compared to traditional publishing. Bradshaw says he has “become a huge fan” as “the format combines the best qualities of traditional book publishing with those of blogging and social media.”

Must read

‘Scraping for journalists’ is a must read for data journalists. One of the problems is how to get your data from the online resources into a spreadsheet. Scraping is the answer. But how do you do that, given the fact that most journalists are not coders? In 30 chapters and almost 500 pages Bradshaw gives his recipes for scraping data. The book is not for reading from cover to cover but rather learning by doing. You follow the recipes step by step on your computer, add some variation to the examples and finally you try to apply the recipes on your own data. This works wonderfully, because starting with programming takes too much time before you get results. Now you have some readymade code, which works, and you can experiment until you can successfully apply it to your own data.

Fast start

Already from chapter one you can make a quick start. Within five minutes, you can scrape your first data. Bradshaw starts with explaining the commands Import HTML and Import XML used in Google Drive to import data from a web page into spreadsheets. The trick is to find the right table or list of the data. You can dig deep into the html or xml soup but you can also guess and experiment. Just try some numbers in the expression, advises Bradshaw.

Of course extracting tables from a website can be done faster with a nice tool called Outwit Hub. You just load your data web page in Outwit and push the ‘table’ button and there is your scraped data ready to be exported in Excel format. The free version works but Bradshaw advises to buy the official one for about 60 Euros, because it does not have the limitation of scraping only a hundred lines. This is useful when you are scraping a lot of data. Take, for example, 150 members of parliament, who all have their own web pages. If they’re structured in the same way, with a heading/paragraph where the members state their education and former jobs, doing this by hand page after page is pretty boring and time consuming. You can rather make a scraper, based on the opening- and end-tags for education and jobs, then run it over the 150 individual member pages. Have a cup of coffee and after a while, your data will be ready for exporting to Excel. Bradshaw takes great effort in explaining how to find the opening- and end-tag in the html soup for the data you are looking for. This makes sure you will get it working after a while.

Scraperwiki

You are not the only journalist who is scraping data. Scraperwiki is the playground to meet your friends and share your skills. On Scraperwiki you will find various scrapers used by others to collect data. Copy them and make a revision for your own purposes and run it. This sounds simple, however scrapers are written in code, and generally three languages are used, namely PHP, Ruby and Python. You don’t have to be a programmer to use the scripts. After Bradshaw’s explanation of the structure of a scraper you can start experimenting yourself. And, as any good educator and trainer, Bradshaw gives you some assignments at the end of each chapter.

There is much more to discover: do you know how to scrape a PDF, cells in a large spreadsheet, or data in CSV file? In the book you will find the recipes. When I show the tricks in training sessions, participants always ask: do you have this in writing? Now it is.

Peter Verweij

Sexism in tech: why gender needs to stop being an issue for female geeks

General Tech • 28 May 2013

We use cookies

To improve your experience, deliver personalised content and advertising. Find out more by reading our cookie policy.

Sign up to our newsletter to get the latest in digital insights. sign up

Welcome to Memeburn

By signing up for this email you agree to receive the latest info from Burnmedia Group.

Learn more via our Privacy Policy.

You Mailed It! Celebrating South Africa’s Slickest Email Campaigns of 2025

Gen Z and Millennials Quietly Disrupt South Africa’s Insurance Industry

Stranger Things Season 5 Teaser Drops

MTN Group Fintech’s Mandela Day campaign is about much more than hygiene kits

Arab Bank for Economic Development in Africa Joins AWIEF 2025

Acer Unleashes Beastly New Predator BiFrost and Nitro GPUs with AMD Radeon RX 9000 Series

Philips Evnia Drops Jaw-Dropping QD OLED Monitors: 240Hz, Ambiglow, and All the Good Stuff

Microsoft launches new Surface devices in new AI era

TDK enhances IMUs for extreme temps

Data centres and defence are reviving diesel

The most recognisable tactical pickup truck evolves

Aspiring data journalist? This book is a must-read

Peter Verweij

News

You Mailed It! Celebrating South Africa’s Slickest Email Campaigns of 2025

Gen Z and Millennials Quietly Disrupt South Africa’s Insurance Industry

Stranger Things Season 5 Teaser Drops

AI Showdown: Grok 3, Grok 4, ChatGPT, Gemini and DeepSeek — Which AI Wins for SA Creators?

We use cookies

Welcome to Memeburn