Burn Media Sites

Samsung Unveils Galaxy S26 Series: The Most Intuitive Galaxy AI Phone Yet

With powerful hardware working together with an industry-leading camera system and intuitive AI experiences, everyday tasks have never been easier and faster

Netflix Paramount Trump Clash Escalates As Streaming Power Meets Political Pressure

Donald Trump’s call for Netflix to remove board member Susan Rice has intensified the Paramount saga, pushing the streaming wars into a political confrontation.

The Future of Forex Brokers in South Africa – Consolidation, Regulation, or Exit?

South Africa’s retail forex industry is entering a decisive phase as regulation tightens and consolidation accelerates. What does it mean for brokers and traders?

Innovate47 Launches Global Food & Agri Accelerator to Back Climate-Smart Startups

Innovate47, the global venture builder and entrepreneur support organisation, has launched a new Food & Agri Accelerator to help founders reshape food systems and…

Simbi Wabote’s 10-Year Roadmap: How Nigeria Achieved 54% Local Content Growth

When Simbi Wabote assumed leadership of Nigeria’s Content Development and Monitoring Board (NCDMB) in 2016, he inherited an oil industry where local participation had…

Records Tumble In Best Year Yet For South Africa’s Hedge Fund Industry

South Africa’s hedge fund industry closed out 2024 with its strongest performance to date, according to the latest Novare Hedge Fund Survey. Assets under…

OPPO Reno15 Series Lands In South Africa With Big Cameras And Bigger Battery Ambitions

OPPO has officially launched the Reno15 Series in South Africa, bringing the Reno15 Pro and Reno15 F to local shelves from 7 February. Positioned…

OPPO Reno15 Series Brings Flagship-Level Hardware Logic to the Mid-Tier in South Africa

Mid-range smartphones are no longer about compromise. In 2026, they are about decisions. OPPO’s Reno15 Series arrives in South Africa with a clear one:…

HONOR’s customer first strategy is reshaping South Africa’s smartphone market

HONOR has been steadily climbing in South Africa’s smartphone rankings, but its rise is not being driven by specs alone. Instead, the brand has…

Omoda C5 X review

Omoda is one of Chery’s sub-brands in the South African market, combining daring design with an abundance of in-cabin digitisation. Its mid-size crossovers are…

TDK enhances IMUs for extreme temps

TDK has responded to developing market needs with a new range of advanced inertial measurement units (IMU) for automotive applications. The Japanese electronics giant…

Data centres and defence are reviving diesel

Data centres will command power equivalent to the entire Japanese power grid by 2030. It’s a startling prediction and one that infrastructure futurists, data…

Continue in 10 seconds

Skip

Online journalism • 28 May 2013

Aspiring data journalist? This book is a must-read

By Peter Verweij

Data

It was work in progress, but after almost one year and 40 ‘versions’ later, Paul Bradshaw’s ‘Scraping for journalists‘ is published. Bradshaw is teaching at London City University and the City University at Birmingham, but he is also a respected data journalist and blogger at the Online Journalism Blog. And not without reason.

You can order a copy of the work as an e-book, available in PDF, Mobi or Epub formats. Leanpub, where you can obtain a copy, has an interesting concept: it offers all the tools for the production and for the publishing of a book. You can make changes and additions while publishing, and, not an unimportant factor, the royalties are higher compared to traditional publishing. Bradshaw says he has “become a huge fan” as “the format combines the best qualities of traditional book publishing with those of blogging and social media.”

Must read

‘Scraping for journalists’ is a must read for data journalists. One of the problems is how to get your data from the online resources into a spreadsheet. Scraping is the answer. But how do you do that, given the fact that most journalists are not coders? In 30 chapters and almost 500 pages Bradshaw gives his recipes for scraping data. The book is not for reading from cover to cover but rather learning by doing. You follow the recipes step by step on your computer, add some variation to the examples and finally you try to apply the recipes on your own data. This works wonderfully, because starting with programming takes too much time before you get results. Now you have some readymade code, which works, and you can experiment until you can successfully apply it to your own data.

Fast start

Already from chapter one you can make a quick start. Within five minutes, you can scrape your first data. Bradshaw starts with explaining the commands Import HTML and Import XML used in Google Drive to import data from a web page into spreadsheets. The trick is to find the right table or list of the data. You can dig deep into the html or xml soup but you can also guess and experiment. Just try some numbers in the expression, advises Bradshaw.

Of course extracting tables from a website can be done faster with a nice tool called Outwit Hub. You just load your data web page in Outwit and push the ‘table’ button and there is your scraped data ready to be exported in Excel format. The free version works but Bradshaw advises to buy the official one for about 60 Euros, because it does not have the limitation of scraping only a hundred lines. This is useful when you are scraping a lot of data. Take, for example, 150 members of parliament, who all have their own web pages. If they’re structured in the same way, with a heading/paragraph where the members state their education and former jobs, doing this by hand page after page is pretty boring and time consuming. You can rather make a scraper, based on the opening- and end-tags for education and jobs, then run it over the 150 individual member pages. Have a cup of coffee and after a while, your data will be ready for exporting to Excel. Bradshaw takes great effort in explaining how to find the opening- and end-tag in the html soup for the data you are looking for. This makes sure you will get it working after a while.

Scraperwiki

You are not the only journalist who is scraping data. Scraperwiki is the playground to meet your friends and share your skills. On Scraperwiki you will find various scrapers used by others to collect data. Copy them and make a revision for your own purposes and run it. This sounds simple, however scrapers are written in code, and generally three languages are used, namely PHP, Ruby and Python. You don’t have to be a programmer to use the scripts. After Bradshaw’s explanation of the structure of a scraper you can start experimenting yourself. And, as any good educator and trainer, Bradshaw gives you some assignments at the end of each chapter.

There is much more to discover: do you know how to scrape a PDF, cells in a large spreadsheet, or data in CSV file? In the book you will find the recipes. When I show the tricks in training sessions, participants always ask: do you have this in writing? Now it is.

Peter Verweij

Sexism in tech: why gender needs to stop being an issue for female geeks

General Tech • 28 May 2013

We use cookies

To improve your experience, deliver personalised content and advertising. Find out more by reading our cookie policy.

Sign up to our newsletter to get the latest in digital insights. sign up

Welcome to Memeburn

By signing up for this email you agree to receive the latest info from Burnmedia Group.

Learn more via our Privacy Policy.

Samsung Unveils Galaxy S26 Series: The Most Intuitive Galaxy AI Phone Yet

Netflix Paramount Trump Clash Escalates As Streaming Power Meets Political Pressure

The Future of Forex Brokers in South Africa – Consolidation, Regulation, or Exit?

Innovate47 Launches Global Food & Agri Accelerator to Back Climate-Smart Startups

Simbi Wabote’s 10-Year Roadmap: How Nigeria Achieved 54% Local Content Growth

Records Tumble In Best Year Yet For South Africa’s Hedge Fund Industry

OPPO Reno15 Series Lands In South Africa With Big Cameras And Bigger Battery Ambitions

OPPO Reno15 Series Brings Flagship-Level Hardware Logic to the Mid-Tier in South Africa

HONOR’s customer first strategy is reshaping South Africa’s smartphone market

Omoda C5 X review

TDK enhances IMUs for extreme temps

Data centres and defence are reviving diesel

Aspiring data journalist? This book is a must-read

Peter Verweij

News

Samsung Unveils Galaxy S26 Series: The Most Intuitive Galaxy AI Phone Yet

Netflix Paramount Trump Clash Escalates As Streaming Power Meets Political Pressure

The Future of Forex Brokers in South Africa – Consolidation, Regulation, or Exit?

Kimi K2.5 Enters the Global AI Race Against ChatGPT 5.2 and Its Rivals

We use cookies

Welcome to Memeburn