Data journalism: where coders and journos meet


How do you make sense of a growing data pile spit out by the internet? The number of journalists who can analyse and write stories based on this data is still relatively small.

At a time when large numbers of journalists are being laid off — because print newspapers are closing or decreasing the number of editors — data journalism is becoming a great way to get value out of journalistic work.

Newspapers are also exploring it, and data journalism could potentially attract more readers to print or online editions. In Europe, The Guardian’s data page is an example of how well the form can be pulled off.

The growing interest in data journalism is reflected by reports from the National Institute for Computer Assisted Reporting (NICAR) conference held in February 2012 in St Louis, USA. NICAR, a branch of an organisation called Investigative Reporter and Editors (IRE), is the focal point of data journalism in the USA.

According to reports about the NICAR conference the atmosphere was vibrant. Alex Howard wrote: “At NICAR 2012, you could literally see the code underpinning the future of journalism written — or at least projected — on the walls”.

“The energy level was incredible,” said David Herzog, associate professor for print and digital news at the Missouri School of Journalism, in an email interview after NICAR. “I didn’t see participants wringing their hands and worrying about the future of journalism. They’re too busy building it.”

Scraping
Looking at the list of topics and presentations of the NICAR conference, it seems journalists and coders are now officially engaged. It all starts of course with how to use spreadsheets; doing some statistics, making graphs or mapping the data.

But you are not driving your father’s car. Before you can start you need the data. When downloading from a database, the data often needs to be cleaned. Google Refine does the job quite well.

If your data is in pdf format, use cometdocs to translate them into Excel. Converting your Excel data to html meanwhile can be done with Mr. DataConverter.

If data has been published on the web, we need to scrape it from the web page. Chrome has an extension to the browser to do it.

Or you could try to download the data on the web page directly into the spreadsheet with Table Top. It is a Java scripts that works in combination with Google spreadsheets. Things can get really nasty if this does not work. You then have to start coding using Python to write a script, or make use of ScraperWiki.

Spreadsheet data is sometimes available in csv format (comma separated value); to make things easier. To work in this format and prepare data for importing in a spreadsheet, install the csvkit. A quick solution is Mr.People which translates a simple list of for example names into csv format.

Mapping
Mapping data used to be complicated, using for example propriety software like ESRI’s Arcmap. Google made things easier with Fusion Tables, and if you want to go for the whole experience, there is QGIS. This is, however, like using a shotgun to kill a mosquito.

Take it easy with more standard solutions and make your graphs using ManyEyes and maps with StatSilk.

Scraping from the web into a spreadsheet is all well and good, but could you do it the other way round? Could you publish your data from the spreadsheet on the web in a nice format? Absolutely. With Django, based on Python, or Ruby. As an editor, however, you should hire a coder to help you out.

More

News

Sign up to our newsletter to get the latest in digital insights. sign up

Welcome to Memeburn

Sign up to our newsletter to get the latest in digital insights.