Why data driven documentation is the future of online journalism

Data visualisation


Anyone who thinks journalism is not about numbers is wrong. Figures in the newsroom are more important than ever. We live in age where more data is collected than ever and that data will be used for setting policy goals and decision-making. But we also all sit behind computer screens and databases are not rocket science anymore, so if you’re not using data for investigations, you’re missing out.

The news is still in text format but tables with figures are playing an increasingly important role. Even something as simple as a press release from the ECB (European Central Bank) about the decline in the number of financial institutions can provide a mine of interesting data for journalists to exploit. The Guardian has been especially innovative in the field of data journalism, as this interactive map about youth unemployment in the UK shows.

Data journalism is a hot issue among media professionals. The number of applications for training in the Netherlands and Belgium is high for both national and regional newspapers as well as audio-visual media. How do you download data, or scrape data with Outwit Hub, clean your data with Google Refine and analyze data in spreadsheets like Excel; these are the main topics in a training about data journalism. Nevertheless the main question still is: how do you make a story out of a spreadsheet full of figures? Because you can’t just drop your data in a hardcopy newspaper or on the online edition. Visualization is the key word. In this area we see important developments.

Visualization is not that complicated. Excel offers plenty of opportunities to make all kinds of graphs. But these are static images. Useful, but for an online edition boring and not interactive. So we need to go beyond Excel and visualize data with other applications. Services such as Tableau are convenient and not too complicated. Google’s chart editing tool has more options, if you are willing to dig into Java script. Or you could do some experiments on the playground that is Google graphics.

Displaying geographic data and making different maps is slightly more complex. Google Fusion Tables is a wonderful tool, both for displaying information pinpoints (clickable dots on the map referring to information) and for making maps with specific data, such as an administrative map of a country with crime data. Problem is that a map in the Google FT should have the Google’s KML format (Keyhole Markup Language) while a lot of maps are available in the ESRI’s SHP format. Luckily SHP can be converted into KML using a separate service such as shape escape. When you are looking for maps, there are interesting databases to browse, including Geocommons or Diva-Gis.

A real GIS program (Geographic Information Systems), such as Quantum GIS is sometimes more convenient to use. Qgis is open source and a full program to create and edit digital maps. But your maps are not online, or you need to install a mapserver. The best solution is to make a map in shp format in Qgis and export the layer to kml. Next upload this kml file to Google FT, and your map is online with an embedded code or a link. Here is a simple example of crime in South Africa using Google FT. If you want to start immediately without bothering about shp or kml, then indie mapper is a very convenient solution.

Meanwhile, the programmable Web (Web 2.0), based on Java Script, is fast developing. Static web pages are now fully replaced by dynamic pages; they all use Java-based APIs (Active Programmer Interfaces) to ensure that data from one page, services or databases, appear real-time in another page. But actually we want more than just linking all kinds of services and information to each other on a webpage. One step further is to visualize the contents of documents, graphs, maps, tables, as interactive visualizations on the web. This is the direction of publishing Data Driven Documents (D3) on the web.

US Budget
The New York Times has shown some of the amazing possibilities. Take a look at the visualization of the US budget by Jim Vallandingham. The result is beautiful, but it has a price; D3 is quite difficult. It has a steep learning curve and requires programming skills. Moreover, old browsers do not read the result. At least Internet Explorer 8 or higher is needed.

The basis of these attractive infographics is a new sub language, a library with a variety of new applications in Java Script: d3.js. D3 is a library in Java Script, developed by Mike Bostock. Last year, the latest version of this Java Script toolset was published and the results are surprising. We can now publish document based, dynamic and interactive infographics on the web by combining of SVG (Scalable Vector Graphics) CSS (Cascading Style Sheets) and Java Script. Precisely this technique is what data journalism needs. Visualizations are the conclusions of a data journalism research. Therefore I think that D3 opens a great future for data journalism.

Java Script in general and D3 in particular is not something to teach to the average editor in the newsroom. That is a bridge too far: D3 is more a tool for developers. For publishing data driven documents in the online edition it is however necessary that media hire these developers. And secondly, that data editors at the newspaper are at least aware of the basic principles of D3 in order to control the developers. Well, maybe it is time for more training for the newspaper of the future.



Sign up to our newsletter to get the latest in digital insights. sign up

Welcome to Memeburn

Sign up to our newsletter to get the latest in digital insights.