• BURN MEDIA
    • Motorburn
      Because cars are gadgets
    • Gearburn
      Incisive reviews for the gadget obsessed
    • Ventureburn
      Startup news for emerging markets
    • Jobsburn
      Digital industry jobs for the anti 9 to 5!

Data journalist? Here’s how to deal with the changes to ScraperWiki

Scraping is an important tool for data journalists. Sometimes you are lucky, and can download your data or copy-paste them from a website. Bad luck; then the data journalist has to look for heavy tools: a wrench like Outwit Hub could do the job. But if this fails too there is one last resort: the crowbar that is ScraperWiki, where you can code your own scraper. Paul Bradshaw payed much attention to ScraperWiki in his book Scraping for Journalists (check out the Memeburn review).

Recently ScraperWiki has been updated and we are not just talking about the look and feel of the website. Luckily you can still continue to use the recipes put together by Bradshaw, but there are a few other things you might need to know.

In order to use the new ScraperWiki, you have to create a new account. Your old login and password aren’t working anymore. Also your scrapers and data are not available automatically at the renewed ScraperWiki. You can find them at the old website, where you can login with your ID and password. There is a script available for exporting your work from the old to the new website. Copying and pasting also works though.

Community

The new ScraperWiki service has several limitations and now comes with a price tag too:

  • You can use the free version called Community, which is limited to the use of three scrapers and/or datasets not bigger than 8 MB, and not using more than 30 minutes CPU;
  • Data Scientist is the second option and gives you for US$29 a month an unlimited number of scrapers/datasets with a maximum of 256 MB each and using not more than 30 minutes CPU;
  • Explorer is the third and last option; for US$9 a month you can use 10 datasets.

When I tried to scrape a new dataset, already having three sets in my account, ScraperWiki immediately served me with a screen demanding I upgrade.

“More powerful for the end-user and more flexible for the coder”: this is the new adage of ScraperWiki. This becomes clear immediately when you want to scrape a new dataset. The old menus are replaced by tiles. ‘Code in your browser’ brings you back to the well-known environment for creating a scraper in various languages (Python, Ruby or PHP are still available but there are new ones added).

Maps and Graphs

Once you have a scraper working, there are now several new possibilities when it comes time to work with your data.

Again we can choose options from different tiles:

  • You can view your data in a table format
  • Create a graph or map from the dataset or query your dataset using SQL
  • Finally you can download your data.

These options are new and work much easier and faster than the old interface, where you had to create a separate view in order to inspect and or download your dataset.

New options in the main menu are tiles for ‘searching for tweets’ and a tile for ‘searching Flickr’ using geo-tags. Also the possibility to upload a spreadsheet, query it with SQL or create graph or map from the data work smoothly. For coders there is an other choice: they can create their own tools and login directly on the ScraperWiki server using SSH.

But where is the old option to look into scrapers of other user, fork them and modify so you can use them for your own purposes? “Unlike Classic, the new ScraperWiki is not aiming to be a place where people publicly share code and data. The new ScraperWiki is, at its heart, a more private, personal service”.

That is bad luck, because studying working scrapers is not only helpful, but also instructive. However, says ScraperWiki you can publish your scrapers on GitHub; or share you data at DataHub.io.

That is a cold comfort, and in the mean time — probably until September — I’ll stick with the old ScraperWiki.

Author | Peter Verweij

Peter Verweij
After 30 years of lecturing and training at the School of Journalism at Utrecht in journalism, politics and new media, Peter Verweij, started in 2005 his own company D3-Media, which focuses on the following areas: Production of journalistic content for multimedia media and blogs; Research in the area... More
  • Moi

    Sounds like they’ve ripped the heart out of it in the name of profit.