F5.5G Leap-forward Development of Broadband in Africa The Africa Broadband Forum 2024 (BBAF 2024) was successfully held in Cape Town, South Africa recently, under…
Data journalist? Here’s how to deal with the changes to ScraperWiki
Scraping is an important tool for data journalists. Sometimes you are lucky, and can download your data or copy-paste them from a website. Bad luck; then the data journalist has to look for heavy tools: a wrench like Outwit Hub could do the job. But if this fails too there is one last resort: the crowbar that is ScraperWiki, where you can code your own scraper. Paul Bradshaw payed much attention to ScraperWiki in his book Scraping for Journalists (check out the Memeburn review).
Recently ScraperWiki has been updated and we are not just talking about the look and feel of the website. Luckily you can still continue to use the recipes put together by Bradshaw, but there are a few other things you might need to know.
In order to use the new ScraperWiki, you have to create a new account. Your old login and password aren’t working anymore. Also your scrapers and data are not available automatically at the renewed ScraperWiki. You can find them at the old website, where you can login with your ID and password. There is a script available for exporting your work from the old to the new website. Copying and pasting also works though.
Community
The new ScraperWiki service has several limitations and now comes with a price tag too:
- You can use the free version called Community, which is limited to the use of three scrapers and/or datasets not bigger than 8 MB, and not using more than 30 minutes CPU;
- Data Scientist is the second option and gives you for US$29 a month an unlimited number of scrapers/datasets with a maximum of 256 MB each and using not more than 30 minutes CPU;
- Explorer is the third and last option; for US$9 a month you can use 10 datasets.
When I tried to scrape a new dataset, already having three sets in my account, ScraperWiki immediately served me with a screen demanding I upgrade.
“More powerful for the end-user and more flexible for the coder”: this is the new adage of ScraperWiki. This becomes clear immediately when you want to scrape a new dataset. The old menus are replaced by tiles. ‘Code in your browser’ brings you back to the well-known environment for creating a scraper in various languages (Python, Ruby or PHP are still available but there are new ones added).
Maps and Graphs
Once you have a scraper working, there are now several new possibilities when it comes time to work with your data.
Again we can choose options from different tiles:
- You can view your data in a table format
- Create a graph or map from the dataset or query your dataset using SQL
- Finally you can download your data.
These options are new and work much easier and faster than the old interface, where you had to create a separate view in order to inspect and or download your dataset.
New options in the main menu are tiles for ‘searching for tweets’ and a tile for ‘searching Flickr’ using geo-tags. Also the possibility to upload a spreadsheet, query it with SQL or create graph or map from the data work smoothly. For coders there is an other choice: they can create their own tools and login directly on the ScraperWiki server using SSH.
But where is the old option to look into scrapers of other user, fork them and modify so you can use them for your own purposes? “Unlike Classic, the new ScraperWiki is not aiming to be a place where people publicly share code and data. The new ScraperWiki is, at its heart, a more private, personal service”.
That is bad luck, because studying working scrapers is not only helpful, but also instructive. However, says ScraperWiki you can publish your scrapers on GitHub; or share you data at DataHub.io.
That is a cold comfort, and in the mean time — probably until September — I’ll stick with the old ScraperWiki.