Burn Media Sites

Entries Are Open: You Mailed It 2024 Email Marketing Awards!

Calling all marketers, do you think you send the best emails? Make it official by claiming victory at Everlytic’s You Mailed It Email Marketing…

Once upon a time in the future, we spot Huawei’s recipe for growth

This week we landed in the home country of the consumer group Huawei among other electronics manufacturers. We mention Huawei due to the overall…

Here’s what SA business use ChatGPT for

In a compelling survey on the use of generative AI in Africa and the Middle East, we spot the looming threats that are pleasantly…

Cisco ramps up AI-era security with Hypershield

As the artificial intelligence revolution accelerates, the scale and complexity of data centres are straining conventional cybersecurity approaches. In response, Cisco Systems, the networking…

Deloitte and AWS Join Forces to Drive Cloud Adoption Globally

In a strategic move to accelerate cloud computing adoption across growth markets, Deloitte and Amazon Web Services (AWS) have entered into a multi-year Strategic…

MFA Fatigue Attacks: The New Social Engineering Threat Plaguing Enterprises

While multifactor authentication (MFA) has long been heralded as an essential security measure for keeping corporate networks safe from cybercriminals, a new type of…

Slack founder backs Amplifier Security with $3.3m for Ampy AI

AI continues to revolutionize cybersecurity by focusing on the weakest link, user behavior and other major breaches triggered by simple user error. This has…

Realme 12 series promises affordable premium photography

Smartphone brand Realme is set to launch the Realme 12 series in the country sooner than you could say reel me in. Jokes aside,…

What causes lithium-ion battery fires?

Behind the convenience of lithium-ion batteries lies a potentially hazardous science. SafeQuip, a leading distributor of fire-related equipment, delves into the construction of lithium-ion…

Ford Puma review

Puma might be a famous sport and streetwear brand for many, but if you’re into Ford, it’s always been a compact driver’s car. In…

R3 is the rightsizing of EV design

R3 and R3x are the design disruption EV product planners need to understand. The two most compelling EV car companies are a curious antithesis…

Ranger designers rethink mixed reality

Ford’s T6 series platform has truly become the brand’s global car of the 2020s. Everest and Ranger are built on the advanced T6.1 series…

Continue in 10 seconds

Skip

Online journalism • 17 Jul 2013

Data journalist? Here’s how to deal with the changes to ScraperWiki

By Peter Verweij

Read next9 gadgets every Doctor Who fan should have, seriously

ScraperWiki

Scraping is an important tool for data journalists. Sometimes you are lucky, and can download your data or copy-paste them from a website. Bad luck; then the data journalist has to look for heavy tools: a wrench like Outwit Hub could do the job. But if this fails too there is one last resort: the crowbar that is ScraperWiki, where you can code your own scraper. Paul Bradshaw payed much attention to ScraperWiki in his book Scraping for Journalists (check out the Memeburn review).

Recently ScraperWiki has been updated and we are not just talking about the look and feel of the website. Luckily you can still continue to use the recipes put together by Bradshaw, but there are a few other things you might need to know.

In order to use the new ScraperWiki, you have to create a new account. Your old login and password aren’t working anymore. Also your scrapers and data are not available automatically at the renewed ScraperWiki. You can find them at the old website, where you can login with your ID and password. There is a script available for exporting your work from the old to the new website. Copying and pasting also works though.

Community

The new ScraperWiki service has several limitations and now comes with a price tag too:

You can use the free version called Community, which is limited to the use of three scrapers and/or datasets not bigger than 8 MB, and not using more than 30 minutes CPU;
Data Scientist is the second option and gives you for US$29 a month an unlimited number of scrapers/datasets with a maximum of 256 MB each and using not more than 30 minutes CPU;
Explorer is the third and last option; for US$9 a month you can use 10 datasets.

When I tried to scrape a new dataset, already having three sets in my account, ScraperWiki immediately served me with a screen demanding I upgrade.

“More powerful for the end-user and more flexible for the coder”: this is the new adage of ScraperWiki. This becomes clear immediately when you want to scrape a new dataset. The old menus are replaced by tiles. ‘Code in your browser’ brings you back to the well-known environment for creating a scraper in various languages (Python, Ruby or PHP are still available but there are new ones added).

Maps and Graphs

Once you have a scraper working, there are now several new possibilities when it comes time to work with your data.

Again we can choose options from different tiles:

You can view your data in a table format
Create a graph or map from the dataset or query your dataset using SQL
Finally you can download your data.

These options are new and work much easier and faster than the old interface, where you had to create a separate view in order to inspect and or download your dataset.

New options in the main menu are tiles for ‘searching for tweets’ and a tile for ‘searching Flickr’ using geo-tags. Also the possibility to upload a spreadsheet, query it with SQL or create graph or map from the data work smoothly. For coders there is an other choice: they can create their own tools and login directly on the ScraperWiki server using SSH.

But where is the old option to look into scrapers of other user, fork them and modify so you can use them for your own purposes? “Unlike Classic, the new ScraperWiki is not aiming to be a place where people publicly share code and data. The new ScraperWiki is, at its heart, a more private, personal service”.

That is bad luck, because studying working scrapers is not only helpful, but also instructive. However, says ScraperWiki you can publish your scrapers on GitHub; or share you data at DataHub.io.

That is a cold comfort, and in the mean time — probably until September — I’ll stick with the old ScraperWiki.

Peter Verweij

9 gadgets every Doctor Who fan should have, seriously

Gearburn • 17 Jul 2013

We use cookies

To improve your experience, deliver personalised content and advertising. Find out more by reading our cookie policy.

Sign up to our newsletter to get the latest in digital insights. sign up

Welcome to Memeburn

By signing up for this email you agree to receive the latest info from Burnmedia Group.

Learn more via our Privacy Policy.

Entries Are Open: You Mailed It 2024 Email Marketing Awards!

Once upon a time in the future, we spot Huawei’s recipe for growth

Here’s what SA business use ChatGPT for

Cisco ramps up AI-era security with Hypershield

Slack founder backs Amplifier Security with $3.3m for Ampy AI

Realme 12 series promises affordable premium photography

What causes lithium-ion battery fires?

Ford Puma review

R3 is the rightsizing of EV design

Ranger designers rethink mixed reality

Data journalist? Here’s how to deal with the changes to ScraperWiki

Peter Verweij

News

Entries Are Open: You Mailed It 2024 Email Marketing Awards!

Once upon a time in the future, we spot Huawei’s recipe for growth

Here’s what SA business use ChatGPT for

Once upon a time in the future, my near collision with a robot waiter

We use cookies

Welcome to Memeburn