dinsdag 16 juli 2013


Scarping is an important tool for data journalists. Sometimes you are lucky, and can download your date or copy-paste them from a website. Bad luck; then the data journalist has to look for heavy tools: a wrench like Outwit Hub could do the job. But if this fails too there is only one last resort the crowbar of scraperwiki, where you can code your own scraper. Paul Bradshaw payed much attention to Scraperwiki in his book 'Scraping for Journalists', reviewed in Memeburn.

Recently scraperwiki has been updated and we we are not talking about look and feel of the website. Luckily you can still continue to use the recipes of Paul Bradshaw.
Published on Memeburn: http://memeburn.com/2013/07/data-journalist-heres-how-to-deal-with-the-changes-to-scraperwiki/

In order to use the new scraperwiki, you have to create a new account. Your old login and password aren't working anymore. Also your scrapers and data are not available automatically at the renewed service of scraperwiki. You can find them at the old website, where you can login with your ID and psw. There is a script available for exporting your work from the old to the new website. Though copy paste also works.


The new scraperwiki service has several limitations and has now a price tag too:

- you can use the free version called Community, which is limited to the use of three scrapers and/or datasets not bigger than 8 MB, and not using more than 30 minutes CPU;

- Data Scientist is the second option and gives you for 29 USD a month an unlimited number of scrapers/datasets with a maximum of 256 MB each and using not more than 30 min CPU;

- Explorer is the third and last option; for 9 USD a month you can use 10 datasets.

When I tried to scrape a new dataset, and already having three sets in my account, Scraperwiki served me immediately a screen for updating the service.

More powerful for the end user and more flexible for the coder”, this is the new adagium of Scraperwiki. This becomes clear immediately when you want to scrape a new dataset. The old menu's are replaced by tiles. 'Code in your browser' brings you back to the well known environment for creating a scraper in various languages (Python, Ruby or PHP are still available but there are new ones added).

Maps and Graphs
Once you have a scraper working, there are now several new possibilities to work with your data.
Again we can choose options from different tiles:
- you can view your data in a table format;

- create a graph or map from the dataset;

- or query your dataset using SQL;

- and finally you can download your data.

These options are new and work much easier and faster than the old interface, where you had to create a separate view in order to inspect and or download your dataset.

New options in the main menu are tiles for 'searching for tweets' and a tile for 'searching Flickr' using geo-tags. Also the possibility to upload a spreadsheet, query it with SQL or create graph or map from the data work smoothly. For coders there is an other choice: they can create their own tools and login directly on the scraperwiki server using SSH.
But where is the old option to to look into scrapers of other user, fork them and modify so you can use them for your own purposes? “Unlike Classic, the new ScraperWiki is not aiming to be a place where people publically share code and data. The new ScraperWiki is, at its heart, a more private, personal service”.
Eisch, gone! that is bad luck because studying working scrapers is not only helpful, but also instructive. However, says scarperwiki you can publish your scrapers on GitHub; or share you data at DataHub.io.
That is a cold comfort, and in the mean time – probably until September – I continue working in the old scraperwiki.

Geen opmerkingen:

Een reactie posten

Opmerking: Alleen leden van deze blog kunnen een reactie posten.