Scarping
is an important tool for data journalists. Sometimes you are lucky,
and can download your date or copy-paste them from a website. Bad
luck; then the data journalist has to look for heavy tools: a wrench
like Outwit
Hub could do the job. But if this fails too
there is only one last resort the crowbar of scraperwiki, where you
can code your own scraper. Paul Bradshaw payed much attention
to Scraperwiki in his book 'Scraping for Journalists', reviewed
in Memeburn.
Recently
scraperwiki has been updated and we we are not talking about look and
feel of the website. Luckily you can still continue to use the
recipes of Paul Bradshaw.
Published on Memeburn: http://memeburn.com/2013/07/data-journalist-heres-how-to-deal-with-the-changes-to-scraperwiki/
Published on Memeburn: http://memeburn.com/2013/07/data-journalist-heres-how-to-deal-with-the-changes-to-scraperwiki/
In
order to use the
new scraperwiki, you have to create a new
account. Your old login and password aren't working anymore. Also
your scrapers and data are not available automatically at the renewed
service of scraperwiki. You can find them at the
old website, where you can login with your ID
and psw. There is a script available for exporting your work from the
old to the new website. Though copy paste also works.
Community
The
new scraperwiki service has several limitations and has now a price
tag too:
-
you can use the free version called Community, which is
limited to the use of three scrapers and/or datasets not bigger than
8 MB, and not using more than 30 minutes CPU;
-
Data Scientist is the second option and gives you for 29 USD a
month an unlimited number of scrapers/datasets with a maximum of 256
MB each and using not more than 30 min CPU;
-
Explorer is the third and last option; for 9 USD a month you
can use 10 datasets.
When
I tried to scrape a new dataset, and already having three sets in my
account, Scraperwiki served me immediately a screen for updating the
service.
“More
powerful for the end user and more flexible for the coder”,
this is the new adagium of Scraperwiki. This becomes clear
immediately when you want to scrape a new dataset. The old menu's are
replaced by tiles. 'Code in your browser' brings you back to the well
known environment for creating a scraper in various languages
(Python, Ruby or PHP are still available but there are new ones
added).
Maps
and Graphs
Once
you have a scraper working, there are now several new possibilities
to work with your data.
Again
we can choose options from different tiles:
-
you can view your data in a table format;
-
create a graph or map from the dataset;
- or
query your dataset using SQL;
-
and finally you can download your data.
These
options are new and work much easier and faster than the old
interface, where you had to create a separate view in order to
inspect and or download your dataset.
New
options in the main menu are tiles for 'searching for tweets' and a
tile for 'searching Flickr' using geo-tags. Also the possibility to
upload a spreadsheet, query it with SQL or create graph or map from
the data work smoothly. For coders there is an other choice: they can
create their own tools and login directly on the scraperwiki server
using SSH.
But
where is the old option to to look into scrapers of other user, fork
them and modify so you can use them for your own purposes? “Unlike
Classic, the new ScraperWiki is not aiming to be a place where people
publically share code and data. The new ScraperWiki is, at its heart,
a more private, personal service”.
Eisch,
gone! that is bad luck because studying working scrapers is not only
helpful, but also instructive. However, says scarperwiki you can
publish your scrapers on GitHub;
or share you data at DataHub.io.
That
is a cold comfort, and in the mean time – probably until September
– I continue working in the old scraperwiki.
Geen opmerkingen:
Een reactie posten
Opmerking: Alleen leden van deze blog kunnen een reactie posten.