dinsdag 27 april 2021

R Data Journalism Helpers

R is an interesting platform that has found its way into data journalism. Once installed it is ready to use. However R is open source and is constantly developing. New packages are released, new ideas for analysis are discussed, for example at https://www.r-bloggers.com/ .

1. Get your data with an R api

R is a very important statistical platform for data scientist. Also in data journalism R has found applications and is used for example by journalists of the Economist or the BBC. (http://d3-media.blogspot.com/2019/02/blog-post.html ). Getting your data into R from let’s say a database puts you on a kind of detour. You make a selection of the data and download the set as .csv or .xls, from there you can import your data into R.

Direct access and downloading the data in R is now often an option. Database like the World Bank offer an R package called ‘wbstats’ ; more https://econandrew.github.io/wdi-api-R-gettingstarted/using-r-to-access-the-world-bank-api.html ; the IMF use ‘imfr’; more https://meshry.com/blog/downloading-data-from-the-imf-api-using-r/ ). For poverty data the World Bank uses ‘povcalnetR’; more https://github.com/worldbank/povcalnetR ). This is a great help for not only for data scientists but also journalist who are reporting for example about poverty and development in Sub Sahara Africa.

In the Netherlands, the Central Bureau of Statistics, has an interesting R api to collect data: ‘cbsodataR’. Here is an manual: https://www.cbs.nl/nl-nl/onze-diensten/open-data/statline-als-open-data/snelstartgids , For making maps the have the following handout: https://www.cbs.nl/nl-nl/onze-diensten/open-data/statline-als-open-data/cartografie

2. Scrapping and printing in R

Scrapping data from a webpage is generally not so difficult when employing Excel for analysis. For example if there is atable on the webpage, you could directly import the table in excel, using data and then from the web; paste the url and done. Outwit Hub is a nice programme for scraping.Table capture in chrome is an other posibility. Capture the dta in .csv and then import in Excel or in R.

Direct scraping data in R is more difficult, using package ‘rvest’, you have to dive deep into the .html code . More: https://www.dataquest.io/blog/web-scraping-in-r-rvest/

Now there is a package in R for scarping using a GUI calles ‘datapasta’; scraping a table from the web using R becomes a piece of cake. Here is a tutorial: https://milesmcbain.github.io/datapasta/articles/how-to-datapasta.html 

Making a plot with ggplot can also be time consuming because of the wide range on possibilities. A GUI for printing with ggplot is very helpful. I use a package called ‘esquisse’. Here are my experiences: http://d3-media.blogspot.com/2019/02/gui-for-r-ggplot.html


maandag 26 april 2021

Data Journalism on the Samsung Tablet with Termux, Ubuntu and R

Can I turn my Tablet into a real work station? Apps are nice for small jobs, reading mail and bit of browsing. But for work…Office is a nice app and gives you all the standard Office tools like Word and Excel, lite versions of course, but working OK. But how about doing statistics with R? No R for Android so you need Linux environment. Installing Linux in a box in Android is possible with for example Linux Deploy app. However you need to root the machine. Generally I have no problem with that: I want full control on all machines. Rooting a machine has a consequence: losing the warrant. An secondly is a working root for this specific machine with that Android version available. And finally you run the risk in case of failure to turn you machine in a brick.


Data journalism on Samsung Tablet (1)

I had to buy a new tablet. The old one a Samsung Tab S10, SM-T800, was regularly crashing when it got too much data to load. Reading the NYTimes was a disaster, constantly reloading the pages; visalizations of the Economist crashed regularly because it was out of memory. The choice for a new one was: Samsung Tab 7+, SM T-970, running Android 11. Of course with a keyboard cover, just as the old one. Here they are: