dinsdag 27 april 2021

R Data Journalism Helpers

R is an interesting platform that has found its way into data journalism. Once installed it is ready to use. However R is open source and is constantly developing. New packages are released, new ideas for analysis are discussed, for example at https://www.r-bloggers.com/ .

1. Get your data with an R api

R is a very important statistical platform for data scientist. Also in data journalism R has found applications and is used for example by journalists of the Economist or the BBC. (http://d3-media.blogspot.com/2019/02/blog-post.html ). Getting your data into R from let’s say a database puts you on a kind of detour. You make a selection of the data and download the set as .csv or .xls, from there you can import your data into R.

Direct access and downloading the data in R is now often an option. Database like the World Bank offer an R package called ‘wbstats’ ; more https://econandrew.github.io/wdi-api-R-gettingstarted/using-r-to-access-the-world-bank-api.html ; the IMF use ‘imfr’; more https://meshry.com/blog/downloading-data-from-the-imf-api-using-r/ ). For poverty data the World Bank uses ‘povcalnetR’; more https://github.com/worldbank/povcalnetR ). This is a great help for not only for data scientists but also journalist who are reporting for example about poverty and development in Sub Sahara Africa.

In the Netherlands, the Central Bureau of Statistics, has an interesting R api to collect data: ‘cbsodataR’. Here is an manual: https://www.cbs.nl/nl-nl/onze-diensten/open-data/statline-als-open-data/snelstartgids , For making maps the have the following handout: https://www.cbs.nl/nl-nl/onze-diensten/open-data/statline-als-open-data/cartografie

2. Scrapping and printing in R

Scrapping data from a webpage is generally not so difficult when employing Excel for analysis. For example if there is atable on the webpage, you could directly import the table in excel, using data and then from the web; paste the url and done. Outwit Hub is a nice programme for scraping.Table capture in chrome is an other posibility. Capture the dta in .csv and then import in Excel or in R.

Direct scraping data in R is more difficult, using package ‘rvest’, you have to dive deep into the .html code . More: https://www.dataquest.io/blog/web-scraping-in-r-rvest/

Now there is a package in R for scarping using a GUI calles ‘datapasta’; scraping a table from the web using R becomes a piece of cake. Here is a tutorial: https://milesmcbain.github.io/datapasta/articles/how-to-datapasta.html 

Making a plot with ggplot can also be time consuming because of the wide range on possibilities. A GUI for printing with ggplot is very helpful. I use a package called ‘esquisse’. Here are my experiences: http://d3-media.blogspot.com/2019/02/gui-for-r-ggplot.html