dinsdag 27 april 2021

R Data Journalism Helpers

R is an interesting platform that has found its way into data journalism. Once installed it is ready to use. However R is open source and is constantly developing. New packages are released, new ideas for analysis are discussed, for example at https://www.r-bloggers.com/ .

1. Get your data with an R api

R is a very important statistical platform for data scientist. Also in data journalism R has found applications and is used for example by journalists of the Economist or the BBC. (http://d3-media.blogspot.com/2019/02/blog-post.html ). Getting your data into R from let’s say a database puts you on a kind of detour. You make a selection of the data and download the set as .csv or .xls, from there you can import your data into R.

Direct access and downloading the data in R is now often an option. Database like the World Bank offer an R package called ‘wbstats’ ; more https://econandrew.github.io/wdi-api-R-gettingstarted/using-r-to-access-the-world-bank-api.html ; the IMF use ‘imfr’; more https://meshry.com/blog/downloading-data-from-the-imf-api-using-r/ ). For poverty data the World Bank uses ‘povcalnetR’; more https://github.com/worldbank/povcalnetR ). This is a great help for not only for data scientists but also journalist who are reporting for example about poverty and development in Sub Sahara Africa.

In the Netherlands, the Central Bureau of Statistics, has an interesting R api to collect data: ‘cbsodataR’. Here is an manual: https://www.cbs.nl/nl-nl/onze-diensten/open-data/statline-als-open-data/snelstartgids , For making maps the have the following handout: https://www.cbs.nl/nl-nl/onze-diensten/open-data/statline-als-open-data/cartografie

2. Scrapping and printing in R

Scrapping data from a webpage is generally not so difficult when employing Excel for analysis. For example if there is atable on the webpage, you could directly import the table in excel, using data and then from the web; paste the url and done. Outwit Hub is a nice programme for scraping.Table capture in chrome is an other posibility. Capture the dta in .csv and then import in Excel or in R.

Direct scraping data in R is more difficult, using package ‘rvest’, you have to dive deep into the .html code . More: https://www.dataquest.io/blog/web-scraping-in-r-rvest/

Now there is a package in R for scarping using a GUI calles ‘datapasta’; scraping a table from the web using R becomes a piece of cake. Here is a tutorial: https://milesmcbain.github.io/datapasta/articles/how-to-datapasta.html 

Making a plot with ggplot can also be time consuming because of the wide range on possibilities. A GUI for printing with ggplot is very helpful. I use a package called ‘esquisse’. Here are my experiences: http://d3-media.blogspot.com/2019/02/gui-for-r-ggplot.html

maandag 26 april 2021

Data Journalism on the Samsung Tablet with Termux, Ubuntu and R

Can I turn my Tablet into a real work station? Apps are nice for small jobs, reading mail and bit of browsing. But for work…Office is a nice app and gives you all the standard Office tools like Word and Excel, lite versions of course, but working OK. But how about doing statistics with R? No R for Android so you need Linux environment. Installing Linux in a box in Android is possible with for example Linux Deploy app. However you need to root the machine. Generally I have no problem with that: I want full control on all machines. Rooting a machine has a consequence: losing the warrant. An secondly is a working root for this specific machine with that Android version available. And finally you run the risk in case of failure to turn you machine in a brick.

Data journalism on Samsung Tablet (1)

I had to buy a new tablet. The old one a Samsung Tab S10, SM-T800, was regularly crashing when it got too much data to load. Reading the NYTimes was a disaster, constantly reloading the pages; visalizations of the Economist crashed regularly because it was out of memory. The choice for a new one was: Samsung Tab 7+, SM T-970, running Android 11. Of course with a keyboard cover, just as the old one. Here they are: 

vrijdag 20 november 2020

Adverteren in the politiek

 Sociale media hebben de politieke discurs fundamenteel veranderd. Het klassieke communicatiemodel met de journalistiek in het midden als gate keeper heeft plaats gemaakt voor direct communicatie tussen politicus en burger. Twitter en Facebook zijn de twee belangrijkste  sociale media en hebben beide een belangrijke politieke invloed. Via Twitter is bijvoorbeeld te beschrijven hoeveel invloed een twitterende policus heeft aan de hand van het aantal followers/friends. De inhoud van de tweets zegt iets over de standpunten. Eerder heb ik aandacht geschonken aan de tweets van Baudet en Wilders. Facebook pagina's en de volgers geven ook een indruk van de aanhang en standpunten van een partij.

Politieke advertenties op sociale media is een nieuw en groeiend fenomeen. Onlangs publiceerde NRC Handelsblad een insteressante analyse over campagne voeren op Facebook door Trump and Biden. Het artikel is gebaseerd op een analyse van data uit de advertentiedatabase van Facebook. In deze database is het mogelijk te zoeken op: 'issues, elections and politics' per land. Voor Nederland vinden we deze data base bijvoorbeeld voor de VVD gegevens over het totale bedrag dat aan advertenties is uitgegeven. Ook blijkt dat FVD veel geld investeert in dit soort advertenties en de PVV bijna niets.

De verkiezingen zijn pas volgend jaar, maar dit is toch een interessante database om verder te onderzoeken.

vrijdag 6 november 2020

Visualization: and the winner is....

 Data is hot, but without visualization data are not interesting in journalism. The development of new visualization tools goes fast. For years I am using Tableau public for visualizing data. But how about new tools like, PowerBI and Qlik .

In the end they all do the same thing visualizing data in nice graph and maps on dashboards, that can be publicized as a web page. There are differences however, if you don't have the time to try them all and need a comparison between the three software packages, I found the following helpful: Difference Between Power BI vs Tableau vs Qlik. And here is a second one taking R and Shiny into account: https://www.r-bloggers.com/2020/11/3-top-business-intelligence-tools-compared-tableau-powerbi-and-sisense/ 

vrijdag 21 augustus 2020


     Voor RTV Noord maakte ik ter voorbereiding voor een inleiding over datajournalistiek, de volgende screencast over afrikaanse thema's.


dinsdag 11 augustus 2020

KAPWING for online video editing and slide shows

 Just found a nice tool for editing video online and making various slideshow: KAPWING  at https://www.kapwing.com/

It is not rocket science but this video is a fast track how-to: https://www.youtube.com/watch?v=zhyusGNX1Ig

Here is an example: a short slideshow with pictures of last years ABSA RHODES data journalism training