woensdag 27 december 2017


Wanted to start 2018 with a larger view at the connected world. Here is my new Samsung UHD monitor 28 inch. Here are the specs: http://www.samsung.com/nl/monitors/uhd-ue590/LU28E590DSEN/

woensdag 6 december 2017

donderdag 23 november 2017


This enough...after a year struggling with Windows10 fast boot, secure boot, registered keys, UEFI files and endless updating, I cleaned the whole hard drive. Back to Linux again: installed Mint 17.3. All the data journalism stuff: Excel(office online) or Calc(Open Office, Outwit-Hub or table capture Chrome, Refine, Tabula, Qgis, R and R studio; all working. Just in case: running Win10 in a virtual box.
One small problem: I love Tableau public for viz. Not available for Linux (yet)....starting with plot.ly. Better integration with R.

donderdag 16 november 2017

Word frequencies of headlines of the Source

This is work in progress!
Last month headlines in The Source.
Here are the top  word frequencies (wf>5) as bar graph and word cloud.

maandag 18 september 2017


Making visualizations like maps or charts is the end product for many data journalism research. When you want to explain the structure of the research and the different steps of the data journalism research, problems emerge. Or when you want to make the research more transparent by sharing the outcome of the different steps of the research. For example I am using R for analysis and plotly for visualizations; for showing the different steps I am have a text describing the whole process in markdown. During the lecture or a training you have to switch from one application to another. Jupyter notebooks solves this problem, because by using different kernels in the notebook you can show text in markdown, calculations in R, and visualizations in plotly. You can also share the notebook with the data, so anybody can after downloading follow the research process step by step. 

vrijdag 7 juli 2017

Tanzania data journalism training


Tanzania Media Foundation
Twitter: @newTMF

Training Tanzania's next Data Journalism Leaders

TMF and EJC continue to deepen their commitment to data journalism in Tanzania by implementing their first workshop targeting the country's need for data journalism trainers. Kicking off 17 July in Dar es Salaam and led by former Utrecht professor, Peter Verweij, the programme features sessions led by participants themselves and dives deep into the local context of controversial statistics and cybercrime laws recently passed by the government.

During the workshop, the newly published Swahili language version of EJC's Data Journalism Handbook is being released and distributed to journalists across Tanzania and Kenya both digitally and hardcopy.

donderdag 6 juli 2017


Are you still using Whatsapp? That is old school and there are good reasons to switch to another service. Telegram Messenger for example. After Facebook bought Whatsapp, the chances that your privacy was endangered, were increasing. Of course Facebook created a possibility for users to opt out, that is Facebook had no access to the whatsapp data…. For the moment. The first reason for me to look into Telegram was, that it is completely cloud based. You login to your Telegram account with a mobile phone, a tablet, or your laptop, using an app or just the browser. Secondly Telegram offers more possibilities for including files to your message. Not only pics but audio, video, or text files can be attached. Nice but that was not compelling for me to switch. That was the possibility or using bots and also creating your own bot.

How does it work? Installing the messenger service is a piece of cake. Download the app and subscribe. A pin code is end to your phone to login; done. In your contact you can see who is already using Telegram. Others you can send an invitation.

maandag 26 juni 2017


Creating interactive graphics is vital to data journalism stories. In my first blog post on this subject I explored the possibilities of D3, .JavaScript , R and plotly. If you want to avoid D3  and JavaScript completely and only  make use of Python, plotly has developed an interesting new library called Dash. I have been digging into this possibility using data about Dutch municipalities.
From an analysis in R I know that there is correlation between the value of the houses  and the average income for municipalities. Checking the partial correlation  and using political party as an intervening variable, the correlation does not change dramatically. Can we produce an interactive graph showing this conclusion?

vrijdag 16 juni 2017


Working with the new version of Tableau Public 10.3 makes working with data a lot easier.  Here are some of the most important improvements:
  1. Pdf are always a pain in the rear and cracking the file can sometimes be hard, using one of the web services like Pdf to Excel, or using Tabula. Now Tableau is able to open pdf's and connect them immediately to a worksheet.
  2. If you don't want to work with Excel, there is always Google sheets. But getting the sheets into Tableau, exporting to an .xls format was needed. Now we can import Google sheets directly in Tableau.
  3. Excel has its limitations for statistical analysis. R has much more tools under the hood, but making visualization is limited, especially for online. Starting Tableau 10.3 .Rdata can directly imported into the worksheets of Tableau.
  4. Making maps with Tableau had important limitations because one had to rely on the maps provided bu Tableau. My solutions was to produce the map in QGIS and export the map to Google FT(Fusion Tables). And here it is: Tableau reads the shape files(.shp) and makes beautiful maps. Adding data to the map is now problem: choose of 4 different database joins between your map and your data.

zaterdag 10 juni 2017


 Since the beginnings of data journalism in the nineties of the last century, then called CARR or Computer Assisted Research and Reporting, techniques for  analyzing and visualizing data have improved enormously. One of the central tools in te nineties was the spreadsheet, standardized by Microsoft Excel. Spreadsheets are still much used for analysis though moving into the area of advanced data journalism: using for example R for deeper statistical analysis or D3 for creating better interactive graphics creates various new challenges. Then you often will engage in different types of coding: I got struck between Python (for R) or JavaScript (for D3). Does a data journalists need to learn all these programming languages or is there an easier and faster solution?
Looking at journalism practice the answer is:  step on the steep learning  curve and start with learning how to code. Here is some help. Paul Bradshaw starts next year an MA in Data Journalism at the Birmingham School of Media. Studying   Coding and computational thinking being applied journalistic ally (I cover using JavaScript, R, and Python, command line, SQL and Regex to pursue stories)” is one of the elements of this new MA, writes Bradshaw on his blog.
Looking into the market, there is really demand for data journalist with coding skills. Here is a job listing from the Economist. One of the preferred qualities include: A good understanding of data analytics and Coding skills (JavaScript and Python), or a background in data journalism, are a plus.
In the following I will argue that a basic understanding of coding is very helpful, but new services on the web help data journalists to avoid getting stuck in coding.

dinsdag 16 mei 2017


Playing around with scalable vector graphics (.svg), that is text files describing shapes like cubes, rectangles etc. Important for creating charts using D3, data driven documents. For manipulating .svg I use Inkscape, an open source alterbative for Adobe Illustrator. Available for Windows, Mac and Linux at: https://inkscape.org/en/

dinsdag 9 mei 2017



Bought a 32 GB USB drive….that is a complete hard drive! In stead of dual booting Linux from the hard drive, you can boot from the USB in persistent mode, and saving your work and settings. In reading/writing or booting speed is not really different from dual booting. This Philips USB 3.0 Circle 32 GB (compatible with USB 2.0) reads at 55MB/s and writes 10Mb/s. In order to speed up you can run the whole thing in RAM.


Booting from USB is with UEFI a bit more difficult. Generally Ubuntu live start from UEFI with secure boot on. However installing a module in the kernel that is not signed (for example bcmwl-kernel-source, for wireless) is blocked. Let’s disable secure boot. Perfect working, however if you want to start a persistent live Ubuntu then you need to start from UEFI with CSM, because the Ubuntu .iso is a hybrid, and Compatibility Support Module (CSM) provides legacy BIOS compatibility. 

So start up the system with: disable secure boot and fast boot, and enable UEFI with CSM.



To make the USB booting persistent use mkusb. Choose the image, select the drive, tag UEFI and persistent. However when working on HP, mine is HP Elite 820, choose MSDOS partition table not the GUID partition table (GPT), which is standard for UEFI, because HP likes MSDOS better, it seems.

Now I am not only booting into Ubuntu from a portable hard drive, which I can carry along and use it on any machine.

woensdag 12 april 2017

My bot ALEXA: from news-update to reading mail

Amazons software for bot Alexa is ported to Raspberry Pi. It is a nice experience asking for the news (BBC or NYTimes), listen to the emails I just received.

Installing is a piece of cake.
Here is the recipe: How to Build Your Own Amazon Echo with a Raspberry Pi .
Two problems after installing:
- I cannot get the wakewordagent working; it means that instead of yelling ALEXA... I now have to push the button 'listen' to start my question.
- of course you want to use your local settings; here is howto: Using outside the US . 
  The time zone can be set, but not the location; it does accept only US, GB and DE zip codes. I am stick with Dusseldorf weather.



dinsdag 3 januari 2017

The Electronic Barometer

This goes beyond data journalism. It is a project about the Internet of Things, studying the relationship between measurement/observation, data and storing data, and finally retrieving, analyzing and visualizing data if possible real time. Here is the story.

maandag 2 januari 2017


In part 1 we discussed publishing real time data, in part 2 I highlighted storing data in a database and publishing them on a blog. In the last part 3 of this series I will pay attention to retrieving data from the database and visualizing the query results.
The simplest way to retrieve data from MYSQL is installing phpMyAdmin, which gives you complete control of the database, tables and queries. phpMyAdmin makes building a query and exporting it to .csv very easy. The exported .csv can be used for further analysis with for example Excel and visualizing with Google.
More interesting is to make a direct connection to the database from R, making the query, analyze and visualize directly from R. Finally, I pay attention to publishing these results online with plot.ly rest API.


It is nice to read real time the changes in the pressure as an indication of changes in the weather. However, the observations once published are lost.  It would be much better to store the measurements as data in for example MYSQL and publish the data from the database in Wordpress.