donderdag 16 november 2017

Word frequencies of headlines of the Source

This is work in progress!
 Here are some word clouds made with R showing top 100, 300 and 500 frequencies.
Starting an interactive bubble chart produced with Tableau for all frequencies.

maandag 18 september 2017


Making visualizations like maps or charts is the end product for many data journalism research. When you want to explain the structure of the research and the different steps of the data journalism research, problems emerge. Or when you want to make the research more transparent by sharing the outcome of the different steps of the research. For example I am using R for analysis and plotly for visualizations; for showing the different steps I am have a text describing the whole process in markdown. During the lecture or a training you have to switch from one application to another. Jupyter notebooks solves this problem, because by using different kernels in the notebook you can show text in markdown, calculations in R, and visualizations in plotly. You can also share the notebook with the data, so anybody can after downloading follow the research process step by step. 

vrijdag 7 juli 2017

Tanzania data journalism training


Tanzania Media Foundation
Twitter: @newTMF

Training Tanzania's next Data Journalism Leaders

TMF and EJC continue to deepen their commitment to data journalism in Tanzania by implementing their first workshop targeting the country's need for data journalism trainers. Kicking off 17 July in Dar es Salaam and led by former Utrecht professor, Peter Verweij, the programme features sessions led by participants themselves and dives deep into the local context of controversial statistics and cybercrime laws recently passed by the government.

During the workshop, the newly published Swahili language version of EJC's Data Journalism Handbook is being released and distributed to journalists across Tanzania and Kenya both digitally and hardcopy.

donderdag 6 juli 2017


Are you still using Whatsapp? That is old school and there are good reasons to switch to another service. Telegram Messenger for example. After Facebook bought Whatsapp, the chances that your privacy was endangered, were increasing. Of course Facebook created a possibility for users to opt out, that is Facebook had no access to the whatsapp data…. For the moment. The first reason for me to look into Telegram was, that it is completely cloud based. You login to your Telegram account with a mobile phone, a tablet, or your laptop, using an app or just the browser. Secondly Telegram offers more possibilities for including files to your message. Not only pics but audio, video, or text files can be attached. Nice but that was not compelling for me to switch. That was the possibility or using bots and also creating your own bot.

How does it work? Installing the messenger service is a piece of cake. Download the app and subscribe. A pin code is end to your phone to login; done. In your contact you can see who is already using Telegram. Others you can send an invitation.

maandag 26 juni 2017


Creating interactive graphics is vital to data journalism stories. In my first blog post on this subject I explored the possibilities of D3, .JavaScript , R and plotly. If you want to avoid D3  and JavaScript completely and only  make use of Python, plotly has developed an interesting new library called Dash. I have been digging into this possibility using data about Dutch municipalities.
From an analysis in R I know that there is correlation between the value of the houses  and the average income for municipalities. Checking the partial correlation  and using political party as an intervening variable, the correlation does not change dramatically. Can we produce an interactive graph showing this conclusion?

vrijdag 16 juni 2017


Working with the new version of Tableau Public 10.3 makes working with data a lot easier.  Here are some of the most important improvements:
  1. Pdf are always a pain in the rear and cracking the file can sometimes be hard, using one of the web services like Pdf to Excel, or using Tabula. Now Tableau is able to open pdf's and connect them immediately to a worksheet.
  2. If you don't want to work with Excel, there is always Google sheets. But getting the sheets into Tableau, exporting to an .xls format was needed. Now we can import Google sheets directly in Tableau.
  3. Excel has its limitations for statistical analysis. R has much more tools under the hood, but making visualization is limited, especially for online. Starting Tableau 10.3 .Rdata can directly imported into the worksheets of Tableau.
  4. Making maps with Tableau had important limitations because one had to rely on the maps provided bu Tableau. My solutions was to produce the map in QGIS and export the map to Google FT(Fusion Tables). And here it is: Tableau reads the shape files(.shp) and makes beautiful maps. Adding data to the map is now problem: choose of 4 different database joins between your map and your data.

zaterdag 10 juni 2017


 Since the beginnings of data journalism in the nineties of the last century, then called CARR or Computer Assisted Research and Reporting, techniques for  analyzing and visualizing data have improved enormously. One of the central tools in te nineties was the spreadsheet, standardized by Microsoft Excel. Spreadsheets are still much used for analysis though moving into the area of advanced data journalism: using for example R for deeper statistical analysis or D3 for creating better interactive graphics creates various new challenges. Then you often will engage in different types of coding: I got struck between Python (for R) or JavaScript (for D3). Does a data journalists need to learn all these programming languages or is there an easier and faster solution?
Looking at journalism practice the answer is:  step on the steep learning  curve and start with learning how to code. Here is some help. Paul Bradshaw starts next year an MA in Data Journalism at the Birmingham School of Media. Studying   Coding and computational thinking being applied journalistic ally (I cover using JavaScript, R, and Python, command line, SQL and Regex to pursue stories)” is one of the elements of this new MA, writes Bradshaw on his blog.
Looking into the market, there is really demand for data journalist with coding skills. Here is a job listing from the Economist. One of the preferred qualities include: A good understanding of data analytics and Coding skills (JavaScript and Python), or a background in data journalism, are a plus.
In the following I will argue that a basic understanding of coding is very helpful, but new services on the web help data journalists to avoid getting stuck in coding.