maandag 26 juni 2017


Creating interactive graphics is vital to data journalism stories. In my first blog post on this subject I explored the possibilities of D3, .JavaScript , R and plotly. If you want to avoid D3  and JavaScript completely and only  make use of Python, plotly has developed an interesting new library called Dash. I have been digging into this possibility using data about Dutch municipalities.
From an analysis in R I know that there is correlation between the value of the houses  and the average income for municipalities. Checking the partial correlation  and using political party as an intervening variable, the correlation does not change dramatically. Can we produce an interactive graph showing this conclusion?

vrijdag 16 juni 2017


Working with the new version of Tableau Public 10.3 makes working with data a lot easier.  Here are some of the most important improvements:
  1. Pdf are always a pain in the rear and cracking the file can sometimes be hard, using one of the web services like Pdf to Excel, or using Tabula. Now Tableau is able to open pdf's and connect them immediately to a worksheet.
  2. If you don't want to work with Excel, there is always Google sheets. But getting the sheets into Tableau, exporting to an .xls format was needed. Now we can import Google sheets directly in Tableau.
  3. Excel has its limitations for statistical analysis. R has much more tools under the hood, but making visualization is limited, especially for online. Starting Tableau 10.3 .Rdata can directly imported into the worksheets of Tableau.
  4. Making maps with Tableau had important limitations because one had to rely on the maps provided bu Tableau. My solutions was to produce the map in QGIS and export the map to Google FT(Fusion Tables). And here it is: Tableau reads the shape files(.shp) and makes beautiful maps. Adding data to the map is now problem: choose of 4 different database joins between your map and your data.

zaterdag 10 juni 2017


 Since the beginnings of data journalism in the nineties of the last century, then called CARR or Computer Assisted Research and Reporting, techniques for  analyzing and visualizing data have improved enormously. One of the central tools in te nineties was the spreadsheet, standardized by Microsoft Excel. Spreadsheets are still much used for analysis though moving into the area of advanced data journalism: using for example R for deeper statistical analysis or D3 for creating better interactive graphics creates various new challenges. Then you often will engage in different types of coding: I got struck between Python (for R) or JavaScript (for D3). Does a data journalists need to learn all these programming languages or is there an easier and faster solution?
Looking at journalism practice the answer is:  step on the steep learning  curve and start with learning how to code. Here is some help. Paul Bradshaw starts next year an MA in Data Journalism at the Birmingham School of Media. Studying   Coding and computational thinking being applied journalistic ally (I cover using JavaScript, R, and Python, command line, SQL and Regex to pursue stories)” is one of the elements of this new MA, writes Bradshaw on his blog.
Looking into the market, there is really demand for data journalist with coding skills. Here is a job listing from the Economist. One of the preferred qualities include: A good understanding of data analytics and Coding skills (JavaScript and Python), or a background in data journalism, are a plus.
In the following I will argue that a basic understanding of coding is very helpful, but new services on the web help data journalists to avoid getting stuck in coding.