dinsdag 25 december 2018

Re-opened my Wordpress blog: d3media

I forgot I used a Wordpress blog to write about digital journalism. I started on Blogger writing about data journalism. Now I have update my Wordpress blog  with Blogger content: D3Media
Here is the URL: https://d3media.wordpress.com/ 

Why? Wordpress gives more options to enhance the quality. I am looking for example to installing automated tagging, or installing a bot for content search.

zondag 23 december 2018


Holiday time.....now it is time to play around with.... what? I have an old camera Nikon P5100. Question: can I control that camera from a PC? I did want to create trouble on my training/work related laptop so I fired up my Raspberry Pi.
After a bit of Google browsing I found out that you can control a camera from a PC:

1. first you have to install gphoto2 on the pi; here is the recipe to install from scratch . To compile the whole thing takes a bit of time, but after a cup of coffee you are ready to.

2. Hook up the camera with a usb connection and open a terminal. Here is an overview of the commands to check if gphoto2 is working. Take care that the pi is not opening the SD-card of the camera through auto mounting removable media; turn that off.

3. Now you want to take some shots. This is how you that from the command line .

4. Of course you cannot remember all these commands with detailed option. Well there are scripts on github. Shooot is a very nice one.

5. Finally.. you prefer a GUI, use gphoto-bottle . BTW the P5100 is quite limited in use with gphoto2, no movie, no streaming, only shooting pictures in a time sequence. So better use a newer camera!

woensdag 12 december 2018


Voyant is an amazing tool for text analysis, just play around with it at: https://voyant-tools.org/  Just download or copy you text and push the buttons. That is a different experience form the coding I used to do in R. Disadvantage is that your texts are public on the web. Solution: run the Voyant tool on your machine. Of course use a docker container: https://hub.docker.com/r/sepastian/voyant-docker/  .
Start the container with:
docker run -d --name voyant -p 8080:8080 sepastian/voyant-docker:latest
And here is the screen running at http://localhost:8080/voyant/

donderdag 8 november 2018

Sovereign Debt Southern Africa

woensdag 7 november 2018

Big Mac Interactive

Big Mac Index for Southern African countries-STATIC

dinsdag 25 september 2018


There is a trend to make data journalism more easy. No spreadsheets or difficult statistics or coding, but immediately producing impressive visualizations. Read more: https://d3-media.blogspot.com/2018/07/new-steps-in-data-journalism.html  .
I believe we have arrived at the logical end of this development: turn your data in work of art. Give it a try at: https://morph.graphics/. Difficult? No of course not but  the intro by Alberto Cairo is very nice. Here is an example of my art work, using data about life expectancy and GDP per capita from Sub-Sahara countries. Impressive. Nice start for a powerpoint ppt, but don't  ask what it means.

maandag 17 september 2018

Google Data: Traveling back in Time

Discussing how to control the Internet and the big data companies like Google, Amazon and Facebook is import. https://d3-media.blogspot.com/2018/07/take-back-control-over-internet.html However finding out what these companies know about you is an other question. This is about YOUR data and thus on a personal level. I found out that it is pretty scary how much Google for examples knows about the details of my whereabouts.
First I downloaded my Google data using Take Out: https://takeout.google.com/settings/takeout. Unzipped the file was 363 GB. To show the amount of detail and to produce an interesting visualization I took out the location history file, 273 MB. The file is in JSON. There is in interesting site that visualize your location history in a heat map - https://locationhistoryvisualizer.com/heatmap/ , and by zooming in you get much details.
Here is the result of an investigation of the NYTimes about apps using location info:
Last year I worked for a month at Cape Town, the three pictures below zoom in step by step, and in the end I can find the restaurant at Bree Straat where I had lunch last year. This time traveling.
NOW it is time to scrutinize all my data at Google and Facebook!

dinsdag 21 augustus 2018


Some time ago I installed Docker to explore Blockstack. I am working now with Docker for a while using the command prompt. Starting and stopping containers, pull and image of removing them is a lot of typing. And I make mistakes in the commands, forgetting a flag or the specification for the volume. I tried to find a GUI for docker: just creating containers with a click. Reading reviews from the web I tried Portainer. Portainer runs in a docker container and can be quickly installed:
docker run -d -p 9000:9000 \
-v /var/run/docker.sock:/var/run/docker.sock \

The interest  for Docker has been rising the past year, according to Google trends. That is not surprising because Docker has a lot of advantages compared to virtualization, using for example an Oracle Virtual Box. Docker uses only a part of the kernel; the software runs an a separate container. It is an ideal solution for developers testing software or user for trying experimenting.
When I work with R, making graphs and visualizations is important for data journalism applications.
The Shiny server of R is one of the most advanced possibilities. Running in a Docker container  you deploy the server in a minute can experiment as much as you like, without the change of wrecking your OS.

dinsdag 24 juli 2018

Text Mining Made Easy

When I am doing a text analysis I generally use R. Are has various libraries for text analysis and there are also howto's. Here is one for basic text mining in R by Philip Murphy: https://rstudio-pubs-static.s3.amazonaws.com/265713_cbef910aee7642dc8b62996e38d2825d.html. When you the basic or R and R studio, this works like a cooking recipe. However for a training for data journalists this a bit over the top. Because first you to talk some theory about text mining, next introduce R and R studio, and then take them step by step through an example. This learning curve is a bit steep.
Found some light at the end of the Google tunnel: voyant tools. https://voyant-tools.org/ . Voyant tools is easy and simple to handle, it is web based, it is free, and has lots of possibilities for analysis. Ranging from simple word frequencies and word clouds, but also correlation between words, links between words in a network. On top pf this all your visualizations like a word cloud can separately  be download as .png or .svg. Or data like word frequencies can be download as .csv. And finally there is a link to the page with whole analysis. Upload your data and start mining. 

zondag 8 juli 2018


Data journalism is already more than fifty years old. It started in the sixties as precision journalism with Phil Meyer, then CARR computer assisted research and reporting and now data journalism. The shortest definition of data journalism is 'social science done on deadline' (Steve Dough). We incorporate the tools of the social sciences to analyze data and include them in our storytelling.
In the beginning, some 10-15 years ago, practicing data journalism needed extra skills and training. Scraping data, cleaning up and analyzing in Excel, making graphs in maps, getting data into the story, this all needed some extra journalism training. Therefore data journalism became a specialization of journalism.

The field is changing fast, and data journalism becomes a do-it-your-self toolkit that everybody can use with a minimum number of skills and understanding. Take a tool like Flourish https://app.flourish.studio/ for example: put the data in and push a button a get the graph of a map. Or the latest: workbench. Clean, scrape, analyze and visualize data without coding. A project from Columbia J-school at New York. Sign-up and get started:http://workbenchdata.com/. All the data journalism tools integrated in one package.

Reflecting on data journalism on his onlinejournalism blog, Paul Bradshaw creates two categories of data journalism training: teaching slow or fast. Teaching data journalism fast works as follows: “For many years I began my introductory data journalism classes with basic spreadsheet techniques, followed by visualization sessions to show them how to bring some of the results to life. In 2016, however, I decided to try something different: what if, instead of taking students through the process chronologically, we started at the end — and worked backwards from there? The class worked like this: students were given a spreadsheet of several tables already ready to be turned into a chart”. The new tools just mentioned not only make data journalism easy, but also clears the way for thinking about the story to be produced, and not too much about the technology and number crunching behind it.


When I switched on the Internet at the School of Journalism at the end of the eighties of the past century. I was impressed by the idea of electronic communication: ranging from e-mail to IRC chat.
This would enhance communication and understanding, and contribute to democracy. Now the opposite is the case. At the heart of their disenchantment, is that the internet has become much more “centralised” (in the tech crowd’s terminology) than it was even ten years ago”….”the system was “biased in favour of decentralisation of power and freedom to act”, writes the Economist .

From de-centralized to centralized
Instead of have direct one-on-one communication, decentralized and uncontrolled, we are working on controlled centralized systems. “These days the main way of getting online is via smartphones and tablets that confine users to carefully circumscribed spaces, or “walled gardens”, which are hardly more exciting than television channels “. It almost looks like that the times before the Internet have returned. Is Facebook so different from what was once Compuserve?

The decentralized infrastructure of the Internet is still there. On the basic level the net still runs on TCP/IP . “The connections to transfer information still exist, as do the protocols, but the extensions the internet has spawned now greatly outweigh the original network”. Not the basic level but the levels higher up are centralized and controlled. Consumer websites and all these apps. Take the social networks for example, we work on the machines of Facebook (comparable with Compuserve mainframe). “The best way to picture all this is as a vast collection of data silos with big pipes between them, connected to all kinds of devices which both deliver services and collect more data”.

Data business
How could that happen? Answer: data! “The Google search engine attracts users, which attracts suppliers of content (in Google’s case, websites that want to be listed in its index), which in turn improves the user experience, and so on. Similarly, the more people use Google’s search service, the more data it will collect, which helps to make the results more relevant. “ And the same counts for Facebook or Instagram. Data and targeted advertising are the basis of the business model which turned the Internet in a totally different beast. “Having tried to sell its technology to companies, it went for advertising, later followed by Facebook and other big internet firms. That choice meant they had to collect ever more data about their users. The more information they have, the better they can target their ads and the more they can charge for them.”

Take back control
What can we do to take back our original control over our communication on the internet? Below give a summary of 4 possible solutions based on the literature referred in the links.

maandag 11 juni 2018

vrijdag 8 juni 2018

IS DATA JOURNALISM UNDER ATTACK (Opening Media Lab speech, Peter Verweij)

Dar es Salaam June 7, 2018 Tanzania Media Fund(TMF)

Lianne Houben (in the middle)Deputy Head of Mission
at the Embassy of the Netherlands at Dar es Salaam
opening the new media lab  TMF
Photo Josh Laporte EJC

Of course you are all on Facebook, right? So you all gave Zuckerberg permission to Hoover up all your data to sell targeted advertising. In exchange you can post messages and pics to the world and to your friends. Zuckerberg: creating better communication we create we better world. This ideology is under attack one we understand the true business model behind Facebook. Not only making huge profits but through analyzing and combining the data of the users trying to influence our thinking and acting through advertising/information. Book a flight and within minutes your advised to book a car and a hotel at your destination. And it is not only Facebook but Amazon and Google as well. They all live from the use and misuse of your data. After Cambridge analytics Facebook got the full blow, the others are temporarily off the hook. The result is clear: The sole idea of data is under attack: because of privacy advertising manipulation and misuse. There is something fishy about data.

zaterdag 10 februari 2018


Earthquakes at the province of Groningen are induced by the mining of natural gas since the sixties. The KNMI has recorded and collected the data of the quakes. Inspired by Maarten Lambrechts I loaded the data into a template at Flourish gives the following time chart:

vrijdag 9 februari 2018


Flourish is an awesome tool to create charts. Its output is almost art; this could move data journalism away from its original goal: being a kind of 'sociology done on deadline', aiming at 'improving reporting by using the tools of science'. Although a chart can be made fast, easily and beautiful, the question still is what does it show and what is the meaning?
Below I show how to use R and R Studio to do an analysis of the same dataset.

loading the data set in a data frame h
Showing the structure of the data set
'data.frame':   165 obs. of  8 variables:
 $ year                : int  2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ...
 $ country             : Factor w/ 11 levels "Angola","Botswana",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ life.expec          : num  46.6 47.4 48.1 48.8 49.4 ...
 $ gdp.cap             : num  606 574 776 850 1136 ...
 $ code                : Factor w/ 11 levels "AGO","BWA","CMR",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Total.as.percGDP    : num  2.79 5.38 3.63 4.41 4.71 4.1 4.54 3.38 3.84 4.37 ...
 $ govperc.total.exp   : num  60.2 52.2 46.4 46.4 51.1 ...
 $ privat.perc.of.total: num  39.8 47.8 53.6 53.6 48.9 ...

donderdag 8 februari 2018


You don't have to be a highly skilled data journalist to create interesting graphs and charts. There are a large number of internet sites were you can drop your data  and retrieve in seconds awesome graphics to embed on your news blog or website. I have been working in my training with for example Datawrapper, Plotly, Tableau. Recently a tweet by Alberto Cairo draw my attention to Flourish. Amazing! Flourish, based on a cooperation with Google Newslab, easily beats the competition. And of course for free. Create an account, login, choose a template and you are in business.
I played around with it, using some data of the Worldbank.  I selected a number of Sub-Sahara countries and download life expectancy and gdp per cap from 2000 to 2014. Here is my creation, done in a few minutes.