maandag 17 september 2018

Google Data: Traveling back in Time

Discussing how to control the Internet and the big data companies like Google, Amazon and Facebook is import. However finding out what these companies know about you is an other question. This is about YOUR data and thus on a personal level. I found out that it is pretty scary how much Google for examples knows about the details of my whereabouts.
First I downloaded my Google data using Take Out: Unzipped the file was 363 GB. To show the amount of detail and to produce an interesting visualization I took out the location history file, 273 MB. The file is in JSON. There is in interesting site that visualize your location history in a heat map - , and by zooming in you get much details.
Last year I worked for a month at Cape Town, the three pictures below zoom in step by step, and in the end I can find the restaurant at Bree Straat where I had lunch last year. This time traveling.
NOW it is time to scrutinize all my data at Google and Facebook!

dinsdag 21 augustus 2018


Some time ago I installed Docker to explore Blockstack. I am working now with Docker for a while using the command prompt. Starting and stopping containers, pull and image of removing them is a lot of typing. And I make mistakes in the commands, forgetting a flag or the specification for the volume. I tried to find a GUI for docker: just creating containers with a click. Reading reviews from the web I tried Portainer. Portainer runs in a docker container and can be quickly installed:
docker run -d -p 9000:9000 \
-v /var/run/docker.sock:/var/run/docker.sock \

The interest  for Docker has been rising the past year, according to Google trends. That is not surprising because Docker has a lot of advantages compared to virtualization, using for example an Oracle Virtual Box. Docker uses only a part of the kernel; the software runs an a separate container. It is an ideal solution for developers testing software or user for trying experimenting.
When I work with R, making graphs and visualizations is important for data journalism applications.
The Shiny server of R is one of the most advanced possibilities. Running in a Docker container  you deploy the server in a minute can experiment as much as you like, without the change of wrecking your OS.

dinsdag 24 juli 2018

Text Mining Made Easy

When I am doing a text analysis I generally use R. Are has various libraries for text analysis and there are also howto's. Here is one for basic text mining in R by Philip Murphy: When you the basic or R and R studio, this works like a cooking recipe. However for a training for data journalists this a bit over the top. Because first you to talk some theory about text mining, next introduce R and R studio, and then take them step by step through an example. This learning curve is a bit steep.
Found some light at the end of the Google tunnel: voyant tools. . Voyant tools is easy and simple to handle, it is web based, it is free, and has lots of possibilities for analysis. Ranging from simple word frequencies and word clouds, but also correlation between words, links between words in a network. On top pf this all your visualizations like a word cloud can separately  be download as .png or .svg. Or data like word frequencies can be download as .csv. And finally there is a link to the page with whole analysis. Upload your data and start mining. 

zondag 8 juli 2018


Data journalism is already more than fifty years old. It started in the sixties as precision journalism with Phil Meyer, then CARR computer assisted research and reporting and now data journalism. The shortest definition of data journalism is 'social science done on deadline' (Steve Dough). We incorporate the tools of the social sciences to analyze data and include them in our storytelling.
In the beginning, some 10-15 years ago, practicing data journalism needed extra skills and training. Scraping data, cleaning up and analyzing in Excel, making graphs in maps, getting data into the story, this all needed some extra journalism training. Therefore data journalism became a specialization of journalism.

The field is changing fast, and data journalism becomes a do-it-your-self toolkit that everybody can use with a minimum number of skills and understanding. Take a tool like Flourish for example: put the data in and push a button a get the graph of a map. Or the latest: workbench. Clean, scrape, analyze and visualize data without coding. A project from Columbia J-school at New York. Sign-up and get started: All the data journalism tools integrated in one package.

Reflecting on data journalism on his onlinejournalism blog, Paul Bradshaw creates two categories of data journalism training: teaching slow or fast. Teaching data journalism fast works as follows: “For many years I began my introductory data journalism classes with basic spreadsheet techniques, followed by visualization sessions to show them how to bring some of the results to life. In 2016, however, I decided to try something different: what if, instead of taking students through the process chronologically, we started at the end — and worked backwards from there? The class worked like this: students were given a spreadsheet of several tables already ready to be turned into a chart”. The new tools just mentioned not only make data journalism easy, but also clears the way for thinking about the story to be produced, and not too much about the technology and number crunching behind it.


When I switched on the Internet at the School of Journalism at the end of the eighties of the past century. I was impressed by the idea of electronic communication: ranging from e-mail to IRC chat.
This would enhance communication and understanding, and contribute to democracy. Now the opposite is the case. At the heart of their disenchantment, is that the internet has become much more “centralised” (in the tech crowd’s terminology) than it was even ten years ago”….”the system was “biased in favour of decentralisation of power and freedom to act”, writes the Economist .

From de-centralized to centralized
Instead of have direct one-on-one communication, decentralized and uncontrolled, we are working on controlled centralized systems. “These days the main way of getting online is via smartphones and tablets that confine users to carefully circumscribed spaces, or “walled gardens”, which are hardly more exciting than television channels “. It almost looks like that the times before the Internet have returned. Is Facebook so different from what was once Compuserve?

The decentralized infrastructure of the Internet is still there. On the basic level the net still runs on TCP/IP . “The connections to transfer information still exist, as do the protocols, but the extensions the internet has spawned now greatly outweigh the original network”. Not the basic level but the levels higher up are centralized and controlled. Consumer websites and all these apps. Take the social networks for example, we work on the machines of Facebook (comparable with Compuserve mainframe). “The best way to picture all this is as a vast collection of data silos with big pipes between them, connected to all kinds of devices which both deliver services and collect more data”.

Data business
How could that happen? Answer: data! “The Google search engine attracts users, which attracts suppliers of content (in Google’s case, websites that want to be listed in its index), which in turn improves the user experience, and so on. Similarly, the more people use Google’s search service, the more data it will collect, which helps to make the results more relevant. “ And the same counts for Facebook or Instagram. Data and targeted advertising are the basis of the business model which turned the Internet in a totally different beast. “Having tried to sell its technology to companies, it went for advertising, later followed by Facebook and other big internet firms. That choice meant they had to collect ever more data about their users. The more information they have, the better they can target their ads and the more they can charge for them.”

Take back control
What can we do to take back our original control over our communication on the internet? Below give a summary of 4 possible solutions based on the literature referred in the links.

maandag 11 juni 2018

vrijdag 8 juni 2018

IS DATA JOURNALISM UNDER ATTACK (Opening Media Lab speech, Peter Verweij)

Dar es Salaam June 7, 2018 Tanzania Media Fund(TMF)

Lianne Houben (in the middle)Deputy Head of Mission
at the Embassy of the Netherlands at Dar es Salaam
opening the new media lab  TMF
Photo Josh Laporte EJC

Of course you are all on Facebook, right? So you all gave Zuckerberg permission to Hoover up all your data to sell targeted advertising. In exchange you can post messages and pics to the world and to your friends. Zuckerberg: creating better communication we create we better world. This ideology is under attack one we understand the true business model behind Facebook. Not only making huge profits but through analyzing and combining the data of the users trying to influence our thinking and acting through advertising/information. Book a flight and within minutes your advised to book a car and a hotel at your destination. And it is not only Facebook but Amazon and Google as well. They all live from the use and misuse of your data. After Cambridge analytics Facebook got the full blow, the others are temporarily off the hook. The result is clear: The sole idea of data is under attack: because of privacy advertising manipulation and misuse. There is something fishy about data.